Skip to content
Advertisement

How to filter for columns where the first row (not header) starts with string

I’m trying to filter a dataframe by the first row, but can’t seem to figure out how to do it.

Here’s a sample version of the data I’m working with:

In [11]: df = pd.DataFrame(
    ...:     [['Open-Ended Response', 'Open-Ended Response', 'Response', 'Response'], [1, 2, 3, 4]],
    ...:     columns=list('ABCD'),
    ...: )

In [12]: df
Out[12]:
                     A                    B         C         D
0  Open-Ended Response  Open-Ended Response  Response  Response
1                    1                    2         3         4

What I want to do is filter for all columns that start with “Response” in the first non-header row. So in this case, just have the last two columns in there own dataframe.

I can easily filter the header with something like this:

respo = [col for col in df if col.startswith('Response')]

But it doesn’t seem to work if it’s the 1t non-header row. Importantly, I need to keep the current header after I filter.

Thank you.

Advertisement

Answer

First step is to select the values of the first row:

df.iloc[0]  # selects the values in the first row

Then, use python’s .str StringAccessor methods for working with data values rather than column names:

df.iloc[0].str.startswith('Response') # Test the result of the above line

This will give you a Series with True/False values indexed by column name. Finally, use this to select the columns from your dataframe based on the matched labels:

df.loc[:, df.iloc[0].str.startswith('Response')] # Select columns based on the test

This should do the trick!

See pandas’s docs on Indexing and Selecting Data and the StringAccessor methods for more help.

Advertisement