Search and filter pandas dataframe with regular expressions

Question

I'd appreciate your help. I have a pandas dataframe. I want to search 3 columns of the dataframe using a regular expression, then return all rows that meet the search criteria, sorted by one of my columns. I would like to write this as a function so I can implement this logic with other criteria if possible, but am not

Accepted Answer

You can use apply to make the code more concise. For example, given this DataFrame:df = pd.DataFrame(    {        'col1': ['vhigh', 'low', 'vlow'],        'col2': ['eee', 'low', 'high'],        'val': [100,200,300]    })print dfInput:    col1  col2  val0  vhigh   eee  1001    low   low  2002   vlow  high  300You can select all the rows that contain the strings vhigh or high in columns col1 or col2 as follow:mask = df[['col1', 'col2']].apply(    lambda x: x.str.contains(        'vhigh|high',        regex=True    )).any(axis=1)print df[mask]The apply function applies the contains function on each column (since by default axis=0). The any function returns a Boolean mask, with element True indicating that at least one of the columns met the search criteria. This can then be used to perform selection on the original DataFrame.Output:    col1  col2  val0  vhigh   eee  1002   vlow  high  300Then, to sort the result by a column, e.g. the val column, you could simply do:df[mask].sort('val')

Advertisement

Answer