How to query/filter cells against single values when cells have multiple values?

Question

I have a csv file that follows the following format Columns one Column two Key1 Value1,Value2,value3 Key2 value5 I can easily use a list and .isin to filter the data-frame as follows: Which gives me the second row, but if there are cells with multiple values (like in the first row in the example table above w…

Accepted Answer

You can Stack the dataframe to reshape, then split and explode the strings and use isin to test for occurrence of strings in list_keep, then groupby on level=0 and reduce with any to create a boolean mask:mask = df.stack().str.split(',').explode().isin(list_keep).groupby(level=0).any()Alternative approach with applymap and set operations:mask = df.applymap(lambda s: not set(s.split(',')).isdisjoint(list_keep)).any(1)>>> df[mask]  Columns one            Column two0        Key1  Value1,Value2,value3

Advertisement

Answer