I am trying to filter a dataframe using the isin() function by passing in a list and comparing with a dataframe column that also contains lists. This is an extension of the question below:
How to implement ‘in’ and ‘not in’ for Pandas dataframe
For example, instead of having one country in each row, now each row contains a list of countries.
df = pd.DataFrame({'countries':[['US', 'UK'], ['UK'], ['Germany', 'France'], ['China']]})
And to filter, I set two separate lists:
countries = ['UK','US'] countries_2 = ['UK']
The intended results should be the same because both rows 0 and 1 contain UK and/or US
>>> df[df.countries.isin(countries)] countries 0 US, UK 1 UK >>> df[~df.countries.isin(countries_2)] countries 0 US, UK 1 UK
However Python threw the following error
TypeError: unhashable type: 'list'
Advertisement
Answer
One possible solutions with sets and issubset
or isdisjoint
with map
:
print (df[df.countries.map(set(countries).issubset)]) countries 0 [US, UK] print (df[~df.countries.map(set(countries).isdisjoint)]) countries 0 [US, UK] 1 [UK] print (df[df.countries.map(set(countries_2).issubset)]) countries 0 [US, UK] 1 [UK] print (df[~df.countries.map(set(countries_2).isdisjoint)]) countries 0 [US, UK] 1 [UK]