Skip to content
Advertisement

Pandas .isin on column entries containing lists

I am trying to filter a dataframe using the isin() function by passing in a list and comparing with a dataframe column that also contains lists. This is an extension of the question below:

How to implement ‘in’ and ‘not in’ for Pandas dataframe

For example, instead of having one country in each row, now each row contains a list of countries.

df = pd.DataFrame({'countries':[['US', 'UK'], ['UK'], ['Germany', 'France'], ['China']]})

And to filter, I set two separate lists:

countries = ['UK','US']
countries_2 = ['UK']

The intended results should be the same because both rows 0 and 1 contain UK and/or US

>>> df[df.countries.isin(countries)]
  countries
0     US, UK
1         UK
>>> df[~df.countries.isin(countries_2)]
  countries
0     US, UK
1         UK

However Python threw the following error

TypeError: unhashable type: 'list'

Advertisement

Answer

One possible solutions with sets and issubset or isdisjoint with map:

print (df[df.countries.map(set(countries).issubset)])
  countries
0  [US, UK]

print (df[~df.countries.map(set(countries).isdisjoint)])
  countries
0  [US, UK]
1      [UK]

print (df[df.countries.map(set(countries_2).issubset)])
  countries
0  [US, UK]
1      [UK]

print (df[~df.countries.map(set(countries_2).isdisjoint)])
  countries
0  [US, UK]
1      [UK]
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement