I am trying to filter a dataframe using the isin() function by passing in a list and comparing with a dataframe column that also contains lists. This is an extension of the question below:
How to implement ‘in’ and ‘not in’ for Pandas dataframe
For example, instead of having one country in each row, now each row contains a list of countries.
JavaScript
x
2
1
df = pd.DataFrame({'countries':[['US', 'UK'], ['UK'], ['Germany', 'France'], ['China']]})
2
And to filter, I set two separate lists:
JavaScript
1
3
1
countries = ['UK','US']
2
countries_2 = ['UK']
3
The intended results should be the same because both rows 0 and 1 contain UK and/or US
JavaScript
1
9
1
>>> df[df.countries.isin(countries)]
2
countries
3
0 US, UK
4
1 UK
5
>>> df[~df.countries.isin(countries_2)]
6
countries
7
0 US, UK
8
1 UK
9
However Python threw the following error
JavaScript
1
2
1
TypeError: unhashable type: 'list'
2
Advertisement
Answer
One possible solutions with sets and issubset
or isdisjoint
with map
:
JavaScript
1
19
19
1
print (df[df.countries.map(set(countries).issubset)])
2
countries
3
0 [US, UK]
4
5
print (df[~df.countries.map(set(countries).isdisjoint)])
6
countries
7
0 [US, UK]
8
1 [UK]
9
10
print (df[df.countries.map(set(countries_2).issubset)])
11
countries
12
0 [US, UK]
13
1 [UK]
14
15
print (df[~df.countries.map(set(countries_2).isdisjoint)])
16
countries
17
0 [US, UK]
18
1 [UK]
19