I have a list of strings of various lengths stored in df
. The total number of rows in df
is 301501. Example is as follows:
JavaScript
x
8
1
>> df
2
3
item
4
>> 0 ['Tom', 'David']
5
>> 1 ['Robert', 'Jennifer', 'Jane']
6
>> 2 ['Robert', 'Tom', 'Patricia']
7
>> 3 ['Thomas', 'David', 'Chloe', 'Michelle']
8
I have also stored a list of female names in another list called f_name
.
I want to create another column in df
to filter out elements that are not found in f_name
. What I tried was this:
JavaScript
1
2
1
df['f_item'] = [item for item in df['item'] if f_name in item]
2
The error received is “ValueError: Length of values (0) does not match length of index (301501)”. How do I create a new column with a filtered list that only contains elements from f_name
?
Advertisement
Answer
Assuming your item
column actually contains lists of strings (and aren’t just strings that look like lists, e.g. '[1, 2, 3]'
), cast f_name
to set
and perform set intersection:
JavaScript
1
3
1
f_name = set(f_name)
2
df["item"].apply(f_name.intersection)
3
Demo:
JavaScript
1
18
18
1
In [3]: df
2
Out[3]:
3
item
4
0 [Tom, David]
5
1 [Robert, Jennifer, Jane]
6
2 [Robert, Tom, Patricia]
7
3 [Thomas, David, Chloe, Michelle]
8
9
In [4]: f_name = {"Jane", "Michelle", "Patricia", "Jennifer", "Chloe"}
10
11
In [5]: df.item.apply(f_name.intersection)
12
Out[5]:
13
0 {}
14
1 {Jane, Jennifer}
15
2 {Patricia}
16
3 {Michelle, Chloe}
17
Name: item, dtype: object
18