I have a list of strings of various lengths stored in df
. The total number of rows in df
is 301501. Example is as follows:
>> df item >> 0 ['Tom', 'David'] >> 1 ['Robert', 'Jennifer', 'Jane'] >> 2 ['Robert', 'Tom', 'Patricia'] >> 3 ['Thomas', 'David', 'Chloe', 'Michelle']
I have also stored a list of female names in another list called f_name
.
I want to create another column in df
to filter out elements that are not found in f_name
. What I tried was this:
df['f_item'] = [item for item in df['item'] if f_name in item]
The error received is “ValueError: Length of values (0) does not match length of index (301501)”. How do I create a new column with a filtered list that only contains elements from f_name
?
Advertisement
Answer
Assuming your item
column actually contains lists of strings (and aren’t just strings that look like lists, e.g. '[1, 2, 3]'
), cast f_name
to set
and perform set intersection:
f_name = set(f_name) df["item"].apply(f_name.intersection)
Demo:
In [3]: df Out[3]: item 0 [Tom, David] 1 [Robert, Jennifer, Jane] 2 [Robert, Tom, Patricia] 3 [Thomas, David, Chloe, Michelle] In [4]: f_name = {"Jane", "Michelle", "Patricia", "Jennifer", "Chloe"} In [5]: df.item.apply(f_name.intersection) Out[5]: 0 {} 1 {Jane, Jennifer} 2 {Patricia} 3 {Michelle, Chloe} Name: item, dtype: object