I have a list of strings of various lengths stored in df. The total number of rows in df is 301501. Example is as follows:
>> df
item
>> 0 ['Tom', 'David']
>> 1 ['Robert', 'Jennifer', 'Jane']
>> 2 ['Robert', 'Tom', 'Patricia']
>> 3 ['Thomas', 'David', 'Chloe', 'Michelle']
I have also stored a list of female names in another list called f_name.
I want to create another column in df to filter out elements that are not found in f_name. What I tried was this:
df['f_item'] = [item for item in df['item'] if f_name in item]
The error received is “ValueError: Length of values (0) does not match length of index (301501)”. How do I create a new column with a filtered list that only contains elements from f_name?
Advertisement
Answer
Assuming your item column actually contains lists of strings (and aren’t just strings that look like lists, e.g. '[1, 2, 3]'), cast f_name to set and perform set intersection:
f_name = set(f_name) df["item"].apply(f_name.intersection)
Demo:
In [3]: df
Out[3]:
item
0 [Tom, David]
1 [Robert, Jennifer, Jane]
2 [Robert, Tom, Patricia]
3 [Thomas, David, Chloe, Michelle]
In [4]: f_name = {"Jane", "Michelle", "Patricia", "Jennifer", "Chloe"}
In [5]: df.item.apply(f_name.intersection)
Out[5]:
0 {}
1 {Jane, Jennifer}
2 {Patricia}
3 {Michelle, Chloe}
Name: item, dtype: object