Skip to content
Advertisement

filter string elements from list using another list

I have a list of strings of various lengths stored in df. The total number of rows in df is 301501. Example is as follows:

 >>  df

         item
 >>  0   ['Tom', 'David']   
 >>  1   ['Robert', 'Jennifer', 'Jane']   
 >>  2   ['Robert', 'Tom', 'Patricia']   
 >>  3   ['Thomas', 'David', 'Chloe', 'Michelle'] 

I have also stored a list of female names in another list called f_name.

I want to create another column in df to filter out elements that are not found in f_name. What I tried was this:

df['f_item'] = [item for item in df['item'] if f_name in item]

The error received is “ValueError: Length of values (0) does not match length of index (301501)”. How do I create a new column with a filtered list that only contains elements from f_name?

Advertisement

Answer

Assuming your item column actually contains lists of strings (and aren’t just strings that look like lists, e.g. '[1, 2, 3]'), cast f_name to set and perform set intersection:

f_name = set(f_name)
df["item"].apply(f_name.intersection)

Demo:

In [3]: df
Out[3]:
                               item
0                      [Tom, David]
1          [Robert, Jennifer, Jane]
2           [Robert, Tom, Patricia]
3  [Thomas, David, Chloe, Michelle]

In [4]: f_name = {"Jane", "Michelle", "Patricia", "Jennifer", "Chloe"}

In [5]: df.item.apply(f_name.intersection)
Out[5]:
0                   {}
1     {Jane, Jennifer}
2           {Patricia}
3    {Michelle, Chloe}
Name: item, dtype: object
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement