Skip to content
Advertisement

return a list if the column contains a string

I would like to check if the Names column contains any of the strings in the kw. If yes, return the list.

Here is the data:

import pandas as pd

df = pd.DataFrame({'Names':['APPLE JUICE','APPLE DRINK','APPLE JUICE DRINK', 'APPLE','ORANGE AVAILABLE','TEA AVAILABLE']})
kw = ['APPLE JUICE', 'DRINK', 'ORANGE', 'LEMONS', 'STRAWBERRY', 'BLUEBERRY', 'TEA COFFEE']

I’ve tried:

df['Names2'] = df['Names'].apply(lambda x: [k if any([k in x for k in kw]) else ''])

But it returns:

    Names   Names2
0   APPLE JUICE [<function <lambda> at 0x0000017BB875C550>]
1   APPLE DRINK [<function <lambda> at 0x0000017BB875C550>]
2   APPLE JUICE DRINK   [<function <lambda> at 0x0000017BB875C550>]
3   APPLE   []
4   ORANGE AVAILABLE    [<function <lambda> at 0x0000017BB875C550>]
5   TEA AVAILABLE   []

I am expecting an output like:

    Names   Names2
0   APPLE JUICE ['APPLE JUICE']
1   APPLE DRINK ['DRINK']
2   APPLE JUICE DRINK   ['APPLE JUICE', 'DRINK']
3   APPLE   []
4   ORANGE AVAILABLE    ['ORANGE']
5   TEA AVAILABLE   []

Advertisement

Answer

You were very close:

df['Names2'] = df['Names'].map(lambda x: [y for y in kw if y in x])

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement