I want to create a new column based on a condition that have to be applied on a list. Here’s a reproducible example:
import pandas as pd df = pd.DataFrame( { "ID": [1, 2, 3, 4, 5], "BRAND": [[], ["LVH"], ["FER", "MER", "POR"], ["WDC", "AUD", "LVH"], ["AST"]] } ) print(df) ID BRAND 0 1 [] 1 2 [LVH] 2 3 [FER, MER, POR] 3 4 [WDC, AUD, LVH] 4 5 [AST]
As one can see, each object in the BRAND
column is a list that can contain one or more elements (the list can also be empty as for the row where ID = 1
).
Now, given the following target list target_list = ["LVH", "WDC"]
, my goal is to create a new column based on the following rule: if at least one element of target_list
(i.e. either LVH
or WDC
) is present in the BRAND
column value, then assign a flag equal to Y
in a new column (otherwise assign N
). The resulting DataFrame for the above example should look as follows:
ID BRAND FLAG 0 1 [] N 1 2 [LVH] Y 2 3 [FER, MER, POR] N 3 4 [WDC, AUD, LVH] Y 4 5 [AST] N
Advertisement
Answer
Option 1
Seems to be a bit faster on a larger set than Option 2 below:
df['FLAG'] = df.BRAND.explode().isin(target_list).groupby(level=0, sort=False) .any().map({True:'Y',False:'N'}) print(df) ID BRAND FLAG 0 1 [] N 1 2 [LVH] Y 2 3 [FER, MER, POR] N 3 4 [WDC, AUD, LVH] Y 4 5 [AST] N
Explanation:
- Use
Series.explode
to “[t]ransform each element of a list-like to a row”. - Check for matches with
Series.isin
, and getTrue
orFalse
. - We now have a series with duplicate rows, so use
Series.groupby
to isolate the groups, applyany
, and get apd.Series
back with booleans in the correct shape. - Finally, use
Series.map
to turnFalse
andTrue
into"N"
and"Y"
respectively.
Option 2:
Basically same performance as the answer by @AnoushiravanR
df['FLAG'] = df.BRAND.apply(lambda x: 'Y' if len(set(x) & set(target_list)) else 'N') print(df) ID BRAND FLAG 0 1 [] N 1 2 [LVH] Y 2 3 [FER, MER, POR] N 3 4 [WDC, AUD, LVH] Y 4 5 [AST] N
Explanation: set(list_a) & set(list_b)
being a shorthand for set_a.intersection(set_b)
, which we pass to len()
. If len(...) == 0
, this will result in False
.