Skip to content
Advertisement

Pandas apply condition on a column that contains list

I want to create a new column based on a condition that have to be applied on a list. Here’s a reproducible example:

import pandas as pd

df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "BRAND": [[], ["LVH"], ["FER", "MER", "POR"], ["WDC", "AUD", "LVH"], ["AST"]]
    }
)
print(df)
   ID            BRAND
0   1               []
1   2            [LVH]
2   3  [FER, MER, POR]
3   4  [WDC, AUD, LVH]
4   5            [AST]

As one can see, each object in the BRAND column is a list that can contain one or more elements (the list can also be empty as for the row where ID = 1).

Now, given the following target list target_list = ["LVH", "WDC"], my goal is to create a new column based on the following rule: if at least one element of target_list (i.e. either LVH or WDC) is present in the BRAND column value, then assign a flag equal to Y in a new column (otherwise assign N). The resulting DataFrame for the above example should look as follows:

   ID            BRAND FLAG
0   1               []    N
1   2            [LVH]    Y
2   3  [FER, MER, POR]    N
3   4  [WDC, AUD, LVH]    Y
4   5            [AST]    N

Advertisement

Answer

Option 1

Seems to be a bit faster on a larger set than Option 2 below:

df['FLAG'] = df.BRAND.explode().isin(target_list).groupby(level=0, sort=False)
    .any().map({True:'Y',False:'N'})

print(df)

   ID            BRAND FLAG
0   1               []    N
1   2            [LVH]    Y
2   3  [FER, MER, POR]    N
3   4  [WDC, AUD, LVH]    Y
4   5            [AST]    N

Explanation:

  • Use Series.explode to “[t]ransform each element of a list-like to a row”.
  • Check for matches with Series.isin, and get True or False.
  • We now have a series with duplicate rows, so use Series.groupby to isolate the groups, apply any, and get a pd.Series back with booleans in the correct shape.
  • Finally, use Series.map to turn False and True into "N" and "Y" respectively.

Option 2:

Basically same performance as the answer by @AnoushiravanR

df['FLAG'] = df.BRAND.apply(lambda x: 'Y' if len(set(x) & set(target_list)) 
                            else 'N')

print(df)

   ID            BRAND FLAG
0   1               []    N
1   2            [LVH]    Y
2   3  [FER, MER, POR]    N
3   4  [WDC, AUD, LVH]    Y
4   5            [AST]    N

Explanation: set(list_a) & set(list_b) being a shorthand for set_a.intersection(set_b), which we pass to len(). If len(...) == 0, this will result in False.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement