Skip to content
Advertisement

Pandas apply condition on a column that contains list

I want to create a new column based on a condition that have to be applied on a list. Here’s a reproducible example:

JavaScript

As one can see, each object in the BRAND column is a list that can contain one or more elements (the list can also be empty as for the row where ID = 1).

Now, given the following target list target_list = ["LVH", "WDC"], my goal is to create a new column based on the following rule: if at least one element of target_list (i.e. either LVH or WDC) is present in the BRAND column value, then assign a flag equal to Y in a new column (otherwise assign N). The resulting DataFrame for the above example should look as follows:

JavaScript

Advertisement

Answer

Option 1

Seems to be a bit faster on a larger set than Option 2 below:

JavaScript

Explanation:

  • Use Series.explode to “[t]ransform each element of a list-like to a row”.
  • Check for matches with Series.isin, and get True or False.
  • We now have a series with duplicate rows, so use Series.groupby to isolate the groups, apply any, and get a pd.Series back with booleans in the correct shape.
  • Finally, use Series.map to turn False and True into "N" and "Y" respectively.

Option 2:

Basically same performance as the answer by @AnoushiravanR

JavaScript

Explanation: set(list_a) & set(list_b) being a shorthand for set_a.intersection(set_b), which we pass to len(). If len(...) == 0, this will result in False.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement