Looping through a filtered dataframe to see if a value is in a list column

Question

Apologies for the vague title, I'm not entirely sure how to word it more correctly. I have a DataFrame like this: Which is created with this: And the logic behind it is that each line is a customer record: they can only ever save one product at a time (which is why savedProduct has one product code) but they can

Accepted Answer

Try:df=df.explode('purchasedProduct').reset_index(drop=True)df['purchase_date'] = df.groupby('customerID').apply(    lambda df: df.apply(        lambda x: np.nan if x.savedProduct == 0 else df.loc[df.purchasedProduct == x.savedProduct, 'date'], axis=1))This will first explode the rows with lists in purchasedProducts, so it creates a seperate row for each item in the list.Then it adds a purchase date column, so you can determine on row level if and when the product is bought.date        customerID  saved   purchased   savedProduct    purchasedProduct    purchase_date2021-01-01  456789      1       0           11223344        0                   2021-01-032021-01-01  456789      1       0           55667788        0                   NaN2021-01-03  456789      0       1           0               11223344            NaN2021-01-03  456789      0       1           0               28373827            NaNOf course you can filter the df to only have rows with saved products:df.loc[df.saved==1]date        customerID  saved   purchased   savedProduct    purchasedProduct    purchase_date2021-01-01  456789      1       0           11223344        0                   2021-01-032021-01-01  456789      1       0           55667788        0                   NaNOr with only certain columns:df.loc[df.saved==1, ['customerID', 'savedProduct', 'date',`'purchase_date']]customerID  savedProduct    date        purchase_date456789      11223344        2021-01-01  2021-01-03456789      55667788        2021-01-01  NaN

Advertisement

Answer