Skip to content
Advertisement

Conditionally replace values in list of arrays in a pandas dataframe

I would like to conditionally replace values in a column that contains a series of arrays.

Example dataset below: (my real dataset contains many more columns and rows)

index   lists                                  condition
0       ['5 apples', '2 pears']                B
1       ['3 apples', '3 pears', '1 pumpkin']   A
2       ['4 blueberries']                      A
3       ['5 kiwis']                            C
4       ['1 pumpkin']                          B
...     ...                                    ...

For example, if the condition is A and the row contains ‘1 pumpkin’, then I would like to replace the value with XXX. But if the condition is B and the row contains 1 pumpkin, then I would like to replace the value with YYY.

Desired output

index   lists                                  condition
0       ['5 apples', '2 pears']                B
1       ['3 apples', '3 pears', 'XXX']         A
2       ['4 blueberries']                      A
3       ['5 kiwis']                            C
4       ['YYY']                                B
...     ...                                    ...

The goal is, in fact, to replace all these values but 1 pumpkin is just one example. Importantly, I would like to maintain the array structure. Thanks!

Advertisement

Answer

Let us do explode then np.select

s = df.explode('lists')
cond = s['lists']=='1 pumpkin'
c1 = cond&s['condition'].eq('A')
c2 = cond&s['condition'].eq('B')
s['lists'] = np.select([c1,c2],['XXX','YYY'],default = s.lists.values )
df['lists'] = s.groupby(level=0)['lists'].agg(list)

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement