I would like to conditionally replace values in a column that contains a series of arrays.
Example dataset below: (my real dataset contains many more columns and rows)
index lists condition 0 ['5 apples', '2 pears'] B 1 ['3 apples', '3 pears', '1 pumpkin'] A 2 ['4 blueberries'] A 3 ['5 kiwis'] C 4 ['1 pumpkin'] B ... ... ...
For example, if the condition is A
and the row contains ‘1 pumpkin’, then I would like to replace the value with XXX
. But if the condition is B
and the row contains 1 pumpkin
, then I would like to replace the value with YYY
.
Desired output
index lists condition 0 ['5 apples', '2 pears'] B 1 ['3 apples', '3 pears', 'XXX'] A 2 ['4 blueberries'] A 3 ['5 kiwis'] C 4 ['YYY'] B ... ... ...
The goal is, in fact, to replace all these values but 1 pumpkin
is just one example. Importantly, I would like to maintain the array structure. Thanks!
Advertisement
Answer
Let us do explode
then np.select
s = df.explode('lists') cond = s['lists']=='1 pumpkin' c1 = cond&s['condition'].eq('A') c2 = cond&s['condition'].eq('B') s['lists'] = np.select([c1,c2],['XXX','YYY'],default = s.lists.values ) df['lists'] = s.groupby(level=0)['lists'].agg(list)