I would like to conditionally replace values in a column that contains a series of arrays.
Example dataset below: (my real dataset contains many more columns and rows)
JavaScript
x
8
1
index lists condition
2
0 ['5 apples', '2 pears'] B
3
1 ['3 apples', '3 pears', '1 pumpkin'] A
4
2 ['4 blueberries'] A
5
3 ['5 kiwis'] C
6
4 ['1 pumpkin'] B
7
8
For example, if the condition is A
and the row contains ‘1 pumpkin’, then I would like to replace the value with XXX
. But if the condition is B
and the row contains 1 pumpkin
, then I would like to replace the value with YYY
.
Desired output
JavaScript
1
8
1
index lists condition
2
0 ['5 apples', '2 pears'] B
3
1 ['3 apples', '3 pears', 'XXX'] A
4
2 ['4 blueberries'] A
5
3 ['5 kiwis'] C
6
4 ['YYY'] B
7
8
The goal is, in fact, to replace all these values but 1 pumpkin
is just one example. Importantly, I would like to maintain the array structure. Thanks!
Advertisement
Answer
Let us do explode
then np.select
JavaScript
1
7
1
s = df.explode('lists')
2
cond = s['lists']=='1 pumpkin'
3
c1 = cond&s['condition'].eq('A')
4
c2 = cond&s['condition'].eq('B')
5
s['lists'] = np.select([c1,c2],['XXX','YYY'],default = s.lists.values )
6
df['lists'] = s.groupby(level=0)['lists'].agg(list)
7