My DataFrame:
JavaScript
x
8
1
Col X Col Y ID Value
2
A a 'r' 3
3
A a 'b' 2
4
A a 'c' 1
5
B b 'd' 5
6
B b 's' 6
7
B b 'd' 7
8
Output required:
JavaScript
1
4
1
Col X Col Y Out
2
A a {'r':3, 'b':2, 'c':1}
3
B b {'d': 5, 's': 6, 'd':7}
4
Approach tried so far:
JavaScript
1
3
1
df = df.set_index(['Col X', 'Col Y', 'ID']).Value
2
dict_column = {k: df.xs((k, v)).to_dict() for k,v,v2 in df.index}
3
Advertisement
Answer
Use GroupBy.apply
with lambda function:
JavaScript
1
10
10
1
df['ID'] = df['ID'].str.strip("'")
2
3
df1 = (df.groupby(['Col X', 'Col Y'])[['ID','Value']]
4
.apply(lambda x: dict(x.to_numpy()))
5
.reset_index(name='Out'))
6
print (df1)
7
Col X Col Y Out
8
0 A a {'r': 3, 'b': 2, 'c': 1}
9
1 B b {'d': 7, 's': 6}
10
Duplicated keys not exist in python dictionary. You can aggregate values, e.g. by sum
:
JavaScript
1
19
19
1
df['ID'] = df['ID'].str.strip("'")
2
3
df = df.groupby(['Col X', 'Col Y','ID'], as_index=False)['Value'].sum()
4
print (df)
5
Col X Col Y ID Value
6
0 A a b 2
7
1 A a c 1
8
2 A a r 3
9
3 B b d 12
10
4 B b s 6
11
12
df1 = (df.groupby(['Col X', 'Col Y'])[['ID','Value']]
13
.apply(lambda x: dict(x.to_numpy()))
14
.reset_index(name='Out'))
15
print (df1)
16
Col X Col Y Out
17
0 A a {'b': 2, 'c': 1, 'r': 3}
18
1 B b {'d': 12, 's': 6}
19