My DataFrame:
Col X Col Y ID Value A a 'r' 3 A a 'b' 2 A a 'c' 1 B b 'd' 5 B b 's' 6 B b 'd' 7
Output required:
Col X Col Y Out A a {'r':3, 'b':2, 'c':1} B b {'d': 5, 's': 6, 'd':7}
Approach tried so far:
df = df.set_index(['Col X', 'Col Y', 'ID']).Value dict_column = {k: df.xs((k, v)).to_dict() for k,v,v2 in df.index}
Advertisement
Answer
Use GroupBy.apply
with lambda function:
df['ID'] = df['ID'].str.strip("'") df1 = (df.groupby(['Col X', 'Col Y'])[['ID','Value']] .apply(lambda x: dict(x.to_numpy())) .reset_index(name='Out')) print (df1) Col X Col Y Out 0 A a {'r': 3, 'b': 2, 'c': 1} 1 B b {'d': 7, 's': 6}
Duplicated keys not exist in python dictionary. You can aggregate values, e.g. by sum
:
df['ID'] = df['ID'].str.strip("'") df = df.groupby(['Col X', 'Col Y','ID'], as_index=False)['Value'].sum() print (df) Col X Col Y ID Value 0 A a b 2 1 A a c 1 2 A a r 3 3 B b d 12 4 B b s 6 df1 = (df.groupby(['Col X', 'Col Y'])[['ID','Value']] .apply(lambda x: dict(x.to_numpy())) .reset_index(name='Out')) print (df1) Col X Col Y Out 0 A a {'b': 2, 'c': 1, 'r': 3} 1 B b {'d': 12, 's': 6}