I need to do some coding in python and I can’t do this code:
import pandas as pd df = pd.DataFrame({'Interview':['1','1','2','2','3','3','4','4','5','5'], 'Sequence':['1st','2nd','1st','2nd','1st','2nd','1st','2nd','1st','2nd'], 'Product':['Orange','Banana','Banana','Orange','Apple','Banana','Apple','Apple','Orange','Apple'], 'Value':[6,8,5,4,3,5,7,6,7,9]})
I need to do something like this as result:
sequence = pd.DataFrame({'1st':['Orange','Orange','Orange','Banana','Banana','Banana','Apple','Apple','Apple'], '2nd':['Orange','Banana','Apple','Orange','Banana','Apple','Orange','Banana','Apple'], 'Value':[0,14,16,9,0,0,0,8,13]})
For me the sequence matters most in my analysis. It’s a sum of the results in interviews.
Thanks guys for the help!
Advertisement
Answer
Here is another approach using reindex
and unstack
:
df2 = df.set_index(['Interview', 'Sequence']).unstack() result = df2.Product.join(df2.Value.sum(1).rename("Value")) # 1st 2nd Value # Interview # 1 Orange Banana 14 # 2 Banana Orange 9 # 3 Apple Banana 8 # 4 Apple Apple 13 # 5 Orange Apple 16