Skip to content
Advertisement

How to sum a sequence in pandas?

I need to do some coding in python and I can’t do this code:

import pandas as pd

df = pd.DataFrame({'Interview':['1','1','2','2','3','3','4','4','5','5'],
                 'Sequence':['1st','2nd','1st','2nd','1st','2nd','1st','2nd','1st','2nd'],
                 'Product':['Orange','Banana','Banana','Orange','Apple','Banana','Apple','Apple','Orange','Apple'],
                  'Value':[6,8,5,4,3,5,7,6,7,9]})

Pandas sample

I need to do something like this as result:

sequence = pd.DataFrame({'1st':['Orange','Orange','Orange','Banana','Banana','Banana','Apple','Apple','Apple'],
                       '2nd':['Orange','Banana','Apple','Orange','Banana','Apple','Orange','Banana','Apple'],
                       'Value':[0,14,16,9,0,0,0,8,13]})

Pandas sample result

For me the sequence matters most in my analysis. It’s a sum of the results in interviews.

Thanks guys for the help!

Advertisement

Answer

Here is another approach using reindex and unstack:

df2 = df.set_index(['Interview', 'Sequence']).unstack()
result = df2.Product.join(df2.Value.sum(1).rename("Value"))

#               1st     2nd  Value
# Interview                       
# 1          Orange  Banana     14
# 2          Banana  Orange      9
# 3           Apple  Banana      8
# 4           Apple   Apple     13
# 5          Orange   Apple     16
Advertisement