let’s say I have a data frame like this
eff_date,mdl_cd,ast_cd,prop_cd,value 2021-09-22,Comm,Agri,Car,-0.1234 2021-09-22,Comm,Agri,Fund,0.5123 2021-09-22,Comm,Agri,Mmt,-0.7612 2021-09-22,Comm,Engy,Car,0.1212 2021-09-22,Comm,Engy,Fund,-0.1234 2021-09-22,Comm,Engy,Mmt,0.5123 2021-09-22,Comm,Industry,Car,-0.7612 2021-09-22,Comm,Industry,Fund,0.1212 2021-09-22,Comm,Industry,Mmt,-0.1234 2021-09-22,Comm,Metal,Car,0.5123 2021-09-22,Comm,Metal,Fund,-0.7612 2021-09-22,Comm,Metal,Mmt,0.1212 2021-09-23,Equity,Agri,Car,0.6541 2021-09-23,Equity,Agri,Fund,0.5123 2021-09-23,Equity,Agri,Mmt,-0.1874 2021-09-23,Equity,Engy,Car,0.1212 2021-09-23,Equity,Engy,Fund,-0.6234 2021-09-23,Equity,Engy,Mmt,0.5123 2021-09-23,Equity,Industry,Car,-0.1612 2021-09-23,Equity,Industry,Fund,0.1212 2021-09-23,Equity,Industry,Mmt,-0.1934 2021-09-23,Equity,Metal,Car,0.5123 2021-09-23,Equity,Metal,Fund,0.5412 2021-09-23,Equity,Metal,Mmt,0.1212
I wanted to add a new row into each group of groupby(by=[‘eff_date’,’mdl_cd’,’ast_cd’])
in which column value for eff_date,mdl_cd
and ast_cd
will remain same
but for prop_cd
value become Hlds
and value value column become sum of value of that group e.g. for first group value of value column will be (-0.1234+0.5123+-0.7612) i.e. -0.3723
hence the output will be like this
eff_date,mdl_cd,ast_cd,prop_cd,value 2021-09-22,Comm,Agri,Car,-0.1234 2021-09-22,Comm,Agri,Fund,0.5123 2021-09-22,Comm,Agri,Mmt,-0.7612 2021-09-22,Comm,Agri,Hlds,-0.3723 +row added (sum of value in that group) 2021-09-22,Comm,Engy,Car,0.1212 2021-09-22,Comm,Engy,Fund,-0.1234 2021-09-22,Comm,Engy,Mmt,0.5123 2021-09-22,Comm,Engy,Hlds,0.5101 +row added (sum of value in that group) 2021-09-22,Comm,Industry,Car,-0.7612 2021-09-22,Comm,Industry,Fund,0.1212 2021-09-22,Comm,Industry,Mmt,-0.1234 2021-09-22,Comm,Industry,Hlds,-0.7634 +row added (sum of value in that group) 2021-09-22,Comm,Metal,Car,0.5123 2021-09-22,Comm,Metal,Fund,-0.7612 2021-09-22,Comm,Metal,Mmt,0.1212 2021-09-22,Comm,Metal,Hlds,-0.1277 +row added (sum of value in that group) 2021-09-23,Equity,Agri,Car,0.6541 2021-09-23,Equity,Agri,Fund,0.5123 2021-09-23,Equity,Agri,Mmt,-0.1874 2021-09-23,Equity,Agri,Hlds,0.979 +row added (sum of value in that group) 2021-09-23,Equity,Engy,Car,0.1212 2021-09-23,Equity,Engy,Fund,-0.6234 2021-09-23,Equity,Engy,Mmt,0.5123 2021-09-23,Equity,Engy,Hlds,0.0101 +row added (sum of value in that group) 2021-09-23,Equity,Industry,Car,-0.1612 2021-09-23,Equity,Industry,Fund,0.1212 2021-09-23,Equity,Industry,Mmt,-0.1934 2021-09-23,Equity,Industry,Hlds,-0.2334 +row added (sum of value in that group) 2021-09-23,Equity,Metal,Car,0.5123 2021-09-23,Equity,Metal,Fund,0.5412 2021-09-23,Equity,Metal,Mmt,0.1212 2021-09-23,Equity,Metal,Hlds,1.1747 +row added (sum of value in that group)
how to perform this computation using pandas
Advertisement
Answer
You can create a dataframe with the sum of each group by .groupby()
and .sum()
, set the prop_cd
as Hlds
by .assign()
.
Then, concat with the original dataframe by pd.concat()
and sort the columns to put the sum rows back together with their respective groups by .sort_values()
, as follows:
df_sum = df.groupby(['eff_date','mdl_cd','ast_cd'], as_index=False)['value'].sum().assign(prop_cd='Hlds') df_out = pd.concat([df, df_sum]).sort_values(['eff_date','mdl_cd','ast_cd'], kind='stable', ignore_index=True)
Result:
print(df_out) eff_date mdl_cd ast_cd prop_cd value 0 2021-09-22 Comm Agri Car -0.1234 1 2021-09-22 Comm Agri Fund 0.5123 2 2021-09-22 Comm Agri Mmt -0.7612 3 2021-09-22 Comm Agri Hlds -0.3723 4 2021-09-22 Comm Engy Car 0.1212 5 2021-09-22 Comm Engy Fund -0.1234 6 2021-09-22 Comm Engy Mmt 0.5123 7 2021-09-22 Comm Engy Hlds 0.5101 8 2021-09-22 Comm Industry Car -0.7612 9 2021-09-22 Comm Industry Fund 0.1212 10 2021-09-22 Comm Industry Mmt -0.1234 11 2021-09-22 Comm Industry Hlds -0.7634 12 2021-09-22 Comm Metal Car 0.5123 13 2021-09-22 Comm Metal Fund -0.7612 14 2021-09-22 Comm Metal Mmt 0.1212 15 2021-09-22 Comm Metal Hlds -0.1277 16 2021-09-23 Equity Agri Car 0.6541 17 2021-09-23 Equity Agri Fund 0.5123 18 2021-09-23 Equity Agri Mmt -0.1874 19 2021-09-23 Equity Agri Hlds 0.9790 20 2021-09-23 Equity Engy Car 0.1212 21 2021-09-23 Equity Engy Fund -0.6234 22 2021-09-23 Equity Engy Mmt 0.5123 23 2021-09-23 Equity Engy Hlds 0.0101 24 2021-09-23 Equity Industry Car -0.1612 25 2021-09-23 Equity Industry Fund 0.1212 26 2021-09-23 Equity Industry Mmt -0.1934 27 2021-09-23 Equity Industry Hlds -0.2334 28 2021-09-23 Equity Metal Car 0.5123 29 2021-09-23 Equity Metal Fund 0.5412 30 2021-09-23 Equity Metal Mmt 0.1212 31 2021-09-23 Equity Metal Hlds 1.1747
Setup
df = pd.read_clipboard(',') eff_date mdl_cd ast_cd prop_cd value 0 2021-09-22 Comm Agri Car -0.1234 1 2021-09-22 Comm Agri Fund 0.5123 2 2021-09-22 Comm Agri Mmt -0.7612 3 2021-09-22 Comm Engy Car 0.1212 4 2021-09-22 Comm Engy Fund -0.1234 5 2021-09-22 Comm Engy Mmt 0.5123 6 2021-09-22 Comm Industry Car -0.7612 7 2021-09-22 Comm Industry Fund 0.1212 8 2021-09-22 Comm Industry Mmt -0.1234 9 2021-09-22 Comm Metal Car 0.5123 10 2021-09-22 Comm Metal Fund -0.7612 11 2021-09-22 Comm Metal Mmt 0.1212 12 2021-09-23 Equity Agri Car 0.6541 13 2021-09-23 Equity Agri Fund 0.5123 14 2021-09-23 Equity Agri Mmt -0.1874 15 2021-09-23 Equity Engy Car 0.1212 16 2021-09-23 Equity Engy Fund -0.6234 17 2021-09-23 Equity Engy Mmt 0.5123 18 2021-09-23 Equity Industry Car -0.1612 19 2021-09-23 Equity Industry Fund 0.1212 20 2021-09-23 Equity Industry Mmt -0.1934 21 2021-09-23 Equity Metal Car 0.5123 22 2021-09-23 Equity Metal Fund 0.5412 23 2021-09-23 Equity Metal Mmt 0.1212
Interim result:
print(df_sum) eff_date mdl_cd ast_cd value prop_cd 0 2021-09-22 Comm Agri -0.3723 Hlds 1 2021-09-22 Comm Engy 0.5101 Hlds 2 2021-09-22 Comm Industry -0.7634 Hlds 3 2021-09-22 Comm Metal -0.1277 Hlds 4 2021-09-23 Equity Agri 0.9790 Hlds 5 2021-09-23 Equity Engy 0.0101 Hlds 6 2021-09-23 Equity Industry -0.2334 Hlds 7 2021-09-23 Equity Metal 1.1747 Hlds