Skip to content
Advertisement

Pandas get difference from first row at a set dtime with groupby

If I have a dataframe with [Group], [DTime] and [Value] columns

For each [Group] I’m trying to find the difference between the first [Value] and every subsequent value from a set [DTime], for this example say it’s the start of the df at 2015-01-01.

Ultimately I would like to plot a timeseries of [Difference] with a trace for each [Group] where every trace begins at 0 on the Y-axis.

I’ve used this below, which does work to give the difference but I’m not sure how to implement pd.groupby here to get it to repeat for each group rather than start at the very beginning.

df['Difference'] = df.loc[1:, 'Value'] - df.at[0, 'Value']

Group   Dtime         Value        
Grp1    2015-01-01    1261.406773       
Grp1    2015-01-02    1252.660231       
Grp1    2015-01-03    1223.076426
Grp2    2015-01-01    1214.402352
Grp2    2015-01-02    1422.532532
Grp2    2015-01-03    1262.990213

Is this even the best way to go about this?

Advertisement

Answer

Try this:

df['Difference'] = (df['Value'] - 
                    df.sort_values('Dtime').groupby('Group')['Value']
                                           .transform('first'))

Output:

  Group       Dtime        Value  Difference
0  Grp1  2015-01-01  1261.406773    0.000000
1  Grp1  2015-01-02  1252.660231   -8.746542
2  Grp1  2015-01-03  1223.076426  -38.330347
3  Grp2  2015-01-01  1214.402352    0.000000
4  Grp2  2015-01-02  1422.532532  208.130180
5  Grp2  2015-01-03  1262.990213   48.587861
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement