If I have a dataframe with [Group], [DTime] and [Value] columns
For each [Group] I’m trying to find the difference between the first [Value] and every subsequent value from a set [DTime], for this example say it’s the start of the df at 2015-01-01.
Ultimately I would like to plot a timeseries of [Difference] with a trace for each [Group] where every trace begins at 0 on the Y-axis.
I’ve used this below, which does work to give the difference but I’m not sure how to implement pd.groupby here to get it to repeat for each group rather than start at the very beginning.
df['Difference'] = df.loc[1:, 'Value'] - df.at[0, 'Value']
Group Dtime Value Grp1 2015-01-01 1261.406773 Grp1 2015-01-02 1252.660231 Grp1 2015-01-03 1223.076426 Grp2 2015-01-01 1214.402352 Grp2 2015-01-02 1422.532532 Grp2 2015-01-03 1262.990213
Is this even the best way to go about this?
Advertisement
Answer
Try this:
df['Difference'] = (df['Value'] - df.sort_values('Dtime').groupby('Group')['Value'] .transform('first'))
Output:
Group Dtime Value Difference 0 Grp1 2015-01-01 1261.406773 0.000000 1 Grp1 2015-01-02 1252.660231 -8.746542 2 Grp1 2015-01-03 1223.076426 -38.330347 3 Grp2 2015-01-01 1214.402352 0.000000 4 Grp2 2015-01-02 1422.532532 208.130180 5 Grp2 2015-01-03 1262.990213 48.587861