I have a groupby with a diff function, however I want to add an extra mean column for heart rate, how can I do this the best way?
this is the code
data= pd.DataFrame( [[Timestamp('2022-08-05 10:11:04'), 140, 120], [Timestamp('2022-08-05 10:11:05'), 160, 155], [Timestamp('2022-08-05 10:11:06'), 230, 156], [Timestamp('2022-08-05 10:11:07'), 230, 155], [Timestamp('2022-08-05 10:11:08'), 230, 160], [Timestamp('2022-08-05 10:11:09'), 140, 130], [Timestamp('2022-08-05 10:11:10'), 140, 131], [Timestamp('2022-08-05 10:11:11'), 230, 170]], columns=['timestamp', 'power', 'heart rate']) m = data['power'].gt(200) #fill in power value gb = (-data['timestamp'].diff(-1))[m].groupby([(~m).cumsum()).sum() gb= gb.groupby((~m).cumsum()).sum() gb
where should I add in the piece of code to calculate the average heart rate?
output will be the amount of seconds in high power zone and then i would like to add the average heart rate during this period. like this
gb = pd.DataFrame( [[Timestamp('00:00:04'), 210, 145], [Timestamp('00:00:15'), 250, 155], [Timestamp('00:01:00'), 230, 180], columns=['time at high intensity', ' avg power', ' avg heart rate'])
Advertisement
Answer
You can create helper column from by difference and then aggregate by it and another column in named aggregation in GroupBy.agg
:
m = data['power'].gt(200) #fill in power value gb = (data.assign(new=-data['timestamp'].diff(-1))[m] .groupby((~m).cumsum()) .agg(time_at_high_intensity=('new','sum'), avg_power=('power','mean'), avg_heart_rate=('heart rate','mean'))) print (gb) time_at_high_intensity avg_power avg_heart_rate power 2 0 days 00:00:03 230 157 4 0 days 00:00:00 230 170