Skip to content
Advertisement

How to calculate a Process Duration from a TimeSeries Dataset with Pandas

I have a huge dataset of various sensor data sorted chronologically (by timestamp) and by sensor type. I want to calculate the duration of a process in seconds by subtracting the first entry of a sensor from the last entry. This is to be done with python and pandas. Attached is an example for better understanding: enter image description here

I want to subtract the first row from the last row for each sensor type to get the process duration in seconds (i.e. row 8 minus row 1 : 2022-04-04T09:44:56.962Z – 2022-04-04T09:44:56.507Z = 0.455 seconds). The duration should then be written to a newly created column in the last row of the sensor type.

Thanks in advance!

Advertisement

Answer

Assuming your ‘timestamp’ column is already ‘to_datetime’ converted, would this work ?

df['diffPerSensor_type']=df.groupby('sensor_type')['timestamp'].transform('last')-df.groupby('sensor_type')['timestamp'].transform('first')

You could then extract your seconds with this

df['diffPerSensor_type'].dt.seconds
Advertisement