I have a huge dataset of various sensor data sorted chronologically (by timestamp) and by sensor type. I want to calculate the duration of a process in seconds by subtracting the first entry of a sensor from the last entry. This is to be done with python and pandas. Attached is an example for better understanding: enter image description here
I want to subtract the first row from the last row for each sensor type to get the process duration in seconds (i.e. row 8 minus row 1 : 2022-04-04T09:44:56.962Z – 2022-04-04T09:44:56.507Z = 0.455 seconds). The duration should then be written to a newly created column in the last row of the sensor type.
Thanks in advance!
Advertisement
Answer
Assuming your ‘timestamp’ column is already ‘to_datetime’ converted, would this work ?
df['diffPerSensor_type']=df.groupby('sensor_type')['timestamp'].transform('last')-df.groupby('sensor_type')['timestamp'].transform('first')
You could then extract your seconds with this
df['diffPerSensor_type'].dt.seconds