Skip to content
Advertisement

Duration between two timestamps

I have a dataframe with different timestamp for each user, and I want to calculate the duration. I used this code to import my CSV files:

import pandas as pd
import glob

path = r'C:Users...Desktop' 
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0,encoding='ISO-8859-1')
    li.append(df)

df = pd.concat(li, axis=0, ignore_index=True)

df.head()

  ID     timestamp
1828765  31-05-2021 22:27:03    
1828765  31-05-2021 22:27:12    
1828765  31-05-2021 22:27:13    
1828765  31-05-2021 22:27:34
2056557  21-07-2021 10:27:12
2056557  21-07-2021 10:27:20
2056557  21-07-2021 10:27:22

And I want to get something like that

   ID    timestamp             duration(s)
1828765  31-05-2021 22:27:03    NAN
1828765  31-05-2021 22:27:12    9
1828765  31-05-2021 22:27:13    1
1828765  31-05-2021 22:27:34    21
2056557  21-07-2021 10:27:12    NAN
2056557  21-07-2021 10:27:20    8
2056557  21-07-2021 10:27:22    2

I’ve used this code, but doesn’t work for me

import datetime
df['timestamp'] =  pd.to_datetime(df['timestamp'], format = "%d-%m-%Y %H:%M:%S") 
df['time_diff'] = 0
for i in range(df.shape[0] - 1):
    df['time_diff'][i+1] = (datetime.datetime.min +  (df['timestamp'][i+1] - df['timestamp'][i])).time()

Advertisement

Answer

Operations which occur over groups of values are GroupBy operations in pandas.

pandas supports mathematical operations over timestamps natively. For this reason, subtraction will give the correct duration between any two timestamps.

We’ve already successfully converted out timestamp column to datetime64[ns]

df['timestamp'] = pd.to_datetime(df['timestamp'], format="%d-%m-%Y %H:%M:%S")

Now we can take the difference between rows within groups with Groupby.diff

df['duration'] = df.groupby('ID')['timestamp'].diff()

df

        ID           timestamp        duration
0  1828765 2021-05-31 22:27:03             NaT
1  1828765 2021-05-31 22:27:12 0 days 00:00:09
2  1828765 2021-05-31 22:27:13 0 days 00:00:01
3  1828765 2021-05-31 22:27:34 0 days 00:00:21
4  2056557 2021-07-21 10:27:12             NaT
5  2056557 2021-07-21 10:27:20 0 days 00:00:08
6  2056557 2021-07-21 10:27:22 0 days 00:00:02

If we want to get the duration in seconds we can extract the total number of seconds using Series.dt.total_seconds:

df['duration (s)'] = df.groupby('ID')['timestamp'].diff().dt.total_seconds()

df:

        ID           timestamp  duration (s)
0  1828765 2021-05-31 22:27:03           NaN
1  1828765 2021-05-31 22:27:12           9.0
2  1828765 2021-05-31 22:27:13           1.0
3  1828765 2021-05-31 22:27:34          21.0
4  2056557 2021-07-21 10:27:12           NaN
5  2056557 2021-07-21 10:27:20           8.0
6  2056557 2021-07-21 10:27:22           2.0

Complete Working Example:

import pandas as pd

df = pd.DataFrame({
    'ID': [1828765, 1828765, 1828765, 1828765, 2056557, 2056557, 2056557],
    'timestamp': ['31-05-2021 22:27:03', '31-05-2021 22:27:12',
                  '31-05-2021 22:27:13', '31-05-2021 22:27:34',
                  '21-07-2021 10:27:12', '21-07-2021 10:27:20',
                  '21-07-2021 10:27:22']
})

df['timestamp'] = pd.to_datetime(df['timestamp'], format="%d-%m-%Y %H:%M:%S")
df['duration (s)'] = df.groupby('ID')['timestamp'].diff().dt.total_seconds()
print(df)
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement