I’m simply looking for a more intuitive and faster way to get start and end times of uninterrupted time sequences. Here’s a reproducible example as well as my way of doing it for the time being:
import pandas as pd
import numpy as np
import datetime
data = ['1999-01-01 00:00:00', '1999-01-01 01:00:00', '1999-01-01 02:00:00',
'1999-01-10 10:00:00', '1999-01-10 11:00:00', '1999-01-10 12:00:00', '1999-01-10 13:00:00',
'1999-01-20 17:00:00', '1999-01-20 18:00:00', '1999-01-20 19:00:00']
df = pd.DataFrame(data, columns = ['time'])
df['time'] = pd.to_datetime(df['time'])
# Conversion:
new_df = pd.DataFrame(columns=['Start Date', 'End Date'])
new_df2 = pd.DataFrame(columns=['End Date'])
df['diff'] = df['time'].diff(1)
df['diff2'] = df['diff'].shift(-1)
new_df['Start Date'] = df['time'].loc[df['diff'] != pd.to_timedelta(1, unit ='h')].reset_index(drop = True)
new_df2['End Date'] = df['time'].loc[df['diff2'] != pd.to_timedelta(1, unit ='h')].reset_index(drop = True)
new_df['End Date'] = new_df2['End Date']
new_df['Duration [Hours]'] = (new_df['End Date'] - new_df['Start Date']) / np.timedelta64(1, 'h')
print(new_df)
Resulting dataframe:
Start Date End Date Duration [Hours] 0 1999-01-01 00:00:00 1999-01-01 02:00:00 2.0 1 1999-01-10 10:00:00 1999-01-10 13:00:00 3.0 2 1999-01-20 17:00:00 1999-01-20 19:00:00 2.0
Any kind of help will be valuable.
Advertisement
Answer
Another approach is to create a group column indicating which group each row belongs to. This can be done by using shift and cumsum. After that, we can simply use groupby to create the wanted columns.
df['group'] = (df['time'].shift(1) != df['time'] - pd.Timedelta(hours=1)).cumsum()
df = df.groupby('group')['time'].agg(['first', 'last', 'count'])
df['count'] = df['count'] - 1
df.columns = ['Start Date', 'End Date', 'Duration [Hours]']
df = df.reset_index(drop=True)
We need to remove 1 from the count value since it’s the time we want, not simply the number of rows in each group. The last two lines are simply for cleanup to match the wanted output dataframe.
Result:
Start Date End Date Duration [Hours] 0 1999-01-01 00:00:00 1999-01-01 02:00:00 2 1 1999-01-10 10:00:00 1999-01-10 13:00:00 3 2 1999-01-20 17:00:00 1999-01-20 19:00:00 2