Skip to content
Advertisement

How to resample a dataframe an include start and end times?

So I am working with tick data and I am attempting to resample the dataframe to minute bars, but when resample is called the time series begins and ends the first instance that a tick exists. How would I resample this data such that the first and last times can be specified to a certain start and end time?

Edit here is some sample data.

df = pd.DataFrame(data={'Code': pd.Series(['A', 'A', 'B', 'B'], dtype='str'), 'Timestamp': pd.Series([1608627600073933, 1698929600124359, 1608627600073933, 1608929600124359], dtype='datetime64[ns]'),
'Val':[5, 6, 5, 6]})
df.set_index(['Timestamp'], inplace=True)
df.groupby('Code').resample('1T').agg('sum')

Which outputs

                     Val
Timestamp
1970-01-19 14:50:00    5
1970-01-19 14:51:00    0
1970-01-19 14:52:00    0
1970-01-19 14:53:00    0
1970-01-19 14:54:00    0
1970-01-19 14:55:00    6

But I would like an output dataframe that includes a timestamp for every minute of a specific hour for example.

Advertisement

Answer

You can add start and end datetimes manually:

#removed minutes and seconds
df1 = df.rename(lambda x: x.floor('H'))
#removed duplicated DatetimeIndex - output empty df
df1 = df1.loc[~df1.index.duplicated(), []]
#join together
df1 = pd.concat([df, df1, df1.rename(lambda x: x + pd.Timedelta('00:59:00'))])
print (df1)
                              Code  Val
Timestamp                              
1970-01-19 14:50:27.600073933    A  5.0
1970-01-19 14:55:29.600124359    A  6.0
1970-01-19 14:00:00.000000000  NaN  NaN
1970-01-19 14:59:00.000000000  NaN  NaN

df2 = df1.resample('1T').agg('sum')
print (df2)

For add values per days:

df1 = df.rename(lambda x: x.floor('D'))
df1 = df1.loc[~df1.index.duplicated(), []]
df1 = pd.concat([df, df1, df1.rename(lambda x: x + pd.Timedelta('23:59:00'))])
print (df1)
                              Code  Val
Timestamp                              
1970-01-19 14:50:27.600073933    A  5.0
1970-01-19 14:55:29.600124359    A  6.0
1970-01-19 00:00:00.000000000  NaN  NaN
1970-01-19 23:59:00.000000000  NaN  NaN

df2 = df1.resample('1T').agg('sum')
print (df2)
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement