So I am working with tick data and I am attempting to resample the dataframe to minute bars, but when resample is called the time series begins and ends the first instance that a tick exists. How would I resample this data such that the first and last times can be specified to a certain start and end time?
Edit here is some sample data.
df = pd.DataFrame(data={'Code': pd.Series(['A', 'A', 'B', 'B'], dtype='str'), 'Timestamp': pd.Series([1608627600073933, 1698929600124359, 1608627600073933, 1608929600124359], dtype='datetime64[ns]'), 'Val':[5, 6, 5, 6]}) df.set_index(['Timestamp'], inplace=True) df.groupby('Code').resample('1T').agg('sum')
Which outputs
Val Timestamp 1970-01-19 14:50:00 5 1970-01-19 14:51:00 0 1970-01-19 14:52:00 0 1970-01-19 14:53:00 0 1970-01-19 14:54:00 0 1970-01-19 14:55:00 6
But I would like an output dataframe that includes a timestamp for every minute of a specific hour for example.
Advertisement
Answer
You can add start and end datetimes manually:
#removed minutes and seconds df1 = df.rename(lambda x: x.floor('H')) #removed duplicated DatetimeIndex - output empty df df1 = df1.loc[~df1.index.duplicated(), []] #join together df1 = pd.concat([df, df1, df1.rename(lambda x: x + pd.Timedelta('00:59:00'))]) print (df1) Code Val Timestamp 1970-01-19 14:50:27.600073933 A 5.0 1970-01-19 14:55:29.600124359 A 6.0 1970-01-19 14:00:00.000000000 NaN NaN 1970-01-19 14:59:00.000000000 NaN NaN df2 = df1.resample('1T').agg('sum') print (df2)
For add values per days:
df1 = df.rename(lambda x: x.floor('D')) df1 = df1.loc[~df1.index.duplicated(), []] df1 = pd.concat([df, df1, df1.rename(lambda x: x + pd.Timedelta('23:59:00'))]) print (df1) Code Val Timestamp 1970-01-19 14:50:27.600073933 A 5.0 1970-01-19 14:55:29.600124359 A 6.0 1970-01-19 00:00:00.000000000 NaN NaN 1970-01-19 23:59:00.000000000 NaN NaN df2 = df1.resample('1T').agg('sum') print (df2)