I have two dataframes each containing one or more time series from the same time frame but sampled at different timestamps.
I’d like to merge them into a single one resampled and interpolated with the index of the first.
Here’s a sample of the first dataframe:
a b c d 2013-01-01 07:00:00 0.45 24.33 9.04 0.00 2013-01-01 08:00:00 0.55 23.11 11.60 0.06 2013-01-01 09:00:00 0.69 27.23 18.18 0.03 2013-01-01 10:00:00 0.64 26.58 31.46 0.06 2013-01-01 11:00:00 0.36 17.50 42.58 0.29 2013-01-01 12:00:00 0.32 15.39 50.30 0.17 2013-01-01 13:00:00 0.41 17.73 51.45 0.13 2013-01-01 14:00:00 0.50 19.48 50.50 0.05 2013-01-01 15:00:00 0.48 18.32 51.51 0.03 2013-01-01 16:00:00 0.50 18.49 50.70 0.02 2013-01-01 17:00:00 1.13 32.89 40.07 0.20 2013-01-01 18:00:00 1.81 59.64 16.59 0.37
And the second one:
e 2013-01-01 06:15:00 9.0 2013-01-01 06:45:00 9.0 2013-01-01 06:55:00 9.0 2013-01-01 07:15:00 9.0 2013-01-01 07:45:00 9.0 2013-01-01 07:55:00 9.0 2013-01-01 08:15:00 10.0 2013-01-01 08:45:00 11.0 2013-01-01 08:55:00 11.0 2013-01-01 09:15:00 12.0 2013-01-01 09:45:00 13.0 2013-01-01 09:55:00 13.0 2013-01-01 10:15:00 14.0 2013-01-01 10:45:00 15.0 2013-01-01 10:55:00 15.0 2013-01-01 11:15:00 14.0 2013-01-01 11:45:00 14.0 2013-01-01 11:55:00 14.0 2013-01-01 12:15:00 14.0 2013-01-01 12:45:00 14.0 2013-01-01 12:55:00 14.0 2013-01-01 13:15:00 14.0 2013-01-01 13:45:00 14.0 2013-01-01 13:55:00 14.0 2013-01-01 14:15:00 14.0 2013-01-01 14:45:00 14.0 2013-01-01 14:55:00 14.0 2013-01-01 15:15:00 14.0 2013-01-01 15:45:00 13.0 2013-01-01 15:55:00 13.0 2013-01-01 16:15:00 13.0 2013-01-01 16:45:00 13.0 2013-01-01 16:55:00 13.0 2013-01-01 17:15:00 12.0 2013-01-01 17:45:00 12.0 2013-01-01 17:55:00 12.0 2013-01-01 18:15:00 11.0
In this case the second one is more granular but that won’t be necessarily the case. I’d like to resample the second one with dates from the first. Is this possible in an elegant pandas way?
I tried reindex
with the full dataframes but it complains about duplicate axis. Maybe that’s really my issue.
Advertisement
Answer
A simple new_df = pd.concat((df1,df2), axis=1)
retains all information and timestamps. You can choose to resample new_df
as wished.
In this specific case, you can do:
pd.concat((df1, df2.groupby(df2.index.floor('H')).mean()), axis=1)
Output:
a b c d e idx 2013-01-01 06:00:00 NaN NaN NaN NaN 9.000000 2013-01-01 07:00:00 0.45 24.33 9.04 0.00 9.000000 2013-01-01 08:00:00 0.55 23.11 11.60 0.06 10.666667 2013-01-01 09:00:00 0.69 27.23 18.18 0.03 12.666667 2013-01-01 10:00:00 0.64 26.58 31.46 0.06 14.666667 2013-01-01 11:00:00 0.36 17.50 42.58 0.29 14.000000 2013-01-01 12:00:00 0.32 15.39 50.30 0.17 14.000000 2013-01-01 13:00:00 0.41 17.73 51.45 0.13 14.000000 2013-01-01 14:00:00 0.50 19.48 50.50 0.05 14.000000 2013-01-01 15:00:00 0.48 18.32 51.51 0.03 13.333333 2013-01-01 16:00:00 0.50 18.49 50.70 0.02 13.000000 2013-01-01 17:00:00 1.13 32.89 40.07 0.20 12.000000 2013-01-01 18:00:00 1.81 59.64 16.59 0.37 11.000000