I have two dataframes, and both of them are indexed by datetime. for example, the dataframe 1 is something below:
|date | value | +----------------+--------+ |2021-11-11 09:00| 1 | |2021-11-11 10:00| 1 | |2021-11-12 11:00| 2 | |2021-11-14 09:00| 2 | |2021-11-15 09:30| 3 |
and the dataframe 2 looks like:
|date | value | +----------------+--------+ |2021-11-10 11:00| 2 | |2021-11-11 09:30| 3 | |2021-11-12 12:00| 4 | |2021-11-13 09:50| 2 | |2021-11-15 10:30| 3 |
For each entry in dataframe 1, I want to find the most recent one entry in dataframe 2, and create a new column in dataframe 1 to setup the relationship between the two dataframes.
To make it more clearly, the expected results should look like below.
|date | value | df2_index | +----------------+--------+----------------| |2021-11-11 09:00| 1 |2021-11-10 11:00| |2021-11-11 10:00| 1 |2021-11-11 09:30| |2021-11-12 11:00| 2 |2021-11-11 09:30| |2021-11-14 09:00| 2 |2021-11-13 09:50| |2021-11-15 09:30| 3 |2021-11-13 09:50|
For the first entry in dataframe 1, 2021-11-11 09:00
‘s most recent one is 2021-11-10 11:00
, and the third entry 2021-11-12 11:00
‘s most recent one which means the largest timestamp that do not exceed 2021-11-12 11:00
in dataframe 2 is the 2021-11-11 09:30
.
Is there any pandas method that could implement this function efficiently?
Great thanks.
Advertisement
Answer
pandas merge_asof
should suffice :
pd.merge_asof(df1, df2.assign(df2_index = df2.date), on = 'date') date value_x value_y df2_index 0 2021-11-11 09:00:00 1 2 2021-11-10 11:00:00 1 2021-11-11 10:00:00 1 3 2021-11-11 09:30:00 2 2021-11-12 11:00:00 2 3 2021-11-11 09:30:00 3 2021-11-14 09:00:00 2 2 2021-11-13 09:50:00 4 2021-11-15 09:30:00 3 2 2021-11-13 09:50:00