Skip to content
Advertisement

efficient way to find the most recent entry in another dataframe for each entry of a dataframe indexed by datetime in pandas

I have two dataframes, and both of them are indexed by datetime. for example, the dataframe 1 is something below:

|date            |  value |
+----------------+--------+
|2021-11-11 09:00|    1   |
|2021-11-11 10:00|    1   |
|2021-11-12 11:00|    2   |
|2021-11-14 09:00|    2   |
|2021-11-15 09:30|    3   |

and the dataframe 2 looks like:

|date            |  value |
+----------------+--------+
|2021-11-10 11:00|    2   |
|2021-11-11 09:30|    3   |
|2021-11-12 12:00|    4   |
|2021-11-13 09:50|    2   |
|2021-11-15 10:30|    3   |

For each entry in dataframe 1, I want to find the most recent one entry in dataframe 2, and create a new column in dataframe 1 to setup the relationship between the two dataframes.

To make it more clearly, the expected results should look like below.

|date            |  value |    df2_index   |
+----------------+--------+----------------|
|2021-11-11 09:00|    1   |2021-11-10 11:00|
|2021-11-11 10:00|    1   |2021-11-11 09:30|
|2021-11-12 11:00|    2   |2021-11-11 09:30|
|2021-11-14 09:00|    2   |2021-11-13 09:50|
|2021-11-15 09:30|    3   |2021-11-13 09:50|

For the first entry in dataframe 1, 2021-11-11 09:00‘s most recent one is 2021-11-10 11:00, and the third entry 2021-11-12 11:00‘s most recent one which means the largest timestamp that do not exceed 2021-11-12 11:00 in dataframe 2 is the 2021-11-11 09:30.

Is there any pandas method that could implement this function efficiently?

Great thanks.

Advertisement

Answer

pandas merge_asof should suffice :

pd.merge_asof(df1, df2.assign(df2_index = df2.date), on = 'date')

                 date  value_x  value_y           df2_index
0 2021-11-11 09:00:00        1        2 2021-11-10 11:00:00
1 2021-11-11 10:00:00        1        3 2021-11-11 09:30:00
2 2021-11-12 11:00:00        2        3 2021-11-11 09:30:00
3 2021-11-14 09:00:00        2        2 2021-11-13 09:50:00
4 2021-11-15 09:30:00        3        2 2021-11-13 09:50:00
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement