Skip to content
Advertisement

Pandas: Merge Dataframes Based on Condition but Keep NaN

I have two dataframes, df1 and df2, which I would like to merge on the column ‘id’ where the ‘triggerdate’ from df1 falls between the ‘startdate’ and ‘enddate’ of df2, however, keep the rows where there’s no match.

df1:

id  triggerdate
a    09/01/2022
a    08/15/2022
b    06/25/2022
c    06/30/2022
c    07/01/2022

df2:

id startdate   enddate     value
a  08/30/2022  09/03/2022     30
b  07/10/2022  07/15/2022      5
c  06/28/2022  07/05/2022     10

Expected Output:

id triggerdate  startdate  enddate     value
a  09/01/2022  08/30/2022  09/03/2022     30
a  08/15/2022         NaN         NaN    NaN
b  06/25/2022         NaN         NaN    NaN
c  06/30/2022  06/28/2022  07/05/2022     10
c  07/01/2022  06/28/2022  07/05/2022     10

The approach that I have taken so far is:

df_merged = df1.merge(df2, on = ['id'], how='outer')

output = df_merged.loc[
             df_merged['triggerdate'].between(
                 df_merged['startdate'], 
                 df_merged['enddate'], inclusive='both')]

However, this approach does the following 1) Matches the ‘id’ values in df1 with df2 regardless of whether the condition is met and 2) then drops all the rows where the condition isn’t met.

Unfortunately, I had no luck finding the solution online.

What would be the recommended approach to get the expected output?

Thank you in advance for your help!

Advertisement

Answer

Try pd.merge_asof()

modify df2 to merge on date column

df2 = (pd.concat([df2,
df2[['id','enddate']]
.rename({'enddate':'startdate'},axis=1)]))

Then merge

(pd.merge_asof(df.reset_index().sort_values('triggerdate'),
df2.sort_values('startdate'),
left_on = 'triggerdate',
right_on = 'startdate',
by = 'id')
.sort_values('index')
.drop('index',axis=1))

Output:

  id triggerdate  startdate    enddate  value
4  a  2022-09-01 2022-08-30 2022-09-03   30.0
3  a  2022-08-15        NaT        NaT    NaN
0  b  2022-06-25        NaT        NaT    NaN
1  c  2022-06-30 2022-06-28 2022-07-05   10.0
2  c  2022-07-01 2022-06-28 2022-07-05   10.0
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement