Hi I will try to explain the issue I am facing.
I have one dataframe (df) with the following:
ID | Date (dd-mm-yyyy) |
---|---|
AAA | 01-09-2020 |
AAA | 01-11-2020 |
AAA | 18-03-2021 |
AAA | 10-10-2022 |
BBB | 01-01-2019 |
BBB | 01-03-2019 |
CCC | 01-05-2020 |
CCC | 01-07-2020 |
CCC | 01-08-2020 |
CCC | 01-10-2021 |
I have created another dataframe (df2) with the first date (t) registered per ID and t+3months:
ID | T (First Date Occurred) | T+3 |
---|---|---|
AAA | 01-09-2020 | 01-12-2020 |
BBB | 01-01-2019 | 01-03-2020 |
CCC | 01-05-2020 | 01-08-2020 |
The desired output where I am struggling is to filter the df based on the two date filters defined in df2(“T” & “T+3):
e.g.AAA = AAA > T & AAA < T+3
ID | Date (dd-mm-yyyy) |
---|---|
AAA | 01-11-2020 |
BBB | 01-03-2019 |
CCC | 01-07-2020 |
CCC | 01-08-2020 |
What is the best way to approach this? Any help is appreciated!
Advertisement
Answer
IIUC, you can use pandas.merge_asof
with allow_exact_matches=False
:
(pd.merge_asof(df1.sort_values(by='Date'), df2.sort_values(by='T'), allow_exact_matches=False, by='ID', left_on='Date', right_on='T') .loc[lambda d: d['Date'] <= d['T+3']] )
NB. the exact condition on the T+3 is unclear as you describe “< T+3” but the shown output has “<= T+3”, just chose what you want (< or <=) in the loc
output:
ID Date T T+3 1 BBB 2019-03-01 2019-01-01 2020-03-01 3 CCC 2020-07-01 2020-05-01 2020-08-01 4 CCC 2020-08-01 2020-05-01 2020-08-01 6 AAA 2020-11-01 2020-09-01 2020-12-01