I have a pandas Dataframe with an ID column and a date column (YYYY-MM-DD),
ID | Date |
---|---|
001 | 2022-01-01 |
001 | 2022-01-04 |
001 | 2022-02-07 |
002 | 2022-01-02 |
002 | 2022-01-03 |
002 | 2022-01-28 |
There may be gaps in the date field, as shown. I would like to have a new column, “occurrences_last_month” where it counts the number of occurrences for each ID in the last month (30 days).
The idea would be to add a new column with the following output,
ID | Date | Ocurrences_last_month |
---|---|---|
001 | 2022-01-01 | 0 |
001 | 2022-01-04 | 1 |
001 | 2022-02-07 | 0 |
002 | 2022-01-02 | 0 |
002 | 2022-01-03 | 1 |
002 | 2022-01-28 | 2 |
For example, in the case of ID 001:
- January 1st: substracts 1 month to December 2nd, so 0 occurrences
- January 2nd: goes from December 3rd to January 1st, so 1 occurrence
- February 7th: goes from January 8th, so 0 occurrences
I tried to use the datetime.timedelta to compute a new column “date_previous_month”, but I am unable to compute what I need from there, I tried using count() but I failed to obtain what I need.
Advertisement
Answer
First idea is per grouops use Rolling.count
with remove first level created by ID
:
df = df.set_index('Date') df['Ocurrences_last_month'] = (df.groupby('ID') .rolling('30D') .count().sub(1).droplevel(0).astype(int)) print (df) ID Ocurrences_last_month Date 2022-01-01 1 0 2022-01-04 1 1 2022-02-07 1 0 2022-01-02 2 0 2022-01-03 2 1 2022-01-28 2 2
EDIT: If possible duplciated values create Series
and assign to original DataFrame by DataFrame.join
:
s = df.groupby('ID').rolling('30D', on='Date')['Date'].count().sub(1).astype(int) df = df.join(s.rename('Ocurrences_last_month'), on=['ID','Date']) print (df) ID Date Ocurrences_last_month 0 1 2022-01-01 0 1 1 2022-01-04 1 2 1 2022-02-07 0 3 2 2022-01-02 0 4 2 2022-01-03 1 5 2 2022-01-28 2
Alternative solution from comments:
df = df.merge(s.rename('Ocurrences_last_month'), on=['ID','Date'])