Skip to content
Advertisement

Count occurrences in last 30 days with Pandas Dataframe

I have a pandas Dataframe with an ID column and a date column (YYYY-MM-DD),

ID Date
001 2022-01-01
001 2022-01-04
001 2022-02-07
002 2022-01-02
002 2022-01-03
002 2022-01-28

There may be gaps in the date field, as shown. I would like to have a new column, “occurrences_last_month” where it counts the number of occurrences for each ID in the last month (30 days).

The idea would be to add a new column with the following output,

ID Date Ocurrences_last_month
001 2022-01-01 0
001 2022-01-04 1
001 2022-02-07 0
002 2022-01-02 0
002 2022-01-03 1
002 2022-01-28 2

For example, in the case of ID 001:

  • January 1st: substracts 1 month to December 2nd, so 0 occurrences
  • January 2nd: goes from December 3rd to January 1st, so 1 occurrence
  • February 7th: goes from January 8th, so 0 occurrences

I tried to use the datetime.timedelta to compute a new column “date_previous_month”, but I am unable to compute what I need from there, I tried using count() but I failed to obtain what I need.

Advertisement

Answer

First idea is per grouops use Rolling.count with remove first level created by ID:

df = df.set_index('Date')
df['Ocurrences_last_month'] = (df.groupby('ID')
                                 .rolling('30D')
                                 .count().sub(1).droplevel(0).astype(int))
print (df)
            ID  Ocurrences_last_month
Date                                 
2022-01-01   1                      0
2022-01-04   1                      1
2022-02-07   1                      0
2022-01-02   2                      0
2022-01-03   2                      1
2022-01-28   2                      2

EDIT: If possible duplciated values create Series and assign to original DataFrame by DataFrame.join:

s = df.groupby('ID').rolling('30D', on='Date')['Date'].count().sub(1).astype(int)

df = df.join(s.rename('Ocurrences_last_month'), on=['ID','Date'])
print (df)
   ID       Date  Ocurrences_last_month
0   1 2022-01-01                      0
1   1 2022-01-04                      1
2   1 2022-02-07                      0
3   2 2022-01-02                      0
4   2 2022-01-03                      1
5   2 2022-01-28                      2

Alternative solution from comments:

df = df.merge(s.rename('Ocurrences_last_month'), on=['ID','Date'])
9 People found this is helpful
Advertisement