Skip to content
Advertisement

Datetime rolling count per category in Pandas

Starting from a DataFrame with a date and user column, I’d like to add a third count_past_5_days column to indicate the rolling count of occurrences of each row’s user during the past 5 days:

date user count_past_5_days
2020-01-01 abc 1
2020-01-01 def 1
2020-01-02 abc 2
2020-01-03 abc 3
2020-01-04 abc 4
2020-01-04 def 2
2020-01-04 ghi 1
2020-01-05 abc 5
2020-01-06 abc 5
2020-01-07 abc 5

I’ve tried the following:

df.set_index('date').rolling('5D')['user'].count()

But this gets the total count for the past five rolling days, not just for the specific user of the current row. How to get this rolling count, for each row’s specific user only?

Advertisement

Answer

Try this, you can chain rolling on groupby:

df.set_index('date').groupby('user')['user']
  .rolling('5D')
  .count()
  .rename('count_past_5_days')
  .reset_index()
  .sort_values('date')

Output:

  user       date  count_past_5_days
0  abc 2020-01-01                1.0
1  def 2020-01-01                1.0
2  abc 2020-01-02                2.0
3  abc 2020-01-03                3.0
4  abc 2020-01-04                4.0
5  def 2020-01-04                2.0
6  ghi 2020-01-04                1.0
7  abc 2020-01-05                5.0
8  abc 2020-01-06                5.0
9  abc 2020-01-07                5.0
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement