Skip to content
Advertisement

Cumulative of last 12 months from latest communication date?

I’m looking at counting the number of interactions grouped by ID in the last 12 months for each unique ID. The count starts from the latest date to the last one grouped by ID.

ID    date       
001   2022-02-01 
002   2018-03-26  
001   2021-08-05  
001   2019-05-01  
002   2019-02-01  
003   2018-07-01  

Output is something like the below.

ID    Last_12_Months_Count
001    2
002    2
003    1

How can I achieve this in Pandas? Any function that would count the months based on the dates from the latest date per group?

Advertisement

Answer

Use:

m = df['date'].gt(df.groupby('ID')['date'].transform('max')
                    .sub(pd.offsets.DateOffset(years=1)))
df1 = df[m]

df1 = df1.groupby('ID').size().reset_index(name='Last_12_Months_Count')
print (df1)
   ID  Last_12_Months_Count
0   1                     2
1   2                     2
2   3                     1

Or:

df1 = (df.groupby('ID')['date']
         .agg(lambda x: x.gt(x.max() - pd.offsets.DateOffset(years=1)).sum())
         .reset_index(name='Last_12_Months_Count'))
print (df1)
   ID  Last_12_Months_Count
0   1                     2
1   2                     2
2   3                     1

For count multiple columns use named aggregation:

df['date1'] = df['date']

f = lambda x: x.gt(x.max() - pd.offsets.DateOffset(years=1)).sum()
df1 = (df.groupby('ID')
         .agg(Last_12_Months_Count_date = ('date', f),
              Last_12_Months_Count_date1 = ('date1', f))
         .reset_index())
print (df1)
   ID  Last_12_Months_Count_date  Last_12_Months_Count_date1
0   1                          2                           2
1   2                          2                           2
2   3                          1                           1
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement