I’m looking at counting the number of interactions grouped by ID in the last 12 months for each unique ID. The count starts from the latest date to the last one grouped by ID.
ID date 001 2022-02-01 002 2018-03-26 001 2021-08-05 001 2019-05-01 002 2019-02-01 003 2018-07-01
Output is something like the below.
ID Last_12_Months_Count 001 2 002 2 003 1
How can I achieve this in Pandas? Any function that would count the months based on the dates from the latest date per group?
Advertisement
Answer
Use:
m = df['date'].gt(df.groupby('ID')['date'].transform('max') .sub(pd.offsets.DateOffset(years=1))) df1 = df[m] df1 = df1.groupby('ID').size().reset_index(name='Last_12_Months_Count') print (df1) ID Last_12_Months_Count 0 1 2 1 2 2 2 3 1
Or:
df1 = (df.groupby('ID')['date'] .agg(lambda x: x.gt(x.max() - pd.offsets.DateOffset(years=1)).sum()) .reset_index(name='Last_12_Months_Count')) print (df1) ID Last_12_Months_Count 0 1 2 1 2 2 2 3 1
For count multiple columns use named aggregation:
df['date1'] = df['date'] f = lambda x: x.gt(x.max() - pd.offsets.DateOffset(years=1)).sum() df1 = (df.groupby('ID') .agg(Last_12_Months_Count_date = ('date', f), Last_12_Months_Count_date1 = ('date1', f)) .reset_index()) print (df1) ID Last_12_Months_Count_date Last_12_Months_Count_date1 0 1 2 2 1 2 2 2 2 3 1 1