I have the pandas dataframe,
data = pd.DataFrame([['1001','2020-03-06'], ['1001','2020-04-06'], ['1002','2021-04-02'], ['1003','2022-07-08'], ['1001','2020-09-06'], ['1003','2022-04-04'], ['1002','2021-06-05'], ['1007','2020-09-08'], ['1002','2021-12-07'], ['1003','2022-12-06'], ['1007','2020-02-10'], ], columns=['Type', 'Date'])
I need to Group by each id and then apply the monthly difference in each id to get monthly frequency number
I tried out : Solution :
data['Date'] = pd.to_datetime(data['Date']) data['diff'] = data.groupby(['Type'])['Date'].apply(lambda x:(x.max() - x)/np.timedelta64(1, 'M')) data['diff'] = data['diff'].astype(int) data = data.sort_values('Type')
I am expecting the Output dataframe :
Dataframe : ([['1007','2020-09-08', 0], ['1007','2020-02-10', 7], ['1003','2022-12-06', 0], ['1003','2022-07-08', 5], ['1003','2022-04-04', 3], ['1002','2021-12-07', 0], ['1002','2021-06-05', 6], ['1002','2021-04-02', 2], ['1001','2020-09-06', 0], ['1001','2020-04-06', 5], ['1001','2020-03-06', 1], ], columns=['Type', 'Date', 'MonthlyFreq'])
Advertisement
Answer
You can use period objects to calculate the number of monthly periods in between 2 dates:
data['Date'] = pd.to_datetime(data['Date']) data = data.sort_values(['Type', 'Date'], ascending=False) data['diff'] = (data['Date'] .dt.to_period('M') # convert to monthly period .groupby(data['Type']).diff().mul(-1) .apply(lambda x: 0 if pd.isna(x) else x.n) )
output:
Type Date diff 7 1007 2020-09-08 0 10 1007 2020-02-10 7 9 1003 2022-12-06 0 3 1003 2022-07-08 5 5 1003 2022-04-04 3 8 1002 2021-12-07 0 6 1002 2021-06-05 6 2 1002 2021-04-02 2 4 1001 2020-09-06 0 1 1001 2020-04-06 5 0 1001 2020-03-06 1