I have the pandas dataframe,
JavaScript
x
14
14
1
data = pd.DataFrame([['1001','2020-03-06'],
2
['1001','2020-04-06'],
3
['1002','2021-04-02'],
4
['1003','2022-07-08'],
5
['1001','2020-09-06'],
6
['1003','2022-04-04'],
7
['1002','2021-06-05'],
8
['1007','2020-09-08'],
9
['1002','2021-12-07'],
10
['1003','2022-12-06'],
11
['1007','2020-02-10'],
12
],
13
columns=['Type', 'Date'])
14
I need to Group by each id and then apply the monthly difference in each id to get monthly frequency number
I tried out : Solution :
JavaScript
1
5
1
data['Date'] = pd.to_datetime(data['Date'])
2
data['diff'] = data.groupby(['Type'])['Date'].apply(lambda x:(x.max() - x)/np.timedelta64(1, 'M'))
3
data['diff'] = data['diff'].astype(int)
4
data = data.sort_values('Type')
5
I am expecting the Output dataframe :
JavaScript
1
14
14
1
Dataframe : ([['1007','2020-09-08', 0],
2
['1007','2020-02-10', 7],
3
['1003','2022-12-06', 0],
4
['1003','2022-07-08', 5],
5
['1003','2022-04-04', 3],
6
['1002','2021-12-07', 0],
7
['1002','2021-06-05', 6],
8
['1002','2021-04-02', 2],
9
['1001','2020-09-06', 0],
10
['1001','2020-04-06', 5],
11
['1001','2020-03-06', 1],
12
],
13
columns=['Type', 'Date', 'MonthlyFreq'])
14
Advertisement
Answer
You can use period objects to calculate the number of monthly periods in between 2 dates:
JavaScript
1
9
1
data['Date'] = pd.to_datetime(data['Date'])
2
data = data.sort_values(['Type', 'Date'], ascending=False)
3
4
data['diff'] = (data['Date']
5
.dt.to_period('M') # convert to monthly period
6
.groupby(data['Type']).diff().mul(-1)
7
.apply(lambda x: 0 if pd.isna(x) else x.n)
8
)
9
output:
JavaScript
1
13
13
1
Type Date diff
2
7 1007 2020-09-08 0
3
10 1007 2020-02-10 7
4
9 1003 2022-12-06 0
5
3 1003 2022-07-08 5
6
5 1003 2022-04-04 3
7
8 1002 2021-12-07 0
8
6 1002 2021-06-05 6
9
2 1002 2021-04-02 2
10
4 1001 2020-09-06 0
11
1 1001 2020-04-06 5
12
0 1001 2020-03-06 1
13