I have DataFrame that looks like:
JavaScript
x
6
1
import pandas as pd
2
3
df = pd.DataFrame({'Customer': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B','B','B','B','B'],
4
'Date': ['1/1/2021', '2/1/2021','3/1/2021', '4/1/2021','5/1/2021', '6/1/2021','7/1/2021', '1/1/2021', '2/1/2021','3/1/2021', '4/1/2021','5/1/2021', '6/1/2021','7/1/2021'],
5
'Amt': [0, 10, 0, 10, 0, 0, 0, 0, 0, 10, 10, 0, 10, 10]})
6
df
I’m trying to calculate the beginning and end date for each, which I think is pretty straight forward (i.e., first time each customer customer and last time, as defined by amt > 0).
What I need help with is calculating the number of new acquisitions, whether it’s their first purchase or they’ve churned for a period and come back. For example, for Customer A the first is Feb-21 and the second would be Apr-21. Moreover, Customer B would be Mar-21 and then again on Jun-21. Both would have two new
The desired output would be:
I’m just not sure where to start on this one.
Advertisement
Answer
One way using pandas.DataFrame.groupby
with shift
trick:
JavaScript
1
6
1
df["grp"] = df["Amt"].ne(df["Amt"].shift()).cumsum()
2
new_df = df[df["Amt"].gt(0)].groupby("Customer").agg(Start=("Date", min),
3
End=("Date", max),
4
Reactivation=("grp", "nunique"))
5
print(new_df)
6
Output:
JavaScript
1
5
1
Start End Reactivation
2
Customer
3
A 2021-02-01 2021-04-01 2
4
B 2021-03-01 2021-07-01 2
5