I have DataFrame that looks like:
import pandas as pd df = pd.DataFrame({'Customer': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B','B','B','B','B'], 'Date': ['1/1/2021', '2/1/2021','3/1/2021', '4/1/2021','5/1/2021', '6/1/2021','7/1/2021', '1/1/2021', '2/1/2021','3/1/2021', '4/1/2021','5/1/2021', '6/1/2021','7/1/2021'], 'Amt': [0, 10, 0, 10, 0, 0, 0, 0, 0, 10, 10, 0, 10, 10]})
df
I’m trying to calculate the beginning and end date for each, which I think is pretty straight forward (i.e., first time each customer customer and last time, as defined by amt > 0).
What I need help with is calculating the number of new acquisitions, whether it’s their first purchase or they’ve churned for a period and come back. For example, for Customer A the first is Feb-21 and the second would be Apr-21. Moreover, Customer B would be Mar-21 and then again on Jun-21. Both would have two new
The desired output would be:
I’m just not sure where to start on this one.
Advertisement
Answer
One way using pandas.DataFrame.groupby
with shift
trick:
df["grp"] = df["Amt"].ne(df["Amt"].shift()).cumsum() new_df = df[df["Amt"].gt(0)].groupby("Customer").agg(Start=("Date", min), End=("Date", max), Reactivation=("grp", "nunique")) print(new_df)
Output:
Start End Reactivation Customer A 2021-02-01 2021-04-01 2 B 2021-03-01 2021-07-01 2