Skip to content
Advertisement

Calculating the number of starts, by customer using Pandas

I have DataFrame that looks like:

import pandas as pd

df = pd.DataFrame({'Customer': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B','B','B','B','B'],
               'Date': ['1/1/2021', '2/1/2021','3/1/2021', '4/1/2021','5/1/2021', '6/1/2021','7/1/2021', '1/1/2021', '2/1/2021','3/1/2021', '4/1/2021','5/1/2021', '6/1/2021','7/1/2021'], 
               'Amt': [0, 10, 0, 10, 0, 0, 0, 0, 0, 10, 10, 0, 10, 10]})

df

enter image description here

I’m trying to calculate the beginning and end date for each, which I think is pretty straight forward (i.e., first time each customer customer and last time, as defined by amt > 0).

What I need help with is calculating the number of new acquisitions, whether it’s their first purchase or they’ve churned for a period and come back. For example, for Customer A the first is Feb-21 and the second would be Apr-21. Moreover, Customer B would be Mar-21 and then again on Jun-21. Both would have two new

The desired output would be:

enter image description here

I’m just not sure where to start on this one.

Advertisement

Answer

One way using pandas.DataFrame.groupby with shift trick:

df["grp"] = df["Amt"].ne(df["Amt"].shift()).cumsum()
new_df = df[df["Amt"].gt(0)].groupby("Customer").agg(Start=("Date", min), 
                                                     End=("Date", max),
                                                     Reactivation=("grp", "nunique"))
print(new_df)

Output:

              Start        End  Reactivation
Customer                                    
A        2021-02-01 2021-04-01             2
B        2021-03-01 2021-07-01             2
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement