How to write a function to find clients that are gone, boomeranging, new, etc?

Question

I am trying to come up with a dynamic way to check for the existence of a string and report back a few different results: gone_client, boomerang, new_client. If I groupby address_id and my_date, and the pattern is Verizon, Verizon, Comcast, Comcast, the client left Verizon and went to another company. If the …

Accepted Answer

Here&#8217;s a solution with some verbose logic that you can play around with. It doesn&#8217;t sound like you&#8217;re quite certain of your final logic but hopefully this gives you enough to play around with.This solution groups the dataframe based on the address_id. Then, for each individual group, we can examine which my_company feature. We can use this, along with a timedelta to have some logic around whether the address is with Verizon, not with Verizon, never with Verizon, is currently with Verizon and came back in the last 30 days, etc.This answer is not sponsored by Verizon. Other cellular providers exist.import pandas as pdimport datetime# data stored in dictionarydetails = {    'address_id': [111,111,111,111,111,111,222,222,222,222,222,222,333,333,333,333,333,333,444,444,444,444,444,444,555,555,555,555,555,555,777,777,777],    'my_company':['Comcast','Verizon','Other','Other','Comcast','Comcast','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Verizon','Verizon','Verizon','Verizon','Verizon','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Spectrum','Spectrum','Spectrum','Spectrum','Spectrum','Verizon','Other','Verizon','Comcast','Comcast'],    'my_date':['2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28','2022-04-25','2022-05-23','2022-06-27','2022-01-24','2022-02-21','2022-03-28']}df = pd.DataFrame(details)df['my_date'] = pd.to_datetime(df['my_date']) address_groups = df.groupby(['address_id'])frame_list = []current_date = datetime.datetime.now()for group, frame in address_groups:    # Create a list and set of each company used by a given address-id:    company_list = frame['my_company'].values.tolist()    company_set = set(company_list)    # Exclusively Verizon    if ('Verizon' in company_set) and (len(company_set) == 1):        frame['status'] = 'Verizon Diehard'    # Never Verizon    if ('Verizon' not in company_set):        frame['status'] = 'Verizon Never'    # Verizon at some point but not currently    if ('Verizon' in company_set) and (company_list[-1] != 'Verizon'):        v_frame = frame[frame['my_company'] == 'Verizon']        last_verizon_date = v_frame['my_date'].iloc[-1]        last_verizon_date = datetime.datetime.strptime(last_verizon_date, '%Y-%m-%d')        if (current_date - last_verizon_date) < pd.Timedelta("30 days"):            frame['status'] = 'Not curretly Verizon, but was in last 30 days'        else:            frame['status'] = 'Not curretly Verizon, but was so more than 30 days ago'     # Verizon currently but was a boomerang    if (company_list[-1] == 'Verizon') and (len(company_set) >= 2):        non_v_frame = frame[frame['my_company'] != 'Verizon']        last_non_v_date = non_v_frame['my_date'].iloc[-1]        last_non_v_date = datetime.datetime.strptime(last_non_v_date, '%Y-%m-%d')        if (current_date - last_non_v_date) < pd.Timedelta("30 days"):            frame['status'] = 'Boomerang back to Verizon in last 30 days'        else:            frame['status'] = 'Boomerang back more than 30 days ago'    frame_list.append(frame)final_df = pd.concat(frame_list)

Advertisement

Answer