my problem is closely related to Groupby count only when a certain value is present in one of the column in pandas.
Let’s say I have a dataframe which is sorted by not_unique_id
and date_of_call
.
rng = pd.date_range('2015-02-24', periods=8, freq='D') df = pd.DataFrame( { "unique_id": ["K0", "K1", "K2", "K3", "K4", "K5", "K6","K7"], "not_unique_id": ["A000", "A111", "A222", "A222", "A222", "A222", "A222","A333"], "date_of_call": rng, "customer_reached": [1,0,0,1,1,1,1,1], } ) df.sort_values(['not_unique_id','date_of_call'], inplace=True, ascending=False) df.reset_index(drop=True, inplace=True) # reset index df.head(10)
Now I want to add a new column which tells me, how often the customer was called successfully in the past. In other words: count how many times to customer was reached in the past and save result in new column. For the provided example, this would be the result:
How would you do it?
Advertisement
Answer
If all datetimes are unique and sorted you can change order by indexing first and then aggregate custom function with shift
and cumulative sum by Series.cumsum
, last replace missing values and convert column to integers:
df['new'] = (df.iloc[::-1] .groupby('not_unique_id')['customer_reached'] .apply(lambda x: x.shift().cumsum()) .fillna(0) .astype(int)) print (df) unique_id not_unique_id date_of_call customer_reached new 0 K7 A333 2015-03-03 1 0 1 K6 A222 2015-03-02 1 3 2 K5 A222 2015-03-01 1 2 3 K4 A222 2015-02-28 1 1 4 K3 A222 2015-02-27 1 0 5 K2 A222 2015-02-26 0 0 6 K1 A111 2015-02-25 0 0 7 K0 A000 2015-02-24 1 0
Or if possible change order:
df.sort_values(['not_unique_id','date_of_call'], inplace=True) df['new'] = (df.groupby('not_unique_id')['customer_reached'] .apply(lambda x: x.shift().cumsum()) .fillna(0) .astype(int)) print (df) unique_id not_unique_id date_of_call customer_reached new 0 K0 A000 2015-02-24 1 0 1 K1 A111 2015-02-25 0 0 2 K2 A222 2015-02-26 0 0 3 K3 A222 2015-02-27 1 0 4 K4 A222 2015-02-28 1 1 5 K5 A222 2015-03-01 1 2 6 K6 A222 2015-03-02 1 3 7 K7 A333 2015-03-03 1 0