Skip to content
Advertisement

Groupby and count only how many times customer was called at specific point of time

my problem is closely related to Groupby count only when a certain value is present in one of the column in pandas.

Let’s say I have a dataframe which is sorted by not_unique_id and date_of_call.

rng = pd.date_range('2015-02-24', periods=8, freq='D')
df = pd.DataFrame(
    {
    "unique_id": ["K0", "K1", "K2", "K3", "K4", "K5", "K6","K7"],
    "not_unique_id": ["A000", "A111", "A222", "A222", "A222", "A222", "A222","A333"],
    "date_of_call": rng,
    "customer_reached": [1,0,0,1,1,1,1,1],
    }
    ) 
df.sort_values(['not_unique_id','date_of_call'], inplace=True, ascending=False)
df.reset_index(drop=True, inplace=True) # reset index

df.head(10)

enter image description here

Now I want to add a new column which tells me, how often the customer was called successfully in the past. In other words: count how many times to customer was reached in the past and save result in new column. For the provided example, this would be the result:

enter image description here

How would you do it?

Advertisement

Answer

If all datetimes are unique and sorted you can change order by indexing first and then aggregate custom function with shift and cumulative sum by Series.cumsum, last replace missing values and convert column to integers:

df['new'] = (df.iloc[::-1]
               .groupby('not_unique_id')['customer_reached']
               .apply(lambda x: x.shift().cumsum())
               .fillna(0)
               .astype(int))
print (df)
  unique_id not_unique_id date_of_call  customer_reached  new
0        K7          A333   2015-03-03                 1    0
1        K6          A222   2015-03-02                 1    3
2        K5          A222   2015-03-01                 1    2
3        K4          A222   2015-02-28                 1    1
4        K3          A222   2015-02-27                 1    0
5        K2          A222   2015-02-26                 0    0
6        K1          A111   2015-02-25                 0    0
7        K0          A000   2015-02-24                 1    0

Or if possible change order:

df.sort_values(['not_unique_id','date_of_call'], inplace=True)


df['new'] = (df.groupby('not_unique_id')['customer_reached']
               .apply(lambda x: x.shift().cumsum())
               .fillna(0)
               .astype(int))
print (df)
  unique_id not_unique_id date_of_call  customer_reached  new
0        K0          A000   2015-02-24                 1    0
1        K1          A111   2015-02-25                 0    0
2        K2          A222   2015-02-26                 0    0
3        K3          A222   2015-02-27                 1    0
4        K4          A222   2015-02-28                 1    1
5        K5          A222   2015-03-01                 1    2
6        K6          A222   2015-03-02                 1    3
7        K7          A333   2015-03-03                 1    0
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement