Pandas conditional counting by date

Question

I want to count all orders done by each customer at each order date, to find out how many orders were done at the time of each order. Input: Expected output: The following code works but is extremely slow. Taking upwards of 10 hours for 100k+ rows. There is certainly a better way. Answer Try sort_values to get dates in

Accepted Answer

Try sort_values to get dates in ascending order then groupby cumcount to enumerate groups in order:df['orders_total'] = df.sort_values('order_date').groupby('customer_id').cumcount()df:   customer_id  order_id order_date  orders_total0            1        12 2019-01-06             11            1        22 2019-01-01             02            2        34 2018-05-08             03            2        33 2018-05-12             14            2        38 2018-05-29             2Complete Working Example:import pandas as pddf = pd.DataFrame({    'customer_id': [1, 1, 2, 2, 2],    'order_id': [12, 22, 34, 33, 38],    'order_date': ['2019-01-06', '2019-01-01', '2018-05-08', '2018-05-12',                   '2018-05-29']})df['order_date'] = pd.to_datetime(df['order_date'])df['orders_total'] = (    df.sort_values('order_date')        .groupby('customer_id')        .cumcount())print(df)EditAssuming same dates should have the same value per group via rank:import pandas as pddf = pd.DataFrame({    'customer_id': [1, 1, 1, 2, 2, 2],    'order_id': [15, 12, 22, 34, 33, 38],    'order_date': ['2019-01-06', '2019-01-06', '2019-01-01',                   '2018-05-08', '2018-05-12', '2018-05-29']})df['order_date'] = pd.to_datetime(df['order_date'])df['orders_total'] = (        df.sort_values('order_date')        .groupby('customer_id')['order_date']        .rank(method='dense').astype(int) - 1)df:   customer_id  order_id order_date  orders_total0            1        15 2019-01-06             11            1        12 2019-01-06             12            1        22 2019-01-01             03            2        34 2018-05-08             04            2        33 2018-05-12             15            2        38 2018-05-29             2

Advertisement

Answer