I want to count all orders done by each customer at each order date, to find out how many orders were done at the time of each order.
Input:
Expected output:
The following code works but is extremely slow. Taking upwards of 10 hours for 100k+ rows. There is certainly a better way.
JavaScript
x
9
1
orders_total = []
2
3
for y,row in df_dated_filt.iterrows():
4
orders_total.append(df_dated_filt[(df_dated_filt["order_id"] != row["order_id"]) &
5
(df_dated_filt["customer_id"] == row["customer_id"]) &
6
(pd.to_datetime(df_dated_filt['order_date'])<pd.to_datetime(row['order_date']))]["order_id"].count())
7
8
df_dated_filt["orders_total"] = orders_total
9
Advertisement
Answer
Try sort_values
to get dates in ascending order then groupby cumcount
to enumerate groups in order:
JavaScript
1
2
1
df['orders_total'] = df.sort_values('order_date').groupby('customer_id').cumcount()
2
df
:
JavaScript
1
7
1
customer_id order_id order_date orders_total
2
0 1 12 2019-01-06 1
3
1 1 22 2019-01-01 0
4
2 2 34 2018-05-08 0
5
3 2 33 2018-05-12 1
6
4 2 38 2018-05-29 2
7
Complete Working Example:
JavaScript
1
18
18
1
import pandas as pd
2
3
df = pd.DataFrame({
4
'customer_id': [1, 1, 2, 2, 2],
5
'order_id': [12, 22, 34, 33, 38],
6
'order_date': ['2019-01-06', '2019-01-01', '2018-05-08', '2018-05-12',
7
'2018-05-29']
8
})
9
df['order_date'] = pd.to_datetime(df['order_date'])
10
11
df['orders_total'] = (
12
df.sort_values('order_date')
13
.groupby('customer_id')
14
.cumcount()
15
)
16
17
print(df)
18
Edit
Assuming same dates should have the same value per group via rank
:
JavaScript
1
16
16
1
import pandas as pd
2
3
df = pd.DataFrame({
4
'customer_id': [1, 1, 1, 2, 2, 2],
5
'order_id': [15, 12, 22, 34, 33, 38],
6
'order_date': ['2019-01-06', '2019-01-06', '2019-01-01',
7
'2018-05-08', '2018-05-12', '2018-05-29']
8
})
9
df['order_date'] = pd.to_datetime(df['order_date'])
10
11
df['orders_total'] = (
12
df.sort_values('order_date')
13
.groupby('customer_id')['order_date']
14
.rank(method='dense').astype(int) - 1
15
)
16
df
:
JavaScript
1
8
1
customer_id order_id order_date orders_total
2
0 1 15 2019-01-06 1
3
1 1 12 2019-01-06 1
4
2 1 22 2019-01-01 0
5
3 2 34 2018-05-08 0
6
4 2 33 2018-05-12 1
7
5 2 38 2018-05-29 2
8