I have a Dataframe with three columns Customer
, Purchase amount
and Purchase date
.
I need to get sum of Purchase amount
of customer purchases for each day, then get mean of purchases for each customer for one month.
I’ve tried to use df.pivot_table()
but it takes mean of purchases for one day and then mean of purchases for one month.
Hope you can help me.
My Dataframe:
JavaScript
x
31
31
1
lst = [['Customer1', 100, '10.01.2020'], ['Customer1', 210, '10.01.2020'], ['Customer1', 122, '10.01.2020'],
2
['Customer1', 259, '10.13.2020'], ['Customer1', 160, '10.13.2020'], ['Customer1', 225, '10.13.2020'],
3
['Customer1', 310, '10.15.2020'], ['Customer1', 310, '10.15.2020'], ['Customer1', 310, '10.15.2020'],
4
['Customer2', 50, '10.21.2020'], ['Customer2', 70, '10.21.2020'], ['Customer2', 100, '10.21.2020'],
5
['Customer2', 120, '10.25.2020'], ['Customer2', 140, '10.25.2020'], ['Customer2', 155, '10.25.2020'],
6
['Customer2', 40, '10.27.2020'], ['Customer2', 30, '10.27.2020'], ['Customer2', 80, '10.27.2020']]
7
8
df = pd.DataFrame(lst, columns = ['Customer', 'Amount', 'Purchase date'])
9
10
11
12
Customer Amount Purchase date
13
0 Customer1 100 10.01.2020
14
1 Customer1 210 10.01.2020
15
2 Customer1 122 10.01.2020
16
3 Customer1 259 10.13.2020
17
4 Customer1 160 10.13.2020
18
5 Customer1 225 10.13.2020
19
6 Customer1 310 10.15.2020
20
7 Customer1 310 10.15.2020
21
8 Customer1 310 10.15.2020
22
9 Customer2 50 10.21.2020
23
10 Customer2 70 10.21.2020
24
11 Customer2 100 10.21.2020
25
12 Customer2 120 10.25.2020
26
13 Customer2 140 10.25.2020
27
14 Customer2 155 10.25.2020
28
15 Customer2 40 10.27.2020
29
16 Customer2 30 10.27.2020
30
17 Customer2 80 10.27.2020
31
I’ve tried to use:
JavaScript
1
2
1
df.pivot_table(index = 'Customer', columns = 'Purchase date', values = 'Amount', aggfunc = 'mean', margins = True)
2
Result:
JavaScript
1
5
1
Purchase date 10.01.2020 10.13.2020 10.15.2020 10.21.2020 10.25.2020 10.27.2020 Mean
2
Customer
3
Customer1 144 214 310 NaN NaN NaN 222
4
Customer2 NaN NaN NaN 73 138 50 87
5
Here is what I need to get:
JavaScript
1
5
1
Purchase date 10.01.2020 10.13.2020 10.15.2020 10.21.2020 10.25.2020 10.27.2020 Mean
2
Customer1 432 644 930 NaN NaN NaN 668
3
Customer2 NaN NaN NaN 220 415 150.0 261
4
5
Advertisement
Answer
JavaScript
1
4
1
df['Purchase date'] = pd.to_datetime(df['Purchase date'])
2
# create a month column
3
df['month'] = df['Purchase date'].apply(lambda x: x.month)
4
For getting the sum for each day use groupby
JavaScript
1
2
1
df.groupby(by=['Purchase date', 'Customer']).sum()
2
output
JavaScript
1
8
1
Purchase date Customer Amount month
2
2020-10-01 Customer1 432 30
3
2020-10-13 Customer1 644 30
4
2020-10-15 Customer1 930 30
5
2020-10-21 Customer2 220 30
6
2020-10-25 Customer2 415 30
7
2020-10-27 Customer2 150 30
8
For monthly average again use groupby
JavaScript
1
2
1
df.groupby(by=['month','Customer'] ).mean()
2
output
JavaScript
1
5
1
2
month Customer Amount
3
10 Customer1 222.888889
4
Customer2 87.222222
5
Looking at what you are trying to achieve can be done using pivot table
as
JavaScript
1
3
1
df = df.pivot_table(index = 'Customer', columns = 'Purchase date', values = 'Amount', aggfunc = 'sum', margins = True, margins_name='mean')
2
df['mean'] = df.apply(lambda x: round(x['mean']/(x.count()-1)), axis=1)
3
output
JavaScript
1
6
1
Purchase date 10.01.2020 10.13.2020 10.15.2020 10.21.2020 10.25.2020 10.27.2020 mean
2
Customer
3
Customer1 432.0 644.0 930.0 NaN NaN NaN 669.0
4
Customer2 NaN NaN NaN 220.0 415.0 150.0 262.0
5
mean 432.0 644.0 930.0 220.0 415.0 150.0 930.0
6