I have a following dataframe df
:
JavaScript
x
14
14
1
datestamp device country users
2
2021-01-14 ipad uk 10
3
2021-01-14 iphone uk 15
4
2021-01-14 ipad us 20
5
2021-01-14 iphone us 40
6
2021-01-14 ipad fr 100
7
2021-01-14 iphone fr 50
8
2021-01-15 ipad uk 20
9
2021-01-15 iphone uk 10
10
2021-01-15 ipad us 20
11
2021-01-15 iphone us 60
12
2021-01-15 ipad fr 300
13
2021-01-15 iphone fr 500
14
And I want to know percentage change of users
per datestamp device country
columns.
I tried:
JavaScript
1
2
1
df.groupby(['datestamp','country', 'device']).count().pct_change().reset_index()
2
But it ignores the grouping and checks it simply row by row.
Desired result would look like this:
JavaScript
1
14
14
1
datestamp device country users change
2
2021-01-14 ipad uk 10 np.nan
3
2021-01-14 iphone uk 15 np.nan
4
2021-01-14 ipad us 20 np.nan
5
2021-01-14 iphone us 40 np.nan
6
2021-01-14 ipad fr 100 np.nan
7
2021-01-14 iphone fr 50 np.nan
8
2021-01-15 ipad uk 20 100%
9
2021-01-15 iphone uk 10 -33%
10
2021-01-15 ipad us 20 0%
11
2021-01-15 iphone us 60 50%
12
2021-01-15 ipad fr 300 300%
13
2021-01-15 iphone fr 500 1000%
14
Advertisement
Answer
It looks like you want the percent change for each device
/ country
combination. And the change reflects year over year. In which case, you don’t want to group by datestamp
. Instead, you should sort by datestamp
and groupby device
and country
:
JavaScript
1
21
21
1
df['change'] = df.sort_values('datestamp')
2
.groupby(['device', 'country'])
3
.users
4
.pct_change()
5
.mul(100)
6
7
df
8
# datestamp device country users change
9
#0 2021-01-14 ipad uk 10 NaN
10
#1 2021-01-14 iphone uk 15 NaN
11
#2 2021-01-14 ipad us 20 NaN
12
#3 2021-01-14 iphone us 40 NaN
13
#4 2021-01-14 ipad fr 100 NaN
14
#5 2021-01-14 iphone fr 50 NaN
15
#6 2021-01-15 ipad uk 20 100.000000
16
#7 2021-01-15 iphone uk 10 -33.333333
17
#8 2021-01-15 ipad us 20 0.000000
18
#9 2021-01-15 iphone us 60 50.000000
19
#10 2021-01-15 ipad fr 300 200.000000
20
#11 2021-01-15 iphone fr 500 900.000000
21