I have a following dataframe df:
datestamp device country users 2021-01-14 ipad uk 10 2021-01-14 iphone uk 15 2021-01-14 ipad us 20 2021-01-14 iphone us 40 2021-01-14 ipad fr 100 2021-01-14 iphone fr 50 2021-01-15 ipad uk 20 2021-01-15 iphone uk 10 2021-01-15 ipad us 20 2021-01-15 iphone us 60 2021-01-15 ipad fr 300 2021-01-15 iphone fr 500
And I want to know percentage change of users per datestamp device country columns.
I tried:
df.groupby(['datestamp','country', 'device']).count().pct_change().reset_index()
But it ignores the grouping and checks it simply row by row.
Desired result would look like this:
datestamp device country users change 2021-01-14 ipad uk 10 np.nan 2021-01-14 iphone uk 15 np.nan 2021-01-14 ipad us 20 np.nan 2021-01-14 iphone us 40 np.nan 2021-01-14 ipad fr 100 np.nan 2021-01-14 iphone fr 50 np.nan 2021-01-15 ipad uk 20 100% 2021-01-15 iphone uk 10 -33% 2021-01-15 ipad us 20 0% 2021-01-15 iphone us 60 50% 2021-01-15 ipad fr 300 300% 2021-01-15 iphone fr 500 1000%
Advertisement
Answer
It looks like you want the percent change for each device / country combination. And the change reflects year over year. In which case, you don’t want to group by datestamp. Instead, you should sort by datestamp and groupby device and country:
df['change'] = df.sort_values('datestamp')
.groupby(['device', 'country'])
.users
.pct_change()
.mul(100)
df
# datestamp device country users change
#0 2021-01-14 ipad uk 10 NaN
#1 2021-01-14 iphone uk 15 NaN
#2 2021-01-14 ipad us 20 NaN
#3 2021-01-14 iphone us 40 NaN
#4 2021-01-14 ipad fr 100 NaN
#5 2021-01-14 iphone fr 50 NaN
#6 2021-01-15 ipad uk 20 100.000000
#7 2021-01-15 iphone uk 10 -33.333333
#8 2021-01-15 ipad us 20 0.000000
#9 2021-01-15 iphone us 60 50.000000
#10 2021-01-15 ipad fr 300 200.000000
#11 2021-01-15 iphone fr 500 900.000000