I have a following dataframe df
:
datestamp device country users 2021-01-14 ipad uk 10 2021-01-14 iphone uk 15 2021-01-14 ipad us 20 2021-01-14 iphone us 40 2021-01-14 ipad fr 100 2021-01-14 iphone fr 50 2021-01-15 ipad uk 20 2021-01-15 iphone uk 10 2021-01-15 ipad us 20 2021-01-15 iphone us 60 2021-01-15 ipad fr 300 2021-01-15 iphone fr 500
And I want to know percentage change of users
per datestamp device country
columns.
I tried:
df.groupby(['datestamp','country', 'device']).count().pct_change().reset_index()
But it ignores the grouping and checks it simply row by row.
Desired result would look like this:
datestamp device country users change 2021-01-14 ipad uk 10 np.nan 2021-01-14 iphone uk 15 np.nan 2021-01-14 ipad us 20 np.nan 2021-01-14 iphone us 40 np.nan 2021-01-14 ipad fr 100 np.nan 2021-01-14 iphone fr 50 np.nan 2021-01-15 ipad uk 20 100% 2021-01-15 iphone uk 10 -33% 2021-01-15 ipad us 20 0% 2021-01-15 iphone us 60 50% 2021-01-15 ipad fr 300 300% 2021-01-15 iphone fr 500 1000%
Advertisement
Answer
It looks like you want the percent change for each device
/ country
combination. And the change reflects year over year. In which case, you don’t want to group by datestamp
. Instead, you should sort by datestamp
and groupby device
and country
:
df['change'] = df.sort_values('datestamp') .groupby(['device', 'country']) .users .pct_change() .mul(100) df # datestamp device country users change #0 2021-01-14 ipad uk 10 NaN #1 2021-01-14 iphone uk 15 NaN #2 2021-01-14 ipad us 20 NaN #3 2021-01-14 iphone us 40 NaN #4 2021-01-14 ipad fr 100 NaN #5 2021-01-14 iphone fr 50 NaN #6 2021-01-15 ipad uk 20 100.000000 #7 2021-01-15 iphone uk 10 -33.333333 #8 2021-01-15 ipad us 20 0.000000 #9 2021-01-15 iphone us 60 50.000000 #10 2021-01-15 ipad fr 300 200.000000 #11 2021-01-15 iphone fr 500 900.000000