Using pct_change on grouped multiindex dataframe with a datestamp

Tags: ,



I have a following dataframe df:

datestamp       device      country     users
2021-01-14      ipad        uk          10
2021-01-14      iphone      uk          15
2021-01-14      ipad        us          20
2021-01-14      iphone      us          40
2021-01-14      ipad        fr          100
2021-01-14      iphone      fr          50
2021-01-15      ipad        uk          20
2021-01-15      iphone      uk          10
2021-01-15      ipad        us          20
2021-01-15      iphone      us          60
2021-01-15      ipad        fr          300
2021-01-15      iphone      fr          500

And I want to know percentage change of users per datestamp device country columns.

I tried:

df.groupby(['datestamp','country', 'device']).count().pct_change().reset_index()

But it ignores the grouping and checks it simply row by row.

Desired result would look like this:

datestamp       device      country     users       change
2021-01-14      ipad        uk          10          np.nan
2021-01-14      iphone      uk          15          np.nan
2021-01-14      ipad        us          20          np.nan
2021-01-14      iphone      us          40          np.nan
2021-01-14      ipad        fr          100         np.nan
2021-01-14      iphone      fr          50          np.nan
2021-01-15      ipad        uk          20          100%
2021-01-15      iphone      uk          10          -33%
2021-01-15      ipad        us          20          0%
2021-01-15      iphone      us          60          50%
2021-01-15      ipad        fr          300         300%
2021-01-15      iphone      fr          500         1000%

Answer

It looks like you want the percent change for each device / country combination. And the change reflects year over year. In which case, you don’t want to group by datestamp. Instead, you should sort by datestamp and groupby device and country:

df['change'] = df.sort_values('datestamp') 
                 .groupby(['device', 'country']) 
                 .users 
                 .pct_change() 
                 .mul(100)

df
#     datestamp  device country  users      change
#0   2021-01-14    ipad      uk     10         NaN
#1   2021-01-14  iphone      uk     15         NaN
#2   2021-01-14    ipad      us     20         NaN
#3   2021-01-14  iphone      us     40         NaN
#4   2021-01-14    ipad      fr    100         NaN
#5   2021-01-14  iphone      fr     50         NaN
#6   2021-01-15    ipad      uk     20  100.000000
#7   2021-01-15  iphone      uk     10  -33.333333
#8   2021-01-15    ipad      us     20    0.000000
#9   2021-01-15  iphone      us     60   50.000000
#10  2021-01-15    ipad      fr    300  200.000000
#11  2021-01-15  iphone      fr    500  900.000000


Source: stackoverflow