Skip to content
Advertisement

Is it possible to create an additional pct_change column for each column of the dataframe?

I tried to solve this on my own and have searched other topics for help, yet, my problem remains. If anyone can help me or point me to the right direction I would appreciate

I’m fairly new to python and I’m trying to perform some changes on a pandas dataframe. To summarize, I want to verify percental changes over sales data.

I’m aware of the pct_change method and below is what I’ve tried.

This is a sample data that looks like my original dataframe:

  store_id  sales_value      day
0   1       54,141.00        2020-12-22
1   1       78,921.00        2020-12-23
2   6       5,894.00         2020-12-24
3   6       22,991.00        2020-12-23
4   6       25,802.00        2020-12-22

I made a function to calculate variations over rows. it looks like this:

def var_percent(df, n):
    return df.pct_change(n)

Then, using:

df['var_pct'] = var_percent(df['sales_value'],1)

it gave me something like the following:

  store_id  sales_value    day           var_pct
0   1       54,141.00      2020-12-22    nan
1   1       78,921.00      2020-12-23    0.46
4   6       25,802.00      2020-12-22   -0.67
3   6       22,991.00      2020-12-23   -0.11
2   6       5,894.00       2020-12-24   -0.74

That’s not really what I want. I need to see changes for each store alone (store_id), where this type of configuration calculates over rows, no matter from which store it is.

Moving forward I tried this:

df.set_index(["day", "store_id"]).unstack(level=1)

Finally I get my actual dataframe, the one I’m stuck with and it’s kind of like this:

sotore_id     1      6       15      22   
day                                         
2020-12-22  54141   25802   173399  36,200.00   
2020-12-23  78921   22991   234885  32,762.00   
2020-12-24  0       5894         0  10,956.00   
2020-12-26  0   0   0         0.00      10980   
2020-12-28  0   0   0         0.00          0   

Now the dataframe is how I need it to be, but I haven’t found a way to implement pct_change the way I want, which would be something like this, adding a percental change column for each existing column (these are dummy numbers, it’s just a visual representation of how I’d like it to be):

sotore_id     1     1_pct    6       6_pct      15      15_pct    
day                                         
2020-12-22  54141   0        25802       0      173399       0  
2020-12-23  78921   25       22991      -8      234885      20  
2020-12-24  0       0        5894      -60      0            0
2020-12-26  0       0        0.00        0      10980     1000
2020-12-28  0       0        0.00        0      0            0

Is is even possible to do so?

Advertisement

Answer

You can use the below:

sales value should be converted to a numeric and date should be changed to datetime, then the data should be sorted. If all of these are already done , you can skip this block:

df['sales_value']=pd.to_numeric(df['sales_value'].str.replace(",",''))
df['day'] = pd.to_datetime(df['day'])
df = df.sort_values(['store_id','day'])

Calculate pct_change for each group and then unstack

out = (df.assign(pct=df.groupby("store_id")['sales_value'].pct_change()
      .mul(100).round()).set_index(["day", "store_id"])
      .unstack(level=1).fillna(0).sort_index(level=1,axis=1))

out.columns =  out.columns.map('{0[1]} {0[0]}'.format)

print(out)
day 1 pct 1 sales_value 6 pct 6 sales_value
2020-12-22 0.0 54141.0 0.0 25802.0
2020-12-23 46.0 78921.0 -11.0 22991.0
2020-12-24 0.0 0.0 -74.0 5894.0
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement