I tried to solve this on my own and have searched other topics for help, yet, my problem remains. If anyone can help me or point me to the right direction I would appreciate
I’m fairly new to python and I’m trying to perform some changes on a pandas dataframe. To summarize, I want to verify percental changes over sales data.
I’m aware of the pct_change method and below is what I’ve tried.
This is a sample data that looks like my original dataframe:
store_id sales_value day 0 1 54,141.00 2020-12-22 1 1 78,921.00 2020-12-23 2 6 5,894.00 2020-12-24 3 6 22,991.00 2020-12-23 4 6 25,802.00 2020-12-22
I made a function to calculate variations over rows. it looks like this:
def var_percent(df, n): return df.pct_change(n)
Then, using:
df['var_pct'] = var_percent(df['sales_value'],1)
it gave me something like the following:
store_id sales_value day var_pct 0 1 54,141.00 2020-12-22 nan 1 1 78,921.00 2020-12-23 0.46 4 6 25,802.00 2020-12-22 -0.67 3 6 22,991.00 2020-12-23 -0.11 2 6 5,894.00 2020-12-24 -0.74
That’s not really what I want. I need to see changes for each store alone (store_id), where this type of configuration calculates over rows, no matter from which store it is.
Moving forward I tried this:
df.set_index(["day", "store_id"]).unstack(level=1)
Finally I get my actual dataframe, the one I’m stuck with and it’s kind of like this:
sotore_id 1 6 15 22 day 2020-12-22 54141 25802 173399 36,200.00 2020-12-23 78921 22991 234885 32,762.00 2020-12-24 0 5894 0 10,956.00 2020-12-26 0 0 0 0.00 10980 2020-12-28 0 0 0 0.00 0
Now the dataframe is how I need it to be, but I haven’t found a way to implement pct_change the way I want, which would be something like this, adding a percental change column for each existing column (these are dummy numbers, it’s just a visual representation of how I’d like it to be):
sotore_id 1 1_pct 6 6_pct 15 15_pct day 2020-12-22 54141 0 25802 0 173399 0 2020-12-23 78921 25 22991 -8 234885 20 2020-12-24 0 0 5894 -60 0 0 2020-12-26 0 0 0.00 0 10980 1000 2020-12-28 0 0 0.00 0 0 0
Is is even possible to do so?
Advertisement
Answer
You can use the below:
sales value should be converted to a numeric and date should be changed to datetime, then the data should be sorted. If all of these are already done , you can skip this block:
df['sales_value']=pd.to_numeric(df['sales_value'].str.replace(",",'')) df['day'] = pd.to_datetime(df['day']) df = df.sort_values(['store_id','day'])
Calculate pct_change
for each group and then unstack
out = (df.assign(pct=df.groupby("store_id")['sales_value'].pct_change() .mul(100).round()).set_index(["day", "store_id"]) .unstack(level=1).fillna(0).sort_index(level=1,axis=1)) out.columns = out.columns.map('{0[1]} {0[0]}'.format)
print(out)
day | 1 pct | 1 sales_value | 6 pct | 6 sales_value |
---|---|---|---|---|
2020-12-22 | 0.0 | 54141.0 | 0.0 | 25802.0 |
2020-12-23 | 46.0 | 78921.0 | -11.0 | 22991.0 |
2020-12-24 | 0.0 | 0.0 | -74.0 | 5894.0 |