I have 5 years of daily volume data. I want to create a new column in pandas dataframe where it produces the values of YoY Growth for that particular day. For e.g. compares 2018-01-01 with 2017-01-01, and 2019-01-01 compares with 2018-01-01
I have 364 records for each year (except for the year 2020 there are 365 days)
How can I create the column YoY_Growth as below in pandas dataframe.
Advertisement
Answer
# It's more convenient to index the dataframe with the Date for our algorith, df = df.set_index("Date") is_leap_day = (df.index.month == 2) & (df.index.day == 29) # Leap day is an edge case, since you can't find Feb 29 of the previous year. # pandas handles this by shifting to Feb 28 of the previous year: # 2020-02-29 -> 2019-02-28 # 2020-02-28 -> 2019-02-28 # This creates a duplicate for Feb 28. So we need to handle leap day separately. volume_last_year = df.loc[~is_leap_day, "Volume"].shift(freq=pd.DateOffset(years=1)) # For non leap days df["YoY_Growth"] = df["Volume"] / volume_last_year - 1 # For leap days df.loc[is_leap_day, "YoY_Growth"] = ( df.loc[is_leap_day, "Volume"] / volume_last_year.shift(freq=pd.DateOffset(days=1)) - 1 )
Result (Volume
was randomly generated):
df.loc[["2019-01-01", "2019-02-28", "2020-01-01", "2020-02-28", "2020-02-29"], :] Volume YoY_Growth Date 2019-01-01 45 NaN 2019-02-28 23 NaN 2020-01-01 10 -0.777778 # = 10 / 45 - 1 2020-02-28 34 0.478261 # = 34 / 23 - 1 2020-02-29 76 2.304348 # = 76 / 23 - 1