I am building a financial model in Python. To do so, I need to calculate the “tax carry forward”. This position consists of the “EBT” plus the “tax carry forward” of the previous period. In the first period, the “tax carry forward” is equal to “EBT”.
I am currently trying to solve this with the df.shift() function:
df2["carry forward"] = df2["EBT"] + df2["carry forward"].shift(periods=1, fill_value=0)
This, however, doesn’t work properly. While I get correct results for the first two iterations (2021, 2022), it doesn’t work anymore from the third iteration onwards.
EBT carry forward year 2021 -377893.353711 -377893.353711 2022 -282754.978037 -660648.331748 2023 -224512.990469 -507267.968506 2024 -167696.637680 -392209.628149
The carry forward, as shown in the table above, for the year 2023 is the sum of EBT 2023 and 2022, which is incorrect. I can’t quite figure out my mistake, because I am not sure how exactly Python is populating columns in a dataframe. To me, it looks like Python isn’t populating the dataframe row by row, but rather simultaneously. Is this the problem? If so, how do I work around it? Is there a better way to do the task than the df.shift() function?
Advertisement
Answer
I assume that you are actually looking for a cumulative sum of the column EBT
for the column of carry_forward
?
For the input of:
EBT carry_forward year 2021 -377893.353711 0 2022 -282754.978037 0 2023 -224512.990469 0 2024 -167696.637680 0
with:
df["carry_forward"] = df["EBT"].cumsum() df
You will get:
EBT carry_forward year 2021 -377893.353711 -3.778934e+05 2022 -282754.978037 -6.606483e+05 2023 -224512.990469 -8.851613e+05 2024 -167696.637680 -1.052858e+06