Say I have a pd.DataFrame() that I differenced with .diff(5), which works like “new number at idx i = (number at idx i) – (number at idx i-5)”
JavaScript
x
5
1
import pandas as pd
2
import random
3
example_df = pd.DataFrame(data=random.sample(range(1, 100), 20), columns=["number"])
4
df_diff = example_df.diff(5)
5
Now I want to undo this operation using the first 5 entries of example_df, and using df_diff.
If i had done .diff(1), I would simply use .cumsum(). But how can I achieve that it only sums up every 5th value?
My desired output is a df with the following values:
JavaScript
1
11
11
1
df_example[0]
2
df_example[1]
3
df_example[2]
4
df_example[3]
5
df_example[4]
6
df_diff[5] + df_example[0]
7
df_diff[6] + df_example[1]
8
df_diff[7] + df_example[2]
9
df_diff[8] + df_example[3]
10
11
Advertisement
Answer
you could shift the column, add them and fill nans:
JavaScript
1
4
1
df_diff["shifted"] = example_df.shift(5)
2
df_diff["undone"] = df_diff["number"] + df_diff["shifted"]
3
df_diff["undone"] = df_diff["undone"].fillna(example_df["number"])
4