Skip to content
Advertisement

Rolling window calculation is added to the dataframe as a column of NaN

I have a data frame that is indexed from 1 to 100000 and I want to calculate the slope for every 12 steps. Is there any rolling window for that?

I did the following, but it is not working. The 'slope' column is created, but all of the values as NaN.

!pip install yfinance
import yfinance as yf 
from scipy.stats import linregress
import pandas as pd
import numpy as np

# test data
df = yf.download('^GSPC',start='2009-11-26',end='2014-12-31',interval='1d')

# I want to get the slope for the Close every 7 days 
def get_slope(array):
    y = np.array(array)
    x = np.arange(len(y))
    slope, intercept, r_value, p_value, std_err = linregress(x,y)
    return slope


# calculate slope of regression of last 7 days
days_back = 7

df['slope'] = df.groupby(df.index)['Close'].rolling(window=days_back,
                               min_periods=days_back).apply(get_slope, raw=True).reset_index(0, drop=True)

Advertisement

Answer

  1. It’s not necessary to use .groupby because there is only 1 record per day.
  2. Don’t use .reset_index(0, drop=True) because this is dropping the date index. When you drop the index from the calculation, it no longer matches the index of df, so the data is added as NaN.
    • df['Close'].rolling(window=days_back, min_periods=days_back).apply(get_slope, raw=True) creates a pandas.Series. When assigning a pandas.Series to a pandas.DataFrame as a new column, the indices must match.
# add the slope column
df['slope'] = df['Close'].rolling(window=days_back, min_periods=days_back).apply(get_slope, raw=True)

# plot slope
df.plot(y='slope')

enter image description here

Advertisement