I have a data frame that is indexed from 1 to 100000 and I want to calculate the slope for every 12 steps. Is there any rolling window for that?
I did the following, but it is not working. The 'slope' column is created, but all of the values as NaN.
!pip install yfinance
import yfinance as yf
from scipy.stats import linregress
import pandas as pd
import numpy as np
# test data
df = yf.download('^GSPC',start='2009-11-26',end='2014-12-31',interval='1d')
# I want to get the slope for the Close every 7 days
def get_slope(array):
y = np.array(array)
x = np.arange(len(y))
slope, intercept, r_value, p_value, std_err = linregress(x,y)
return slope
# calculate slope of regression of last 7 days
days_back = 7
df['slope'] = df.groupby(df.index)['Close'].rolling(window=days_back,
min_periods=days_back).apply(get_slope, raw=True).reset_index(0, drop=True)
Advertisement
Answer
- It’s not necessary to use
.groupbybecause there is only 1 record per day. - Don’t use
.reset_index(0, drop=True)because this is dropping the date index. When you drop the index from the calculation, it no longer matches the index ofdf, so the data is added asNaN.df['Close'].rolling(window=days_back, min_periods=days_back).apply(get_slope, raw=True)creates apandas.Series. When assigning apandas.Seriesto apandas.DataFrameas a new column, the indices must match.
# add the slope column df['slope'] = df['Close'].rolling(window=days_back, min_periods=days_back).apply(get_slope, raw=True) # plot slope df.plot(y='slope')
