I have a data frame that is indexed from 1 to 100000 and I want to calculate the slope for every 12 steps. Is there any rolling window for that?
I did the following, but it is not working. The 'slope'
column is created, but all of the values as NaN
.
!pip install yfinance import yfinance as yf from scipy.stats import linregress import pandas as pd import numpy as np # test data df = yf.download('^GSPC',start='2009-11-26',end='2014-12-31',interval='1d') # I want to get the slope for the Close every 7 days def get_slope(array): y = np.array(array) x = np.arange(len(y)) slope, intercept, r_value, p_value, std_err = linregress(x,y) return slope # calculate slope of regression of last 7 days days_back = 7 df['slope'] = df.groupby(df.index)['Close'].rolling(window=days_back, min_periods=days_back).apply(get_slope, raw=True).reset_index(0, drop=True)
Advertisement
Answer
- It’s not necessary to use
.groupby
because there is only 1 record per day. - Don’t use
.reset_index(0, drop=True)
because this is dropping the date index. When you drop the index from the calculation, it no longer matches the index ofdf
, so the data is added asNaN
.df['Close'].rolling(window=days_back, min_periods=days_back).apply(get_slope, raw=True)
creates apandas.Series
. When assigning apandas.Series
to apandas.DataFrame
as a new column, the indices must match.
# add the slope column df['slope'] = df['Close'].rolling(window=days_back, min_periods=days_back).apply(get_slope, raw=True) # plot slope df.plot(y='slope')