I have a data frame that is indexed from 1 to 100000 and I want to calculate the slope for every 12 steps. Is there any rolling window for that?
I did the following, but it is not working. The 'slope'
column is created, but all of the values as NaN
.
JavaScript
x
23
23
1
!pip install yfinance
2
import yfinance as yf
3
from scipy.stats import linregress
4
import pandas as pd
5
import numpy as np
6
7
# test data
8
df = yf.download('^GSPC',start='2009-11-26',end='2014-12-31',interval='1d')
9
10
# I want to get the slope for the Close every 7 days
11
def get_slope(array):
12
y = np.array(array)
13
x = np.arange(len(y))
14
slope, intercept, r_value, p_value, std_err = linregress(x,y)
15
return slope
16
17
18
# calculate slope of regression of last 7 days
19
days_back = 7
20
21
df['slope'] = df.groupby(df.index)['Close'].rolling(window=days_back,
22
min_periods=days_back).apply(get_slope, raw=True).reset_index(0, drop=True)
23
Advertisement
Answer
- It’s not necessary to use
.groupby
because there is only 1 record per day. - Don’t use
.reset_index(0, drop=True)
because this is dropping the date index. When you drop the index from the calculation, it no longer matches the index ofdf
, so the data is added asNaN
.df['Close'].rolling(window=days_back, min_periods=days_back).apply(get_slope, raw=True)
creates apandas.Series
. When assigning apandas.Series
to apandas.DataFrame
as a new column, the indices must match.
JavaScript
1
6
1
# add the slope column
2
df['slope'] = df['Close'].rolling(window=days_back, min_periods=days_back).apply(get_slope, raw=True)
3
4
# plot slope
5
df.plot(y='slope')
6