In a nutshell: I need to calculate the Hurst Exponent (HE) across a rolling window inside a pandas dataframe and assign the values to its own column.
The HE function I use was lifted from here as it seemed more robust. For convenience it’s posted below:
def HurstEXP( ts = [ None, ] ):
# TESTED: HurstEXP() Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
""" __doc__
USAGE:
HurstEXP( ts = [ None, ] )
Returns the Hurst Exponent of the time series vector ts[]
PARAMETERS:
ts[,] a time-series, with 100+ elements
( or [ None, ] that produces a demo run )
RETURNS:
float - a Hurst Exponent approximation,
as a real value
or
an explanatory string on an empty call
THROWS:
n/a
EXAMPLE:
>>> HurstEXP() # actual numbers will vary, as per np.random.randn() generator used
HurstEXP( Geometric Browian Motion ): 0.49447454
HurstEXP( Mean-Reverting Series ): -0.00016013
HurstEXP( Trending Series ): 0.95748937
'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'
>>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
REF.s:
>>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
"""
#---------------------------------------------------------------------------------------------------------------------------<self-reflective>
if ( ts[0] == None ): # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:
gbm = np.log( 1000 + np.cumsum( np.random.randn( 100000 ) ) ) # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log( 1000 + rand + ... )]
mr = np.log( 1000 + np.random.randn( 100000 ) ) # a Mean-Reverting Series [log(1000 + rand), log(1000 + rand ), log(1000 + rand ),... log( 1000 + rand )]
tr = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) ) # a Trending Series [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]
# Output the Hurst Exponent for each of the above SYNTH series
print ( "HurstEXP( Geometric Browian Motion ): {0: > 12.8f}".format( HurstEXP( gbm ) ) )
print ( "HurstEXP( Mean-Reverting Series ): {0: > 12.8f}".format( HurstEXP( mr ) ) )
print ( "HurstEXP( Trending Series ): {0: > 12.8f}".format( HurstEXP( tr ) ) )
return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
""" # FIX:
===================================================================================================================
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
0.47537688039105963
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
-0.31081076640420308
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
nan
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:Python27.anacondalibsite-packagesnumpylibpolynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.026867491053098096
"""
pass; too_short_list = 101 - len( ts ) # MUST HAVE 101+ ELEMENTS
if ( 0 < too_short_list ): # IF NOT:
ts = too_short_list * ts[:1] + ts # PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
#---------------------------------------------------------------------------------------------------------------------------
lags = range( 2, 100 ) # Create the range of lag values
tau = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ] # Calculate the array of the variances of the lagged differences
#oly = np.polyfit( np.log( lags ), np.log( tau ), 1 ) # Use a linear fit to estimate the Hurst Exponent
#eturn ( 2.0 * poly[0] ) # Return the Hurst exponent from the polyfit output
""" ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
|>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
C:Python27.anacondalibsite-packagesnumpycore_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
C:Python27.anacondalibsite-packagesnumpycore_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
C:Python27.anacondalibsite-packagesnumpycore_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
ret, rcount, out=ret, casting='unsafe', subok=False)
QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] ) # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:Python27.anacondalibsite-packagesnumpylibpolynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.028471879418359915
|
|
|# DATA:
|
|>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
memmap([ 1763.31005859, 1765.01000977, 1765.44995117, 1764.80004883,
1765.83996582, 1768.91003418, 1771.04003906, 1769.43994141,
1771.4699707 , 1771.61999512, 1774.76000977, 1769.55004883,
1773.4699707 , 1773.32995605, 1770.08996582, 1770.20996094,
1768.34997559, 1768.02001953, 1767.59997559, 1767.23999023,
1768.41003418, 1769.06994629, 1769.56994629, 1770.7800293 ,
1770.56994629, 1769.7800293 , 1769.90002441, 1770.44995117,
1770.9699707 , 1771.04003906, 1771.16003418, 1769.81005859,
1768.76000977, 1769.39001465, 1773.23999023, 1771.91003418,
1766.92004395, 1765.56994629, 1762.65002441, 1760.18005371,
1755. , 1756.67004395, 1753.48999023, 1753.7199707 ,
1751.92004395, 1745.44995117, 1745.44995117, 1744.54003906,
1744.54003906, 1744.84997559, 1744.84997559, 1744.34997559,
1744.34997559, 1743.75 , 1743.75 , 1745.23999023,
1745.23999023, 1745.15002441, 1745.31005859, 1745.47998047,
1745.47998047, 1749.06994629, 1749.06994629, 1748.29003906,
1748.29003906, 1747.42004395, 1747.42004395, 1746.98999023,
1747.61999512, 1748.79003906, 1748.79003906, 1748.38000488,
1748.38000488, 1744.81005859, 1744.81005859, 1736.80004883,
1736.80004883, 1735.43005371, 1735.43005371, 1737.9699707
], dtype=float32
)
|
|
| # CONVERTED .tolist() to avoid .memmap-type artifacts:
|
|>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
[1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
C:Python27.anacondalibsite-packagesnumpycore_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:Python27.anacondalibsite-packagesnumpylibpolynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.028471876494884543
===================================================================================================================
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
0.47537688039105963
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
-0.31081076640420308
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
nan
|
|>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )
Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
C:Python27.anacondalibsite-packagesnumpylibpolynomial.py:594: RankWarning: Polyfit may be poorly conditioned
warnings.warn(msg, RankWarning)
0.026867491053098096
"""
return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )
Now in order to test the function let’s grab some TSLA data from Yahoo finance:
import pandas as pd
import yfinance as yf
from datetime import datetime
from dateutil.relativedelta import relativedelta
years = 5
today = datetime.today().strftime('%Y-%m-%d')
lastyeartoday = (datetime.today() - relativedelta(years=years)).strftime('%Y-%m-%d')
df = yf.download('TSLA',
start=lastyeartoday,
end=today,
progress=False)
df = df.dropna()
df = df[[u'Close']]
df
Output:
Date Close
2016-02-16 31.034000
2016-02-17 33.736000
2016-02-18 33.354000
2016-02-19 33.316002
2016-02-22 35.548000
2021-02-08 863.419983
2021-02-09 849.460022
2021-02-10 804.820007
2021-02-11 811.659973
2021-02-12 816.119995
1259 rows × 1 columns
So far so good. We have the function and the data to test it with. Now let’s do a sanity test, i.e. run the function against a subsample of the data:
import numpy as np
window = 20
hurst = lambda x: (HurstEXP(ts = df[u'Close'][:-x].to_numpy()))
hurst(window)
Output:
0.5163981260143369
Excellent.
Now to the meaty part. Applying the lambda across a rolling window and assigning the result to its own column. I’ve pretty much tried every trick I was able to dig up but cannot make it work.
The vanilla approach:
df.Close.rolling(window).apply(hurst, engine='cython', raw=True)
Gives me the following error:
TypeError: Cannot convert input [[-31.0340004 -33.73600006 -33.35400009 -33.31600189 -35.54800034
-35.44200134 -35.79999924 -37.48600006 -38.06800079 -38.38600159
-37.27000046 -37.66799927 -39.14799881 -40.20800018 -41.05799866
-40.52000046 -41.74399948 -41.0359993 -41.5 -43.02999878]] of type <class 'numpy.ndarray'> to Timestamp
Then I tried to get clever:
hurst = df.apply(lambda x: pd.Series(x.index).rolling(window).agg({'Hurst': lambda window: HurstEXP(x.loc[window])})).reset_index()['Hurst']
df.assign(Hurst=hurst)
Also failed ignominiously. So at this point – half a day later – I’m pretty much stumped. Do any of you hardcore python aficionados know of a way to do this?
Thanks a lot in advance for any insights and pointers.
Advertisement
Answer
I think your problem is that your window is too short. It says in the docstring that the window length has to be 100+ elements, and the Hurst code isn’t handling it properly, resulting in a failure of the SVD.
Separately, your test is actually slicing everything but the last 20 elements, so is actually a long array, which is why it didn’t fail:
tmp = df[u'Close'][:-20].to_numpy()
print(tmp.shape, HurstEXP(ts = tmp))
(1239,) 0.5163981260143368
If you test a window < 100 length, it throws a LinAlg exception:
tmp = df[u'Close'][:20].to_numpy()
print(tmp.shape, HurstEXP(ts = tmp))
(fails)
It should work if increase your rolling window length or repair the code in the Hurst function to pad out the array if it’s too short.
window = 500
df.Close.rolling(window).apply(lambda x: HurstEXP(ts = x), raw=True)
The code in the HurstEXP
function for handling lists shorter than 100 elements won’t work for values of ts
that are np.ndarray
objects like those being provided from the .rolling(raw=True)
.
You could modify the function to start with the following, and it will work for windows under 100 elements:
def HurstEXP( ts= [ None, ] ):
if isinstance(ts, np.ndarray):
ts = ts.tolist()
…alternatively, if you’re always going to have numpy arrays, you could change the line that fixes it:
ts = too_short_list * ts[:1] + ts # PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
to
ts = np.pad(ts, pad_width=(too_short_list,0), mode='edge')