I have a large data frame, df, containing 4 columns:
id period ret_1m mkt_ret_1m 131146 CAN00WG0 199609 -0.1538 0.047104 133530 CAN00WG0 199610 -0.0455 -0.014143 135913 CAN00WG0 199611 0.0000 0.040926 138334 CAN00WG0 199612 0.2952 0.008723 140794 CAN00WG0 199701 -0.0257 0.039916 143274 CAN00WG0 199702 -0.0038 -0.025442 145754 CAN00WG0 199703 -0.2992 -0.049279 148246 CAN00WG0 199704 -0.0919 -0.005948 150774 CAN00WG0 199705 0.0595 0.122322 153318 CAN00WG0 199706 -0.0337 0.045765 id period ret_1m mkt_ret_1m 160980 CAN00WH0 199709 0.0757 0.079293 163569 CAN00WH0 199710 -0.0741 -0.044000 166159 CAN00WH0 199711 0.1000 -0.014644 168782 CAN00WH0 199712 -0.0909 -0.007072 171399 CAN00WH0 199801 -0.0100 0.001381 174022 CAN00WH0 199802 0.1919 0.081924 176637 CAN00WH0 199803 0.0085 0.050415 179255 CAN00WH0 199804 -0.0168 0.018393 181880 CAN00WH0 199805 0.0427 -0.051279 184516 CAN00WH0 199806 -0.0656 -0.011516 id period ret_1m mkt_ret_1m 143275 CAN00WO0 199702 -0.1176 -0.025442 145755 CAN00WO0 199703 -0.0074 -0.049279 148247 CAN00WO0 199704 -0.0075 -0.005948 150775 CAN00WO0 199705 0.0451 0.122322
etc.
I am attempting to calculate a common financial measure, known as beta, using a function, that takes two of the columns, ret_1m, the monthly stock_return, and ret_1m_mkt, the market 1 month return for the same period (period_id). I want to apply a function (calc_beta) to calculate the 12-month result of this function on a 12 month rolling basis.
To do this, I am creating a groupby object:
grp = df.groupby('id')
What I would like to do is use something like:
period = 12 for stock, sub_df in grp: arg = sub_df[['ret_1m', 'mkt_ret_1m']] beta = pd.rolling_apply(arg, period, calc_beta, min_periods = period)
Now, here is the first problem. According to the documentation, pd.rolling_apply arg can be either a series or a data frame. However, it appears that the data frame I supply is converted into a numpy array that can only contain one column of data, rather than the two I have tried to supply. So my code below for calc_beta will not work, because I need to pass both the stock and market returns:
def calc_beta(np_array) s = np_array[:,0] # stock returns are column zero from numpy array m = np_array[:,1] # market returns are column one from numpy array covariance = np.cov(s,m) # Calculate covariance between stock and market beta = covariance[0,1]/covariance[1,1] return beta
So my questions are as follows, I think it makes sense to list them in this way:
(i) How can I pass a data frame/multiple series/numpy array with more than one column to calc_beta using rolling_apply? (ii) How can I return more than one value (e.g. the beta) from the calc_beta function? (iii) Having calculated rolling quantities, how can I recombined with the original dataframe df so that I have the rolling quantities corresponding to the correct date in the period column? (iv) Is there a better (vectorized) way of achieving this? I have seen some similar questions using e.g. df.apply(pd.rolling_apply,period,??) but I did not understand how these worked.
I gather that rolling_apply previously was unable to handle data frames, but the documentations suggests that it is now able to do so. My pandas.version is 0.16.1.
Thanks for any help! I have lost 1.5 days trying to figure this out and am totally stumped.
Ultimately, what I want is something like this:
id period ret_1m mkt_ret_1m beta other_quantities 131146 CAN00WG0 199609 -0.1538 0.047104 0.521 xxx 133530 CAN00WG0 199610 -0.0455 -0.014143 0.627 xxxx 135913 CAN00WG0 199611 0.0000 0.040926 0.341 xxx 138334 CAN00WG0 199612 0.2952 0.008723 0.567 xx 140794 CAN00WG0 199701 -0.0257 0.039916 0.4612 xxx 143274 CAN00WG0 199702 -0.0038 -0.025442 0.215 xxx 145754 CAN00WG0 199703 -0.2992 -0.049279 0.4678 xxx 148246 CAN00WG0 199704 -0.0919 -0.005948 -0.4225 xxx 150774 CAN00WG0 199705 0.0595 0.122322 0.780 xxx 153318 CAN00WG0 199706 -0.0337 0.045765 0.623 xxx id period ret_1m mkt_ret_1m beta other_quantities 160980 CAN00WH0 199709 0.0757 0.079293 -0.913 xx 163569 CAN00WH0 199710 -0.0741 -0.044000 0.894 xxx 166159 CAN00WH0 199711 0.1000 -0.014644 0.563 xxx 168782 CAN00WH0 199712 -0.0909 -0.007072 0.734 xxx 171399 CAN00WH0 199801 -0.0100 0.001381 0.894 xxxx 174022 CAN00WH0 199802 0.1919 0.081924 0.789 xx 176637 CAN00WH0 199803 0.0085 0.050415 0.1563 xxxx 179255 CAN00WH0 199804 -0.0168 0.018393 -0.64 xxxx 181880 CAN00WH0 199805 0.0427 -0.051279 -0.742 xxx 184516 CAN00WH0 199806 -0.0656 -0.011516 0.925 xxx id period ret_1m mkt_ret_1m beta 143275 CAN00WO0 199702 -0.1176 -0.025442 -1.52 xx 145755 CAN00WO0 199703 -0.0074 -0.049279 -0.632 xxx 148247 CAN00WO0 199704 -0.0075 -0.005948 1.521 xx 150775 CAN00WO0 199705 0.0451 0.122322 0.0321 xxx
etc.
Advertisement
Answer
def rolling_apply(df, period, func, min_periods=None): if min_periods is None: min_periods = period result = pd.Series(np.nan, index=df.index) for i in range(1, len(df)): sub_df = df.iloc[max(i-period, 0):i,:] #get a subsample to run if len(sub_df) >= min_periods: idx = sub_df.index[-1]+1 # mind the forward looking bias,your return in time t should not be inclued in the beta calculating in time t result[idx] = func(sub_df) return result
I fix a forward looking bias for Happy001’s code. It’s a finance problem, so it should be cautious.
I find that vlmercado‘s answer is so wrong. If you simply use pd.rolling_cov and pd.rolling_var you are making mistakes in finance. Firstly, it’s obvious that the second stock CAN00WH0 do not have any NaN beta, since it use the return of CAN00WG0, which is wrong at all. Secondly, consider such a situation: a stock suspended for ten years, and you can also get that sample into your beta calculating.
I find that pandas.rolling also works for Timestamp, you can see how in my answer above if interested. I change the code of Happy001’s code . It’s not the fastest way, but is at least 20x faster than the origin code.
crsp_daily['date']=pd.to_datetime(crsp_daily['date']) crsp_daily=crsp_daily.set_index('date') # rolling needs a time serie index crsp_daily.index=pd.DatetimeIndex(crsp_daily.index) calc=crsp_daily[['permno','ret','mkt_ret']] grp = calc.groupby('permno') #rolling beta for each stock beta=pd.DataFrame() for stock, sub_df in grp: sub2_df=sub_df[['ret','mkt_ret']].sort_index() beta_m = sub2_df.rolling('1825d',min_periods=150).cov() # 5yr rolling beta , note that d for day, and you cannot use w/m/y, s/d are availiable. beta_m['beta']=beta_m['ret']/beta_m['mkt_ret'] beta_m=beta_m.xs('mkt_ret',level=1,axis=0) beta=beta.append(pd.merge(sub_df,pd.DataFrame(beta_m['beta']))) beta=beta.reset_index() beta=beta[['date','permno','beta']]