I want to see a correlation on a rolling week basis in time series data. The reason because I want to see how rolling correlation moves each year. To do so, I tried to use pandas.corr()
, pandas.rolling_corr()
built-in function for getting rolling correlation and tried to make line plot, but I couldn’t correct the correlation line chart. I don’t know how should I aggregate time series for getting rolling correlation line chart. Does anyone knows any way of doing this in python? Is there any workaround to get rolling correlation line chart from time series data in pandas? any idea?
my attempt:
I tried of using pandas.corr()
to get correlation but it was not helpful to generate rolling correlation line chart. So, here is my new attempt but it is not working. I assume I should think about the right way of data aggregation to make rolling correlation line chart.
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns url = 'https://gist.githubusercontent.com/adamFlyn/eb784c86c44fd7ed3f2504157a33dc23/raw/79b6aa4f2e0ffd1eb626dffdcb609eb2cb8dae48/corr.csv' df = pd.read_csv(url) df['date'] = pd.to_datetime(df['date']) def get_corr(df, window=4): dfs = [] for key, value in df: value["ROLL_CORR"] = pd.rolling_corr(value["prod_A_price"],value["prod_B_price"], window) dfs.append(value) df_final = pd.concat(dfs) return df_final corr_df = get_corr(df, window=12) fig, ax = plt.subplots(figsize=(7, 4), dpi=144) sns.lineplot(x='week', y='ROLL_CORR', hue='year', data=corr_df,alpha=.8) plt.show() plt.close()
doing this way is not working to me. By doing this, I want to see how the rolling correlations move each year. Can anyone point me out possible of doing rolling correlation line chart from time-series data in python? any thoughts?
desired output
here is the desired rolling correlation line chart that I want to get. Note that desired plot was generated from MS excel. I am wondering is there any possible way of doing this in python? Is there any workaround to get a rolling correlation line chart from time-series data in python? how should I correct my current attempt to get the desired output? any thoughts?
Advertisement
Answer
Using your code and description as a starting point.
Panda’s Rolling
class has an apply
function which can be leveraged (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.rolling.Rolling.apply.html#pandas.core.window.rolling.Rolling.apply)
Two tricks are involved to make the code work:
- Accessing the whole row in the applied function (Pandas rolling apply using multiple columns)
- We call the
rolling
function on apandas.Series
(heredf['week']
) to avoid going the applied function once per column
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns url = 'https://gist.githubusercontent.com/adamFlyn/eb784c86c44fd7ed3f2504157a33dc23/raw/79b6aa4f2e0ffd1eb626dffdcb609eb2cb8dae48/corr.csv' df = pd.read_csv(url) def get_corr(ser): rolling_df = df.loc[ser.index] return rolling_df['prod_A_price'].corr(rolling_df['prod_B_price']) df['ROLL_CORR'] = df['week'].rolling(4).apply(get_corr) number_years = 3 for week, df_week in df.groupby('week'): df = df.append({ 'week': week, 'year': f'{number_years} year avg', 'ROLL_CORR': df_week.sort_values(by='date').head(number_years)['ROLL_CORR'].mean() }, ignore_index=True) fig, ax = plt.subplots(figsize=(7, 4), dpi=144) sns.lineplot(x='week', y='ROLL_CORR', hue='year', data=df,alpha=.8) plt.show() plt.close()