Skip to content
Advertisement

make correlation plot on time series data in python

I want to see a correlation on a rolling week basis in time series data. The reason because I want to see how rolling correlation moves each year. To do so, I tried to use pandas.corr(), pandas.rolling_corr() built-in function for getting rolling correlation and tried to make line plot, but I couldn’t correct the correlation line chart. I don’t know how should I aggregate time series for getting rolling correlation line chart. Does anyone knows any way of doing this in python? Is there any workaround to get rolling correlation line chart from time series data in pandas? any idea?

my attempt:

I tried of using pandas.corr() to get correlation but it was not helpful to generate rolling correlation line chart. So, here is my new attempt but it is not working. I assume I should think about the right way of data aggregation to make rolling correlation line chart.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

url = 'https://gist.githubusercontent.com/adamFlyn/eb784c86c44fd7ed3f2504157a33dc23/raw/79b6aa4f2e0ffd1eb626dffdcb609eb2cb8dae48/corr.csv'
df = pd.read_csv(url)
df['date'] = pd.to_datetime(df['date'])

def get_corr(df, window=4):
    dfs = []
    for key, value in df:
        value["ROLL_CORR"] = pd.rolling_corr(value["prod_A_price"],value["prod_B_price"], window)
        dfs.append(value)
    df_final = pd.concat(dfs)
    return df_final

corr_df = get_corr(df, window=12)

fig, ax = plt.subplots(figsize=(7, 4), dpi=144)
sns.lineplot(x='week', y='ROLL_CORR', hue='year', data=corr_df,alpha=.8)
plt.show()
plt.close()

doing this way is not working to me. By doing this, I want to see how the rolling correlations move each year. Can anyone point me out possible of doing rolling correlation line chart from time-series data in python? any thoughts?

desired output

here is the desired rolling correlation line chart that I want to get. Note that desired plot was generated from MS excel. I am wondering is there any possible way of doing this in python? Is there any workaround to get a rolling correlation line chart from time-series data in python? how should I correct my current attempt to get the desired output? any thoughts?

Advertisement

Answer

Using your code and description as a starting point. Panda’s Rolling class has an apply function which can be leveraged (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.rolling.Rolling.apply.html#pandas.core.window.rolling.Rolling.apply)

Two tricks are involved to make the code work:

  1. Accessing the whole row in the applied function (Pandas rolling apply using multiple columns)
  2. We call the rolling function on a pandas.Series (here df['week']) to avoid going the applied function once per column
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

url = 'https://gist.githubusercontent.com/adamFlyn/eb784c86c44fd7ed3f2504157a33dc23/raw/79b6aa4f2e0ffd1eb626dffdcb609eb2cb8dae48/corr.csv'
df = pd.read_csv(url)

def get_corr(ser):
    rolling_df = df.loc[ser.index]
    return rolling_df['prod_A_price'].corr(rolling_df['prod_B_price'])

df['ROLL_CORR'] = df['week'].rolling(4).apply(get_corr)

number_years = 3
for week, df_week in df.groupby('week'):
    df = df.append({
        'week': week,
        'year': f'{number_years} year avg',
        'ROLL_CORR': df_week.sort_values(by='date').head(number_years)['ROLL_CORR'].mean()
    }, ignore_index=True)

fig, ax = plt.subplots(figsize=(7, 4), dpi=144)
sns.lineplot(x='week', y='ROLL_CORR', hue='year', data=df,alpha=.8)
plt.show()
plt.close()

You’ll find here the generated image by seaborn

With the 3 year average

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement