Cramer V correlation in python but instead of using frequency using weights?

Question

So the dataset for Cramer V correlation has multiple categorical variables in columns, but there is also a column that is there telling us how often these values appear. Similar to table below: I want to figure out how to calculate the Cramer V correlation between season/Age/Weather and the weight is sales? If doable how would one write something to

Accepted Answer

As you probably know, Cramer&#8217;s V measures association between two nominal variables.  So you can convert your current table into separate contingency tables for each pairwise combination of your variables and then compute pairwise statistics.Code to create a table similar to yours:from itertools import productimport numpy as npimport pandas as pdimport scipy.stats as statsnp.random.seed(42)all_combs = product(    ['Spring', 'Summer', 'Fall', 'Winter'],    ['New', 'Old'],    ['Cold', 'Warm', 'Hot'])df = pd.DataFrame(all_combs, columns=['Season', 'Age', 'Weather'])df['Sales'] = np.random.randint(25, 200, len(df))df.head()#     Season    Age    Weather    Sales# 0   Spring    New      Cold       127# 1   Spring    New      Warm       117# 2   Spring    New       Hot        39# 3   Spring    Old      Cold       131# 4   Spring    Old      Warm        96Convert the table into a contingency table for measuring association between Season and Age and save it as 2-d array:cont = df.pivot_table('Sales', 'Season', 'Age', 'sum')cont#    Age    New Old# Season        # Fall      459 277# Spring    283 272# Summer    372 377# Winter    356 384cont_arr = cont.valuesNow, you can calculate the chi-squared statistic and from that compute Cramer&#8217;s V.  The formula for Cramer&#8217;s V can be found here.chi2 = stats.chi2_contingency(cont_arr, correction=False)[0]sample_size = np.sum(cont_arr)min_dim = min(cont_arr.shape) - 1cramer_v = np.sqrt((chi2 / sample_size) / min_dim)cramer_v# 0.1157257...

Advertisement

Answer