How to Create a Correlation Dataframe from already related data

Question

I have a data frame of language similarity. Here is a small snippet that's been edited for simplicity: I would like to create a correlation dataframe such as: To create the first dataframe, I ran: I have tried: Which returns: I have looked at other similar questions but it seems that the data for use in .corr() is by itself

Accepted Answer

Use crosstab to create the all language combinations and fill with the existing data:lg = pd.concat([df[0], df[1]]).unique()  # ['English', 'Spanish', 'Russian']cx = pd.crosstab(lg, lg)cx.update(df.set_index([0, 1]).squeeze().unstack())cx.update(df.set_index([0, 1]).squeeze().unstack().T)>>> cxcol_0    English  Russian  Spanishrow_0English     1.00     0.15      0.5Russian     0.15     1.00      0.0Spanish     0.50     0.00      1.0

Advertisement

Answer