Skip to content
Advertisement

Use .corr to get the correlation between two columns

I have the following pandas dataframe Top15: enter image description here

I create a column that estimates the number of citable documents per person:

JavaScript

I want to know the correlation between the number of citable documents per capita and the energy supply per capita. So I use the .corr() method (Pearson’s correlation):

JavaScript

I want to return a single number, but the result is: enter image description here

Advertisement

Answer

Without actual data it is hard to answer the question but I guess you are looking for something like this:

JavaScript

That calculates the correlation between your two columns 'Citable docs per Capita' and 'Energy Supply per Capita'.

To give an example:

JavaScript

Then

JavaScript

gives 1 as expected.

Now, if you change a value, e.g.

JavaScript

the command

JavaScript

returns

JavaScript

which is still close to 1, as expected.

If you apply .corr() directly to your dataframe, it will return all pairwise correlations between your columns; that’s why you then observe 1s at the diagonal of your matrix (each column is perfectly correlated with itself).

JavaScript

will therefore return

JavaScript

In the graphic you show, only the upper left corner of the correlation matrix is represented (I assume).

There can be cases, where you get NaNs in your solution – check this post for an example.

If you want to filter entries above/below a certain threshold, you can check this question. If you want to plot a heatmap of the correlation coefficients, you can check this answer and if you then run into the issue with overlapping axis-labels check the following post.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement