Skip to content
Advertisement

Return highest correlation values pandas

I have this function

def highest_correlation(dataframe):
    corr_table = dataframe.corr().unstack()
    df_corrvalues = corr_table.sort_values(ascending=False)
    
    return df_corrvalues 

correlation = highest_correlation(heart)
correlation

This is the output

age      age        1.000000
sex      sex        1.000000
thall    thall      1.000000
caa      caa        1.000000
slp      slp        1.000000
                      ...   
output   oldpeak   -0.429146
         exng      -0.435601
exng     output    -0.435601
slp      oldpeak   -0.576314
oldpeak  slp       -0.576314
Length: 196, dtype: float64

How can return the highest correlation values that are lower than 1?

That is, I want to remove the 1s that appear on the top when I use sort_values(ascending=False)

Advertisement

Answer

Multiindex Series from the Pandas User Guide

import pandas as pd
from numpy.random import default_rng
rng = default_rng()


arrays = [
    ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
    ["one", "two", "one", "two", "one", "two", "one", "two"],
]

tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
s = pd.Series(rng.standard_normal(8), index=index)
print(s)

Filter for values less than one.

print(s[s<1])

first  second
bar    one       1.602675
       two      -0.197277
baz    one      -0.746729
       two       1.384208
foo    one       1.587294
       two      -1.616769
qux    one      -0.872030
       two      -0.721226
dtype: float64

first  second
bar    two      -0.197277
baz    one      -0.746729
foo    two      -1.616769
qux    one      -0.872030
       two      -0.721226
dtype: float64
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement