Skip to content
Advertisement

Why doesn’t the MinMaxScaler change the sns.pairplot of the dataset?

I’m trying to create a pairplot of my dataset, where the variables are vastly different numbers (some are in the 0-1 range, some, like age and Monthly Income, can go way higher) and I want to scale those variables that go above 1 to 0-1 using the following code:

scale_vars=['MonthlyIncome','age','NumberOfTime30-59DaysPastDueNotWorse','DebtRatio','NumberOfOpenCreditLinesAndLoans',
            'NumberOfTimes90DaysLate','NumberRealEstateLoansOrLines','NumberOfTime60-89DaysPastDueNotWorse',
            'NumberOfDependents']
scaler=MinMaxScaler(copy=False)
train2[scale_vars]=scaler.fit_transform(train2[scale_vars])

My problem is that after scaling the variables and creating the pairplot again, it doesn’t change at all. Do you know what might be the cause for this? Here’s the code I use to create a pairplot:

g=sns.pairplot(train2, hue='SeriousDlqin2yrs', diag_kws={'bw':0.2})

where SeriousDlqin2yrs is the Y variable.

Advertisement

Answer

The plots are expected to look the same, but not exactly – the tick labels should be different. The scaler does a linear transformation, and seaborn chooses the axis limits based on the range of values, so the arrangement of points in the scatter plots does not change.

Since I do not have your data, here is the same effect with Ronald Fisher’s classic iris dataset:

import pandas as pd
import seaborn as sns; sns.set()
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler

iris_dict = load_iris(as_frame=True)
iris = iris_dict['data']
iris['species'] = iris_dict['target']

g = sns.pairplot(iris, hue='species', diag_kws={'bw_method':0.2})

plot with unscaled data

scale_vars = ['sepal length (cm)', 'sepal width (cm)', 
              'petal length (cm)', 'petal width (cm)']
scaler = MinMaxScaler(copy=False)
iris[scale_vars] = scaler.fit_transform(iris[scale_vars])

g = sns.pairplot(iris, hue='species', diag_kws={'bw_method':0.2})

plot with scaled data Note that the column names should have been changed when the scaling was done, because these are no longer centimeters.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement