Skip to content
Advertisement

Scaling / Normalizing pandas column

I have a dataframe like:

TOTAL | Name
3232     Jane
382      Jack
8291     Jones

I’d like to create a newly scaled column in the dataframe called SIZE where SIZE is a number between 5 and 50.

For Example:

TOTAL | Name | SIZE
3232     Jane   24.413
382      Jack   10
8291     Jones  50

I’ve tried

from sklearn.preprocessing import MinMaxScaler
import pandas as pd

scaler=MinMaxScaler(feature_range=(10,50))
df["SIZE"]=scaler.fit_transform(df["TOTAL"])

but got Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I’ve tried other things, such as creating a list, transforming it, and appending it back to the dataframe, among other things.

What is the easiest way to do this?

Thanks!

Advertisement

Answer

Option 1
sklearn
You see this problem time and time again, the error really should be indicative of what you need to do. You’re basically missing a superfluous dimension on the input. Change df["TOTAL"] to df[["TOTAL"]].

df['SIZE'] = scaler.fit_transform(df[["TOTAL"]])

df
   TOTAL   Name       SIZE
0   3232   Jane  24.413959
1    382   Jack  10.000000
2   8291  Jones  50.000000

Option 2
pandas
Preferably, I would bypass sklearn and just do the min-max scaling myself.

a, b = 10, 50
x, y = df.TOTAL.min(), df.TOTAL.max()
df['SIZE'] = (df.TOTAL - x) / (y - x) * (b - a) + a

df
   TOTAL   Name       SIZE
0   3232   Jane  24.413959
1    382   Jack  10.000000
2   8291  Jones  50.000000

This is essentially what the min-max scaler does, but without the overhead of importing scikit learn (don’t do it unless you have to, it’s a heavy library).

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement