Skip to content
Advertisement

How to build a custom scaler based on StandardScaler?

I am trying to build a custom scaler to scale only the continuous variables on a dataset (the US Adult Income: https://www.kaggle.com/uciml/adult-census-income), using StandardScaler as a base. Here is my Python code that I used:

JavaScript

However when I tried to run the scaler, I met this problem: enter image description here

So what is the error that I have on building the scaler? And furthermore, how could you build a custom scaler for this dataset?

Thank you!

Advertisement

Answer

I agree with @AntoineDubuis, that ColumnTransformer is a better (builtin!) way to do this. That said, I’d like to address where your code goes wrong.

In fit, you have self.scaler.fit(X[self.columns], y); this indicates that self.columns should be a list of column names (or a few other options). But you’ve defined the parameter as continuous = df.iloc[:, np.r_[0,2,10:13]], which is a dataframe.

A couple other issues:

  1. you should only set attributes in __init__ that come from its signature, or cloning will fail. Move self.scaler to fit, and save its parameters copy etc. directly at __init__. Don’t initialize mean_ or var_.
  2. you never actually use mean_ or var_. You can keep them if you want, but the relevant statistics are stored in the scaler object.
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement