Skip to content
Advertisement

sklearn.compose.make_column_transformer(): using SimpleImputer() and OneHotEncoder() in one step on one dataframe column

I have a dataframe containing a column with categorical variables, which also includes NaNs.

JavaScript

I’d like to to use sklearn.compose.make_column_transformer() to prepare the df in a clean way. I tried to impute nan values and OneHotEncode the column with the following code:

JavaScript

Running the transformer on my training data raises

ValueError: Input contains NaN

JavaScript

The desired output would be something like that:

JavaScript

That raises two questions:

  1. Does the transformer computes both the SimpleImputer and the OneHotEncoder in parallel on the original data or in the order I introduced them in the transformer?

  2. How can I change my code so that the OneHotEncoder gets the imputed values as an input? I know that I can solve it outside of the transformer with pandas in two different steps, but I’d like to have the code in a clean pipeline format

Advertisement

Answer

You should use sklearn Pipeline to sequentially apply a list of transforms:

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement