Extracting feature names from sklearn column transformer

Question

I'm using sklearn.pipeline to transform my features and fit a model, so my general flow looks like this: column transformer --> general pipeline --> model. I would like to be able to extract feature names from the column transformer (since the following step, general pipeline applies the same transformation to all columns, e.g. nan_to_zero) and use them for model explainability

Accepted Answer

It seems the problem is generated by the encode="ordinal" parameter passed to the KBinsDiscretizer constructor. The bug is tracked in GitHub issue #22731 and GitHub issue #22841 and solved with PR #22735.Indeed, you might see that by specifying encode="onehot" you might get a consistent result:import numpy as npimport pandas as pdfrom sklearn import compose, pipeline, preprocessingdf = pd.DataFrame({"a": [1, 2, 3], "b": [1, 2, 3], "c": ["x", "y", "z"]})column_transformer = compose.make_column_transformer(   (preprocessing.StandardScaler(), ["a", "b"]),   (preprocessing.KBinsDiscretizer(n_bins=2, encode="onehot"), ["a"]),   (preprocessing.OneHotEncoder(), ["c"]),)pipe = pipeline.Pipeline([   ("transform", column_transformer),   ("nan_to_num", preprocessing.FunctionTransformer(np.nan_to_num, validate=False))])pipe.fit_transform(df) pipe.named_steps['transform'].get_feature_names_out()# array(['standardscaler__a', 'standardscaler__b', 'kbinsdiscretizer__a_0.0', 'kbinsdiscretizer__a_1.0','onehotencoder__c_x', 'onehotencoder__c_y', 'onehotencoder__c_z'], dtype=object)Besides this, everything seems fine to me.Eventually, apparently, even installing the nightly builds, I still get the same error.

Advertisement

Answer