Skip to content
Advertisement

Python SKLearn: How to Get Feature Names After OneHotEncoder?

I would like to get the feature names of a data set after it has been transformed by SKLearn OneHotEncoder.

In active_features_ attribute in OneHotEncoder one can see a very good explanation how the attributes n_values_, feature_indices_ and active_features_ get filled after transform() was executed.

My question is:

For e.g. DataFrame based input data:

JavaScript

How does the code look like to get from the original feature names a, b and c to a list of the transformed feature names (like e.g:

a-0,a-1, a-2, b-0, b-1, b-2, b-3, c-0, c-1, c-2, c-3

or

a-0,a-1, a-2, b-0, b-1, b-2, b-3, b-4, b-5, b-6, b-7, b-8

or anything that helps to see the assignment of encoded columns to the original columns).

Background: I would like to see the feature importances of some of the algorithms to get a feeling for which feature have the most effect on the algorithm used.

Advertisement

Answer

You can use pd.get_dummies():

JavaScript

will give you:

JavaScript

which can automatically generates the column names. You can apply this to all your columns and then get the columns names. No need to convert them to a numpy matrix.

So with:

JavaScript

the solution looks like:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement