Python SKLearn: How to Get Feature Names After OneHotEncoder?

Question

I would like to get the feature names of a data set after it has been transformed by SKLearn OneHotEncoder. In active_features_ attribute in OneHotEncoder one can see a very good explanation how the attributes n_values_, feature_indices_ and active_features_ get filled after transform() was executed. My question is: For e.g. DataFrame based input data: How does the code look like

Accepted Answer

You can use pd.get_dummies():pd.get_dummies(data["a"],prefix="a")will give you:    a_0 a_1 a_20   1   0   01   0   1   02   0   0   13   1   0   0which can automatically generates the column names. You can apply this to all your columns and then get the columns names. No need to convert them to a numpy matrix.So with:df = pd.DataFrame({"a": [0, 1, 2,0], "b": [0,1,4, 5], "c":[0,1,4, 5]})data = df.as_matrix()the solution looks like:columns = df.columnsmy_result = pd.DataFrame()temp = pd.DataFrame()for runner in columns:    temp = pd.get_dummies(df[runner], prefix=runner)    my_result[temp.columns] = tempprint(my_result.columns)>>Index(['a_0', 'a_1', 'a_2', 'b_0', 'b_1', 'b_4', 'b_5', 'c_0', 'c_1', 'c_4',       'c_5'],      dtype='object')

Advertisement

Answer