Skip to content
Advertisement

Merging pandas get_dummies back to categorical values

I have a pandas dataframe which I have one hot encoded with get_dummies, the data previously had a ‘type’ column which contained the values small_airport, large_airport, medium_airport, I split the type column in to each different type of airport with 1s and 0s where the frequencies matched. After using get_dummies, it looks a bit like this:

frequency_mhz, type_large_airport, type_medium_airport, type_small_airport
-122.3648, 0, 1, 0

Basically I need now to ‘undo’ the get_dummies and get the type column back. I have looked at loads of similar questions and have tried using.

df = pd.get_dummies(data).idxmax(1)

but I can’t seem to get the result I need, or I am not understanding the answers enough to implement them in to my own project.

I really hope that is clear! Any help would be massively appreciated!

Advertisement

Answer

Test df:

   key_a  key_b  key_c
0      0      1      0
1      0      1      0
2      1      0      0
3      0      0      1
4      1      0      0
5      0      1      0

Code:

df.idxmax(axis='columns')

Output:

   key_a  key_b  key_c    key
0      0      1      0  key_b
1      0      1      0  key_b
2      1      0      0  key_a
3      0      0      1  key_c
4      1      0      0  key_a
5      0      1      0  key_b

For your case, you may want to explicitly choose the columns you’re working with, aka:

df['airport_type'] = df[['type_large_airport', 'type_medium_airport', 'type_small_airport']].idxmax(axis='columns')

And then if you want to simplify the results:

df.replace(['type_large_airport', 'type_medium_airport', 'type_small_airport'], ['large', 'medium', 'small'], inplace=True)
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement