I have a pandas dataframe which I have one hot encoded with get_dummies, the data previously had a ‘type’ column which contained the values small_airport, large_airport, medium_airport, I split the type column in to each different type of airport with 1s and 0s where the frequencies matched. After using get_dummies, it looks a bit like this:
frequency_mhz, type_large_airport, type_medium_airport, type_small_airport -122.3648, 0, 1, 0
Basically I need now to ‘undo’ the get_dummies and get the type column back. I have looked at loads of similar questions and have tried using.
df = pd.get_dummies(data).idxmax(1)
but I can’t seem to get the result I need, or I am not understanding the answers enough to implement them in to my own project.
I really hope that is clear! Any help would be massively appreciated!
Advertisement
Answer
Test df:
key_a key_b key_c 0 0 1 0 1 0 1 0 2 1 0 0 3 0 0 1 4 1 0 0 5 0 1 0
Code:
df.idxmax(axis='columns')
Output:
key_a key_b key_c key 0 0 1 0 key_b 1 0 1 0 key_b 2 1 0 0 key_a 3 0 0 1 key_c 4 1 0 0 key_a 5 0 1 0 key_b
For your case, you may want to explicitly choose the columns you’re working with, aka:
df['airport_type'] = df[['type_large_airport', 'type_medium_airport', 'type_small_airport']].idxmax(axis='columns')
And then if you want to simplify the results:
df.replace(['type_large_airport', 'type_medium_airport', 'type_small_airport'], ['large', 'medium', 'small'], inplace=True)