Python: How to One Hot Encode a Feature with multiple values?

Question

I have the following dataframe df with names of the travelling cities in route column of an aircraft with it's ticket_price. I want to obtain individual city names from route and one hot encode them. Dataframe (df) Required Dataframe (df_encoded) Code I have performed some preprocessing on the route column using the following code but am unable to understand how

Accepted Answer

If you have dataframe:   id                      route  ticket_price0   1  Mumbai - Pune - Bangalore         100001   2               Pune - Delhi          70002   3               Delhi - Pune          6500Then:df.route = df.route.str.split(" - ")df_out = pd.concat(    [        df.explode("route")        .pivot_table(index="id", columns="route", aggfunc="size", fill_value=0)        .add_prefix("Route_"),        df.set_index("id").ticket_price,    ],    axis=1,)print(df_out)Prints:    Route_Bangalore  Route_Delhi  Route_Mumbai  Route_Pune  ticket_priceid                                                                      1                 1            0             1           1         100002                 0            1             0           1          70003                 0            1             0           1          6500

Advertisement

Answer