Skip to content
Advertisement

Normalization and flattening of JSON column in a mixed type dataframe

There dataframe below has columns with mixed types. Column of interest for expansion is “Info”. Each row value in this column is a JSON object.

JavaScript

I would like to have the headers expanded i.e. have “Info.id”,”info.x_y_cord”,”info.neutral” etc as individual columns with corresponding values under them across the dataset. I’ve tried normalizing them via pd.json_normalize(df[“Info”]) iteration but nothing seems to change. Do I need to convert the column to another type first? Can someone point me to the right direction?

The output should be something like this:

JavaScript

Advertisement

Answer

First of all, your JSON strings seem to be not valid because of the ID value. 001 is not processed correctly so you’ll need to pass the “id” value as a string instead. Here’s one way to do that:

JavaScript

Once you’ve done that, you can use pd.json_normalize on your “Info” column after you’ve loaded the values from the JSON strings using json.loads:

JavaScript

After that, just rename the columns and use pd.concat to form the output dataframe:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement