Normalization and flattening of JSON column in a mixed type dataframe

Question

There dataframe below has columns with mixed types. Column of interest for expansion is "Info". Each row value in this column is a JSON object. I would like to have the headers expanded i.e. have "Info.id","info.x_y_cord","info.neutral" etc as individual columns with corresponding values under them across the dataset. I've tried normalizing them via pd.json_normalize(df["Info"]) iteration but nothing seems to change.

Accepted Answer

First of all, your JSON strings seem to be not valid because of the ID value. 001 is not processed correctly so you&#8217;ll need to pass the &#8220;id&#8221; value as a string instead. Here&#8217;s one way to do that:def id_as_string(matchObj):    # Adds " around the ID value    return f""id":"{matchObj.group(1)}","df["Info"] = df["Info"].str.replace(""id":(d*),", repl=id_to_string, regex=True))Once you&#8217;ve done that, you can use pd.json_normalize on your &#8220;Info&#8221; column after you&#8217;ve loaded the values from the JSON strings using json.loads:import jsonjson_part_df = pd.json_normalize(df["Info"].map(json.loads))After that, just rename the columns and use pd.concat to form the output dataframe:# Rename columnsjson_part_df.columns = [f"Info.{column}" for column in json_part_df.columns]# Use pd.concat to create outputdf = pd.concat([df[["Code", "Region"]], json_part_df], axis=1)

Advertisement

Answer