There dataframe below has columns with mixed types. Column of interest for expansion is “Info”. Each row value in this column is a JSON object.
data = {'Code':['001', '002', '003', '004'],
df = pd.DataFrame(data)
I would like to have the headers expanded i.e. have “”,”info.x_y_cord”,”info.neutral” etc as individual columns with corresponding values under them across the dataset. I’ve tried normalizing them via pd.json_normalize(df[“Info”]) iteration but nothing seems to change. Do I need to convert the column to another type first? Can someone point me to the right direction?
The output should be something like this:
data1 = {'Code':['001', '002', '003', '004'],
df_final = pd.DataFrame(data1)
First of all, your JSON strings seem to be not valid because of the ID value. 001
is not processed correctly so you’ll need to pass the “id” value as a string instead. Here’s one way to do that:
def id_as_string(matchObj):
# Adds " around the ID value
return f""id":"{}","
df["Info"] = df["Info"].str.replace(""id":(d*),", repl=id_to_string, regex=True))
Once you’ve done that, you can use pd.json_normalize
on your “Info” column after you’ve loaded the values from the JSON strings using json.loads
import json
json_part_df = pd.json_normalize(df["Info"].map(json.loads))
After that, just rename the columns and use pd.concat
to form the output dataframe:
# Rename columns
json_part_df.columns = [f"Info.{column}" for column in json_part_df.columns]
# Use pd.concat to create output
df = pd.concat([df[["Code", "Region"]], json_part_df], axis=1)