I have the json in this format:
{ "fields": { "tcidte": { "mode": "required", "type": "date", "format": "%Y%m%d" }, "tcmcid": { "mode": "required", "type": "string" }, "tcacbr": { "mode": "required", "type": "string" } } }
I want it to be in a dataframe format where each of the three field names are separate rows. Where one row has a column(e.g “format”) where others are blank should be assumed to be NULL.
I have tried to use the flatten_json function which I found on here, but doesn’t work as expected but will still include here:
def flatten_json(nested_json, exclude=['']): """Flatten json object with nested keys into a single level. Args: nested_json: A nested json object. exclude: Keys to exclude from output. Returns: The flattened json object if successful, None otherwise. """ out = {} def flatten(x, name='', exclude=exclude): if type(x) is dict: for a in x: if a not in exclude: flatten(x[a], name + a + '_') elif type(x) is list: i = 0 for a in x: flatten(a, name + str(i) + '_') i += 1 else: out[name[:-1]] = x flatten(nested_json) return out flatten_json_file = pd.DataFrame(flatten_json(nested_json)) pprint.pprint(flatten_json_file)
Additional Complexity JSON:
{ "fields": { "action": { "type": { "field_type": "string" }, "mode": "required" }, "upi": { "type": { "field_type": "string" }, "regex": "^[0-9]{9}$", "mode": "required" }, "firstname": { "type": { "field_type": "string" }, "mode": "required" } } }
Advertisement
Answer
With
data = { "fields": { "tcidte": { "mode": "required", "type": "date", "format": "%Y%m%d" }, "tcmcid": { "mode": "required", "type": "string" }, "tcacbr": { "mode": "required", "type": "string" } } }
this
df = pd.DataFrame(data["fields"].values())
results in
mode type format 0 required date %Y%m%d 1 required string NaN 2 required string NaN
Is that your goal?
If you want the keys of data["fields"]
as index:
df = pd.DataFrame(data["fields"]).T
or
df = pd.DataFrame.from_dict(data["fields"], orient="index")
both result in
mode type format tcidte required date %Y%m%d tcmcid required string NaN tcacbr required string NaN
With
data = { "fields": { "action": { "type": { "field_type": "string" }, "mode": "required" }, "upi": { "type": { "field_type": "string" }, "regex": "^[0-9]{9}$", "mode": "required" }, "firstname": { "type": { "field_type": "string" }, "mode": "required" } } }
you could either do
data = {key: {**d, **d["type"]} for key, d in data["fields"].items()} df = pd.DataFrame.from_dict(data, orient="index").drop(columns="type")
or
df = pd.DataFrame.from_dict(data["fields"], orient="index") df = pd.concat( [df, pd.DataFrame(df.type.to_list(), index=df.index)], axis=1 ).drop(columns="type")
with a result like (column positions may differ)
mode field_type regex action required string NaN upi required string ^[0-9]{9}$ firstname required string NaN