I have the json in this format:
JavaScript
x
18
18
1
{
2
"fields": {
3
"tcidte": {
4
"mode": "required",
5
"type": "date",
6
"format": "%Y%m%d"
7
},
8
"tcmcid": {
9
"mode": "required",
10
"type": "string"
11
},
12
"tcacbr": {
13
"mode": "required",
14
"type": "string"
15
}
16
}
17
}
18
I want it to be in a dataframe format where each of the three field names are separate rows. Where one row has a column(e.g “format”) where others are blank should be assumed to be NULL.
I have tried to use the flatten_json function which I found on here, but doesn’t work as expected but will still include here:
JavaScript
1
28
28
1
def flatten_json(nested_json, exclude=['']):
2
"""Flatten json object with nested keys into a single level.
3
Args:
4
nested_json: A nested json object.
5
exclude: Keys to exclude from output.
6
Returns:
7
The flattened json object if successful, None otherwise.
8
"""
9
out = {}
10
11
def flatten(x, name='', exclude=exclude):
12
if type(x) is dict:
13
for a in x:
14
if a not in exclude: flatten(x[a], name + a + '_')
15
elif type(x) is list:
16
i = 0
17
for a in x:
18
flatten(a, name + str(i) + '_')
19
i += 1
20
else:
21
out[name[:-1]] = x
22
23
flatten(nested_json)
24
return out
25
26
flatten_json_file = pd.DataFrame(flatten_json(nested_json))
27
pprint.pprint(flatten_json_file)
28
Additional Complexity JSON:
JavaScript
1
24
24
1
{
2
"fields": {
3
"action": {
4
"type": {
5
"field_type": "string"
6
},
7
"mode": "required"
8
},
9
"upi": {
10
"type": {
11
"field_type": "string"
12
},
13
"regex": "^[0-9]{9}$",
14
"mode": "required"
15
},
16
"firstname": {
17
"type": {
18
"field_type": "string"
19
},
20
"mode": "required"
21
}
22
}
23
}
24
Advertisement
Answer
With
JavaScript
1
18
18
1
data = {
2
"fields": {
3
"tcidte": {
4
"mode": "required",
5
"type": "date",
6
"format": "%Y%m%d"
7
},
8
"tcmcid": {
9
"mode": "required",
10
"type": "string"
11
},
12
"tcacbr": {
13
"mode": "required",
14
"type": "string"
15
}
16
}
17
}
18
this
JavaScript
1
2
1
df = pd.DataFrame(data["fields"].values())
2
results in
JavaScript
1
5
1
mode type format
2
0 required date %Y%m%d
3
1 required string NaN
4
2 required string NaN
5
Is that your goal?
If you want the keys of data["fields"]
as index:
JavaScript
1
2
1
df = pd.DataFrame(data["fields"]).T
2
or
JavaScript
1
2
1
df = pd.DataFrame.from_dict(data["fields"], orient="index")
2
both result in
JavaScript
1
5
1
mode type format
2
tcidte required date %Y%m%d
3
tcmcid required string NaN
4
tcacbr required string NaN
5
With
JavaScript
1
24
24
1
data = {
2
"fields": {
3
"action": {
4
"type": {
5
"field_type": "string"
6
},
7
"mode": "required"
8
},
9
"upi": {
10
"type": {
11
"field_type": "string"
12
},
13
"regex": "^[0-9]{9}$",
14
"mode": "required"
15
},
16
"firstname": {
17
"type": {
18
"field_type": "string"
19
},
20
"mode": "required"
21
}
22
}
23
}
24
you could either do
JavaScript
1
3
1
data = {key: {**d, **d["type"]} for key, d in data["fields"].items()}
2
df = pd.DataFrame.from_dict(data, orient="index").drop(columns="type")
3
or
JavaScript
1
5
1
df = pd.DataFrame.from_dict(data["fields"], orient="index")
2
df = pd.concat(
3
[df, pd.DataFrame(df.type.to_list(), index=df.index)], axis=1
4
).drop(columns="type")
5
with a result like (column positions may differ)
JavaScript
1
5
1
mode field_type regex
2
action required string NaN
3
upi required string ^[0-9]{9}$
4
firstname required string NaN
5