I have a complex nested dictionary with multiple levels. I wish to create a multilevel index pandas dataframe from the dict.
The dict takes the format:
{
"bucket_0" : [
{
"key" : 'foo',
"level_1" : {
"bucket_1" : [
{
"key" : 'bar',
"level_2" : {
"bucket_2" : [
{
"key" : 'baz',
"max" : {
"value" : 100
}
},
.
.
.
]
}
},
.
.
.
]
}
},
.
.
.
]
}
Basically, there are 3 arrays, each containing dicts. In the outer most level, bucket_0
, we have X number of foo
dicts. In the next level, bucket_1
, we have Y number of bar
dicts. In the last level, bucket_2
, we have Z number of baz
dicts.
I know I can iterate thru using 3 for loops, but that’s obviously not desirable. I’m aware of pandas json_normalize()
method but am unsure how to use this effectively, especially when trying to create a multilevel index.
My desired result would be a multilevel index dataframe where a row would look like:
['foo', 'bar', 'baz', 'max.value']
Also tagged as Elasticsearch since this is a result from a nested aggregation query.
Advertisement
Answer
You can systematically expand the structure and build the index
js = {'bucket_0': [{'key': 'foo',
'level_1': {'bucket_1': [{'key': 'bar',
'level_2': {'bucket_2': [{'key': 'baz', 'max': {'value': 100}}]}}]}}]}
df = pd.json_normalize(js["bucket_0"]).explode("level_1.bucket_1").set_index("key")
df = df["level_1.bucket_1"].apply(pd.Series).set_index("key", append=True)
df = df["level_2"].apply(pd.Series).explode("bucket_2")
df = df["bucket_2"].apply(pd.Series).set_index("key", append=True)
df["max"].apply(pd.Series).rename_axis(index=["l1","l2","l3"])
output
value
l1 l2 l3
foo bar baz 100