I have a complex nested dictionary with multiple levels. I wish to create a multilevel index pandas dataframe from the dict.
The dict takes the format:
{ "bucket_0" : [ { "key" : 'foo', "level_1" : { "bucket_1" : [ { "key" : 'bar', "level_2" : { "bucket_2" : [ { "key" : 'baz', "max" : { "value" : 100 } }, . . . ] } }, . . . ] } }, . . . ] }
Basically, there are 3 arrays, each containing dicts. In the outer most level, bucket_0
, we have X number of foo
dicts. In the next level, bucket_1
, we have Y number of bar
dicts. In the last level, bucket_2
, we have Z number of baz
dicts.
I know I can iterate thru using 3 for loops, but that’s obviously not desirable. I’m aware of pandas json_normalize()
method but am unsure how to use this effectively, especially when trying to create a multilevel index.
My desired result would be a multilevel index dataframe where a row would look like:
['foo', 'bar', 'baz', 'max.value']
Also tagged as Elasticsearch since this is a result from a nested aggregation query.
Advertisement
Answer
You can systematically expand the structure and build the index
js = {'bucket_0': [{'key': 'foo', 'level_1': {'bucket_1': [{'key': 'bar', 'level_2': {'bucket_2': [{'key': 'baz', 'max': {'value': 100}}]}}]}}]} df = pd.json_normalize(js["bucket_0"]).explode("level_1.bucket_1").set_index("key") df = df["level_1.bucket_1"].apply(pd.Series).set_index("key", append=True) df = df["level_2"].apply(pd.Series).explode("bucket_2") df = df["bucket_2"].apply(pd.Series).set_index("key", append=True) df["max"].apply(pd.Series).rename_axis(index=["l1","l2","l3"])
output
value l1 l2 l3 foo bar baz 100