Skip to content
Advertisement

Nested JSON to Multiple Dataframe in Pandas

I am trying to build a tool which can take any JSON data and convert that into multiple data frame based on data types. I am trying to add each data frame with a relation so that we can identify which data belong to which parent element(key).

For Example :

JavaScript

I wanted to have data frame such as

JavaScript

And have a relations with UUID to identify which key belongs to which parents id. I am trying the flattening approach to solve this problem using python and pandas . But my solution does not works for nested JSON.

Here is what I am trying.

JavaScript

Update

The reason I wanted to create multiple dataframe is to put this data into Datalake and later access it via Athena in AWS. Once I get the dataframe I can move them into SQL tables.

Advertisement

Answer

The structure you are describing – a JSON of an indefinitely defined number of nested JSONs – fits exactly with a tree data structure. Since we are looking to store the ID of the parent JSON in each dataframe, we will approach this with BFS (breadth first search) a.k.a. level order traversal. This is a common graph traversal algorithm well suited to this kind of problem.

If a element has an id of None, it indicates it is the root or top level element.

JavaScript

Output:

JavaScript

Level Order Traversal

Breadth First Search

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement