Nested JSON to Multiple Dataframe in Pandas

Question

I am trying to build a tool which can take any JSON data and convert that into multiple data frame based on data types. I am trying to add each data frame with a relation so that we can identify which data belong to which parent element(key). For Example : I wanted to have data frame such as And have

Accepted Answer

The structure you are describing &#8211; a JSON of an indefinitely defined number of nested JSONs &#8211; fits exactly with a tree data structure. Since we are looking to store the ID of the parent JSON in each dataframe, we will approach this with BFS (breadth first search) a.k.a. level order traversal. This is a common graph traversal algorithm well suited to this kind of problem.If a element has an id of None, it indicates it is the root or top level element.import jsonimport pandas as pdimport uuiddef nested_json_list_df(file : str):    dict_list = []    def bfs(json_dict : dict, node_name : str, parent_uuid : uuid.UUID):        """Breadth First Search a.k.a. Level order Traversal        """        # Create parent node        uuid_val = uuid.uuid1()        out_dict = {'id': node_name, 'uuid': uuid_val, 'parent id': parent_uuid}        # Search child nodes        for key, val in json_dict.items():            # If a child node is a dict itself, it is a sub-nested JSON            if isinstance(val, dict):                bfs(val, key, uuid_val)            # A list of single-nested dicts is simply a new entry            # A list containing dicts within dicts is interpreted            # as another nested JSON            elif isinstance(val, list):                new_val = []                for v in val:                    if isinstance(v, dict):                        new_dict = dict()                        for key2, val2 in v.items():                            # Indicates nested JSONs                            if isinstance(val2, dict):                                bfs(val2, key, uuid_val)                            else:                                new_dict[key2] = val2                        new_val.append(new_dict)                    else:                        new_val.append(v)                uuid2 = uuid.uuid1()                out_dict2 = {'id': key, 'uuid': uuid2, 'parent id': parent_uuid, key : new_val}                dict_list.append({key : out_dict2})            else:                out_dict[key] = val                dict_list.append({node_name : out_dict})        return dict_list        ## Run BFS ##    with open(file) as f:        json_dict = json.load(f)    df_list = []    for d in bfs(json_dict, file, None):        df_list.append(pd.DataFrame(d))    return df_listdf_list = nested_json_list_df('temp.json')for df in df_list:     print(df)Output:                                                    StudentsStudents   [{'id': ',some id', 'value': 'some val'}, {'id...id                                                  Studentsparent id                                               Noneuuid                    2d68cce3-c7f7-11ec-81a3-b0227ae68aa0                                                       ErrorError      [{'id': ',some id', 'code': 'some code', 'emes...id                                                     Errorparent id                                               Noneuuid                    2d68cce4-c7f7-11ec-b0da-b0227ae68aa0                                      temp.jsondate                                 12:23:2022id                                    temp.jsonname                                  Some nameparent id                                  Noneuuid       2d68cce2-c7f7-11ec-bcd8-b0227ae68aa0Level Order TraversalBreadth First Search

Advertisement

Answer