Convert a CSV into a JSON using a JSON Schema

Question

How do I convert a flat table into a JSON? I have previously converted JSONs into Flat Tables using both custom code and libraries. However, what I am aiming to do here is the reverse. Before going ahead and creating a custom library, I was wondering if anyone had encountered this problem before and if there was an existing solution

Accepted Answer

I&#8217;m not entirely clear on why JSON schema would be needed for this, but if you wanted to, you could easily create a convenience function which can essentially &#8220;unflatten&#8221; the flat JSON that your CSV data would be mapped to, into a nested dictionary format as mentioned above.The following example should demonstrate a simplified example of how this would work. Note the following two points:In the CSV header, I&#8217;ve corrected a typo and renamed one of the columns to address.city; previously, it was adress.city, which would result in it getting mapped to another JSON path under a separate adress key, which might not be desirable.I wasn&#8217;t sure of the best way to handle this, but it looks like csv module only allows a single-character delimiter; in the CSV file, it looks like you have a comma and a space ,  as the separator, so I&#8217;ve just replaced all occurrences of this with a single comma , so that the split on the delimiter works as expected.from csv import DictReaderfrom io import StringIOfrom typing import Anycsv_data = StringIO("""user_id, address.city, address.street, address.number, name, aka, contacts.name, contacts.relationship1, Seattle, Atomic Street, 6910, Rick Sanchez, Rick, Morty, Grandson1, Seattle, Atomic Street, 6910, Rick Sanchez, Grandpa, Morty, Grandson1, Seattle, Atomic Street, 6910, Rick Sanchez, Albert Ein-douche, Morty, Grandson1, Seattle, Atomic Street, 6910, Rick Sanchez, Richard, Morty, Grandson1, Seattle, Atomic Street, 6910, Rick Sanchez, Rick, Beth, Daughter1, Seattle, Atomic Street, 6910, Rick Sanchez, Grandpa, Beth, Daughter1, Seattle, Atomic Street, 6910, Rick Sanchez, Albert Ein-douche, Beth, Daughter1, Seattle, Atomic Street, 6910, Rick Sanchez, Richard, Beth, Daughter""".replace(', ', ','))def unflatten_json(json_dict: dict):    """Unflatten a JSON dictionary object, with keys like 'a.b.c'"""    result_dict = {}    for k, v in json_dict.items():        *nested_parts, field_name = k.split('.')        obj = result_dict        for p in nested_parts:            obj = obj.setdefault(p, {})        obj[field_name] = v    return result_dictdef main():    reader = DictReader(csv_data)    flat_json: list[dict[str, Any]] = list(reader)    first_obj = flat_json[0]    nested_dict = unflatten_json(first_obj)    print('Flat JSON:   ', first_obj)    print('Nested JSON: ', nested_dict)if __name__ == '__main__':    main()The output is given below:Flat JSON:    {'user_id': '1', 'address.city': 'Seattle', 'address.street': 'Atomic Street', 'address.number': '6910', 'name': 'Rick Sanchez', 'aka': 'Rick', 'contacts.name': 'Morty', 'contacts.relationship': 'Grandson'}Nested JSON:  {'user_id': '1', 'address': {'city': 'Seattle', 'street': 'Atomic Street', 'number': '6910'}, 'name': 'Rick Sanchez', 'aka': 'Rick', 'contacts': {'name': 'Morty', 'relationship': 'Grandson'}}Note, if you want to unflatten all JSON dictionary objects in the list, you could instead use a list comprehension as below:result_list = [unflatten_json(d) for d in flat_json]I would also point out that the above solution isn&#8217;t perfect, as it will pass in everything as string values, for example in the case of 'user_id': '1'. To work around that, you can modify the unflatten_json function so it is like below:...for k, v in json_dict.items():    ...    try:        v = int(v)    except ValueError:        pass    obj[field_name] = vNow the unflattened JSON object should be as below. Note that I&#8217;m pretty printing it with json.dumps(nested_dict, indent=2) so it&#8217;s a little easier to see.{  "user_id": 1,  "address": {    "city": "Seattle",    "street": "Atomic Street",    "number": 6910  },  "name": "Rick Sanchez",  "aka": "Rick",  "contacts": {    "name": "Morty",    "relationship": "Grandson"  }}Complete SolutionThe full solution to achieve the desired output (data for all rows appended to aka and contacts) is provided below:from csv import DictReaderfrom io import StringIOfrom pprint import pprintcsv_data = StringIO("""user_id, address.city, address.street, address.number, name, aka, contacts.name, contacts.relationship1, Seattle, Atomic Street, 6910, Rick Sanchez, Rick, Morty, Grandson1, Seattle, Atomic Street, 6910, Rick Sanchez, Grandpa, Morty, Grandson1, Seattle, Atomic Street, 6910, Rick Sanchez, Albert Ein-douche, Morty, Grandson1, Seattle, Atomic Street, 6910, Rick Sanchez, Richard, Morty, Grandson1, Seattle, Atomic Street, 6910, Rick Sanchez, Rick, Beth, Daughter1, Seattle, Atomic Street, 6910, Rick Sanchez, Grandpa, Beth, Daughter1, Seattle, Atomic Street, 6910, Rick Sanchez, Albert Ein-douche, Beth, Daughter1, Seattle, Atomic Street, 6910, Rick Sanchez, Richard, Beth, Daughter""".replace(', ', ','))def unflatten_json(json_dict: dict[str, str]):    """Unflatten a JSON dictionary object, with keys like 'a.b.c'"""    result_dict = {}    for k, v in json_dict.items():        *nested_parts, field_name = k.split('.')        obj = result_dict        for p in nested_parts:            obj = obj.setdefault(p, {})        obj[field_name] = int(v) if v.isnumeric() else v    return result_dictdef main():    reader = DictReader(csv_data)    rows = list(map(unflatten_json, reader))    # retrieve the first element in the (unflattened) sequence    result_obj = rows[0]    # define list fields that we want to merge data for    list_fields = ('aka', 'contacts')    # now loop through, and for all rows merge the data for these fields    for field in list_fields:        result_obj[field] = [row[field] for row in rows]    print('Result object:')    pprint(result_obj)if __name__ == '__main__':    main()This should have the desired result as also noted in the question:Result object:{'address': {'city': 'Seattle', 'number': 6910, 'street': 'Atomic Street'}, 'aka': ['Rick',         'Grandpa',         'Albert Ein-douche',         'Richard',         'Rick',         'Grandpa',         'Albert Ein-douche',         'Richard'], 'contacts': [{'name': 'Morty', 'relationship': 'Grandson'},              {'name': 'Morty', 'relationship': 'Grandson'},              {'name': 'Morty', 'relationship': 'Grandson'},              {'name': 'Morty', 'relationship': 'Grandson'},              {'name': 'Beth', 'relationship': 'Daughter'},              {'name': 'Beth', 'relationship': 'Daughter'},              {'name': 'Beth', 'relationship': 'Daughter'},              {'name': 'Beth', 'relationship': 'Daughter'}], 'name': 'Rick Sanchez', 'user_id': 1}

Advertisement

Answer

Complete Solution