Skip to content
Advertisement

Convert a CSV into a JSON using a JSON Schema

How do I convert a flat table into a JSON?

I have previously converted JSONs into Flat Tables using both custom code and libraries. However, what I am aiming to do here is the reverse. Before going ahead and creating a custom library, I was wondering if anyone had encountered this problem before and if there was an existing solution to it.

When you flatten a JSON into a CSV, you loose the information on the structure, and therefore to reverse it, you need a document that describes how the JSON should be built, which ideally would be the standardised JSON Schema.

The following example shows a source CSV, the JSON Schema and the expected output.

User CSV

JavaScript

JSON Schema

This follows the defined standard with the addition of the “source” property. I am suggesting adding this custom property to this specific problem in order to map between the csv columns and the JSON values (leafs).

JavaScript

Expected JSON

JavaScript

From the above we see that although there are 8 rows in the CSV, we are producing a single JSON Object (instead of 8) since there is only one unique user (user_id = 1). This could be inferred from the JSON Schema where the root element is an object and not a list.

If we did not specify a JSON Schema or other kind of mapping, you could simply assume no structure and just create 8 flat jsons as below

JavaScript

I am adding the Python tag since that is the language I use mostly, but in this case, the solution doesn’t need to be in Python.

Advertisement

Answer

I’m not entirely clear on why JSON schema would be needed for this, but if you wanted to, you could easily create a convenience function which can essentially “unflatten” the flat JSON that your CSV data would be mapped to, into a nested dictionary format as mentioned above.

The following example should demonstrate a simplified example of how this would work. Note the following two points:

  • In the CSV header, I’ve corrected a typo and renamed one of the columns to address.city; previously, it was adress.city, which would result in it getting mapped to another JSON path under a separate adress key, which might not be desirable.

  • I wasn’t sure of the best way to handle this, but it looks like csv module only allows a single-character delimiter; in the CSV file, it looks like you have a comma and a space , as the separator, so I’ve just replaced all occurrences of this with a single comma , so that the split on the delimiter works as expected.

JavaScript

The output is given below:

JavaScript

Note, if you want to unflatten all JSON dictionary objects in the list, you could instead use a list comprehension as below:

JavaScript

I would also point out that the above solution isn’t perfect, as it will pass in everything as string values, for example in the case of 'user_id': '1'. To work around that, you can modify the unflatten_json function so it is like below:

JavaScript

Now the unflattened JSON object should be as below. Note that I’m pretty printing it with json.dumps(nested_dict, indent=2) so it’s a little easier to see.

JavaScript

Complete Solution

The full solution to achieve the desired output (data for all rows appended to aka and contacts) is provided below:

JavaScript

This should have the desired result as also noted in the question:

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement