Skip to content
Advertisement

Seach in JSON with variable depth and structure

I have some highly nested JSON files I need to work with.

A short example:

{
   "coffee":[
      {
         "value":"coffee"
      },
      {
         "value":"water"
      }
   ],
   "cake":{
      "value":{
         "dough":[
            {
               "value":"2",
               "name":"eggs"
            },
            {
               "value":"500g",
               "name":"flour"
            },
            {
               "value":{
                  "almondpaste":[
                  {
                    "value":"300g",
                    "name":"almonds"
                  },
                  {
                  "value":"200g",
                  "name":"oil"
                  },
                  {
                  "value":"200g",
                    "name":"sugar"
                  },
                  ]
                },
            {
               "value":"200g",
               "name":"sugar"
            },
            .
            .
            .
            .
            .
            .

I would now like to read all names from the JSON file and write them into a list. This is not particularly difficult if the JSON file has a fixed structure. However, my JSON files have a variable structure and variable depth. Sometimes everything happens on one level, but there are also files that go up to level 4 or 5. I would now like to create a variable solution that iterates over all layers of the JSON and searches for certain keys.

I have already tried something in the following direction, but I always get error messages.

list = []
for k for val in json_file for d in val for j in d.keys():
    if k== "name":
        list.append(k['name'])
    if d=="name":
        list.append(k['name'])
    if j=="name":
        list.append(k['name'])
print(list)

Error:

for k for val in json_file for d in val for j in d.keys():
              ^
    SyntaxError: invalid syntax

Maybe someone has a code sample that could solve my problem and from which I could develop an idea for myself?

Answer

You can define this function:

def iterate(data):
    if isinstance(data, list):
        for item in data:
            yield from iterate(item)
    elif isinstance(data, dict):
        for key, item in data.items():
            if key == 'name':
                yield item
            else:
                yield from iterate(item)

And then you can use it like this (data is your json data):

result = list(iterate(data))

Let’s do an example. This is your input data:

>>> data
{'coffee': [{'value': 'coffee'}, {'value': 'water'}], 'cake': {'value': {'dough': [{'value': '2', 'name': 'eggs'}, {'value': '500g', 'name': 'flour'}, {'value': {'almondpaste': [{'value': '300g', 'name': 'almonds'}, {'value': '200g', 'name': 'oil'}, {'value': '200g', 'name': 'sugar'}]}}, {'value': '200g', 'name': 'sugar'}]}}}

Here is the output:

>>> list(iterate(data))
['eggs', 'flour', 'almonds', 'oil', 'sugar', 'sugar']
Advertisement