Skip to content
Advertisement

Iterating through nested dictionaries and find the keywords in the value of dictionary in python

I have data in below format.

data = {"policy": {"1": {"ID": "ML_0", "URL": "www.a.com", "Text": "my name is Martin and here is my code"} "2": {"ID": "ML_1", "URL": "www.b.com", "Plain_Text" my name is Mikal and here is my code"}}}


keywords = ['is', 'my']

Here are few things I want to do with my data in python.

First to iterate over my dictionary and to find and count the keywords mentioned above in the value of “Text” both in “1” and “2” and last thing is to update the current dictionary with keywords counts (no of times keywords mentioned in “1” and “2” like below.

{"policy": {"1": {"ID": "ML_0", "URL": "www.a.com", "Text": "my name is Martin and here is my code", "is": "2", "my": "2"} "2": {"ID": "ML_1", "URL": "www.b.com", "Plain_Text: "my name is Mikal and here is my code", "is": "2", "my": "2"}}}

If anyone can help me, would be thankful.

Advertisement

Answer

You could use collections.Counter:

from collections import Counter
import json  # Only for pretty printing `data` dictionary.


def get_keyword_counts(text: str, keywords: list[str]) -> dict[str, int]:
    return {
        word: count for word, count in Counter(text.split()).items()
        if word in set(keywords)
    }


def main() -> None:
    data = {
        "policy": {
            "1": {
                "ID": "ML_0",
                "URL": "www.a.com",
                "Text": "my name is Martin and here is my code"
            },
            "2": {
                "ID": "ML_1",
                "URL": "www.b.com",
                "Text": "my name is Mikal and here is my code"
            }
        }
    }
    keywords = ['is', 'my']
    for policy in data['policy'].values():
        policy |= get_keyword_counts(policy['Text'], keywords)
    print(json.dumps(data, indent=4))


if __name__ == '__main__':
    main()

Output:

{
    "policy": {
        "1": {
            "ID": "ML_0",
            "URL": "www.a.com",
            "Text": "my name is Martin and here is my code",
            "my": 2,
            "is": 2
        },
        "2": {
            "ID": "ML_1",
            "URL": "www.b.com",
            "Text": "my name is Mikal and here is my code",
            "my": 2,
            "is": 2
        }
    }
}

Note: Using |= to merge dicts is a Python 3.10 feature. Should not be hard to google how to do it if you are using an older version.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement