I have data in below format.
data = {"policy": {"1": {"ID": "ML_0", "URL": "www.a.com", "Text": "my name is Martin and here is my code"} "2": {"ID": "ML_1", "URL": "www.b.com", "Plain_Text" my name is Mikal and here is my code"}}} keywords = ['is', 'my']
Here are few things I want to do with my data in python.
First to iterate over my dictionary and to find and count the keywords mentioned above in the value of “Text” both in “1” and “2” and last thing is to update the current dictionary with keywords counts (no of times keywords mentioned in “1” and “2” like below.
{"policy": {"1": {"ID": "ML_0", "URL": "www.a.com", "Text": "my name is Martin and here is my code", "is": "2", "my": "2"} "2": {"ID": "ML_1", "URL": "www.b.com", "Plain_Text: "my name is Mikal and here is my code", "is": "2", "my": "2"}}}
If anyone can help me, would be thankful.
Advertisement
Answer
You could use collections.Counter
:
from collections import Counter import json # Only for pretty printing `data` dictionary. def get_keyword_counts(text: str, keywords: list[str]) -> dict[str, int]: return { word: count for word, count in Counter(text.split()).items() if word in set(keywords) } def main() -> None: data = { "policy": { "1": { "ID": "ML_0", "URL": "www.a.com", "Text": "my name is Martin and here is my code" }, "2": { "ID": "ML_1", "URL": "www.b.com", "Text": "my name is Mikal and here is my code" } } } keywords = ['is', 'my'] for policy in data['policy'].values(): policy |= get_keyword_counts(policy['Text'], keywords) print(json.dumps(data, indent=4)) if __name__ == '__main__': main()
Output:
{ "policy": { "1": { "ID": "ML_0", "URL": "www.a.com", "Text": "my name is Martin and here is my code", "my": 2, "is": 2 }, "2": { "ID": "ML_1", "URL": "www.b.com", "Text": "my name is Mikal and here is my code", "my": 2, "is": 2 } } }
Note: Using |=
to merge dicts is a Python 3.10 feature. Should not be hard to google how to do it if you are using an older version.