I am trying to calculate the semantic description of a nested list to turn it into a nested dictionary. First I got distinct_words
, each word of it will be the keys of my final dictionary.
def build_semantic_descriptors(sentences): flat_list = [term for group in sentences for term in group] distinct_words = set(flat_list) d = {} for row in sentences: for words in row: if words not in d: d[words] = 1 else: d[words] += 1 if __name__ == '__main__': x = [["i", "am", "a", "sick", "man"], ["i", "am", "a", "spiteful", "man"], ["i", "am", "an", "unattractive", "man"], ["i", "believe", "my", "liver", "is", "diseased"], ["however", "i", "know", "nothing", "at", "all", "about", "my", "disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]] print(build_semantic_descriptors(x))
EXPECTED OUTPUT: {'i': {'am': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1, 'believe': 1, 'my': 2, 'liver': 1, 'is': 1, 'diseased': 1, 'however': 1, 'know': 1, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'am': {'i': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1},
etc…}
At this moment this is my code. I already got the words I want as the keys, but I don’t know how to count the words related to them and put into the final dictionary, I’ve tried using the counter above, but what it does is calculate the overall value of appearences.
Thanks in advance for any help.
Advertisement
Answer
Try this:
from collections import defaultdict from itertools import product def build_semantic_descriptors(sentences): d = defaultdict(lambda: defaultdict(int)) for sentence in sentences: should_skip_key = True for (key, word) in product(sentence, sentence): if key == word and should_skip_key: should_skip_key = False continue d[key][word] += 1 return d if __name__ == '__main__': x = [["i", "am", "a", "sick", "man"], ["i", "am", "a", "spiteful", "man"], ["i", "am", "an", "unattractive", "man"], ["i", "believe", "my", "liver", "is", "diseased"], ["however", "i", "know", "nothing", "at", "all", "about", "my", "disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]] print(build_semantic_descriptors(x))
You need to loop each sentence twice, in order to get each word for each key. For this you can use itertools.product
.
Also note that I use here collections.defaultdict
which you should read about, it is a nice utility that sets the dictionary with a default if the key does not exist (allowing to skip the check that you had)