Skip to content
Advertisement

Calculating the semantic descriptor of a nested list

I am trying to calculate the semantic description of a nested list to turn it into a nested dictionary. First I got distinct_words, each word of it will be the keys of my final dictionary.

def build_semantic_descriptors(sentences):
    flat_list = [term for group in sentences for term in group]
    distinct_words = set(flat_list)

    d = {}
    for row in sentences:
        for words in row:
            if words not in d:
                d[words] = 1
            else:
                d[words] += 1 


if __name__ == '__main__':
         x = [["i", "am", "a", "sick", "man"],
              ["i", "am", "a", "spiteful", "man"],
              ["i", "am", "an", "unattractive", "man"],
              ["i", "believe", "my", "liver", "is", "diseased"],
              ["however", "i", "know", "nothing", "at", "all", "about", "my",
               "disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]]
    print(build_semantic_descriptors(x))

EXPECTED OUTPUT: {'i': {'am': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1, 'believe': 1, 'my': 2, 'liver': 1, 'is': 1, 'diseased': 1, 'however': 1, 'know': 1, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'am': {'i': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1}, etc…}

At this moment this is my code. I already got the words I want as the keys, but I don’t know how to count the words related to them and put into the final dictionary, I’ve tried using the counter above, but what it does is calculate the overall value of appearences.

Thanks in advance for any help.

Advertisement

Answer

Try this:

from collections import defaultdict
from itertools import product


def build_semantic_descriptors(sentences):
    d = defaultdict(lambda: defaultdict(int))

    for sentence in sentences:
        should_skip_key = True
        for (key, word) in product(sentence, sentence):
            if key == word and should_skip_key:
                should_skip_key = False
                continue
            d[key][word] += 1
    return d


if __name__ == '__main__':
    x = [["i", "am", "a", "sick", "man"],
          ["i", "am", "a", "spiteful", "man"],
          ["i", "am", "an", "unattractive", "man"],
          ["i", "believe", "my", "liver", "is", "diseased"],
          ["however", "i", "know", "nothing", "at", "all", "about", "my",
           "disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]]
    print(build_semantic_descriptors(x))

You need to loop each sentence twice, in order to get each word for each key. For this you can use itertools.product.

Also note that I use here collections.defaultdict which you should read about, it is a nice utility that sets the dictionary with a default if the key does not exist (allowing to skip the check that you had)

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement