Skip to content
Advertisement

Dictionary of Words as keys and the Sentences it appears in as values

I have a text which I split into a list of unique words using set. I also have split the text into a list of sentences. I then split that list of sentences into a list of lists (of the words in each sentence / maybe I don’t need to do the last part)

text = 'i was hungry. i got food. now i am not hungry i am full'

sents = ['i was hungry', 'i got food', 'now i am', 'not hungry i am full']

words = ['i', 'was', 'hungry', 'got', 'food', 'now', 'not', 'am', 'full']

split_sents = [['i', 'was', 'hungry'], ['i', 'got', 'food'], ['now', 'i', 'am', 'not','hungry','i','am','full']]

I want to write a loop or a list comprehension that makes a dictionary where each word in words is a key and if the word appears in a sentence each sentence is captured as a list value so I can then get some statistics like the count of sentences but also the average length of the sentences for each word…so far I have the following but it’s not right.

word_freq = {}
for sent in split_sents:
  for word in words:
    if word in sent:
      word_freq[word] += sent
    else:
      word_freq[word] = sent

it returns a dictionary of word keys and empty values. Ideally, I’d like to do it without collections/counter though any solution is appriciated. I’m sure this question has been asked before but I couldn’t find the right solution so feel free to link and close if you link to a solution.

Advertisement

Answer

Here is an approach using list and dictionary comprehension

Code:

text = 'i was hungry. i got food. now i am not hungry i am full'

sents = ['i was hungry', 'i got food', 'now i am', 'not hungry i am full']

words = ['i', 'was', 'hungry', 'got', 'food', 'now', 'not', 'am', 'full']


word_freq = {w:[s for s in sents if w in s.split()] for w in words }

print(word_freq)

Output:

{
'i': ['i was hungry', 'i got food', 'now i am', 'not hungry i am full'], 
'was': ['i was hungry'], '
hungry': ['i was hungry', 'not hungry i am full'], 
'got': ['i got food'], 
'food': ['i got food'], 
'now': ['now i am'], 
'not': ['not hungry i am full'], 
'am': ['now i am', 'not hungry i am full'], 
'full': ['not hungry i am full']
}

Or if you want output sentences as list of words:

word_freq = {w:[s.split() for s in sents if w in s.split()] for w in words }

Output:

{
'i': [['i', 'was', 'hungry'], ['i', 'got', 'food'], ['now', 'i', 'am'], ['not', 'hungry', 'i', 'am', 'full']], 
'was': [['i', 'was', 'hungry']], 
'hungry': [['i', 'was', 'hungry'], ['not', 'hungry', 'i', 'am', 'full']], 
'got': [['i', 'got', 'food']], 
'food': [['i', 'got', 'food']], 
'now': [['now', 'i', 'am']], 
'not': [['not', 'hungry', 'i', 'am', 'full']], 
'am': [['now', 'i', 'am'], ['not', 'hungry', 'i', 'am', 'full']], 
'full': [['not', 'hungry', 'i', 'am', 'full']]}

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement