I have a text which I split into a list of unique words using set. I also have split the text into a list of sentences. I then split that list of sentences into a list of lists (of the words in each sentence / maybe I don’t need to do the last part)
text = 'i was hungry. i got food. now i am not hungry i am full'
sents = ['i was hungry', 'i got food', 'now i am', 'not hungry i am full']
words = ['i', 'was', 'hungry', 'got', 'food', 'now', 'not', 'am', 'full']
split_sents = [['i', 'was', 'hungry'], ['i', 'got', 'food'], ['now', 'i', 'am', 'not','hungry','i','am','full']]
I want to write a loop or a list comprehension that makes a dictionary where each word in words is a key and if the word appears in a sentence each sentence is captured as a list value so I can then get some statistics like the count of sentences but also the average length of the sentences for each word…so far I have the following but it’s not right.
word_freq = {}
for sent in split_sents:
for word in words:
if word in sent:
word_freq[word] += sent
else:
word_freq[word] = sent
it returns a dictionary of word keys and empty values. Ideally, I’d like to do it without collections/counter though any solution is appriciated. I’m sure this question has been asked before but I couldn’t find the right solution so feel free to link and close if you link to a solution.
Advertisement
Answer
Here is an approach using list and dictionary comprehension
Code:
text = 'i was hungry. i got food. now i am not hungry i am full'
sents = ['i was hungry', 'i got food', 'now i am', 'not hungry i am full']
words = ['i', 'was', 'hungry', 'got', 'food', 'now', 'not', 'am', 'full']
word_freq = {w:[s for s in sents if w in s.split()] for w in words }
print(word_freq)
Output:
{
'i': ['i was hungry', 'i got food', 'now i am', 'not hungry i am full'],
'was': ['i was hungry'], '
hungry': ['i was hungry', 'not hungry i am full'],
'got': ['i got food'],
'food': ['i got food'],
'now': ['now i am'],
'not': ['not hungry i am full'],
'am': ['now i am', 'not hungry i am full'],
'full': ['not hungry i am full']
}
Or if you want output sentences as list of words:
word_freq = {w:[s.split() for s in sents if w in s.split()] for w in words }
Output:
{
'i': [['i', 'was', 'hungry'], ['i', 'got', 'food'], ['now', 'i', 'am'], ['not', 'hungry', 'i', 'am', 'full']],
'was': [['i', 'was', 'hungry']],
'hungry': [['i', 'was', 'hungry'], ['not', 'hungry', 'i', 'am', 'full']],
'got': [['i', 'got', 'food']],
'food': [['i', 'got', 'food']],
'now': [['now', 'i', 'am']],
'not': [['not', 'hungry', 'i', 'am', 'full']],
'am': [['now', 'i', 'am'], ['not', 'hungry', 'i', 'am', 'full']],
'full': [['not', 'hungry', 'i', 'am', 'full']]}