I have a text which I split into a list of unique words using set. I also have split the text into a list of sentences. I then split that list of sentences into a list of lists (of the words in each sentence / maybe I don’t need to do the last part)
text = 'i was hungry. i got food. now i am not hungry i am full' sents = ['i was hungry', 'i got food', 'now i am', 'not hungry i am full'] words = ['i', 'was', 'hungry', 'got', 'food', 'now', 'not', 'am', 'full'] split_sents = [['i', 'was', 'hungry'], ['i', 'got', 'food'], ['now', 'i', 'am', 'not','hungry','i','am','full']]
I want to write a loop or a list comprehension that makes a dictionary where each word in words is a key and if the word appears in a sentence each sentence is captured as a list value so I can then get some statistics like the count of sentences but also the average length of the sentences for each word…so far I have the following but it’s not right.
word_freq = {} for sent in split_sents: for word in words: if word in sent: word_freq[word] += sent else: word_freq[word] = sent
it returns a dictionary of word keys and empty values. Ideally, I’d like to do it without collections/counter though any solution is appriciated. I’m sure this question has been asked before but I couldn’t find the right solution so feel free to link and close if you link to a solution.
Advertisement
Answer
Here is an approach using list and dictionary comprehension
Code:
text = 'i was hungry. i got food. now i am not hungry i am full' sents = ['i was hungry', 'i got food', 'now i am', 'not hungry i am full'] words = ['i', 'was', 'hungry', 'got', 'food', 'now', 'not', 'am', 'full'] word_freq = {w:[s for s in sents if w in s.split()] for w in words } print(word_freq)
Output:
{ 'i': ['i was hungry', 'i got food', 'now i am', 'not hungry i am full'], 'was': ['i was hungry'], ' hungry': ['i was hungry', 'not hungry i am full'], 'got': ['i got food'], 'food': ['i got food'], 'now': ['now i am'], 'not': ['not hungry i am full'], 'am': ['now i am', 'not hungry i am full'], 'full': ['not hungry i am full'] }
Or if you want output sentences as list of words:
word_freq = {w:[s.split() for s in sents if w in s.split()] for w in words }
Output:
{ 'i': [['i', 'was', 'hungry'], ['i', 'got', 'food'], ['now', 'i', 'am'], ['not', 'hungry', 'i', 'am', 'full']], 'was': [['i', 'was', 'hungry']], 'hungry': [['i', 'was', 'hungry'], ['not', 'hungry', 'i', 'am', 'full']], 'got': [['i', 'got', 'food']], 'food': [['i', 'got', 'food']], 'now': [['now', 'i', 'am']], 'not': [['not', 'hungry', 'i', 'am', 'full']], 'am': [['now', 'i', 'am'], ['not', 'hungry', 'i', 'am', 'full']], 'full': [['not', 'hungry', 'i', 'am', 'full']]}