Counting specific words in a sentence

Question

I am currently trying to solve this homework question. My task is to implement a function that returns a vector of word counts in a given text. I am required to split the text into words then use NLTK's tokeniser to tokenise each sentence. This is the code I have so far: There are two doctests that should give the

Accepted Answer

This is how I would use nltk to get to the result your homework wants:import nltkimport collectionsfrom nltk.tokenize import TweetTokenizer# nltk.download('punkt')# nltk.download('gutenberg')# nltk.download('brown')def word_counts(text, words):    """Return a vector that represents the counts of specific words in the text    word_counts("Here is one. Here is two.", ['Here', 'two', 'three'])    [2, 1, 0]    emma = nltk.corpus.gutenberg.raw('austen-emma.txt')    word_counts(emma, ['the', 'a'])    [4842, 3001]    """      textTok = nltk.word_tokenize(text)     counts =  nltk.FreqDist(textTok)   # this counts ALL word occurences    return [counts[x] for x in words] # this returns what was counted for *wordsr1 = word_counts("Here is one. Here is two.", ['Here', 'two', 'three'])print(r1) #    [2, 1, 0]emma = nltk.corpus.gutenberg.raw('austen-emma.txt')r2 = word_counts(emma, ['the', 'a'])print(r2) # [4842, 3001]Your code does multiple things that look just wrong:for sen in text, words:    for word in nltk.word_tokenize(sen):        wordList.append(text, words).split(word)sent_tokenize() takes a string and returns a list of sentences from it &#8211; you store the results in 2 variables text, words and then you try to iterate over tuple of them? words is not a text with sentences to begin, this makes not much sense to mewordList is a list, if you use the .append() on it, append() returns None. Nonehas no .split() function.

Advertisement

Answer