Basically, I have no idea why I’m getting this error.
Just to have more than an image, here is a similar message in code format. As it is more recent, the answer of this thread has already been mentioned in the message:
Preprocessing raw texts ... --------------------------------------------------------------------------- LookupError Traceback (most recent call last) <ipython-input-38-263240bbee7e> in <module>() ----> 1 main() 7 frames <ipython-input-32-62fa346501e8> in main() 32 data = data.fillna('') # only the comments has NaN's 33 rws = data.abstract ---> 34 sentences, token_lists, idx_in = preprocess(rws, samp_size=samp_size) 35 # Define the topic model object 36 #tm = Topic_Model(k = 10), method = TFIDF) <ipython-input-31-f75213289788> in preprocess(docs, samp_size) 25 for i, idx in enumerate(samp): 26 sentence = preprocess_sent(docs[idx]) ---> 27 token_list = preprocess_word(sentence) 28 if token_list: 29 idx_in.append(idx) <ipython-input-29-eddacbfa6443> in preprocess_word(s) 179 if not s: 180 return None --> 181 w_list = word_tokenize(s) 182 w_list = f_punct(w_list) 183 w_list = f_noun(w_list) /usr/local/lib/python3.7/dist-packages/nltk/tokenize/__init__.py in word_tokenize(text, language, preserve_line) 126 :type preserver_line: bool 127 """ --> 128 sentences = [text] if preserve_line else sent_tokenize(text, language) 129 return [token for sent in sentences 130 for token in _treebank_word_tokenizer.tokenize(sent)] /usr/local/lib/python3.7/dist-packages/nltk/tokenize/__init__.py in sent_tokenize(text, language) 92 :param language: the model name in the Punkt corpus 93 """ ---> 94 tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language)) 95 return tokenizer.tokenize(text) 96 /usr/local/lib/python3.7/dist-packages/nltk/data.py in load(resource_url, format, cache, verbose, logic_parser, fstruct_reader, encoding) 832 833 # Load the resource. --> 834 opened_resource = _open(resource_url) 835 836 if format == 'raw': /usr/local/lib/python3.7/dist-packages/nltk/data.py in _open(resource_url) 950 951 if protocol is None or protocol.lower() == 'nltk': --> 952 return find(path_, path + ['']).open() 953 elif protocol.lower() == 'file': 954 # urllib might not use mode='rb', so handle this one ourselves: /usr/local/lib/python3.7/dist-packages/nltk/data.py in find(resource_name, paths) 671 sep = '*' * 70 672 resource_not_found = 'n%sn%sn%sn' % (sep, msg, sep) --> 673 raise LookupError(resource_not_found) 674 675 LookupError: ********************************************************************** Resource punkt not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt') Searched in: - '/root/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/usr/nltk_data' - '/usr/lib/nltk_data' - '' **********************************************************************
Advertisement
Answer
Perform the following:
>>> import nltk >>> nltk.download()
Then when you receive a window popup, select punkt
under the identifier
column which is locatedin the Module
tab.