I’m using windows system, python 3.7 when I install:
JavaScript
x
3
1
import nltk
2
nltk.download('reuters')
3
it has no problem to import, and I also already install nltk in my cmd
but when I conduct the code:
JavaScript
1
10
10
1
import matplotlib.pyplot as plt
2
from collections import Counter
3
from nltk.corpus import reuters
4
import re
5
import spacy
6
nlp = spacy.load('en', disable=['parser', 'tagger'])
7
reuters_fileids = reuters.fileids()
8
reuters_nlp = [nlp(re.sub('s+',' ', reuters.raw(i)).strip()) for i in reuters_fileids[:100]]
9
label_counter = Counter()
10
it has some Error, and I don’t know how to fix it… However, the code works well on my MacBook I’m wondering what’s going on with the windows system p.s I use anaconda, and on the windows computer, the anaconda is installed on E:
JavaScript
1
16
16
1
Resource reuters not found.
2
Please use the NLTK Downloader to obtain the resource:
3
4
>>> import nltk
5
>>> nltk.download('reuters')
6
7
Searched in:
8
- 'C:\Users\user/nltk_data'
9
- 'C:\nltk_data'
10
- 'D:\nltk_data'
11
- 'E:\nltk_data'
12
- 'E:\Anaconda\nltk_data'
13
- 'E:\Anaconda\share\nltk_data'
14
- 'E:\Anaconda\lib\nltk_data'
15
- 'C:\Users\user\AppData\Roaming\nltk_data'
16
Advertisement
Answer
You don’t have the corpus in your new environment.
Download the corpus as suggested in the error message:
JavaScript
1
13
13
1
>>> from nltk.corpus import reuters
2
3
>>> import nltk
4
>>> nltk.download('reuters')
5
[nltk_data] Downloading package reuters to
6
[nltk_data] /Users/liling.tan/nltk_data
7
True
8
9
>>> reuters.words()
10
['ASIAN', 'EXPORTERS', 'FEAR', 'DAMAGE', 'FROM', 'U', ]
11
>>> reuters.sents()
12
[['ASIAN', 'EXPORTERS', 'FEAR', 'DAMAGE', 'FROM', 'U', '.', 'S', '.-', 'JAPAN', 'RIFT', 'Mounting', 'trade', 'friction', 'between', 'the', 'U', '.', 'S', '.', 'And', 'Japan', 'has', 'raised', 'fears', 'among', 'many', 'of', 'Asia', "'", 's', 'exporting', 'nations', 'that', 'the', 'row', 'could', 'inflict', 'far', '-', 'reaching', 'economic', 'damage', ',', 'businessmen', 'and', 'officials', 'said', '.'], ['They', 'told', 'Reuter', 'correspondents', 'in', 'Asian', 'capitals', 'a', 'U', '.', 'S', '.', 'Move', 'against', 'Japan', 'might', 'boost', 'protectionist', 'sentiment', 'in', 'the', 'U', '.', 'S', '.', 'And', 'lead', 'to', 'curbs', 'on', 'American', 'imports', 'of', 'their', 'products', '.'], ]
13
Alternatively, you can also download the corpus from command line:
JavaScript
1
7
1
$ python3 -m nltk.downloader reuters
2
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour
3
warn(RuntimeWarning(msg))
4
[nltk_data] Downloading package reuters to
5
[nltk_data] /Users/liling.tan/nltk_data
6
[nltk_data] Package reuters is already up-to-date!
7
See also: How do I download NLTK data?