Skip to content
Advertisement

ImportError: No module named ‘spacy.en’

I’m working on a codebase that uses Spacy. I installed spacy using:

JavaScript

and then

JavaScript

At the end of this last command, I got a message:

JavaScript

Now, when I try running my code, on the line:

JavaScript

it gives me the following error:

JavaScript

I’ve looked on Stackexchange and the closest is: Import error with spacy: “No module named en” which does not solve my problem.

Any help would be appreciated. Thanks.

Edit: I might have solved this by doing the following:

JavaScript

and then using:

JavaScript

I’m still keeping this open in case there are any other answers.

Advertisement

Answer

Yes, I can confirm that your solution is correct. The version of spaCy you downloaded from pip is v2.0, which includes a lot of new features, but also a few changes to the API. One of them is that all language data has been moved to a submodule spacy.lang to keep thing cleaner and better organised. So instead of using spacy.en, you now import from spacy.lang.en.

JavaScript

However, it’s also worth mentioning that what you download when you run spacy download en is not the same as spacy.lang.en. The language data shipped with spaCy includes the static data like tokenization rules, stop words or lemmatization tables. The en package that you can download is a shortcut for the statistical model en_core_web_sm. It includes the language data, as well as binary weight to enable spaCy to make predictions for part-of-speech tags, dependencies and named entities.

Instead of just downloading en, I’d actually recommend using the full model name, which makes it much more obvious what’s going on:

JavaScript
JavaScript

When you call spacy.load, spaCy does the following:

  1. Find the installed model named "en_core_web_sm" (a package or shortcut link).
  2. Read its meta.json and check which language it’s using (in this case, spacy.lang.en), and how its processing pipeline should look (in this case, tagger, parser and ner).
  3. Initialise the language class and add the pipeline to it.
  4. Load in the binary weights from the model data so pipeline components (like the tagger, parser or entity recognizer) can make predictions.

See this section in the docs for more details.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement