SpaCy can’t find table(s) lexeme_norm for language ‘en’ in spacy-lookups-data

I am trying to train a text categorization pipe in SpaCy:

import spacy

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textcat", last=True)
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
with nlp.disable_pipes(*other_pipes):
    optimizer = nlp.begin_training()
    # training logic

JavaScript
​x
 
import spacy
​
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textcat", last=True)
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
with nlp.disable_pipes(*other_pipes):
    optimizer = nlp.begin_training()
    # training logic
​

However, every time I call nlp.begin_training(), I get the error

ValueError: [E955] Can't find table(s) lexeme_norm for language 'en' in spacy-lookups-data. Make sure you have the package installed or provide your own lookup tables if no default lookups are available for your language.

JavaScript
 
ValueError: [E955] Can't find table(s) lexeme_norm for language 'en' in spacy-lookups-data. Make sure you have the package installed or provide your own lookup tables if no default lookups are available for your language.
​

Running python3 -m spacy validate returns

✔ Loaded compatibility table

================= Installed pipeline packages (spaCy v3.0.3) =================
ℹ spaCy installation:
/xxx/xxx/xxx/env/lib/python3.8/site-packages/spacy

NAME             SPACY            VERSION                            
en_core_web_lg   >=3.0.0,<3.1.0   3.0.0   ✔
en_core_web_sm   >=3.0.0,<3.1.0   3.0.0   ✔

JavaScript
 
✔ Loaded compatibility table
​
================= Installed pipeline packages (spaCy v3.0.3) =================
ℹ spaCy installation:
/xxx/xxx/xxx/env/lib/python3.8/site-packages/spacy
​
NAME             SPACY            VERSION                            
en_core_web_lg   >=3.0.0,<3.1.0   3.0.0   ✔
en_core_web_sm   >=3.0.0,<3.1.0   3.0.0   ✔
​

Furthermore, I have tried installing spacy-lookups-data without success.

How can I resolve this error?

Answer

It isn’t allowed to call nlp.begin_training() on pretrained models. If you want to train a new model, just use: nlp = spacy.blank('en') instead of nlp = spacy.load("en_core_web_sm")

However, if you want to continue training on an existing model call optimizer = nlp.create_optimizer() instead of begin_training()

Advertisement

Answer