I am trying to train a text categorization pipe in SpaCy:
JavaScript
x
9
1
import spacy
2
3
nlp = spacy.load("en_core_web_sm")
4
nlp.add_pipe("textcat", last=True)
5
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'textcat']
6
with nlp.disable_pipes(*other_pipes):
7
optimizer = nlp.begin_training()
8
# training logic
9
However, every time I call nlp.begin_training()
, I get the error
JavaScript
1
2
1
ValueError: [E955] Can't find table(s) lexeme_norm for language 'en' in spacy-lookups-data. Make sure you have the package installed or provide your own lookup tables if no default lookups are available for your language.
2
Running python3 -m spacy validate
returns
JavaScript
1
10
10
1
✔ Loaded compatibility table
2
3
================= Installed pipeline packages (spaCy v3.0.3) =================
4
ℹ spaCy installation:
5
/xxx/xxx/xxx/env/lib/python3.8/site-packages/spacy
6
7
NAME SPACY VERSION
8
en_core_web_lg >=3.0.0,<3.1.0 3.0.0 ✔
9
en_core_web_sm >=3.0.0,<3.1.0 3.0.0 ✔
10
Furthermore, I have tried installing spacy-lookups-data
without success.
How can I resolve this error?
Advertisement
Answer
It isn’t allowed to call nlp.begin_training()
on pretrained models. If you want to train a new model, just use:
nlp = spacy.blank('en')
instead of nlp = spacy.load("en_core_web_sm")
However, if you want to continue training on an existing model call optimizer = nlp.create_optimizer()
instead of begin_training()