I am trying to get the entity ruler patterns to use a combination of lemma & ent_type to generate a tag for the phrase “landed (or land) in Baltimore(location)”. It seems to be working with the Matcher, but not the entity ruler I created. I set the override ents to True, so not really sure why this isn’t working. It is most likely a user error, I am just not sure what it is. Below is the code example. From the output, you can see that the pattern rule was added after NER and I have set the override ents to true. Any input or suggestions would be appreciated!
The matcher tags the entire phrase (landed in Baltimore), but the entity rule does not.
Code Sample
import spacy from spacy.matcher import Matcher nlp = spacy.load('en_core_web_lg') matcher = Matcher(nlp.vocab) pattern = [{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}] patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}] matcher.add("Flying", [pattern]) rulerActions= EntityRuler(nlp, overwrite_ents=True) rulerActions = nlp.add_pipe("entity_ruler","ruleActions").add_patterns(patterns) # rulerActions.add_patterns(patterns) print(f'spaCy Pipelines: {nlp.pipe_names}') doc = nlp("The student landed in Baltimore for the holidays.") matches = matcher(doc) for match_id, start, end in matches: string_id = nlp.vocab.strings[match_id] # Get string representation span = doc[start:end] # The matched span print(f'{string_id} -> {span.text}') for ent in doc.ents: print(ent.text, ent.label_)
Print Statements
spaCy Pipelines: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'ruleActions'] Flying -> landed in Baltimore Baltimore GPE the holidays DATE
Advertisement
Answer
Here is a working version of your code:
import spacy nlp = spacy.load('en_core_web_lg') patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}] ruler = nlp.add_pipe("entity_ruler","ruleActions", config={"overwrite_ents": True}) ruler.add_patterns(patterns) print(f'spaCy Pipelines: {nlp.pipe_names}') doc = nlp("The student landed in Baltimore for the holidays.") matches = matcher(doc) for match_id, start, end in matches: string_id = nlp.vocab.strings[match_id] # Get string representation span = doc[start:end] # The matched span print(f'{string_id} -> {span.text}') for ent in doc.ents: print(ent.text, ent.label_)
The Matcher you are creating isn’t used at all. When you call EntityRuler
that creates an EntityRuler, but calling add_pipe
creates a completely different object, and it doesn’t have the overwrite_ents
config.