Skip to content
Advertisement

spacy Entity Ruler pattern isn’t working for ent_type

I am trying to get the entity ruler patterns to use a combination of lemma & ent_type to generate a tag for the phrase “landed (or land) in Baltimore(location)”. It seems to be working with the Matcher, but not the entity ruler I created. I set the override ents to True, so not really sure why this isn’t working. It is most likely a user error, I am just not sure what it is. Below is the code example. From the output, you can see that the pattern rule was added after NER and I have set the override ents to true. Any input or suggestions would be appreciated!

The matcher tags the entire phrase (landed in Baltimore), but the entity rule does not.

Code Sample

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('en_core_web_lg')

matcher = Matcher(nlp.vocab)

pattern = [{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]
patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]

matcher.add("Flying", [pattern])

rulerActions= EntityRuler(nlp, overwrite_ents=True)
rulerActions = nlp.add_pipe("entity_ruler","ruleActions").add_patterns(patterns)
# rulerActions.add_patterns(patterns)

print(f'spaCy Pipelines: {nlp.pipe_names}')

doc = nlp("The student landed in Baltimore for the holidays.")

matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(f'{string_id}  ->  {span.text}')
    
for ent in doc.ents:
    print(ent.text, ent.label_)

Print Statements

spaCy Pipelines: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'ruleActions']
Flying  ->  landed in Baltimore
Baltimore GPE
the holidays DATE

Advertisement

Answer

Here is a working version of your code:

import spacy

nlp = spacy.load('en_core_web_lg')

patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]

ruler = nlp.add_pipe("entity_ruler","ruleActions", config={"overwrite_ents": True})
ruler.add_patterns(patterns)

print(f'spaCy Pipelines: {nlp.pipe_names}')

doc = nlp("The student landed in Baltimore for the holidays.")

matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(f'{string_id}  ->  {span.text}')
    
for ent in doc.ents:
    print(ent.text, ent.label_)

The Matcher you are creating isn’t used at all. When you call EntityRuler that creates an EntityRuler, but calling add_pipe creates a completely different object, and it doesn’t have the overwrite_ents config.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement