Skip to content
Advertisement

Name Entity Recognition (NER) for multiple languages

I am writing some code to perform Named Entity Recognition (NER), which is coming along quite nicely for English texts. However, I would like to be able to apply NER to any language. To do this, I would like to 1) identify the language of a text, and then 2) apply the NER for the identified language. For step 2, I’m doubting to A) translate the text to English, and then apply the NER (in English), or B) apply the NER in the language identified.

Below is the code I have so far. What I would like is for the NER to work for text2, or in any other language, after this language is first recognized:

import spacy
from spacy_langdetect import LanguageDetector
from langdetect import DetectorFactory

text = 'In 1793, Alexander Hamilton recruited Webster to move to New York City and become an editor for a Federalist Party newspaper.'
text2 = 'Em 1793, Alexander Hamilton recrutou Webster para se mudar para a cidade de Nova York e se tornar editor de um jornal do Partido Federalista.'

# Step 1: Identify the language of a text
DetectorFactory.seed = 0
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe(LanguageDetector(), name='language_detector', last=True)
doc = nlp(text)
print(doc._.language)

# Step 2: NER
Entities = [(str(x), x.label_) for x in nlp(str(text)).ents]
print(Entities)

Does anyone have any experience with this? Much appreciated!

Advertisement

Answer

Spacy needs to load the correct model for the right language.

See https://spacy.io/usage/models for available models.

import spacy
from langdetect import detect
nlp={}    
for lang in ["en", "es", "pt", "ru"]: # Fill in the languages you want, hopefully they are supported by spacy.
    if lang == "en":
        nlp[lang]=spacy.load(lang + '_core_web_lg')
    else: 
        nlp[lang]=spacy.load(lang + '_core_news_lg')

def entites(text):
     lang = detect(text)
     try:
         nlp2 =nlp[lang]
     except KeyError:
         return Exception(lang + " model is not loaded")
     return [(str(x), x.label_) for x in nlp2(str(text)).ents]

Then, you could run the two steps together

ents = entites(text)
print(ents)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement