Skip to content
Advertisement

Given a word can we get all possible lemmas for it using Spacy?

The input word is standalone and not part of a sentence but I would like to get all of its possible lemmas as if the input word were in different sentences with all possible POS tags. I would also like to get the lookup version of the word’s lemma.

Why am I doing this?

I have extracted lemmas from all the documents and I have also calculated the number of dependency links between lemmas. Both of which I have done using en_core_web_sm. Now, given an input word, I would like to return the lemmas that are linked most frequently to all the possible lemmas of the input word.

So in short, I would like to replicate the behaviour of token._lemma for the input word with all possible POS tags to maintain consistency with the lemma links I have counted.

Advertisement

Answer

I found it difficult to get lemmas and inflections directly out of spaCy without first constructing an example sentence to give it context. This wasn’t ideal, so I looked further and found LemmaInflect did this very well.

> from lemminflect import getAllLemmas, getInflection, getAllInflections, getAllInflectionsOOV

> getAllLemmas('watches')
{'NOUN': ('watch',), 'VERB': ('watch',)}

> getAllInflections('watch')
{'NN': ('watch',), 'NNS': ('watches', 'watch'), 'VB': ('watch',), 'VBD': ('watched',), 'VBG': ('watching',), 'VBZ': ('watches',),  'VBP': ('watch',)}
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement