The input word is standalone and not part of a sentence but I would like to get all of its possible lemmas as if the input word were in different sentences with all possible POS tags. I would also like to get the lookup version of the word’s lemma.
Why am I doing this?
I have extracted lemmas from all the documents and I have also calculated the number of dependency links between lemmas. Both of which I have done using en_core_web_sm
. Now, given an input word, I would like to return the lemmas that are linked most frequently to all the possible lemmas of the input word.
So in short, I would like to replicate the behaviour of token._lemma
for the input word with all possible POS tags to maintain consistency with the lemma links I have counted.
Advertisement
Answer
I found it difficult to get lemmas and inflections directly out of spaCy without first constructing an example sentence to give it context. This wasn’t ideal, so I looked further and found LemmaInflect did this very well.
> from lemminflect import getAllLemmas, getInflection, getAllInflections, getAllInflectionsOOV > getAllLemmas('watches') {'NOUN': ('watch',), 'VERB': ('watch',)} > getAllInflections('watch') {'NN': ('watch',), 'NNS': ('watches', 'watch'), 'VB': ('watch',), 'VBD': ('watched',), 'VBG': ('watching',), 'VBZ': ('watches',), 'VBP': ('watch',)}