I have 2 lists of dictionnaries that result from a pymongo extraction. A list of dicts containing id’s (string) and lemmas (strings): lemmas = [{‘id’: ‘id1’, ‘lemma’: ‘lemma1’}, {‘id’: ‘id2’, ‘lemma’: ‘lemma2’}, {‘id’: ‘id3’, ‘lemma’: ‘lemma3’}, …] A list of dicts containing id’s and multiple words per id: words = [{‘id’: ‘id1’, ‘word’: ‘word1.1’}, {‘id’: ‘id1’, ‘word’: ‘word1.2’}, {‘id’: ‘id2’,
Tag: lemmatization
Neither stemmer nor lemmatizer seem to work very well, what should I do?
I am new to text analysis and am trying to create a bag of words model(using sklearn’s CountVectorizer method). I have a data frame with a column of text with words like ‘acid’, ‘acidic’, ‘acidity’, ‘wood’, ‘woodsy’, ‘woody’. I think that ‘acid’ and ‘wood’ should be the only words included in the final output, however neither stemming nor lemmatizing seems
Given a word can we get all possible lemmas for it using Spacy?
The input word is standalone and not part of a sentence but I would like to get all of its possible lemmas as if the input word were in different sentences with all possible POS tags. I would also like to get the lookup version of the word’s lemma. Why am I doing this? I have extracted lemmas from all
How to solve Spanish lemmatization problems with SpaCy?
When trying lemmatize in Spanish a csv with more than 60,000 words, SpaCy does not correctly write certain words, I understand that the model is not 100% accurate. However, I have not found any other solution, since NLTK does not bring a Spanish core. A friend tried to ask this question in Spanish Stackoverflow, however, the community is quite small
ntlk: how to get inflections of words
I have a list of words, nearly 5000 English words, and for each word I need these inflectional forms: noun: singular and plural verb: infinitive, present simple, present simple 3rd person, past simple, present participle (ing form), past participle adjective: comparative and superlative adverb How can I extract these information from a given word (e.g. help) in ntlk via python?