Skip to content

Tag: lemmatization

How to create a dictionnary whose key:value pairs are the values of two different lists of dictionnaries?

I have 2 lists of dictionnaries that result from a pymongo extraction. A list of dicts containing id’s (string) and lemmas (strings): lemmas = [{‘id’: ‘id1’, ‘lemma’: ‘lemma1’}, {‘id’: ‘id2’, ‘lemma’: ‘lemma2’}, {‘id’: ‘id3’, ‘lemma’: ‘lemma3’}, …] A list of dicts containing id’s and multiple words per id: words = [{‘id’: ‘id1’, ‘word’: ‘word1.1’}, {‘id’: ‘id1’, ‘word’: ‘word1.2’}, {‘id’: ‘id2’,

Neither stemmer nor lemmatizer seem to work very well, what should I do?

I am new to text analysis and am trying to create a bag of words model(using sklearn’s CountVectorizer method). I have a data frame with a column of text with words like ‘acid’, ‘acidic’, ‘acidity’, ‘wood’, ‘woodsy’, ‘woody’. I think that ‘acid’ and ‘wood’ should be the only words included in the final output, however neither stemming nor lemmatizing seems

How to solve Spanish lemmatization problems with SpaCy?

When trying lemmatize in Spanish a csv with more than 60,000 words, SpaCy does not correctly write certain words, I understand that the model is not 100% accurate. However, I have not found any other solution, since NLTK does not bring a Spanish core. A friend tried to ask this question in Spanish Stackoverflow, however, the community is quite small

ntlk: how to get inflections of words

I have a list of words, nearly 5000 English words, and for each word I need these inflectional forms: noun: singular and plural verb: infinitive, present simple, present simple 3rd person, past simple, present participle (ing form), past participle adjective: comparative and superlative adverb How can I extract these information from a given word (e.g. help) in ntlk via python?
