Tag: gensim

Gensim Word2Vec exhausting iterable

I’m getting the following prompt when calling model.train() from gensim word2vec The only solutions I found on my search for an answer point to the itarable vs iterator difference, and at this point, I tried everything I could to solve this on my own, currently, my code looks like this: The corpus variable is a list containing sentences, and each

Retrieve n-grams with word2vec

gensim python word2vec

I have a list of texts. I turn each text into a token list. For example if one of the texts is ‘I am studying word2vec’ the respective token list will be (assuming I consider n-grams with n = 1, 2, 3) [‘I’, ‘am’, ‘studying ‘, ‘word2vec, ‘I am’, ‘am studying’, ‘studying word2vec’, ‘I am studying’, ‘am studying word2vec’]. Is

disable logging for specific lines of code

gensim python stdout

I am tuning the word2vec model hyper-parameters. Word2Vec has to many log in console that I cannot read Optuna or my custom log. Is there any trick to suppress logs generated by Word2Vec? Answer I used following code in python 3.7 in python 3.6 we have send logging.ERROR to disable function.

Modifying .trainables.syn1neg[i] with previously trained vectors in Gensim word2vec

gensim python word2vec

My issue is the following. In my code I’m modifying the .wv[word] before training but after .build_vocab(), which is fairly straight forward. Just instead of the vectors in there add mine for every word. Where setIntersection is just a set of common words between gensim word2vec and RandomIndexing trained. Same size of 300 in both. Now I want to also

training a Fasttext model

fasttext gensim python

I want to train a Fasttext model in Python using the “gensim” library. First, I should tokenize each sentences to its words, hence converting each sentence to a list of words. Then, this list should be appended to a final list. Therefore, at the end, I will have a nested list containing all tokenized sentences: Then, the model should be

Gensim LDA Coherence Score Nan

gensim lda machine-learning python topic-modeling

I created a Gensim LDA Model as shown in this tutorial: https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/ And it generates 10 topics with a log_perplexity of: lda_model.log_perplexity(data_df[‘bow_corpus’]) = -5.325966117835991 But when I run the coherence model on it to calculate coherence score, like so: My LDA-Score is nan. What am I doing wrong here? Answer Solved! Coherence Model requires the original text, instead of the

Doc2Vec find the similar sentence

doc2vec gensim nlp python sentence-similarity

I am trying find similar sentence using doc2vec. What I am not able to find is actual sentence that is matching from the trained sentences. Below is the code from this article: But the above code only gives me vectors or numbers. But how can I get the actual sentence matched from training data. For Eg – In this case

CalledProcessError: Returned non-zero exit status 1

gensim lda mallet python

When I try to run: I get the following error: What can I do in my code specifically to make it work? Furthermore, the question on this error has been asked a few times before. However, each answer seems so specific to a particular case, that I don’t see what I can change on my code now so that it

Measure similarity between two documents using Doc2Vec

doc2vec gensim machine-learning nlp python

I have already trained gensim doc2Vec model, which is finding most similar documents to an unknown one. Now I need to find the similarity value between two unknown documents (which were not in the training data, so they can not be referenced by doc id) in the code above vec1 and vec2 are successfully initialized to some values and of