Skip to content

Tag: gensim

Gensim Word2Vec exhausting iterable

I’m getting the following prompt when calling model.train() from gensim word2vec The only solutions I found on my search for an answer point to the itarable vs iterator difference, and at this point, I tried everything I could to solve this on my own, currently, my code looks like this: The corpus variable is a list containing sentences, and each

Retrieve n-grams with word2vec

I have a list of texts. I turn each text into a token list. For example if one of the texts is ‘I am studying word2vec’ the respective token list will be (assuming I consider n-grams with n = 1, 2, 3) [‘I’, ‘am’, ‘studying ‘, ‘word2vec, ‘I am’, ‘am studying’, ‘studying word2vec’, ‘I am studying’, ‘am studying word2vec’]. Is

disable logging for specific lines of code

I am tuning the word2vec model hyper-parameters. Word2Vec has to many log in console that I cannot read Optuna or my custom log. Is there any trick to suppress logs generated by Word2Vec? Answer I used following code in python 3.7 in python 3.6 we have send logging.ERROR to disable function.

training a Fasttext model

I want to train a Fasttext model in Python using the “gensim” library. First, I should tokenize each sentences to its words, hence converting each sentence to a list of words. Then, this list should be appended to a final list. Therefore, at the end, I will have a nested list containing all tokenized sentences: Then, the model should be

Gensim LDA Coherence Score Nan

I created a Gensim LDA Model as shown in this tutorial: And it generates 10 topics with a log_perplexity of: lda_model.log_perplexity(data_df[‘bow_corpus’]) = -5.325966117835991 But when I run the coherence model on it to calculate coherence score, like so: My LDA-Score is nan. What am I doing wrong here? Answer Solved! Coherence Model requires the original text, instead of the

Doc2Vec find the similar sentence

I am trying find similar sentence using doc2vec. What I am not able to find is actual sentence that is matching from the trained sentences. Below is the code from this article: But the above code only gives me vectors or numbers. But how can I get the actual sentence matched from training data. For Eg – In this case

CalledProcessError: Returned non-zero exit status 1

When I try to run: I get the following error: What can I do in my code specifically to make it work? Furthermore, the question on this error has been asked a few times before. However, each answer seems so specific to a particular case, that I don’t see what I can change on my code now so that it
