Modifying .trainables.syn1neg[i] with previously trained vectors in Gensim word2vec

Question

My issue is the following. In my code I'm modifying the .wv[word] before training but after .build_vocab(), which is fairly straight forward. Just instead of the vectors in there add mine for every word. Where setIntersection is just a set of common words between gensim word2vec and RandomIndexing trained. Same size of 300 in both. Now I want to also

Accepted Answer

In Gensim 4.0+, that &#8220;hidden to output layer&#8221; is just in w2v_model.syn1neg, instead of a (now-removed) subcomponent .trainables.Following the original word2vec.c on which Gensim&#8217;s implementation is based, those weights begin training as uninitialized zeros.As the output (predicted-word) nodes are exactly the same vocabulary as are considered in the input/projection layer, the correspondence of rows-to-words is exactly the same as in the input layer, aka the word-vectors being trained. (That was previously in an array called .syn0, more recently called just .vectors.)So the word that&#8217;s in slot 0 in w2v_model.wv.vectors is also the word represented by the output-node fed by w2v_model.syn1neg[0].In Gensim 4.0+, these word-to-slot values can be read from w2v_model.wv.key_to_index[word]. (Pre-4.0, I think it was w2v_model.wv.vocab[word].index.)

Advertisement

Answer