Skip to content
Advertisement

Measure similarity between two documents using Doc2Vec

I have already trained gensim doc2Vec model, which is finding most similar documents to an unknown one.

Now I need to find the similarity value between two unknown documents (which were not in the training data, so they can not be referenced by doc id)

JavaScript

in the code above vec1 and vec2 are successfully initialized to some values and of size – ‘vector_size’

now looking through the gensim api and examples I could not find method that works for me, all of them are expecting TaggedDocument

Can I compare the feature vectors value by value and if they are closer => the texts are more similar?

Advertisement

Answer

Hello just In case someone is interested, to do this you just need the cosine distance between the two vectors.

I found that most people are using ‘spatial’ for this pourpose

Here is a small code sniped that should work pretty well if you already have trained doc2vec

JavaScript
Advertisement