Measure similarity between two documents using Doc2Vec

Question

I have already trained gensim doc2Vec model, which is finding most similar documents to an unknown one. Now I need to find the similarity value between two unknown documents (which were not in the training data, so they can not be referenced by doc id) in the code above vec1 and vec2 are successfully initiali…

Accepted Answer

Hello just In case someone is interested, to do this you just need the cosine distance between the two vectors.I found that most people are using &#8216;spatial&#8217; for this pourposeHere is a small code sniped that should work pretty well if you already have trained doc2vecfrom gensim.models import doc2vecfrom scipy import spatiald2v_model = doc2vec.Doc2Vec.load(model_file)fisrt_text = '..'second_text = '..'vec1 = d2v_model.infer_vector(fisrt_text.split())vec2 = d2v_model.infer_vector(second_text.split())cos_distance = spatial.distance.cosine(vec1, vec2)# cos_distance indicates how much the two texts differ from each other:# higher values mean more distant (i.e. different) texts

Advertisement

Answer