The Problem: I am trying to determine the similarity between two 1D arrays composed of counts. Both the positions and relative magnitudes of the counts inside the arrays are important.
X = [1, 5, 10, 0, 0, 0, 2] Y = [1, 2, 0, 0, 10, 0, 5] Z = [1, 3, 8, 0, 0, 0, 1]
In this case array X is more similar to array Z than array Y.
I have tried a few metrics including cosine distance, earth movers distance and histogram intersection and while cosine distance and earth movers distance work decently, only EMD really satisfies both of my conditions
I am curious to know if there are other algorithms / distance metrics out there that exist to answer this sort of problem.
Thank you!
Advertisement
Answer
One popular and simple method is root-mean-square, where you sum the squares of the differences between the elements, take the square root, and divide by the number of elements, In your case, X vs Y produces 2.1, and X vs Z produces 0.4.
import math X = [1, 5, 10, 0, 0, 0, 2] Y = [1, 2, 0, 0, 10, 0, 5] Z = [1, 3, 8, 0, 0, 0, 1] def rms(a,b): return math.sqrt( sum((a1-b1)*(a1-b1) for a1,b1 in zip(a,b)))/len(a) print(rms(X,Y)) print(rms(X,Z))