Skip to content
Advertisement

Determine the similarity between two arrays of counts [closed]

The Problem: I am trying to determine the similarity between two 1D arrays composed of counts. Both the positions and relative magnitudes of the counts inside the arrays are important.

X = [1, 5, 10, 0,  0, 0, 2]
Y = [1, 2,  0, 0, 10, 0, 5]
Z = [1, 3,  8, 0,  0, 0, 1]

In this case array X is more similar to array Z than array Y.

I have tried a few metrics including cosine distance, earth movers distance and histogram intersection and while cosine distance and earth movers distance work decently, only EMD really satisfies both of my conditions

I am curious to know if there are other algorithms / distance metrics out there that exist to answer this sort of problem.

Thank you!

Advertisement

Answer

One popular and simple method is root-mean-square, where you sum the squares of the differences between the elements, take the square root, and divide by the number of elements, In your case, X vs Y produces 2.1, and X vs Z produces 0.4.

import math

X = [1, 5, 10, 0,  0, 0, 2]
Y = [1, 2,  0, 0, 10, 0, 5]
Z = [1, 3,  8, 0,  0, 0, 1]

def rms(a,b):
    return math.sqrt( sum((a1-b1)*(a1-b1) for a1,b1 in zip(a,b)))/len(a)

print(rms(X,Y))
print(rms(X,Z))
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement