Suppose I have a bunch of arrays, including x
and y
, and I want to check if they’re equal. Generally, I can just use np.all(x == y)
(barring some dumb corner cases which I’m ignoring now).
However this evaluates the entire array of (x == y)
, which is usually not needed. My arrays are really large, and I have a lot of them, and the probability of two arrays being equal is small, so in all likelihood, I really only need to evaluate a very small portion of (x == y)
before the all
function could return False, so this is not an optimal solution for me.
I’ve tried using the builtin all
function, in combination with itertools.izip
: all(val1==val2 for val1,val2 in itertools.izip(x, y))
However, that just seems much slower in the case that two arrays are equal, that overall, it’s stil not worth using over np.all
. I presume because of the builtin all
‘s general-purposeness. And np.all
doesn’t work on generators.
Is there a way to do what I want in a more speedy manner?
I know this question is similar to previously asked questions (e.g. Comparing two numpy arrays for equality, element-wise) but they specifically don’t cover the case of early termination.
Advertisement
Answer
Until this is implemented in numpy natively you can write your own function and jit-compile it with numba:
import numpy as np import numba as nb @nb.jit(nopython=True) def arrays_equal(a, b): if a.shape != b.shape: return False for ai, bi in zip(a.flat, b.flat): if ai != bi: return False return True a = np.random.rand(10, 20, 30) b = np.random.rand(10, 20, 30) %timeit np.all(a==b) # 100000 loops, best of 3: 9.82 µs per loop %timeit arrays_equal(a, a) # 100000 loops, best of 3: 9.89 µs per loop %timeit arrays_equal(a, b) # 100000 loops, best of 3: 691 ns per loop
Worst case performance (arrays equal) is equivalent to np.all
and in case of early stopping the compiled function has the potential to outperform np.all
a lot.