I’ve tried to optimize searching for minimum value between two numpy
vectors with numba
. There is speed up and result is correct until I use prange
and parallel=True
option. I understand that the issue is in sharing variables min_val
, tmp
, min_val_idx_a
, min_val_idx_b
during parallel execution (maybe with parallel threads). Is there way to overcome the problem and use numba
in with parallel=True
option ? (which makes my simple code 300x faster)
import numpy as np
from numba import jit, int32, void, double, prange
@jit(void(double[:], double[:], int32), nopython=True, parallel=True)
def lowest_value_numba(a, b, n):
# initialization
min_val_idx_a, min_val_idx_b = 0, 0
min_val = tmp = np.abs(a[0]-b[0])
for i in prange(n):
# print(i)
for j in prange(i, n):
tmp = np.abs(a[i]-b[j])
if(tmp < min_val):
min_val = tmp
min_val_idx_a = i
min_val_idx_b = j
print(min_val, min_val_idx_a, min_val_idx_b)
n = int(1e4)
a = np.random.uniform(low=0.0, high=1.0, size=n)
b = np.random.uniform(low=0.0, high=1.0, size=n)
# setting min value by setting the same valu efor a[n-1] and b[n-1]
a[n-1], b[n-1] = 1, 1
%timeit -n 1 -r 1 lowest_value_numba(a, b, n)
output which is incorrect (it should be 0.0 9999 9999
):
0.23648058275546968 0 0
223 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
but for compilation with parallel=False
output is correct (last values are most close to each other):
0.0 9999 9999
65 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Advertisement
Answer
You can avoid the issues commented about cross iteration dependencies if you parallelize by rows. For example:
from numba import jit, int32, double, prange
from numba.types import Tuple
@jit(Tuple([double, int32, int32])(double[:], double[:], int32),
nopython=True, parallel=True)
def lowest_value_numba(a, b, n):
min_dif = np.empty_like(a)
min_j = np.empty((n,), dtype=int32)
for i in prange(n):
diff = np.abs(a[i] - b)
min_j[i] = j = np.argmin(diff)
min_dif[i] = diff[j]
i = np.argmin(min_dif)
j = min_j[i]
min_val = min_dif[i]
return min_val, i, j
Results are consistent with your implementation (tested with parallel=False
and for j in prange(n)
) and with the brute force Numpy approach:
def lowest_value_numpy(a, b):
diff = np.abs(np.atleast_2d(a).T - np.atleast_2d(b))
indices = np.unravel_index(diff.argmin(), diff.shape)
return diff[indices], *indices