I’ve tried to optimize searching for minimum value between two numpy
vectors with numba
. There is speed up and result is correct until I use prange
and parallel=True
option. I understand that the issue is in sharing variables min_val
, tmp
, min_val_idx_a
, min_val_idx_b
during parallel execution (maybe with parallel threads). Is there way to overcome the problem and use numba
in with parallel=True
option ? (which makes my simple code 300x faster)
import numpy as np from numba import jit, int32, void, double, prange @jit(void(double[:], double[:], int32), nopython=True, parallel=True) def lowest_value_numba(a, b, n): # initialization min_val_idx_a, min_val_idx_b = 0, 0 min_val = tmp = np.abs(a[0]-b[0]) for i in prange(n): # print(i) for j in prange(i, n): tmp = np.abs(a[i]-b[j]) if(tmp < min_val): min_val = tmp min_val_idx_a = i min_val_idx_b = j print(min_val, min_val_idx_a, min_val_idx_b) n = int(1e4) a = np.random.uniform(low=0.0, high=1.0, size=n) b = np.random.uniform(low=0.0, high=1.0, size=n) # setting min value by setting the same valu efor a[n-1] and b[n-1] a[n-1], b[n-1] = 1, 1 %timeit -n 1 -r 1 lowest_value_numba(a, b, n)
output which is incorrect (it should be 0.0 9999 9999
):
0.23648058275546968 0 0 223 µs ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
but for compilation with parallel=False
output is correct (last values are most close to each other):
0.0 9999 9999 65 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Advertisement
Answer
You can avoid the issues commented about cross iteration dependencies if you parallelize by rows. For example:
from numba import jit, int32, double, prange from numba.types import Tuple @jit(Tuple([double, int32, int32])(double[:], double[:], int32), nopython=True, parallel=True) def lowest_value_numba(a, b, n): min_dif = np.empty_like(a) min_j = np.empty((n,), dtype=int32) for i in prange(n): diff = np.abs(a[i] - b) min_j[i] = j = np.argmin(diff) min_dif[i] = diff[j] i = np.argmin(min_dif) j = min_j[i] min_val = min_dif[i] return min_val, i, j
Results are consistent with your implementation (tested with parallel=False
and for j in prange(n)
) and with the brute force Numpy approach:
def lowest_value_numpy(a, b): diff = np.abs(np.atleast_2d(a).T - np.atleast_2d(b)) indices = np.unravel_index(diff.argmin(), diff.shape) return diff[indices], *indices