Skip to content
Advertisement

FAST: 1D overlaps with rows in 2D?

let say i have 2D array, f.e.:

JavaScript

I want to calculate overlap with 1D vector, FAST.

I can almost do it with (8ms on big array):

JavaScript

The problem with it is that it only matches if both Position and Value match.

JavaScript

F.e. 5 in 2nd column of 1d vec did not match with 5 in 3rd column on the 2nd row.

It works with .isin()

JavaScript

but it is slow on big arrays (200000,10) ~20ms

Help me extend the first case so that it can match the Value in any position of 1D vector with the row.


the expected result is row-indexes ordered by OVERLAP COUNT, lets use [2,4,5] because it has more matches:

JavaScript

Overlap :

JavaScript

Order by overlap :

JavaScript

See rows : 6,3,1 have Overlap:2 that why they are first


Variants:

JavaScript

Can this be sped if using numba+cuda OR cupy on GPU ?

Advertisement

Answer

The main problem of all approach so fast is that they create huge temporary array while finally only 5 items are important. Numba can be used to compute the arrays on the fly (with efficient JIT-compiled loops) avoiding some temporary array. Moreover, a full sort is not required as only the top 5 items need to be retrieved. A partition can be used instead. It is even possible to use a faster approach since only the 5 selected items matters and not the others. Here is the resulting code:

JavaScript

Note that best4 can provide results different to best2 when the matching score (stored in tmp) is equal between multiple items. This is due to the sorting algorithm which is not stable by default in Numpy (the kind parameter can be used to adapt this behavior). This is also true for the partition algorithm although Numpy does not seems to provide a stable partition algorithm yet.

This code should be faster than other implementation, but not by a large margin. One of the issue is that Numba (and most C/C++ compilers like the one used to compile Numpy) do not succeed to generate a fast code since it does not know the value m at compile time. As a result, the most aggressive optimizations (eg. unrolling loops and using of SIMD instructions) can hardly be applied. You can help Numba using assertions or escaping conditionals.

Moreover, the code can be parallelized using multiple threads to make it much faster on mainstream platforms. Note that the parallelized version may not faster on small data nor all platforms since creating threads introduces an overhead that could be bigger than the actual computation.

Here is the resulting implementation:

JavaScript

Here are the timings on my machine with the example dataset:

JavaScript

The fastest implementation is 15 times faster than the original reference implementation.

Note that if m is greater than about 30, it should be better to use a more advanced set-based algorithm. An alternative solution is to sort match first and then use np.isin in the i-based loop in this case.

Advertisement