2D Vectorization of unique values per row with condition

Question

Consider the array and function definition shown: The point of the function is to return the rows of array a that have exactly grpCount groups of elements that each hold exactly grpSize identical elements. For example: As expected, the code outputs out = [[2, 2, 5, 6, 2, 5], [3, 3, 7, 7, 3, 3]]. The 1st output row has

Accepted Answer

np.unique sorts the array which makes it less efficient for your purpose. Use np.bincount and that way you most likely will save some time(depending on your array shape and values in the array). You also will not need np.any anymore:def grpCountSize(arr, grpCount, grpSize):        count = [np.bincount(row) for row in arr]    valid = [np.count_nonzero(row == grpSize) == grpCount for row in count]    return validAnother way that might even save more time is using same number of bins for all rows and create one array:def grpCountSize(arr, grpCount, grpSize):    m = arr.max()    count = np.stack([np.bincount(row, minlength=m+1) for row in arr])    return (count == grpSize).sum(1)==grpCountAnother yet upgrade is to use vectorized 2D bin count from this post. For example (note that Numba solutions tested in the post above is faster. I just provided the numpy solution for example. You can replace the function with any of the suggested ones in the post linked above):def grpCountSize(arr, grpCount, grpSize):    count = bincount2D_vectorized(arr)    return (count == grpSize).sum(1)==grpCount#from the post abovedef bincount2D_vectorized(a):        N = a.max()+1    a_offs = a + np.arange(a.shape[0])[:,None]*N    return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)output of all solutions above:a[grpCountSize2(a, 1, 2)]#array([[2, 2, 5, 6, 2, 5],#       [3, 3, 7, 7, 3, 3]])

Advertisement

Answer