Numpy: Optimal way to count indexs occurrence in an array

I have an array indexs. It’s very long (>10k), and each int value is rather small (<100). e.g.

indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0]) # int index array
indexs_max = 4 # already known

JavaScript
​x
 
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0]) # int index array
indexs_max = 4 # already known
​

Now I want to count occurrence of each index value (e.g. 0 for 3 times, 1 for 2 times…), and get counts as np.array([3, 2, 1, 1, 1]). I have tested 4 methods as follows:

UPDATE: _test4 is @Ch3steR‘s sol:

indexs = np.random.randint(0, 10, (20000,))
indexs_max = 9

def _test1():
    counts = np.zeros((indexs_max + 1, ), dtype=np.int32)
    for ind in indexs:
        counts[ind] += 1
    return counts

def _test2():
    counts = np.zeros((indexs_max + 1,), dtype=np.int32)
    uniq_vals, uniq_cnts = np.unique(indexs, return_counts=True)
    counts[uniq_vals] = uniq_cnts
    # this is because some value in range may be missing
    return counts

def _test3():
    therange = np.arange(0, indexs_max + 1)
    counts = np.sum(indexs[None] == therange[:, None], axis=1)
    return counts

def _test4():
    return np.bincount(indexs, minlength=indexs_max+1)

JavaScript
 
indexs = np.random.randint(0, 10, (20000,))
indexs_max = 9
​
def _test1():
    counts = np.zeros((indexs_max + 1, ), dtype=np.int32)
    for ind in indexs:
        counts[ind] += 1
    return counts
​
def _test2():
    counts = np.zeros((indexs_max + 1,), dtype=np.int32)
    uniq_vals, uniq_cnts = np.unique(indexs, return_counts=True)
    counts[uniq_vals] = uniq_cnts
    # this is because some value in range may be missing
    return counts
​
def _test3():
    therange = np.arange(0, indexs_max + 1)
    counts = np.sum(indexs[None] == therange[:, None], axis=1)
    return counts
​
def _test4():
    return np.bincount(indexs, minlength=indexs_max+1)
​

Run for 500 times, their time usage are respectively 32.499472856521606s, 0.31386804580688477s, 0.14069509506225586s, 0.017721891403198242s. ~~Although _test3 is the fastest, it uses additional big memory.~~

So I’m asking for any better methods. Thank u :) (@Ch3steR)

UPDATE: np.bincount seems optimal so far.

Answer

You can use np.bincount to count the occurrences in an array.

indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0])
np.bincount(indexs)
# array([3, 2,  1,  1,  1])
#        0's 1's 2's 3's 4's count

JavaScript
 
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0])
np.bincount(indexs)
# array([3, 2,  1,  1,  1])
#        0's 1's 2's 3's 4's count
​

There’s a caveat to it np.bincount(x).size == np.amax(x)+1

Example:

indexs = np.array([5, 10])
np.bincount(indexs)
# array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1])
#                       5's            10's count

JavaScript
 
indexs = np.array([5, 10])
np.bincount(indexs)
# array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1])
#                       5's            10's count
​

Here’s it would count occurrences of 0 to the max in the array, a workaround can be

c = np.bincount(indexs) # indexs is [5, 10]
c = c[c>0]
# array([1,  1])
#        5's 10's count

JavaScript
 
c = np.bincount(indexs) # indexs is [5, 10]
c = c[c>0]
# array([1,  1])
#        5's 10's count
​

If you have no missing values from i.e from 0 to your_max you can use np.bincount.

Another caveat:

From docs:

Count the number of occurrences of each value in an array of non-negative ints.

Advertisement

Answer