Skip to content
Advertisement

Numpy: Optimal way to count indexs occurrence in an array

I have an array indexs. It’s very long (>10k), and each int value is rather small (<100). e.g.

JavaScript

Now I want to count occurrence of each index value (e.g. 0 for 3 times, 1 for 2 times…), and get counts as np.array([3, 2, 1, 1, 1]). I have tested 4 methods as follows:

UPDATE: _test4 is @Ch3steR‘s sol:


JavaScript

Run for 500 times, their time usage are respectively 32.499472856521606s, 0.31386804580688477s, 0.14069509506225586s, 0.017721891403198242s. Although _test3 is the fastest, it uses additional big memory.

So I’m asking for any better methods. Thank u :) (@Ch3steR)


UPDATE: np.bincount seems optimal so far.

Advertisement

Answer

You can use np.bincount to count the occurrences in an array.

JavaScript

There’s a caveat to it np.bincount(x).size == np.amax(x)+1

Example:

JavaScript

Here’s it would count occurrences of 0 to the max in the array, a workaround can be

JavaScript

If you have no missing values from i.e from 0 to your_max you can use np.bincount.

Another caveat:

From docs:

Count the number of occurrences of each value in an array of non-negative ints.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement