I have an array indexs
. It’s very long (>10k), and each int value is rather small (<100). e.g.
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0]) # int index array indexs_max = 4 # already known
Now I want to count occurrence of each index value (e.g. 0 for 3 times, 1 for 2 times…), and get counts
as np.array([3, 2, 1, 1, 1])
. I have tested 4 methods as follows:
UPDATE
: _test4
is @Ch3steR‘s sol:
indexs = np.random.randint(0, 10, (20000,)) indexs_max = 9 def _test1(): counts = np.zeros((indexs_max + 1, ), dtype=np.int32) for ind in indexs: counts[ind] += 1 return counts def _test2(): counts = np.zeros((indexs_max + 1,), dtype=np.int32) uniq_vals, uniq_cnts = np.unique(indexs, return_counts=True) counts[uniq_vals] = uniq_cnts # this is because some value in range may be missing return counts def _test3(): therange = np.arange(0, indexs_max + 1) counts = np.sum(indexs[None] == therange[:, None], axis=1) return counts def _test4(): return np.bincount(indexs, minlength=indexs_max+1)
Run for 500 times, their time usage are respectively 32.499472856521606s
, 0.31386804580688477s
, 0.14069509506225586s
, 0.017721891403198242s
. Although _test3
is the fastest, it uses additional big memory.
So I’m asking for any better methods. Thank u :) (@Ch3steR)
UPDATE
: np.bincount
seems optimal so far.
Advertisement
Answer
You can use np.bincount
to count the occurrences in an array.
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0]) np.bincount(indexs) # array([3, 2, 1, 1, 1]) # 0's 1's 2's 3's 4's count
There’s a caveat to it np.bincount(x).size == np.amax(x)+1
Example:
indexs = np.array([5, 10]) np.bincount(indexs) # array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]) # 5's 10's countHere’s it would count occurrences of 0 to the max in the array, a workaround can be
c = np.bincount(indexs) # indexs is [5, 10] c = c[c>0] # array([1, 1]) # 5's 10's countIf you have no missing values from i.e from
0
toyour_max
you can usenp.bincount
.
Another caveat:
From docs:
Count the number of occurrences of each value in an array of non-negative ints.