Suppose i have:
x1 = [1, 3, 2, 4]
and:
x2 = [0, 1, 1, 0]
with the same shape
now i want to “put x2 ontop of x1” and sum up all the numbers of x1 corresponding to the numbers of x2
so the end result is:
end = [1+4 ,3+2] # end[0] is the sum of all numbers of x1 where a 0 was in x2
this is a naive implementation using list to further clarify the question
store_0 = 0
store_1 = 0
x1 = [1, 3, 4, 2]
x2 = [0, 1, 1, 0]
for value_x1 ,value_x2 in zip(x1 ,x2):
if value_x2 == 0:
store_0 += value_x1
elif value_x2 == 1:
store_1 += value_x1
so my question: is there is a way to implement this in numpy without using loops or in general just faster?
Advertisement
Answer
In this particular example (and, in general, for unique, duplicated, and groupby kinds of operations), pandas is faster than a pure numpy solution:
A pandas way, using Series (credit: very similar to @mcsoini’s answer):
def pd_group_sum(x1, x2):
return pd.Series(x1, index=x2).groupby(x2).sum()
A pure numpy way, using np.unique and some fancy indexing:
def np_group_sum(a, groups):
_, ix, rix = np.unique(groups, return_index=True, return_inverse=True)
return np.where(np.arange(len(ix))[:, None] == rix, a, 0).sum(axis=1)
Note: a better pure numpy way is inspired by @Woodford’s answer:
def selsum(a, g, e):
return a[g==e].sum()
vselsum = np.vectorize(selsum, signature='(n),(n),()->()')
def np_group_sum2(a, groups):
return vselsum(a, groups, np.unique(groups))
Yet another pure numpy way is inspired by a comment from @mapf about using argsort(). That in itself already takes 45ms, but we may try something based on np.argpartition(x2, len(x2)-1) instead, since that takes only 7.5ms by itself on the benchmark below:
def np_group_sum3(a, groups):
ix = np.argpartition(groups, len(groups)-1)
ends = np.nonzero(np.diff(np.r_[groups[ix], groups.max() + 1]))[0]
return np.diff(np.r_[0, a[ix].cumsum()[ends]])
(Slightly modified) example
x1 = np.array([1, 3, 2, 4, 8]) # I added a group for sake of generality x2 = np.array([0, 1, 1, 0, 7]) >>> pd_group_sum(x1, x2) 0 5 1 5 7 8 >>> np_group_sum(x1, x2) # and all the np_group_sum() variants array([5, 5, 8])
Speed
n = 1_000_000 x1 = np.random.randint(0, 20, n) x2 = np.random.randint(0, 20, n) %timeit pd_group_sum(x1, x2) # 13.9 ms ± 65.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit np_group_sum(x1, x2) # 171 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit np_group_sum2(x1, x2) # 66.7 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit np_group_sum3(x1, x2) # 25.6 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Going via pandas is faster, in part because of numpy issue 11136.