Suppose i have:
x1 = [1, 3, 2, 4]
and:
x2 = [0, 1, 1, 0]
with the same shape
now i want to “put x2 ontop of x1” and sum up all the numbers of x1 corresponding to the numbers of x2
so the end result is:
end = [1+4 ,3+2] # end[0] is the sum of all numbers of x1 where a 0 was in x2
this is a naive implementation using list to further clarify the question
store_0 = 0 store_1 = 0 x1 = [1, 3, 4, 2] x2 = [0, 1, 1, 0] for value_x1 ,value_x2 in zip(x1 ,x2): if value_x2 == 0: store_0 += value_x1 elif value_x2 == 1: store_1 += value_x1
so my question: is there is a way to implement this in numpy without using loops or in general just faster?
Advertisement
Answer
In this particular example (and, in general, for unique
, duplicated
, and groupby
kinds of operations), pandas
is faster than a pure numpy
solution:
A pandas
way, using Series
(credit: very similar to @mcsoini’s answer):
def pd_group_sum(x1, x2): return pd.Series(x1, index=x2).groupby(x2).sum()
A pure numpy
way, using np.unique
and some fancy indexing:
def np_group_sum(a, groups): _, ix, rix = np.unique(groups, return_index=True, return_inverse=True) return np.where(np.arange(len(ix))[:, None] == rix, a, 0).sum(axis=1)
Note: a better pure numpy
way is inspired by @Woodford’s answer:
def selsum(a, g, e): return a[g==e].sum() vselsum = np.vectorize(selsum, signature='(n),(n),()->()') def np_group_sum2(a, groups): return vselsum(a, groups, np.unique(groups))
Yet another pure numpy
way is inspired by a comment from @mapf about using argsort()
. That in itself already takes 45ms, but we may try something based on np.argpartition(x2, len(x2)-1)
instead, since that takes only 7.5ms by itself on the benchmark below:
def np_group_sum3(a, groups): ix = np.argpartition(groups, len(groups)-1) ends = np.nonzero(np.diff(np.r_[groups[ix], groups.max() + 1]))[0] return np.diff(np.r_[0, a[ix].cumsum()[ends]])
(Slightly modified) example
x1 = np.array([1, 3, 2, 4, 8]) # I added a group for sake of generality x2 = np.array([0, 1, 1, 0, 7]) >>> pd_group_sum(x1, x2) 0 5 1 5 7 8 >>> np_group_sum(x1, x2) # and all the np_group_sum() variants array([5, 5, 8])
Speed
n = 1_000_000 x1 = np.random.randint(0, 20, n) x2 = np.random.randint(0, 20, n) %timeit pd_group_sum(x1, x2) # 13.9 ms ± 65.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit np_group_sum(x1, x2) # 171 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit np_group_sum2(x1, x2) # 66.7 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit np_group_sum3(x1, x2) # 25.6 ms ± 41.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Going via pandas is faster, in part because of numpy issue 11136.