Assuming I have a numpy array like: [1,2,3,4,5,6] and another array: [0,0,1,2,2,1] I want to sum the items in the first array by group (the second array) and obtain n-groups results in group number order (in this case the result would be [3, 9, 9]). How do I do this in numpy?
Advertisement
Answer
There’s more than one way to do this, but here’s one way:
import numpy as np data = np.arange(1, 7) groups = np.array([0,0,1,2,2,1]) unique_groups = np.unique(groups) sums = [] for group in unique_groups: sums.append(data[groups == group].sum())
You can vectorize things so that there’s no for loop at all, but I’d recommend against it. It becomes unreadable, and will require a couple of 2D temporary arrays, which could require large amounts of memory if you have a lot of data.
Edit: Here’s one way you could entirely vectorize. Keep in mind that this may (and likely will) be slower than the version above. (And there may be a better way to vectorize this, but it’s late and I’m tired, so this is just the first thing to pop into my head…)
However, keep in mind that this is a bad example… You’re really better off (both in terms of speed and readability) with the loop above…
import numpy as np data = np.arange(1, 7) groups = np.array([0,0,1,2,2,1]) unique_groups = np.unique(groups) # Forgive the bad naming here... # I can't think of more descriptive variable names at the moment... x, y = np.meshgrid(groups, unique_groups) data_stack = np.tile(data, (unique_groups.size, 1)) data_in_group = np.zeros_like(data_stack) data_in_group[x==y] = data_stack[x==y] sums = data_in_group.sum(axis=1)