I have two arrays:
vals
has shape (N,m) where N is ~1 million, and m is 3. The values are floats
I have another array indices
with shape (N,4)
. All values in indices
are row indices in vals
. (Additionally, unlike the example here, every row of indices
contains unique values.).
import numpy as np from random import randrange # set up the arrays for this test example (no need to improve this) N = 9 vals = np.array(list(range(3*N))).reshape((N,3)) indices = np.array([randrange(N) for n in range(4*N)]).reshape((N,4))
I would like replace the following for loop when creating the array aug
# form an augmented matrix by indexing into vals using rows from indices aug = np.stack([vals[indices[x]] for x in range(N)]) # compute a mean along axis=1 of aug aug.mean(axis=1)
The broader context for the question is vals
contains numeric data for particles distributed in 3D. indices
is generated using a nearest neighbor search on the spatial positions of the particles (using scipy.spatial.cKDTree
) . I would like to average the numeric data over the nearest neighbors. As I have ~1 million particles, a for-loop is quite slow.
Advertisement
Answer
You actually can replace the entire aug = ...
line with
aug = vals[indices]
That will produce the same result:
np.array_equal( np.stack([vals[indices[x]] for x in range(N)]), vals[indices] ) # True