I have two arrays:
vals has shape (N,m) where N is ~1 million, and m is 3. The values are floats
I have another array indices with shape (N,4). All values in indices are row indices in vals. (Additionally, unlike the example here, every row of indices contains unique values.).
import numpy as np from random import randrange # set up the arrays for this test example (no need to improve this) N = 9 vals = np.array(list(range(3*N))).reshape((N,3)) indices = np.array([randrange(N) for n in range(4*N)]).reshape((N,4))
I would like replace the following for loop when creating the array aug
# form an augmented matrix by indexing into vals using rows from indices aug = np.stack([vals[indices[x]] for x in range(N)]) # compute a mean along axis=1 of aug aug.mean(axis=1)
The broader context for the question is vals contains numeric data for particles distributed in 3D. indices is generated using a nearest neighbor search on the spatial positions of the particles (using scipy.spatial.cKDTree) . I would like to average the numeric data over the nearest neighbors. As I have ~1 million particles, a for-loop is quite slow.
Advertisement
Answer
You actually can replace the entire aug = ... line with
aug = vals[indices]
That will produce the same result:
np.array_equal(
np.stack([vals[indices[x]] for x in range(N)]),
vals[indices]
)
# True