Eliminate for loop when indexing into array

I have two arrays: vals has shape (N,m) where N is ~1 million, and m is 3. The values are floats I have another array indices with shape (N,4). All values in indices are row indices in vals. (Additionally, unlike the example here, every row of indices contains unique values.).

import numpy as np
from random import randrange

# set up the arrays for this test example (no need to improve this)
N = 9
vals = np.array(list(range(3*N))).reshape((N,3))
indices = np.array([randrange(N) for n in range(4*N)]).reshape((N,4))

I would like replace the following for loop when creating the array aug

# form an augmented matrix by indexing into vals using rows from indices
aug = np.stack([vals[indices[x]] for x in range(N)])

# compute a mean along axis=1 of aug
aug.mean(axis=1)

The broader context for the question is vals contains numeric data for particles distributed in 3D. indices is generated using a nearest neighbor search on the spatial positions of the particles (using scipy.spatial.cKDTree) . I would like to average the numeric data over the nearest neighbors. As I have ~1 million particles, a for-loop is quite slow.

Answer

You actually can replace the entire aug = ... line with

aug = vals[indices]

That will produce the same result:

np.array_equal(
    np.stack([vals[indices[x]] for x in range(N)]),
    vals[indices]
)
# True

Advertisement

Answer