Skip to content
Advertisement

Can this for loop be vectorized?

data = np.arange(10)
n = len(data)    
np.asarray([np.sum((data[0:i]-np.mean(data[0:i]))**2) for i in range(1,n)])

Can this for loop be vectorized maybe by expanding dimensions and then collapsing it?

I got the hint from somewhere that I can replace

np.mean(data[0:i])

with

np.cumsum(data[0:n-1])/(np.arange(n-1)+1)

Advertisement

Answer

It can be vectorized by expanding dimensions as you suggested. I think the secret sauce is using np.tril to zero out terms in the progression before summing:

# calculate means using cumsum
mean = np.cumsum(data) / np.arange(1, n+1)

# expand into 2 dimensions
mean_2d = np.repeat(mean, n).reshape(n, n)
data_2d = np.tile(data, n).reshape(n, n)

# zero out unneeded terms
diff_squared = np.tril((data_2d-mean_2d)**2)

# sum along rows
np.sum(diff_squared, axis=1)
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement