data = np.arange(10) n = len(data) np.asarray([np.sum((data[0:i]-np.mean(data[0:i]))**2) for i in range(1,n)])
Can this for loop be vectorized maybe by expanding dimensions and then collapsing it?
I got the hint from somewhere that I can replace
np.mean(data[0:i])
with
np.cumsum(data[0:n-1])/(np.arange(n-1)+1)
Advertisement
Answer
It can be vectorized by expanding dimensions as you suggested. I think the secret sauce is using np.tril to zero out terms in the progression before summing:
# calculate means using cumsum mean = np.cumsum(data) / np.arange(1, n+1) # expand into 2 dimensions mean_2d = np.repeat(mean, n).reshape(n, n) data_2d = np.tile(data, n).reshape(n, n) # zero out unneeded terms diff_squared = np.tril((data_2d-mean_2d)**2) # sum along rows np.sum(diff_squared, axis=1)