Skip to content
Advertisement

Looking for a more efficient way to do the array multiplication in this loop

I have a script that is taking a bit long to run, so I was trying to look through and speed it up where I can. I found a part that takes ~10 minutes or so and I feel like it could be a bit more efficient, but I might be wrong.

Basically, I am trying to multiply one array, W, with another array, res. W is an NxN diagonal matrix with 0’s on the off diagonals and real numbers on the diagonal and I have Y occurrences such that the full size of the array is NxNxY.

res is a NxY array.

I want to multiply W*res for each Y. So a single instance would be like W[:,:,1]*res[:,1].

In the simplest case this could be a for loop, but that was super inefficient, so I figured this out

Wres = np.squeeze(np.matmul(W.transpose(2, 0, 1), res.T[..., None])).T

This is better, but this is all being done in a different loop that ends up repeated T times.

I was hoping to improve this by saving each instance of W and res, so they end up being NxNxYxT and NxYxT respectively and doing it all in a single operation. Which I think I accomplished by this:

W_T = np.squeeze(np.matmul(W_T.transpose(2,3,0,1), res_T.transpose(1,2,0)[..., None])).T

But this seems to take even longer.

I am hoping someone can see something I am missing.

I attempted to write a script to replicate this. It isn’t perfect, but I think it demonstrates the difference. W and res are not randomly generated, so improving those parts won’t help (which are particularly slow right now so I timed the relevant parts separately). It seems like the 2nd case is faster for smaller sizes, but pretty quickly the 1st case becomes faster.

import numpy as np
import time

gsize = 10000 #real size is 40000
t_range = np.arange(0,100) # real size is 200
n_range = np.arange(0,5) # real size is 65

I = np.identity(len(n_range))
test = np.zeros([gsize,len(t_range)]);

W = np.zeros([len(n_range),len(n_range),gsize])
tr  = np.random.rand(len(n_range),gsize)
for i in range(0,tr.shape[1]):
   W[:,:,i] = I*tr[:,i]
    
res = np.random.rand(len(n_range),gsize)

# first case
tot_time = 0
for t in t_range:
    #Generating random inputs for example, this are not normally random.
    res = np.random.rand(len(n_range),gsize)
    W = np.zeros([len(n_range),len(n_range),gsize])
    tr  = np.random.rand(len(n_range),gsize)
    for i in range(0,tr.shape[1]):
       W[:,:,i] = I*tr[:,i]

    st = time.time()
    Wres = np.squeeze(np.matmul(W.transpose(2, 0, 1), res.T[..., None])).T
    ss = (np.sum(np.square(Wres), axis=0))
    test[:,t] = ss
    tot_time = tot_time + time.time() - st

print(tot_time)   # total time for first case
# 2nd method
W_T = np.zeros((len(n_range),len(n_range),gsize,len(t_range)), float)
res_T = np.zeros((len(n_range),gsize,len(t_range)), float)


for t in t_range:
    res = np.random.rand(len(n_range),gsize)
    W = np.zeros([len(n_range),len(n_range),gsize])
    tr  = np.random.rand(len(n_range),gsize)
    for i in range(0,tr.shape[1]):
       W[:,:,i] = I*tr[:,i]
    
    res_T[:,:,t] = res
    W_T[:,:,:,t] = W 


st = time.time()
Wres_T = np.squeeze(np.matmul(W_T.transpose(2,3,0,1), res_T.transpose(1,2,0)[..., None])).T
test = np.sum(Wres_T, axis = 0)

print(time.time() - st)    #time for second case

I’d appreciate any help/ideas.

Thanks

Advertisement

Answer

Given that you know some of your matrix is just a diagonal, you can save a lot of time only using the diagonal part.

Your original methods take:

0.05394101142883301
0.04986453056335449

On my computer.

Here’s a couple of tries at it. Make sure you verify there’s no mistakes in the values: it looks ok to me though.

Replacing just the Wres call in the first call, with Wres = (np.diagonal(W, axis1=0, axis2=1).T*res)

tot_time = 0
for t in t_range:
    #Generating random inputs for example, this are not normally random.
    res = np.random.rand(len(n_range),gsize)
    W = np.zeros([len(n_range),len(n_range),gsize])
    tr  = np.random.rand(len(n_range),gsize)
    for i in range(0,tr.shape[1]):
       W[:,:,i] = I*tr[:,i]

    st = time.time()
    #Wres = np.squeeze(np.matmul(W.transpose(2, 0, 1), res.T[..., None])).T
    Wres = (np.diagonal(W, axis1=0, axis2=1).T*res)
    ss = (np.sum(np.square(Wres), axis=0))
    test[:,t] = ss
    tot_time = tot_time + time.time() - st

print(tot_time)   # total time for first case

Gives me:

0.01994800567626953

Modifying the second one in a similar manner: Wres_T = np.diagonal(W_T).T * res_T.transpose(0,2,1)

# 2nd method
W_T = np.zeros((len(n_range),len(n_range),gsize,len(t_range)), float)
res_T = np.zeros((len(n_range),gsize,len(t_range)), float)


for t in t_range:
    res = np.random.rand(len(n_range),gsize)
    W = np.zeros([len(n_range),len(n_range),gsize])
    tr  = np.random.rand(len(n_range),gsize)
    for i in range(0,tr.shape[1]):
       W[:,:,i] = I*tr[:,i]
    
    res_T[:,:,t] = res
    W_T[:,:,:,t] = W 


st = time.time()
#Wres_T = np.squeeze(np.matmul(W_T.transpose(2,3,0,1), res_T.transpose(1,2,0)[..., None])).T
Wres_T = np.diagonal(W_T).T *  res_T.transpose(0,2,1)
test = np.sum(Wres_T, axis = 0)

print(time.time() - st)    #time for second case

Gives me: 0.011936426162719727

So, we are at between 1/4 to 1/5 of the original time.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement