I have a script that is taking a bit long to run, so I was trying to look through and speed it up where I can. I found a part that takes ~10 minutes or so and I feel like it could be a bit more efficient, but I might be wrong.
Basically, I am trying to multiply one array, W, with another array, res. W is an NxN diagonal matrix with 0’s on the off diagonals and real numbers on the diagonal and I have Y occurrences such that the full size of the array is NxNxY.
res is a NxY array.
I want to multiply W*res for each Y. So a single instance would be like W[:,:,1]*res[:,1].
In the simplest case this could be a for loop, but that was super inefficient, so I figured this out
Wres = np.squeeze(np.matmul(W.transpose(2, 0, 1), res.T[..., None])).T
This is better, but this is all being done in a different loop that ends up repeated T times.
I was hoping to improve this by saving each instance of W and res, so they end up being NxNxYxT and NxYxT respectively and doing it all in a single operation. Which I think I accomplished by this:
W_T = np.squeeze(np.matmul(W_T.transpose(2,3,0,1), res_T.transpose(1,2,0)[..., None])).T
But this seems to take even longer.
I am hoping someone can see something I am missing.
I attempted to write a script to replicate this. It isn’t perfect, but I think it demonstrates the difference. W and res are not randomly generated, so improving those parts won’t help (which are particularly slow right now so I timed the relevant parts separately). It seems like the 2nd case is faster for smaller sizes, but pretty quickly the 1st case becomes faster.
import numpy as np import time gsize = 10000 #real size is 40000 t_range = np.arange(0,100) # real size is 200 n_range = np.arange(0,5) # real size is 65 I = np.identity(len(n_range)) test = np.zeros([gsize,len(t_range)]); W = np.zeros([len(n_range),len(n_range),gsize]) tr = np.random.rand(len(n_range),gsize) for i in range(0,tr.shape[1]): W[:,:,i] = I*tr[:,i] res = np.random.rand(len(n_range),gsize) # first case tot_time = 0 for t in t_range: #Generating random inputs for example, this are not normally random. res = np.random.rand(len(n_range),gsize) W = np.zeros([len(n_range),len(n_range),gsize]) tr = np.random.rand(len(n_range),gsize) for i in range(0,tr.shape[1]): W[:,:,i] = I*tr[:,i] st = time.time() Wres = np.squeeze(np.matmul(W.transpose(2, 0, 1), res.T[..., None])).T ss = (np.sum(np.square(Wres), axis=0)) test[:,t] = ss tot_time = tot_time + time.time() - st print(tot_time) # total time for first case # 2nd method W_T = np.zeros((len(n_range),len(n_range),gsize,len(t_range)), float) res_T = np.zeros((len(n_range),gsize,len(t_range)), float) for t in t_range: res = np.random.rand(len(n_range),gsize) W = np.zeros([len(n_range),len(n_range),gsize]) tr = np.random.rand(len(n_range),gsize) for i in range(0,tr.shape[1]): W[:,:,i] = I*tr[:,i] res_T[:,:,t] = res W_T[:,:,:,t] = W st = time.time() Wres_T = np.squeeze(np.matmul(W_T.transpose(2,3,0,1), res_T.transpose(1,2,0)[..., None])).T test = np.sum(Wres_T, axis = 0) print(time.time() - st) #time for second case
I’d appreciate any help/ideas.
Thanks
Advertisement
Answer
Given that you know some of your matrix is just a diagonal, you can save a lot of time only using the diagonal part.
Your original methods take:
0.05394101142883301 0.04986453056335449
On my computer.
Here’s a couple of tries at it. Make sure you verify there’s no mistakes in the values: it looks ok to me though.
Replacing just the Wres call in the first call, with Wres = (np.diagonal(W, axis1=0, axis2=1).T*res)
tot_time = 0 for t in t_range: #Generating random inputs for example, this are not normally random. res = np.random.rand(len(n_range),gsize) W = np.zeros([len(n_range),len(n_range),gsize]) tr = np.random.rand(len(n_range),gsize) for i in range(0,tr.shape[1]): W[:,:,i] = I*tr[:,i] st = time.time() #Wres = np.squeeze(np.matmul(W.transpose(2, 0, 1), res.T[..., None])).T Wres = (np.diagonal(W, axis1=0, axis2=1).T*res) ss = (np.sum(np.square(Wres), axis=0)) test[:,t] = ss tot_time = tot_time + time.time() - st print(tot_time) # total time for first case
Gives me:
0.01994800567626953
Modifying the second one in a similar manner: Wres_T = np.diagonal(W_T).T * res_T.transpose(0,2,1)
# 2nd method W_T = np.zeros((len(n_range),len(n_range),gsize,len(t_range)), float) res_T = np.zeros((len(n_range),gsize,len(t_range)), float) for t in t_range: res = np.random.rand(len(n_range),gsize) W = np.zeros([len(n_range),len(n_range),gsize]) tr = np.random.rand(len(n_range),gsize) for i in range(0,tr.shape[1]): W[:,:,i] = I*tr[:,i] res_T[:,:,t] = res W_T[:,:,:,t] = W st = time.time() #Wres_T = np.squeeze(np.matmul(W_T.transpose(2,3,0,1), res_T.transpose(1,2,0)[..., None])).T Wres_T = np.diagonal(W_T).T * res_T.transpose(0,2,1) test = np.sum(Wres_T, axis = 0) print(time.time() - st) #time for second case
Gives me: 0.011936426162719727
So, we are at between 1/4 to 1/5 of the original time.