Looking for a more efficient way to do the array multiplication in this loop

Question

I have a script that is taking a bit long to run, so I was trying to look through and speed it up where I can. I found a part that takes ~10 minutes or so and I feel like it could be a bit more efficient, but I might be wrong. Basically, I am trying to multiply one array,

Accepted Answer

Given that you know some of your matrix is just a diagonal, you can save a lot of time only using the diagonal part.Your original methods take:0.053941011428833010.04986453056335449On my computer.Here&#8217;s a couple of tries at it. Make sure you verify there&#8217;s no mistakes in the values: it looks ok to me though.Replacing just the Wres call in the first call, with Wres = (np.diagonal(W, axis1=0, axis2=1).T*res)tot_time = 0for t in t_range:    #Generating random inputs for example, this are not normally random.    res = np.random.rand(len(n_range),gsize)    W = np.zeros([len(n_range),len(n_range),gsize])    tr  = np.random.rand(len(n_range),gsize)    for i in range(0,tr.shape[1]):       W[:,:,i] = I*tr[:,i]    st = time.time()    #Wres = np.squeeze(np.matmul(W.transpose(2, 0, 1), res.T[..., None])).T    Wres = (np.diagonal(W, axis1=0, axis2=1).T*res)    ss = (np.sum(np.square(Wres), axis=0))    test[:,t] = ss    tot_time = tot_time + time.time() - stprint(tot_time)   # total time for first caseGives me:0.01994800567626953Modifying the second one in a similar manner: Wres_T = np.diagonal(W_T).T *  res_T.transpose(0,2,1)# 2nd methodW_T = np.zeros((len(n_range),len(n_range),gsize,len(t_range)), float)res_T = np.zeros((len(n_range),gsize,len(t_range)), float)for t in t_range:    res = np.random.rand(len(n_range),gsize)    W = np.zeros([len(n_range),len(n_range),gsize])    tr  = np.random.rand(len(n_range),gsize)    for i in range(0,tr.shape[1]):       W[:,:,i] = I*tr[:,i]        res_T[:,:,t] = res    W_T[:,:,:,t] = W st = time.time()#Wres_T = np.squeeze(np.matmul(W_T.transpose(2,3,0,1), res_T.transpose(1,2,0)[..., None])).TWres_T = np.diagonal(W_T).T *  res_T.transpose(0,2,1)test = np.sum(Wres_T, axis = 0)print(time.time() - st)    #time for second caseGives me:0.011936426162719727So, we are at between 1/4 to 1/5 of the original time.

Advertisement

Answer