Why does this custom function cost too much time while backward in pytorch?

Question

I'm revising a baseline method in pytorch. But when I add a custom function in the training phase, the cost time of backward increases 4x on a single V100. Here is an example of the custom function: where b is the batch size, 16; h and w are the spatial dimensions, 100; k is equal to 21. I'm not sure

Accepted Answer

You might be able to get a performance boost on the double tensor multiplication by using torch.einsum:>>> o = torch.einsum('acdefg,bshigj,kldejm->bsdefm', ZZ_t, INV_SIGMA, ZZ)The resulting tensor o will be shaped (b, h*w, k, k, 1, 1)For details on the subscript notation:b: batch dimension.s: &#8216;s&#8217; for spatial, i.e. the h*w dimension.d and e: the two k dimensions which are paired across ZZ_t and ZZ.A simple 2D matrix multiplication applying matmul with ij,jk->ik.Keeping that in mind, we have in your case:A first multiplication: r = ZZ_t@INV_SIGMA which does something like *fg,*gj->*fj, the asterisk sign * refers to leading dimensions.A second matrix multiplication: r@INV_SIGMA which comes down to *fj,*jm->*fm.Overall, if we combine both, we get directly: *fg,*gj,*jm->*fm.Finally, I have assigned all other dimensions to random but different subscript letters:a, c, f, h, i, k, lReplacing the asterisk above with those notations, we get the following subscript input:#  *  fg, *  gj, *  jm-> *  fm# acdefg,bshigj,kldejm->bsdefm

Advertisement

Answer