Skip to content
Advertisement

Pytorch’s nn.TransformerEncoder “src_key_padding_mask” not functioning as expected

Im working with Pytorch’s nn.TransformerEncoder module. I got input samples with (as normal) the shape (batch-size, seq-len, emb-dim). All samples in one batch have been zero-padded to the size of the biggest sample in this batch. Therefore I want the attention of the all zero values to be ignored.

The documentation says, to add an argument src_key_padding_mask to the forward function of the nn.TransformerEncoder module. This mask should be a tensor with shape (batch-size, seq-len) and have for each index either True for the pad-zeros or False for anything else.

I achieved that by doing:

JavaScript

Everything works good when I dont set the src_key_padding_mask. But the error I get when I do is the following:

JavaScript

Seems seems like it is comparing the first dimension of the mask, which is the batch-size, with bsz which probably stands for batch-size. But why is it failing then? Help very much appreciated!

Advertisement

Answer

I got the same issue, which is not a bug: pytorch’s Transformer implementation requires the input x to be (seq-len x batch-size x emb-dim) while yours seems to be (batch-size x seq-len x emb-dim).

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement