Tag: transformer-model

Masking layer vs attention_mask parameter in MultiHeadAttention

keras python tensorflow transformer-model

I use MultiHeadAttention layer in my transformer model (my model is very similar to the named entity recognition models). Because my data comes with different lengths, I use padding and attention_mask parameter in MultiHeadAttention to mask padding. If I would use the Masking layer before MultiHeadAttention, will it have the same effect as attention_mask parameter? Or should I use both:

PyTorch TransformerEncoderLayer different input order gets different results

python pytorch transformer-model

Before I start, I’m very new to Transformers, and sorry for by bad sentence structure, I have a fever right now. Any time I use nn.TransformerEncoderLayer in anyway with a saved model if the data is in a different order I get different results. Is there a way to save the Encode table (or whatever this would be), this would

MultiHeadAttention giving very different values between versions (Pytorch/Tensorflow

attention-model python pytorch tensorflow transformer-model

I’m trying to recreate a transformer that was written in Pytorch and make it Tensorflow. Everything was going pretty well until each version of MultiHeadAttention started giving extremely different outputs. Both methods are an implementation of multi-headed attention as described in the paper “Attention is all you Need”, so they should be able to achieve the same output. I’m converting

Load a model as DPRQuestionEncoder in HuggingFace

bert-language-model huggingface-transformers nlp python transformer-model

I would like to load the BERT’s weights (or whatever transformer) into a DPRQuestionEncoder architecture, such that I can use the HuggingFace save_pretrained method and plug the saved model into the RAG architecture to do end-to-end fine-tuning. But I got the following error I am using the last version of Transformers. Answer As already mentioned in the comments, DPRQuestionEncoder does

Pytorch’s nn.TransformerEncoder “src_key_padding_mask” not functioning as expected

nlp python pytorch transformer-model

Im working with Pytorch’s nn.TransformerEncoder module. I got input samples with (as normal) the shape (batch-size, seq-len, emb-dim). All samples in one batch have been zero-padded to the size of the biggest sample in this batch. Therefore I want the attention of the all zero values to be ignored. The documentation says, to add an argument src_key_padding_mask to the forward