I’m trying to recreate a transformer that was written in Pytorch and make it Tensorflow. Everything was going pretty well until each version of MultiHeadAttention started giving extremely different outputs. Both methods are an implementation of multi-headed attention as described in the paper “Attention is all you Need”, so they should be able to achieve the same output. I’m converting
Tag: attention-model
How to add an attention layer to LSTM autoencoder built as sequential keras model in python?
So I want to build an autoencoder model for sequence data. I have started to build a sequential keras model in python and now I want to add an attention layer in the middle, but have no idea how to approach this. My model so far: So far I have tried to add an attention function copied from here and