So I want to build an autoencoder model for sequence data. I have started to build a sequential keras model in python and now I want to add an attention layer in the middle, but have no idea how to approach this. My model so far:
from keras.layers import LSTM, TimeDistributed, RepeatVector, Layer from keras.models import Sequential import keras.backend as K model = Sequential() model.add(LSTM(20, activation="relu", input_shape=(time_steps,n_features), return_sequences=False)) model.add(RepeatVector(time_steps, name="bottleneck_output")) model.add(LSTM(30, activation="relu", return_sequences=True)) model.add(TimeDistributed(Dense(n_features))) model.compile(optimizer="adam", loss="mae")
So far I have tried to add an attention function copied from here
class attention(Layer): def __init__(self,**kwargs): super(attention,self).__init__(**kwargs) def build(self,input_shape): self.W=self.add_weight(name="att_weight",shape=(input_shape[-1],1),initializer="normal") self.b=self.add_weight(name="att_bias",shape=(input_shape[1],1),initializer="zeros") super(attention, self).build(input_shape) def call(self,x): et=K.squeeze(K.tanh(K.dot(x,self.W)+self.b),axis=-1) at=K.softmax(et) at=K.expand_dims(at,axis=-1) output=x*at return K.sum(output,axis=1) def compute_output_shape(self,input_shape): return (input_shape[0],input_shape[-1]) def get_config(self): return super(attention,self).get_config()
and added it after first LSTM, before repeat vector, i.e:
model = Sequential() model.add(LSTM(20, activation="relu", input_shape=(time_steps,n_features), return_sequences=False)) model.add(attention()) # this is added model.add(RepeatVector(time_steps, name="bottleneck_output")) model.add(LSTM(30, activation="relu", return_sequences=True)) model.add(TimeDistributed(Dense(n_features))) model.compile(optimizer="adam", loss="mae")
but the code gives error, because the dimensions somehow do not fit and the problem is in putting output of attention() to repeat vector:
ValueError: Input 0 is incompatible with layer bottleneck_output: expected ndim=2, found ndim=1
…. but according to model.summary()
the output dimension of attention layer is (None, 20)
, which is the same also for the first lstm_1 layer . The code works without attention layer.
I would appreciate also some explanation why the solution is the solution to the problem, I am fairly new to python and have problems understanding what the class attention()
is doing. I just copied it and tried to use it which is ofcrs not working….
Advertisement
Answer
Ok, I solved it. There has to be return_sequence = True
in first LSTM layer. Then it works as it is.