I know you can use different types of layers in an RNN architecture in Keras, depending on the type of problem you have. What I’m referring to is for example layers.SimpleRNN
, layers.LSTM
or layers.GRU
.
So let’s say we have (with the functional API in Keras):
inputs = tf.keras.Input(shape=(timesteps, features), name='input') lstm_1 = layers.LSTM(64, return_sequences=True, name='lstm_1')(inputs) lstm_2 = layers.LSTM(64, return_sequences=True, name='lstm_2')(lstm_1) # last lstm layer lstm_3 = layers.LSTM(64, return_sequences=False, name='lstm_3')(lstm_2) model = keras.Model(inputs=inputs, outputs=lstm_3, name='rnn_example') #print(model.summary()) inputs = tf.random.normal([32, timesteps, features]) print(model(inputs).shape)
Where lstm_3
is the last layer.
Does it make sense to have it as an LSTM layer? Or would it have to be a different type of layer? Because I’ve seen both.
For example here (this time with the sequential API):
model = keras.Sequential() model.add( layers.Bidirectional(layers.LSTM(64, return_sequences=True), input_shape=(timesteps, features)) ) # Second Bidirectional layer model.add(layers.Bidirectional(layers.LSTM(32))) # Output model.add(layers.Dense(10)) model.summary()
Advertisement
Answer
TL;DR Both are valid choices.
Overall it depends of the kind of output you want or, more precisely, where do you want your output to come from. You can use the outputs of the LSTM layer directly, or you can use a Dense layer, with or without a TimeDistributed layer. One reason for adding another Dense layer after the final LSTM is allowing your model to be more expressive (and also more prone to overfitting). So, using a final dense layer or not is up to experimentation.