Skip to content
Advertisement

Last layer in a RNN – Dense, LSTM, GRU…?

I know you can use different types of layers in an RNN architecture in Keras, depending on the type of problem you have. What I’m referring to is for example layers.SimpleRNN, layers.LSTM or layers.GRU.

So let’s say we have (with the functional API in Keras):

inputs = tf.keras.Input(shape=(timesteps, features), name='input')
lstm_1 = layers.LSTM(64, return_sequences=True, name='lstm_1')(inputs)
lstm_2 = layers.LSTM(64, return_sequences=True, name='lstm_2')(lstm_1)
# last lstm layer
lstm_3 = layers.LSTM(64, return_sequences=False, name='lstm_3')(lstm_2)
model = keras.Model(inputs=inputs, outputs=lstm_3, name='rnn_example')
#print(model.summary())
inputs = tf.random.normal([32, timesteps, features])
print(model(inputs).shape)

Where lstm_3 is the last layer.

Does it make sense to have it as an LSTM layer? Or would it have to be a different type of layer? Because I’ve seen both.

For example here (this time with the sequential API):

model = keras.Sequential()

model.add(
    layers.Bidirectional(layers.LSTM(64, return_sequences=True), input_shape=(timesteps, features))
)
# Second Bidirectional layer
model.add(layers.Bidirectional(layers.LSTM(32)))
# Output
model.add(layers.Dense(10))

model.summary()

Advertisement

Answer

TL;DR Both are valid choices.

Overall it depends of the kind of output you want or, more precisely, where do you want your output to come from. You can use the outputs of the LSTM layer directly, or you can use a Dense layer, with or without a TimeDistributed layer. One reason for adding another Dense layer after the final LSTM is allowing your model to be more expressive (and also more prone to overfitting). So, using a final dense layer or not is up to experimentation.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement