Skip to content
Advertisement

Implementing a minimal LSTMCell in Keras using RNN and Layer classes

I am trying to implement a simple LSTMCell without the “fancy kwargs” defaultly implemented in the tf.keras.layers.LSTMCell class, following a schematic model like this. It doesn’t really have a direct purpose, I would just like to practice implementing a more complex RNNCell than the one described here in the Examples section. My code is the following:

JavaScript

However, when I tried to test it an exception was raised with the following message:

JavaScript

in the call function at the line where I set the value for self.stateC. Here, I thought that initially the statesargument of the call function is a tensor and not a list of tensors, so this is why I get an error. So I added a self.already_called = False line to the classes __init__ and the following segment to the call function:

JavaScript

hoping that it will eliminate the problem. This resulted in another error at the merge_with_state function:

JavaScript

which I genuinely do not get, since the RNN layer should only “show” the CustomLSTMCell tensors with shape (3) and not (None, 3), since axis 0 is the axis it should iterate along. At this point I was convinced that I am doing something really wrong and should ask the community for help. Basically my question is: what is wrong with my code and if “almost everything”, then how should I implement an LSTMCell from scratch?

Advertisement

Answer

Ok, so it seems that I managed to fix the problem. It turns out that it is always useful to read the documentation, in this case the docs for the RNN class. First, the already_called attribute is unnecessary, because the problem lies in the first line of the __init__ function: the state_size attribute should be a list of integers and not only one integer, like this: self.state_size = [units, units] (since we need two states for an LSTM of size units and not one). When I corrected it I got a different error: the tensors are not compatible in dimension in the forget_gate for addition. This happened because the RNN sees the whole batch at once and not each element in the batch separately (thus the None shape at axis 0). The correction for it is to add an extra dimension to each tensor of size 1 at axis 0 like this:

JavaScript

and instead of dot products I had to use the K.batch_dot function. So the whole, working code is the following:

JavaScript

Edit: In the question I made a mistake with respect to the model linked and used a tanh function in the input_gate for amount instead of a sigmoid. Here I edited it in the code, so it is correct now.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement