I don’t fully understand the LSTM layer in PyTorch. When I instantiate an LSTM layer how can I specify the number of LSTM cells inside the layer? My first thought was that it was the “num_layers” argument, if we assume that LSTM cells are connected vertically. But if that is the case how can we implement stacked LSTM with for example two layers with 8 cells each?
Advertisement
Answer
The amount of cells of an LSTM (or RNN or GRU) is the amount of timesteps your input has/needs. For example, when you want to run the word „hello“ through the LSTM function in Pytorch, you can just convert the word to a vector (with one-hot encoding or embeddings) and then pass that vector though the LSTM function. It will then, in the background, iterate through all the embedded characters („h“, „e“, „l“, …). And each input can even have a different amount of timesteps/cells, for example when you want to pass „hello“ and after that „Joe“ the LSTM will need different amount of iterations (5 for hello, 3 for Joe). So as you can see, there is no need to give an amount of cells! Hope that answer satisfied you. :)
Edit
An example:
sentence = "Hey Im Joe" embedding_size = 300 batch_size = 1 # batch-size 1 for demonstration input_ = [create_embedding(word, dims=embedding_size) for word in sentence] # the LSTM will need three timesteps or cells to process that sentence input_ = torch.tensor(x).reshape(1, embedding_size) hidden_size = 256 layers = 256 lstm = nn.LSTM(input_size=embedding_size, hidden_size=hidden_size, num_layers=layers, dropout=0.5, batch_first=True) # initialize hidden-state (must be tuple of following dimentsions hidden = (torch.zeros(layers, batch_size, hidden_size), torch.zeros(layers, batch_size, hidden_size)) outputs, hidden = lstm(input_, hidden) # outputs is now a list containing the outputs of each timestep # for classification you can take output of the last timestep and use it further, like this output = outputs[:, -1]
So what happens in this (outputs, hidden = lstm(input, hidden)
) line?
Again pseudo code:
# inside the LSTM function # lets say for demonstration reasons the embedding for "Hey" is a, for "Im" its b and for "Joe" its c input_ = [a, b, c] def LSTM(input_sentence): hidden = ... outputs = [] # each iteration is one timestep/cell for embedded_word in input_sentence: output, hidden = neural_network(embedded_word, hidden) outputs.append(output) # returns all outputs and hiddenstate (which you normally dont need) return outputs, hidden outputs, hidden = LSTM(input_)
Is it clear now how what the LSTM function does and how to use it?