Skip to content
Advertisement

Specifying number of cells in LSTM layer in PyTorch

I don’t fully understand the LSTM layer in PyTorch. When I instantiate an LSTM layer how can I specify the number of LSTM cells inside the layer? My first thought was that it was the “num_layers” argument, if we assume that LSTM cells are connected vertically. But if that is the case how can we implement stacked LSTM with for example two layers with 8 cells each?

Advertisement

Answer

The amount of cells of an LSTM (or RNN or GRU) is the amount of timesteps your input has/needs. For example, when you want to run the word „hello“ through the LSTM function in Pytorch, you can just convert the word to a vector (with one-hot encoding or embeddings) and then pass that vector though the LSTM function. It will then, in the background, iterate through all the embedded characters („h“, „e“, „l“, …). And each input can even have a different amount of timesteps/cells, for example when you want to pass „hello“ and after that „Joe“ the LSTM will need different amount of iterations (5 for hello, 3 for Joe). So as you can see, there is no need to give an amount of cells! Hope that answer satisfied you. :)

Edit

An example:

sentence = "Hey Im Joe"

embedding_size = 300
batch_size = 1  # batch-size 1 for demonstration

input_ = [create_embedding(word, dims=embedding_size) for word in sentence]
# the LSTM will need three timesteps or cells to process that sentence 
input_ = torch.tensor(x).reshape(1, embedding_size)

hidden_size = 256
layers = 256

lstm = nn.LSTM(input_size=embedding_size, hidden_size=hidden_size, num_layers=layers, dropout=0.5, batch_first=True)

# initialize hidden-state (must be tuple of following dimentsions
hidden = (torch.zeros(layers, batch_size, hidden_size), torch.zeros(layers, batch_size, hidden_size))

outputs, hidden = lstm(input_, hidden)
# outputs is now a list containing the outputs of each timestep
# for classification you can take output of the last timestep and use it further, like this
output = outputs[:, -1]

So what happens in this (outputs, hidden = lstm(input, hidden)) line? Again pseudo code:

# inside the LSTM function

# lets say for demonstration reasons the embedding for "Hey" is a, for "Im" its b and for "Joe" its c

input_ = [a, b, c]

def LSTM(input_sentence):
    hidden = ...
    outputs = []
    
    # each iteration is one timestep/cell
    for embedded_word in input_sentence:
        output, hidden = neural_network(embedded_word, hidden)
        outputs.append(output)
    
    # returns all outputs and hiddenstate (which you normally dont need)
    return outputs, hidden

outputs, hidden = LSTM(input_)

Is it clear now how what the LSTM function does and how to use it?

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement