Specifying number of cells in LSTM layer in PyTorch

Question

I don't fully understand the LSTM layer in PyTorch. When I instantiate an LSTM layer how can I specify the number of LSTM cells inside the layer? My first thought was that it was the "num_layers" argument, if we assume that LSTM cells are connected vertically. But if that is the case how can we implement stacked LSTM with for

Accepted Answer

The amount of cells of an LSTM (or RNN or GRU) is the amount of timesteps your input has/needs. For example, when you want to run the word „hello“ through the LSTM function in Pytorch, you can just convert the word to a vector (with one-hot encoding or embeddings) and then pass that vector though the LSTM function. It will then, in the background, iterate through all the embedded characters („h“, „e“, „l“, &#8230;). And each input can even have a different amount of timesteps/cells, for example when you want to pass „hello“ and after that „Joe“ the LSTM will need different amount of iterations (5 for hello, 3 for Joe). So as you can see, there is no need to give an amount of cells! Hope that answer satisfied you. :)EditAn example:sentence = "Hey Im Joe"embedding_size = 300batch_size = 1  # batch-size 1 for demonstrationinput_ = [create_embedding(word, dims=embedding_size) for word in sentence]# the LSTM will need three timesteps or cells to process that sentence input_ = torch.tensor(x).reshape(1, embedding_size)hidden_size = 256layers = 256lstm = nn.LSTM(input_size=embedding_size, hidden_size=hidden_size, num_layers=layers, dropout=0.5, batch_first=True)# initialize hidden-state (must be tuple of following dimentsionshidden = (torch.zeros(layers, batch_size, hidden_size), torch.zeros(layers, batch_size, hidden_size))outputs, hidden = lstm(input_, hidden)# outputs is now a list containing the outputs of each timestep# for classification you can take output of the last timestep and use it further, like thisoutput = outputs[:, -1]So what happens in this (outputs, hidden = lstm(input, hidden)) line?Again pseudo code:# inside the LSTM function# lets say for demonstration reasons the embedding for "Hey" is a, for "Im" its b and for "Joe" its cinput_ = [a, b, c]def LSTM(input_sentence):    hidden = ...    outputs = []        # each iteration is one timestep/cell    for embedded_word in input_sentence:        output, hidden = neural_network(embedded_word, hidden)        outputs.append(output)        # returns all outputs and hiddenstate (which you normally dont need)    return outputs, hiddenoutputs, hidden = LSTM(input_)Is it clear now how what the LSTM function does and how to use it?

Advertisement

Answer