Skip to content
Advertisement

How to infer the shape of the output when connecting convolution layer with dense layers?

I am trying to construct a Convolutional Neural Network using pytorch and can not understand how to interpret the input neurons for the first densely connected layer. Say, for example, I have the following architecture:

self.conv_layer = nn.Sequential(
   nn.Conv2d(3, 32, 5),
   nn.Conv2d(32, 64, 5),
   nn.MaxPool2d(2, 2),
   nn.Conv2d(64, 128, 5),
   nn.Conv2d(128, 128, 5),
   nn.MaxPool2d(2, 2))

self.fc_layer = nn.Sequential(
   nn.Linear(X, 512),
   nn.Linear(512, 128),
   nn.Linear(128, 10))

Here X would be the number of neurons in the first linear layer. So, do I need to keep track of the shape of the output tensor at each layer so that I can figure out X?

Now, I can put the values in the formula (W - F + 2P) / S + 1 and calculate the shape after each layer, that would be somewhat convenient.

Isn’t there something even more convenient which might do this automatically?

Advertisement

Answer

An easy solution would be to use LazyLinear layer: https://pytorch.org/docs/stable/generated/torch.nn.LazyLinear.html.

According to the documentation:

A torch.nn.Linear module where in_features is inferred … They will be initialized after the first call to forward is done and the module will become a regular torch.nn.Linear module. The in_features argument of the Linear is inferred from the input.shape[-1].

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement