Skip to content
Advertisement

Difference between the calculation of the training loss and validation loss using pytorch

I wanna use the following code of this traditional image classification problem for my regression problem. The code can be found here:

GeeksforGeeks-Training Neural Networks with Validation using Pytorch

class Network(nn.Module):
    def __init__(self):
        super(Network,self).__init__()
        self.fc1 = nn.Linear(28*28, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(1,-1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = Network()

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

epochs = 5

for e in range(epochs):
    train_loss = 0.0
    model.train()     # Optional when not using Model Specific layer
    for data, labels in trainloader:
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
        
        optimizer.zero_grad()
        target = model(data)
        loss = criterion(target,labels)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
    
    valid_loss = 0.0
    model.eval()     # Optional when not using Model Specific layer
    for data, labels in validloader:
        if torch.cuda.is_available():
            data, labels = data.cuda(), labels.cuda()
        
        target = model(data)
        loss = criterion(target,labels)
        valid_loss = loss.item() * data.size(0)

print(f'Epoch {e+1} tt Training Loss: {train_loss / len(trainloader)} tt Validation Loss: {valid_loss / len(validloader)}')

I can understand why the training loss is summed up and then divided by the length of the training data in this example, but I can’t get why the validation loss is also not summed up and divided by the length. If I understand correctly, the validation loss will be calculated here by using the validation loss of the last batch and then it is multiplied by the length of the batch size.

Is the calulation of the validation loss the correct way to do it? Can I use the code for my regression problem assuming I use regression-specific metrics (e.g. MSE instead of CrossEntropyLoss etc.)?

Advertisement

Answer

Yes, you can use the code for your regression task. The targets of the code example are one-hot vectors or in the MNIST example the numbers 0 to 9, which symbolize the classes. You would make a scalar out of that in the regression case. The loss function, which is the cross-entropy in the example, can be replaced by the MSE in your case.

I assume that the validation loss in this example is only estimated by extrapolating from a single data point to all other data points. Since data.size represents the batch size, even averaging would only come out with the loss of that single data point.

However, on the web page, the validation loss is calculated over all data points in the validation set, as it should be done.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement