Skip to content
Advertisement

What is the relation between a learning rate scheduler and an optimizer?

If I have a model:

import torch
import torch.nn as nn
import torch.optim as optim

class net_x(nn.Module): 
        def __init__(self):
            super(net_x, self).__init__()
            self.fc1=nn.Linear(2, 20) 
            self.fc2=nn.Linear(20, 20)
            self.out=nn.Linear(20, 4) 

        def forward(self, x):
            x=self.fc1(x)
            x=self.fc2(x)
            x=self.out(x)
            return x

nx = net_x()

And then I’m defining my inputs, optimizer (with lr=0.1), scheduler (with base_lr=1e-3), and training:

r = torch.tensor([1.0,2.0])
optimizer = optim.Adam(nx.parameters(), lr = 0.1)
scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=1e-3, max_lr=0.1, step_size_up=1, mode="triangular2", cycle_momentum=False)

path = 'opt.pt'
for epoch in range(10):
    optimizer.zero_grad()
    net_predictions = nx(r)
    loss = torch.sum(torch.randint(0,10,(4,)) - net_predictions)
    loss.backward()
    optimizer.step()
    scheduler.step()
    print('loss:' , loss)
    
    #save state dict
    torch.save({    'epoch': epoch,
                    'net_x_state_dict': nx.state_dict(),
                    'optimizer_state_dict': optimizer.state_dict(),
                    'scheduler': scheduler.state_dict(),    
                    }, path)
#loading state dict
checkpoint = torch.load(path)        
nx.load_state_dict(checkpoint['net_x_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
scheduler.load_state_dict(checkpoint['scheduler'])

The optimizer seems to take the learning rate of the scheduler

for g in optimizer.param_groups:
    print(g)
>>>
{'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False, 'initial_lr': 0.001, 'params': [Parameter containing:

Does the learning rate scheduler overwrite the optimizer? How does it connect to it? Trying to understand the relation between them (i.e how they interact, etc.)

Advertisement

Answer

TL;DR: The LR scheduler contains the optimizer as a member and alters its parameters learning rates explicitly.


As mentioned in PyTorch Official Documentations, the learning rate scheduler receives the optimizer as a parameter in its constructor, and thus has access to its parameters.

The common use is to update the LR after every epoch:

scheduler = ... # initialize some LR scheduler
for epoch in range(100):
    train(...) # here optimizer.step() is called numerous times.
    validate(...)
    scheduler.step()

All optimizers inherit from a common parent class torch.nn.Optimizer and are updated using the step method implemented for each of them.

Similarly, all LR schedulers (besides ReduceLROnPlateau) inherit from a common parent class named _LRScheduler. Observing its source code uncovers that in the step method the class indeed changes the LR of the parameters of the optimizer:

...
for i, data in enumerate(zip(self.optimizer.param_groups, values)):
            param_group, lr = data
            param_group['lr'] = lr
...
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement