If I have a model:
import torch import torch.nn as nn import torch.optim as optim class net_x(nn.Module): def __init__(self): super(net_x, self).__init__() self.fc1=nn.Linear(2, 20) self.fc2=nn.Linear(20, 20) self.out=nn.Linear(20, 4) def forward(self, x): x=self.fc1(x) x=self.fc2(x) x=self.out(x) return x nx = net_x()
And then I’m defining my inputs, optimizer (with lr=0.1
), scheduler (with base_lr=1e-3
), and training:
r = torch.tensor([1.0,2.0]) optimizer = optim.Adam(nx.parameters(), lr = 0.1) scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=1e-3, max_lr=0.1, step_size_up=1, mode="triangular2", cycle_momentum=False) path = 'opt.pt' for epoch in range(10): optimizer.zero_grad() net_predictions = nx(r) loss = torch.sum(torch.randint(0,10,(4,)) - net_predictions) loss.backward() optimizer.step() scheduler.step() print('loss:' , loss) #save state dict torch.save({ 'epoch': epoch, 'net_x_state_dict': nx.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'scheduler': scheduler.state_dict(), }, path) #loading state dict checkpoint = torch.load(path) nx.load_state_dict(checkpoint['net_x_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) scheduler.load_state_dict(checkpoint['scheduler'])
The optimizer seems to take the learning rate of the scheduler
for g in optimizer.param_groups: print(g) >>> {'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False, 'initial_lr': 0.001, 'params': [Parameter containing:
Does the learning rate scheduler overwrite the optimizer? How does it connect to it? Trying to understand the relation between them (i.e how they interact, etc.)
Advertisement
Answer
TL;DR: The LR scheduler contains the optimizer as a member and alters its parameters learning rates explicitly.
As mentioned in PyTorch Official Documentations, the learning rate scheduler receives the optimizer as a parameter in its constructor, and thus has access to its parameters.
The common use is to update the LR after every epoch:
scheduler = ... # initialize some LR scheduler for epoch in range(100): train(...) # here optimizer.step() is called numerous times. validate(...) scheduler.step()
All optimizers inherit from a common parent class torch.nn.Optimizer
and are updated using the step
method implemented for each of them.
Similarly, all LR schedulers (besides ReduceLROnPlateau
) inherit from a common parent class named _LRScheduler
. Observing its source code uncovers that in the step
method the class indeed changes the LR of the parameters of the optimizer:
... for i, data in enumerate(zip(self.optimizer.param_groups, values)): param_group, lr = data param_group['lr'] = lr ...