optimization – Extra Optimization Schemes

Extra Optimizers

class neuralnet_pytorch.optim.AdaBound(params, lr=0.001, betas=(0.9, 0.999), final_lr=0.1, gamma=0.001, eps=1e-08, weight_decay=0, amsbound=False)[source]

Implements AdaBound algorithm proposed in Adaptive Gradient Methods with Dynamic Bound of Learning Rate.

Parameters:
  • params – iterable of parameters to optimize or dicts defining. parameter groups
  • lr – Adam learning rate. Default: 1e-3.
  • betas – coefficients used for computing running averages of gradient and its square. Default: (0.9, 0.999).
  • final_lr – final (SGD) learning rate. Default: 0.1.
  • gamma – convergence speed of the bound functions. Default: 1e-3.
  • eps – term added to the denominator to improve numerical stability. Default: 1e-8.
  • weight_decay – weight decay (L2 penalty). Default: 0.
  • amsbound (bool) – whether to use the AMSBound variant of this algorithm.
class neuralnet_pytorch.optim.Lookahead(optimizer, la_steps=5, alpha=0.8, pullback_momentum=None)[source]

PyTorch implementation of the lookahead wrapper. Lookahead Optimizer: https://arxiv.org/abs/1907.08610.

Parameters:
  • optimizer – an usual optimizer such as Adam.
  • la_steps – number of lookahead steps. Default: 5.
  • alpha – linear interpolation coefficient. Default: 0.8.
  • pullback_momentum – either 'reset', 'pullback', or None. Default: None.
class neuralnet_pytorch.optim.NAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, decay=<function NAdam.<lambda>>)[source]

Adaptive moment with Nesterov gradients.

http://cs229.stanford.edu/proj2015/054_report.pdf

Parameters:
  • params – iterable of parameters to optimize or dicts defining parameter groups
  • lr – learning rate (default: 1e-3)
  • betas – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
  • eps – term added to the denominator to improve numerical stability (default: 1e-8)
  • weight_decay – weight decay (L2 penalty) (default: 0)
  • decay – a decay scheme for betas[0]. Default: \(\beta * (1 - 0.5 * 0.96^{\frac{t}{250}})\) where t is the training step.

Extra LR Schedulers

class neuralnet_pytorch.optim.lr_scheduler.InverseLR(optimizer, gamma, last_epoch=-1)[source]

Decreases lr every iteration by the inverse of gamma times iteration plus 1. \(\text{lr} = \text{lr} / (1 + \gamma * t)\).

Parameters:
  • optimizer – wrapped optimizer.
  • gamma – decrease coefficient.
  • last_epoch (int) – the index of last epoch. Default: -1.
class neuralnet_pytorch.optim.lr_scheduler.WarmRestart(optimizer, T_max, T_mul=1, eta_min=0, last_epoch=-1)[source]

Step should be used in batch iteration loop. Putting step in the epoch loop results in wrong behavior of the restart. One must not pass the iteration number to step.

Parameters:
  • optimizer – wrapped optimizer.
  • T_max – maximum number of iterations.
  • T_mul – multiplier for T_max.
  • eta_min – minimum learning rate. Default: 0.
  • last_epoch – the index of last epoch. Default: -1.