monitor – Monitor Training of Neural Networks

class neuralnet_pytorch.monitor.Monitor(model_name=None, root=None, current_folder=None, print_freq=100, num_iters=None, prefix='run', use_visdom=False, use_tensorboard=False, send_slack=False, with_git=False, **kwargs)[source]

Collects statistics and displays the results using various backends. The collected stats are stored in ‘<root>/<model_name>/<prefix><#id>’ where #id is automatically assigned each time a new run starts.

Examples

The following snippet shows how to plot smoothed training losses and save images from the current iteration, and then display them every 100 iterations.

from neuralnet_pytorch import monitor as mon

mon.model_name = 'foo-model'
mon.set_path()
mon.print_freq = 100

...
for epoch in mon.iter_epoch(range(n_epochs)):
    for data in mon.iter_batch(data_loader):
        loss = net(data)
        mon.plot('training loss', loss, smooth=.99, filter_outliers=True)
        mon.imwrite('input images', data['images'], latest_only=True)
...
Parameters:
  • model_name (str) – name of the model folder. Default: None.
  • root (str) – path to store the collected statistics. Default: None.
  • current_folder (str) – if given, all the stats will be loaded from the given folder. Default: None.
  • print_freq (int) – statistics display frequency. Unit is iteration. Default: None.
  • num_iters (int) – number of iterations per epoch. If specified, training iteration percentage will be displayed along with epoch. Otherwise, it will be automatically calculated in the first epoch. Default: 100.
  • prefix (str) – predix for folder name of of each run. Default: 'run'.
  • use_visdom (bool) – whether to use Visdom for real-time monitoring. Default: False.
  • use_tensorboard (bool) – whether to use Tensorboard for real-time monitoring. Default: False.
  • send_slack (bool) – whether to send the statistics to Slack chatroom. Default: False.
  • with_git (bool) – whether to retrieve git information. Default: False.
  • kwargs – some miscellaneous options for Visdom and other functions.
path

contains all the runs of model_name.

current_folder

path to the current run.

vis

an instance of Visdom when use_visdom is set to True.

writer

an instance of Tensorboard’s SummaryWriter when use_tensorboard is set to True.

plot_folder

path to the folder containing the collected plots.

file_folder

path to the folder containing the collected files.

image_folder

path to the folder containing the collected images.

hist_folder

path to the folder containing the collected histograms.

clear_hist_stats(key)[source]

removes the collected statistics for histogram plot of the specified key.

Parameters:key – the name of the histogram collection.
Returns:None.
clear_mat_stats(key)[source]

removes the collected statistics for matrix plot of the specified key.

Parameters:key – the name of the matrix collection.
Returns:None.
clear_num_stats(key)[source]

removes the collected statistics for scalar plot of the specified key.

Parameters:key – the name of the scalar collection.
Returns:None.
epoch

returns the current epoch.

Returns:_last_epoch.
hist_stats

returns the collected tensors from beginning.

Returns:_hist_since_beginning.
iter

returns the current iteration.

Returns:_iter.
iter_batch(iterator)[source]

tracks training iteration and returns the item in iterator.

Parameters:iterator – the batch iterator. For e.g., enumerator(loader).
Returns:a generator over iterator.

Examples

>>> from neuralnet_pytorch import monitor as mon
>>> mon.print_freq = 1000
>>> data_loader = ...
>>> num_epochs = 10
>>> for epoch in mon.iter_epoch(range(num_epochs)):
...     for idx, data in mon.iter_batch(enumerate(data_loader)):
...         # do something here

See also

iter_epoch()

iter_epoch(iterator)[source]

tracks training epoch and returns the item in iterator.

Parameters:iterator – the epoch iterator. For e.g., range(num_epochs).
Returns:a generator over iterator.

Examples

>>> from neuralnet_pytorch import monitor as mon
>>> mon.print_freq = 1000
>>> num_epochs = 10
>>> for epoch in mon.iter_epoch(range(mon.epoch, num_epochs))
...     # do something here

See also

iter_batch()

load(file, method='pickle', version=-1, **kwargs)[source]

loads from the given file.

Parameters:
  • file – name of the saved file without version.
  • method

    str or callable. If callable, it should be a custom method to load object. There are 3 types of str.

    'pickle': use pickle.dump() to store object.

    'torch': use torch.save() to store object.

    'txt': use numpy.savetxt() to store object.

    Default: 'pickle'.

  • version – the version of the saved file to load. Default: -1 (loads the latest version of the saved file).
  • kwargs – additional keyword arguments to the underlying load function.
Returns:

None.

mat_stats

returns the collected scalar statistics from beginning.

Returns:_num_since_beginning.
model_name

returns the name of the model.

Returns:_model_name.
num_stats

returns the collected scalar statistics from beginning.

Returns:_num_since_beginning.
prefix

returns the prefix of saved folders.

Returns:_prefix.
read_log(log)[source]

reads a saved log file.

Parameters:log – name of the log file.
Returns:contents of the log file.
reset()[source]

factory-resets the monitor object. This includes clearing all the collected data, set the iteration and epoch counters to 0, and reset the timer.

Returns:None.
run_training(net, solver: torch.optim.optimizer.Optimizer, train_loader, n_epochs: int, closure=None, eval_loader=None, valid_freq=None, start_epoch=None, scheduler=None, scheduler_iter=False, device=None, *args, **kwargs)[source]

Runs the training loop for the given neural network.

Parameters:
  • net – must be an instance of Net and Module.
  • solver – a solver for optimization.
  • train_loader – provides training data for neural net.
  • n_epochs – number of training epochs.
  • closure – a method to calculate loss in each optimization step. Optional.
  • eval_loader – provides validation data for neural net. Optional.
  • valid_freq – indicates how often validation is run. In effect if only eval_loader is given.
  • start_epoch – the epoch from which training will continue. If None, training counter will be set to 0.
  • scheduler – a learning rate scheduler. Default: None.
  • scheduler_iter – if True, scheduler will run every iteration. Otherwise, it will step every epoch. Default: False.
  • device – device to perform calculation. Default: None.
  • args – additional arguments that will be passed to neural net.
  • kwargs – additional keyword arguments that will be passed to neural net.
Returns:

None.

Examples

import neuralnet_pytorch as nnt
from neuralnet_pytorch import monitor as mon

class MyNet(nnt.Net, nnt.Module):
    ...

    def train_procedure(batch, *args, **kwargs):
        loss = ...
        mon.plot('train loss', loss)
        return loss

    def eval_procedure(batch, *args, **kwargs):
        pred = ...
        loss = ...
        acc = ...
        mon.plot('eval loss', loss)
        mon.plot('eval accuracy', acc)

# define the network, and training and testing loaders
net = MyNet(...)
train_loader = ...
eval_loader = ...
solver = ...
scheduler = ...

# instantiate a Monitor object
mon.model_name = 'my_net'
mon.print_freq = 100
mon.set_path()

# collect the parameters of the network
def save_checkpoint():
    states = {
        'states': mon.epoch,
        'model_state_dict': net.state_dict(),
        'opt_state_dict': solver.state_dict()
    }
    if scheduler is not None:
        states['scheduler_state_dict'] = scheduler.state_dict()

    mon.dump(name='training.pt', obj=states, type='torch', keep=5)

# save a checkpoint after each epoch and keep only the 5 latest checkpoints
mon.schedule(save_checkpoint)
print('Training...')

# run the training loop
mon.run_training(net, solver, train_loader, n_epochs, eval_loader=eval_loader, scheduler=scheduler,
                 valid_freq=val_freq)
print('Training finished!')
Parameters:
  • solver
  • scheduler
  • scheduler
schedule(func, when=None, *args, **kwargs)[source]

uses to schedule a routine during every epoch in run_training().

Parameters:
  • func – a routine to be executed in run_training().
  • when – the moment when the func is executed. For the moment, choices are: 'begin_epoch', 'end_epoch', 'begin_iter', and 'end_iter'. Default: 'begin_epoch'.
  • args – additional arguments to func.
  • kwargs – additional keyword arguments to func.
Returns:

None