monitor
– Monitor Training of Neural Networks¶
-
class
neuralnet_pytorch.monitor.
Monitor
(model_name=None, root=None, current_folder=None, print_freq=100, num_iters=None, prefix='run', use_visdom=False, use_tensorboard=False, send_slack=False, with_git=False, **kwargs)[source]¶ Collects statistics and displays the results using various backends. The collected stats are stored in ‘<root>/<model_name>/<prefix><#id>’ where #id is automatically assigned each time a new run starts.
Examples
The following snippet shows how to plot smoothed training losses and save images from the current iteration, and then display them every 100 iterations.
from neuralnet_pytorch import monitor as mon mon.model_name = 'foo-model' mon.set_path() mon.print_freq = 100 ... for epoch in mon.iter_epoch(range(n_epochs)): for data in mon.iter_batch(data_loader): loss = net(data) mon.plot('training loss', loss, smooth=.99, filter_outliers=True) mon.imwrite('input images', data['images'], latest_only=True) ...
Parameters: - model_name (str) – name of the model folder.
Default:
None
. - root (str) – path to store the collected statistics.
Default:
None
. - current_folder (str) – if given, all the stats will be loaded from the given folder.
Default:
None
. - print_freq (int) – statistics display frequency.
Unit is iteration.
Default:
None
. - num_iters (int) – number of iterations per epoch. If specified, training iteration percentage will be displayed along with epoch. Otherwise, it will be automatically calculated in the first epoch. Default: 100.
- prefix (str) – predix for folder name of of each run.
Default:
'run'
. - use_visdom (bool) – whether to use Visdom for real-time monitoring.
Default:
False
. - use_tensorboard (bool) – whether to use Tensorboard for real-time monitoring.
Default:
False
. - send_slack (bool) – whether to send the statistics to Slack chatroom.
Default:
False
. - with_git (bool) – whether to retrieve git information.
Default:
False
. - kwargs – some miscellaneous options for Visdom and other functions.
-
path
¶ contains all the runs of model_name.
-
current_folder
¶ path to the current run.
-
vis
¶ an instance of
Visdom
when use_visdom is set toTrue
.
-
writer
¶ an instance of Tensorboard’s
SummaryWriter
when use_tensorboard is set toTrue
.
-
plot_folder
¶ path to the folder containing the collected plots.
-
file_folder
¶ path to the folder containing the collected files.
-
image_folder
¶ path to the folder containing the collected images.
-
hist_folder
¶ path to the folder containing the collected histograms.
-
clear_hist_stats
(key)[source]¶ removes the collected statistics for histogram plot of the specified key.
Parameters: key – the name of the histogram collection. Returns: None
.
-
clear_mat_stats
(key)[source]¶ removes the collected statistics for matrix plot of the specified key.
Parameters: key – the name of the matrix collection. Returns: None
.
-
clear_num_stats
(key)[source]¶ removes the collected statistics for scalar plot of the specified key.
Parameters: key – the name of the scalar collection. Returns: None
.
-
epoch
¶ returns the current epoch.
Returns: _last_epoch
.
-
hist_stats
¶ returns the collected tensors from beginning.
Returns: _hist_since_beginning
.
-
iter
¶ returns the current iteration.
Returns: _iter
.
-
iter_batch
(iterator)[source]¶ tracks training iteration and returns the item in iterator.
Parameters: iterator – the batch iterator. For e.g., enumerator(loader)
.Returns: a generator over iterator. Examples
>>> from neuralnet_pytorch import monitor as mon >>> mon.print_freq = 1000 >>> data_loader = ... >>> num_epochs = 10 >>> for epoch in mon.iter_epoch(range(num_epochs)): ... for idx, data in mon.iter_batch(enumerate(data_loader)): ... # do something here
See also
-
iter_epoch
(iterator)[source]¶ tracks training epoch and returns the item in iterator.
Parameters: iterator – the epoch iterator. For e.g., range(num_epochs)
.Returns: a generator over iterator. Examples
>>> from neuralnet_pytorch import monitor as mon >>> mon.print_freq = 1000 >>> num_epochs = 10 >>> for epoch in mon.iter_epoch(range(mon.epoch, num_epochs)) ... # do something here
See also
-
load
(file, method='pickle', version=-1, **kwargs)[source]¶ loads from the given file.
Parameters: - file – name of the saved file without version.
- method –
str
orcallable
. Ifcallable
, it should be a custom method to load object. There are 3 types ofstr
.'pickle'
: usepickle.dump()
to store object.'torch'
: usetorch.save()
to store object.'txt'
: usenumpy.savetxt()
to store object.Default:
'pickle'
. - version – the version of the saved file to load. Default: -1 (loads the latest version of the saved file).
- kwargs – additional keyword arguments to the underlying load function.
Returns: None
.
-
mat_stats
¶ returns the collected scalar statistics from beginning.
Returns: _num_since_beginning
.
-
model_name
¶ returns the name of the model.
Returns: _model_name
.
-
num_stats
¶ returns the collected scalar statistics from beginning.
Returns: _num_since_beginning
.
-
prefix
¶ returns the prefix of saved folders.
Returns: _prefix
.
-
read_log
(log)[source]¶ reads a saved log file.
Parameters: log – name of the log file. Returns: contents of the log file.
-
reset
()[source]¶ factory-resets the monitor object. This includes clearing all the collected data, set the iteration and epoch counters to 0, and reset the timer.
Returns: None
.
-
run_training
(net, solver: torch.optim.optimizer.Optimizer, train_loader, n_epochs: int, closure=None, eval_loader=None, valid_freq=None, start_epoch=None, scheduler=None, scheduler_iter=False, device=None, *args, **kwargs)[source]¶ Runs the training loop for the given neural network.
Parameters: - net – must be an instance of
Net
andModule
. - solver – a solver for optimization.
- train_loader – provides training data for neural net.
- n_epochs – number of training epochs.
- closure – a method to calculate loss in each optimization step. Optional.
- eval_loader – provides validation data for neural net. Optional.
- valid_freq – indicates how often validation is run. In effect if only eval_loader is given.
- start_epoch – the epoch from which training will continue.
If
None
, training counter will be set to 0. - scheduler – a learning rate scheduler.
Default:
None
. - scheduler_iter – if
True
, scheduler will run every iteration. Otherwise, it will step every epoch. Default:False
. - device – device to perform calculation.
Default:
None
. - args – additional arguments that will be passed to neural net.
- kwargs – additional keyword arguments that will be passed to neural net.
Returns: None
.Examples
import neuralnet_pytorch as nnt from neuralnet_pytorch import monitor as mon class MyNet(nnt.Net, nnt.Module): ... def train_procedure(batch, *args, **kwargs): loss = ... mon.plot('train loss', loss) return loss def eval_procedure(batch, *args, **kwargs): pred = ... loss = ... acc = ... mon.plot('eval loss', loss) mon.plot('eval accuracy', acc) # define the network, and training and testing loaders net = MyNet(...) train_loader = ... eval_loader = ... solver = ... scheduler = ... # instantiate a Monitor object mon.model_name = 'my_net' mon.print_freq = 100 mon.set_path() # collect the parameters of the network def save_checkpoint(): states = { 'states': mon.epoch, 'model_state_dict': net.state_dict(), 'opt_state_dict': solver.state_dict() } if scheduler is not None: states['scheduler_state_dict'] = scheduler.state_dict() mon.dump(name='training.pt', obj=states, type='torch', keep=5) # save a checkpoint after each epoch and keep only the 5 latest checkpoints mon.schedule(save_checkpoint) print('Training...') # run the training loop mon.run_training(net, solver, train_loader, n_epochs, eval_loader=eval_loader, scheduler=scheduler, valid_freq=val_freq) print('Training finished!')
Parameters: - solver –
- scheduler –
- scheduler –
- net – must be an instance of
-
schedule
(func, when=None, *args, **kwargs)[source]¶ uses to schedule a routine during every epoch in
run_training()
.Parameters: - func – a routine to be executed in
run_training()
. - when – the moment when the
func
is executed. For the moment, choices are:'begin_epoch'
,'end_epoch'
,'begin_iter'
, and'end_iter'
. Default:'begin_epoch'
. - args – additional arguments to func.
- kwargs – additional keyword arguments to func.
Returns: None
- func – a routine to be executed in
- model_name (str) – name of the model folder.
Default: