Note: This article will require some basic understanding of neural networks. For readers new to the topic we recommend this video.
It is well known to neural network enthusiasts that the training phase plays a key role in ensuring that the tool (neural network) is reliable. Over the years, many training techniques have made it possible to achieve good convergence towards real values (Neural Network output tends to Real Target).
In this article we shall endeavour to clarify the difference between two approaches, which are applied, according to judgment, to the most well-known training techniques (back-propagation, conjugate gradient etc ...)
The main difference between a batch approach and online approach is that, in the latter, the samples of the training set are acquired incrementally during the training process.
The use of the batch method, in which the entire set of samples is available before training begins, remains a good basis for understanding what it means to train a neural network.
Suppose we want to train the network using the backpropagation method, which can be interpreted as a more extensive and less schematic version of the gradient method and is well suited to define training algorithms for multilayer networks.
The next step now lies in choosing which method to use for training: we want to update, at each step, the weights and biases using information relating to all the samples of the training set (batch) or we want to update, at each step, the weights and biases taking into account only a single sample (online)?
It is quite clear that if we have the entire set of samples in the training set then we can start the training process using batch methods. Conversely, online methods can be used for real-time training, for instance, when the sample elements are acquired, from time to time, during the training process.
Furthermore, batch methods are defined as more efficient as they guarantee, with less effort, the convergence of the error function to stationary points and, generally, ensure a good trend towards the set objective (target).
What to do if we have few samples available?
First of all we must remember that we need to acquire more samples but that we can begin the training process with the few available. The online methods require, first of all, to calculate the error for each individual sample, avoiding the introduction of the overall error function, as is required in batch methods.
In real-time learning problems, convergence properties can only be characterized in probabilistic terms. However, online methods are often preferred over batch methods for the following reasons:
- less time to calculate a solution, as the calculation of the derivatives is only imposed on a single function;
- if the set of training available has a very high cardinality, the calculation of the error may be difficult to achieve;
- if the training set available is made up of equivalent sample vectors, the error function could bet incorrectly;
- the on-line technique implicitly defines a certain randomness in the choice of the sample. This aspect can be useful to avoid convergence towards undesired local minima.
Recalling what we have previously written, namely that online methods are characterized by a probabilistic calculation, then such methods can be compared (with due caution) to the non-linear regression algorithms known in statistics. Online methods are excellent methods of stochastic approximation and convergence can also be achieved in a deterministic context, only by imposing appropriate conditions on the step which must necessarily be tending to zero.
In fact, from a deterministic point of view, the on-line methods can be seen as incremental methods in which the calculation of the single error functions (or of its derivatives) belonging to the respective sample is carried out. The evaluation of the different error functions is distributed in a sequence of many iterations (epoch). In order to establish the convergence, however, it is necessary to generate a succession of epochs, that is to take individually (unlike the batch methods), all the error functions and their derivatives.
The main limitation of the on-line method consists precisely in the fact that, to ensure convergence without evaluating the overall target function, it is necessary to impose a-priori that the pitch tends to zero with a predetermined law (heuristics) while trying to avoid a slow convergence (sublinear ).
Ultimately, during the training in real time it is possible to exploit a compromise regarding the use of mixed techniques (on-line-batch: bold-driver methods), in which the on-line method is used for one or more epochs and the step is recalculated periodically by evaluating the overall error function.
Then it is possible to gradually switch from the online method, which is particularly beneficial in initial iterations, to the batch method, which ensures faster convergence.