Testing Dropout Method for Neural Network

James Jung, Sumved Ravi, Ryan Jakiel

Figure 1. Srivastava’s Comparison of Dropout Network vs Regular Network Classification Errors

From previous work done by Srivastava and his peers, it was found that the dropout method used on neural networks with a node retention probability between 0.4 and 0.8 generated a close to optimal result across dataset a broad spectrum of data groups. Srivastava’s group then goes on to experiment with averaging the results of multiple dropout networks to create a final neural net for the test classifications. It was found that dropout networks produced a noticeable reduction in classification error for nearly across all weight updates i, specifically between a 0.25% - 0.5% as seen in the figure below.

Figure 2. Srivastava’s Dropout Network Classification Error Results per Node Retention Probability

In addition, the node retention probability was varied to see which probability value lead to to the greatest decrease in classification error. Srivastava’s group found a plateau in percent error between probabilities of 0.5 < p < 0.8. This region of probabilities yields approximately 0.5% - 2% improvement in classification error as compared to other drop out probabilities.

The purpose in this experiment is to validate two findings. First, to show the drop in classification error between the dropout networks and regular networks. Second, to validate the concave trend (Figure 2) in classification error as probability p of node retention is varied.

In order to reproduce these results, we have changed the multi-layered perceptron model from homework 3 as follows. The final neural network weights are averaged across 20 thinned neural network maps. This method is repeated across ten testing sets and scored through cross validation of the given data. The number of hidden units is kept constant per run as was done in Srivastava’s paper (seen in Figure 2 above, “keeping n fixed”, n being number of hidden units). For each testing set the node retention probability p was varied with values, [0.1, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0]. The tests with a retention probability of 1.0 are effectively a regular neural network with no dropouts, and as such is treated as the baseline.

Figure 3. Dropout Network %Classification Error in Comparison to the of Baseline p = 1.0

The data was re-formatted to see the increase and decrease in %classification error in comparison to the baseline of p = 1.0 (no dropouts).

Figure 4. Dropout Network %Classification Error Results per Node Retention Probability

The results above show the classification error per training set per probability p.

Figure 5. Averaged %Classification Error per Node Retention Probability p

The classification error across all testing sets was averaged per probability p. The results on the left (Figure 5) can be directly compared to Figure 2 above.

The results show that the effect of dropout in network accuracy is inconclusive. The standard neural network maintains minimal %classification error in comparison to dropout networks with varying retention probability. In fact, the trend shows that as probability p increases, reducing the number of dropped nodes, the accuracy of the network increases. This being said, certain aspects of the network done by Srivastava and his team are unable to be replicated by ourselves. In the paper, the number of thinned neural networks (weights) averaged together is 2^n . Where n is the number of nodes. This is done so as to increase the number of variations and smooth out the distribution of dropped nodes when averaged. Our network model is set to be simplistic with a 4 (input nodes), 4 (hidden nodes), and 1 (output node) network. This in turn means that we have only 2^8 combinations of dropouts to consider, as well as leads to a higher chance of dropping the majority of nodes. The model used in paper, is much more complex and robust, built for varying training data sets. This leads to greater variation to account for and in return a better training / strengthening of weights, as well as reduces the likely hood of dropping the majority of a layer.

References:

[1] B. Ko, H. Kim and H. Choi, "Controlled dropout: A different dropout for improving training speed on deep neural network," 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, 2017, pp. 972-977. doi: 10.1109/SMC.2017.8122736

https://ieeexplore-ieee-org.ezproxy.lib.vt.edu/document/8122736?arnumber=8122736&SID=EBSCO:edseee

[2] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research 15, pp. 1929–1958, Jun. 2014.

http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

[3] S. M. Yadav and K. George, "On the Use of Dropouts in Neural Networks for System Identification and Control," 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 2018, pp. 1374-1381. doi: 10.1109/SSCI.2018.8628706

https://ieeexplore-ieee-org.ezproxy.lib.vt.edu/document/8628706?arnumber=8628706&SID=EBSCO:edseee