Gautham Sunder, Carlson School of Management
Title: Hyperparameter Optimization of Deep Neural Networks with Application to Medical Device Manufacturing
Abstract: The prediction performance of Deep Neural Networks (DNNs) is highly sensitive to the choice of hyperparameters. Hyperparameter optimization (HPO), the process of identifying the optimal hyperparameter values that maximize the model performance, is a critical step in training DNNs. Typically, Bayesian Optimization (BO), a class of Response Surface Optimization (RSO) methods for optimizing nonlinear functions, is a commonly adopted strategy for HPO. In this study, we empirically illustrate that the validation loss in HPO problems, in some cases, can be well-approximated by a second-order polynomial function. When this is the case, Classical RSO (C-RSO) methods are demonstrably more efficient in estimating the optimal response when compared with BO, especially under constraints on run size. In this study we propose Compound-RSO, a three-staged batch sequential RSO strategy for optimizing continuous experimental factors. The proposed Compound-RSO strategy estimates the complexity of the response function and appropriately chooses between C-RSO and BO. For estimating the complexity of the unknown response surface, we propose a robust design which is supersaturated for the full polynomial model. Additionally, when the second-order approximation is adequate, we propose Adaptive-RSO, an adaptive experimentation strategy for optimizing the second-order response surface. In our simulation studies on test functions of varying complexity and noise levels, we illustrate that the Compound-RSO strategy is more efficient than BO when the true response function is second-order and performs comparably to BO when the true response function is complex. A case study on HPO of DNNs for quality inspection at a medical device manufacturer is used to illustrate the usefulness of the proposed Compound-RSO strategy in a business application.