Learning-to-learn through Model-based Optimization: HPO, NAS, and Distributed Systems


In recent years we have seen rapid progress in developing modern NLP applications, by either building omni-purpose systems via training massive language models such as GPT-3 on big data, or building industrial solutions for specific real-world use cases via composition from pre-made modules. In both cases, a bottleneck developers often face is the effort required to determine the best way to train the model: such as how to tune the optimal configuration of hyper-parameters of the model(s), big or small, single or multiple; how to choose the best structure of a single large network or a pipeline of multiple model modules; or even how to dynamically pick the best learning rate and gradient-update transmission/synchronization scheme to achieve best “Goodput” of training on a cluster. This is a special area in meta-learning that concerns the question of “learning to learn”. However, many existing methods remain rather primitive, including random search, simple line or grid (or hyper-grid) search, and genetic algorithms, which suffer many limitations such as optimality, efficiency, scalability, adaptability, and ability to leverage domain knowledge.

In this talk, we present a learning-to-learn methodology based on model-based optimization (MBO), which leverages machine learning models which take actions to gather information and provide recommendations to efficiently improve performance. This exhibits several advantages over existing alternatives: 1) provides adaptive/elastic algorithms that improve performance online; 2) we can incorporate domain knowledge into these models for improved recommendations; 3) can easily facilitate more-data-efficient automatic learning-to-learn, or Auto-ML. We show applications of Auto-ML via MBO in three main tasks: hyper-parameter tuning, neural architecture search, and Goodput optimization in distributed systems. We argue that such applications can improve productivity and performance of NLP systems across the board.