Conservative Objective Models for Effective Offline Model-Based Optimization

ICML 2021
Brandon Trabucco * , Aviral Kumar * , Xinyang Geng , Sergey Levine
UC Berkeley ( * Equal Contribution )

Paper: https://arxiv.org/abs/2107.06882 | Code: https://github.com/brandontrabucco/design-baselines

If you want to devise new offline MBO methods and compare to COMs, consider using our standardized benchmark for offline MBO: https://github.com/brandontrabucco/design-bench , which can be installed with pip install design-bench==2.0.12

Abstract

Computational design problems arise in a number of settings, from synthetic biology to computer architectures. In this paper, we aim to solve data-driven model-based optimization (MBO) problems, where the goal is to find a design input that maximizes an unknown objective function provided access to only a static dataset of prior experiments. Such data-driven optimization procedures are the only practical methods in many real-world domains where active data collection is expensive (e.g., when optimizing over proteins) or dangerous (e.g., when optimizing over aircraft designs). Typical methods for MBO that optimize the design against a learned model suffer from distributional shift: it is easy to find a design that ``fools'' the model into predicting a high value. To overcome this, we propose conservative objective models (COMs), a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs, and uses it for optimization. Structurally, COMs resembles adversarial training methods used to overcome adversarial examples. COMs are simple to implement, and outperform a number of existing methods on a wide range of MBO problems, including optimizing protein sequences, robot morphologies, neural network weights, and superconducting materials.

Conservative Objective Models (COMs)

  • How can we learn to optimize a black-box function using only a static dataset? Given the success of supervised deep learning in giving rise us large, accurate predictive models that generalize well, we might be tempted to think that we should simply utilize these deep learning methods to learn a model of the objective function using the dataset and then optimize the learned model of the objective function using simple optimization schemes like gradient ascent.

  • Is optimizing standard predictive models sufficient to obtain good, optimized designs? It doesn't! This is because errors in the learned function can "fool" the optimizer towards producing designs that have erroneously optimistic values under the learned model but are actually quite bad under the groundtruth function. Check the plot on the right -- a great predictive model still overestimates the value of at unseen data points, out of the manifold of the training data.

To mitigate this issue and enable using predictive models, we design conservative objective models (COMs) that learns a conservative model of the objective function in a specific way to avoid these adversarial examples that fool the optimizer:

  • Similar to adversarial training, COMs explicitly mine for these adversarial design inputs that fool the optimizer, and penalize the predictive model on these adversarial examples.

  • This amounts to training with two objectives:

    • Supervised prediction for training the model and,

    • An additional regularizer that minimizes the value of adversarial examples and optionally, maximizes the function value on the dataset observed.

    • These adversarial examples can be found via simple gradient ascent on the partially trained function

A summary of the method is provided in the figure on the right.

Empirical Results
Empirically, we evaluate COMs on 7 tasks spanning real-world design problems in robot morphology design, material design, neural network weight design, as well as design problems in a discrete input space (such as those on proteins and DNA sequences). The performance of COMs compared to a number of other methods is shown in the tables below. We observe that COMs dominate in the continuous design spaces and perform competitively in discrete spaces even though they utilize simple gradient descent optimizer on the learned conservative model.

For questions and comments, please contact Brandon Trabucco (btrabucco@berkeley.edu) and Aviral Kumar (aviralk@berkeley.edu).