Cliqueformer
Model-Based Optimization With Structured Transformers
Jakub Grudzien Kuba | Pieter Abbeel | Sergey Levine
Model-Based Optimization With Structured Transformers
Jakub Grudzien Kuba | Pieter Abbeel | Sergey Levine
We introduce Cliqueformer - a scalable model for solving offline model-based optimization tasks, such as protein and hardware design. Our model learns the structural properties of the target function by acquiring its functional graphical model and shines in the big and high-dimensional data regime thanks to incorporation of the transformer backbone. Together with the novel architecture, our analysis leads to an original form of variational information bottleneck, via which the FGM structure is discovered. Our model achieves state-of-the-art results in several tasks, including a record of 1.43 (of training data max) in Superconductor and 3.15 in DNA Enhancers k562.
Cliqueformer addresses the fundamental MBO questions with its architecture and algorithm.
How to enable improvement upon the dataset's designs without sheerly exploiting errors in the predictive model?
Introduce an additive clique decomposition in the model that allows for maximizing its predictions by stitching the best in-distribution cliques.
How to assure that the optimized designs are valid?
Pre-train representations of designs that can be mapped to valid designs with variational training.
What neural mechanisms can empower a single model to attain these properties?
To enable our model to meet the above criteria we employ the transformer backbone.
How to confine the design optimization to in-distribution regions?
Keep the magnitude of designs' representations low with weight decay.
Cliqueformer learns representations of input designs that follow a pre-defined FGM decomposition. The decomposition holds with respect to the representations, over cliques of the FGM. The cliques are marginally, but not jointly, standard-normal, so that we can find new, high-value designs by stitching high-value, in-distrbution cliques.
We achieve state-of-the-art result in latent RBF tasks, that we sustain for all problem dimensionalities. We also sustain state-of-the-art performance across other tasks, and establish new SOTA on Superconductor (designing a super-conducting material) and DNA k562 (designing a DNA sequence of length 200 with high k562 activity levels).