Wuyang Chen* Zhiding Yu† Zhangyang Wang* Anima Anandkumar†‡
*Texas A&M University †NVIDIA ‡Caltech
Resources
[Paper] [Code] [Slides] [Poster] [Talk]
Contact: Wuyang Chen (wuyang.chen@tamu.edu) Zhiding Yu (zhidingy@nvidia.com)
Abstract
Training with synthetic images has become increasingly popular as it can provide unlimited labels at low costs. However, Models trained on synthetic images often face degraded generalization to real data. To remedy such domain gaps, we propose an Automated Synthetic-to-real Generalization framework by formulating it as a lifelong learning problem with ImageNet pre-trained model. This method is automated from two aspects: 1) It consistently improves the generalization during transfer learning, avoiding the difficulty of choosing epochs to stop. 2) It automates the complicated tuning of layer-wise learning rates towards better generalization. The core of our work is the intuition that a good synthetically-trained model should share similar representations with ImageNet-models, and we leverage this intuition as proxy guidance to search layer-wise training schedules through learning-to-optimize. Our work is able to generalize synthetically trained models to real data without seeing them. Since it does not require any extra training loop other than synthetic training, it can be conveniently used as a drop-in module to many applications involving synthetic training.
Citation
If you find this work useful in your research, please consider to cite the following paper:
@inproceedings{chen2020automated,
title={Automated Synthetic-to-Real Generalization},
author={Chen, Wuyang and Yu, Zhiding and Wang, Zhangyang and Anandkumar, Anima},
booktitle={International Conference on Machine Learning (ICML)},
year={2020}
}
Motivation
Proxy Guidance
Use ImageNet pre-trained representation as a proxy guidance to measure syn-to-real generalization without needing to access any real data.
Formulate as a learning without forgetting problem, with the similarity between synthetically trained model and ImageNet pre-trained model imposed via KL-divergence.
Losses used as rewards in subsequent L2O module to determine layer-wise learning rates.
Generalizable to different tasks defined by synthetic data (e.g., segmentation).
Learning-to-Optimize
Learning rate matters for syn-to-real generalization.
Automated layer-wise learning rate selection via learning-to-optimize.
Observations: training statistics.
Reward at t+1: L_{t} - L_{t+1}.
Actions at t+1: discrete learning rate scale factors [0, 0.1, 0.2, ..., 0.9, 1].
ASG Improves Domain Generalization
VisDA-17 (Image Classification)
GTA5 -> Cityscapes (Semantic Segmentation)
ASG Improves Domain Adaptation
Improved domain adaptation performance on VisDA-17 as a result of ASG.
T-SNE visualization of learned embeddings. Left: Source-Res101. Right: ASG + CRST.