Conducting efficient performance estimations of neural architectures is a major challenge in neural architecture search (NAS). To reduce the architecture training costs in NAS, one-shot estimators (OSEs) amortize the architecture training costs by sharing the parameters of one “supernet” between all architectures. Recently, zero-shot estimators (ZSEs) that involve no training are proposed to further reduce the architecture evaluation cost. Despite the high efficiency of these estimators, the quality of such estimations has not been thoroughly studied. In this paper, we conduct an extensive and organized assessment of OSEs and ZSEs on five NAS benchmarks: NAS-Bench-101/201/301, and NDS ResNet/ResNeXt-A. Specifically, we employ a set of NAS-oriented criteria to study the behavior of OSEs and ZSEs and reveal that they have certain biases and variances. After analyzing how and why the OSE estimations are unsatisfying, we explore how to mitigate the correlation gap of OSEs from several perspectives. Through our analysis, we give out suggestions for future application and development of efficient architecture performance estimators. Furthermore, the analysis framework proposed in our work could be utilized in future research to give a more comprehensive understanding of newly designed architecture performance estimators.
One-shot Estimation (OSE) and Zero-shot Estimator (ZSE) are two kinds of architecture performance estimation strategies for acceleration. Despite their widespread use, especially of OSEs, studies have revealed that they might fail to reflect the true ranking of architectures. In this work, we conduct a more comprehensive study on OSEs in five search spaces with distinct properties, including three topological search spaces (NAS-Bench-101, NAS-Bench-201, and NAS-Bench-301), and two non-topological search spaces (NDS ResNet, and NDS ResNeXt-A). We further analyze how and why OSE estimations have bias and variance, and explore how to improve OSEs. And we study various ZSEs on several benchmarks and reveal their properties and weakness.
Analysis Framework
Our analysis framework consists of three components, e.g., analysis target, analysis benchmark and analysis aspect. The analysis target contains One-Shot Estimators (OSEs) and Zero-Shot Estimators (ZSEs), which are two commonly used efficient performance estimators in the NAS field. The analysis benchmark contains NAS-Bench-101 (NB101), NAS-Bench-201 (NB201), NAS-Bench-301 (NB301), NDS-ResNet and NDS-ResNeXt-A. The analysis aspect contains the ranking correlation, distinguishing ability of top/bottom architectures and the estimation bias and variance. We believe this analysis framework is comprehensive enough to evaluate the efficient performance estimators and can be utilized in the further research.
OSE Diagnosis & Improvements
Criteria Trend
Ranking quality keeps increasing while training. Longer one-shot training helps.
Distinguish bottom architectures relatively well (P@top 5% < P@bottom 5%)
Bias Diagnosis
Some architectures need higher sampling probability to match their relative performance in standalone training.
Architectures are sampled from an unfair distribution, where some have undesirable higher equivalent probabilities.
Variance Diagnosis
The training of subsequent architectures overwrites the previous weights, thus degrades the accuracy.
Even with rather sufficient training (1k epoch) where the mean OS accuracy already saturates, the ranking stability of top architectures is still not high.
Improvements
Improving stability of OSE estimations to reduce temporal variances.
Improving the fairness of OSE sampling to reduce biases.
Intuitively, reducing the sharing extent of OSE might help.
ZSE Diagnosis
Criteria Comparison
ZSEs cannot surpass the ranking qualities of #FLOPs or #Param.
The best ZSE is different across SSes.
Architecture-level ZSEs get good overall correlation on topological SSes.
Current ZSEs have poor P@topKs.
Bias Diagnosis
Some parameter-level ZSEs prefer linear architectures without skip connection (prefer gradient explosion).
jacob_cov / relu_logdet prefer small kernel sizes (e.g., 1x1 over 3x3).
@article{ning2021evaluating,
title={Evaluating Efficient Performance Estimators of Neural Architectures},
author={Ning, Xuefei and Tang, Changcheng and Li, Wenshuo and Zhou, Zixuan and Liang, Shuang and Yang, Huazhong and Wang, Yu},
journal={Advances in Neural Information Processing Systems},
volume={34},
year={2021}
}