One-shot Neural Architecture Search (NAS) has been widely used to discover architectures due to its efficiency. However, previous studies reveal that one-shot performance estimations of architectures might not be well correlated with their performances in stand-alone training because of the excessive sharing of operation parameters (i.e., large sharing extent) between architectures. Thus, recent methods construct even more over-parameterized supernets to reduce the sharing extent. But these improved methods introduce a large number of extra parameters and thus cause an undesirable trade-off between the training costs and the ranking quality. To alleviate the above issues, we propose to apply Curriculum Learning On Sharing Extent (CLOSE) to train the supernet both efficiently and effectively. Specifically, we train the supernet with a large sharing extent (an easier curriculum) at the beginning and gradually decrease the sharing extent of the supernet (a harder curriculum). To support this training strategy, we design a novel supernet (CLOSENet) that decouples the parameters from operations to realize a flexible sharing scheme and adjustable sharing extent. Extensive experiments demonstrate that CLOSE can obtain a better ranking quality across different computational budget constraints than other one-shot supernets, and is able to discover superior architectures when combined with various search strategies.
To alleviate the severe problem of search inefficiency in traditional NAS algorithms, one-shot NAS proposes to share operation parameters among candidate architectures in a “supernet” and train this supernet to evaluate all sampled candidate architectures, which reduces the overall search cost from thousands of GPU days to only a few GPU hours. Despite its efficiency, previous studies reveal that one-shot NAS suffers from the poor ranking correlation between one-shot estimations and stand-alone estimations, which leads to unfair comparisons between the candidate architectures. The excessive sharing of parameters, i.e., the large sharing extent, has been widely regarded as the most important factor causing the unsatisfying performance estimation. But our experiments give a more concrete demonstration on how the sharing extent affects the ranking quality. On one hand, using a smaller sharing extent (more parameters, larger supernet capacity) can alleviate the undesired coadaptation between architectures, and has the potential to achieve higher saturating performances. On the other hand, training the supernet with higher sharing extent than the vanilla one (fewer parameters, smaller supernet capacity) greatly accelerates the training process of parameters, and help the supernet obtain a good ranking quality faster. Based on the above results and analysis, a natural idea to achieve a winwin scenario of supernet training efficiency and high ranking quality is to adapt the sharing extent during the supernet training process.
Method
CLOSENet: A Supernet with An Adjustable Sharing Extent
To enable the adaption of the sharing extent during the training process, we design a novel supernet, CLOSENet, whose sharing extent can be easily adjusted. The key idea behind CLOSENet is to decouple the parameters from operations to enable flexible sharing scheme and adjustable sharing extent. Specifically, we design the GLobal Operation Weight (GLOW) Block to store the parameters, and design a GATE module for assigning the proper GLOW block to each operation.
CLOSE: Curriculum Learning On Sharing Extent
We borrow the idea of curriculum learning to design a novel supernet training strategy CLOSE. Specifically, we initialize the CLOSENet with only one GLOW block at the beginning. This large sharing extent helps us to train the supernet much faster. Then, we gradually add GLOW blocks at preset epochs to reduce the sharing extent. In this way, CLOSE not only accelerates the supernet training, but also improves the saturating ranking quality of the supernet. We also propose two tenchiques, WIT and SRT, to further improve the ranking quality.
Evaluation of Ranking Quality
CLOSENet achieves a higher KD and P@top5% on all the NAS benchmarks (e.g., NAS-Bench-101, NAS-Bench-201, NDS-ResNet, NDS-ResNeXt-A).
Results show that CLOSE reaches SOTA KDs on all the three datasets of NAS-Bench-201.
Evaluation of Search Performance
CLOSE benefits the search process significantly. In particular, it can alleviate the collapse issue of DARTS caused by the improper preference of parameter-free operations (i.e., skip connect) in early training stages.
CLOSE achieves a competitive test error of 2.72% in CIFAR-10. And when transferred to the ImageNet, the found architecture achieves a low test error of 24.7%.
@article{zhou2022close,
title={CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS},
author={Zhou, Zixuan and Ning, Xuefei and Cai, Yi and Han, Jiashu and Deng, Yiping and Dong, Yuhan and Yang, Huazhong and Wang, Yu},
journal={arXiv preprint arXiv:2207.07868},
year={2022}
}