The vanilla evaluation strategy trains each candidate architecture separately from scratch and test it on the validation dataset to get the evaluation result, which is extremely computation-intensive.
Instead of training each architecture from scratch to convergence, there are some work proposing to inherit weights from parent architectures (used in conjunction with local-search / mutation-based search strategy), or early stop the training before convergence. These evaluation strategies still train candidate architectures separately. In contrast, one-shot evaluation strategies amortize the training costs of candidate architectures into the training of one parameter-sharing supernet or one parameter-generating hypernet, as shown in the below figure. From middle 2020, researchers are exploring a more ambitious question "can we further reduce the training cost of architecture evaluation strategy to zero", and propose several zero-shot" evaluation strategies.
Illustration of the one-shot and zero-shot evaluation strategies.
Despite the high efficiency of one-shot and zero-shot strategies and the wide application of one-shot strategies, their evaluation quality and bias lack a thorough study. Therefore, we conduct a comprehensive "surgery" of these efficient evaluation strategies (one-shot, 8 types of zero-shot) on five diverse benchmarks at NeurIPS'21. This work provides a diagnosis toolset, multiple pieces of knowledge, application practices, and research directions, which we hope are valuable for the NAS community.
Following one direction pointed out by this evaluation research, we propose an improved one-shot evaluation strategy with dynamically-decided and curriculum-scheduled sharing extent at ECCV'22.
Welcome to check out these two works below!
The CLOSE one-shot evaluation strategy learns a more proper parameter-sharing pattern between architectures.
Conducting efficient performance estimations of neural architectures is a major challenge in NAS. To reduce the architecture training costs in NAS, one-shot estimators (OSEs) amortize the architecture training costs by sharing the parameters of one “supernet” between all architectures. Recently, zero-shot estimators (ZSEs) that involve no training are proposed to further reduce the architecture evaluation cost. Despite the high efficiency of these estimators, the quality of such estimations has not been thoroughly studied. In this paper, we conduct an extensive and organized assessment of OSEs and ZSEs on five NAS benchmarks: NAS-Bench-101/201/301, and NDS ResNet/ResNeXt-A. After analyzing how and why the OSE estimations are unsatisfying, we explore how to mitigate the correlation gap of OSEs from several perspectives.
One-shot NAS has been widely used to discover architectures due to its efficiency. However, previous studies reveal that one-shot performance estimations of architectures might not be well correlated with their performances in stand-alone training because of the excessive sharing of operation parameters (i.e., large sharing extent) between architectures. To alleviate this issue, we propose to apply Curriculum Learning On Sharing Extent (CLOSE) to train the supernet both efficiently and effectively. Also, we design a novel supernet (CLOSENet) that decouples the parameters from operations to realize a flexible sharing scheme and adjustable sharing extent.