Our Work and Thoughts on Neural Architecture Search
NICS-EFC group, Tsinghua University
This website summarizes our work and thoughts related to neural architecture search (NAS) at NICS-EFC lab, Tsinghua University. We'll first give a brief introduction on NAS, and then discuss on our work, and finally present our summary and outlook of this field.
NAS Basics
IN recent years, with the development of deep learning technology, neural networks have become one of the main methods to realize artificial intelligence.
The performance and efficiency of neural networks are closely related to their architectures. In the early days, neural networks' architectures were always designed by experienced academics through long periods of trial and error. And in recent years, researchers have been focusing on applying automatic algorithms for discovering neural architectures (i.e., Neural Architecture Search).
Neural architecture search (NAS) mainly consists of three components: search space, architecture performance estimation strategy (or evaluation strategy), and search strategy. Specifically, the search space defines the set of all candidate architectures, the architecture performance evaluation strategy depicts how to evaluate the performance of a sampled architecture, and the search strategy defines how to explore the search space and sample architectures for evaluation.
The three key components of a NAS framework [Elsken et al., JMLR 2019].
The most usual workflow of conducting NAS for a specific application scenario (illustrated in the below figure) consists of several steps:
(1) Design or pick proper search space, evaluation strategy, and search strategy for the given task, hardware, and other constraint conditions (e.g., budget requirements for the NAS process, or budget requirement for the output architecture).
(2) Run the NAS search process. Usually, this step is the most time-consuming one, and thus receives much research attention on boosting its efficiency. In order to improve the efficiency of the NAS search process, one can design a more compact search space, improve the sample efficiency of the search strategy, or use a faster proxy evaluation strategy. Check our NAS improvement research and NAS field summary figure for more information.
(3) Derive (decide) the final architecture. Different search strategies have different deriving practices. For example, the final architecture could be picked from a population of architectures according to some criterion (e.g., evolutionary-based [Real et al., AAAI 2019]), or derived according to a human-crafted discretization rule in the earliest differentiable NAS work [Liu et al., ICLR 2019], or be sampled from an architecture distribution specified by the output of a controller network (e.g., as in most of the RL-based NAS work [Zoph et al., ICLR 2017][Zoph et al., CVPR 2018]) or global-trainable parameters (e.g., as in most of the reparametrized differentiable NAS work [Xie et al., ICLR 2019]). For most discrete search-based NAS methods, the derive step is very straightforward and does not receive much research interests, since these search algorithms themselves usually define the proper ways to output a final decision. However, if you'd like to use the differentiable NAS methods, you should pay careful attention to the derive process [Wang et al., ICLR 2021] as there exists a gap between the architecture representation in the search process (continuous and relaxed) and the actual discrete architecture.
(4) Retrain the final architecture on the task to get the final model. Usually, to make the search process in the 2nd step faster, we won't sufficiently train candidate architectures. Instead, we would evaluate architectures on a proxy task (e.g., on a smaller dataset, train for fewer epochs) using a proxy model (e.g., has fewer channels or a smaller depth). The evaluation proxy could be even more aggressive and noisy as in one-shot NAS [Pham et al., ICML 2018] and zero-shot NAS [Abdelfattah et al., ICLR 2021] (BTW, you could check our thorough study [Ning et al., NeurIPS 2021] on these proxy evaluation strategies to get a complete overview of these proxies). Therefore, we could not directly use the candidate model in the search process as the final model for deployment, and usually we retrain the derived architecture on the task to get the final model. Nevertheless, Once-For-All (OFA) [Cai et al., ICLR 2020] demonstrates the possibility of directly using the subnetwork in a dedicatedly and well-trained one-shot supernet as the final model (i.e., without retraining). The OFA method is very suitable for scenarios where we'd like to tailor different models for only one task onto different hardware or hardware requirements. On one hand, in the search process, OFA decouples the supernet training process and the search process just like some previous one-shot NAS work [Bender et al., ICML 2018][Guo et al., ECCV 20]. Thus, we do not need to retrain the supernet in the 2nd step for each hardware. On the other hand, OFA utilizes more techniques and spends more resources & time on supernet training to improve the parameter-sharing accuracies of candidate architectures, thus help get rid of the need to retrain the found architecture for each hardware (the 4th step).
The "Design -> Search -> Derive -> Retrain" workflow of most NAS application studies.
Our Work
Hope the above introduction has already brought you the basic notion of what NAS is and the basic composition of a NAS method. Our group has conducted some researches on improving the NAS efficiency, and also applied NAS for various efficiency-related and security-related applications.
Improving NAS
It's very expensive to run the earliest NAS-RL [Zoph et al., ICLR 2017] method. It uses an RL-learned RNN controller to sample architectures (the search strategy) and train each architecture from scratch for T=50 epochs (the evaluation strategy). The NAS-RL process explores N=13k architectures and takes ~48k GPU hours in total on CIFAR-10. Thereafter, there are lots of work trying to make NAS more efficient. Generally speaking, these work can be concluded into two directions: (1) Decrease N by designing compact search spaces or sample-efficient search strategies: Fewer architectures need to be evaluated before discovering a good architecture; (2) Decrease T by designing efficient evaluation strategies: Accelerate the evaluation process for each architecture while still maintaining evaluation quality.
For search strategy, the exploration in the search space should be fast but shouldn't miss high-performance architectures. For the evaluation strategy, the evaluation strategy should be efficient enough but able to reflect ranking of actual architecture performance. However, these requirements are often contradictory. To solve the problems, we conduct several studies on search strategy and evaluation strategy, respectively.
Click the two URLs below for more details.
Applying NAS
BESIDES the improvement of NAS, we also apply NAS on different tasks to discover high-performance architectures. It is often necessary to tailor an appropriate search space and evaluation strategy for efficient application on these specific tasks.
Concretely, we focus on the efficiency and security properties of models. For the former, we aim to discover high-performance and efficient architectures. For the latter, we employ NAS to enhance security properties.
Click the two URLs below for more details.
NAS Software
Our group develops a NAS framework with various NAS algorithms implemented in a modularized manner, named aw_nas, available at https://github.com/walkerning/aw_nas. Currently, aw_nas can be used to reproduce the results of many mainstream NAS algorithms, e.g., ENAS, DARTS, SNAS, FBNet, OFA, various predictor-based NAS methods, etc. And we have applied NAS algorithms for various applications & scenarios with aw_nas, including NAS for classification, detection, text modeling, hardware fault tolerance, adversarial robustness, hardware inference efficiency, and so on.
The overall working flow of aw_nas is illustrated in the figure below. All our researches are conducted using this framework. Nevertheless, since aw_nas is mainly a research-oriented framework, there might be bugs and some entry barriers to getting started. We're sorry about that, and if you encounter any problem, please open an issue to help us improve it. All types of contributions are welcome, including new NAS component implementation, new NAS applications, bug fixes, documentation, and so on.
Overview of the NAS Research Field
Development
From the resurgence work [Zoph et al., ICLR 2017] in ICLR 2017 by Zoph and Le, lots of research work emerge in the NAS field. We summarize some representative work in the field together with some of our work in the following figure.
A vague summary of the development history. For each "NAS Efficiency Improvement" type of work, we underscore the major component it focuses on (i.e., "controller" means the search strategy, "evaluation", or " search space"). For layout convenience, we omit all "et al." and maybe some affiliations. We note that this summary is only used to present a view of the development process, and is far from summarizing all important and representative work in the field due to the limitation of the space and our knowledge.
NAS Efficiency Improvement. We can see that there exist continual research efforts to improve the NAS efficiency. Early studies have designed the one-shot evaluation strategy [Pham et al., ICML 2018, Bender et al., ICML 2018, Liu et al., ICLR 2019] and applied different search strategies for NAS, including RL-based [Zoph et al., ICLR 2017, Zoph et al., CVPR 2018], evolutionary [Real et al., AAAI 2019], gradient-based [Liu et al., ICLR 2019], predictor-based [Luo et al., NIPS 2018] and so on. Regarding this research direction, the recent studies can still be classified according to the component they are improving: (1) Designing novel predictor construction or training methods for a more sample-efficient controller (i.e., search strategy)[Ning et al., ECCV 2020, Ru et al., ICLR 2021, Colin et al., NIPS 2021]; (2) Improving one-shot evaluation [Zhao et al., ICML 2021, Zhou et al, ECCV 2022] and crafting ultra-fast zero-shot evaluation strategies [Abdelfattah et al., ICLR 2021, Mellor et al., ICML 2021, Lin et al., ICCV 2021]. For a more comprehensive discussion on the current status, application suggestions, and open research problems along this research direction, we refer the readers to our two posts.
Hardware-Related. Starting in 2018, hardware-aware NAS has attracted a lot of attention. Though we only list two representative work, MNASNet [Tan et al., CVPR 2019] and Once-For-All [Cai et al., ICLR 2020] in the figure, there exist vast literature on this topic. And starting in late 2019, researchers start to design algorithm-mapping-hardware co-exploration methods [Yang et al., DAC 2020, Lin et al., DAC 2021, Sun et al., DATE 2022] to automatically find a design in the larger and multi-level search space.
Advanced Task Settings. From the perspective of NAS problem settings, the architecture evaluation has many coupling factors, such as label requirements, data requirements, and application tasks. Therefore, exploring whether we can conduct NAS under more advanced task settings or more constrained proxy problem settings, other than the vanilla fully-supervised single-task setting, e.g., fewer data [Li et al., NeurIPS 2021], fewer labels [Liu et al., ECCV 2020, Yan et al., NeurIPS 2021, Zhang et al., CVPR 2021], cross-task [Lian et al., ICLR 2020], etc.
Search Space Analysis & Auto Design. The search space design is essential for a NAS method to find well-performing architectures. And in early NAS work, researchers embed a lot of expert knowledge & experiences related to task and hardware performances in the search space design [Zoph et al., CVPR 2018, Tan et al., CVPR 2019]. Some work [Yang et al., ICLR 2020, Wan et al., ICLR 2022] reveals that the search space design largely influences or limits the usefulness of the NAS process. It is worthy to evaluate, explore and analyze the advantages and pitfalls in the current search space design [Radosavovic et al., ICCV 2019, Yang et al., ICLR 2020, Wan et al., ICLR 2022], discard some prior design constraints properly [Xie et al., ICCV 2019] and extract some low-dimension and interpretable rules in an automated way [Radosavovic et al., CVPR 2020]. The researchers in FAIR led some interesting research towards this direction in 2019. Thereafter, some following research interests mainly focus on the auto-shrinking of the search space [Hu et al., ECCV 2020, Ci et al., ICCV 2021, Chen et al., NeurIPS 2021].
Field Summary
Below is our figure summary of the NAS researches, organized into four sub-categories, i.e., "task-related", "NAS improvement", "hardware-related", and "AutoML in broader spaces". Due to the importance of benchmark researches in pushing forward the field, we also list the benchmark studies as a separate sub-category.
Regarding the "AutoML in broader spaces" category, NAS is a sub-field in the larger AutoML field [Hutter et al., Springer 2019, Yao et al., 2019]. The deep learning pipeline is complex and contains a lot of designs crafted according to manual knowledge and experiences (see the below figure). Intuitively, all these factors can be decided automatically as long as we can give a proper search space design. And more ambitiously, can we jump out of the deep learning paradigm and automatically discover things like "back-propagation" or "SGD optimizer", AutoML-Zero [Real et al., ICML 2020] is a very interesting work along this direction.
The broader AutoML space. A lot of factors can be automatically decided in the deep learning pipeline at the algorithm level, including hyper-parameters, optimizer, loss design, data augmentation, and surely, the neural network architecture).
What's Next
Serious disclaimer: The following is only my personal opinion!
At this time point in middle 2022, NAS has become a somehow standardized workflow and toolset, at least for a single task. When there emerges a new task or a new type of model family such as transformer, after designing an appropriate search space according to some domain-specific knowledge, one can pick suitable search and evaluation strategies according to some constraint conditions (e.g., budget requirements for the NAS process, or budget requirement for the output architecture). There might be needs or possibilities in amending the method for better evaluation or faster search for the specific tasks. However, this is more like a task-by-task analysis. And here, we'd like to share our opinions on more general methodology research directions.
The general methodology of efficient NAS, hardware-aware NAS, or NAS under some constrained settings with fewer data and labels is relatively well-explored. Though there certainly exists room for further improvements along these directions, the current methods are already very strong baselines and can satisfy most practical use cases. Below lists some directions that might need more researches in our opinion.
Mind the Search Space: The search space can be extended into much larger ones, bringing larger efficiency challenge, which might call for further technique improvement (we summarize three types of extension below). Secondly, the search space can be break down into smaller granularity for discovering novel knowledge. Thirdly, how to define a rule space to enable automatically discover compact sub search space is still interesting.
Vertical extension to multi-level search space (co-exploration for compiler, mapping, schedule, hardware architecture and so on).
Single-agent horizontal extension to joint AutoML on all types of factors in the deep learning pipeline.
Multi-agent horizontal extension to AutoML / DSE under the collaborative settings.
Breaking down the granularity or even out of the deep learning paradigm to discover novel knowledge (AutoML-Zero is a representative and inspiring work).
Automated rule search for sub search space (RegNet [Radosavovic et al., CVPR 2020] is a pioneering and inspiring work).
Opening the Black Box: NAS or co-exploration problems are black-box optimization problems, since the training pipeline involves complex learning dynamics we cannot fully understand now. How to open the black box either by deductive or inductive analysis is very interesting.
Understanding the rules (of the black-box-discovered design choices): Which architecture pattern can transfer between tasks? Which architecture or cross-layer interplay pattern is good. Can we have interpretation & analysis toolset that conducts inductive rule extraction (model-based, or statistics-based) for design space exploration (DSE)? This will help the user to interact with, believe in, and benefit from the DSE results.
Understanding the dynamics (of the black-box evaluation): Understanding how an architecture decide the learning dynamics is essential for designing better zero-shot evaluation strategy.
Consider Some Dynamicity: Fast transfer & adaptive to dynamic changes in Task, Workload, Hardware, and so on. We may want the transfer and adaptation to be both data efficient and computational efficient.
Application on General Big Models: There is a prominent trend of developing general big models currently. How much can AutoML techniques benefit this field? Does the "multi-modal and multi-task pre-training at scale" setting bring enormous computational challenges or other unique challenges that call for special method design (e.g., fast adaption)?
If you'd like to discuss more about these directions and collaborate with us on them, please contact me :D
Contact
Contents by: Xuefei Ning, Junbo Zhao, Zixuan Zhou
Group PI: Prof. Yu Wang, Prof. Huazhong Yang
Thanks to All Co-Authors of Our Work, including Tianchen Zhao, Yin Zheng, Wenshuo Li, Changcheng Tang, Hanbo Sun, Yi Cai, Zhenhua Zhu, Shulin Zeng, Xiangsheng Shi, Enshu Liu, Chenyu Wang, and others.
Lab Affiliation: Department of Electronic Engineering, Tsinghua University, Beijing, China, 100084.
Due to my limited knowledge, there are inevitably some omissions and errors and in the contents. If you'd like to provide information for us :D, please contact us at foxdoraame@gmail.com (Xuefei Ning) and yu-wang@tsinghua.edu.cn (Prof. Yu Wang). We also welcome all kinds of discussions about NAS, algorithm-hardware co-exploration, or other efficient deep learning topics.
[AD] Our team is recruiting master students, visiting students, research assistants and engineers, if you're interested, check the information on our website, and contact us!
References
[Snoek et al., NIPS 2012] Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. "Practical bayesian optimization of machine learning algorithms." Advances in neural information processing systems 25 (2012).
[Zoph et al., ICLR 2017] Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).
[Pham et al., ICML 2018] Pham, Hieu, et al. "Efficient neural architecture search via parameters sharing." International conference on machine learning. PMLR, 2018.
[Bender et al., ICML 2018] Bender, Gabriel, et al. "Understanding and simplifying one-shot architecture search." International conference on machine learning. PMLR, 2018.
[Zoph et al., CVPR 2018] Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
[Kandasamy et al., NIPS 2018] Kandasamy, Kirthevasan, et al. "Neural architecture search with bayesian optimisation and optimal transport." Advances in neural information processing systems 31 (2018).
[Elsken et al., JMLR 2019] Elsken, Thomas, Jan Hendrik Metzen, and Frank Hutter. "Neural architecture search: A survey." The Journal of Machine Learning Research 20.1 (2019): 1997-2017.
[Hutter et al., Springer 2019] Hutter, Frank, and Kotthoff, Lars, and Vanschoren, Joaquin, "Automated Machine Learning - Methods, Systems, Challenges", Springer 2019.
[Yao et al., 2019] Yao, Quanming, et al. "Taking human out of learning applications: A survey on automated machine learning." arXiv preprint arXiv:1810.13306 (2018).
[Real et al., AAAI 2019] Real, Esteban, et al. "Regularized evolution for image classifier architecture search." Proceedings of the aaai conference on artificial intelligence. Vol. 33. No. 01. 2019.
[Liu et al., ICLR 2019] Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "DARTS: Differentiable Architecture Search." International Conference on Learning Representations. 2019.
[Xie et al., ICLR 2019] Xie, Sirui, et al. "SNAS: stochastic neural architecture search." International Conference on Learning Representations. 2019.
[Tan et al., CVPR 2019] Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[Cubuk et al., CVPR 2019] Cubuk, Ekin D., et al. "Autoaugment: Learning augmentation strategies from data." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[Xie et al., ICCV 2019] Xie, Saining, et al. "Exploring randomly wired neural networks for image recognition." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
[Radosavovic et al., ICCV 2019] Radosavovic, Ilija, et al. "On network design spaces for visual recognition." Proceedings of the IEEE/CVF international conference on computer vision. 2019.
[Cai et al., ICLR 2020] Cai, Han, et al. "Once-for-All: Train One Network and Specialize it for Efficient Deployment." International Conference on Learning Representations. 2020.
[Yang et al., ICLR 2020] Yang, Antoine, Pedro M. Esperança, and Fabio M. Carlucci. "NAS evaluation is frustratingly hard." International Conference on Learning Representations. 2020.
[Radosavovic et al., CVPR 2020] Radosavovic, Ilija, et al. "Designing network design spaces." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
[Yang et al., DAC 2020] Yang, Lei, et al. "Co-exploration of neural architectures and heterogeneous asic accelerator designs targeting multiple tasks." 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020.
[Real et al., ICML 2020] Real, Esteban, et al. "Automl-zero: Evolving machine learning algorithms from scratch." International Conference on Machine Learning. PMLR, 2020.
[Liu et al., ECCV 2020] Liu, Chenxi, et al. "Are labels necessary for neural architecture search?." European Conference on Computer Vision. Springer, Cham, 2020.
[Ning et al., ECCV 2020] Ning, Xuefei, et al. "A generic graph-based neural architecture encoding scheme for predictor-based nas." European Conference on Computer Vision. Springer, Cham, 2020.
[Guo et al., ECCV 2020] Guo, Zichao, et al. "Single path one-shot neural architecture search with uniform sampling." European conference on computer vision. Springer, Cham, 2020.
[Hu et al., ECCV 2020] Hu, Yiming, et al. "Angle-based search space shrinking for neural architecture search." European Conference on Computer Vision. Springer, Cham, 2020.
[Yan et al., NeurIPS 2020] Yan, Shen, et al. "Does unsupervised architecture representation learning help neural architecture search?." Advances in Neural Information Processing Systems 33 (2020): 12486-12498.
[Xie et al., ACMCS 2021] Xie, Lingxi, et al. "Weight-sharing neural architecture search: A battle to shrink the optimization gap." ACM Computing Surveys (CSUR) 54.9 (2021): 1-37.
[Wang et al., ICLR 2021] Wang, Ruochen, et al. "Rethinking architecture selection in differentiable NAS." arXiv preprint arXiv:2108.04392 (2021).
[Abdelfattah et al., ICLR 2021] Abdelfattah, Mohamed S., et al. "Zero-Cost Proxies for Lightweight NAS." International Conference on Learning Representations. 2021.
[Ru et al., ICLR 2021] Ru, Binxin, et al. "Interpretable Neural Architecture Search via Bayesian Optimisation with Weisfeiler-Lehman Kernels." International Conference on Learning Representations. 2021.
[Li et al., ICLR 2021] Li, Hao, et al. "Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation." International Conference on Learning Representations. 2021.
[Zhang et al., CVPR 2021] Zhang, Xuanyang, et al. "Neural architecture search with random labels." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
[Lin et al., DAC 2021] Lin, Yujun, Mengtian Yang, and Song Han. "Naas: Neural accelerator architecture search." 2021 58th ACM/IEEE Design Automation Conference (DAC). IEEE, 2021.
[Mellor et al., ICML 2021] Mellor, Joe, et al. "Neural architecture search without training." International Conference on Machine Learning. PMLR, 2021.
[Zhao et al., ICML 2021] Zhao, Yiyang, et al. "Few-shot neural architecture search." International Conference on Machine Learning. PMLR, 2021.
[Ci et al., ICCV 2021] Ci, Yuanzheng, et al. "Evolving search space for neural architecture search." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[Lin et al., ICCV 2021] Lin, Ming, et al. "Zen-nas: A zero-shot nas for high-performance image recognition." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[Ning et al., NeurIPS 2021] Ning, Xuefei, et al. "Evaluating efficient performance estimators of neural architectures." Advances in Neural Information Processing Systems 34 (2021): 12265-12277.
[Colin et al., NeurIPS 2021] White, Colin, et al. "How powerful are performance predictors in neural architecture search?." Advances in Neural Information Processing Systems 34 (2021): 28454-28469.
[Chen et al., NeurIPS 2021] Chen, Minghao, et al. "Searching the Search Space of Vision Transformer." Advances in Neural Information Processing Systems 34 (2021): 8714-8726.
[Li et al., NeurIPS 2021] Li, Yuhong, et al. "Generic neural architecture search via regression." Advances in Neural Information Processing Systems 34 (2021): 20476-20490.
[Wan et al., ICLR 2022] Wan, Xingchen, et al. "On Redundancy and Diversity in Cell-based Neural Architecture Search." International Conference on Learning Representations. 2021.
[Sun et al., DATE 2022] Sun, Hanbo, et al. "Gibbon: efficient co-exploration of NN model and processing-in-memory architecture." 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2022.
[Zhou et al., ECCV 2022] Zhou, Zixuan, et al. "CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS." arXiv preprint arXiv:2207.07868 (2022).