Neural Architecture Search (NAS) is a promising approach to discover good neural network architectures for given applications. Among the three basic components in a NAS system (search space, search strategy, and evaluation), prior work mainly focused on the development of different search strategies and evaluation methods. As most of the previous hardwareaware search space designs aimed at CPUs and GPUs, it still remains a challenge to design a suitable search space for Deep Neural Network (DNN) accelerators. Besides, the architectures and compilers of DNN accelerators vary greatly, so it is quite difficult to get a unified and accurate evaluation of the latency of DNN across different platforms. To address these issues, we propose a black box profiling-based search space tuning method and further improve the latency evaluation by introducing a layer adaptive latency correction method. Used as the first stage in our general accelerator-aware NAS pipeline, our proposed methods could provide a smaller and dynamic search space with a controllable trade-off between accuracy and latency for DNN accelerators. Experimental results on CIFAR-10 and ImageNet demonstrate our search space is effective with up to 12.7% improvement in accuracy and 2.2x reduction of latency, and also efficient by reducing the search time and GPU memory up to 4.35x and 6.25x, respectively.
Recently, many researchers have begun to study the implementation of hardware-aware NAS frameworks on the algorithm level. However, most of these hardware-aware NAS frameworks are targeted for mobile CPUs, other hardware platforms such as Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs) are not discussed. Take FBNet as an example, its candidate blocks in the SS are inspired by MobileNet [7], which is specially designed for mobile CPU platform. As there are more and more DNN accelerators based on FPGAs and ASICs, their architecture design and compilation tools are quite different from each other. Since the Search Space (SS) restricts the upper bound of the network performance obtained by NAS, ignoring the relationship between the SS and the accelerator platform will lead to the failure of NAS to obtain the optimal network. Besides, in most cases, we are unable to obtain any information about the hardware architecture and the software compiler, making it hard to accurately evaluate the latency of each candidate network on these black box accelerators. To overcome these challenges, we propose a black box profiling-based search space tuning method and a layer adaptive latency correction method to build a general accelerator-aware NAS framework with a controllable tradeoff between accuracy and latency.
Method
A General Accelerator-Aware NAS Framework
We propose a general accelerator-aware NAS framework targeted for the black box accelerator platform. In specific, we introduce a black box search space profiling method to provide an efficient and effective SS. In addition, we apply a layer adaptive latency correction method to obtain a policy-aware latency LUT.
BBSSP: A Black Box Search Space Profiling Method
We first introduce the Search Space Base Networks (SSBNs) and present a cost function for evaluating the SS. Based on the proposed SSBNs and cost function, our search space profiling flow is as follows:
Firstly, we generate the SSBNs for each candidate SS.
Then, we evaluate the accuracy and latency of each candidate network in the basenet pool.
Next, we compute the cost of each base network and select the high-performed ones.
Using the proposed algorithm, we can derive an optimized SS, of which the size is much smaller and controllable.
The validity of the Cost function design: the obtained search space has the lowest cost compared to the baseline, and can optimize the accuracy and latency according to the artificial parameters.
Our search space can achieve 1.4% higher accuracy, 1.9x reduction in latency, 2.7x and 4x reduction of the search time and GPU memory, on average.
@inproceedings{zeng2020black,
title={Black box search space profiling for accelerator-aware neural architecture search},
author={Zeng, Shulin and Sun, Hanbo and Xing, Yu and Ning, Xuefei and Shan, Yi and Chen, Xiaoming and Wang, Yu and Yang, Huazhong},
booktitle={2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)},
pages={518--523},
year={2020},
organization={IEEE}
}