The memristor-based Processing-In-Memory (PIM) architectures have shown great potential to boost the computing energy efficiency of Neural Networks (NNs). Existing work concentrates on hardware architecture design and algorithm-hardware co-optimization, but neglects the non-negligible impact of the correlation between NN models and PIM architectures. To ensure high accuracy and energy efficiency, it is important to co-design the NN model and PIM architecture. However, on the one hand, the co-exploration space of NN model and PIM architecture is extremely tremendous, making searching for the optimal results difficult. On the other hand, during the coexploration process, PIM simulators pose a heavy computational burden and runtime overhead for evaluation. To address these problems, in this paper, we propose an efficient co-exploration framework for the NN model and PIM architecture, named Gibbon. In Gibbon, we propose an evolutionary search algorithm with adaptive parameter priority, which focuses on subspace of high priority parameters and alleviates the problem of vast codesign space. Besides, we design a Recurrent Neural Network (RNN) based predictor for accuracy and hardware performances. It substitutes for a large part of the PIM simulator workload and reduces the long simulation time. Experimental results show that the proposed co-exploration framework can find better NN models and PIM architectures than existing studies in only seven GPU hours (8.4∼41.3× speedup). At the same time, Gibbon can improve the accuracy of co-design results by 10.7% and reduce the energy-delay-product by 6.48× compared with existing work.
The memristor-based Processing-In-Memory (PIM) architectures have shown powerful capabilities in NN computing. PIM architectures can perform in-situ Matrix-VectorMultiplications (MVMs) and reduce the weight data movements, improving the computing energy efficiency. Existing PIM work mainly focuses on hardware architecture design and algorithm-hardware co-optimization (e.g., pruning and quantization) for given NN models. But these researches neglect the non-negligible impact of the correlation between NN structure parameters (e.g., kernel size) and PIM architecture design parameters (e.g., crossbar size) on accuracy and hardware performance. Neural Architecture Search (NAS) is an effective way to automatically search well-performing NN models. Researchers have proposed PIM-oriented NAS methods to automatically explore the NN model and PIM architecture codesign space. However, these studies suffer from low exploration efficiency and long search time because of the explosive search space expansion and the time-consuming simulation of PIM architectures. In order to solve these problems, this paper proposes an efficient co-exploration framework for NN model and PIM architecture, which can reduce the search time from hundreds of GPU hours to several GPU hours and generate better search results with higher accuracy and hardware performance.
Method
Gibbon: A Co-Exploration Frameowork
Our proposed framework, Gibbon, consists of three key parts: the joint search space for NN model and PIM architecture co-exploration (b), the evolutionary search algorithm with adaptive parameter priority (ESAPP) (c), and the RNN-based performance predictor (d). The joint search space contains many search candidates. In each iteration of the search process, ESAPP samples multiple parents and sends them to the RNN-based predictor, which can predict the hardware performances and the accuracy difference. After getting the prediction results, we use them to filter out ∼95% of the sampled parents. Afterward, ESAPP mutates the selected parents to get new candidates. Finally, these candidates are evaluated by the accurate but costly PIM simulator, and the evaluation results are used to update the RNN-based predictor and the dynamic priorities in ESAPP.
Evolutionary Search with Adaptive Parameter Priority (ESAPP)
The large search space size of PIM-oriented co-exploration poses search efficiency challenges on the application of evolutionary search. To tackle this problem, we propose the evolutionary search algorithm with adaptive parameter priority (ESAPP), which assigns a priority to each design parameter and determines which parameters to be mutated in this iteration according to the search priority. In our experiments, ESAPP reduces the average equivalent search space size from 1090 to roughly 1042 during the search.
RNN-based Predictor
The evaluator assesses the NN accuracy and hardware performances of candidate designs. However, existing evaluators require long time (~10 minutes) for simulating a single NN model, which will bring huge search costs. To accelerate the evaluation, we propose an RNN-based NN accuracy and PIM performance predictor. The predictor consists of a design embedder, a feature extractor, and a regressor. In addition, the predictor only needs to predict the relative accuracy loss brought by PIM architecture of a candidate design, which is an easier problem than predicting the absolute accuracy.
Gibbon can achieve better results with higher co-exploration efficiency
Gibbon can achieve 0.2~10.7% accuracy promotion.
Gibbon with area optimization can achieve 2.51x area reduction.
Gibbon with EDP optimization can achieve 6.48x EDP reduction.
Gibbon only need 7 hours to co-exploration, 8.4~41.3x search efficiency improvement.
Insights
We find some interesting observations as follows, hoping to provide some design suggestions for the co-design of NN model and PIM architecture in the future:
Most convolution layers tend to select evensized kernels to reduce the area and EDP.
Group convolution can reduce the amount of calculations.
The shallower and deeper layers tend to have larger output channel number.
The deeper convolution layers tend to choose high quantization bitwidth of weights.
For CIFAR-10, the accuracy of 8-bit ADCs is close to that of 10-bit ADCs.
Low latency models tend to choose smaller output channel number in shallow layers.
The energy-optimal PIM design tends to choose large crossbar size.
Conv layers tend to select evensized kernels.
Group convolution reduces the amount of calculations.
Shallower and deeper layers have larger output channel number.
The deeper conv layers tend to choose high quantization bitwidth.
For CIFAR-10, the acc of 8-bit ADCs is close to that of 10-bit ADCs.
shallow layers choose smaller output channel number.
The PIM design tends to choose large crossbar size.
@inproceedings{sun2022gibbon,
title={Gibbon: efficient co-exploration of NN model and processing-in-memory architecture},
author={Sun, Hanbo and Wang, Chenyu and Zhu, Zhenhua and Ning, Xuefei and Dai, Guohao and Yang, Huazhong and Wang, Yu},
booktitle={2022 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)},
pages={867--872},
year={2022},
organization={IEEE}
}