Application about Efficiency

For NAS application about efficiency, we introduce three of our work:

  1. DSA is a differentiable pruning method targeting certain resource budget.

  2. BBSSP is a very simple search space tuning method that decide a compact search space for a given hardware accelerator.

  3. Gibbon is a co-exploration method that conducts cross-layer co-exploration for the NN architecture, bitwidths, and hardware configurations for PIM-based NN accelerator.

Research List

Budgeted pruning is the problem of pruning under resource constraints. In budgeted pruning, how to distribute the resources across layers (i.e., sparsity allocation) is the key problem. Traditional methods solve it by discretely searching for the layer-wise pruning ratios, which lacks efficiency. In this paper, we propose Differentiable Sparsity Allocation (DSA), an efficient end-to-end budgeted pruning flow. Utilizing a novel differentiable pruning process, DSA finds the layer-wise pruning ratios with gradient-based optimization. It allocates sparsity in continuous space, which is more efficient than methods based on discrete evaluation and search. Furthermore, DSA could work in a pruning-from-scratch manner, whereas traditional budgeted pruning methods are applied to pre-trained models.

NAS is a promising approach to discover good neural network architectures for given applications. Among the three basic components in a NAS system (search space, search strategy, and evaluation), prior work mainly focused on the development of different search strategies and evaluation methods. As most of the previous hardwareaware search space designs aimed at CPUs and GPUs, it still remains a challenge to design a suitable search space for Deep Neural Network (DNN) accelerators. Besides, the architectures and compilers of DNN accelerators vary greatly, so it is quite difficult to get a unified and accurate evaluation of the latency of DNN across different platforms. To address these issues, we propose a black box profiling-based search space tuning method and further improve the latency evaluation by introducing a layer adaptive latency correction method. Used as the first stage in our general accelerator-aware NAS pipeline, our proposed methods could provide a smaller and dynamic search space with a controllable trade-off between accuracy and latency for DNN accelerators.

The memristor-based Processing-In-Memory (PIM) architectures have shown great potential to boost the computing energy efficiency of Neural Networks (NNs). To ensure high accuracy and energy efficiency, it is important to co-design the NN model and PIM architecture. However, on the one hand, the co-exploration space of NN model and PIM architecture is extremely tremendous, making searching for the optimal results difficult. On the other hand, during the coexploration process, PIM simulators pose a heavy computational burden and runtime overhead for evaluation. To address these problems, in this paper, we propose an efficient co-exploration framework for the NN model and PIM architecture, named Gibbon. In Gibbon, we propose an evolutionary search algorithm with adaptive parameter priority, which focuses on subspace of high priority parameters and alleviates the problem of vast codesign space. Besides, we design a Recurrent Neural Network (RNN) based predictor for accuracy and hardware performances. It substitutes for a large part of the PIM simulator workload and reduces the long simulation time.