Budgeted pruning is the problem of pruning under resource constraints. In budgeted pruning, how to distribute the resources across layers (i.e., sparsity allocation) is the key problem. Traditional methods solve it by discretely searching for the layer-wise pruning ratios, which lacks efficiency. In this paper, we propose Differentiable Sparsity Allocation (DSA), an efficient end-to-end budgeted pruning flow. Utilizing a novel differentiable pruning process, DSA finds the layer-wise pruning ratios with gradient-based optimization. It allocates sparsity in continuous space, which is more efficient than methods based on discrete evaluation and search. Furthermore, DSA could work in a pruning-from-scratch manner, whereas traditional budgeted pruning methods are applied to pre-trained models. Experimental results on CIFAR-10 and ImageNet show that DSA could achieve superior performance than current iterative budgeted pruning methods, and shorten the time cost of the overall pruning process by at least 1.5× in the meantime.
The large storage and computing overhead of CNNS brings challenges to the deployment in resource-constrained scenarios. Model pruning method is widely used to reduce the computation and storage resource consumption of CNN. Moreover, many resource-constrained scenarios explicitly limit model size, latency, and other efficiency metrics, requiring deployed models to meet certain resource constraints. At this point, the NN pruning process needs to ensure that the resulting model can meet the given resource constraints.
Pruning for a given resource constraint can be divided into two subproblems: determining how many channels to keep at each layer (architectural determination, sparsity assignment), and how to train for appropriate weights (weight optimization). Existing studies have shown that similar performance can be obtained after the pruned model architecture is determined, whether it is trained from scratch or fine-tuning the pre-trained model. In other words, sparsity allocation is the core issue in pruning for a given resource constraint.
In order to solve the pruning problem under the given resource limit, the mainstream method adopts the "iterative pruning" method, which includes the following three steps: pre-training, sparsity allocation, and fine-tuning. Among them, sparsity assignment is a multi-stage discrete search process, which generally requires hundreds of model accuracy tests, and each test evaluation requires more resources. In addition, such methods also require pre-training of large models, which further increases the resource consumption. Since architecture decision is the key problem in pruning with a given resource limit, this work proposes an end-to-end differentiable pruning method, DSA, by referring to the differentiable NAS method, to solve the task performance maximization problem with resource limit through gradient optimization, which improves the pruning efficiency.
Since the validation accuracy is not differentiable, similar to the differentiable NAS algorithm, we use the validation loss as a differentiable proxy for validation accuracy. As for the internal optimization of the weights corresponding to the current pruning ratios, this work uses the first-order update formula in the bilevel optimization similar to DARTS to adapt to the changes.
The workflow of the DSA method proposed in this chapter is shown in the Figure above. Firstly, DSA groups each convolution layer into K groups according to the topological constraints introduced by skip connections in the model, and assigns a keep ratio to each group. The sparse allocation problem is to determine the appropriate retention ratio A for A total of K groups. DSA then optimizes the keeping ratio A end-to-end in continuous space using a gradient-based approach. Since the optimization problem to be solved is an optimization problem with resource constraints, DSA uses an optimization method similar to ADMM to coordinate the gradient of the task loss as the optimization objective and the resource loss as the constraint to find the keeping ratio A that meets the resource constraints and has excellent task performance. In order to obtain the gradient of the task loss function to each layer, this work designs A probabilistic and differentiable pruning process to carry out the pruning of each layer.
To demonstrate the effectiveness of our proposed method, we conduct network pruning for ResNet-20 and ResNet-56 on CIFAR-10 and ImageNet, respectiveness.
Main Results
The pruned models obtained by DSA meet the budget constraints with smaller accuracy drops than the baseline methods. Compared with the regularization based methods (e.g., SSL and MorphNet), due to the explicit budget modeling, DSA guarantees that the resulting models meet different budget constraints, without hyperparameter trials. Compared with the iterative pruning methods (e.g., AMC), DSA allocates the sparsity in a gradient-based way and is more efficient.
Pruning results of ResNet-20 and ResNet-56 on CIFAR-10. SSL and MorphNet are re-implemented with topological grouping. Accuracy drops for the referred results are calculated based on the reported baseline in their papers. Headers: "TG" stands for Topological Grouping; "FLOPs Budget" stands for the percentage of the pruned models' FLOPs compared to the full model.
Pruning results on ImageNet. "TG" stands for Topological Grouping.
Rationality of the Differentiable Sparsity Allocation
The figure below presents the sparsity allocation (FLOPs budget 25%) for ResNet-18 on CIFAR-10 obtained by DSA and SSL. The results show that the first layer for primary feature extraction should not be pruned too aggressively (group A), so do shortcut layers that are responsible for information transmission across stages (groups A, D, G, J). The strided convolutions are relatively more sensitive, and more channels should be kept (groups E, H). In conclusion, DSA obtains reasonable sparsity allocation that matches empirical knowledge, with lower computational cost than iterative pruning methods.
The comparison between the normalized sensitivity, and the sparsity allocation of DSA and SSL for ResNet-18 on CIFAR-10.
@inproceedings{ning2020dsa,
title={Dsa: More efficient budgeted pruning via differentiable sparsity allocation},
author={Ning, Xuefei and Zhao, Tianchen and Li, Wenshuo and Lei, Peng and Wang, Yu and Yang, Huazhong},
booktitle={European Conference on Computer Vision},
pages={592--607},
year={2020},
organization={Springer}
}