Using Condensa we demonstrate that one can Surpass the State-of-the-Art in DNN model compression and we have compared our approach with two prior works:
Illustration of the level of automation provided by CONDENSA
Selectively compress DNN layers based on
layer.<PROPERTY_NAME>
methods.platform_has_<QUERY_NAME>
methods.Optimization on "any" User defined Black Box function indicating a performance metric of the DNN to improve, some examples are:
Easily and Correctly compose Condensa schemes
Compose[S<n]
methodQuantize(float16)
if platform supports IEEE half precision floating point computationsPyThon
easily, in the compression program of line-6. There are two major choices that must be made when performing Bayesian optimization.
This plot illustrates the prior we use in Condensa when searching for the global sparsity value, By default we setup a simple 1D Gaussian process. In this plot above, you will notice 3 past observations. The solid black line is the GP surrogate mean prediction of the objective function given the data, and the shaded area shows the mean plus and mean minus the variance. The superimposed Gaussians corresponds to the GP mean and standard deviation of the prediction at the 3 previously sampled points.
Here there are 2 plots, the plot on the left illustrates Condensa sparsity profile for ResNet 110 (CIFAR-10) Filter Pruning and the Goal for Bayesian optimization is search for a global sparsity value that satisfies a user provided accuracy constraint, for example line 17 in the listing above (- 2.00%).
The plot on the right, illustrates how we modeled our prior belief of this expensive black-box function (L-C , the Green Curve) to be a Gaussian Process, with kernel functions that have default values, and can be easily modified by the Condensa user, depending on the dataset-network under compression.
Gaussian Processes (GP) are a generic supervised learning method designed to solve regression and probabilistic classification problems.
The advantages of Gaussian processes are:
The choice of co variance functions for the Gaussian Process is crucial, as it determines the smoothness properties of samples drawn from it.
A level set of a real-valued function f of n real variables is a set of the form: