• Prior over parameters in sparse linear regression: In sparse linear regression, the learning algorithm aims to find gradient vector (parameter "phi's"; slopes in data dimensions, Dx1 dimension) where most entries are zero. This is done to discourage non-useful dimensions to influence the uncertainty over parameters by setting their slope values close to 0 as possible. For this, a penalty is imposed on the useful dimensions by means of a prior over the parameters ('phi'). This could be a product of D univariate t-distributions that form a ridge along the coordinate axes in D dimensions (Note: this is not the same as a multivariate t-distribution for D dimensions). This way, the prior brings values close to 0 for slopes corresponding to dimensions that are not so useful, where as for other dimensions, the training data would minimize the effect of prior. An interesting picture of the prior over parameters using a product of 2 univariate t-distributions in 2 dimensions is shown below.

[X1,X2] = meshgrid(linspace(-1,1,50)', linspace(-1,1,50)'); X = [X1(:) X2(:)]; p=tpdf(X(:,1),0.01); % degree of freedom=0.01, unit covariance p=tpdf(X(:,2),0.01).*p; subplot(1,2,1); surf(X1,X2,reshape(p,gridsize,gridsize)); subplot(1,2,2); contour(X1,X2,reshape(p,gridsize,gridsize));