Adaptive Multi-Column Deep Neural Networks

This page contains additional information about the following paper: 
"Adaptive Multi-Column Deep Neural Networks with Application to Robust Image Denoising" 

This work was presented at Neural Information Processing Conference (NIPS), 2013.

Our code and paper can be downloaded below:

*Please send questions/comments/suggestions to Forest Agostinelli:


Stacked sparse denoising autoencoders (SSDAs) have recently been shown to be successful at removing noise from corrupted images. However, like most denoising techniques, the SSDA is not robust to variation in noise types beyond what it has seen during training. To address this limitation, we present the adaptive multi-column stacked sparse denoising autoencoder (AMC-SSDA), a novel technique of combining multiple SSDAs by (1) computing optimal column weights via solving a nonlinear optimization program and (2) training a separate network to predict the optimal weights. We eliminate the need to determine the type of noise, let alone its statistics, at test time and even show that the system can be robust to noise not seen in the training set. We show that state-of-the-art denoising performance can be achieved with a single system on a variety of different noise types. Additionally, we demonstrate the efficacy of AMC-SSDA as a preprocessing (denoising) algorithm by achieving strong classification performance on corrupted MNIST digits. 


Neural networks have the ability to perform well on specific tasks; however, if the distribution of the training data varies too much, the network will have a hard time learning to perform this task. This can be mitigated by making the network larger and increasing the amount of training examples, but this can make training intractable because of memory constraints (the amount of parameters significantly increase) and time constraints (the time to train the network will be much longer). If, instead, the training data was split into smaller pieces and a network was trained on each piece, we can learn to perform well on the larger task.

To solve this problem, we introduce the adaptive multi-column deep neural network. This network has the ability to combine multiple networks (or columns) so that the robustness of the final model spans that of its columns and even remains robust to testing examples whose distributions are different than what was seen in the training set.

Application to Denoising

Many denoising algorithms make strong assumptions on the input, for example, requiring that the images be corrupted with Gaussian noise and even requiring that the user to provide statistics of each test image. We could denoise images corrupted by a much wider range of noise types if we incorporate multiple neural networks, each trained on a different type of noise. During test time, there would be no need to manually figure out statistics or even what type of noise each image is corrupted with.


We build of what is called a stacked sparse denoising autoencoder (SSDA) which has been shown to perform well on image denoising tasks. We train each network on a different type of noise and combine them into an adaptive multi-column SSDA (AMC-SSDA). Each network corresponds to a column in the AMC-SSDA. We then use the hidden activations of each column to train a weight prediction module which weights the output of each column.

Here is a visualization of the AMC-SSDA:

A visualization of the adaptive multi-column stacked sparse denoising autoencoder (AMC-SSDA).

Visualization of denoising performance on MNIST digits

This image shows the AMC-SSDA's ability to handle a multitude of different noise types.

Robustness to Unseen Distributions for CT scans

In this chart, the AMC-SSDA and its columns are trained of various noise types of Gaussian, salt & pepper, and speckle noises. However, it can still be robust to Poisson and uniform noise, though it has never seen it before. Though some of its columns do very poorly on these noise types, it still finds a linear combination of its outputs to select the columns that will best denoise this image corrupted with a noise of an unseen distribution.