The paper is one of the most cited papers in the field of machine learning. The architecture, AlexNet brought down the Top 1 and Top 5 error rates of ILSVRC 2012 to 37.5% and 17.0% which is considerably better than the previous best scores of 45.7% and 25.7% respectively. Their architecture has about 60 million parameters and 650,000 neurons. The model had five convolution layers for feature extraction and three fully connected layers with softmax at the final 1000 layer output.
The model was trained on two GPU and they had a highly-efficient implementation of the convolution operation. The kernels in one of the GPUs were mostly color-agnostic while the other GPU had mostly color-specific kernels. The GPUs do communicate at certain parts of the architecture. They used ReLU non-linearity as it is computationally faster and needed six times lower time to reach 25% accuracy in the CIFAR-10 dataset as compared to normal tanh activations. They also used "brightness normalization" to result in 1.4% decrease in error rate.
The authors also tried different methods to reduce overfitting. Data augmentation was done with simple transformations and performing PCA on the RGB pixel values. The authors also used dropout to prevent overfitting. The dropout decreased overfitting but it took twice the number of iterations to converge. Overlapping pooling also showed that the model was slightly more difficult to overfit.
The paper is one of the landmarks in the field of image processing and showed that state-of-the-art results were possible using a solely supervised method. The original paper is available at this link.