The sections contain a few line description of the paper while the link associated to it leads to a more elaborate explanation.
Table of Contents:
RISE tries to explain the prediction of the image using an occlusion based technique treating the model as a black box. More details
The paper introduced extended triplet loss to extract more meaningful features from Iris images by ignoring the non-Iris regions during the training and removing the minute changes caused by rotation of the Iris. More details
The paper is one of the most cited papers and reintroduced Convolution Neural Networks in the machine learning community. The AlexNet architecture achieved a sharp decrease in error rate in ILSVRC 2012. More details
The paper introduced the transformer. This is the start of the transformer revolution that changed the world of NLP and computer vision. More details
The paper is the first of the Generative Pretraining (GPT) models. It tries to generate a model using unsupervised learning to understand what the next word will be and then uses it for supervised fine-tuning. More details
The paper introduced BERT which considered bidirectional context during pre-training. It used Masked Langauge Modeling where it tried to predict a random token in the input. It also included Next Sentence Prediction to have better sentence level understanding. More details.
ElMO is one of the last models before Transformers gained popularity. It provided contextual word representations using bidirection LSTMs and showed that a linear combination of the representations are better for providing contextual information. More details
This paper introduced how different LMs should be trained so that they are better for downstream tasks. It is one of the more influential papers since it allowed NLP to develop into fine-tuning for downstream tasks without retraining the whole model and also reduced catastrophic forgetting. More Details
This paper introduces the GPT2 paper. Apart from scaling the model size and some changes in how the layerNorm are placed in the architecture, there is not much significant changes. However, this paper introduces WebText which generates a huge dataset from text available in Reddit outlinks. More details.