Papers Explained

The paper introduced extended triplet loss to extract more meaningful features from Iris images by ignoring the non-Iris regions during the training and removing the minute changes caused by rotation of the Iris. More details

ImageNet Classification with Deep Convolutional Neural Networks

The paper is one of the most cited papers and reintroduced Convolution Neural Networks in the machine learning community. The AlexNet architecture achieved a sharp decrease in error rate in ILSVRC 2012. More details

Attention Is All You Need

The paper introduced the transformer. This is the start of the transformer revolution that changed the world of NLP and computer vision. More details

Improving Language Understanding by Generative Pre-Training

The paper is the first of the Generative Pretraining (GPT) models. It tries to generate a model using unsupervised learning to understand what the next word will be and then uses it for supervised fine-tuning. More details

BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding

The paper introduced BERT which considered bidirectional context during pre-training. It used Masked Langauge Modeling where it tried to predict a random token in the input. It also included Next Sentence Prediction to have better sentence level understanding. More details.

Deep contextualized word representations

ElMO is one of the last models before Transformers gained popularity. It provided contextual word representations using bidirection LSTMs and showed that a linear combination of the representations are better for providing contextual information. More details

Universal Language Model Fine-tuning for Test Classification

This paper introduced how different LMs should be trained so that they are better for downstream tasks. It is one of the more influential papers since it allowed NLP to develop into fine-tuning for downstream tasks without retraining the whole model and also reduced catastrophic forgetting. More Details

Language Models are Unsupervised Multitask Learners

This paper introduces the GPT2 paper. Apart from scaling the model size and some changes in how the layerNorm are placed in the architecture, there is not much significant changes. However, this paper introduces WebText which generates a huge dataset from text available in Reddit outlinks. More details.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

BART: Denoising Seq-to-Seq Pretraining for NLG, Translation and Comprehension

ALBERT: A Lite Bert for Self-supervised Learning of Language Representations

Electra: Pretraining Text Encoders as Discriminators Rather Than Generators

Page updated

Google Sites

Report abuse