ULMFiT (Universal Language Model Fine-tuning) was introduced in the paper "Universal Language Model Fine-tuning for Text Classification" and focuses on transferring knowledge from a pre-trained language model to downstream NLP tasks.
ULMFiT uses a pre-trained AWD-LSTM language model, which is first trained on a large corpus and then fine-tuned on a target task dataset. The architecture involves three stages: pre-training, fine-tuning with discriminative fine-tuning (layer-wise learning rates), and slanted triangular learning rates for optimization.
ULMFiT was a breakthrough in enabling transfer learning for NLP, significantly improving results on text classification tasks.
It took motivation from Computer vision transfer learning and explained some of the possible reasons why NLP tasks might fail in general. It is due to the process by which fine-tuning is done. They introduced better ways of changing the learning rates, how the learning rate should depend on each layer and how the freezing of different layers should occur.