ELMo (Embeddings from Language Models) is a contextualized word representation model introduced in the paper "Deep contextualized word representations". Unlike traditional static word embeddings like Word2Vec or GloVe, ELMo captures context-dependent meaning by considering the entire input sentence when producing embeddings.
ELMo uses a bi-directional LSTM architecture trained with a language modeling objective (both forward and backward). It produces word embeddings that vary based on the context of the word, making it effective in downstream tasks like question answering and sentiment analysis.
ELMo differs from GPT and BERT by relying on LSTMs rather than Transformers for its contextual embeddings.
It considers a linear transormation of the features at the different layers of the LTSM to represent the final contextual version.