Lecture 3

ATTENTION IS ALL YOU NEED
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. This paper proposes a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
read more

Image generation has been successfully cast as an autoregressive sequence generation or trans- formation problem. Recent work has shown that self-attention is an effective way of modeling tex- tual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood.
read more

The input is run through the six layers of encoders, and the final output is then sent to the Multi-Head Attention layer of all the decoders. The Masked Multi-Head Attention layer takes in the output of the previous decoder blocks as input. This way, the decoders take into consideration the word from the previous time step and the context of the word from the encoding process.

All the decoders work together to create an output vector which is transformed into a logits vector using a linear transformation. The logits vector has a size equal to the number of words in the vocabulary. This vector is then passed through a softmax function, which tells us how likely a word will be the next word in the generated sentence. The softmax function basically tells us what the next word will be.
read more

GPT-3’s training data is sourced from the internet, which can introduce biases into the model. For example, the model may reflect gender, racial, or cultural biases that exist in the training data. Additionally, the ethical implications of using GPT-3 for certain applications, such as creating fake news or deep fakes, must be considered.
read more

We are at the beginning of an exponential growth phase for AI. ChatGPT, one of the most popular generative AI applications, has revolutionized how humans interact with machines. This was made possible thanks to reinforcement learning with human feedback (RLHF).

To understand reinforcement learning, you need to first understand the difference between supervised and unsupervised learning. Supervised learning requires labeled data which the model is trained on to learn how to behave when it comes across similar data in real life. In unsupervised learning, the model learns all by itself. It is fed data and can infer rules and behaviors without labeled data.
read more

Page updated

Google Sites

Report abuse