Readings

“One glance at a book and you hear the voice of another person, perhaps someone dead for 1,000 years. To read is to voyage through time.” ― Carl Sagan

Understanding TransformerS

Transformers architectures are now the basis of the most advanced systems in modeling complex data structures such as sequences. Transformers with the attention mechanism are the heart of Large Language Models such as GPT-4, Gemini, Claude. Below is a reading path that allows you to understand the intimate functioning of Transformers intuitively.

Navigating LLM Transformers in 3D

A very useful and impressive tool for understanding visually LLM Transformers

LARge Language Models and beyond ➤

Geometric DeeP Learning and beyond ➤

Dayly Papers on Hugging Face

The Latest in AI: Daily AI selected papers

Recommended reading:

An up-to-date free book on Machine Learning applied to Natural Language Processing and Understanding

Speech and Language Processing (3rd ed. draft) - Dan Jurafsky and James H. Martin - (direct link to the PDF)

Learning by doing

LLMs from Scratch by rasbt (GitHub)
This repository offers a step-by-step guide to building a GPT-like LLM in PyTorch from scratch. It covers everything from understanding attention mechanisms and coding GPT models to pretraining on large datasets and fine-tuning for specific tasks like text classification. It’s an ideal resource for those who want to learn how LLMs are built from the ground up and develop a deeper understanding of their architecture.
LLM Course by mlabonne (GitHub)
This GitHub repository offers a comprehensive course on Large Language Models (LLMs) with Colab notebooks. It provides explanations on key concepts like tokenization, attention mechanisms, and model training, along with practical implementations. This resource is perfect for both beginners and intermediate learners who want to understand the architecture and workings of LLMs.
NLP with Transformers (Hugging Face, GitHub)
Based on the book Natural Language Processing with Transformers, this repository offers a series of Jupyter notebooks for hands-on learning. It covers a variety of NLP tasks using transformer models, including text classification, named entity recognition, and text generation. This resource is especially useful for understanding how to apply transformers to practical NLP problems using the Hugging Face library.
Training a Language Model from Scratch (Colab, Keras)
This Colab notebook provides an example of training a transformer-based language model from scratch using the Hugging Face Transformers library and TensorFlow. It walks through the entire process of preparing data, setting up a tokenizer, and training the model. It’s ideal for users who want to experiment with transformers using Google Colab’s cloud resources.

Recommended - Mathematics for Machine Learning (Marc Peter Deisenroth)

A free book on Mathematics for Machine Learning that motivates people to learn mathematical concepts. The book is not intended to cover advanced machine learning techniques because plenty of books are already doing this. Instead, the manual aims to provide the necessary mathematical skills to read those other books.

Direct link to PDF

Recommended - Simply Complexity: A Clear Guide to Complexity Theory

What do traffic jams, stock market crashes, and wars have in common? They are all explained using complexity, an unsolved puzzle that many researchers believe is the key to predicting and ultimately solvingeverything from terrorist attacks and pandemic viruses right down to rush hour traffic congestion.

Complexity is considered by many to be the single most important scientific development since general relativity and it promises to make sense of no less than the very heart of the Universe. Using it, scientists can find order emerging from seemingly random interactions of all kinds, from something as simple as flipping coins through to more challenging problems such as the patterns in modern jazz, the growth of cancer tumours, and predicting shopping habits.

Page updated

Report abuse