Last updated: December 23, 2024
This (full) course is based on two books:
Sebastian Raschka, “Build a Large Language Model (From Scratch)”, Manning Publications 2024.
This is a book in the making and currently has gone up to Chapter 7.
https://www.manning.com/books/build-a-large-language-model-from-scratch.
Costs $31, as of today.
The corresponding GitHub page (with information to buy the book) can be found here: https://github.com/rasbt/LLMs-from-scratch/tree/main
A 3-hour video lecture by the author: Building LLMs from the Ground Up, https://www.youtube.com/watch?v=quh7z1q7-uc
An extension of the above, Medical LLM Project https://github.com/samadon1/LLM-From-Scratch/tree/main
David Foster, “Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play”. 2nd Edition. O'Reilly Media, Incorporated 2023.
Costs around $50, as of today.
The corresponding GitHub page (with information to buy the book) can be found here: https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition
Here are the top 10 picks from the hundreds of research papers published on GenAI. https://www.analyticsvidhya.com/blog/2023/12/top-research-papers-on-genai/
(This above list was developed by K. C. Sabreena Basheer, 15 Apr, 2024)
List of 27 papers given to John Carmack by Ilya Sutskever (co-founder OpenAI): "If you really learn all of these, you’ll know 90% of what matters today." - shared by @keshavchan on Twitter. The following list taken from https://www.linkedin.com/posts/descalo_learning-generativeai-llms-activity-7199041499789959169-UfJp?utm_source=share&utm_medium=member_android
1. The Annotated Transformer (https://lnkd.in/evrqygtu)
2. The First Law of Complexodynamics (https://lnkd.in/eu5aucVm)
3. The Unreasonable Effectiveness of RNNs (https://lnkd.in/e9wht6Js)
4. Understanding LSTM Networks (https://lnkd.in/eY4WnawT)
5. Recurrent Neural Network Regularization (https://lnkd.in/ebrwzuwY)
6. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights (https://lnkd.in/e4f4s9h6)
7. Pointer Networks (https://lnkd.in/e6qcSXYT)
8. ImageNet Classification with Deep CNNs (https://lnkd.in/etrjwGmY)
9. Order Matters: Sequence to sequence for sets (https://lnkd.in/eYrjEHRP)
10. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (https://lnkd.in/ezFVyhyk)
11. Deep Residual Learning for Image Recognition (https://lnkd.in/ejJT79DE)
12. Multi-Scale Context Aggregation by Dilated Convolutions (https://lnkd.in/eN-p4Hi9)
13. Neural Quantum Chemistry (https://lnkd.in/eChquKQi)
14. Attention Is All You Need (https://lnkd.in/eakhSPXf)
15. Neural Machine Translation by Jointly Learning to Align and Translate (https://lnkd.in/eZfrwxDG)
16. Identity Mappings in Deep Residual Networks (https://lnkd.in/eVuuYTTy)
17. A Simple NN Module for Relational Reasoning (https://lnkd.in/e9xYieKc)
18. Variational Lossy Autoencoder (https://lnkd.in/e8XZrzcn)
19. Relational RNNs (https://lnkd.in/eEs3e_MJ)
20. Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton (https://lnkd.in/e7V-jw8S)
21. Neural Turing Machines (https://lnkd.in/e3qidTvP)
22. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin (https://lnkd.in/eYDgB9cA)
23. Scaling Laws for Neural LMs (https://lnkd.in/ev9s6Pz2)
24. A Tutorial Introduction to the Minimum Description Length Principle (https://lnkd.in/eUJtMXDU)
25. Machine Super Intelligence Dissertation (https://lnkd.in/ebCNq64x)
26. PAGE 434 onwards: Kolmogorov Complexity (https://lnkd.in/ecV-qdfV)
27. CS231n Convolutional Neural Networks for Visual Recognition (https://cs231n.github.io/)
Sebastian Raschka, Building LLMs from the Ground Up: A 3-hour Coding Workshop, https://www.youtube.com/watch?v=quh7z1q7-uc
What are Generative AI models? (IBM Technology), https://youtu.be/hfIUstzHs9A?si=uBaG0DIFPI7-qQ_t
How Large Language Models Work (IBM Technology), https://youtu.be/5sLYAQS9sWQ?si=WLtiQsya0x6bQxFb
Generative AI Fundamentals: Build foundational knowledge of generative AI, including large language models (LLMs), with 4 short videos (Databricks on-demand training), https://www.databricks.com/resources/learn/training/generative-ai-fundamentals
Just to know the major steps required to create a model, see the following. No programming, just the steps needed.
Building Your Own Large Language Model (LLM) from Scratch: A Step-by-Step Guide https://www.linkedin.com/pulse/building-your-own-large-language-model-llm-from-guide-ramachandran/
Here is a very long article that explains all these things, and much more, in detail. Worth reading.
A Very Gentle Introduction to Large Language Models without the Hype, https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e
Getting Started with LLMs: A Quick Guide to Resources and Opportunities (by Wendy Ran Wei), https://www.linkedin.com/pulse/getting-started-llms-guide-resources-opportunities-wendy-ran-wei/
A Brief History of Large Language Models (LLM), https://www.linkedin.com/pulse/brief-history-large-language-models-llm-feiyu-chen/
Hugging GPT, https://youtu.be/PfY9lVtM_H0?si=McLJw0PhcbIvxAuo
Simple image learning using Hopfield Nets. See them in action using Google Sheets, https://docs.google.com/spreadsheets/d/1_MrXPz6hWmTvIq5ccsBmd9M1n6kKyPDK3lt_Bzh3alE/edit?usp=sharing
LLMs for Vision - Prompting to explain an image 2, using Gemini, https://colab.research.google.com/drive/1925e75y1fmE_pYf0W6Tix7qS3LBA4XQi?usp=sharing
LLMs for next-word prediction (requires a CSV file ‘fake_or_real_news.csv’ that can be downloaded from https://github.com/lutzhamel/fake-news/tree/master/data and then uploaded to Google Colab), https://colab.research.google.com/drive/1Lt8qs58pJZAWuRSfjtbihvU11xySRuCN?usp=sharing
What is fine-tuning? Using MidJourney Falcon7B, https://colab.research.google.com/drive/1jH0OCoQV8boZAQTz4WRtQ6nQvrtsTGjF?usp=sharing
LLaVa Jupyter notebook: https://colab.research.google.com/drive/1-AR1OC6Csm4rPoWTM8vM8sFI55nye4l_?usp=sharing
Word Embeddings, https://colab.research.google.com/drive/1PXmg1erDvxq1Msdww84zxrYwtX2JQA58?usp=sharing
Your fast-track to LLMs mastery:
1) Implement the data pipeline from Chapters 1 and 2 of Build a Large Language Model From Scratch (BLLMFS), https://www.manning.com/books/build-a-large-language-model-from-scratch.
2) Watch Karpathy's video on training a BPE tokenizer from scratch, https://www.youtube.com/watch?v=zduSFxRajkE
3) Read Chapters 3 and 4 of BLLMFS to learn how to implement the model architecture.
4) Watch Karpathy's video on pretraining the LLM, https://www.youtube.com/watch?v=zjkBMFhNj_g&t=4s.
5) Read Chapter 5 of BLLMFS on pre-training the LLM and loading pretrained weights.
6) Check out Appendix E of BLLMFS for tips on adding extra features to the training loop.
7) Study Chapters 6 and 7 of BLLMFS to master finetuning the LLM.
8) Read Appendix E of BLLMFS for parameter-efficient finetuning with LoRA.
9) Check out Karpathy's repo on coding the LLM in C.
Props to Sebastian Raschka, PhD for putting this together