Additional Readings

Last updated: December 23, 2024

Text Books

Top 10 Research Papers on GenAI

27 Papers by Sutskever

Miscellaneous LLM Lectures/Videos

More Labs

Fast-Track to LLMs Mastery

Text Books

This (full) course is based on two books:

Sebastian Raschka, “Build a Large Language Model (From Scratch)”, Manning Publications 2024.
- This is a book in the making and currently has gone up to Chapter 7.
- https://www.manning.com/books/build-a-large-language-model-from-scratch.
- Costs $31, as of today.
- The corresponding GitHub page (with information to buy the book) can be found here: https://github.com/rasbt/LLMs-from-scratch/tree/main
- A 3-hour video lecture by the author: Building LLMs from the Ground Up, https://www.youtube.com/watch?v=quh7z1q7-uc
- An extension of the above, Medical LLM Project https://github.com/samadon1/LLM-From-Scratch/tree/main
David Foster, “Generative Deep Learning: Teaching Machines to Paint, Write, Compose, and Play”. 2nd Edition. O'Reilly Media, Incorporated 2023.
- https://www.amazon.com/Generative-Deep-Learning-Teaching-Machines-dp-1098134184/dp/1098134184/ref=dp_ob_title_bk
- Costs around $50, as of today.
- The corresponding GitHub page (with information to buy the book) can be found here: https://github.com/davidADSP/Generative_Deep_Learning_2nd_Edition

Top 10 Research Papers on GenAI

Here are the top 10 picks from the hundreds of research papers published on GenAI. https://www.analyticsvidhya.com/blog/2023/12/top-research-papers-on-genai/

(This above list was developed by K. C. Sabreena Basheer, 15 Apr, 2024)

27 Papers by Sutskever

List of 27 papers given to John Carmack by Ilya Sutskever (co-founder OpenAI): "If you really learn all of these, you’ll know 90% of what matters today." - shared by @keshavchan on Twitter. The following list taken from https://www.linkedin.com/posts/descalo_learning-generativeai-llms-activity-7199041499789959169-UfJp?utm_source=share&utm_medium=member_android

1. The Annotated Transformer (https://lnkd.in/evrqygtu)

2. The First Law of Complexodynamics (https://lnkd.in/eu5aucVm)

3. The Unreasonable Effectiveness of RNNs (https://lnkd.in/e9wht6Js)

4. Understanding LSTM Networks (https://lnkd.in/eY4WnawT)

5. Recurrent Neural Network Regularization (https://lnkd.in/ebrwzuwY)

6. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights (https://lnkd.in/e4f4s9h6)

7. Pointer Networks (https://lnkd.in/e6qcSXYT)

8. ImageNet Classification with Deep CNNs (https://lnkd.in/etrjwGmY)

9. Order Matters: Sequence to sequence for sets (https://lnkd.in/eYrjEHRP)

10. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (https://lnkd.in/ezFVyhyk)

11. Deep Residual Learning for Image Recognition (https://lnkd.in/ejJT79DE)

12. Multi-Scale Context Aggregation by Dilated Convolutions (https://lnkd.in/eN-p4Hi9)

13. Neural Quantum Chemistry (https://lnkd.in/eChquKQi)

14. Attention Is All You Need (https://lnkd.in/eakhSPXf)

15. Neural Machine Translation by Jointly Learning to Align and Translate (https://lnkd.in/eZfrwxDG)

16. Identity Mappings in Deep Residual Networks (https://lnkd.in/eVuuYTTy)

17. A Simple NN Module for Relational Reasoning (https://lnkd.in/e9xYieKc)

18. Variational Lossy Autoencoder (https://lnkd.in/e8XZrzcn)

19. Relational RNNs (https://lnkd.in/eEs3e_MJ)

20. Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton (https://lnkd.in/e7V-jw8S)

21. Neural Turing Machines (https://lnkd.in/e3qidTvP)

22. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin (https://lnkd.in/eYDgB9cA)

23. Scaling Laws for Neural LMs (https://lnkd.in/ev9s6Pz2)

24. A Tutorial Introduction to the Minimum Description Length Principle (https://lnkd.in/eUJtMXDU)

25. Machine Super Intelligence Dissertation (https://lnkd.in/ebCNq64x)

26. PAGE 434 onwards: Kolmogorov Complexity (https://lnkd.in/ecV-qdfV)

27. CS231n Convolutional Neural Networks for Visual Recognition (https://cs231n.github.io/)

Miscellaneous LLM Lectures/Videos

Sebastian Raschka, Building LLMs from the Ground Up: A 3-hour Coding Workshop, https://www.youtube.com/watch?v=quh7z1q7-uc
What are Generative AI models? (IBM Technology), https://youtu.be/hfIUstzHs9A?si=uBaG0DIFPI7-qQ_t
How Large Language Models Work (IBM Technology), https://youtu.be/5sLYAQS9sWQ?si=WLtiQsya0x6bQxFb
Generative AI Fundamentals: Build foundational knowledge of generative AI, including large language models (LLMs), with 4 short videos (Databricks on-demand training), https://www.databricks.com/resources/learn/training/generative-ai-fundamentals
Just to know the major steps required to create a model, see the following. No programming, just the steps needed.
Building Your Own Large Language Model (LLM) from Scratch: A Step-by-Step Guide https://www.linkedin.com/pulse/building-your-own-large-language-model-llm-from-guide-ramachandran/
Here is a very long article that explains all these things, and much more, in detail. Worth reading.
A Very Gentle Introduction to Large Language Models without the Hype, https://mark-riedl.medium.com/a-very-gentle-introduction-to-large-language-models-without-the-hype-5f67941fa59e
Getting Started with LLMs: A Quick Guide to Resources and Opportunities (by Wendy Ran Wei), https://www.linkedin.com/pulse/getting-started-llms-guide-resources-opportunities-wendy-ran-wei/
A Brief History of Large Language Models (LLM), https://www.linkedin.com/pulse/brief-history-large-language-models-llm-feiyu-chen/
Hugging GPT, https://youtu.be/PfY9lVtM_H0?si=McLJw0PhcbIvxAuo

More Labs

Simple image learning using Hopfield Nets. See them in action using Google Sheets, https://docs.google.com/spreadsheets/d/1_MrXPz6hWmTvIq5ccsBmd9M1n6kKyPDK3lt_Bzh3alE/edit?usp=sharing
LLMs for Vision - Prompting to explain an image 2, using Gemini, https://colab.research.google.com/drive/1925e75y1fmE_pYf0W6Tix7qS3LBA4XQi?usp=sharing
LLMs for next-word prediction (requires a CSV file ‘fake_or_real_news.csv’ that can be downloaded from https://github.com/lutzhamel/fake-news/tree/master/data and then uploaded to Google Colab), https://colab.research.google.com/drive/1Lt8qs58pJZAWuRSfjtbihvU11xySRuCN?usp=sharing
What is fine-tuning? Using MidJourney Falcon7B, https://colab.research.google.com/drive/1jH0OCoQV8boZAQTz4WRtQ6nQvrtsTGjF?usp=sharing
LLaVa Jupyter notebook: https://colab.research.google.com/drive/1-AR1OC6Csm4rPoWTM8vM8sFI55nye4l_?usp=sharing
Word Embeddings, https://colab.research.google.com/drive/1PXmg1erDvxq1Msdww84zxrYwtX2JQA58?usp=sharing

Fast-Track to LLMs Mastery

Your fast-track to LLMs mastery:

1) Implement the data pipeline from Chapters 1 and 2 of Build a Large Language Model From Scratch (BLLMFS), https://www.manning.com/books/build-a-large-language-model-from-scratch.

2) Watch Karpathy's video on training a BPE tokenizer from scratch, https://www.youtube.com/watch?v=zduSFxRajkE

3) Read Chapters 3 and 4 of BLLMFS to learn how to implement the model architecture.

4) Watch Karpathy's video on pretraining the LLM, https://www.youtube.com/watch?v=zjkBMFhNj_g&t=4s.

5) Read Chapter 5 of BLLMFS on pre-training the LLM and loading pretrained weights.

6) Check out Appendix E of BLLMFS for tips on adding extra features to the training loop.

7) Study Chapters 6 and 7 of BLLMFS to master finetuning the LLM.

8) Read Appendix E of BLLMFS for parameter-efficient finetuning with LoRA.

9) Check out Karpathy's repo on coding the LLM in C.

Props to Sebastian Raschka, PhD for putting this together

Page updated

Google Sites

Report abuse