Finetuning LLMs and

Retrieval Augmented Generation (RAGs)

Last updated: March 06, 2025

By M. Ali Yousuf

[Disclaimer: The code on this page is NOT written by me and comes from the original author/website mentioned in each Google Colab notebook. I have made modifications, when needed, to make them run from within Google Colab, and some extra code/text added to explain the code.]

[Note: If you are new to the field of AI and have not learned it yet, there is a precursor to this course, "Machine Learning For Non-Computer Science Majors" which you can find here https://sites.google.com/view/aiandml4all/home]

Introduction

This course is a collection of labs to give you some hands-on experience. Please note that I am NOT the author of these examples - I have just collected them here with some additional notes in the notebooks. You will find links to the original versions within the notebooks.

Generative artificial intelligence (AI) is a type of AI that uses existing data to create new and realistic content. This content can include text, images, audio, video, and more. Generative AI is different from other types of AI because it produces new data, rather than just analyzing it. The goal is to create content that is similar to what humans would create. [Text generated by Google AI].

Generative AI utilizes various machine learning methods to create new data. Here are some common ones:

Generative Adversarial Networks (GANs): Two neural networks compete, with one generating new data (generator) and the other evaluating its authenticity (discriminator).
Diffusion Models: These models progressively add noise to data, then learn to reverse the process, essentially denoising the data to create new content.
Variational Autoencoders (VAEs): These models encode data into a latent space, allowing for the manipulation and generation of similar data.
Transformers: While not exclusive to generative AI, transformers are powerful neural network architectures that can be used for various tasks, including generating different creative text formats.

References

Some of the sources used:

SuperAnnotate - Introduction to diffusion models for machine learning
TechTarget SearchEnterpriseAI - What is Generative AI? Everything You Need to Know
IBM - What is Generative AI?
Medium - Generative AI (Part-1)
Encord - An Introduction to Diffusion Models for Machine Learning

A note on Google Colab notebooks

When you will click on any file below, it will open and you will be able to make changes. However, you will NOT be able to save those changes as these are my files and you are only a viewer. To get full access, make a 'copy' by clicking on File -> 'Save a Copy in Drive'. That way, you will have a copy of the file in YOUR google drive, under the folder 'Colab Notebooks' (generally a yellow colored folder symbol). You can edit that version as much as you want and it will be saved on your drive. In case you mess up, go back to the original link to the file in my Google Drive and make another fresh copy!

What is Generative AI?

My own presentation on the topic: What is Generative AI (Google Slides)
Another of my presentations on Generative Art (Google Slides).
Understanding LLMs from Scratch Using Middle School Math, https://towardsdatascience.com/understanding-llms-from-scratch-using-middle-school-math-e602d27ec876
Generative AI exists because of the transformer (Financial Time article, very good): https://ig.ft.com/generative-ai/
The IBM Technology YouTube channel has many high-quality videos, all very short (less than 10 minutes each) describing various aspects of the field. Here are two starting points:
- What are Generative AI models? (IBM Technology), https://www.youtube.com/watch?v=hfIUstzHs9A&t=430s
- How Large Language Models Work (IBM Technology), https://youtu.be/5sLYAQS9sWQ?si=WLtiQsya0x6bQxFb

Fine-Tuning

Fine-tune GPT2 For Sequence Classification

The CPU version takes about 14 hours to run, so run it on Google Colab GPU with max RAM and it takes 16 minutes only! https://colab.research.google.com/drive/1h2lY-zE2O_oAD4uDxQW_o9MAVbw9zeNB?usp=sharing . The GPU version works better! (It consumes around $1 of credit from GC).
Want to know more about vectorizers before starting the above? Check https://colab.research.google.com/drive/1qJ1pvafSEMhFehZ97qqxDKeech5o3a-O?usp=sharing
Want to play more with vector embeddings? See https://colab.research.google.com/drive/1PXmg1erDvxq1Msdww84zxrYwtX2JQA58?usp=sharing

Fine-Tune a Vision Language Model Llava

This example comes from https://huggingface.co/blog/vlms.
The video presentation can be found here: https://www.youtube.com/live/cambXXq9mrs?feature=shared
The notebook itself is a copy and the link to the original can be found in the notebook, https://colab.research.google.com/drive/1-AR1OC6Csm4rPoWTM8vM8sFI55nye4l_?usp=sharing

OpenAI with Python
- Fine-tune models for better results and efficiency (using Python), https://platform.openai.com/docs/guides/fine-tuning/fine-tuning
- Data preparation and analysis for chat model fine-tuning, https://cookbook.openai.com/examples/chat_finetuning_data_prep
LitGPT
Use, finetune, pretrain, and deploy LLMs Lightning fast. Every LLM is implemented from scratch with no abstractions and full control, making them blazing fast, minimal, and performant at enterprise scale. Choose from 20+ LLMs. Requires HF Token
https://colab.research.google.com/drive/1Zf8Ta_0SP9gKn91kMxhZAajaWAXzOBNp?usp=sharing
Fine Tune with Unsloth and Ollama
- Watch the video https://www.youtube.com/watch?v=pxhkDaKzBaY [MUST watch the last part of the video that requires work outside Google Notebook and uses Ollama]
- Google Colab, https://colab.research.google.com/drive/1GY3m_AXubttIaTZ-Yv6FMvAYqgBn99qO?usp=sharing
Free and Tiny Models
Find free (and tiny) models and code here: https://huggingface.co/models?sort=trending&search=tiny Everything here runs on a CPU.
Google Colab, https://colab.research.google.com/drive/12vz_KCk81gqxRJ1-R67B_7KWPCJ_yTyB?usp=sharing
Fine-Tune ChatGPT in Plain English

There are various ways to fine-tune a LLM but this one requires a paid chatgpt.com account ($20 per month at the time of this writing). It allows you to fine-tune a model using simple instructions and (optional) configuration. You can upload documents of your own to this system. https://chatgpt.com/gpts/editor/

RAGs - Retrieval-Augmented Generation

RAGs (Retrieval-Augmented Generation)

What is Retrieval-Augmented Generation and why is it better? https://youtu.be/T-D1OfcDW1M?si=gql4IZ3-i5zp9e_q
I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3, https://youtu.be/u5Vcrwpzoz8?si=3VmJv1oEFCBrJ1qR
A Crash Course on Building RAG Systems – Part 1 (With Implementation), https://www.dailydoseofds.com/a-crash-course-on-building-rag-systems-part-1-with-implementations/
A Crash Course on Building RAG Systems – Part 2 (With Implementation), https://www.dailydoseofds.com/a-crash-course-on-building-rag-systems-part-2-with-implementations/
A Crash Course on Building RAG Systems – Part 3 (With Implementation), https://www.dailydoseofds.com/a-crash-course-on-building-rag-systems-part-3-with-implementation/
Understanding RAG Part I: Why It’s Needed
Understanding RAG Part II: How Classic RAG Works
Understanding RAG Part III: Fusion Retrieval and Reranking
Understanding RAG Part IV: RAGAs & Other Evaluation Frameworks
Understanding RAG Part V: Managing Context Length
Understanding RAG Part VI: Effective Retrieval Optimization

Simple RAG

This version uses plain text but in just a few lines - making the whole process very clear.
We use OpenAI's API again (you need the openai_api_key, which generally comes with a paid account) https://colab.research.google.com/drive/1CombJNNVmzx9FhsnQz-35JUg0roN2zbG?usp=sharing

Advanced RAG

This version allows you to upload pdf files and convert each line into plain text.
We use OpenAI's API again (you need the openai_api_key, which generally comes with a paid account)
The Google Colab file can be found here: https://colab.research.google.com/drive/1yXQAzuZam6hd7YS1AU8iGQ1ZNPoGJeUy?usp=sharing

RAG using Clarifai

Large Language Models are not up-to-date, and they also lack domain-specific knowledge, as they are trained for generalized tasks and cannot be used to ask questions about your own data.
That's where Retrieval-Augmented Generation (RAG) comes in: an architecture that provides the most relevant and contextually important data to the LLMs when answering questions.
Learn more about the differences between RAG and Fine Tuning here, https://www.youtube.com/watch?v=00Q0G84kq3M
For this purpose, we'll use the clarifai system. Please open a free account at: https://www.clarifai.com/
Here is the jupyter notebook that achieves this in only a few lines of code:
https://colab.research.google.com/drive/1bHIOtwuU3CGvduZsBeqXCzf52FFSzqW0?usp=sharing

Building A RAG System with Gemma, MongoDB and Open Source Models
Authored by Richmond Alake. From https://huggingface.co/learn/cookbook/en/rag_with_hugging_face_gemma_mongodb which is available here as a google colab. Original can be found here: https://colab.research.google.com/github/huggingface/cookbook/blob/main/notebooks/en/rag_with_hugging_face_gemma_mongodb.ipynb#scrollTo=5gCzss27UwWw
My copy, https://colab.research.google.com/drive/1brwqjA97_P6P0WIel-y289lUB6FfJoMs?usp=sharing
RAG using LangChain and HuggingFace

Building Your Second Brain AI Assistant Using Agents, LLMs and RAG

Six lessons and code at https://github.com/decodingml/second-brain-ai-assistant-course/tree/main?tab=readme-ov-file

Finetuning LLMs and

Retrieval Augmented Generation (RAGs)

Introduction

References

A note on Google Colab notebooks

What is Generative AI?

Fine-Tuning

Fine-tune GPT2 For Sequence Classification

Fine-Tune a Vision Language Model Llava

OpenAI with Python

LitGPT

Fine Tune with Unsloth and Ollama

Free and Tiny Models

Fine-Tune ChatGPT in Plain English

RAGs - Retrieval-Augmented Generation

RAGs (Retrieval-Augmented Generation)

Simple RAG

Advanced RAG

RAG using Clarifai

Building A RAG System with Gemma, MongoDB and Open Source Models

RAG using LangChain and HuggingFace

Building Your Second Brain AI Assistant Using Agents, LLMs and RAG

Further Readings