Exploring Gen AI capabilities:
Lessons from Google x Kaggle
Gen AI Intensive Course Capstone
Thant Thiha
18 April 2025
Thant Thiha
18 April 2025
1. Building a RAG Application:
Making LLMs Smarter with Retrieval
Large Language Models (LLMs) are powerful but they have two key limitations:
They only know what they were trained on (knowledge cutoff).
They have limited context windows for input.
Retrieval Augmented Generation (RAG) addresses both by combining LLMs with external knowledge bases. Here’s how it works in three steps:
Indexing: Relevant documents are embedded and stored in a vector database.
Retrieval: At query time, the system fetches the most relevant documents.
Generation: The LLM generates an answer using both the query and the retrieved information.
In the capstone, ChromaDB was used as the vector store and Gemini’s embedding API to index and retrieve passages about RAG itself. For example, when asked, “What is RAG and how does it work in LLMs?”, the system retrieved a relevant passage and prompted Gemini to generate a user-friendly, accurate answer:
"RAG, which stands for Retrieval-Augmented Generation, is a helpful technique that addresses two common issues with Large Language Models, or LLMs. First, LLMs sometimes "hallucinate," meaning they generate responses that sound believable but are actually factually incorrect. Second, it can be expensive to constantly retrain LLMs to keep them updated with current information. Instead of retraining, RAG allows newer data to be supplied through the prompt. So, RAG can reduce hallucinations, but it doesn't eliminate them entirely. To further reduce hallucinations, the sources from the retrieval can be returned and a quick coherence check can be done by a human or an LLM. This ensures that the LLM response is consistent with the semantically relevant sources."
2. Prompt Engineering:
One-Shot Prompting for Python Code Generation
Prompt engineering is essential for getting the most out of LLMs. By crafting clear instructions and providing examples, we can guide the model to produce high-quality outputs. In the capstone, I explored one-shot prompting—providing a single example to set the pattern.
Task: Generate structured Python code for data analysis on the Iris dataset using Google’s PACE framework (Plan, Analyze, Construct, Evaluate). The prompt included a sample title and breakdown, and the model responded with well-commented, ready-to-run code:
This approach is powerful for automating repetitive coding tasks and ensuring consistency across projects. However, we may need to adapt the generated codes based on the requirements and context of our analysis.
Evaluating Gen AI Outputs:
Ensuring Quality and Reliability
Evaluating LLM outputs is crucial, especially for open-ended tasks. The capstone introduced a systematic approach:
Criteria: Instruction following, groundedness (using only provided context), completeness, and fluency.
Rubric: A 1–5 scale, with 5 being very good (fully meets all criteria).
Automated Evaluation: The model itself can rate its outputs step-by-step, providing explanations for the score.
For example, the generated Python code above received a score of 5 (“Very good”) for following the instructions, being well-organized, and complete:
"STEP 1: The response provides Python code for data analysis using the Iris dataset, following the PACE framework. The code includes data loading, exploration, visualization, model building (SVM), and evaluation with a classification report. The code is well-structured, commented, and covers the key aspects of the analysis.
STEP 2: Based on the evaluation, the response is rated as 5.
Rationale:
Instruction Following: The response successfully translates the PACE framework into a data analysis workflow, covering planning, analysis, construction, and evaluation steps.
Groundedness: The code uses the well-known Iris dataset, which aligns with the prompt.
Completeness: The code covers essential steps in data analysis, from data loading and exploration to model building and evaluation.
Fluency: The code is well-structured, commented, and easy to follow."
The course also suggests improving evaluation confidence by using multiple models or providers (e.g., Gemini, Claude, ChatGPT) to get diverse perspectives.
In Conclusion, the Google x Kaggle Gen AI Intensive Capstone is a great resource for anyone looking to move from theory to practice in Gen AI. The hands-on approach with real APIs, vector databases, and evaluation pipelines makes these advanced techniques accessible and actionable. The Jupyter Notebook about this project can be found here on Kaggle.
Reference: Addison Howard, Brenda Flynn, Myles O'Neill, Nate, and Polong Lin. Gen AI Intensive Course Capstone 2025Q1. Kaggle, 2025