The Business Problem
An EdTech client aimed to integrate Large Language Models (LLMs) as AI tutors but faced significant business risks. Deploying a model prone to factual inaccuracies (hallucinations) or poor pedagogical methods could damage learner trust, harm educational outcomes, and severely impact their brand reputation.
My Solution & The Deliverable
I developed a comprehensive evaluation framework to benchmark the performance and safety of leading LLMs (including Gemini and ChatGPT) for educational use. The core deliverable was a detailed analysis report based on a custom, multi-faceted rubric I created, focusing on factual accuracy, pedagogical appropriateness, conceptual clarity, and learner engagement. This framework provided the client with actionable, data-driven feedback to inform model selection and fine-tuning.
Tech & Skills Showcase
Core Skills: LLM Evaluation & Benchmarking, Responsible AI, Prompt Engineering
Models Analyzed: OpenAI (GPT series), Google (Gemini series)
Methodology: Custom Evaluation Rubrics, Qualitative & Quantitative Feedback Analysis
Quantifiable Results & Impact
Provided recommendations that led to a 25% reduction in model hallucinations on key academic subjects.
Identified key areas for fine-tuning that improved learner engagement metrics by over 15% in pilot tests.
Delivered a reusable evaluation framework that became the client's standard for ensuring all future AI tutors are safe and effective.
Github repository: https://github.com/juliocode-job/LLM-edtech-safety-evaluation
Interested in having a LLM trainer and deploy these features customized for your business needs?
Send me an email: lemosfranca1234@gmail.com