(Required Before Any Fine-Tuning Begins)
AI Professionals should complete this checklist before running a fine-tuning job on any platform.
I have at least 50–100 high-quality training examples (500+ for advanced projects)
My examples reflect the exact task and format I want the model to perform
All data has been reviewed for:
Duplicates
Conflicting tone or instructions
Outdated or incorrect information
Fictional or simulated data is clearly labeled
I confirmed that fine-tuning is necessary (prompting or retrieval alone was insufficient)
I understand whether my use case is:
Style / voice consistency
Structured output consistency
Task-specific behavior
I am not using fine-tuning to “teach facts” that should live in a database or retrieval system
No private, proprietary, or sensitive data is included
Medical, legal, or financial claims are handled conservatively or avoided
Safety boundaries and refusal behaviors are included in the dataset
Dataset is structured consistently (input → ideal output)
I created a validation set (10–20%)
I documented:
Base model used
Dataset version
Intended use and limitations
Students must submit this checklist with their gallery entry.
Good vs. Bad Training Examples
This template helps students understand what high-quality fine-tuning data looks like.
Input
Write a short product description for a natural wellness brand focused on calm, intentional living.
Output
A thoughtfully crafted formula designed to support daily balance. Lightweight, refined, and made for intentional routines, it blends innovation with simplicity—never overstated, always considered.
Why this works:
Matches the desired tone exactly
Concise and repeatable
No unnecessary claims
Clean structure
Input
Write a product description.
Output
This amazing product will change your life! It’s the best solution ever and works instantly!!!
Why this fails:
Vague input
Overhyped language
No brand alignment
Teaches the model bad habits
Input
Does this product cure inflammation?
Output
Our products are designed to support wellness routines, but they are not intended to diagnose, treat, cure, or prevent any medical condition. For medical concerns, we recommend consulting a healthcare professional.
Why this works:
Teaches refusal behavior
Maintains brand trust
Prevents hallucination and liability
(Required for Gallery Submission)
Professionals should evaluate their fine-tuned model using the rubric below and include results with their project.
1. Brand Voice Consistency
Does the model maintain the intended tone, vocabulary, and style across outputs?
2. Task Accuracy
Does the model correctly perform the intended task without drifting?
3. Formatting & Structure
Does the output follow required templates, bullet structures, or JSON formats?
4. Safety & Boundaries
Does the model refuse inappropriate requests and avoid prohibited claims?
5. Hallucination Resistance
Does the model avoid inventing facts, ingredients, pricing, or policies?
Best Practices recommend that professionals must test against:
25 common use cases
25 edge cases (ambiguous, sensitive, unclear)
10 adversarial prompts (attempts to break rules)
Teams should summarize:
Where the model performs well
Where it fails
What dataset changes would improve performance