The following section describes Recipes for tailoring language models to your brand dataset using OpenAI fine-tuning (API/CLI workflow) or Hugging Face Transformers (Llama 2, Falcon). This section is written so students can follow a repeatable, ethical process and document their work for the Touro AI Gallery.
Fine-tuning teaches a model to follow your preferred style, formatting, tone, and domain rules by training on examples of “input → ideal output.” It is most effective when you need consistent behavior at scale (customer support tone, structured outputs, brand voice), not when you simply need more knowledge (use retrieval/RAG for that). OpenAI Platform+1
Use fine-tuning when you want:
Consistent brand voice and formatting without long prompts
Reliable structured outputs (JSON, tables, templates)
“Do it our way” responses for a recurring task
Do not fine-tune when:
The problem is missing facts (use a knowledge base + retrieval)
You only need a one-off style (use a strong system prompt + few-shot examples)
Data quality is weak (garbage in, expensive garbage out)
Pick the smallest model that meets quality requirements, then scale up only if needed. OpenAI specifically notes that newer small models can replace older 3.5-style use cases for better cost/performance in many situations. OpenAI Platform, The following examples use the brand Halo42 (https://halo42.com) as an example to help understanding the best practices and testing approaches.
Recommended approach:
Baseline with prompting
Use a system prompt + 5–20 exemplars (few-shot).
Measure quality, speed, and cost.
If outputs are inconsistent → try supervised fine-tuning
Best for tone, structure, and repeatable tasks. OpenAI Platform
If you need domain knowledge updates → use retrieval
Don’t bake facts into the model if they’ll change.
For open-source experimentation / local training → use Hugging Face + PEFT
Great for learning workflows, privacy constraints, or research.
Important note for students: OpenAI’s fine-tuning support is model-dependent and changes over time. Always confirm which models are currently eligible in the OpenAI fine-tuning docs before starting a project. OpenAI Platform+1
Your dataset should look like the real work you want the model to do.
Best-practice dataset rules:
Minimum: 50–100 high-quality examples for a small pilot; 500–2,000 for strong consistency
Consistency beats volume: fewer, cleaner examples outperform messy large sets
Include edge cases: refunds, safety boundaries, “we don’t do that,” ambiguity
Match the desired format exactly (if you want bullet lists, train bullet lists)
Example training pairs (conceptual):
Input: “Write a short product description for Halo42 Skin’s copaiba oil…”
Output: A description in the approved brand voice + compliance constraints
Input: “Customer asks if this cures eczema”
Output: Brand-safe, medically cautious response + redirect language
OpenAI’s supervised fine-tuning process is broadly:
Prepare a labeled dataset of correct responses
Upload training (and optional validation) files
Create a fine-tuning job
Evaluate outputs and iterate OpenAI Platform+2OpenAI Platform+2
Recommended workflow:
Build a “gold” dataset
Create a CSV or doc first; review with instructor/teammates; then convert to JSONL.
Add a validation split
Keep 10–20% for validation so you can detect overfitting early.
Run a small pilot job first
Train on a smaller dataset to confirm formatting, tone, and safety rules are learned.
Evaluate with a rubric
Score: accuracy, brand voice, formatting, refusal behavior, hallucination rate.
Iterate
Fix data, not the model, first. Most failures are dataset issues.
Where the CLI fits:
Many teams use a CLI to upload files and create jobs, but the exact CLI commands can vary by tool version. The durable workflow is the same: upload → create job → monitor → evaluate. Always follow the current OpenAI fine-tuning documentation for the exact commands/parameters. OpenAI Platform+1
For open-source models, students should use parameter-efficient fine-tuning (PEFT) so training can run on limited hardware.
Recommended “standard recipe”:
Base model: Llama 2 or Falcon variant (choose size that fits GPU/compute)
Fine-tuning method: LoRA or QLoRA using PEFT
Trainer: TRL SFTTrainer (common supervised fine-tuning approach) GitHub+3Hugging Face+3Hugging Face+3
Best practices:
Use LoRA adapters so you train only a small portion of parameters (faster + cheaper) Hugging Face+1
Start with a small subset and verify formatting before scaling
Track experiment metadata (base model, LoRA rank, learning rate, dataset version)
Student-friendly deliverable:
Publish the adapter weights (where permitted) and a model card describing:
dataset sources and cleaning steps
intended use and limitations
evaluation results
examples of correct behavior and failure cases
A fine-tuned model should be judged on repeatable criteria, not “it feels better.”
Suggested evaluation rubric:
Brand voice adherence (consistent tone, vocabulary, style)
Formatting compliance (templates, JSON validity, headings)
Task accuracy (answers the question correctly)
Safety and boundaries (refuses when appropriate, avoids medical/legal claims)
Hallucination resistance (doesn’t invent ingredients, policies, or pricing)
Recommended testing set:
25 “easy” common requests
25 edge cases (ambiguous, policy-sensitive)
10 adversarial prompts (try to break rules)
Brand voice and formatting:
“Write 3 ad variations using a brand like Halo42 voice with compliance-safe claims”
“Generate product page bullets using the approved structure”
“Convert long text into a short, luxury-toned email snippet”
Customer support:
“Respond to shipping delay with empathy + policy”
“Handle ingredient questions without medical claims”
Content ops:
“Turn a customer review into a compliant testimonial + CTA”
“Generate FAQ entries with consistent structure”
OpenAI fine-tuning (best when):
You want managed infrastructure and fast iteration
You need strong instruction-following and consistent formatting OpenAI Platform+1
Hugging Face Transformers + PEFT (best when):
You need open-source experimentation or local control
You want hands-on learning of training workflows and adapters