Assessment in schools has always evolved with the times. Not long ago, much of the conversation was about “catching cheating”, policing plagiarism and relying on traffic-light systems or rigid controls. But with the arrival of generative AI, those approaches feel outdated.
Today, the challenge isn’t about stopping students from using AI. It’s about designing assessment that is authentic, meaningful, and future-ready. Teachers want assessment to reflect real learning, to build skills that matter, and to uphold integrity without creating unnecessary barriers.
To support this shift, we’ve gathered four practical frameworks that teachers can adapt to their own context:
The AUTHENTIC Assessment Framework by cechat, designed to prioritise relevance and deep learning.
Task Design that responds to AI from Adrian Cotterell.
The Menu Approach by Professor Danny Liu (University of Sydney).
AI Assessment Scale from Leon Furze that reframes assessment design in the AI context.
Each offers a way forward in a time when assessment needs to be more than detection; it needs to be a bridge between learning, evidence, and student growth.
“Detectors can reliably catch AI-written work.”
Not true. Major tools have low accuracy and known bias against non-native writers; even OpenAI withdrew its own text-detector for poor performance. Don’t base decisions on detector scores. Ask students to show process (drafts, notes, oral checks) instead. OpenAI+2Stanford HAI+2
“The safest response is to ban AI.”
Regulators and sector bodies advise redesigning assessment, not blanket bans: build authentic tasks, require disclosure, and embed at least one secure, supervised element where appropriate. qaa.ac.uk+1
“Watermarking/provenance will soon make detection easy.”
For text, watermarking and provenance standards are promising but not robust enough to depend on for academic integrity decisions yet. Use them as one signal, not proof. NIST+1
“LLMs can mark on their own as accurately as teachers.”
Current rules in high-stakes systems require human oversight; zero-/few-shot LLMs underperform for practical scoring without careful tuning and moderation. If you pilot AI-assisted marking, keep a human in the loop and calibrate on exemplars. GOV.UK+1
“RAG or a private ‘walled garden’ eliminates hallucinations.”
Grounding reduces errors but does not remove them; models still fabricate or mis-attribute even with good retrieval. Require sources, quote-checking, and spot oral defenses for critical tasks. NIST+1
“Using enterprise AI means our student data will train the models.”
With reputable enterprise offerings, prompts/outputs aren’t used for training by default (and vendors publish DPAs). Still: avoid PII in prompts unless your contract explicitly permits it. OpenAI+1
“AI feedback is a like-for-like replacement for teacher feedback.”
Policy guidance frames AI as augmenting feedback—not replacing teacher judgment—because outputs can be incorrect, generic, or misaligned to outcomes. Use AI to draft first-pass comments, then edit against your rubric. ERIC+1
“The answer is to go back to handwritten exams.”
Quality bodies caution against over-relying on pen-and-paper; authenticity and accessibility matter. Mix modalities: supervised/practical/oral components plus process evidence in coursework. blogs.qub.ac.uk+1
“AI makes assessment fair by default.”
Bias, transparency, and explainability remain live issues; regulators emphasise safeguards to protect fairness and public confidence. Keep bias checks, diverse exemplars, and moderation panels. GOV.UK+1
“Students fully own AI-only outputs and can submit them as original work.”
In some jurisdictions (e.g., U.S.), purely AI-generated work isn’t copyrightable; meaningful human authorship is required. That’s separate from academic originality rules, but it matters for declarations and reuse. Require disclosure of AI assistance. Federal Register+1
Require disclosure of any AI assistance and show-the-work artifacts (drafts, planning notes, prompts, citations, short viva). qaa.ac.uk+1
Redesign tasks for authenticity (local data, iterative products, performance pieces, fieldwork, portfolios). TEQSA
Use AI as a co-marker/coach, not the examiner: generate draft feedback, then human-moderate against the rubric. ERIC
Protect data: use enterprise tools with no-training-on-customer-data terms; avoid PII in prompts unless contracts and consent cover it. OpenAI+1
Assess sources, not just answers: insist on verifiable references; penalise unsupported claims, not “AI use” per se. NIST