GenAI can transform educational assessments by improving efficiency, engagement, equity, and personalization, but without shared evaluation standards, AI-generated assessments risk low-quality outcomes. This project analyzes over 250 recent research studies to identify leading evaluation practices and create guidance for schools, EdTech developers, and measurement researchers.
Build a consensus-based GenAI research analysis approach to evaluate validity, reliability and fairness.
Human-label at least 250 research studies using the approach.
Pilot large language models to scale rubric application.
Provide actionable guidance for developers, educators, and policymakers.
Students – Shorter, more engaging, equitable assessments.
Educators & System Leaders – Trusted guidance to select high-quality tools.
EdTech & Assessment Providers – Shared rubric and benchmarks to accelerate innovation.
Professional Communities – NCME, ISTE, 1EdTech members gain reusable evidence-based resources.
Research review → Human-in-the-loop labeling → LLM pilot → Synthesis & Recommendations
Publicly available evidence framework and evaluation rubric.
Annotated dataset of 250+ studies.
Final synthesis report summarizing key findings and gaps.
Web portal hosting all materials for broad accessibility.
Open-access materials via a dedicated portal with Creative Commons licensing.
No paywalls; designed for under-resourced schools and public institutions.
Dissemination through conferences, webinars, and professional networks.