AI Reviewing

Overview

Companion’s AI Reviewing Initiative brings artificial intelligence to the writing tasks used across EnglishConnect 3 (EC3), PathwayConnect (PC), and the BYU English Language Assessment (ELA). These programs currently rely on large numbers of human reviewers or older machine‑learning scoring systems. By embedding a unified, rubric‑aligned AI writing grader directly in Companion (mobile + web), we can deliver faster feedback, lower operating costs, and consistent scoring quality at scale.

Main Objective and Expected Outcomes

Our primary objective is to deliver a fast, reliable, rubric‑aligned AI writing assessment that reduces manual review costs while meeting program quality standards. The table below summarizes the key targets, current status, and what is needed to reach production launch.

Why It Matters

Educators and program leads asked a simple question: What real problem does this solve, and why invest now? The answer is that it automates the writing-review work that humans do today—at scale—so learners receive feedback in minutes instead of waiting, and programs redirect budget from manual grading to instruction.

Faster Feedback, Stronger Learning Cycles. A human typically needs 10–15 minutes to review an EC3 5-prompt block; the AI returns rubric-aligned scores and comments in under 1 minute and can handle peak workloads without a backlog. Faster turnaround encourages more attempts, revision, and skill growth.
Meaningful Cost Relief. EC3 depends on paid reviewers, and ELA relies on a custom-hosted scoring engine. Consolidating into a single AI service reduces reviewer hours and legacy technology overhead, freeing funds for learner support and instructional improvement.
Quality That Meets Program Standards. Leadership defined 70% agreement with human scoring as the minimum acceptable bar. Current EC3 pilot results show ≈85% alignment, demonstrating that automated scoring can deliver trusted results while still benefiting from targeted human oversight.
Built to Scale Across Programs. The system learns from existing human‑scored data and adapts to each program’s rubric (EC3 Color; ELA ACTFL). This lets us extend one service across EnglishConnect 3, PathwayConnect, and BYU ELA without multiplying reviewer headcount or maintaining separate aging systems.

How It Works

1. Data & Training

Program teams provide historical human‑scored student responses (spreadsheet format). The AI trains on a portion and validates on a separate hold‑out set to measure alignment with human judgment. Iterative training has already produced strong accuracy in EC3 pilot testing.

2. Automated Scoring

When a student submits a 5‑prompt writing block (EC3) or an admissions writing sample (ELA), the AI applies the appropriate program rubric and returns an overall score plus brief feedback comments. For EC3, results display in the familiar Green / Yellow / Red format; for ELA, results map to the ACTFL proficiency scale.

3. Human‑in‑the‑Loop Quality

At launch, a small supervisory reviewer pool will periodically audit AI scores, resolve edge cases, and flag items for retraining. This ensures confidence while reducing the large, ongoing reviewer effort currently required

4. Continuous Improvement

Instructional and assessment leads can periodically upload new “gold” examples (high / mid / low performance) so the model stays aligned as curriculum changes or new learner error patterns emerge. Quality dashboards will surface where additional calibration data is needed.

English Connect 3 Reviewer (Demo)

English Language Assessment (Demo)

This demo does not accurately reflect the student experience, as its purpose is to illustrate how the AI would function. For demonstration purposes, a different layout has been used, and the final evaluation result is shown. In the actual student experience, this result is not displayed; instead, students receive a message informing them whether they have passed the exam.

Progress & Next Steps

The current phase focuses only on assessing English writing skills for both EC3 and ELA.

EnglishConnect 3 (EC3) AI Reviewer

Pilot accuracy is currently ≈85% (above the 70% acceptance threshold).
Engineering integration estimated ~70–75% complete.
Remaining steps: secure production platform access, finalize API integration, run a controlled pilot with real learner submissions.
Target launch: Block 2 of 2026, subject to access approvals.

English Language Assessment (ELA) AI

Requested to replace an older hosted ML scoring system used in Pathway/BYU admissions.
Early development: currently in the process of collecting high-quality, representative writing samples for each English proficiency level according to ACTFL standards (Beginner through Advanced) to ensure proper training of the AI.
Lower near‑term priority while EC3 production work completes; the roadmap will be set once sufficient calibration data is available.

Impact & Results (Preliminary)

Accuracy: EC3 pilot shows ≈85% agreement with human reviewers, above the 70% threshold defined by program leadership.
Speed: AI returns feedback in under 1 minute per submission and can process ~2,000 responses in under an hour—eliminating backlogs during peak periods.
Cost Efficiency: Reducing paid reviewer hours and retiring a legacy hosted scoring stack creates room in the budget for instructional innovation and student services.
Student Experience: Immediate, actionable feedback encourages more frequent practice attempts and earlier identification of grammar, clarity, and task‑completion issues.
Operational Scalability: One service can support multiple programs and rubrics without rebuilding separate tools.

EC3 Preliminary Results Analysis

High‑confidence scoring:  Approximately 70 % of EC3 writing submissions show a 3 / 3 match—the AI assigns the same score as the human reviewer on all three questions. This indicates strong rubric alignment and supports immediate use in a production setting.
Acceptable variance band:  An additional 19 % of submissions fall into the 2 / 3 match bucket, differing on only one question. Combined with perfect matches, ≈ 89 % of submissions meet or exceed the programme’s ≥ 70 % concordance benchmark, confirming readiness for rollout once infrastructure is provisioned.
Focused refinement window:  Only ~11 % of submissions (1 / 3 and 0 / 3 matches) display significant divergence. These low‑agreement cases are ideal “edge‑case” material for targeted retraining and error analysis.
Efficient human‑in‑the‑loop model:  Because the disagreement slice is small, a modest supervisory reviewer pool can audit these specific submissions weekly, tag mis‑scored questions, and feed corrections back into the model—maintaining quality without high labour cost.
Final Conclusion: With nearly nine out of ten submissions already meeting the acceptance bar, the AI grader delivers production‑ready performance, enabling rapid student feedback and cost‑effective scaling for EC3.

Risks & Mitigation

Responsible rollout of AI‑based writing assessment requires more than model accuracy. We must ensure the right rubric is applied to the right program, that we have enough labeled data to calibrate ELA’s ACTFL scale, and that infrastructure (Azure AI Search, Blob Storage, production credentials) is in place before EC3 goes live at scale. We are launching with supervisory human review, clearly defined agreement thresholds, staged integration checkpoints, and institutional safeguards around student data privacy. The table below summarizes the major risks, our mitigation strategies, and current status.

Governance & Quality Assurance

To preserve academic trust, Companion will implement transparent quality controls:

Accuracy Dashboard: Regular comparison of AI vs. human sample scores by rubric category.
Rubric Alignment Reviews: Quarterly working sessions with instructional leads (EC3 Color; ELA ACTFL) to confirm interpretation and threshold levels.
Alerting & Escalation: Automated alerts if rolling accuracy falls below program tolerance; trigger targeted human review sampling.
Data Privacy & Security: Student submissions processed within approved institutional cloud environments; personally identifiable data minimized in model training sets.
Version Tracking: Documented change log for rubric updates, model retrains, and scoring parameter adjustments.

Quick FAQ

Does this replace human reviewers entirely? Not immediately. We will retain a smaller supervisory reviewer pool to check quality, guide improvements, and handle edge cases.
What level of accuracy is “good enough”? Program leadership set ≥70% agreement with human scoring as acceptable. EC3 pilot performance is already ≈85% and will continue to improve with calibration.
Will this slow down students? The opposite—feedback in under a minute supports same‑session revision and more frequent practice.
Is it secure? Student data is processed in institutionally approved cloud environments with controlled access and minimal retention of personal identifiers.

Page updated

Google Sites

Report abuse