Dr. Danielle R. Thomas

I am a Systems Scientist within the Human-Computer Interaction Institute at Carnegie Mellon University, where I serve as the Research Lead for PLUS—Personalized Learning Squared, a hybrid human-AI tutoring project led by Prof. Ken Koedinger at CMU in collaboration with Carnegie Learning, Inc. and Stanford University. I also serve as the Director of Research to Practice at The National Tutoring Observatory, a research infrastructure led by Prof. Rene Kizilcec at Cornell University. The Observatory aims to enable researchers to unlock the secrets behind effective teaching by analyzing "a million tutor moves" of data.

Collectively, these projects have been supported by the Learning Engineering Virtual Institute, Gates Foundation, Chan Zuckerberg Initiative, Walton Family Foundation, Overdeck Family Foundation, Richard King Mellon Foundation, and Google DeepMind.

My research centers on improving student learning outcomes through: 1) uncovering new insights into teaching and learning, 2) developing effective hybrid human-AI tutoring systems, and 3) leveraging AI to affordably scale learning interventions and enhance tutor and teacher training. Check out my CV, for more on my experiences, education, and publications.

As a former teacher, principal, and teacher educator, my first-hand experiences fuel the mission to greatly improve teaching and learning for the students who need it the most—the chronic disengagers, the historical strugglers, and the kids that say, "I hate math."

How to Fix the 5% Problem

In this Learning Engineering Virtual Institute (LEVI) article, I showcase how even the most advanced AI tutoring tools can’t help students who don’t use them. Research shows that ~5% of students engage with math tutoring programs enought to see measurable gains. I explain how human-AI programs are addressing this challenge by embedding tutoring into daily classroom routines, and giving teachers the tools to guide student engagement. This piece highlights how thoughtful integration, goal-setting, and teacher-led supports can ensure the missing 95 percent benefit from AI-assisted learning.

AI + Human Tutors = A New Equation for Math Success

What if every student had their own personal math tutor?

NBC's "The Learning Shift" recently featured our work on PLUS Tutoring to show how students are learning with a combination of human tutors and AI. See it in action and hear directly from our teachers and students. PLUS is already reaching students in 13 schools across 4 states—most from low-income backgrounds—providing one-on-one support they might not otherwise receive.

Watch the feature

Read the article

Paper accepted to AIME-CON (October 2025)

Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation

Humans are biased and inconsistent. Yet, we keep trusting them to define “ground truth.” This paper questions the overreliance on inter-rater reliability in educational AI and proposes a multidimensional approach leveraging expert-based approaches and close-the-loop validity to build annotations that reflect impact, not just agreement. It's time we do better. [link]

Danielle R. Thomas, Conrad Borchers, & Kenneth R. Koedinger.

In Artificial Intelligence in Measurment and Education, Oct. 27-29, 2025. Pittsburgh, PA.

Papers accepted to ECTEL (September 2025)

LLM-Generated Feedback Supports Learning If Learners Choose to Use It

Feedback boosts learning—but can AI do the same? In over 2,600 lessons with delayed corrective feedback for all, learners who engaged with LLM-generated feedback saw modest gains without spending extra time, showing LLMs can support learning. [link]

Danielle R. Thomas, Conrad Borchers, Shambhavi Bhushan, Erin Gatz, Shivang Gupta, & Kenneth R. Koedinger.

In The 20th European Conference on Technology Enhanced Learning, Sept. 15-19, 2025. Durham and Newcastle, UK.

Detecting LLM-Generated Short Answers and Effects on Learner Performance

We fine-tuned GPT-4o to detect LLM-generated short answers in online lessons, outperforming GPTZero and a stylometric baseline. Learners flagged for misusing LLMs performed significantly better on posttests, suggesting AI-generated responses may assist learners in bypassing meaningful learning. [link]

Shambhavi Bhushan, Danielle R. Thomas, Conrad Borchers, Isha Raguvanshi, Ralph Abboud, Erin Gatz, Shivang Gupta, & Kenneth R. Koedinger.

In The 20th European Conference on Technology Enhanced Learning, Sept. 15-19, 2025. Durham and Newcastle, UK.

Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study

We evaluate the feasibility of using LLMs to assess tutor behaviors in real-world math tutoring transcripts. Multiple models reliably detected and evaluated key tutor moves—effective praise and error response—with high alignment to human judgments, suggesting promise for scalable, low-cost tutor assessment. [link]

Danielle R. Thomas, Conrad Borchers, Jionghao Lin, Sanjit Kakarla, Shambhavi Bhushan, Ralph Abboud, Erin Gatz, Shivang Gupta, & Kenneth R. Koedinger.

In The 20th European Conference on Technology Enhanced Learning, Sept. 15-19, 2025. Durham and Newcastle, UK.

Paper presented at AIED in Palermo (July 2025)

Improving Open-Response Assessment with LearnLM

LearnLM, a fine-tuned model trained on pedagogical data, outperforms general models like GPT-4o at grading tutor responses in realistic tutoring scenarios. Given notarious ambiguity using human scores as "ground truth," we introduce a clever predictive validity method for establishing truth rethinking assessment—no red pens required! [link]

Danielle R. Thomas, Conrad Borchers, Sanjit Kakarla, Shambhavi Bhushan, Alex Houk, Shivang Gupta, Erin Gatz, & Kenneth R. Koedinger.

In The 26th International Conference on Artificial Intelligence in Education, July 21-25, 2025. Palermo, Italy (2025).

Papers presented at LAK in Dublin (March 2025)

Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT

MCQs often get a bad rap but is it justified? Here we find no differences in learning among those engaging with MCQs, open responses, or both--but MCQs are faster to complete. Despite using GenAI to automatically grade open responses, we don't plan on getting rid of MCQs just yet! Dataset and AI prompts included to try it yourself. Let's support open science! [link]

Danielle R. Thomas, Conrad Borchers, Sanjit Kakarla, Jionghao Lin, Shambhavi Bhushan, Boyuan Guo, Erin Gatz, & Kenneth R. Koedinger.

In The 15th International Learning Analytics and Knowledge Conference (AIED), March 3-8, 2025, Dublin, Ireland (2025)

Do Tutors Learn from Equity Training and Can Generative AI Assess It?

Human tutors need training on how to support students in math, but what about helping tutors in attending to equity? Here we use GenAI to assess tutors equity skills showing pre- to post-test gains. Dataset and support materials included. [link]

Danielle R. Thomas, Conrad Borchers, Sanjit Kakarla, Jionghao Lin, Shambhavi Bhushan, Boyuan Guo, Erin Gatz, & Kenneth R. Koedinger

In The 15th International Learning Analytics and Knowledge Conference, March 3-8, 2025, Dublin, Ireland (2025)

Highlights from 2024 & 2025

Prof. Koedinger and I received funding from Google DeepMind to use generative AI to assess tutors in training and tutoring

I attended Google's Learning in the AI Era event embracing the importance of curiosity, collaboration, and critical thinking in the AI age

As a panelist at AIED24 in Recife, Brazil, I discussed school perspectives and future classroom use of AI in schools

PLUS in action...

PLUS awarded Best Demo at AIED2023 in Tokyo, Japan

PLUS expands across the country

Active research on human-AI tutoring systems

Past pubs...

The Neglected 15%: Positive Effects of Hybrid Human-AI Tutoring Among Students with Disabilities

We conduct a two-study quasi-experiment to determine the impact of hybrid human-AI tutoring among students with and without disabilities in general education classrooms. We find positive effects among students. In particular, students with disabilities may benefit more from the motivational benefits of human tutor interaction. [link]

Danielle R. Thomas, Erin Gatz, Shivang Gupta, Vincent Aleven, & Kenneth R. Koedinger

In The 25th Artificial Intelligence in Education (AIED) Conference, July 7-13, 2024, Recife, Brazil (2024)

Learning and AI Evaluation of Tutors Responding to Students Engaging in Negative Self-Talk

How do you respond to students saying, "I am dumb" or "I can't do this." We (and generative AI!) assess the performance of 60 tutors within an online lesson on responding to students engaging in negative self-talk. We find evidence of tutor learning, with GPT-4 demonstrating high absolute performance. This LLM assessment system can easily scale from 60 to 600 tutors, making it a game-changer for evaluating human tutors at scale. [link]

Danielle R. Thomas, Jionghao Lin, Shambhavi Bhushan, Ralph Abboud, Erin Gatz, Shivang Gupta, & Kenneth R. Koedinger

In The 11th ACM Conference on Learning @ Scale (L@S), July 18-20, 2024, Altanta, Georgia (2024)

Using Generative AI to Provide Feedback to Adult Tutors in Training and Assess Real-life Performance

This work overviews the progress of the PLUS project towards using generative AI for tutoring feedback and assessment. While using generative AI shows promise as a low-cost and efficient method for these uses, ethical considerations and practical implications are discussed to ensure fair and responsible use. [preprint] [presentation]

Danielle R. Thomas, Erin Gatz, Shivang Gupta, Jionghao Lin, Cindy Tipper, & Kenneth R. Koedinger

In The 17th Annual Learning Ideas Conference, June 12-14, 2024, New York, NY (2024)

Improving Student Learning with Hybrid Human-AI Tutoring: A Three-Study Quasi-Experimental Investigation

We introduce hybrid human-AI tutoring and implement the model across three diverse schools. We find positive impacts on learning outcomes with evidence suggesting lower achieving students may benefit more from tutoring than higher achieving students—a promising finding. [link]

Danielle R. Thomas, Jionghao Lin, Erin Gatz, Ashish Gurung, Shivang Gupta, Kole Norberg, Stephen E. Fancsali, Vincent Aleven, Lee Branstetter, Emma Brunskill, Kenneth R. Koedinger

In The 14th Learning Analtyics and Knowledge (LAK) Conference, March 18-22, 2024, Kyoto, Japan (2024)

A Meta-Analytic Investigation of the Impact of Middle School STEM Education: Where Are All the Students of Color?

In this systematic review, we determine the average STEM student outperforms ~70% of their peers. Most notably, underrepresented minority students benefit given one caveat—they must be given the opportunity. [Journal article link]

Danielle R. Thomas & Karen H. Larwin

International Journal of STEM Education (2023)

Towards the Future of AI-Augmented Human Tutoring in Math Learning

This workshop highlights the challenges and opportunities of AI-in-the-loop math tutoring and encourages discourse in the AIED community. Access papers and presentations here.

Vincent Aleven, Richard Baraniuk, Emma Brunskill, Scott Crossley, Dora Demszky, Stephen Fancsali, Shivang Gupta, Kenneth R. Koedinger, Chris Piech, Steve Ritter, Danielle R. Thomas, Simon Woodhead, Wanli Xing

In The 24th Artificial Intelligence in Education (AIED )Conference, July 3-7, 2023, Tokyo, Japan (2023)

So You Want to Be a Tutor? Professional Development and Scenario-based Training for Adult Tutors

We introduce Personalized Learning Squared (PLUS), a human-AI tutoring platform designed to improve tutoring efficiency. PLUS leverages student-facing AI-powered math software and a tutor-facing personalized dashboard to provide the right support, to the right student, and at the right time.

Danielle R. Thomas, Shivang Gupta, Erin Katz, Cindy Tipper, Kenneth R. Koedinger

in 16th Annual Learning Ideas Conference, NYC (2023)

Using Large Language Models to Provide Explanatory Feedback to Human Tutors

We introduce a method of providing explanatory feedback to human tutors on their responses to open-ended questions leveraging LLMs using named entity recognition.

Jionghao Lin, Danielle R. Thomas, Feifei Han, Shivang Gupta, Wei Tan, Ngoc Dang Nguyen, Kenneth R. Koedinger

Workshop at 24th Artificial Intelligence in Education (AIED) Conference (2023)

Towards the Future of AI-Augmented Human Tutoring in Math Learning

Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise Given to Students in Synthetic Dialogues

We compare the performance of humans and GPT-4 in identifying criteria of praise by tutors to students. GPT-4 performs moderately well is some areas but underperforms in recognizing sincerity and authenticity- not surprising, yet paves the way for future work.

Dollaya Hirunyasiri, Danielle R. Thomas, Jionghao Lin, Kenneth R. Koedinger, Vincent Aleven

Workshop at 24th Artificial Intelligence in Education Conference (2023) Towards the Future of AI-Augmented Human Tutoring in Math Learning

Comparative Analysis of Learnersourced Human-Graded and AI-Generated Responses for Autograding Online Tutor Lessons

We introduce an AI-based method of autograding online tutor lessons. Comparing two methods of training set creation using learnersourced tutor responses and by prompting ChatGPT. Our findings show a constructive use of ChatGPT for pedagogical purposes that is not without limitations. [Video presentation]

Danielle R. Thomas, Shivang Gupta, Kenneth R. Koedinger

In The 24th Artificial Intelligence in Education Conference, July 3-7, 2023, Tokyo, Japan (2023)

When the Tutor Becomes the Student: Design and Evaluation of Efficient Scenario-based Lessons for Tutors

We show tutors perform ~20% better from pretest to posttest on our short scenario-based lessons similar to situational judgment tests. How would you respond to a student who has just made a math error?

Danielle R. Thomas, Xinyu Yang, Shivang Gupta, Adetunji Adeniran, Elizabeth McLaughlin, Kenneth R. Koedinger

In The 13th International Learning Analytics & Knowledge Conference, Austin, TX (2023)

Educational Equity Through Combined Human-AI Personalization: A Propensity Matching Evaluation

Comparing the achievement of 70 students participating in a hybrid tutoring program compared to a matched control, we found the learning gain among participating students was nearly double that of students not participating.

Danielle R. Chine, Cassandra Brentley, Carmen Thomas-Browne, J. Elizabeth Richey, Abdulmenaf Gul,... Kenneth R. Koedinger

In The 23rd Artificial Intelligence in Education Conference, Durham, UK (2022)

Want to know more?

Check out my CV for more pubs.

Page updated

Google Sites

Report abuse