Search this site
Embedded Files
Danielle R. Thomas
  • Home
  • CV
Danielle R. Thomas
  • Home
  • CV
  • More
    • Home
    • CV

E: drthomas@cmu.edu

W. daniellethomas.org

Google Scholar

Research Gate


LinkedInLinkLink

Danielle R. Thomas

I am a Systems Scientist within the Human-Computer Interaction Institute at Carnegie Mellon University, where I serve as the Research Lead for PLUS—Personalized Learning Squared, a hybrid human-AI tutoring project led by Prof. Ken Koedinger at CMU in collaboration with Carnegie Learning, Inc. and Stanford University. 


I also serve as the Director of Research to Practice at The National Tutoring Observatory, a research infrastructure led by Prof. Rene Kizilcec at Cornell University. The Observatory aims to enable researchers to unlock the secrets behind effective teaching by analyzing  "a million tutor moves" of data.


My research centers on improving student learning outcomes through: 1) uncovering new insights into teaching and learning, 2) developing effective hybrid human-AI tutoring systems, and 3) leveraging AI to affordably scale learning interventions and enhance tutor and teacher training. Check out my CV, for more on my experiences, education, and publications.


As a former middle school teacher, principal, and teacher educator, my first-hand experiences fuel the mission to greatly improve teaching and learning by focusing on the students who need it the most—the chronic disengagers, the historical strugglers, and the kids that say, "I hate math."

Recent News

August 2025: Paper accepted to AIME-CON proposing  multidimensional "ground truth".

July 2025: Appointed Director of Research to Practice at The National Tutoring Observatory.

July 2025: Presented at AIED25 in Palermo on automated assessment using LearnLM. 

June 2025: Three papers accepted to ECTEL: Paper 1, Paper 2, and Paper 3. 

May 2025: Featured on NBC News highlighting PLUS in schools. Now streaming on Snapchat and YouTube.  

April 2025:  Organized the Learning at Scale Workshop, taking place July 21 in Palermo, Italy.

March 2025:  Interviewed by The Learning Agency: Five Questions with the Developers of PLUS.

March 2025: Presented at LAK25 in Dublin, Ireland: Paper 1  and Paper 2. 

March 2025: Paper at the iRAISE Workhsop at AAAI25 in Philadephia.  

March 2025: Guest lecturer in Dr. Erin Gatz's  Learning About Learning course at Carnegie Mellon.  

January 2025:  Received research funding from Google DeepMind.   

December 2024:  Appointed Interim Research Director of The National Tutoring Observatory.

November 2024:  Invited to  Google's Learning in the AI Era convening in Mountain View, CA.

October 2024: Guest lecturer in Prof. Ken Holstein's Augmenting Intelligence course at Carnegie Mellon.

July 2024: Panelist on Gen erative AI and schools at AIED 2024 in Recife, Brazil. 

July 2024: Presented at AIED 2024 on tutoring and students with disabilities.

Paper accepted to AIME-CON (October 2025) 

Beyond Agreement: Rethinking Ground Truth in Educational AI  Annotation

Humans are biased, inconsistent, and yet we keep trusting them to define “ground truth.” This paper questions the overreliance on inter-rater reliability in educational AI and proposes a multidimensional approach leveraging expert-based approaches and close-the-loop validity to build annotations that reflect impact, not just agreement. It's time we do better.  [link]

Danielle R. Thomas, Conrad Borchers, & Kenneth R. Koedinger. 

In Artificial Intelligence in Measurment and Education, Oct. 27-29, 2025. Pittsburgh, PA.

Papers accepted to ECTEL (September 2025) 

LLM-Generated Feedback Supports Learning If Learners Choose to Use It  

Feedback boosts learning—but can AI do the same? In over 2,600 lessons with delayed corrective feedback for all, learners who engaged with LLM-generated feedback saw modest gains without spending extra time, showing LLMs can support learning. [link]

Danielle R. Thomas, Conrad Borchers, Shambhavi Bhushan, Erin Gatz, Shivang Gupta, & Kenneth R. Koedinger. 

In The 20th European Conference on Technology Enhanced Learning, Sept. 15-19, 2025. Durham and Newcastle, UK.

Detecting LLM-Generated Short Answers and Effects on Learner Performance

We fine-tuned GPT-4o to detect LLM-generated short answers in online lessons, outperforming GPTZero and a stylometric baseline. Learners flagged for misusing LLMs performed significantly better on posttests, suggesting AI-generated responses may assist learners in bypassing meaningful learning. [link]

Shambhavi Bhushan, Danielle R. Thomas, Conrad Borchers, Isha Raguvanshi, Ralph Abboud, Erin Gatz, Shivang Gupta, & Kenneth R. Koedinger. 

In The 20th European Conference on Technology Enhanced Learning, Sept. 15-19, 2025. Durham and Newcastle, UK.

Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study

We evaluate the feasibility of using LLMs to assess tutor behaviors in real-world math tutoring transcripts. Multiple models reliably detected and evaluated key tutor moves—effective praise and error response—with high alignment to human judgments, suggesting promise for scalable, low-cost tutor assessment. [link]

Danielle R. Thomas, Conrad Borchers, Jionghao Lin, Sanjit Kakarla, Shambhavi Bhushan, Ralph Abboud, Erin Gatz, Shivang Gupta, & Kenneth R. Koedinger. 

In The 20th European Conference on Technology Enhanced Learning, Sept. 15-19, 2025. Durham and Newcastle, UK.

Paper presented at AIED in Palermo (July 2025)

Improving Open-Response Assessment with LearnLM 

LearnLM, a fine-tuned model trained on pedagogical data, outperforms general models like GPT-4o at grading tutor responses in realistic tutoring scenarios.  Given notarious ambiguity using human scores as "ground truth," we introduce a clever predictive validity method for establishing truth rethinking assessment—no red pens required! [link]

Danielle R. Thomas, Conrad Borchers, Sanjit Kakarla, Shambhavi Bhushan, Alex Houk, Shivang Gupta, Erin Gatz, & Kenneth R. Koedinger. 

In The 26th International Conference on Artificial Intelligence in Education, July 21-25, 2025. Palermo, Italy (2025).

Papers presented at LAK in Dublin (March 2025)

Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT

MCQs often get a bad rap but is it justified? Here we find no differences in learning among those engaging with MCQs, open responses, or both--but MCQs are faster to complete.  Despite using GenAI to automatically grade open responses, we don't plan on getting rid of MCQs just yet!  Dataset and AI prompts included to try it yourself. Let's support open science!  [link] 

Danielle R. Thomas, Conrad Borchers, Sanjit Kakarla, Jionghao Lin, Shambhavi Bhushan, Boyuan Guo, Erin Gatz, & Kenneth R. Koedinger.  

In The 15th International Learning Analytics and Knowledge Conference (AIED), March 3-8, 2025,  Dublin, Ireland (2025)

Do Tutors Learn from Equity Training and Can Generative AI Assess It?

Human tutors need training on how to support students in math, but what about helping tutors in attending to equity? Here we use GenAI to assess tutors equity skills showing pre- to post-test gains.  Dataset and support materials included.  [link] 

Danielle R. Thomas, Conrad Borchers, Sanjit Kakarla, Jionghao Lin, Shambhavi Bhushan, Boyuan Guo, Erin Gatz, & Kenneth R. Koedinger 

In The 15th International Learning Analytics and Knowledge Conference, March 3-8, 2025,  Dublin, Ireland (2025)

Highlights from 2024

Prof. Koedinger and I received funding from Google DeepMind to use generative AI to assess tutors in training and tutoring

I attended Google's Learning in the AI Era event embracing the importance of curiosity, collaboration, and critical thinking in the AI age

As a panelist at AIED24 in Recife, Brazil, I discussed school perspectives and future classroom use of AI in schools 

PLUS in action...

PLUS awarded Best Demo at AIED2023 in Tokyo, Japan 

PLUS expands across the country  

Active research on human-AI tutoring systems

Past pubs...

The Neglected 15%: Positive Effects of Hybrid Human-AI Tutoring Among Students with Disabilities 

We conduct a two-study quasi-experiment to determine the impact of hybrid human-AI tutoring among students with and without disabilities in general education classrooms. We find positive effects among students. In particular, students with disabilities may benefit more from the motivational benefits of human tutor interaction.  [link] 

Danielle R. Thomas, Erin Gatz, Shivang Gupta, Vincent Aleven, & Kenneth R. Koedinger 

In The 25th Artificial Intelligence in Education (AIED) Conference, July 7-13, 2024,  Recife, Brazil (2024)

Learning and AI Evaluation of Tutors Responding to Students Engaging in Negative Self-Talk

How do you respond to students saying, "I am dumb" or "I can't do this." We (and generative AI!) assess the performance of 60 tutors within an online lesson on responding to students engaging in negative self-talk. We find evidence of tutor learning, with GPT-4 demonstrating high absolute performance. This LLM assessment system can easily scale from 60 to 600 tutors, making it a game-changer for evaluating human tutors at scale.  [link] 

Danielle R. Thomas, Jionghao Lin, Shambhavi Bhushan, Ralph Abboud, Erin Gatz, Shivang Gupta, & Kenneth R. Koedinger 

In The 11th ACM Conference on Learning @ Scale (L@S), July 18-20, 2024, Altanta, Georgia (2024)

Using Generative AI to Provide Feedback to Adult Tutors in Training and Assess Real-life Performance

This work overviews the progress of the PLUS project towards using generative AI for tutoring feedback and assessment. While using generative AI shows promise as a low-cost and efficient method for these uses, ethical considerations and practical implications are discussed to ensure fair and responsible use. [preprint] [presentation]

Danielle R. Thomas, Erin Gatz, Shivang Gupta, Jionghao Lin, Cindy Tipper, & Kenneth R. Koedinger 

In The 17th Annual Learning Ideas Conference, June 12-14, 2024, New York, NY (2024)

Improving Student Learning with Hybrid Human-AI Tutoring: A Three-Study Quasi-Experimental Investigation

We introduce hybrid human-AI tutoring and implement the model across three diverse schools. We find positive impacts on learning outcomes with evidence suggesting lower achieving students may benefit more from tutoring than higher achieving students—a promising finding. [link] 

Danielle R. Thomas, Jionghao Lin, Erin Gatz, Ashish Gurung, Shivang Gupta, Kole Norberg, Stephen E. Fancsali, Vincent Aleven, Lee Branstetter, Emma Brunskill, Kenneth R. Koedinger 

In The 14th Learning Analtyics and Knowledge (LAK) Conference, March 18-22, 2024, Kyoto, Japan (2024)

A Meta-Analytic Investigation of the Impact of Middle School STEM Education: Where Are All the Students of Color? 

In this systematic review, we determine the average STEM student outperforms ~70% of their peers. Most notably, underrepresented minority students benefit given one caveat—they must be given the opportunity. [Journal article link] 

Danielle R. Thomas & Karen H. Larwin 

International Journal of STEM Education (2023)

Towards the Future of AI-Augmented Human Tutoring in Math Learning

This workshop highlights the challenges and opportunities of AI-in-the-loop math tutoring and encourages discourse in the AIED community. Access papers and presentations here. 

Vincent Aleven, Richard Baraniuk, Emma Brunskill, Scott Crossley, Dora Demszky, Stephen Fancsali, Shivang Gupta, Kenneth R. Koedinger, Chris Piech, Steve Ritter, Danielle R. Thomas, Simon Woodhead, Wanli Xing

In The 24th Artificial Intelligence in Education (AIED )Conference, July 3-7, 2023, Tokyo, Japan (2023)

So You Want to Be a Tutor? Professional Development and Scenario-based Training for Adult Tutors

We introduce Personalized Learning Squared (PLUS), a human-AI tutoring platform designed to improve tutoring efficiency. PLUS leverages student-facing AI-powered math software and a tutor-facing personalized dashboard to provide the right support, to the right student, and at the right time. 

Danielle R. Thomas, Shivang Gupta, Erin Katz, Cindy Tipper, Kenneth R. Koedinger

in 16th Annual Learning Ideas Conference, NYC (2023)

Using Large Language Models to Provide Explanatory Feedback to Human Tutors

We introduce a method of providing explanatory feedback to human tutors on their responses to open-ended questions leveraging LLMs using named entity recognition. 

Jionghao Lin, Danielle R. Thomas, Feifei Han, Shivang Gupta, Wei Tan, Ngoc Dang Nguyen, Kenneth R. Koedinger

Workshop at 24th Artificial Intelligence in Education (AIED) Conference (2023)

Towards the Future of AI-Augmented Human Tutoring in Math Learning

Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise Given to Students in Synthetic Dialogues

We compare the performance of humans and GPT-4 in identifying criteria of praise by tutors to students. GPT-4 performs moderately well is some areas but underperforms in recognizing sincerity and authenticity- not surprising, yet paves the way for future work.

Dollaya Hirunyasiri, Danielle R. Thomas, Jionghao Lin, Kenneth R. Koedinger, Vincent Aleven 

Workshop at 24th Artificial Intelligence in Education Conference (2023) Towards the Future of AI-Augmented Human Tutoring in Math Learning

Comparative Analysis of Learnersourced Human-Graded and AI-Generated Responses for Autograding Online Tutor Lessons

We introduce an AI-based method of autograding online tutor lessons. Comparing two methods of training set creation using learnersourced tutor responses and by prompting ChatGPT. Our findings show a constructive use of ChatGPT for pedagogical purposes that is not without limitations.  [Video presentation]

Danielle R. Thomas, Shivang Gupta, Kenneth R. Koedinger 

In The 24th Artificial Intelligence in Education Conference, July 3-7, 2023, Tokyo, Japan (2023)

When the Tutor Becomes the Student: Design and Evaluation of Efficient Scenario-based Lessons for Tutors

We show tutors perform ~20% better from pretest to posttest on our short scenario-based lessons similar to situational judgment tests. How would you respond to a student who has just made a math error?  

Danielle R. Thomas, Xinyu Yang, Shivang Gupta, Adetunji Adeniran, Elizabeth McLaughlin, Kenneth R. Koedinger

In The 13th International Learning  Analytics & Knowledge Conference, Austin, TX (2023)

Educational Equity Through Combined Human-AI Personalization: A Propensity Matching Evaluation

Comparing the achievement of 70 students participating in a hybrid tutoring program compared to a matched control, we found the learning gain among participating students was nearly double that of students not participating.

Danielle R. Chine, Cassandra Brentley, Carmen Thomas-Browne, J. Elizabeth Richey, Abdulmenaf Gul,... Kenneth R. Koedinger

In The 23rd Artificial Intelligence in Education Conference, Durham, UK (2022)

Want to know more?

Check out my CV for more pubs.


Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse