Full-day Workshop as part of the 2026 International Learning Analytics & Knowledge Conference
April 27, 2026, 9am-5pm CET at the Radisson Blu Royal Hotel, Bergen, Norway
Tutoring is one of the most consistently effective educational interventions, with high-dosage models showing especially strong results. Yet the field still lacks a systematic understanding of why tutoring works—particularly which instructional moves drive learning. The emerging area of “teacher move analytics” addresses this gap by analyzing discourse, feedback, and interaction patterns. Advances in NLP, multimodal analytics, and AI-assisted annotation make large-scale analysis possible, but progress is hindered by fragmented data, high annotation costs, and privacy concerns.
The National Tutoring Observatory (NTO) offers a unifying infrastructure to overcome these barriers—enabling systematic analysis of tutoring, predictive and causal models of effectiveness, and the design of human–AI tutoring systems. By convening the LAK community, this workshop will help advance a shared agenda for uncovering the instructional moves that make tutoring and teaching instruction effective.
Date and time:
April 27, 2026, 9 am - 5 pm (Central European Time)
Location:
Dræggen 4b, Radisson Blu Royal Hotel, Dreggsallmenningen 1, 5003 Bergen, Norway
We will demo a tool that allows for the annotation of tutoring and teaching dialog data using AI. Then, attendees will work individually or in groups to annotate and analyze their own data sets or data provided by the NTO.
As generative AI tutors become increasingly common in educational settings, understanding how their emotional communication patterns compare to human tutors is critical. This study applies sentiment analysis to tutoring transcripts from UPchieve, a free online tutoring platform, comparing sessions with an AI tutor (N=787) to matched human tutor sessions. We find that AI tutors display significantly higher positive sentiment (M=0.601) than human tutors (M=0.197, d=2.13), creating large sentiment gaps with students. While human tutor sessions show converging sentiment trajectories (54\%), AI sessions more often diverge (54\%). Critically, large sentiment gaps in AI sessions predicted negative student sentiment change, whereas human sessions showed no such relationship. These findings suggest that AI tutors may benefit from calibrating emotional tone to better match student affect rather than maintaining consistently high positivity.
This workshop examines how Third Space Learning is approaching the transition from human tutoring to AI tutoring by identifying the interaction patterns that support student thinking. The focus is on a central question: which sequences of tutor moves are most effective at eliciting evidence of student reasoning and construction of understanding? Using a set of human and AI tutoring sessions, participants will apply a tutor-move taxonomy and develop a complementary prompt to classify the level of reasoning visible in student responses. This will allow the group to compare whether human tutoring currently elicits richer reasoning than AI tutoring and to identify tutor-move sequences that may be important to preserve or adapt as tutoring systems become more automated. The workshop is intended to inform how AI tutors can move beyond isolated responses toward more effective sequences of pedagogical support.
Timely and accurate identification of student misconceptions is key to improving learning outcomes and pre-empting the compounding of student errors. However, this task is highly dependent on the effort and intuition of the teacher. In this work, we present a novel approach for detecting misconceptions from student-tutor dialogues using large language models (LLMs). First, we use a fine-tuned LLM to generate plausible misconceptions, and then retrieve the most promising candidates among these using embedding similarity with the input dialogue. These candidates are then assessed and re-ranked by another fine-tuned LLM to improve misconception relevance. Empirically, we evaluate our system on real dialogues from an educational tutoring platform. We consider multiple base LLM models including LLaMA, Qwen and Claude on zero-shot and fine-tuned settings. We find that our approach improves predictive performance over baseline models and that fine-tuning improves both generated misconception quality and can outperform larger closed source models. Finally, we conduct ablation studies to both validate the importance of our generation and reranking steps on misconception generation quality.
Wide-scale analyses of utterance-based conversational data require scalable methods for annotating behaviors of interest. With gold standard annotations on a subset of the full sample, language models can be used to extend human expertise and annotate behaviors of interest across a full dataset. However, development of gold standard samples typically involves an iterative process of developing a schema or codebook for annotation, having humans annotate conversations, reviewing disagreements, and updating the schema. We introduce a framework and software that leverages language models to automate the annotation step in this process, allowing researchers to focus on defining and clarifying behaviors of interest. In particular, we propose a session-by-session procedure where researchers propose a measure of interest within a set of tutoring transcripts, language models annotate these within a single conversation, the reviewer approves annotations and reviews disagreements between language models, and then make changes to the schema. In addition, by use of text embeddings, we are able to conduct a guided search over the space of schema changes and automatically generate proposals for schema changes, which the researcher may approve or deny. Our framework and software provide a tool for education researchers conducting analyses of conversational data to rapidly develop schemas describing any behaviors of interest within conversational data. Furthermore, by deriving our classifier from the current state of the schema in real time, our method narrows the gap between schema and classifier—once a researcher is satisfied by the semantic and predictive accuracy of a schema of their behavior of interest, they immediately have a classifier that can be extended to the entire dataset. Our work enables education researchers to shape analyses around their own behaviors of interest rather than deferring to which behaviors have gold-standard datasets.
Human tutoring can meaningfully increase student engagement and persistence, yet it remains unclear which specific tutor moves are most effective during brief conversational interventions. Prior work combining dialogue transcripts with learning analytics has identified correlations between certain forms of tutor support, such as stepwise scaffolding, and subsequent gains in student engagement. However, much of this analysis relies on manual coding and relatively coarse categorizations of dialogue. At the same time, online tutoring environments generate rich multimodal data, including dialogue, screen activity, and system log traces, that capture finer-grained aspects of tutor-student interactions. These data open new opportunities to move beyond predefined coding schemes and instead discover patterns of effective tutor moves in a more data driven way, leveraging complementary signals across modalities. This brief input introduces embeddings as a tool for discovering and interpreting patterns in tutor-student interactions. Using a dataset of de-identified tutoring transcripts and multimodal features from fall 2025 (including dialogue and behavioral traces from systems such as MATHia), participants will explore how embedding-based clustering can surface meaningful groupings of tutor moves. The activity centers on human interpretation: participants will examine cluster structures, compare embedding configurations, and reflect on how different modalities and hyperparameters shape the resulting groupings. Through hands-on activities, attendees will (1) experiment with clustering tutor dialogue and multimodal actions, (2) interpret emergent cluster “types” of tutor moves, and (3) consider how these groupings relate to downstream engagement outcomes. We will also discuss how embedding-based approaches can complement traditional qualitative analysis by helping researchers and practitioners rapidly identify promising interaction patterns for further study.
Large language models (LLMs) offer a scalable alternative to human coding for data annotation tasks, enabling the scale-up of research across data-intensive domains such as learning analytics. While LLMs are already achieving near-human accuracy on objective annotation tasks, their performance on subjective annotation tasks, such as those involving psychological constructs, is less consistent and more prone to errors. Standard evaluation practices typically collapse all annotation errors into a single alignment metric, but this simplified approach may obscure different kinds of errors that affect final analytical conclusions in different ways. Here, we propose a diagnostic evaluation paradigm that incorporates a human-in-the-loop step to separate task-inherent ambiguity from model-driven inaccuracies and assess annotation quality in terms of their potential downstream impacts. We refine this paradigm on ordinal annotation tasks, which are common in subjective annotation. The refined paradigm includes: (1) a diagnostic taxonomy that categorizes LLM annotation errors along two dimensions: source (model-specific vs. task-inherent) and type (boundary ambiguity vs. conceptual misidentification); (2) a lightweight human annotation test to estimate task-inherent ambiguity from LLM annotations; and (3) a computational method to decompose observed LLM annotation errors following our taxonomy. We validate this paradigm on four educational annotation tasks, demonstrating both its conceptual validity and practical utility. Theoretically, our work provides empirical evidence for why excessively high alignment is unrealistic in specific annotation tasks and why single alignment metrics inadequately reflect the quality of LLM annotations. In practice, our paradigm can be a low-cost diagnostic tool that assesses the suitability of a given task for LLM annotation and provides actionable insights for further technical optimization.
The NTO is a first-of-its-kind research infrastructure that advances the science of teaching by capturing and analyzing tutoring interactions at scale. Its two core components are: (1) the Million Tutor Moves (MTM) repository, a large collection of multimodal tutoring data—transcripts, video, audio, and metadata—linked to student outcomes; and (2) open-source tools for securely de-identifying and annotating these data. Together, they enable researchers, practitioners, and developers to identify which instructional moves and interaction patterns most effectively drive learning.
The NTO partners with leading tutoring providers (Saga Education, UPChieve, Carnegie Learning, Third Space Learning, PLUS, Eedi), teacher feedback platforms like TeachFX, and infrastructures such as Databrary and LearnSphere. These collaborations pool diverse datasets and advance community standards for interoperability, annotation, and responsible sharing. By lowering barriers to fine-grained tutoring data, the NTO empowers the learning analytics community to explore new questions of instructional effectiveness. For LAK participants, it offers datasets, workflows, and a collaborative hub for advancing tutoring research.
Kirk Vanacore is the Research Director for the NTO and an Assistant Research Professor in the Bowers College of Computing and Information Science at Cornell University. His research blends statistics, machine learning, and artificial intelligence to uncover causal learning mechanisms in environments that combine human and AI instruction.
Email: kpv27@cornell.edu
Main Contact for this Workshop
Kizilcec is an Associate Professor in the Bowers College of Computing and Information Science at Cornell University, where he directs the Cornell Future of Learning Lab. He is a PI of the National Tutoring Observatory. Kizilcec studies behavioral, psychological, and computational aspects of technology in education to inform practices and policies that promote learning, equity, and academic and career success. Kizilcec has authored over 100 research papers, won numerous Best Paper awards, and received funding from the NSF, Schmidt Futures Foundation, Gates Foundation, Jacobs Foundation, Chan Zuckerberg Initiative, and Google.
Email: kizilcec@cornell.edu
Rachel is the Associate Director of the Future of Learning Lab in the Bowers College of Computing and Information Science at Cornell University. Slama’s research focuses on the role of technology in accelerating learning in the rapidly changing world of education and training. She also serves as a co-principal investigator and the partnerships director of the National Tutoring Observatory. Slama held previous roles leading workforce and training portfolios at RAND, MIT, and the National Science Foundation. Slama received her doctorate in Education Policy, Leadership, and Instructional Practice from Harvard University where she was an Ambach Fellow-- an opportunity designed to promote innovation in state education agencies. A former teacher in New York City and mother of four public school students, Slama is deeply committed to ensuring that technology is developed with—and for the educators and learners it aims to serve.
Email: rslama@cornell.edu
Josh is the Managing Director for the NTO. He has worked in a variety of education data and technology roles for almost 20 years. Josh is driven by finding new ways to create opportunities for those not served by traditional systems and has focused his research at the intersection of equitable measurement and policy. He earned a Master's degree from Brown University in Urban Education Policy and a Doctorate in Research, Educational Measurement, and Psychometrics from the University of Massachusetts Amherst. Josh is the first in his family to go to college and a proud community college graduate.
Email: jm2945@cornell.edu
Zhuqian Zhou is a postdoctoral researcher at the National Tutoring Observatory. Her work brings together data from diverse sources to deepen understanding of learners’ cognitive processes, guide the design of effective learning interventions, and address ethical considerations in artificial intelligence and education (AIEd) research. Beyond academia, she co-founded an EdTech company that supports high school students in exploring career pathways and serves as a board member of a nonprofit organization dedicated to supporting the Chinese community in the Greater New York area.
Email: zz968@cornell.edu
Bakhtawar Ahtisham is a Learning Analyst and Engineer at the National Tutoring Observatory. She holds a Master’s degree in Information Science from Cornell University and brings a multidisciplinary background in UX research, education technology, and AI-powered learning systems. Her work focuses on developing data-driven tools to support equitable, effective tutoring by combining qualitative research with scalable engineering solutions. At the NTO, Bakhtawar contributes to the design and analysis of open-source annotation systems and AI-driven insights that support researchers and practitioners in understanding tutoring practices at scale.
Email: ba453@cornell.edu
Danielle is a Systems Scientist at Carnegie Mellon University, the Research Lead on the PLUS tutoring project, and the Director of Research to Practice at the NTO. She is a former middle school teacher, instructional coach, and school administrator. Danielle wishes every child had their own human math tutor. Given that's not feasbile, she is striving to improve human-AI tutoring so that its not only impactful but cost-effective and scalable. In recent years, she has first-authored over a dozen papers in conferences, such as Artificial Intelligence in Education and International Learning Analytics and Knowledge.
Email: drthomas@cmu.edu
PLUS: http://tutors.plus
Ken is the Hillman professor of Computer Science and Psychology at Carnegie Mellon University and founder of PLUS tutoring. He is a co-founder of CarnegieLearning, Inc. that has brought Cognitive Tutor based courses to millions of students since it was formed in 1998, and leads LearnLab, the scientific arm of CMU's Simon Initiative. Through extensive research and development in human-AI tutoring, Ken has demonstrated a doubling of math learning among middle school students and aims to bring similar high-quality tutoring that is cost-effective to scale. He has authored over 300 research papers and over 60 grant proposals.
Email: koedinger@cmu.edu
PLUS: http://tutors.plus
Doug is the Technology Director of the NTO and the CEO of Freshcognate, an instructional design firm tackling the most challenging educational situations at scale. Doug is an educator at heart, as he was a former CPS English teacher focused on equity who combines his passion for technology and education to drive impactful teaching and learning experiences and products at scale. He is one of the original members of Lynda.com and EdX and was a major contributor to education at Google.
Email: doug@freshcognate.com
Justin is an associate professor of digital media in the Comparative Media Studies/Writing department at MIT and the director of the Teaching Systems Lab. He is the author of Iterate: The Secret to Innovation in Schools and Failure to Disrupt: Why Technology Alone Can’t Transform Education, and he is the host of the TeachLab Podcast. He earned his doctorate from the Harvard Graduate School of Education is a past Fellow at the Berkman-Klein Center for Internet and Society. His writings have been published in Science, Proceedings of the National Academy of Sciences, Washington Post, The Atlantic, and other scholarly venues. He started his career as a high school history teacher and wrestling coach.
Email: jreich@mit.edu
The International Conference on Learning Analytics & Knowledge Workshop
For questions about the LAK26 workshop, contact:
Danielle R. Thomas: drthomas@cmu.edu
Rene Kizilcec: kizilcec@cornell.edu