LDI Workshop EDM 2020

The Learner Data Institute is funded by The National Science Foundation

Big Data, Research Challenges, & Science Convergence in Educational Data Science

The First Workshop of the Learner Data Institute

A Half-Day Virtual Workshop @ EDM 2020

July 10, 2020 — online

Workshop URL: https://sites.google.com/view/learnerdatainstitute/ldiedm


Proceedings of the Workshop


PDFs of contributed papers as well as slides from keynote and contributed paper presentations are also available below.


[All times are Central Time (UTC−05:00) in the USA.]

1:00PM - 1:25PM: Overview of the Learner Data Institute (Vasile Rus & Stephen Fancsali, Workshop Co-Organizers) [Presentation]

1:25PM - 1:30PM: Break

1:30PM - 2:00PM: Invited Keynote Speaker - Jason Hartline, Northwestern University - "Peer Grading and Mechanism Design" [Presentation]

Abstract: The talk will provide an overview of a peer grading system that is under development at Northwestern U. In courses that use the system it has (a) reduced the grading load of course staff by over 75%, (b) expanded and improved the students’ interaction with the course material, and (c) improved turn-around time of feedback on student work (students receive comments on their work after three days, rather than two weeks). As a research platform, this system enables a dialogue between theory and practice for algorithms, machine learning, data science, and mechanism design. Of particular focus for the talk is on mechanisms that incentivize peers to produce accurate reviews and the connection between these mechanisms and auction theory.

2:00PM - 2:10PM: Break

2:10PM - 2:40PM: Invited Keynote Speaker - Carolyn Rose, Carnegie Mellon University - "Towards Computer-Supported Collaborative Learning in the Workplace Enabled by Language Technologies" [Presentation]

Abstract: Well meaning companies offer training opportunities to their employees, but when push comes to shove, companies are known to push for short-term productivity over learning and higher productivity in the long term. The practical goal of the research is to enable learning during work, with a focus on software development teams.

Building on over a decade of AI-enabled collaborative learning experiences in the classroom and online, in this talk we report our work in progress beginning with classroom studies in large online software courses with substantial teamwork components. Project courses provide an effective test bed to begin our investigations due to similar tensions imposed by the reward structure. Project courses are believed to be valuable experiences for students to engage in reflection on concepts while applying them in practice. However there is a concern that the reward structure encourages students to engage in performance oriented behaviors, such as the most capable student taking on the lion's share of the work while leaving the others behind. These behaviors undercut the opportunity to use the project experience for each student to gain practice and for the students to reflect together on underlying concepts. In our classroom work, we have adapted an industry standard team practice referred to as Mob Programming into a paradigm called Online Mob Programming (OMP) for the purpose of encouraging teams to reflect on concepts and share work in the midst of their project experience. At the core of this work are process mining technologies that enable real time monitoring and just-in-time support for learning during productive work. This talk will offer an overview of a series of classroom studies and introduce a corpus available through Learnsphere.org’s DiscourseDB facilities: http://discoursedb.github.io.

2:40PM - 2:50PM: Break

2:50PM - 4:00PM: Contributed Paper Presentations (Note: 15 minute time slots = 10 minute talk + 5 minute Q&A; 10 minute time slots = 5 minute talk + 5 minute Q&A)

      • The Learner Data Institute: Mission, Framework, & Activities (Vasile Rus, Stephen E. Fancsali, Dale Bowman, Philip Pavlik Jr., Steve Ritter, Deepak Venugopal, Donald Morrison, and The LDI Team) [PDF]

      • Transforming the 21st-Century Learning Ecosystem with Data Science: A Survey of the Learner Data Institute Team (Vasile Rus, Donald M. Morrison, Stephen E. Fancsali) [PDF]

      • 2.50PM - 3.05PM: Student Strategy Prediction using Relational Models (Anup Shakya, Vasile Rus & Deepak Venugopal) [PDF]

      • 3.05PM - 3.15PM: Predicting Mathematics Learning for Young Children in An Adaptive Instructional System (KP Thai, Daniel Jacobs & Eric Keylor) [PDF] [Presentation]

      • 3.15PM - 3.25PM: Analysis of the Efficacy of Workskills Programs Using Detailed Data (Mary Hayes & Tabraiz Dada) [PDF] [Presentation]

      • 3.25PM - 3.40PM: Data-Intensive Learning Engineering & Applied Education Research with Carnegie Learning’s MATHia Platform (Stephen E. Fancsali & Steven Ritter) [PDF] [Presentation]

      • 3.40PM - 3.50PM: Educational Data Mining, 21st-Century Learning Ecosystems, and Models of Human Learning (Donald "Chip" Morrison) [PDF]

      • 3.50PM - 4:00PM: Towards Convergence in Approaching LDI Concrete Tasks: The Education Expert Panel Perspective (Christian Mueller, Laura Casey, Dietrich Albert, Leigh Harrell-Williams, Todd Zoblotsky and Susan Elswick) [PDF]

Workshop Summary

The First Workshop of the Learner Data Institute (LDI) seeks to bring together researchers working across disciplines on data-intensive research of interest to the educational data science and educational data mining communities. In addition to welcoming work describing mature, data-intensive or “big data” research and emerging work-in-progress that spans traditional academic disciplines, the workshop organizers welcome case studies of interdisciplinary research programs and projects, including case studies of learning engineering efforts pursued by universities, learning technology providers, and others (both successful and unsuccessful), as well as position papers on important challenges for researchers harnessing “big data” and crossing disciplinary boundaries as they do so.

Approximately 40 LDI contributors were recently surveyed as a part of the initiative to envision the future of convergence research in educational data science; they were asked to identify areas of challenge and societal need with respect to improving education and learning, compelling opportunities, and ways in which big data can be harnessed to address both. Portions of this workshop will be devoted to presenting and discussing the findings of this survey. In addition, the workshop convenes a diverse set of researchers and developers, some associated with LDI and others not, working with big data from contexts in which learning takes place, seeking to better understand state-of-the-art interdisciplinary research as well as compelling, specific societal needs and challenges and the scientific, big data frameworks that might be leveraged to solve them in the future.

We expect that this convening of a diverse, highly experienced group of researchers will stimulate substantial growth and interest in the notion of science convergence, including helping to set the direction of the LDI and the framework that it is tasked by NSF with developing.

Questions & Areas of Interest

  • How can we use massive and diverse datasets generated by adaptive instructional systems (AISs) to address core questions and challenges in learning science and engineering?

  • Are learners, teachers, and learning science researchers successfully interacting with cyber-learning technologies?

  • What are some critical challenges with respect to scaling the development of AISs across many domains and perhaps millions of learners?

  • What are the limitations of AISs and adaptive components of instructional systems? Which aspects of learning are best handled by humans and which ones by cyber-learning technologies (and how do we enhance the interaction of the two)?

  • How can data from student and teacher interactions with cyber-learning technologies, in and outside the classroom, be collected in ways consistent with best practices—e.g., with respect to data fidelity, security, reliability, privacy, human subject research protocols, school policies, parental consent, HIPAA, FERPA, etc.?

  • Methodology, infrastructure, and workflows for “big data” and data-intensive educational research

  • Inter/multi/trans-disciplinary approaches to data-intensive educational research

  • Case studies of successful & unsuccessful efforts to practically harness insights from large datasets in settings where learning takes place (e.g., case studies of “learning engineering” efforts)

  • Emerging challenges for researchers working across disciplines with large datasets

  • Use-cases, workflows, and case studies (illustrating the need) for (possibilities of extensions to existing) data infrastructure for research leveraging learner data, including data repositories, (open source) software and statistical libraries, innovative use of cloud computing resources, etc.


The Workshop Committee solicits three types of submissions:

  • Full papers (up to 8 pages): describing mature research, extensive descriptions of data-intensive workflows, and learning engineering efforts suitable for a 20 minute presentation.

  • Short papers (4-6 pages): suitable for a 10 minute presentation, especially appropriate for work-in-progress and shorter case studies.

  • Position papers (up to 3 pages) describing approaches to convergence research (see below), emerging challenges (e.g., that the LDI might take on collaboratively with authors with future funding), “wishlist(s)” for transformative learning applications, resources like data repositories, and other infrastructure that would fuel innovative work (e.g., that LDI could collaboratively develop with future funding): suitable for a 5 minute presentation.

We hope that all papers, but especially position papers, will spark conversations and interactions to drive future collaborations between LDI researchers and workshop participants. The Workshop Committee also intends to invite 1-2 speakers to deliver talks during the workshop.

Important Dates

  • Submission Deadline: June 15, 2020

  • Acceptance Notification: June 29, 2020

  • Workshop: July 10, 2020


Use the EDM 2020 paper templates.

Microsoft Word template: https://educationaldatamining.org/edm2020/wp-content/uploads/sites/4/2019/09/edm_word_template2020.doc

LaTeX template: https://educationaldatamining.org/edm2020/wp-content/uploads/sites/4/2019/09/edm_submission2020.zip

Submission System (EasyChair): https://easychair.org/conferences/?conf=ldiedm2020

More About the Workshop

The proceedings of the International Conference on Educational Data Mining (and related conferences, including the International Conference on Artificial Intelligence in Education, the ACM Conference on Learning at Scale, and Learning Analytics and Knowledge) demonstrate inherent linkages across traditional and emerging academic disciplines and research areas. Whether efforts are described as interdisciplinary, multidisciplinary, or trans-disciplinary, providing solutions to compelling challenges faced by learners, those individuals and institutions that facilitate learning, and other learning stakeholders must draw on expertise across boundaries of disciplines as diverse as, but not limited to, psychology, cognitive and learning science(s), mathematics, computer science (e.g., machine learning, artificial intelligence), statistics, human-computer interaction, public policy, education, neuroscience, social work, moral and political philosophy, and any of a number of sub-fields and research areas at the intersection of these disciplines.

The need for such multi/inter/trans-disciplinary solutions is even more relevant today as the vast and diverse repositories of digital data available can make such solutions viable. Indeed, recognizing both substantial scientific challenges and the need for innovative scientific frameworks to solve them, the U.S. National Science Foundation has identified the notion of “convergence” research as one of ten “big ideas” for its on-going investment strategy [1]. Two attributes are crucial to NSF’s notion of convergence research [2], namely that such research is “driven by a specific and compelling problem” and emphasizes “deep integration across disciplines.” Such integration is achieved when:

“... experts from different disciplines pursue common research challenges, [and] their knowledge, theories, methods, data, research communities and languages become increasingly intermingled or integrated. New frameworks, paradigms or even disciplines can form sustained interactions across multiple communities” [2].

The NSF-funded LDI focuses on such science convergence solutions for major challenges in learning with technology (online and blended). Furthermore, LDI will contribute to at least two of the ten new Big Ideas for Future Investment announced by NSF: Harnessing the Data Revolution for 21st Century Science and Engineering (HDR) and The Future of Work at the Human-Technology Frontier (FW-HTF).

About the Learner Data Institute

The Learner Data Institute (LDI) is an NSF-funded “data-intensive research in science and engineering” (DIRSE) initiative seeking to set out compelling, specific, big data research challenges for educational data science researchers and large-scale scientific and data convergence approaches to address them.

The LDI will help us learn: (1) how to transform a far-flung group of interdisciplinary researchers, developers, and practitioners into a community of practice that can fully exploit the data revolution through data and science convergence; (2) how adaptive instructional systems and data science can be used as research vehicles to further our understanding of how learners learn; (3) to explore the human-technology partnership with data and data science to improve learners’ and teachers’ ability to employ technology in a way that facilitates learning, while at the same time improving the affordability, effectiveness, scalability of these systems; and (4) more generally, how to extend the frontiers of data science to include: new methods of data collection and design; more interpretable machine learning methods (e.g., by combining deep learning with more interpretable inference frameworks like Markov Logic); scalable new algorithms (e.g., for joint inference in Markov Logic Networks); and methods for identifying causal mechanisms from unstructured, semistructured, and structured data.

More specifically, LDI contributors from university-based research groups, industry, and government are focusing on cutting-edge, big data approaches to assessment, learner modeling, instructional design, modeling subject-area domains in instructionally useful ways, socio-cultural aspects of learning, ethical aspects of working with learner data, and the human- technology frontier, among other areas of interest.

Workshop & Review Committee

Vasile Rus, Ph.D., University of Memphis (Co-Chair)

Stephen E. Fancsali, Ph.D., Carnegie Learning, Inc. (Co-Chair)

Dale Bowman, Ph.D., University of Memphis

Jody Cockroft, AA, BS, CCRP, University of Memphis

Art Graesser, Ph.D., University of Memphis

Andrew Hampton, Ph.D., University of Memphis

Philip I. Pavlik Jr., Ph.D., University of Memphis

Chip Morrison, Ed.D., University of Memphis

Steven Ritter, Ph.D., Carnegie Learning, Inc.

Deepak Venugopal, Ph.D., University of Memphis

[Ad-hoc reviewers will be drawn from the group of LDI contributors and broader community as necessary.]