Data Challenge

Predicting Performance Based on the Analysis of Reading Behavior

As the adoption of digital learning materials in modern education systems is increasing, the analysis of reading behavior and their effect on student performance gains attention. The main motivation of this workshop is to foster research into the analysis of students’ interaction with digital textbooks, and find new ways in which it can be used to inform and provide meaningful feedback to stakeholders, such as: teachers, students and researchers. Building on the success of last years workshop at LAK19, this year we will offer participants a chance to take part in a data challenge to predict the performance of 300 students based on the reading behaviors of over 1000 students from the previous year in the same course. Additional information on lecture schedules and syllabus will also enable the analysis of learning context for further insights into the preview, in-class, and review reading strategies that learners employ. Participant contributions will be collected as evidence in a repository provided by the workshop and will be shared with the wider research community to promote the development of research into reading analysis systems.

We welcome submissions on some of the following topics(though not restrictive):

  • Student performance/at-risk prediction
  • Student reading behavior self-regulation profiles spanning the entire course
  • Preview, in-class, and review reading patterns
  • Student engagement analysis; and behavior change detection
  • Visualization methods to inform and provide meaningful feedback to stakeholders

This year we will provide two datasets: a labeled training dataset of over 1000 students and a test dataset with around 300 students data from the next year's classes. The learners performance score for the test dataset will be withheld, and participants can upload their scores to the workshop website to check the results of the evaluation once per day. A leaderboard will be provided with the best evaluation score that each participant has achieved to encourage competition between teams. Compared to the previous years class, only small updates have been made to the reading materials, offering a real world scenario for participants to tackle the problem of performance prediction based on digital reader usage.

Participants will be encouraged to share their results and insights of analyzing the provided data or other research related to reading behavior analysis by submitting a paper for presentation at the workshop

Participants will also be encouraged to contribute their programs/source code created in the workshop to an ongoing open learning analytics tool development project for inclusion as an analysis feature.

Evaluation Metrics

  • Prediction evaluation will be scored using the following metrics: RMSE.


  • March 23rd or 24th at LAK 2020 in Frankfurt, Germany.

Important Dates

  • Initial paper submission: December 15, 2019 (This can be an outline of work in progress with preliminary analysis)
  • Notification of acceptance: January 5, 2020
  • Registration deadline: TBA
  • Data Challenge Final Results Submission deadline: TBA (Same as Camera-Ready - last year it was around the end of January)
  • Camera-Ready deadline: TBA


Data challenge track: Initial paper submissions should at least give an outline of work in progress with some preliminary analysis.

Research track: Paper submissions should be fully finalized papers.

  • Full paper: 8-10 pages (data challenge initial paper submission: 6 pages or more)
  • Short paper: 5-6 pages (data challenge initial paper submission: 4 pages or more)
  • Poster paper: 2-3 pages (data challenge initial paper submission: 1 page or more)

Submit papers using EasyChair:

All submissions to the workshop must follow the format of the Companion Proceedings Template (

Organizing Committee

  • Brendan Flanagan (Kyoto University, Japan)
  • Rwitajit Majumdar (Kyoto University, Japan)
  • Atsushi Shimada (Kyushu University, Japan)
  • Hiroaki Ogata (Kyoto University, Japan)

PC Members

  • Gökhan Akçapınar (Hacettepe University)
  • Ivica Boticki (University of Zagreb)
  • Mei-Rong Alice Chen (Kyoto University)
  • Mohammad Nehal Hasnine (Tokyo University of Agriculture and Technology)
  • Tsubasa Minematsu (Kyushu University)
  • Shitanshu Mishra (Vanderbilt University)
  • Yuichi Ono (Tsukuba University)
  • Rekha Ramesh (IIT Bombay)
  • Yuta Taniguchi (Kyushu University)
  • Masanori Yamada (Kyushu University)


By downloading our dataset and using our dataset you have agreed to our Terms of Use.

The dataset for this data challenge includes 4 types of files:

   - Data of the logged activity data from students' interactions with the BookRoll system.

   - Information about the length of the lecture materials used.

   - Information about the schedule of the lectures. This can be used to analyze the preview/in-class/review reading behaviors.

   - Data on the final score for each student. This can be used as a label for training and testing prediction models.

For a more description of the columns, please refer to the README file in the dataset download.

A link to download the dataset will be provided after your contact information has been registered and agreement with the terms of use have been met.

For more information about BookRoll and the learning analytics platform on which the data was collected, please refer to the following:

  • Brendan Flanagan, Hiroaki Ogata, Integration of Learning Analytics Research and Production Systems While Protecting Privacy, Proceedings of the 25th International Conference on Computers in Education (ICCE2017), pp.333-338, 2017.
  • Digital teaching material delivery system "BookRoll"
  • Hiroaki Ogata, Misato Oi, Kousuke Mohri, Fumiya Okubo, Atsushi Shimada, Masanori Yamada, Jingyun Wang, and Sachio Hirokawa, Learning Analytics for E-Book-Based Educational Big Data in Higher Education, In Smart Sensors at the IoT Frontier, pp.327-350, Springer, Cham, 2017.