Data Challenge

Predicting Performance Based on the Analysis of Reading Behavior

As the adoption of digital learning materials in modern education systems is increasing, the analysis of reading behavior and their effect on student performance gains attention. The main motivation of this workshop is to foster research into the analysis of students’ interaction with digital textbooks, and find new ways in which it can be used to inform and provide meaningful feedback to stakeholders, such as: teachers, students and researchers. Building on the success of last years workshop at LAK19, this year we will offer participants a chance to take part in a data challenge to predict the performance of 300 students based on the reading behaviors of over 1000 students from the previous year in the same course. Additional information on lecture schedules and syllabus will also enable the analysis of learning context for further insights into the preview, in-class, and review reading strategies that learners employ. Participant contributions will be collected as evidence in a repository provided by the workshop and will be shared with the wider research community to promote the development of research into reading analysis systems.

We welcome submissions on some of the following topics(though not restrictive):

  • Student performance/at-risk prediction
  • Student reading behavior self-regulation profiles spanning the entire course
  • Preview, in-class, and review reading patterns
  • Student engagement analysis; and behavior change detection
  • Visualization methods to inform and provide meaningful feedback to stakeholders

This year we will provide two datasets: a labeled training dataset of over 1000 students and a test dataset with around 300 students data from the next year's classes. The learners performance score for the test dataset will be withheld, and participants can upload their scores to the workshop website to check the results of the evaluation once per day. A leaderboard will be provided with the best evaluation score that each participant has achieved to encourage competition between teams. Compared to the previous years class, only small updates have been made to the reading materials, offering a real world scenario for participants to tackle the problem of performance prediction based on digital reader usage.

Participants will be encouraged to share their results and insights of analyzing the provided data or other research related to reading behavior analysis by submitting a paper for presentation at the workshop

Participants will also be encouraged to contribute their programs/source code created in the workshop to an ongoing open learning analytics tool development project for inclusion as an analysis feature.

Evaluation Metrics

  • Prediction evaluation will be scored using the following metrics: RMSE.

Important Dates

  • Initial paper submission: December 15, 2019
  • Notification of acceptance: January 5, 2020
  • Registration deadline: January 20, 2020
  • Data Challenge Final Results Submission deadline: February 5, 2020
  • Camera-Ready deadline: February 5, 2020


  • March 23rd or 24th at LAK 2020 in Frankfurt, Germany.


Presentation time for full, short, poster, and data challenge papers is as follows:

(F) Full Paper: 10 min presentation + 5 min Q&A

(S) Short Paper: 8 min presentation + 4 min Q&A

(P) Poster Paper

(D) Data Challenge Lighting Talk: 3 min presentation & poster (please submit PPT/PDF before March 20, 2020 to

9:00 - Opening (Brendan Flanagan)

9:00 - 10:30 Research Paper Session (Chair: TBA)

  • Improving Learning Analytics and Student Performance through Connected Lifelong Learning on the Blockchain (Patrick Ocheja, Brendan Flanagan, Louis Lecailliez and Hiroaki Ogata) (F)
  • Interactive Dashboard for Teacher Orchestration Based on Collaborative Science Inquiry Behaviors (Jiaxin Cao and Yanjie Song) (S)
  • Understanding Jump Back Behaviors in E-book System (Boxuan Ma, Jiadong Chen, Chenhao Li, Likun Liu, Min Lu, Yuta Taniguchi and Shin’ichi Konomi) (F)
  • Social Knowledge Mapping Tool for Interactive Visualization of Learners' Knowledge (Akira Onoue, Masanori Yamada, Atsushi Shimada, Tsubasa Minematsu and Rin-Ichiro Taniguchi) (S)
  • Can the Area marked in eBook Readers Specify Learning Performance? (Yufan Xu, Xuewang Geng, Li Chen, Satomi Hamada, Yuta Taniguchi, Hiroaki Ogata, Atsushi Shimada and Masanori Yamada) (F)
  • Recommendation of Personalized Learning Materials based on Learning History and Campus Life Sensing (Keita Nakayama, Atsushi Shimada, Tsubasa Minematsu, Masanori Yamada and Rin-Ichiro Taniguchi) (S)
  • OpenLA: An Open-Source Library for e-Book Log Analytics (Atsushi Shimada, Ryusuke Murata and Tsubasa Minematsu) (S)

10:30 - 11:00 Morning Tea (Break)

11:00 - 11:20 Data Challenge Lighting Talk Session (Chair: TBA)

  • Learning Engagement - Clustering Analysis based on Student Interaction with Digital Textbooks (Abu Abu, Owoeye Oluwaseyi, Patrick Ocheja, Brendan Flanagan and Hiroaki Ogata) (D)
  • Learner’s Performance Prediction based on Histogram of Actions during class (Takayoshi Yamashita, Akiyoshi Satake, Tsubasa Hirakawa and Hironobu Fujiyoshi) (D)
  • Score Prediction Based on Page Feature Clustering (Ryusuke Murata, Tsubasa Minematsu and Atsushi Shimada) (D)
  • Performance prediction by behavior feature classification (Taisei Aoki, Yuya Kida and Maiya Hori) (D)
  • Predicting Student Exam Scores Based on Click-stream Level Data of Their Usage of an E-book System (Jihed Makhlouf and Tsunenori Mine) (D)

11:20 - 12:20 Research and Data Challenge Poster Session

  • All Data Challenge participants
  • A Picture-Book Recommender System for Extensive Reading on an E-Book System (Chifumi Nishioka, Sanae Fujita, Takashi Hattori, Tessei Kobayashi, Futoshi Naya and Hiroaki Ogata) (P)
  • What Activity Contributes to Academic Performance? (Tetsuya Shiino, Tsubasa Minematsu, Atsushi Shimada and Rin-Ichiro Taniguchi) (P)
  • Automatic Retrieval of Learning Contents Related to Quizzes for Supporting Students’ Enhanced Reviews (Takashi Ishikawa, Tsubasa Minematsu, Atsushi Shimada and Rin-Ichiro Taniguchi) (P)
  • Generating individual advice corresponding to the learning level by analyzing learning behaviors (Taisei Aoki, Maiya Hori and Atsushi Shimada) (P)
  • Evaluating the Accuracy of Real-time Learning Analytics in Student Activities (Takuro Owatari, Tsubasa Minematsu, Atsushi Shimada and Rin-Ichiro Taniguchi) (P)

12:20 - 12:30 Awards & Closing (Brendan Flanagan)


Data challenge track: Initial paper submissions should at least give an outline of work in progress with some preliminary analysis.

Research track: Paper submissions should be fully finalized papers.

  • Full paper: 8-10 pages (data challenge initial paper submission: 6 pages or more)
  • Short paper: 5-6 pages (data challenge initial paper submission: 4 pages or more)
  • Poster paper: 2-3 pages (data challenge initial paper submission: 1 page or more)

Submit papers using EasyChair:

All submissions to the workshop must follow the format of the Companion Proceedings Template (

Organizing Committee

  • Brendan Flanagan (Kyoto University, Japan)
  • Rwitajit Majumdar (Kyoto University, Japan)
  • Atsushi Shimada (Kyushu University, Japan)
  • Hiroaki Ogata (Kyoto University, Japan)

PC Members

  • Gökhan Akçapınar (Hacettepe University)
  • Ivica Boticki (University of Zagreb)
  • Mei-Rong Alice Chen (Kyoto University)
  • Mohammad Nehal Hasnine (Tokyo University of Agriculture and Technology)
  • Tsubasa Minematsu (Kyushu University)
  • Shitanshu Mishra (Vanderbilt University)
  • Yuichi Ono (Tsukuba University)
  • Rekha Ramesh (IIT Bombay)
  • Yuta Taniguchi (Kyushu University)
  • Masanori Yamada (Kyushu University)


By downloading our dataset and using our dataset you have agreed to our Terms of Use.

The dataset for this data challenge includes 4 types of files:

   - Data of the logged activity data from students' interactions with the BookRoll system.

   - Information about the length of the lecture materials used.

   - Information about the schedule of the lectures. This can be used to analyze the preview/in-class/review reading behaviors.

   - Data on the final score for each student. This can be used as a label for training and testing prediction models.

For a more description of the columns, please refer to the README file in the dataset download.

A link to download the dataset will be provided after your contact information has been registered and agreement with the terms of use have been met.

For more information about BookRoll and the learning analytics platform on which the data was collected, please refer to the following:

  • Brendan Flanagan, Hiroaki Ogata, Integration of Learning Analytics Research and Production Systems While Protecting Privacy, Proceedings of the 25th International Conference on Computers in Education (ICCE2017), pp.333-338, 2017.
  • Digital teaching material delivery system "BookRoll"
  • Hiroaki Ogata, Misato Oi, Kousuke Mohri, Fumiya Okubo, Atsushi Shimada, Masanori Yamada, Jingyun Wang, and Sachio Hirokawa, Learning Analytics for E-Book-Based Educational Big Data in Higher Education, In Smart Sensors at the IoT Frontier, pp.327-350, Springer, Cham, 2017.