Longitudinal Data Mining Competition

While this competition is now over, the journal of the Educational Data Mining Society just released August 29, 2020 a special issue where some of the winner wrote up papers released to their findings.

Before that we organized the Workshop on Scientific Findings from the ASSISTments Longitudinal Data Competition during the The 11th Conference of Educational Data Mining in Buffalo, NY on July 15-18, 2018. For more information, please visit our workshop website.

The prediction submission form has been reopened. The scoreboard

The Big Data for Education spoke of the Northeast Big Data Innovation Hub is pleased to release a competition where data miners can try to predict an important longitudinal outcome using real-world educational data.

This competition uses data from a longitudinal study, now over a decade long, led by Professor Ryan Baker and Professor Neil Heffernan. This study, funded by multiple grants from the National Science Foundation, tracks students from their use of the ASSISTments blended learning platform in middle school in 2004-2007, to their high school course-taking, college enrollment, and first job out of college. Several papers have shown that behavior in ASSISTments in middle school can predict high school and college outcomes. In this competition, you will receive access to extensive (but carefully deidentified) click-stream data from middle school ASSISTments use, as well as carefully curated brand new outcome data on first job out of college, never before used in published research.

Why do we need this? School are already using 'drop out' detectors for early warning system, but they also need early warning systems for 'these students are losing interest in STEM' detectors. ' The results of this competition could help inform the design of systems that could help try to reignite student's interest in studying STEM.

Successful entries, as well as any other interested researchers, has been invited to submit both to a conference workshop (at EDM2018, in Buffalo, NY) and to a special issue of the Journal of Educational Data Mining. Dr Baker and Heffernan's goal is twofold: 1) (the engineering goal) to do well at the task and 2) (the science goal) to have the field learn from this competition. Some places like Kaggle are focused on the first goal; we think both goals are important, therefore, we are interest in having folks submit papers to the EDM society to talk about they tried. For instance, if you ran a good set of interesting experiments as part of your attempt to do well in this competition, that could be a contribution for a submitted paper, even if you did not win (or come close to winning) the competition.

This competition was closed December 1 at noon EST December 3rd, 2017 at 23.59 EST. Please see Data Mining Competition 2017 page for more details about participating. You can still get access to the data after the competition ended.

Copyright Baker, Heffernan & Woolf (2018)