ASSISTments

Longitudinal Data Mining Competition

The Big Data for Education spoke of the Northeast Big Data Innovation Hub is pleased to release a competition where data miners can try to predict an important longitudinal outcome using real-world educational data.

This competition uses data from a longitudinal study, now over a decade long, led by Professor Ryan Baker and Professor Neil Heffernan. This study, funded by multiple grants from the National Science Foundation, tracks students from their use of the ASSISTments blended learning platform in middle school in 2004-2007, to their high school course-taking, college enrollment, and first job out of college. Several papers have shown that behavior in ASSISTments in middle school can predict high school and college outcomes. In this competition, you will receive access to extensive (but carefully deidentified) click-stream data from middle school ASSISTments use, as well as carefully curated brand new outcome data on first job out of college, never before used in published research. Successful entries will be invited to submit both to a conference workshop (at EDM2018, in Buffalo, NY) and to a special issue of the Journal of Educational Data Mining. Dr Baker and Heffernan's goal is twofold: 1) (the engineering goal) to do well at the task and 2) (the science goal) to have the field learn from this competition. Some places like Kaggle are focused on the first goal; we think both goals are important, therefore, we are interest in having folks submit papers to the EDM society to talk about they tried. For instance, if you ran a good set of interesting experiments as part of your attempt to do well in this competition, that could be a contribution for a submitted paper, even if you did not win (or come close to winning) the competition.

If you would like meet with the competition organizers, we are hosting a one-day conference at Teachers College, Columbia University on August 28. The conference is FREE, but you must register for it here.

This competition will close December 1 at noon EST. Please see Data Mining Competition 2017 page for more details about participating.