ASSISTments Data Mining Competition 2017: Details
Introduction Video by Professor Neil Heffernan
Overview: This competition is sponsored by the Big Data for Education Spoke of the Northeast Big Data Hub, an NSF initiative to help spur progress in educational research using big data. Individuals/teams competing in this research will use educational data from ASSISTments, an intelligent tutoring system of middle school mathematics, to make long term predictions, and winners will be asked to publish their work in the Journal of Educational Data Mining.
Long term outcome being modeled: The task in this competition is to develop a cross-validated prediction model that is able to use middle-school data to predict whether the students (who have now finished college) pursue a career in STEM fields (1) or not (0).
The dataset for this competition, in which you will predict which students have entered STEM career feels (and which haven't), includes these 12 files:
- student_log_#.csv (9 files), which contain the logged activity data from students' interactions with assistments ASSISTments
- training_label.csv, which contains some student-level data and the dependent measure for the training set: isSTEM.
- validation_test_label.csv contains some student-level data but not the dependent measure for the validation and test set
More information can be found in our column label descriptions for these files.
There is no fee to participate but you must register here, where you will receive a key that will allow you to submit your models for evaluation.
Prediction Model Submissions:
Models can be submitted (here) daily for evaluation, using the following format.
1. The submission must be a comma-separated string of 172 predictions (same number as the rows in validation_test_label.csv) without white spaces.
2. A prediction must be a number between 0 and 1, inclusive. Each number can be an integer (i.e. 1), or decimals (e.g. 0.69472) or decimal without leading zero (e.g. .69472).
3. The predictions should be in the same order as the the students in validation_test_label.csv
4. You also need to input the 10-character key that was sent to you during the registration.
5. The submissions will be evaluated only once everyday at noon EST. We will only evaluate the latest submission for each registered participant.
In keeping with cross-validation practices that are an important part of the Educational Data Mining community, we have held out two, randomly-selected portions of this data set to be used to evaluate prediction models.
(1) a validation set which is being used to give participants formative feedback on their prediction models, which they can resubmit as often as once daily between now and December 1st.
(2) a test set which will be used to make the final evaluation of prediction models on December 1st. The labels of the test set will not be available to the participants by any means. The purpose of this set is to be used for the final evaluation of the predictions at the end of the competition.
We will use the validation set to evaluate any newly submitted models each day at noon EST.
Submissions are evaluated using two measures: RMSE and AUC. The winner of this competition is the competitor whose submission has both low RMSE and high AUC on the test set. More specifically, we will use the linear aggregation of RMSE and AUC (i.e. (1 - RMSE) + AUC) to determine the winner.
In addition, participants must agree to submit the code that produces the submitted predictions in order to be eligible for the winner's awards.
The best submission of each competitor will be updated on our PUBLIC SCOREBOARD, which allows you to see how your model compares to those of other competitors.
Common submission errors:
There are a few things that can invalidate your submission.
- Email address: you can check whether you input the right address by checking your submission log file (Google Doc) that was sent to you when you register. If you input the wrong address, a new row corresponding to the submission will not appear.
- Key: if you input the right email but with a wrong key, the submission will still show up on the log file, but it will appear as isValidated = FALSE. In this case, the submission will not count as the latest submission. Only the latest submission with isValidated = TRUE is evaluated.
- File formats: If your predictions do not match the format we specified, the form will show an error message, and you will not be able to proceed with the submission.
If you have any questions, comments, or concerns, please contact us at assistments.data.mining.team [at] gmail [dot] com