Winners


Press Release

2019 NAEP Educational Data Mining Competition Results Announced


Researchers at Worcester Polytechnic Institute (WPI), University of Pennsylvania, University of Massachusetts Amherst and ETS have announced the results of an educational data mining competition, co-sponsored by Big Data for Education Spoke of the NSF Northeast Big Data Innovation Hub and ETS.


The Competition:

The goal of the competition was to engage leading researchers and promising doctoral students to push the field of educational data mining forward, develop metrics for measuring students’ test-taking activities, and help develop and test evaluation methods for educational analysis.

Competition participants were invited to use data produced by students on the first half of the National Assessment of Educational Progress (NAEP) test to predict which student would demonstrate ineffective test-taking behaviors.

The competition used a NAEP dataset, with permission from the National Center for Education Statistics (NCES), provided by ETS. Since 1969, the NAEP is the only assessment that measures U.S. student knowledge nationwide across academic subjects, in urban, suburban and rural areas.

The goal of this competition was to understand effective and ineffective test-taking behaviors, and to determine how quickly these behaviors can be detected. Specifically, participants in the competition were given data from students’ early performance in the test and asked to predict which students would later be flagged for potentially not being as motivated in the second half of the test, identified by whether students spent enough time on problems and whether they completed all the problems on the test. The full details of this competition can be found on the competition website.


What makes this competition different?

This competition was designed to improve the scientific understanding of student test-taking strategies. The results of this competition show that as early as two minutes into the test, the best of these algorithms could predict with 65% accuracy whether or not the data was from a student student who was not as motivated in the second half of the test.

Professor Neil Heffernan, director of the PhD program in Learning Sciences and Technology at WPI and one of the organizers, "The Nation's Report Card’s mission is to show the trend line of our nation's progress in developing student knowledge. This competition is one step in helping to improve our understanding of the NAEP, as there is a concern that students might not be taking the NAEP test as seriously as they used to. For instance, we could use this data to identify a student who is potentially not motivated throughout the test, and between sections, invite the student's teacher to offer encouragement. It's too early to know how NAEP should use these algorithms, but this competition could be an important step in developing appropriate interventions"

Details on the data being measured are available at the competition’s website.


Participation:

The Nation's Report Card 2019 Data Mining Competition had 89 individual and team participants in the competition, totaling 723 submissions. Researchers and students from 11 countries and 24 U.S. states participated in the competition. Some of the research teams were made up entirely of undergraduates. The organizers are pleased that this competition inspired undergraduates to care about educational data and become interested in its use in research.


The Results and the Winners:

Winners were judged based on the final score of their submission using the evaluation criteria specified in our competition website.

The first place winner was Nathan Levin from Teachers College, Columbia University in New York City. He constructed and refined features based on student click data and the time students spent working on problems. He then applied XGBoost Regressor on the final feature set.

The second place winners were Nirmal Patel, Aditya Sharma, and Tirth Shah from Playpower Labs. They constructed a large number of features using the results of their previous research, many of which were inspired by Process Mining and Curriculum Pacing. They then applied Genetic Algorithm-based feature selection and modeling. The predictions from multiple models were then assembled together to create a single final prediction.

The third place winner was Assistant Professor Nigel Bosch from the iSchool at the University of Illinois Urbana-Champaign. He constructed a large number of features (> 4,000) using both domain knowledge and automatic feature engineering methods, specifically TSFRESH and FeatureTools.

Participants of the top submissions will receive an invitation to submit their work and findings to a special issue of the Journal of Educational Data Mining. This should help to further improve the field’s understanding of this important work.


Honorary Mentions:

Among all of the participating teams, two additional teams showed outstanding efforts and achieved impressive results in both the leaderboard and the final test set: KLETech B Division from KLE Technological University (Huballi, India) and LTWZ from the Columbia University (New York City) and the University of Arizona (Tucson).

KLETech B Division treated the hidden dataset as three different tasks and developed a model for each task based on the different amounts of information provided (e.g., only the first 10 minutes, only the first 20 minutes, and all 30 minutes of log data). LTWZ developed their model using features based on student test-taking behaviors, such as the frequency of how often each student checks the test timer.

Final Scoreboard

NAEP Data Mining Competition 2019: Final Scoreboard

Except for the top 3, only participants who submitted the code before the deadline are included in the final scoreboard. Participants who were not included in the final scoreboard may request their final scores via email: assistments.data.mining.team [at] gmail [dot] com

The Annual Competition:

This was the second competition in what is anticipated to be an annual learning analytics competition. The first competition involved 11-year longitudinal predictions of student success. The third competition will be announced in a few months, and will be designed to challenge researchers to determine how to personalize learning.


Competition Organizers:

Technical Directors: Thanaporn “March” Patikorn,1 Neil Heffernan1

Organizers: Ryan Baker,2 Beverly Woolf,3 Irvin Katz,4 Carol Forsyth4 and Jaclyn Ocumpaugh2

1 Worcester Polytechnic Institute, 2 University of Pennsylvania

3 University of Massachusetts-Amherst, 4 Educational Testing Service


Acknowledgements:

The running of this competition was funded by National Science Foundation grants to UMASS, WPI and UPENN (1636782, 1636847, and 1661987 ) but the opinions expressed here are not those of NSF. We kindly thank NCES for helping to make the NAEP data available. We thank ETS for helping coordinate the use of this NAEP data to run this competition.