Data Challenge

2020 CSEDM Data Challenge

During the CSEDM Workshop, we will launch the 2020 CSEDM Data Challenge. The goal of the challenge is to bring researchers together to tackle a common data mining task that is specific to CS Education. This year's challenge will feature multiple datasets from programming courses and environments in a common format (ProgSnap2).

A working (and incomplete) draft of the challenge can be found here.

In the 2019 Data Challenge, participants competed to create the best student model to predict programming performance. We had four entries and one winner, which were presented at the 2nd CSEDM Workshop. In 2020, we plan to extend the challenge in several ways:

  • Provide multiple datasets in a shared format, and use these to evaluate submissions on their generalizability.

  • Create multiple tracks, with separate goals. For example, we might have a track for predicting the success of a student's next submission, as in 2019, and another track for modeling learning curves (as in this paper).

  • Provide incentives for teams to collaborate across institutions.

Participants in this year's CSEDM Workshop had the chance to shape the goals and procedures of the Data Challenge.

We are seeking community input before the workshop as well. If you have a computing education dataset you can share, or an idea for a good data mining challenge, contact Thomas Price at twprice@ncsu.edu.