30 Days After Introducing Programming: Which of My Students Will Fail?

At the end of this webpage, you can find the script, data, and materials we use in our research.

We now describe each file:

Script.r:
- The R script we use to fetch the data from the database, execute the strategy, and save the results as HTML files
- This script also has a function to execute the hypotheses tests
Clusters.zip
- This zile contains all files describing the User - Submissions - Correct Submissions
2 Groups.zip / 3 Groups.zip:
- These zip files contain all files generated by our script. Everything is summarized in HTML files
- The HTML files contain data regarding our two metrics for each student
- The k-means results are summarized in the files
- They also contain mappings between each student and the cluster group

All tables in the HTML files have the following structure:

We detail the structure in what follows. For each student, we have:

An internal number used by our database (ID)
The metrics Number of Correct Submissions and Number of Submissions are normalized (between 0 and 1)
We omit the student name (column Students). To do so, we use A - Z letters
Cluster Number: the group that k-means mapped to the student
- Possible numbers:
  - For two groups: 1 or 2
  - For three groups, 1, 2, or 3
Reproved: true or false (0 or 1)

We have IDs to represent each course. Here is the mapping between the IDs and the semesters: