30 Days After Introducing Programming: Which of My Students Will Fail?

At the end of this webpage, you can find the script, data, and materials we use in our research.

We now describe each file:

  • Script.r:
    • The R script we use to fetch the data from the database, execute the strategy, and save the results as HTML files
    • This script also has a function to execute the hypotheses tests
  • Clusters.zip
    • This zile contains all files describing the User - Submissions - Correct Submissions
  • 2 Groups.zip / 3 Groups.zip:
    • These zip files contain all files generated by our script. Everything is summarized in HTML files
    • The HTML files contain data regarding our two metrics for each student
    • The k-means results are summarized in the files
    • They also contain mappings between each student and the cluster group

All tables in the HTML files have the following structure:

We detail the structure in what follows. For each student, we have:

  • An internal number used by our database (ID)
  • The metrics Number of Correct Submissions and Number of Submissions are normalized (between 0 and 1)
  • We omit the student name (column Students). To do so, we use A - Z letters
  • Cluster Number: the group that k-means mapped to the student
    • Possible numbers:
      • For two groups: 1 or 2
      • For three groups, 1, 2, or 3
  • Reproved: true or false (0 or 1)

We have IDs to represent each course. Here is the mapping between the IDs and the semesters: