Introductory

Project

Overview

The purpose of the introductory project is to expose you to various common tasks related to user modeling research. You will pick one experiment from the Common Trends in Online Educational Experiments Dataset and perform the tasks specified below. For each task, you MUST include your work. If you used python, attach you scripts, if you used excel, attach your spreadsheets, if you did everything by hand, upload pictures of your work.


Whichever experiment you select ensure that it has the following:

  • At least 30 student participants

  • At least one control and one treatment condition


The introductory project is due September 15th before class.

Task 1

  • Report the following:

    • # of students in any control condition

    • # of students in any treatment condition

    • % of students placed in a control condition who completed the assignment

    • % of students placed in a treatment condition who completed the assignment

  • Find an appropriate quantitative method for determining if the likelihood that a student completed their assignment was influenced by which condition they were placed it. Justify in writing why this is an appropriate method.

  • Use your selected method to determine if the likelihood that students completed their assignment was influenced by which condition they were placed it, and explain the implication of whatever value(s) your method produced.

Task 2

  • Identify three different features of students (either that already exist in the priors table or that you calculate yourself from the action and problem logs) that you think could influence a student's ability to complete their assignment. Justify in writing why you have chosen these three features.

  • Calculate correlations

    • Calculate the correlation between each of your features and all students' assignment completion.

    • Calculate separate correlations for students in control conditions and students in treatment conditions.

    • Report the statistical significance of each correlation coefficient you've calculated.

    • Create a scatter plot for each of your features with a feature on the x-axis and the students' assignment completion on the y-axis.

Task 3

  • Randomly partition your data into a "training" and a "test" set. Explain the value of doing this.

  • Pick a supervised learning model to predict assignment completion, justify in writing your choice.

  • Fit your model on the training data to predict assignment completion using only your three features from Task 2 and the students' assigned experimental condition.

  • Use your model to predict assignment completion for your training set and your test set

  • Calculate and report at least three relevant accuracy metrics (Such as AUC) separately for the training and test set predictions.

  • Discuss the differences in these metrics between the training and test set.