2009-2010 ASSISTment Data

Data Description for

This is the ASSISTment data that gathered in the school year 2009~2010. The full dataset is separated into two different files, one is all skill builder data, one is all non skill builder data.

Skill builder data is also called mastery learning data. This dataset is from skill builder (mastery learning) problem sets, in which a student is considered mastered a skill when meeting certain criterion (normally set to answered 3 questions correctly in a row), and no more questions will be given after mastery.

The dataset is free to use.

Data Download

There are three separate files you can get. If you write a paper using these please give them a link to the data page to be precise. These data sets include the number of hints and number of attempts but don't contain action level data (that is the exact sequence of hints and attempts). That exact sequence is often important in Ryan Bakers Affect detector work.

This is the data set that has gotten a lot of attention. click below

Skill-builder data 2009-2010:

Non-Skill Builder data 2009-10:

ASSISTments 2009-2010 Full Data set:

This file contains data from above two data sets, additionally, it also has data that has no problem set type associated.

Possible Research Questions You Could Try to use this data for.

RQ1: Predict Student Performance

The educational data mining field has been building student models to fit student data and predict student performance for many years. Lots of researches has been done using ASSISTment data to predict student performances. Some of them are predicting the very next performance of a student, such as in paper: The “Assistance” Model: Leveraging How Many Hints and Attempts a Student Needs; some of them are predicting student performances after a time intervel, such as in paper: Using Student Modeling to Estimate Student Knowledge Retention.

RQ2: Personalization

There have been efforts made in individualizing student models. Research have shown improvement in model fitting by personalizing student parameters.

Here are some examples of work in this field using the ASSISTment data:

The Student Skill Model

Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing

RQ3: Wheel-Spinning

Wheel-Spinning refer to the situation where student may find it hard to learn a skill from a problem set. How to detect Wheel-Spinning is useful in Intelligent Tutoring Systems.

For more details, please see the paper: Wheel-spinning: students who fail to master a skill

RQ4: Clustering

Previous work has shown some benefit of clustering student in predicting student performances. Different features for clustering, and different clustering method could be explored to better improve student models.

Here are some examples of clustering works done using the ASSISTment data:

Clustering Students to Generate an Ensemble to Improve Standard Test Score Prediction

Spectral Clustering in Educational Data Mining

Column Headings (this list is old and we have more complete descriptions of some of these fields here)

order_id
- - These id's are chronological, and refer to the id of the original problem log.
assignment_id
- - Two different assignments can have the same sequence id. Each assignment is specific to a single teacher/class.
user_id
- - The ID of the student doing the problem.
assistment_id
- - The ID of the ASSISTment. An ASSISTment consists of one or more problems.
problem_id
- - The ID of the problem.
original
- - 1 = Main problem
  - 0 = Scaffolding problem
correct
- - 1 = Correct on the first attempt
  - 0 = Incorrect on the first attempt, or asked for help.

- This column is often the target for prediction

attempt_count
- - Number of student attempts on this problem.
ms_first_response
- - The time in milliseconds for the student's first response.
tutor_mode
- - tutor, test mode, pretest, or posttest
answer_type
- - choose_1: Multiple choice (radio buttons)
  - algebra: Math evaluated string (text box)
  - fill_in: Simple string-compared answer (text box)
  - open_response: Records student answer, but their response is always marked correct
sequence_id
- - The content id of the problem set. Different assignments that assign the same problem set will have the same sequence id.
student_class_id
- - The class ID.
position
- - Assignment position on the class assignments page.
problem_set_type
- - Linear - Student completes all problems in a predetermined order.
  - Random - Student completes all problems, but each student is presented with the problems in a different random order.
  - Mastery - Random order; and students must "master" the problem set by getting a certain number of questions (3 by default) correct in a row before being able to continue.
base_sequence_id
- - This is to account for if a sequence has been copied. This will point to the original copy, or be the same as sequence_id if it hasn't been copied.
skill_id
- - ID of the skill associated with the problem.
  - For the skill builder dataset, different skills for the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills.
  - For the non skill builder dataset, different skills for the same data record are in the same row, separated with comma.
skill_name
- - Skill name associated with the problem.
  - For the skill builder dataset, different skills for the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills.
  - For the non skill builder dataset, different skills for the same data record are in the same row, separated with comma.
teacher_id
- - The ID of the teacher who assigned the problem.
school_id

- The ID of the school where the problem was assigned.

hint_count

- Number of student attempts on this problem.

hint_total

- Number of possible hints on this problem.

overlap_time

- The time in milliseconds for the student's overlap time.

template_id

- The template ID of the ASSISTment. ASSISTments with the same template ID have similar questions.

answer_id

- The answer ID for multi-choice questions.

answer_text

- The answer text for fill-in questions.

first_action

- The type of first action: attemp or ask for a hint.

bottom_hint
- - Whether or not the student asks for all hints.
opportunity

- The number of opportunities the student has to practice on this skill.
- For the skill builder dataset, opportunities for different skills of the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills and the corresponding opportunity count.
- For the non skill builder dataset, opportunities for different skills of the same data record are in the same row, separated with comma.

opportunity_original

- The number of opportunities the student has to practice on this skill counting only original problems.
  - For the skill builder dataset, original opportunities for different skills of the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills and the corresponding original opportunity count.
  - For the non skill builder dataset, original opportunities for different skills of the same data record are in the same row, separated with comma.