2009-2010 ASSISTment Data
Data Description for
This is the ASSISTment data that gathered in the school year 2009~2010. The full dataset is separated into two different files, one is all skill builder data, one is all non skill builder data.
Skill builder data is also called mastery learning data. This dataset is from skill builder (mastery learning) problem sets, in which a student is considered mastered a skill when meeting certain criterion (normally set to answered 3 questions correctly in a row), and no more questions will be given after mastery.
The dataset is free to use.
Data Download
There are three separate files you can get. If you write a paper using these please give them a link to the data page to be precise. These data sets include the number of hints and number of attempts but don't contain action level data (that is the exact sequence of hints and attempts). That exact sequence is often important in Ryan Bakers Affect detector work.
This is the data set that has gotten a lot of attention. click below
Non-Skill Builder data 2009-10:
ASSISTments 2009-2010 Full Data set:
This file contains data from above two data sets, additionally, it also has data that has no problem set type associated.
Possible Research Questions You Could Try to use this data for.
RQ1: Predict Student Performance
The educational data mining field has been building student models to fit student data and predict student performance for many years. Lots of researches has been done using ASSISTment data to predict student performances. Some of them are predicting the very next performance of a student, such as in paper: The “Assistance” Model: Leveraging How Many Hints and Attempts a Student Needs; some of them are predicting student performances after a time intervel, such as in paper: Using Student Modeling to Estimate Student Knowledge Retention.
RQ2: Personalization
There have been efforts made in individualizing student models. Research have shown improvement in model fitting by personalizing student parameters.
Here are some examples of work in this field using the ASSISTment data:
Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing
RQ3: Wheel-Spinning
Wheel-Spinning refer to the situation where student may find it hard to learn a skill from a problem set. How to detect Wheel-Spinning is useful in Intelligent Tutoring Systems.
For more details, please see the paper: Wheel-spinning: students who fail to master a skill
RQ4: Clustering
Previous work has shown some benefit of clustering student in predicting student performances. Different features for clustering, and different clustering method could be explored to better improve student models.
Here are some examples of clustering works done using the ASSISTment data:
Clustering Students to Generate an Ensemble to Improve Standard Test Score Prediction
Column Headings (this list is old and we have more complete descriptions of some of these fields here)
order_id
These id's are chronological, and refer to the id of the original problem log.
assignment_id
Two different assignments can have the same sequence id. Each assignment is specific to a single teacher/class.
user_id
The ID of the student doing the problem.
assistment_id
The ID of the ASSISTment. An ASSISTment consists of one or more problems.
problem_id
The ID of the problem.
original
1 = Main problem
0 = Scaffolding problem
correct
1 = Correct on the first attempt
0 = Incorrect on the first attempt, or asked for help.
attempt_count
Number of student attempts on this problem.
ms_first_response
The time in milliseconds for the student's first response.
tutor_mode
tutor, test mode, pretest, or posttest
answer_type
choose_1: Multiple choice (radio buttons)
algebra: Math evaluated string (text box)
fill_in: Simple string-compared answer (text box)
open_response: Records student answer, but their response is always marked correct
sequence_id
The content id of the problem set. Different assignments that assign the same problem set will have the same sequence id.
student_class_id
The class ID.
position
Assignment position on the class assignments page.
problem_set_type
Linear - Student completes all problems in a predetermined order.
Random - Student completes all problems, but each student is presented with the problems in a different random order.
Mastery - Random order; and students must "master" the problem set by getting a certain number of questions (3 by default) correct in a row before being able to continue.
base_sequence_id
This is to account for if a sequence has been copied. This will point to the original copy, or be the same as sequence_id if it hasn't been copied.
skill_id
ID of the skill associated with the problem.
For the skill builder dataset, different skills for the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills.
For the non skill builder dataset, different skills for the same data record are in the same row, separated with comma.
skill_name
Skill name associated with the problem.
For the skill builder dataset, different skills for the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills.
For the non skill builder dataset, different skills for the same data record are in the same row, separated with comma.
teacher_id
The ID of the teacher who assigned the problem.
school_id
The ID of the school where the problem was assigned.
hint_count
Number of student attempts on this problem.
hint_total
Number of possible hints on this problem.
overlap_time
The time in milliseconds for the student's overlap time.
template_id
The template ID of the ASSISTment. ASSISTments with the same template ID have similar questions.
answer_id
The answer ID for multi-choice questions.
answer_text
The answer text for fill-in questions.
first_action
The type of first action: attemp or ask for a hint.
bottom_hint
Whether or not the student asks for all hints.
opportunity
The number of opportunities the student has to practice on this skill.
For the skill builder dataset, opportunities for different skills of the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills and the corresponding opportunity count.
For the non skill builder dataset, opportunities for different skills of the same data record are in the same row, separated with comma.
opportunity_original
The number of opportunities the student has to practice on this skill counting only original problems.
For the skill builder dataset, original opportunities for different skills of the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills and the corresponding original opportunity count.
For the non skill builder dataset, original opportunities for different skills of the same data record are in the same row, separated with comma.