An Explanation on how to interpret our data sets

Many of the log files you will get will be from the "problem-logs" table. That is each row represents a single problem of each student. There will be extra information like the number of hints used for the problem or the time it took for a student to make their first response (called "first_response_time" in the table below). Problem_log_id refers to the unique row in the table base. We will sometime "join" into this file other stuff that is at the student or assignment level, so those would be duplicated (user id and sequences id would be duplicated).

Professor Neil Heffernan has documented with videos what many of these fields mean. See here. If you run your own experiment, there is a second set of descriptions of some of these fields in the files we give out that could be helpful so take a look here

Column Headings

  • user_id

    • The ID of the student doing the problem.

We have a set of features that tell researchers something about how students did prior to starting the problem set of interest.

  • prior_problem_count

    • The number of problems the student had completed in ASSISTments prior to this assignment. This allows you to know something about the student before they did this problem set. You could use this to split student into low and high incoming ability to see in your intervention is effective. If this is empty it means this student has not done any prior assistment work.

  • prior_correct

    • Goes with prior_problem_count. The number of problems the student had answered correctly in ASSISTments prior to this assignment.

  • prior_percent_correct

    • prior_correct/prior_problem_count. The percent of past ASSISTments problems the student got correct.

We have some features that are related to the problem set overall like how many problems they finished to if they were doing it for homework.

  • problem_count

    • Number of problems done by the student in this assignment.

  • assignment_started_count

    • The number of students who have started this assignment for the given class (assignment_id)

  • assignment_finished_count

    • The number of students who have finished this assignment for the given class (assignment_id)

  • assignment_homework_count

    • The number of students who have finished the assignment NOT between the hours of 7-3 server time. This is considered school time.

  • homework_percent

    • The percent of students who did the assignment as homework (assignment_homework_count / assignment_finished_count)

  • problem_logs_id

    • Each problem the student does is recorded as a problem log. This is the ID of a problem log. Problem_log is the table that the biggest in ASSISTments. About 10 millions problems were solved in 2012, so there are 10 million rows in the database. For each problem, there might be a few attempts that a child made and a few hint requests. We call them actions. We don't store actions in their own table but if we did it would be bigger as every problem has at least one action (if it is correct) and more if the student was incorrect. We store in the "actions" field of problem_log the actions with time stamps.

  • problem_logs_assignment_id

    • The ID of the assignment the problem log belongs to.

  • problem_logs_user_id

    • The same as user_id. It is the user id of the student whose problem log this is.

  • condition

    • We currently do not give condition that is coming soon.

We have three new columns related to ARRS.

    • ARRS Correctness.

    • The correctness on the first reassessment test. A '1' represents that the student answered the question correctly and a '0' represents that the student answered the question incorrectly. All other values (nulls or dashes) mean the student was never assigned an ARRS test or has not yet attempted the ARRS question.

  • ARRS Delay Days.

    • The number of days between when the student finished the skill builder and when the ARRS test was assigned.

  • ARRS Adaptive Mode.

    • ASSISTments now has an adaptive version of ARRS. Students that take a longer number of items to learn, get reassessed earlier. It is important when looking at data to know if the student was assigned in adaptive mode. There is a nicely published paper on this features here.

  • guessed_gender

    • Detail on how we "guess" gender are listed here. This with either be "Male", "Female" or "Unknown"

  • guessed_gender_2

    • Similar method like "guessed_gender", but has wider coverage. Detail is listed here. This with either be "Male", "Female" or "Unknown"

  • assistment_id

    • Similar to problem_id. The ID of a problem one will see in the builder. If a problem has multiple main problems and/or scaffolding, everything relating to one problem is called an assistment and has the same assistment_id. If you see problem logs with the same assistment number, they are multiple main problems(or scaffolding problems) that are part of the same overarching problem.

  • problem_id

    • The ID of the problem. If a problem has multiple main problems, each multiple main problem will have a different problem_id.

  • original

    • 1 = Main problem

    • 0 = Scaffolding problem

    • If a problem has scaffolding and the student answers incorrectly or asks for the problem to be broken into steps, a new problem will be created called a scaffolding problem. This creates a separate problem log row in the file with the variable original = 0.

  • correct

    • 1 = Correct on first attempt

    • 0 = Incorrect on first attempt, or asked for help

    • This column is often the target for prediction. (Minor note: Neil Heffernan notes that while this is true most of the time, we also have Essay questions that teachers can grade. Neil thinks that if this value is say .25 that means the teacher gave it a 1 our of 4. )

  • answer_id

    • Only exists for multiple choice or choose all that apply questions

    • A number = the answer the student put in corresponds with one of the answers for that problem

    • 0 or empty = the student put an answer not corresponding with one of the answers for that problem

  • answer_text

    • The answer the student entered. Or the value the student selected in a multiple choice or choose all that apply problem.

  • first_action

    • 0 = attempt

    • 1 = hint

    • 2 = scaffolding

    • empty = student clicked on the problem but did nothing else

  • hint_count

    • Number of hints a student asked for during the duration of the problem

  • bottom_hint

    • 1 = The student asked for the bottom out hint

    • 0 = The student did not ask for the bottom out hint.

    • If this is blank it means the student did not ask for a hint. Remember that for scaffolding questions they can not get a hint.

o The bottom out hint is the last hint for a problem and will generally contain the problem’s answer.

  • attempt_count

    • Number of attempts(number of times a student entered an answer)

  • problem_start_time

    • Time the student started the problem

  • problem_end_time

    • Should be the time the student finished the problem, currently it is the time in milliseconds after the problem_start_time.

  • first_response_time

    • Time between start time and first student action(asking for hint or entering an answer) (in milliseconds)

  • overlap_time

    • The time in milliseconds for the student to complete the problem. Ideally this is meant to be time it took the student to finish the problem. For instance, if a student spent 10 seconds reading the problem, asked for a hint and spent 3 seconds reading it, and then spent 14 seconds typing in the correct answer, the overlap time would be 27 seconds (expressed in milliseconds.)

    • This field is often computed incorrectly. Many data sets display overlap time the same as the first response time. You could compute overlap time using other fields, like using the state time of two problems.

  • tutor_strategy_id

    • The types of tutoring strategies in ASSISTments are hints or scaffolding

    • For a problem, the entire hint set or scaffolding set has a number, if a student used a tutoring strategy the number of that strategy is tutor_strategy_id. Professor Hefferna documented and showed what this means here

    • If there are multiple main problems the tutoring strategies for each main problem has a unique number.

  • assignment_type

    • Determined by assignment type id. Usually ClassAssignment, but sometimes ARRS or remedial.

  • teacher_comment

    • Currently unsure what this column refers to

  • network_state

    • The state of the network when the student did the assignment.

    • CONNECTED = the student did the assignment while online

    • DISCONNECTED = the student did the assignment while offline

  • assignment_logs_id

    • Each assignment the student does is recorded as an assignment log. This is the id of an assignment log.

  • assignment_logs_assignment_id

    • This is the id of the assignment. This number is linked to the assignment and is the same for all students that do this assignment.

  • assignment_logs_user_id

    • The same as user_id

  • assignment_start_time

    • same as problem_start_time

  • assignment_end_time

    • same as problem_end_time

  • instance

    • Currently unsure what this column refers to

  • assignment_logs_sequence

    • Currently unsure what this column refers to

  • effort_score

    • Currently unsure what this column refers to

  • assignment_logs_variables

    • If a variablized template, this will return any variables present.

  • last_worked_on

    • This is the last time the student worked on something from the assignment that goes with this problem log

    • Only a date, not a time

    • Used in seeing if the daily limit is reached

  • mastery_status

    • Only significant for a skill builder

    • mastered = Student completed the number of problems required for mastery

    • limit exceeded = Student exceeded the daily limit of problems for that skill builder

    • not mastered exhausted = Student attempted all the problems in that skill builder

    • blank = Student did not fit into one of the previous three categories

  • assignment_logs_assignment_type

    • same as assignment_type

  • status_id

    • Used to tell if a teacher has marked an assignment as excused

    • 10 = this assignment has been excused (for this student)

  • email_notification

    • Not used anymore

  • user_details_grade

    • This is the current grade of the student

    • There appear to be students assigned to K(Kindergarten) that are in a higher grade

  • preferences

    • Old, not used anymore

  • registration_code

    • Not important for research, used for registration

  • email_validation_code

    • Not important for research, used for registration

  • heard_of_us_option_id

    • Not important for research, used for registration

  • heard_of_us_other_text

    • Not important for research, used for registration

  • class_assignments_id

    • Same as assignment_id

  • class_assignments_sequence_id

    • ASSISTments is very confusing in how they use "Problem Set" and "Sequence". The same object that is called a "Sequence" in the database is exposed to teachers as a "Problem Set". If you have a sequence ID, you can use the converter here to get the corresponding problem set number to use ASSISTments and see exactly what the content looked like.

  • student_class_id

    • The ID of the class, the same for all students in the same class. If you want to heiricharchila liearn modeling you can use this for the class ID. We can also give you a teacher ID. You might also want to look at section ID (if its not in here we can give it to you: Korinn).

  • class_assignments_name

    • Currently unsure what this column refers to

  • points_possible

    • This is the points possible we have assigned to the problem. It is not significant now, but might be used in the future for computing partial credit.

  • has_files

    • This is very old. Currently unsure what this column refers to

  • due_date

    • Due date for the assignment. Can be set by the teacher as a date and time or just a date.

  • scope_type

    • Currently unsure what this column refers to

  • scope_id

    • Currently unsure what this column refers to

  • release_date

    • The time when the assignment showed up in student view

  • curriculum_item_id

    • Mainly used for folders. Aside from that, currently unsure what this column refers to.

  • sprial_assignment_id

    • Used to be used for ARRS, not useful anymore.

  • category

    • Currently unsure what this column refers to

  • class_assignments_assignment_type_id

    • Number corresponding with assignment type

    • 1 = ClassAssignment

    • Other numbers correspond with other assignment types (One of these means invdivudlzied assignment , one means reassessments means ret

    • 6 mean releasing

  • assigned_date

    • Should be the date the teacher assigned the assignment, in some problem logs the day appears to be missing and only the time is shown.

  • time_to_due_date

    • Currently unsure what this column refers to. Possibly the amount of time between when the student finished the assignment and when the assignment was due.

  • assignment_types_id

    • Same as class_assignments_assignment_type_id

  • assignment_types_assignment_type

    • Type of assignment, usually class, but probably can also be ARRS

  • assignment_types_origin

    • Who assigned the assignment. Can be Teacher, ARRS or placements

  • assignment_types_display_name

    • How the assignment type is displayed. The same as assignment_types_assignment_type.

  • user_roles_id

    • Each student has a distinct user_role_id that points to a place in the user roles table that tells what role they have.

  • role_id

    • 4 = Student

    • 2 = Teacher

    • 14 = Researcher

  • user_roles_user_id

    • Same as user_id

  • location_id

    • The id number for the school

  • type

    • For students this will always be school

  • location_type

    • For students this will always be school

  • school_id

    • ID of the school where the problem was assigned.

  • district_id

    • The id of the district

  • verified

    • Not useful for researchers. Students are only allowed to join verified schools (where verified = TRUE)

  • locked

    • Currently unsure what this column refers to

  • districts_id

    • Id number for the student’s district

  • district_name

    • Name of the student’s district

  • districts_state

    • Student’s state

  • districts_state_id

    • Id for the student’s state

  • districts_verfied

    • Not useful for researchers. Teachers are only allowed to join verified districts(where districts_verified = TRUE).

  • state_id

    • The id of the student’s state

  • state_name

    • Name of the student’s state

  • abbreviation

    • Abbreviation of the student’s state

  • state_locked

    • Not useful for researchers. Teachers are only allowed to join verified states(where state_locked = TRUE)

  • assistment_id

    • described above

  • assistments_name

    • The name of the assistment, will be cut off if too long

  • description

    • Is quite old, and doesn't appear to be used

  • assistments_created_at

    • Time the assistment was created. Should be a date and time, but sometimes is showing up as only a time.

  • assistments_position

    • Should be blank

  • assistment_type_id

    • 1 = normal problem

    • 2 = variablized template

  • sequence_id

    • same as class_assignments_sequence_id

  • head_section_id

    • Not important for research. The head section decides if a problem is linear or a skill builder.

  • sequences_created_at

    • Should be the date the sequence was created, but currently doesn't appear to be working.

  • sequence_name

    • The title of the problem set (problem set and sequence are the same thing)

  • sequence_description

    • ASSISTments does not allow the sequence description to be changed anymore. If you are looking at an old problem set, there could be a sequence description.

  • quality_level_id

    • 1 = uncertified

    • 3 = WPI certified

  • sequences_parameters

    • Parameters returned from the quick builder

  • sequences_updated_at

    • The last date the sequence was updated

Column Headings Not Currently Used

  • order_id

    • These id's are chronological, and refer to the id of the original problem log.

  • tutor_mode

    • tutor, test mode, pre-test, or post-test

  • problem_set_type

    • Linear - Student completes all problems in a predetermined order.

    • Random - Student completes all problems, but each student is presented with the problems in a different random order.

    • Mastery - Random order, and student must "master" the problem set by getting a certain number of questions correct in a row before being able to continue.

  • base_sequence_id

    • This is to account for if a sequence has been copied. This will point to the original copy, or be the same as sequence_id if it hasn't been copied.

  • skill_id

    • ID of the skill associated with the problem.

    • For the skill builder dataset, different skills for the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills.

    • For the non skill builder dataset, different skills for the same data record are in the same row, separated with comma.

  • skill_name

    • Skill name associated with the problem.

    • For the skill builder dataset, different skills for the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills.

    • For the non skill builder dataset, different skills for the same data record are in the same row, separated with comma.

  • hint_total

    • Number of possible hints on this problem. We tell you the total number of hints so you can compute something like a % of hints used. Not all problems have all the same number of hints.

  • template_id

    • The template ID of the ASSISTment. ASSISTments with the same template ID have similar questions.

  • opportunity

    • The number of opportunities the student has to practice on this skill.

    • For the skill builder dataset, opportunities for different skills of the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills and the corresponding opportunity count.

    • For the non skill builder dataset, opportunities for different skills of the same data record are in the same row, separated with comma.

  • opportunity_original

    • The number of opportunities the student has to practice on this skill counting only original problems.

    • For the skill builder dataset, original opportunities for different skills of the same data record are in different rows. This means if a student answers a multi skill question, this record is duplicated several times, and each duplication is tagged with one of the multi skills and the corresponding original opportunity count.

    • For the non skill builder dataset, original opportunities for different skills of the same data record are in the same row, separated with comma.