2012-13 School Data with Affect

Press

In the summer of 2014 there was a scientific American article that highlighted the work of Ryan Baker with ASSISTments data. Below is the relevant paragraph and a link to the full article.

The papers relevant to this include (but at not limited to)

First paper reporting we can measure affect
- San Pedro, M., Baker, R., Gowda, S., & Heffernan, N. (2013). Towards an Understanding of Affect and Knowledge from Student Interaction with an Intelligent Tutoring System. In Lane, Yacef, Motow & Pavlik (Eds) The Artificial Intelligence in Education Conference. Springer-Verlag. pp. 41-50. Publisher link
Then we showed we can better predict state test scores
- Pardos, Z.A., Baker, R.S.J.d., San Pedro, M.O.C.Z., Gowda, S.M., Gowda, S.M. (2014) Affective States and State Tests: Investigating How Affect and Engagement during the School Year Predict End‐of‐Year Learning Outcomes. Journal of Learning Analytics, 1(1), 107–128.
  1. First appeared as Pardos, Z.A., Baker, R.S.J.d., San Pedro, M.O.C.Z., Gowda, S.M., Gowda, S.M. (2013) Affective states and state tests: Investigating how affect throughout the school year predicts end of year learning outcomes. Proceedings of the 3rd International Conference on Learning Analytics and Knowledge, 117-124.
Then we showed we can predict who enrolls in college years later
- San Pedro, M., Baker, R., Bowers, A. & Heffernan, N. (2013) Predicting College Enrollment from Student Interaction with an Intelligent Tutoring System in Middle School. In S. D'Mello, R. Calvo, & A. Olney (Eds.) Proceedings of the 6th International Conference on Educational Data Mining
Then we showed we can predict college major they will pursue
- San Pedro, M., Ocumpaugh, J., Baker, R., & Heffernan, N. (2014) Predicting STEM and Non-STEM College Major Enrollment from Middle School Interaction with Mathematics Educational Software. In John Stamper et al. (Eds) Proceedings of the 7th International Conference on Educational Data Mining. pp. 276-279. A longer version is here
We followed that up with looking at how gaming the system was important in making the decision.
- San Pedro, M.O., Baker, R., Heffernan, N., Ocumpaugh, J. (2015) Exploring College Major Choice and Middle School Student Behavior, Affect and Learning: What Happens to Students Who Game the System? Proceedings of the 5th International Learning Analytics and Knowledge Conference. pp 36-40.
Along the way Baker et al lead the team to make sure the detectors generalize across urban, suburban and rural areas

1. Ocumpaugh, J., Baker, R., Gowda, S., Heffernan, N., Heffernan, C. (2014) Population validity for Educational Data Mining models: A case study in affect detection. British Journal of Educational Technology, 45 (3), 487-501. OI: 10.1111/bjet.12156

Finally, Heffernan has refit the detectors with new features
- Wang, Y., Heffernan, N, & Heffernan, C. (2015) Towards better affect detectors: effect of missing skills, class features and common wrong answers. Proceedings of the Fifth International Conference on Learning Analytics And Knowledge. pp 31-35. See data here and here
Finally, Heffernan has refit the detectors with new features
- Botelho, A. F., Baker, R. S., & Heffernan, N. T. (2017). Improving Sensor-Free Affect Detection Using Deep Learning. In E. Andre' et al (Eds.) Proceedings of the Eighteenth International Conference on Artificial Intelligence in Education. Pp 40-51.
Ryan's student used the detectors and some NLP tools .
- Slater, S., Ocumpaugh, J., Almeda, M., Allen, L., Heffernan, N., & Baker, R. (2017) Using Natural Language Processing Tools to Develop Complex Models of Student Engagement. Affective Computing and Intelligent Interaction, At San Antonio, TX, US.

Here is the Data

This is the ASSISTments data for the school year 2012~2013 with affect predictions.

https://drive.google.com/file/d/1cU6Ft4R3hLqA7G1rIGArVfelSZvc6RxY/view?usp=sharing

https://drive.google.com/file/d/0BxCxNjHXlkkHczVDT2kyaTQyZUk/edit?usp=sharing (used to be stored here)

When you download this file and unzip it will take 3 gig of ram!!! So I made a small version with the first few rows here but even this is hard to read. We have the actions that have quotes in them that make this impossible to open in normal editors. This is in fact how the data is stored in our data base but that does not make it easy to use. I think sometime soon we will have an easier to use data set. If we remove the action column that makes it easier but kills all the data.

If you use the columns related to affect please cite the following paper.

- Wang, Y., Heffernan, N, & Heffernan, C. (2015) Towards better affect detectors: effect of missing skills, class features and common wrong answers. Proceedings of the Fifth International Conference on Learning Analytics And Knowledge. pp 31-35. See data here and here

My PhD student ZachPardos did refit the detector. Pardos, Z.A., Baker, R.S.J.d., San Pedro, M.O.C.Z., Gowda, S.M., Gowda, S.M. (2013) Affective states and state tests: Investigating how affect throughout the school year predicts end of year learning outcomes. Proceedings of the 3rd International Conference on Learning Analytics and Knowledge, 117-124.

Later my other phd student, Anthony Botelho refit them with deep learning.

Botelho, A. F., Baker, R. S., & Heffernan, N. T. (2017). Improving Sensor-Free Affect Detection Using Deep Learning. In E. Andre' et al (Eds.) Proceedings of the Eighteenth International Conference on Artificial Intelligence in Education. Pp 40-51.

How to cite this data if you don't care about Affect ?

If you are not using the affect column please acknowledge ASSISTments with a citation to the following paper.

Feng, M., Heffernan, N.T., & Koedinger, K.R. (2009). Addressing the assessment challenge in an Intelligent Tutoring System that tutors as it assesses. The Journal of User Modeling and User-Adapted Interaction.19, 243-266. (Based on CP15) Best Paper of the Year (See Award #20 above). Mentioned in National Ed. Tech Plan (See Award 19 above).

Column Headings

See this page for more detail on the column heading. See also https://sites.google.com/site/assistmentsdata/how-to-interpret on her to interpret this.

problem_log_id
- Unique ID of the logged actions. Problem_log is the table that the biggest in ASSISTments. About 10 millions problems were solved in 2012, so there are 10 million rows in the data base. For each problem, there might be a few attempts that a child made and a few hint requests. We call them actions. We don't store actions in their own table but if we did it would be bigger as every problem has at least one action (if its correct) and more if the student was incorrect. We store in the "actions" field of problem_log the actions with time stamps.
skill
- Skill name associated with the problem (different skills are in different rows).
problem_id
- The ID of the problem.
user_id
- The ID of the student doing the problem.
assignment_id
- Two different assignments can have the same sequence id. Each assignment is specific to a single teacher/class.
assistment_id
- The ID of the ASSISTment. An ASSISTment consists of one or more problems.
start_time
- Timestamp when the problem starts.
end_time
- Timestamp when the problem ends.
problem_type
- choose_1: Multiple choice (radio buttons)
- algebra: Math evaluated string (text box)
- fill_in: Simple string-compared answer (text box)
- open_response: Records student answer, but their response is always marked correct
original
- 1 = Main problem
- 0 = Scaffolding problem
correct
- 1 = Correct on first attempt
- decimal values are calculated as a partial credit based on the number of hints and attempts needed to solve (based on teacher setting)
- 0 = student either saw the answer, exhausted partial credit from too many hints/attempts, or (based on teacher setting) answered incorrectly on the first attempt
  - c.f. Wang, Y., Heffernan, N. T., & Beck, J. E. (2010, June). Representing Student Performance with Partial Credit. In EDM (pp. 335-336).
- When observed as a dependent variable, it is recommended that this value be converted to a binary variable using the formula: 1 = correct, <1 = Incorrect
bottom_hint
- Whether or not the student asks for all hints.
hint_count
- Number of hints on this problem.
actions
- Every action on this problem.
attempt_count
- Number of student attempts on this problem.
ms_first_response
- The time in milliseconds for the student's first response.
tutor_mode
- tutor, test mode, pre-test, or post-test
sequence_id
- The content id of the problem set. Different assignments that are assigned the same problem set will have the same sequence id. Again the terminology is confused as years ago when ASSISTments was starting we called problem sets sequences. But a problem set in our modern use of the term is really stored as a sequence. Most sequences are simple, but it's possible to build a problem set that is a hierarchical tree of problem sets.
student_class_id
- The class ID.
position
- Assignment position on the class assignments page.
type
- This is the type of the head section of the problem set. Each problem set is usually one of the following three.
  - Linear - Student completes all problems in a predetermined order.
  - Random - Student completes all problems, but each student is presented with the problems in a different random order.
  - Mastery - Random order, and student must "master" the problem set by getting a certain number of questions correct in a row before being able to continue. ASSISTments calls problem sets that have a head section that is of type mastery a "Skill Builder".
base_sequence_id
- This is to account for if a sequence has been copied. This will point to the original copy, or be the same as sequence_id if it hasn't been copied.
skill_id
- ID of the skill associated with the problem (different skills are in different rows).
teacher_id
- The ID of the teacher who assigned the problem.
school_id
- The ID of the school where the problem was assigned.
overlap_time
- The time in milliseconds for the student's overlap time.
template_id
- The template ID of the ASSISTments. ASSISTments with the same template ID have similar questions.
answer_id
- The answer ID for multi-choice questions.
answer_text
- The answer text for fill-in questions.
first_action
- The type of first action: attempt or ask for a hint.
problemlog_id
- Unique ID of the logged actions.
Average_confidence(FRUSTRATED)
- Predicted Frustration of student for the problem. Value close to "0" being less frustrated and close to "1" being more frustrated.
Average_confidence(CONFUSED)
- Predicted Confusion of student for the problem. Value close to "0" being less confused and close to "1" being more confused.
Average_confidence(CONCENTRATING)
- Predicted Engaged Concentration of student for the problem. Value close to "0" being less concentrated and "1" being more concentrated.
Average_confidence(BORED)
- Predicted Boredom of student for the problem. Value close to"0" being less bored and "1" being more bored.

Here's the code and data for the current affect detectors:

https://drive.google.com/folderview?id=0B9MXO4ELrnzyUjdlVE1PWElHSE0&usp=sharing

Professor Heffernan has released the actual questions as well,. For instance in this paper

Pardos, Z.A., Dadu, A. (2017) Imputing KCs with Representations of Problem Content and Context. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization (UMAP'17). Bratislava, Slovakia. ACM. Pp. 148-155. http://dl.acm.org/authorize?N31523

the authors applied an NLP technique to try to guess for each problem, what skill it should be tagged to. If you want access to the questions it is required that you ask Professor Heffernan as he does not want the questions (and the answers) put up on the web where students could get them. He will ask you agree to abide by that request. If you want access to text of the problem email nth@wpi.edu and cc td@wpi.edu from a google email, with enough information that explain that your are legitimate and that you agree to not share with anyone else, and we will share the google folder with your gmail account.