Datasets

This page is old.  We have moved the maintenance of data sets to here but leave this page live as others might be pointing to this.

Professor Heffernan asks his graduate students to release their data and code to this website every time they publish a paper (as long as the data is %100 anonymized; we do have data sets that include student open response data where a student could release personally identifiable information (PII) like this paper.  For such paper you can ask for a legally binding contract that will require you do a few things like not attempting to look for PII and if you find it to tell me about it and delete it.)   He believes that makes for good science.  All papers after 2012 should have a publicly available link to get the data, but if that is not true, email all the authors, and myself, asking for the data. I am glad that researchers that I don't even know are starting to use these date sets and are writing their own papers.  ( If you are writing a paper or know of a paper that uses assistments data please send me (Neil Heffernan) a citation!)

With all this concern about student privacy, I liked this quote from Annie Murphy Paul "But there’s another potential danger regarding student data that is much less frequently noted: the possibility that it will sit unused, inaccessible to parents, educators, the general public—and students themselves. " 

ASSISTments seeks to share its data from its system.  Of course we scrub the data of all personally identifiable information to the best of our ability.  Sometime people release data that they think is scrubbed but were wrong (see this ) so the terms of use of this data set is that if you find something personally identifiable information you need to report that and you cannot use that information in anyway (suppose some student type in "I am Barrack Obama and I live at 1600 Penn Ave in DC" that would personalized data that we did not catch.  If you find any of this, you must delete it, and can not use that. )  Please the WPI Institutional Review Board approved terms of use that govern the use of the data.  

If you want to hear more on open science read this.

I am aware of over 10 of our data sets have been used in over 45 published works, all written by others.  (ie where Hefferna is not an author)

Click here to get to those commonly used data sets.

How do you read log files from ASSISTments?  Since many different projects use the same variables here is the one place were we try to document the mean of the different variables.  

This data is released by Professor Neil Heffernan of  WPI.

 N. T. Heffernan, 2014.