Winners

Winners announced of a competition to predict student interest in science

Researchers at WPI, UPenn, UMass has announced the results of an educational data mining competition at the NorthEast Big Data Spoke Meeting at MIT, February 16th 2018.

The competition.

Our schools need tools to identify students who are interested in science, technology, engineering and mathematics (STEM) careers. “One purpose of this competition was to develop better tools to understand and predict students’ future interests in STEM careers.” said Dr. Heffernan, one of the competition organizers. Once these interests are known, young students can be supported and encouraged to continue their interest in science. 74 teams of researchers were given a very large stream of data (418 megabytes) and asked to predict which students would enter STEM graduate school programs and later STEM careers. They were told, for a sample of 514 students, which students' first job out of college was in a STEM career. (No personal identifiable information was given out; just student actions in the software -- such as answering “4” on a math question after pausing for 22 seconds -- and a single outcome measure, whether the former student is now pursuing a STEM career or not). The dataset for this competition followed 423 students for about 10 years, starting when they used an online mathematics system as middle-school students (from 2004-2006) through to when they graduated from college.

The participants.

Participants from several countries received access to extensive, de-identified click-stream data from middle school instructional software usage. Their job was to use data mining algorithms of their choice to predict a student’s first job out of college. Algorithms were used on the log data to correlate middle school students’ behavior within the software to predict high school and college outcomes and ultimately choice of career. The goal in hosting this competition was to make good predictions about student career choices and also to support the field of Data Mining for Education to learn from this competition.

Dr. Heffernan is the creator of the ASSISTments platform, the online mathematic software used by over 50,000 students a year in the U.S. The data for the competition came from students who used ASSISTments a decade ago, as well as data on whether these students had begun a STEM career after college. The National Science Foundation (NSF) had previously funded the creation and maintenance of this longitudinal data set, as well as tracking to which college these students went.

One of the other organizers, Dr. Ryan Baker of the UPenn, said, “We already knew that by using this data we could better predict state test scores, who will go on to college, and whether they will choose a STEM major. So for this competition, we asked participants to predict the next thing - Did the students wind up in a STEM career?'

Heffernan said, “It makes sense that students whose clickstream data shows they are more engaged in math class are probably more likely to go into a STEM career, and on the flip side, students who are bored, frustrated, off task, or confused are less likely to want to study more math. Researchers can detect who is losing interest in math, and these effects persist for over a decade.”

Announcement of winners.

74 research teams participated in the competition.

The first place winning team of Chun Kit Yeung, Kai Yang, and Dit-yan Yeung is from the Hong Kong University of Science and Technology. They used a state-of- the-art “Deep Knowledge Tracing” model to predict student performance, combined with boosting algorithms. Prof. Yeung said, “It was great to participate in this challenge and to be recognized. We learned a lot about deep learning, and were glad to be given a chance to practice those skills.

The second place winner was Makhlouf Jihed from Japan’s Kyushu University, and third place honors went to the University of Michigan Data Science Team, a group that regularly competes in data competitions like this one.

The bigger picture.

Justin Reich, an assistant professor of Comparative Media Studies/Writing MIT, who hosted the meeting said this in reflection: “Open data is a crucial component of the future of science, and we are thrilled at the MIT Teaching Systems Lab to host the announcement of this important competition. It’s so exciting to see a global community of students and researchers collaborating to better understand how learning analytics can meaningfully support learners, while accounting for considerations of privacy and ethics. I applaud WPI and UPenn for sharing their data, and offering this competition in order to advance data science in education.”

Beverly Woolf, a Research Professor at UMass-Amherst, and director of the NSF grant that funded the project, said “One of our goals is to build a community of researchers and practitioners around gathering, storing, and analyzing data from schools, students and administrators. Development of longitudinal analyses to predict students’ career choice is one use of exploratory analyses to identify regular (or unusual) patterns in data, thus helping to formulate new scientific hypotheses. We are excited to provide this opportunity and associated training opportunity to winners of this competition.”