Automated Scoring of Multidimensional Assessments

Using Machine Learning to Evaluate Multidimensional Assessments of Chemistry and Physics

Sarah Maestrales1, Xiaoming Zhai2, Israel Touitou3, Quinton Baker4,

Barbara Schneider5, Joseph Krajcik6

1 College of Education, Michigan State University

2 Department of Mathematics and Science Education, University of Georgia

3 CREATE for STEM Institute, Michigan State University

4 College of Education, Michigan State University

5 College of Education, Michigan State University

6 CREATE for STEM Institute, Michigan State University

Abstract

With the release of the Framework for K-12 Science Education (NRC, 2012) and the subsequent Next Generation Science Standards (NGSS, 2013), there has been a call for a shift in teaching and learning in science classrooms. Instead of learning the content of a scientific domain, students are prompted to engage in a variety of scientific and engineering practices (SEPs) to make sense of phenomena while learning both its disciplinary core ideas (DCIs) as well as crosscutting concepts (CCs) applicable for understanding and examining their meaning. The proposed shift in learning and instruction is accompanied by new science assessments. To evaluate students' learning, assessment items are now designed to incorporate the three-dimensions of learning science, SEPs as well as the CCs alongside DCIs. Due to the complex nature of such items, using a constructed response (CR) item is viewed as a more appropriate modality for tackling these complex ideas rather than traditional multiple-choice ones. However, using CR items makes grading both laborious and inconsistent with multiple scorers. In this study, we examined the use and reliability of a machine learning text analysis protocol as an alternative to human scoring using CR items that include multiple dimensions of science learning. Following human raters through the training and calibration for scoring a large randomized sample of responses, we built a robust training set. Due to rigorous procedures, the predictive scoring models achieved similar performance to human raters. These results show the potential for the use of machine learning protocols to lessen the load of human scoring while maintaining fidelity and differentiation between different dimensions of learning.

Maestrales, S., Zhai, X., Touitou, I., Baker, Q., Schneider, B., & Krajcik, J. (2020). Using Machine Learning to Evaluate Multidimensional Assessments of Chemistry and Physics. Manuscript submitted for publication.