2025
Obroampah, D., Cui, Y., & Gierl, M. (in press). Investigating predictors of student performance in STEM using educational data mining techniques. Canadian Journal of Science, Mathematics, and Technology Education.
Kornerup, I., Compton, S., Guo, Q., Gierl, M., Lai, H., Zeinabadi, M., Horta, K., & Catunda, R. (in press.) Assessing dental students’ emotions while treating uncooperative children: A pilot study. Latin American Journal of Pediatric Dentistry.
Tan, B., Bulut, O., Mazzullo, E., Amoush, N., & Gierl, M. (in press). A review of automatic item generation techniques leveraging large language models. International Journal of Assessment Tools in Education.
Shin, J., Gierl, M., Brett-MacLean, P. & Lai, H. (2025). Assessing reflective writing of medical students using Natural Language Processing approaches. In R. Johnson & M. Tweedie (Ed.), Technology for Medical Language Assessment: Transdisciplinary Perspectives (pp. 39-60). Berlin, Boston: De Gruyter Mouton. https://doi.org/10.1515/9783110793321-003
Firoozi, T. & Gierl, M. (2025). Banking strategies and software solutions for generated test items. In Proceedings of the 17th International Conference on Computer Supported Education (CSEDU 2025), Vol. 1, pp. 779-784. DOI:10.5220/0013491000003932
Mohammadi, H., Firoozi, T., & Gierl, M. J. (2025). Neural semantic video analysis. In M. Khosrow-Pour (Ed.), Encyclopedia of Organizational Knowledge, Administration, and Technologies (6th Ed.)(Chapter 65, pp.1-15). Hershey, PA: IGI Global. DOI: 10.4018/978-1-6684-7366-5.ch068
2024
Firoozi, T., Mohammadi, H., & Gierl, M. J. (2024). Using automated procedures to score educational essays written in three languages. Journal of Educational Measurement. 10.1111/jedm.12406. pp. 1-24.
Firoozi, T., & Gierl, M. J. (2024). Scoring essays written in Persian using a transformer-based model: Implications for multilingual AES. Invited chapter in M. Shermis & J. Wilson (Eds.), The Routledge International Handbook of Automated Essay Evaluation (pp. 55-77). New York: Routledge. [We were invited to prepare a chapter on multilingual automated essay scoring for the Routledge International Handbook of Automated Essay Evaluation.]
Shin, J., & Gierl, M. J. (2024). Automated short-response scoring for automated item generation in science assessments. Invited chapter in M. Shermis & J. Wilson (Eds.), The Routledge International Handbook of Automated Essay Evaluation (pp. 504-534). New York: Routledge. [We were invited to prepare a chapter on the interface between automated essay scoring and automatic item generation for the Routledge International Handbook of Automated Essay Evaluation.]
Shin, J., Wang, B., Pinto, W., & Gierl, M. J. (2024). An engagement-aware predictive model to evaluate problem-solving performance from the Study of Adult Skills' process data. Large-scale Assessments in Education, 12:6.
Sayin, A. and Gierl, M. (2024), Using OpenAI GPT to generate reading comprehension items. Educational Measurement: Issues and Practice, 43, 5-18.
2023
Firoozi, T., Bulut, O., & Gierl, M. (2023). Language models in automated essay scoring: Insights for the multilingual world. International Journal of Assessment Tools in Education, 10, 148-162. (Special Issue: Educational Measurement and Evaluation: Lessons from the Past, Visions for the Future)
Sayin, A. Bozdag, S., & Gierl, M. (2023). Automatic item generation for nonverbal reasoning items. International Journal of Assessment Tools in Education, 10, 131-147. (Special Issue: Educational Measurement and Evaluation: Lessons from the Past, Visions for the Future)
Sayin, A. & Gierl, M. (2023). Automatic item generation for online measurement and evaluation: Turkish literature items. International Journal of Assessment Tools in Education, 10, 218-231.
Leslie, T., & Gierl, M. J. (2023). Using automatic item generation to create multiple-choice questions for pharmacy assessment. Journal of Pharmaceutical Education. https://doi.org/10.1016/ j.ajpe.2023.100081
Shin, J, & Gierl, M. J. (2023). A technology-enhanced approach for locating timely and relevant news articles for context-based science education. In R. Ouyand, P. Jiao, B. M. McLaren, & A. H. Alavi (Eds.), Artificial Intelligence in STEM Education: The Paradigmatic Shifts in Research, Education, and Technology (pp. 109-126). Boca Raton, FL: CRC Press.
Gierl, M., Shin, J., & Firoozi, T. (2023). Automatic item generation. In R. Tierney, F. Rizvi, & K. Ercikan (Eds.), International Encyclopedia of Education (4th Ed., pp. 193-200). New York: Elsevier. https://doi.org/10.1016/B978-0-12-818630-5.10026-0
2022
Firoozi, T., Mohammadi, H., & Gierl, M. J. (2022). Using active learning methods to strategically select essays for automated scoring. Educational Measurement: Issues and Practice, 00, 1– 10. https://doi.org/10.1111/emip.12537
Shin, J., Gierl, M., & Lai, H. (2022). Generating reading comprehension items using automated processes. International Journal of Testing, 22, 289-311.
Gierl, M. J., Swygert, K., Matovinovic, D., Kulesher, A., & Lai, H. (2022). Three sources of validation evidence needed to evaluate the quality of generated test items for medical licensure. Teaching and Learning in Medicine. https://doi.org/10.1080/10401334.2022.2119569.
Shin, J., & Gierl, M. J. (2022). Attribute-specific essay scoring using convolution neural networks. Journal of Applied Testing Technology, 22, 1-17
Gierl, M. J., Shin, J., Firoozi, T., & Lai, H. (2022). Using content coding and automatic item generation to improve test security. Frontiers in Education (Special Issue: Online Assessment for Humans—Advancements, Challenges, and Futures for Digital Assessment). 07:853578. doi: 10.3389/feduc.2022.853578
2021
Stephen, T., Gierl, M., & King, S. (2021). Automated essay scoring of constructed responses in nursing examinations papers: An evaluation. Nurse Education in Practice, 54. https://doi.org/10.1016/j.nepr.2021.103085
Odell, B., Gierl, M., & Cutumisu, M. (2021). Testing measurement invariance of PISA 2015 mathematics, science, and ICT scales using the alignment method. Studies in Educational Evaluation, 68. https://doi.org/10.1016/j.stueduc.2020.100965
Lai, H. & Gierl, M. J. (2021). Automating the generation of test items. In M. Khosrow-Pour (Ed.), Encyclopedia of Organizational Knowledge, Administration, and Technologies (pp. 233-244). Hershey, PA: IGI Global.
Shin, J., Guo, Q., & Gierl, M. J. (2020). Automated essay scoring using deep learning algorithms. In M. Khosrow-Pour (Ed.), Handbook of Research on Modern Educational Technologies, Applications, and Management (1st Ed.). DOI: 10.4018/978-1-7998-3476-2.ch003.
2020
Leslie, T., & Gierl, M. Using automatic item generation methodology to create multiple-choice questions appropriate for entry to pharmacy practice assessment. AFPC PERC 2020 Abstracts—Oral and Poster Presentations. Can Pharm J (Ott) 2020, 153: 30.
Shin, J., & Gierl, M. J. (2020). More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms. Language Testing. https://doi.org/10.1177/0265532220937830
Pugh, D., De Champlain, A., Gierl, M., Lai, H., & Touchie, C. (2020). Can automated item generation be used to develop high-quality MCQs that assess application of knowledge? Research and Practice in Technology Enhanced Learning, 15, 12. https://doi.org/10.1186/s41039-020-00134-8.
Latifi, F. S., & Gierl, M. J. (2020). Automated scoring of junior high essays using Coh-Metrix features: Implications for large-scale language testing. Language Testing. https://doi.org/10.1177/0265532220929918
Shin, J., Bulut, O., & Gierl, M. (2020). Development practices of trusted AI systems among Canadian data scientists. The International Review of Information Ethics, 28. http://informationethics.ca/ index.php/irie/article/view/377.
2019
Shin, J., Bulut, O., & Gierl, M. J. (2019). The effect of best distractor location on the difficulty of multiple-choice items. Journal of Experimental Education. DOI: 10.1080/00220973.2019.1629577
Shin J., Guo Q., & Gierl M.J. (2019) Multiple-choice item distractor development using topic modeling approaches. Frontiers in Psychology, 10: 825. doi: 10.3389/fpsyg.2019.00825. Invited Paper in Special Issue on Advancements in Technology-Based Assessment: Emerging Item Formats, Test Designs, and Data Sources, Frank Goldhammer, Ronny Scherer, Samuel Greiff (Guest Editors).
Gierl, M. J., Matovinovic, D., & Lai, H. (2019) Creating content for educational testing using a workflow that supports automatic item generation. In A. Reyes-Munoz, P. Zheng, D. Crawford, & V. Callaghan (Eds.), EAI International Conference on Technology, Innovation, Entrepreneurship and Education, Lecture Notes in Electrical Engineering 532 (pp. 27-38), New York: Springer.
Shin, J., Guo, Q., & Gierl, M. J. (2019). Automated essay scoring using deep learning algorithms. Chapter to appear in M. Khosrow-Pour (Editor), Encyclopedia of Organizational Knowledge, Administration, and Technologies (1st Edition).
Gierl, M. J., Lai, H., & Matovinovic, D. (in press). Augmented intelligence and the future of item development. In M. H. Jiao & R. Lissitz (Eds.), Applications of artificial intelligence in assessment. Daryaganj, New Delhi: New Age Publishing.
Lai, H. & Gierl, M. J. (in press). Automating the generation of test items. Chapter to appear in M. Khosrow-Pour (Editor), Encyclopedia of Organizational Knowledge, Administration, and Technologies (1st Edition).
2018
Gierl, M. J., Bulut, O., & Zhang, X. (2018). Using computerized formative testing to support personalized learning in higher education: An application of two assessment technologies. In R. Zheng (Ed.), Digital technologies and instructional design for personalized learning (pp. 99-119). Hershey, PA: IGI Global.
Gierl, M. J., Lai, H., & Zhang, X. (2018). Automatic item generation. In M. Khosrow-Pour (Ed.), Encyclopedia of information science and technology (4th Ed., pp. 2369-2379). Hershey, PA: IGI Global.
Gierl, M. J., & Lai, H. (2018). Using automatic item generation to create solutions and rationales for computerized formative testing. Applied Psychological Measurement, 42, 42-57.
2017
Gierl, M. J., Bulut, O., Gao, Q., & Zhang, X. (2017). Developing, analyzing, and using distractors for multiple-choice tests: A comprehensive review. Review of Educational Research, 87, 1082-1116. [At the time of publication, the impact factor for the journal Review of Educational Research was 5.263 with a ranking of 2 out of 236 for the Education & Educational Research category using the 2016 release of Journal Citation Reports.]
Daniels, L., & Gierl, M. J. (2017). The impact of immediate test score reporting on university students‘ achievement emotions in the context of computer-based multiple-choice exams. Learning and Instruction, 52, 27-35.
Gierl, M. J., Daniels, L., & Zhang, X. (2017). Creating parallel forms to support on-demand testing for undergraduate students in psychology. Journal of Measurement and Evaluation in Education and Psychology, 8, 298-303.
Bulut, O., Guo, Q., & Gierl, M. J. (2017). A structural equation modeling approach for examining position effects in large-scale assessments. Large-scale Assessments in Education, 5: 8, 1-20.
Lai, H., Gierl, M. J., Cui, Y., & Babenko, O. (2017). Item consistency index: A method for evaluating item- model fit for cognitive diagnostic assessment. International Journal of Learning, Teaching and Educational Research, 16, 1-21.
Latifi, S., Gierl, M., Wang, R., Lai, H., & Wang, A. (2017). Information-based methods for evaluating the semantics of automatically generated test items. Artificial Intelligence Research, 6, 69-79.
2016
Gierl, M. J. & Lai, H. (2016). A process for reviewing and evaluating generated test items. Educational Measurement: Issues and Practice, 35, 6–20.
Gierl, M. J., Lai, H., Pugh, D., Touchie, C., Boulais, A-P, & DeChamplain, A. (2016). Evaluating the psychometric characteristics of generated multiple-choice test items. Applied Measurement in Education, 29, 196-210.
Latifi, S., Bulut, O., Gierl, M., Christie, T., & Jeeva, S. (2016). Differential performance on national exams: Evaluating item and bundle functioning methods using english, mathematics, and science assessments. SAGE Open, 6(2).
Zhang, X., & Gierl, M. J. (2016). A model-based method for content validation of automatically generated test items. Journal of Educational Issues, 2, 184-202.
Lai, H., Gierl, M. J., Touchie, C., Pugh, D., Boulais, A., & De Champlain, A. (2016). Using automatic item generation to improve the quality of MCQ distractors. Teaching & Learning in Medicine, 28, 166-173.
Pugh, D., DeChamplain, A., Gierl, M. J., Lai, H, & Touchie, C. (2016). Using cognitive models to develop quality multiple-choice questions. Medical Teacher.
Lai, H., Gierl, M. J., Byrne, B. E., Spielman, A., & Waldschmidt, D. (2016). Three modeling applications to promote automatic item generation for examinations in dentistry. Journal of Dental Education, 80, 339-347.
Gierl, M. J. & Lai, H. (2016). The role of cognitive models in automatic item generation. In A. Rupp & J. Leighton (Eds.), The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 124-145). New York: Wiley.
Cui, Y., Gierl, M. J., & Guo, Q. (2016). The rule space and attribute hierarchy methods. In A. Rupp & J. Leighton (Eds.), The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 354-378). New York: Wiley.
Gierl, M. J. & Lai, H. (2016). Automatic item generation. In S. Lane, M. Raymond, & T. Haladyna (Eds.), Handbook of test development (2nd edition, pp. 410-429). New York: Routledge.
Gierl, M. J., Lai, H., Fung, K., & Zheng, B. (2016). Using technology-enhanced processes to generate items in multiple languages. In F. Drasgow (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 109-127). New York: Routledge.
Gierl, M. J., Latifi, F., Lai, H., Matovinovic, D., & Boughton, K. (2016). Using automated processes to generate items to measure K-12 science skills. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on computational tools for real-world skill development (pp. 590-610). Hershey, PA: IGI Global.
2015
Gierl, M. J., Latifi, F., Lai, H., Matovinovic, D., & Boughton, K. (2015). Using automated processes to generate items to measure K-12 science skills. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on computational tools for real-world skill development (pp. 590-610). Hershey, PA: IGI Global.
Gierl, M. J., Lai, H., Hogan, J., & Matovinovic, D. (2015). A method for generating test items that are aligned to the Common Core State Standards. Journal of Applied Testing Technology, 16, 1-18.
Gierl, M. J., & Lai. H. (2015). Using automated processes to generate test items and their associated solutions and rationales to support formative feedback. Interaction Design & Architecture(s)—IxD&A Journal, N.25, 9-20. Special Issue on Technology-Enhanced Assessment: Agency Change in the Educational Eco-System, Marco Kalz, Eric Ras, & Denise Whitelock (Guest Editors).
Latifi, F., Gierl, M. J., Boulais, A-P, & DeChamplain, A. (2015). Using automated essay scoring to evaluate written-response prompts in English and French on high-stakes medical licensure exams. Evaluation & the Health Professions, 1-5. September 16. [Epublication ahead of print]. DOI: 10.1177/0163278715605358.
Gierl, M. J., Lai, H., Houston, L., Rich, C., & Boughton, K. (2015). Using automated processes to generate items in three or more languages. International Journal of e-Assessment, 1, 1-19.
Cui, Y., Gierl, M. J., & Guo, Q. (2015). Statistical classification for cognitive diagnostic assessment: An artificial neural network approach. Educational Psychology. DOI: 10.1080/01443410.2015.1062078
Gierl, M. J., & Lai, H. (2015). Using automated processes to generate English and French test items simultaneously. Mesure et évaluation en éducation—Measurement and Evaluation in Education, 37, 39-61. Invited Paper appearing in Special Issue on Methodological Advances in Assessment, François Vachon (Guest Editor).
Gierl, M. J., MacMahon-Ball, M., Vele, V., & Lai, H. (2015). Method for generating nonverbal reasoning items using n-layer modeling. In E. Ras & D. Joosten-ten Brinke (Eds.), Proceedings from the 2015 International Computer Assisted Assessment Conference, Communications in Computer and Information Science (pp. 1-10). New York: Springer.
2014
Gierl, M. J., Lai, H., Latifi, F., Boulais, A-P, & DeChamplain, A. (2014). Automated essay scoring and the future of assessment in medical education. Medical Education, 48, 950–962
2013
Gierl, M. J., & Lai, H. (2013). Using automated processes to generate test items. Educational Measurement: Issues and Practice, 32, 36-50.
Gierl, M. J., & Lai, H. (2013). Evaluating the quality of medical multiple-choice items created with automated generation processes. Medical Education, 47, 726-733.
Gierl, M. J., Lai, H., & Li, J. (2013). Identifying differential item functioning in multi-stage computer adaptive testing. Educational Research and Evaluation, 19:2-3, 188-203. Invited paper appearing in Special Issue on Fairness Issues in Educational Assessment, Hossein Karami (Guest Editor).
Gierl, M. J., & Haladyna, T. (2013). Introduction and overview of automatic item generation. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 3-12). New York: Routledge.
Gierl, M. J., & Lai, H. (2013). Using weak and strong theory to create item models for automatic item generation: Some practical guidelines with examples. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 26-39). New York: Routledge.
Lai, H., & Gierl, M. J. (2013). Using principles in assessment engineering to generate items for reading comprehension and mathematical reasoning. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 77-101). New York: Routledge.
Haladyna, T., & Gierl, M. J., & Haladyna, T. (2013). The future of automatic item generation. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 231-239). New York: Routledge.
2012
Gierl, M. J., Lai, H., & Turner, S. (2012). Using automatic item generation to create multiple-choice items for assessments in medical education. Medical Education, 46, 757-765.
Gierl, M. J., & Lai, H. (2012). Using item models for automatic item generation. International Journal of Testing, 12, 273-298.
Cui, Y., Gierl, M. J., & Chang, W. W. (2012). Estimating classification consistency and accuracy for cognitive diagnostic assessment. Journal of Educational Measurement, 49, 19-38.
2011
Squires, J.E., Estabrooks, C.A., Newburn-Cook, C.V., & Gierl, M. (2011). Validation of the Conceptual Research Utilization Scale: An application of the Standards for Educational and Psychological Testing in Healthcare. BMC Health Services Research, 11:107.
Wang, C, & Gierl, M. J. (2011). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees' cognitive skills in critical reading. Journal of Educational Measurement, 48, 1-24.
2010
Gierl, M. J., Alves, C., & Taylor-Majeau, R. (2010). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees' skills in mathematics: An operational implementation of cognitive diagnostic assessment. International Journal of Testing, 10, 318-341.
Roberts, M. R., & Gierl, M. J. (2010). Developing score reports for cognitive diagnostic assessment. Educational Measurement: Issues and Practice, 29, 25-38.
Zheng, Y., Gierl, M. J., & Cui, Y. (2010). Using Cochran's Z statistic to test the kernel-smoothed IRF differences between focal and reference groups. Educational and Psychological Measurement, 70, 541-556.
2009
Gierl, M. J., Cui, Y., & Zhou, J. (2009). Reliability of attribute-based scoring in cognitive diagnostic assessment. Journal of Educational Measurement, 46, 293-313.
Cor, K., Alves, C., & Gierl, M. J. (2009). Three applications of automated test assembly within a user-friendly modeling environment. Practical Assessment Research and Evaluation, 14, 1-23.
Gierl, M. J., Leighton, J. P., Wang, C., Zhou, J., Gokiert, R., & Tan, A. (2009). Developing and validating cognitive models of algebra performance on the SAT© (Research Report No. 2009-03). New York: The College Board.
2008
Gierl, M. J., Cui, Y., & Hunka, S. (2008). Using connectionist models to evaluate examinees' response patterns on tests. Journal of Modern Applied Statistical Methods, 7, 234-245.
Gierl, M. J., Zhou, J., & Alves, C. (2008). Developing a taxonomy of item model types to promote assessment engineering. Journal of Technology, Learning, and Assessment, 7(2). Retrieved [date] from http://www.jtla.org.
Gierl, M. J., & Cui, Y. (2008). Defining characteristics of diagnostic classification models and the problem of retrofitting in cognitive diagnostic assessment. Measurement: Interdisciplinary Research and Perspectives, 6, 263-268.
Gierl, M. J., & Zhou, J. (2008). Computer adaptive-attribute testing: A new approach to cognitive diagnostic assessment. Zeitschift fur Psychologie—Journal of Psychology, 216, 29-39. Invited Paper appearing in Special Issue on Adaptive Models of Psychological Testing, Wim J. van der Linden (Guest Editor).
Gierl, M. J., Zheng, Y., & Cui, Y. (2008). Using the Attribute Hierarchy Method to identify and interpret the cognitive skills that produce group differences. Journal of Educational Measurement, 45, 65-89.
Gierl, M. J., Wang, C., & Zhou, J. (2008). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees' cognitive skills in algebra on the SAT. Journal of Technology, Learning, and Assessment, 6 (6). Retrieved [date] from http://www.jtla.org.
Cor, M. K., Alves, C., & Gierl, M. J. (2008). [Review of the software ‘Conducting Automated Test Assembly using the Premium Solver Platform Version 7.0 with Microsoft EXCEL and the Large-Scale LP/QP Solver Engine Add-In.’] Applied Psychological Measurement, 32, 652-663.
2007
Gierl, M. J. (2007). Making diagnostic inferences about cognitive attributes using the rule space model and Attribute Hierarchy Method. Journal of Educational Measurement, 44, 325-340. Invited Paper appearing in Special Issue on IRT-Based Cognitive Diagnostic Models and Related Methods, Lou DiBello & William Stout (Guest Editors).
Leighton, J. P., & Gierl, M. J. (2007). Defining and evaluating models of cognition used in educational measurement to make inferences about examinees' thinking processes. Educational Measurement: Issues and Practice, 26, 3-16.
Magill-Evans, J., Harrison, M., Benzie, K., Gierl, M. J., & Kimak, C. (2007). Effects of parenting education on first-time fathers' skills in interactions with their infants. Fathering, 5, 41-56.
Gierl, M. J., Leighton, J. P., & Hunka, S. (2007). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees’ cognitive skills. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications. (pp. 242-274). Cambridge, UK: Cambridge University Press.
Gierl, M. J., & Leighton, J. P. (2007). Directions for future research in cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 341-351). Cambridge, UK: Cambridge University Press.
Leighton, J. P., & Gierl, M. J. (2007). Verbal reports as data for cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications. (pp. 146-172). Cambridge, UK: Cambridge University Press.
Leighton, J. P., & Gierl, M. J. (2007). Cognitive diagnostic assessment: An introduction. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 3-18). Cambridge, UK: Cambridge University Press.
Gierl, M. J., & Leighton, J. P. (2007). Linking cognitively-based models and psychometric methods. In C. R. Rao & S. Sinharay (Eds.) Handbook of statistics: Psychometrics, Volume 26 (pp. 1103-1106). North Holland, UK: Elsevier.
VanderVeen, A. A., Huff, K., Gierl, M., McNamara, D. S., Louwerse, M., & Graesser, A. (2007). Developing and validating instructionally relevant reading competency profiles measured by the critical reading section of the SAT Reasoning Test©. In D. S. McNamara (Ed.) Reading comprehension strategies: Theories, interventions, and technologies (pp. 137-172). New York, NY: Taylor & Francis.
Gierl, M. J., & Elatia, S. (2007). [Review of the book ‘Adapting educational and psychological tests for cross-cultural assessment’ edited by Ronald K. Hambleton, Peter Merenda, & Charles Spielberger.] Applied Psychological Measurement, 31, 74-78.
2006
Bolt, D. M., & Gierl, M. J. (2006). Testing features of graphical DIF: Application of a regression correction to three nonparametric statistical tests. Journal of Educational Measurement, 43, 313-333.
Gierl, M. J., Leighton, J. P., Tan, X. (2006). Evaluating DETECT classification accuracy and consistency when data display complex structure. Journal of Educational Measurement, 43, 265-289.
Puhan, G., & Gierl, M. J. (2006). Evaluating the effectiveness of two-stage testing on English and French versions of a science achievement test. Journal of Cross-Cultural Psychology, 37, 136-154.
2005
Gierl, M. J., Tan, X., & Wang, C. (2005). Identifying content and cognitive dimensions on the SAT© (Research Report No. 2005-11). New York: The College Board.
Gierl, M. J. (2005). Using a dimensionality-based DIF analysis paradigm to identify and interpret constructs that elicit group differences. Educational Measurement: Issues and Practice, 24, 3-14.
2004
Leighton, J. P., Gierl, M. J., & Hunka, S. (2004). The Attribute Hierarchy Method for cognitive assessment: A variation on Tatsuoka's rule-space approach. Journal of Educational Measurement, 41, 205-237.
Gierl, M. J., Gotzmann, A., & Boughton, K. A. (2004). Performance of SIBTEST when the percentage of DIF items is large. Applied Measurement in Education, 17, 241-264.
Ercikan, K. Gierl, M. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of bilingual versions of assessments: Sources of incomparability of english and french versions of Canada's national achievement tests. Applied Measurement in Education, 17, 301-321.
Gierl, M. J., & Leighton, J. P. (2004). [Review of the book ‘Item generation for test development’ edited by Sidney Irvine & Patrick Kyllonen.] Journal of Educational Measurement, 41, 69-72.
2003
Gierl, M. J., Bisanz, J., Bisanz, G., & Boughton, K. (2003). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the DIP analysis framework. Journal of Educational Measurement, 40, 281-306.
Rogers, W. T., Gierl, M. J., Tardif, C., Lin, J., & Rinaldi, C. (2003). Differential validity and utility of successive and simultaneous approaches to the development of equivalent achievement tests in French and English. Alberta Journal of Educational Research, 49, 290-304.
Ackerman, T. A., Gierl, M. J., & Walker C. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22, 37-53.
2002
Klein, S., Sollereder, P., & Gier, M. (2002). Examining the factor structure and psychometric properties of the Test of Visual-Perceptual Skills. The Occupational Therapy Journal of Research, 22, 16-24.
2001
Gierl, M. J., & Bolt, D. (2001). Illustrating the use of nonparametric regression to assess differential item and bundle functioning among multiple groups. International Journal of Testing, I, 249-270.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIP detection. Applied Measurement in Education, 14, 329-349.
Gierl, M. J , & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests. Journal of Educational Measurement, 38, 164-187.
Gierl, M. J , Bisanz, J., Bisanz, G., Boughton, K., & Khaliq, S. (2001). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests. Educational Measurement: Issues and Practice, 20, 26-36.
Gierl, M. J., Henderson, D., Jodom, M., & Klinger, D. (2001). Minimizing the influence of item parameter estimation errors in test development: A comparison of three selection procedures. Journal of Experimental Education, 69, 261-279.
2000
Gierl, M. J. (2000). Construct equivalence on translated achievement tests. Canadian Journal of Education, 25, 280-296.
Gierl, M. J., Leighton, J. P., & Hunka, S. (2000). Exploring the logic of Tatsuoka's rule-space model for test development and analysis. Educational Measurement: Issues and Practice, 19, 34-44.
Before 2000
Gierl, M. J., Rogers, W. T., & Klinger, D. (1999). Using statistical and substantive reviews to identify and interpret translation DIF. Alberta Journal of Educational Research, 45, 353-376. [Theme Issue: “Measurement and Evaluation in the New Millennium”].
Gierl, M. J. (1999). Differential item functioning on the Alberta Education Social Studies 30 Diploma Exam. Canadian Social Studies, 33, 54-58.
Gierl, M. J. (1998). Generalizability of written-response scores for the Alberta Education English 30 Diploma Examination. Alberta Journal of Educational Research, 44, 91-94.
Gierl, M. J. (1997). Comparing the cognitive representations of test developers and students on a mathematics achievement test using Bloom’s taxonomy. Journal of Educational Research, 91, 26-32.
McLeod, D. B., Stake, R. E., Schappelle, B. P., Mellissinos, M., & Gierl, M. J. (1996). Setting the Standards: NCTM’s role in the reform of mathematics education. In S. A. Raizen & E. D. Britton (Eds.) Bold ventures. Volume 3: Case studies of U.S. innovations in mathematics education (pp. 13-132). Dordrecht, The Netherlands: Kluwer.
Gierl, M. J., & Rogers, W. T. (1996). A confirmatory factor analysis of the Test Anxiety Inventory using Canadian high school students. Educational and Psychological Measurement, 56, 315-324.
Gierl, M. J., & Bisanz, J. (1995). Anxiety and attitudes related to mathematics in grades 3 and 6. Journal of Experimental Education, 63, 139-158.