2024
Firoozi, T., & Gierl, M. J. (in press). Scoring essays written in Persian using a transformer-based model: Implications for multilingual AES. Invited chapter to appear in M. Shermis & J. Wilson (Eds.), The Routledge International Handbook of Automated Essay Evaluation. New York: Routledge. [We were invited to prepare a chapter on multilingual automated essay scoring for the upcoming edition of the Routledge International Handbook of Automated Essay Evaluation.]
Shin, J., & Gierl, M. J. (in press). Automated short-response scoring for automated item generation in science assessments. Invited chapter to appear in M. Shermis & J. Wilson (Eds.), The Routledge International Handbook of Automated Essay Evaluation. New York: Routledge. [We were invited to prepare a chapter on the interface between automated essay scoring and automatic item generation for the upcoming edition of the Routledge International Handbook of Automated Essay Evaluation.]
Mohammadi, H., Firoozi, T., & Gierl, M. J. (in press). Neural semantic video analysis. In M. Khosrow-Pour (Ed.), Encyclopedia of Organizational Knowledge, Administration, and Technologies (6th Ed.). Hershey, PA: IGI Global.
Shin, J., Wang, B., Pinto, W., & Gierl, M. J. (2024). An engagement-aware predictive model to evaluate problem-solving performance from the Study of Adult Skills' process data. Large-scale Assessments in Education, 12:6.
Sayin, A. and Gierl, M. (2024), Using OpenAI GPT to generate reading comprehension items. Educational Measurement: Issues and Practice, 43, 5-18.
2023
Firoozi, T., Bulut, O., & Gierl, M. (2023). Language models in automated essay scoring: Insights for the multilingual world. International Journal of Assessment Tools in Education, 10, 148-162. (Special Issue: Educational Measurement and Evaluation: Lessons from the Past, Visions for the Future)
Sayin, A. Bozdag, S., & Gierl, M. (2023). Automatic item generation for nonverbal reasoning items. International Journal of Assessment Tools in Education, 10, 131-147. (Special Issue: Educational Measurement and Evaluation: Lessons from the Past, Visions for the Future)
Sayin, A. & Gierl, M. (2023). Automatic item generation for online measurement and evaluation: Turkish literature items. International Journal of Assessment Tools in Education, 10, 218-231.
Leslie, T., & Gierl, M. J. (2023). Using automatic item generation to create multiple-choice questions for pharmacy assessment. Journal of Pharmaceutical Education. https://doi.org/10.1016/ j.ajpe.2023.100081
Shin, J, & Gierl, M. J. (2023). A technology-enhanced approach for locating timely and relevant news articles for context-based science education. In R. Ouyand, P. Jiao, B. M. McLaren, & A. H. Alavi (Eds.), Artificial Intelligence in STEM Education: The Paradigmatic Shifts in Research, Education, and Technology (pp. 109-126). Boca Raton, FL: CRC Press.
Gierl, M., Shin, J., & Firoozi, T. (2023). Automatic item generation. In R. Tierney, F. Rizvi, & K. Ercikan (Eds.), International Encyclopedia of Education (4th Ed., pp. 193-200). New York: Elsevier. https://doi.org/10.1016/B978-0-12-818630-5.10026-0
2022
Firoozi, T., Mohammadi, H., & Gierl, M. J. (2022). Using active learning methods to strategically select essays for automated scoring. Educational Measurement: Issues and Practice, 00, 1– 10. https://doi.org/10.1111/emip.12537
Shin, J., Gierl, M., & Lai, H. (2022). Generating reading comprehension items using automated processes. International Journal of Testing, 22, 289-311.
Gierl, M. J., Swygert, K., Matovinovic, D., Kulesher, A., & Lai, H. (2022). Three sources of validation evidence needed to evaluate the quality of generated test items for medical licensure. Teaching and Learning in Medicine. https://doi.org/10.1080/10401334.2022.2119569.
Shin, J., & Gierl, M. J. (2022). Attribute-specific essay scoring using convolution neural networks. Journal of Applied Testing Technology, 22, 1-17
Gierl, M. J., Shin, J., Firoozi, T., & Lai, H. (2022). Using content coding and automatic item generation to improve test security. Frontiers in Education (Special Issue: Online Assessment for Humans—Advancements, Challenges, and Futures for Digital Assessment). 07:853578. doi: 10.3389/feduc.2022.853578
2021
Stephen, T., Gierl, M., & King, S. (2021). Automated essay scoring of constructed responses in nursing examinations papers: An evaluation. Nurse Education in Practice, 54. https://doi.org/10.1016/j.nepr.2021.103085
Odell, B., Gierl, M., & Cutumisu, M. (2021). Testing measurement invariance of PISA 2015 mathematics, science, and ICT scales using the alignment method. Studies in Educational Evaluation, 68. https://doi.org/10.1016/j.stueduc.2020.100965
Lai, H. & Gierl, M. J. (2021). Automating the generation of test items. In M. Khosrow-Pour (Ed.), Encyclopedia of Organizational Knowledge, Administration, and Technologies (pp. 233-244). Hershey, PA: IGI Global.
Shin, J., Guo, Q., & Gierl, M. J. (2020). Automated essay scoring using deep learning algorithms. In M. Khosrow-Pour (Ed.), Handbook of Research on Modern Educational Technologies, Applications, and Management (1st Ed.). DOI: 10.4018/978-1-7998-3476-2.ch003.
2020
Leslie, T., & Gierl, M. Using automatic item generation methodology to create multiple-choice questions appropriate for entry to pharmacy practice assessment. AFPC PERC 2020 Abstracts—Oral and Poster Presentations. Can Pharm J (Ott) 2020, 153: 30.
Shin, J., & Gierl, M. J. (2020). More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms. Language Testing. https://doi.org/10.1177/0265532220937830
Pugh, D., De Champlain, A., Gierl, M., Lai, H., & Touchie, C. (2020). Can automated item generation be used to develop high-quality MCQs that assess application of knowledge? Research and Practice in Technology Enhanced Learning, 15, 12. https://doi.org/10.1186/s41039-020-00134-8.
Latifi, F. S., & Gierl, M. J. (2020). Automated scoring of junior high essays using Coh-Metrix features: Implications for large-scale language testing. Language Testing. https://doi.org/10.1177/0265532220929918
Shin, J., Bulut, O., & Gierl, M. (2020). Development practices of trusted AI systems among Canadian data scientists. The International Review of Information Ethics, 28. http://informationethics.ca/ index.php/irie/article/view/377.
2019
Shin, J., Bulut, O., & Gierl, M. J. (2019). The effect of best distractor location on the difficulty of multiple-choice items. Journal of Experimental Education. DOI: 10.1080/00220973.2019.1629577
Shin J., Guo Q., & Gierl M.J. (2019) Multiple-choice item distractor development using topic modeling approaches. Frontiers in Psychology, 10: 825. doi: 10.3389/fpsyg.2019.00825. Invited Paper in Special Issue on Advancements in Technology-Based Assessment: Emerging Item Formats, Test Designs, and Data Sources, Frank Goldhammer, Ronny Scherer, Samuel Greiff (Guest Editors).
Gierl, M. J., Matovinovic, D., & Lai, H. (2019) Creating content for educational testing using a workflow that supports automatic item generation. In A. Reyes-Munoz, P. Zheng, D. Crawford, & V. Callaghan (Eds.), EAI International Conference on Technology, Innovation, Entrepreneurship and Education, Lecture Notes in Electrical Engineering 532 (pp. 27-38), New York: Springer.
Shin, J., Guo, Q., & Gierl, M. J. (2019). Automated essay scoring using deep learning algorithms. Chapter to appear in M. Khosrow-Pour (Editor), Encyclopedia of Organizational Knowledge, Administration, and Technologies (1st Edition).
Gierl, M. J., Lai, H., & Matovinovic, D. (in press). Augmented intelligence and the future of item development. In M. H. Jiao & R. Lissitz (Eds.), Applications of artificial intelligence in assessment. Daryaganj, New Delhi: New Age Publishing.
Lai, H. & Gierl, M. J. (in press). Automating the generation of test items. Chapter to appear in M. Khosrow-Pour (Editor), Encyclopedia of Organizational Knowledge, Administration, and Technologies (1st Edition).
2018
Gierl, M. J., Bulut, O., & Zhang, X. (2018). Using computerized formative testing to support personalized learning in higher education: An application of two assessment technologies. In R. Zheng (Ed.), Digital technologies and instructional design for personalized learning (pp. 99-119). Hershey, PA: IGI Global.
Gierl, M. J., Lai, H., & Zhang, X. (2018). Automatic item generation. In M. Khosrow-Pour (Ed.), Encyclopedia of information science and technology (4th Ed., pp. 2369-2379). Hershey, PA: IGI Global.
Gierl, M. J., & Lai, H. (2018). Using automatic item generation to create solutions and rationales for computerized formative testing. Applied Psychological Measurement, 42, 42-57.
2017
Gierl, M. J., Bulut, O., Gao, Q., & Zhang, X. (2017). Developing, analyzing, and using distractors for multiple-choice tests: A comprehensive review. Review of Educational Research, 87, 1082-1116. [At the time of publication, the impact factor for the journal Review of Educational Research was 5.263 with a ranking of 2 out of 236 for the Education & Educational Research category using the 2016 release of Journal Citation Reports.]
Daniels, L., & Gierl, M. J. (2017). The impact of immediate test score reporting on university students‘ achievement emotions in the context of computer-based multiple-choice exams. Learning and Instruction, 52, 27-35.
Gierl, M. J., Daniels, L., & Zhang, X. (2017). Creating parallel forms to support on-demand testing for undergraduate students in psychology. Journal of Measurement and Evaluation in Education and Psychology, 8, 298-303.
Bulut, O., Guo, Q., & Gierl, M. J. (2017). A structural equation modeling approach for examining position effects in large-scale assessments. Large-scale Assessments in Education, 5: 8, 1-20.
Lai, H., Gierl, M. J., Cui, Y., & Babenko, O. (2017). Item consistency index: A method for evaluating item- model fit for cognitive diagnostic assessment. International Journal of Learning, Teaching and Educational Research, 16, 1-21.
Latifi, S., Gierl, M., Wang, R., Lai, H., & Wang, A. (2017). Information-based methods for evaluating the semantics of automatically generated test items. Artificial Intelligence Research, 6, 69-79.
2016
Gierl, M. J. & Lai, H. (2016). A process for reviewing and evaluating generated test items. Educational Measurement: Issues and Practice, 35, 6–20.
Gierl, M. J., Lai, H., Pugh, D., Touchie, C., Boulais, A-P, & DeChamplain, A. (2016). Evaluating the psychometric characteristics of generated multiple-choice test items. Applied Measurement in Education, 29, 196-210.
Latifi, S., Bulut, O., Gierl, M., Christie, T., & Jeeva, S. (2016). Differential performance on national exams: Evaluating item and bundle functioning methods using english, mathematics, and science assessments. SAGE Open, 6(2).
Zhang, X., & Gierl, M. J. (2016). A model-based method for content validation of automatically generated test items. Journal of Educational Issues, 2, 184-202.
Lai, H., Gierl, M. J., Touchie, C., Pugh, D., Boulais, A., & De Champlain, A. (2016). Using automatic item generation to improve the quality of MCQ distractors. Teaching & Learning in Medicine, 28, 166-173.
Pugh, D., DeChamplain, A., Gierl, M. J., Lai, H, & Touchie, C. (2016). Using cognitive models to develop quality multiple-choice questions. Medical Teacher.
Lai, H., Gierl, M. J., Byrne, B. E., Spielman, A., & Waldschmidt, D. (2016). Three modeling applications to promote automatic item generation for examinations in dentistry. Journal of Dental Education, 80, 339-347.
Gierl, M. J. & Lai, H. (2016). The role of cognitive models in automatic item generation. In A. Rupp & J. Leighton (Eds.), The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 124-145). New York: Wiley.
Cui, Y., Gierl, M. J., & Guo, Q. (2016). The rule space and attribute hierarchy methods. In A. Rupp & J. Leighton (Eds.), The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 354-378). New York: Wiley.
Gierl, M. J. & Lai, H. (2016). Automatic item generation. In S. Lane, M. Raymond, & T. Haladyna (Eds.), Handbook of test development (2nd edition, pp. 410-429). New York: Routledge.
Gierl, M. J., Lai, H., Fung, K., & Zheng, B. (2016). Using technology-enhanced processes to generate items in multiple languages. In F. Drasgow (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 109-127). New York: Routledge.
Gierl, M. J., Latifi, F., Lai, H., Matovinovic, D., & Boughton, K. (2016). Using automated processes to generate items to measure K-12 science skills. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on computational tools for real-world skill development (pp. 590-610). Hershey, PA: IGI Global.
2015
Gierl, M. J., Latifi, F., Lai, H., Matovinovic, D., & Boughton, K. (2015). Using automated processes to generate items to measure K-12 science skills. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on computational tools for real-world skill development (pp. 590-610). Hershey, PA: IGI Global.
Gierl, M. J., Lai, H., Hogan, J., & Matovinovic, D. (2015). A method for generating test items that are aligned to the Common Core State Standards. Journal of Applied Testing Technology, 16, 1-18.
Gierl, M. J., & Lai. H. (2015). Using automated processes to generate test items and their associated solutions and rationales to support formative feedback. Interaction Design & Architecture(s)—IxD&A Journal, N.25, 9-20. Special Issue on Technology-Enhanced Assessment: Agency Change in the Educational Eco-System, Marco Kalz, Eric Ras, & Denise Whitelock (Guest Editors).
Latifi, F., Gierl, M. J., Boulais, A-P, & DeChamplain, A. (2015). Using automated essay scoring to evaluate written-response prompts in English and French on high-stakes medical licensure exams. Evaluation & the Health Professions, 1-5. September 16. [Epublication ahead of print]. DOI: 10.1177/0163278715605358.
Gierl, M. J., Lai, H., Houston, L., Rich, C., & Boughton, K. (2015). Using automated processes to generate items in three or more languages. International Journal of e-Assessment, 1, 1-19.
Cui, Y., Gierl, M. J., & Guo, Q. (2015). Statistical classification for cognitive diagnostic assessment: An artificial neural network approach. Educational Psychology. DOI: 10.1080/01443410.2015.1062078
Gierl, M. J., & Lai, H. (2015). Using automated processes to generate English and French test items simultaneously. Mesure et évaluation en éducation—Measurement and Evaluation in Education, 37, 39-61. Invited Paper appearing in Special Issue on Methodological Advances in Assessment, François Vachon (Guest Editor).
Gierl, M. J., MacMahon-Ball, M., Vele, V., & Lai, H. (2015). Method for generating nonverbal reasoning items using n-layer modeling. In E. Ras & D. Joosten-ten Brinke (Eds.), Proceedings from the 2015 International Computer Assisted Assessment Conference, Communications in Computer and Information Science (pp. 1-10). New York: Springer.
2014
Gierl, M. J., Lai, H., Latifi, F., Boulais, A-P, & DeChamplain, A. (2014). Automated essay scoring and the future of assessment in medical education. Medical Education, 48, 950–962
2013
Gierl, M. J., & Lai, H. (2013). Using automated processes to generate test items. Educational Measurement: Issues and Practice, 32, 36-50.
Gierl, M. J., & Lai, H. (2013). Evaluating the quality of medical multiple-choice items created with automated generation processes. Medical Education, 47, 726-733.
Gierl, M. J., Lai, H., & Li, J. (2013). Identifying differential item functioning in multi-stage computer adaptive testing. Educational Research and Evaluation, 19:2-3, 188-203. Invited paper appearing in Special Issue on Fairness Issues in Educational Assessment, Hossein Karami (Guest Editor).
Gierl, M. J., & Haladyna, T. (2013). Introduction and overview of automatic item generation. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 3-12). New York: Routledge.
Gierl, M. J., & Lai, H. (2013). Using weak and strong theory to create item models for automatic item generation: Some practical guidelines with examples. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 26-39). New York: Routledge.
Lai, H., & Gierl, M. J. (2013). Using principles in assessment engineering to generate items for reading comprehension and mathematical reasoning. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 77-101). New York: Routledge.
Haladyna, T., & Gierl, M. J., & Haladyna, T. (2013). The future of automatic item generation. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 231-239). New York: Routledge.
2012
Gierl, M. J., Lai, H., & Turner, S. (2012). Using automatic item generation to create multiple-choice items for assessments in medical education. Medical Education, 46, 757-765.
Gierl, M. J., & Lai, H. (2012). Using item models for automatic item generation. International Journal of Testing, 12, 273-298.
Cui, Y., Gierl, M. J., & Chang, W. W. (2012). Estimating classification consistency and accuracy for cognitive diagnostic assessment. Journal of Educational Measurement, 49, 19-38.
2011
Squires, J.E., Estabrooks, C.A., Newburn-Cook, C.V., & Gierl, M. (2011). Validation of the Conceptual Research Utilization Scale: An application of the Standards for Educational and Psychological Testing in Healthcare. BMC Health Services Research, 11:107.
Wang, C, & Gierl, M. J. (2011). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees' cognitive skills in critical reading. Journal of Educational Measurement, 48, 1-24.
2010
Gierl, M. J., Alves, C., & Taylor-Majeau, R. (2010). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees' skills in mathematics: An operational implementation of cognitive diagnostic assessment. International Journal of Testing, 10, 318-341.
Roberts, M. R., & Gierl, M. J. (2010). Developing score reports for cognitive diagnostic assessment. Educational Measurement: Issues and Practice, 29, 25-38.
Zheng, Y., Gierl, M. J., & Cui, Y. (2010). Using Cochran's Z statistic to test the kernel-smoothed IRF differences between focal and reference groups. Educational and Psychological Measurement, 70, 541-556.
2009
Gierl, M. J., Cui, Y., & Zhou, J. (2009). Reliability of attribute-based scoring in cognitive diagnostic assessment. Journal of Educational Measurement, 46, 293-313.
Cor, K., Alves, C., & Gierl, M. J. (2009). Three applications of automated test assembly within a user-friendly modeling environment. Practical Assessment Research and Evaluation, 14, 1-23.
Gierl, M. J., Leighton, J. P., Wang, C., Zhou, J., Gokiert, R., & Tan, A. (2009). Developing and validating cognitive models of algebra performance on the SAT© (Research Report No. 2009-03). New York: The College Board.
2008
Gierl, M. J., Cui, Y., & Hunka, S. (2008). Using connectionist models to evaluate examinees' response patterns on tests. Journal of Modern Applied Statistical Methods, 7, 234-245.
Gierl, M. J., Zhou, J., & Alves, C. (2008). Developing a taxonomy of item model types to promote assessment engineering. Journal of Technology, Learning, and Assessment, 7(2). Retrieved [date] from http://www.jtla.org.
Gierl, M. J., & Cui, Y. (2008). Defining characteristics of diagnostic classification models and the problem of retrofitting in cognitive diagnostic assessment. Measurement: Interdisciplinary Research and Perspectives, 6, 263-268.
Gierl, M. J., & Zhou, J. (2008). Computer adaptive-attribute testing: A new approach to cognitive diagnostic assessment. Zeitschift fur Psychologie—Journal of Psychology, 216, 29-39. Invited Paper appearing in Special Issue on Adaptive Models of Psychological Testing, Wim J. van der Linden (Guest Editor).
Gierl, M. J., Zheng, Y., & Cui, Y. (2008). Using the Attribute Hierarchy Method to identify and interpret the cognitive skills that produce group differences. Journal of Educational Measurement, 45, 65-89.
Gierl, M. J., Wang, C., & Zhou, J. (2008). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees' cognitive skills in algebra on the SAT. Journal of Technology, Learning, and Assessment, 6 (6). Retrieved [date] from http://www.jtla.org.
Cor, M. K., Alves, C., & Gierl, M. J. (2008). [Review of the software ‘Conducting Automated Test Assembly using the Premium Solver Platform Version 7.0 with Microsoft EXCEL and the Large-Scale LP/QP Solver Engine Add-In.’] Applied Psychological Measurement, 32, 652-663.
2007
Gierl, M. J. (2007). Making diagnostic inferences about cognitive attributes using the rule space model and Attribute Hierarchy Method. Journal of Educational Measurement, 44, 325-340. Invited Paper appearing in Special Issue on IRT-Based Cognitive Diagnostic Models and Related Methods, Lou DiBello & William Stout (Guest Editors).
Leighton, J. P., & Gierl, M. J. (2007). Defining and evaluating models of cognition used in educational measurement to make inferences about examinees' thinking processes. Educational Measurement: Issues and Practice, 26, 3-16.
Magill-Evans, J., Harrison, M., Benzie, K., Gierl, M. J., & Kimak, C. (2007). Effects of parenting education on first-time fathers' skills in interactions with their infants. Fathering, 5, 41-56.
Gierl, M. J., Leighton, J. P., & Hunka, S. (2007). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees’ cognitive skills. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications. (pp. 242-274). Cambridge, UK: Cambridge University Press.
Gierl, M. J., & Leighton, J. P. (2007). Directions for future research in cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 341-351). Cambridge, UK: Cambridge University Press.
Leighton, J. P., & Gierl, M. J. (2007). Verbal reports as data for cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications. (pp. 146-172). Cambridge, UK: Cambridge University Press.
Leighton, J. P., & Gierl, M. J. (2007). Cognitive diagnostic assessment: An introduction. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 3-18). Cambridge, UK: Cambridge University Press.
Gierl, M. J., & Leighton, J. P. (2007). Linking cognitively-based models and psychometric methods. In C. R. Rao & S. Sinharay (Eds.) Handbook of statistics: Psychometrics, Volume 26 (pp. 1103-1106). North Holland, UK: Elsevier.
VanderVeen, A. A., Huff, K., Gierl, M., McNamara, D. S., Louwerse, M., & Graesser, A. (2007). Developing and validating instructionally relevant reading competency profiles measured by the critical reading section of the SAT Reasoning Test©. In D. S. McNamara (Ed.) Reading comprehension strategies: Theories, interventions, and technologies (pp. 137-172). New York, NY: Taylor & Francis.
Gierl, M. J., & Elatia, S. (2007). [Review of the book ‘Adapting educational and psychological tests for cross-cultural assessment’ edited by Ronald K. Hambleton, Peter Merenda, & Charles Spielberger.] Applied Psychological Measurement, 31, 74-78.
2006
Bolt, D. M., & Gierl, M. J. (2006). Testing features of graphical DIF: Application of a regression correction to three nonparametric statistical tests. Journal of Educational Measurement, 43, 313-333.
Gierl, M. J., Leighton, J. P., Tan, X. (2006). Evaluating DETECT classification accuracy and consistency when data display complex structure. Journal of Educational Measurement, 43, 265-289.
Puhan, G., & Gierl, M. J. (2006). Evaluating the effectiveness of two-stage testing on English and French versions of a science achievement test. Journal of Cross-Cultural Psychology, 37, 136-154.
2005
Gierl, M. J., Tan, X., & Wang, C. (2005). Identifying content and cognitive dimensions on the SAT© (Research Report No. 2005-11). New York: The College Board.
Gierl, M. J. (2005). Using a dimensionality-based DIF analysis paradigm to identify and interpret constructs that elicit group differences. Educational Measurement: Issues and Practice, 24, 3-14.
2004
Leighton, J. P., Gierl, M. J., & Hunka, S. (2004). The Attribute Hierarchy Method for cognitive assessment: A variation on Tatsuoka's rule-space approach. Journal of Educational Measurement, 41, 205-237.
Gierl, M. J., Gotzmann, A., & Boughton, K. A. (2004). Performance of SIBTEST when the percentage of DIF items is large. Applied Measurement in Education, 17, 241-264.
Ercikan, K. Gierl, M. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of bilingual versions of assessments: Sources of incomparability of english and french versions of Canada's national achievement tests. Applied Measurement in Education, 17, 301-321.
Gierl, M. J., & Leighton, J. P. (2004). [Review of the book ‘Item generation for test development’ edited by Sidney Irvine & Patrick Kyllonen.] Journal of Educational Measurement, 41, 69-72.
2003
Gierl, M. J., Bisanz, J., Bisanz, G., & Boughton, K. (2003). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the DIP analysis framework. Journal of Educational Measurement, 40, 281-306.
Rogers, W. T., Gierl, M. J., Tardif, C., Lin, J., & Rinaldi, C. (2003). Differential validity and utility of successive and simultaneous approaches to the development of equivalent achievement tests in French and English. Alberta Journal of Educational Research, 49, 290-304.
Ackerman, T. A., Gierl, M. J., & Walker C. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22, 37-53.
2002
Klein, S., Sollereder, P., & Gier, M. (2002). Examining the factor structure and psychometric properties of the Test of Visual-Perceptual Skills. The Occupational Therapy Journal of Research, 22, 16-24.
2001
Gierl, M. J., & Bolt, D. (2001). Illustrating the use of nonparametric regression to assess differential item and bundle functioning among multiple groups. International Journal of Testing, I, 249-270.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIP detection. Applied Measurement in Education, 14, 329-349.
Gierl, M. J , & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests. Journal of Educational Measurement, 38, 164-187.
Gierl, M. J , Bisanz, J., Bisanz, G., Boughton, K., & Khaliq, S. (2001). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests. Educational Measurement: Issues and Practice, 20, 26-36.
Gierl, M. J., Henderson, D., Jodom, M., & Klinger, D. (2001). Minimizing the influence of item parameter estimation errors in test development: A comparison of three selection procedures. Journal of Experimental Education, 69, 261-279.
2000
Gierl, M. J. (2000). Construct equivalence on translated achievement tests. Canadian Journal of Education, 25, 280-296.
Gierl, M. J., Leighton, J. P., & Hunka, S. (2000). Exploring the logic of Tatsuoka's rule-space model for test development and analysis. Educational Measurement: Issues and Practice, 19, 34-44.
Before 2000
Gierl, M. J., Rogers, W. T., & Klinger, D. (1999). Using statistical and substantive reviews to identify and interpret translation DIF. Alberta Journal of Educational Research, 45, 353-376. [Theme Issue: “Measurement and Evaluation in the New Millennium”].
Gierl, M. J. (1999). Differential item functioning on the Alberta Education Social Studies 30 Diploma Exam. Canadian Social Studies, 33, 54-58.
Gierl, M. J. (1998). Generalizability of written-response scores for the Alberta Education English 30 Diploma Examination. Alberta Journal of Educational Research, 44, 91-94.
Gierl, M. J. (1997). Comparing the cognitive representations of test developers and students on a mathematics achievement test using Bloom’s taxonomy. Journal of Educational Research, 91, 26-32.
McLeod, D. B., Stake, R. E., Schappelle, B. P., Mellissinos, M., & Gierl, M. J. (1996). Setting the Standards: NCTM’s role in the reform of mathematics education. In S. A. Raizen & E. D. Britton (Eds.) Bold ventures. Volume 3: Case studies of U.S. innovations in mathematics education (pp. 13-132). Dordrecht, The Netherlands: Kluwer.
Gierl, M. J., & Rogers, W. T. (1996). A confirmatory factor analysis of the Test Anxiety Inventory using Canadian high school students. Educational and Psychological Measurement, 56, 315-324.
Gierl, M. J., & Bisanz, J. (1995). Anxiety and attitudes related to mathematics in grades 3 and 6. Journal of Experimental Education, 63, 139-158.