SELECT PUBLICATIONS (Google Scholar Link)

2024

Firoozi, T., & Gierl, M. J. (in press). Scoring essays written in Persian using a transformer-based model: Implications for multilingual AES. Invited chapter to appear in M. Shermis & J. Wilson (Eds.), The Routledge International Handbook of Automated Essay Evaluation. New York: Routledge. [We were invited to prepare a chapter on multilingual automated essay scoring for the upcoming edition of the Routledge International Handbook of Automated Essay Evaluation.]

 

Shin, J., & Gierl, M. J. (in press). Automated short-response scoring for automated item generation in science assessments. Invited chapter to appear in M. Shermis & J. Wilson (Eds.), The Routledge International Handbook of Automated Essay Evaluation. New York: Routledge. [We were invited to prepare a chapter on the interface between automated essay scoring and automatic item generation for the upcoming edition of the Routledge International Handbook of Automated Essay Evaluation.]


Mohammadi, H., Firoozi, T., & Gierl, M. J. (in press). Neural semantic video analysis. In M. Khosrow-Pour (Ed.), Encyclopedia of Organizational Knowledge, Administration, and Technologies (6th Ed.). Hershey, PA: IGI Global.


Shin, J., Wang, B., Pinto, W., & Gierl, M. J. (2024). An engagement-aware predictive model to evaluate problem-solving performance from the Study of Adult Skills' process data. Large-scale Assessments in Education, 12:6.


Sayin, A. and Gierl, M. (2024), Using OpenAI GPT to generate reading comprehension items. Educational Measurement: Issues and Practice, 43, 5-18.

2023

Firoozi, T., Bulut, O., & Gierl, M. (2023). Language models in automated essay scoring: Insights for the multilingual world. International Journal of Assessment Tools in Education, 10, 148-162.  (Special Issue: Educational Measurement and Evaluation: Lessons from the Past, Visions for the Future)

 

  Sayin, A. Bozdag, S., & Gierl, M. (2023). Automatic item generation for nonverbal reasoning items. International Journal of Assessment Tools in Education, 10, 131-147.  (Special Issue: Educational Measurement and Evaluation: Lessons from the Past, Visions for the Future)


Sayin, A. & Gierl, M. (2023). Automatic item generation for online measurement and evaluation: Turkish literature items. International Journal of Assessment Tools in Education, 10, 218-231.


Leslie, T., & Gierl, M. J. (2023). Using automatic item generation to create multiple-choice questions for pharmacy assessment. Journal of Pharmaceutical Education. https://doi.org/10.1016/ j.ajpe.2023.100081


Shin, J, & Gierl, M. J. (2023). A technology-enhanced approach for locating timely and relevant news articles for context-based science education. In R. Ouyand, P. Jiao, B. M. McLaren, & A. H. Alavi (Eds.), Artificial Intelligence in STEM Education: The Paradigmatic Shifts in Research, Education, and Technology (pp. 109-126). Boca Raton, FL: CRC Press.

 

Gierl, M., Shin, J., & Firoozi, T. (2023). Automatic item generation. In R. Tierney, F. Rizvi, & K. Ercikan (Eds.), International Encyclopedia of Education (4th Ed., pp. 193-200). New York: Elsevier. https://doi.org/10.1016/B978-0-12-818630-5.10026-0

2022

Firoozi, T., Mohammadi, H., & Gierl, M. J. (2022). Using active learning methods to strategically select essays for automated scoring. Educational Measurement: Issues and Practice, 00, 1– 10. https://doi.org/10.1111/emip.12537

 

Shin, J., Gierl, M., & Lai, H. (2022). Generating reading comprehension items using automated processes. International Journal of Testing, 22, 289-311.

 

Gierl, M. J., Swygert, K., Matovinovic, D., Kulesher, A., & Lai, H. (2022). Three sources of validation evidence needed to evaluate the quality of generated test items for medical licensure. Teaching and Learning in Medicine. https://doi.org/10.1080/10401334.2022.2119569.


Shin, J., & Gierl, M. J. (2022). Attribute-specific essay scoring using convolution neural networks. Journal of Applied Testing Technology, 22, 1-17

 

Gierl, M. J., Shin, J., Firoozi, T., & Lai, H. (2022). Using content coding and automatic item generation to improve test security.  Frontiers in Education (Special Issue: Online Assessment for Humans—Advancements, Challenges, and Futures for Digital Assessment). 07:853578. doi: 10.3389/feduc.2022.853578

2021

Stephen, T., Gierl, M., & King, S. (2021). Automated essay scoring of constructed responses in nursing examinations papers: An evaluation. Nurse Education in Practice, 54. https://doi.org/10.1016/j.nepr.2021.103085


Odell, B., Gierl, M., & Cutumisu, M. (2021). Testing measurement invariance of PISA 2015 mathematics, science, and ICT scales using the alignment method. Studies in Educational Evaluation, 68.  https://doi.org/10.1016/j.stueduc.2020.100965


Lai, H. & Gierl, M. J. (2021). Automating the generation of test items. In M. Khosrow-Pour (Ed.), Encyclopedia of Organizational Knowledge, Administration, and Technologies (pp. 233-244). Hershey, PA: IGI Global.

Shin, J., Guo, Q., & Gierl, M. J. (2020). Automated essay scoring using deep learning algorithms. In M. Khosrow-Pour (Ed.), Handbook of Research on Modern Educational Technologies, Applications, and Management (1st Ed.). DOI: 10.4018/978-1-7998-3476-2.ch003.

2020

Leslie, T., & Gierl, M. Using automatic item generation methodology to create multiple-choice questions appropriate for entry to pharmacy practice assessment. AFPC PERC 2020 Abstracts—Oral and Poster Presentations. Can Pharm J (Ott) 2020, 153: 30.

Shin, J., & Gierl, M. J. (2020). More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms. Language Testing.  https://doi.org/10.1177/0265532220937830 

Pugh, D., De Champlain, A., Gierl, M., Lai, H., & Touchie, C. (2020). Can automated item generation be used to develop high-quality MCQs that assess application of knowledge? Research and Practice in Technology Enhanced Learning, 15, 12. https://doi.org/10.1186/s41039-020-00134-8.

Latifi, F. S., & Gierl, M. J. (2020). Automated scoring of junior high essays using Coh-Metrix features: Implications for large-scale language testing. Language Testing. https://doi.org/10.1177/0265532220929918 

Shin, J., Bulut, O., & Gierl, M. (2020). Development practices of trusted AI systems among Canadian data scientists. The International Review of Information Ethics, 28. http://informationethics.ca/ index.php/irie/article/view/377.

2019

Shin, J., Bulut, O., & Gierl, M. J. (2019). The effect of best distractor location on the difficulty of multiple-choice items. Journal of Experimental Education. DOI: 10.1080/00220973.2019.1629577

Shin J., Guo Q., & Gierl M.J. (2019) Multiple-choice item distractor development using topic modeling approaches. Frontiers in Psychology, 10: 825. doi: 10.3389/fpsyg.2019.00825. Invited Paper in Special Issue on Advancements in Technology-Based Assessment: Emerging Item Formats, Test Designs, and Data Sources, Frank Goldhammer, Ronny Scherer, Samuel Greiff (Guest Editors).

Gierl, M. J., Matovinovic, D., & Lai, H. (2019) Creating content for educational testing using a workflow that supports automatic item generation. In A. Reyes-Munoz, P. Zheng, D. Crawford, & V. Callaghan (Eds.), EAI International Conference on Technology, Innovation, Entrepreneurship and Education, Lecture Notes in Electrical Engineering 532 (pp. 27-38), New York: Springer.

Shin, J., Guo, Q., & Gierl, M. J. (2019). Automated essay scoring using deep learning algorithms. Chapter to appear in M. Khosrow-Pour (Editor), Encyclopedia of Organizational Knowledge, Administration, and Technologies (1st Edition).

Gierl, M. J., Lai, H., & Matovinovic, D. (in press). Augmented intelligence and the future of item development. In M. H. Jiao & R. Lissitz (Eds.), Applications of artificial intelligence in assessment. Daryaganj, New Delhi: New Age Publishing.

Lai, H. & Gierl, M. J. (in press). Automating the generation of test items. Chapter to appear in M. Khosrow-Pour (Editor), Encyclopedia of Organizational Knowledge, Administration, and Technologies (1st Edition).

2018

Gierl, M. J., Bulut, O., & Zhang, X. (2018). Using computerized formative testing to support personalized learning in higher education: An application of two assessment technologies. In R. Zheng (Ed.), Digital technologies and instructional design for personalized learning (pp. 99-119). Hershey, PA: IGI Global.

Gierl, M. J., Lai, H., & Zhang, X. (2018). Automatic item generation. In M. Khosrow-Pour (Ed.), Encyclopedia of information science and technology (4th Ed., pp. 2369-2379). Hershey, PA: IGI Global.

Gierl, M. J., & Lai, H. (2018). Using automatic item generation to create solutions and rationales for computerized formative testing.  Applied Psychological Measurement, 42, 42-57. 

2017

Gierl, M. J., Bulut, O., Gao, Q., & Zhang, X. (2017). Developing, analyzing, and using distractors for multiple-choice tests: A comprehensive review. Review of Educational Research, 87, 1082-1116. [At the time of publication, the impact factor for the journal Review of Educational Research was 5.263 with a ranking of 2 out of 236 for the Education & Educational Research category using the 2016 release of Journal Citation Reports.]

 Daniels, L., & Gierl, M. J. (2017). The impact of immediate test score reporting on university students‘ achievement emotions in the context of computer-based multiple-choice exams. Learning and Instruction, 52, 27-35.

 Gierl, M. J., Daniels, L., & Zhang, X. (2017). Creating parallel forms to support on-demand testing for undergraduate students in psychology. Journal of Measurement and Evaluation in Education and Psychology, 8, 298-303.

Bulut, O., Guo, Q., & Gierl, M. J. (2017). A structural equation modeling approach for examining position effects in large-scale assessments. Large-scale Assessments in Education, 5: 8, 1-20.

Lai, H., Gierl, M. J., Cui, Y., & Babenko, O. (2017). Item consistency index: A method for evaluating item- model fit for cognitive diagnostic assessment. International Journal of Learning, Teaching and Educational Research, 16, 1-21.

Latifi, S., Gierl, M., Wang, R., Lai, H., & Wang, A. (2017). Information-based methods for evaluating the semantics of automatically generated test items. Artificial Intelligence Research, 6, 69-79.

2016

Gierl, M. J. & Lai, H. (2016). A process for reviewing and evaluating generated test items. Educational Measurement: Issues and Practice, 35, 6–20.

Gierl, M. J., Lai, H., Pugh, D., Touchie, C., Boulais, A-P, & DeChamplain, A. (2016). Evaluating the psychometric characteristics of generated multiple-choice test items. Applied Measurement in Education, 29, 196-210.

Latifi, S., Bulut, O., Gierl, M., Christie, T., & Jeeva, S. (2016). Differential performance on national exams: Evaluating item and bundle functioning methods using english, mathematics, and science assessments. SAGE Open, 6(2).

Zhang, X., & Gierl, M. J. (2016). A model-based method for content validation of automatically generated test items. Journal of Educational Issues, 2, 184-202.

Lai, H., Gierl, M. J., Touchie, C., Pugh, D., Boulais, A., & De Champlain, A. (2016). Using automatic item generation to improve the quality of MCQ distractors. Teaching & Learning in Medicine, 28, 166-173.

Pugh, D., DeChamplain, A., Gierl, M. J., Lai, H, & Touchie, C. (2016). Using cognitive models to develop quality multiple-choice questions. Medical Teacher.

Lai, H., Gierl, M. J., Byrne, B. E., Spielman, A., & Waldschmidt, D. (2016). Three modeling applications to promote automatic item generation for examinations in dentistry. Journal of Dental Education, 80, 339-347.

Gierl, M. J. & Lai, H. (2016). The role of cognitive models in automatic item generation. In A. Rupp & J. Leighton (Eds.), The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 124-145)New York: Wiley.

Cui, Y., Gierl, M. J., & Guo, Q. (2016). The rule space and attribute hierarchy methods. In A. Rupp & J. Leighton (Eds.), The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 354-378)New York: Wiley.

Gierl, M. J. & Lai, H. (2016). Automatic item generation.  In S. Lane, M. Raymond, & T. Haladyna (Eds.), Handbook of test development (2nd edition, pp. 410-429). New York: Routledge.

Gierl, M. J., Lai, H., Fung, K., & Zheng, B. (2016). Using technology-enhanced processes to generate items in multiple languages.  In F. Drasgow (Ed.), Technology and testing: Improving educational and psychological measurement (pp. 109-127). New York: Routledge.

Gierl, M. J., Latifi, F., Lai, H., Matovinovic, D., & Boughton, K. (2016). Using automated processes to generate items to measure K-12 science skills.  In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on computational tools for real-world skill development (pp. 590-610). Hershey, PA: IGI Global.

2015

Gierl, M. J., Latifi, F., Lai, H., Matovinovic, D., & Boughton, K. (2015). Using automated processes to generate items to measure K-12 science skills. In Y. Rosen, S. Ferrara, & M. Mosharraf (Eds.), Handbook of research on computational tools for real-world skill development (pp. 590-610). Hershey, PA: IGI Global.

Gierl, M. J., Lai, H., Hogan, J., & Matovinovic, D. (2015).   A method for generating test items that are aligned to the Common Core State Standards.  Journal of Applied Testing Technology, 16, 1-18.

Gierl, M. J., & Lai. H. (2015).  Using automated processes to generate test items and their associated solutions and rationales to support formative feedback. Interaction Design & Architecture(s)—IxD&A Journal, N.25, 9-20. Special Issue on Technology-Enhanced Assessment: Agency Change in the Educational Eco-System, Marco Kalz, Eric Ras, & Denise Whitelock (Guest Editors).

Latifi, F., Gierl, M. J., Boulais, A-P, & DeChamplain, A. (2015).  Using automated essay scoring to evaluate written-response prompts in English and French on high-stakes medical licensure exams.  Evaluation & the Health Professions, 1-5. September 16. [Epublication ahead of print]. DOI: 10.1177/0163278715605358.

Gierl, M. J., Lai, H., Houston, L., Rich, C., & Boughton, K. (2015).  Using automated processes to generate items in three or more languages.  International Journal of e-Assessment, 1, 1-19.

Cui, Y., Gierl, M. J., & Guo, Q. (2015). Statistical classification for cognitive diagnostic assessment: An artificial neural network approach. Educational Psychology. DOI: 10.1080/01443410.2015.1062078

Gierl, M. J., & Lai, H. (2015).  Using automated processes to generate English and French test items simultaneously.  Mesure et évaluation en éducation—Measurement and Evaluation in Education, 37, 39-61. Invited Paper appearing in Special Issue on Methodological Advances in Assessment, François Vachon (Guest Editor).

Gierl, M. J., MacMahon-Ball, M., Vele, V., & Lai, H. (2015). Method for generating nonverbal reasoning items using n-layer modeling. In E. Ras & D. Joosten-ten Brinke (Eds.), Proceedings from the 2015 International Computer Assisted Assessment Conference, Communications in Computer and Information Science (pp. 1-10). New York: Springer.

2014

Gierl, M. J., Lai, H., Latifi, F., Boulais, A-P, & DeChamplain, A. (2014).  Automated essay scoring and the future of assessment in medical education.  Medical Education, 48, 950–962 

2013

Gierl, M. J., & Lai, H. (2013).  Using automated processes to generate test items.  Educational Measurement: Issues and Practice, 32, 36-50.

Gierl, M. J., & Lai, H. (2013).  Evaluating the quality of medical multiple-choice items created with automated generation processes.  Medical Education, 47, 726-733.

Gierl, M. J., Lai, H., & Li, J. (2013).  Identifying differential item functioning in multi-stage computer adaptive testing.  Educational Research and Evaluation, 19:2-3, 188-203. Invited paper appearing in Special Issue on Fairness Issues in Educational Assessment, Hossein Karami (Guest Editor).

Gierl, M. J., & Haladyna, T. (2013). Introduction and overview of automatic item generation. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 3-12). New York: Routledge.

Gierl, M. J., & Lai, H. (2013). Using weak and strong theory to create item models for automatic item generation: Some practical guidelines with examples. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 26-39). New York: Routledge.

Lai, H., & Gierl, M. J. (2013). Using principles in assessment engineering to generate items for reading comprehension and mathematical reasoning. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 77-101). New York: Routledge.

Haladyna, T., & Gierl, M. J., & Haladyna, T. (2013). The future of automatic item generation. In M. J. Gierl & T. Haladyna (Eds.), Automatic item generation: Theory and practice (pp. 231-239). New York: Routledge.

2012

Gierl, M. J., Lai, H., & Turner, S. (2012).  Using automatic item generation to create multiple-choice items for assessments in medical education.  Medical Education, 46, 757-765.

Gierl, M. J., & Lai, H. (2012).  Using item models for automatic item generation.  International Journal of Testing, 12, 273-298.

Cui, Y., Gierl, M. J., & Chang, W. W. (2012).  Estimating classification consistency and accuracy for cognitive diagnostic assessment.  Journal of Educational Measurement, 49, 19-38.

2011

Squires, J.E., Estabrooks, C.A., Newburn-Cook, C.V., & Gierl, M. (2011).  Validation of the Conceptual Research Utilization Scale: An application of the Standards for Educational and Psychological Testing in Healthcare.  BMC Health Services Research, 11:107.

Wang, C, & Gierl, M. J. (2011).  Using the Attribute Hierarchy Method to make diagnostic inferences about examinees' cognitive skills in critical reading.  Journal of Educational Measurement, 48, 1-24.

2010

Gierl, M. J., Alves, C., & Taylor-Majeau, R. (2010).  Using the Attribute Hierarchy Method to make diagnostic inferences about examinees' skills in mathematics: An operational implementation of cognitive diagnostic assessment.  International Journal of Testing, 10, 318-341.

Roberts, M. R., & Gierl, M. J. (2010).  Developing score reports for cognitive diagnostic assessment.  Educational Measurement: Issues and Practice, 29, 25-38.

Zheng, Y., Gierl, M. J., & Cui, Y. (2010).  Using Cochran's Z statistic to test the kernel-smoothed IRF differences between focal and reference groups.  Educational and Psychological Measurement, 70, 541-556.

2009

Gierl, M. J., Cui, Y., & Zhou, J. (2009).  Reliability of attribute-based scoring in cognitive diagnostic assessment.  Journal of Educational Measurement, 46, 293-313.

Cor, K., Alves, C., & Gierl, M. J. (2009).  Three applications of automated test assembly within a user-friendly modeling environment.  Practical Assessment Research and Evaluation, 14, 1-23.

Gierl, M. J., Leighton, J. P., Wang, C., Zhou, J., Gokiert, R., & Tan, A. (2009).  Developing and validating cognitive models of algebra performance on the SAT© (Research Report No. 2009-03).  New York: The College Board.

2008

Gierl, M. J., Cui, Y., & Hunka, S. (2008).  Using connectionist models to evaluate examinees' response patterns on tests.  Journal of Modern Applied Statistical Methods, 7, 234-245.

Gierl, M. J., Zhou, J., & Alves, C. (2008).  Developing a taxonomy of item model types to promote assessment engineering.  Journal of Technology, Learning, and Assessment, 7(2). Retrieved [date] from http://www.jtla.org.

Gierl, M. J., & Cui, Y. (2008).  Defining characteristics of diagnostic classification models and the problem of retrofitting in cognitive diagnostic assessment.  Measurement: Interdisciplinary Research and Perspectives, 6, 263-268.

Gierl, M. J., & Zhou, J. (2008).  Computer adaptive-attribute testing: A new approach to cognitive diagnostic assessment.  Zeitschift fur Psychologie—Journal of Psychology, 216, 29-39. Invited Paper appearing in Special Issue on Adaptive Models of Psychological Testing, Wim J. van der Linden (Guest Editor).

Gierl, M. J., Zheng, Y., & Cui, Y. (2008).  Using the Attribute Hierarchy Method to identify and interpret the cognitive skills that produce group differences.  Journal of Educational Measurement, 45, 65-89.

Gierl, M. J., Wang, C., & Zhou, J. (2008).  Using the Attribute Hierarchy Method to make diagnostic inferences about examinees' cognitive skills in algebra on the SAT.  Journal of Technology, Learning, and Assessment, 6 (6). Retrieved [date] from http://www.jtla.org.

Cor, M. K., Alves, C., & Gierl, M. J. (2008). [Review of the software ‘Conducting Automated Test Assembly using the Premium Solver Platform Version 7.0 with Microsoft EXCEL and the Large-Scale LP/QP Solver Engine Add-In.’] Applied Psychological Measurement, 32, 652-663.

2007

Gierl, M. J. (2007).  Making diagnostic inferences about cognitive attributes using the rule space model and Attribute Hierarchy Method.  Journal of Educational Measurement, 44, 325-340. Invited Paper appearing in Special Issue on IRT-Based Cognitive Diagnostic Models and Related Methods, Lou DiBello & William Stout (Guest Editors).

Leighton, J. P., & Gierl, M. J. (2007).  Defining and evaluating models of cognition used in educational measurement to make inferences about examinees' thinking processes. Educational Measurement: Issues and Practice, 26, 3-16.

Magill-Evans, J., Harrison, M., Benzie, K., Gierl, M. J., & Kimak, C. (2007).  Effects of parenting education on first-time fathers' skills in interactions with their infants.  Fathering, 5, 41-56.

Gierl, M. J., Leighton, J. P., & Hunka, S. (2007). Using the Attribute Hierarchy Method to make diagnostic inferences about examinees’ cognitive skills. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications. (pp. 242-274). Cambridge, UK: Cambridge University Press.

Gierl, M. J., & Leighton, J. P. (2007). Directions for future research in cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 341-351). Cambridge, UK: Cambridge University Press.

Leighton, J. P., & Gierl, M. J. (2007). Verbal reports as data for cognitive diagnostic assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications. (pp. 146-172). Cambridge, UK: Cambridge University Press.

Leighton, J. P., & Gierl, M. J. (2007). Cognitive diagnostic assessment: An introduction. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 3-18). Cambridge, UK: Cambridge University Press.

Gierl, M. J., & Leighton, J. P. (2007). Linking cognitively-based models and psychometric methods. In C. R. Rao & S. Sinharay (Eds.) Handbook of statistics: Psychometrics, Volume 26 (pp. 1103-1106). North Holland, UK: Elsevier.

VanderVeen, A. A., Huff, K., Gierl, M., McNamara, D. S., Louwerse, M., & Graesser, A. (2007). Developing and validating instructionally relevant reading competency profiles measured by the critical reading section of the SAT Reasoning Test©. In D. S. McNamara (Ed.) Reading comprehension strategies: Theories, interventions, and technologies (pp. 137-172). New York, NY: Taylor & Francis. 

Gierl, M. J., & Elatia, S. (2007). [Review of the book ‘Adapting educational and psychological tests for cross-cultural assessment’ edited by Ronald K. Hambleton, Peter Merenda, & Charles Spielberger.] Applied Psychological Measurement, 31, 74-78.

2006

Bolt, D. M., & Gierl, M. J. (2006).  Testing features of graphical DIF: Application of a regression correction to three nonparametric statistical tests.  Journal of Educational Measurement, 43, 313-333.

Gierl, M. J., Leighton, J. P., Tan, X. (2006).  Evaluating DETECT classification accuracy and consistency when data display complex structure.  Journal of Educational Measurement, 43, 265-289.

Puhan, G., & Gierl, M. J. (2006).  Evaluating the effectiveness of two-stage testing on English and French versions of a science achievement test. Journal of Cross-Cultural Psychology, 37, 136-154.

2005

Gierl, M. J., Tan, X., & Wang, C. (2005).  Identifying content and cognitive dimensions on the SAT© (Research Report No. 2005-11).  New York: The College Board.

Gierl, M. J. (2005).  Using a dimensionality-based DIF analysis paradigm to identify and interpret constructs that elicit group differences.  Educational Measurement: Issues and Practice, 24, 3-14.

2004

Leighton, J. P., Gierl, M. J., & Hunka, S. (2004).  The Attribute Hierarchy Method for cognitive assessment: A variation on Tatsuoka's rule-space approach.  Journal of Educational Measurement, 41, 205-237.

Gierl, M. J., Gotzmann, A., & Boughton, K. A. (2004).  Performance of SIBTEST when the percentage of DIF items is large.  Applied Measurement in Education, 17, 241-264.

Ercikan, K. Gierl, M. J., McCreith, T., Puhan, G., & Koh, K. (2004).  Comparability of bilingual versions of assessments: Sources of incomparability of english and french versions of Canada's national achievement tests.  Applied Measurement in Education, 17, 301-321.

Gierl, M. J., & Leighton, J. P. (2004). [Review of the book ‘Item generation for test development’ edited by Sidney Irvine & Patrick Kyllonen.] Journal of Educational Measurement, 41, 69-72.

2003

Gierl, M. J., Bisanz, J., Bisanz, G., & Boughton, K. (2003).  Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the DIP analysis framework.  Journal of Educational Measurement, 40, 281-306.

Rogers, W. T., Gierl, M. J., Tardif, C., Lin, J., & Rinaldi, C. (2003).  Differential validity and utility of successive and simultaneous approaches to the development of equivalent achievement tests in French and English.  Alberta Journal of Educational Research, 49, 290-304.

Ackerman, T. A., Gierl, M. J., & Walker C. (2003).  Using multidimensional item response theory to evaluate educational and psychological tests.  Educational Measurement: Issues and Practice, 22, 37-53.

2002

Klein, S., Sollereder, P., & Gier, M. (2002).  Examining the factor structure and psychometric properties of the Test of Visual-Perceptual Skills.  The Occupational Therapy Journal of Research, 22, 16-24.

2001

Gierl, M. J., & Bolt, D. (2001).  Illustrating the use of nonparametric regression to assess differential item and bundle functioning among multiple groups.  International Journal of Testing, I, 249-270.

Jodoin, M. G., & Gierl, M. J. (2001).  Evaluating Type I error and power rates using an effect size measure with the logistic regression procedure for DIP detection.  Applied Measurement in Education, 14, 329-349.

Gierl, M. J , & Khaliq, S. N. (2001).  Identifying sources of differential item and bundle functioning on translated achievement tests.  Journal of Educational Measurement, 38, 164-187.

Gierl, M. J , Bisanz, J., Bisanz, G., Boughton, K., & Khaliq, S. (2001).  Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests.  Educational Measurement: Issues and Practice, 20, 26-36.

Gierl, M. J., Henderson, D., Jodom, M., & Klinger, D. (2001).  Minimizing the influence of item parameter estimation errors in test development: A comparison of three selection procedures. Journal of Experimental Education, 69, 261-279.

2000

Gierl, M. J. (2000).  Construct equivalence on translated achievement tests.  Canadian Journal of Education, 25, 280-296.

Gierl, M. J., Leighton, J. P., & Hunka, S. (2000).  Exploring the logic of Tatsuoka's rule-space model for test development and analysis.  Educational Measurement: Issues and Practice, 19, 34-44.

Before 2000

Gierl, M. J., Rogers, W. T., & Klinger, D. (1999). Using statistical and substantive reviews to identify and interpret translation DIF. Alberta Journal of Educational Research, 45, 353-376.  [Theme Issue: “Measurement and Evaluation in the New Millennium”].

Gierl, M. J. (1999). Differential item functioning on the Alberta Education Social Studies 30 Diploma Exam. Canadian Social Studies, 33, 54-58.

Gierl, M. J. (1998). Generalizability of written-response scores for the Alberta Education English 30 Diploma Examination. Alberta Journal of Educational Research, 44, 91-94.

Gierl, M. J. (1997). Comparing the cognitive representations of test developers and students on a mathematics achievement test using Bloom’s taxonomy. Journal of Educational Research, 91, 26-32.

McLeod, D. B., Stake, R. E., Schappelle, B. P., Mellissinos, M., & Gierl, M. J. (1996). Setting the Standards: NCTM’s role in the reform of mathematics education. In S. A. Raizen & E. D. Britton (Eds.) Bold ventures. Volume 3: Case studies of U.S. innovations in mathematics education (pp. 13-132). Dordrecht, The Netherlands: Kluwer.

Gierl, M. J., & Rogers, W. T. (1996). A confirmatory factor analysis of the Test Anxiety Inventory using Canadian high school students. Educational and Psychological Measurement, 56, 315-324.

Gierl, M. J., & Bisanz, J. (1995). Anxiety and attitudes related to mathematics in grades 3 and 6. Journal of Experimental Education, 63, 139-158.