RESEARCH

2020

Motivating Assessment: How to Leverage Summative Assessments for the Good of Intrinsic Motivation. In Sharon Nichols & Divya Varier (eds.). Theory to Practice: Educational Psychology for Teachers and Teaching (Teaching on Assessment). Daniels, L.M., Pelletier, G, Radil, A. I., & Goegan, L.D (2020).
An intelligent recommender system for personalized test administration scheduling with computerized formative assessments. Frontiers in Education. Okan Bulut, Damien C. Cormier, & Jinnie Shin (2020).
Guidelines for generating effective feedback from e-assessments. Hacettepe University Journal of Education. Okan Bulut, Maria Cutumisu, Deepak Sing, & Alexandra M. Aquilina (2020)
Development practices of trusted AI systems among Canadian data scientists. International Review for Information Ethics. Jinnie Shin, Okan Bulut, & M. J. Gierl (2020).
Odell*, B., Gierl, M., & Cutumisu, M. (2020). Testing measurement invariance of PISA 2015 mathematics, science, and ICT Scales using the alignment method. Studies in Educational Evaluation. https://doi.org/10.1016/j.stueduc.2020.100965. Impact Factor: 1.983. Rank: 34% = 90/263.
Ghoman*, S. K., Cutumisu, M., & Schmölzer, G. M. (2020). Digital simulation improves, maintains, and helps transfer healthcare providers’ neonatal resuscitation performance. Frontiers in Pediatrics: Neonatology. Impact Factor: 2.63. Rank: 27% = 33/124.
Cutumisu, M., Ghoman*, S. K., Lu*, C., Patel*, S. D., Garcia-Hidalgo*, C., Fray*, C., Brown, M. R. G., Greiner, R., & Schmölzer, G. M. (2020). Health care providers’ performance, mindset, and attitudes toward a neonatal resuscitation computer-based simulator: Empirical study. JMIR Serious Games, 8(4):e21855. Impact Factor: 3.526. Rank: 26% = 7/27. PMID: 33346741. DOI: 10.2196/21855
Harley, J. M., Lou*, N. M., Liu*, Y., Cutumisu, M., Daniels, M., Leighton, J. P., & Nadon*, L. (2020). University students’ negative emotions in a computer-based exam: The roles of trait test-emotion, prior test-taking methods, and gender. Assessment and Evaluation in Higher Education. Impact Factor: 2.32. Rank: 21% = 56/263.
Ghoman*, S. K., Cutumisu, M., & Schmölzer, G. M. (2020). Using technology to bridge the gap for remote healthcare education during COVID-19. BMJ Simulation & Technology Enhanced Learning (BMJ STEL). http://dx.doi.org/10.1136/bmjstel-2020-000733
Ghoman*, S. K., Cutumisu, M., & Schmölzer, G. M. (2020). Using the RETAIN tabletop simulator as a summative assessment tool for neonatal resuscitation healthcare professionals: A pilot study. Frontiers in Pediatrics: Neonatology. Impact Factor: 2.63. Rank: 27% = 33/124. https://www.frontiersin.org/articles/10.3389/fped.2020.569776/abstract
Lu*, C., Ghoman*, S. K., Cutumisu, M., & Schmölzer, G. M. (2020). Unsupervised machine learning algorithms examine healthcare providers' perceptions and longitudinal performance in a digital neonatal resuscitation simulator. Frontiers in Pediatrics: Neonatology. Impact Factor: 2.63. Rank: 27% = 33/124.
Odell*, B., Cutumisu, M., & Gierl, M. (2020). A scoping review of the relationship between students’ ICT and performance in mathematics and science in the PISA data. Social Psychology of Education, 23. 1449-1481. Impact Factor: 1.62. Rank: 55% = 33/60. DOI: 10.1007/s11218-020-09591-x
Cutumisu, M., & Lou*, N. M. (2020). The roles of generic and domain-specific mindsets in learning graphic design principles. Interactive Learning Environments. Impact Factor: 1.94. Rank: 37% = 96/263.
Cutumisu, M., & Lou*, N. M. (2020). The moderating effect of mindset on the relation between university students' critical feedback-seeking and learning. Computers in Human Behavior. Impact Factor: 5.00. Rank: 5% = 4/87. https://doi.org/10.1016/j.chb.2020.106445
Ghoman*, S. K., Cutumisu, M., & Schmölzer, G. M. (2020). Using the RETAIN neonatal resuscitation game to train and assess health care professionals’ competence in an observational study design. SAGE Research Methods Cases: Medicine and Health. https://doi.org/10.4135/9781529734461
Odell*, B., Galovan, A., & Cutumisu, M. (2020). The relation between ICT and science in PISA 2015 for Bulgarian and Finnish students. Eurasia Journal of Mathematics, Science and Technology Education, 16(6), Article em1846. https://doi.org/10.29333/ejmste/7805.
Ghoman*, S. K., Cutumisu, M., & Schmölzer, G. M. (2020). Simulation-based summative assessment of neonatal resuscitation providers using the RETAIN serious board game – A pilot study. Frontiers in Pediatrics: Neonatology, 8. https://doi.org/10.3389/fped.2020.00014. Impact Factor: 2.63. Rank: 27% = 33/124.
Cutumisu, M., Schwartz, D. L., & Lou*, N. M. (2020). The relation between academic achievement and the spontaneous use of design-thinking strategies. Computers & Education, 149, Article 103806. https://doi.org/10.1016/j.compedu.2020.103806. Impact Factor: 5.30. Rank: 1.5% = 4/263.

2019

Objective Score Versus Subjective Satisfaction: Impact on Emotions Following Immediate Score Reporting. Lia Daniels. International Journal of Experimental Education.
Simultaneously Students and Teachers: Comparing Student and Professional Perspectives on Measures of Achievement Goals. Daniels, L. M., Goegan, L.D., Radil, A. I., & Frohlich, J. Journal of Experimental Education.
From syllabus to final grades: A wrap-around workshop to support student motivation. Lia Daniels Canadian Society for Studies in Education, Vancouver, BC.
The effect of best distractor location on the difficulty of multiple-choice items. Journal of Experimental Education. Jinnie Shin, Okan Bulut, & Mark Gierl (2019).
Multiple-choice item distractor development using topic modeling approaches. Frontiers in Psychology, 10: 825. Jinnie Shin, Qi Guo, & Mark Gierl (2019).
Creating content for educational testing using a workflow that supports automatic item generation. In A. Reyes-Munoz, P. Zheng, D. Crawford, & V. Callaghan (Eds.), EAI International Conference on Technology, Innovation, Entrepreneurship and Education, Lecture Notes in Electrical Engineering 532 (pp. 27-38), New York: Springer. Mark Gierl, Donna Matovinovic, & Hollis Lai, H. (2019)
Transforming test development - Anything but the status quo. Paper presented at the annual meeting of the European Association of Test Publishers, Madrid, Spain. Donna Matovinovic & Mark Gierl (2019, September).
A comparison of machine learning and deep learning approaches in automated essay scoring. Paper presented at the annual meeting of the National Council on Measurement in Education, Toronto, ON. Jinnie Shin & Mark Gierl. (2019, April).
Using students’ written responses to inform content specialists about common misconceptions. In O. Bulut (Chair), Communicating Assessment Results: How to Inform Decision-Making in Education. Paper presented at the annual meeting of the National Council on Measurement in Education, Toronto, ON. Jinnie Shin, Qi Guo, & Mark Gierl, M. J. (2019, April).
Cutumisu, M., Adams, C., & Lu*, C. (2019). A scoping review of empirical research on recent computational thinking assessments. Journal of Science Education and Technology, 28(6), 651-676. https://doi.org/10.1007/s10956-019-09799-3. Impact Factor: 1.64. Rank: 47% = 124/263.
Cutumisu, M., Turgeon*, K.-L., Saiyera*, T., Chuong*, S., González Esparza*, L. M., MacDonald*, R., & Kokhan*, V. (2019). Eye tracking the feedback assigned to undergraduate students in a digital assessment game. Frontiers in Psychology - Educational Psychology. Neuroeducation: Translating Lab Insights into Classroom Practice. https://doi.org/10.3389/fpsyg.2019.01931. Impact Factor: 2.07. Rank: 33% = 45/138.
Cutumisu, M. (2019). Feedback valence agency moderates the effect of pre-service teachers’ growth mindset on the relation between revising and performance. Frontiers in Psychology - Educational Psychology, 10, Article 1794. https://doi.org/10.3389/fpsyg.2019.01794. Impact Factor: 2.07. Rank: 33% = 45/138.
Cutumisu, M., & Guo*, Q. (2019). Using topic modeling to extract pre-service teachers’ understandings of computational thinking from their coding reflections. IEEE Transactions on Education, 62(4), 325-332. https://doi.org/10.1109/TE.2019.2925253. Impact Factor: 1.86. Rank: 36% = 15/41.
Bulut, O., Cutumisu, M., Aquilina*, A. M., & Singh*, D. (2019). Effects of digital score reporting and feedback on students’ learning in higher education. Frontiers in Education: Assessment, Testing and Applied Measurement, 4. DOI: 10.3389/feduc.2019.00065
Ghoman*, S. K., Patel*, S. D., Cutumisu, M., von Hauff*, P., Jeffery*, T., Brown, M. R. G., & Schmölzer, G. M. (2019). Serious games, a game changer in teaching neonatal resuscitation? A review. Archives of Disease in Childhood - Fetal and Neonatal Edition, 105(1), 98-107. https://doi.org/10.1136/archdischild-2019-317011. Impact factor: 5.44. Rank: 6% = 7/124.
Cutumisu, M., Chin, D. B., & Schwartz, D. L. (2019). A digital game-based assessment of middle-school and college students’ choices to seek critical feedback and to revise. British Journal of Educational Technology, 50(6), 2977-3003. https://doi.org/10.1111/bjet.12796. Impact factor: 2.95. Rank: 12% = 31/263.
Cutumisu, M., Vasquez*, C., Uhlich*, M., Beatty*, P., Hamayeli-Mehrabani*, H., Djebah*, R., Murtha, A., Greiner, R., & Lewis, J. (2019). PROSPeCT: A Predictive Research Online System for Prostate Cancer Tasks. JCO Clinical Cancer Informatics, 3, 1-12. DOI: 10.1200/CCI.18.00144
Cutumisu, M., Patel*, S., Brown, M. R. G., Fray*, C., von Hauff*, P., Jeffery*, T., & Schmölzer, G. M. (2019). RETAIN: A board game that improves neonatal resuscitation knowledge retention. Frontiers in Pediatrics: Neonatology, 7(13). DOI: 10.3389/fped.2019.00013. Impact factor: 2.63. Rank: 31% = 38/124.
Chin, D. B., Blair, K. P., Wolf, R. C., Conlin, L. D., Cutumisu, M., Pfaffman, J., & Schwartz, D. L. (2019). Educating and measuring choice: A test of the transfer of design thinking in problem solving and learning. Journal of the Learning Sciences, 3(28), 337-380. DOI: 10.1080/10508406.2019.1570933. Impact factor: 3.59. Rank: 6% = 16/263.
Cutumisu, M. (2019, online 2018). The association between critical feedback seeking and performance is moderated by growth mindset in a digital assessment game. Computers in Human Behavior, 93, 267-278. DOI: 10.1016/j.chb.2018.12.026. Impact factor: 5.00. Rank: 5% = 4/87.

2018

Internal medicine residents' achievement goals, emotions, efficacy, and assessments. Lia Daniels & Vijay Daniels Canadian Medical Education Journal, 9, e59-e68.
Enhancing our understanding of teachers’ personal responsibility for student motivation: Mixed insights informing theory, measurement, and practice. Lia Daniels, Cheryl Poth, Lauren Goegan. Frontiers: Educational Psychology.
Human machine interactive automatic item generation. Xinxin Zhang & Mark Gierl (NCME 2018)
Examining the effect of best distractor location on item difficulty. Jinnie Shin, Okan Bulut, & Mark Gierl (NCME 2018)
Using automated procedures to generate test items for nursing examinations. Kim Brunnert, Barbara Schneiner, Mark Gierl, & Hollis Lai (ATP 2018)
Using automatic item generation to support formative feedback in medical education. Mark Gierl, Hollis Lai, & Andre DeChamplain (CCME 2018)
Developing, analyzing, and using distractors for multiple-choice medical testing: A comprehensive review. Mark Gierl, Okan Bulut, Qi, Guo, & Xinxin, Zhang (CCME 2018)
The influence of culture and historical practice on the makeup and technical quality of national school accountability models. Okan Bulut (AERA 2018)
How score versus satisfaction relate to students' emotions following a computer-based test. Lia Daniels, Lily Le, & Lindsay Nadon (AERA 2018)
When technology does not add up: ICT use negatively predicts mathematics and science achievement for Finnish and Turkish students in PISA 2012. Okan Bulut & Maria Cutumisu. Journal of Educational Multimedia and Hypermedia, 27(1), 25-42.
The informational value of feedback choices for performance and revision in a digital assessment game. Maria Cutumisu. Interactive Technology and Smart Education. ISSN: 1741-5659.
The impact of critical feedback choice on students' revision, performance, learning, and memory. Maria Cutumisu & Daniel L. Schwartz. Computers in Human Behavior, 78, 351-367, ISSN 0747-5632. DOI: 10.1016/j.chb.2017.06.029.
Growth mindset moderates the impact of neonatal resuscitation skill maintenance on performance in a simulation training video simulator. Maria Cutumisu, Matthew Brown, Caroline Fray, & Georg Schmölzer (Canadian Paediatric Society Conference)
Growth mindset moderates the association between critical feedback and performance in a digital assessment game. Maria Cutumisu & Daniel L. Schwartz (AERA 2018)
The Effect of Feedback Choices on Mindset, Revision, and Performance in a Digital Assessment Game. Maria Cutumisu & Daniel L. Schwartz (AERA 2018)
Eye tracking students' gazes on feedback in a digital assessment game. Maria Cutumisu, Krystle-Lee Turgeon, Lydia González, Tasbire Saiyera, Steven Chuong, & Daniel L. Schwartz (ICLS)
The influence of feedback choice on university students’ revision choices and performance in a digital assessment game. Maria Cutumisu. In Proceedings of the IEEE Computational Intelligence and Games Conference (IEEE CIG)

2017

Automatic item generation and artificial intelligence. Hollis Lai & Mark J. Gierl (Maryland Conference 2017)
A large-scale progress monitoring application with computerized adaptive testing. Okan Bulut & Damien C. Cormier (IACAT 2017)
Generating rationales to support formative feedback in adaptive testing for computer adaptive testing. Mark J. Gierl & Okan Bulut (IACAT 2017)
Rationale generation: An expansion of the item generation framework. Mark J. Gierl & Xinxin Zhang (NCME 2017)
The Achilles' heel of multiple-choice items: Distractors. Okan Bulut, Mark J. Gierl, Qi Guo & Xinxin Zhang (NCME 2017)
Evaluating text similarity of generated items using graph theory. Xinxin Zhang & Mark J. Gierl (NCME 2017)
Extreme scoring machine: Integrating deep language features for developing an essay scoring framework. Syed Latifi & Mark J. Gierl (NCME 2017)
Implementing automated item generation in a large-scale medical licensing examination Program: Lessons learned. André De Champlain & Mark J. Gierl (ATP 2017)
A neural network approach to estimate student skill mastery in cognitive diagnostic assessments. Qi Guo, Maria Cutumisu, & Ying Cui (EDM 2017)
Teachers’ experience using technology to provide feedback that enhances student persuasive writing skills. Maria Cutumisu, Chantal Labonté, Vanessa Oslie, Elizabeth Gange*, Heather Brown, & Veronica Smith. LEARNing Landscapes, 11(1), 87-102.
Problem-solving attitudes and gender as predictors of academic achievement in mathematics and science for Canadian and Finnish students in the PISA 2012 assessment. Maria Cutumisu & Okan Bulut. Journal of Educational Multimedia and Hypermedia, 26(4), 325-342.
Assessing whether students seek constructive criticism: The design of an automated feedback system for a graphic design task. Maria Cutumisu, Kristen P. Blair, Doris B. Chin, & Daniel L. Schwartz. International Journal of Artificial Intelligence in Education (IJAIED), 27(3), 419-447, DOI: 10.1007/s40593-016-0137-5, Springer.

2016

Modeling the global text features for enhancing the automated scoring system. Syed F. Latifi & Mark J. Gierl (NCME 2016)
Recovering the item model structure from automatically generated items using graph theory Xinxin Zhang & Mark J. Gierl (NCME 2016)
Examining position effects in large-scale assessments using an SEM approach. Okan Bulut, Qi Guo, & Mark J. Gierl (ITC 2016)
Criterion-related validity of subscores in high school diploma examinations. Okan Bulut (ITC 2016)
Examining testlet-position effects of reading passages in computer-based assessments. Okan Bulut, Xiadong Hou, & Ming Lei (NCME 2016)
Understanding nonresponse behaviors of students with disabilities in alternate assessments. Okan Bulut, Ming Lei, Mehmet Kaplan, & Damien Cormier (AERA 2016)
Linguistic demand of cognitive test directions across commonly used batteries. Damien Cormier, Deepak Singh, & Okan Bulut (NASP 2016)

2015

A Novel Approach for Quantify Semantics of Automatically Generated Items. Syed F. Latifi, Mark J. Gierl, Ren Wang & Andong Wang (NCME 2015)
Developing and Validating the Attitudes Towards Mistakes Inventory (ATMI): A Self-Report Measure. Jacqueline P. Leighton, Wei Tang, Qi Guo (NCME 2015)
Accounting for Affective States in Response Processing Data: Impact for Validation. Jacqueline P. Leighton (NCME 2015)
A Method for Multilingual Automatic Item Generation. Mark J. Gierl, Hollis Lai, Lorena Houston, Changhua Rich & Keith Boughton (ATP 2015)
Evaluating the Psychometric Properties of Generated Test Items. Mark J. Gierl, Hollis Lai, André-Philippe Boulais, André De Champlain, Claire Touchie & Debra Pugh (ATP 2015)
Using Automatic Item Generation to Develop Practice Non-Verbal Reasoning Items for a High-Stakes Admissions Test. Marita Ball & Mark J. Gierl (ATP 2015)
Posterlet: A game-based assessment of children’s choices to seek feedback and to revise. Maria Cutumisu, Kristen P. Blair, Doris B. Chin, & Daniel L. Schwartz (2015). Journal of Learning Analytics, 2(1), 49-71. Acceptance rate: 43%. https://doi.org/10.7939/R31R6ND7N

2014

Student and School factors Associated with Aberrant Response Patterns on a Large Scale Assessment. Amin Mousavi & Ying Cui (AERA 2014)
Using hierarchy linear modeling to examine factors predicting students' reading achievement. Karen Fung & Samira ElAtia (CSSE 2014)
Evaluating the quality of items generated using automatic processes. Mark Gierl, Fahad Latifi, Hollis Lai, Donna Matovinovic, & Keith Boughton (NCME 2014)
Technology-enhanced scoring of a multilingual medical licensing examination. Fahad Latifi, Mark Gierl, Andre-Phillip Boulais, & Andre DeChamplain (NCME 2014)
Creating plausible distractors in an item generation framework. Hollis Lai, Mark Gierl, & Andre DeChamplain (Ottawa Conference, 2014)

2013

Analyzing data from educational surveys: a comparison of HLM and Multilevel IRT. Amin Mousavi (NCME 2013)
Evaluation the translations on item models in automatic item generation. Karen Fung & Mark J. Gierl (NCME 2013)
Using linked elements for creating item models of multiple languages. Karen Fung & Mark J. Gierl (CSSE 2013)
Evaluate the performance if Iz and I*z of person fit: A simulation study.Amin Mousavi & Ying Cui (NCME 2013)
Internal Consistency: Do We Really Know What It Is and How To Assess It? Wei Tang & Ying Cui (AERA 2013)
Towards Automated Scoring using Open-Source Technologies. Sayed M. Fahad Latifi, Qi Guo, Mark J. Gierl, Amin Mousavi, Karen Fung (CSSE 2013)
Establishing Item Uniqueness for Automatic Item General. Sayed M. Fahad Latifi, Mark J. Gierl, Hollis Lai & Karen Fung (NCME 2013)
Defeating the Automated Scoring: Is it Possible to Cheat in Automatic Essay Scoring? Syed M. Fahad Latifi, Karen Fung, Mark J. Gierl, Amin Mousavi & Qi Guo (AERA 2013)
Using Automated Processes to Generate Test Items in Multiple Languages. Dr. Mark J. Gierl, Karen Fung, Dr. Hollis Lai, Dr. Bin Zheng (NCME 2013)

2012

Gestalt Principles in Physics Education: Does it come with Teaching Experience?. Man-Wai Chu (CSSE 2012)
Testing Expert-Based vs. Student-Based Cognitive Models for a Grade 3 Diagnostic Mathematics Assessment. Mary Roduta Roberts (AERA 2012)
Issues of Cost, Time and Validity: Psychometric Perspectives on Technologically-Rich Innovative Assessments (TRIAs). Jacquelinen P. Leighton (AERA 2012)
Bootstrap Confidence Intervals for the Range-Restricted Coefficient Alpha. Johnson Ching-hong Li, Ying Cui, Mark J. Gierl & Wai Chan (AERA 2012)
Examining Language Proficiency, Test Performance, and Test Fairness using Data from the Pan-Canadian Assessment Program. Karen Fung, Samilar ElAtia, & Mark J. Gierl (AERA 2012)
A Simulation Study for Comparing Three Lower Bounds to Reliability. Wei Tang & Ying Cui (AERA 2012)
Detecting Directional DIF using CATSIB with Impact Present. Man-Wai Chu, Hollis Lai, Xian Wang (NCME 2012)
Estimating Classification Consistency and Accuracy for Cognitive Diagnostic Assessment. Ying Cui (NCME 2012)
Design Principles Required for Skills-Based Calibrated Item Generation. Hollis Lai & Mark J. Gierl (NCME 2012)
Item Consistency Index: An Item-Fit Index for Cognitive Diagnostic Assessment. Hollis Lai, Mark J. Gierl & Ying Cui (NCME 2012)
Methods for Creating and Evaluating the Item Model Structure Used In Automatic Item Generation. Mark J. Gierl, Hollis Lai & Krista Breithaupt (NCME 2012)

2011

Developing and Evaluating Score Reports for a Diagnostic Mathematics Assessment. Mary Roduta Roberts & Mark Gierl (AERA 2011)
Does Culture have an Effect on Cognitive Patterns? Examination of Cultural Effect on Categorization. Alex Riedel & Qi Guo (AERA 2011)
The Role of Item Models in Automatic Item Generation. Mark Gierl & Hollis Lai (NCME 2011)
A Comparison of Logistic Regression, CSIBTEST, and Combined Decision Rule for Detection of Uniform and Nonuniform DIF Items using Real Data. Qi Guo & Alex Riedel (NCME 2011)

2010

Evaluating Statistical Reasoning of College Students in the Social and Health Sciences with Cognitive Diagnostic Assessment. Ying Cui, Mary Roduta Roberts, Andrea Gotzmann (AERA 2010)
Do cognitive models consistently show good model-data-fit for students at different ability levels?. Andrea Gotzmann, Mary Roduta Roberts (AERA 2010)
Using Automated Item Generation to Promote Principled Test Design and Development. Cecilia B. Alves, Mark J. Gierl, & Hollis Lai (AERA 2010)
Using Principled Test Design to Develop and Evaluate a Diagnostic Mathematics Assessment in Grades 3 and 6. Mark J. Gierl, Cecilia Alves, & Renate Taylor Maueau (AERA 2010)

2009

Two Types of Think Aloud Interview for Educational Measurement: Protocol and Verbal Analysis. Jacqueline P. Leighton (NCME 2009)
Using Cognitive Models to Evaluate Ethnicity and Gender Differences. Andrea Gotzmann, Mary Roduta Roberts, Cecilia Brito Alves, & Mark J. Gierl (AERA 2009)
Development of a Framework for Diagnostic Score Reporting. Mary Roduta Roberts & Mark J. Gierl (AERA 2009)
Estimating the Attribute Hierarchy Method with Mathematica. Ying Cui, Mark Gierl, & Jacqueline Leighton
A Comparison of Three Weighting Procedures for High- and Low-Stakes Examinations with Mixed Item Formats in Different Subject Areas. W. Todd Rogers & Denise M. Nowicki (NCME 2009)
Three Applications of Automated Test Assembly within a User-Friendly Modeling Environment. Ken Cor, Cecilia Alves & Mark J. Gierl (NCME, 2009)
Attribute Reliability in Cognitive Diagnostic Assessment. Jiawen Zhou, Mark J. Gierl & Ying Cui (NCME, 2009)
Development of Cognitive Models in Mathematics to Promote Diagnostic Inferences about Student Performance. Mary Roduta Roberts, Cecilia Brito Alves, Andrea Gotzmann & Mark J. Gierl (AERA, 2009)
Using Judgments from Content Specialists to Develop Cognitive Models for Diagnostic Assessments. Mark J. Gierl, Mary Roberts, Cecilia Alves & Andrea Gotzmann (NCME, 2009)

2008

An Experimental Test of Student Verbal Reports and Expert Teacher Evaluation as a Source of Validity Evidence for Test Development. Jacqueline P. Leighton, Colleen Heffernan, M. Kenneth Cor, Rebecca J. Gokiert & Ying Cui (AERA, 2008)
The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment. Ying Cui & Jacqueline P. Leighton (AERA, 2008
Using Cochran's Z Statistic to Test the Kernel-Smoothed IRF Differences between Focal and Reference Group. Yinggan Zheng & Mark J. Gierl (AERA, 2008)
Computerized Adaptive-Attribute Testing: Incorporating Psychological Principles with Assessment Practices in Computerized Adaptive Testing. Jiawen Zhou, Mark J. Gierl & Ying Cui (NCME, 2008)
The Role of Academic Confidence and Epistemological Beliefs in Syllogistic Reasoning Performance. Carol M. Okamoto, Jacqueline P. Leighton & M. Kenneth Cor (AERA, 2008)
Testing Expert-Based and Student-Based Cognitive Models: An Application of the Attribute Hierarchy Method and Hierarchy Consistency Index. Jacqueline P. Leighton, Ying Cui & M. Kenneth Cor (NCME, 2008)

2007

Cognitive-Psychometric Modeling of the MELAB Reading Items. Lingyun Gao & Todd Rogers (NCME, 2007)
Purposes of an Issues with the Provincial Testing Programs in Alberta. W. Todd Rogers & Donald A. Klinger (NCME, 2007)
Using Connectionist Models to Evaluate Examinees' Response Patterns on Tests: An Application of the Attribute Hierarchy Method to Assessment Engineering. Mark J. Gierl, Ying Cui & Steve Hunka (NCME, 2007)
Using Real Data to Compare DIF Detection and Effect Size Measures among Mantel-Haenszel, SIBTEST, and Logistic Regression Procedures. Yinggan Zheng, Mark J. Gierl & Ying Cui (NCME, 2007)
Investigating the Cognitive Attributes Underlying Student Performance on the SAT Critical Reading Subtest: An Application of the Attribute Hierarchy Method. Changjiang Wang & Mark J. Gierl (NCME, 2007)
Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees' Cognitive Skills. Mark J. Gierl (ATP, 2007)

2006

The Hierarchy Consistency Index: A Person-fit Statistic for the Attribute Hierarchy Method. Ying Cui, Jacqueline P. Leighton, Mark J. Gierl, & Steve M. Hunka (NCME, 2006)
Simulation Studies for Evaluating the Performance of the Two Classification Methods in the AHM. Ying Cui, Jacqueline P. Leighton, & Yinggan Zheng (NCME, 2006)
Evaluating DETECT Classification Accuracy and Consistency when Data Display Complex Structure. Mark J. Gierl, Jacqueline P. Leighton, & Xuan Tan (NCME, 2006)
A Three-Stage Approach for Identifying Gender Differences on Large-Scale Science Assessments. Rebecca J. Gokiert & Jacqueline P. Leighton (NCME, 2006)
Validity of the Simultaneous Approach to the Development of Equivalent Achievement tests in English and French (Stage III). Jie Lin & W. Todd Rogers (NCME, 2006)
Investigating the Cognitive Attributes Underlying Student Performance on a Foreign Language Reading Test: An Application of the Attribute Hierarchy Method. Changjiang Wang, Mark J. Gierl, & Jacqueline P. Leighton (NCME, 2006)
Evaluating the Performance of SIBTEST and MULTISIB Using Different Matching Criteria. Jiawen Zhou, Mark J. Gierl, & Xuan Tan (NCME, 2006)
Evaluating the Consistency of DETECT Indices and Item Clusters Using Simulated and Real Data that Display both Simple and Complex Structure. Xuan Tan & Mark J. Gierl (AERA, 2006)

2005

Evaluating DETECT Classification Accuracy and Consistencywhen Data Display Complex Structure. Mark J. Gierl, Jacqueline P. Leighton, & Xuan Tan (CSSE, 2005)
Investigating Test Items Designed to Measure Higher-Order Reasoning using Think-Aloud Methods: Implications for Construct Validity and Alignment. Jacqueline P. Leighton & Rebecca J. Gokiert (AERA, 2005)
Investigating the Statistical and Cognitive Dimensions in Large-Scale Science Assessments: Causal and Categorical Reasoning in Science. Jacqueline P. Leighton, Rebecca J. Gokiert*, & Ying Cui (AERA, 2005)
The Cognitive Effects of Test Item Features: Informing Item Generation by Identifying Construct Irrelevant Variance. Jacqueline P. Leighton & Rebecca J. Gokiert (NCME, 2005)
Identifying cognition dimensions that affect student performance on the new SAT. Mark J. Gierl, Xuan Tan, & Changjiang Wang (NCME, 2005)
Validity of the Simultaneous Approach to the Development of Equivalent Achievement Tests in English and French (Stage II). Jie Lin & W. Todd Rogers (NCME, 2005)
Using Five Procedures to Detect DIF with Passage-Based Testlets. Lingyun Gao & Changjiang Wang (NCME, 2005)
Using Global and Local DIF Analyses to Assess DIF across Language Groups. Xuan Tan & Mark J. Gierl (NCME, 2005)

2004

Using a Multidimensionality-Based Framework to Identify and Interpret the Construct-Related Dimensions that Elicit Group Differences. Mark J. Gierl (AERA, 2004)
Gender Differential Item Functioning on the WISC-II: Analysis of the Canadian Standardization Sample. Rebecca J. Gokiert & Kathryn L. Ricker (AERA, 2004)
Robustness of Lord's Formulas for Item Difficulty and Discrimination Conversions between Classical and Item Response Theory Models. Tess Dawber (AERA, 2004)
Standard Setting Using the Attribute Hierarchy Model. Gregory S. Sadesky (NCME, 2004)
The Identification and Interpretation of Group Differences on the Canadian Language Benchmarks Assessment Reading Items. Marilyn Abbott (NCME, 2004)
Using the Multidimensionality-Based DIF Analysis Paradigm to Study Cognitive Skills that Elicit Group Differences: A Critique. Mark J. Gierl (NCME, 2004)

2003

Setting Cut Scores: Critical Review of Angoff and Modified-Angoff Methods. Kathryn L. Ricker (CSSE, 2003)
Standard Setting For Complex Performance Assessments: A Critical Examination of the Analytic Judgment Method. Marilyn Abbott (CSSE, 2003)
Cluster Analysis and its Application In Standard Setting. Gregory S. Sadesky (CSSE, 2003)
Standard-setting Issues in Computerized-Adaptive Testing. Matthew M. Gushta (CSSE, 2003)
The Bookmark Standard Setting Procedure: Strengths and Weaknesses. Jie Lin (CSSE, 2003)
Promoting Gender Equity in Alberta's Provincial Social Studies 30 Diploma Examinations. Marilyn Abbott (NCME, 2003)
Differential Validity and Utility of Successive and Simultaneous Approaches to the Development of Equivalent Achievement Tests in French and English. W. Todd Rogers, Mark J. Gierl, Claudette Tardif, & Jie Lin (NCME, 2003)
Implications of the Multidimensionality-Based DIF Analysis Framework for Selecting a Matching and Studied Subtest. Mark J. Gierl & Daniel M. Bolt (NCME, 2003)
Evaluating the Comparability of English- and French-Speaking Examinees on a Science Achievement Test Administered using Two-Stage Testing. Gautam Puhan & Mark J. Gierl (NCME, 2003)
Differential Performance by Gender in Foreign Language Testing. Jie Lin & Fenglan Wu (NCME, 2003)

2002

Identifying Content and Cognitive Skills that Produce Gender Differences in Mathematics: A Demonstration of the DIF Analysis Framework. Mark J. Gierl, Jeffrey Bisanz, Gay L. Bisanz, & Keith A. Boughton (NCME, 2002)
The Attribute Hierarchy Model for Cognitive Assessment. Jacqueline P. Leighton, Mark J. Gierl, & Stephen M. Hunka (NCME, 2002)
The Cognitive Experience of Bookmark Standard Setting Participants. Tess Dawber & Daniel M. Lewis (AERA, 2002)
Cognition or Motivation: What leads to Performance Differences in Science. Gautam Puhan & Huiqin Hu (NCME, 2002)

2001

Illustrating the Utility of Differential Bundle Functioning Analyses to Identify and Interpret Group Differences on Achievement Tests. Mark J. Gierl, Jeffrey Bisanz, Gay L. Bisanz, & Keith A. Boughton (AERA, 2001)
Effects of Randon Rater Error on Parameter Recovery of the Generalized Partial Credit Model and Graded Response Model. Keith A. Boughton, Don A. Klinger, & Mark J. Gierl (NCME, 2001)
Differential Bundle Functioning on Three Achievement Tests: A Comparison of Aboriginal and Non-Aboriginal Examinees. Christine N. Vandenberghe & Mark J. Gierl (AERA, 2001)
Construction of Automated Parallel Forms and Multiple Parallel Panels in Computer-Adaptive Sequential Testing: New Measures of Parallelism and Their Applications. Keith A. Boughton, Fernando L. Cartwright, & Mark J. Gierl (AERA, 2001)
Differential Bundle Functioning on Social Studies High School Certification Exams. Keith A. Boughton, Tess E. Dawber, & Laurie-Ann M. Hellsten (AERA, 2001)

2000

Identifying Sources of Differential Item Functioning on Translated Tests: A Confirmatory Approach. Mark Gierl & Shameem Nyla Khaliq (NCME, 2000)
Reducing Type I Error Using an Effect Size Measure with the Logistic Regression Procedure for DIF detection. Michael Jodoin & Mark Gierl (NCME, 2000)
Comparison of Ability Estimates from Dichotomously and Nominally-Scored Testwise Susceptible and Non-susceptible Items. Joanna Tomkowicz & W. Todd Rogers (AERA, 2000)
Performance of Mantel-Haenszel, SIBTEST, and Logistic Regression when the Number of DIF items is Large. Mark Gierl, Michael Jodoin, & Terry Ackerman (AERA, 2000)
Automated Test Assembly Procedures for Criterion-Referenced Testing Using Optimization Heuristics. Keith A. Boughton & Mark J. Gierl (AERA, 2000)
Differential Bundle Functioning on Mathematics and Science Achievement Tests: A Small Step Toward Understanding Differential Performance. Keith A. Boughton, Mark J. Gierl, & Shameem Nyla Khaliq (CSSE, 2000)

1999

Assessing the Computational Accuracy in Statistical Packages. Steve Hunka (August, 1999)
Using Statistical and Jugmental Reviews to Identify and Interpret Translation DIF. Mark J. Gierl, W. Todd Rogers, & Don Klinger (NCME, 1999)
Gender Differential Item Functioning in Mathematics and Science: Prevalence and Policy Implications. Mark Gierl, Shameem Nyla Khaliq, & Keith A. Boughton (CSSE, 1999)

1998

Teacher Evaluation. Robert Stake (November, 1998)
Principles for Fair Student Assessment Practices for Education in Canada. Downloadable in English or French

Report abuse