Presnosť, validita a efektívnosť počítačového adaptívneho testovania

Bibliografický odkaz pre citovanie

Žitný, P. (2011). Presnosť, validita a efektívnosť počítačového adaptívneho testovania. Československá psychologie, roč. 55, č. 2, s. 167-179. ISSN 0009-062X. Dostupné na internete: <http://tinyurl.com/zitny>

Abstrakt

Súčasný vývoj v oblasti psychologického hodnotenia zdôrazňuje zlepšovanie metodológie a význam zvyšovania efektívnosti. Algoritmy počítačového adaptívneho testovania (CAT) založené na teórii odpovede na položku (IRT) ponúkajú zaujímavé príležitosti pre súčasnú optimalizáciu ako presnosti, tak aj efektívnosti merania. Tento článok prezentuje zistenia 15 výskumných štúdií z oblasti testovania schopností, klinickej psychológie, testovania osobnosti a zdravotníckej starostlivosti zameraných na skúmanie reliability, užitočnosti (v zmysle úspory položiek) a validity (v zmysle korelácií s existujúcimi nástrojmi) CAT. Celkovo sú zistenia povzbudivé – CAT poskytuje efektívny prostriedok pre získanie optimálneho množstva informácie potrebnej pre zodpovedanie posudzovanej otázky, a to využitím minimálneho množstva času a/alebo počtu položiek pre získanie danej informácie. CAT skóre silno korelovalo so skóre z celej položkovej banky (rozpätie r = 0,83 – 0,99) a stredne silno so zaužívanými nástrojmi (rozpätie r = 0,58 – 0,83) poskytujúc dôkazy pre reliabilitu, validitu a porovnateľnosť adaptívnych nástrojov. Avšak tieto výsledky sú založené hlavne na CAT simulačných štúdiách a preto sú potrebné ďalšie Live-CAT štúdie (zahŕňajúce administráciu skutočných testov živým respondentom), aby tieto zistenia potvrdili.

Kľúčové slová

Teória odpovede na položku. Počítačové adaptívne testovanie. Validita. Reliabilita. Porovnateľnosť.

Jazyk práce

slovenský

Full-text

Uvítam, keď mi budúci autor vlastnej publikácie pošle stručnú správu na e-mail o tom, v akom publikačnom výstupe túto prácu použil.

Zoznam použitej literatúry

ASVAB. (2010). Official Site of the ASVAB Testing Program. [online], [citované 10.05.2010]. Dostupné na internete: <http://tinyurl.com/asvab2010>

Ayala, R.J. (2009). The Theory and Practice of Item Response Theory (Methodology In The Social Sciences). 1th ed. New York: The Guilford Press, 448 pp. ISBN 978-1-59385-869-8.

Becker, J. - Fliege, H. - Kocalevent, R.-D. - Bjorner, J.B. - Rose, M. - Walter, O.B. - Klapp, B.F. (2008). Functioning and validity of A Computerized Adaptive Test to measure anxiety (A-CAT). Depression and Anxiety, vol. 25, no. 12, pp. E182-E194. ISSN 1091-4269.

Butcher, J.N. - Keller, L.S. - Bacon, S.F. (1985). Current Developments and Future Directions in Computerized Personality Assessment. Journal of consulting and clinical psychology, vol. 53, no. 6, pp. 803-815. ISSN 0022-006X.

Butcher, J.N. - Perry, J. - Hahn, J. (2004). Computers in clinical assessment: Historical developments, present status, and future challenges. Journal of clinical psychology, vol. 60, no. 3, pp. 331-345. ISSN 0021-9762.

Butcher, J.N. - Perry, J.N. - Atlis, M.M. (2000). Validity and utility of computer-based test interpretation. Psychological Assessment, vol. 12, no. 1, pp. 6-18. ISSN 1040-3590.

Embretson, S.E. - Reise, S.P. (2000). Item Response Theory for Psychologists (Multivariate Applications Book Series). 1th ed. Mahwah, NJ: Lawrence Erlbaum Associates, 376 pp. ISBN 0-8058-2819-2.

ETS. (2010a). Graduate Record Examinations - About the GRE General Test. Educational Testing Service. [online], [citované 10.05.2010]. Dostupné na internete: <http://tinyurl.com/ets2010a>

ETS. (2010b). The GRE revised General Test - Launching in 2011. Educational Testing Service. [online], [citované 10.05.2010]. Dostupné na internete: <http://tinyurl.com/ets2010b>

Finger, M.S. - Ones, D.S. (1999). Psychometric equivalence of the computer and booklet forms of the MMPI: A meta-analysis. Psychological Assessment, vol. 11, no. 1, pp. 58-66. ISSN 1040-3590.

Fliege, H. - Becker, J. - Walter, O.B. - Bjorner, J.B. - Klapp, B.F. - Rose, M. (2005). Development of a computer-adaptive test for depression (D-CAT). Quality of life research, vol. 14, no. 10, pp. 2277-2291. ISSN 0962-9343.

Fliege, H. - Becker, J. - Walter, O.B. - Rose, M. - Bjorner, J.B. - Klapp, B.F. (2009). Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application. International journal of methods in psychiatric research, vol. 18, no. 1, pp. 23-36. ISSN 1049-8931.

Goodwin, L.D. (2002). Changing conceptions of measurement validity: An update on the new standards. Journal of nursing education, vol. 41, no. 3, pp. 100-106. ISSN 0148-4834.

Halama, P. (2005). Adaptívne testovanie pomocou počítača: Aplikácia teórie odpovede na položku v diagnostike inteligencie. Psychológia a patopsychológia dieťaťa, roč. 40, č. 3, s. 252-266. ISSN 055-5574.

Halama, P. (2011). Princípy psychologickej diagnostiky. 2. vyd. Trnava: Filozofická fakulta Trnavskej univerzity v Trnave, 208 s. ISBN 978-80-8082-451-8.

Halama, P. - Bieščad, M. (2006). Psychometrická analýza Rosenbergovej škály sebahodnotenia s použitím metód klasickej teórie testov (CTT) a teórie odpovede na položku (IRT). Československá psychologie, roč. 50, č. 6, s. 588-603. ISSN 0009-062X.

Halama, P. - Bieščad, M. (2009). Item response theory analysis of the CORE-OM. In: 11th European Congress of Psychology ECP09. Oslo, Norway 7 – 10 July 2009: Abstracts, CD-ROM, 63 pp.

Haley, S.M. - Fragala-Pinkham, M.A. - Dumas, H.M. - Ni, P. - Gorton, G.E. - Watson, K. - Montpetit, K. - Bilocleau, N. - Hambleton, R.K. - Tucker, C.A. (2009). Evaluation of an Item Bank for a Computerized Adaptive Test of Activity in Children With Cerebral Palsy. Physical therapy, vol. 89, no. 6, pp. 589-600. ISSN 0031-9023.

Handel, R.W. - Ben-Porath, Y.S. - Watt, M. (1999). Computerized adaptive assessment with the MMPI-2 in a clinical setting. Psychological Assessment, vol. 11, no. 3, pp. 369-380. ISSN 1040-3590.

Harwell, M. - Stone, C.A. - Hsu, T.C. - Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, vol. 20, no. 2, pp. 101-125. ISSN 0146-6216.

Jelínek, M. - Květon, P. - Denglerová, D. (2006). Adaptivní testování - základní pojmy a principy. Československá psychologie, roč. 50, č. 2, s. 163-173. ISSN 0009-062X.

Jette, A.M. - Haley, S.M. - Tao, W. - Ni, P.S. - Moed, R. - Meyers, D. - Zurek, M. (2007). Prospective evaluation of the AM-PAC-CAT in outpatient rehabilitation settings. Physical therapy, vol. 87, no. 4, pp. 385-398. ISSN 0031-9023.

Kocalevent, R.D. - Matthias, R. - Becker, J. - Walter, O.B. - Fliege, H. - Bjorner, J.B. - Kleiber, D. - Klapp, B.F. (2009). An evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception. Journal of clinical epidemiology, vol. 62, no. 3, pp. 278-287. ISSN 0895-4356.

Kopec, J.A. - Badii, M. - McKenna, M. - Lima, V.D. - Sayre, E.C. - Dvorak, M. (2008). Computerized adaptive testing in back pain - Validation of the CAT-5D-QOL. Spine, vol. 33, no. 12, pp. 1384-1390. ISSN 0362-2436.

Kosinski, M. - Bjorner, J.B. - Ware, J.E. - Sullivan, E. - Straus, W.L. (2006). An evaluation of a patient-reported outcomes found computerized adaptive testing was efficient in assessing osteoarthritis impact. Journal of clinical epidemiology, vol. 59, no. 7, pp. 715-723. ISSN 0895-4356.

Květon, P. - Jelínek, M. - Denglerová, D. - Vobořil, D. (2008). Software pro adaptivní testování: CAT v praxi. Československá psychologie, roč. 52, č. 2, s. 145-154. ISSN 0009-062X.

Kveton, P. - Jelinek, M. - Voboril, D. - Klimusova, H. (2007). Computer-based tests: the impact of test design and problem of equivalency. Computers in human behavior, vol. 23, no. 1, pp. 32-51. ISSN 0747-5632.

Květon, P. - Jelínek, M. - Vobořil, D. - Klimusová, H. (2003). Ekvivalence tradiční a počítačové formy testu IST-70. Československá psychologie, roč. 47, č. 6, s. 562-572. ISSN 0009-062X.

Květon, P. - Klimusová, H. (2002). Metodologické aspekty počítačové administrace psychodiagnostických metod. Československá psychologie, roč. 46, č. 3, s. 251-264. ISSN 0009-062X.

Leung, C.K. - Chang, H.H. - Hau, K.T. (2003). Computerized adaptive testing: A comparison of three content balancing methods. Journal of Technology, Learning, and Assessment, vol. 2, no. 5, pp. 1-15. ISSN 1540-2525.

Moreno, K.E. - Segall, D.O. (2006). Reliability and Construct Validity of CAT-ASVAB. In: CAT-ASVAB Technical Bulletin No. 1 (pp. 169-174). Personnel Testing Division Defense Manpower Data Center, 293 pp. [online], [citované 09.06.2009]. Dostupné na internete: <http://tinyurl.com/moreno2006>

Mulcahey, M.J. - Haley, S.M. - Duffy, T. - Pengsheng, N. - Betz, R.R. (2008). Measuring Physical Functioning in Children With Spinal Impairments With Computerized Adaptive Testing. Journal of pediatric orthopaedics, vol. 28, no. 3, pp. 330-335. ISSN 0271-6798.

Reise, S.P. - Henson, J.M. (2000). Computerization and adaptive administration of the NEO PI-R. Assessment, vol. 7, no. 4, pp. 347-364. ISSN 1073-1911.

Roper, B.L. - Ben-Porath, Y.S. - Butcher, J.N. (1995). Comparability and Validity of Computerized Adaptive Testing With the MMPI-2. Journal of Personality Assessment, vol. 65, no. 2, p. 358. ISSN 00223891.

Sebille, V. - Hardouin, J.B. - Le Neel, T. - Kubis, G. - Boyer, F. - Guillemin, F. - Falissard, B. (2010). Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study. BMC medical research methodology, vol. 10, no. 24, pp. 1-10. ISSN 1471-2288.

Segall, D.O. - Moreno, K.E. (1999). Development of the Computerized Adaptive Testing Version of the Armed Services Vocational Aptitude Battery. In: F. Drasgow - J. B. Olson-Buchanan (Eds.), Innovations in Computerized Assessment (pp. 35-65). Mahwah, NJ: Lawrence Erlbaum Associates, 266 pp. ISBN 0585114862. [online], [citované 20.06.2009]. Dostupné na internete: <http://tinyurl.com/segall1999>

Schaeffer, G.A. - Bridgeman, B. - Golub-Smith, M.L. - Lewis, C. - Potenza, M.T. - Steffen, M. (1998). Comparability of Paper-and-Pencil and Computer Adaptive Test Scores on the GRE General Test. Princeton, NJ: Educational Testing Service (Research Report No: RR-98-38), 25 pp.

Schaeffer, G.A. - Reese, C.M. - Steffen, M. - McKinley, R.L. - Mills, C.N. (1993). Field Test of a Computer-Based GRE General Test. Princeton, NJ: Educational Testing Service (Research Report No: RR-93-07), 48 pp.

Schaeffer, G.A. - Steffen, M. - Golub-Smith, M.L. - Mills, C.N. - Durso, R. (1995). The Introduction and Comparability of the Computer Adaptive GRE General Test. Princeton, NJ: Educational Testing Service (Research Report No: RR-95-20), 40 pp.

Schuhfried. (2010). Vienna Test System. Austria: Schuhfried GmbH. [online], [citované 01.07.2011]. Dostupné na internete: <http://tinyurl.com/schuhfried2010>

Simms, L.J. - Clark, L.A. (2005). Validation of a Computerized Adaptive Version of the Schedule for Nonadaptive and Adaptive Personality (SNAP). Psychological Assessment, vol. 17, no. 1, pp. 28-43. ISSN 10403590.

Urbánek, T. (2002). Základy psychometriky. 1. vyd. Masarykova univerzita: Masarykova univerzita, 154 s. ISBN 80-210-2797-5.

Urbánek, T. - Šimeček, M. (2001). Teorie odpovědi na položku. Československá psychologie, roč. 45, č. 5, s. 428-440. ISSN 0009-062X.

Vispoel, W.P. (2000). Computerized versus paper-and-pencil assessment of self-concept: Score comparability and respondent preferences. Measurement and evaluation in counseling and development, vol. 33, no. 3, pp. 130-143. ISSN 0748-1756.

Vispoel, W.P. - Boo, J. - Bleiler, T. (2001). Computerized and paper-and-pencil versions of the Rosenberg Self-Esteem Scale: A comparison of psychometric features and respondent preferences. Educational and Psychological Measurement, vol. 61, no. 3, pp. 461-474. ISSN 0013-1644.

Waller, N.G. - Reise, S.P. (1989). Computerized Adaptive Personality Assessment: An Illustration With the Absorption Scale. Journal of personality and social psychology, vol. 57, no. 6, pp. 1051-1058. ISSN 0022-3514.

Weiss, D.J. (1985). Adaptive testing by computer. Journal of consulting and clinical psychology, vol. 53, no. 6, pp. 774-789. ISSN 0022-006X.

Weiss, D.J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and evaluation in counseling and development, vol. 37, no. 2, pp. 70-84. ISSN 0748-1756.

Williams, J.E. - McCord, D.M. (2006). Equivalence of standard and computerized versions of the Raven Progressive Matrices Test. Computers in human behavior, vol. 22, no. 5, pp. 791-800. ISSN 0747-5632.

Wolfe, J.H. - Moreno, K.E. - Segall, D.O. (2006). Evaluating the Predictive Validity of CAT-ASVAB. In: CAT-ASVAB Technical Bulletin No. 1 (pp. 175-180). Personnel Testing Division Defense Manpower Data Center, 293 pp. [online], [citované 09.06.2009]. Dostupné na internete: <http://tinyurl.com/wolfe2006>

Abstrakty článkov z použitej literatúry

Becker, J. - Fliege, H. - Kocalevent, R.-D. - Bjorner, J.B. - Rose, M. - Walter, O.B. - Klapp, B.F. (2008). Functioning and validity of A Computerized Adaptive Test to measure anxiety (A-CAT). Depression and Anxiety, vol. 25, no. 12, pp. E182-E194. ISSN 1091-4269.

ABSTRAKT: Background: The aim of this study was to evaluate the Computerized Adaptive Test to measure anxiety (A-CAT), a patient-reported outcome questionnaire that uses computerized adaptive testing to measure anxiety. Methods: The A-CAT builds on an item bank of 50 items that has been built using conventional item analyses and item response theory analyses. The A-CAT was administered on Personal Digital Assistants to n=357 patients diagnosed and treated at the department of Psychosomatic Medicine and Psychotherapy, Germany. For validation purposes, two subgroups of patients (n=110 and 125) answered the A-CAT along with established anxiety and depression questionnaires. Results: The A-CAT was fast to complete (on average in 2 min, 38 s) and a precise item response theory based CAT score (reliability>.9) could be estimated after 4–41 items. On average, the CAT displayed 6 items (SD=4.2). Convergent validity of the A-CAT was supported by correlations to existing tools (Hospital Anxiety and Depression Scale-A, Beck Anxiety Inventory, Berliner Stimmungs-Fragebogen A/D, and State Trait Anxiety Inventory: r=.56-.66); discriminant validity between diagnostic groups was higher for the A-CAT than for other anxiety measures. Conclusions: The German A-CAT is an efficient, reliable, and valid tool for assessing anxiety in patients suffering from anxiety disorders and other conditions with significant potential for initial assessment and long-term treatment monitoring. Future research directions are to explore content balancing of the item selection algorithm of the CAT, to norm the tool to a healthy sample, and to develop practical cutoff scores.

Butcher, J.N. - Keller, L.S. - Bacon, S.F. (1985). Current Developments and Future Directions in Computerized Personality Assessment. Journal of consulting and clinical psychology, vol. 53, no. 6, pp. 803-815. ISSN 0022-006X.

ABSTRAKT: Contends, on the basis of a review of current examples of computer usage in personality assessment, that there is wide acceptance of automated clerical tasks such as test scoring and administration. The computer is also writing narrative interpretive reports from test results. Three proposed strategies (countdown, adaptive typological, and a strategy borrowed from ability testing) for developing computerized adaptive personality tests are described.

Butcher, J.N. - Perry, J. - Hahn, J. (2004). Computers in clinical assessment: Historical developments, present status, and future challenges. Journal of clinical psychology, vol. 60, no. 3, pp. 331-345. ISSN 0021-9762.

ABSTRAKT: Computerized testing methods have long been regarded as a potentially powerful asset for providing psychological assessment services. Ever since computers were first introduced and adapted to the field of assessment psychology in the 1950s, they have been a valuable aid for scoring, data processing, and even interpretation of test results. The history and status of computer-based personality and neuropsychological tests are discussed in this article. Several pertinent issues involved in providing test interpretation by computer are highlighted. Advances in computer-based test use, such as computerized adaptive testing, are described and problems noted. Today, there is great interest in expanding the availability of psychological assessment applications on the Internet. Although these applications show great promise, there are a number of problems associated with providing psychological tests on the Internet that need to be addressed by psychologists before the Internet can become a major medium for psychological service delivery. (C) 2004 Wiley Periodicals, Inc.

Butcher, J.N. - Perry, J.N. - Atlis, M.M. (2000). Validity and utility of computer-based test interpretation. Psychological Assessment, vol. 12, no. 1, pp. 6-18. ISSN 1040-3590.

ABSTRAKT: Computers have been important to applied psychology since their introduction, and the application of computerized methods has expanded in recent decades. The application of computerized methods has broadened in both scope and depth. This article explores the most recent uses of computer-based assessment methods and examines their validity. The comparability between computer-administered tests and their pencil-and-paper counterparts is discussed. Basic decision making in psychiatric screening, personality assessment, neuropsychology, and personnel psychology is also investigated. Studies on the accuracy of computerized narrative reports in personality assessment and psychiatric screening are then summarized. Research thus far appears to indicate that computer-generated reports should be viewed as valuable adjuncts to, rather than substitutes for, clinical judgment. Additional studies are needed to support broadened computer-based test usage.

Finger, M.S. - Ones, D.S. (1999). Psychometric equivalence of the computer and booklet forms of the MMPI: A meta-analysis. Psychological Assessment, vol. 11, no. 1, pp. 58-66. ISSN 1040-3590.

ABSTRAKT: Inconsistent findings have repeatedly been found by researchers attempting to determine whether the computer form of the Minnesota Multiphasic Personality Inventory (MMPI) is psychometrically equivalent to the booklet form. This article applied psychometric meta-analysis to pool results from all available studies to examine the equivalence: of the computer and booklet MMPI forms. Means, standard deviations, and crossform correlations were cumulated. A comprehensive meta-analysis of the literature demonstrated that the disparate findings can be explained in terms of sampling error across individual studies. Differences in means and standard deviations across studies were near 0, and crossform rank orderings were near perfect. The results of this study suggest that the computer and booklet forms of the MMPI are psychometrically equivalent.

Fliege, H. - Becker, J. - Walter, O.B. - Bjorner, J.B. - Klapp, B.F. - Rose, M. (2005). Development of a computer-adaptive test for depression (D-CAT). Quality of life research, vol. 14, no. 10, pp. 2277-2291. ISSN 0962-9343.

ABSTRAKT: Depression is one of the most prevalent mental health problems and measuring depressive symptoms becomes increasingly important in science as well as medical practice. Computer Adaptive Tests (CAT) based on the Item Response Theory (IRT) promise to enhance measurement precision and reduce respondent's burden. Our aim was to develop a CAT application to measure depressive symptoms. Three thousand two hundred seventy psychosomatic patients answered an overall of 11 mental health questionnaires at the University Clinic in Berlin. Three independent reviewers rated 144 items out of these questionnaires as indicative of depressive symptoms. All items underwent six empirical steps to analyze unidimensionality, local independence and item discrimination. Finally 64 items could be used to calculate item parameters applying a Generalized Partial Credit Model (GPCM). CAT scores were estimated using an 'expected a posteriori' algorithm (EAP). Two simulation experiments showed that for theta values within the range of 2SD around the mean (98% of the cases), the latent trait can be estimated out of approximately six items with a predefined standard error of <= 0.32 (reliability rho >= 0.90). The CAT-scores correlated high with scores of all depression items (r = 0.95), with the Beck Depression Inventory (r = 0.79) and with a CES-D 8 item short form (r = 0.76). We conclude that the Depression-CAT measures depressive symptoms with high precision and low respondent burden.

Fliege, H. - Becker, J. - Walter, O.B. - Rose, M. - Bjorner, J.B. - Klapp, B.F. (2009). Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application. International journal of methods in psychiatric research, vol. 18, no. 1, pp. 23-36. ISSN 1049-8931.

ABSTRAKT: In the past, a German Computerized Adaptive Test, based on Item Response Theory (IRT), was developed for purposes of assessing the construct depression [Computer-adaptive test for depression (D-CAT)]. This study aims at testing the feasibility and validity of the real computer-adaptive application. The D-CAT, supplied by a bank of 64 items, was administered on personal digital assistants (PDAs) to 423 consecutive patients suffering from psychosomatic and other medical conditions (78 with depression). Items were adaptively administered until a predetermined reliability (r >= 0.90) was attained. For validation purposes, the Hospital Anxiety and Depression Scale (HADS), the Centre for Epidemiological Studies Depression (CES-D) scale, and the Beck Depression Inventory (BDI) were administered. Another sample of 114 patients was evaluated using standardized diagnostic interviews [Composite International Diagnostic Interview (CIDI)]. The D-CAT was quickly completed (mean 74 seconds), well accepted by the patients and reliable after an average administration of only six items. In 95% of the cases, 10 items or less were needed for a reliable score estimate. Correlations between the D-CAT and the HADS, CES-D, and BDI ranged between r = 0.68 and r = 0.77. The D-CAT distinguished between diagnostic groups as well as established questionnaires do. The D-CAT proved an efficient, well accepted and reliable tool. Discriminative power was comparable to other depression measures, whereby the CAT is shorter and more precise. Item usage raises questions of balancing the item selection for content in the future. Copyright (C) 2009 John Wiley & Sons, Ltd.

Goodwin, L.D. (2002). Changing conceptions of measurement validity: An update on the new standards. Journal of nursing education, vol. 41, no. 3, pp. 100-106. ISSN 0148-4834.

ABSTRAKT: This article serves as a follow up to a 1997 article in the Journal of Nursing Education, in which the author presented a historical overview of the ways in which views of measurement validity had changed during the past half century. The new American Educational Research Association, American Psychological Association, and National Council on Measurement in Education Standards for Educational and Psychological Testing, released in 1999, includes a revised conceptualization of validity. The key changes, including the elimination of the old "trinity" view of validity and the operationalization of validity as five types of evidence, are described in this article, and specific ways to obtain evidence of each type are provided. The article concludes with a brief discussion of some of the major continuing issues and challenges in validity theory and practice.

Halama, P. (2005). Adaptívne testovanie pomocou počítača: Aplikácia teórie odpovede na položku v diagnostike inteligencie. Psychológia a patopsychológia dieťaťa, roč. 40, č. 3, s. 252-266. ISSN 055-5574.

ABSTRAKT: Rozvoj počítačovej techniky vo svete sa prejavuje aj v oblasti psychodiagnostiky, a to najmä využívaním počítačov pri prezentácii testových podnetov a spracúvaní a prezentácii výsledkov testovania. Príspevok sa zameriava na jednu z takýchto aplikácií počítačov v psychodiagnostike inteligencie, konkrétne adaptívne testovanie pomocou počítača (CAT). Cieľom CAT je zefektívniť a spresniť proces testovania tak, aby testovaná osoba riešila položky, čo najlepšie zodpovedajúce jej úrovni inteligencie. V príspevku sú prezentované základné princípy teórie odpovede na položku (IRT), ktorá tvorí základ pre moderné adaptívne testovanie. Popísané sú jednotlivé fázy adaptívneho testovania, konkrétne vytvorenie banky položiek, prezentácia prvej položky, odhadnutie pravdepodobnej miery schopnosti, algoritmus adaptácie položiek a ukončenie testovania. Na záver sú uvedené výhody CAT ale aj možné problémy a obmedzenia súvisiace s jeho používaním.

Halama, P. - Bieščad, M. (2006). Psychometrická analýza Rosenbergovej škály sebahodnotenia s použitím metód klasickej teórie testov (CTT) a teórie odpovede na položku (IRT). Československá psychologie, roč. 50, č. 6, s. 588-603. ISSN 0009-062X.

ABSTRAKT: Rosenbergova škála sebahodnotenia bola psychometricky analyzovaná na vzorke 591 adolescentov (234 mužov, 327 žien, priemerný vek 18,77). Na hodnotenie psychometrických vlastností škály a jej položiek boli použité metódy klasickej teórie testov, jako aj teórie odpovede na položku. Podobne ako vo viacerých iných výskumoch, faktorová analýza odhalila dva korelované faktory tvorené pozitívne a negatívne formulovanými položkami. Klasická položková analýza potvrdila, že väčšina položiek dobre prispieva k vnútornej konzistencii škály. Pomocou teórie odpovede na položku, konkrétne Samajimovho modelu pre stupňované odpovede, sa ukázalo, že jednotlivé položky sa líšia v ich schopnosti diskriminácie a majú odlišný vzťah k úrovni sebaúcty. Informačná funkcia škály tiež ukázala, že táto škála je viac informatívna u osob s nízkou a strednou mierou sebaúcty a menej informatívna u osob s vysokou mierou sebaúcty. Výsledky položkovej analýzy poukázali na istú konvergenciu koeficientov z oboch prístupov.

Haley, S.M. - Fragala-Pinkham, M.A. - Dumas, H.M. - Ni, P. - Gorton, G.E. - Watson, K. - Montpetit, K. - Bilocleau, N. - Hambleton, R.K. - Tucker, C.A. (2009). Evaluation of an Item Bank for a Computerized Adaptive Test of Activity in Children With Cerebral Palsy. Physical therapy, vol. 89, no. 6, pp. 589-600. ISSN 0031-9023.

ABSTRAKT: Background. Contemporary clinical assessments of activity are needed across the age span for children with cerebral palsy (CP). Computerized adaptive testing (CAT) has the potential to efficiently administer items for children across wide age spans and functional levels. Objective. The objective of this study was to examine the psychometric properties of a new item bank and simulated computerized adaptive test to assess activity level abilities in children with CP. Design. This was a cross-sectional item calibration study. Methods. The convenience sample consisted of 308 children and youth with CP, aged 2 to 20 years ((X) over bar =10.7, SD=4.0), recruited from 4 pediatric hospitals. We collected parent-report data on;in initial set of 45 activity items. Using an Item Response Theory (1111) approach, we compared estimated scores from the activity item bank with concurrent instruments, examined discriminate validity, and developed computer simulations of a CAT algorithm with multiple stop rules to evaluate scale coverage, score agreement with CAT algorithms, and discriminant and concurrent validity. Results. Confirmatory factor analysis supported scale unidimensionality, local item dependence, and invariance. Scores from the computer simulations of the prototype CAT's with varying stop rules were consistent with scores from the full item bank (r=.93-.98). The activity summary scores discriminated across levels of upper-extremity and gross motor severity and were correlated with the Pediatric Outcomes Data Collection Instrument (PODCI) physical function and sports subscale (r=.86), the Functional Independence Measure for Children (Wee-FIM) (r=.79), and the Pediatric Quality of Life Inventory-Cerebral Pals), version (r=.74). Limitations. The sample size was small for such IRT item banks and CAT development Studies. Another limitation was oversampling of children with CP at higher functioning levels. Conclusions. The new activity item bank appears to have promise for use in a CAT application for the assessment of activity abilities in children with CP across a wide age range and different levels of motor severity.

Handel, R.W. - Ben-Porath, Y.S. - Watt, M. (1999). Computerized adaptive assessment with the MMPI-2 in a clinical setting. Psychological Assessment, vol. 11, no. 3, pp. 369-380. ISSN 1040-3590.

ABSTRAKT: Comparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method (Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with with this mode of adaptive resting has minimal impact on test validity. Item and time savings were substantial.

Harwell, M. - Stone, C.A. - Hsu, T.C. - Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, vol. 20, no. 2, pp. 101-125. ISSN 0146-6216.

ABSTRAKT: Monte carlo studies are being used in item response theory (IRT) to provide information about how validly these methods can be applied to realistic datasets (e.g., small numbers of examinees and multidimensional data). This paper describes the conditions under which monte carlo studies are appropriate in IRT-based research, the kinds of problems these techniques have been applied to, available computer programs for generating item responses and estimating item and examinee parameters, and the importance of conceptualizing these studies as statistical sampling experiments that should be subject to the same principles of experimental design and data analysis that pertain to empirical studies. The number of replications that should be used in these studies is also addressed.

Jelínek, M. - Květon, P. - Denglerová, D. (2006). Adaptivní testování - základní pojmy a principy. Československá psychologie, roč. 50, č. 2, s. 163-173. ISSN 0009-062X.

ABSTRAKT: V moderní psychodiagnostice se vedle klasických testů objevuje i efektivnější postup pro zachycení sledovaných charakteristik, kterým je technika adaptivního testování. Myšlenka adaptivního testování má poměrně dlouhou historii, nicméně její výhody se prosazují až díky skutečně interaktivní počítačové administraci. Počítačová technika umožnila zavést do adaptivního procesu testování pokročilejší matematický aparát, který je znám jako teorie odpovědi na položku (IRT). Většina adaptivních testů je zaměřena do oblasti výkonové diagnostiky, nicméně snahy o rozšíření možností adaptivního přístupu směřují také do diagnostiky osobnosti.

Jette, A.M. - Haley, S.M. - Tao, W. - Ni, P.S. - Moed, R. - Meyers, D. - Zurek, M. (2007). Prospective evaluation of the AM-PAC-CAT in outpatient rehabilitation settings. Physical therapy, vol. 87, no. 4, pp. 385-398. ISSN 0031-9023.

ABSTRAKT: Background and Purpose The purpose of this study was to prospectively evaluate the practical and psychometric adequacy of the Activity Measure for Post-Acute Care (AM-PAC) "item bank" and computerized adaptive testing (CAT) assessment platform (AM-PAC-CAT when applied within orthopedic outpatient physical therapy settings. Method This was a prospective study with a convenience sample of 1,815 patients with spine, lower-extremity, or upper-extremity impairments who received outpatient physical therapy in I of 20 outpatient clinics across 5 states. The authors conducted an evaluation of the number of items used and amount of time needed to complete the CAT assessment; evaluation of breadth of content coverage, item exposure rate, and test precision; as well as an assessment of the validity and sensitivity to change of the score estimates. Results Overall, the AM-PAC-CAT's Basic Mobility scale demonstrated excellent psychometric properties while the Daily Activity scale demonstrated less adequate psychometric properties when applied in this outpatient sample. The mean length of time to complete the Basic Mobility scale was 1.9 minutes, using, on average, 6.6 items per CAT session, and the mean length of time to complete the Daily Activity scale was 1.01 minutes, using on average, 6.8 items. Background and Conclusion Overall, the findings are encouraging, yet they do reveal several areas where the AM-PAC-CAT scales can be improved to best suit the needs of patients who are receiving outpatient orthopedic physical therapy of the type included in this study.

Kocalevent, R.D. - Matthias, R. - Becker, J. - Walter, O.B. - Fliege, H. - Bjorner, J.B. - Kleiber, D. - Klapp, B.F. (2009). An evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception. Journal of clinical epidemiology, vol. 62, no. 3, pp. 278-287. ISSN 0895-4356.

ABSTRAKT: Objectives: This study aimed to develop and evaluate a first computerized adaptive test (CAT) for the measurement of stress perception (Stress-CAT), in terms of the two dimensions: exposure to stress and stress reaction. Study Design and Setting: Item response theory modeling was performed using a two-parameter model (Generalized Partial Credit Model). The evaluation of the Stress-CAT comprised a simulation study and real clinical application. A total of 1,092 psychosomatic patients (N1) were studied. Two hundred simulees (N2) were generated for a simulated response data set. Then the Stress-CAT was given to n = 116 inpatients, (M) together with established stress questionnaires as validity criteria. Results: The final banks included n = 38 stress exposure items and n = 31 stress reaction items. In the first simulation study, CAT scores could be estimated with a high measurement precision (SE < 0.32; rho > 0.90) using 7.0 +/- 2.3 (M +/- SD) stress reaction items and 11.6 +/- 1.7 stress exposure items. The second simulation study reanalyzed real patients data (N1) and showed an average use of items of 5.6 +/- 2.1 for the dimension stress reaction and 10.0 +/- 4.9 for the dimension stress exposure. Convergent validity showed significantly high correlations. Conclusions: The Stress-CAT is short and precise, potentially lowering the response burden of patients in clinical decision making. (C) 2008 Elsevier Inc. All rights reserved.

Kopec, J.A. - Badii, M. - McKenna, M. - Lima, V.D. - Sayre, E.C. - Dvorak, M. (2008). Computerized adaptive testing in back pain - Validation of the CAT-5D-QOL. Spine, vol. 33, no. 12, pp. 1384-1390. ISSN 0362-2436.

ABSTRAKT: Study Design. We have conducted an outcome instrument validation study. Objective. Our objective was to develop a computerized adaptive test (CAT) to measure 5 domains of health-related quality of life (HRQL) and assess its feasibility, reliability, validity, and efficiency. Summary of Background Data. Kopec and colleagues have recently developed item response theory based item banks for 5 domains of HRQL relevant to back pain and suitable for CAT applications. The domains are Daily Activities (DAILY), Walking (WALK), Handling Objects (HAND), Pain or Discomfort (PAIN), and Feelings (FEEL). Methods. An adaptive algorithm was implemented in a web-based questionnaire administration system. The questionnaire included CAT-5D-QOL (5 scales), Modified Oswestry Disability Index (MODI), Roland-Morris Disability Questionnaire (RMDQ), SF-36 Health Survey, and standard clinical and demographic information. Participants were outpatients treated for mechanical back pain at a referral center in Vancouver, Canada. Results. A total of 215 patients completed the questionnaire and 84 completed a retest. On average, patients answered 5.2 items per CAT-5D-QOL scale. Reliability ranged from 0.83 (FEEL) to 0.92 (PAIN) and was 0.92 for the MODI, RMDQ, and Physical Component Summary (PCS-36). The ceiling effect was 0.5% for PAIN compared with 2% for MODI and 5% for RMQ. The CAT-5D-QOL scales correlated as anticipated with other measures of HRQL and discriminated well according to the level of satisfaction with current symptoms, duration of the last episode, sciatica, and disability compensation. The average relative discrimination index was 0.87 for PAIN, 0.67 for DAILY and 0.62 for WALK, compared with 0.89 for MODI, 0.80 for RMDQ, and 0.59 for PCS-36. Conclusion. The CAT-5D-QOL is feasible, reliable, valid, and efficient in patients with back pain. This methodology can be recommended for use in back pain research and should improve outcome assessment, facilitate comparisons across studies, and reduce patient burden.

Kosinski, M. - Bjorner, J.B. - Ware, J.E. - Sullivan, E. - Straus, W.L. (2006). An evaluation of a patient-reported outcomes found computerized adaptive testing was efficient in assessing osteoarthritis impact. Journal of clinical epidemiology, vol. 59, no. 7, pp. 715-723. ISSN 0895-4356.

ABSTRAKT: Background and Objectives: Evaluate a patient-reported outcomes questionnaire that uses computerized adaptive testing (CAT) to measure the impact of osteoarthritis (OA) on functioning and well-being. Materials and Methods: OA patients completed 37 questions about the impact of OA on physical, social and role functioning, emotional well-being, and vitality. Questionnaire responses were calibrated and scored using item response theory, and two scores were estimated: a Total-OA score based on patients' responses to all 37 questions, and a simulated CAT-OA score where the computer selected and scored the five most informative questions for each patient. Agreement between Total-OA and CAT-OA scores was assessed using correlations. Discriminant validity of Total-OA and CAT-OA scores was assessed with analysis of variance. Criterion measures included OA pain and severity, patient global assessment, and missed work days. Results: Simulated CAT-OA and Total-OA scores correlated highly (r = 0.96). Both Total-OA and simulated CAT-OA scores discriminated significantly between patients differing on the criterion measures. F-statistics across criterion measures ranged from 39.0 (P <.001) to 225.1 (P <.001) for the Total-OA score, and from 40.5 (P <.001) to 221.5 (P <.001) for the simulated CAT-OA score. Conclusions: CAT methods produce valid and precise estimates of the impact of OA on functioning and well-being with significant reduction in response burden. (c) 2006 Elsevier Inc. All rights reserved.

Květon, P. - Jelínek, M. - Denglerová, D. - Vobořil, D. (2008). Software pro adaptivní testování: CAT v praxi. Československá psychologie, roč. 52, č. 2, s. 145-154. ISSN 0009-062X.

ABSTRAKT: Počítačové adaptivní testování představuje nový přístup k testování psychologických (i jiných) charakteristik, který umožňuje proces testování zefektivnit a zpřesnit. Základní ideou je administrace pouze takových položek, které jsou pro danou testovanou osobu adekvátní a poskytují tedy v terminologii Teorie odpovědi na položku (IRT – Item Response Theory), která je pro adaptivní testování základním matematickým aparátem, maximum informace. Cílem příspěvku je představení původního software vzniklého na půdě Psychologického ústavu Akademie věd ČR, který implementuje funkce pro interaktivní administraci a výběr adekvátních položek, odhad měřené charakteristiky, a vyhodnocení definované podmínky ukončení testu. V současné době je program schopen bezproblémově pracovat s testy tvořenými dichotomně skórovanými položkami. Software byl pojmenován Computerized Adaptive Testing optimized, ve zkratce CATO™.

Kveton, P. - Jelinek, M. - Voboril, D. - Klimusova, H. (2007). Computer-based tests: the impact of test design and problem of equivalency. Computers in human behavior, vol. 23, no. 1, pp. 32-51. ISSN 0747-5632.

ABSTRAKT: Nowadays, computerized forms of psychodiagnostic methods are often produced without providing appropriate psychometric characteristics, or without proving equivalency with conventional forms. Moreover, there exist tests with more than one computerized versions, which are mostly designed differently. Study I focused on the impact of test design. It was found that even simple change of color scheme (light stimuli on dark background vs. dark stimuli on light background) had a significant effect on subjects' performance. Study II examined equivalency of a computerized speeded test, which is broadly used within psychological practitioners in the Czech Republic; this form was found non-equivalent with its conventional counterpart. (c) 2004 Elsevier Ltd. All rights reserved.

Květon, P. - Jelínek, M. - Vobořil, D. - Klimusová, H. (2003). Ekvivalence tradiční a počítačové formy testu IST-70. Československá psychologie, roč. 47, č. 6, s. 562-572. ISSN 0009-062X.

ABSTRAKT: Studie se zabývá otázkou ekvivalence tradičních a počítačových forem psychodiagnostických metod. Autoři ověřovali ekvivalenci inteligenčního testu IST-70 v tradiční a v praxi používané počítačové formě. Přestože většina subtestů nebyla modem administrace výrazněji ovlivněna, u grafických subtestů byl nalezen rozdíl mezi oběma formami testu. V případě subtestů Volba geometrického obrazce a Úlohy s kostkami byly výkony zkoumaných osob v podmínkách počítačové administrace výrazně slabší než při administraci formy tradiční, což autoři připisují vlivu náročnější ovládací a zobrazovací ergonomie.

Květon, P. - Klimusová, H. (2002). Metodologické aspekty počítačové administrace psychodiagnostických metod. Československá psychologie, roč. 46, č. 3, s. 251-264. ISSN 0009-062X.

ABSTRAKT: Počítačová administrace nachází mnohá uplatnění v moderní psychodiagnostice. Jsou vyvíjeny komputerizované verze klasických psychodiagnostických metod, počítačová interview, počítačové adaptivní testy a v poslední době i online testování na internetu. Charakter počítačové administrace (kvalita zpracování designu testu) v mnoha případech ovlivňuje výkon osoby v testové situaci. Míra ovlivnění se mění dle typu testu. nejvíce jsou ovlivněny výkonové testy s rychlostní složkou a také testy s vizuálně komplikovanými stimuly, které jsou náročné na percepci. Méně jsou ovlivněny dotazníkové metody. Subjektivně působícím faktorem v situaci počítačové administrace je počítačová anxieta.

Leung, C.K. - Chang, H.H. - Hau, K.T. (2003). Computerized adaptive testing: A comparison of three content balancing methods. Journal of Technology, Learning, and Assessment, vol. 2, no. 5, pp. 1-15. ISSN 1540-2525.

ABSTRAKT: Content balancing is often a practical consideration in the design of computerized adaptive testing (CAT). This study compared three content balancing methods, namely, the constrained CAT (CCAT), the modified constrained CAT (MCCAT), and the modified multinomial model (MMM), under various conditions of test length and target maximum exposure rate. Results of a series of simulation studies indicate that there is no systematic effect of content balancing method in measurement efficiency and pool utilization. However, among the three methods, the MMM appears to consistently over-expose fewer items.

Mulcahey, M.J. - Haley, S.M. - Duffy, T. - Pengsheng, N. - Betz, R.R. (2008). Measuring Physical Functioning in Children With Spinal Impairments With Computerized Adaptive Testing. Journal of pediatric orthopaedics, vol. 28, no. 3, pp. 330-335. ISSN 0271-6798.

ABSTRAKT: Background: The purpose of this study was to assess the utility of measuring current physical functioning status of children with scoliosis and kyphosis by applying computerized adaptive testing ( CAT) methods. Computerized adaptive testing uses a computer interface to administer the most optimal items based on previous responses, reducing the number of items needed to obtain a scoring estimate. Methods: This was a prospective study of 77 subjects (0.6-19.8 years) who were seen by a spine surgeon during a routine clinic visit for progress spine deformity. Using a multidimensional version of the Pediatric Evaluation of Disability Inventory CAT program (PEDI-MCAT), we evaluated content range, accuracy and efficiency, known-group validity, concurrent validity with the Pediatric Outcomes Data Collection Instrument, and test-retest reliability in a subsample (n = 16) within a 2-week interval. Results: We found the PEDI-MCAT to have sufficient item coverage in both self-care and mobility content for this sample, although most patients tended to score at the higher ends of both scales. Both the accuracy of PEDI-MCAT scores as compared with a fixed format of the PEDI (r = 0.98 for both mobility and self-care) and test-retest reliability were very high [self-care: intraclass correlation ( 3,1) = 0.98, mobility: intraclass correlation ( 3,1) = 0.99]. The PEDI-MCAT took an average of 2.9 minutes for the parents to complete. The PEDI-MCAT detected expected differences between patient groups, and scores on the PEDI-MCAT correlated in expected directions with scores from the Pediatric Outcomes Data Collection Instrument domains. Conclusions: Use of the PEDI-MCAT to assess the physical functioning status, as perceived by parents of children with complex spinal impairments, seems to be feasible and achieves accurate and efficient estimates of self-care and mobility function. Additional item development will be needed at the higher functioning end of the scale to avoid ceiling effects for older children. Level of Evidence: This is a level II prospective study designed to establish the utility of computer adaptive testing as an evaluation method in a busy pediatric spine practice.

Reise, S.P. - Henson, J.M. (2000). Computerization and adaptive administration of the NEO PI-R. Assessment, vol. 7, no. 4, pp. 347-364. ISSN 1073-1911.

ABSTRAKT: This study asks, how well does an item response theory (IRT) based computerized adaptive NEO PI-R work? To explore this question, real-data simulations (N = 1,059) were used to evaluate a maximum information item selection computerized adaptive test (CAT) algorithm. Findings indicated satisfactory recovery of full-scale facet scores with the administration of around four items per facet scale. Thus, the NEO PI-R could be reduced in half with little loss in precision by CAT administration. However, results also indicated that the CAT algorithm was not necessary. We found that for many scales, administering the "best" four items per facet scale would have produced similar results. In the conclusion, we discuss the future of computerized personality assessment and describe the role IRT methods might play in such assessments.

Roper, B.L. - Ben-Porath, Y.S. - Butcher, J.N. (1995). Comparability and Validity of Computerized Adaptive Testing With the MMPI-2. Journal of Personality Assessment, vol. 65, no. 2, p. 358. ISSN 00223891.

ABSTRAKT: The comparability and validity of a computerized adaptive (CA) Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 571 undergraduate college students. The CA MMPI-2 administered adaptively Scales L, E the 10 clinical scales, and the 15 content scales, utilizing the countdown method (Butcher, Keller, & Bacon, 1985). All subjects completed the MMPI-2 twice, with three experimental conditions: booklet test-retest, booklet-CA, and conventional computerized (CC)-CA. Profiles across administration modalities show a high degree of similarity, providing evidence for the comparability of the three forms. Correlations between MMPI-2 scales and other psychometric measures (Beck Depression Inventory; Symptom Checklist-Revised; State-Trait Anxiety and Anger Scales; and the Anger Expression Scale) support the validity of the CA MMPI-2. Substantial item savings may be realized with the implementation of the countdown procedure.

Sebille, V. - Hardouin, J.B. - Le Neel, T. - Kubis, G. - Boyer, F. - Guillemin, F. - Falissard, B. (2010). Methodological issues regarding power of classical test theory (CTT) and item response theory (IRT)-based approaches for the comparison of patient-reported outcomes in two groups of patients - a simulation study. BMC medical research methodology, vol. 10, no. 24, pp. 1-10. ISSN 1471-2288.

ABSTRAKT: Background: Patients-Reported Outcomes (PRO) are increasingly used in clinical and epidemiological research. Two main types of analytical strategies can be found for these data: classical test theory (CTT) based on the observed scores and models coming from Item Response Theory (IRT). However, whether IRT or CTT would be the most appropriate method to analyse PRO data remains unknown. The statistical properties of CTT and IRT, regarding power and corresponding effect sizes, were compared. Methods: Two-group cross-sectional studies were simulated for the comparison of PRO data using IRT or CTT-based analysis. For IRT, different scenarios were investigated according to whether items or person parameters were assumed to be known, to a certain extent for item parameters, from good to poor precision, or unknown and therefore had to be estimated. The powers obtained with IRT or CTT were compared and parameters having the strongest impact on them were identified. Results: When person parameters were assumed to be unknown and items parameters to be either known or not, the power achieved using IRT or CTT were similar and always lower than the expected power using the well-known sample size formula for normally distributed endpoints. The number of items had a substantial impact on power for both methods. Conclusion: Without any missing data, IRT and CTT seem to provide comparable power. The classical sample size formula for CTT seems to be adequate under some conditions but is not appropriate for IRT. In IRT, it seems important to take account of the number of items to obtain an accurate formula.

Simms, L.J. - Clark, L.A. (2005). Validation of a Computerized Adaptive Version of the Schedule for Nonadaptive and Adaptive Personality (SNAP). Psychological Assessment, vol. 17, no. 1, pp. 28-43. ISSN 10403590.

ABSTRAKT: This is a validation study of a computerized adaptive (CAT) version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) conducted with 413 undergraduates who completed the SNAP twice, 1 week apart. Participants were assigned randomly to 1 of 4 retest groups: (a) paper-and-pencil (P&P) SNAP, (b) CAT, (c) P&P/CAT, and (d) CAT/P&P. With number of items held constant, computerized administration had little effect on descriptive statistics, rank ordering of scores, reliability, and concurrent validity, but was preferred over P&P administration by most participants. CAT administration yielded somewhat lower precision and validity than P&P administration, but required 36% to 37% fewer items and 58% to 60% less time to complete. These results confirm not only key findings from previous CAT simulation studies of personality measures but extend them for the 1st time to a live assessment setting.

Urbánek, T. - Šimeček, M. (2001). Teorie odpovědi na položku. Československá psychologie, roč. 45, č. 5, s. 428-440. ISSN 0009-062X.

ABSTRAKT: Článek srovnává základní principy klasické teorie testů (CTT) a teorie odpovědi na položku (IRT). Hlavní důraz je kladen na představení modelů IRT a jejich výhod při tvorbě testových metod. Srovnání CTT a IRT se soustřeďuje na otázky vztahu položky a celého testu, vlastností položek, reliability a přesnosti měření a možností interpretace výsledků testování.

Vispoel, W.P. (2000). Computerized versus paper-and-pencil assessment of self-concept: Score comparability and respondent preferences. Measurement and evaluation in counseling and development, vol. 33, no. 3, pp. 130-143. ISSN 0748-1756.

ABSTRAKT: Results supported the comparability of scores yielded by computerized and paper-and-pencil Versions of the third edition of the Self-Description Questionnaire and respondents' preferences for computerized assessment.

Vispoel, W.P. - Boo, J. - Bleiler, T. (2001). Computerized and paper-and-pencil versions of the Rosenberg Self-Esteem Scale: A comparison of psychometric features and respondent preferences. Educational and Psychological Measurement, vol. 61, no. 3, pp. 461-474. ISSN 0013-1644.

ABSTRAKT: Although the use of computerized assessment tools in educational and psychological settings has increased dramatically in recent years, limited information is available about the properties of computerized self-concept measures. The authors evaluated the characteristics of computerized and paper-and-pencil versions of the Rosenberg Self-Esteem Scale (SES)-one of the most widely used self-concept measures in educational and psychological research. Results showed that administration mode (computerized versus paper and pencil) had little effect on the psychometric properties of the SES (i.e., score magnitude, variability, and factor structure) but that the computerized version took longer and was preferred by examinees. With the exception of administration time, these results support the use of the computerized SES and its comparability to the paper-and-pencil version.

Waller, N.G. - Reise, S.P. (1989). Computerized Adaptive Personality Assessment: An Illustration With the Absorption Scale. Journal of personality and social psychology, vol. 57, no. 6, pp. 1051-1058. ISSN 0022-3514.

ABSTRAKT: This article introduces the theory behind and applications of adaptive personality assessment based on the item response theory. Two adaptive testing strategies were compared: (a) fixed test length and (b) clinical decision. Real-data simulations, based on the item responses from 1,000 subjects who had previously taken the 34-item Absorption scale (A. Tellegen, 1982) by means of paper-and-pencil format, were used to illustrate these strategies. Results suggest that computerized adaptive personality assessment works impressively well. With the fixed-test-length strategy, a 50% savings in administered items was achieved with little loss of measurement precision. In the clinical-decision testing strategy, individuals who were extreme on the Absorption trait were identified with perfect accuracy using, on average, 25% of the available items. The implications of these results for personality research and assessment are discussed.

Weiss, D.J. (1985). Adaptive testing by computer. Journal of consulting and clinical psychology, vol. 53, no. 6, pp. 774-789. ISSN 0022-006X.

ABSTRAKT: Describes problems with conventional tests, in which all examinees take the same items; the advantages of adaptive tests are also described. Computer simulation results and results of other live-testing studies by J. G. Thompson and the present author (1980) are reviewed. Adaptive testing based on item response theory is also discussed. Results support theoretical predictions that adaptive tests can decrease testing time by about 50% while resulting in more precise measurements in comparison with conventional tests. Item pool and computer hardware and software requirements for adaptive testing are specified, and a microcomputer-based system for implementing adaptive testing in clinical environments is briefly described.

Weiss, D.J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and evaluation in counseling and development, vol. 37, no. 2, pp. 70-84. ISSN 0748-1756.

ABSTRAKT: Computerized adaptive testing (CAT) is described and compared with conventional tests, and its advantages summarized Some item response theory concepts used in CA Tare summarized and illustrated. The author describes the potential usefulness of CA T in counseling and education and reviews some current issues in the implementation of CAT.

Williams, J.E. - McCord, D.M. (2006). Equivalence of standard and computerized versions of the Raven Progressive Matrices Test. Computers in human behavior, vol. 22, no. 5, pp. 791-800. ISSN 0747-5632.

ABSTRAKT: The present study examined the equivalence of the computer administered version of the Raven Standard Progressive Matrices (RSPM) with the standard paper-and-pencil administered version of the RSPM. In addition, the effects of state and trait anxiety as well as computer anxiety were investigated. Fifty undergraduate volunteers were administered the RSPM twice tinder one of four conditions: computer-computer, standard-standard, computer-standard, or standard-computer. No significant differences were found between mean scores and standard deviations across administrations or formats. Rank-order correlations revealed similar ranking across formats. Tentative support for the equivalence of the computerized version of the RSPM was found. Analyses revealed no significant differences in anxiety across formats and no significant correlations between anxiety and RSPM performance. Explanations and implications for further research are discussed. (c) 2004 Elsevier Ltd. All rights reserved.

Späť hore na stránku