Validity of cognitive ability tests – comparison of computerized adaptive testing with paper and pencil and computer based forms of administrations

Bibliografický odkaz pre citovanie

Žitný, P. – Halama, P. – Jelínek, M. – Květon, P. (2012). Validity of cognitive ability tests – comparison of computerized adaptive testing with paper-and-pencil and computer-based forms of administrations. Studia Psychologica, vol. 54, no. 3, pp. 181-194. ISSN 0039-3320. Dostupné na internete: <http://tinyurl.com/zitny>

Abstrakt

Štúdia analyzuje a porovnáva validitu administrácie testov kognitívnych schopností prostredníctvom počítačového adaptívneho testovania s administráciou formou papier-ceruzka a cez počítač. Výskum bol realizovaný na súbore 803 študentov stredných škôl (567 vyplnilo testy formou papier-ceruzka, 236 cez počítač/simulácia CAT; 363 mužov, 440 žien), ich priemerný vek bol 16.8 rokov (SD = 1.33). Testová batéria pozostávala z Testu intelektového potenciálu a Viedenského matricového testu. Celkovo sa z výsledkov ukázalo, že validita CAT bola adekvátne porovnateľná cez jednotlivé formy administrácie. V súlade s predchádzajúcim výskumom, CAT používa len malé množstvo položiek dávajúc výsledky, ktoré, pokiaľ ide o validitu, sú len nepatrne odlišné od výsledkov tradičnej administrácie. Simulovaná CAT administrácia TIP bola zhruba o 55 % a VMT o 54 % úspornejšia ako tradičné verzie. Tieto výsledky naznačujú, že CAT je užitočný spôsob, ako zlepšiť metodológiu psychologického testovania.

Kľúčové slová

Teória odpovede na položku. Počítačové adaptívne testovanie. Papier a ceruzka. Počítačový. Kriteriálna a konštruktová validita. Efektívnosť.

Jazyk práce

anglický

Full-text

Uvítam, keď mi budúci autor vlastnej publikácie pošle stručnú správu na e-mail o tom, v akom publikačnom výstupe túto prácu použil.

Zoznam použitej literatúry

BECKER, J., FLIEGE, H., KOCALEVENT, R.-D., BJORNER, J.B., ROSE, M., WALTER, O.B., KLAPP, B.F., 2008, Functioning and validity of A Computerized Adaptive Test to measure anxiety (A-CAT). Depression and Anxiety, 25, 12, E182-E194.

BUTCHER, J.N., PERRY, J., HAHN, J., 2004, Computers in clinical assessment: Historical developments, present status, and future challenges. Journal of clinical psychology, 60, 3, 331-345.

CUDECK, R., 1985, A Structural Comparison of Conventional and Adaptive Versions of the ASVAB. Multivariate behavioral research, 20, 3, 305-322.

EMBRETSON, S.E., REISE, S.P., 2000, Item Response Theory for Psychologists (Multivariate Applications Book Series). Mahwah, NJ: Lawrence Erlbaum Associates.

FINGER, M.S., ONES, D.S., 1999, Psychometric equivalence of the computer and booklet forms of the MMPI: A meta-analysis. Psychological Assessment, 11, 1, 58-66.

FLIEGE, H., BECKER, J., WALTER, O.B., ROSE, M., BJORNER, J.B., KLAPP, B.F., 2009, Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application. International journal of methods in psychiatric research, 18, 1, 23-36.

HAHN, E.A., CELLA, D., BODE, R.K., GERSHON, R., LAI, J.-S., 2006, Item Banks and Their Potential Applications to Health Status Assessment in Diverse Populations. Medical Care, 44, 11, 189-197.

HALAMA, P., 2005, Adaptívne testovanie pomocou počítača: Aplikácia teórie odpovede na položku v diagnostike inteligencie [Computerized adaptive testing: Application of item response theory in intelligence testing]. Psychológia a patopsychológia dieťaťa, 40, 3, 252-266.

HAMBLETON, R.K., 2000, Emergence of item response modeling in instrument development and data analysis. Medical Care, 38, 9, 60-65.

HAMBLETON, R.K., JONES, R.W., 1993, Comparison of classical test theory and item response theory and their applications to test development. Educational Measurement: Issues and Practice, 12, 3, 253-262.

HAMBLETON, R.K., SWAMINATHAN, H., ROGERS, H.J., 1991, Fundamentals of item response theory (Measurement Methods for the Social Science). Newbury Park, CA: Sage Publications.

HANDEL, R.W., BEN-PORATH, Y.S., WATT, M., 1999, Computerized adaptive assessment with the MMPI-2 in a clinical setting. Psychological Assessment, 11, 3, 369-380.

HART, D.L., MIODUSKI, J.E., WERNEKE, M.W., STRATFORD, P.W., 2006, Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function. Journal of clinical epidemiology, 59, 9, 947-956.

HOL, A.M., VORST, H.C.M., MELLENBERGH, G.J., 2007, Computerized adaptive testing for polytomous motivation items: Administration mode effects and a comparison with short forms. Applied psychological measurement, 31, 5, 412-429.

JELÍNEK, M., KVĚTON, P., DENGLEROVÁ, D., 2006, Adaptivní testování - základní pojmy a principy [Adaptive testing - basic concepts and principles]. Československá psychologie, 50, 2, 163-173.

JELÍNEK, M., KVĚTON, P., VOBOŘIL, D., 2011a, Adaptivní administrace NEO PI-R: výhody a omezení [Adaptive administration of NEO PI-R: limits and benefits]. Československá psychologie, 55, 1, 69-81.

JELÍNEK, M., KVĚTON, P., VOBOŘIL, D., 2011b, Testování v psychologii - Teorie odpovědi na položku a počítačové adaptivní testování [Testing in Psychology: Item Response Theory and Computerized Adaptive Testing]. Praha: Grada Publishing.

KINGSBURY, G.G., HOUSER, R.L., 1993, Assessing the Utility of Item Response Models: Computerized Adaptive Testing. Educational Measurement: Issues and Practice, 12, 1, 21-27.

KVĚTON, P., JELÍNEK, M., DENGLEROVÁ, D., VOBOŘIL, D., 2008, Software pro adaptivní testování: CAT v praxi [Software for adaptive testing: CAT in action]. Československá psychologie, 52, 2, 145-154.

KVĚTON, P., JELÍNEK, M., VOBOŘIL, D., KLIMUSOVÁ, H., 2007, Computer-based tests: the impact of test design and problem of equivalency. Computers in human behavior, 23, 1, 32-51.

KVĚTON, P., KLIMUSOVÁ, H., 2002, Metodologické aspekty počítačové administrace psychodiagnostických metod [Methodological aspects of computer administration of psychodiagnostics methods]. Československá psychologie, 46, 3, 251-264.

LORD, F.M., 1980, Applications of item response theory to practical testing problems. Hillsdale, N.J.: Lawrence Erlbaum Associates.

MEAD, A.D., DRASGOW, F., 1993, Equivalence of Computerized and Paper-and-Pencil Cognitive Ability Tests: A Meta-Analysis. Psychological Bulletin, 114, 3, 449-458.

MEIJER, R.R., NERING, M.L., 1999, Computerized adaptive testing: Overview and introduction. Applied psychological measurement, 23, 3, 187-194.

MILLS, C.N., STOCKING, M.L., 1996, Practical issues in Large-Scale Computerized Adaptive Testing. Applied Measurement in Education, 9, 4, 287-304.

ROPER, B.L., BEN-PORATH, Y.S., BUTCHER, J.N., 1995, Comparability and Validity of Computerized Adaptive Testing With the MMPI-2. Journal of Personality Assessment, 65, 2, 358.

ŘÍČAN, P., 1971, Test Intelektového potenciálu - TIP [Test of Intellect Potential - TIP]. Bratislava: Psychodiagnostické a didaktické testy.

SCHAEFFER, G.A., BRIDGEMAN, B., GOLUB-SMITH, M.L., LEWIS, C., POTENZA, M.T., STEFFEN, M., 1998, Comparability of Paper-and-Pencil and Computer Adaptive Test Scores on the GRE General Test. Princeton, NJ: Educational Testing Service (Research Report No: RR-98-38).

SCHAEFFER, G.A., REESE, C.M., STEFFEN, M., MCKINLEY, R.L., MILLS, C.N., 1993, Field Test of a Computer-Based GRE General Test. Princeton, NJ: Educational Testing Service (Research Report No: RR-93-07).

SCHAEFFER, G.A., STEFFEN, M., GOLUB-SMITH, M.L., MILLS, C.N., DURSO, R., 1995, The Introduction and Comparability of the Computer Adaptive GRE General Test. Princeton, NJ: Educational Testing Service (Research Report No: RR-95-20).

SIMMS, L.J., CLARK, L.A., 2005, Validation of a Computerized Adaptive Version of the Schedule for Nonadaptive and Adaptive Personality (SNAP). Psychological Assessment, 17, 1, 28-43.

THISSEN, D., CHEN, W., BOCK, R.D., 2003, MULTILOG 7 for Windows: Multiple-category item analysis and test scoring using item response theory. Lincolnwood, IL: Scientific Software International, Inc. [Computer software].

THISSEN, D., MISLEVY, R.J., 2000, Testing algorithms. In: H. Wainer (Ed.), Computerized adaptive testing: A Primer (pp. 101-133). Mahwah, NJ: Lawrence Erlbaum Associates, 360.

URBÁNEK, T., ŠIMEČEK, M., 2001, Teorie odpovědi na položku [Item response theory]. Československá psychologie, 45, 5, 428-440.

VAN DER LINDEN, W.J., GLAS, C.A.W., 2002, Computerized adaptive testing: Theory and practice. New York: Kluwer Academic Publishers.

VISPOEL, W.P., BOO, J., BLEILER, T., 2001, Computerized and paper-and-pencil versions of the Rosenberg Self-Esteem Scale: A comparison of psychometric features and respondent preferences. Educational and Psychological Measurement, 61, 3, 461-474.

VONKOMER, J., 1992, Viedenský matricový test - VMT [Vienna Matrices Test - VMT]. Bratislava: Psychodiagnostika.

WAINER, H., 2000, Computerized adaptive testing: A Primer. Mahwah, NJ: Lawrence Erlbaum Associates.

WAINER, H., MISLEVY, R.J., 2000, Item response theory, item calibration, and proficiency estimation. In: H. Wainer (Ed.), Computerized adaptive testing: A Primer (pp. 61-100). Mahwah, NJ: Lawrence Erlbaum Associates, 360.

WANG, T.Y., KOLEN, M.J., 2001, Evaluating comparability in computerized adaptive testing: Issues, criteria and an example. Journal of educational measurement, 38, 1, 19-49.

WANG, X.B., PAN, W., HARRIS, V., 1999, Computerized adaptive testing simulations using real test taker responses. Newtown, PA: Law School Admission Council.

WEISS, D.J., 1982, Improving measurement quality and efficiency with adaptive theory. Applied Psychological Measurement, 6, 4, 473-492.

WEISS, D.J., 1985, Adaptive testing by computer. Journal of consulting and clinical psychology, 53, 6, 774-789.

WEISS, D.J., 2004, Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and evaluation in counseling and development, 37, 2, 70-84.

WEISS, D.J., 2005, Manual for POSTSIM: Post-hoc simulation of computerized adaptive testing. Version 2.0. St. Paul, MN: Assessment Systems Corporation.

WILLIAMS, J.E., MCCORD, D.M., 2006, Equivalence of standard and computerized versions of the Raven Progressive Matrices Test. Computers in human behavior, 22, 5, 791-800.

WOOD, R., WILSON, D., GIBBONS, R., SCHILLING, S., MURAKI, E., BOCK, D., 2003, TESTFACT: Test scoring, item statistics, and item factor analysis. Lincolnwood, IL: Scientific Software International, Inc. [Computer software].

ŽITNÝ, P., 2011, Presnosť, validita a efektívnosť počítačového adaptívneho testovania [Computerized adaptive testing: precision, validity and efficiency]. Československá psychologie, 55, 2, 167-179.

Abstrakty článkov z použitej literatúry

BUTCHER, J.N. - PERRY, J. - HAHN, J. (2004). Computers in clinical assessment: Historical developments, present status, and future challenges. Journal of clinical psychology, vol. 60, no. 3, pp. 331-345. ISSN 0021-9762.

ABSTRAKT: Computerized testing methods have long been regarded as a potentially powerful asset for providing psychological assessment services. Ever since computers were first introduced and adapted to the field of assessment psychology in the 1950s, they have been a valuable aid for scoring, data processing, and even interpretation of test results. The history and status of computer-based personality and neuropsychological tests are discussed in this article. Several pertinent issues involved in providing test interpretation by computer are highlighted. Advances in computer-based test use, such as computerized adaptive testing, are described and problems noted. Today, there is great interest in expanding the availability of psychological assessment applications on the Internet. Although these applications show great promise, there are a number of problems associated with providing psychological tests on the Internet that need to be addressed by psychologists before the Internet can become a major medium for psychological service delivery.

FINGER, M.S. - ONES, D.S. (1999). Psychometric equivalence of the computer and booklet forms of the MMPI: A meta-analysis. Psychological Assessment, vol. 11, no. 1, pp. 58-66. ISSN 1040-3590.

ABSTRAKT: Inconsistent findings have repeatedly been found by researchers attempting to determine whether the computer form of the Minnesota Multiphasic Personality Inventory (MMPI) is psychometrically equivalent to the booklet form. This article applied psychometric meta-analysis to pool results from all available studies to examine the equivalence: of the computer and booklet MMPI forms. Means, standard deviations, and crossform correlations were cumulated. A comprehensive meta-analysis of the literature demonstrated that the disparate findings can be explained in terms of sampling error across individual studies. Differences in means and standard deviations across studies were near 0, and crossform rank orderings were near perfect. The results of this study suggest that the computer and booklet forms of the MMPI are psychometrically equivalent.

FLIEGE, H. - BECKER, J. - WALTER, O.B. - ROSE, M. - BJORNER, J.B. - KLAPP, B.F. (2009). Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application. International journal of methods in psychiatric research, vol. 18, no. 1, pp. 23-36. ISSN 1049-8931.

ABSTRAKT: In the past, a German Computerized Adaptive Test, based on Item Response Theory (IRT), was developed for purposes of assessing the construct depression [Computer-adaptive test for depression (D-CAT)]. This study aims at testing the feasibility and validity of the real computer-adaptive application. The D-CAT, supplied by a bank of 64 items, was administered on personal digital assistants (PDAs) to 423 consecutive patients suffering from psychosomatic and other medical conditions (78 with depression). Items were adaptively administered until a predetermined reliability (r >= 0.90) was attained. For validation purposes, the Hospital Anxiety and Depression Scale (HADS), the Centre for Epidemiological Studies Depression (CES-D) scale, and the Beck Depression Inventory (BDI) were administered. Another sample of 114 patients was evaluated using standardized diagnostic interviews [Composite International Diagnostic Interview (CIDI)]. The D-CAT was quickly completed (mean 74 seconds), well accepted by the patients and reliable after an average administration of only six items. In 95% of the cases, 10 items or less were needed for a reliable score estimate. Correlations between the D-CAT and the HADS, CES-D, and BDI ranged between r = 0.68 and r = 0.77. The D-CAT distinguished between diagnostic groups as well as established questionnaires do. The D-CAT proved an efficient, well accepted and reliable tool. Discriminative power was comparable to other depression measures, whereby the CAT is shorter and more precise. Item usage raises questions of balancing the item selection for content in the future. Copyright (C) 2009 John Wiley & Sons, Ltd.

HAHN, E.A. - CELLA, D. - BODE, R.K. - GERSHON, R. - LAI, J.-S. (2006). Item Banks and Their Potential Applications to Health Status Assessment in Diverse Populations. Medical Care, vol. 44, no. 11, pp. 189-197. ISSN 0025-7079.

ABSTRAKT: In the context of an ethnically diverse, aging society, attention is increasingly turning to health-related quality of life measurement to evaluate healthcare and treatment options for chronic diseases. When evaluating and treating symptoms and concerns such as fatigue, pain, or physical function, reliable and accurate assessment is a priority. Modern psychometric methods have enabled us to move from long, static tests that provide inefficient and often inaccurate assessment of individual patients, to computerized adaptive tests (CATs) that can precisely measure individuals on health domains of interest. These modern methods, collectively referred to as item response theory (IRT), can produce calibrated "item banks" from larger pools of questions. From these banks, CATs can be conducted on individuals to produce their scores on selected domains. Item banks allow for comparison of patients across different question sets because the patient's score is expressed on a common scale. Other advantages of using item banks include flexibility in terms of the degree of precision desired; interval measurement properties under most circumstances; realistic capability for accurate individual assessment over time (using CAT); and measurement equivalence across different patient populations. This work summarizes the process used in the creation and evaluation of item banks and reviews their potential contributions and limitations regarding outcome assessment and patient care, particularly when they are applied across people of different cultural backgrounds. (C) 2006 Lippincott Williams & Wilkins, Inc.

HALAMA, P. (2005). Adaptívne testovanie pomocou počítača: Aplikácia teórie odpovede na položku v diagnostike inteligencie [Computerized adaptive testing: Application of item response theory in intelligence testing]. Psychológia a patopsychológia dieťaťa, vol. 40, no. 3, pp. 252-266. ISSN 055-5574.

ABSTRAKT: Rozvoj počítačovej techniky vo svete sa prejavuje aj v oblasti psychodiagnostiky, a to najmä využívaním počítačov pri prezentácii testových podnetov a spracúvaní a prezentácii výsledkov testovania. Príspevok sa zameriava na jednu z takýchto aplikácií počítačov v psychodiagnostike inteligencie, konkrétne adaptívne testovanie pomocou počítača (CAT). Cieľom CAT je zefektívniť a spresniť proces testovania tak, aby testovaná osoba riešila položky, čo najlepšie zodpovedajúce jej úrovni inteligencie. V príspevku sú prezentované základné princípy teórie odpovede na položku (IRT), ktorá tvorí základ pre moderné adaptívne testovanie. Popísané sú jednotlivé fázy adaptívneho testovania, konkrétne vytvorenie banky položiek, prezentácia prvej položky, odhadnutie pravdepodobnej miery schopnosti, algoritmus adaptácie položiek a ukončenie testovania. Na záver sú uvedené výhody CAT ale aj možné problémy a obmedzenia súvisiace s jeho používaním.

HANDEL, R.W. - BEN-PORATH, Y.S. - WATT, M. (1999). Computerized adaptive assessment with the MMPI-2 in a clinical setting. Psychological Assessment, vol. 11, no. 3, pp. 369-380. ISSN 1040-3590.

ABSTRAKT: Comparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method (Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with with this mode of adaptive resting has minimal impact on test validity. Item and time savings were substantial.

HART, D.L. - MIODUSKI, J.E. - WERNEKE, M.W. - STRATFORD, P.W. (2006). Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function. Journal of clinical epidemiology, vol. 59, no. 9, pp. 947-956. ISSN 0895-4356.

ABSTRAKT: Objective: To equate physical functioning (PF) items with Back Pain Functional Scale (BPFS) items, develop a computerized adaptive test (CAT) designed to assess lumbar spine functional status (LFS) in people with lumbar spine impairments, and compare discriminant validity of LFS measures (theta(IRT)) generated using all items analyzed with a rating scale Item Response Theory model (RSM) and measures generated using the simulated CAT (O-CAT). Methods: We performed a secondary analysis of retrospective intake rehabilitation data. Results: Unidimensionality and local independence of 25 BPFS and PF items were supported. Differential item functioning was negligible for levels of symptom acuity, gender, age, and surgical history. The RSM fit the data well. A lumbar spine specific CAT was developed that was 72% more efficient than using all 25 items to estimate LFS measures. theta(IRT) and theta(CAT) measures did not discriminate patients by symptom acuity, age, or gender, but discriminated patients by surgical history in similar clinically logical ways. theta(CAT) measures were as precise as theta(IRT) measures. Conclusion: A body part specific simulated CAT developed from an LFS item bank was efficient and produced precise measures of LFS without eroding discriminant validity. (c) 2006 Elsevier Inc. All rights reserved.

HOL, A.M. - VORST, H.C.M. - MELLENBERGH, G.J. (2007). Computerized adaptive testing for polytomous motivation items: Administration mode effects and a comparison with short forms. Applied psychological measurement, vol. 31, no. 5, pp. 412-429. ISSN 0146-6216.

ABSTRAKT: In a randomized experiment (n = 515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit.

JELÍNEK, M. - KVĚTON, P. - DENGLEROVÁ, D. (2006). Adaptivní testování - základní pojmy a principy [Adaptive testing - basic concepts and principles]. Československá psychologie, vol. 50, no. 2, pp. 163-173. ISSN 0009-062X.

ABSTRAKT: V moderní psychodiagnostice se vedle klasických testů objevuje i efektivnější postup pro zachycení sledovaných charakteristik, kterým je technika adaptivního testování. Myšlenka adaptivního testování má poměrně dlouhou historii, nicméně její výhody se prosazují až díky skutečně interaktivní počítačové administraci. Počítačová technika umožnila zavést do adaptivního procesu testování pokročilejší matematický aparát, který je znám jako teorie odpovědi na položku (IRT). Většina adaptivních testů je zaměřena do oblasti výkonové diagnostiky, nicméně snahy o rozšíření možností adaptivního přístupu směřují také do diagnostiky osobnosti.

JELÍNEK, M. - KVĚTON, P. - VOBOŘIL, D. (2011). Adaptivní administrace NEO PI-R: výhody a omezení [Adaptive administration of NEO PI-R: limits and benefits]. Československá psychologie, vol. 55, no. 1, pp. 69-81. ISSN 0009-062X.

ABSTRAKT: Adaptively administered tests with dichotomously scored items are already well described in the relevant literature and used in practice. The presented study analyses the possibilities of adaptive administration of test with polytomous items, which are commonly used in personality testing. Based on analysis of simulated adaptive administration of NEO PI-R, the limits and benefits of this approach are discussed. It was found that adaptive administration successfully and more effectively reconstructs the level of measured traits in comparison with full scale administration. On the other hand, significant problem consists in overexposure of several items with highest item discrimination power. Representative sample built for the purposes of Czech standardization of NEO PI-R was used (N = 2084).

KINGSBURY, G.G. - HOUSER, R.L. (1993). Assessing the Utility of Item Response Models: Computerized Adaptive Testing. Educational Measurement: Issues and Practice, vol. 12, no. 1, pp. 21-27. ISSN 1745-3992.

ABSTRAKT: How has Item Response Theory helped solve problems in the development and use of computer-adaptive tests? Do we need to balance item content with computer-adaptive tests? Could we use IRT to evaluate unusual responses to computer-delivered tests?

KVĚTON, P. - JELÍNEK, M. - DENGLEROVÁ, D. - VOBOŘIL, D. (2008). Software pro adaptivní testování: CAT v praxi [Software for adaptive testing: CAT in action]. Československá psychologie, vol. 52, no. 2, pp. 145-154. ISSN 0009-062X.

ABSTRAKT: Počítačové adaptivní testování představuje nový přístup k testování psychologických (i jiných) charakteristik, který umožňuje proces testování zefektivnit a zpřesnit. Základní ideou je administrace pouze takových položek, které jsou pro danou testovanou osobu adekvátní a poskytují tedy v terminologii Teorie odpovědi na položku (IRT – Item Response Theory), která je pro adaptivní testování základním matematickým aparátem, maximum informace. Cílem příspěvku je představení původního software vzniklého na půdě Psychologického ústavu Akademie věd ČR, který implementuje funkce pro interaktivní administraci a výběr adekvátních položek, odhad měřené charakteristiky, a vyhodnocení definované podmínky ukončení testu. V současné době je program schopen bezproblémově pracovat s testy tvořenými dichotomně skórovanými položkami. Software byl pojmenován Computerized Adaptive Testing optimized, ve zkratce CATO™.

KVĚTON, P. - JELÍNEK, M. - VOBOŘIL, D. - KLIMUSOVÁ, H. (2007). Computer-based tests: the impact of test design and problem of equivalency. Computers in human behavior, vol. 23, no. 1, pp. 32-51. ISSN 0747-5632.

ABSTRAKT: Nowadays, computerized forms of psychodiagnostic methods are often produced without providing appropriate psychometric characteristics, or without proving equivalency with conventional forms. Moreover, there exist tests with more than one computerized versions, which are mostly designed differently. Study I focused on the impact of test design. It was found that even simple change of color scheme (light stimuli on dark background vs. dark stimuli on light background) had a significant effect on subjects' performance. Study II examined equivalency of a computerized speeded test, which is broadly used within psychological practitioners in the Czech Republic; this form was found non-equivalent with its conventional counterpart. (c) 2004 Elsevier Ltd. All rights reserved.

KVĚTON, P. - KLIMUSOVÁ, H. (2002). Metodologické aspekty počítačové administrace psychodiagnostických metod [Methodological aspects of computer administration of psychodiagnostics methods]. Československá psychologie, vol. 46, no. 3, pp. 251-264. ISSN 0009-062X.

ABSTRAKT: Počítačová administrace nachází mnohá uplatnění v moderní psychodiagnostice. Jsou vyvíjeny komputerizované verze klasických psychodiagnostických metod, počítačová interview, počítačové adaptivní testy a v poslední době i online testování na internetu. Charakter počítačové administrace (kvalita zpracování designu testu) v mnoha případech ovlivňuje výkon osoby v testové situaci. Míra ovlivnění se mění dle typu testu. nejvíce jsou ovlivněny výkonové testy s rychlostní složkou a také testy s vizuálně komplikovanými stimuly, které jsou náročné na percepci. Méně jsou ovlivněny dotazníkové metody. Subjektivně působícím faktorem v situaci počítačové administrace je počítačová anxieta.

MEAD, A.D. - DRASGOW, F. (1993). Equivalence of Computerized and Paper-and-Pencil Cognitive Ability Tests: A Meta-Analysis. Psychological Bulletin, vol. 114, no. 3, pp. 449-458. ISSN 0033-2909.

ABSTRAKT: The effects of the medium of test administration-paper and pencil versus computerized-were examined for timed power and speeded tests of cognitive abilities for populations of young adults and adults. Meta-analytic techniques were used to estimate the cross-mode correlation after correcting for measurement error. A total of 159 correlations was meta-analyzed: 123 from timed power tests and 36 from speeded tests. The corrected cross-mode correlation was found to be .91 when all correlations were analyzed simultaneously. Speededness was found to moderate the effects of administration mode in that the cross-mode correlation was estimated to be .97 for timed power tests but only .72 for speeded tests. No difference in equivalence was observed between adaptively and conventionally administered computerized tests. Some limitations on the generality of these results are discussed, and directions for future research are outlined.

MEIJER, R.R. - NERING, M.L. (1999). Computerized adaptive testing: Overview and introduction. Applied psychological measurement, vol. 23, no. 3, pp. 187-194. ISSN 0146-6216.

ABSTRAKT: Use of computerized adaptive testing (CAT) has increased substantially since it was first formulated in the 1970s. This paper provides an overview of CAT and introduces the contributions to this Special Issue. The elements of CAT discussed here include item selection procedures, estimation of the latent trait, item exposure, measurement precision, and item bank development. Some topics for future research are also presented.

ROPER, B.L. - BEN-PORATH, Y.S. - BUTCHER, J.N. (1995). Comparability and Validity of Computerized Adaptive Testing With the MMPI-2. Journal of Personality Assessment, vol. 65, no. 2, p. 358. ISSN 00223891.

ABSTRAKT: The comparability and validity of a computerized adaptive (CA) Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 571 undergraduate college students. The CA MMPI-2 administered adaptively Scales L, E the 10 clinical scales, and the 15 content scales, utilizing the countdown method (Butcher, Keller, & Bacon, 1985). All subjects completed the MMPI-2 twice, with three experimental conditions: booklet test-retest, booklet-CA, and conventional computerized (CC)-CA. Profiles across administration modalities show a high degree of similarity, providing evidence for the comparability of the three forms. Correlations between MMPI-2 scales and other psychometric measures (Beck Depression Inventory; Symptom Checklist-Revised; State-Trait Anxiety and Anger Scales; and the Anger Expression Scale) support the validity of the CA MMPI-2. Substantial item savings may be realized with the implementation of the countdown procedure.

SIMMS, L.J. - CLARK, L.A. (2005). Validation of a Computerized Adaptive Version of the Schedule for Nonadaptive and Adaptive Personality (SNAP). Psychological Assessment, vol. 17, no. 1, pp. 28-43. ISSN 10403590.

ABSTRAKT: This is a validation study of a computerized adaptive (CAT) version of the Schedule for Nonadaptive and Adaptive Personality (SNAP) conducted with 413 undergraduates who completed the SNAP twice, 1 week apart. Participants were assigned randomly to 1 of 4 retest groups: (a) paper-and-pencil (P&P) SNAP, (b) CAT, (c) P&P/CAT, and (d) CAT/P&P. With number of items held constant, computerized administration had little effect on descriptive statistics, rank ordering of scores, reliability, and concurrent validity, but was preferred over P&P administration by most participants. CAT administration yielded somewhat lower precision and validity than P&P administration, but required 36% to 37% fewer items and 58% to 60% less time to complete. These results confirm not only key findings from previous CAT simulation studies of personality measures but extend them for the 1st time to a live assessment setting.

URBÁNEK, T. - ŠIMEČEK, M. (2001). Teorie odpovědi na položku [Item response theory]. Československá psychologie, vol. 45, no. 5, pp. 428-440. ISSN 0009-062X.

ABSTRAKT: Článek srovnává základní principy klasické teorie testů (CTT) a teorie odpovědi na položku (IRT). Hlavní důraz je kladen na představení modelů IRT a jejich výhod při tvorbě testových metod. Srovnání CTT a IRT se soustřeďuje na otázky vztahu položky a celého testu, vlastností položek, reliability a přesnosti měření a možností interpretace výsledků testování.

VISPOEL, W.P. - BOO, J. - BLEILER, T. (2001). Computerized and paper-and-pencil versions of the Rosenberg Self-Esteem Scale: A comparison of psychometric features and respondent preferences. Educational and Psychological Measurement, vol. 61, no. 3, pp. 461-474. ISSN 0013-1644.

ABSTRAKT: Although the use of computerized assessment tools in educational and psychological settings has increased dramatically in recent years, limited information is available about the properties of computerized self-concept measures. The authors evaluated the characteristics of computerized and paper-and-pencil versions of the Rosenberg Self-Esteem Scale (SES)-one of the most widely used self-concept measures in educational and psychological research. Results showed that administration mode (computerized versus paper and pencil) had little effect on the psychometric properties of the SES (i.e., score magnitude, variability, and factor structure) but that the computerized version took longer and was preferred by examinees. With the exception of administration time, these results support the use of the computerized SES and its comparability to the paper-and-pencil version.

WANG, T.Y. - KOLEN, M.J. (2001). Evaluating comparability in computerized adaptive testing: Issues, criteria and an example. Journal of educational measurement, vol. 38, no. 1, pp. 19-49. ISSN 0022-0655.

ABSTRAKT: When a computerized adaptive testing (CAT) version of a test co-exists with its paper-and-pencil (P&P) version, it is important for scores from the CAT version to be comparable to scores from its P&P version. The CAT version may require multiple item pools for test security reasons, and CAT scores based on alternate pools also need to be comparable to each other In this paper we review research literature on CAT comparability issues and synthesize issues specific to these two settings. A framework of criteria for evaluating comparability was developed that contains the following three categories of criteria: validity criterion, psychometric property/reliability criterion, and statistical assumption/test administration condition criterion. Methods for evaluating comparability under these criteria as well as various algorithms for improving comparability are described and discussed. Focusing on the psychometric property/reliability criterion, an example using an item pool of ACT Assessment Mathematics items is provided to demonstrate a process for developing comparable CAT versions and for evaluating comparability. This example illustrates how simulations can be used to improve comparability at the early stages of the development of a CAT The effects of different specifications of practical constraints, such as content balancing and item exposure rate control, and the effects of using alternate item pools are examined. One interesting finding from this study is that a large part of incomparability may be due to the change from number-correct score-based scoring to IRT ability estimation-based scoring. In addition, changes in components of a CAT such as exposure rate control, content balancing, test length, and item pool size were found to result in different levels of comparability in test scores.

WEISS, D.J. (2004). Computerized adaptive testing for effective and efficient measurement in counseling and education. Measurement and evaluation in counseling and development, vol. 37, no. 2, pp. 70-84. ISSN 0748-1756.

ABSTRAKT: Computerized adaptive testing (CAT) is described and compared with conventional tests, and its advantages summarized Some item response theory concepts used in CA Tare summarized and illustrated. The author describes the potential usefulness of CA T in counseling and education and reviews some current issues in the implementation of CAT.

WILLIAMS, J.E. - MCCORD, D.M. (2006). Equivalence of standard and computerized versions of the Raven Progressive Matrices Test. Computers in human behavior, vol. 22, no. 5, pp. 791-800. ISSN 0747-5632.

ABSTRAKT: The present study examined the equivalence of the computer administered version of the Raven Standard Progressive Matrices (RSPM) with the standard paper-and-pencil administered version of the RSPM. In addition, the effects of state and trait anxiety as well as computer anxiety were investigated. Fifty undergraduate volunteers were administered the RSPM twice tinder one of four conditions: computer-computer, standard-standard, computer-standard, or standard-computer. No significant differences were found between mean scores and standard deviations across administrations or formats. Rank-order correlations revealed similar ranking across formats. Tentative support for the equivalence of the computerized version of the RSPM was found. Analyses revealed no significant differences in anxiety across formats and no significant correlations between anxiety and RSPM performance. Explanations and implications for further research are discussed. (c) 2004 Elsevier Ltd. All rights reserved.

ŽITNÝ, P. (2011). Presnosť, validita a efektívnosť počítačového adaptívneho testovania [Computerized adaptive testing: precision, validity and efficiency]. Československá psychologie, vol. 55, no. 2, pp. 167-179. ISSN 0009-062X.

ABSTRAKT: Computerized adaptive testing: precision, validity and efficiency P. Zitny Present developments in the area of psychological assessment place emphasis on methodological improvements and the importance of increasing effectiveness. Computerized adaptive testing (CAT) algorithms based on item response theory (IRT) offer attractive opportunities for simultaneously optimizing both measurement precision and efficiency. This article presents findings of 15 research studies from field of ability testing, clinical psychology, personality testing and health care designed to explore the reliability, utility (in terms of item savings) and validity (in terms of correlations with existing tools) of CAT. Overall, the findings are encouraging - CAT provides an efective means to gain an optimal amount of information needed to answer an assessment question, while keeping time and/or number of items required to obtain that information at a minimum. CAT score correlated high with score from the full item bank (range r = 0,83 - 0,99) and moderately with established measures (range r = 0,58 - 0,83) provide the evidence for reliability, validity and comparability of adaptive tools. However, these results are based mainly on CAT simulation studies and therefore additional Live-CAT studies (involves the administration of real tests to live examinees) are needed to confirm this pattern of findings.

Späť hore na stránku

Google Sites

Report abuse