%0 Journal Article %J Journal of Computerized Adaptive Testing %D 2024 %T The Influence of Computerized Adaptive Testing on Psychometric Theory and Practice %A Reckase, Mark D. %K computerized adaptive testing %K Item Response Theory %K paradigm shift %K scaling theory %K test design %X

The major premise of this article is that part of the stimulus for the evolution of psychometric theory since the 1950s was the introduction of the concept of computerized adaptive testing (CAT) or its earlier non-CAT variations. The conceptual underpinnings of CAT that had the most influence on psychometric theory was the shift of emphasis from the test (or test score) as the focus of analysis to the test item (or item score). The change in focus allowed a change in the way that test results are conceived of as measurements. It also resolved the conflict among a number of ideas that were present in the early work on psychometric theory. Some of the conflicting ideas are summarized below to show how work on the development of CAT resolved some of those conflicts.

 

%B Journal of Computerized Adaptive Testing %V 11 %G English %U https://jcatpub.net/index.php/jcat/issue/view/34/9 %N 1 %R 10.7333/2403-1101001 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2023 %T An Extended Taxonomy of Variants of Computerized Adaptive Testing %A Roy Levy %A John T. Behrens %A Robert J. Mislevy %K Adaptive Testing %K evidence-centered design %K Item Response Theory %K knowledge-based model construction %K missingness %B Journal of Computerized Adaptive Testing %V 10 %G English %N 1 %R 10.7333/2302-100101 %0 Journal Article %J Applied Psychological Measurement %D 2011 %T catR: An R Package for Computerized Adaptive Testing %A Magis, D. %A Raîche, G. %K computer program %K computerized adaptive testing %K Estimation %K Item Response Theory %X

Computerized adaptive testing (CAT) is an active current research field in psychometrics and educational measurement. However, there is very little software available to handle such adaptive tasks. The R package catR was developed to perform adaptive testing with as much flexibility as possible, in an attempt to provide a developmental and testing platform to the interested user. Several item-selection rules and ability estimators are implemented. The item bank can be provided by the user or randomly generated from parent distributions of item parameters. Three stopping rules are available. The output can be graphically displayed.

%B Applied Psychological Measurement %G eng %R 10.1177/0146621611407482 %0 Journal Article %J Zeitschrift für Psychologie / Journal of Psychology %D 2008 %T Computerized Adaptive Testing of Personality Traits %A Hol, A. M. %A Vorst, H. C. M. %A Mellenbergh, G. J. %K Adaptive Testing %K cmoputer-assisted testing %K Item Response Theory %K Likert scales %K Personality Measures %X

A computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a
conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun’s Adjective
Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima’s graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used).

%B Zeitschrift für Psychologie / Journal of Psychology %V 216 %P 12-21 %N 1 %R 10.1027/0044-3409.216.1.12 %0 Journal Article %J Psycho-Oncology %D 2007 %T The initial development of an item bank to assess and screen for psychological distress in cancer patients %A Smith, A. B. %A Rush, R. %A Velikova, G. %A Wall, L. %A Wright, E. P. %A Stark, D. %A Selby, P. %A Sharpe, M. %K 3293 Cancer %K cancer patients %K Distress %K initial development %K Item Response Theory %K Models %K Neoplasms %K Patients %K Psychological %K psychological distress %K Rasch %K Stress %X Psychological distress is a common problem among cancer patients. Despite the large number of instruments that have been developed to assess distress, their utility remains disappointing. This study aimed to use Rasch models to develop an item-bank which would provide the basis for better means of assessing psychological distress in cancer patients. An item bank was developed from eight psychological distress questionnaires using Rasch analysis to link common items. Items from the questionnaires were added iteratively with common items as anchor points and misfitting items (infit mean square > 1.3) removed, and unidimensionality assessed. A total of 4914 patients completed the questionnaires providing an initial pool of 83 items. Twenty items were removed resulting in a final pool of 63 items. Good fit was demonstrated and no additional factor structure was evident from the residuals. However, there was little overlap between item locations and person measures, since items mainly targeted higher levels of distress. The Rasch analysis allowed items to be pooled and generated a unidimensional instrument for measuring psychological distress in cancer patients. Additional items are required to more accurately assess patients across the whole continuum of psychological distress. (PsycINFO Database Record (c) 2007 APA ) (journal abstract) %B Psycho-Oncology %V 16 %P 724-732 %@ 1057-9249 %G English %M 2007-12507-004 %0 Journal Article %J European Journal of Psychological Assessment %D 2007 %T Psychometric properties of an emotional adjustment measure: An application of the graded response model %A Rubio, V. J. %A Aguado, D. %A Hontangas, P. M. %A Hernández, J. M. %K computerized adaptive tests %K Emotional Adjustment %K Item Response Theory %K Personality Measures %K personnel recruitment %K Psychometrics %K Samejima's graded response model %K test reliability %K validity %X Item response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. However, IRT has been mainly used for assessing achievements and ability rather than personality factors. This paper presents an application of the IRT to a personality measure. Thus, the psychometric properties of a new emotional adjustment measure that consists of a 28-six graded response items is shown. Classical test theory (CTT) analyses as well as IRT analyses are carried out. Samejima's (1969) graded-response model has been used for estimating item parameters. Results show that the bank of items fulfills model assumptions and fits the data reasonably well, demonstrating the suitability of the IRT models for the description and use of data originating from personality measures. In this sense, the model fulfills the expectations that IRT has undoubted advantages: (1) The invariance of the estimated parameters, (2) the treatment given to the standard error of measurement, and (3) the possibilities offered for the construction of computerized adaptive tests (CAT). The bank of items shows good reliability. It also shows convergent validity compared to the Eysenck Personality Inventory (EPQ-A; Eysenck & Eysenck, 1975) and the Big Five Questionnaire (BFQ; Caprara, Barbaranelli, & Borgogni, 1993). (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B European Journal of Psychological Assessment %I Hogrefe & Huber Publishers GmbH: Germany %V 23 %P 39-46 %@ 1015-5759 (Print) %G eng %M 2007-01587-007 %0 Journal Article %J Applied Psychological Measurement %D 2006 %T Optimal testing with easy or difficult items in computerized adaptive testing %A Theo Eggen %A Verschoor, Angela J. %K computer adaptive tests %K individualized tests %K Item Response Theory %K item selection %K Measurement %X Computerized adaptive tests (CATs) are individualized tests that, from a measurement point of view, are optimal for each individual, possibly under some practical conditions. In the present study, it is shown that maximum information item selection in CATs using an item bank that is calibrated with the one- or the two-parameter logistic model results in each individual answering about 50% of the items correctly. Two item selection procedures giving easier (or more difficult) tests for students are presented and evaluated. Item selection on probability points of items yields good results only with the one-parameter logistic model and not with the two-parameter logistic model. An alternative selection procedure, based on maximum information at a shifted ability level, gives satisfactory results with both models. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Applied Psychological Measurement %I Sage Publications: US %V 30 %P 379-393 %@ 0146-6216 (Print) %G eng %M 2006-10279-002 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function %A Hart, D. L. %A Mioduski, J. E. %A Werneke, M. W. %A Stratford, P. W. %K Back Pain Functional Scale %K computerized adaptive testing %K Item Response Theory %K Lumbar spine %K Rehabilitation %K True-score equating %X Objective: To equate physical functioning (PF) items with Back Pain Functional Scale (BPFS) items, develop a computerized adaptive test (CAT) designed to assess lumbar spine functional status (LFS) in people with lumbar spine impairments, and compare discriminant validity of LFS measures (qIRT) generated using all items analyzed with a rating scale Item Response Theory model (RSM) and measures generated using the simulated CAT (qCAT). Methods: We performed a secondary analysis of retrospective intake rehabilitation data. Results: Unidimensionality and local independence of 25 BPFS and PF items were supported. Differential item functioning was negligible for levels of symptom acuity, gender, age, and surgical history. The RSM fit the data well. A lumbar spine specific CAT was developed that was 72% more efficient than using all 25 items to estimate LFS measures. qIRT and qCAT measures did not discriminate patients by symptom acuity, age, or gender, but discriminated patients by surgical history in similar clinically logical ways. qCAT measures were as precise as qIRT measures. Conclusion: A body part specific simulated CAT developed from an LFS item bank was efficient and produced precise measures of LFS without eroding discriminant validity. %B Journal of Clinical Epidemiology %V 59 %P 947–956 %G eng %R 10.1016/j.jclinepi.2005.10.017 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function %A Hart, D. L. %A Cook, K. F. %A Mioduski, J. E. %A Teal, C. R. %A Crane, P. K. %K computerized adaptive testing %K Flexilevel Scale of Shoulder Function %K Item Response Theory %K Rehabilitation %X

Background and Objective: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items,
develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (qIRT) and measures generated using the simulated CAT (qCAT).
Study Design and Setting: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients
with shoulder impairments who completed 60 SFS items.
Results: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items on were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The qIRT and qCAT measures were highly correlated (r 5 .96) and resulted in similar classifications of patients.
Conclusion: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good
discriminating ability. 

%B Journal of Clinical Epidemiology %V 59 %P 290-298 %G English %N 3 %0 Journal Article %J Anales de Psicología %D 2006 %T Técnicas para detectar patrones de respuesta atípicos [Aberrant patterns detection methods] %A Núñez, R. M. N. %A Pina, J. A. L. %K aberrant patterns detection %K Classical Test Theory %K generalizability theory %K Item Response %K Item Response Theory %K Mathematics %K methods %K person-fit %K Psychometrics %K psychometry %K Test Validity %K test validity analysis %K Theory %X La identificación de patrones de respuesta atípicos es de gran utilidad para la construcción de tests y de bancos de ítems con propiedades psicométricas así como para el análisis de validez de los mismos. En este trabajo de revisión se han recogido los más relevantes y novedosos métodos de ajuste de personas que se han elaborado dentro de cada uno de los principales ámbitos de trabajo de la Psicometría: el escalograma de Guttman, la Teoría Clásica de Tests (TCT), la Teoría de la Generalizabilidad (TG), la Teoría de Respuesta al Ítem (TRI), los Modelos de Respuesta al Ítem No Paramétricos (MRINP), los Modelos de Clase Latente de Orden Restringido (MCL-OR) y el Análisis de Estructura de Covarianzas (AEC).Aberrant patterns detection has a great usefulness in order to make tests and item banks with psychometric characteristics and validity analysis of tests and items. The most relevant and newest person-fit methods have been reviewed. All of them have been made in each one of main areas of Psychometry: Guttman's scalogram, Classical Test Theory (CTT), Generalizability Theory (GT), Item Response Theory (IRT), Non-parametric Response Models (NPRM), Order-Restricted Latent Class Models (OR-LCM) and Covariance Structure Analysis (CSA). %B Anales de Psicología %V 22 %P 143-154 %@ 0212-9728 %G Spanish %M 2006-07751-018 %0 Book Section %B Outcomes assessment in cancer %D 2005 %T Applications of item response theory to improve health outcomes assessment: Developing item banks, linking instruments, and computer-adaptive testing %A Hambleton, R. K. %E C. C. Gotay %E C. Snyder %K Computer Assisted Testing %K Health %K Item Response Theory %K Measurement %K Test Construction %K Treatment Outcomes %X (From the chapter) The current chapter builds on Reise's introduction to the basic concepts, assumptions, popular models, and important features of IRT and discusses the applications of item response theory (IRT) modeling to health outcomes assessment. In particular, we highlight the critical role of IRT modeling in: developing an instrument to match a study's population; linking two or more instruments measuring similar constructs on a common metric; and creating item banks that provide the foundation for tailored short-form instruments or for computerized adaptive assessments. (PsycINFO Database Record (c) 2005 APA ) %B Outcomes assessment in cancer %I Cambridge University Press %C Cambridge, UK %P 445-464 %G eng %0 Journal Article %J Educational Technology & Society %D 2005 %T An Authoring Environment for Adaptive Testing %A Guzmán, E %A Conejo, R %A García-Hervás, E %K Adaptability %K Adaptive Testing %K Authoring environment %K Item Response Theory %X

SIETTE is a web-based adaptive testing system. It implements Computerized Adaptive Tests. These tests are tailor-made, theory-based tests, where questions shown to students, finalization of the test, and student knowledge estimation is accomplished adaptively. To construct these tests, SIETTE has an authoring environment comprising a suite of tools that helps teachers create questions and tests properly, and analyze students’ performance after taking a test. In this paper, we present this authoring environment in the
framework of adaptive testing. As will be shown, this set of visual tools, that contain some adaptable eatures, can be useful for teachers lacking skills in this kind of testing. Additionally, other systems that implement adaptive testing will be studied. 

%B Educational Technology & Society %V 8 %P 66-76 %G eng %N 3 %0 Journal Article %J International Journal of Artificial Intelligence in Education %D 2005 %T A Bayesian student model without hidden nodes and its comparison with item response theory %A Desmarais, M. C. %A Pu, X. %K Bayesian Student Model %K computer adaptive testing %K hidden nodes %K Item Response Theory %X The Bayesian framework offers a number of techniques for inferring an individual's knowledge state from evidence of mastery of concepts or skills. A typical application where such a technique can be useful is Computer Adaptive Testing (CAT). A Bayesian modeling scheme, POKS, is proposed and compared to the traditional Item Response Theory (IRT), which has been the prevalent CAT approach for the last three decades. POKS is based on the theory of knowledge spaces and constructs item-to-item graph structures without hidden nodes. It aims to offer an effective knowledge assessment method with an efficient algorithm for learning the graph structure from data. We review the different Bayesian approaches to modeling student ability assessment and discuss how POKS relates to them. The performance of POKS is compared to the IRT two parameter logistic model. Experimental results over a 34 item Unix test and a 160 item French language test show that both approaches can classify examinees as master or non-master effectively and efficiently, with relatively comparable performance. However, more significant differences are found in favor of POKS for a second task that consists in predicting individual question item outcome. Implications of these results for adaptive testing and student modeling are discussed, as well as the limitations and advantages of POKS, namely the issue of integrating concepts into its structure. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B International Journal of Artificial Intelligence in Education %I IOS Press: Netherlands %V 15 %P 291-323 %@ 1560-4292 (Print); 1560-4306 (Electronic) %G eng %M 2006-10770-003 %0 Journal Article %J Health Services Research %D 2005 %T Dynamic assessment of health outcomes: Time to let the CAT out of the bag? %A Cook, K. F. %A O'Malley, K. J. %A Roddey, T. S. %K computer adaptive testing %K Item Response Theory %K self reported health outcomes %X Background: The use of item response theory (IRT) to measure self-reported outcomes has burgeoned in recent years. Perhaps the most important application of IRT is computer-adaptive testing (CAT), a measurement approach in which the selection of items is tailored for each respondent. Objective. To provide an introduction to the use of CAT in the measurement of health outcomes, describe several IRT models that can be used as the basis of CAT, and discuss practical issues associated with the use of adaptive scaling in research settings. Principal Points: The development of a CAT requires several steps that are not required in the development of a traditional measure including identification of "starting" and "stopping" rules. CAT's most attractive advantage is its efficiency. Greater measurement precision can be achieved with fewer items. Disadvantages of CAT include the high cost and level of technical expertise required to develop a CAT. Conclusions: Researchers, clinicians, and patients benefit from the availability of psychometrically rigorous measures that are not burdensome. CAT outcome measures hold substantial promise in this regard, but their development is not without challenges. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Health Services Research %I Blackwell Publishing: United Kingdom %V 40 %P 1694-1711 %@ 0017-9124 (Print); 1475-6773 (Electronic) %G eng %M 2006-02162-008 %0 Journal Article %J Acta Psychologica Sinica %D 2005 %T [Item characteristic curve equating under graded response models in IRT] %A Jun, Z. %A Dongming, O. %A Shuyuan, X. %A Haiqi, D. %A Shuqing, Q. %K graded response models %K item characteristic curve %K Item Response Theory %X In one of the largest qualificatory tests--economist test, to guarantee the comparability among different years, construct item bank and prepare for computerized adaptive testing, item characteristic curve equating and anchor test equating design under graded models in IRT are used, which have realized the item and ability parameter equating of test data in five years and succeeded in establishing an item bank. Based on it, cut scores of different years are compared by equating and provide demonstrational gist to constitute the eligibility standard of economist test. %B Acta Psychologica Sinica %I Science Press: China %V 37 %P 832-838 %@ 0439-755X (Print) %G eng %M 2005-16031-017 %0 Journal Article %J Journal of Educational Measurement %D 2004 %T Using patterns of summed scores in paper-and-pencil tests and computer-adaptive tests to detect misfitting item score patterns %A Meijer, R. R. %K Computer Assisted Testing %K Item Response Theory %K person Fit %K Test Scores %X Two new methods have been proposed to determine unexpected sum scores on subtests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted ρ, was compared with a method where the probability for each score combination was calculated using a highest density region (HDR). Furthermore, these methods were compared with the standardized log-likelihood statistic with and without a correction for the estimated latent trait value (denoted as l-super(*)-sub(z) and l-sub(z), respectively). Data were simulated on the basis of the one-parameter logistic model, and both parametric and nonparametric logistic regression was used to obtain estimates of the latent trait. Results showed that it is important to take the trait level into account when comparing subtest scores. In a nonparametric item response theory (IRT) context, on adapted version of the HDR method was a powerful alterative to ρ. In a parametric IRT context, results showed that l-super(*)-sub(z) had the highest power when the data were simulated conditionally on the estimated latent trait level. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Journal of Educational Measurement %V 41 %P 119-136 %G eng %0 Journal Article %J International Journal of Selection and Assessment %D 2003 %T Computerized adaptive rating scales for measuring managerial performance %A Schneider, R. J. %A Goff, M. %A Anderson, S. %A Borman, W. C. %K Adaptive Testing %K Algorithms %K Associations %K Citizenship %K Computer Assisted Testing %K Construction %K Contextual %K Item Response Theory %K Job Performance %K Management %K Management Personnel %K Rating Scales %K Test %X Computerized adaptive rating scales (CARS) had been developed to measure contextual or citizenship performance. This rating format used a paired-comparison protocol, presenting pairs of behavioral statements scaled according to effectiveness levels, and an iterative item response theory algorithm to obtain estimates of ratees' citizenship performance (W. C. Borman et al, 2001). In the present research, we developed CARS to measure the entire managerial performance domain, including task and citizenship performance, thus addressing a major limitation of the earlier CARS. The paper describes this development effort, including an adjustment to the algorithm that reduces substantially the number of item pairs required to obtain almost as much precision in the performance estimates. (PsycINFO Database Record (c) 2005 APA ) %B International Journal of Selection and Assessment %V 11 %P 237-246 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Computerized adaptive testing using the nearest-neighbors criterion %A Cheng, P. E. %A Liou, M. %K (Statistical) %K Adaptive Testing %K Computer Assisted Testing %K Item Analysis %K Item Response Theory %K Statistical Analysis %K Statistical Estimation computerized adaptive testing %K Statistical Tests %X Item selection procedures designed for computerized adaptive testing need to accurately estimate every taker's trait level (θ) and, at the same time, effectively use all items in a bank. Empirical studies showed that classical item selection procedures based on maximizing Fisher or other related information yielded highly varied item exposure rates; with these procedures, some items were frequently used whereas others were rarely selected. In the literature, methods have been proposed for controlling exposure rates; they tend to affect the accuracy in θ estimates, however. A modified version of the maximum Fisher information (MFI) criterion, coined the nearest neighbors (NN) criterion, is proposed in this study. The NN procedure improves to a moderate extent the undesirable item exposure rates associated with the MFI criterion and keeps sufficient precision in estimates. The NN criterion will be compared with a few other existing methods in an empirical study using the mean squared errors in θ estimates and plots of item exposure rates associated with different distributions. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 27 %P 204-216 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Item exposure constraints for testlets in the verbal reasoning section of the MCAT %A Davis, L. L. %A Dodd, B. G. %K Adaptive Testing %K Computer Assisted Testing %K Entrance Examinations %K Item Response Theory %K Random Sampling %K Reasoning %K Verbal Ability computerized adaptive testing %X The current study examined item exposure control procedures for testlet scored reading passages in the Verbal Reasoning section of the Medical College Admission Test with four computerized adaptive testing (CAT) systems using the partial credit model. The first system used a traditional CAT using maximum information item selection. The second used random item selection to provide a baseline for optimal exposure rates. The third used a variation of Lunz and Stahl's randomization procedure. The fourth used Luecht and Nungester's computerized adaptive sequential testing (CAST) system. A series of simulated fixed-length CATs was run to determine the optimal item length selection procedure. Results indicated that both the randomization procedure and CAST performed well in terms of exposure control and measurement precision, with the CAST system providing the best overall solution when all variables were taken into consideration. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 27 %P 335-356 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Optimal stratification of item pools in α-stratified computerized adaptive testing %A Chang, Hua-Hua %A van der Linden, W. J. %K Adaptive Testing %K Computer Assisted Testing %K Item Content (Test) %K Item Response Theory %K Mathematical Modeling %K Test Construction computerized adaptive testing %X A method based on 0-1 linear programming (LP) is presented to stratify an item pool optimally for use in α-stratified adaptive testing. Because the 0-1 LP model belongs to the subclass of models with a network flow structure, efficient solutions are possible. The method is applied to a previous item pool from the computerized adaptive testing (CAT) version of the Graduate Record Exams (GRE) Quantitative Test. The results indicate that the new method performs well in practical situations. It improves item exposure control, reduces the mean squared error in the θ estimates, and increases test reliability. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 27 %P 262-274 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T A comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model %A Pastor, D. A. %A Dodd, B. G. %A Chang, Hua-Hua %K (Statistical) %K Adaptive Testing %K Algorithms computerized adaptive testing %K Computer Assisted Testing %K Item Analysis %K Item Response Theory %K Mathematical Modeling %X The use of more performance items in large-scale testing has led to an increase in the research investigating the use of polytomously scored items in computer adaptive testing (CAT). Because this research has to be complemented with information pertaining to exposure control, the present research investigated the impact of using five different exposure control algorithms in two sized item pools calibrated using the generalized partial credit model. The results of the simulation study indicated that the a-stratified design, in comparison to a no-exposure control condition, could be used to reduce item exposure and overlap, increase pool utilization, and only minorly degrade measurement precision. Use of the more restrictive exposure control algorithms, such as the Sympson-Hetter and conditional Sympson-Hetter, controlled exposure to a greater extent but at the cost of measurement precision. Because convergence of the exposure control parameters was problematic for some of the more restrictive exposure control algorithms, use of the more simplistic exposure control mechanisms, particularly when the test length to item pool size ratio is large, is recommended. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 26 %P 147-163 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2002 %T Data sparseness and on-line pretest item calibration-scaling methods in CAT %A Ban, J-C. %A Hanson, B. A. %A Yi, Q. %A Harris, D. J. %K Computer Assisted Testing %K Educational Measurement %K Item Response Theory %K Maximum Likelihood %K Methodology %K Scaling (Testing) %K Statistical Data %X Compared and evaluated 3 on-line pretest item calibration-scaling methods (the marginal maximum likelihood estimate with 1 expectation maximization [EM] cycle [OEM] method, the marginal maximum likelihood estimate with multiple EM cycles [MEM] method, and M. L. Stocking's Method B) in terms of item parameter recovery when the item responses to the pretest items in the pool are sparse. Simulations of computerized adaptive tests were used to evaluate the results yielded by the three methods. The MEM method produced the smallest average total error in parameter estimation, and the OEM method yielded the largest total error (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 39 %P 207-218 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T An EM approach to parameter estimation for the Zinnes and Griggs paired comparison IRT model %A Stark, S. %A F Drasgow %K Adaptive Testing %K Computer Assisted Testing %K Item Response Theory %K Maximum Likelihood %K Personnel Evaluation %K Statistical Correlation %K Statistical Estimation %X Borman et al. recently proposed a computer adaptive performance appraisal system called CARS II that utilizes paired comparison judgments of behavioral stimuli. To implement this approach,the paired comparison ideal point model developed by Zinnes and Griggs was selected. In this article,the authors describe item response and information functions for the Zinnes and Griggs model and present procedures for estimating stimulus and person parameters. Monte carlo simulations were conducted to assess the accuracy of the parameter estimation procedures. The results indicated that at least 400 ratees (i.e.,ratings) are required to obtain reasonably accurate estimates of the stimulus parameters and their standard errors. In addition,latent trait estimation improves as test length increases. The implications of these results for test construction are also discussed. %B Applied Psychological Measurement %V 26 %P 208-227 %G eng %0 Report %D 2002 %T Mathematical-programming approaches to test item pool design %A Veldkamp, B. P. %A van der Linden, W. J. %A Ariel, A. %K Adaptive Testing %K Computer Assisted %K Computer Programming %K Educational Measurement %K Item Response Theory %K Mathematics %K Psychometrics %K Statistical Rotation computerized adaptive testing %K Test Items %K Testing %X (From the chapter) This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing and hence to increase both measurement precision and validity. The approach consists of the application of mathematical programming techniques to calculate optimal blueprints for item pools. These blueprints can be used to guide the item-writing process. Three different types of design problems are discussed, namely for item pools for linear tests, item pools computerized adaptive testing (CAT), and systems of rotating item pools for CAT. The paper concludes with an empirical example of the problem of designing a system of rotating item pools for CAT. %I University of Twente, Faculty of Educational Science and Technology %C Twente, The Netherlands %P 93-108 %@ 02-09 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2002 %T Outlier detection in high-stakes certification testing %A Meijer, R. R. %K Adaptive Testing %K computerized adaptive testing %K Educational Measurement %K Goodness of Fit %K Item Analysis (Statistical) %K Item Response Theory %K person Fit %K Statistical Estimation %K Statistical Power %K Test Scores %X Discusses recent developments of person-fit analysis in computerized adaptive testing (CAT). Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in CAT Most person-fit research in CAT is restricted to simulated data. In this study, empirical data from a certification test were used. Alternatives are discussed to generate norms so that bounds can be determined to classify an item score pattern as fitting or misfitting. Using bounds determined from a sample of a high-stakes certification test, the empirical analysis showed that different types of misfit can be distinguished Further applications using statistical process control methods to detect misfitting item score patterns are discussed. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 39 %P 219-233 %G eng %0 Journal Article %J Assessment %D 2002 %T A structure-based approach to psychological measurement: Matching measurement models to latent structure %A Ruscio, John %A Ruscio, Ayelet Meron %K Adaptive Testing %K Assessment %K Classification (Cognitive Process) %K Computer Assisted %K Item Response Theory %K Psychological %K Scaling (Testing) %K Statistical Analysis computerized adaptive testing %K Taxonomies %K Testing %X The present article sets forth the argument that psychological assessment should be based on a construct's latent structure. The authors differentiate dimensional (continuous) and taxonic (categorical) structures at the latent and manifest levels and describe the advantages of matching the assessment approach to the latent structure of a construct. A proper match will decrease measurement error, increase statistical power, clarify statistical relationships, and facilitate the location of an efficient cutting score when applicable. Thus, individuals will be placed along a continuum or assigned to classes more accurately. The authors briefly review the methods by which latent structure can be determined and outline a structure-based approach to assessment that builds on dimensional scaling models, such as item response theory, while incorporating classification methods as appropriate. Finally, the authors empirically demonstrate the utility of their approach and discuss its compatibility with traditional assessment methods and with computerized adaptive testing. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Assessment %V 9 %P 4-16 %G eng %0 Journal Article %J Behaviormetrika %D 2001 %T Developments in measurement of persons and items by means of item response models %A Sijtsma, K. %K Cognitive %K Computer Assisted Testing %K Item Response Theory %K Models %K Nonparametric Statistical Tests %K Processes %X This paper starts with a general introduction into measurement of hypothetical constructs typical of the social and behavioral sciences. After the stages ranging from theory through operationalization and item domain to preliminary test or questionnaire have been treated, the general assumptions of item response theory are discussed. The family of parametric item response models for dichotomous items is introduced and it is explained how parameters for respondents and items are estimated from the scores collected from a sample of respondents who took the test or questionnaire. Next, the family of nonparametric item response models is explained, followed by the 3 classes of item response models for polytomous item scores (e.g., rating scale scores). Then, to what degree the mean item score and the unweighted sum of item scores for persons are useful for measuring items and persons in the context of item response theory is discussed. Methods for fitting parametric and nonparametric models to data are briefly discussed. Finally, the main applications of item response models are discussed, which include equating and item banking, computerized and adaptive testing, research into differential item functioning, person fit research, and cognitive modeling. (PsycINFO Database Record (c) 2005 APA ) %B Behaviormetrika %V 28 %P 65-94 %G eng %0 Book Section %B Test scoring %D 2001 %T Item response theory applied to combinations of multiple-choice and constructed-response items--approximation methods for scale scores %A Thissen, D. %A Nelson, L. A. %A Swygert, K. A. %K Adaptive Testing %K Item Response Theory %K Method) %K Multiple Choice (Testing %K Scoring (Testing) %K Statistical Estimation %K Statistical Weighting %K Test Items %K Test Scores %X (From the chapter) The authors develop approximate methods that replace the scoring tables with weighted linear combinations of the component scores. Topics discussed include: a linear approximation for the extension to combinations of scores; the generalization of two or more scores; potential applications of linear approximations to item response theory in computerized adaptive tests; and evaluation of the pattern-of-summed-scores, and Gaussian approximation, estimates of proficiency. (PsycINFO Database Record (c) 2005 APA ) %B Test scoring %I Lawrence Erlbaum Associates %C Mahwah, N.J. USA %P 289-315 %G eng %& 8 %0 Journal Article %J Medical Care %D 2000 %T Emergence of item response modeling in instrument development and data analysis %A Hambleton, R. K. %K Computer Assisted Testing %K Health %K Item Response Theory %K Measurement %K Statistical Validity computerized adaptive testing %K Test Construction %K Treatment Outcomes %B Medical Care %V 38 %P II60-II65 %G eng %0 Journal Article %J Dissertation Abstracts International Section A: Humanities and Social Sciences %D 2000 %T An exploratory analysis of item parameters and characteristics that influence item level response time %A Smith, Russell Winsor %K Item Analysis (Statistical) %K Item Response Theory %K Problem Solving %K Reaction Time %K Reading Comprehension %K Reasoning %X This research examines the relationship between item level response time and (1) item discrimination, (2) item difficulty, (3) word count, (4) item type, and (5) whether a figure is included in an item. Data are from the Graduate Management Admission Test, which is currently offered only as a computerized adaptive test. Analyses revealed significant differences in response time between the five item types: problem solving, data sufficiency, sentence correction, critical reasoning, and reading comprehension. For this reason, the planned pairwise and complex analyses were run within each item type. Pairwise curvilinear regression analyses explored the relationship between response time and item discrimination, item difficulty, and word count. Item difficulty significantly contributed to the prediction of response time for each item type; two of the relationships were significantly quadratic. Item discrimination significantly contributed to the prediction of response time for only two of the item types; one revealed a quadratic relationship and the other a cubic relationship. Word count had significant linear relationship with response time for all the item types except reading comprehension, for which there was no significant relationship. Multiple regression analyses using word count, item difficulty, and item discrimination predicted between 35.4% and 71.4% of the variability in item response time across item types. The results suggest that response time research should consider the type of item that is being administered and continue to explore curvilinear relationships between response time and its predictor variables. (PsycINFO Database Record (c) 2005 APA ) %B Dissertation Abstracts International Section A: Humanities and Social Sciences %V 61 %P 1812 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T An integer programming approach to item bank design %A van der Linden, W. J. %A Veldkamp, B. P. %A Reese, L. M. %K Aptitude Measures %K Item Analysis (Test) %K Item Response Theory %K Test Construction %K Test Items %X An integer programming approach to item bank design is presented that can be used to calculate an optimal blueprint for an item bank, in order to support an existing testing program. The results are optimal in that they minimize the effort involved in producing the items as revealed by current item writing patterns. Also presented is an adaptation of the models, which can be used as a set of monitoring tools in item bank management. The approach is demonstrated empirically for an item bank that was designed for the Law School Admission Test. %B Applied Psychological Measurement %V 24 %P 139-150 %G eng