%0 Journal Article %J Personality and Individual Differences %D 2010 %T Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM %A Egberink, I. J. L. %A Meijer, R. R. %A Veldkamp, B. P. %A Schakel, L. %A Smid, N. G. %K CAT %K computerized adaptive testing %K CUSUM approach %K person Fit %X The scalability of individual trait scores on a computerized adaptive test (CAT) was assessed through investigating the consistency of individual item score patterns. A sample of N = 428 persons completed a personality CAT as part of a career development procedure. To detect inconsistent item score patterns, we used a cumulative sum (CUSUM) procedure. Combined information from the CUSUM, other personality measures, and interviews showed that similar estimated trait values may have a different interpretation.Implications for computer-based assessment are discussed. %B Personality and Individual Differences %V 48 %P 921-925 %@ 01918869 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2004 %T Using patterns of summed scores in paper-and-pencil tests and computer-adaptive tests to detect misfitting item score patterns %A Meijer, R. R. %K Computer Assisted Testing %K Item Response Theory %K person Fit %K Test Scores %X Two new methods have been proposed to determine unexpected sum scores on subtests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted ρ, was compared with a method where the probability for each score combination was calculated using a highest density region (HDR). Furthermore, these methods were compared with the standardized log-likelihood statistic with and without a correction for the estimated latent trait value (denoted as l-super(*)-sub(z) and l-sub(z), respectively). Data were simulated on the basis of the one-parameter logistic model, and both parametric and nonparametric logistic regression was used to obtain estimates of the latent trait. Results showed that it is important to take the trait level into account when comparing subtest scores. In a nonparametric item response theory (IRT) context, on adapted version of the HDR method was a powerful alterative to ρ. In a parametric IRT context, results showed that l-super(*)-sub(z) had the highest power when the data were simulated conditionally on the estimated latent trait level. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Journal of Educational Measurement %V 41 %P 119-136 %G eng %0 Journal Article %J Psychometrika %D 2003 %T Using response times to detect aberrant responses in computerized adaptive testing %A van der Linden, W. J. %A van Krimpen-Stoop, E. M. L. A. %K Adaptive Testing %K Behavior %K Computer Assisted Testing %K computerized adaptive testing %K Models %K person Fit %K Prediction %K Reaction Time %X A lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered. %B Psychometrika %V 68 %P 251-265 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2002 %T Outlier detection in high-stakes certification testing %A Meijer, R. R. %K Adaptive Testing %K computerized adaptive testing %K Educational Measurement %K Goodness of Fit %K Item Analysis (Statistical) %K Item Response Theory %K person Fit %K Statistical Estimation %K Statistical Power %K Test Scores %X Discusses recent developments of person-fit analysis in computerized adaptive testing (CAT). Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in CAT Most person-fit research in CAT is restricted to simulated data. In this study, empirical data from a certification test were used. Alternatives are discussed to generate norms so that bounds can be determined to classify an item score pattern as fitting or misfitting. Using bounds determined from a sample of a high-stakes certification test, the empirical analysis showed that different types of misfit can be distinguished Further applications using statistical process control methods to detect misfitting item score patterns are discussed. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 39 %P 219-233 %G eng %0 Book Section %B Computer adaptive testing: Theory and practice %D 2000 %T Detecting person misfit in adaptive testing using statistical process control techniques %A van Krimpen-Stoop, E. M. L. A. %A Meijer, R. R. %K person Fit %B Computer adaptive testing: Theory and practice %I Kluwer Academic. %C Dordrecht, The Netherlands %P 201-219 %G eng