TY - JOUR T1 - Computerized adaptive testing for polytomous motivation items: Administration mode effects and a comparison with short forms JF - Applied Psychological Measurement Y1 - 2007 A1 - Hol, A. M. A1 - Vorst, H. C. M. A1 - Mellenbergh, G. J. KW - 2220 Tests & Testing KW - Adaptive Testing KW - Attitude Measurement KW - computer adaptive testing KW - Computer Assisted Testing KW - items KW - Motivation KW - polytomous motivation KW - Statistical Validity KW - Test Administration KW - Test Forms KW - Test Items AB - In a randomized experiment (n=515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit. (PsycINFO Database Record (c) 2007 APA ) (journal abstract) VL - 31 SN - 0146-6216 N1 - 10.1177/0146621606297314Journal; Peer Reviewed Journal; Journal Article ER - TY - CHAP T1 - Computer-based testing T2 - Handbook of multimethod measurement in psychology Y1 - 2006 A1 - F Drasgow A1 - Chuah, S. C. KW - Adaptive Testing computerized adaptive testing KW - Computer Assisted Testing KW - Experimentation KW - Psychometrics KW - Theories AB - (From the chapter) There has been a proliferation of research designed to explore and exploit opportunities provided by computer-based assessment. This chapter provides an overview of the diverse efforts by researchers in this area. It begins by describing how paper-and-pencil tests can be adapted for administration by computers. Computerization provides the important advantage that items can be selected so they are of appropriate difficulty for each examinee. Some of the psychometric theory needed for computerized adaptive testing is reviewed. Then research on innovative computerized assessments is summarized. These assessments go beyond multiple-choice items by using formats made possible by computerization. Then some hardware and software issues are described, and finally, directions for future work are outlined. (PsycINFO Database Record (c) 2006 APA ) JF - Handbook of multimethod measurement in psychology PB - American Psychological Association CY - Washington D.C. USA VL - xiv N1 - Using Smart Source ParsingHandbook of multimethod measurement in psychology. (pp. 87-100). Washington, DC : American Psychological Association, [URL:http://www.apa.org/books]. xiv, 553 pp ER - TY - CHAP T1 - Applications of item response theory to improve health outcomes assessment: Developing item banks, linking instruments, and computer-adaptive testing T2 - Outcomes assessment in cancer Y1 - 2005 A1 - Hambleton, R. K. ED - C. C. Gotay ED - C. Snyder KW - Computer Assisted Testing KW - Health KW - Item Response Theory KW - Measurement KW - Test Construction KW - Treatment Outcomes AB - (From the chapter) The current chapter builds on Reise's introduction to the basic concepts, assumptions, popular models, and important features of IRT and discusses the applications of item response theory (IRT) modeling to health outcomes assessment. In particular, we highlight the critical role of IRT modeling in: developing an instrument to match a study's population; linking two or more instruments measuring similar constructs on a common metric; and creating item banks that provide the foundation for tailored short-form instruments or for computerized adaptive assessments. (PsycINFO Database Record (c) 2005 APA ) JF - Outcomes assessment in cancer PB - Cambridge University Press CY - Cambridge, UK N1 - Using Smart Source ParsingOutcomes assessment in cancer: Measures, methods, and applications. (pp. 445-464). New York, NY : Cambridge University Press. xiv, 662 pp ER - TY - JOUR T1 - A computer-assisted test design and diagnosis system for use by classroom teachers JF - Journal of Computer Assisted Learning Y1 - 2005 A1 - He, Q. A1 - Tymms, P. KW - Computer Assisted Testing KW - Computer Software KW - Diagnosis KW - Educational Measurement KW - Teachers AB - Computer-assisted assessment (CAA) has become increasingly important in education in recent years. A variety of computer software systems have been developed to help assess the performance of students at various levels. However, such systems are primarily designed to provide objective assessment of students and analysis of test items, and focus has been mainly placed on higher and further education. Although there are commercial professional systems available for use by primary and secondary educational institutions, such systems are generally expensive and require skilled expertise to operate. In view of the rapid progress made in the use of computer-based assessment for primary and secondary students by education authorities here in the UK and elsewhere, there is a need to develop systems which are economic and easy to use and can provide the necessary information that can help teachers improve students' performance. This paper presents the development of a software system that provides a range of functions including generating items and building item banks, designing tests, conducting tests on computers and analysing test results. Specifically, the system can generate information on the performance of students and test items that can be easily used to identify curriculum areas where students are under performing. A case study based on data collected from five secondary schools in Hong Kong involved in the Curriculum, Evaluation and Management Centre's Middle Years Information System Project, Durham University, UK, has been undertaken to demonstrate the use of the system for diagnostic and performance analysis. (PsycINFO Database Record (c) 2006 APA ) (journal abstract) VL - 21 ER - TY - JOUR T1 - Controlling item exposure and test overlap in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2005 A1 - Chen, S-Y. A1 - Lei, P-W. KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Content (Test) computerized adaptive testing AB - This article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 29 ER - TY - JOUR T1 - Propiedades psicométricas de un test Adaptativo Informatizado para la medición del ajuste emocional [Psychometric properties of an Emotional Adjustment Computerized Adaptive Test] JF - Psicothema Y1 - 2005 A1 - Aguado, D. A1 - Rubio, V. J. A1 - Hontangas, P. M. A1 - Hernández, J. M. KW - Computer Assisted Testing KW - Emotional Adjustment KW - Item Response KW - Personality Measures KW - Psychometrics KW - Test Validity KW - Theory AB - En el presente trabajo se describen las propiedades psicométricas de un Test Adaptativo Informatizado para la medición del ajuste emocional de las personas. La revisión de la literatura acerca de la aplicación de los modelos de la teoría de la respuesta a los ítems (TRI) muestra que ésta se ha utilizado más en el trabajo con variables aptitudinales que para la medición de variables de personalidad, sin embargo diversos estudios han mostrado la eficacia de la TRI para la descripción psicométrica de dichasvariables. Aun así, pocos trabajos han explorado las características de un Test Adaptativo Informatizado, basado en la TRI, para la medición de una variable de personalidad como es el ajuste emocional. Nuestros resultados muestran la eficiencia del TAI para la evaluación del ajuste emocional, proporcionando una medición válida y precisa, utilizando menor número de elementos de medida encomparación con las escalas de ajuste emocional de instrumentos fuertemente implantados. Psychometric properties of an emotional adjustment computerized adaptive test. In the present work it was described the psychometric properties of an emotional adjustment computerized adaptive test. An examination of Item Response Theory (IRT) research literature indicates that IRT has been mainly used for assessing achievements and ability rather than personality factors. Nevertheless last years have shown several studies wich have successfully used IRT to personality assessment instruments. Even so, a few amount of works has inquired the computerized adaptative test features, based on IRT, for the measurement of a personality traits as it’s the emotional adjustment. Our results show the CAT efficiency for the emotional adjustment assessment so this provides a valid and accurate measurement; by using a less number of items in comparison with the emotional adjustment scales from the most strongly established questionnaires. VL - 17 ER - TY - JOUR T1 - A randomized experiment to compare conventional, computerized, and computerized adaptive administration of ordinal polytomous attitude items JF - Applied Psychological Measurement Y1 - 2005 A1 - Hol, A. M. A1 - Vorst, H. C. M. A1 - Mellenbergh, G. J. KW - Computer Assisted Testing KW - Test Administration KW - Test Items AB - A total of 520 high school students were randomly assigned to a paper-and-pencil test (PPT), a computerized standard test (CST), or a computerized adaptive test (CAT) version of the Dutch School Attitude Questionnaire (SAQ), consisting of ordinal polytomous items. The CST administered items in the same order as the PPT. The CAT administered all items of three SAQ subscales in adaptive order using Samejima's graded response model, so that six different stopping rule settings could be applied afterwards. School marks were used as external criteria. Results showed significant but small multivariate administration mode effects on conventional raw scores and small to medium effects on maximum likelihood latent trait estimates. When the precision of CAT latent trait estimates decreased, correlations with grade point average in general decreased. However, the magnitude of the decrease was not very large as compared to the PPT, the CST, and the CAT without the stopping rule. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 29 ER - TY - JOUR T1 - Somministrazione di test computerizzati di tipo adattivo: Un' applicazione del modello di misurazione di Rasch [Administration of computerized and adaptive tests: An application of the Rasch Model] JF - Testing Psicometria Metodologia Y1 - 2005 A1 - Miceli, R. A1 - Molinengo, G. KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Response Theory computerized adaptive testing KW - Models KW - Psychometrics AB - The aim of the present study is to describe the characteristics of a procedure for administering computerized and adaptive tests (Computer Adaptive Testing or CAT). Items to be asked to the individuals are interactively chosen and are selected from a "bank" in which they were previously calibrated and recorded on the basis of their difficulty level. The selection of items is performed by increasingly more accurate estimates of the examinees' ability. The building of an item-bank on Psychometrics and the implementation of this procedure allow a first validation through Monte Carlo simulations. (PsycINFO Database Record (c) 2006 APA ) (journal abstract) VL - 12 ER - TY - JOUR T1 - Assisted self-adapted testing: A comparative study JF - European Journal of Psychological Assessment Y1 - 2004 A1 - Hontangas, P. A1 - Olea, J. A1 - Ponsoda, V. A1 - Revuelta, J. A1 - Wise, S. L. KW - Adaptive Testing KW - Anxiety KW - Computer Assisted Testing KW - Psychometrics KW - Test AB - A new type of self-adapted test (S-AT), called Assisted Self-Adapted Test (AS-AT), is presented. It differs from an ordinary S-AT in that prior to selecting the difficulty category, the computer advises examinees on their best difficulty category choice, based on their previous performance. Three tests (computerized adaptive test, AS-AT, and S-AT) were compared regarding both their psychometric (precision and efficiency) and psychological (anxiety) characteristics. Tests were applied in an actual assessment situation, in which test scores determined 20% of term grades. A sample of 173 high school students participated. Neither differences in posttest anxiety nor ability were obtained. Concerning precision, AS-AT was as precise as CAT, and both revealed more precision than S-AT. It was concluded that AS-AT acted as a CAT concerning precision. Some hints, but not conclusive support, of the psychological similarity between AS-AT and S-AT was also found. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 20 ER - TY - JOUR T1 - Kann die Konfundierung von Konzentrationsleistung und Aktivierung durch adaptives Testen mit dern FAKT vermieden werden? [Avoiding the confounding of concentration performance and activation by adaptive testing with the FACT] JF - Zeitschrift für Differentielle und Diagnostische Psychologie Y1 - 2004 A1 - Frey, A. A1 - Moosbrugger, H. KW - Adaptive Testing KW - Computer Assisted Testing KW - Concentration KW - Performance KW - Testing computerized adaptive testing AB - The study investigates the effect of computerized adaptive testing strategies on the confounding of concentration performance with activation. A sample of 54 participants was administered 1 out of 3 versions (2 adaptive, 1 non-adaptive) of the computerized Frankfurt Adaptive Concentration Test FACT (Moosbrugger & Heyden, 1997) at three subsequent points in time. During the test administration changes in activation (electrodermal activity) were recorded. The results pinpoint a confounding of concentration performance with activation for the non-adaptive test version, but not for the adaptive test versions (p = .01). Thus, adaptive FACT testing strategies can remove the confounding of concentration performance with activation, thereby increasing the discriminant validity. In conclusion, an attention-focusing-hypothesis is formulated to explain the observed effect. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 25 ER - TY - JOUR T1 - Using patterns of summed scores in paper-and-pencil tests and computer-adaptive tests to detect misfitting item score patterns JF - Journal of Educational Measurement Y1 - 2004 A1 - Meijer, R. R. KW - Computer Assisted Testing KW - Item Response Theory KW - person Fit KW - Test Scores AB - Two new methods have been proposed to determine unexpected sum scores on subtests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted ρ, was compared with a method where the probability for each score combination was calculated using a highest density region (HDR). Furthermore, these methods were compared with the standardized log-likelihood statistic with and without a correction for the estimated latent trait value (denoted as l-super(*)-sub(z) and l-sub(z), respectively). Data were simulated on the basis of the one-parameter logistic model, and both parametric and nonparametric logistic regression was used to obtain estimates of the latent trait. Results showed that it is important to take the trait level into account when comparing subtest scores. In a nonparametric item response theory (IRT) context, on adapted version of the HDR method was a powerful alterative to ρ. In a parametric IRT context, results showed that l-super(*)-sub(z) had the highest power when the data were simulated conditionally on the estimated latent trait level. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 41 ER - TY - CHAP T1 - Assessing question banks T2 - Reusing online resources: A sustanable approach to e-learning Y1 - 2003 A1 - Bull, J. A1 - Dalziel, J. A1 - Vreeland, T. KW - Computer Assisted Testing KW - Curriculum Based Assessment KW - Education KW - Technology computerized adaptive testing AB - In Chapter 14, Joanna Bull and James Daziel provide a comprehensive treatment of the issues surrounding the use of Question Banks and Computer Assisted Assessment, and provide a number of excellent examples of implementations. In their review of the technologies employed in Computer Assisted Assessment the authors include Computer Adaptive Testing and data generation. The authors reveal significant issues involving the impact of Intellectual Property rights and computer assisted assessment and make important suggestions for strategies to overcome these obstacles. (PsycINFO Database Record (c) 2005 APA )http://www-jime.open.ac.uk/2003/1/ (journal abstract) JF - Reusing online resources: A sustanable approach to e-learning PB - Kogan Page Ltd. CY - London, UK ER - TY - JOUR T1 - A Bayesian method for the detection of item preknowledge in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2003 A1 - McLeod, L. A1 - Lewis, C. A1 - Thissen, D. KW - Adaptive Testing KW - Cheating KW - Computer Assisted Testing KW - Individual Differences computerized adaptive testing KW - Item KW - Item Analysis (Statistical) KW - Mathematical Modeling KW - Response Theory AB - With the increased use of continuous testing in computerized adaptive testing, new concerns about test security have evolved, such as how to ensure that items in an item pool are safeguarded from theft. In this article, procedures to detect test takers using item preknowledge are explored. When test takers use item preknowledge, their item responses deviate from the underlying item response theory (IRT) model, and estimated abilities may be inflated. This deviation may be detected through the use of person-fit indices. A Bayesian posterior log odds ratio index is proposed for detecting the use of item preknowledge. In this approach to person fit, the estimated probability that each test taker has preknowledge of items is updated after each item response. These probabilities are based on the IRT parameters, a model specifying the probability that each item has been memorized, and the test taker's item responses. Simulations based on an operational computerized adaptive test (CAT) pool are used to demonstrate the use of the odds ratio index. (PsycINFO Database Record (c) 2005 APA ) VL - 27 ER - TY - JOUR T1 - A comparative study of item exposure control methods in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Chang, S-W. A1 - Ansley, T. N. KW - Adaptive Testing KW - Computer Assisted Testing KW - Educational KW - Item Analysis (Statistical) KW - Measurement KW - Strategies computerized adaptive testing AB - This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The M. L. Stocking and C. Lewis conditional multinomial procedure and, to a slightly lesser extent, the T. Davey and C. G. Parshall method seemed to be the most promising considering all of the factors that this study addressed. (PsycINFO Database Record (c) 2005 APA ) VL - 40 ER - TY - JOUR T1 - Computerized adaptive rating scales for measuring managerial performance JF - International Journal of Selection and Assessment Y1 - 2003 A1 - Schneider, R. J. A1 - Goff, M. A1 - Anderson, S. A1 - Borman, W. C. KW - Adaptive Testing KW - Algorithms KW - Associations KW - Citizenship KW - Computer Assisted Testing KW - Construction KW - Contextual KW - Item Response Theory KW - Job Performance KW - Management KW - Management Personnel KW - Rating Scales KW - Test AB - Computerized adaptive rating scales (CARS) had been developed to measure contextual or citizenship performance. This rating format used a paired-comparison protocol, presenting pairs of behavioral statements scaled according to effectiveness levels, and an iterative item response theory algorithm to obtain estimates of ratees' citizenship performance (W. C. Borman et al, 2001). In the present research, we developed CARS to measure the entire managerial performance domain, including task and citizenship performance, thus addressing a major limitation of the earlier CARS. The paper describes this development effort, including an adjustment to the algorithm that reduces substantially the number of item pairs required to obtain almost as much precision in the performance estimates. (PsycINFO Database Record (c) 2005 APA ) VL - 11 ER - TY - JOUR T1 - Computerized adaptive testing using the nearest-neighbors criterion JF - Applied Psychological Measurement Y1 - 2003 A1 - Cheng, P. E. A1 - Liou, M. KW - (Statistical) KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Analysis KW - Item Response Theory KW - Statistical Analysis KW - Statistical Estimation computerized adaptive testing KW - Statistical Tests AB - Item selection procedures designed for computerized adaptive testing need to accurately estimate every taker's trait level (θ) and, at the same time, effectively use all items in a bank. Empirical studies showed that classical item selection procedures based on maximizing Fisher or other related information yielded highly varied item exposure rates; with these procedures, some items were frequently used whereas others were rarely selected. In the literature, methods have been proposed for controlling exposure rates; they tend to affect the accuracy in θ estimates, however. A modified version of the maximum Fisher information (MFI) criterion, coined the nearest neighbors (NN) criterion, is proposed in this study. The NN procedure improves to a moderate extent the undesirable item exposure rates associated with the MFI criterion and keeps sufficient precision in estimates. The NN criterion will be compared with a few other existing methods in an empirical study using the mean squared errors in θ estimates and plots of item exposure rates associated with different distributions. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 27 ER - TY - JOUR T1 - Item exposure constraints for testlets in the verbal reasoning section of the MCAT JF - Applied Psychological Measurement Y1 - 2003 A1 - Davis, L. L. A1 - Dodd, B. G. KW - Adaptive Testing KW - Computer Assisted Testing KW - Entrance Examinations KW - Item Response Theory KW - Random Sampling KW - Reasoning KW - Verbal Ability computerized adaptive testing AB - The current study examined item exposure control procedures for testlet scored reading passages in the Verbal Reasoning section of the Medical College Admission Test with four computerized adaptive testing (CAT) systems using the partial credit model. The first system used a traditional CAT using maximum information item selection. The second used random item selection to provide a baseline for optimal exposure rates. The third used a variation of Lunz and Stahl's randomization procedure. The fourth used Luecht and Nungester's computerized adaptive sequential testing (CAST) system. A series of simulated fixed-length CATs was run to determine the optimal item length selection procedure. Results indicated that both the randomization procedure and CAST performed well in terms of exposure control and measurement precision, with the CAST system providing the best overall solution when all variables were taken into consideration. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 27 ER - TY - JOUR T1 - Optimal stratification of item pools in α-stratified computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2003 A1 - Chang, Hua-Hua A1 - van der Linden, W. J. KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Content (Test) KW - Item Response Theory KW - Mathematical Modeling KW - Test Construction computerized adaptive testing AB - A method based on 0-1 linear programming (LP) is presented to stratify an item pool optimally for use in α-stratified adaptive testing. Because the 0-1 LP model belongs to the subclass of models with a network flow structure, efficient solutions are possible. The method is applied to a previous item pool from the computerized adaptive testing (CAT) version of the Graduate Record Exams (GRE) Quantitative Test. The results indicate that the new method performs well in practical situations. It improves item exposure control, reduces the mean squared error in the θ estimates, and increases test reliability. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 27 ER - TY - JOUR T1 - The relationship between item exposure and test overlap in computerized adaptive testing JF - Journal of Educational Measurement Y1 - 2003 A1 - Chen, S-Y. A1 - Ankemann, R. D. A1 - Spray, J. A. KW - (Statistical) KW - Adaptive Testing KW - Computer Assisted Testing KW - Human Computer KW - Interaction computerized adaptive testing KW - Item Analysis KW - Item Analysis (Test) KW - Test Items AB - The purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (r-sub(max)). Therefore, item exposure control methods which implement a specification of r-sub(max) (e.g., J. B. Sympson and R. D. Hetter, 1985) provide the most direct control at both the item and test levels. (PsycINFO Database Record (c) 2005 APA ) VL - 40 ER - TY - JOUR T1 - Some alternatives to Sympson-Hetter item-exposure control in computerized adaptive testing JF - Journal of Educational and Behavioral Statistics Y1 - 2003 A1 - van der Linden, W. J. KW - Adaptive Testing KW - Computer Assisted Testing KW - Test Items computerized adaptive testing AB - TheHetter and Sympson (1997; 1985) method is a method of probabilistic item-exposure control in computerized adaptive testing. Setting its control parameters to admissible values requires an iterative process of computer simulations that has been found to be time consuming, particularly if the parameters have to be set conditional on a realistic set of values for the examinees’ ability parameter. Formal properties of the method are identified that help us explain why this iterative process can be slow and does not guarantee admissibility. In addition, some alternatives to the SH method are introduced. The behavior of these alternatives was estimated for an adaptive test from an item pool from the Law School Admission Test (LSAT). Two of the alternatives showed attractive behavior and converged smoothly to admissibility for all items in a relatively small number of iteration steps. VL - 28 ER - TY - JOUR T1 - Using response times to detect aberrant responses in computerized adaptive testing JF - Psychometrika Y1 - 2003 A1 - van der Linden, W. J. A1 - van Krimpen-Stoop, E. M. L. A. KW - Adaptive Testing KW - Behavior KW - Computer Assisted Testing KW - computerized adaptive testing KW - Models KW - person Fit KW - Prediction KW - Reaction Time AB - A lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered. VL - 68 ER - TY - JOUR T1 - A comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model JF - Applied Psychological Measurement Y1 - 2002 A1 - Pastor, D. A. A1 - Dodd, B. G. A1 - Chang, Hua-Hua KW - (Statistical) KW - Adaptive Testing KW - Algorithms computerized adaptive testing KW - Computer Assisted Testing KW - Item Analysis KW - Item Response Theory KW - Mathematical Modeling AB - The use of more performance items in large-scale testing has led to an increase in the research investigating the use of polytomously scored items in computer adaptive testing (CAT). Because this research has to be complemented with information pertaining to exposure control, the present research investigated the impact of using five different exposure control algorithms in two sized item pools calibrated using the generalized partial credit model. The results of the simulation study indicated that the a-stratified design, in comparison to a no-exposure control condition, could be used to reduce item exposure and overlap, increase pool utilization, and only minorly degrade measurement precision. Use of the more restrictive exposure control algorithms, such as the Sympson-Hetter and conditional Sympson-Hetter, controlled exposure to a greater extent but at the cost of measurement precision. Because convergence of the exposure control parameters was problematic for some of the more restrictive exposure control algorithms, use of the more simplistic exposure control mechanisms, particularly when the test length to item pool size ratio is large, is recommended. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 26 ER - TY - JOUR T1 - Data sparseness and on-line pretest item calibration-scaling methods in CAT JF - Journal of Educational Measurement Y1 - 2002 A1 - Ban, J-C. A1 - Hanson, B. A. A1 - Yi, Q. A1 - Harris, D. J. KW - Computer Assisted Testing KW - Educational Measurement KW - Item Response Theory KW - Maximum Likelihood KW - Methodology KW - Scaling (Testing) KW - Statistical Data AB - Compared and evaluated 3 on-line pretest item calibration-scaling methods (the marginal maximum likelihood estimate with 1 expectation maximization [EM] cycle [OEM] method, the marginal maximum likelihood estimate with multiple EM cycles [MEM] method, and M. L. Stocking's Method B) in terms of item parameter recovery when the item responses to the pretest items in the pool are sparse. Simulations of computerized adaptive tests were used to evaluate the results yielded by the three methods. The MEM method produced the smallest average total error in parameter estimation, and the OEM method yielded the largest total error (PsycINFO Database Record (c) 2005 APA ) VL - 39 ER - TY - JOUR T1 - An EM approach to parameter estimation for the Zinnes and Griggs paired comparison IRT model JF - Applied Psychological Measurement Y1 - 2002 A1 - Stark, S. A1 - F Drasgow KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Response Theory KW - Maximum Likelihood KW - Personnel Evaluation KW - Statistical Correlation KW - Statistical Estimation AB - Borman et al. recently proposed a computer adaptive performance appraisal system called CARS II that utilizes paired comparison judgments of behavioral stimuli. To implement this approach,the paired comparison ideal point model developed by Zinnes and Griggs was selected. In this article,the authors describe item response and information functions for the Zinnes and Griggs model and present procedures for estimating stimulus and person parameters. Monte carlo simulations were conducted to assess the accuracy of the parameter estimation procedures. The results indicated that at least 400 ratees (i.e.,ratings) are required to obtain reasonably accurate estimates of the stimulus parameters and their standard errors. In addition,latent trait estimation improves as test length increases. The implications of these results for test construction are also discussed. VL - 26 ER - TY - JOUR T1 - Hypergeometric family and item overlap rates in computerized adaptive testing JF - Psychometrika Y1 - 2002 A1 - Chang, Hua-Hua A1 - Zhang, J. KW - Adaptive Testing KW - Algorithms KW - Computer Assisted Testing KW - Taking KW - Test KW - Time On Task computerized adaptive testing AB - A computerized adaptive test (CAT) is usually administered to small groups of examinees at frequent time intervals. It is often the case that examinees who take the test earlier share information with examinees who will take the test later, thus increasing the risk that many items may become known. Item overlap rate for a group of examinees refers to the number of overlapping items encountered by these examinees divided by the test length. For a specific item pool, different item selection algorithms may yield different item overlap rates. An important issue in designing a good CAT item selection algorithm is to keep item overlap rate below a preset level. In doing so, it is important to investigate what the lowest rate could be for all possible item selection algorithms. In this paper we rigorously prove that if every item had an equal possibility to be selected from the pool in a fixed-length CAT, the number of overlapping item among any α randomly sampled examinees follows the hypergeometric distribution family for α ≥ 1. Thus, the expected values of the number of overlapping items among any randomly sampled α examinee can be calculated precisely. These values may serve as benchmarks in controlling item overlap rates for fixed-length adaptive tests. (PsycINFO Database Record (c) 2005 APA ) VL - 67 ER - TY - JOUR T1 - Information technology and literacy assessment JF - Reading and Writing Quarterly Y1 - 2002 A1 - Balajthy, E. KW - Computer Applications KW - Computer Assisted Testing KW - Information KW - Internet KW - Literacy KW - Models KW - Systems KW - Technology AB - This column discusses information technology and literacy assessment in the past and present. The author also describes computer-based assessments today including the following topics: computer-scored testing, computer-administered formal assessment, Internet formal assessment, computerized adaptive tests, placement tests, informal assessment, electronic portfolios, information management, and Internet information dissemination. A model of the major present-day applications of information technologies in reading and literacy assessment is also included. (PsycINFO Database Record (c) 2005 APA ) VL - 18 ER - TY - CHAP T1 - The work ahead: A psychometric infrastructure for computerized adaptive tests T2 - Computer-based tests: Building the foundation for future assessment Y1 - 2002 A1 - F Drasgow ED - M. P. Potenza ED - J. J. Freemer ED - W. C. Ward KW - Adaptive Testing KW - Computer Assisted Testing KW - Educational KW - Measurement KW - Psychometrics AB - (From the chapter) Considers the past and future of computerized adaptive tests and computer-based tests and looks at issues and challenges confronting a testing program as it implements and operates a computer-based test. Recommendations for testing programs from The National Council of Measurement in Education Ad Hoc Committee on Computerized Adaptive Test Disclosure are appended. (PsycINFO Database Record (c) 2005 APA ) JF - Computer-based tests: Building the foundation for future assessment PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J. USA N1 - Using Smart Source ParsingComputer-based testing: Building the foundation for future assessments. (pp. 1-35). Mahwah, NJ : Lawrence Erlbaum Associates, Publishers. xi, 326 pp ER - TY - JOUR T1 - Computerized adaptive testing with the generalized graded unfolding model JF - Applied Psychological Measurement Y1 - 2001 A1 - Roberts, J. S. A1 - Lin, Y. A1 - Laughlin, J. E. KW - Attitude Measurement KW - College Students computerized adaptive testing KW - Computer Assisted Testing KW - Item Response KW - Models KW - Statistical Estimation KW - Theory AB - Examined the use of the generalized graded unfolding model (GGUM) in computerized adaptive testing. The objective was to minimize the number of items required to produce equiprecise estimates of person locations. Simulations based on real data about college student attitudes toward abortion and on data generated to fit the GGUM were used. It was found that as few as 7 or 8 items were needed to produce accurate and precise person estimates using an expected a posteriori procedure. The number items in the item bank (20, 40, or 60 items) and their distribution on the continuum (uniform locations or item clusters in moderately extreme locations) had only small effects on the accuracy and precision of the estimates. These results suggest that adaptive testing with the GGUM is a good method for achieving estimates with an approximately uniform level of precision using a small number of items. (PsycINFO Database Record (c) 2005 APA ) VL - 25 ER - TY - JOUR T1 - Developments in measurement of persons and items by means of item response models JF - Behaviormetrika Y1 - 2001 A1 - Sijtsma, K. KW - Cognitive KW - Computer Assisted Testing KW - Item Response Theory KW - Models KW - Nonparametric Statistical Tests KW - Processes AB - This paper starts with a general introduction into measurement of hypothetical constructs typical of the social and behavioral sciences. After the stages ranging from theory through operationalization and item domain to preliminary test or questionnaire have been treated, the general assumptions of item response theory are discussed. The family of parametric item response models for dichotomous items is introduced and it is explained how parameters for respondents and items are estimated from the scores collected from a sample of respondents who took the test or questionnaire. Next, the family of nonparametric item response models is explained, followed by the 3 classes of item response models for polytomous item scores (e.g., rating scale scores). Then, to what degree the mean item score and the unweighted sum of item scores for persons are useful for measuring items and persons in the context of item response theory is discussed. Methods for fitting parametric and nonparametric models to data are briefly discussed. Finally, the main applications of item response models are discussed, which include equating and item banking, computerized and adaptive testing, research into differential item functioning, person fit research, and cognitive modeling. (PsycINFO Database Record (c) 2005 APA ) VL - 28 ER - TY - JOUR T1 - Differences between self-adapted and computerized adaptive tests: A meta-analysis JF - Journal of Educational Measurement Y1 - 2001 A1 - Pitkin, A. K. A1 - Vispoel, W. P. KW - Adaptive Testing KW - Computer Assisted Testing KW - Scores computerized adaptive testing KW - Test KW - Test Anxiety AB - Self-adapted testing has been described as a variation of computerized adaptive testing that reduces test anxiety and thereby enhances test performance. The purpose of this study was to gain a better understanding of these proposed effects of self-adapted tests (SATs); meta-analysis procedures were used to estimate differences between SATs and computerized adaptive tests (CATs) in proficiency estimates and post-test anxiety levels across studies in which these two types of tests have been compared. After controlling for measurement error the results showed that SATs yielded proficiency estimates that were 0.12 standard deviation units higher and post-test anxiety levels that were 0.19 standard deviation units lower than those yielded by CATs. The authors speculate about possible reasons for these differences and discuss advantages and disadvantages of using SATs in operational settings. (PsycINFO Database Record (c) 2005 APA ) VL - 38 ER - TY - JOUR T1 - Item selection in computerized adaptive testing: Should more discriminating items be used first? JF - Journal of Educational Measurement Y1 - 2001 A1 - Hau, Kit-Tai A1 - Chang, Hua-Hua KW - ability KW - Adaptive Testing KW - Computer Assisted Testing KW - Estimation KW - Statistical KW - Test Items computerized adaptive testing AB - During computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with J. B. Sympson and R. D. Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order, as described in H. Chang and Z. Yings (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure. (PsycINFO Database Record (c) 2005 APA ) VL - 38 ER - TY - JOUR T1 - Outlier measures and norming methods for computerized adaptive tests JF - Journal of Educational and Behavioral Statistics Y1 - 2001 A1 - Bradlow, E. T. A1 - Weiss, R. E. KW - Adaptive Testing KW - Computer Assisted Testing KW - Statistical Analysis KW - Test Norms AB - Notes that the problem of identifying outliers has 2 important aspects: the choice of outlier measures and the method to assess the degree of outlyingness (norming) of those measures. Several classes of measures for identifying outliers in Computerized Adaptive Tests (CATs) are introduced. Some of these measures are constructed to take advantage of CATs' sequential choice of items; other measures are taken directly from paper and pencil (P&P) tests and are used for baseline comparisons. Assessing the degree of outlyingness of CAT responses, however, can not be applied directly from P&P tests because stopping rules associated with CATs yield examinee responses of varying lengths. Standard outlier measures are highly correlated with the varying lengths which makes comparison across examinees impossible. Therefore, 4 methods are presented and compared which map outlier statistics to a familiar probability scale (a p value). The methods are explored in the context of CAT data from a 1995 Nationally Administered Computerized Examination (NACE). (PsycINFO Database Record (c) 2005 APA ) VL - 26 ER - TY - CHAP T1 - Practical issues in setting standards on computerized adaptive tests T2 - Setting performance standards: Concepts, methods, and perspectives Y1 - 2001 A1 - Sireci, S. G. A1 - Clauser, B. E. KW - Adaptive Testing KW - Computer Assisted Testing KW - Performance Tests KW - Testing Methods AB - (From the chapter) Examples of setting standards on computerized adaptive tests (CATs) are hard to find. Some examples of CATs involving performance standards include the registered nurse exam and the Novell systems engineer exam. Although CATs do not require separate standard setting-methods, there are special issues to be addressed by test specialist who set performance standards on CATs. Setting standards on a CAT will typical require modifications on the procedures used with more traditional, fixed-form, paper-and -pencil examinations. The purpose of this chapter is to illustrate why CATs pose special challenges to the standard setter. (PsycINFO Database Record (c) 2005 APA ) JF - Setting performance standards: Concepts, methods, and perspectives PB - Lawrence Erlbaum Associates, Inc. CY - Mahwah, N.J. USA N1 - Using Smart Source ParsingSetting performance standards: Concepts, methods, and perspectives. (pp. 355-369). Mahwah, NJ : Lawrence Erlbaum Associates, Publishers. xiii, 510 pp ER - TY - JOUR T1 - Requerimientos, aplicaciones e investigación en tests adaptativos informatizados [Requirements, applications, and investigation in computerized adaptive testing] JF - Apuntes de Psicologia Y1 - 2001 A1 - Olea Díaz, J. A1 - Ponsoda Gil, V. A1 - Revuelta Menéndez, J. A1 - Hontangas Beltrán, P. A1 - Abad, F. J. KW - Computer Assisted Testing KW - English as Second Language KW - Psychometrics computerized adaptive testing AB - Summarizes the main requirements and applications of computerized adaptive testing (CAT) with emphasis on the differences between CAT and conventional computerized tests. Psychometric properties of estimations based on CAT, item selection strategies, and implementation software are described. Results of CAT studies in Spanish-speaking samples are described. Implications for developing a CAT measuring the English vocabulary of Spanish-speaking students are discussed. (PsycINFO Database Record (c) 2005 APA ) VL - 19 ER - TY - JOUR T1 - Toepassing van een computergestuurde adaptieve testprocedure op persoonlijkheidsdata [Application of a computerised adaptive test procedure on personality data] JF - Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden Y1 - 2001 A1 - Hol, A. M. A1 - Vorst, H. C. M. A1 - Mellenbergh, G. J. KW - Adaptive Testing KW - Computer Applications KW - Computer Assisted Testing KW - Personality Measures KW - Test Reliability computerized adaptive testing AB - Studied the applicability of a computerized adaptive testing procedure to an existing personality questionnaire within the framework of item response theory. The procedure was applied to the scores of 1,143 male and female university students (mean age 21.8 yrs) in the Netherlands on the Neuroticism scale of the Amsterdam Biographical Questionnaire (G. J. Wilde, 1963). The graded response model (F. Samejima, 1969) was used. The quality of the adaptive test scores was measured based on their correlation with test scores for the entire item bank and on their correlation with scores on other scales from the personality test. The results indicate that computerized adaptive testing can be applied to personality scales. (PsycINFO Database Record (c) 2005 APA ) VL - 56 ER - TY - JOUR T1 - A comparison of item selection rules at the early stages of computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2000 A1 - Chen, S-Y. A1 - Ankenmann, R. D. A1 - Chang, Hua-Hua KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Analysis (Test) KW - Statistical Estimation computerized adaptive testing AB - The effects of 5 item selection rules--Fisher information (FI), Fisher interval information (FII), Fisher information with a posterior distribution (FIP), Kullback-Leibler information (KL), and Kullback-Leibler information with a posterior distribution (KLP)--were compared with respect to the efficiency and precision of trait (θ) estimation at the early stages of computerized adaptive testing (CAT). FII, FIP, KL, and KLP performed marginally better than FI at the early stages of CAT for θ=-3 and -2. For tests longer than 10 items, there appeared to be no precision advantage for any of the selection rules. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) VL - 24 ER - TY - JOUR T1 - The development of a computerized version of Vandenberg's mental rotation test and the effect of visuo-spatial working memory loading JF - Dissertation Abstracts International Section A: Humanities and Social Sciences Y1 - 2000 A1 - Strong, S. D. KW - Computer Assisted Testing KW - Mental Rotation KW - Short Term Memory computerized adaptive testing KW - Test Construction KW - Test Validity KW - Visuospatial Memory AB - This dissertation focused on the generation and evaluation of web-based versions of Vandenberg's Mental Rotation Test. Memory and spatial visualization theory were explored in relation to the addition of a visuo-spatial working memory component. Analysis of the data determined that there was a significant difference between scores on the MRT Computer and MRT Memory test. The addition of a visuo-spatial working memory component did significantly affect results at the .05 alpha level. Reliability and discrimination estimates were higher on the MRT Memory version. The computerization of the paper and pencil version on the MRT did not significantly effect scores but did effect the time required to complete the test. The population utilized in the quasi-experiment consisted of 107 university students from eight institutions in engineering graphics related courses. The subjects completed two researcher developed, Web-based versions of Vandenberg's Mental Rotation Test and the original paper and pencil version of the Mental Rotation Test. One version of the test included a visuo-spatial working memory loading. Significant contributions of this study included developing and evaluating computerized versions of Vandenberg's Mental Rotation Test. Previous versions of Vandenberg's Mental Rotation Test did not take advantage of the ability of the computer to incorporate an interaction factor, such as a visuo-spatial working memory loading, into the test. The addition of an interaction factor results in a more discriminate test which will lend itself well to computerized adaptive testing practices. Educators in engineering graphics related disciplines should strongly consider the use of spatial visualization tests to aid in establishing the effects of modern computer systems on fundamental design/drafting skills. Regular testing of spatial visualization skills will result assist in the creation of a more relevant curriculum. Computerized tests which are valid and reliable will assist in making this task feasible. (PsycINFO Database Record (c) 2005 APA ) VL - 60 ER - TY - JOUR T1 - Emergence of item response modeling in instrument development and data analysis JF - Medical Care Y1 - 2000 A1 - Hambleton, R. K. KW - Computer Assisted Testing KW - Health KW - Item Response Theory KW - Measurement KW - Statistical Validity computerized adaptive testing KW - Test Construction KW - Treatment Outcomes VL - 38 ER - TY - JOUR T1 - Estimation of trait level in computerized adaptive testing JF - Applied Psychological Measurement Y1 - 2000 A1 - Cheng, P. E. A1 - Liou, M. KW - (Statistical) KW - Adaptive Testing KW - Computer Assisted Testing KW - Item Analysis KW - Statistical Estimation computerized adaptive testing AB - Notes that in computerized adaptive testing (CAT), a examinee's trait level (θ) must be estimated with reasonable accuracy based on a small number of item responses. A successful implementation of CAT depends on (1) the accuracy of statistical methods used for estimating θ and (2) the efficiency of the item-selection criterion. Methods of estimating θ suitable for CAT are reviewed, and the differences between Fisher and Kullback-Leibler information criteria for selecting items are discussed. The accuracy of different CAT algorithms was examined in an empirical study. The results show that correcting θ estimates for bias was necessary at earlier stages of CAT, but most CAT algorithms performed equally well for tests of 10 or more items. (PsycINFO Database Record (c) 2005 APA ) VL - 24 ER - TY - JOUR T1 - An examination of the reliability and validity of performance ratings made using computerized adaptive rating scales JF - Dissertation Abstracts International: Section B: The Sciences and Engineering Y1 - 2000 A1 - Buck, D. E. KW - Adaptive Testing KW - Computer Assisted Testing KW - Performance Tests KW - Rating Scales KW - Reliability KW - Test KW - Test Validity AB - This study compared the psychometric properties of performance ratings made using recently-developed computerized adaptive rating scales (CARS) to the psyc hometric properties of ratings made using more traditional paper-and-pencil rati ng formats, i.e., behaviorally-anchored and graphic rating scales. Specifically, the reliability, validity and accuracy of the performance ratings from each for mat were examined. One hundred twelve participants viewed six 5-minute videotape s of office situations and rated the performance of a target person in each vide otape on three contextual performance dimensions-Personal Support, Organizationa l Support, and Conscientious Initiative-using CARS and either behaviorally-ancho red or graphic rating scales. Performance rating properties were measured using Shrout and Fleiss's intraclass correlation (2, 1), Borman's differential accurac y measure, and Cronbach's accuracy components as indexes of rating reliability, validity, and accuracy, respectively. Results found that performance ratings mad e using the CARS were significantly more reliable and valid than performance rat ings made using either of the other formats. Additionally, CARS yielded more acc urate performance ratings than the paper-and-pencil formats. The nature of the C ARS system (i.e., its adaptive nature and scaling methodology) and its paired co mparison judgment task are offered as possible reasons for the differences found in the psychometric properties of the performance ratings made using the variou s rating formats. (PsycINFO Database Record (c) 2005 APA ) VL - 61 ER - TY - JOUR T1 - The distribution of indexes of person fit within the computerized adaptive testing environment JF - Applied Psychological Measurement Y1 - 1997 A1 - Nering, M. L. KW - Adaptive Testing KW - Computer Assisted Testing KW - Fit KW - Person Environment AB - The extent to which a trait estimate represents the underlying latent trait of interest can be estimated by using indexes of person fit. Several statistical methods for indexing person fit have been proposed to identify nonmodel-fitting response vectors. These person-fit indexes have generally been found to follow a standard normal distribution for conventionally administered tests. The present investigation found that within the context of computerized adaptive testing (CAT) these indexes tended not to follow a standard normal distribution. As the item pool became less discriminating, as the CAT termination criterion became less stringent, and as the number of items in the pool decreased, the distributions of the indexes approached a standard normal distribution. It was determined that under these conditions the indexes' distributions approached standard normal distributions because more items were being administered. However, even when over 50 items were administered in a CAT the indexes were distributed in a fashion that was different from what was expected. (PsycINFO Database Record (c) 2006 APA ) VL - 21 N1 - Journal; Peer Reviewed Journal ER -