TY - JOUR T1 - Item Selection and Exposure Control Methods for Computerized Adaptive Testing with Multidimensional Ranking Items JF - Journal of Educational Measurement Y1 - 2020 A1 - Chen, Chia-Wen A1 - Wang, Wen-Chung A1 - Chiu, Ming Ming A1 - Ro, Sage AB - Abstract The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across items). To address these issues, we introduce subpool partition strategies for item selection and within-person statement exposure control procedures. Simulations showed that the multinomial method reduces computation time while maintaining measurement precision. Both the freeze and revised Sympson-Hetter online (RSHO) methods controlled the statement exposure rate; RSHO sacrificed some measurement precision but increased pool use. Furthermore, preventing a statement's repetition on consecutive items neither hindered the effectiveness of the freeze or RSHO method nor reduced measurement precision. VL - 57 UR - https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12252 ER - TY - JOUR T1 - Variable-Length Computerized Adaptive Testing Using the Higher Order DINA Model JF - Journal of Educational Measurement Y1 - 2015 A1 - Hsu, Chia-Ling A1 - Wang, Wen-Chung AB - Cognitive diagnosis models provide profile information about a set of latent binary attributes, whereas item response models yield a summary report on a latent continuous trait. To utilize the advantages of both models, higher order cognitive diagnosis models were developed in which information about both latent binary attributes and latent continuous traits is available. To facilitate the utility of cognitive diagnosis models, corresponding computerized adaptive testing (CAT) algorithms were developed. Most of them adopt the fixed-length rule to terminate CAT and are limited to ordinary cognitive diagnosis models. In this study, the higher order deterministic-input, noisy-and-gate (DINA) model was used as an example, and three criteria based on the minimum-precision termination rule were implemented: one for the latent class, one for the latent trait, and the other for both. The simulation results demonstrated that all of the termination criteria were successful when items were selected according to the Kullback-Leibler information and the posterior-weighted Kullback-Leibler information, and the minimum-precision rule outperformed the fixed-length rule with a similar test length in recovering the latent attributes and the latent trait. VL - 52 UR - http://dx.doi.org/10.1111/jedm.12069 ER - TY - JOUR T1 - The Random-Threshold Generalized Unfolding Model and Its Application of Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2013 A1 - Wang, Wen-Chung A1 - Liu, Chen-Wei A1 - Wu, Shiu-Lien AB -

The random-threshold generalized unfolding model (RTGUM) was developed by treating the thresholds in the generalized unfolding model as random effects rather than fixed effects to account for the subjective nature of the selection of categories in Likert items. The parameters of the new model can be estimated with the JAGS (Just Another Gibbs Sampler) freeware, which adopts a Bayesian approach for estimation. A series of simulations was conducted to evaluate the parameter recovery of the new model and the consequences of ignoring the randomness in thresholds. The results showed that the parameters of RTGUM were recovered fairly well and that ignoring the randomness in thresholds led to biased estimates. Computerized adaptive testing was also implemented on RTGUM, where the Fisher information criterion was used for item selection and the maximum a posteriori method was used for ability estimation. The simulation study showed that the longer the test length, the smaller the randomness in thresholds, and the more categories in an item, the more precise the ability estimates would be.

VL - 37 UR - http://apm.sagepub.com/content/37/3/179.abstract ER - TY - JOUR T1 - Variable-Length Computerized Adaptive Testing Based on Cognitive Diagnosis Models JF - Applied Psychological Measurement Y1 - 2013 A1 - Hsu, Chia-Ling A1 - Wang, Wen-Chung A1 - Chen, Shu-Ying AB -

Interest in developing computerized adaptive testing (CAT) under cognitive diagnosis models (CDMs) has increased recently. CAT algorithms that use a fixed-length termination rule frequently lead to different degrees of measurement precision for different examinees. Fixed precision, in which the examinees receive the same degree of measurement precision, is a major advantage of CAT over nonadaptive testing. In addition to the precision issue, test security is another important issue in practical CAT programs. In this study, the authors implemented two termination criteria for the fixed-precision rule and evaluated their performance under two popular CDMs using simulations. The results showed that using the two criteria with the posterior-weighted Kullback–Leibler information procedure for selecting items could achieve the prespecified measurement precision. A control procedure was developed to control item exposure and test overlap simultaneously among examinees. The simulation results indicated that in contrast to no method of controlling exposure, the control procedure developed in this study could maintain item exposure and test overlap at the prespecified level at the expense of only a few more items.

VL - 37 UR - http://apm.sagepub.com/content/37/7/563.abstract ER - TY - JOUR T1 - Computerized Adaptive Testing Using a Class of High-Order Item Response Theory Models JF - Applied Psychological Measurement Y1 - 2012 A1 - Huang, Hung-Yu A1 - Chen, Po-Hsi A1 - Wang, Wen-Chung AB -

In the human sciences, a common assumption is that latent traits have a hierarchical structure. Higher order item response theory models have been developed to account for this hierarchy. In this study, computerized adaptive testing (CAT) algorithms based on these kinds of models were implemented, and their performance under a variety of situations was examined using simulations. The results showed that the CAT algorithms were very effective. The progressive method for item selection, the Sympson and Hetter method with online and freeze procedure for item exposure control, and the multinomial model for content balancing can simultaneously maintain good measurement precision, item exposure control, content balance, test security, and pool usage.

VL - 36 UR - http://apm.sagepub.com/content/36/8/689.abstract ER - TY - JOUR T1 - Computerized Classification Testing Under the Generalized Graded Unfolding Model JF - Educational and Psychological Measurement Y1 - 2011 A1 - Wang, Wen-Chung A1 - Liu, Chen-Wei AB -

The generalized graded unfolding model (GGUM) has been recently developed to describe item responses to Likert items (agree—disagree) in attitude measurement. In this study, the authors (a) developed two item selection methods in computerized classification testing under the GGUM, the current estimate/ability confidence interval method and the cut score/sequential probability ratio test method and (b) evaluated their accuracy and efficiency in classification through simulations. The results indicated that both methods were very accurate and efficient. The more points each item had and the fewer the classification categories, the more accurate and efficient the classification would be. However, the latter method may yield a very low accuracy in dichotomous items with a short maximum test length. Thus, if it is to be used to classify examinees with dichotomous items, the maximum text length should be increased.

VL - 71 UR - http://epm.sagepub.com/content/71/1/114.abstract ER - TY - JOUR T1 - Computerized Classification Testing Under the One-Parameter Logistic Response Model With Ability-Based Guessing JF - Educational and Psychological Measurement Y1 - 2011 A1 - Wang, Wen-Chung A1 - Huang, Sheng-Yun AB -

The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their performances. Four item selection methods (the Fisher information, the Fisher information with a posterior distribution, the progressive method, and the adjusted progressive method) and two termination criteria (the ability confidence interval [ACI] method and the sequential probability ratio test [SPRT]) were developed. In addition, the Sympson–Hetter online method with freeze (SHOF) was implemented for item exposure control. Major results include the following: (a) when no item exposure control was made, all the four item selection methods yielded very similar correct classification rates, but the Fisher information method had the worst item bank usage and the highest item exposure rate; (b) SHOF can successfully maintain the item exposure rate at a prespecified level, without compromising substantial accuracy and efficiency in classification; (c) once SHOF was implemented, all the four methods performed almost identically; (d) ACI appeared to be slightly more efficient than SPRT; and (e) in general, a higher weight of ability in guessing led to a slightly higher accuracy and efficiency, and a lower forced classification rate.

VL - 71 UR - http://epm.sagepub.com/content/71/6/925.abstract ER - TY - JOUR T1 - Implementation and Measurement Efficiency of Multidimensional Computerized Adaptive Testing JF - Applied Psychological Measurement Y1 - 2004 A1 - Wang, Wen-Chung A1 - Chen, Po-Hsi AB -

Multidimensional adaptive testing (MAT) procedures are proposed for the measurement of several latent traits by a single examination. Bayesian latent trait estimation and adaptive item selection are derived. Simulations were conducted to compare the measurement efficiency of MAT with those of unidimensional adaptive testing and random administration. The results showed that the higher the correlation between latent traits, the more latent traits there were, and the more scoring levels there were in the items, the more efficient MAT was than the other two procedures. For tests containing multidimensional items, only MAT is applicable, whereas unidimensional adaptive testing is not. Issues in implementing MAT are discussed.

VL - 28 UR - http://apm.sagepub.com/content/28/5/295.abstract ER -