%0 Journal Article %J Applied Psychological Measurement %D 2020 %T Stratified Item Selection Methods in Cognitive Diagnosis Computerized Adaptive Testing %A Jing Yang %A Hua-Hua Chang %A Jian Tao %A Ningzhong Shi %X Cognitive diagnostic computerized adaptive testing (CD-CAT) aims to obtain more useful diagnostic information by taking advantages of computerized adaptive testing (CAT). Cognitive diagnosis models (CDMs) have been developed to classify examinees into the correct proficiency classes so as to get more efficient remediation, whereas CAT tailors optimal items to the examinee’s mastery profile. The item selection method is the key factor of the CD-CAT procedure. In recent years, a large number of parametric/nonparametric item selection methods have been proposed. In this article, the authors proposed a series of stratified item selection methods in CD-CAT, which are combined with posterior-weighted Kullback–Leibler (PWKL), nonparametric item selection (NPS), and weighted nonparametric item selection (WNPS) methods, and named S-PWKL, S-NPS, and S-WNPS, respectively. Two different types of stratification indices were used: original versus novel. The performances of the proposed item selection methods were evaluated via simulation studies and compared with the PWKL, NPS, and WNPS methods without stratification. Manipulated conditions included calibration sample size, item quality, number of attributes, number of strata, and data generation models. Results indicated that the S-WNPS and S-NPS methods performed similarly, and both outperformed the S-PWKL method. And item selection methods with novel stratification indices performed slightly better than the ones with original stratification indices, and those without stratification performed the worst. %B Applied Psychological Measurement %V 44 %P 346-361 %U https://doi.org/10.1177/0146621619893783 %R 10.1177/0146621619893783 %0 Journal Article %J Quality of Life Research %D 2018 %T Some recommendations for developing multidimensional computerized adaptive tests for patient-reported outcomes %A Smits, Niels %A Paap, Muirne C. S. %A Böhnke, Jan R. %X Multidimensional item response theory and computerized adaptive testing (CAT) are increasingly used in mental health, quality of life (QoL), and patient-reported outcome measurement. Although multidimensional assessment techniques hold promises, they are more challenging in their application than unidimensional ones. The authors comment on minimal standards when developing multidimensional CATs. %B Quality of Life Research %V 27 %P 1055–1063 %8 Apr %U https://doi.org/10.1007/s11136-018-1821-8 %R 10.1007/s11136-018-1821-8 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Scripted On-the-fly Multistage Testing %A Edison Choe %A Bruce Williams %A Sung-Hyuck Lee %K CAT %K multistage testing %K On-the-fly testing %X

On-the-fly multistage testing (OMST) was introduced recently as a promising alternative to preassembled MST. A decidedly appealing feature of both is the reviewability of items within the current stage. However, the fundamental difference is that, instead of routing to a preassembled module, OMST adaptively assembles a module at each stage according to an interim ability estimate. This produces more individualized forms with finer measurement precision, but imposing nonstatistical constraints and controlling item exposure become more cumbersome. One recommendation is to use the maximum priority index followed by a remediation step to satisfy content constraints, and the Sympson-Hetter method with a stratified item bank for exposure control.

However, these methods can be computationally expensive, thereby impeding practical implementation. Therefore, this study investigated the script method as a simpler solution to the challenge of strict content balancing and effective item exposure control in OMST. The script method was originally devised as an item selection algorithm for CAT and generally proceeds as follows: For a test with m items, there are m slots to be filled, and an item is selected according to pre-defined rules for each slot. For the first slot, randomly select an item from a designated content area (collection). For each subsequent slot, 1) Discard any enemies of items already administered in previous slots; 2) Draw a designated number of candidate items (selection length) from the designated collection according to the current ability estimate; 3) Randomly select one item from the set of candidates. There are two distinct features of the script method. First, a predetermined sequence of collections guarantees meeting content specifications. The specific ordering may be determined either randomly or deliberately by content experts. Second, steps 2 and 3 depict a method of exposure control, in which selection length balances item usage at the possible expense of ability estimation accuracy. The adaptation of the script method to OMST is straightforward. For the first module, randomly select each item from a designated collection. For each subsequent module, the process is the same as in scripted CAT (SCAT) except the same ability estimate is used for the selection of all items within the module. A series of simulations was conducted to evaluate the performance of scripted OMST (SOMST, with 3 or 4 evenly divided stages) relative to SCAT under various item exposure restrictions. In all conditions, reliability was maximized by programming an optimization algorithm that searches for the smallest possible selection length for each slot within the constraints. Preliminary results indicated that SOMST is certainly a capable design with performance comparable to that of SCAT. The encouraging findings and ease of implementation highly motivate the prospect of operational use for large-scale assessments.

Presentation Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1wKuAstITLXo6BM4APf2mPsth1BymNl-y %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T A Simulation Study to Compare Classification Method in Cognitive Diagnosis Computerized Adaptive Testing %A Jing Yang %A Jian Tao %A Hua-Hua Chang %A Ning-Zhong Shi %X

Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) combines the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models that can be viewed as restricted latent class models have been developed to classify the examinees into the correct profile of skills that have been mastered and those that have not so as to get more efficient remediation. Chiu & Douglas (2013) introduces a nonparametric procedure that only requires specification of Q-matrix to classify by proximity to ideal response pattern. In this article, we compare nonparametric procedure with common profile estimation method like maximum a posterior (MAP) in CD-CAT. Simulation studies consider a variety of Q-matrix structure, the number of attributes, ways to generate attribute profiles, and item quality. Results indicate that nonparametric procedure consistently gets the higher pattern and attribute recovery rate in nearly all conditions.

References

Chiu, C.-Y., & Douglas, J. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250. doi: 10.1007/s00357-013-9132-9

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1jCL3fPZLgzIdwvEk20D-FliZ15OTUtpr %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Stochastic Curtailment of Questionnaires for Three-Level Classification: Shortening the CES-D for Assessing Low, Moderate, and High Risk of Depression %A Smits, Niels %A Finkelman, Matthew D. %A Kelderman, Henk %X In clinical assessment, efficient screeners are needed to ensure low respondent burden. In this article, Stochastic Curtailment (SC), a method for efficient computerized testing for classification into two classes for observable outcomes, was extended to three classes. In a post hoc simulation study using the item scores on the Center for Epidemiologic Studies–Depression Scale (CES-D) of a large sample, three versions of SC, SC via Empirical Proportions (SC-EP), SC via Simple Ordinal Regression (SC-SOR), and SC via Multiple Ordinal Regression (SC-MOR) were compared at both respondent burden and classification accuracy. All methods were applied under the regular item order of the CES-D and under an ordering that was optimal in terms of the predictive power of the items. Under the regular item ordering, the three methods were equally accurate, but SC-SOR and SC-MOR needed less items. Under the optimal ordering, additional gains in efficiency were found, but SC-MOR suffered from capitalization on chance substantially. It was concluded that SC-SOR is an efficient and accurate method for clinical screening. Strengths and weaknesses of the methods are discussed. %B Applied Psychological Measurement %V 40 %P 22-36 %U http://apm.sagepub.com/content/40/1/22.abstract %R 10.1177/0146621615592294 %0 Journal Article %J Applied Psychological Measurement %D 2015 %T Stochastic Curtailment in Adaptive Mastery Testing: Improving the Efficiency of Confidence Interval–Based Stopping Rules %A Sie, Haskell %A Finkelman, Matthew D. %A Bartroff, Jay %A Thompson, Nathan A. %X A well-known stopping rule in adaptive mastery testing is to terminate the assessment once the examinee’s ability confidence interval lies entirely above or below the cut-off score. This article proposes new procedures that seek to improve such a variable-length stopping rule by coupling it with curtailment and stochastic curtailment. Under the new procedures, test termination can occur earlier if the probability is high enough that the current classification decision remains the same should the test continue. Computation of this probability utilizes normality of an asymptotically equivalent version of the maximum likelihood ability estimate. In two simulation sets, the new procedures showed a substantial reduction in average test length while maintaining similar classification accuracy to the original method. %B Applied Psychological Measurement %V 39 %P 278-292 %U http://apm.sagepub.com/content/39/4/278.abstract %R 10.1177/0146621614561314 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2014 %T The Sequential Probability Ratio Test and Binary Item Response Models %A Nydick, Steven W. %X

The sequential probability ratio test (SPRT) is a common method for terminating item response theory (IRT)-based adaptive classification tests. To decide whether a classification test should stop, the SPRT compares a simple log-likelihood ratio, based on the classification bound separating two categories, to prespecified critical values. As has been previously noted (Spray & Reckase, 1994), the SPRT test statistic is not necessarily monotonic with respect to the classification bound when item response functions have nonzero lower asymptotes. Because of nonmonotonicity, several researchers (including Spray & Reckase, 1994) have recommended selecting items at the classification bound rather than the current ability estimate when terminating SPRT-based classification tests. Unfortunately, this well-worn advice is a bit too simplistic. Items yielding optimal evidence for classification depend on the IRT model, item parameters, and location of an examinee with respect to the classification bound. The current study illustrates, in depth, the relationship between the SPRT test statistic and classification evidence in binary IRT models. Unlike earlier studies, we examine the form of the SPRT-based log-likelihood ratio while altering the classification bound and item difficulty. These investigations motivate a novel item selection algorithm based on optimizing the expected SPRT criterion given the current ability estimate. The new expected log-likelihood ratio algorithm results in test lengths noticeably shorter than current, commonly used algorithms, and with no loss in classification accuracy.

%B Journal of Educational and Behavioral Statistics %V 39 %P 203-230 %U http://jeb.sagepub.com/cgi/content/abstract/39/3/203 %R 10.3102/1076998614524824 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T A Sequential Procedure for Detecting Compromised Items in the Item Pool of a CAT System %A Zhang, Jinming %X

To maintain the validity of a continuous testing system, such as computerized adaptive testing (CAT), items should be monitored to ensure that the performance of test items has not gone through any significant changes during their lifetime in an item pool. In this article, the author developed a sequentially monitoring procedure based on a series of statistical hypothesis tests to examine whether the statistical characteristics of individual items have changed significantly during test administration. Simulation studies show that under the simulated setting, by choosing an appropriate cutoff point, the procedure can control the rate of Type I errors at any reasonable significance level and meanwhile, has a very low rate of Type II errors.

%B Applied Psychological Measurement %V 38 %P 87-104 %U http://apm.sagepub.com/content/38/2/87.abstract %R 10.1177/0146621613510062 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T Stratified Item Selection and Exposure Control in Unidimensional Adaptive Testing in the Presence of Two-Dimensional Data %A Kalinowski, Kevin E. %A Natesan, Prathiba %A Henson, Robin K. %X

It is not uncommon to use unidimensional item response theory models to estimate ability in multidimensional data with computerized adaptive testing (CAT). The current Monte Carlo study investigated the penalty of this model misspecification in CAT implementations using different item selection methods and exposure control strategies. Three item selection methods—maximum information (MAXI), a-stratification (STRA), and a-stratification with b-blocking (STRB) with and without Sympson–Hetter (SH) exposure control strategy—were investigated. Calibrating multidimensional items as unidimensional items resulted in inaccurate item parameter estimates. Therefore, MAXI performed better than STRA and STRB in estimating the ability parameters. However, all three methods had relatively large standard errors. SH exposure control had no impact on the number of overexposed items. Existing unidimensional CAT implementations might consider using MAXI only if recalibration as multidimensional model is too expensive. Otherwise, building a CAT pool by calibrating multidimensional data as unidimensional is not recommended.

%B Applied Psychological Measurement %V 38 %P 563-576 %U http://apm.sagepub.com/content/38/7/563.abstract %R 10.1177/0146621614536768 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2013 %T A Semiparametric Model for Jointly Analyzing Response Times and Accuracy in Computerized Testing %A Wang, Chun %A Fan, Zhewen %A Chang, Hua-Hua %A Douglas, Jeffrey A. %X

The item response times (RTs) collected from computerized testing represent an underutilized type of information about items and examinees. In addition to knowing the examinees’ responses to each item, we can investigate the amount of time examinees spend on each item. Current models for RTs mainly focus on parametric models, which have the advantage of conciseness, but may suffer from reduced flexibility to fit real data. We propose a semiparametric approach, specifically, the Cox proportional hazards model with a latent speed covariate to model the RTs, embedded within the hierarchical framework proposed by van der Linden to model the RTs and response accuracy simultaneously. This semiparametric approach combines the flexibility of nonparametric modeling and the brevity and interpretability of the parametric modeling. A Markov chain Monte Carlo method for parameter estimation is given and may be used with sparse data obtained by computerized adaptive testing. Both simulation studies and real data analysis are carried out to demonstrate the applicability of the new model.

%B Journal of Educational and Behavioral Statistics %V 38 %P 381-417 %U http://jeb.sagepub.com/cgi/content/abstract/38/4/381 %R 10.3102/1076998612461831 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2013 %T Speededness and Adaptive Testing %A van der Linden, Wim J. %A Xiong, Xinhui %X

Two simple constraints on the item parameters in a response–time model are proposed to control the speededness of an adaptive test. As the constraints are additive, they can easily be included in the constraint set for a shadow-test approach (STA) to adaptive testing. Alternatively, a simple heuristic is presented to control speededness in plain adaptive testing without any constraints. Both types of control are easy to implement and do not require any other real-time parameter estimation during the test than the regular update of the test taker’s ability estimate. Evaluation of the two approaches using simulated adaptive testing showed that the STA was especially effective. It guaranteed testing times that differed less than 10 seconds from a reference test across a variety of conditions.

%B Journal of Educational and Behavioral Statistics %V 38 %P 418-438 %U http://jeb.sagepub.com/cgi/content/abstract/38/4/418 %R 10.3102/1076998612466143 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T A Stochastic Method for Balancing Item Exposure Rates in Computerized Classification Tests %A Huebner, Alan %A Li, Zhushan %X

Computerized classification tests (CCTs) classify examinees into categories such as pass/fail, master/nonmaster, and so on. This article proposes the use of stochastic methods from sequential analysis to address item overexposure, a practical concern in operational CCTs. Item overexposure is traditionally dealt with in CCTs by the Sympson-Hetter (SH) method, but this method is unable to restrict the exposure of the most informative items to the desired level. The authors’ new method of stochastic item exposure balance (SIEB) works in conjunction with the SH method and is shown to greatly reduce the number of overexposed items in a pool and improve overall exposure balance while maintaining classification accuracy comparable with using the SH method alone. The method is demonstrated using a simulation study.

%B Applied Psychological Measurement %V 36 %P 181-188 %U http://apm.sagepub.com/content/36/3/181.abstract %R 10.1177/0146621612439932 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T A Stochastic Method for Balancing Item Exposure Rates in Computerized Classification Tests %A Huebner, Alan %A Li, Zhushan %X

Computerized classification tests (CCTs) classify examinees into categories such as pass/fail, master/nonmaster, and so on. This article proposes the use of stochastic methods from sequential analysis to address item overexposure, a practical concern in operational CCTs. Item overexposure is traditionally dealt with in CCTs by the Sympson-Hetter (SH) method, but this method is unable to restrict the exposure of the most informative items to the desired level. The authors’ new method of stochastic item exposure balance (SIEB) works in conjunction with the SH method and is shown to greatly reduce the number of overexposed items in a pool and improve overall exposure balance while maintaining classification accuracy comparable with using the SH method alone. The method is demonstrated using a simulation study.

%B Applied Psychological Measurement %V 36 %P 181-188 %U http://apm.sagepub.com/content/36/3/181.abstract %R 10.1177/0146621612439932 %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T Small-Sample Shadow Testing %A Wallace Judd %K CAT %K shadow test %B Annual Conference of the International Association for Computerized Adaptive Testing %G eng %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Sequencing an Adaptive Test Battery %A van der Linden, W. J. %B Elements of Adaptive Testing %G eng %& 5 %R 10.1007/978-0-387-85461-8 %0 Generic %D 2010 %T SimulCAT: Windows application that simulates computerized adaptive test administration %A Han, K. T. %G eng %U http://www.hantest.net/simulcat %0 Journal Article %J Journal of Educational Measurement %D 2010 %T Stratified and maximum information item selection procedures in computer adaptive testing %A Deng, H. %A Ansley, T. %A Chang, H.-H. %B Journal of Educational Measurement %V 47 %P 202-226 %G Eng %0 Journal Article %J Journal of Educational Measurement %D 2010 %T Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing %A Deng, Hui %A Ansley, Timothy %A Chang, Hua-Hua %X

In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with completely random item selection (RAN). The comparisons were with respect to error variances, reliability of ability estimates and item usage through CATs simulated under nine test conditions of various practical constraints and item selection space. The results showed that F had an apparent precision advantage over STR and USTR under unconstrained item selection, but with very poor item usage. USTR reduced error variances for STR under various conditions, with small compromises in item usage. Compared to F, USTR enhanced item usage while achieving comparable precision in ability estimates; it achieved a precision level similar to F with improved item usage when items were selected under exposure control and with limited item selection space. The results provide implications for choosing an appropriate item selection procedure in applied settings.

%B Journal of Educational Measurement %V 47 %P 202–226 %U http://dx.doi.org/10.1111/j.1745-3984.2010.00109.x %R 10.1111/j.1745-3984.2010.00109.x %0 Journal Article %J Educational and Psychological Measurement %D 2009 %T Studying the Equivalence of Computer-Delivered and Paper-Based Administrations of the Raven Standard Progressive Matrices Test %A Arce-Ferrer, Alvaro J. %A Martínez Guzmán, Elvira %X

This study investigates the effect of mode of administration of the Raven Standard Progressive Matrices test on distribution, accuracy, and meaning of raw scores. A random sample of high school students take counterbalanced paper-and-pencil and computer-based administrations of the test and answer a questionnaire surveying preferences for computer-delivered test administrations. Administration mode effect is studied with repeated measures multivariate analysis of variance, internal consistency reliability estimates, and confirmatory factor analysis approaches. Results show a lack of test mode effect on distribution, accuracy, and meaning of raw scores. Participants indicate their preferences for the computer-delivered administration of the test. The article discusses findings in light of previous studies of the Raven Standard Progressive Matrices test.

%B Educational and Psychological Measurement %V 69 %P 855-867 %U http://epm.sagepub.com/content/69/5/855.abstract %R 10.1177/0013164409332219 %0 Journal Article %J Applied Psychological Measurement %D 2008 %T Severity of Organized Item Theft in Computerized Adaptive Testing: A Simulation Study %A Qing Yi, %A Jinming Zhang, %A Chang, Hua-Hua %X

Criteria had been proposed for assessing the severity of possible test security violations for computerized tests with high-stakes outcomes. However, these criteria resulted from theoretical derivations that assumed uniformly randomized item selection. This study investigated potential damage caused by organized item theft in computerized adaptive testing (CAT) for two realistic item selection methods, maximum item information and a-stratified with content blocking, using the randomized method as a baseline for comparison. Damage caused by organized item theft was evaluated by the number of compromised items each examinee could encounter and the impact of the compromised items on examinees' ability estimates. Severity of test security violation was assessed under self-organized and organized item theft simulation scenarios. Results indicated that though item theft could cause severe damage to CAT with either item selection method, the maximum item information method was more vulnerable to the organized item theft simulation than was the a-stratified method.

%B Applied Psychological Measurement %V 32 %P 543-558 %U http://apm.sagepub.com/content/32/7/543.abstract %R 10.1177/0146621607311336 %0 Journal Article %J Zeitschrift für Psychologie %D 2008 %T Some new developments in adaptive testing technology %A van der Linden, W. J. %K computerized adaptive testing %X

In an ironic twist of history, modern psychological testing has returned to an adaptive format quite common when testing was not yet standardized. Important stimuli to the renewed interest in adaptive testing have been the development of item-response theory in psychometrics, which models the responses on test items using separate parameters for the items and test takers, and the use of computers in test administration, which enables us to estimate the parameter for a test taker and select the items in real time. This article reviews a selection from the latest developments in the technology of adaptive testing, such as constrained adaptive item selection, adaptive testing using rule-based item generation, multidimensional adaptive testing, adaptive use of test batteries, and the use of response times in adaptive testing.

%B Zeitschrift für Psychologie %V 216 %P 3-11 %G eng %0 Journal Article %J Journal of Applied Measurement %D 2008 %T Strategies for controlling item exposure in computerized adaptive testing with the partial credit model %A Davis, L. L. %A Dodd, B. G. %K *Algorithms %K *Computers %K *Educational Measurement/statistics & numerical data %K Humans %K Questionnaires/*standards %K United States %X Exposure control research with polytomous item pools has determined that randomization procedures can be very effective for controlling test security in computerized adaptive testing (CAT). The current study investigated the performance of four procedures for controlling item exposure in a CAT under the partial credit model. In addition to a no exposure control baseline condition, the Kingsbury-Zara, modified-within-.10-logits, Sympson-Hetter, and conditional Sympson-Hetter procedures were implemented to control exposure rates. The Kingsbury-Zara and the modified-within-.10-logits procedures were implemented with 3 and 6 item candidate conditions. The results show that the Kingsbury-Zara and modified-within-.10-logits procedures with 6 item candidates performed as well as the conditional Sympson-Hetter in terms of exposure rates, overlap rates, and pool utilization. These two procedures are strongly recommended for use with partial credit CATs due to their simplicity and strength of their results. %B Journal of Applied Measurement %7 2008/01/09 %V 9 %P 1-17 %@ 1529-7713 (Print)1529-7713 (Linking) %G eng %M 18180546 %0 Journal Article %J Educational and Psychological Measurement %D 2008 %T A Strategy for Controlling Item Exposure in Multidimensional Computerized Adaptive Testing %A Lee, Yi-Hsuan %A Ip, Edward H. %A Fuh, Cheng-Der %X

Although computerized adaptive tests have enjoyed tremendous growth, solutions for important problems remain unavailable. One problem is the control of item exposure rate. Because adaptive algorithms are designed to select optimal items, they choose items with high discriminating power. Thus, these items are selected more often than others, leading to both overexposure and underutilization of some parts of the item pool. Overused items are often compromised, creating a security problem that could threaten the validity of a test. Building on a previously proposed stratification scheme to control the exposure rate for one-dimensional tests, the authors extend their method to multidimensional tests. A strategy is proposed based on stratification in accordance with a functional of the vector of the discrimination parameter, which can be implemented with minimal computational overhead. Both theoretical and empirical validation studies are provided. Empirical results indicate significant improvement over the commonly used method of controlling exposure rate that requires only a reasonable sacrifice in efficiency.

%B Educational and Psychological Measurement %V 68 %P 215-232 %U http://epm.sagepub.com/content/68/2/215.abstract %R 10.1177/0013164407307007 %0 Book Section %D 2007 %T The shadow-test approach: A universal framework for implementing adaptive testing %A van der Linden, W. J. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2007 %T Some thoughts on controlling item exposure in adaptive testing %A Lewis, C. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2007 %T Statistical aspects of adaptive testing %A van der Linden, W. J. %A Glas, C. A. W. %C C. R. Rao and S. Sinharay (Eds.), Handbook of statistics (Vol. 27: Psychometrics) (pp. 801838). Amsterdam: North-Holland. %G eng %0 Journal Article %J Journal of Pain Symptom Management %D 2007 %T A system for interactive assessment and management in palliative care %A Chang, C-H. %A Boni-Saenz, A. A. %A Durazo-Arvizu, R. A. %A DesHarnais, S. %A Lau, D. T. %A Emanuel, L. L. %K *Needs Assessment %K Humans %K Medical Informatics/*organization & administration %K Palliative Care/*organization & administration %X The availability of psychometrically sound and clinically relevant screening, diagnosis, and outcome evaluation tools is essential to high-quality palliative care assessment and management. Such data will enable us to improve patient evaluations, prognoses, and treatment selections, and to increase patient satisfaction and quality of life. To accomplish these goals, medical care needs more precise, efficient, and comprehensive tools for data acquisition, analysis, interpretation, and management. We describe a system for interactive assessment and management in palliative care (SIAM-PC), which is patient centered, model driven, database derived, evidence based, and technology assisted. The SIAM-PC is designed to reliably measure the multiple dimensions of patients' needs for palliative care, and then to provide information to clinicians, patients, and the patients' families to achieve optimal patient care, while improving our capacity for doing palliative care research. This system is innovative in its application of the state-of-the-science approaches, such as item response theory and computerized adaptive testing, to many of the significant clinical problems related to palliative care. %B Journal of Pain Symptom Management %7 2007/03/16 %V 33 %P 745-55 %@ 0885-3924 (Print) %G eng %M 17360148 %0 Journal Article %J Clin Rehabil %D 2006 %T Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness programme %A Haley, S. M. %A Fragala-Pinkham, M. A. %A Ni, P. %B Clin Rehabil %V 20 %P 616-622 %0 Journal Article %J International Journal of Testing %D 2006 %T Sequential Computerized Mastery Tests—Three Simulation Studies %A Wiberg, Marie %B International Journal of Testing %V 6 %P 41-55 %U http://www.tandfonline.com/doi/abs/10.1207/s15327574ijt0601_3 %R 10.1207/s15327574ijt0601_3 %0 Journal Article %J Applied Psychological Measurement %D 2006 %T SIMCAT 1.0: A SAS computer program for simulating computer adaptive testing %A Raîche, G. %A Blais, J-G. %K computer adaptive testing %K computer program %K estimated proficiency level %K Monte Carlo methodologies %K Rasch logistic model %X Monte Carlo methodologies are frequently applied to study the sampling distribution of the estimated proficiency level in adaptive testing. These methods eliminate real situational constraints. However, these Monte Carlo methodologies are not currently supported by the available software programs, and when these programs are available, their flexibility is limited. SIMCAT 1.0 is aimed at the simulation of adaptive testing sessions under different adaptive expected a posteriori (EAP) proficiency-level estimation methods (Blais & Raîche, 2005; Raîche & Blais, 2005) based on the one-parameter Rasch logistic model. These methods are all adaptive in the a priori proficiency-level estimation, the proficiency-level estimation bias correction, the integration interval, or a combination of these factors. The use of these adaptive EAP estimation methods diminishes considerably the shrinking, and therefore biasing, effect of the estimated a priori proficiency level encountered when this a priori is fixed at a constant value independently of the computed previous value of the proficiency level. SIMCAT 1.0 also computes empirical and estimated skewness and kurtosis coefficients, such as the standard error, of the estimated proficiency-level sampling distribution. In this way, the program allows one to compare empirical and estimated properties of the estimated proficiency-level sampling distribution under different variations of the EAP estimation method: standard error and bias, like the skewness and kurtosis coefficients. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Applied Psychological Measurement %I Sage Publications: US %V 30 %P 60-61 %@ 0146-6216 (Print) %G eng %M 2005-16359-005 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function %A Hart, D. L. %A Mioduski, J. E. %A Werneke, M. W. %A Stratford, P. W. %K Back Pain Functional Scale %K computerized adaptive testing %K Item Response Theory %K Lumbar spine %K Rehabilitation %K True-score equating %X Objective: To equate physical functioning (PF) items with Back Pain Functional Scale (BPFS) items, develop a computerized adaptive test (CAT) designed to assess lumbar spine functional status (LFS) in people with lumbar spine impairments, and compare discriminant validity of LFS measures (qIRT) generated using all items analyzed with a rating scale Item Response Theory model (RSM) and measures generated using the simulated CAT (qCAT). Methods: We performed a secondary analysis of retrospective intake rehabilitation data. Results: Unidimensionality and local independence of 25 BPFS and PF items were supported. Differential item functioning was negligible for levels of symptom acuity, gender, age, and surgical history. The RSM fit the data well. A lumbar spine specific CAT was developed that was 72% more efficient than using all 25 items to estimate LFS measures. qIRT and qCAT measures did not discriminate patients by symptom acuity, age, or gender, but discriminated patients by surgical history in similar clinically logical ways. qCAT measures were as precise as qIRT measures. Conclusion: A body part specific simulated CAT developed from an LFS item bank was efficient and produced precise measures of LFS without eroding discriminant validity. %B Journal of Clinical Epidemiology %V 59 %P 947–956 %G eng %R 10.1016/j.jclinepi.2005.10.017 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function %A Hart, D. %A Mioduski, J. %A Werenke, M. %A Stratford, P. %B Journal of Clinical Epidemiology %V 59 %P 947-956 %G eng %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function %A Hart, D. L. %A Cook, K. F. %A Mioduski, J. E. %A Teal, C. R. %A Crane, P. K. %K computerized adaptive testing %K Flexilevel Scale of Shoulder Function %K Item Response Theory %K Rehabilitation %X

Background and Objective: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items,
develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (qIRT) and measures generated using the simulated CAT (qCAT).
Study Design and Setting: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients
with shoulder impairments who completed 60 SFS items.
Results: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items on were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The qIRT and qCAT measures were highly correlated (r 5 .96) and resulted in similar classifications of patients.
Conclusion: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good
discriminating ability. 

%B Journal of Clinical Epidemiology %V 59 %P 290-298 %G English %N 3 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function %A Hart, D. L. %A Cook, K. F. %A Mioduski, J. E. %A Teal, C. R. %A Crane, P. K. %K *Computer Simulation %K *Range of Motion, Articular %K Activities of Daily Living %K Adult %K Aged %K Aged, 80 and over %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Prospective Studies %K Reproducibility of Results %K Research Support, N.I.H., Extramural %K Research Support, U.S. Gov't, Non-P.H.S. %K Shoulder Dislocation/*physiopathology/psychology/rehabilitation %K Shoulder Pain/*physiopathology/psychology/rehabilitation %K Shoulder/*physiopathology %K Sickness Impact Profile %K Treatment Outcome %X BACKGROUND AND OBJECTIVE: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items, develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (theta(IRT)) and measures generated using the simulated CAT (theta(CAT)). STUDY DESIGN AND SETTING: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients with shoulder impairments who completed 60 SFS items. RESULTS: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The theta(IRT) and theta(CAT) measures were highly correlated (r = .96) and resulted in similar classifications of patients. CONCLUSION: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability. %B Journal of Clinical Epidemiology %V 59 %P 290-8 %G eng %M 16488360 %0 Journal Article %J Journal of Clinical Epidemiology %D 2005 %T Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments %A Hart, D. L. %A Mioduski, J. E. %A Stratford, P. W. %K *Health Status Indicators %K Activities of Daily Living %K Adolescent %K Adult %K Aged %K Aged, 80 and over %K Ankle Joint/physiopathology %K Diagnosis, Computer-Assisted/*methods %K Female %K Hip Joint/physiopathology %K Humans %K Joint Diseases/physiopathology/*rehabilitation %K Knee Joint/physiopathology %K Lower Extremity/*physiopathology %K Male %K Middle Aged %K Research Support, N.I.H., Extramural %K Research Support, U.S. Gov't, P.H.S. %K Retrospective Studies %X BACKGROUND AND OBJECTIVE: To develop computerized adaptive tests (CATs) designed to assess lower extremity functional status (FS) in people with lower extremity impairments using items from the Lower Extremity Functional Scale and compare discriminant validity of FS measures generated using all items analyzed with a rating scale Item Response Theory model (theta(IRT)) and measures generated using the simulated CATs (theta(CAT)). METHODS: Secondary analysis of retrospective intake rehabilitation data. RESULTS: Unidimensionality of items was strong, and local independence of items was adequate. Differential item functioning (DIF) affected item calibration related to body part, that is, hip, knee, or foot/ankle, but DIF did not affect item calibration for symptom acuity, gender, age, or surgical history. Therefore, patients were separated into three body part specific groups. The rating scale model fit all three data sets well. Three body part specific CATs were developed: each was 70% more efficient than using all LEFS items to estimate FS measures. theta(IRT) and theta(CAT) measures discriminated patients by symptom acuity, age, and surgical history in similar ways. theta(CAT) measures were as precise as theta(IRT) measures. CONCLUSION: Body part-specific simulated CATs were efficient and produced precise measures of FS with good discriminant validity. %B Journal of Clinical Epidemiology %V 58 %P 629-38 %G eng %M 15878477 %0 Journal Article %J Testing Psicometria Metodologia %D 2005 %T Somministrazione di test computerizzati di tipo adattivo: Un' applicazione del modello di misurazione di Rasch [Administration of computerized and adaptive tests: An application of the Rasch Model] %A Miceli, R. %A Molinengo, G. %K Adaptive Testing %K Computer Assisted Testing %K Item Response Theory computerized adaptive testing %K Models %K Psychometrics %X The aim of the present study is to describe the characteristics of a procedure for administering computerized and adaptive tests (Computer Adaptive Testing or CAT). Items to be asked to the individuals are interactively chosen and are selected from a "bank" in which they were previously calibrated and recorded on the basis of their difficulty level. The selection of items is performed by increasingly more accurate estimates of the examinees' ability. The building of an item-bank on Psychometrics and the implementation of this procedure allow a first validation through Monte Carlo simulations. (PsycINFO Database Record (c) 2006 APA ) (journal abstract) %B Testing Psicometria Metodologia %V 12 %P 131-149 %G eng %0 Generic %D 2005 %T Strategies for controlling item exposure in computerized adaptive testing with the partial credit model %A Davis, L. L. %A Dodd, B. %C Pearson Educational Measurement Research Report 05-01 %G eng %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2004 %T Score comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care %A Haley, S. M. %A Coster, W. J. %A Andres, P. L. %A Kosinski, M. %A Ni, P. %K Boston %K Factor Analysis, Statistical %K Humans %K Outcome Assessment (Health Care)/*methods %K Prospective Studies %K Questionnaires/standards %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVE: To compare simulated short-form and computerized adaptive testing (CAT) scores to scores obtained from complete item sets for each of the 3 domains of the Activity Measure for Post-Acute Care (AM-PAC). DESIGN: Prospective study. SETTING: Six postacute health care networks in the greater Boston metropolitan area, including inpatient acute rehabilitation, transitional care units, home care, and outpatient services. PARTICIPANTS: A convenience sample of 485 adult volunteers who were receiving skilled rehabilitation services. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Inpatient and community-based short forms and CAT applications were developed for each of 3 activity domains (physical & mobility, personal care & instrumental, applied cognition) using item pools constructed from new items and items from existing postacute care instruments. RESULTS: Simulated CAT scores correlated highly with score estimates from the total item pool in each domain (4- and 6-item CAT r range,.90-.95; 10-item CAT r range,.96-.98). Scores on the 10-item short forms constructed for inpatient and community settings also provided good estimates of the AM-PAC item pool scores for the physical & movement and personal care & instrumental domains, but were less consistent in the applied cognition domain. Confidence intervals around individual scores were greater in the short forms than for the CATs. CONCLUSIONS: Accurate scoring estimates for AM-PAC domains can be obtained with either the setting-specific short forms or the CATs. The strong relationship between CAT and item pool scores can be attributed to the CAT's ability to select specific items to match individual responses. The CAT may have additional advantages over short forms in practicality, efficiency, and the potential for providing more precise scoring estimates for individuals. %B Archives of Physical Medicine and Rehabilitation %7 2004/04/15 %V 85 %P 661-6 %8 Apr %@ 0003-9993 (Print) %G eng %M 15083444 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2004 %T A sequential Bayesian procedure for item calibration in multistage testing %A van der Linden, W. J. %A Alan D Mead %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Journal Article %J Journal of Statistical Planning and Inference %D 2004 %T Sequential estimation in variable length computerized adaptive testing %A Chang, I. Y. %X With the advent of modern computer technology, there have been growing e3orts in recent years to computerize standardized tests, including the popular Graduate Record Examination (GRE), the Graduate Management Admission Test (GMAT) and the Test of English as a Foreign Language (TOEFL). Many of such computer-based tests are known as the computerized adaptive tests, a major feature of which is that, depending on their performance in the course of testing, di3erent examinees may be given with di3erent sets of items (questions). In doing so, items can be e>ciently utilized to yield maximum accuracy for estimation of examinees’ ability traits. We consider, in this article, one type of such tests where test lengths vary with examinees to yield approximately same predetermined accuracy for all ability traits. A comprehensive large sample theory is developed for the expected test length and the sequential point and interval estimates of the latent trait. Extensive simulations are conducted with results showing that the large sample approximations are adequate for realistic sample sizes. %B Journal of Statistical Planning and Inference %V 121 %P 249-264 %@ 03783758 %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2004 %T A sharing item response theory model for computerized adaptive testing %A Segall, D. O. %X A new sharing item response theory (SIRT) model is presented which explicitly models the effects of sharing item content between informants and testtakers. This model is used to construct adaptive item selection and scoring rules that provide increased precision and reduced score gains in instances where sharing occurs. The adaptive item selection rules are expressed as functions of the item’s exposure rate in addition to other commonly used properties (characterized by difficulty, discrimination, and guessing parameters). Based on the results of simulated item responses, the new item selection and scoring algorithms compare favorably to the Sympson-Hetter exposure control method. The new SIRT approach provides higher reliability and lower score gains in instances where sharing occurs. %B Journal of Educational and Behavioral Statistics %V 29 %P 439-460 %8 Win %G eng %0 Journal Article %J International Journal of Artificial Intelligence in Education %D 2004 %T Siette: a web-based tool for adaptive testing %A Conejo, R %A Guzmán, E %A Millán, E %A Trella, M %A Pérez-De-La-Cruz, JL %A Ríos, A %K computerized adaptive testing %B International Journal of Artificial Intelligence in Education %V 14 %P 29-61 %G eng %0 Book Section %D 2004 %T State-of-the-art and adaptive open-closed items in adaptive foreign language assessment %A Giouroglou, H. %A Economides, A. A. %C Proceedings 4th Hellenic Conference with ternational Participation: Informational and Communication Technologies in Education, Athens,747-756 %G eng %0 Journal Article %J Metodologia de Las Ciencias del Comportamiento. %D 2004 %T Statistics for detecting disclosed items in a CAT environment %A Lu, Y., %A Hambleton, R. K. %B Metodologia de Las Ciencias del Comportamiento. %V 5 %G eng %N 2 %& págs. 225-242 %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Strategies for Controlling Item Exposure in Computerized Adaptive Testing With the Generalized Partial Credit Model %A Davis, Laurie Laughlin %X

Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline condition, the randomesque, modified-within-.10-logits, Sympson-Hetter, conditional Sympson-Hetter, a-stratified with multiple-stratification, and enhanced a-stratified with multiple-stratification procedures were implemented to control exposure rates. Two variations of the randomesque and modified-within-.10-logits procedures were examined, which varied the size of the item group from which the next item to be administered was randomly selected. The results indicate that although the conditional Sympson-Hetter provides somewhat lower maximum exposure rates, the randomesque and modified-within-.10-logits procedures with the six-item group variation have great utility for controlling overlap rates and increasing pool utilization and should be given further consideration.

%B Applied Psychological Measurement %V 28 %P 165-185 %U http://apm.sagepub.com/content/28/3/165.abstract %R 10.1177/0146621604264133 %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Strategies for controlling item exposure in computerized adaptive testing with the generalized partial credit model %A Davis, L. L. %K computerized adaptive testing %K generalized partial credit model %K item exposure %X Choosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline condition, the randomesque, modified-within-.10-logits, Sympson-Hetter, conditional Sympson-Hetter, a-stratified with multiple-stratification, and enhanced a-stratified with multiple-stratification procedures were implemented to control exposure rates. Two variations of the randomesque and modified-within-.10-logits procedures were examined, which varied the size of the item group from which the next item to be administered was randomly selected. The results indicate that although the conditional Sympson-Hetter provides somewhat lower maximum exposure rates, the randomesque and modified-within-.10-logits procedures with the six-item group variation have great utility for controlling overlap rates and increasing pool utilization and should be given further consideration. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Applied Psychological Measurement %I Sage Publications: US %V 28 %P 165-185 %@ 0146-6216 (Print) %G eng %M 2004-13800-002 %0 Journal Article %J Dissertation Abstracts International: Section B: The Sciences & Engineering %D 2004 %T Strategies for controlling testlet exposure rates in computerized adaptive testing systems %A Boyd, Aimee Michelle %X Exposure control procedures in computerized adaptive testing (CAT) systems protect item pools from being compromised, however, this impacts measurement precision. Previous research indicates that exposure control procedures perform differently for dichotomously scored versus polytomously scored CAT systems. For dichotomously scored CATs, conditional selection procedures are often the optimal choice, while randomization procedures perform best for polytomously scored CATs. CAT systems modeled with testlet response theory have not been examined to determine optimal exposure control procedures. This dissertation examined various exposure control procedures in testlet-based CAT systems using the three-parameter logistic testlet response theory model and the partial credit model. The exposure control procedures were the randomesque procedure, the modified within .10 logits procedure, two levels of the progressive restricted procedure, and two levels of the Sympson-Hetter procedure. Each of these was compared to a baseline no exposure control procedure, maximum information. The testlets were reading passages with six to ten multiple-choice items. The CAT systems consisted of maximum information testlet selection contingent on an exposure control procedure and content balancing for passage type and the number of items per passage; expected a posteriori ability estimation; and a fixed length stopping rule of seven testlets totaling fifty multiple-choice items. Measurement precision and exposure rates were examined to evaluate the effectiveness of the exposure control procedures for each measurement model. The exposure control procedures yielded similar results for measurement precision within the models. The exposure rates distinguished which exposure control procedures were most effective. The Sympson-Hetter conditions, which are conditional procedures, maintained the pre-specified maximum exposure rate, but performed very poorly in terms of pool utilization. The randomization procedures, randomesque and modified within .10 logits, yielded low maximum exposure rates, but used only about 70% of the testlet pool. Surprisingly, the progressive restricted procedure, which is a combination of both a conditional and randomization procedure, yielded the best results in its ability to maintain and control the maximum exposure rate and it used the entire testlet pool. The progressive restricted conditions were the optimal procedures for both the partial credit CAT systems and the three-parameter logistic testlet response theory CAT systems. (PsycINFO Database Record (c) 2004 APA, all rights reserved). %B Dissertation Abstracts International: Section B: The Sciences & Engineering %V 64 %P 5835 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2004 %T A study of multiple stage adaptive test designs %A Armstrong, R. D. %A Edmonds, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Generic %D 2003 %T A sequential Bayes procedure for item calibration in multi-stage testing %A van der Linden, W. J. %A Alan D Mead %C Manuscript in preparation %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T A simulation study to compare CAT strategies for cognitive diagnosis %A Xu, X. %A Chang, Hua-Hua %A Douglas, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Small sample estimation in dichotomous item response models: Effect of priors based on judgmental information on the accuracy of item parameter estimates %A Swaminathan, H. %A Hambleton, R. K. %A Sireci, S. G. %A Xing, D. %A Rizavi, S. M. %X Large item banks with properly calibrated test items are essential for ensuring the validity of computer-based tests. At the same time, item calibrations with small samples are desirable to minimize the amount of pretesting and limit item exposure. Bayesian estimation procedures show considerable promise with small examinee samples. The purposes of the study were (a) to examine how prior information for Bayesian item parameter estimation can be specified and (b) to investigate the relationship between sample size and the specification of prior information on the accuracy of item parameter estimates. The results of the simulation study were clear: Estimation of item response theory (IRT) model item parameters can be improved considerably. Improvements in the one-parameter model were modest; considerable improvements with the two- and three-parameter models were observed. Both the study of different forms of priors and ways to improve the judgmental data used in forming the priors appear to be promising directions for future research. %B Applied Psychological Measurement %V 27 %P 27-51 %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2003 %T Some alternatives to Sympson-Hetter item-exposure control in computerized adaptive testing %A van der Linden, W. J. %K Adaptive Testing %K Computer Assisted Testing %K Test Items computerized adaptive testing %X TheHetter and Sympson (1997; 1985) method is a method of probabilistic item-exposure control in computerized adaptive testing. Setting its control parameters to admissible values requires an iterative process of computer simulations that has been found to be time consuming, particularly if the parameters have to be set conditional on a realistic set of values for the examinees’ ability parameter. Formal properties of the method are identified that help us explain why this iterative process can be slow and does not guarantee admissibility. In addition, some alternatives to the SH method are introduced. The behavior of these alternatives was estimated for an adaptive test from an item pool from the Law School Admission Test (LSAT). Two of the alternatives showed attractive behavior and converged smoothly to admissibility for all items in a relatively small number of iteration steps. %B Journal of Educational and Behavioral Statistics %V 28 %P 249-265 %G eng %0 Conference Paper %B Paper Prepared for Presentation at the Annual Conference of the Canadian Society for Studies in Education %D 2003 %T Standard-setting issues in computerized-adaptive testing %A Gushta, M. M. %B Paper Prepared for Presentation at the Annual Conference of the Canadian Society for Studies in Education %C Halifax, Nova Scotia, May 30th, 2003 %G eng %0 Journal Article %J Dissertation Abstracts International: Section B: The Sciences & Engineering %D 2003 %T Statistical detection and estimation of differential item functioning in computerized adaptive testing %A Feng, X. %X Differential item functioning (DIF) is an important issue in large scale standardized testing. DIF refers to the unexpected difference in item performances among groups of equally proficient examinees, usually classified by ethnicity or gender. Its presence could seriously affect the validity of inferences drawn from a test. Various statistical methods have been proposed to detect and estimate DIF. This dissertation addresses DIF analysis in the context of computerized adaptive testing (CAT), whose item selection algorithm adapts to the ability level of each individual examinee. In a CAT, a DIF item may be more consequential and more detrimental be cause fewer items are administered in a CAT than in a traditional paper-and-pencil test and because the remaining sequence of items presented to examinees depends in part on their responses to the DIF item. Consequently, an efficient, stable and flexible method to detect and estimate CAT DIF becomes necessary and increasingly important. We propose simultaneous implementations of online calibration and DIF testing. The idea is to perform online calibration of an item of interest separately in the focal and reference groups. Under any specific parametric IRT model, we can use the (online) estimated latent traits as covariates and fit a nonlinear regression model to each of the two groups. Because of the use of the estimated, not the true , the regression fit has to adjust for the covariate "measurement errors". It turns out that this situation fits nicely into the framework of nonlinear error-in-variable modelling, which has been extensively studied in statistical literature. We develop two bias-correction methods using asymptotic expansion and conditional score theory. After correcting the bias caused by measurement error, one can perform a significance test to detect DIF with the parameter estimates for different groups. This dissertation also discusses some general techniques to handle measurement error modelling with different IRT models, including the three-parameter normal ogive model and polytomous response models. Several methods of estimating DIF are studied as well. Large sample properties are established to justify the proposed methods. Extensive simulation studies show that the resulting methods perform well in terms of Type-I error rate control, accuracy in estimating DIF and power against both unidirectional and crossing DIF. (PsycINFO Database Record (c) 2004 APA, all rights reserved). %B Dissertation Abstracts International: Section B: The Sciences & Engineering %V 64 %P 2736 %G eng %0 Journal Article %J Dissertation Abstracts International: Section B: The Sciences & Engineering %D 2003 %T Strategies for controlling item exposure in computerized adaptive testing with polytomously scored items %A Davis, L. L. %X Choosing a strategy for controlling the exposure of items to examinees has become an integral part of test development for computerized adaptive testing (CAT). Item exposure can be controlled through the use of a variety of algorithms which modify the CAT item selection process. This may be done through a randomization, conditional selection, or stratification approach. The effectiveness of each procedure as well as the degree to which measurement precision is sacrificed has been extensively studied with dichotomously scored item pools. However, only recently have researchers begun to examine these procedures in polytomously scored item pools. The current study investigated the performance of six different exposure control mechanisms under three polytomous IRT models in terms of measurement precision, test security, and ease of implementation. The three models examined in the current study were the partial credit, generalized partial credit, and graded response models. In addition to a no exposure control baseline condition, the randomesque, within .10 logits, Sympson-Hetter, conditional Sympson-Hetter, a-Stratified, and enhanced a-Stratified procedures were implemented to control item exposure rates. The a-Stratified and enhanced a-Stratified procedures were not evaluated with the partial credit model. Two variations of the randomesque and within .10 logits procedures were also examined which varied the size of the item group from which the next item to be administered was randomly selected. The results of this study were remarkably similar for all three models and indicated that the randomesque and within .10 logits procedures, when implemented with the six item group variation, provide the best option for controlling exposure rates when impact to measurement precision and ease of implementation are considered. The three item group variations of the procedures were, however, ineffective in controlling exposure, overlap, and pool utilization rates to desired levels. The Sympson-Hetter and conditional Sympson-Hetter procedures were difficult and time consuming to implement, and while they did control exposure rates to the target level, their performance in terms of item overlap (for the Sympson-Hetter) and pool utilization were disappointing. The a-Stratified and enhanced a-Stratified procedures both turned in surprisingly poor performances across all variables. (PsycINFO Database Record (c) 2004 APA, all rights reserved). %B Dissertation Abstracts International: Section B: The Sciences & Engineering %V 64 %P 458 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Strategies for controlling item exposure in computerized adaptive testing with the generalized partial credit model %A Davis, L. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Book %D 2003 %T Strategies for controlling testlet exposure rates in computerized adaptive testing systems %A Boyd, A. M %C Unpublished Ph.D. Dissertation, The University of Texas at Austin. %G eng %0 Journal Article %J System %D 2003 %T Student modeling and ab initio language learning %A Heift, T. %A Schulze, M. %X Provides examples of student modeling techniques that have been employed in computer-assisted language learning over the past decade. Describes two systems for learning German: "German Tutor" and "Geroline." Shows how a student model can support computerized adaptive language testing for diagnostic purposes in a Web-based language learning environment that does not rely on parsing technology. (Author/VWL) %B System %V 31 %P 519-535 %G eng %M EJ677996 %0 Journal Article %J Quality of Life Research %D 2003 %T A study of the feasibility of Internet administration of a computerized health survey: The Headache Impact Test (HIT) %A Bayliss, M.S. %A Dewey, J.E. %A Dunlap, I %A et. al. %B Quality of Life Research %V 12 %P 953-961 %G eng %0 Journal Article %J International Journal of Continuing Engineering Education and Life-Long Learning %D 2002 %T Self-adapted testing: An overview %A Wise, S. L. %A Ponsoda, V. %A Olea, J. %B International Journal of Continuing Engineering Education and Life-Long Learning %V 12 %P 107-122 %G eng %0 Conference Paper %B Communication proposée au 11e Biannual International objective measurement workshop. New-Orleans : International Objective Measurement Workshops. %D 2002 %T Some features of the estimated sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules %A Raîche, G. %A Blais, J. G. %B Communication proposée au 11e Biannual International objective measurement workshop. New-Orleans : International Objective Measurement Workshops. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the International Objective Measurement Workshops-XI %D 2002 %T Some features of the sampling distribution of the ability estimate in computerized adaptive testing according to two stopping rules %A Blais, J-G. %A Raiche, G. %B Paper presented at the annual meeting of the International Objective Measurement Workshops-XI %C New Orleans, LA %G eng %0 Generic %D 2002 %T STAR Math 2 Computer-Adaptive Math Test and Database: Technical Manual %A Renaissance-Learning-Inc. %C Wisconsin Rapids, WI: Author %G eng %0 Conference Paper %B (Original title: Detecting item misfit in computerized adaptive testing.) Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Statistical indexes for monitoring item behavior under computer adaptive testing environment %A Zhu, R. %A Yu, F. %A Liu, S. M. %B (Original title: Detecting item misfit in computerized adaptive testing.) Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Book %D 2002 %T Strategies for controlling item exposure in computerized adaptive testing with polytomously scored items %A Davis, L. L. %C Unpublished doctoral dissertation, University of Texas, Austin %G eng %0 Generic %D 2002 %T A strategy for controlling item exposure in multidimensional computerized adaptive testing %A Lee, Y. H. %A Ip, E.H. %A Fuh, C.D. %C Available from http://www3. tat.sinica.edu.tw/library/c_tec_rep/c-2002-11.pdf %G eng %0 Journal Article %J Assessment %D 2002 %T A structure-based approach to psychological measurement: Matching measurement models to latent structure %A Ruscio, John %A Ruscio, Ayelet Meron %K Adaptive Testing %K Assessment %K Classification (Cognitive Process) %K Computer Assisted %K Item Response Theory %K Psychological %K Scaling (Testing) %K Statistical Analysis computerized adaptive testing %K Taxonomies %K Testing %X The present article sets forth the argument that psychological assessment should be based on a construct's latent structure. The authors differentiate dimensional (continuous) and taxonic (categorical) structures at the latent and manifest levels and describe the advantages of matching the assessment approach to the latent structure of a construct. A proper match will decrease measurement error, increase statistical power, clarify statistical relationships, and facilitate the location of an efficient cutting score when applicable. Thus, individuals will be placed along a continuum or assigned to classes more accurately. The authors briefly review the methods by which latent structure can be determined and outline a structure-based approach to assessment that builds on dimensional scaling models, such as item response theory, while incorporating classification methods as appropriate. Finally, the authors empirically demonstrate the utility of their approach and discuss its compatibility with traditional assessment methods and with computerized adaptive testing. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Assessment %V 9 %P 4-16 %G eng %0 Generic %D 2001 %T Scoring alternatives for incomplete computerized adaptive tests (Research Report 01-20) %A Way, W. D. %A Gawlick, L. A. %A Eignor, D. R. %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 2001 %T STAR Early Literacy Computer-Adaptive Diagnostic Assessment: Technical Manual %A Renaissance-Learning-Inc. %C Wisconsin Rapids, WI: Author %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T A system for on-the-fly adaptive testing %A Wagner, M. E. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Generic %D 2000 %T A selection procedure for polytomous items in computerized adaptive testing (Measurement and Research Department Reports 2000-5) %A Rijn, P. W. van, %A Theo Eggen %A Hemker, B. T. %A Sanders, P. F. %C Arnhem, The Netherlands: Cito %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Solving complex constraints in a-stratified computerized adaptive testing designs %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans, USA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Some considerations for improving accuracy of estimation of item characteristic curves in online calibration of computerized adaptive testing %A Samejima, F. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Specific information item selection for adaptive testing %A Davey, T. %A Fan, M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans %G eng %0 Generic %D 2000 %T STAR Reading 2 Computer-Adaptive Reading Test and Database: Technical Manual %A Renaissance-Learning-Inc. %C Wisconsin Rapids, WI: Author %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Sufficient simplicity or comprehensive complexity? A comparison of probabilitic and stratification methods of exposure control %A Parshall, C. G. %A Kromrey, J. D. %A Hogarty, K. Y. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Generic %D 1999 %T Some relationship among issues in CAT item pool management %A Wang, T. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T Some reliability estimates for computerized adaptive tests %A Nicewander, W. A. %A Thomasson, G. L. %X Three reliability estimates are derived for the Bayes modal estimate (BME) and the maximum likelihood estimate (MLE) of θin computerized adaptive tests (CAT). Each reliability estimate is a function of test information. Two of the estimates are shown to be upper bounds to true reliability. The three reliability estimates and the true reliabilities of both MLE and BME were computed for seven simulated CATs. Results showed that the true reliabilities for MLE and BME were nearly identical in all seven tests. The three reliability estimates never differed from the true reliabilities by more than .02 (.01 in most cases). A simple implementation of one reliability estimate was found to accurately estimate reliability in CATs. %B Applied Psychological Measurement %V 23 %P 239-47 %G eng %M EJ596308 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Standard errors of proficiency estimates in stratum scored CAT %A Kingsbury, G. G. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Study of methods to detect aberrant response patterns in computerized testing %A Iwamoto, C. K. %A Nungester, R. J. %A Luecht, RM %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Generic %D 1998 %T Simulating nonmodel-fitting responses in a CAT Environment (Research Report 98-10) %A Yi, Q. %A Nering, M, L. %C Iowa City IA: ACT Inc. (Also presented at National Council on Measurement in Education, 1999: ERIC No. ED 427 042) %G eng %0 Generic %D 1998 %T Simulating the null distribution of person-fit statistics for conventional and adaptive tests (Research Report 98-02) %A Meijer, R. R. %A van Krimpen-Stoop, E. M. L. A. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Journal Article %J Journal of Educational Measurement %D 1998 %T Simulating the use of disclosed items in computerized adaptive testing %A Stocking, M. L. %A W. C. Ward %A Potenza, M. T. %K computerized adaptive testing %X Regular use of questions previously made available to the public (i.e., disclosed items) may provide one way to meet the requirement for large numbers of questions in a continuous testing environment, that is, an environment in which testing is offered at test taker convenience throughout the year rather than on a few prespecified test dates. First it must be shown that such use has effects on test scores small enough to be acceptable. In this study simulations are used to explore the use of disclosed items under a worst-case scenario which assumes that disclosed items are always answered correctly. Some item pool and test designs were identified in which the use of disclosed items produces effects on test scores that may be viewed as negligible. %B Journal of Educational Measurement %V 35 %P 48-68 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1998 %T Some considerations for eliminating biases in ability estimation in computerized adaptive testing %A Samejima, F. %B Paper presented at the annual meeting of the American Educational Research Association %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T Some item response theory to provide scale scores based on linear combinations of testlet scores, for computerized adaptive tests %A Thissen, D. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Journal Article %J Journal of Educational Measurement %D 1998 %T Some practical examples of computerized adaptive sequential testing %A Luecht, RM %A Nungester, R. J. %B Journal of Educational Measurement %V 35 %P 229-249 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T Some reliability estimators for computerized adaptive tests %A Nicewander, W. A. %A Thomasson, G. L. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Report %D 1998 %T Statistical tests for person misfit in computerized adaptive testing %A Glas, C. A. W. %A Meijer, R. R. %A van Krimpen-Stoop, E. M. %I Faculty of Educational Science and Technology, Univeersity of Twente %C Enschede, The Netherlands %P 28 %@ 98-01 %G eng %0 Generic %D 1998 %T Statistical tests for person misfit in computerized adaptive testing (Research Report 98-01) %A Glas, C. A. W. %A Meijer, R. R. %A van Krimpen-Stoop, E. M. L. A. %C Enschede, The Netherlands : University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Journal Article %J Psychometrika %D 1998 %T Stochastic order in dichotomous item response models for fixed, adaptive, and multidimensional tests %A van der Linden, W. J. %B Psychometrika %V 63 %P 211-226 %G eng %0 Journal Article %J International Journal of Selection and Assessment %D 1998 %T Swedish Enlistment Battery: Construct validity and latent variable estimation of cognitive abilities by the CAT-SEB %A Mardberg, B. %A Carlstedt, B. %B International Journal of Selection and Assessment %V 6 %P 107-114 %G eng %0 Journal Article %J Stress & Coping: An International Journal %D 1997 %T Self-adapted testing: Improving performance by modifying tests instead of examinees %A Rocklin, T. %X This paper describes self-adapted testing and some of the evidence concerning its effects, presents possible theoretical explanations for those effects, and discusses some of the practical concerns regarding self-adapted testing. Self-adapted testing is a variant of computerized adapted testing in which the examine makes dynamic choices about the difficulty of the items he or she attempts. Self-adapted testing generates scores that are, in constrast to computerized adapted test and fixed-item tests, uncorrelated with a measure of trait test anxiety. This lack of correlation with an irrelevant attribute of the examine is evidence of an improvement in the construct validity of the scores. This improvement comes at the cost of a decrease in testing efficiency. The interaction between test anxiety and test administration mode is more consistent with an interference theory of test anxiety than a deficit theory. Some of the practical concerns regarding self-adapted testing can be ruled out logically, but others await empirical investigation. %B Stress & Coping: An International Journal %V 10(1) %P 83-104 %G eng %0 Generic %D 1997 %T Simulating the use of disclosed items in computerized adaptive testing (Research Report 97-10) %A Stocking, M. L. %A W. C. Ward %A Potenza, M. T. %C Princeton NJ: Educational Testing Service %G eng %0 Conference Paper %B Paper presented at the Psychometric Society meeting %D 1997 %T Simulation of realistic ability vectors %A Nering, M. %A Thompson, T.D. %A Davey, T. %B Paper presented at the Psychometric Society meeting %C Gatlinburg TN %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T A simulation study of the use of the Mantel-Haenszel and logistic regression procedures for assessing DIF in a CAT environment %A Ross, L. P. %A Nandakumar, R, %A Clauser, B. E. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 1997 %T Some new item selection criteria for adaptive testing %A Veerkamp, W. J. J., %A Berger, M. P. F. %B Journal of Educational and Behavioral Statistics %V 22 %P 203-226 %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 1997 %T Some new item selection criteria for adaptive testing %A Berger, M. P. F., %A Veerkamp, W. J. J. %B Journal of Educational and Behavioral Statistics %V 22 %P 203-226 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Some questions that must be addressed to develop and maintain an item pool for use in an adaptive test %A Kingsbury, G. G. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Book %D 1997 %T Statistical methods for computerized adaptive testing %A Veerkamp, W. J. J. %C Unpublished doctoral dissertation, University of Twente, Enschede, The Netherlands %G eng %0 Conference Paper %B Paper presented at the annual meeting of National Council on Measurement in Education %D 1996 %T A search procedure to determine sets of decision points when using testlet-based Bayesian sequential testing procedures %A Smith, R. %A Lewis, C. %B Paper presented at the annual meeting of National Council on Measurement in Education %C New York %G eng %0 Generic %D 1996 %T Some practical examples of computerized adaptive sequential testing (Internal Report) %A Luecht, RM %A Nungester, R. J. %C Philadelphia: National Board of Medical Examiners %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Strategies for managing item pools to maximize item security %A Way, W. D. %A A Zara %A Leahy, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego %G eng %0 Journal Article %J The Chronicle of Higher Education %D 1995 %T Shortfall of questions curbs use of computerized graduate exam %A Jacobson, R. L. %B The Chronicle of Higher Education %G eng %& A23. %0 Generic %D 1995 %T Some alternative CAT item selection heuristics (Internal report) %A Luecht, RM %C Philadelphia PA: National Board of Medical Examiners %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1995 %T Some new methods for content balancing adaptive tests %A Segall, D. O. %A Davey, T. C. %B Paper presented at the annual meeting of the Psychometric Society %C Minneapolis MN %G eng %0 Journal Article %J Shinrigaku Kenkyu %D 1995 %T A study of psychologically optimal level of item difficulty %A Fujimori, S. %K *Adaptation, Psychological %K *Psychological Tests %K Adult %K Female %K Humans %K Male %X For the purpose of selecting items in a test, this study presented a viewpoint of psychologically optimal difficulty level, as well as measurement efficiency, of items. A paper-and-pencil test (P & P) composed of hard, moderate and easy subtests was administered to 298 students at a university. A computerized adaptive test (CAT) was also administered to 79 students. The items of both tests were selected from Shiba's Word Meaning Comprehension Test, for which the estimates of parameters of two-parameter item response model were available. The results of P & P research showed that the psychologically optimal success level would be such that the proportion of right answers is somewhere between .75 and .85. A similar result was obtained from CAT research, where the proportion of about .8 might be desirable. Traditionally a success rate of .5 has been recommended in adaptive testing. In this study, however, it was suggested that the items of such level would be too hard psychologically for many examinees. %B Shinrigaku Kenkyu %7 1995/02/01 %V 65 %P 446-53 %8 Feb %@ 0021-5236 (Print)0021-5236 (Linking) %G jpn %M 7752567 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1994 %T The selection of test items for decision making with a computer adaptive test %A Reckase, M. D. %A Spray, J. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the national meeting of the National Council on Measurement in Education %D 1994 %T The selection of test items for decision making with a computer adaptive test %A Spray, J. A. %A Reckase, M. D. %B Paper presented at the national meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T Self-adapted testing %A Rocklin, T. R. %B Applied Measurement in Education %V 7 %P 3-14 %G eng %0 Generic %D 1994 %T A simple and fast item selection procedure for adaptive testing %A Veerkamp, W. J. J. %C Research (Report 94-13). University of Twente. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1994 %T A Simulation Study of Methods for Assessing Differential Item Functioning in Computerized Adaptive Tests %A Zwick, R. %A Thayer, D. T. %A Wingersky, M. %B Applied Psychological Measurement %V 18 %P 121-140 %G English %0 Generic %D 1994 %T A simulation study of the Mantel-Haenszel procedure for detecting DIF with the NCLEX using CAT (Technical Report xx-xx) %A Way, W. D. %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1994 %T Some new item selection criteria for adaptive testing (Research Rep 94-6) %A Veerkamp, W. J. %A Berger, M. P. F. %C Enschede, The Netherlands: University of Twente, Department of Educational Measurement and Data Analysis. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T A simulated comparison of testlets and a content balancing procedure for an adaptive certification examination %A Reshetar, R. A. %A Norcini, J. J. %A Shea, J. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T A simulated comparison of two content balancing and maximum information item selection procedures for an adaptive certification examination %A Reshetar, R. A. %A Norcini, J. J. %A Shea, J. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta %G eng %0 Generic %D 1993 %T A simulation study of methods for assessing differential item functioning in computer-adaptive tests (Educational Testing Service Research Rep No RR 93-11) %A Zwick, R. %A Thayer, D. %A Wingersky, M. %C Princeton NJ: Educational Testing Service. %G eng %0 Conference Paper %B Paper presented at the New Methods and Applications in Consumer Research Conference %D 1993 %T Some initial experiments with adaptive survey designs for structured questionnaires %A Singh, J. %B Paper presented at the New Methods and Applications in Consumer Research Conference %C Cambridge MA %G eng %0 Journal Article %J Educational Measurement: Issues and Practice %D 1993 %T Some practical considerations when converting a linearly administered test to an adaptive format %A Wainer, H., %B Educational Measurement: Issues and Practice %V 12 (1) %P 15-20 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1992 %T Scaling of two-stage adaptive test configurations for achievement testing %A Hendrickson, A. B. %A Kolen, M. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Generic %D 1992 %T Some practical considerations when converting a linearly administered test to an adaptive format (Research Report 92-21 or 13?) %A Wainer, H., %C Princeton NJ: Educational Testing Service %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1992 %T Student attitudes toward computer-adaptive test administration %A Baghi, H %A Ferrara, S. F %A Gabrys, R. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco CA %G eng %0 Generic %D 1991 %T A simulation study of some simple approaches to the study of DIF for CATs %A Holland, P. W. %A Zwick, R. %C Internal memorandum, Educational Testing Service %G eng %0 Generic %D 1991 %T Some empirical guidelines for building testlets (Technical Report 91-56) %A Wainer, H., %A Kaplan, B. %A Lewis, C. %C Princeton NJ: Educational Testing Service, Program Statistics Research %G eng %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 1990 %T Sequential item response models with an ordered response %A Tutz, G. %B British Journal of Mathematical and Statistical Psychology %V 43 %P 39-55 %0 Journal Article %J Journal of Educational Measurement %D 1990 %T A simulation and comparison of flexilevel and Bayesian computerized adaptive testing %A De Ayala, R. J., %A Dodd, B. G. %A Koch, W. R. %K computerized adaptive testing %X Computerized adaptive testing (CAT) is a testing procedure that adapts an examination to an examinee's ability by administering only items of appropriate difficulty for the examinee. In this study, the authors compared Lord's flexilevel testing procedure (flexilevel CAT) with an item response theory-based CAT using Bayesian estimation of ability (Bayesian CAT). Three flexilevel CATs, which differed in test length (36, 18, and 11 items), and three Bayesian CATs were simulated; the Bayesian CATs differed from one another in the standard error of estimate (SEE) used for terminating the test (0.25, 0.10, and 0.05). Results showed that the flexilevel 36- and 18-item CATs produced ability estimates that may be considered as accurate as those of the Bayesian CAT with SEE = 0.10 and comparable to the Bayesian CAT with SEE = 0.05. The authors discuss the implications for classroom testing and for item response theory-based CAT. %B Journal of Educational Measurement %V 27 %P 227-239 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1990 %T Software review: MicroCAT Testing System Version 3 %A Patience, W. M. %B Journal of Educational Measurement %V 7 %P 82-88 %G eng %0 Conference Paper %B Paper presented at the Midwest Objective Measurement Seminar %D 1990 %T The stability of Rasch pencil and paper item calibrations on computer adaptive tests %A Bergstrom, Betty A. %A Lunz, M. E. %B Paper presented at the Midwest Objective Measurement Seminar %C Chicago IL %G eng %0 Journal Article %J International Journal of Educational Research %D 1989 %T Some procedures for computerized ability testing %A van der Linden, W. J. %A Zwarts, M. A. %B International Journal of Educational Research %V 13(2) %P 175-187 %G eng %0 Generic %D 1988 %T Scale drift in on-line calibration (Research Report RR-88-28-ONR) %A Stocking, M. L. %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1988 %T Scale drift in on-line calibration (Tech Rep. No. ERIC ED389710) %A Stocking, M. L. %C Educational Testing Service, Princeton, N.J. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1988 %T Simple and effective algorithms [for] computer-adaptive testing %A Linacre, J. M. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Generic %D 1988 %T Some considerations in maintaining adaptive test item pools (Research Report 88-33-ONR) %A Stocking, M. L. %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1988 %T Some considerations in maintaining adaptive test item pools (Tech Rep. No. ERIC ED391814) %A Stocking, M. L. %C Educational Testing Service, Princeton, N.J. %G eng %0 Journal Article %J Journal of Educational Psychology %D 1987 %T Self-adapted testing: A performance improving variation of computerized adaptive testing %A Rocklin, T. R., %A O’Donnell, A. M. %B Journal of Educational Psychology %V 79 %P 315-319 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1986 %T Some Applications of Optimization Algorithms in Test Design and Adaptive Testing %A Theunissen, T. J. J. M. %B Applied Psychological Measurement %V 10 %P 381-389 %G English %N 4 %0 Journal Article %J Applied Psychological Measurement %D 1986 %T Some applications of optimization algorithms in test design and adaptive testing %A Theunissen, T. J. J. M. %B Applied Psychological Measurement %V 10 %P 381-389 %G eng %0 Book %D 1985 %T Sequential analysis: Tests and confidence intervals %A Siegmund, D. %C New York: Springer-Verlag %G eng %0 Journal Article %J Multivariate Behavioral Research %D 1985 %T A structural comparison of conventional and adaptive versions of the ASVAB %A Cudeck, R. %X Examined several structural models of similarity between the Armed Services Vocational Aptitude Battery (ASVAB) and a battery of computerized adaptive tests designed to measure the same aptitudes. 12 plausible models were fitted to sample data in a double cross-validation design. 1,411 US Navy recruits completed 10 ASVAB subtests. A computerized adaptive test version of the ASVAB subtests was developed on item pools of approximately 200 items each. The items were pretested using applicants from military entrance processing stations across the US, resulting in a total calibration sample size of approximately 60,000 for the computerized adaptive tests. Three of the 12 models provided reasonable summaries of the data. One model with a multiplicative structure (M. W. Browne; see record 1984-24964-001) performed quite well. This model provides an estimate of the disattenuated method correlation between conventional testing and adaptive testing. In the present data, this correlation was estimated to be 0.97 and 0.98 in the 2 halves of the data. Results support computerized adaptive tests as replacements for conventional tests. (33 ref) (PsycINFO Database Record (c) 2004 APA, all rights reserved). %B Multivariate Behavioral Research %V 20 %P 305-322 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1984 %T The selection of items for decision making with a computer adaptive test %A Spray, J. A. %A Reckase, M. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Book Section %B New horizons in testing: Latent trait test theory and computerized adaptive testing %D 1983 %T Small N justifies Rasch model %A Lord, F. M., %E Bock, R. D. %B New horizons in testing: Latent trait test theory and computerized adaptive testing %I Academic Press %C New York, NY. USA %P 51-61 %G eng %0 Book %D 1983 %T The stochastic modeling of elementary psychological processes %A Townsend, J. T. %A Ashby, G. F. %C Cambridge: Cambridge University Press %G eng %0 Generic %D 1983 %T The stratified adaptive computerized ability test (Research Report 73-3) %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Computerized Adaptive Testing Laboratory %G eng %0 Journal Article %J Applied Psychological Measurement %D 1982 %T Sequential testing for selection %A Weitzman, R. A. %B Applied Psychological Measurement %V 6 %P 337-351 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1982 %T Sequential Testing for Selection %A Weitzman, R. A. %B Applied Psychological Measurement %V 6 %P 337-351 %G English %N 3 %0 Journal Article %J British Journal of Educational Psychology %D 1980 %T A simple form of tailored testing %A Nisbet, J. %A Adams, M. %A Arthur, J. %B British Journal of Educational Psychology %V 50 %P 301-303 %0 Book Section %D 1980 %T Some decision procedures for use with tailored testing %A Reckase, M. D. %C D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 79-100). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. %G eng %0 Book Section %D 1980 %T Some how and which for practical tailored testing %A Lord, F. M., %C L. J. T. van der Kamp, W. F. Langerak and D.N.M. de Gruijter (Eds): Psychometrics for educational debates (pp. 189-206). New York: John Wiley and Sons. Computer-Assisted Instruction, Testing, and Guidance (pp. 139-183). New York: Harper and Row. %G eng %0 Generic %D 1980 %T A successful application of latent trait theory to tailored achievement testing (Research Report 80-1) %A McKinley, R. L. %A Reckase, M. D. %C University of Missouri, Department of Educational Psychology, Tailored Testing Research Laboratory %G eng %0 Conference Paper %B Paper presented at the 87th annual meeting of the American Psychological Association %D 1979 %T Student reaction to computerized adaptive testing in the classroom %A Johnson, M. J. %B Paper presented at the 87th annual meeting of the American Psychological Association %C New York %G eng %0 Journal Article %J TIMS Studies in the Management Sciences %D 1978 %T The stratified adaptive ability test as a tool for personnel selection and placement %A Vale, C. D. %A Weiss, D. J. %B TIMS Studies in the Management Sciences %V 8 %P 135-151 %G eng %0 Journal Article %J Japanese Journal of Educational Psychology %D 1978 %T A stratified adaptive test of verbal ability %A Shiba, S. %A Noguchi, H. %A Haebra, T. %B Japanese Journal of Educational Psychology %V 26 %P 229-238 %0 Journal Article %J Applied Psychological Measurement %D 1977 %T Some properties of a Bayesian adaptive ability testing strategy %A J. R. McBride %B Applied Psychological Measurement %V 1 %P 121-140 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1977 %T Some Properties of a Bayesian Adaptive Ability Testing Strategy %A J. R. McBride %B Applied Psychological Measurement %V 1 %P 121-140 %G En %N 1 %0 Book Section %D 1977 %T Student attitudes toward tailored testing %A Koch, W. R. %A Patience, W. M. %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Book %D 1976 %T Simulation studies of adaptive testing: A comparative evaluation %A J. R. McBride %C Unpublished doctoral dissertation, University of Minnesota, Minneapolis, MN %G eng %0 Book Section %D 1976 %T Some likelihood functions found in tailored testing %A Lord, F. M., %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 79-81). Washington DC: U.S. Government Printing Office. %G eng %0 Generic %D 1976 %T Some properties of a Bayesian adaptive ability testing strategy (Research Report 76-1) %A J. R. McBride %A Weiss, D. J. %C Minneapolis MN: Department of Psychology, Computerized Adaptive Testing Laboratory %G eng %0 Book Section %D 1975 %T Scoring adaptive tests %A J. R. McBride %C D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 17-25. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Journal Article %J Journal of Computer-Based Instruction. %D 1975 %T Sequential testing for instructional classification %A Thomas, D. B. %B Journal of Computer-Based Instruction. %V 1 %P 92-99 %G eng %0 Generic %D 1975 %T A simulation study of stradaptive ability testing (Research Report 75-6) %A Vale, C. D. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Book Section %D 1975 %T Strategies of branching through an item pool %A Vale, C. D. %C D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 1-16. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Generic %D 1975 %T A study of computer-administered stradaptive ability testing (Research Report 75-4) %A Vale, C. D. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Generic %D 1974 %T Simulation studies of two-stage ability testing (Research Report 74-4) %A Betz, N. E. %A Weiss, D. J. %C Minneapolis: Department of Psychology, Psychometric Methods Program %G eng %0 Generic %D 1974 %T Strategies of adaptive ability measurement (Research Report 74-5) %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Generic %D 1973 %T The stratified adaptive computerized ability test (Research Report 73-3) %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1972 %T Sequential testing for dichotomous decisions. %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %K CCAT %K CLASSIFICATION Computerized Adaptive Testing %K sequential probability ratio testing %K SPRT %B Educational and Psychological Measurement %V 32 %P 85-95. %G eng %0 Journal Article %J Journal of Educational Measurement %D 1971 %T The self-scoring flexilevel test %A Lord, F. M., %B Journal of Educational Measurement %V 8 %P 147-151 %G eng %0 Generic %D 1970 %T The self-scoring flexilevel test (RB-7043) %A Lord, F. M., %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1970 %T Sequential testing for dichotomous decisions. College Entrance Examination Board Research and Development Report (RDR 69-70, No 3", and Educational Testing Service RB-70-31) %A Linn, R. L. %A Rock, D. A. %A Cleary, T. A. %C Princeton NJ: Educational Testing Service. %G eng %0 Book Section %D 1970 %T Some test theory for tailored testing %A Lord, F. M., %C W. H. Holtzman (Ed.), Computer-assisted instruction, testing, and guidance (pp.139-183). New York: Harper and Row. %G eng %0 Generic %D 1969 %T Short tailored tests (RB-69-63) %A Stocking, M. L. %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J American Psychologist %D 1956 %T The sequential item test %A Krathwohl, D. R. %A Huyser, R. J. %B American Psychologist %V 2 %P 419 %G eng %0 Journal Article %D 1950 %T Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis %A Armitage, P. %V 12 %P 137-144 %G eng %0 Journal Article %J Journal of Experimental Education %D 1950 %T Some empirical aspects of the sequential analysis technique as applied to an achievement examination %A Moonan, W. J. %B Journal of Experimental Education %V 18 %P 195-207 %G eng