%0 Journal Article %J Journal of Educational Measurement %D 2019 %T Routing Strategies and Optimizing Design for Multistage Testing in International Large-Scale Assessments %A Svetina, Dubravka %A Liaw, Yuan-Ling %A Rutkowski, Leslie %A Rutkowski, David %X Abstract This study investigates the effect of several design and administration choices on item exposure and person/item parameter recovery under a multistage test (MST) design. In a simulation study, we examine whether number-correct (NC) or item response theory (IRT) methods are differentially effective at routing students to the correct next stage(s) and whether routing choices (optimal versus suboptimal routing) have an impact on achievement precision. Additionally, we examine the impact of testlet length on both person and item recovery. Overall, our results suggest that no single approach works best across the studied conditions. With respect to the mean person parameter recovery, IRT scoring (via either Fisher information or preliminary EAP estimates) outperformed classical NC methods, although differences in bias and root mean squared error were generally small. Item exposure rates were found to be more evenly distributed when suboptimal routing methods were used, and item recovery (both difficulty and discrimination) was most precisely observed for items with moderate difficulties. Based on the results of the simulation study, we draw conclusions and discuss implications for practice in the context of international large-scale assessments that recently introduced adaptive assessment in the form of MST. Future research directions are also discussed. %B Journal of Educational Measurement %V 56 %P 192-213 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12206 %R 10.1111/jedm.12206 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Response Time and Response Accuracy in Computerized Adaptive Testing %A Yang Shi %K CAT %K response accuracy %K Response time %X

Introduction. This study explores the relationship between response speed and response accuracy in Computerized Adaptive Testing (CAT). CAT provides a score as well as item response times, which can offer additional diagnostic information regarding behavioral processes of task completion that cannot be uncovered by paper-based instruments. The goal of this study is to investigate how the accuracy rate evolves as a function of response time. If the accuracy of cognitive test responses decreases with response time, then it is an indication that the underlying cognitive process is a degrading process such as knowledge retrieval. More accessible knowledge can be retrieved faster than less accessible knowledge. For instance, in reading tasks, the time on task effect is negative and the more negative, the easier a task is. However, if the accuracy of cognitive test responses increases with response time, then the process is of an upgrading nature, with an increasing success rate as a function of response time. For example, problem-solving takes time, and fast responses are less likely to be well-founded responses. It is of course also possible that the relationship is curvilinear, as when an increasing success rate is followed by a decreasing success rate or vice versa.

Hypothesis. The present study argues the relationship between response time on task and response accuracy can be positive, negative, or curvilinear, which depends on cognitive nature of task items holding ability of the subjects and difficulty of the items constant.

Methodology. Data from a subsection of GRE quantitative test were available. We will use generalized linear mixed models. A linear model means a linear combination of predictors determining the probability of person p for answering item i correctly. Modeling mixed effects means both random effects and fixed effects are included. Fixed effects refer to constants across test takers. The models are equivalent with advanced IRT models that go beyond the regular modeling of test responses in terms of one or more latent variables and item parameters. The lme4 package for R will be utilized to conduct the statistical calculation.

Research questions. 1. What is the relationship between response accuracy and response speed? 2. What is the correlation between response accuracy and type of response time (fast response vs slow response) after controlling ability of people?

Preliminary Findings. 1. There is a negative relationship between response time and response accuracy. The success rate declines with elapsing response time. 2. The correlation between the two response latent variables (fast and slow) is 1.0, indicating the time on task effects between respond time types are not different.

Implications. The right amount of testing time in CAT is important—too much is wasteful and costly, too little impacts score validity. The study is expected to provide new perception on the relationship between response time and response accuracy, which in turn, contribute to the best timing strategy in CAT—with or without time constraints.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1yYP01bzGrKvJnfLwepcAoQQ2F4TdSvZ2 %0 Journal Article %J Frontiers in Education %D 2017 %T Robust Automated Test Assembly for Testlet-Based Tests: An Illustration with Analytical Reasoning Items %A Veldkamp, Bernard P. %A Paap, Muirne C. S. %B Frontiers in Education %V 2 %P 63 %U https://www.frontiersin.org/article/10.3389/feduc.2017.00063 %R 10.3389/feduc.2017.00063 %0 Journal Article %J Applied Psychological Measurement %D 2013 %T The Random-Threshold Generalized Unfolding Model and Its Application of Computerized Adaptive Testing %A Wang, Wen-Chung %A Liu, Chen-Wei %A Wu, Shiu-Lien %X

The random-threshold generalized unfolding model (RTGUM) was developed by treating the thresholds in the generalized unfolding model as random effects rather than fixed effects to account for the subjective nature of the selection of categories in Likert items. The parameters of the new model can be estimated with the JAGS (Just Another Gibbs Sampler) freeware, which adopts a Bayesian approach for estimation. A series of simulations was conducted to evaluate the parameter recovery of the new model and the consequences of ignoring the randomness in thresholds. The results showed that the parameters of RTGUM were recovered fairly well and that ignoring the randomness in thresholds led to biased estimates. Computerized adaptive testing was also implemented on RTGUM, where the Fisher information criterion was used for item selection and the maximum a posteriori method was used for ability estimation. The simulation study showed that the longer the test length, the smaller the randomness in thresholds, and the more categories in an item, the more precise the ability estimates would be.

%B Applied Psychological Measurement %V 37 %P 179-200 %U http://apm.sagepub.com/content/37/3/179.abstract %R 10.1177/0146621612469720 %0 Book Section %B Research on PISA. %D 2013 %T Reporting differentiated literacy results in PISA by using multidimensional adaptive testing. %A Frey, A. %A Seitz, N-N. %A Kröhne, U. %B Research on PISA. %I Dodrecht: Springer %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2012 %T On the Reliability and Validity of a Numerical Reasoning Speed Dimension Derived From Response Times Collected in Computerized Testing %A Davison, Mark L. %A Semmes, Robert %A Huang, Lan %A Close, Catherine N. %X

Data from 181 college students were used to assess whether math reasoning item response times in computerized testing can provide valid and reliable measures of a speed dimension. The alternate forms reliability of the speed dimension was .85. A two-dimensional structural equation model suggests that the speed dimension is related to the accuracy of speeded responses. Speed factor scores were significantly correlated with performance on the ACT math scale. Results suggest that the speed dimension underlying response times can be reliably measured and that the dimension is related to the accuracy of performance under the pressure of time limits.

%B Educational and Psychological Measurement %V 72 %P 245-263 %U http://epm.sagepub.com/content/72/2/245.abstract %R 10.1177/0013164411408412 %0 Journal Article %J Journal of Educational Measurement %D 2011 %T Restrictive Stochastic Item Selection Methods in Cognitive Diagnostic Computerized Adaptive Testing %A Wang, Chun %A Chang, Hua-Hua %A Huebner, Alan %X

This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback-Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson-Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method.

%B Journal of Educational Measurement %V 48 %P 255–273 %U http://dx.doi.org/10.1111/j.1745-3984.2011.00145.x %R 10.1111/j.1745-3984.2011.00145.x %0 Journal Article %J Health and Quality of Life Outcomes %D 2009 %T Reduction in patient burdens with graphical computerized adaptive testing on the ADL scale: tool development and simulation %A Chien, T. W. %A Wu, H. M. %A Wang, W-C. %A Castillo, R. V. %A Chou, W. %K *Activities of Daily Living %K *Computer Graphics %K *Computer Simulation %K *Diagnosis, Computer-Assisted %K Female %K Humans %K Male %K Point-of-Care Systems %K Reproducibility of Results %K Stroke/*rehabilitation %K Taiwan %K United States %X BACKGROUND: The aim of this study was to verify the effectiveness and efficacy of saving time and reducing burden for patients, nurses, and even occupational therapists through computer adaptive testing (CAT). METHODS: Based on an item bank of the Barthel Index (BI) and the Frenchay Activities Index (FAI) for assessing comprehensive activities of daily living (ADL) function in stroke patients, we developed a visual basic application (VBA)-Excel CAT module, and (1) investigated whether the averaged test length via CAT is shorter than that of the traditional all-item-answered non-adaptive testing (NAT) approach through simulation, (2) illustrated the CAT multimedia on a tablet PC showing data collection and response errors of ADL clinical functional measures in stroke patients, and (3) demonstrated the quality control of endorsing scale with fit statistics to detect responding errors, which will be further immediately reconfirmed by technicians once patient ends the CAT assessment. RESULTS: The results show that endorsed items could be shorter on CAT (M = 13.42) than on NAT (M = 23) at 41.64% efficiency in test length. However, averaged ability estimations reveal insignificant differences between CAT and NAT. CONCLUSION: This study found that mobile nursing services, placed at the bedsides of patients could, through the programmed VBA-Excel CAT module, reduce the burden to patients and save time, more so than the traditional NAT paper-and-pencil testing appraisals. %B Health and Quality of Life Outcomes %7 2009/05/07 %V 7 %P 39 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %M 19416521 %2 2688502 %0 Journal Article %J Quality of Life Research %D 2009 %T Replenishing a computerized adaptive test of patient-reported daily activity functioning %A Haley, S. M. %A Ni, P. %A Jette, A. M. %A Tao, W. %A Moed, R. %A Meyers, D. %A Ludlow, L. H. %K *Activities of Daily Living %K *Disability Evaluation %K *Questionnaires %K *User-Computer Interface %K Adult %K Aged %K Cohort Studies %K Computer-Assisted Instruction %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %X PURPOSE: Computerized adaptive testing (CAT) item banks may need to be updated, but before new items can be added, they must be linked to the previous CAT. The purpose of this study was to evaluate 41 pretest items prior to including them into an operational CAT. METHODS: We recruited 6,882 patients with spine, lower extremity, upper extremity, and nonorthopedic impairments who received outpatient rehabilitation in one of 147 clinics across 13 states of the USA. Forty-one new Daily Activity (DA) items were administered along with the Activity Measure for Post-Acute Care Daily Activity CAT (DA-CAT-1) in five separate waves. We compared the scoring consistency with the full item bank, test information function (TIF), person standard errors (SEs), and content range of the DA-CAT-1 to the new CAT (DA-CAT-2) with the pretest items by real data simulations. RESULTS: We retained 29 of the 41 pretest items. Scores from the DA-CAT-2 were more consistent (ICC = 0.90 versus 0.96) than DA-CAT-1 when compared with the full item bank. TIF and person SEs were improved for persons with higher levels of DA functioning, and ceiling effects were reduced from 16.1% to 6.1%. CONCLUSIONS: Item response theory and online calibration methods were valuable in improving the DA-CAT. %B Quality of Life Research %7 2009/03/17 %V 18 %P 461-71 %8 May %@ 0962-9343 (Print)0962-9343 (Linking) %G eng %M 19288222 %0 Journal Article %J Spanish Journal of Psychology %D 2008 %T Rotating item banks versus restriction of maximum exposure rates in computerized adaptive testing %A Barrada, J %A Olea, J. %A Abad, F. J. %K *Character %K *Databases %K *Software Design %K Aptitude Tests/*statistics & numerical data %K Bias (Epidemiology) %K Computing Methodologies %K Diagnosis, Computer-Assisted/*statistics & numerical data %K Educational Measurement/*statistics & numerical data %K Humans %K Mathematical Computing %K Psychometrics/statistics & numerical data %X

If examinees were to know, beforehand, part of the content of a computerized adaptive test, their estimated trait levels would then have a marked positive bias. One of the strategies to avoid this consists of dividing a large item bank into several sub-banks and rotating the sub-bank employed (Ariel, Veldkamp & van der Linden, 2004). This strategy permits substantial improvements in exposure control at little cost to measurement accuracy, However, we do not know whether this option provides better results than using the master bank with greater restriction in the maximum exposure rates (Sympson & Hetter, 1985). In order to investigate this issue, we worked with several simulated banks of 2100 items, comparing them, for RMSE and overlap rate, with the same banks divided in two, three... up to seven sub-banks. By means of extensive manipulation of the maximum exposure rate in each bank, we found that the option of rotating banks slightly outperformed the option of restricting maximum exposure rate of the master bank by means of the Sympson-Hetter method.

%B Spanish Journal of Psychology %7 2008/11/08 %V 11 %P 618-625 %@ 1138-7416 %G eng %M 18988447 %0 Journal Article %J Journal of Applied Measurement %D 2007 %T Relative precision, efficiency and construct validity of different starting and stopping rules for a computerized adaptive test: The GAIN Substance Problem Scale %A Riley, B. B. %A Conrad, K. J. %A Bezruczko, N. %A Dennis, M. L. %K My article %X Substance abuse treatment programs are being pressed to measure and make clinical decisions more efficiently about an increasing array of problems. This computerized adaptive testing (CAT) simulation examined the relative efficiency, precision and construct validity of different starting and stopping rules used to shorten the Global Appraisal of Individual Needs’ (GAIN) Substance Problem Scale (SPS) and facilitate diagnosis based on it. Data came from 1,048 adolescents and adults referred to substance abuse treatment centers in 5 sites. CAT performance was evaluated using: (1) average standard errors, (2) average number of items, (3) bias in personmeasures, (4) root mean squared error of person measures, (5) Cohen’s kappa to evaluate CAT classification compared to clinical classification, (6) correlation between CAT and full-scale measures, and (7) construct validity of CAT classification vs. clinical classification using correlations with five theoretically associated instruments. Results supported both CAT efficiency and validity. %B Journal of Applied Measurement %V 8 %P 48-65 %G eng %0 Journal Article %J Journal of Technology,Learning, and Assessment, %D 2007 %T A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005 %A Georgiadou, E. %A Triantafillou, E. %A Economides, A. A. %X Since researchers acknowledged the several advantages of computerized adaptive testing (CAT) over traditional linear test administration, the issue of item exposure control has received increased attention. Due to CAT’s underlying philosophy, particular items in the item pool may be presented too often and become overexposed, while other items are rarely selected by the CAT algorithm and thus become underexposed. Several item exposure control strategies have been presented in the literature aiming to prevent overexposure of some items and to increase the use rate of rarely or never selected items. This paper reviews such strategies that appeared in the relevant literature from 1983 to 2005. The focus of this paper is on studies that have been conducted in order to evaluate the effectiveness of item exposure control strategies for dichotomous scoring, polytomous scoring and testlet-based CAT systems. In addition, the paper discusses the strengths and weaknesses of each strategy group using examples from simulation studies. No new research is presented but rather a compendium of models is reviewed with an overall objective of providing researchers of this field, especially newcomers, a wide view of item exposure control strategies. %B Journal of Technology,Learning, and Assessment, %V 5(8) %G eng %0 Journal Article %J Applied Psychological Measurement %D 2005 %T A randomized experiment to compare conventional, computerized, and computerized adaptive administration of ordinal polytomous attitude items %A Hol, A. M. %A Vorst, H. C. M. %A Mellenbergh, G. J. %K Computer Assisted Testing %K Test Administration %K Test Items %X A total of 520 high school students were randomly assigned to a paper-and-pencil test (PPT), a computerized standard test (CST), or a computerized adaptive test (CAT) version of the Dutch School Attitude Questionnaire (SAQ), consisting of ordinal polytomous items. The CST administered items in the same order as the PPT. The CAT administered all items of three SAQ subscales in adaptive order using Samejima's graded response model, so that six different stopping rule settings could be applied afterwards. School marks were used as external criteria. Results showed significant but small multivariate administration mode effects on conventional raw scores and small to medium effects on maximum likelihood latent trait estimates. When the precision of CAT latent trait estimates decreased, correlations with grade point average in general decreased. However, the magnitude of the decrease was not very large as compared to the PPT, the CST, and the CAT without the stopping rule. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 29 %P 159-183 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2005 %T A Randomized Experiment to Compare Conventional, Computerized, and Computerized Adaptive Administration of Ordinal Polytomous Attitude Items %A Hol, A. Michiel %A Vorst, Harrie C. M. %A Mellenbergh, Gideon J. %X

A total of 520 high school students were randomly assigned to a paper-and-pencil test (PPT), a computerized standard test (CST), or a computerized adaptive test (CAT) version of the Dutch School Attitude Questionnaire (SAQ), consisting of ordinal polytomous items. The CST administered items in the same order as the PPT. The CAT administered all items of three SAQ subscales in adaptive order using Samejima’s graded response model, so that six different stopping rule settings could be applied afterwards. School marks were used as external criteria. Results showed significant but small multivariate administration mode effects on conventional raw scores and small to medium effects on maximum likelihood latent trait estimates. When the precision of CAT latent trait estimates decreased, correlations with grade point average in general decreased. However, the magnitude of the decrease was not very large as compared to the PPT, the CST, and the CAT without the stopping rule.

%B Applied Psychological Measurement %V 29 %P 159-183 %U http://apm.sagepub.com/content/29/3/159.abstract %R 10.1177/0146621604271268 %0 Report %D 2005 %T Recent trends in comparability studies %A Paek, P. %K computer adaptive testing %K Computerized assessment %K differential item functioning %K Mode effects %B PEM Research Report 05-05 %I Pearson %8 August, 2005 %@ 05-05 %G eng %0 Conference Paper %B National Council on Measurement in Education %D 2005 %T Rescuing CAT by fixing the problems %A Chang, S-H. %A Zhang, J. %B National Council on Measurement in Education %C Montreal, Canada %G eng %0 Journal Article %J Medical Care %D 2004 %T Refining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain %A Coster, W. J. %A Haley, S. M. %A Andres, P. L. %A Ludlow, L. H. %A Bond, T. L. %A Ni, P. S. %K *Self Efficacy %K *Sickness Impact Profile %K Activities of Daily Living/*classification/psychology %K Adult %K Aged %K Aged, 80 and over %K Disability Evaluation %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods/statistics & numerical data %K Questionnaires/*standards %K Recovery of Function/physiology %K Rehabilitation/*standards/statistics & numerical data %K Reproducibility of Results %K Research Support, U.S. Gov't, Non-P.H.S. %K Research Support, U.S. Gov't, P.H.S. %K Sensitivity and Specificity %X BACKGROUND: Rehabilitation outcome measures routinely include content on performance of daily activities; however, the conceptual basis for item selection is rarely specified. These instruments differ significantly in format, number, and specificity of daily activity items and in the measurement dimensions and type of scale used to specify levels of performance. We propose that a requirement for upper limb and hand skills underlies many activities of daily living (ADL) and instrumental activities of daily living (IADL) items in current instruments, and that items selected based on this definition can be placed along a single functional continuum. OBJECTIVE: To examine the dimensional structure and content coverage of a Personal Care and Instrumental Activities item set and to examine the comparability of items from existing instruments and a set of new items as measures of this domain. METHODS: Participants (N = 477) from 3 different disability groups and 4 settings representing the continuum of postacute rehabilitation care were administered the newly developed Activity Measure for Post-Acute Care (AM-PAC), the SF-8, and an additional setting-specific measure: FIM (in-patient rehabilitation); MDS (skilled nursing facility); MDS-PAC (postacute settings); OASIS (home care); or PF-10 (outpatient clinic). Rasch (partial-credit model) analyses were conducted on a set of 62 items covering the Personal Care and Instrumental domain to examine item fit, item functioning, and category difficulty estimates and unidimensionality. RESULTS: After removing 6 misfitting items, the remaining 56 items fit acceptably along the hypothesized continuum. Analyses yielded different difficulty estimates for the maximum score (eg, "Independent performance") for items with comparable content from different instruments. Items showed little differential item functioning across age, diagnosis, or severity groups, and 92% of the participants fit the model. CONCLUSIONS: ADL and IADL items from existing rehabilitation outcomes instruments that depend on skilled upper limb and hand use can be located along a single continuum, along with the new personal care and instrumental items of the AM-PAC addressing gaps in content. Results support the validity of the proposed definition of the Personal Care and Instrumental Activities dimension of function as a guide for future development of rehabilitation outcome instruments, such as linked, setting-specific short forms and computerized adaptive testing approaches. %B Medical Care %V 42 %P I62-172 %8 Jan %G eng %M 14707756 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Recalibration of IRT item parameters in CAT: Sparse data matrices and missing data treatments %A Harmes, J. C. %A Parshall, C. G. %A Kromrey, J. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Journal of Educational Measurement %D 2003 %T The relationship between item exposure and test overlap in computerized adaptive testing %A Chen, S-Y. %A Ankemann, R. D. %A Spray, J. A. %K (Statistical) %K Adaptive Testing %K Computer Assisted Testing %K Human Computer %K Interaction computerized adaptive testing %K Item Analysis %K Item Analysis (Test) %K Test Items %X The purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (r-sub(max)). Therefore, item exposure control methods which implement a specification of r-sub(max) (e.g., J. B. Sympson and R. D. Hetter, 1985) provide the most direct control at both the item and test levels. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 40 %P 129-145 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2003 %T The relationship between item exposure and test overlap in computerized adaptive testing %A Chen, S. %A Ankenmann, R. D. %A Spray, J. A. %B Journal of Educational Measurement %V 40 %P 129-145 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2003 %T The relationship between item exposure and test overlap in computerized adaptive testing %A Chen, S. %A Ankenmann, R. D. %A Spray, J. A. %B Journal of Educational Measurement %V 40 %P 129-145 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T A “rearrangement procedure” for administering adaptive tests with review options %A Papanastasiou, E. C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Redeveloping the exposure control parameters of CAT items when a pool is modified %A Chang, S-W. %A Harris, D. J. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Relative precision of ability estimation in polytomous CAT: A comparison under the generalized partial credit model and graded response model %A Wang, S %A Wang, T. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the International Conference on Computer-Based Testing and the Internet %D 2002 %T Reliability and decision accuracy of linear parallel form and multi stage tests with realistic and ideal item pools %A Jodoin, M. G. %B Paper presented at the International Conference on Computer-Based Testing and the Internet %C Winchester, England %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T The robustness of the unidimensional 3PL IRT model when applied to two-dimensional data in computerized adaptive testing %A Zhao, J. C. %A McMorris, R. F. %A Pruzek, R. M. %A Chen, R. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Book %D 2001 %T A rearrangement procedure for administering adaptive tests when review options are permitted %A Papanastasiou, E. C. %C Unpublished doctoral dissertation, Michigan State University %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T Refining a system for computerized adaptive testing pool creation %A Way, W. D. %A Swanson, l, %A Stocking, M. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Generic %D 2001 %T Refining a system for computerized adaptive testing pool creation (Research Report 01-18) %A Way, W. D. %A Swanson, L. %A Steffen, M. %A Stocking, M. L. %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Apuntes de Psicologia %D 2001 %T Requerimientos, aplicaciones e investigación en tests adaptativos informatizados [Requirements, applications, and investigation in computerized adaptive testing] %A Olea Díaz, J. %A Ponsoda Gil, V. %A Revuelta Menéndez, J. %A Hontangas Beltrán, P. %A Abad, F. J. %K Computer Assisted Testing %K English as Second Language %K Psychometrics computerized adaptive testing %X Summarizes the main requirements and applications of computerized adaptive testing (CAT) with emphasis on the differences between CAT and conventional computerized tests. Psychometric properties of estimations based on CAT, item selection strategies, and implementation software are described. Results of CAT studies in Spanish-speaking samples are described. Implications for developing a CAT measuring the English vocabulary of Spanish-speaking students are discussed. (PsycINFO Database Record (c) 2005 APA ) %B Apuntes de Psicologia %V 19 %P 11-28 %G eng %0 Journal Article %J Computers in Human Behavior %D 2000 %T A real data simulation of computerized adaptive administration of the MMPI-A %A Forbey, J. D. %A Handel, R. W. %A Ben-Porath, Y. S. %B Computers in Human Behavior %V 16 %P 83-96 %G eng %0 Journal Article %J Computers in Human Behavior %D 2000 %T A real data simulation of computerized adaptive administration of the MMPI-A %A Fobey, J. D. %A Handel, R. W. %A Ben-Porath, Y. S. %X A real data simulation of computerized adaptive administration of the Minnesota Multiphasic Inventory-Adolescent (MMPI-A) was conducted using item responses from three groups of participants. The first group included 196 adolescents (age range 14-18) tested at a midwestern residential treatment facility for adolescents. The second group was the normative sample used in the standardization of the MMPI-A (Butcher, Williams, Graham, Archer, Tellegen, Ben-Porath, & Kaemmer, 1992. Minnesota Multiphasic Inventory-Adolescent (MMPI-A): manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press.). The third group was the clinical sample: used in the validation of the MMPI-A (Williams & Butcher, 1989. An MMPI study of adolescents: I. Empirical validation of the study's scales. Personality assessment, 1, 251-259.). The MMPI-A data for each group of participants were run through a modified version of the MMPI-2 adaptive testing computer program (Roper, Ben-Porath & Butcher, 1995. Comparability and validity of computerized adaptive testing with the MMPI-2. Journal of Personality Assessment, 65, 358-371.). To determine the optimal amount of item savings, each group's MMPI-A item responses were used to simulate three different orderings of the items: (1) from least to most frequently endorsed in the keyed direction; (2) from least to most frequently endorsed in the keyed direction with the first 120 items rearranged into their booklet order; and (3) all items in booklet order. The mean number of items administered for each group was computed for both classification and full- scale elevations for T-score cut-off values of 60 and 65. Substantial item administration savings were achieved for all three groups, and the mean number of items saved ranged from 50 items (10.7% of the administered items) to 123 items (26.4% of the administered items), depending upon the T-score cut-off, classification method (i.e. classification only or full-scale elevation), and group. (C) 2000 Elsevier Science Ltd. All rights reserved. %B Computers in Human Behavior %V 16 %P 83-96 %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2000 %T Rescuing computerized testing by breaking Zipf’s law %A Wainer, H., %B Journal of Educational and Behavioral Statistics %V 25 %P 203-224 %G eng %0 Journal Article %J Medical Care %D 2000 %T Response to Hays et al and McHorney and Cohen: Practical implications of item response theory and computerized adaptive testing: A brief summary of ongoing studies of widely used headache impact scales %A Ware, J. E., Jr. %A Bjorner, J. B. %A Kosinski, M. %B Medical Care %V 38 %P 73-82 %G eng %0 Journal Article %J Popular Measurement %D 2000 %T A review of CAT review %A Sekula-Wacura, R. %X Studied the effects of answer review on results of a computerized adaptive test, the laboratory professional examination of the American Society of Clinical Pathologists. Results from 29,293 candidates show that candidates who changed answers were more likely to improve their scores. (SLD) %B Popular Measurement %V 3 %P 47-49 %G eng %M EJ610760 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T The rationale and principles of stratum scoring %A Wise, S. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T Reducing bias in CAT trait estimation: A comparison of approaches %A Wang, T. %A Hanson, B. H. %A C.-M. H. Lau %B Applied Psychological Measurement %V 23 %P 263-278 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Reducing item exposure without reducing precision (much) in computerized adaptive testing %A Holmes, R. M. %A Segall, D. O. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, CA %G eng %0 Book Section %D 1999 %T Research and development of a computer-adaptive test of listening comprehension in the less-commonly taught language Hausa %A Dunkel, P. %C M. Chalhoub-Deville (Ed.). Issues in computer-adaptive testing of reading proficiency. Cambridge, UK : Cambridge University Press. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Response time feedback on computer-administered tests %A Scrams, D. J. %A Schnipke, D. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Book %D 1999 %T The robustness of the unidimensional 3PL IRT model when applied to two-dimensional data in computerized adaptive testing %A Zhao, J. C. %C Unpublished Ph.D. dissertation, State University of New York at Albany %G eng %0 Generic %D 1998 %T The relationship between computer familiarity and performance on computer-based TOEFL test tasks (Research Report 98-08) %A Taylor, C. %A Jamieson, J. %A Eignor, D. R. %A Kirsch, I. %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %J Journal of Educational Measurement %D 1998 %T Reviewing and changing answers on computer-adaptive and self-adaptive vocabulary tests %A Vispoel, W. P. %B Journal of Educational Measurement %V 35 %P 328-345 %G eng %0 Conference Paper %B In T. Miller (Chair), High-dimensional simulation of item response data for CAT research. Psychometric Society %D 1997 %T Realistic simulation procedures for item response data %A Davey, T. %A Nering, M. %A Thompson, T. %B In T. Miller (Chair), High-dimensional simulation of item response data for CAT research. Psychometric Society %C Gatlinburg TN %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Relationship of response latency to test design, examinee ability, and item difficulty in computer-based test administration %A Swanson, D. B. %A Featherman, C. M. %A Case, A. M. %A Luecht, RM %A Nungester, R. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Book Section %D 1997 %T Reliability and construct validity of CAT-ASVAB %A Moreno, K. E. %A Segall, O. D. %C W. A. Sands, B. K. Waters, and J. R. McBride (Eds.). Computerized adaptive testing: From inquiry to operation (pp. 169-179). Washington DC: American Psychological Association. %G eng %0 Book Section %B Computerized adaptive testing: From inquiry to practice %D 1997 %T Research antecedents of applied adaptive testing %A J. R. McBride %E B. K. Waters %E J. R. McBride %K computerized adaptive testing %X (from the chapter) This chapter sets the stage for the entire computerized adaptive testing Armed Services Vocational Aptitude Battery (CAT-ASVAB) development program by describing the state of the art immediately preceding its inception. By the mid-l970s, a great deal of research had been conducted that provided the technical underpinnings needed to develop adaptive tests, but little research had been done to corroborate empirically the promising results of theoretical analyses and computer simulation studies. In this chapter, the author summarizes much of the important theoretical and simulation research prior to 1977. In doing so, he describes a variety of approaches to adaptive testing, and shows that while many methods for adaptive testing had been proposed, few practical attempts had been made to implement it. Furthermore, the few instances of adaptive testing were based primarily on traditional test theory, and were developed in laboratory settings for purposes of basic research. The most promising approaches, those based on item response theory and evaluated analytically or by means of computer simulations, remained to be proven in the crucible of live testing. (PsycINFO Database Record (c) 2004 APA, all rights reserved). %B Computerized adaptive testing: From inquiry to practice %7 xviii %I American Psychological Association %C Washington D.C. USA %P 47-57 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1997 %T Revising item responses in computerized adaptive tests: A comparison of three models %A Stocking, M. L. %K computerized adaptive testing %X Interest in the application of large-scale computerized adaptive testing has focused attention on issues that arise when theoretical advances are made operational. One such issue is that of the order in which exaniinees address questions within a test or separately timed test section. In linear testing, this order is entirely under the control of the examinee, who can look ahead at questions and return and revise answers to questions. Using simulation, this study investigated three models that permit restricted examinee control over revising previous answers in the context of adaptive testing. Even under a worstcase model of examinee revision behavior, two of the models of permitting item revisions worked well in preserving test fairness and accuracy. One model studied may also preserve some cognitive processing styles developed by examinees for a linear testing environment. %B Applied Psychological Measurement %V 21 %P 129-142 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1997 %T The role of item feedback in self-adapted testing %A Roos, L. L. %A Wise, S. L. %A Plake, B. S. %B Educational and Psychological Measurement %V 57 %P 85-98 %G eng %0 Generic %D 1996 %T Recursive maximum likelihood estimation, sequential design, and computerized adaptive testing %A Chang, Hua-Hua %A Ying, Z. %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1996 %T Revising item responses in computerized adaptive testing: A comparison of three models (RR-96-12) %A Stocking, M. L. %C Princeton NJ: Educational Testing Service %G eng %0 Book %D 1996 %T Robustness of a unidimensional computerized testing mastery procedure with multidimensional testing data %A Lau, CA %C Unpublished doctoral dissertation, University of Iowa, Iowa City IA %0 Conference Paper %B Paper presented at the Eleventh Workshop on Item Response Theory %D 1995 %T Recursive maximum likelihood estimation, sequential designs, and computerized adaptive testing %A Ying, Z. %A Chang, Hua-Hua %B Paper presented at the Eleventh Workshop on Item Response Theory %C University of Twente, the Netherlands %G eng %0 Journal Article %J Psychometrika %D 1995 %T Review of the book Computerized Adaptive Testing: A Primer %A Andrich, D. %B Psychometrika %V 4? %P 615-620 %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T The relationship between examinee anxiety and preference for self-adapted testing %A Wise, S. L. %A Roos, L. L. %A Plake, B. S., %A Nebelsick-Gullett, L. J. %B Applied Measurement in Education %V 7 %P 81-91 %G eng %0 Book Section %B Objective measurement, theory into practice %D 1994 %T Reliability of alternate computer adaptive tests %A Lunz, M. E. %A Bergstrom, Betty A. %A Wright, B. D. %B Objective measurement, theory into practice %I Ablex %C New Jersey %V II %G eng %0 Journal Article %J Journal of Educational Measurement %D 1991 %T On the reliability of testlet-based tests %A Sireci, S. G. %A Wainer, H., %A Thissen, D. %B Journal of Educational Measurement %V 28 %P 237-247 %G eng %0 Book Section %D 1990 %T Reliability and measurement precision %A Thissen, D. %C H. Wainer, N. J. Dorans, R. Flaugher, B. F. Green, R. J. Mislevy, L. Steinberg, and D. Thissen (Eds.), Computerized adaptive testing: A primer (pp. 161-186). Hillsdale NJ: Erlbaum. %G eng %0 Book Section %D 1990 %T A research proposal for field testing CAT for nursing licensure examinations %A A Zara %C Delegate Assembly Book of Reports 1989. Chicago: National Council of State Boards of Nursing. %G eng %0 Journal Article %J Psychological Assessment %D 1989 %T A real-data simulation of computerized adaptive administration of the MMPI %A Ben-Porath, Y. S. %A Slutske, W. S. %A Butcher, J. N. %K computerized adaptive testing %X A real-data simulation of computerized adaptive administration of the MMPI was conducted with data obtained from two personnel-selection samples and two clinical samples. A modification of the countdown method was tested to determine the usefulness, in terms of item administration savings, of several different test administration procedures. Substantial item administration savings were achieved for all four samples, though the clinical samples required administration of more items to achieve accurate classification and/or full-scale scores than did the personnel-selection samples. The use of normative item endorsement frequencies was found to be as effective as sample-specific frequencies for the determination of item administration order. The role of computerized adaptive testing in the future of personality assessment is discussed., (C) 1989 by the American Psychological Association %B Psychological Assessment %V 1 %P 18-22 %G eng %M 00012030-198903000-00003 %0 Book Section %D 1989 %T A research proposal for field testing CAT for nursing licensure examinations %A A Zara %C Delegate Assembly Book of Reports 1989. Chicago: National Council of State Boards of Nursing, Inc. %G eng %0 Conference Paper %B annual meeting of the American Educational Research Association %D 1988 %T The Rasch model and missing data, with an emphasis on tailoring test items %A de Gruijter, D. N. M. %X Many applications of educational testing have a missing data aspect (MDA). This MDA is perhaps most pronounced in item banking, where each examinee responds to a different subtest of items from a large item pool and where both person and item parameter estimates are needed. The Rasch model is emphasized, and its non-parametric counterpart (the Mokken scale) is considered. The possibility of tailoring test items in combination with their estimation is discussed; however, most methods for the estimation of item parameters are inadequate under tailoring. Without special measures, only marginal maximum likelihood produces adequate item parameter estimates under item tailoring. Fischer's approximate minimum-chi-square method for estimation of item parameters for the Rasch model is discussed, which efficiently produces item parameters. (TJH) %B annual meeting of the American Educational Research Association %C New Orleans, LA. USA %8 April 5-9 %G eng %M ED297012 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 1988 %T The Rasch model and multi-stage testing %A Glas, C. A. W. %B Journal of Educational and Behavioral Statistics %V 13 %P 45-52 %G eng %0 Book Section %D 1988 %T On a Rasch-model-based test for non-computerized adaptive testing %A Kubinger, K. D. %C Langeheine, R. and Rost, J. (Ed.), Latent trait and latent class models. New York: Plenum Press. %G eng %0 Conference Paper %B Paper presented at the 23rd Annual Symposium on recent developments in the use of the MMPI %D 1988 %T A real-data simulation of adaptive MMPI administration %A Slutske, W. S. %A Ben-Porath, Y. S. %A Butcher, J. N. %B Paper presented at the 23rd Annual Symposium on recent developments in the use of the MMPI %C St. Petersburg FL %G eng %0 Generic %D 1988 %T Refinement of the Computerized Adaptive Screening Test (CAST) (Final Report, Contract No MDA203 06-C-0373) %A Wise, L. L. %A McHenry, J.J. %A Chia, W.J. %A Szenas, P.L. %A J. R. McBride %C Washington, DC: American Institutes for Research. %G eng %0 Book Section %D 1985 %T Reducing the predictability of adaptive item sequences %A Wetzel, C. D. %A J. R. McBride %C Proceedings of the 27th Annual Conference of the Military Testing Association, San Diego, 43-48. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1984 %T Relationship Between Corresponding Armed Services Vocational Aptitude Battery (ASVAB) and Computerized Adaptive Testing (CAT) Subtests %A Moreno, K. E. %A Wetzel, C. D. %A J. R. McBride %A Weiss, D. J. %B Applied Psychological Measurement %V 8 %P 155-163 %G English %N 2 %0 Journal Article %J Applied Psychological Measurement %D 1984 %T Relationship between corresponding Armed Services Vocational Aptitude Battery (ASVAB) and computerized adaptive testing (CAT) subtests %A Moreno, K. E. %A Wetzel, C. D. %A J. R. McBride %A Weiss, D. J. %K computerized adaptive testing %X Investigated the relationships between selected subtests from the Armed Services Vocational Aptitude Battery (ASVAB) and corresponding subtests administered as computerized adaptive tests (CATs), using 270 17-26 yr old Marine recruits as Ss. Ss were administered the ASVAB before enlisting and approximately 2 wks after entering active duty, and the CAT tests were administered to Ss approximately 24 hrs after arriving at the recruit depot. Results indicate that 3 adaptive subtests correlated as well with ASVAB as did the 2nd administration of the ASVAB, although CAT subtests contained only half the number of items. Factor analysis showed CAT subtests to load on the same factors as the corresponding ASVAB subtests, indicating that the same abilities were being measured. It is concluded that CAT can achieve the same measurement precision as a conventional test, with half the number of items. (16 ref) %B Applied Psychological Measurement %V 8 %P 155-163 %G eng %0 Generic %D 1983 %T Relationship between corresponding Armed Services Vocational Aptitude Battery (ASVAB) and computerized adaptive testing (CAT) subtests (TR 83-27) %A Moreno, K. E. %A Wetzel, D. C. %A J. R. McBride %A Weiss, D. J. %C San Diego CA: Navy Personnel Research and Development Center %G eng %0 Book Section %D 1983 %T Reliability and validity of adaptive ability tests in a military setting %A J. R. McBride %A Martin, J. T. %C D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 224-236). New York: Academic Press. %G eng %0 Generic %D 1983 %T Reliability and validity of adaptive ability tests in a military recruit population (Research Report 83-1) %A J. R. McBride %A Martin, J. T. %A Weiss, D. J. %C Minneapolis: Department of Psychology, Psychometric Methods Program, Computerized Testing Laboratory %G eng %0 Generic %D 1983 %T Reliability and validity of adaptive vs. conventional tests in a military recruit population (Research Rep. No. 83-1). %A Martin, J. T. %A J. R. McBride %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory. %G eng %0 Book Section %D 1982 %T Robustness of adaptive testing to multidimensionality %A Weiss, D. J. %A Suhadolnik, D. %C D. J. Weiss (Ed.), Proceedings of the 1982 Item Response Theory and Computerized Adaptive Testing Conference. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. {PDF file, 1. %G eng %0 Book %D 1979 %T The Rasch model in computerized personality testing %A Kunce, C. S. %C Ph.D. dissertation, University of Missouri, Columbia, 1979 %G eng %0 Generic %D 1977 %T A rapid item search procedure for Bayesian adaptive testing (Research Report 77-4) %A Vale, C. D. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Conference Paper %B Third International Conference on Educational Testing %D 1977 %T Real-data simulation of a proposal for tailored teting %A Killcross, M. C. %B Third International Conference on Educational Testing %C Leyden, The Netherlands %8 06/1977 %G eng %0 Book Section %D 1977 %T Reduction of Test Bias by Adaptive Testing %A Pine, S. M. %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %0 Book Section %D 1976 %T Reflections on adaptive testing %A Hansen, D. N. %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 90-94). Washington DC: U.S. Government Printing Office. %G eng %0 Generic %D 1976 %T Research on adaptive testing 1973-1976: A review of the literature %A J. R. McBride %C Unpublished manuscript, University of Minnesota %G eng %0 Generic %D 1976 %T A review of research in tailored testing (Report APRE No %A Killcross, M. C. %C 9/76, Farnborough, Hants, U. K.: Ministry of Defence, Army Personnel Research Establishment.) %G eng %0 Book Section %D 1974 %T Recent and projected developments in ability testing by computer %A J. R. McBride %A Weiss, D. J. %C Earl Jones (Ed.), Symposium Proceedings: Occupational Research and the Navy–Prospectus 1980 (TR-74-14). San Diego, CA: Navy Personnel Research and Development Center. %G eng %0 Journal Article %J Review of Educational Research %D 1973 %T Response-contingent testing %A Wood, R. L. %B Review of Educational Research %V 43 %P 529-544 %G eng %0 Generic %D 1973 %T A review of testing and decision-making procedures (Technical Bulletin No. 15 %A Hambleton, R. K. %C Iowa City IA: American College Testing Program. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1971 %T Robbins-Monro procedures for tailored testing %A Lord, F. M., %B Educational and Psychological Measurement %V 31 %P 3-31 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1968 %T Reproduction of total test score through the use of sequential programmed tests %A Cleary, T. A. %A Linn, R. L. %A Rock, D. A. %B Journal of Educational Measurement %V 5 %P 183-187 %G eng