%0 Journal Article %J Journal of Computerized Adaptive Testing %D 2023 %T Expanding the Meaning of Adaptive Testing to Enhance Validity %A Steven L. Wise %K Adaptive Testing %K CAT %K CBT %K test-taking disengagement %K validity %B Journal of Computerized Adaptive Testing %V 10 %P 22-31 %G English %N 2 %R 10.7333/2305-1002022 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2023 %T An Extended Taxonomy of Variants of Computerized Adaptive Testing %A Roy Levy %A John T. Behrens %A Robert J. Mislevy %K Adaptive Testing %K evidence-centered design %K Item Response Theory %K knowledge-based model construction %K missingness %B Journal of Computerized Adaptive Testing %V 10 %G English %N 1 %R 10.7333/2302-100101 %0 Journal Article %J Journal of Educational Measurement %D 2019 %T Efficiency of Targeted Multistage Calibration Designs Under Practical Constraints: A Simulation Study %A Berger, Stéphanie %A Verschoor, Angela J. %A Eggen, Theo J. H. M. %A Moser, Urs %X Abstract Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we investigated whether the efficiency of calibration under the Rasch model could be enhanced by improving the match between item difficulty and student ability. We introduced targeted multistage calibration designs, a design type that considers ability-related background variables and performance for assigning students to suitable items. Furthermore, we investigated whether uncertainty about item difficulty could impair the assembling of efficient designs. The results indicated that targeted multistage calibration designs were more efficient than ordinary targeted designs under optimal conditions. Limited knowledge about item difficulty reduced the efficiency of one of the two investigated targeted multistage calibration designs, whereas targeted designs were more robust. %B Journal of Educational Measurement %V 56 %P 121-146 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12203 %R 10.1111/jedm.12203 %0 Journal Article %J Journal of Educational Measurement %D 2018 %T Evaluation of a New Method for Providing Full Review Opportunities in Computerized Adaptive Testing—Computerized Adaptive Testing With Salt %A Cui, Zhongmin %A Liu, Chunyan %A He, Yong %A Chen, Hanwei %X Abstract Allowing item review in computerized adaptive testing (CAT) is getting more attention in the educational measurement field as more and more testing programs adopt CAT. The research literature has shown that allowing item review in an educational test could result in more accurate estimates of examinees’ abilities. The practice of item review in CAT, however, is hindered by the potential danger of test-manipulation strategies. To provide review opportunities to examinees while minimizing the effect of test-manipulation strategies, researchers have proposed different algorithms to implement CAT with restricted revision options. In this article, we propose and evaluate a new method that implements CAT without any restriction on item review. In particular, we evaluate the new method in terms of the accuracy on ability estimates and the robustness against test-manipulation strategies. This study shows that the newly proposed method is promising in a win-win situation: examinees have full freedom to review and change answers, and the impacts of test-manipulation strategies are undermined. %B Journal of Educational Measurement %V 55 %P 582-594 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12193 %R 10.1111/jedm.12193 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Efficiency of Item Selection in CD-CAT Based on Conjunctive Bayesian Network Modeling Hierarchical attributes %A Soo-Yun Han %A Yun Joo Yoo %K CD-CAT %K Conjuctive Bayesian Network Modeling %K item selection %X

Cognitive diagnosis models (CDM) aim to diagnosis examinee’s mastery status of multiple fine-grained skills. As new development for cognitive diagnosis methods emerges, much attention is given to cognitive diagnostic computerized adaptive testing (CD-CAT) as well. The topics such as item selection methods, item exposure control strategies, and online calibration methods, which have been wellstudied for traditional item response theory (IRT) based CAT, are also investigated in the context of CD-CAT (e.g., Xu, Chang, & Douglas, 2003; Wang, Chang, & Huebner, 2011; Chen et al., 2012).

In CDM framework, some researchers suggest to model structural relationship between cognitive skills, or namely, attributes. Especially, attributes can be hierarchical, such that some attributes must be acquired before the subsequent ones are mastered. For example, in mathematics, addition must be mastered before multiplication, which gives a hierarchy model for addition skill and multiplication skill. Recently, new CDMs considering attribute hierarchies have been suggested including the Attribute Hierarchy Method (AHM; Leighton, Gierl, & Hunka, 2004) and the Hierarchical Diagnostic Classification Models (HDCM; Templin & Bradshaw, 2014).

Bayesian Networks (BN), the probabilistic graphical models representing the relationship of a set of random variables using a directed acyclic graph with conditional probability distributions, also provide an efficient framework for modeling the relationship between attributes (Culbertson, 2016). Among various BNs, conjunctive Bayesian network (CBN; Beerenwinkel, Eriksson, & Sturmfels, 2007) is a special kind of BN, which assumes partial ordering between occurrences of events and conjunctive constraints between them.

In this study, we propose using CBN for modeling attribute hierarchies and discuss the advantage of CBN for CDM. We then explore the impact of the CBN modeling on the efficiency of item selection methods for CD-CAT when the attributes are truly hierarchical. To this end, two simulation studies, one for fixed-length CAT and another for variable-length CAT, are conducted. For each studies, two attribute hierarchy structures with 5 and 8 attributes are assumed. Among the various item selection methods developed for CD-CAT, six algorithms are considered: posterior-weighted Kullback-Leibler index (PWKL; Cheng, 2009), the modified PWKL index (MPWKL; Kaplan, de la Torre, Barrada, 2015), Shannon entropy (SHE; Tatsuoka, 2002), mutual information (MI; Wang, 2013), posterior-weighted CDM discrimination index (PWCDI; Zheng & Chang, 2016) and posterior-weighted attribute-level CDM discrimination index (PWACDI; Zheng & Chang, 2016). The impact of Q-matrix structure, item quality, and test termination rules on the efficiency of item selection algorithms is also investigated. Evaluation measures include the attribute classification accuracy (fixed-length experiment) and test length of CDCAT until stopping (variable-length experiment).

The results of the study indicate that the efficiency of item selection is improved by directly modeling the attribute hierarchies using CBN. The test length until achieving diagnosis probability threshold was reduced to 50-70% for CBN based CAT compared to the CD-CAT assuming independence of attributes. The magnitude of improvement is greater when the cognitive model of the test includes more attributes and when the test length is shorter. We conclude by discussing how Q-matrix structure, item quality, and test termination rules affect the efficiency.

References

Beerenwinkel, N., Eriksson, N., & Sturmfels, B. (2007). Conjunctive bayesian networks. Bernoulli, 893- 909.

Chen, P., Xin, T., Wang, C., & Chang, H. H. (2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77(2), 201-222.

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619-632.

Culbertson, M. J. (2016). Bayesian networks in educational assessment: the state of the field. Applied Psychological Measurement, 40(1), 3-21.

Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167-188.

Leighton, J. P., Gierl, M. J., & Hunka, S. M. (2004). The attribute hierarchy method for cognitive assessment: a variation on Tatsuoka's rule‐space approach. Journal of Educational Measurement, 41(3), 205-237.

Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 51(3), 337-350.

Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317-339. Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and Psychological Measurement, 73(6), 1017-1035.

Wang, C., Chang, H. H., & Huebner, A. (2011). Restrictive stochastic item selection methods in cognitive diagnostic computerized adaptive testing. Journal of Educational Measurement, 48(3), 255-273.

Xu, X., Chang, H., & Douglas, J. (2003, April). A simulation study to compare CAT strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago.

Zheng, C., & Chang, H. H. (2016). High-efficiency response distribution–based item selection algorithms for short-length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40(8), 608-624.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1RbO2gd4aULqsSgRi_VZudNN_edX82NeD %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Efficiency of Targeted Multistage Calibration Designs under Practical Constraints: A Simulation Study %A Stephanie Berger %A Angela J. Verschoor %A Theo Eggen %A Urs Moser %K CAT %K Efficiency %K Multistage Calibration %X

Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we focused on two related research questions. First, we investigated whether the efficiency of item calibration under the Rasch model could be enhanced by calibration designs that optimize the match between item difficulty and student ability (Berger, 1991). Therefore, we introduced targeted multistage calibration designs, a design type that refers to a combination of traditional targeted calibration designs and multistage designs. As such, targeted multistage calibration designs consider ability-related background variables (e.g., grade in school), as well as performance (i.e., outcome of a preceding test stage) for assigning students to suitable items.

Second, we explored how limited a priori knowledge about item difficulty affects the efficiency of both targeted calibration designs and targeted multistage calibration designs. When arranging items within a given calibration design, test developers need to know the item difficulties to locate items optimally within the design. However, usually, no empirical information about item difficulty is available before item calibration. Owing to missing empirical data, test developers might fail to assign all items to the most suitable location within a calibration design.

Both research questions were addressed in a simulation study in which we varied the calibration design, as well as the accuracy of item distribution across the different booklets or modules within each design (i.e., number of misplaced items). The results indicated that targeted multistage calibration designs were more efficient than ordinary targeted designs under optimal conditions. Especially, targeted multistage calibration designs provided more accurate estimates for very easy and 52 IACAT 2017 ABSTRACTS BOOKLET very difficult items. Limited knowledge about item difficulty during test construction impaired the efficiency of all designs. The loss of efficiency was considerably large for one of the two investigated targeted multistage calibration designs, whereas targeted designs were more robust.

References

Berger, M. P. F. (1991). On the efficiency of IRT models when applied to different sampling designs. Applied Psychological Measurement, 15(3), 293–306. doi:10.1177/014662169101500310

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/file/d/1ko2LuiARKqsjL_6aupO4Pj9zgk6p_xhd/view?usp=sharing %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T An Empirical Simulation Study Using mstR for MST Designs %A Soo Lee %K mstR %K multistage testing %X

Unlike other systems of adaptive testing, multistage testing (MST) provides many benefits of adaptive testing and linear testing, and has become the most sought-after form for computerized testing in educational assessment recently. It is greatly fit for testing educational achievement and can be adapted to practical educational surveys testing. However, there are many practical considerations for MST design for operational implementations including costs and benefits. As a practitioner, we need to start with various simulations to evaluate the various MST designs and their performances before the implementations. A recently developed statistical tool mstR, an open source R package, was released to support the researchers and practitioners to aid their MST simulations for implementations.

Conventional MST design has three stages of module (i.e., 1-2-3 design) structure. Alternatively, the composition of modules diverges from one design to another (e.g., 1-3 design). For advance planning of equivalence studies, this paper utilizes both 1-2-3 design and 1-3 design for the MST structures. In order to study the broad structure of these values, this paper evaluates the different MST designs through simulations using the R package mstR. The empirical simulation study provides an introductory overview of mstR and describes what mstR offers using different MST structures from 2PL item bank. Further comparisons will show the advantages of the different MST designs (e.g., 1-2-3 design and 1-3 design) for different practical implementations.

As an open-source statistical environment R, mstR provides a great simulation tool and allows psychologists, social scientists, and educational measurement scientists to apply it to innovative future assessments in the operational use of MST.

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Evaluation of Parameter Recovery, Drift, and DIF with CAT Data %A Nathan Thompson %A Jordan Stoeger %K CAT %K DIF %K Parameter Drift %K Parameter Recovery %X

Parameter drift and differential item functioning (DIF) analyses are frequent components of a test maintenance plan. That is, after a test form(s) is published, organizations will often calibrate postpublishing data at a later date to evaluate whether the performance of the items or the test has changed over time. For example, if item content is leaked, the items might gradually become easier over time, and item statistics or parameters can reflect this.

When tests are published under a computerized adaptive testing (CAT) paradigm, they are nearly always calibrated with item response theory (IRT). IRT calibrations assume that range restriction is not an issue – that is, each item is administered to a range of examinee ability. CAT data violates this assumption. However, some organizations still wish to evaluate continuing performance of the items from a DIF or drift paradigm.

This presentation will evaluate just how inaccurate DIF and drift analyses might be on CAT data, using a Monte Carlo parameter recovery methodology. Known item parameters will be used to generate both linear and CAT data sets, which are then calibrated for DIF and drift. In addition, we will implement Randomesque item exposure constraints in some CAT conditions, as this randomization directly alleviates the range restriction problem somewhat, but it is an empirical question as to whether this improves the parameter recovery calibrations.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1F7HCZWD28Q97sCKFIJB0Yps0H66NPeKq %0 Journal Article %J Quality of Life Research %D 2016 %T On the effect of adding clinical samples to validation studies of patient-reported outcome item banks: a simulation study %A Smits, Niels %X To increase the precision of estimated item parameters of item response theory models for patient-reported outcomes, general population samples are often enriched with samples of clinical respondents. Calibration studies provide little information on how this sampling scheme is incorporated into model estimation. In a small simulation study the impact of ignoring the oversampling of clinical respondents on item and person parameters is illustrated. %B Quality of Life Research %V 25 %P 1635–1644 %U https://doi.org/10.1007/s11136-015-1199-9 %R 10.1007/s11136-015-1199-9 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2016 %T Effect of Imprecise Parameter Estimation on Ability Estimation in a Multistage Test in an Automatic Item Generation Context %A Colvin, Kimberly %A Keller, Lisa A %A Robin, Frederic %K Adaptive Testing %K automatic item generation %K errors in item parameters %K item clones %K multistage testing %B Journal of Computerized Adaptive Testing %V 4 %P 1-18 %G English %U http://iacat.org/jcat/index.php/jcat/article/view/59/27 %N 1 %R 10.7333/1608-040101 %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Exploration of Item Selection in Dual-Purpose Cognitive Diagnostic Computerized Adaptive Testing: Based on the RRUM %A Dai, Buyun %A Zhang, Minqiang %A Li, Guangming %X Cognitive diagnostic computerized adaptive testing (CD-CAT) can be divided into two broad categories: (a) single-purpose tests, which are based on the subject’s knowledge state (KS) alone, and (b) dual-purpose tests, which are based on both the subject’s KS and traditional ability level ( ). This article seeks to identify the most efficient item selection method for the latter type of CD-CAT corresponding to various conditions and various evaluation criteria, respectively, based on the reduced reparameterized unified model (RRUM) and the two-parameter logistic model of item response theory (IRT-2PLM). The Shannon entropy (SHE) and Fisher information methods were combined to produce a new synthetic item selection index, that is, the “dapperness with information (DWI)” index, which concurrently considers both KS and within one step. The new method was compared with four other methods. The results showed that, in most conditions, the new method exhibited the best performance in terms of KS estimation and the second-best performance in terms of estimation. Item utilization uniformity and computing time are also considered for all the competing methods. %B Applied Psychological Measurement %V 40 %P 625-640 %U http://apm.sagepub.com/content/40/8/625.abstract %R 10.1177/0146621616666008 %0 Journal Article %J Applied Psychological Measurement %D 2015 %T The Effect of Upper and Lower Asymptotes of IRT Models on Computerized Adaptive Testing %A Cheng, Ying %A Liu, Cheng %X In this article, the effect of the upper and lower asymptotes in item response theory models on computerized adaptive testing is shown analytically. This is done by deriving the step size between adjacent latent trait estimates under the four-parameter logistic model (4PLM) and two models it subsumes, the usual three-parameter logistic model (3PLM) and the 3PLM with upper asymptote (3PLMU). The authors show analytically that the large effect of the discrimination parameter on the step size holds true for the 4PLM and the two models it subsumes under both the maximum information method and the b-matching method for item selection. Furthermore, the lower asymptote helps reduce the positive bias of ability estimates associated with early guessing, and the upper asymptote helps reduce the negative bias induced by early slipping. Relative step size between modeling versus not modeling the upper or lower asymptote under the maximum Fisher information method (MI) and the b-matching method is also derived. It is also shown analytically why the gain from early guessing is smaller than the loss from early slipping when the lower asymptote is modeled, and vice versa when the upper asymptote is modeled. The benefit to loss ratio is quantified under both the MI and the b-matching method. Implications of the analytical results are discussed. %B Applied Psychological Measurement %V 39 %P 551-565 %U http://apm.sagepub.com/content/39/7/551.abstract %R 10.1177/0146621615585850 %0 Journal Article %J Educational Measurement: Issues and Practice %D 2015 %T Evaluating Content Alignment in Computerized Adaptive Testing %A Wise, S. L. %A Kingsbury, G. G. %A Webb, N. L. %X The alignment between a test and the content domain it measures represents key evidence for the validation of test score inferences. Although procedures have been developed for evaluating the content alignment of linear tests, these procedures are not readily applicable to computerized adaptive tests (CATs), which require large item pools and do not use fixed test forms. This article describes the decisions made in the development of CATs that influence and might threaten content alignment. It outlines a process for evaluating alignment that is sensitive to these threats and gives an empirical example of the process. %B Educational Measurement: Issues and Practice %V 34 %N 4 %R http://dx.doi.org/10.1111/emip.12094 %0 Journal Article %J Journal of Educational Measurement %D 2014 %T An Enhanced Approach to Combine Item Response Theory With Cognitive Diagnosis in Adaptive Testing %A Wang, Chun %A Zheng, Chanjin %A Chang, Hua-Hua %X

Computerized adaptive testing offers the possibility of gaining information on both the overall ability and cognitive profile in a single assessment administration. Some algorithms aiming for these dual purposes have been proposed, including the shadow test approach, the dual information method (DIM), and the constraint weighted method. The current study proposed two new methods, aggregate ranked information index (ARI) and aggregate standardized information index (ASI), which appropriately addressed the noncompatibility issue inherent in the original DIM method. More flexible weighting schemes that put different emphasis on information about general ability (i.e., θ in item response theory) and information about cognitive profile (i.e., α in cognitive diagnostic modeling) were also explored. Two simulation studies were carried out to investigate the effectiveness of the new methods and weighting schemes. Results showed that the new methods with the flexible weighting schemes could produce more accurate estimation of both overall ability and cognitive profile than the original DIM. Among them, the ASI with both empirical and theoretical weights is recommended, and attribute-level weighting scheme is preferred if some attributes are considered more important from a substantive perspective.

%B Journal of Educational Measurement %V 51 %P 358–380 %U http://dx.doi.org/10.1111/jedm.12057 %R 10.1111/jedm.12057 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T Enhancing Pool Utilization in Constructing the Multistage Test Using Mixed-Format Tests %A Park, Ryoungsun %A Kim, Jiseon %A Chung, Hyewon %A Dodd, Barbara G. %X

This study investigated a new pool utilization method of constructing multistage tests (MST) using the mixed-format test based on the generalized partial credit model (GPCM). MST simulations of a classification test were performed to evaluate the MST design. A linear programming (LP) model was applied to perform MST reassemblies based on the initial MST construction. Three subsequent MST reassemblies were performed. For each reassembly, three test unit replacement ratios (TRRs; 0.22, 0.44, and 0.66) were investigated. The conditions of the three passing rates (30%, 50%, and 70%) were also considered in the classification testing. The results demonstrated that various MST reassembly conditions increased the overall pool utilization rates, while maintaining the desired MST construction. All MST testing conditions performed equally well in terms of the precision of the classification decision.

%B Applied Psychological Measurement %V 38 %P 268-280 %U http://apm.sagepub.com/content/38/4/268.abstract %R 10.1177/0146621613515545 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2013 %T Estimating Measurement Precision in Reduced-Length Multi-Stage Adaptive Testing %A Crotts, K.M. %A Zenisky, A. L. %A Sireci, S.G. %A Li, X. %B Journal of Computerized Adaptive Testing %V 1 %P 67-87 %G English %N 4 %R 10.7333/1309-0104067 %0 Journal Article %J Journal of Educational Measurement %D 2012 %T An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing %A Han, Kyung T. %X

Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation and long-term quality control of CAT. This study proposed a new item selection method using the “efficiency balanced information” criterion to address issues with the maximum Fisher information method and stratification methods. According to the simulation results, the new efficiency balanced information method had desirable advantages over the other studied item selection methods in terms of improving the optimality of CAT assembly and utilizing items with low a-values while eliminating the need for item pool stratification.

%B Journal of Educational Measurement %V 49 %P 225–246 %U http://dx.doi.org/10.1111/j.1745-3984.2012.00173.x %R 10.1111/j.1745-3984.2012.00173.x %0 Journal Article %J Applied Psychological Measurement %D 2012 %T An Empirical Evaluation of the Slip Correction in the Four Parameter Logistic Models With Computerized Adaptive Testing %A Yen, Yung-Chin %A Ho, Rong-Guey %A Laio, Wen-Wei %A Chen, Li-Ju %A Kuo, Ching-Chin %X

In a selected response test, aberrant responses such as careless errors and lucky guesses might cause error in ability estimation because these responses do not actually reflect the knowledge that examinees possess. In a computerized adaptive test (CAT), these aberrant responses could further cause serious estimation error due to dynamic item administration. To enhance the robust performance of CAT against aberrant responses, Barton and Lord proposed the four-parameter logistic (4PL) item response theory (IRT) model. However, most studies relevant to the 4PL IRT model were conducted based on simulation experiments. This study attempts to investigate the performance of the 4PL IRT model as a slip-correction mechanism with an empirical experiment. The results showed that the 4PL IRT model could not only reduce the problematic underestimation of the examinees’ ability introduced by careless mistakes in practical situations but also improve measurement efficiency.

%B Applied Psychological Measurement %V 36 %P 75-87 %U http://apm.sagepub.com/content/36/2/75.abstract %R 10.1177/0146621611432862 %0 Thesis %B THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY %D 2011 %T Effects of Different Computerized Adaptive Testing Strategies of Recovery of Ability %A Kalender, I. %X

The purpose of the present study is to compare ability estimations obtained from computerized adaptive testing (CAT) procedure with the paper and pencil test administration results of Student Selection Examination (SSE) science subtest considering different ability estimation methods and test termination rules. There are two phases in the present study. In the first phase, a post-hoc simulation was conducted to find out relationships between examinee ability levels estimated by CAT and paper and pencil test versions of the SSE. Maximum Likelihood Estimation and Expected A Posteriori were used as ability estimation method. Test termination rules were standard error threshold and fixed number of items. Second phase was actualized by implementing a CAT administration to a group of examinees to investigate performance of CAT administration in an environment other than simulated administration. Findings of post-hoc simulations indicated CAT could be implemented by using Expected A Posteriori estimation method with standard error threshold value of 0.30 or higher for SSE. Correlation between ability estimates obtained by CAT and real SSE was found to be 0.95. Mean of number of items given to examinees by CAT is 18.4. Correlation between live CAT and real SSE ability estimations was 0.74. Number of items used for CAT administration is approximately 50% of the items in paper and pencil SSE science subtest. Results indicated that CAT for SSE science subtest provided ability estimations with higher reliability with fewer items compared to paper and pencil format.

%B THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY %V Ph.D. %G eng %0 Journal Article %J Quality of Life Research %D 2010 %T Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms %A Choi, S. %A Reise, S. P. %A Pilkonis, P. A. %A Hays, R. D. %A Cella, D. %B Quality of Life Research %V 19(1) %P 125–136 %G eng %0 Book %D 2010 %T Elements of Adaptive Testing %A van der Linden, W. J. %A Glas, C. A. W. %I Springer %C New York %P 437 %G eng %R 10.1007/978-0-387-85461-8 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Estimation of the Parameters in an Item-Cloning Model for Adaptive Testing %A Glas, C. A. W. %A van der Linden, W. J. %A Geerlings, H. %B Elements of Adaptive Testing %P 289-314 %G eng %& 15 %R 10.1007/978-0-387-85461-8 %0 Book Section %D 2009 %T Effect of early misfit in computerized adaptive testing on the recovery of theta %A Guyer, R. D. %A Weiss, D. J. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Diagnostica %D 2009 %T Effekte des adaptiven Testens auf die Moti¬vation zur Testbearbeitung [Effects of adaptive testing on test taking motivation]. %A Frey, A. %A Hartig, J. %A Moosbrugger, H. %B Diagnostica %V 55 %P 20-28 %G German %0 Journal Article %J International Journal for Methods in Psychiatric Research %D 2009 %T Evaluation of a computer-adaptive test for the assessment of depression (D-CAT) in clinical application %A Fliege, H. %A Becker, J. %A Walter, O. B. %A Rose, M. %A Bjorner, J. B. %A Klapp, B. F. %X In the past, a German Computerized Adaptive Test, based on Item Response Theory (IRT), was developed for purposes of assessing the construct depression [Computer-adaptive test for depression (D-CAT)]. This study aims at testing the feasibility and validity of the real computer-adaptive application.The D-CAT, supplied by a bank of 64 items, was administered on personal digital assistants (PDAs) to 423 consecutive patients suffering from psychosomatic and other medical conditions (78 with depression). Items were adaptively administered until a predetermined reliability (r >/= 0.90) was attained. For validation purposes, the Hospital Anxiety and Depression Scale (HADS), the Centre for Epidemiological Studies Depression (CES-D) scale, and the Beck Depression Inventory (BDI) were administered. Another sample of 114 patients was evaluated using standardized diagnostic interviews [Composite International Diagnostic Interview (CIDI)].The D-CAT was quickly completed (mean 74 seconds), well accepted by the patients and reliable after an average administration of only six items. In 95% of the cases, 10 items or less were needed for a reliable score estimate. Correlations between the D-CAT and the HADS, CES-D, and BDI ranged between r = 0.68 and r = 0.77. The D-CAT distinguished between diagnostic groups as well as established questionnaires do.The D-CAT proved an efficient, well accepted and reliable tool. Discriminative power was comparable to other depression measures, whereby the CAT is shorter and more precise. Item usage raises questions of balancing the item selection for content in the future. Copyright (c) 2009 John Wiley & Sons, Ltd. %B International Journal for Methods in Psychiatric Research %7 2009/02/06 %V 18 %P 233-236 %8 Feb 4 %@ 1049-8931 (Print) %G Eng %M 19194856 %0 Book Section %D 2009 %T An evaluation of a new procedure for computing information functions for Bayesian scores from computerized adaptive tests %A Ito, K. %A Pommerich, M %A Segall, D. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Clinical Epidemiology %D 2009 %T An evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception %A Kocalevent, R. D. %A Rose, M. %A Becker, J. %A Walter, O. B. %A Fliege, H. %A Bjorner, J. B. %A Kleiber, D. %A Klapp, B. F. %K *Diagnosis, Computer-Assisted %K Adolescent %K Adult %K Aged %K Aged, 80 and over %K Confidence Intervals %K Female %K Humans %K Male %K Middle Aged %K Perception %K Quality of Health Care/*standards %K Questionnaires %K Reproducibility of Results %K Sickness Impact Profile %K Stress, Psychological/*diagnosis/psychology %K Treatment Outcome %X OBJECTIVES: This study aimed to develop and evaluate a first computerized adaptive test (CAT) for the measurement of stress perception (Stress-CAT), in terms of the two dimensions: exposure to stress and stress reaction. STUDY DESIGN AND SETTING: Item response theory modeling was performed using a two-parameter model (Generalized Partial Credit Model). The evaluation of the Stress-CAT comprised a simulation study and real clinical application. A total of 1,092 psychosomatic patients (N1) were studied. Two hundred simulees (N2) were generated for a simulated response data set. Then the Stress-CAT was given to n=116 inpatients, (N3) together with established stress questionnaires as validity criteria. RESULTS: The final banks included n=38 stress exposure items and n=31 stress reaction items. In the first simulation study, CAT scores could be estimated with a high measurement precision (SE<0.32; rho>0.90) using 7.0+/-2.3 (M+/-SD) stress reaction items and 11.6+/-1.7 stress exposure items. The second simulation study reanalyzed real patients data (N1) and showed an average use of items of 5.6+/-2.1 for the dimension stress reaction and 10.0+/-4.9 for the dimension stress exposure. Convergent validity showed significantly high correlations. CONCLUSIONS: The Stress-CAT is short and precise, potentially lowering the response burden of patients in clinical decision making. %B Journal of Clinical Epidemiology %7 2008/07/22 %V 62 %P 278-287 %@ 1878-5921 (Electronic)0895-4356 (Linking) %G eng %M 18639439 %0 Book Section %D 2009 %T An examination of decision-theory adaptive testing procedures %A Rudner, L. M. %X This research examined three ways to adaptively select items using decision theory: a traditional decision theory sequential testing approach (expected minimum cost), information gain (modeled after Kullback-Leibler), and a maximum discrimination approach, and then compared them all against an approach using maximum IRT Fisher information. It also examined the use of Wald’s (1947) wellknown sequential probability ratio test, SPRT, as a test termination rule in this context. The minimum cost approach was notably better than the best-case possibility for IRT. Information gain, which is based on entropy and comes from information theory, was almost identical to minimum cost. The simple approach using the item that best discriminates between the two most likely classifications also fared better than IRT, but not as well as information gain or minimum cost. Through Wald’s SPRT, large percentages of examinees can be accurately classified with very few items. With only 25 sequentially selected items, for example, approximately 90% of the simulated NAEP examinees were classified with 86% accuracy. The advantages of the decision theory model are many—the model yields accurate mastery state classifications, can use a small item pool, is simple to implement, requires little pretesting, is applicable to criterion-referenced tests, can be used in diagnostic testing, can be adapted to yield classifications on multiple skills, and should be easy to explain to non-statisticians. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book %D 2008 %T Effect of early misfit in computerized adaptive testing on the recovery of theta %A Guyer, R. D. %C Unpublished Ph.D. dissertation, University of Minnesota, Minneapolis MN. %G eng %0 Journal Article %J Disability & Rehabilitation %D 2008 %T Efficiency and sensitivity of multidimensional computerized adaptive testing of pediatric physical functioning %A Allen, D. D. %A Ni, P. %A Haley, S. M. %K *Disability Evaluation %K Child %K Computers %K Disabled Children/*classification/rehabilitation %K Efficiency %K Humans %K Outcome Assessment (Health Care) %K Psychometrics %K Reproducibility of Results %K Retrospective Studies %K Self Care %K Sensitivity and Specificity %X PURPOSE: Computerized adaptive tests (CATs) have efficiency advantages over fixed-length tests of physical functioning but may lose sensitivity when administering extremely low numbers of items. Multidimensional CATs may efficiently improve sensitivity by capitalizing on correlations between functional domains. Using a series of empirical simulations, we assessed the efficiency and sensitivity of multidimensional CATs compared to a longer fixed-length test. METHOD: Parent responses to the Pediatric Evaluation of Disability Inventory before and after intervention for 239 children at a pediatric rehabilitation hospital provided the data for this retrospective study. Reliability, effect size, and standardized response mean were compared between full-length self-care and mobility subscales and simulated multidimensional CATs with stopping rules at 40, 30, 20, and 10 items. RESULTS: Reliability was lowest in the 10-item CAT condition for the self-care (r = 0.85) and mobility (r = 0.79) subscales; all other conditions had high reliabilities (r > 0.94). All multidimensional CAT conditions had equivalent levels of sensitivity compared to the full set condition for both domains. CONCLUSIONS: Multidimensional CATs efficiently retain the sensitivity of longer fixed-length measures even with 5 items per dimension (10-item CAT condition). Measuring physical functioning with multidimensional CATs could enhance sensitivity following intervention while minimizing response burden. %B Disability & Rehabilitation %7 2008/02/26 %V 30 %P 479-84 %@ 0963-8288 (Print)0963-8288 (Linking) %G eng %M 18297502 %0 Journal Article %J Educational Assessment %D 2007 %T The effect of including pretest items in an operational computerized adaptive test: Do different ability examinees spend different amounts of time on embedded pretest items? %A Ferdous, A. A. %A Plake, B. S. %A Chang, S-R. %K ability %K operational computerized adaptive test %K pretest items %K time %X The purpose of this study was to examine the effect of pretest items on response time in an operational, fixed-length, time-limited computerized adaptive test (CAT). These pretest items are embedded within the CAT, but unlike the operational items, are not tailored to the examinee's ability level. If examinees with higher ability levels need less time to complete these items than do their counterparts with lower ability levels, they will have more time to devote to the operational test questions. Data were from a graduate admissions test that was administered worldwide. Data from both quantitative and verbal sections of the test were considered. For the verbal section, examinees in the lower ability groups spent systematically more time on their pretest items than did those in the higher ability groups, though for the quantitative section the differences were less clear. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Educational Assessment %I Lawrence Erlbaum: US %V 12 %P 161-173 %@ 1062-7197 (Print); 1532-6977 (Electronic) %G eng %M 2007-06685-003 %0 Generic %D 2007 %T The effect of using item parameters calibrated from paper administrations in computer adaptive test administrations %A Pommerich, M %K Mode effects %X Computer administered tests are becoming increasingly prevalent as computer technology becomes more readily available on a large scale. For testing programs that utilize both computer and paper administrations, mode effects are problematic in that they can result in examinee scores that are artificially inflated or deflated. As such, researchers have engaged in extensive studies of whether scores differ across paper and computer presentations of the same tests. The research generally seems to indicate that the more complicated it is to present or take a test on computer, the greater the possibility of mode effects. In a computer adaptive test, mode effects may be a particular concern if items are calibrated using item responses obtained from one administration mode (i.e., paper), and those parameters are then used operationally in a different administration mode (i.e., computer). This paper studies the suitability of using parameters calibrated from a paper administration for item selection and scoring in a computer adaptive administration, for two tests with lengthy passages that required navigation in the computer administration. The results showed that the use of paper calibrated parameters versus computer calibrated parameters in computer adaptive administrations had small to moderate effects on the reliability of examinee scores, at fairly short test lengths. This effect was generally diminished for longer test lengths. However, the results suggest that in some cases, some loss in reliability might be inevitable if paper-calibrated parameters are used in computer adaptive administrations. %B Journal of Technology, Learning, and Assessment %V 5 %P 1-29 %G eng %0 Journal Article %J The Journal of Technology, Learning, and Assessment %D 2007 %T The Effect of Using Item Parameters Calibrated from Paper Administrations in Computer Adaptive Test Administrations %A Pommerich, M %X

Computer administered tests are becoming increasingly prevalent as computer technology becomes more readily available on a large scale. For testing programs that utilize both computer and paper administrations, mode effects are problematic in that they can
result in examinee scores that are artificially inflated or deflated. As such, researchers have engaged in extensive studies of whether scores differ across paper and computer presentations of the same tests. The research generally seems to indicate that the more
complicated it is to present or take a test on computer, the greater the possibility of mode effects. In a computer adaptive test, mode effects may be a particular concern if items are calibrated using item responses obtained from one administration mode (i.e., paper), and those parameters are then used operationally in a different administration mode (i.e., computer). This paper studies the suitability of using parameters calibrated from a paper administration for item selection and scoring in a computer adaptive administration, for two tests with lengthy passages that required navigation in the computer administration. The results showed that the use of paper calibrated parameters versus computer calibrated parameters in computer adaptive administrations had small to
moderate effects on the reliability of examinee scores, at fairly short test lengths. This effect was generally diminished for longer test lengths. However, the results suggest that in some cases, some loss in reliability might be inevitable if paper-calibrated parameters
are used in computer adaptive administrations. 

%B The Journal of Technology, Learning, and Assessment %V 5 %0 Journal Article %J Educational and Psychological Measurement %D 2007 %T Estimating the Standard Error of the Maximum Likelihood Ability Estimator in Adaptive Testing Using the Posterior-Weighted Test Information Function %A Penfield, Randall D. %X

The standard error of the maximum likelihood ability estimator is commonly estimated by evaluating the test information function at an examinee's current maximum likelihood estimate (a point estimate) of ability. Because the test information function evaluated at the point estimate may differ from the test information function evaluated at an examinee's true ability value, the estimated standard error may be biased under certain conditions. This is of particular concern in adaptive testing because the height of the test information function is expected to be higher at the current estimate of ability than at the actual value of ability. This article proposes using the posterior-weighted test information function in computing the standard error of the maximum likelihood ability estimator for adaptive test sessions. A simulation study showed that the proposed approach provides standard error estimates that are less biased and more efficient than those provided by the traditional point estimate approach.

%B Educational and Psychological Measurement %V 67 %P 958-975 %U http://epm.sagepub.com/content/67/6/958.abstract %R 10.1177/0013164407301544 %0 Journal Article %J International Journal of Web-Based Learning and Teaching Technologies %D 2007 %T Evaluation of computer adaptive testing systems %A Economides, A. A. %A Roupas, C %K computer adaptive testing systems %K examination organizations %K systems evaluation %X Many educational organizations are trying to reduce the cost of the exams, the workload and delay of scoring, and the human errors. Also, they try to increase the accuracy and efficiency of the testing. Recently, most examination organizations use computer adaptive testing (CAT) as the method for large scale testing. This article investigates the current state of CAT systems and identifies their strengths and weaknesses. It evaluates 10 CAT systems using an evaluation framework of 15 domains categorized into three dimensions: educational, technical, and economical. The results show that the majority of the CAT systems give priority to security, reliability, and maintainability. However, they do not offer to the examinee any advanced support and functionalities. Also, the feedback to the examinee is limited and the presentation of the items is poor. Recommendations are made in order to enhance the overall quality of a CAT system. For example, alternative multimedia items should be available so that the examinee would choose a preferred media type. Feedback could be improved by providing more information to the examinee or providing information anytime the examinee wished. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B International Journal of Web-Based Learning and Teaching Technologies %I IGI Global: US %V 2 %P 70-87 %@ 1548-1093 (Print); 1548-1107 (Electronic) %G eng %M 2007-04391-004 %0 Journal Article %J Acta Psychologica Sinica %D 2007 %T An exploration and realization of computerized adaptive testing with cognitive diagnosis %A Haijing, L. %A Shuliang, D. %X An increased attention paid to “cognitive bugs behavior,” appears to lead to an increased research interests in diagnostic testing based on Item Response Theory(IRT)that combines cognitive psychology and psychometrics. The study of cognitive diagnosis were applied mainly to Paper-and-Pencil (P&P) testing. Rarely has it been applied to computerized adaptive testing CAT), To our knowledge, no research on CAT with cognitive diagnosis has been conducted in China. Since CAT is more efficient and accurate than P&P testing, there is important to develop an application technique for cognitive diagnosis suitable for CAT. This study attempts to construct a preliminary CAT system for cognitive diagnosis.With the help of the methods for “ Diagnosis first, Ability estimation second ”, the knowledge state conversion diagram was used to describe all the possible knowledge states in a domain of interest and the relation among the knowledge states at the diagnosis stage, where a new strategy of item selection based-on the algorithm of Depth First Search was proposed. On the other hand, those items that contain attributes which the examinee has not mastered were removed in ability estimation. At the stage of accurate ability estimation, all the items answered by each examinee not only matched his/her ability estimated value, but also were limited to those items whose attributes have been mastered by the examinee.We used Monte Carlo Simulation to simulate all the data of the three different structures of cognitive attributes in this study. These structures were tree-shaped, forest-shaped, and some isolated vertices (that are related to simple Q-matrix). Both tree-shaped and isolated vertices structure were derived from actual cases, while forest-shaped structure was a generalized simulation. 3000 examinees and 3000 items were simulated in the experiment of tree-shaped, 2550 examinees and 3100 items in forest-shaped, and 2000 examinees and 2500 items in isolated vertices. The maximum test length was all assumed as 30 items for all those experiments. The difficulty parameters and the logarithm of the discrimination were drawn from the standard normal distribution N(0,1). There were 100 examinees of each attribute pattern in the experiment of tree-shaped and 50 examinees of each attribute pattern in forest-shaped. In isolated vertices, 2000 examinees are students come from actual case.To assess the behaviors of the proposed diagnostic approach, three assessment indices were used. They are attribute pattern classification agreement rate (abr.APCAR), the Recovery (the average of the absolute deviation between the estimated value and the true value) and the average test length (abr. Length).Parts of results of Monte Carlo study were as follows.For the attribute structure of tree-shaped, APCAR is 84.27%,Recovery is 0.17,Length is 24.80.For the attribute structure of forest-shaped, APCAR is 84.02%,Recovery is 0.172,Length is 23.47.For the attribute structure of isolated vertices, APCAR is 99.16%,Recorvery is 0.256,Length is 27.32.As show the above, we can conclude that the results are favorable. The rate of cognitive diagnosis accuracy has exceeded 80% in each experiment, and the Recovery is also good. Therefore, it should be an acceptable idea to construct an initiatory CAT system for cognitive diagnosis, if we use the methods for “Diagnosis first, Ability estimation second ” with the help of both knowledge state conversion diagram and the new strategy of item selection based-on the algorithm of Depth First Search %B Acta Psychologica Sinica %V 39 %P 747-753 %G eng %0 Book Section %D 2007 %T Exploring potential designs for multi-form structure computerized adaptive tests with uniform item exposure %A Edwards, M. C. %A Thissen, D. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2006 %T Effects of Estimation Bias on Multiple-Category Classification With an IRT-Based Adaptive Classification Procedure %A Yang, Xiangdong %A Poggio, John C. %A Glasnapp, Douglas R. %X

The effects of five ability estimators, that is, maximum likelihood estimator, weighted likelihood estimator, maximum a posteriori, expected a posteriori, and Owen's sequential estimator, on the performances of the item response theory–based adaptive classification procedure on multiple categories were studied via simulations. The following results were found. (a) The Bayesian estimators were more likely to misclassify examinees into an inward category because of their inward biases, when a fixed start value of zero was assigned to every examinee. (b) When moderately accurate start values were available, however, Bayesian estimators produced classifications that were slightly more accurate than was the maximum likelihood estimator or weighted likelihood estimator. Expected a posteriori was the procedure that produced the most accurate results among the three Bayesian methods. (c) All five estimators produced equivalent efficiencies in terms of number of items required, which was 50 or more items except for abilities that were less than -2.00 or greater than 2.00.

%B Educational and Psychological Measurement %V 66 %P 545-564 %U http://epm.sagepub.com/content/66/4/545.abstract %R 10.1177/0013164405284031 %0 Journal Article %J Applied Psychological Measurement %D 2006 %T Equating scores from adaptive to linear tests %A van der Linden, W. J. %K computerized adaptive testing %K equipercentile equating %K local equating %K score reporting %K test characteristic function %X Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test for a population of test takers. The two local methods were generally best. Surprisingly, the TCF method performed slightly worse than the equipercentile method. Both methods showed strong bias and uniformly large inaccuracy, but the TCF method suffered from extra error due to the lower asymptote of the test characteristic function. It is argued that the worse performances of the two methods are a consequence of the fact that they use a single equating transformation for an entire population of test takers and therefore have to compromise between the individual score distributions. %B Applied Psychological Measurement %I Sage Publications: US %V 30 %P 493-508 %@ 0146-6216 (Print) %G eng %M 2006-20197-003 %0 Journal Article %J J Educ Eval Health Prof %D 2006 %T Estimation of an examinee's ability in the web-based computerized adaptive testing program IRT-CAT %A Lee, Y. H. %A Park, J. H. %A Park, I. Y. %X We developed a program to estimate an examinee s ability in order to provide freely available access to a web-based computerized adaptive testing (CAT) program. We used PHP and Java Script as the program languages, PostgresSQL as the database management system on an Apache web server and Linux as the operating system. A system which allows for user input and searching within inputted items and creates tests was constructed. We performed an ability estimation on each test based on a Rasch model and 2- or 3-parametric logistic models. Our system provides an algorithm for a web-based CAT, replacing previous personal computer-based ones, and makes it possible to estimate an examinee's ability immediately at the end of test. %B J Educ Eval Health Prof %7 2006/01/01 %V 3 %P 4 %@ 1975-5937 (Electronic) %G eng %M 19223996 %2 2631187 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T An evaluation of a patient-reported outcomes found computerized adaptive testing was efficient in assessing osteoarthritis impact %A Kosinski, M. %A Bjorner, J. %A Warejr, J. %A Sullivan, E. %A Straus, W. %X BACKGROUND AND OBJECTIVES: Evaluate a patient-reported outcomes questionnaire that uses computerized adaptive testing (CAT) to measure the impact of osteoarthritis (OA) on functioning and well-being. MATERIALS AND METHODS: OA patients completed 37 questions about the impact of OA on physical, social and role functioning, emotional well-being, and vitality. Questionnaire responses were calibrated and scored using item response theory, and two scores were estimated: a Total-OA score based on patients' responses to all 37 questions, and a simulated CAT-OA score where the computer selected and scored the five most informative questions for each patient. Agreement between Total-OA and CAT-OA scores was assessed using correlations. Discriminant validity of Total-OA and CAT-OA scores was assessed with analysis of variance. Criterion measures included OA pain and severity, patient global assessment, and missed work days. RESULTS: Simulated CAT-OA and Total-OA scores correlated highly (r = 0.96). Both Total-OA and simulated CAT-OA scores discriminated significantly between patients differing on the criterion measures. F-statistics across criterion measures ranged from 39.0 (P < .001) to 225.1 (P < .001) for the Total-OA score, and from 40.5 (P < .001) to 221.5 (P < .001) for the simulated CAT-OA score. CONCLUSIONS: CAT methods produce valid and precise estimates of the impact of OA on functioning and well-being with significant reduction in response burden. %B Journal of Clinical Epidemiology %V 59 %P 715-723 %@ 08954356 %G eng %0 Journal Article %J British Journal of Educational Technology %D 2006 %T Evaluation parameters for computer adaptive testing %A Georgiadou, E. %A Triantafillou, E. %A Economides, A. A. %B British Journal of Educational Technology %V Vol. 37 %P 261-278 %G eng %N No 2 %0 Journal Article %J Journal of Applied Measurement %D 2006 %T Expansion of a physical function item bank and development of an abbreviated form for clinical research %A Bode, R. K. %A Lai, J-S. %A Dineen, K. %A Heinemann, A. W. %A Shevrin, D. %A Von Roenn, J. %A Cella, D. %K clinical research %K computerized adaptive testing %K performance levels %K physical function item bank %K Psychometrics %K test reliability %K Test Validity %X We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Applied Measurement %I Richard M Smith: US %V 7 %P 1-15 %@ 1529-7713 (Print) %G eng %M 2006-01262-001 %0 Conference Paper %B Annual meeting of the National Council on Measurement in Education %D 2005 %T The effectiveness of using multiple item pools in computerized adaptive testing %A Zhang, J. %A Chang, H. %B Annual meeting of the National Council on Measurement in Education %C Montreal, Canada %8 04/2005 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2004 %T Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing %A Chen, S-Y. %A Ankenmann, R. D. %K computerized adaptive testing %K item selection rules %K practical constraints %X The purpose of this study was to compare the effects of four item selection rules--(1) Fisher information (F), (2) Fisher information with a posterior distribution (FP), (3) Kullback-Leibler information with a posterior distribution (KP), and (4) completely randomized item selection (RN)--with respect to the precision of trait estimation and the extent of item usage at the early stages of computerized adaptive testing. The comparison of the four item selection rules was carried out under three conditions: (1) using only the item information function as the item selection criterion; (2) using both the item information function and content balancing; and (3) using the item information function, content balancing, and item exposure control. When test length was less than 10 items, FP and KP tended to outperform F at extreme trait levels in Condition 1. However, in more realistic settings, it could not be concluded that FP and KP outperformed F, especially when item exposure control was imposed. When test length was greater than 10 items, the three nonrandom item selection procedures performed similarly no matter what the condition was, while F had slightly higher item usage. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Educational Measurement %I Blackwell Publishing: United Kingdom %V 41 %P 149-174 %@ 0022-0655 (Print) %G eng %M 2005-04771-004 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2004 %T Estimating ability and item-selection strategy in self-adapted testing: A latent class approach %A Revuelta, J. %K estimating ability %K item-selection strategies %K psychometric model %K self-adapted testing %X This article presents a psychometric model for estimating ability and item-selection strategies in self-adapted testing. In contrast to computer adaptive testing, in self-adapted testing the examinees are allowed to select the difficulty of the items. The item-selection strategy is defined as the distribution of difficulty conditional on the responses given to previous items. The article shows that missing responses in self-adapted testing are missing at random and can be ignored in the estimation of ability. However, the item-selection strategy cannot always be ignored in such an estimation. An EM algorithm is presented to estimate an examinee's ability and strategies, and a model fit is evaluated using Akaike's information criterion. The article includes an application with real data to illustrate how the model can be used in practice for evaluating hypotheses, estimating ability, and identifying strategies. In the example, four strategies were identified and related to examinees' ability. It was shown that individual examinees tended not to follow a consistent strategy throughout the test. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Educational and Behavioral Statistics %I American Educational Research Assn: US %V 29 %P 379-396 %@ 1076-9986 (Print) %G eng %M 2005-00264-002 %0 Report %D 2004 %T Evaluating scale stability of a computer adaptive testing system %A Guo, F. %A Wang, L. %I GMAC %C McLean, VA %G eng %0 Book %D 2004 %T Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment %A Zenisky, A. L. %C Unpublished doctoral dissertation, University of Massachusetts, Amherst %G eng %0 Journal Article %J ReCALL %D 2004 %T Évaluation et multimédia dans l'apprentissage d'une L2 [Assessment and multimedia in learning an L2] %A Laurier, M. %K Adaptive Testing %K Computer Assisted Instruction %K Educational %K Foreign Language Learning %K Program Evaluation %K Technology computerized adaptive testing %X In the first part of this paper different areas where technology may be used for second language assessment are described. First, item banking operations, which are generally based on item Response Theory but not necessarily restricted to dichotomously scored items, facilitate assessment task organization and require technological support. Second, technology may help to design more authentic assessment tasks or may be needed in some direct testing situations. Third, the assessment environment may be more adapted and more stimulating when technology is used to give the student more control. The second part of the paper presents different functions of assessment. The monitoring function (often called formative assessment) aims at adapting the classroom activities to students and to provide continuous feedback. Technology may be used to train the teachers in monitoring techniques, to organize data or to produce diagnostic information; electronic portfolios or quizzes that are built in some educational software may also be used for monitoring. The placement function is probably the one in which the application of computer adaptive testing procedures (e.g. French CAPT) is the most appropriate. Automatic scoring devices may also be used for placement purposes. Finally the certification function requires more valid and more reliable tools. Technology may be used to enhance the testing situation (to make it more authentic) or to facilitate data processing during the construction of a test. Almond et al. (2002) propose a four component model (Selection, Presentation, Scoring and Response) for designing assessment systems. Each component must be planned taking into account the assessment function. %B ReCALL %V 16 %P 475-487 %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2004 %T Evaluation of the CATSIB DIF procedure in a pretest setting %A Nandakumar, R. %A Roussos, L. A. %K computerized adaptive tests %K differential item functioning %X A new procedure, CATSIB, for assessing differential item functioning (DIF) on computerized adaptive tests (CATs) is proposed. CATSIB, a modified SIBTEST procedure, matches test takers on estimated ability and controls for impact-induced Type I error inflation by employing a CAT version of the SIBTEST "regression correction." The performance of CATSIB in terms of detection of DIF in pretest items was evaluated in a simulation study. Simulated test takers were adoptively administered 25 operational items from a pool of 1,000 and were linearly administered 16 pretest items that were evaluated for DIF. Sample size varied from 250 to 500 in each group. Simulated impact levels ranged from a 0- to 1-standard-deviation difference in mean ability levels. The results showed that CATSIB with the regression correction displayed good control over Type 1 error, whereas CATSIB without the regression correction displayed impact-induced Type 1 error inflation. With 500 test takers in each group, power rates were exceptionally high (84% to 99%) for values of DIF at the boundary between moderate and large DIF. For smaller samples of 250 test takers in each group, the corresponding power rates ranged from 47% to 95%. In addition, in all cases, CATSIB was very accurate in estimating the true values of DIF, displaying at most only minor estimation bias. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Educational and Behavioral Statistics %I American Educational Research Assn: US %V 29 %P 177-199 %@ 1076-9986 (Print) %G eng %M 2004-19188-002 %0 Generic %D 2003 %T Effect of extra time on GRE® Quantitative and Verbal Scores (Research Report 03-13) %A Bridgeman, B. %A Cline, F. %A Hessinger, J. %C Princeton NJ: Educational Testing service %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T The effect of item selection method on the variability of CAT’s ability estimates when item parameters are contaminated with measurement errors %A Li, Y. H. %A Schafer, W. D. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Generic %D 2003 %T The effects of model misfit in computerized classification test %A Jiao, H. %A Lau, A. C. %C Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago IL %G eng %0 Journal Article %J Dissertation Abstracts International Section A: Humanities & Social Sciences %D 2003 %T The effects of model specification error in item response theory-based computerized classification test using sequential probability ratio test %A Jiao, H. %X This study investigated the effects of model specification error on classification accuracy, error rates, and average test length in Item Response Theory (IRT) based computerized classification test (CCT) using sequential probability ratio test (SPRT) in making binary decisions from examinees' dichotomous responses. This study consisted of three sub-studies. In each sub-study, one of the three unidimensional dichotomous IRT models, the 1-parameter logistic (IPL), the 2-parameter logistic (2PL), and the 3-parameter logistic (3PL) model was set as the true model and the other two models were treated as the misfit models. Item pool composition, test length, and stratum depth were manipulated to simulate different test conditions. To ensure the validity of the study results, the true model based CCTs using the true and the recalibrated item parameters were compared first to study the effect of estimation error in item parameters in CCTs. Then, the true model and the misfit model based CCTs were compared to accomplish the research goal, The results indicated that estimation error in item parameters did not affect classification results based on CCTs using SPRT. The effect of model specification error depended on the true model, the misfit model, and the item pool composition. When the IPL or the 2PL IRT model was the true model, the use of another IRT model had little impact on the CCT results. When the 3PL IRT model was the true model, the use of the 1PL model raised the false positive error rates. The influence of using the 2PL instead of the 3PL model depended on the item pool composition. When the item discrimination parameters varied greatly from uniformity of one, the use of the 2PL IRT model raised the false negative error rates to above the nominal level. In the simulated test conditions with test length and item exposure constraints, using a misfit model in CCTs most often affected the average test length. Its effects on error rates and classification accuracy were negligible. It was concluded that in CCTs using SPRT, IRT model selection and evaluation is indispensable (PsycINFO Database Record (c) 2004 APA, all rights reserved). %B Dissertation Abstracts International Section A: Humanities & Social Sciences %V 64 %P 478 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Effects of test administration mode on item parameter estimates %A Yi, Q. %A Harris, D. J. %A Wang, T. %A Ban, J-C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2003 %T Evaluating a new approach to detect aberrant responses in CAT %A Lu, Y., %A Robin, F. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2003 %T Evaluating computer-based test security by generalized item overlap rates %A Zhang, J. %A Lu, T. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Evaluating computerized adaptive testing design for the MCAT with realistic simulated data %A Lu, Y., %A Pitoniak, M. %A Rizavi, S. %A Way, W. D. %A Steffan, M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Evaluating stability of online item calibrations under varying conditions %A Thomasson, G. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Evaluating the comparability of English- and French-speaking examinees on a science achievement test administered using two-stage testing %A Puhan, G. %A Gierl, M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 2003 %T The evaluation of exposure control procedures for an operational CAT. %A French, B. F. %A Thompson, T. T. %B Paper presented at the Annual Meeting of the American Educational Research Association %C Chicago IL %0 Journal Article %J Journal of Applied Measurement %D 2003 %T An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model %A Davis, L. L. %A Pastor, D. A. %A Dodd, B. G. %A Chiang, C. %A Fitzpatrick, S. J. %K *Computers %K *Educational Measurement %K *Models, Theoretical %K Automation %K Decision Making %K Humans %K Reproducibility of Results %X The purpose of the present investigation was to systematically examine the effectiveness of the Sympson-Hetter technique and rotated content balancing relative to no exposure control and no content rotation conditions in a computerized adaptive testing system (CAT) based on the partial credit model. A series of simulated fixed and variable length CATs were run using two data sets generated to multiple content areas for three sizes of item pools. The 2 (exposure control) X 2 (content rotation) X 2 (test length) X 3 (item pool size) X 2 (data sets) yielded a total of 48 conditions. Results show that while both procedures can be used with no deleterious effect on measurement precision, the gains in exposure control, pool utilization, and item overlap appear quite modest. Difficulties involved with setting the exposure control parameters in small item pools make questionable the utility of the Sympson-Hetter technique with similar item pools. %B Journal of Applied Measurement %V 4 %P 24-42 %G eng %M 12700429 %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T Exposure control using adaptive multi-stage item bundles %A Luecht, RM %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B annual meeting of the National Council on Measurement in Education %D 2003 %T Exposure control using adaptive multi-stage item bundles %A Luecht, RM %B annual meeting of the National Council on Measurement in Education %C Chicago, IL. USA %G eng %0 Journal Article %J Dissertation Abstracts International Section A: Humanities & Social Sciences %D 2002 %T The effect of test characteristics on aberrant response patterns in computer adaptive testing %A Rizavi, S. M. %K computerized adaptive testing %X The advantages that computer adaptive testing offers over linear tests have been well documented. The Computer Adaptive Test (CAT) design is more efficient than the Linear test design as fewer items are needed to estimate an examinee's proficiency to a desired level of precision. In the ideal situation, a CAT will result in examinees answering different number of items according to the stopping rule employed. Unfortunately, the realities of testing conditions have necessitated the imposition of time and minimum test length limits on CATs. Such constraints might place a burden on the CAT test taker resulting in aberrant response behaviors by some examinees. Occurrence of such response patterns results in inaccurate estimation of examinee proficiency levels. This study examined the effects of test lengths, time limits and the interaction of these factors with the examinee proficiency levels on the occurrence of aberrant response patterns. The focus of the study was on the aberrant behaviors caused by rushed guessing due to restrictive time limits. Four different testing scenarios were examined; fixed length performance tests with and without content constraints, fixed length mastery tests and variable length mastery tests without content constraints. For each of these testing scenarios, the effect of two test lengths, five different timing conditions and the interaction between these factors with three ability levels on ability estimation were examined. For fixed and variable length mastery tests, decision accuracy was also looked at in addition to the estimation accuracy. Several indices were used to evaluate the estimation and decision accuracy for different testing conditions. The results showed that changing time limits had a significant impact on the occurrence of aberrant response patterns conditional on ability. Increasing test length had negligible if not negative effect on ability estimation when rushed guessing occured. In case of performance testing high ability examinees while in classification testing middle ability examinees suffered the most. The decision accuracy was considerably affected in case of variable length classification tests. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Dissertation Abstracts International Section A: Humanities & Social Sciences %V 62 %P 3363 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T An EM approach to parameter estimation for the Zinnes and Griggs paired comparison IRT model %A Stark, S. %A F Drasgow %K Adaptive Testing %K Computer Assisted Testing %K Item Response Theory %K Maximum Likelihood %K Personnel Evaluation %K Statistical Correlation %K Statistical Estimation %X Borman et al. recently proposed a computer adaptive performance appraisal system called CARS II that utilizes paired comparison judgments of behavioral stimuli. To implement this approach,the paired comparison ideal point model developed by Zinnes and Griggs was selected. In this article,the authors describe item response and information functions for the Zinnes and Griggs model and present procedures for estimating stimulus and person parameters. Monte carlo simulations were conducted to assess the accuracy of the parameter estimation procedures. The results indicated that at least 400 ratees (i.e.,ratings) are required to obtain reasonably accurate estimates of the stimulus parameters and their standard errors. In addition,latent trait estimation improves as test length increases. The implications of these results for test construction are also discussed. %B Applied Psychological Measurement %V 26 %P 208-227 %G eng %0 Conference Paper %B annual meeting of the American Educational Research Association %D 2002 %T An empirical comparison of achievement level estimates from adaptive tests and paper-and-pencil tests %A Kingsbury, G. G. %K computerized adaptive testing %B annual meeting of the American Educational Research Association %C New Orleans, LA. USA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T An empirical comparison of achievement level estimates from adaptive tests and paper-and-pencil tests %A Kingsbury, G. G. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Generic %D 2002 %T An empirical investigation of selected multi-stage testing design variables on test assembly and decision accuracy outcomes for credentialing exams (Center for Educational Assessment Research Report No 469) %A Zenisky, A. L. %C Amherst, MA: University of Massachusetts, School of Education. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T Employing new ideas in CAT to a simulated reading test %A Thompson, T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Mesure et évaluation en éducation %D 2002 %T Étude de la distribution d'échantillonnage de l'estimateur du niveau d'habileté en testing adaptatif en fonction de deux règles d'arrêt dans le contexte de l'application du modèle de Rasch [Study of the sampling distribution of the proficiecy estima %A Raîche, G. %A Blais, J-G. %B Mesure et évaluation en éducation %V 24(2-3) %P 23-40 %G French %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Evaluation of selection procedures for computerized adaptive testing with polytomous items %A van Rijn, P. W. %A Theo Eggen %A Hemker, B. T. %A Sanders, P. F. %B Applied Psychological Measurement %V 26 %P 393-411 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Evaluation of selection procedures for computerized adaptive testing with polytomous items %A van Rijn, P. W. %A Theo Eggen %A Hemker, B. T. %A Sanders, P. F. %K computerized adaptive testing %X In the present study, a procedure that has been used to select dichotomous items in computerized adaptive testing was applied to polytomous items. This procedure was designed to select the item with maximum weighted information. In a simulation study, the item information function was integrated over a fixed interval of ability values and the item with the maximum area was selected. This maximum interval information item selection procedure was compared to a maximum point information item selection procedure. Substantial differences between the two item selection procedures were not found when computerized adaptive tests were evaluated on bias and the root mean square of the ability estimate. %B Applied Psychological Measurement %V 26 %P 393-411 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T An examination of decision-theory adaptive testing procedures %A Rudner, L. M. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans, LA %G eng %0 Generic %D 2002 %T An exploration of potentially problematic adaptive tests %A Stocking, M. %A Steffen, M. Golub-Smith, M. L. %A Eignor, D. R. %C Research Report 02-05) %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T The effect of test and examinee characteristics on the occurrence of aberrant response patterns in a computerized adaptive test %A Rizavi, S. %A Swaminathan, H. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T Effective use of simulated data in an on-line item calibration in practical situations of computerized adaptive testing %A Samejima, F. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T Effects of changes in the examinees’ ability distribution on the exposure control methods in CAT %A Chang, S-W. %A Twu, B.-Y. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %D 2001 %T Efficient on-line item calibration using a nonparametric method adjusted to computerized adaptive testing %A Samejima, F. %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Journal Article %J Journal of Personality Assessment %D 2001 %T Evaluation of an MMPI-A short form: Implications for adaptive testing %A Archer, R. P. %A Tirrell, C. A. %A Elkins, D. E. %K Adaptive Testing %K Mean %K Minnesota Multiphasic Personality Inventory %K Psychometrics %K Statistical Correlation %K Statistical Samples %K Test Forms %X Reports some psychometric properties of an MMPI-Adolescent version (MMPI-A; J. N. Butcher et al, 1992) short form based on administration of the 1st 150 items of this test instrument. The authors report results for both the MMPI-A normative sample of 1,620 adolescents (aged 14-18 yrs) and a clinical sample of 565 adolescents (mean age 15.2 yrs) in a variety of treatment settings. The authors summarize results for the MMPI-A basic scales in terms of Pearson product-moment correlations generated between full administration and short-form administration formats and mean T score elevations for the basic scales generated by each approach. In this investigation, the authors also examine single-scale and 2-point congruences found for the MMPI-A basic clinical scales as derived from standard and short-form administrations. The authors present the relative strengths and weaknesses of the MMPI-A short form and discuss the findings in terms of implications for attempts to shorten the item pool through the use of computerized adaptive assessment approaches. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Personality Assessment %V 76 %P 76-89 %G eng %0 Journal Article %J Applied Measurement in Education %D 2001 %T An examination of conditioning variables used in computer adaptive testing for DIF analyses %A Walker, C. M. %A Beretvas, S. N %A Ackerman, T. A. %B Applied Measurement in Education %V 14 %P 3-16 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T An examination of item review on a CAT using the specific information item selection algorithm %A Bowles, R %A Pommerich, M %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T An examination of item review on a CAT using the specific information item selection algorithm %A Bowles, R %A Pommerich, M %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Generic %D 2001 %T An examination of item review on computer adaptive tests %A Bowles, R %C Manuscript in preparation, University of Virginia %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 2001 %T An examination of item selection rules by stratified CAT designs integrated with content balancing methods %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the Annual Meeting of the American Educational Research Association %C Seattle WA %G eng %0 Conference Paper %D 2001 %T An examination of testlet scoring and item exposure constraints in the Verbal Reasoning section of the MCAT %A Davis, L. L. %A Dodd, B. G. %G eng %0 Generic %D 2001 %T An examination of testlet scoring and item exposure constraints in the verbal reasoning section of the MCAT %A Davis, L. L. %A Dodd, B. G. %C MCAT Monograph Series: Association of American Medical Colleges %G eng %0 Journal Article %J Journal of Applied Psychology %D 2001 %T An examination of the comparative reliability, validity, and accuracy of performance ratings made using computerized adaptive rating scales %A Borman, W. C. %A Buck, D. E. %A Hanson, M. A. %A Motowidlo, S. J. %A Stark, S. %A F Drasgow %K *Computer Simulation %K *Employee Performance Appraisal %K *Personnel Selection %K Adult %K Automatic Data Processing %K Female %K Human %K Male %K Reproducibility of Results %K Sensitivity and Specificity %K Support, U.S. Gov't, Non-P.H.S. %K Task Performance and Analysis %K Video Recording %X This laboratory research compared the reliability, validity, and accuracy of a computerized adaptive rating scale (CARS) format and 2 relatively common and representative rating formats. The CARS is a paired-comparison rating task that uses adaptive testing principles to present pairs of scaled behavioral statements to the rater to iteratively estimate a ratee's effectiveness on 3 dimensions of contextual performance. Videotaped vignettes of 6 office workers were prepared, depicting prescripted levels of contextual performance, and 112 subjects rated these vignettes using the CARS format and one or the other competing format. Results showed 23%-37% lower standard errors of measurement for the CARS format. In addition, validity was significantly higher for the CARS format (d = .18), and Cronbach's accuracy coefficients showed significantly higher accuracy, with a median effect size of .08. The discussion focuses on possible reasons for the results. %B Journal of Applied Psychology %V 86 %P 965-973 %G eng %M 11596812 %0 Generic %D 2000 %T Effects of item-selection criteria on classification testing with the sequential probability ratio test (Research Report 2000-8) %A Lin, C.-J. %A Spray, J. A. %C Iowa City, IA: American College Testing %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Effects of nonequivalence of item pools on ability estimates in CAT %A Ban, J. C. %A Wang, T. %A Yi, Q. %A Harris, D. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Medical Care %D 2000 %T Emergence of item response modeling in instrument development and data analysis %A Hambleton, R. K. %K Computer Assisted Testing %K Health %K Item Response Theory %K Measurement %K Statistical Validity computerized adaptive testing %K Test Construction %K Treatment Outcomes %B Medical Care %V 38 %P II60-II65 %G eng %0 Generic %D 2000 %T Estimating item parameters from classical indices for item pool development with a computerized classification test (Research Report 2000-4) %A Huang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. %C Iowa City IA: ACT Inc %G eng %0 Report %D 2000 %T Estimating Item Parameters from Classical Indices for Item Pool Development with a Computerized Classification Test. %A Huang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. A. %I ACT, Inc. %C Iowa City, Iowa %G eng %0 Generic %D 2000 %T Estimating item parameters from classical indices for item pool development with a computerized classification test (ACT Research 2000-4) %A Chang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. %C Iowa City IA, ACT, Inc %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T Estimation of trait level in computerized adaptive testing %A Cheng, P. E. %A Liou, M. %K (Statistical) %K Adaptive Testing %K Computer Assisted Testing %K Item Analysis %K Statistical Estimation computerized adaptive testing %X Notes that in computerized adaptive testing (CAT), a examinee's trait level (θ) must be estimated with reasonable accuracy based on a small number of item responses. A successful implementation of CAT depends on (1) the accuracy of statistical methods used for estimating θ and (2) the efficiency of the item-selection criterion. Methods of estimating θ suitable for CAT are reviewed, and the differences between Fisher and Kullback-Leibler information criteria for selecting items are discussed. The accuracy of different CAT algorithms was examined in an empirical study. The results show that correcting θ estimates for bias was necessary at earlier stages of CAT, but most CAT algorithms performed equally well for tests of 10 or more items. (PsycINFO Database Record (c) 2005 APA ) %B Applied Psychological Measurement %V 24 %P 257-265 %G eng %0 Journal Article %J Chronicle of Higher Education %D 2000 %T ETS finds flaws in the way online GRE rates some students %A Carlson, S. %B Chronicle of Higher Education %V 47 %P a47 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2000 %T An examination of exposure control and content balancing restrictions on item selection in CATs using the partial credit model %A Davis, L. L. %A Pastor, D. A. %A Dodd, B. G. %A Chiang, C. %A Fitzpatrick, S. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans, LA %G eng %0 Journal Article %J Dissertation Abstracts International: Section B: The Sciences and Engineering %D 2000 %T An examination of the reliability and validity of performance ratings made using computerized adaptive rating scales %A Buck, D. E. %K Adaptive Testing %K Computer Assisted Testing %K Performance Tests %K Rating Scales %K Reliability %K Test %K Test Validity %X This study compared the psychometric properties of performance ratings made using recently-developed computerized adaptive rating scales (CARS) to the psyc hometric properties of ratings made using more traditional paper-and-pencil rati ng formats, i.e., behaviorally-anchored and graphic rating scales. Specifically, the reliability, validity and accuracy of the performance ratings from each for mat were examined. One hundred twelve participants viewed six 5-minute videotape s of office situations and rated the performance of a target person in each vide otape on three contextual performance dimensions-Personal Support, Organizationa l Support, and Conscientious Initiative-using CARS and either behaviorally-ancho red or graphic rating scales. Performance rating properties were measured using Shrout and Fleiss's intraclass correlation (2, 1), Borman's differential accurac y measure, and Cronbach's accuracy components as indexes of rating reliability, validity, and accuracy, respectively. Results found that performance ratings mad e using the CARS were significantly more reliable and valid than performance rat ings made using either of the other formats. Additionally, CARS yielded more acc urate performance ratings than the paper-and-pencil formats. The nature of the C ARS system (i.e., its adaptive nature and scaling methodology) and its paired co mparison judgment task are offered as possible reasons for the differences found in the psychometric properties of the performance ratings made using the variou s rating formats. (PsycINFO Database Record (c) 2005 APA ) %B Dissertation Abstracts International: Section B: The Sciences and Engineering %V 61 %P 570 %G eng %0 Journal Article %J Dissertation Abstracts International Section A: Humanities and Social Sciences %D 2000 %T An exploratory analysis of item parameters and characteristics that influence item level response time %A Smith, Russell Winsor %K Item Analysis (Statistical) %K Item Response Theory %K Problem Solving %K Reaction Time %K Reading Comprehension %K Reasoning %X This research examines the relationship between item level response time and (1) item discrimination, (2) item difficulty, (3) word count, (4) item type, and (5) whether a figure is included in an item. Data are from the Graduate Management Admission Test, which is currently offered only as a computerized adaptive test. Analyses revealed significant differences in response time between the five item types: problem solving, data sufficiency, sentence correction, critical reasoning, and reading comprehension. For this reason, the planned pairwise and complex analyses were run within each item type. Pairwise curvilinear regression analyses explored the relationship between response time and item discrimination, item difficulty, and word count. Item difficulty significantly contributed to the prediction of response time for each item type; two of the relationships were significantly quadratic. Item discrimination significantly contributed to the prediction of response time for only two of the item types; one revealed a quadratic relationship and the other a cubic relationship. Word count had significant linear relationship with response time for all the item types except reading comprehension, for which there was no significant relationship. Multiple regression analyses using word count, item difficulty, and item discrimination predicted between 35.4% and 71.4% of the variability in item response time across item types. The results suggest that response time research should consider the type of item that is being administered and continue to explore curvilinear relationships between response time and its predictor variables. (PsycINFO Database Record (c) 2005 APA ) %B Dissertation Abstracts International Section A: Humanities and Social Sciences %V 61 %P 1812 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1999 %T The effect of model misspecification on classification decisions made using a computerized test %A Kalohn, J.C. %A Spray, J. A. %K computerized adaptive testing %X Many computerized testing algorithms require the fitting of some item response theory (IRT) model to examinees' responses to facilitate item selection, the determination of test stopping rules, and classification decisions. Some IRT models are thought to be particularly useful for small volume certification programs that wish to make the transition to computerized adaptive testing (CAT). The 1-parameter logistic model (1-PLM) is usually assumed to require a smaller sample size than the 3-parameter logistic model (3-PLM) for item parameter calibrations. This study examined the effects of model misspecification on the precision of the decisions made using the sequential probability ratio test. For this comparison, the 1-PLM was used to estimate item parameters, even though the items' characteristics were represented by a 3-PLM. Results demonstrate that the 1-PLM produced considerably more decision errors under simulation conditions similar to a real testing environment, compared to the true model and to a fixed-form standard reference set of items. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Journal of Educational Measurement %V 36 %P 47-59 %G eng %0 Journal Article %J Applied Measurement in Education %D 1999 %T The effects of test difficulty manipulation in computerized adaptive testing and self-adapted testing %A Ponsoda, V. %A Olea, J. %A Rodriguez, M. S. %A Revuelta, J. %B Applied Measurement in Education %V 12 %P 167-184 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T Empirical initialization of the trait estimator in adaptive testing %A van der Linden, W. J. %B Applied Psychological Measurement %V 23 %P 21-29 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1999 %T An enhanced stratified computerized adaptive testing design %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the American Educational Research Association %C Montreal, Canada %G eng %0 Journal Article %J Academic Medicine %D 1999 %T Evaluating the usefulness of computerized adaptive testing for medical in-course assessment %A Kreiter, C. D. %A Ferguson, K. %A Gruppen, L. D. %K *Automation %K *Education, Medical, Undergraduate %K Educational Measurement/*methods %K Humans %K Internal Medicine/*education %K Likelihood Functions %K Psychometrics/*methods %K Reproducibility of Results %X PURPOSE: This study investigated the feasibility of converting an existing computer-administered, in-course internal medicine test to an adaptive format. METHOD: A 200-item internal medicine extended matching test was used for this research. Parameters were estimated with commercially available software with responses from 621 examinees. A specially developed simulation program was used to retrospectively estimate the efficiency of the computer-adaptive exam format. RESULTS: It was found that the average test length could be shortened by almost half with measurement precision approximately equal to that of the full 200-item paper-and-pencil test. However, computer-adaptive testing with this item bank provided little advantage for examinees at the upper end of the ability continuum. An examination of classical item statistics and IRT item statistics suggested that adding more difficult items might extend the advantage to this group of examinees. CONCLUSIONS: Medical item banks presently used for incourse assessment might be advantageously employed in adaptive testing. However, it is important to evaluate the match between the items and the measurement objective of the test before implementing this format. %B Academic Medicine %7 1999/10/28 %V 74 %P 1125-8 %8 Oct %@ 1040-2446 (Print) %G eng %M 10536635 %! Acad Med %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T An examination of conditioning variables in DIF analysis in a computer adaptive testing environment %A Walker, C. M. %A Ackerman, T. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Journal Article %J Applied Measurement in Education %D 1999 %T Examinee judgments of changes in item difficulty: Implications for item review in computerized adaptive testing %A Wise, S. L. %A Finney, S. J., %A Enders, C. K. %A Freeman, S.A. %A Severance, D.D. %B Applied Measurement in Education %V 12 %P 185-198 %G eng %0 Generic %D 1999 %T Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing (ACT Research Report series 99-5) %A Chen, S-Y. %A Ankenmann, R. D. %A Spray, J. A. %C Iowa City IA: ACT, Inc %G eng %0 Generic %D 1999 %T Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing %A Chen, S. %A Ankenmann, R. D. %A Spray, J. A. %C Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada %G eng %0 Journal Article %J J Outcome Meas %D 1998 %T The effect of item pool restriction on the precision of ability measurement for a Rasch-based CAT: comparisons to traditional fixed length examinations %A Halkitis, P. N. %K *Decision Making, Computer-Assisted %K Comparative Study %K Computer Simulation %K Education, Nursing %K Educational Measurement/*methods %K Human %K Models, Statistical %K Psychometrics/*methods %X This paper describes a method for examining the precision of a computerized adaptive test with a limited item pool. Standard errors of measurement ascertained in the testing of simulees with a CAT using a restricted pool were compared to the results obtained in a live paper-and-pencil achievement testing of 4494 nursing students on four versions of an examination of calculations of drug administration. CAT measures of precision were considered when the simulated examine pools were uniform and normal. Precision indices were also considered in terms of the number of CAT items required to reach the precision of the traditional tests. Results suggest that regardless of the size of the item pool, CAT provides greater precision in measurement with a smaller number of items administered even when the choice of items is limited but fails to achieve equiprecision along the entire ability continuum. %B J Outcome Meas %V 2 %P 97-122 %G eng %M 9661734 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T Effect of item selection on item exposure rates within a computerized classification test %A Kalohn, J.C. %A Spray, J. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T An empirical Bayes approach to Mantel-Haenszel DIF analysis: Theoretical development and application to CAT data %A Zwick, R. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1998 %T Essentially unbiased Bayesian estimates in computerized adaptive testing %A Wang, T. %A Lau, C. %A Hanson, B. A. %B Paper presented at the annual meeting of the American Educational Research Association %C San Diego %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T Evaluating and insuring measurement precision in adaptive testing %A Davey, T. %A Nering, M. L. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T Evaluation of methods for the use of underutilized items in a CAT environment %A Steffen, M. %A Liu, M. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T An examination of item-level response times from an operational CAT %A Swygert, K. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Urbana IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of National Council on Measurement in Education %D 1998 %T Expected losses for individuals in Computerized Mastery Testing %A Smith, R. %A Lewis, C. %B Paper presented at the annual meeting of National Council on Measurement in Education %C San Diego %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1997 %T The effect of adaptive administration on the variability of the Mantel-Haenszel measure of differential item functioning %A Zwick, R. %B Educational and Psychological Measurement %V 57 %P 412-421 %G eng %0 Journal Article %J Educational & Psychological Measurement %D 1997 %T The effect of population distribution and method of theta estimation on computerized adaptive testing (CAT) using the rating scale model %A Chen, S-K. %A Hou, L. Y. %A Fitzpatrick, S. J. %A Dodd, B. G. %K computerized adaptive testing %X Investigated the effect of population distribution on maximum likelihood estimation (MLE) and expected a posteriori estimation (EAP) in a simulation study of computerized adaptive testing (CAT) based on D. Andrich's (1978) rating scale model. Comparisons were made among MLE and EAP with a normal prior distribution and EAP with a uniform prior distribution within 2 data sets: one generated using a normal trait distribution and the other using a negatively skewed trait distribution. Descriptive statistics, correlations, scattergrams, and accuracy indices were used to compare the different methods of trait estimation. The EAP estimation with a normal prior or uniform prior yielded results similar to those obtained with MLE, even though the prior did not match the underlying trait distribution. An additional simulation study based on real data suggested that more work is needed to determine the optimal number of quadrature points for EAP in CAT based on the rating scale model. The choice between MLE and EAP for particular measurement situations is discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Educational & Psychological Measurement %V 57 %P 422-439 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1997 %T The effect of population distribution and methods of theta estimation on computerized adaptive testing (CAT) using the rating scale model %A Chen, S. %A Hou, L. %A Fitzpatrick, S. J. %A Dodd, B. %B Educational and Psychological Measurement %V 57 %P 422-439 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T The effects of motivation on equating adaptive and conventional tests %A Segall, D. O. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Book Section %D 1997 %T Equating the CAT-ASVAB %A Segall, D. O. %C W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 181-198). Washington DC: American Psychological Association. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1997 %T Essentially unbiased EAP estimates in computerized adaptive testing %A Wang, T. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago %G eng %0 Journal Article %D 1997 %T Evaluating an automatically scorable, open-ended response type for measuring mathematical reasoning in computer-adaptive tests %A Bennett, R. E. %A Steffen, M. %A Singley, M.K. %A Morley, M. %A Jacquemin, D. %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1997 %T Evaluating comparability in computerized adaptive testing: A theoretical framework with an example %A Wang, T. %A Kolen, M. J. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago %G eng %0 Book Section %D 1997 %T Evaluating item calibration medium in computerized adaptive testing %A Hetter, R. D. %A Segall, D. O. %A Bloxom, B. M. %C W.A. Sands, B.K. Waters and J.R. McBride, Computerized adaptive testing: From inquiry to operation (pp. 161-168). Washington, DC: American Psychological Association. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Examinee issues in CAT %A Wise, S. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T Effect of altering passing score in CAT when unidimensionality is violated %A Abdel-Fattah, A. A. %A Lau, CA %A Spray, J. A. %B Paper presented at the annual meeting of the American Educational Research Association %C New York NY %8 April %G eng %0 Journal Article %J Dissertation Abstracts International: Section B: the Sciences & Engineering %D 1996 %T The effect of individual differences variables on the assessment of ability for Computerized Adaptive Testing %A Gershon, R. C. %K computerized adaptive testing %X Computerized Adaptive Testing (CAT) continues to gain momentum as the accepted testing modality for a growing number of certification, licensure, education, government and human resource applications. However, the developers of these tests have for the most part failed to adequately explore the impact of individual differences such as test anxiety on the adaptive testing process. It is widely accepted that non-cognitive individual differences variables interact with the assessment of ability when using written examinations. Logic would dictate that individual differences variables would equally affect CAT. Two studies were used to explore this premise. In the first study, 507 examinees were given a test anxiety survey prior to taking a high stakes certification exam using CAT or using a written format. All examinees had already completed their course of study, and the examination would be their last hurdle prior to being awarded certification. High test anxious examinees performed worse than their low anxious counterparts on both testing formats. The second study replicated the finding that anxiety depresses performance in CAT. It also addressed the differential effect of anxiety on within test performance. Examinees were candidates taking their final certification examination following a four year college program. Ability measures were calculated for each successive part of the test for 923 subjects. Within subject performance varied depending upon test position. High anxious examinees performed poorly at all points in the test, while low and medium anxious examinee performance peaked in the middle of the test. If test anxiety and performance measures were actually the same trait, then low anxious individuals should have performed equally well throughout the test. The observed interaction of test anxiety and time on task serves as strong evidence that test anxiety has motivationally mediated as well as cognitively mediated effects. The results of the studies are di (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Dissertation Abstracts International: Section B: the Sciences & Engineering %V 57 %P 4085 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education. %D 1996 %T Effects of answer feedback and test anxiety on the psychometric and motivational characteristics of computer-adaptive and self-adaptive vocabulary tests %A Vispoel, W. P. %A Brunsman, B. %A Forte, E. %A Bleiler, T. %B Paper presented at the annual meeting of the National Council on Measurement in Education. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Effects of answer review and test anxiety on the psychometric and motivational characteristics of computer-adaptive and self-adaptive vocabulary tests %A Vispoel, W. %A Forte, E. %A Boo, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New York %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T The effects of methods of theta estimation, prior distribution, and number of quadrature points on CAT using the graded response model %A Hou, L. %A Chen, S. %A Dodd. B. G. %A Fitzpatrick, S. J. %B Paper presented at the annual meeting of the American Educational Research Association %C New York NY %G eng %0 Book %D 1996 %T The effects of person misfit in computerized adaptive testing %A Nering, M. L. %C Unpublished doctoral dissertation, University of Minnesota, Minneapolis %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T Effects of randomesque item selection on CAT item exposure rates and proficiency estimation under 1- and 2-PL models %A Featherman, C. M. %A Subhiyah, R. G. %A Hadadi, A. %B Paper presented at the annual meeting of the American Educational Research Association %C New York %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1996 %T An evaluation of a two-stage testlet design for computerized adaptive testing %A Reese, L. M. %A Schnipke, D. L. %B Paper presented at the annual meeting of the Psychometric Society %C Banff, Alberta, Canada %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the Psychometric Society %D 1995 %T The effect of ability estimation for polytomous CAT in different item selection procedures %A Fan, M. %A Hsu, Y. %B Paper presented at the Annual meeting of the Psychometric Society %C Minneapolis MN %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1995 %T The effect of model misspecification on classification decisions made using a computerized test: UIRT versus MIRT %A Abdel-Fattah, A. A. %A Lau, C.-M. A. %B Paper presented at the annual meeting of the Psychometric Society %C Minneapolis MN %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the Psychometric Society %D 1995 %T The effect of model misspecification on classification decisions made using a computerized test: 3-PLM vs. 1PLM (and UIRT versus MIRT) %A Spray, J. A. %A Kalohn, J.C. %A Schulz, M. %A Fleer, P. Jr. %B Paper presented at the Annual Meeting of the Psychometric Society %C Minneapolis, MN %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1995 %T The effect of population distribution and methods of theta estimation on CAT using the rating scale model %A Chen, S. %A Hou, L. %A Fitzpatrick, S. J. %A Dodd, B. G. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco %G eng %0 Journal Article %J Journal of Educational Measurement %D 1995 %T Effect of Rasch calibration on ability and DIF estimation in computer-adaptive tests %A Zwick, R. %A Thayer, D. T. %A Wingersky, M. %B Journal of Educational Measurement %V 32 %P 341-363 %0 Journal Article %J Journal of Educational Psychology %D 1995 %T Effects and underlying mechanisms of self-adapted testing %A Rocklin, T. R. %A O’Donnell, A. M. %A Holst, P. M. %B Journal of Educational Psychology %V 87 %P 103-116 %G eng %0 Conference Paper %B Paper presented at the meeting of the Society for Industrial and Organizational Psychology %D 1995 %T The effects of item compromise on computerized adaptive test scores %A Segall, D. O. %B Paper presented at the meeting of the Society for Industrial and Organizational Psychology %C Orlando, FL %G eng %0 Book %D 1995 %T El control de la exposicin de los items en tests adaptativos informatizados [Item exposure control in computerized adaptive tests] %A Revuelta, J. %C Unpublished master’s dissertation, Universidad Autonma de Madrid, Spain %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1995 %T Equating computerized adaptive certification examinations: The Board of Registry series of studies %A Lunz, M. E. %A Bergstrom, Betty A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco %G eng %0 Conference Paper %B Paper presented at the meeting of the National Council on Measurement in Education %D 1995 %T Equating the CAT-ASVAB: Experiences and lessons learned %A Segall, D. O. %B Paper presented at the meeting of the National Council on Measurement in Education %C San Francisco %G eng %0 Conference Paper %B Paper presented at the meeting of the National Council on Measurement in Education %D 1995 %T Equating the CAT-ASVAB: Issues and approach %A Segall, D. O. %A Carter, G. %B Paper presented at the meeting of the National Council on Measurement in Education %C San Francisco %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education. San Francisco %D 1995 %T Equating the computerized adaptive edition of the Differential Aptitude Tests %A J. R. McBride %B Paper presented at the annual meeting of the National Council on Measurement in Education. San Francisco %C CA %G eng %0 Conference Paper %B Paper presented at the annual conference of the National Council on Measurement in Education in San Francisco. %D 1995 %T Estimation of item difficulty from restricted CAT calibration samples %A Sykes, R. %A Ito, K. %B Paper presented at the annual conference of the National Council on Measurement in Education in San Francisco. %G eng %0 Generic %D 1995 %T An evaluation of alternative concepts for administering the Armed Services Vocational Aptitude Battery to applicants for enlistment %A Hogan, P.F. %A J. R. McBride %A Curran, L. T. %C DMDC Technical Report 95-013. Monterey, CA: Personnel Testing Division, Defense Manpower Data Center %G eng %0 Conference Paper %B Paper presented at the 102nd Annual Convention of the American Psychological Association. Los Angeles %D 1994 %T Early psychometric research in the CAT-ASVAB Project %A J. R. McBride %B Paper presented at the 102nd Annual Convention of the American Psychological Association. Los Angeles %C CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1994 %T The effect of restricting ability distributions in the estimation of item difficulties: Implications for a CAT implementation %A Ito, K. %A Sykes, R.C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T The effect of review on the psychometric characteristics of computerized adaptive tests %A Lunz, M. E. %A Stone, G. E. %B Applied Measurement in Education %V 7(3) %P 211-222 %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T The effect of review on the psychometric characteristics of computerized adaptive tests %A Stone, G. E. %A Lunz, M. E. %B Applied Measurement in Education %V 7 %P 211-222 %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T The effect of review on the psychometric characterstics of computerized adaptive tests %A Stone, G. E. %A Lunz, M. E. %X Explored the effect of reviewing items and altering responses on examinee ability estimates, test precision, test information, decision confidence, and pass/fail status for computerized adaptive tests. Two different populations of examinees took different computerized certification examinations. For purposes of analysis, each population was divided into 3 ability groups (high, medium, and low). Ability measures before and after review were highly correlated, but slightly lower decision confidence was found after review. Pass/fail status was most affected for examinees with estimates close to the pass point. Decisions remained the same for 94% of the examinees. Test precision is only slightly affected by review, and the average information loss can be recovered by the addition of one item. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Applied Measurement in Education %V 7 %P 211-222 %G eng %0 Book %D 1994 %T Effects of computerized adaptive test anxiety on nursing licensure examinations %A Arrowwood, V. E. %C Dissertation Abstracts International, A (Humanities and Social Sciences), 54 (9-A), 3410 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1994 %T The effects of item pool depth on the accuracy of pass/fail decisions for NCLEX using CAT %A Haynie, K.A. %A Way, W. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans %G eng %0 Journal Article %J Journal of Educational Measurement %D 1994 %T An empirical study of computerized adaptive test administration conditions %A Lunz, M. E. %A Bergstrom, Betty A. %B Journal of Educational Measurement %V 31 %P 251-263 %8 Fal %G eng %0 Book Section %B Objective measurement: Theory into practice %D 1994 %T The equivalence of Rasch item calibrations and ability estimates across modes of administration %A Bergstrom, Betty A. %A Lunz, M. E. %K computerized adaptive testing %B Objective measurement: Theory into practice %I Ablex Publishing Co. %C Norwood, N.J. USA %V 2 %P 122-128 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1994 %T Establishing the comparability of the NCLEX using CAT with traditional NCLEX examinations %A Eignor, D. R. %A Way, W. D. %A Amoss, K.E. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans, LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Psychological Association %D 1994 %T Evaluation and implementation of CAT-ASVAB %A Curran, L. T. %A Wise, L. L. %B Paper presented at the annual meeting of the American Psychological Association %C Los Angeles %G eng %0 Book %D 1994 %T The exploration of an alternative method for scoring computer adaptive tests %A Potenza, M. %C Unpublished doctoral dissertation, Lincoln NE: University of Nebraska %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T The efficiency, reliability, and concurrent validity of adaptive and fixed-item tests of music listening skills %A Vispoel, W. P. %A Wang, T. %A Bleiler, T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta GA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T Establishing time limits for the GRE computer adaptive tests %A Reese, C. M. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta GA %G eng %0 Journal Article %J Applied Psychological Measurement %D 1992 %T The effect of review on student ability and test efficiency for computerized adaptive tests %A Lunz, M. E. %A Bergstrom, Betty A. %A Wright, Benjamin D. %X 220 students were randomly assigned to a review condition for a medical technology test; their test instructions indicated that each item must be answered when presented, but that the responses could be reviewed and altered at the end of the test. A sample of 492 students did not have the opportunity to review and alter responses. Within the review condition, examinee ability estimates before and after review were correlated .98. The average efficiency of the test was decreased by 1% after review. Approximately 32% of the examinees improved their ability estimates after review but did not change their pass/fail status. Disallowing review on adaptive tests administered under these rules is not supported by these data. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Applied Psychological Measurement %V 16 %P 33-40 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1992 %T The Effect of Review on Student Ability and Test Efficiency for Computerized Adaptive Tests %A Lunz, M. E. %A Berstrom, B.A. %A Wright, B. D. %B Applied Psychological Measurement %V 16 %P 33-40 %G English %N 1 %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1992 %T Effects of feedback during self-adapted testing on estimates of ability %A Holst, P. M. %A O’Donnell, A. M. %A Rocklin, T. R. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco %G eng %0 Conference Paper %B Paper presented at the annual meeting of the NMCE %D 1992 %T The effects of feedback in computerized adaptive and self-adapted tests %A Roos, L. L. %A Plake, B. S. %A Wise, S. L. %B Paper presented at the annual meeting of the NMCE %C San Francisco %G eng %0 Conference Paper %B Paper presented at the annual meeting if the American Educational Research Association %D 1992 %T Estimation of ability level by using only observable quantities in adaptive testing %A Kirisci, L. %B Paper presented at the annual meeting if the American Educational Research Association %C Chicago %G eng %0 Book Section %D 1992 %T Evaluation of alternative operational concepts %A J. R. McBride %A Hogan, P.F. %C Proceedings of the 34th Annual Conference of the Military Testing Association. San Diego, CA: Navy Personnel Research and Development Center. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1991 %T An empirical comparison of self-adapted and maximum information item selection %A Rocklin, T. R. %A O’Donnell, A. M. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago %G eng %0 Journal Article %J Applied Psychological Measurement %D 1990 %T The Effect of Item Selection Procedure and Stepsize on Computerized Adaptive Attitude Measurement Using the Rating Scale Model %A Dodd, B. G. %B Applied Psychological Measurement %V 14 %P 355-366 %G English %N 4 %0 Journal Article %J Applied Psychological Measurement %D 1990 %T The effect of item selection procedure and stepsize on computerized adaptive attitude measurement using the rating scale model %A Dodd, B. G. %X Real and simulated datasets were used to investigate the effects of the systematic variation of two major variables on the operating characteristics of computerized adaptive testing (CAT) applied to instruments consisting of poly- chotomously scored rating scale items. The two variables studied were the item selection procedure and the stepsize method used until maximum likelihood trait estimates could be calculated. The findings suggested that (1) item pools that consist of as few as 25 items may be adequate for CAT; (2) the variable stepsize method of preliminary trait estimation produced fewer cases of nonconvergence than the use of a fixed stepsize procedure; and (3) the scale value item selection procedure used in conjunction with a minimum standard error stopping rule outperformed the information item selection technique used in conjunction with a minimum information stopping rule in terms of the frequencies of nonconvergent cases, the number of items administered, and the correlations of CAT 0 estimates with full scale estimates and known 0 values. The implications of these findings for implementing CAT with rating scale items are discussed. Index terms: %B Applied Psychological Measurement %V 14 %P 355-386 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1990 %T The effects of variable entry on bias and information of the Bayesian adaptive testing procedure %A Hankins, J. A. %B Educational and Psychological Measurement %V 50 %P 785-802 %G eng %0 Conference Paper %B Paper presented at the 25th Annual Symposium on recent developments in the MMPI/MMPI-2 %D 1990 %T An empirical study of the computer adaptive MMPI-2 %A Ben-Porath, Y. S. %A Roper, B. L. %A Butcher, J. N. %B Paper presented at the 25th Annual Symposium on recent developments in the MMPI/MMPI-2 %C Minneapolis MN %0 Journal Article %J Applied Psychological Measurement %D 1989 %T Estimating Reliabilities of Computerized Adaptive Tests %A Divgi, D. R. %B Applied Psychological Measurement %V 13 %P 145-149 %G English %N 2 %0 Book %D 1989 %T Étude de praticabilité du testing adaptatif de maîtrise des apprentissages scolaires au Québec : une expérimentation en éducation économique secondaire 5 %A Auger, R. %C Thèse de doctorat non publiée. Montréal : Université du Québec à Montréal. [In French] %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1989 %T EXSPRT: An expert systems approach to computer-based adaptive testing %A Frick, T. W. %A Plew, G.T. %A Luk, H.-K. %B Paper presented at the annual meeting of the American Educational Research Association %V San Francisco. %G eng %0 Generic %D 1988 %T The equivalence of scores from automated and conventional educational and psychological tests (College Board Report No. 88-8) %A Mazzeo, J. %A Harvey, A. L. %C New York: The College Entrance Examination Board. %G eng %0 Report %D 1987 %T The effect of item parameter estimation error on decisions made using the sequential probability ratio test %A Spray, J. A. %A Reckase, M. D. %K computerized adaptive testing %K Sequential probability ratio test %B ACT Research Report Series %I DTIC Document %C Iowa City, IA. USA %G eng %0 Generic %D 1987 %T The effect of item parameter estimation error on the decisions made using the sequential probability ratio test (ACT Research Report Series 87-17) %A Spray, J. A. %A Reckase, M. D. %C Iowa City IA: American College Testing %G eng %0 Book %D 1987 %T The effects of variable entry on bias and information of the Bayesian adaptive testing procedure %A Hankins, J. A. %C Dissertation Abstracts International, 47 (8A), 3013 %G eng %0 Conference Paper %B Paper presented at the meeting of the American Psychological Association %D 1987 %T Equating the computerized adaptive edition of the Differential Aptitude Tests %A J. R. McBride %A Corpe, V. A. %A Wing, H. %B Paper presented at the meeting of the American Psychological Association %C New York %G eng %0 Generic %D 1987 %T Equivalent-groups versus single-group equating designs for the Accelerated CAT-ASVAB Project (Research Memorandum 87-6) %A Stoloff, P. H. %C Alexandria VA: Center for Naval Analyses %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1986 %T The effects of computer experience on computerized adaptive test performance %A Lee, J. A. %B Educational and Psychological Measurement %V 46 %P 727-733 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1986 %T Equivalence of conventional and computer presentation of speed tests %A Greaud, V. A., %A Green, B. F. %B Applied Psychological Measurement %V 10 %P 23-34 %G eng %0 Report %D 1985 %T Equivalence of scores from computerized adaptive and paper-and-pencil ASVAB tests %A Stoloff, P. H. %I Center for Naval Analysis %C Alexandria, VA. USA %P 100 %G eng %0 Generic %D 1984 %T Efficiency and precision in two-stage adaptive testing %A Loyd, B.H. %C West Palm Beach Florida: Eastern ERA %G eng %0 Generic %D 1984 %T Evaluation of computerized adaptive testing of the ASVAB %A Hardwicke, S. %A Vicino, F. %A J. R. McBride %A Nemeth, C. %C San Diego, CA: Navy Personnel Research and Development Center, unpublished manuscript %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1984 %T An evaluation of the utility of large scale computerized adaptive testing %A Vicino, F. L. %A Hardwicke, S. B. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1984 %T An evaluation of the utility of large scale computerized adaptive testing %A Vicino, F. L. %A Hardwicke, S. B. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago %G eng %0 Generic %D 1984 %T Evaluation plan for the computerized adaptive vocational aptitude battery (Research Report 82-1) %A Green, B. F. %A Bock, R. D. %A Humphreys, L. G. %A Linn, R. L. %A Reckase, M. D. %G eng %0 Book %D 1983 %T Effects of item parameter error and other factors on trait estimation in latent trait based adaptive testing %A Mattson, J. D. %C Unpublished doctoral dissertation, University of Minnesota %G eng %0 Generic %D 1983 %T An evaluation of one- and three-parameter logistic tailored testing procedures for use with small item pools (Research Report ONR83-1) %A McKinley, R. L. %A Reckase, M. D. %C Iowa City IA: American College Testing Program %G eng %0 Book %D 1981 %T Effect of error in item parameter estimates on adaptive testing (Doctoral dissertation, University of Minnesota) %A Crichton, L. I. %C Dissertation Abstracts International, 42, 06-B %G eng %0 Journal Article %J Applied Psychological Measurement %D 1981 %T The Effects of Item Calibration Sample Size and Item Pool Size on Adaptive Testing %A Ree, M. J. %B Applied Psychological Measurement %V 5 %P 11-19 %G English %N 1 %0 Generic %D 1980 %T Effects of computerized adaptive testing on Black and White students (Research Report 79-2) %A Pine, S. M. %A Church, A. T. %A Gialluca, K. A. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1980 %T Effects of program parameters and item pool characteristics on the bias of a three-parameter tailored testing procedure %A Patience, W. M. %A Reckase, M. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Boston MA, U %G eng %0 Generic %D 1980 %T An empirical study of a broad range test of verbal ability %A Kreitzberg, C. B. %A Jones, D. J. %C Princeton NJ: Educational Testing Service %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1980 %T Estimating the reliability of adaptive tests from a single test administration %A Sympson, J. B. %B Paper presented at the annual meeting of the American Educational Research Association %C Boston %G eng %0 Report %D 1979 %T Efficiency of an adaptive inter-subtest branching strategy in the measurement of classroom achievement (Research Report 79-6) %A Gialluca, K. A. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %0 Generic %D 1979 %T An evaluation of computerized adaptive testing %A J. R. McBride %C In Proceedings of the 21st Military Testing Association Conference. SanDiego, CA: Navy Personnel Research and Development Center. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1979 %T Evaluation of implied orders as a basis for tailored testing with simulation data %A Cliff, N. A. %A McCormick, D. %B Applied Psychological Measurement %V 3 %P 495-514 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1979 %T Evaluation of Implied Orders as a Basis for Tailored Testing with Simulation Data %A N. Cliff %A Cudeck, R. %A McCormick, D. J. %B Applied Psychological Measurement %V 3 %P 495-514 %G English %N 4 %0 Generic %D 1978 %T Evaluations of implied orders as a basis for tailored testing using simulations (Technical Report No. 4) %A Cliff, N. A. %A Cudeck, R. %A McCormick, D. %C Los Angeles CA: University of Southern California, Department of Psychology. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1977 %T Effects of Immediate Knowledge of Results and Adaptive Testing on Ability Test Performance %A Betz, N. E. %B Applied Psychological Measurement %V 1 %P 259-266 %G English %N 2 %0 Journal Article %J Applied Psychological Measurement %D 1977 %T Effects of immediate knowledge of results and adaptive testing on ability test performance %A Betz, N. E. %B Applied Psychological Measurement %V 2 %P 259-266 %G eng %0 Book Section %D 1977 %T Effects of Knowledge of Results and Varying Proportion Correct on Ability Test Performance and Psychological Variables %A Prestwood, J. S. %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %0 Book Section %D 1977 %T An empirical evaluation of implied orders as a basis for tailored testing %A Cliff, N. A. %A Cudeck, R. %A McCormick, D. %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1977 %T An Empirical Investigation of the Stratified Adaptive Computerized Testing Model %A B. K. Waters %B Applied Psychological Measurement %V 1 %P 141-152 %N 1 %0 Book Section %D 1977 %T Estimation of latent trait status in adaptive testing %A Sympson, J. B. %C D. J. Weiss (Ed.), Applications of computerized testing (Research Report 77-1). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1976 %T The effect of item pool characteristics on the operation of a tailored testing procedure %A Reckase, M. D. %B Paper presented at the annual meeting of the Psychometric Society %C Murray Hill NJ %G eng %0 Book Section %D 1976 %T Effectiveness of the ancillary estimation procedure %A Gugel, J. F. %A Schmidt, F. L. %A Urry, V. W. %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 103-106). Washington DC: U.S. Government Printing Office. %G eng %0 Generic %D 1976 %T Effects of immediate knowledge of results and adaptive testing on ability test performance (Research Report 76-3) %A Betz, N. E. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory %G eng %0 Generic %D 1976 %T Elements of a basic test theory generalizable to tailored testing %A Cliff, N. A. %C Unpublished manuscript %G eng %0 Book Section %D 1976 %T An empirical investigation of Weiss' stradaptive testing model %A B. K. Waters %C C. L. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 54-63.). Washington DC: U. S. Civil Service Commission. %G eng %0 Thesis %D 1976 %T An exploratory studyof the efficiency of the flexilevel testing procedure %A Seguin, S. P. %I University of Toronto %C Toronto, Canada %V Doctoral %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1975 %T The effect of item choice on ability estimation when using a simple logistic tailored testing model %A Reckase, M. D. %B Paper presented at the annual meeting of the American Educational Research Association %C Washington, D.C. %G eng %0 Generic %D 1975 %T Empirical and simulation studies of flexilevel ability testing (Research Report 75-3) %A Betz, N. E. %A Weiss, D. J. %C Minneapolis: Department of Psychology, Psychometric Methods Program %G eng %0 Generic %D 1975 %T An empirical comparison of two-stage and pyramidal ability testing (Research Report 75-1) %A Larkin, K. C. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Book Section %D 1975 %T Evaluating the results of computerized adaptive testing %A Sympson, J. B. %C D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 26-31. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Generic %D 1974 %T An empirical investigation of computer-administered pyramidal ability testing (Research Report 74-3) %A Larkin, K. C. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Conference Paper %B Annual meeting of the National Council on Measurement in Education %D 1974 %T An empirical investigation of the stability and accuracy of flexilevel tests %A Kocher, A.T. %B Annual meeting of the National Council on Measurement in Education %C Chicago IL %8 03/1074 %G eng %0 Book %D 1974 %T An empirical investigation of the stratified adaptive computerized testing model for the measurement of human ability %A B. K. Waters %C Unpublished Ph.D. dissertation, Florida State University %G eng %0 Book %D 1974 %T An evaluation of the self-scoring flexilevel testing model %A Olivier, P. %C Unpublished dissertation, Florida State University. Dissertation Abstracts International, 35 (7-A), 4257 %G eng %0 Thesis %D 1974 %T An evaluation of the self-scoring flexilevel testing model %A Olivier, P. %I Florida State University %G eng %9 Ph.D. Dissertation %0 Generic %D 1973 %T An empirical study of computer-administered two-stage ability testing (Research Report 73-4) %A Betz, N. E. %A Weiss, D. J. %C Minneapolis: Department of Psychology, Psychometric Methods Program %G eng %0 Journal Article %J Educational Research %D 1969 %T The efficacy of tailored testing %A Wood, R. L. %B Educational Research %V 11 %P 219-222 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1969 %T An exploratory study of programmed tests %A Cleary, T. A. %A Linn, R. L. %A Rock, D. A. %B Educational and Psychological Measurement %V 28 %P 345-360 %G eng %0 Generic %D 1967 %T An exploratory study of branching tests (Technical Research Note 188) %A Bayroff, A. G. %A Seeley, L. C. %C Washington DC: US Army Behavioral Science Research Laboratory. (NTIS No. AD 655263) %G eng %0 Book %D 1962 %T An evaluation of the sequential method of testing %A Paterson, J. J. %C Unpublished doctoral dissertation, Michigan State University %G eng %0 Generic %D 1962 %T Exploratory study of a sequential item test %A Seeley, L. C. %A Morton, M. A. %A Anderson, A. A. %C U.S. Army Personnel Research Office, Technical Research Note 129. %G eng %0 Journal Article %D 1953 %T An empirical study of the applicability of sequential analysis to item selection %A Anastasi, A. %V 13 %P 3-13 %G eng