%0 Journal Article %J Assessment %D In Press %T Development of a Computerized Adaptive Test for Anxiety Based on the Dutch–Flemish Version of the PROMIS Item Bank %A Gerard Flens %A Niels Smits %A Caroline B. Terwee %A Joost Dekker %A Irma Huijbrechts %A Philip Spinhoven %A Edwin de Beurs %X We used the Dutch–Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample (N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch–Flemish version of the PROMIS Anxiety item bank. %B Assessment %U https://doi.org/10.1177/1073191117746742 %R 10.1177/1073191117746742 %0 Journal Article %J Applied Psychological Measurement %D 2020 %T Framework for Developing Multistage Testing With Intersectional Routing for Short-Length Tests %A Kyung (Chris) T. Han %X Multistage testing (MST) has many practical advantages over typical item-level computerized adaptive testing (CAT), but there is a substantial tradeoff when using MST because of its reduced level of adaptability. In typical MST, the first stage almost always performs as a routing stage in which all test takers see a linear test form. If multiple test sections measure different but moderately or highly correlated traits, then a score estimate for one section might be capable of adaptively selecting item modules for following sections without having to administer routing stages repeatedly for each section. In this article, a new framework for developing MST with intersectional routing (ISR) was proposed and evaluated under several research conditions with different MST structures, section score distributions and relationships, and types of regression models for ISR. The overall findings of the study suggested that MST with ISR approach could improve measurement efficiency and test optimality especially with tests with short lengths. %B Applied Psychological Measurement %V 44 %P 87-102 %U https://doi.org/10.1177/0146621619837226 %R 10.1177/0146621619837226 %0 Journal Article %J Applied Psychological Measurement %D 2020 %T New Efficient and Practicable Adaptive Designs for Calibrating Items Online %A Yinhong He %A Ping Chen %A Yong Li %X When calibrating new items online, it is practicable to first compare all new items according to some criterion and then assign the most suitable one to the current examinee who reaches a seeding location. The modified D-optimal design proposed by van der Linden and Ren (denoted as D-VR design) works within this practicable framework with the aim of directly optimizing the estimation of item parameters. However, the optimal design point for a given new item should be obtained by comparing all examinees in a static examinee pool. Thus, D-VR design still has room for improvement in calibration efficiency from the view of traditional optimal design. To this end, this article incorporates the idea of traditional optimal design into D-VR design and proposes a new online calibration design criterion, namely, excellence degree (ED) criterion. Four different schemes are developed to measure the information provided by the current examinee when implementing this new criterion, and four new ED designs equipped with them are put forward accordingly. Simulation studies were conducted under a variety of conditions to compare the D-VR design and the four proposed ED designs in terms of calibration efficiency. Results showed that the four ED designs outperformed D-VR design in almost all simulation conditions. %B Applied Psychological Measurement %V 44 %P 3-16 %U https://doi.org/10.1177/0146621618824854 %R 10.1177/0146621618824854 %0 Journal Article %J Educational and Psychological Measurement %D 2019 %T Developing Multistage Tests Using D-Scoring Method %A Kyung (Chris) T. Han %A Dimiter M. Dimitrov %A Faisal Al-Mashary %X The D-scoring method for scoring and equating tests with binary items proposed by Dimitrov offers some of the advantages of item response theory, such as item-level difficulty information and score computation that reflects the item difficulties, while retaining the merits of classical test theory such as the simplicity of number correct score computation and relaxed requirements for model sample sizes. Because of its unique combination of those merits, the D-scoring method has seen quick adoption in the educational and psychological measurement field. Because item-level difficulty information is available with the D-scoring method and item difficulties are reflected in test scores, it conceptually makes sense to use the D-scoring method with adaptive test designs such as multistage testing (MST). In this study, we developed and compared several versions of the MST mechanism using the D-scoring approach and also proposed and implemented a new framework for conducting MST simulation under the D-scoring method. Our findings suggest that the score recovery performance under MST with D-scoring was promising, as it retained score comparability across different MST paths. We found that MST using the D-scoring method can achieve improvements in measurement precision and efficiency over linear-based tests that use D-scoring method. %B Educational and Psychological Measurement %V 79 %P 988-1008 %U https://doi.org/10.1177/0013164419841428 %R 10.1177/0013164419841428 %0 Journal Article %J Educational and Psychological Measurement %D 2019 %T Imputation Methods to Deal With Missing Responses in Computerized Adaptive Multistage Testing %A Dee Duygu Cetin-Berber %A Halil Ibrahim Sari %A Anne Corinne Huggins-Manley %X Routing examinees to modules based on their ability level is a very important aspect in computerized adaptive multistage testing. However, the presence of missing responses may complicate estimation of examinee ability, which may result in misrouting of individuals. Therefore, missing responses should be handled carefully. This study investigated multiple missing data methods in computerized adaptive multistage testing, including two imputation techniques, the use of full information maximum likelihood and the use of scoring missing data as incorrect. These methods were examined under the missing completely at random, missing at random, and missing not at random frameworks, as well as other testing conditions. Comparisons were made to baseline conditions where no missing data were present. The results showed that imputation and the full information maximum likelihood methods outperformed incorrect scoring methods in terms of average bias, average root mean square error, and correlation between estimated and true thetas. %B Educational and Psychological Measurement %V 79 %P 495-511 %U https://doi.org/10.1177/0013164418805532 %R 10.1177/0013164418805532 %0 Journal Article %J Applied Psychological Measurement %D 2019 %T An Investigation of Exposure Control Methods With Variable-Length CAT Using the Partial Credit Model %A Audrey J. Leroux %A J. Kay Waid-Ebbs %A Pey-Shan Wen %A Drew A. Helmer %A David P. Graham %A Maureen K. O’Connor %A Kathleen Ray %X The purpose of this simulation study was to investigate the effect of several different item exposure control procedures in computerized adaptive testing (CAT) with variable-length stopping rules using the partial credit model. Previous simulation studies on CAT exposure control methods with polytomous items rarely considered variable-length tests. The four exposure control techniques examined were the randomesque with a group of three items, randomesque with a group of six items, progressive-restricted standard error (PR-SE), and no exposure control. The two variable-length stopping rules included were the SE and predicted standard error reduction (PSER), along with three item pools of varied sizes (43, 86, and 172 items). Descriptive statistics on number of nonconvergent cases, measurement precision, testing burden, item overlap, item exposure, and pool utilization were calculated. Results revealed that the PSER stopping rule administered fewer items on average while maintaining measurement precision similar to the SE stopping rule across the different item pool sizes and exposure controls. The PR-SE exposure control procedure surpassed the randomesque methods by further reducing test overlap, maintaining maximum exposure rates at the target rate or lower, and utilizing all items from the pool with a minimal increase in number of items administered and nonconvergent cases. %B Applied Psychological Measurement %V 43 %P 624-638 %U https://doi.org/10.1177/0146621618824856 %R 10.1177/0146621618824856 %0 Journal Article %J Applied Psychological Measurement %D 2019 %T Multidimensional Computerized Adaptive Testing Using Non-Compensatory Item Response Theory Models %A Chia-Ling Hsu %A Wen-Chung Wang %X Current use of multidimensional computerized adaptive testing (MCAT) has been developed in conjunction with compensatory multidimensional item response theory (MIRT) models rather than with non-compensatory ones. In recognition of the usefulness of MCAT and the complications associated with non-compensatory data, this study aimed to develop MCAT algorithms using non-compensatory MIRT models and to evaluate their performance. For the purpose of the study, three item selection methods were adapted and compared, namely, the Fisher information method, the mutual information method, and the Kullback–Leibler information method. The results of a series of simulations showed that the Fisher information and mutual information methods performed similarly, and both outperformed the Kullback–Leibler information method. In addition, it was found that the more stringent the termination criterion and the higher the correlation between the latent traits, the higher the resulting measurement precision and test reliability. Test reliability was very similar across the dimensions, regardless of the correlation between the latent traits and termination criterion. On average, the difficulties of the administered items were found to be at a lower level than the examinees’ abilities, which shed light on item bank construction for non-compensatory items. %B Applied Psychological Measurement %V 43 %P 464-480 %U https://doi.org/10.1177/0146621618800280 %R 10.1177/0146621618800280 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T A Continuous a-Stratification Index for Item Exposure Control in Computerized Adaptive Testing %A Alan Huebner %A Chun Wang %A Bridget Daly %A Colleen Pinkelman %X The method of a-stratification aims to reduce item overexposure in computerized adaptive testing, as items that are administered at very high rates may threaten the validity of test scores. In existing methods of a-stratification, the item bank is partitioned into a fixed number of nonoverlapping strata according to the items’a, or discrimination, parameters. This article introduces a continuous a-stratification index which incorporates exposure control into the item selection index itself and thus eliminates the need for fixed discrete strata. The new continuous a-stratification index is compared with existing stratification methods via simulation studies in terms of ability estimation bias, mean squared error, and control of item exposure rates. %B Applied Psychological Measurement %V 42 %P 523-537 %U https://doi.org/10.1177/0146621618758289 %R 10.1177/0146621618758289 %0 Journal Article %J Journal of Educational Measurement %D 2018 %T Evaluation of a New Method for Providing Full Review Opportunities in Computerized Adaptive Testing—Computerized Adaptive Testing With Salt %A Cui, Zhongmin %A Liu, Chunyan %A He, Yong %A Chen, Hanwei %X Abstract Allowing item review in computerized adaptive testing (CAT) is getting more attention in the educational measurement field as more and more testing programs adopt CAT. The research literature has shown that allowing item review in an educational test could result in more accurate estimates of examinees’ abilities. The practice of item review in CAT, however, is hindered by the potential danger of test-manipulation strategies. To provide review opportunities to examinees while minimizing the effect of test-manipulation strategies, researchers have proposed different algorithms to implement CAT with restricted revision options. In this article, we propose and evaluate a new method that implements CAT without any restriction on item review. In particular, we evaluate the new method in terms of the accuracy on ability estimates and the robustness against test-manipulation strategies. This study shows that the newly proposed method is promising in a win-win situation: examinees have full freedom to review and change answers, and the impacts of test-manipulation strategies are undermined. %B Journal of Educational Measurement %V 55 %P 582-594 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12193 %R 10.1111/jedm.12193 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2018 %T Factors Affecting the Classification Accuracy and Average Length of a Variable-Length Cognitive Diagnostic Computerized Test %A Huebner, Alan %A Finkelman, Matthew D. %A Weissman, Alexander %B Journal of Computerized Adaptive Testing %V 6 %P 1-14 %U http://iacat.org/jcat/index.php/jcat/article/view/55/30 %N 1 %R 10.7333/1802-060101 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T Item Selection Methods in Multidimensional Computerized Adaptive Testing With Polytomously Scored Items %A Dongbo Tu %A Yuting Han %A Yan Cai %A Xuliang Gao %X Multidimensional computerized adaptive testing (MCAT) has been developed over the past decades, and most of them can only deal with dichotomously scored items. However, polytomously scored items have been broadly used in a variety of tests for their advantages of providing more information and testing complicated abilities and skills. The purpose of this study is to discuss the item selection algorithms used in MCAT with polytomously scored items (PMCAT). Several promising item selection algorithms used in MCAT are extended to PMCAT, and two new item selection methods are proposed to improve the existing selection strategies. Two simulation studies are conducted to demonstrate the feasibility of the extended and proposed methods. The simulation results show that most of the extended item selection methods for PMCAT are feasible and the new proposed item selection methods perform well. Combined with the security of the pool, when two dimensions are considered (Study 1), the proposed modified continuous entropy method (MCEM) is the ideal of all in that it gains the lowest item exposure rate and has a relatively high accuracy. As for high dimensions (Study 2), results show that mutual information (MUI) and MCEM keep relatively high estimation accuracy, and the item exposure rates decrease as the correlation increases. %B Applied Psychological Measurement %V 42 %P 677-694 %U https://doi.org/10.1177/0146621618762748 %R 10.1177/0146621618762748 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Comparison of Pretest Item Calibration Methods in a Computerized Adaptive Test (CAT) %A Huijuan Meng %A Chris Han %K CAT %K Pretest Item Calibration %X

Calibration methods for pretest items in a computerized adaptive test (CAT) are not a new area of research inquiry. After decades of research on CAT, the fixed item parameter calibration (FIPC) method has been widely accepted and used by practitioners to address two CAT calibration issues: (a) a restricted ability range each item is exposed to, and (b) a sparse response data matrix. In FIPC, the parameters of the operational items are fixed at their original values, and multiple expectation maximization (EM) cycles are used to estimate parameters of the pretest items with prior ability distribution being updated multiple times (Ban, Hanson, Wang, Yi, & Harris, 2001; Kang & Peterson, 2009; Pommerich & Segall, 2003).

Another calibration method is the fixed person parameter calibration (FPPC) method proposed by Stocking (1988) as “Method A.” Under this approach, candidates’ ability estimates are fixed in the calibration of pretest items and they define the scale on which the parameter estimates are reported. The logic of FPPC is suitable for CAT applications because the person parameters are estimated based on operational items and available for pretest item calibration. In Stocking (1988), the FPPC was evaluated using the LOGIST computer program developed by Wood, Wingersky, and Lord (1976). He reported that “Method A” produced larger root mean square errors (RMSEs) in the middle ability range than “Method B,” which required the use of anchor items (administered non-adaptively) and linking steps to attempt to correct for the potential scale drift due to the use of imperfect ability estimates.

Since then, new commercial software tools such as BILOG-MG and flexMIRT (Cai, 2013) have been developed to handle the FPPC method with different implementations (e.g., the MH-RM algorithm with flexMIRT). The performance of the FPPC method with those new software tools, however, has rarely been researched in the literature.

In our study, we evaluated the performance of two pretest item calibration methods using flexMIRT, the new software tool. The FIPC and FPPC are compared under various CAT settings. Each simulated exam contains 75% operational items and 25% pretest items, and real item parameters are used to generate the CAT data. This study also addresses the lack of guidelines in existing CAT item calibration literature regarding population ability shift and exam length (more accurate theta estimates are expected in longer exams). Thus, this study also investigates the following four factors and their impact on parameter estimation accuracy, including: (1) candidate population changes (3 ability distributions); (2) exam length (20: 15 OP + 5 PT, 40: 30 OP + 10 PT, and 60: 45 OP + 15 PT); (3) data model fit (3PL and 3PL with fixed C), and (4) pretest item calibration sample sizes (300, 500, and 1000). This study’s findings will fill the gap in this area of research and thus provide new information on which practitioners can base their decisions when selecting a pretest calibration method for their exams.

References

Ban, J. C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D. J. (2001). A comparative study of online pretest item—Calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement, 38(3), 191–212.

Cai, L. (2013). flexMIRT® Flexible Multilevel Multidimensional Item Analysis and Test Scoring (Version 2) [Computer software]. Chapel Hill, NC: Vector Psychometric Group.

Kang, T., & Petersen, N. S. (2009). Linking item parameters to a base scale (Research Report No. 2009– 2). Iowa City, IA: ACT.

Pommerich, M., & Segall, D.O. (2003, April). Calibrating CAT pools and online pretest items using marginal maximum likelihood methods. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.

Stocking, M. L. (1988). Scale drift in online calibration (Research Report No. 88–28). Princeton, NJ: Educational Testing Service.

Wood, R. L., Wingersky, M. S., & Lord, F. M. (1976). LOGIST: A computer program for estimating examinee ability and item characteristic curve parameters (RM76-6) [Computer program]. Princeton, NJ: Educational Testing Service.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Considerations in Performance Evaluations of Computerized Formative Assessments %A Michael Chajewski %A John Harnisher %K algebra %K Formative Assessment %K Performance Evaluations %X

Computerized adaptive instruments have been widely established and used in the context of summative assessments for purposes including licensure, admissions and proficiency testing. The benefits of examinee tailored examinations, which can provide estimates of performance that are more reliable and valid, have in recent years attracted a greater audience (i.e. patient oriented outcomes, test prep, etc.). Formative assessment, which are most widely understood in their implementation as diagnostic tools, have recently started to expand to lesser known areas of computerized testing such as in implementations of instructional designs aiming to maximize examinee learning through targeted practice.

Using a CAT instrument within the framework of evaluating repetitious examinee performances (in such settings as a Quiz Bank practices for example) poses unique challenges not germane to summative assessments. The scale on which item parameters (and subsequently examinee performance estimates such as Maximum Likelihood Estimates) are determined usually do not take change over time under consideration. While vertical scaling features resolve the learning acquisition problem, most content practice engines do not make use of explicit practice windows which could be vertically aligned. Alternatively, the Multidimensional (MIRT)- and Hierarchical Item Response Theory (HIRT) models allow for the specification of random effects associated with change over time in examinees’ skills, but are often complex and require content and usage resources not often observed.

The research submitted for consideration simulated examinees’ repeated variable length Quiz Bank practice in algebra using a 500 1-PL operational item pool. The stability simulations sought to determine with which rolling item interval size ability estimates would provide the most informative insight into the examinees’ learning progression over time. Estimates were evaluated in terms of reduction in estimate uncertainty, bias and RMSD with the true and total item based ability estimates. It was found that rolling item intervals between 20-25 items provided the best reduction of uncertainty around the estimate without compromising the ability to provide informed performance estimates to students. However, while asymptotically intervals of 20-25 items tended to provide adequate estimates of performance, changes over shorter periods of time assessed with shorter quizzes could not be detected as those changes would be suppressed in lieu of the performance based on the full interval considered. Implications for infrastructure (such as recommendation engines, etc.), product and scale development are discussed.

Session video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Journal Article %J Evaluation & the Health Professions %D 2017 %T Development of a Computer Adaptive Test for Depression Based on the Dutch-Flemish Version of the PROMIS Item Bank %A Gerard Flens %A Niels Smits %A Caroline B. Terwee %A Joost Dekker %A Irma Huijbrechts %A Edwin de Beurs %X We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample (N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development. %B Evaluation & the Health Professions %V 40 %P 79-105 %U https://doi.org/10.1177/0163278716684168 %R 10.1177/0163278716684168 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Efficiency of Item Selection in CD-CAT Based on Conjunctive Bayesian Network Modeling Hierarchical attributes %A Soo-Yun Han %A Yun Joo Yoo %K CD-CAT %K Conjuctive Bayesian Network Modeling %K item selection %X

Cognitive diagnosis models (CDM) aim to diagnosis examinee’s mastery status of multiple fine-grained skills. As new development for cognitive diagnosis methods emerges, much attention is given to cognitive diagnostic computerized adaptive testing (CD-CAT) as well. The topics such as item selection methods, item exposure control strategies, and online calibration methods, which have been wellstudied for traditional item response theory (IRT) based CAT, are also investigated in the context of CD-CAT (e.g., Xu, Chang, & Douglas, 2003; Wang, Chang, & Huebner, 2011; Chen et al., 2012).

In CDM framework, some researchers suggest to model structural relationship between cognitive skills, or namely, attributes. Especially, attributes can be hierarchical, such that some attributes must be acquired before the subsequent ones are mastered. For example, in mathematics, addition must be mastered before multiplication, which gives a hierarchy model for addition skill and multiplication skill. Recently, new CDMs considering attribute hierarchies have been suggested including the Attribute Hierarchy Method (AHM; Leighton, Gierl, & Hunka, 2004) and the Hierarchical Diagnostic Classification Models (HDCM; Templin & Bradshaw, 2014).

Bayesian Networks (BN), the probabilistic graphical models representing the relationship of a set of random variables using a directed acyclic graph with conditional probability distributions, also provide an efficient framework for modeling the relationship between attributes (Culbertson, 2016). Among various BNs, conjunctive Bayesian network (CBN; Beerenwinkel, Eriksson, & Sturmfels, 2007) is a special kind of BN, which assumes partial ordering between occurrences of events and conjunctive constraints between them.

In this study, we propose using CBN for modeling attribute hierarchies and discuss the advantage of CBN for CDM. We then explore the impact of the CBN modeling on the efficiency of item selection methods for CD-CAT when the attributes are truly hierarchical. To this end, two simulation studies, one for fixed-length CAT and another for variable-length CAT, are conducted. For each studies, two attribute hierarchy structures with 5 and 8 attributes are assumed. Among the various item selection methods developed for CD-CAT, six algorithms are considered: posterior-weighted Kullback-Leibler index (PWKL; Cheng, 2009), the modified PWKL index (MPWKL; Kaplan, de la Torre, Barrada, 2015), Shannon entropy (SHE; Tatsuoka, 2002), mutual information (MI; Wang, 2013), posterior-weighted CDM discrimination index (PWCDI; Zheng & Chang, 2016) and posterior-weighted attribute-level CDM discrimination index (PWACDI; Zheng & Chang, 2016). The impact of Q-matrix structure, item quality, and test termination rules on the efficiency of item selection algorithms is also investigated. Evaluation measures include the attribute classification accuracy (fixed-length experiment) and test length of CDCAT until stopping (variable-length experiment).

The results of the study indicate that the efficiency of item selection is improved by directly modeling the attribute hierarchies using CBN. The test length until achieving diagnosis probability threshold was reduced to 50-70% for CBN based CAT compared to the CD-CAT assuming independence of attributes. The magnitude of improvement is greater when the cognitive model of the test includes more attributes and when the test length is shorter. We conclude by discussing how Q-matrix structure, item quality, and test termination rules affect the efficiency.

References

Beerenwinkel, N., Eriksson, N., & Sturmfels, B. (2007). Conjunctive bayesian networks. Bernoulli, 893- 909.

Chen, P., Xin, T., Wang, C., & Chang, H. H. (2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77(2), 201-222.

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619-632.

Culbertson, M. J. (2016). Bayesian networks in educational assessment: the state of the field. Applied Psychological Measurement, 40(1), 3-21.

Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167-188.

Leighton, J. P., Gierl, M. J., & Hunka, S. M. (2004). The attribute hierarchy method for cognitive assessment: a variation on Tatsuoka's rule‐space approach. Journal of Educational Measurement, 41(3), 205-237.

Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 51(3), 337-350.

Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317-339. Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and Psychological Measurement, 73(6), 1017-1035.

Wang, C., Chang, H. H., & Huebner, A. (2011). Restrictive stochastic item selection methods in cognitive diagnostic computerized adaptive testing. Journal of Educational Measurement, 48(3), 255-273.

Xu, X., Chang, H., & Douglas, J. (2003, April). A simulation study to compare CAT strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago.

Zheng, C., & Chang, H. H. (2016). High-efficiency response distribution–based item selection algorithms for short-length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40(8), 608-624.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1RbO2gd4aULqsSgRi_VZudNN_edX82NeD %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T An Imputation Approach to Handling Incomplete Computerized Tests %A Troy Chen %A Chi-Yu Huang %A Chunyan Liu %K CAT %K imputation approach %K incomplete computerized test %X

As technology advances, computerized adaptive testing (CAT) is becoming increasingly popular as it allows tests to be tailored to an examinee’s ability.  Nevertheless, examinees might devise testing strategies to use CAT to their advantage.  For instance, if only the items that examinees answer count towards their score, then a higher theta score might be obtained by spending more time on items at the beginning of the test and skipping items at the end if time runs out. This type of gaming can be discouraged if examinees’ scores are lowered or “penalized” based on the amount of non-response.

The goal of this study was to devise a penalty function that would meet two criteria: 1) the greater the omit rate, the greater the penalty, and 2) examinees with the same ability and the same omit rate should receive the same penalty. To create the penalty, theta was calculated based on only the items the examinee responded to ( ).  Next, the expected number correct score (EXR) was obtained using  and the test characteristic curve. A penalized expected number correct score (E ) was obtained by multiplying EXR by the proportion of items the examinee responded to. Finally, the penalized theta ( ) was identified using the test characteristic curve. Based on   and the item parameters ( ) of an unanswered item, the likelihood of a correct response,  , is computed and employed to estimate the imputed score ( ) for the unanswered item.

Two datasets were used to generate tests with completion rates of 50%, 80%, and 90%.  The first dataset included real data where approximately 4,500 examinees responded to a 21 -item test which provided a baseline/truth. Sampling was done to achieve the three completion rate conditions. The second dataset consisted of simulated item scores for 50,000 simulees under a 1-2-4 multi-stage CAT design where each stage contained seven items. Imputed item scores for unanswered items were computed using a variety of values for G (and therefore T).  Three other approaches to handling unanswered items were also considered: all correct (i.e., T = 0), all incorrect (i.e., T = 1), and random scoring (i.e., T = 0.5).

The current study investigated the impact on theta estimates resulting from the proposed approach to handling unanswered items in a fixed-length CAT. In real testing situations, when examinees do not finish a test, it is hard to tell whether they tried diligently but ran out of time or whether they attempted to manipulate the scoring engine.  To handle unfinished tests with penalties, the proposed approach considers examinees’ abilities and incompletion rates. The results of this study provide direction for psychometric practitioners when considering penalties for omitted responses.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1vznZeO3nsZZK0k6_oyw5c9ZTP8uyGnXh %0 Journal Article %J Applied Psychological MeasurementApplied Psychological Measurement %D 2017 %T The Information Product Methods: A Unified Approach to Dual-Purpose Computerized Adaptive Testing %A Zheng, Chanjin %A He, Guanrui %A Gao, Chunlei %X This article gives a brief summary of major approaches in dual-purpose computerized adaptive testing (CAT) in which the test is tailored interactively to both an examinee?s overall ability level, ?, and attribute mastery level, α. It also proposes an information product approach whose connections to the current methods are revealed. An updated comprehensive empirical study demonstrated that the information product approach not only can offer a unified framework to connect all other approaches but also can mitigate the weighting issue in the dual-information approach. %B Applied Psychological MeasurementApplied Psychological Measurement %V 42 %P 321 - 324 %8 2018/06/01 %@ 0146-6216 %U https://doi.org/10.1177/0146621617730392 %N 4 %! Applied Psychological Measurement %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Item Pool Design and Evaluation %A Mark D Reckase %A Wei He %A Jing-Ru Xu %A Xuechun Zhou %K CAT %K Item Pool Design %X

Early work on CAT tended to use existing sets of items which came from fixed length test forms. These sets of items were selected to meet much different requirements than are needed for a CAT; decision making or covering a content domain. However, there was also some early work that suggested having items equally distributed over the range of proficiency that was of interest or concentrated at a decision point. There was also some work that showed that there was bias in proficiency estimates when an item pool was too easy or too hard. These early findings eventually led to work on item pool design and, more recently, on item pool evaluation. This presentation gives a brief overview of these topics to give some context for the following presentations in this symposium.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1ZAsqm1yNZlliqxEHcyyqQ_vOSu20xxZs %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T New Challenges (With Solutions) and Innovative Applications of CAT %A Chun Wang %A David J. Weiss %A Xue Zhang %A Jian Tao %A Yinhong He %A Ping Chen %A Shiyu Wang %A Susu Zhang %A Haiyan Lin %A Xiaohong Gao %A Hua-Hua Chang %A Zhuoran Shang %K CAT %K challenges %K innovative applications %X

Over the past several decades, computerized adaptive testing (CAT) has profoundly changed the administration of large-scale aptitude tests, state-wide achievement tests, professional licensure exams, and health outcome measures. While many challenges of CAT have been successfully addressed due to the continual efforts of researchers in the field, there are still many remaining, longstanding challenges that have yet to be resolved. This symposium will begin with three presentations, each of which provides a sound solution to one of the unresolved challenges. They are (1) item calibration when responses are “missing not at random” from CAT administration; (2) online calibration of new items when person traits have non-ignorable measurement error; (3) establishing consistency and asymptotic normality of latent trait estimation when allowing item response revision in CAT. In addition, this symposium also features innovative applications of CAT. In particular, there is emerging interest in using cognitive diagnostic CAT to monitor and detect learning progress (4th presentation). Last but not least, the 5th presentation illustrates the power of multidimensional polytomous CAT that permits rapid identification of hospitalized patients’ rehabilitative care needs in health outcomes measurement. We believe this symposium covers a wide range of interesting and important topics in CAT.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1Wvgxw7in_QCq_F7kzID6zCZuVXWcFDPa %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Using Bayesian Decision Theory in Cognitive Diagnosis Computerized Adaptive Testing %A Chia-Ling Hsu %A Wen-Chung Wang %A ShuYing Chen %K Bayesian Decision Theory %K CD-CAT %X

Cognitive diagnosis computerized adaptive testing (CD-CAT) purports to provide each individual a profile about the strengths and weaknesses of attributes or skills with computerized adaptive testing. In the CD-CAT literature, researchers dedicated to evolving item selection algorithms to improve measurement efficiency, and most algorithms were developed based on information theory. By the discontinuous nature of the latent variables in CD-CAT, this study introduced an alternative for item selection, called the minimum expected cost (MEC) method, which was derived based on Bayesian decision theory. Using simulations, the MEC method was evaluated against the posterior weighted Kullback-Leibler (PWKL) information, the modified PWKL (MPWKL), and the mutual information (MI) methods by manipulating item bank quality, item selection algorithm, and termination rule. Results indicated that, regardless of item quality and termination criterion, the MEC, MPWKL, and MI methods performed very similarly and they all outperformed the PWKL method in classification accuracy and test efficiency, especially in short tests; the MEC method had more efficient item bank usage than the MPWKL and MI methods. Moreover, the MEC method could consider the costs of incorrect decisions and improve classification accuracy and test efficiency when a particular profile was of concern. All the results suggest the practicability of the MEC method in CD-CAT.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata Japan %8 08/2017 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2016 %T On Computing the Key Probability in the Stochastically Curtailed Sequential Probability Ratio Test %A Huebner, Alan R. %A Finkelman, Matthew D. %X The Stochastically Curtailed Sequential Probability Ratio Test (SCSPRT) is a termination criterion for computerized classification tests (CCTs) that has been shown to be more efficient than the well-known Sequential Probability Ratio Test (SPRT). The performance of the SCSPRT depends on computing the probability that at a given stage in the test, an examinee’s current interim classification status will not change before the end of the test. Previous work discusses two methods of computing this probability, an exact method in which all potential responses to remaining items are considered and an approximation based on the central limit theorem (CLT) requiring less computation. Generally, the CLT method should be used early in the test when the number of remaining items is large, and the exact method is more appropriate at later stages of the test when few items remain. However, there is currently a dearth of information as to the performance of the SCSPRT when using the two methods. For the first time, the exact and CLT methods of computing the crucial probability are compared in a simulation study to explore whether there is any effect on the accuracy or efficiency of the CCT. The article is focused toward practitioners and researchers interested in using the SCSPRT as a termination criterion in an operational CCT. %B Applied Psychological Measurement %V 40 %P 142-156 %U http://apm.sagepub.com/content/40/2/142.abstract %R 10.1177/0146621615611633 %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Maximum Likelihood Score Estimation Method With Fences for Short-Length Tests and Computerized Adaptive Tests %A Han, Kyung T. %X A critical shortcoming of the maximum likelihood estimation (MLE) method for test score estimation is that it does not work with certain response patterns, including ones consisting only of all 0s or all 1s. This can be problematic in the early stages of computerized adaptive testing (CAT) administration and for tests short in length. To overcome this challenge, test practitioners often set lower and upper bounds of theta estimation and truncate the score estimation to be one of those bounds when the log likelihood function fails to yield a peak due to responses consisting only of 0s or 1s. Even so, this MLE with truncation (MLET) method still cannot handle response patterns in which all harder items are correct and all easy items are incorrect. Bayesian-based estimation methods such as the modal a posteriori (MAP) method or the expected a posteriori (EAP) method can be viable alternatives to MLE. The MAP or EAP methods, however, are known to result in estimates biased toward the center of a prior distribution, resulting in a shrunken score scale. This study introduces an alternative approach to MLE, called MLE with fences (MLEF). In MLEF, several imaginary “fence” items with fixed responses are introduced to form a workable log likelihood function even with abnormal response patterns. The findings of this study suggest that, unlike MLET, the MLEF can handle any response patterns and, unlike both MAP and EAP, results in score estimates that do not cause shrinkage of the theta scale. %B Applied Psychological Measurement %V 40 %P 289-301 %U http://apm.sagepub.com/content/40/4/289.abstract %R 10.1177/0146621616631317 %0 Journal Article %J Educational Measurement: Issues and Practice. %D 2016 %T Using Response Time to Detect Item Preknowledge in Computer?Based Licensure Examinations %A Qian H. %A Staniewska, D. %A Reckase, M. %A Woo, A. %X This article addresses the issue of how to detect item preknowledge using item response time data in two computer-based large-scale licensure examinations. Item preknowledge is indicated by an unexpected short response time and a correct response. Two samples were used for detecting item preknowledge for each examination. The first sample was from the early stage of the operational test and was used for item calibration. The second sample was from the late stage of the operational test, which may feature item preknowledge. The purpose of this research was to explore whether there was evidence of item preknowledge and compromised items in the second sample using the parameters estimated from the first sample. The results showed that for one nonadaptive operational examination, two items (of 111) were potentially exposed, and two candidates (of 1,172) showed some indications of preknowledge on multiple items. For another licensure examination that featured computerized adaptive testing, there was no indication of item preknowledge or compromised items. Implications for detected aberrant examinees and compromised items are discussed in the article. %B Educational Measurement: Issues and Practice. %V 35 %N 1 %R http://dx.doi.org/10.1111/emip.12102 %0 Journal Article %J Journal of Educational Measurement %D 2015 %T Variable-Length Computerized Adaptive Testing Using the Higher Order DINA Model %A Hsu, Chia-Ling %A Wang, Wen-Chung %X Cognitive diagnosis models provide profile information about a set of latent binary attributes, whereas item response models yield a summary report on a latent continuous trait. To utilize the advantages of both models, higher order cognitive diagnosis models were developed in which information about both latent binary attributes and latent continuous traits is available. To facilitate the utility of cognitive diagnosis models, corresponding computerized adaptive testing (CAT) algorithms were developed. Most of them adopt the fixed-length rule to terminate CAT and are limited to ordinary cognitive diagnosis models. In this study, the higher order deterministic-input, noisy-and-gate (DINA) model was used as an example, and three criteria based on the minimum-precision termination rule were implemented: one for the latent class, one for the latent trait, and the other for both. The simulation results demonstrated that all of the termination criteria were successful when items were selected according to the Kullback-Leibler information and the posterior-weighted Kullback-Leibler information, and the minimum-precision rule outperformed the fixed-length rule with a similar test length in recovering the latent attributes and the latent trait. %B Journal of Educational Measurement %V 52 %P 125–143 %U http://dx.doi.org/10.1111/jedm.12069 %R 10.1111/jedm.12069 %0 Journal Article %J Educational and Psychological Measurement %D 2014 %T A Comparison of Four Item-Selection Methods for Severely Constrained CATs %A He, Wei %A Diao, Qi %A Hauser, Carl %X

This study compared four item-selection procedures developed for use with severely constrained computerized adaptive tests (CATs). Severely constrained CATs refer to those adaptive tests that seek to meet a complex set of constraints that are often not conclusive to each other (i.e., an item may contribute to the satisfaction of several constraints at the same time). The procedures examined in the study included the weighted deviation model (WDM), the weighted penalty model (WPM), the maximum priority index (MPI), and the shadow test approach (STA). In addition, two modified versions of the MPI procedure were introduced to deal with an edge case condition that results in the item selection procedure becoming dysfunctional during a test. The results suggest that the STA worked best among all candidate methods in terms of measurement accuracy and constraint management. For the other three heuristic approaches, they did not differ significantly in measurement accuracy and constraint management at the lower bound level. However, the WPM method appears to perform considerably better in overall constraint management than either the WDM or MPI method. Limitations and future research directions were also discussed.

%B Educational and Psychological Measurement %V 74 %P 677-696 %U http://epm.sagepub.com/content/74/4/677.abstract %R 10.1177/0013164413517503 %0 Journal Article %J Educational and Psychological Measurement %D 2014 %T Item Pool Design for an Operational Variable-Length Computerized Adaptive Test %A He, Wei %A Reckase, Mark D. %X

For computerized adaptive tests (CATs) to work well, they must have an item pool with sufficient numbers of good quality items. Many researchers have pointed out that, in developing item pools for CATs, not only is the item pool size important but also the distribution of item parameters and practical considerations such as content distribution and item exposure issues. Yet, there is little research on how to design item pools to have those desirable features. The research reported in this article provided step-by-step hands-on guidance on the item pool design process by applying the bin-and-union method to design item pools for a large-scale licensure CAT employing complex adaptive testing algorithm with variable test length, a decision based on stopping rule, content balancing, and exposure control. The design process involved extensive simulations to identify several alternative item pool designs and evaluate their performance against a series of criteria. The design output included the desired item pool size and item parameter distribution. The results indicate that the mechanism used to identify the desirable item pool features functions well and that two recommended item pool designs would support satisfactory performance of the operational testing program.

%B Educational and Psychological Measurement %V 74 %P 473-494 %U http://epm.sagepub.com/content/74/3/473.abstract %R 10.1177/0013164413509629 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T Stratified Item Selection and Exposure Control in Unidimensional Adaptive Testing in the Presence of Two-Dimensional Data %A Kalinowski, Kevin E. %A Natesan, Prathiba %A Henson, Robin K. %X

It is not uncommon to use unidimensional item response theory models to estimate ability in multidimensional data with computerized adaptive testing (CAT). The current Monte Carlo study investigated the penalty of this model misspecification in CAT implementations using different item selection methods and exposure control strategies. Three item selection methods—maximum information (MAXI), a-stratification (STRA), and a-stratification with b-blocking (STRB) with and without Sympson–Hetter (SH) exposure control strategy—were investigated. Calibrating multidimensional items as unidimensional items resulted in inaccurate item parameter estimates. Therefore, MAXI performed better than STRA and STRB in estimating the ability parameters. However, all three methods had relatively large standard errors. SH exposure control had no impact on the number of overexposed items. Existing unidimensional CAT implementations might consider using MAXI only if recalibration as multidimensional model is too expensive. Otherwise, building a CAT pool by calibrating multidimensional data as unidimensional is not recommended.

%B Applied Psychological Measurement %V 38 %P 563-576 %U http://apm.sagepub.com/content/38/7/563.abstract %R 10.1177/0146621614536768 %0 Journal Article %J Educational and Psychological Measurement %D 2013 %T A Comparison of Exposure Control Procedures in CATs Using the 3PL Model %A Leroux, Audrey J. %A Lopez, Myriam %A Hembry, Ian %A Dodd, Barbara G. %X

This study compares the progressive-restricted standard error (PR-SE) exposure control procedure to three commonly used procedures in computerized adaptive testing, the randomesque, Sympson–Hetter (SH), and no exposure control methods. The performance of these four procedures is evaluated using the three-parameter logistic model under the manipulated conditions of item pool size (small vs. large) and stopping rules (fixed-length vs. variable-length). PR-SE provides the advantage of similar constraints to SH, without the need for a preceding simulation study to execute it. Overall for the large and small item banks, the PR-SE method administered almost all of the items from the item pool, whereas the other procedures administered about 52% or less of the large item bank and 80% or less of the small item bank. The PR-SE yielded the smallest amount of item overlap between tests across conditions and administered fewer items on average than SH. PR-SE obtained these results with similar, and acceptable, measurement precision compared to the other exposure control procedures while vastly improving on item pool usage.

%B Educational and Psychological Measurement %V 73 %P 857-874 %U http://epm.sagepub.com/content/73/5/857.abstract %R 10.1177/0013164413486802 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2013 %T Item Ordering in Stochastically Curtailed Health Questionnaires With an Observable Outcome %A Finkelman, M. D. %A Kim, W. %A He, Y. %A Lai, A.M. %B Journal of Computerized Adaptive Testing %V 1 %P 38-66 %G en %N 3 %R 10.7333/1304-0103038 %0 Journal Article %J Applied Psychological Measurement %D 2013 %T Item Pocket Method to Allow Response Review and Change in Computerized Adaptive Testing %A Han, Kyung T. %X

Most computerized adaptive testing (CAT) programs do not allow test takers to review and change their responses because it could seriously deteriorate the efficiency of measurement and make tests vulnerable to manipulative test-taking strategies. Several modified testing methods have been developed that provide restricted review options while limiting the trade-off in CAT efficiency. The extent to which these methods provided test takers with options to review test items, however, still was quite limited. This study proposes the item pocket (IP) method, a new testing approach that allows test takers greater flexibility in changing their responses by eliminating restrictions that prevent them from moving across test sections to review their answers. A series of simulations were conducted to evaluate the robustness of the IP method against various manipulative test-taking strategies. Findings and implications of the study suggest that the IP method may be an effective solution for many CAT programs when the IP size and test time limit are properly set.

%B Applied Psychological Measurement %V 37 %P 259-275 %U http://apm.sagepub.com/content/37/4/259.abstract %R 10.1177/0146621612473638 %0 Journal Article %J Applied Psychological Measurement %D 2013 %T Variable-Length Computerized Adaptive Testing Based on Cognitive Diagnosis Models %A Hsu, Chia-Ling %A Wang, Wen-Chung %A Chen, Shu-Ying %X

Interest in developing computerized adaptive testing (CAT) under cognitive diagnosis models (CDMs) has increased recently. CAT algorithms that use a fixed-length termination rule frequently lead to different degrees of measurement precision for different examinees. Fixed precision, in which the examinees receive the same degree of measurement precision, is a major advantage of CAT over nonadaptive testing. In addition to the precision issue, test security is another important issue in practical CAT programs. In this study, the authors implemented two termination criteria for the fixed-precision rule and evaluated their performance under two popular CDMs using simulations. The results showed that using the two criteria with the posterior-weighted Kullback–Leibler information procedure for selecting items could achieve the prespecified measurement precision. A control procedure was developed to control item exposure and test overlap simultaneously among examinees. The simulation results indicated that in contrast to no method of controlling exposure, the control procedure developed in this study could maintain item exposure and test overlap at the prespecified level at the expense of only a few more items.

%B Applied Psychological Measurement %V 37 %P 563-582 %U http://apm.sagepub.com/content/37/7/563.abstract %R 10.1177/0146621613488642 %0 Journal Article %J Educational and Psychological Measurement %D 2012 %T Comparison Between Dichotomous and Polytomous Scoring of Innovative Items in a Large-Scale Computerized Adaptive Test %A Jiao, H. %A Liu, J. %A Haynie, K. %A Woo, A. %A Gorham, J. %X

This study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test settings were explored in one real data analysis and two simulation studies when two different polytomous scoring algorithms, automated polytomous scoring and rater-generated polytomous scoring, were applied. For the real data analyses, the ability estimates from dichotomous and polytomous scoring were highly correlated; the classification consistency between different scoring algorithms was nearly perfect. Information distribution changed slightly in the operational item bank. In the two simulation studies comparing each polytomous scoring with dichotomous scoring, the ability estimates resulting from polytomous scoring had slightly higher measurement precision than those resulting from dichotomous scoring. The practical impact related to classification decision was minor because of the extremely small number of items that could be scored polytomously in this current study.

%B Educational and Psychological Measurement %V 72 %P 493-509 %G eng %R 10.1177/0013164411422903 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T Computerized Adaptive Testing Using a Class of High-Order Item Response Theory Models %A Huang, Hung-Yu %A Chen, Po-Hsi %A Wang, Wen-Chung %X

In the human sciences, a common assumption is that latent traits have a hierarchical structure. Higher order item response theory models have been developed to account for this hierarchy. In this study, computerized adaptive testing (CAT) algorithms based on these kinds of models were implemented, and their performance under a variety of situations was examined using simulations. The results showed that the CAT algorithms were very effective. The progressive method for item selection, the Sympson and Hetter method with online and freeze procedure for item exposure control, and the multinomial model for content balancing can simultaneously maintain good measurement precision, item exposure control, content balance, test security, and pool usage.

%B Applied Psychological Measurement %V 36 %P 689-706 %U http://apm.sagepub.com/content/36/8/689.abstract %R 10.1177/0146621612459552 %0 Journal Article %J Journal of Educational Measurement %D 2012 %T Detecting Local Item Dependence in Polytomous Adaptive Data %A Mislevy, Jessica L. %A Rupp, André A. %A Harring, Jeffrey R. %X

A rapidly expanding arena for item response theory (IRT) is in attitudinal and health-outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence have been studied both for polytomous items in fixed-form settings and for dichotomous items in CAT settings, there have been no publications applying local item dependence detection methodology to polytomous items in CAT despite its central importance to these applications. The current research uses a simulation study to investigate the extension of widely used pairwise statistics, Yen's Q3 Statistic and Pearson's Statistic X2, in this context. The simulation design and results are contextualized throughout with a real item bank of this type from the Patient-Reported Outcomes Measurement Information System (PROMIS).

%B Journal of Educational Measurement %V 49 %P 127–147 %U http://dx.doi.org/10.1111/j.1745-3984.2012.00165.x %R 10.1111/j.1745-3984.2012.00165.x %0 Journal Article %J Journal of Educational Measurement %D 2012 %T An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing %A Han, Kyung T. %X

Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation and long-term quality control of CAT. This study proposed a new item selection method using the “efficiency balanced information” criterion to address issues with the maximum Fisher information method and stratification methods. According to the simulation results, the new efficiency balanced information method had desirable advantages over the other studied item selection methods in terms of improving the optimality of CAT assembly and utilizing items with low a-values while eliminating the need for item pool stratification.

%B Journal of Educational Measurement %V 49 %P 225–246 %U http://dx.doi.org/10.1111/j.1745-3984.2012.00173.x %R 10.1111/j.1745-3984.2012.00173.x %0 Journal Article %J Applied Psychological Measurement %D 2012 %T An Empirical Evaluation of the Slip Correction in the Four Parameter Logistic Models With Computerized Adaptive Testing %A Yen, Yung-Chin %A Ho, Rong-Guey %A Laio, Wen-Wei %A Chen, Li-Ju %A Kuo, Ching-Chin %X

In a selected response test, aberrant responses such as careless errors and lucky guesses might cause error in ability estimation because these responses do not actually reflect the knowledge that examinees possess. In a computerized adaptive test (CAT), these aberrant responses could further cause serious estimation error due to dynamic item administration. To enhance the robust performance of CAT against aberrant responses, Barton and Lord proposed the four-parameter logistic (4PL) item response theory (IRT) model. However, most studies relevant to the 4PL IRT model were conducted based on simulation experiments. This study attempts to investigate the performance of the 4PL IRT model as a slip-correction mechanism with an empirical experiment. The results showed that the 4PL IRT model could not only reduce the problematic underestimation of the examinees’ ability introduced by careless mistakes in practical situations but also improve measurement efficiency.

%B Applied Psychological Measurement %V 36 %P 75-87 %U http://apm.sagepub.com/content/36/2/75.abstract %R 10.1177/0146621611432862 %0 Journal Article %J Practical Assessment, Research & Evaluation %D 2012 %T Item Overexposure in Computerized Classification Tests Using Sequential Item Selection %A Huebner, A. %X

Computerized classification tests (CCTs) often use sequential item selection which administers items according to maximizing psychometric information at a cut point demarcating passing and failing scores. This paper illustrates why this method of item selection leads to the overexposure of a significant number of items, and the performances of three different methods for controlling maximum item exposure rates in CCTs are compared. Specifically, the Sympson-Hetter, restricted, and item eligibility methods are examined in two studies realistically simulating different types of CCTs and are evaluated based upon criteria including classification accuracy, the number of items exceeding the desired maximum exposure rate, and test overlap. The pros and cons of each method are discussed from a practical perspective.

%B Practical Assessment, Research & Evaluation %V 17 %G English %N 12 %0 Journal Article %J Applied Measurement in Education %D 2012 %T Item Selection and Ability Estimation Procedures for a Mixed-Format Adaptive Test %A Ho, Tsung-Han %A Dodd, Barbara G. %B Applied Measurement in Education %V 25 %P 305-326 %U http://www.tandfonline.com/doi/abs/10.1080/08957347.2012.714686 %R 10.1080/08957347.2012.714686 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T A Mixture Rasch Model–Based Computerized Adaptive Test for Latent Class Identification %A Hong Jiao, %A Macready, George %A Liu, Junhui %A Cho, Youngmi %X

This study explored a computerized adaptive test delivery algorithm for latent class identification based on the mixture Rasch model. Four item selection methods based on the Kullback–Leibler (KL) information were proposed and compared with the reversed and the adaptive KL information under simulated testing conditions. When item separation was large, all item selection methods did not differ evidently in terms of accuracy in classifying examinees into different latent classes and estimating latent ability. However, when item separation was small, two methods with class-specific ability estimates performed better than the other two methods based on a single latent ability estimate across all latent classes. The three types of KL information distributions were compared. The KL and the reversed KL information could be the same or different depending on the ability level and the item difficulty difference between latent classes. Although the KL information and the reversed KL information were different at some ability levels and item difficulty difference levels, the use of the KL, the reversed KL, or the adaptive KL information did not affect the results substantially due to the symmetric distribution of item difficulty differences between latent classes in the simulated item pools. Item pool usage and classification convergence points were examined as well.

%B Applied Psychological Measurement %V 36 %P 469-493 %U http://apm.sagepub.com/content/36/6/469.abstract %R 10.1177/0146621612450068 %0 Journal Article %J Educational and Psychological Measurement %D 2012 %T On the Reliability and Validity of a Numerical Reasoning Speed Dimension Derived From Response Times Collected in Computerized Testing %A Davison, Mark L. %A Semmes, Robert %A Huang, Lan %A Close, Catherine N. %X

Data from 181 college students were used to assess whether math reasoning item response times in computerized testing can provide valid and reliable measures of a speed dimension. The alternate forms reliability of the speed dimension was .85. A two-dimensional structural equation model suggests that the speed dimension is related to the accuracy of speeded responses. Speed factor scores were significantly correlated with performance on the ACT math scale. Results suggest that the speed dimension underlying response times can be reliably measured and that the dimension is related to the accuracy of performance under the pressure of time limits.

%B Educational and Psychological Measurement %V 72 %P 245-263 %U http://epm.sagepub.com/content/72/2/245.abstract %R 10.1177/0013164411408412 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T A Stochastic Method for Balancing Item Exposure Rates in Computerized Classification Tests %A Huebner, Alan %A Li, Zhushan %X

Computerized classification tests (CCTs) classify examinees into categories such as pass/fail, master/nonmaster, and so on. This article proposes the use of stochastic methods from sequential analysis to address item overexposure, a practical concern in operational CCTs. Item overexposure is traditionally dealt with in CCTs by the Sympson-Hetter (SH) method, but this method is unable to restrict the exposure of the most informative items to the desired level. The authors’ new method of stochastic item exposure balance (SIEB) works in conjunction with the SH method and is shown to greatly reduce the number of overexposed items in a pool and improve overall exposure balance while maintaining classification accuracy comparable with using the SH method alone. The method is demonstrated using a simulation study.

%B Applied Psychological Measurement %V 36 %P 181-188 %U http://apm.sagepub.com/content/36/3/181.abstract %R 10.1177/0146621612439932 %0 Journal Article %J Applied Psychological Measurement %D 2012 %T A Stochastic Method for Balancing Item Exposure Rates in Computerized Classification Tests %A Huebner, Alan %A Li, Zhushan %X

Computerized classification tests (CCTs) classify examinees into categories such as pass/fail, master/nonmaster, and so on. This article proposes the use of stochastic methods from sequential analysis to address item overexposure, a practical concern in operational CCTs. Item overexposure is traditionally dealt with in CCTs by the Sympson-Hetter (SH) method, but this method is unable to restrict the exposure of the most informative items to the desired level. The authors’ new method of stochastic item exposure balance (SIEB) works in conjunction with the SH method and is shown to greatly reduce the number of overexposed items in a pool and improve overall exposure balance while maintaining classification accuracy comparable with using the SH method alone. The method is demonstrated using a simulation study.

%B Applied Psychological Measurement %V 36 %P 181-188 %U http://apm.sagepub.com/content/36/3/181.abstract %R 10.1177/0146621612439932 %0 Journal Article %J Educational and Psychological Measurement %D 2011 %T Computerized Classification Testing Under the One-Parameter Logistic Response Model With Ability-Based Guessing %A Wang, Wen-Chung %A Huang, Sheng-Yun %X

The one-parameter logistic model with ability-based guessing (1PL-AG) has been recently developed to account for effect of ability on guessing behavior in multiple-choice items. In this study, the authors developed algorithms for computerized classification testing under the 1PL-AG and conducted a series of simulations to evaluate their performances. Four item selection methods (the Fisher information, the Fisher information with a posterior distribution, the progressive method, and the adjusted progressive method) and two termination criteria (the ability confidence interval [ACI] method and the sequential probability ratio test [SPRT]) were developed. In addition, the Sympson–Hetter online method with freeze (SHOF) was implemented for item exposure control. Major results include the following: (a) when no item exposure control was made, all the four item selection methods yielded very similar correct classification rates, but the Fisher information method had the worst item bank usage and the highest item exposure rate; (b) SHOF can successfully maintain the item exposure rate at a prespecified level, without compromising substantial accuracy and efficiency in classification; (c) once SHOF was implemented, all the four methods performed almost identically; (d) ACI appeared to be slightly more efficient than SPRT; and (e) in general, a higher weight of ability in guessing led to a slightly higher accuracy and efficiency, and a lower forced classification rate.

%B Educational and Psychological Measurement %V 71 %P 925-941 %U http://epm.sagepub.com/content/71/6/925.abstract %R 10.1177/0013164410392372 %0 Journal Article %J Physical & Occupational Therapy in Pediatrics %D 2011 %T Content range and precision of a computer adaptive test of upper extremity function for children with cerebral palsy %A Montpetit, K. %A Haley, S. %A Bilodeau, N. %A Ni, P. %A Tian, F. %A Gorton, G., 3rd %A Mulcahey, M. J. %X This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized measures: Pediatric Outcomes Data Collection Instrument and Functional Independence Measure for Children. The UE CAT correlated strongly with the upper extremity component of these measures and had greater precision when describing individual functional ability. The UE item bank has wider range with items populating the lower end of the ability spectrum. This new UE item bank and CAT have the capability to quickly assess children of all ages and abilities with good precision and, most importantly, with items that are meaningful and appropriate for their age and level of physical function. %B Physical & Occupational Therapy in Pediatrics %7 2010/10/15 %V 31 %P 90-102 %@ 1541-3144 (Electronic)0194-2638 (Linking) %G eng %M 20942642 %! Phys Occup Ther Pediatr %0 Generic %D 2011 %T Cross-cultural development of an item list for computer-adaptive testing of fatigue in oncological patients %A Giesinger, J. M. %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Arraras, J. I. %A Conroy, T. %A Gamper, E. M. %A Kemmler, G. %A King, M. T. %A Oberguggenberger, A. S. %A Velikova, G. %A Young, T. %A Holzner, B. %A Eortc-Qlg, E. O. %X ABSTRACT: INTRODUCTION: Within an ongoing project of the EORTC Quality of Life Group, we are developing computerized adaptive test (CAT) measures for the QLQ-C30 scales. These new CAT measures are conceptualised to reflect the same constructs as the QLQ-C30 scales. Accordingly, the Fatigue-CAT is intended to capture physical and general fatigue. METHODS: The EORTC approach to CAT development comprises four phases (literature search, operationalisation, pre-testing, and field testing). Phases I-III are described in detail in this paper. A literature search for fatigue items was performed in major medical databases. After refinement through several expert panels, the remaining items were used as the basis for adapting items and/or formulating new items fitting the EORTC item style. To obtain feedback from patients with cancer, these English items were translated into Danish, French, German, and Spanish and tested in the respective countries. RESULTS: Based on the literature search a list containing 588 items was generated. After a comprehensive item selection procedure focusing on content, redundancy, item clarity and item difficulty a list of 44 fatigue items was generated. Patient interviews (n=52) resulted in 12 revisions of wording and translations. DISCUSSION: The item list developed in phases I-III will be further investigated within a field-testing phase (IV) to examine psychometric characteristics and to fit an item response theory model. The Fatigue CAT based on this item bank will provide scores that are backward-compatible to the original QLQ-C30 fatigue scale. %B Health and Quality of Life Outcomes %7 2011/03/31 %V 9 %P 10 %8 March 29, 2011 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G Eng %M 21447160 %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T Impact of Item Drift on Candidate Ability Estimation %A Sarah Hagge %A Ada Woo %A Phil Dickison %K item drift %X

For large operational pools, candidate ability estimates appear robust to item drift, especially under conditions that may represent ‘normal’ amounts of drift. Even with ‘extreme’ conditions of drift (e.g., 20% of items drifting 1.00 logits), decision consistency was still high.

%B Annual Conference of the International Association for Computerized Adaptive Testing %8 10/2011 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2011 %T Restrictive Stochastic Item Selection Methods in Cognitive Diagnostic Computerized Adaptive Testing %A Wang, Chun %A Chang, Hua-Hua %A Huebner, Alan %X

This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback-Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson-Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method.

%B Journal of Educational Measurement %V 48 %P 255–273 %U http://dx.doi.org/10.1111/j.1745-3984.2011.00145.x %R 10.1111/j.1745-3984.2011.00145.x %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Assembling an Inventory of Multistage Adaptive Testing Systems %A Breithaupt, K %A Ariel, A. %A Hare, D. R. %B Elements of Adaptive Testing %P 247-266 %G eng %& 13 %R 10.1007/978-0-387-85461-8 %0 Journal Article %J Educational Technology & Society %D 2010 %T Development and evaluation of a confidence-weighting computerized adaptive testing %A Yen, Y. C. %A Ho, R. G. %A Chen, L. J. %A Chou, K. Y. %A Chen, Y. L. %B Educational Technology & Society %V 13(3) %P 163–176 %G eng %0 Journal Article %J Quality of Life Research %D 2010 %T Development of computerized adaptive testing (CAT) for the EORTC QLQ-C30 physical functioning dimension %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Chie, W. C. %A Conroy, T. %A Costantini, A. %A Fayers, P. %A Helbostad, J. %A Holzner, B. %A Kaasa, S. %A Singer, S. %A Velikova, G. %A Young, T. %X PURPOSE: Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF). METHODS: Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties. RESULTS: Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation. CONCLUSIONS: We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF. %B Quality of Life Research %7 2010/10/26 %V 20 %P 479-490 %@ 1573-2649 (Electronic)0962-9343 (Linking) %G Eng %M 20972628 %0 Journal Article %J Quality of Life Research %D 2010 %T Efficiency of static and computer adaptive short forms compared to full-length measures of depressive symptoms %A Choi, S. %A Reise, S. P. %A Pilkonis, P. A. %A Hays, R. D. %A Cella, D. %B Quality of Life Research %V 19(1) %P 125–136 %G eng %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Innovative Items for Computerized Testing %A Parshall, C. G. %A Harmes, J. C. %A Davey, T. %A Pashley, P. J. %B Elements of Adaptive Testing %P 215-230 %G eng %& 11 %R 10.1007/978-0-387-85461-8 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T A Japanese Adaptive Test of English as a Foreign Language: Developmental and Operational Aspects %A Nogami, Y. %A Hayashi, N. %B Elements of Adaptive Testing %P 191-211 %G eng %& 10 %R 10.1007/978-0-387-85461-8 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Multistage Testing: Issues, Designs, and Research %A Zenisky, A. L. %A Hambleton, R. K. %A Luecht, RM %B Elements of Adaptive Testing %P 355-372 %G eng %& 18 %R 10.1007/978-0-387-85461-8 %0 Generic %D 2010 %T SimulCAT: Windows application that simulates computerized adaptive test administration %A Han, K. T. %G eng %U http://www.hantest.net/simulcat %0 Journal Article %J Journal of Applied Measurement %D 2010 %T The use of PROMIS and assessment center to deliver patient-reported outcome measures in clinical research %A Gershon, R. C. %A Rothrock, N. %A Hanrahan, R. %A Bass, M. %A Cella, D. %X The Patient-Reported Outcomes Measurement Information System (PROMIS) was developed as one of the first projects funded by the NIH Roadmap for Medical Research Initiative to re-engineer the clinical research enterprise. The primary goal of PROMIS is to build item banks and short forms that measure key health outcome domains that are manifested in a variety of chronic diseases which could be used as a "common currency" across research projects. To date, item banks, short forms and computerized adaptive tests (CAT) have been developed for 13 domains with relevance to pediatric and adult subjects. To enable easy delivery of these new instruments, PROMIS built a web-based resource (Assessment Center) for administering CATs and other self-report data, tracking item and instrument development, monitoring accrual, managing data, and storing statistical analysis results. Assessment Center can also be used to deliver custom researcher developed content, and has numerous features that support both simple and complicated accrual designs (branching, multiple arms, multiple time points, etc.). This paper provides an overview of the development of the PROMIS item banks and details Assessment Center functionality. %B Journal of Applied Measurement %V 11 %P 304-314 %@ 1529-7713 %G eng %0 Journal Article %J Computers and Education %D 2009 %T An adaptive testing system for supporting versatile educational assessment %A Huang, Y-M. %A Lin, Y-T. %A Cheng, S-C. %K Architectures for educational technology system %K Distance education and telelearning %X With the rapid growth of computer and mobile technology, it is a challenge to integrate computer based test (CBT) with mobile learning (m-learning) especially for formative assessment and self-assessment. In terms of self-assessment, computer adaptive test (CAT) is a proper way to enable students to evaluate themselves. In CAT, students are assessed through a process that uses item response theory (IRT), a well-founded psychometric theory. Furthermore, a large item bank is indispensable to a test, but when a CAT system has a large item bank, the test item selection of IRT becomes more tedious. Besides the large item bank, item exposure mechanism is also essential to a testing system. However, IRT all lack the above-mentioned points. These reasons have motivated the authors to carry out this study. This paper describes a design issue aimed at the development and implementation of an adaptive testing system. The system can support several assessment functions and different devices. Moreover, the researchers apply a novel approach, particle swarm optimization (PSO) to alleviate the computational complexity and resolve the problem of item exposure. Throughout the development of the system, a formative evaluation was embedded into an integral part of the design methodology that was used for improving the system. After the system was formally released onto the web, some questionnaires and experiments were conducted to evaluate the usability, precision, and efficiency of the system. The results of these evaluations indicated that the system provides an adaptive testing for different devices and supports versatile assessment functions. Moreover, the system can estimate students' ability reliably and validly and conduct an adaptive test efficiently. Furthermore, the computational complexity of the system was alleviated by the PSO approach. By the approach, the test item selection procedure becomes efficient and the average best fitness values are very close to the optimal solutions. %B Computers and Education %V 52 %P 53-67 %@ 0360-1315 %G eng %0 Book Section %D 2009 %T Criterion-related validity of an innovative CAT-based personality measure %A Schneider, R. J. %A McLellan, R. A. %A Kantrowitz, T. M. %A Houston, J. S. %A Borman, W. C. %X This paper describes development and initial criterion-related validation of the PreVisor Computer Adaptive Personality Scales (PCAPS), a computerized adaptive testing-based personality measure that uses an ideal point IRT model based on forced-choice, paired-comparison responses. Based on results from a large consortium study, a composite of six PCAPS scales identified as relevant to the population of interest (first-line supervisors) had an estimated operational validity against an overall job performance criterion of ρ = .25. Uncorrected and corrected criterion-related validity results for each of the six PCAPS scales making up the composite are also reported. Because the PCAPS algorithm computes intermediate scale scores until a stopping rule is triggered, we were able to graph number of statement-pairs presented against criterion-related validities. Results showed generally monotonically increasing functions. However, asymptotic validity levels, or at least a reduction in the rate of increase in slope, were often reached after 5-7 statement-pairs were presented. In the case of the composite measure, there was some evidence that validities decreased after about six statement-pairs. A possible explanation for this is provided. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Diagnostica %D 2009 %T Effekte des adaptiven Testens auf die Moti¬vation zur Testbearbeitung [Effects of adaptive testing on test taking motivation]. %A Frey, A. %A Hartig, J. %A Moosbrugger, H. %B Diagnostica %V 55 %P 20-28 %G German %0 Book Section %D 2009 %T Features of J-CAT (Japanese Computerized Adaptive Test) %A Imai, S. %A Ito, S. %A Nakamura, Y. %A Kikuchi, K. %A Akagi, Y. %A Nakasono, H. %A Honda, A. %A Hiramura, T. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T A gradual maximum information ratio approach to item selection in computerized adaptive testing %A Han, K. T. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Report %D 2009 %T Gradual maximum information ratio approach to item selection in computerized adaptive testing %A Han, K. T. %B GMAC Research Reports %I Graduate Management Admissions Council %C McLean, VA. USA %G eng %0 Journal Article %J Quality of Life Research %D 2009 %T Measuring global physical health in children with cerebral palsy: Illustration of a multidimensional bi-factor model and computerized adaptive testing %A Haley, S. M. %A Ni, P. %A Dumas, H. M. %A Fragala-Pinkham, M. A. %A Hambleton, R. K. %A Montpetit, K. %A Bilodeau, N. %A Gorton, G. E. %A Watson, K. %A Tucker, C. A. %K *Computer Simulation %K *Health Status %K *Models, Statistical %K Adaptation, Psychological %K Adolescent %K Cerebral Palsy/*physiopathology %K Child %K Child, Preschool %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Massachusetts %K Pennsylvania %K Questionnaires %K Young Adult %X PURPOSE: The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). METHODS: Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. RESULTS: Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. CONCLUSIONS: The bi-factor MIRT CAT application, especially the 10- and 15-item versions, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner. %B Quality of Life Research %7 2009/02/18 %V 18 %P 359-370 %8 Apr %@ 0962-9343 (Print)0962-9343 (Linking) %G eng %M 19221892 %2 2692519 %0 Book Section %D 2009 %T Practical issues concerning the application of the DINA model to CAT data %A Huebner, A. %A Wang, B. %A Lee, S. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Quality of Life Research %D 2009 %T Replenishing a computerized adaptive test of patient-reported daily activity functioning %A Haley, S. M. %A Ni, P. %A Jette, A. M. %A Tao, W. %A Moed, R. %A Meyers, D. %A Ludlow, L. H. %K *Activities of Daily Living %K *Disability Evaluation %K *Questionnaires %K *User-Computer Interface %K Adult %K Aged %K Cohort Studies %K Computer-Assisted Instruction %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %X PURPOSE: Computerized adaptive testing (CAT) item banks may need to be updated, but before new items can be added, they must be linked to the previous CAT. The purpose of this study was to evaluate 41 pretest items prior to including them into an operational CAT. METHODS: We recruited 6,882 patients with spine, lower extremity, upper extremity, and nonorthopedic impairments who received outpatient rehabilitation in one of 147 clinics across 13 states of the USA. Forty-one new Daily Activity (DA) items were administered along with the Activity Measure for Post-Acute Care Daily Activity CAT (DA-CAT-1) in five separate waves. We compared the scoring consistency with the full item bank, test information function (TIF), person standard errors (SEs), and content range of the DA-CAT-1 to the new CAT (DA-CAT-2) with the pretest items by real data simulations. RESULTS: We retained 29 of the 41 pretest items. Scores from the DA-CAT-2 were more consistent (ICC = 0.90 versus 0.96) than DA-CAT-1 when compared with the full item bank. TIF and person SEs were improved for persons with higher levels of DA functioning, and ceiling effects were reduced from 16.1% to 6.1%. CONCLUSIONS: Item response theory and online calibration methods were valuable in improving the DA-CAT. %B Quality of Life Research %7 2009/03/17 %V 18 %P 461-71 %8 May %@ 0962-9343 (Print)0962-9343 (Linking) %G eng %M 19288222 %0 Journal Article %J American Journal of Physical Medicine and Rehabilitation %D 2008 %T Adaptive short forms for outpatient rehabilitation outcome assessment %A Jette, A. M. %A Haley, S. M. %A Ni, P. %A Moed, R. %K *Activities of Daily Living %K *Ambulatory Care Facilities %K *Mobility Limitation %K *Treatment Outcome %K Disabled Persons/psychology/*rehabilitation %K Female %K Humans %K Male %K Middle Aged %K Questionnaires %K Rehabilitation Centers %X OBJECTIVE: To develop outpatient Adaptive Short Forms for the Activity Measure for Post-Acute Care item bank for use in outpatient therapy settings. DESIGN: A convenience sample of 11,809 adults with spine, lower limb, upper limb, and miscellaneous orthopedic impairments who received outpatient rehabilitation in 1 of 127 outpatient rehabilitation clinics in the United States. We identified optimal items for use in developing outpatient Adaptive Short Forms based on the Basic Mobility and Daily Activities domains of the Activity Measure for Post-Acute Care item bank. Patient scores were derived from the Activity Measure for Post-Acute Care computerized adaptive testing program. Items were selected for inclusion on the Adaptive Short Forms based on functional content, range of item coverage, measurement precision, item exposure rate, and data collection burden. RESULTS: Two outpatient Adaptive Short Forms were developed: (1) an 18-item Basic Mobility Adaptive Short Form and (2) a 15-item Daily Activities Adaptive Short Form, derived from the same item bank used to develop the Activity Measure for Post-Acute Care computerized adaptive testing program. Both Adaptive Short Forms achieved acceptable psychometric properties. CONCLUSIONS: In outpatient postacute care settings where computerized adaptive testing outcome applications are currently not feasible, item response theory-derived Adaptive Short Forms provide the efficient capability to monitor patients' functional outcomes. The development of Adaptive Short Form functional outcome instruments linked by a common, calibrated item bank has the potential to create a bridge to outcome monitoring across postacute care settings and can facilitate the eventual transformation from Adaptive Short Forms to computerized adaptive testing applications easier and more acceptable to the rehabilitation community. %B American Journal of Physical Medicine and Rehabilitation %7 2008/09/23 %V 87 %P 842-52 %8 Oct %@ 1537-7385 (Electronic) %G eng %M 18806511 %0 Journal Article %J Psychiatric Services %D 2008 %T Are we ready for computerized adaptive testing? %A Unick, G. J. %A Shumway, M. %A Hargreaves, W. %K *Attitude of Health Personnel %K *Diagnosis, Computer-Assisted/instrumentation %K Humans %K Mental Disorders/*diagnosis %K Software %B Psychiatric Services %7 2008/04/02 %V 59 %P 369 %8 Apr %@ 1075-2730 (Print)1075-2730 (Linking) %G eng %M 18378833 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2008 %T Assessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory %A Coster, W. J. %A Haley, S. M. %A Ni, P. %A Dumas, H. M. %A Fragala-Pinkham, M. A. %K *Disability Evaluation %K *Social Adjustment %K Activities of Daily Living %K Adolescent %K Age Factors %K Child %K Child, Preschool %K Computer Simulation %K Cross-Over Studies %K Disabled Children/*rehabilitation %K Female %K Follow-Up Studies %K Humans %K Infant %K Male %K Outcome Assessment (Health Care) %K Reference Values %K Reproducibility of Results %K Retrospective Studies %K Risk Factors %K Self Care/*standards/trends %K Sex Factors %K Sickness Impact Profile %X OBJECTIVE: To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. PARTICIPANTS: Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. RESULTS: Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. CONCLUSIONS: Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time. %B Archives of Physical Medicine and Rehabilitation %7 2008/04/01 %V 89 %P 622-629 %8 Apr %@ 1532-821X (Electronic)0003-9993 (Linking) %G eng %M 18373991 %2 2666276 %0 Journal Article %J Educational and Psychological Measurement %D 2008 %T Comparability of Computer-Based and Paper-and-Pencil Testing in K–12 Reading Assessments %A Shudong Wang, %A Hong Jiao, %A Young, Michael J. %A Brooks, Thomas %A Olson, John %X

In recent years, computer-based testing (CBT) has grown in popularity, is increasingly being implemented across the United States, and will likely become the primary mode for delivering tests in the future. Although CBT offers many advantages over traditional paper-and-pencil testing, assessment experts, researchers, practitioners, and users have expressed concern about the comparability of scores between the two test administration modes. To help provide an answer to this issue, a meta-analysis was conducted to synthesize the administration mode effects of CBTs and paper-and-pencil tests on K—12 student reading assessments. Findings indicate that the administration mode had no statistically significant effect on K—12 student reading achievement scores. Four moderator variables—study design, sample size, computer delivery algorithm, and computer practice—made statistically significant contributions to predicting effect size. Three moderator variables—grade level, type of test, and computer delivery method—did not affect the differences in reading scores between test modes.

%B Educational and Psychological Measurement %V 68 %P 5-24 %U http://epm.sagepub.com/content/68/1/5.abstract %R 10.1177/0013164407305592 %0 Journal Article %J Educational and Psychological Measurement %D 2008 %T Computer-Based and Paper-and-Pencil Administration Mode Effects on a Statewide End-of-Course English Test %A Kim, Do-Hong %A Huynh, Huynh %X

The current study compared student performance between paper-and-pencil testing (PPT) and computer-based testing (CBT) on a large-scale statewide end-of-course English examination. Analyses were conducted at both the item and test levels. The overall results suggest that scores obtained from PPT and CBT were comparable. However, at the content domain level, a rather large difference in the reading comprehension section suggests that reading comprehension test may be more affected by the test administration mode. Results from the confirmatory factor analysis suggest that the administration mode did not alter the construct of the test.

%B Educational and Psychological Measurement %V 68 %P 554-570 %U http://epm.sagepub.com/content/68/4/554.abstract %R 10.1177/0013164407310132 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2008 %T Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes %A Haley, S. M. %A Gandek, B. %A Siebens, H. %A Black-Schaffer, R. M. %A Sinclair, S. J. %A Tao, W. %A Coster, W. J. %A Ni, P. %A Jette, A. M. %K *Activities of Daily Living %K *Adaptation, Physiological %K *Computer Systems %K *Questionnaires %K Adult %K Aged %K Aged, 80 and over %K Chi-Square Distribution %K Factor Analysis, Statistical %K Female %K Humans %K Longitudinal Studies %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Patient Discharge %K Prospective Studies %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden. %B Archives of Physical Medicine and Rehabilitation %7 2008/01/30 %V 89 %P 275-283 %8 Feb %@ 1532-821X (Electronic)0003-9993 (Linking) %G eng %M 18226651 %2 2666330 %0 Journal Article %J Journal of Clinical Epidemiology %D 2008 %T Computerized adaptive testing for patients with knee inpairments produced valid and responsive measures of function %A Hart, D. L. %A Wang, Y-C. %A Stratford, P. W. %A Mioduski, J. E. %B Journal of Clinical Epidemiology %V 61 %P 1113-1124 %G eng %0 Journal Article %J Zeitschrift für Psychologie / Journal of Psychology %D 2008 %T Computerized Adaptive Testing of Personality Traits %A Hol, A. M. %A Vorst, H. C. M. %A Mellenbergh, G. J. %K Adaptive Testing %K cmoputer-assisted testing %K Item Response Theory %K Likert scales %K Personality Measures %X

A computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a
conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun’s Adjective
Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima’s graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used).

%B Zeitschrift für Psychologie / Journal of Psychology %V 216 %P 12-21 %N 1 %R 10.1027/0044-3409.216.1.12 %0 Conference Paper %B Joint Meeting on Adolescent Treatment Effectiveness %D 2008 %T Developing a progressive approach to using the GAIN in order to reduce the duration and cost of assessment with the GAIN short screener, Quick and computer adaptive testing %A Dennis, M. L. %A Funk, R. %A Titus, J. %A Riley, B. B. %A Hosman, S. %A Kinne, S. %B Joint Meeting on Adolescent Treatment Effectiveness %C Washington D.C., USA %8 2008 %G eng %( 2008 %) ADDED 1 Aug 2008 %F 205795 %0 Journal Article %J Disability & Rehabilitation %D 2008 %T Efficiency and sensitivity of multidimensional computerized adaptive testing of pediatric physical functioning %A Allen, D. D. %A Ni, P. %A Haley, S. M. %K *Disability Evaluation %K Child %K Computers %K Disabled Children/*classification/rehabilitation %K Efficiency %K Humans %K Outcome Assessment (Health Care) %K Psychometrics %K Reproducibility of Results %K Retrospective Studies %K Self Care %K Sensitivity and Specificity %X PURPOSE: Computerized adaptive tests (CATs) have efficiency advantages over fixed-length tests of physical functioning but may lose sensitivity when administering extremely low numbers of items. Multidimensional CATs may efficiently improve sensitivity by capitalizing on correlations between functional domains. Using a series of empirical simulations, we assessed the efficiency and sensitivity of multidimensional CATs compared to a longer fixed-length test. METHOD: Parent responses to the Pediatric Evaluation of Disability Inventory before and after intervention for 239 children at a pediatric rehabilitation hospital provided the data for this retrospective study. Reliability, effect size, and standardized response mean were compared between full-length self-care and mobility subscales and simulated multidimensional CATs with stopping rules at 40, 30, 20, and 10 items. RESULTS: Reliability was lowest in the 10-item CAT condition for the self-care (r = 0.85) and mobility (r = 0.79) subscales; all other conditions had high reliabilities (r > 0.94). All multidimensional CAT conditions had equivalent levels of sensitivity compared to the full set condition for both domains. CONCLUSIONS: Multidimensional CATs efficiently retain the sensitivity of longer fixed-length measures even with 5 items per dimension (10-item CAT condition). Measuring physical functioning with multidimensional CATs could enhance sensitivity following intervention while minimizing response burden. %B Disability & Rehabilitation %7 2008/02/26 %V 30 %P 479-84 %@ 0963-8288 (Print)0963-8288 (Linking) %G eng %M 18297502 %0 Journal Article %J Zeitschrift für Psychologie / Journal of Psychology %D 2008 %T ICAT: An adaptive testing procedure for the identification of idiosyncratic knowledge patterns %A Kingsbury, G. G. %A Houser, R.L. %B Zeitschrift für Psychologie / Journal of Psychology %V 216(1) %P 40–48 %G eng %0 Journal Article %J Zeitschrift für Psychologie %D 2008 %T ICAT: An adaptive testing procedure for the identification of idiosyncratic knowledge patterns %A Kingsbury, G. G. %A Houser, R.L. %K computerized adaptive testing %X

Traditional adaptive tests provide an efficient method for estimating student achievements levels, by adjusting the characteristicsof the test questions to match the performance of each student. These traditional adaptive tests are not designed to identify diosyncraticknowledge patterns. As students move through their education, they learn content in any number of different ways related to their learning style and cognitive development. This may result in a student having different achievement levels from one content area to another within a domain of content. This study investigates whether such idiosyncratic knowledge patterns exist. It discusses the differences between idiosyncratic knowledge patterns and multidimensionality. Finally, it proposes an adaptive testing procedure that can be used to identify a student’s areas of strength and weakness more efficiently than current adaptive testing approaches. The findings of the study indicate that a fairly large number of students may have test results that are influenced by their idiosyncratic knowledge patterns. The findings suggest that these patterns persist across time for a large number of students, and that the differences in student performance between content areas within a subject domain are large enough to allow them to be useful in instruction. Given the existence of idiosyncratic patterns of knowledge, the proposed testing procedure may enable us to provide more useful information to teachers. It should also allow us to differentiate between idiosyncratic patterns or knowledge, and important mutidimensionality in the testing data.

%B Zeitschrift für Psychologie %V 216 %P 40-48 %G eng %0 Journal Article %J Journal of Pediatric Orthopedics %D 2008 %T Measuring physical functioning in children with spinal impairments with computerized adaptive testing %A Mulcahey, M. J. %A Haley, S. M. %A Duffy, T. %A Pengsheng, N. %A Betz, R. R. %K *Disability Evaluation %K Adolescent %K Child %K Child, Preschool %K Computer Simulation %K Cross-Sectional Studies %K Disabled Children/*rehabilitation %K Female %K Humans %K Infant %K Kyphosis/*diagnosis/rehabilitation %K Male %K Prospective Studies %K Reproducibility of Results %K Scoliosis/*diagnosis/rehabilitation %X BACKGROUND: The purpose of this study was to assess the utility of measuring current physical functioning status of children with scoliosis and kyphosis by applying computerized adaptive testing (CAT) methods. Computerized adaptive testing uses a computer interface to administer the most optimal items based on previous responses, reducing the number of items needed to obtain a scoring estimate. METHODS: This was a prospective study of 77 subjects (0.6-19.8 years) who were seen by a spine surgeon during a routine clinic visit for progress spine deformity. Using a multidimensional version of the Pediatric Evaluation of Disability Inventory CAT program (PEDI-MCAT), we evaluated content range, accuracy and efficiency, known-group validity, concurrent validity with the Pediatric Outcomes Data Collection Instrument, and test-retest reliability in a subsample (n = 16) within a 2-week interval. RESULTS: We found the PEDI-MCAT to have sufficient item coverage in both self-care and mobility content for this sample, although most patients tended to score at the higher ends of both scales. Both the accuracy of PEDI-MCAT scores as compared with a fixed format of the PEDI (r = 0.98 for both mobility and self-care) and test-retest reliability were very high [self-care: intraclass correlation (3,1) = 0.98, mobility: intraclass correlation (3,1) = 0.99]. The PEDI-MCAT took an average of 2.9 minutes for the parents to complete. The PEDI-MCAT detected expected differences between patient groups, and scores on the PEDI-MCAT correlated in expected directions with scores from the Pediatric Outcomes Data Collection Instrument domains. CONCLUSIONS: Use of the PEDI-MCAT to assess the physical functioning status, as perceived by parents of children with complex spinal impairments, seems to be feasible and achieves accurate and efficient estimates of self-care and mobility function. Additional item development will be needed at the higher functioning end of the scale to avoid ceiling effects for older children. LEVEL OF EVIDENCE: This is a level II prospective study designed to establish the utility of computer adaptive testing as an evaluation method in a busy pediatric spine practice. %B Journal of Pediatric Orthopedics %7 2008/03/26 %V 28 %P 330-5 %8 Apr-May %@ 0271-6798 (Print)0271-6798 (Linking) %G eng %M 18362799 %2 2696932 %0 Generic %D 2008 %T Preparing the implementation of computerized adaptive testing for high-stakes examinations %A Huh, S. %B Journal of Educational Evaluation for Health Professions %7 2009/02/19 %V 5 %P 1 %@ 1975-5937 (Electronic) %G eng %M 19223998 %2 2631196 %0 Journal Article %J Zeitschrift für Psychologie \ Journal of Psychology %D 2008 %T Transitioning from fixed-length questionnaires to computer-adaptive versions %A Walter, O. B. %A Holling, H. %B Zeitschrift für Psychologie \ Journal of Psychology %V 216(1) %P 22–28 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2007 %T Automated Simultaneous Assembly of Multistage Testlets for a High-Stakes Licensing Examination %A Breithaupt, Krista %A Hare, Donovan R. %X

Many challenges exist for high-stakes testing programs offering continuous computerized administration. The automated assembly of test questions to exactly meet content and other requirements, provide uniformity, and control item exposure can be modeled and solved by mixed-integer programming (MIP) methods. A case study of the computerized licensing examination of the American Institute of Certified Public Accountants is offered as one application of MIP techniques for test assembly. The solution illustrates assembly for a computer-adaptive multistage testing design. However, the general form of the constraint-based solution can be modified to generate optimal test designs for paper-based or computerized administrations, regardless of the specific psychometric model. An extension of this methodology allows for long-term planning for the production and use of test content on the basis of an exact psychometric test designs and administration schedules.

%B Educational and Psychological Measurement %V 67 %P 5-20 %U http://epm.sagepub.com/content/67/1/5.abstract %R 10.1177/0013164406288162 %0 Journal Article %J Applied Psychological Measurement %D 2007 %T Computerized Adaptive Testing for Polytomous Motivation Items: Administration Mode Effects and a Comparison With Short Forms %A Hol, A. Michiel %A Vorst, Harrie C. M. %A Mellenbergh, Gideon J. %X

In a randomized experiment (n = 515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit.

%B Applied Psychological Measurement %V 31 %P 412-429 %U http://apm.sagepub.com/content/31/5/412.abstract %R 10.1177/0146621606297314 %0 Journal Article %J Applied Psychological Measurement %D 2007 %T Computerized adaptive testing for polytomous motivation items: Administration mode effects and a comparison with short forms %A Hol, A. M. %A Vorst, H. C. M. %A Mellenbergh, G. J. %K 2220 Tests & Testing %K Adaptive Testing %K Attitude Measurement %K computer adaptive testing %K Computer Assisted Testing %K items %K Motivation %K polytomous motivation %K Statistical Validity %K Test Administration %K Test Forms %K Test Items %X In a randomized experiment (n=515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit. (PsycINFO Database Record (c) 2007 APA ) (journal abstract) %B Applied Psychological Measurement %V 31 %P 412-429 %@ 0146-6216 %G English %M 2007-13340-003 %0 Journal Article %J Educational and Psychological Measurement %D 2007 %T Computerizing Organizational Attitude Surveys %A Mueller, Karsten %A Liebig, Christian %A Hattrup, Keith %X

Two quasi-experimental field studies were conducted to evaluate the psychometric equivalence of computerized and paper-and-pencil job satisfaction measures. The present research extends previous work in the area by providing better control of common threats to validity in quasi-experimental research on test mode effects and by evaluating a more comprehensive measurement model for job attitudes. Results of both studies demonstrated substantial equivalence of the computerized measure with the paper-and-pencil version. Implications for the practical use of computerized organizational attitude surveys are discussed.

%B Educational and Psychological Measurement %V 67 %P 658-678 %U http://epm.sagepub.com/content/67/4/658.abstract %R 10.1177/0013164406292084 %0 Book Section %D 2007 %T Designing templates based on a taxonomy of innovative items %A Parshall, C. G. %A Harmes, J. C. %C D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Acta Psychologica Sinica %D 2007 %T An exploration and realization of computerized adaptive testing with cognitive diagnosis %A Haijing, L. %A Shuliang, D. %X An increased attention paid to “cognitive bugs behavior,” appears to lead to an increased research interests in diagnostic testing based on Item Response Theory(IRT)that combines cognitive psychology and psychometrics. The study of cognitive diagnosis were applied mainly to Paper-and-Pencil (P&P) testing. Rarely has it been applied to computerized adaptive testing CAT), To our knowledge, no research on CAT with cognitive diagnosis has been conducted in China. Since CAT is more efficient and accurate than P&P testing, there is important to develop an application technique for cognitive diagnosis suitable for CAT. This study attempts to construct a preliminary CAT system for cognitive diagnosis.With the help of the methods for “ Diagnosis first, Ability estimation second ”, the knowledge state conversion diagram was used to describe all the possible knowledge states in a domain of interest and the relation among the knowledge states at the diagnosis stage, where a new strategy of item selection based-on the algorithm of Depth First Search was proposed. On the other hand, those items that contain attributes which the examinee has not mastered were removed in ability estimation. At the stage of accurate ability estimation, all the items answered by each examinee not only matched his/her ability estimated value, but also were limited to those items whose attributes have been mastered by the examinee.We used Monte Carlo Simulation to simulate all the data of the three different structures of cognitive attributes in this study. These structures were tree-shaped, forest-shaped, and some isolated vertices (that are related to simple Q-matrix). Both tree-shaped and isolated vertices structure were derived from actual cases, while forest-shaped structure was a generalized simulation. 3000 examinees and 3000 items were simulated in the experiment of tree-shaped, 2550 examinees and 3100 items in forest-shaped, and 2000 examinees and 2500 items in isolated vertices. The maximum test length was all assumed as 30 items for all those experiments. The difficulty parameters and the logarithm of the discrimination were drawn from the standard normal distribution N(0,1). There were 100 examinees of each attribute pattern in the experiment of tree-shaped and 50 examinees of each attribute pattern in forest-shaped. In isolated vertices, 2000 examinees are students come from actual case.To assess the behaviors of the proposed diagnostic approach, three assessment indices were used. They are attribute pattern classification agreement rate (abr.APCAR), the Recovery (the average of the absolute deviation between the estimated value and the true value) and the average test length (abr. Length).Parts of results of Monte Carlo study were as follows.For the attribute structure of tree-shaped, APCAR is 84.27%,Recovery is 0.17,Length is 24.80.For the attribute structure of forest-shaped, APCAR is 84.02%,Recovery is 0.172,Length is 23.47.For the attribute structure of isolated vertices, APCAR is 99.16%,Recorvery is 0.256,Length is 27.32.As show the above, we can conclude that the results are favorable. The rate of cognitive diagnosis accuracy has exceeded 80% in each experiment, and the Recovery is also good. Therefore, it should be an acceptable idea to construct an initiatory CAT system for cognitive diagnosis, if we use the methods for “Diagnosis first, Ability estimation second ” with the help of both knowledge state conversion diagram and the new strategy of item selection based-on the algorithm of Depth First Search %B Acta Psychologica Sinica %V 39 %P 747-753 %G eng %0 Book Section %D 2007 %T ICAT: An adaptive testing procedure to allow the identification of idiosyncratic knowledge patterns %A Kingsbury, G. G. %A Houser, R.L. %C D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Quality of Life Research %D 2007 %T IRT health outcomes data analysis project: an overview and summary %A Cook, K. F. %A Teal, C. R. %A Bjorner, J. B. %A Cella, D. %A Chang, C-H. %A Crane, P. K. %A Gibbons, L. E. %A Hays, R. D. %A McHorney, C. A. %A Ocepek-Welikson, K. %A Raczek, A. E. %A Teresi, J. A. %A Reeve, B. B. %K *Data Interpretation, Statistical %K *Health Status %K *Quality of Life %K *Questionnaires %K *Software %K Female %K HIV Infections/psychology %K Humans %K Male %K Neoplasms/psychology %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Stress, Psychological %X BACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed. %B Quality of Life Research %7 2007/03/14 %V 16 %P 121-132 %@ 0962-9343 (Print) %G eng %M 17351824 %0 Journal Article %J Educational Measurement: Issues and Practice %D 2007 %T An NCME instructional module on multistage testing %A Hendrickson, A. %B Educational Measurement: Issues and Practice %V 26(2) %P 44-52 %G eng %0 Journal Article %J Physical Therapy %D 2007 %T Prospective evaluation of the am-pac-cat in outpatient rehabilitation settings %A Jette, A., %A Haley, S. %A Tao, W. %A Ni, P. %A Moed, R. %A Meyers, D. %A Zurek, M. %B Physical Therapy %V 87 %P 385-398 %G eng %0 Journal Article %J Medical Care %D 2007 %T Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) %A Reeve, B. B. %A Hays, R. D. %A Bjorner, J. B. %A Cook, K. F. %A Crane, P. K. %A Teresi, J. A. %A Thissen, D. %A Revicki, D. A. %A Weiss, D. J. %A Hambleton, R. K. %A Liu, H. %A Gershon, R. C. %A Reise, S. P. %A Lai, J. S. %A Cella, D. %K *Health Status %K *Information Systems %K *Quality of Life %K *Self Disclosure %K Adolescent %K Adult %K Aged %K Calibration %K Databases as Topic %K Evaluation Studies as Topic %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Questionnaires/standards %K United States %X BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. %B Medical Care %7 2007/04/20 %V 45 %P S22-31 %8 May %@ 0025-7079 (Print) %G eng %M 17443115 %0 Journal Article %J European Journal of Psychological Assessment %D 2007 %T Psychometric properties of an emotional adjustment measure: An application of the graded response model %A Rubio, V. J. %A Aguado, D. %A Hontangas, P. M. %A Hernández, J. M. %K computerized adaptive tests %K Emotional Adjustment %K Item Response Theory %K Personality Measures %K personnel recruitment %K Psychometrics %K Samejima's graded response model %K test reliability %K validity %X Item response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. However, IRT has been mainly used for assessing achievements and ability rather than personality factors. This paper presents an application of the IRT to a personality measure. Thus, the psychometric properties of a new emotional adjustment measure that consists of a 28-six graded response items is shown. Classical test theory (CTT) analyses as well as IRT analyses are carried out. Samejima's (1969) graded-response model has been used for estimating item parameters. Results show that the bank of items fulfills model assumptions and fits the data reasonably well, demonstrating the suitability of the IRT models for the description and use of data originating from personality measures. In this sense, the model fulfills the expectations that IRT has undoubted advantages: (1) The invariance of the estimated parameters, (2) the treatment given to the standard error of measurement, and (3) the possibilities offered for the construction of computerized adaptive tests (CAT). The bank of items shows good reliability. It also shows convergent validity compared to the Eysenck Personality Inventory (EPQ-A; Eysenck & Eysenck, 1975) and the Big Five Questionnaire (BFQ; Caprara, Barbaranelli, & Borgogni, 1993). (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B European Journal of Psychological Assessment %I Hogrefe & Huber Publishers GmbH: Germany %V 23 %P 39-46 %@ 1015-5759 (Print) %G eng %M 2007-01587-007 %0 Journal Article %J Psychology Science %D 2006 %T Adaptive success control in computerized adaptive testing %A Häusler, Joachim %K adaptive success control %K computerized adaptive testing %K Psychometrics %X In computerized adaptive testing (CAT) procedures within the framework of probabilistic test theory the difficulty of an item is adjusted to the ability of the respondent, with the aim of maximizing the amount of information generated per item, thereby also increasing test economy and test reasonableness. However, earlier research indicates that respondents might feel over-challenged by a constant success probability of p = 0.5 and therefore cannot come to a sufficiently high answer certainty within a reasonable timeframe. Consequently response time per item increases, which -- depending on the test material -- can outweigh the benefit of administering optimally informative items. Instead of a benefit, the result of using CAT procedures could be a loss of test economy. Based on this problem, an adaptive success control algorithm was designed and tested, adapting the success probability to the working style of the respondent. Persons who need higher answer certainty in order to come to a decision are detected and receive a higher success probability, in order to minimize the test duration (not the number of items as in classical CAT). The method is validated on the re-analysis of data from the Adaptive Matrices Test (AMT, Hornke, Etzel & Rettig, 1999) and by the comparison between an AMT version using classical CAT and an experimental version using Adaptive Success Control. The results are discussed in the light of psychometric and psychological aspects of test quality. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Psychology Science %I Pabst Science Publishers: Germany %V 48 %P 436-450 %@ 0033-3018 (Print) %G eng %M 2007-03313-004 %0 Generic %D 2006 %T A CAT with personality and attitude %A Hol, A. M. %C Enschede, The Netherlands: PrintPartners Ipskamp B %G eng %0 Journal Article %J Acta Psychologica Sinica %D 2006 %T The comparison among item selection strategies of CAT with multiple-choice items %A Hai-qi, D. %A De-zhi, C. %A Shuliang, D. %A Taiping, D. %K CAT %K computerized adaptive testing %K graded response model %K item selection strategies %K multiple choice items %X The initial purpose of comparing item selection strategies for CAT was to increase the efficiency of tests. As studies continued, however, it was found that increasing the efficiency of item bank using was also an important goal of comparing item selection strategies. These two goals often conflicted. The key solution was to find a strategy with which both goals could be accomplished. The item selection strategies for graded response model in this study included: the average of the difficulty orders matching with the ability; the medium of the difficulty orders matching with the ability; maximum information; A stratified (average); and A stratified (medium). The evaluation indexes used for comparison included: the bias of ability estimates for the true; the standard error of ability estimates; the average items which the examinees have administered; the standard deviation of the frequency of items selected; and sum of the indices weighted. Using the Monte Carlo simulation method, we obtained some data and computer iterated the data 20 times each under the conditions that the item difficulty parameters followed the normal distribution and even distribution. The results were as follows; The results indicated that no matter difficulty parameters followed the normal distribution or even distribution. Every type of item selection strategies designed in this research had its strong and weak points. In general evaluation, under the condition that items were stratified appropriately, A stratified (medium) (ASM) had the best effect. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Acta Psychologica Sinica %I Science Press: China %V 38 %P 778-783 %@ 0439-755X (Print) %G eng %M 2006-20552-017 %0 Journal Article %J Applied Measurement in Education %D 2006 %T Comparison of the Psychometric Properties of Several Computer-Based Test Designs for Credentialing Exams With Multiple Purposes %A Jodoin, Michael G. %A Zenisky, April %A Hambleton, Ronald K. %B Applied Measurement in Education %V 19 %P 203-220 %U http://www.tandfonline.com/doi/abs/10.1207/s15324818ame1903_3 %R 10.1207/s15324818ame1903_3 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Computer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank %A Haley, S. %A Ni, P. %A Hambleton, R. K. %A Slavin, M. %A Jette, A. %B Journal of Clinical Epidemiology %V 59 %P 1174-1182 %@ 08954356 %G eng %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Computer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank %A Haley, S. M. %A Ni, P. %A Hambleton, R. K. %A Slavin, M. D. %A Jette, A. M. %K *Recovery of Function %K Activities of Daily Living %K Adolescent %K Adult %K Aged %K Aged, 80 and over %K Confidence Intervals %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Rehabilitation/*standards %K Reproducibility of Results %K Software %X BACKGROUND AND OBJECTIVE: Measuring physical functioning (PF) within and across postacute settings is critical for monitoring outcomes of rehabilitation; however, most current instruments lack sufficient breadth and feasibility for widespread use. Computer adaptive testing (CAT), in which item selection is tailored to the individual patient, holds promise for reducing response burden, yet maintaining measurement precision. We calibrated a PF item bank via item response theory (IRT), administered items with a post hoc CAT design, and determined whether CAT would improve accuracy and precision of score estimates over random item selection. METHODS: 1,041 adults were interviewed during postacute care rehabilitation episodes in either hospital or community settings. Responses for 124 PF items were calibrated using IRT methods to create a PF item bank. We examined the accuracy and precision of CAT-based scores compared to a random selection of items. RESULTS: CAT-based scores had higher correlations with the IRT-criterion scores, especially with short tests, and resulted in narrower confidence intervals than scores based on a random selection of items; gains, as expected, were especially large for low and high performing adults. CONCLUSION: The CAT design may have important precision and efficiency advantages for point-of-care functional assessment in rehabilitation practice settings. %B Journal of Clinical Epidemiology %7 2006/10/10 %V 59 %P 1174-82 %8 Nov %@ 0895-4356 (Print) %G eng %M 17027428 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2006 %T Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes %A Haley, S. M. %A Siebens, H. %A Coster, W. J. %A Tao, W. %A Black-Schaffer, R. M. %A Gandek, B. %A Sinclair, S. J. %A Ni, P. %K *Activities of Daily Living %K *Adaptation, Physiological %K *Computer Systems %K *Questionnaires %K Adult %K Aged %K Aged, 80 and over %K Chi-Square Distribution %K Factor Analysis, Statistical %K Female %K Humans %K Longitudinal Studies %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Patient Discharge %K Prospective Studies %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time. %B Archives of Physical Medicine and Rehabilitation %7 2006/08/01 %V 87 %P 1033-42 %8 Aug %@ 0003-9993 (Print) %G eng %M 16876547 %0 Journal Article %J Journal of Applied Measurement %D 2006 %T Expansion of a physical function item bank and development of an abbreviated form for clinical research %A Bode, R. K. %A Lai, J-S. %A Dineen, K. %A Heinemann, A. W. %A Shevrin, D. %A Von Roenn, J. %A Cella, D. %K clinical research %K computerized adaptive testing %K performance levels %K physical function item bank %K Psychometrics %K test reliability %K Test Validity %X We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Applied Measurement %I Richard M Smith: US %V 7 %P 1-15 %@ 1529-7713 (Print) %G eng %M 2006-01262-001 %0 Journal Article %J Medical Care %D 2006 %T Item banks and their potential applications to health status assessment in diverse populations %A Hahn, E. A. %A Cella, D. %A Bode, R. K. %A Gershon, R. C. %A Lai, J. S. %X In the context of an ethnically diverse, aging society, attention is increasingly turning to health-related quality of life measurement to evaluate healthcare and treatment options for chronic diseases. When evaluating and treating symptoms and concerns such as fatigue, pain, or physical function, reliable and accurate assessment is a priority. Modern psychometric methods have enabled us to move from long, static tests that provide inefficient and often inaccurate assessment of individual patients, to computerized adaptive tests (CATs) that can precisely measure individuals on health domains of interest. These modern methods, collectively referred to as item response theory (IRT), can produce calibrated "item banks" from larger pools of questions. From these banks, CATs can be conducted on individuals to produce their scores on selected domains. Item banks allow for comparison of patients across different question sets because the patient's score is expressed on a common scale. Other advantages of using item banks include flexibility in terms of the degree of precision desired; interval measurement properties under most circumstances; realistic capability for accurate individual assessment over time (using CAT); and measurement equivalence across different patient populations. This work summarizes the process used in the creation and evaluation of item banks and reviews their potential contributions and limitations regarding outcome assessment and patient care, particularly when they are applied across people of different cultural backgrounds. %B Medical Care %V 44 %P S189-S197 %8 Nov %G eng %M 17060827 %0 Journal Article %J Acta Psychologica Sinica %D 2006 %T [Item Selection Strategies of Computerized Adaptive Testing based on Graded Response Model.] %A Ping, Chen %A Shuliang, Ding %A Haijing, Lin %A Jie, Zhou %K computerized adaptive testing %K item selection strategy %X Item selection strategy (ISS) is an important component of Computerized Adaptive Testing (CAT). Its performance directly affects the security, efficiency and precision of the test. Thus, ISS becomes one of the central issues in CATs based on the Graded Response Model (GRM). It is well known that the goal of IIS is to administer the next unused item remaining in the item bank that best fits the examinees current ability estimate. In dichotomous IRT models, every item has only one difficulty parameter and the item whose difficulty matches the examinee's current ability estimate is considered to be the best fitting item. However, in GRM, each item has more than two ordered categories and has no single value to represent the item difficulty. Consequently, some researchers have used to employ the average or the median difficulty value across categories as the difficulty estimate for the item. Using the average value and the median value in effect introduced two corresponding ISSs. In this study, we used computer simulation compare four ISSs based on GRM. We also discussed the effect of "shadow pool" on the uniformity of pool usage as well as the influence of different item parameter distributions and different ability estimation methods on the evaluation criteria of CAT. In the simulation process, Monte Carlo method was adopted to simulate the entire CAT process; 1,000 examinees drawn from standard normal distribution and four 1,000-sized item pools of different item parameter distributions were also simulated. The assumption of the simulation is that a polytomous item is comprised of six ordered categories. In addition, ability estimates were derived using two methods. They were expected a posteriori Bayesian (EAP) and maximum likelihood estimation (MLE). In MLE, the Newton-Raphson iteration method and the Fisher Score iteration method were employed, respectively, to solve the likelihood equation. Moreover, the CAT process was simulated with each examinee 30 times to eliminate random error. The IISs were evaluated by four indices usually used in CAT from four aspects--the accuracy of ability estimation, the stability of IIS, the usage of item pool, and the test efficiency. Simulation results showed adequate evaluation of the ISS that matched the estimate of an examinee's current trait level with the difficulty values across categories. Setting "shadow pool" in ISS was able to improve the uniformity of pool utilization. Finally, different distributions of the item parameter and different ability estimation methods affected the evaluation indices of CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Acta Psychologica Sinica %I Science Press: China %V 38 %P 461-467 %@ 0439-755X (Print) %G eng %M 2006-09336-020 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2006 %T Measurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the pediatric evaluation of disability inventory %A Haley, S. M. %A Ni, P. %A Ludlow, L. H. %A Fragala-Pinkham, M. A. %K *Disability Evaluation %K *Pediatrics %K Adolescent %K Child %K Child, Preschool %K Computers %K Disabled Persons/*classification/rehabilitation %K Efficiency %K Humans %K Infant %K Outcome Assessment (Health Care) %K Psychometrics %K Self Care %X OBJECTIVE: To compare the measurement efficiency and precision of a multidimensional computer adaptive testing (M-CAT) application to a unidimensional CAT (U-CAT) comparison using item bank data from 2 of the functional skills scales of the Pediatric Evaluation of Disability Inventory (PEDI). DESIGN: Using existing PEDI mobility and self-care item banks, we compared the stability of item calibrations and model fit between unidimensional and multidimensional Rasch models and compared the efficiency and precision of the U-CAT- and M-CAT-simulated assessments to a random draw of items. SETTING: Pediatric rehabilitation hospital and clinics. PARTICIPANTS: Clinical and normative samples. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Not applicable. RESULTS: The M-CAT had greater levels of precision and efficiency than the separate mobility and self-care U-CAT versions when using a similar number of items for each PEDI subdomain. Equivalent estimation of mobility and self-care scores can be achieved with a 25% to 40% item reduction with the M-CAT compared with the U-CAT. CONCLUSIONS: M-CAT applications appear to have both precision and efficiency advantages compared with separate U-CAT assessments when content subdomains have a high correlation. Practitioners may also realize interpretive advantages of reporting test score information for each subdomain when separate clinical inferences are desired. %B Archives of Physical Medicine and Rehabilitation %7 2006/08/29 %V 87 %P 1223-9 %8 Sep %@ 0003-9993 (Print) %G eng %M 16935059 %0 Journal Article %J Applied Measurement in Education %D 2006 %T Optimal and nonoptimal computer-based test designs for making pass-fail decisions %A Hambleton, R. K. %A Xing, D. %K adaptive test %K credentialing exams %K Decision Making %K Educational Measurement %K multistage tests %K optimal computer-based test designs %K test form %X Now that many credentialing exams are being routinely administered by computer, new computer-based test designs, along with item response theory models, are being aggressively researched to identify specific designs that can increase the decision consistency and accuracy of pass-fail decisions. The purpose of this study was to investigate the impact of optimal and nonoptimal multistage test (MST) designs, linear parallel-form test designs (LPFT), and computer adaptive test (CAT) designs on the decision consistency and accuracy of pass-fail decisions. Realistic testing situations matching those of one of the large credentialing agencies were simulated to increase the generalizability of the findings. The conclusions were clear: (a) With the LPFTs, matching test information functions (TIFs) to the mean of the proficiency distribution produced slightly better results than matching them to the passing score; (b) all of the test designs worked better than test construction using random selection of items, subject to content constraints only; (c) CAT performed better than the other test designs; and (d) if matching a TIP to the passing score, the MST design produced a bit better results than the LPFT design. If an argument for the MST design is to be made, it can be made on the basis of slight improvements over the LPFT design and better expected item bank utilization, candidate preference, and the potential for improved diagnostic feedback, compared with the feedback that is possible with fixed linear test forms. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Applied Measurement in Education %I Lawrence Erlbaum: US %V 19 %P 221-239 %@ 0895-7347 (Print); 1532-4818 (Electronic) %G eng %M 2006-08493-004 %0 Journal Article %J Clin Rehabil %D 2006 %T Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness programme %A Haley, S. M. %A Fragala-Pinkham, M. A. %A Ni, P. %B Clin Rehabil %V 20 %P 616-622 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function %A Hart, D. L. %A Mioduski, J. E. %A Werneke, M. W. %A Stratford, P. W. %K Back Pain Functional Scale %K computerized adaptive testing %K Item Response Theory %K Lumbar spine %K Rehabilitation %K True-score equating %X Objective: To equate physical functioning (PF) items with Back Pain Functional Scale (BPFS) items, develop a computerized adaptive test (CAT) designed to assess lumbar spine functional status (LFS) in people with lumbar spine impairments, and compare discriminant validity of LFS measures (qIRT) generated using all items analyzed with a rating scale Item Response Theory model (RSM) and measures generated using the simulated CAT (qCAT). Methods: We performed a secondary analysis of retrospective intake rehabilitation data. Results: Unidimensionality and local independence of 25 BPFS and PF items were supported. Differential item functioning was negligible for levels of symptom acuity, gender, age, and surgical history. The RSM fit the data well. A lumbar spine specific CAT was developed that was 72% more efficient than using all 25 items to estimate LFS measures. qIRT and qCAT measures did not discriminate patients by symptom acuity, age, or gender, but discriminated patients by surgical history in similar clinically logical ways. qCAT measures were as precise as qIRT measures. Conclusion: A body part specific simulated CAT developed from an LFS item bank was efficient and produced precise measures of LFS without eroding discriminant validity. %B Journal of Clinical Epidemiology %V 59 %P 947–956 %G eng %R 10.1016/j.jclinepi.2005.10.017 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function %A Hart, D. %A Mioduski, J. %A Werenke, M. %A Stratford, P. %B Journal of Clinical Epidemiology %V 59 %P 947-956 %G eng %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function %A Hart, D. L. %A Cook, K. F. %A Mioduski, J. E. %A Teal, C. R. %A Crane, P. K. %K computerized adaptive testing %K Flexilevel Scale of Shoulder Function %K Item Response Theory %K Rehabilitation %X

Background and Objective: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items,
develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (qIRT) and measures generated using the simulated CAT (qCAT).
Study Design and Setting: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients
with shoulder impairments who completed 60 SFS items.
Results: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items on were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The qIRT and qCAT measures were highly correlated (r 5 .96) and resulted in similar classifications of patients.
Conclusion: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good
discriminating ability. 

%B Journal of Clinical Epidemiology %V 59 %P 290-298 %G English %N 3 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function %A Hart, D. L. %A Cook, K. F. %A Mioduski, J. E. %A Teal, C. R. %A Crane, P. K. %K *Computer Simulation %K *Range of Motion, Articular %K Activities of Daily Living %K Adult %K Aged %K Aged, 80 and over %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Prospective Studies %K Reproducibility of Results %K Research Support, N.I.H., Extramural %K Research Support, U.S. Gov't, Non-P.H.S. %K Shoulder Dislocation/*physiopathology/psychology/rehabilitation %K Shoulder Pain/*physiopathology/psychology/rehabilitation %K Shoulder/*physiopathology %K Sickness Impact Profile %K Treatment Outcome %X BACKGROUND AND OBJECTIVE: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items, develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (theta(IRT)) and measures generated using the simulated CAT (theta(CAT)). STUDY DESIGN AND SETTING: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients with shoulder impairments who completed 60 SFS items. RESULTS: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The theta(IRT) and theta(CAT) measures were highly correlated (r = .96) and resulted in similar classifications of patients. CONCLUSION: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability. %B Journal of Clinical Epidemiology %V 59 %P 290-8 %G eng %M 16488360 %0 Book Section %B Outcomes assessment in cancer %D 2005 %T Applications of item response theory to improve health outcomes assessment: Developing item banks, linking instruments, and computer-adaptive testing %A Hambleton, R. K. %E C. C. Gotay %E C. Snyder %K Computer Assisted Testing %K Health %K Item Response Theory %K Measurement %K Test Construction %K Treatment Outcomes %X (From the chapter) The current chapter builds on Reise's introduction to the basic concepts, assumptions, popular models, and important features of IRT and discusses the applications of item response theory (IRT) modeling to health outcomes assessment. In particular, we highlight the critical role of IRT modeling in: developing an instrument to match a study's population; linking two or more instruments measuring similar constructs on a common metric; and creating item banks that provide the foundation for tailored short-form instruments or for computerized adaptive assessments. (PsycINFO Database Record (c) 2005 APA ) %B Outcomes assessment in cancer %I Cambridge University Press %C Cambridge, UK %P 445-464 %G eng %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2005 %T Assessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory %A Haley, S. M. %A Raczek, A. E. %A Coster, W. J. %A Dumas, H. M. %A Fragala-Pinkham, M. A. %K *Computer Simulation %K *Disability Evaluation %K Adolescent %K Child %K Child, Preschool %K Cross-Sectional Studies %K Disabled Children/*rehabilitation %K Female %K Humans %K Infant %K Male %K Outcome Assessment (Health Care)/*methods %K Rehabilitation Centers %K Rehabilitation/*standards %K Sensitivity and Specificity %X OBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time. %B Archives of Physical Medicine and Rehabilitation %7 2005/05/17 %V 86 %P 932-9 %8 May %@ 0003-9993 (Print) %G eng %M 15895339 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2005 %T Assessing Mobility in Children Using a Computer Adaptive Testing Version of the Pediatric Evaluation of Disability Inventory %A Haley, S. %A Raczek, A. %A Coster, W. %A Dumas, H. %A Fragalapinkham, M. %B Archives of Physical Medicine and Rehabilitation %V 86 %P 932-939 %@ 00039993 %G eng %0 Journal Article %J Developmental Medicine and Child Neuropsychology %D 2005 %T A computer adaptive testing approach for assessing physical functioning in children and adolescents %A Haley, S. M. %A Ni, P. %A Fragala-Pinkham, M. A. %A Skrinar, A. M. %A Corzo, D. %K *Computer Systems %K Activities of Daily Living %K Adolescent %K Age Factors %K Child %K Child Development/*physiology %K Child, Preschool %K Computer Simulation %K Confidence Intervals %K Demography %K Female %K Glycogen Storage Disease Type II/physiopathology %K Health Status Indicators %K Humans %K Infant %K Infant, Newborn %K Male %K Motor Activity/*physiology %K Outcome Assessment (Health Care)/*methods %K Reproducibility of Results %K Self Care %K Sensitivity and Specificity %X The purpose of this article is to demonstrate: (1) the accuracy and (2) the reduction in amount of time and effort in assessing physical functioning (self-care and mobility domains) of children and adolescents using computer-adaptive testing (CAT). A CAT algorithm selects questions directly tailored to the child's ability level, based on previous responses. Using a CAT algorithm, a simulation study was used to determine the number of items necessary to approximate the score of a full-length assessment. We built simulated CAT (5-, 10-, 15-, and 20-item versions) for self-care and mobility domains and tested their accuracy in a normative sample (n=373; 190 males, 183 females; mean age 6y 11mo [SD 4y 2m], range 4mo to 14y 11mo) and a sample of children and adolescents with Pompe disease (n=26; 21 males, 5 females; mean age 6y 1mo [SD 3y 10mo], range 5mo to 14y 10mo). Results indicated that comparable score estimates (based on computer simulations) to the full-length tests can be achieved in a 20-item CAT version for all age ranges and for normative and clinical samples. No more than 13 to 16% of the items in the full-length tests were needed for any one administration. These results support further consideration of using CAT programs for accurate and efficient clinical assessments of physical functioning. %B Developmental Medicine and Child Neuropsychology %7 2005/02/15 %V 47 %P 113-120 %8 Feb %@ 0012-1622 (Print) %G eng %M 15707234 %0 Journal Article %J Journal of Computer Assisted Learning %D 2005 %T A computer-assisted test design and diagnosis system for use by classroom teachers %A He, Q. %A Tymms, P. %K Computer Assisted Testing %K Computer Software %K Diagnosis %K Educational Measurement %K Teachers %X Computer-assisted assessment (CAA) has become increasingly important in education in recent years. A variety of computer software systems have been developed to help assess the performance of students at various levels. However, such systems are primarily designed to provide objective assessment of students and analysis of test items, and focus has been mainly placed on higher and further education. Although there are commercial professional systems available for use by primary and secondary educational institutions, such systems are generally expensive and require skilled expertise to operate. In view of the rapid progress made in the use of computer-based assessment for primary and secondary students by education authorities here in the UK and elsewhere, there is a need to develop systems which are economic and easy to use and can provide the necessary information that can help teachers improve students' performance. This paper presents the development of a software system that provides a range of functions including generating items and building item banks, designing tests, conducting tests on computers and analysing test results. Specifically, the system can generate information on the performance of students and test items that can be easily used to identify curriculum areas where students are under performing. A case study based on data collected from five secondary schools in Hong Kong involved in the Curriculum, Evaluation and Management Centre's Middle Years Information System Project, Durham University, UK, has been undertaken to demonstrate the use of the system for diagnostic and performance analysis. (PsycINFO Database Record (c) 2006 APA ) (journal abstract) %B Journal of Computer Assisted Learning %V 21 %P 419-429 %G eng %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2005 %T Computerized adaptive testing: a mixture item selection approach for constrained situations %A Leung, C. K. %A Chang, Hua-Hua %A Hau, K. T. %K *Computer-Aided Design %K *Educational Measurement/methods %K *Models, Psychological %K Humans %K Psychometrics/methods %X In computerized adaptive testing (CAT), traditionally the most discriminating items are selected to provide the maximum information so as to attain the highest efficiency in trait (theta) estimation. The maximum information (MI) approach typically results in unbalanced item exposure and hence high item-overlap rates across examinees. Recently, Yi and Chang (2003) proposed the multiple stratification (MS) method to remedy the shortcomings of MI. In MS, items are first sorted according to content, then difficulty and finally discrimination parameters. As discriminating items are used strategically, MS offers a better utilization of the entire item pool. However, for testing with imposed non-statistical constraints, this new stratification approach may not maintain its high efficiency. Through a series of simulation studies, this research explored the possible benefits of a mixture item selection approach (MS-MI), integrating the MS and MI approaches, in testing with non-statistical constraints. In all simulation conditions, MS consistently outperformed the other two competing approaches in item pool utilization, while the MS-MI and the MI approaches yielded higher measurement efficiency and offered better conformity to the constraints. Furthermore, the MS-MI approach was shown to perform better than MI on all evaluation criteria when control of item exposure was imposed. %B British Journal of Mathematical and Statistical Psychology %7 2005/11/19 %V 58 %P 239-57 %8 Nov %@ 0007-1102 (Print)0007-1102 (Linking) %G eng %M 16293199 %0 Government Document %D 2005 %T Computerizing statewide assessments in Minnesota: A report on the feasibility of converting the Minnesota Comprehensive Assessments to a computerized adaptive format %A Peterson, K.A. %A Davison. M. L. %A Hjelseth, L. %I Office of Educational Accountability, College of Education and Human Development, University of Minnesota %G eng %0 Journal Article %J Journal of Rehabilitation Medicine %D 2005 %T Contemporary measurement techniques for rehabilitation outcomes assessment %A Jette, A. M. %A Haley, S. M. %K *Disability Evaluation %K Activities of Daily Living/classification %K Disabled Persons/classification/*rehabilitation %K Health Status Indicators %K Humans %K Outcome Assessment (Health Care)/*methods/standards %K Recovery of Function %K Research Support, N.I.H., Extramural %K Research Support, U.S. Gov't, Non-P.H.S. %K Sensitivity and Specificity computerized adaptive testing %X In this article, we review the limitations of traditional rehabilitation functional outcome instruments currently in use within the rehabilitation field to assess Activity and Participation domains as defined by the International Classification of Function, Disability, and Health. These include a narrow scope of functional outcomes, data incompatibility across instruments, and the precision vs feasibility dilemma. Following this, we illustrate how contemporary measurement techniques, such as item response theory methods combined with computer adaptive testing methodology, can be applied in rehabilitation to design functional outcome instruments that are comprehensive in scope, accurate, allow for compatibility across instruments, and are sensitive to clinically important change without sacrificing their feasibility. Finally, we present some of the pressing challenges that need to be overcome to provide effective dissemination and training assistance to ensure that current and future generations of rehabilitation professionals are familiar with and skilled in the application of contemporary outcomes measurement. %B Journal of Rehabilitation Medicine %V 37 %P 339-345 %G eng %M 16287664 %0 Journal Article %J IEEE Transactions on Education %D 2005 %T Design and evaluation of an XML-based platform-independent computerized adaptive testing system %A Ho, R.-G., %A Yen, Y.-C. %B IEEE Transactions on Education %V 48(2) %P 230-237 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2005 %T Infeasibility in automated test assembly models: A comparison study of different methods %A Huitzing, H. A. %A Veldkamp, B. P. %A Verschoor, A. J. %K Algorithms %K Item Content (Test) %K Models %K Test Construction %X Several techniques exist to automatically put together a test meeting a number of specifications. In an item bank, the items are stored with their characteristics. A test is constructed by selecting a set of items that fulfills the specifications set by the test assembler. Test assembly problems are often formulated in terms of a model consisting of restrictions and an objective to be maximized or minimized. A problem arises when it is impossible to construct a test from the item pool that meets all specifications, that is, when the model is not feasible. Several methods exist to handle these infeasibility problems. In this article, test assembly models resulting from two practical testing programs were reconstructed to be infeasible. These models were analyzed using methods that forced a solution (Goal Programming, Multiple-Goal Programming, Greedy Heuristic), that analyzed the causes (Relaxed and Ordered Deletion Algorithm (RODA), Integer Randomized Deletion Algorithm (IRDA), Set Covering (SC), and Item Sampling), or that analyzed the causes and used this information to force a solution (Irreducible Infeasible Set-Solver). Specialized methods such as the IRDA and the Irreducible Infeasible Set-Solver performed best. Recommendations about the use of different methods are given. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Journal of Educational Measurement %V 42 %P 223-243 %G eng %0 Journal Article %J Acta Psychologica Sinica %D 2005 %T [Item characteristic curve equating under graded response models in IRT] %A Jun, Z. %A Dongming, O. %A Shuyuan, X. %A Haiqi, D. %A Shuqing, Q. %K graded response models %K item characteristic curve %K Item Response Theory %X In one of the largest qualificatory tests--economist test, to guarantee the comparability among different years, construct item bank and prepare for computerized adaptive testing, item characteristic curve equating and anchor test equating design under graded models in IRT are used, which have realized the item and ability parameter equating of test data in five years and succeeded in establishing an item bank. Based on it, cut scores of different years are compared by equating and provide demonstrational gist to constitute the eligibility standard of economist test. %B Acta Psychologica Sinica %I Science Press: China %V 37 %P 832-838 %@ 0439-755X (Print) %G eng %M 2005-16031-017 %0 Journal Article %J American Journal of Physical Medicine and Rehabilitation %D 2005 %T Measuring physical function in patients with complex medical and postsurgical conditions: a computer adaptive approach %A Siebens, H. %A Andres, P. L. %A Pengsheng, N. %A Coster, W. J. %A Haley, S. M. %K Activities of Daily Living/*classification %K Adult %K Aged %K Cohort Studies %K Continuity of Patient Care %K Disability Evaluation %K Female %K Health Services Research %K Humans %K Male %K Middle Aged %K Postoperative Care/*rehabilitation %K Prognosis %K Recovery of Function %K Rehabilitation Centers %K Rehabilitation/*standards %K Sensitivity and Specificity %K Sickness Impact Profile %K Treatment Outcome %X OBJECTIVE: To examine whether the range of disability in the medically complex and postsurgical populations receiving rehabilitation is adequately sampled by the new Activity Measure--Post-Acute Care (AM-PAC), and to assess whether computer adaptive testing (CAT) can derive valid patient scores using fewer questions. DESIGN: Observational study of 158 subjects (mean age 67.2 yrs) receiving skilled rehabilitation services in inpatient (acute rehabilitation hospitals, skilled nursing facility units) and community (home health services, outpatient departments) settings for recent-onset or worsening disability from medical (excluding neurological) and surgical (excluding orthopedic) conditions. Measures were interviewer-administered activity questions (all patients) and physical functioning portion of the SF-36 (outpatients) and standardized chart items (11 Functional Independence Measure (FIM), 19 Standardized Outcome and Assessment Information Set (OASIS) items, and 22 Minimum Data Set (MDS) items). Rasch modeling analyzed all data and the relationship between person ability estimates and average item difficulty. CAT assessed the ability to derive accurate patient scores using a sample of questions. RESULTS: The 163-item activity item pool covered the range of physical movement and personal and instrumental activities. CAT analysis showed comparable scores between estimates using 10 items or the total item pool. CONCLUSION: The AM-PAC can assess a broad range of function in patients with complex medical illness. CAT achieves valid patient scores using fewer questions. %B American Journal of Physical Medicine and Rehabilitation %V 84 %P 741-8 %8 Oct %G eng %M 16205429 %0 Journal Article %J Psicothema %D 2005 %T Propiedades psicométricas de un test Adaptativo Informatizado para la medición del ajuste emocional [Psychometric properties of an Emotional Adjustment Computerized Adaptive Test] %A Aguado, D. %A Rubio, V. J. %A Hontangas, P. M. %A Hernández, J. M. %K Computer Assisted Testing %K Emotional Adjustment %K Item Response %K Personality Measures %K Psychometrics %K Test Validity %K Theory %X En el presente trabajo se describen las propiedades psicométricas de un Test Adaptativo Informatizado para la medición del ajuste emocional de las personas. La revisión de la literatura acerca de la aplicación de los modelos de la teoría de la respuesta a los ítems (TRI) muestra que ésta se ha utilizado más en el trabajo con variables aptitudinales que para la medición de variables de personalidad, sin embargo diversos estudios han mostrado la eficacia de la TRI para la descripción psicométrica de dichasvariables. Aun así, pocos trabajos han explorado las características de un Test Adaptativo Informatizado, basado en la TRI, para la medición de una variable de personalidad como es el ajuste emocional. Nuestros resultados muestran la eficiencia del TAI para la evaluación del ajuste emocional, proporcionando una medición válida y precisa, utilizando menor número de elementos de medida encomparación con las escalas de ajuste emocional de instrumentos fuertemente implantados. Psychometric properties of an emotional adjustment computerized adaptive test. In the present work it was described the psychometric properties of an emotional adjustment computerized adaptive test. An examination of Item Response Theory (IRT) research literature indicates that IRT has been mainly used for assessing achievements and ability rather than personality factors. Nevertheless last years have shown several studies wich have successfully used IRT to personality assessment instruments. Even so, a few amount of works has inquired the computerized adaptative test features, based on IRT, for the measurement of a personality traits as it’s the emotional adjustment. Our results show the CAT efficiency for the emotional adjustment assessment so this provides a valid and accurate measurement; by using a less number of items in comparison with the emotional adjustment scales from the most strongly established questionnaires. %B Psicothema %V 17 %P 484-491 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2005 %T A Randomized Experiment to Compare Conventional, Computerized, and Computerized Adaptive Administration of Ordinal Polytomous Attitude Items %A Hol, A. Michiel %A Vorst, Harrie C. M. %A Mellenbergh, Gideon J. %X

A total of 520 high school students were randomly assigned to a paper-and-pencil test (PPT), a computerized standard test (CST), or a computerized adaptive test (CAT) version of the Dutch School Attitude Questionnaire (SAQ), consisting of ordinal polytomous items. The CST administered items in the same order as the PPT. The CAT administered all items of three SAQ subscales in adaptive order using Samejima’s graded response model, so that six different stopping rule settings could be applied afterwards. School marks were used as external criteria. Results showed significant but small multivariate administration mode effects on conventional raw scores and small to medium effects on maximum likelihood latent trait estimates. When the precision of CAT latent trait estimates decreased, correlations with grade point average in general decreased. However, the magnitude of the decrease was not very large as compared to the PPT, the CST, and the CAT without the stopping rule.

%B Applied Psychological Measurement %V 29 %P 159-183 %U http://apm.sagepub.com/content/29/3/159.abstract %R 10.1177/0146621604271268 %0 Journal Article %J Applied Psychological Measurement %D 2005 %T A randomized experiment to compare conventional, computerized, and computerized adaptive administration of ordinal polytomous attitude items %A Hol, A. M. %A Vorst, H. C. M. %A Mellenbergh, G. J. %K Computer Assisted Testing %K Test Administration %K Test Items %X A total of 520 high school students were randomly assigned to a paper-and-pencil test (PPT), a computerized standard test (CST), or a computerized adaptive test (CAT) version of the Dutch School Attitude Questionnaire (SAQ), consisting of ordinal polytomous items. The CST administered items in the same order as the PPT. The CAT administered all items of three SAQ subscales in adaptive order using Samejima's graded response model, so that six different stopping rule settings could be applied afterwards. School marks were used as external criteria. Results showed significant but small multivariate administration mode effects on conventional raw scores and small to medium effects on maximum likelihood latent trait estimates. When the precision of CAT latent trait estimates decreased, correlations with grade point average in general decreased. However, the magnitude of the decrease was not very large as compared to the PPT, the CST, and the CAT without the stopping rule. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 29 %P 159-183 %G eng %0 Journal Article %J Journal of Clinical Epidemiology %D 2005 %T Simulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments %A Hart, D. L. %A Mioduski, J. E. %A Stratford, P. W. %K *Health Status Indicators %K Activities of Daily Living %K Adolescent %K Adult %K Aged %K Aged, 80 and over %K Ankle Joint/physiopathology %K Diagnosis, Computer-Assisted/*methods %K Female %K Hip Joint/physiopathology %K Humans %K Joint Diseases/physiopathology/*rehabilitation %K Knee Joint/physiopathology %K Lower Extremity/*physiopathology %K Male %K Middle Aged %K Research Support, N.I.H., Extramural %K Research Support, U.S. Gov't, P.H.S. %K Retrospective Studies %X BACKGROUND AND OBJECTIVE: To develop computerized adaptive tests (CATs) designed to assess lower extremity functional status (FS) in people with lower extremity impairments using items from the Lower Extremity Functional Scale and compare discriminant validity of FS measures generated using all items analyzed with a rating scale Item Response Theory model (theta(IRT)) and measures generated using the simulated CATs (theta(CAT)). METHODS: Secondary analysis of retrospective intake rehabilitation data. RESULTS: Unidimensionality of items was strong, and local independence of items was adequate. Differential item functioning (DIF) affected item calibration related to body part, that is, hip, knee, or foot/ankle, but DIF did not affect item calibration for symptom acuity, gender, age, or surgical history. Therefore, patients were separated into three body part specific groups. The rating scale model fit all three data sets well. Three body part specific CATs were developed: each was 70% more efficient than using all LEFS items to estimate FS measures. theta(IRT) and theta(CAT) measures discriminated patients by symptom acuity, age, and surgical history in similar ways. theta(CAT) measures were as precise as theta(IRT) measures. CONCLUSION: Body part-specific simulated CATs were efficient and produced precise measures of FS with good discriminant validity. %B Journal of Clinical Epidemiology %V 58 %P 629-38 %G eng %M 15878477 %0 Journal Article %J Applied Psychological Measurement %D 2005 %T Test construction for cognitive diagnosis %A Henson, R. K. %A Douglas, J. %K (Measurement) %K Cognitive Assessment %K Item Analysis (Statistical) %K Profiles %K Test Construction %K Test Interpretation %K Test Items %X Although cognitive diagnostic models (CDMs) can be useful in the analysis and interpretation of existing tests, little has been developed to specify how one might construct a good test using aspects of the CDMs. This article discusses the derivation of a general CDM index based on Kullback-Leibler information that will serve as a measure of how informative an item is for the classification of examinees. The effectiveness of the index is examined for items calibrated using the deterministic input noisy "and" gate model (DINA) and the reparameterized unified model (RUM) by implementing a simple heuristic to construct a test from an item bank. When compared to randomly constructed tests from the same item bank, the heuristic shows significant improvement in classification rates. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 29 %P 262-277 %G eng %0 Journal Article %J Alcoholism: Clinical & Experimental Research %D 2005 %T Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire %A Kahler, C. W. %A Strong, D. R. %A Read, J. P. %A De Boeck, P. %A Wilson, M. %A Acton, G. S. %A Palfai, T. P. %A Wood, M. D. %A Mehta, P. D. %A Neale, M. C. %A Flay, B. R. %A Conklin, C. A. %A Clayton, R. R. %A Tiffany, S. T. %A Shiffman, S. %A Krueger, R. F. %A Nichol, P. E. %A Hicks, B. M. %A Markon, K. E. %A Patrick, C. J. %A Iacono, William G. %A McGue, Matt %A Langenbucher, J. W. %A Labouvie, E. %A Martin, C. S. %A Sanjuan, P. M. %A Bavly, L. %A Kirisci, L. %A Chung, T. %A Vanyukov, M. %A Dunn, M. %A Tarter, R. %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. %K Psychometrics %K Substance-Related Disorders %X Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association %B Alcoholism: Clinical & Experimental Research %V 29 %P 1180-1189 %G eng %0 Journal Article %J Medical Care %D 2004 %T Activity outcome measurement for postacute care %A Haley, S. M. %A Coster, W. J. %A Andres, P. L. %A Ludlow, L. H. %A Ni, P. %A Bond, T. L. %A Sinclair, S. J. %A Jette, A. M. %K *Self Efficacy %K *Sickness Impact Profile %K Activities of Daily Living/*classification/psychology %K Adult %K Aftercare/*standards/statistics & numerical data %K Aged %K Boston %K Cognition/physiology %K Disability Evaluation %K Factor Analysis, Statistical %K Female %K Human %K Male %K Middle Aged %K Movement/physiology %K Outcome Assessment (Health Care)/*methods/statistics & numerical data %K Psychometrics %K Questionnaires/standards %K Rehabilitation/*standards/statistics & numerical data %K Reproducibility of Results %K Sensitivity and Specificity %K Support, U.S. Gov't, Non-P.H.S. %K Support, U.S. Gov't, P.H.S. %X BACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings. %B Medical Care %V 42 %P I49-161 %G eng %M 14707755 %0 Generic %D 2004 %T The AMC Linear Disability Score project in a population requiring residential care: psychometric properties %A Holman, R. %A Lindeboom, R. %A Vermeulen, M. %A de Haan, R. J. %K *Disability Evaluation %K *Health Status Indicators %K Activities of Daily Living/*classification %K Adult %K Aged %K Aged, 80 and over %K Data Collection/methods %K Female %K Humans %K Logistic Models %K Male %K Middle Aged %K Netherlands %K Pilot Projects %K Probability %K Psychometrics/*instrumentation %K Questionnaires/standards %K Residential Facilities/*utilization %K Severity of Illness Index %X BACKGROUND: Currently there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes, including functional status. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. METHOD: This paper examines the psychometric properties of the AMC Linear Disability Score (ALDS) project item bank using an item response theory model and full information factor analysis. Data were collected from 555 respondents on a total of 160 items. RESULTS: Following the analysis, 79 items remained in the item bank. The remaining 81 items were excluded because of: difficulties in presentation (1 item); low levels of variation in response pattern (28 items); significant differences in measurement characteristics for males and females or for respondents under or over 85 years old (26 items); or lack of model fit to the data at item level (26 items). CONCLUSIONS: It is conceivable that the item bank will have different measurement characteristics for other patient or demographic populations. However, these results indicate that the ALDS item bank has sound psychometric properties for respondents in residential care settings and could form a stable base for measuring functional status in a range of situations, including the implementation of computerised adaptive testing of functional status. %B Health and Quality of Life Outcomes %7 2004/08/05 %V 2 %P 42 %8 Aug 3 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %M 15291958 %2 514531 %0 Journal Article %J European Journal of Psychological Assessment %D 2004 %T Assisted self-adapted testing: A comparative study %A Hontangas, P. %A Olea, J. %A Ponsoda, V. %A Revuelta, J. %A Wise, S. L. %K Adaptive Testing %K Anxiety %K Computer Assisted Testing %K Psychometrics %K Test %X A new type of self-adapted test (S-AT), called Assisted Self-Adapted Test (AS-AT), is presented. It differs from an ordinary S-AT in that prior to selecting the difficulty category, the computer advises examinees on their best difficulty category choice, based on their previous performance. Three tests (computerized adaptive test, AS-AT, and S-AT) were compared regarding both their psychometric (precision and efficiency) and psychological (anxiety) characteristics. Tests were applied in an actual assessment situation, in which test scores determined 20% of term grades. A sample of 173 high school students participated. Neither differences in posttest anxiety nor ability were obtained. Concerning precision, AS-AT was as precise as CAT, and both revealed more precision than S-AT. It was concluded that AS-AT acted as a CAT concerning precision. Some hints, but not conclusive support, of the psychological similarity between AS-AT and S-AT was also found. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B European Journal of Psychological Assessment %V 20 %P 2-9 %G eng %0 Journal Article %J Stroke Rehabilitation %D 2004 %T Computer adaptive testing: a strategy for monitoring stroke rehabilitation across settings %A Andres, P. L. %A Black-Schaffer, R. M. %A Ni, P. %A Haley, S. M. %K *Computer Simulation %K *User-Computer Interface %K Adult %K Aged %K Aged, 80 and over %K Cerebrovascular Accident/*rehabilitation %K Disabled Persons/*classification %K Female %K Humans %K Male %K Middle Aged %K Monitoring, Physiologic/methods %K Severity of Illness Index %K Task Performance and Analysis %X Current functional assessment instruments in stroke rehabilitation are often setting-specific and lack precision, breadth, and/or feasibility. Computer adaptive testing (CAT) offers a promising potential solution by providing a quick, yet precise, measure of function that can be used across a broad range of patient abilities and in multiple settings. CAT technology yields a precise score by selecting very few relevant items from a large and diverse item pool based on each individual's responses. We demonstrate the potential usefulness of a CAT assessment model with a cross-sectional sample of persons with stroke from multiple rehabilitation settings. %B Stroke Rehabilitation %7 2004/05/01 %V 11 %P 33-39 %8 Spring %@ 1074-9357 (Print) %G eng %M 15118965 %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2004 %T Computer adaptive testing and the No Child Left Behind Act %A Kingsbury, G. G. %A Hauser, C. %B Paper presented at the annual meeting of the American Educational Research Association %C San Diego CA %G eng %0 Generic %D 2004 %T Computer-based test designs with optimal and non-optimal tests for making pass-fail decisions %A Hambleton, R. K. %A Xing, D. %C Research Report, University of Massachusetts, Amherst, MA %G eng %0 Journal Article %J Journal of Clinical Psychology %D 2004 %T Computers in clinical assessment: Historical developments, present status, and future challenges %A Butcher, J. N. %A Perry, J. L. %A Hahn, J. A. %K clinical assessment %K computerized testing method %K Internet %K psychological assessment services %X Computerized testing methods have long been regarded as a potentially powerful asset for providing psychological assessment services. Ever since computers were first introduced and adapted to the field of assessment psychology in the 1950s, they have been a valuable aid for scoring, data processing, and even interpretation of test results. The history and status of computer-based personality and neuropsychological tests are discussed in this article. Several pertinent issues involved in providing test interpretation by computer are highlighted. Advances in computer-based test use, such as computerized adaptive testing, are described and problems noted. Today, there is great interest in expanding the availability of psychological assessment applications on the Internet. Although these applications show great promise, there are a number of problems associated with providing psychological tests on the Internet that need to be addressed by psychologists before the Internet can become a major medium for psychological service delivery. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Clinical Psychology %I John Wiley & Sons: US %V 60 %P 331-345 %@ 0021-9762 (Print); 1097-4679 (Electronic) %G eng %M 2004-11596-008 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2004 %T Detecting exposed test items in computer-based testing %A Han, N. %A Hambleton, R. K. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2004 %T Impact of Test Design, Item Quality, and Item Bank Size on the Psychometric Properties of Computer-Based Credentialing Examinations %A Xing, Dehui %A Hambleton, Ronald K. %X

Computer-based testing by credentialing agencies has become common; however, selecting a test design is difficult because several good ones are available—parallel forms, computer adaptive (CAT), and multistage (MST). In this study, three computerbased test designs under some common examination conditions were investigated. Item bank size and item quality had a practically significant impact on decision consistency and accuracy. Even in nearly ideal situations, the choice of test design was not a factor in the results. Two conclusions follow from the findings: (a) More time and resources should be committed to expanding the size and quality of item banks, and (b) designs that individualize an exam administration such as MST and CAT may not be helpful when the primary purpose of the examination is to make pass-fail decisions and conditions are present for using parallel forms with a target information function that can be centered on the passing score.

%B Educational and Psychological Measurement %V 64 %P 5-21 %U http://epm.sagepub.com/content/64/1/5.abstract %R 10.1177/0013164403258393 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2004 %T Investigating the effects of selected multi-stage test design alternatives on credentialing outcomes %A Zenisky, A. L. %A Hambleton, R. K. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Mokken Scale Analysis Using Hierarchical Clustering Procedures %A van Abswoude, Alexandra A. H. %A Vermunt, Jeroen K. %A Hemker, Bas T. %A van der Ark, L. Andries %X

Mokken scale analysis (MSA) can be used to assess and build unidimensional scales from an item pool that is sensitive to multiple dimensions. These scales satisfy a set of scaling conditions, one of which follows from the model of monotone homogeneity. An important drawback of the MSA program is that the sequential item selection and scale construction procedure may not find the dominant underlying dimensionality of the responses to a set of items. The authors investigated alternative hierarchical item selection procedures and compared the performance of four hierarchical methods and the sequential clustering method in the MSA context. The results showed that hierarchical clustering methods can improve the search process of the dominant dimensionality of a data matrix. In particular, the complete linkage and scale linkage methods were promising in finding the dimensionality of the item response data from a set of items.

%B Applied Psychological Measurement %V 28 %P 332-354 %U http://apm.sagepub.com/content/28/5/332.abstract %R 10.1177/0146621604265510 %0 Generic %D 2004 %T Practical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project %A Holman, R. %A Glas, C. A. %A Lindeboom, R. %A Zwinderman, A. H. %A de Haan, R. J. %K *Disability Evaluation %K *Health Surveys %K *Logistic Models %K *Questionnaires %K Activities of Daily Living/*classification %K Data Interpretation, Statistical %K Health Status %K Humans %K Pilot Projects %K Probability %K Quality of Life %K Severity of Illness Index %X BACKGROUND: Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. METHODS: The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. RESULTS: The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. CONCLUSIONS: The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used. %B Health and Quality of Life Outcomes %7 2004/06/18 %V 2 %P 29 %8 Jun 16 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %9 Comparative StudyResearch Support, Non-U.S. Gov't %M 15200681 %2 441407 %0 Journal Article %J Medical Care %D 2004 %T Refining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain %A Coster, W. J. %A Haley, S. M. %A Andres, P. L. %A Ludlow, L. H. %A Bond, T. L. %A Ni, P. S. %K *Self Efficacy %K *Sickness Impact Profile %K Activities of Daily Living/*classification/psychology %K Adult %K Aged %K Aged, 80 and over %K Disability Evaluation %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods/statistics & numerical data %K Questionnaires/*standards %K Recovery of Function/physiology %K Rehabilitation/*standards/statistics & numerical data %K Reproducibility of Results %K Research Support, U.S. Gov't, Non-P.H.S. %K Research Support, U.S. Gov't, P.H.S. %K Sensitivity and Specificity %X BACKGROUND: Rehabilitation outcome measures routinely include content on performance of daily activities; however, the conceptual basis for item selection is rarely specified. These instruments differ significantly in format, number, and specificity of daily activity items and in the measurement dimensions and type of scale used to specify levels of performance. We propose that a requirement for upper limb and hand skills underlies many activities of daily living (ADL) and instrumental activities of daily living (IADL) items in current instruments, and that items selected based on this definition can be placed along a single functional continuum. OBJECTIVE: To examine the dimensional structure and content coverage of a Personal Care and Instrumental Activities item set and to examine the comparability of items from existing instruments and a set of new items as measures of this domain. METHODS: Participants (N = 477) from 3 different disability groups and 4 settings representing the continuum of postacute rehabilitation care were administered the newly developed Activity Measure for Post-Acute Care (AM-PAC), the SF-8, and an additional setting-specific measure: FIM (in-patient rehabilitation); MDS (skilled nursing facility); MDS-PAC (postacute settings); OASIS (home care); or PF-10 (outpatient clinic). Rasch (partial-credit model) analyses were conducted on a set of 62 items covering the Personal Care and Instrumental domain to examine item fit, item functioning, and category difficulty estimates and unidimensionality. RESULTS: After removing 6 misfitting items, the remaining 56 items fit acceptably along the hypothesized continuum. Analyses yielded different difficulty estimates for the maximum score (eg, "Independent performance") for items with comparable content from different instruments. Items showed little differential item functioning across age, diagnosis, or severity groups, and 92% of the participants fit the model. CONCLUSIONS: ADL and IADL items from existing rehabilitation outcomes instruments that depend on skilled upper limb and hand use can be located along a single continuum, along with the new personal care and instrumental items of the AM-PAC addressing gaps in content. Results support the validity of the proposed definition of the Personal Care and Instrumental Activities dimension of function as a guide for future development of rehabilitation outcome instruments, such as linked, setting-specific short forms and computerized adaptive testing approaches. %B Medical Care %V 42 %P I62-172 %8 Jan %G eng %M 14707756 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2004 %T Score comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care %A Haley, S. M. %A Coster, W. J. %A Andres, P. L. %A Kosinski, M. %A Ni, P. %K Boston %K Factor Analysis, Statistical %K Humans %K Outcome Assessment (Health Care)/*methods %K Prospective Studies %K Questionnaires/standards %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVE: To compare simulated short-form and computerized adaptive testing (CAT) scores to scores obtained from complete item sets for each of the 3 domains of the Activity Measure for Post-Acute Care (AM-PAC). DESIGN: Prospective study. SETTING: Six postacute health care networks in the greater Boston metropolitan area, including inpatient acute rehabilitation, transitional care units, home care, and outpatient services. PARTICIPANTS: A convenience sample of 485 adult volunteers who were receiving skilled rehabilitation services. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Inpatient and community-based short forms and CAT applications were developed for each of 3 activity domains (physical & mobility, personal care & instrumental, applied cognition) using item pools constructed from new items and items from existing postacute care instruments. RESULTS: Simulated CAT scores correlated highly with score estimates from the total item pool in each domain (4- and 6-item CAT r range,.90-.95; 10-item CAT r range,.96-.98). Scores on the 10-item short forms constructed for inpatient and community settings also provided good estimates of the AM-PAC item pool scores for the physical & movement and personal care & instrumental domains, but were less consistent in the applied cognition domain. Confidence intervals around individual scores were greater in the short forms than for the CATs. CONCLUSIONS: Accurate scoring estimates for AM-PAC domains can be obtained with either the setting-specific short forms or the CATs. The strong relationship between CAT and item pool scores can be attributed to the CAT's ability to select specific items to match individual responses. The CAT may have additional advantages over short forms in practicality, efficiency, and the potential for providing more precise scoring estimates for individuals. %B Archives of Physical Medicine and Rehabilitation %7 2004/04/15 %V 85 %P 661-6 %8 Apr %@ 0003-9993 (Print) %G eng %M 15083444 %0 Journal Article %J Metodologia de Las Ciencias del Comportamiento. %D 2004 %T Statistics for detecting disclosed items in a CAT environment %A Lu, Y., %A Hambleton, R. K. %B Metodologia de Las Ciencias del Comportamiento. %V 5 %G eng %N 2 %& págs. 225-242 %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Using Set Covering with Item Sampling to Analyze the Infeasibility of Linear Programming Test Assembly Models %A Huitzing, Hiddo A. %X

This article shows how set covering with item sampling (SCIS) methods can be used in the analysis and preanalysis of linear programming models for test assembly (LPTA). LPTA models can construct tests, fulfilling a set of constraints set by the test assembler. Sometimes, no solution to the LPTA model exists. The model is then said to be infeasible. Causes of infeasibility can be difficult to find. A method is proposed that constitutes a helpful tool for test assemblers to detect infeasibility before hand and, in the case of infeasibility, give insight into its causes. This method is based on SCIS. Although SCIS can help to detect feasibility or infeasibility, its power lies in pinpointing causes of infeasibility such as irreducible infeasible sets of constraints. Methods to resolve infeasibility are also given, minimizing the model deviations. A simulation study is presented, offering a guide to test assemblers to analyze and solve infeasibility.

%B Applied Psychological Measurement %V 28 %P 355-375 %U http://apm.sagepub.com/content/28/5/355.abstract %R 10.1177/0146621604266152 %0 Journal Article %J The Journal of Technology, Learning and Assessment %D 2003 %T Computerized adaptive testing: A comparison of three content balancing methods %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %X Content balancing is often a practical consideration in the design of computerized adaptive testing (CAT). This study compared three content balancing methods, namely, the constrained CAT (CCAT), the modified constrained CAT (MCCAT), and the modified multinomial model (MMM), under various conditions of test length and target maximum exposure rate. Results of a series of simulation studies indicate that there is no systematic effect of content balancing method in measurement efficiency and pool utilization. However, among the three methods, the MMM appears to consistently over-expose fewer items. %B The Journal of Technology, Learning and Assessment %V 2 %P 1-15 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Computerized adaptive testing: A comparison of three content balancing methods %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %A Wen. Z. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Journal of Applied Measurement %D 2003 %T Developing an initial physical function item bank from existing sources %A Bode, R. K. %A Cella, D. %A Lai, J. S. %A Heinemann, A. W. %K *Databases %K *Sickness Impact Profile %K Adaptation, Psychological %K Data Collection %K Humans %K Neoplasms/*physiopathology/psychology/therapy %K Psychometrics %K Quality of Life/*psychology %K Research Support, U.S. Gov't, P.H.S. %K United States %X The objective of this article is to illustrate incremental item banking using health-related quality of life data collected from two samples of patients receiving cancer treatment. The kinds of decisions one faces in establishing an item bank for computerized adaptive testing are also illustrated. Pre-calibration procedures include: identifying common items across databases; creating a new database with data from each pool; reverse-scoring "negative" items; identifying rating scales used in items; identifying pivot points in each rating scale; pivot anchoring items at comparable rating scale categories; and identifying items in each instrument that measure the construct of interest. A series of calibrations were conducted in which a small proportion of new items were added to the common core and misfitting items were identified and deleted until an initial item bank has been developed. %B Journal of Applied Measurement %V 4 %P 124-36 %G eng %M 12748405 %0 Generic %D 2003 %T Effect of extra time on GRE® Quantitative and Verbal Scores (Research Report 03-13) %A Bridgeman, B. %A Cline, F. %A Hessinger, J. %C Princeton NJ: Educational Testing service %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Effects of test administration mode on item parameter estimates %A Yi, Q. %A Harris, D. J. %A Wang, T. %A Ban, J-C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2003 %T Incorporation Of Content Balancing Requirements In Stratification Designs For Computerized Adaptive Testing %A Leung, Chi-Keung %A Chang, Hua-Hua %A Hau, Kit-Tai %X

In computerized adaptive testing, the multistage a-stratified design advocates a new philosophy on pool management and item selection in which, contradictory to common practice, less discriminating items are used first. The method is effective in reducing item-overlap rate and enhancing pool utilization. This stratification method has been extended in different ways to deal with the practical issues of content constraints and the positive correlation between item difficulty and discrimination. Nevertheless, these modified designs on their own do not automatically satisfy content requirements. In this study, three stratification designs were examined in conjunction with three well developed content balancing methods. The performance of each of these nine combinational methods was evaluated in terms of their item security, measurement efficiency, and pool utilization. Results showed substantial differences in item-overlap rate and pool utilization among different methods. An optimal combination of stratification design and content balancing method is recommended.

%B Educational and Psychological Measurement %V 63 %P 257-270 %U http://epm.sagepub.com/content/63/2/257.abstract %R 10.1177/0013164403251326 %0 Journal Article %J Educational and Psychological Measurement %D 2003 %T Incorporation of Content Balancing Requirements in Stratification Designs for Computerized Adaptive Testing %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %K computerized adaptive testing %X Studied three stratification designs for computerized adaptive testing in conjunction with three well-developed content balancing methods. Simulation study results show substantial differences in item overlap rate and pool utilization among different methods. Recommends an optimal combination of stratification design and content balancing method. (SLD) %B Educational and Psychological Measurement %V 63 %P 257-70 %G eng %M EJ672406 %0 Journal Article %J Quality of Life Research %D 2003 %T Item banking to improve, shorten and computerized self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale %A Lai, J-S. %A Crane, P. K. %A Cella, D. %A Chang, C-H. %A Bode, R. K. %A Heinemann, A. W. %K *Health Status Indicators %K *Questionnaires %K Adult %K Fatigue/*diagnosis/etiology %K Female %K Humans %K Male %K Middle Aged %K Neoplasms/complications %K Psychometrics %K Research Support, Non-U.S. Gov't %K Research Support, U.S. Gov't, P.H.S. %K Sickness Impact Profile %X Fatigue is a common symptom among cancer patients and the general population. Due to its subjective nature, fatigue has been difficult to effectively and efficiently assess. Modern computerized adaptive testing (CAT) can enable precise assessment of fatigue using a small number of items from a fatigue item bank. CAT enables brief assessment by selecting questions from an item bank that provide the maximum amount of information given a person's previous responses. This article illustrates steps to prepare such an item bank, using 13 items from the Functional Assessment of Chronic Illness Therapy Fatigue Subscale (FACIT-F) as the basis. Samples included 1022 cancer patients and 1010 people from the general population. An Item Response Theory (IRT)-based rating scale model, a polytomous extension of the Rasch dichotomous model was utilized. Nine items demonstrating acceptable psychometric properties were selected and positioned on the fatigue continuum. The fatigue levels measured by these nine items along with their response categories covered 66.8% of the general population and 82.6% of the cancer patients. Although the operational CAT algorithms to handle polytomously scored items are still in progress, we illustrated how CAT may work by using nine core items to measure level of fatigue. Using this illustration, a fatigue measure comparable to its full-length 13-item scale administration was obtained using four items. The resulting item bank can serve as a core to which will be added a psychometrically sound and operational item bank covering the entire fatigue continuum. %B Quality of Life Research %V 12 %P 485-501 %8 Aug %G eng %M 13677494 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Maintaining scale in computer adaptive testing %A Smith, R. L. %A Rizavi, S. %A Paez, R. %A Damiano, M. %A Herbert, E. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Recalibration of IRT item parameters in CAT: Sparse data matrices and missing data treatments %A Harmes, J. C. %A Parshall, C. G. %A Kromrey, J. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Small sample estimation in dichotomous item response models: Effect of priors based on judgmental information on the accuracy of item parameter estimates %A Swaminathan, H. %A Hambleton, R. K. %A Sireci, S. G. %A Xing, D. %A Rizavi, S. M. %X Large item banks with properly calibrated test items are essential for ensuring the validity of computer-based tests. At the same time, item calibrations with small samples are desirable to minimize the amount of pretesting and limit item exposure. Bayesian estimation procedures show considerable promise with small examinee samples. The purposes of the study were (a) to examine how prior information for Bayesian item parameter estimation can be specified and (b) to investigate the relationship between sample size and the specification of prior information on the accuracy of item parameter estimates. The results of the simulation study were clear: Estimation of item response theory (IRT) model item parameters can be improved considerably. Improvements in the one-parameter model were modest; considerable improvements with the two- and three-parameter models were observed. Both the study of different forms of priors and ways to improve the judgmental data used in forming the priors appear to be promising directions for future research. %B Applied Psychological Measurement %V 27 %P 27-51 %G eng %0 Journal Article %J System %D 2003 %T Student modeling and ab initio language learning %A Heift, T. %A Schulze, M. %X Provides examples of student modeling techniques that have been employed in computer-assisted language learning over the past decade. Describes two systems for learning German: "German Tutor" and "Geroline." Shows how a student model can support computerized adaptive language testing for diagnostic purposes in a Web-based language learning environment that does not rely on parsing technology. (Author/VWL) %B System %V 31 %P 519-535 %G eng %M EJ677996 %0 Generic %D 2003 %T Using moving averages to assess test and item security in computer-based testing (Center for Educational Assessment Research Report No 468) %A Han, N. %C Amherst, MA: University of Massachusetts, School of Education. %G eng %0 Journal Article %J Journal of Educational Measurement %D 2002 %T Can examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests? %A Vispoel, W. P. %A Clough, S. J. %A Bleiler, T. %A Hendrickson, A. B. %A Ihrig, D. %B Journal of Educational Measurement %V 39 %P 311-330 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T Comparing three item selection approaches for computerized adaptive testing with content balancing requirement %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Psychologische Beitrge %D 2002 %T A comparison of non-deterministic procedures for the adaptive assessment of knowledge %A Hockemeyer, C. %B Psychologische Beitrge %V 44 %P 495-503 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T Comparison of the psychometric properties of several computer-based test designs for credentialing exams %A Jodoin, M. %A Zenisky, A. L. %A Hambleton, R. K. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Journal of Educational Measurement %D 2002 %T Data sparseness and on-line pretest item calibration-scaling methods in CAT %A Ban, J-C. %A Hanson, B. A. %A Yi, Q. %A Harris, D. J. %K Computer Assisted Testing %K Educational Measurement %K Item Response Theory %K Maximum Likelihood %K Methodology %K Scaling (Testing) %K Statistical Data %X Compared and evaluated 3 on-line pretest item calibration-scaling methods (the marginal maximum likelihood estimate with 1 expectation maximization [EM] cycle [OEM] method, the marginal maximum likelihood estimate with multiple EM cycles [MEM] method, and M. L. Stocking's Method B) in terms of item parameter recovery when the item responses to the pretest items in the pool are sparse. Simulations of computerized adaptive tests were used to evaluate the results yielded by the three methods. The MEM method produced the smallest average total error in parameter estimation, and the OEM method yielded the largest total error (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 39 %P 207-218 %G eng %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2002 %T Development of an index of physical functional health status in rehabilitation %A Hart, D. L. %A Wright, B. D. %K *Health Status Indicators %K *Rehabilitation Centers %K Adolescent %K Adult %K Aged %K Aged, 80 and over %K Female %K Health Surveys %K Humans %K Male %K Middle Aged %K Musculoskeletal Diseases/*physiopathology/*rehabilitation %K Nervous System Diseases/*physiopathology/*rehabilitation %K Physical Fitness/*physiology %K Recovery of Function/physiology %K Reproducibility of Results %K Retrospective Studies %X OBJECTIVE: To describe (1) the development of an index of physical functional health status (FHS) and (2) its hierarchical structure, unidimensionality, reproducibility of item calibrations, and practical application. DESIGN: Rasch analysis of existing data sets. SETTING: A total of 715 acute, orthopedic outpatient centers and 62 long-term care facilities in 41 states participating with Focus On Therapeutic Outcomes, Inc. PATIENTS: A convenience sample of 92,343 patients (40% male; mean age +/- standard deviation [SD], 48+/-17y; range, 14-99y) seeking rehabilitation between 1993 and 1999. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Patients completed self-report health status surveys at admission and discharge. The Medical Outcomes Study 36-Item Short-Form Health Survey's physical functioning scale (PF-10) is the foundation of the physical FHS. The Oswestry Low Back Pain Disability Questionnaire, Neck Disability Index, Lysholm Knee Questionnaire, items pertinent to patients with upper-extremity impairments, and items pertinent to patients with more involved neuromusculoskeletal impairments were cocalibrated into the PF-10. RESULTS: The final FHS item bank contained 36 items (patient separation, 2.3; root mean square measurement error, 5.9; mean square +/- SD infit, 0.9+/-0.5; outfit, 0.9+/-0.9). Analyses supported empirical item hierarchy, unidimensionality, reproducibility of item calibrations, and content and construct validity of the FHS-36. CONCLUSIONS: Results support the reliability and validity of FHS-36 measures in the present sample. Analyses show the potential for a dynamic, computer-controlled, adaptive survey for FHS assessment applicable for group analysis and clinical decision making for individual patients. %B Archives of Physical Medicine and Rehabilitation %V 83 %P 655-65 %8 May %G eng %M 11994805 %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Evaluation of selection procedures for computerized adaptive testing with polytomous items %A van Rijn, P. W. %A Theo Eggen %A Hemker, B. T. %A Sanders, P. F. %K computerized adaptive testing %X In the present study, a procedure that has been used to select dichotomous items in computerized adaptive testing was applied to polytomous items. This procedure was designed to select the item with maximum weighted information. In a simulation study, the item information function was integrated over a fixed interval of ability values and the item with the maximum area was selected. This maximum interval information item selection procedure was compared to a maximum point information item selection procedure. Substantial differences between the two item selection procedures were not found when computerized adaptive tests were evaluated on bias and the root mean square of the ability estimate. %B Applied Psychological Measurement %V 26 %P 393-411 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Evaluation of selection procedures for computerized adaptive testing with polytomous items %A van Rijn, P. W. %A Theo Eggen %A Hemker, B. T. %A Sanders, P. F. %B Applied Psychological Measurement %V 26 %P 393-411 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T Impact of item quality and item bank size on the psychometric quality of computer-based credentialing exams %A Hambleton, R. K. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T Impact of selected factors on the psychometric quality of credentialing examinations administered with a sequential testlet design %A Hambleton, R. K. %A Jodoin, M. %A Zenisky, A. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the meeting of National Council on Measurement in Education %D 2002 %T Impact of test design, item quality and item bank size on the psychometric properties of computer-based credentialing exams %A Xing, D. %A Hambleton, R. K. %B Paper presented at the meeting of National Council on Measurement in Education %C New Orleans %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T An investigation of procedures for estimating error indexes in proficiency estimation in CAT %A Shyu, C.-Y. %A Fan, M. %A Thompson, T, %A Hsu, Y. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm %A Leung, C. K. %A Chang, Hua-Hua %A Hau, K. T. %B Applied Psychological Measurement %V 26 %P 376-392 %@ 0146-6216 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter algorithm %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Applied Psychological Measurement %V 26 %P 376-392 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Optimum number of strata in the a-stratified adaptive testing design %A Wen, J.-B. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Redeveloping the exposure control parameters of CAT items when a pool is modified %A Chang, S-W. %A Harris, D. J. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T Adaptation of a-stratified method in variable length computerized adaptive testing %A Wen, J.-B. %A Chang, Hua-Hua %A Hau, K.-T.  %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Journal Article %J Journal of Educational Measurement %D 2001 %T A comparative study of on line pretest item—Calibration/scaling methods in computerized adaptive testing %A Ban, J. C. %A Hanson, B. A. %A Wang, T. %A Yi, Q. %A Harris, D. J. %X The purpose of this study was to compare and evaluate five on-line pretest item-calibration/scaling methods in computerized adaptive testing (CAT): marginal maximum likelihood estimate with one EM cycle (OEM), marginal maximum likelihood estimate with multiple EM cycles (MEM), Stocking's Method A, Stocking's Method B, and BILOG/Prior. The five methods were evaluated in terms ofitem-parameter recovery, using three different sample sizes (300, 1000 and 3000). The MEM method appeared to be the best choice among these, because it produced the smallest parameter-estimation errors for all sample size conditions. MEM and OEM are mathematically similar, although the OEM method produced larger errors. MEM also was preferable to OEM, unless the amount of timeinvolved in iterative computation is a concern. Stocking's Method B also worked very well, but it required anchor items that either would increase test lengths or require larger sample sizes depending on test administration design. Until more appropriate ways of handling sparse data are devised, the BILOG/Prior method may not be a reasonable choice for small sample sizes. Stocking's Method A hadthe largest weighted total error, as well as a theoretical weakness (i.e., treating estimated ability as true ability); thus, there appeared to be little reason to use it %B Journal of Educational Measurement %V 38 %P 191-212 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T Comparison of the SPRT and CMT procedures in computerized adaptive testing %A Yi, Q. %A Hanson, B. %A Widiatmo, H. %A Harris, D. J. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T Data sparseness and online pretest calibration/scaling methods in CAT %A Ban, J %A Hanson, B. A. %A Yi, Q. %A Harris, D. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 2001 %T An examination of item selection rules by stratified CAT designs integrated with content balancing methods %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the Annual Meeting of the American Educational Research Association %C Seattle WA %G eng %0 Journal Article %J Journal of Applied Psychology %D 2001 %T An examination of the comparative reliability, validity, and accuracy of performance ratings made using computerized adaptive rating scales %A Borman, W. C. %A Buck, D. E. %A Hanson, M. A. %A Motowidlo, S. J. %A Stark, S. %A F Drasgow %K *Computer Simulation %K *Employee Performance Appraisal %K *Personnel Selection %K Adult %K Automatic Data Processing %K Female %K Human %K Male %K Reproducibility of Results %K Sensitivity and Specificity %K Support, U.S. Gov't, Non-P.H.S. %K Task Performance and Analysis %K Video Recording %X This laboratory research compared the reliability, validity, and accuracy of a computerized adaptive rating scale (CARS) format and 2 relatively common and representative rating formats. The CARS is a paired-comparison rating task that uses adaptive testing principles to present pairs of scaled behavioral statements to the rater to iteratively estimate a ratee's effectiveness on 3 dimensions of contextual performance. Videotaped vignettes of 6 office workers were prepared, depicting prescripted levels of contextual performance, and 112 subjects rated these vignettes using the CARS format and one or the other competing format. Results showed 23%-37% lower standard errors of measurement for the CARS format. In addition, validity was significantly higher for the CARS format (d = .18), and Cronbach's accuracy coefficients showed significantly higher accuracy, with a median effect size of .08. The discussion focuses on possible reasons for the results. %B Journal of Applied Psychology %V 86 %P 965-973 %G eng %M 11596812 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Impact of scoring options for not reached items in CAT %A Yi, Q. %A Widiatmo, H. %A Ban, J-C. %A Harris, D. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Generic %D 2001 %T Impact of several computer-based testing variables on the psychometric properties of credentialing examinations (Laboratory of Psychometric and Evaluative Research Report No 393) %A Xing, D. %A Hambleton, R. K. %C Amherst, MA: University of Massachusetts, School of Education. %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %D 2001 %T Impact of several computer-based testing variables on the psychometric properties of credentialing examinations %A Xing, D. %A Hambleton, R. K. %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %D 2001 %T Integrating stratification and information approaches for multiple constrained CAT %A Leung, C.-I. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T An investigation of procedures for estimating error indexes in proficiency estimation in a realistic second-order equitable CAT environment %A Shyu, C.-Y. %A Fan, M. %A Thompson, T, %A Hsu. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Journal Article %J Journal of Educational Measurement %D 2001 %T Item selection in computerized adaptive testing: Should more discriminating items be used first? %A Hau, Kit-Tai %A Chang, Hua-Hua %K ability %K Adaptive Testing %K Computer Assisted Testing %K Estimation %K Statistical %K Test Items computerized adaptive testing %X During computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with J. B. Sympson and R. D. Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order, as described in H. Chang and Z. Yings (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 38 %P 249-266 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Multidimensional adaptive testing using weighted likelihood estimation: A comparison of estimation methods %A Tseng, F.-E. %A Hsu, T.-C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Nearest neighbors, simple strata, and probabilistic parameters: An empirical comparison of methods for item exposure control in CATs %A Parshall, C. G. %A Kromrey, J. D. %A Harmes, J. C. %A Sentovich, C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Generic %D 2001 %T Online item parameter recalibration: Application of missing data treatments to overcome the effects of sparse data conditions in a computerized adaptive version of the MCAT %A Harmes, J. C. %A Kromrey, J. D. %A Parshall, C. G. %C Unpublished manuscript %G eng %0 Journal Article %J Apuntes de Psicologia %D 2001 %T Requerimientos, aplicaciones e investigación en tests adaptativos informatizados [Requirements, applications, and investigation in computerized adaptive testing] %A Olea Díaz, J. %A Ponsoda Gil, V. %A Revuelta Menéndez, J. %A Hontangas Beltrán, P. %A Abad, F. J. %K Computer Assisted Testing %K English as Second Language %K Psychometrics computerized adaptive testing %X Summarizes the main requirements and applications of computerized adaptive testing (CAT) with emphasis on the differences between CAT and conventional computerized tests. Psychometric properties of estimations based on CAT, item selection strategies, and implementation software are described. Results of CAT studies in Spanish-speaking samples are described. Implications for developing a CAT measuring the English vocabulary of Spanish-speaking students are discussed. (PsycINFO Database Record (c) 2005 APA ) %B Apuntes de Psicologia %V 19 %P 11-28 %G eng %0 Journal Article %J Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden %D 2001 %T Toepassing van een computergestuurde adaptieve testprocedure op persoonlijkheidsdata [Application of a computerised adaptive test procedure on personality data] %A Hol, A. M. %A Vorst, H. C. M. %A Mellenbergh, G. J. %K Adaptive Testing %K Computer Applications %K Computer Assisted Testing %K Personality Measures %K Test Reliability computerized adaptive testing %X Studied the applicability of a computerized adaptive testing procedure to an existing personality questionnaire within the framework of item response theory. The procedure was applied to the scores of 1,143 male and female university students (mean age 21.8 yrs) in the Netherlands on the Neuroticism scale of the Amsterdam Biographical Questionnaire (G. J. Wilde, 1963). The graded response model (F. Samejima, 1969) was used. The quality of the adaptive test scores was measured based on their correlation with test scores for the entire item bank and on their correlation with scores on other scales from the personality test. The results indicate that computerized adaptive testing can be applied to personality scales. (PsycINFO Database Record (c) 2005 APA ) %B Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden %V 56 %P 119-133 %G eng %0 Journal Article %J Educational Measurement: Issues and Practice %D 2001 %T Validity issues in computer-based testing %A Huff, K. L. %A Sireci, S. G. %B Educational Measurement: Issues and Practice %V 20(3) %P 16-25 %G eng %0 Journal Article %J European Journal of Psychological Assessment %D 2000 %T The choice of item difficulty in self adapted testing %A Hontangas, P. %A Ponsoda, V. %A Olea, J. %A Wise, S. L. %B European Journal of Psychological Assessment %V 16 %P 3-12 %G eng %N 1 %0 Journal Article %J Assessment %D 2000 %T Computerization and adaptive administration of the NEO PI-R %A Reise, S. P. %A Henson, J. M. %K *Personality Inventory %K Algorithms %K California %K Diagnosis, Computer-Assisted/*methods %K Humans %K Models, Psychological %K Psychometrics/methods %K Reproducibility of Results %X This study asks, how well does an item response theory (IRT) based computerized adaptive NEO PI-R work? To explore this question, real-data simulations (N = 1,059) were used to evaluate a maximum information item selection computerized adaptive test (CAT) algorithm. Findings indicated satisfactory recovery of full-scale facet scores with the administration of around four items per facet scale. Thus, the NEO PI-R could be reduced in half with little loss in precision by CAT administration. However, results also indicated that the CAT algorithm was not necessary. We found that for many scales, administering the "best" four items per facet scale would have produced similar results. In the conclusion, we discuss the future of computerized personality assessment and describe the role IRT methods might play in such assessments. %B Assessment %V 7 %P 347-64 %G eng %M 11151961 %0 Generic %D 2000 %T Computerized adaptive rating scales (CARS): Development and evaluation of the concept %A Borman, W. C. %A Hanson, M. A. %A Kubisiak, U. C. %A Buck, D. E. %C (Institute Rep No. 350). Tampa FL: Personnel Decisions Research Institute. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2000 %T Content balancing in stratified computerized adaptive testing designs %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans, LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Effects of nonequivalence of item pools on ability estimates in CAT %A Ban, J. C. %A Wang, T. %A Yi, Q. %A Harris, D. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Journal Article %J Medical Care %D 2000 %T Emergence of item response modeling in instrument development and data analysis %A Hambleton, R. K. %K Computer Assisted Testing %K Health %K Item Response Theory %K Measurement %K Statistical Validity computerized adaptive testing %K Test Construction %K Treatment Outcomes %B Medical Care %V 38 %P II60-II65 %G eng %0 Report %D 2000 %T Estimating Item Parameters from Classical Indices for Item Pool Development with a Computerized Classification Test. %A Huang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. A. %I ACT, Inc. %C Iowa City, Iowa %G eng %0 Generic %D 2000 %T Estimating item parameters from classical indices for item pool development with a computerized classification test (Research Report 2000-4) %A Huang, C.-Y. %A Kalohn, J.C. %A Lin, C.-J. %A Spray, J. %C Iowa City IA: ACT Inc %G eng %0 Journal Article %J Florida Journal of Educational Research %D 2000 %T Item exposure control in computer-adaptive testing: The use of freezing to augment stratification %A Parshall, C. %A Harmes, J. C. %A Kromrey, J. D. %B Florida Journal of Educational Research %V 40 %P 28-52 %G eng %0 Journal Article %J Medical Care %D 2000 %T Item response theory and health outcomes measurement in the 21st century %A Hays, R. D. %A Morales, L. S. %A Reise, S. P. %K *Models, Statistical %K Activities of Daily Living %K Data Interpretation, Statistical %K Health Services Research/*methods %K Health Surveys %K Human %K Mathematical Computing %K Outcome Assessment (Health Care)/*methods %K Research Design %K Support, Non-U.S. Gov't %K Support, U.S. Gov't, P.H.S. %K United States %X Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods. %B Medical Care %V 38 %P II28-II42 %G eng %M 10982088 %0 Journal Article %J Journal of Educational Measurement %D 2000 %T Limiting answer review and change on computerized adaptive vocabulary tests: Psychometric and attitudinal results %A Vispoel, W. P. %A Hendrickson, A. B. %A Bleiler, T. %B Journal of Educational Measurement %V 37 %P 21-38 %G eng %0 Journal Article %J Metodología de las Ciencias del Comportamiento %D 2000 %T Los tests adaptativos informatizados en la frontera del siglo XXI: Una revisión [Computerized adaptive tests at the turn of the 21st century: A review] %A Hontangas, P. %A Ponsoda, V. %A Olea, J. %A Abad, F. J. %K computerized adaptive testing %B Metodología de las Ciencias del Comportamiento %V 2 %P 183-216 %@ 1575-9105 %G eng %0 Journal Article %J Computers in Human Behavior %D 2000 %T A real data simulation of computerized adaptive administration of the MMPI-A %A Fobey, J. D. %A Handel, R. W. %A Ben-Porath, Y. S. %X A real data simulation of computerized adaptive administration of the Minnesota Multiphasic Inventory-Adolescent (MMPI-A) was conducted using item responses from three groups of participants. The first group included 196 adolescents (age range 14-18) tested at a midwestern residential treatment facility for adolescents. The second group was the normative sample used in the standardization of the MMPI-A (Butcher, Williams, Graham, Archer, Tellegen, Ben-Porath, & Kaemmer, 1992. Minnesota Multiphasic Inventory-Adolescent (MMPI-A): manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press.). The third group was the clinical sample: used in the validation of the MMPI-A (Williams & Butcher, 1989. An MMPI study of adolescents: I. Empirical validation of the study's scales. Personality assessment, 1, 251-259.). The MMPI-A data for each group of participants were run through a modified version of the MMPI-2 adaptive testing computer program (Roper, Ben-Porath & Butcher, 1995. Comparability and validity of computerized adaptive testing with the MMPI-2. Journal of Personality Assessment, 65, 358-371.). To determine the optimal amount of item savings, each group's MMPI-A item responses were used to simulate three different orderings of the items: (1) from least to most frequently endorsed in the keyed direction; (2) from least to most frequently endorsed in the keyed direction with the first 120 items rearranged into their booklet order; and (3) all items in booklet order. The mean number of items administered for each group was computed for both classification and full- scale elevations for T-score cut-off values of 60 and 65. Substantial item administration savings were achieved for all three groups, and the mean number of items saved ranged from 50 items (10.7% of the administered items) to 123 items (26.4% of the administered items), depending upon the T-score cut-off, classification method (i.e. classification only or full-scale elevation), and group. (C) 2000 Elsevier Science Ltd. All rights reserved. %B Computers in Human Behavior %V 16 %P 83-96 %G eng %0 Journal Article %J Computers in Human Behavior %D 2000 %T A real data simulation of computerized adaptive administration of the MMPI-A %A Forbey, J. D. %A Handel, R. W. %A Ben-Porath, Y. S. %B Computers in Human Behavior %V 16 %P 83-96 %G eng %0 Generic %D 2000 %T A selection procedure for polytomous items in computerized adaptive testing (Measurement and Research Department Reports 2000-5) %A Rijn, P. W. van, %A Theo Eggen %A Hemker, B. T. %A Sanders, P. F. %C Arnhem, The Netherlands: Cito %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Solving complex constraints in a-stratified computerized adaptive testing designs %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans, USA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Sufficient simplicity or comprehensive complexity? A comparison of probabilitic and stratification methods of exposure control %A Parshall, C. G. %A Kromrey, J. D. %A Hogarty, K. Y. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1999 %T Adjusting "scores" from a CAT following successful item challenges %A Wang, T. %A Yi, Q. %A Ban, J. C. %A Harris, D. J. %A Hanson, B. A. %B Paper presented at the annual meeting of the American Educational Research Association %C Montreal, Canada %G eng %0 Journal Article %J European Journal of Psychological Assessment %D 1999 %T Benefits from computerized adaptive testing as seen in simulation studies %A Hornke, L. F. %B European Journal of Psychological Assessment %V 15 %P 91-98 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T A comparative study of ability estimates from computer-adaptive testing and multi-stage testing %A Patsula, L N. %A Hambleton, R. K. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal Canada %G eng %0 Journal Article %J Psychological Assessment %D 1999 %T Computerized adaptive assessment with the MMPI-2 in a clinical setting %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. E. %B Psychological Assessment %V 11 %P 369-380 %G eng %M 00012030-199909000-00013 %0 Book Section %D 1999 %T Developing computerized adaptive tests for school children %A Kingsbury, G. G. %A Houser, R.L. %C F. Drasgow and J. B. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 93-115). Mahwah NJ: Erlbaum. %G eng %0 Book Section %D 1999 %T The development of a computerized adaptive selection system for computer programmers in a financial services company %A Zickar, M.. J. %A Overton, R. C. %A Taylor, L. R. %A Harms, H. J. %C F. Drasgow and J. B. Olsen (Eds.), Innvoations in computerized assessment (p. 7-33). Mahwah NJ Erlbaum. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1999 %T An enhanced stratified computerized adaptive testing design %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the annual meeting of the American Educational Research Association %C Montreal, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1999 %T Item exposure in adaptive tests: An empirical investigation of control strategies %A Parshall, C. %A Hogarty, K. %A Kromrey, J. %B Paper presented at the annual meeting of the Psychometric Society %C Lawrence KS %G eng %0 Generic %D 1999 %T Item nonresponse: Occurrence, causes and imputation of missing answers to test items %A Huisman, J. M. E. %C (M and T Series No 32). Leiden: DSWO Press %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 1999 %T Item selection in computerized adaptive testing: improving the a-stratified design with the Sympson-Hetter algorithm %A Leung, C-K.. %A Chang, Hua-Hua %A Hau, K-T. %B Paper presented at the Annual Meeting of the American Educational Research Association %C Montreal, CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Limiting answer review and change on computerized adaptive vocabulary tests: Psychometric and attitudinal results %A Vispoel, W. P. %A Hendrickson, A. %A Bleiler, T. %A Widiatmo, H. %A Shrairi, S. %A Ihrig, D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T Reducing bias in CAT trait estimation: A comparison of approaches %A Wang, T. %A Hanson, B. H. %A C.-M. H. Lau %B Applied Psychological Measurement %V 23 %P 263-278 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Reducing item exposure without reducing precision (much) in computerized adaptive testing %A Holmes, R. M. %A Segall, D. O. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T CAT item calibration %A Hsu, Y. %A Thompson, T.D. %A Chen, W-H. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1998 %T A comparison of maximum likelihood estimation and expected a posteriori estimation in CAT using the partial credit model %A Chen, S. %A Hou, L. %A Dodd, B. G. %B Educational and Psychological Measurement %V 58 %P 569-595 %G eng %0 Conference Paper %B Paper presented at the 3th annual conference of the Society for Industrial and Organizational Psychology %D 1998 %T Computerized adaptive rating scales that measure contextual performance %A Borman, W. C. %A Hanson, M. A. %A Montowidlo, S. J %A F Drasgow %A Foster, L %A Kubisiak, U. C. %B Paper presented at the 3th annual conference of the Society for Industrial and Organizational Psychology %C Dallas TX %G eng %0 Journal Article %J J Outcome Meas %D 1998 %T The effect of item pool restriction on the precision of ability measurement for a Rasch-based CAT: comparisons to traditional fixed length examinations %A Halkitis, P. N. %K *Decision Making, Computer-Assisted %K Comparative Study %K Computer Simulation %K Education, Nursing %K Educational Measurement/*methods %K Human %K Models, Statistical %K Psychometrics/*methods %X This paper describes a method for examining the precision of a computerized adaptive test with a limited item pool. Standard errors of measurement ascertained in the testing of simulees with a CAT using a restricted pool were compared to the results obtained in a live paper-and-pencil achievement testing of 4494 nursing students on four versions of an examination of calculations of drug administration. CAT measures of precision were considered when the simulated examine pools were uniform and normal. Precision indices were also considered in terms of the number of CAT items required to reach the precision of the traditional tests. Results suggest that regardless of the size of the item pool, CAT provides greater precision in measurement with a smaller number of items administered even when the choice of items is limited but fails to achieve equiprecision along the entire ability continuum. %B J Outcome Meas %V 2 %P 97-122 %G eng %M 9661734 %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1998 %T Essentially unbiased Bayesian estimates in computerized adaptive testing %A Wang, T. %A Lau, C. %A Hanson, B. A. %B Paper presented at the annual meeting of the American Educational Research Association %C San Diego %G eng %0 Journal Article %J Journal of Educational Measurement %D 1998 %T Item selection in computerized adaptive testing: Should more discriminating items be used first? %A Hau, K. T. %A Chang, Hua-Hua %B Journal of Educational Measurement %V 38 %P 249-266 %G eng %0 Conference Paper %D 1998 %T Item selection in computerized adaptive testing: Should more discriminating items be used first? Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA %A Hau, K. T. %A Chang, Hua-Hua %G eng %0 Journal Article %J Personnel Psychology %D 1997 %T Adapting to adaptive testing %A Overton, R. C. %A Harms, H. J. %A Taylor, L. R. %A Zickar, M.. J. %B Personnel Psychology %V 50 %P 171-185 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1997 %T Calibration of CAT items administered online for classification: Assumption of local independence %A Spray, J. A. %A Parshall, C. G. %A Huang, C.-H. %B Paper presented at the annual meeting of the Psychometric Society %C Gatlinburg TN %G eng %0 Conference Paper %B Paper presented at the 32nd Annual Symposium and Recent Developments in the use of the MMPI-2 and MMPI-A. Minneapolis MN. %D 1997 %T Comparability and validity of computerized adaptive testing with the MMPI-2 using a clinical sample %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. %B Paper presented at the 32nd Annual Symposium and Recent Developments in the use of the MMPI-2 and MMPI-A. Minneapolis MN. %G eng %0 Journal Article %J Educational & Psychological Measurement %D 1997 %T The effect of population distribution and method of theta estimation on computerized adaptive testing (CAT) using the rating scale model %A Chen, S-K. %A Hou, L. Y. %A Fitzpatrick, S. J. %A Dodd, B. G. %K computerized adaptive testing %X Investigated the effect of population distribution on maximum likelihood estimation (MLE) and expected a posteriori estimation (EAP) in a simulation study of computerized adaptive testing (CAT) based on D. Andrich's (1978) rating scale model. Comparisons were made among MLE and EAP with a normal prior distribution and EAP with a uniform prior distribution within 2 data sets: one generated using a normal trait distribution and the other using a negatively skewed trait distribution. Descriptive statistics, correlations, scattergrams, and accuracy indices were used to compare the different methods of trait estimation. The EAP estimation with a normal prior or uniform prior yielded results similar to those obtained with MLE, even though the prior did not match the underlying trait distribution. An additional simulation study based on real data suggested that more work is needed to determine the optimal number of quadrature points for EAP in CAT based on the rating scale model. The choice between MLE and EAP for particular measurement situations is discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Educational & Psychological Measurement %V 57 %P 422-439 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1997 %T The effect of population distribution and methods of theta estimation on computerized adaptive testing (CAT) using the rating scale model %A Chen, S. %A Hou, L. %A Fitzpatrick, S. J. %A Dodd, B. %B Educational and Psychological Measurement %V 57 %P 422-439 %G eng %0 Book Section %D 1997 %T Evaluating item calibration medium in computerized adaptive testing %A Hetter, R. D. %A Segall, D. O. %A Bloxom, B. M. %C W.A. Sands, B.K. Waters and J.R. McBride, Computerized adaptive testing: From inquiry to operation (pp. 161-168). Washington, DC: American Psychological Association. %G eng %0 Book Section %B Computerized adaptive testing: From inquiry to operation %D 1997 %T Item exposure control in CAT-ASVAB %A Hetter, R. D. %A Sympson, J. B. %E J. R. McBride %X Describes the method used to control item exposure in computerized adaptive testing-Armed Services Vocational Aptitude Battery (CAT-ASVAB). The method described was developed specifically to ensure that CAT-ASVAB items were expose no more often than the items in the printers ASVAB's alternate forms, ensuring that CAT ASVAB is nor more vulnerable than printed ASVAB forms to comprise from item exposure. (PsycINFO Database Record (c) 2010 APA, all rights reserved) %B Computerized adaptive testing: From inquiry to operation %I American Psychological Association %C Washington D.C., USA %P 141-144 %G eng %0 Book Section %D 1997 %T Item pool development and evaluation %A Segall, D. O. %A Moreno, K. E. %A Hetter, D. H. %C W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 117-130). Washington DC: American Psychological Association. %G eng %0 Book Section %D 1997 %T Policy and program management perspective %A Martin, C.J. %A Hoshaw, C.R. %C W.A. Sands, B.K. Waters, and J.R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation. Washington, DC: American Psychological Association. %G eng %0 Book Section %D 1997 %T Preliminary psychometric research for CAT-ASVAB: Selecting an adaptive testing strategy %A J. R. McBride %A Wetzel, C. D. %A Hetter, R. D. %C W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 83-95). Washington DC: American Psychological Association. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Psychometric mode effects and fit issues with respect to item difficulty estimates %A Hadidi, A. %A Luecht, RM %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Book Section %D 1997 %T Psychometric procedures for administering CAT-ASVAB %A Segall, D. O. %A Moreno, K. E. %A Bloxom, B. M. %A Hetter, R. D. %C W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation (pp. 131-140). Washington D.C.: American Psychological Association. %G eng %0 Generic %D 1997 %T Unidimensional approximations for a computerized adaptive test when the item pool and latent space are multidimensional (Research Report 97-5) %A Spray, J. A. %A Abdel-Fattah, A. A. %A Huang, C.-Y. %A Lau, CA %C Iowa City IA: ACT Inc %G eng %0 Book Section %D 1996 %T Adaptive assessment and training using the neighbourhood of knowledge states %A Dowling, C. E. %A Hockemeyer, C. %A Ludwig, A .H. %C Frasson, C. and Gauthier, G. and Lesgold, A. (eds.) Intelligent Tutoring Systems, Third International Conference, ITS'96, Montral, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 578-587. %G eng %0 Book Section %D 1996 %T Adaptive assessment using granularity hierarchies and Bayesian nets %A Collins, J. A. %A Greer, J. E. %A Huang, S. X. %C Frasson, C. and Gauthier, G. and Lesgold, A. (Eds.) Intelligent Tutoring Systems, Third International Conference, ITS'96, Montréal, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 569-577. %G eng %0 Book Section %D 1996 %T A content-balanced adaptive testing algorithm for computer-based training systems %A Huang, S. X. %C Frasson, C. and Gauthier, G. and Lesgold, A. (Eds.), Intelligent Tutoring Systems, Third International Conference, ITS'96, Montr�al, Canada, June 1996 Proceedings. Lecture Notes in Computer Science 1086. Berlin Heidelberg: Springer-Verlag 306-314. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T The effects of methods of theta estimation, prior distribution, and number of quadrature points on CAT using the graded response model %A Hou, L. %A Chen, S. %A Dodd. B. G. %A Fitzpatrick, S. J. %B Paper presented at the annual meeting of the American Educational Research Association %C New York NY %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T Effects of randomesque item selection on CAT item exposure rates and proficiency estimation under 1- and 2-PL models %A Featherman, C. M. %A Subhiyah, R. G. %A Hadadi, A. %B Paper presented at the annual meeting of the American Educational Research Association %C New York %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Heuristic-based CAT: Balancing item information, content and exposure %A Luecht, RM %A Hadadi, A. %A Nungester, R. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New York NY %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Heuristic-based CAT: Balancing item information, content, and exposure %A Luecht, RM %A Hadadi, A. %A Nungester, R. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New York NY %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Heuristics based CAT: Balancing item information, content, and exposure %A Luecht, RM %A Nungester, R. J. %A Hadadi, A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New York NY %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 1996 %T Multidimensional computer adaptive testing %A Fan, M. %A Hsu, Y. %B Paper presented at the Annual Meeting of the American Educational Research Association %C New York NY %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T New algorithms for item selection and exposure and proficiency estimation under 1- and 2-PL models %A Featherman, C. M. %A Subhiyah, R. G. %A Hadadi, A. %B Paper presented at the annual meeting of the American Educational Research Association %C New York %G eng %0 Generic %D 1996 %T Preliminary cost-effectiveness analysis of alternative ASVAB testing concepts at MET sites %A Hogan, P.F. %A Dall, T. %A J. R. McBride %C Interim report to Defense Manpower Data Center. Fairfax, VA: Lewin-VHI, Inc. %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %D 1996 %T Utility of Fisher information, global information and different starting abilities in mini CAT %A Fan, M. %A Hsu, Y. %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %C New York NY %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the Psychometric Society %D 1995 %T The effect of ability estimation for polytomous CAT in different item selection procedures %A Fan, M. %A Hsu, Y. %B Paper presented at the Annual meeting of the Psychometric Society %C Minneapolis MN %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1995 %T The effect of population distribution and methods of theta estimation on CAT using the rating scale model %A Chen, S. %A Hou, L. %A Fitzpatrick, S. J. %A Dodd, B. G. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco %G eng %0 Journal Article %J Journal of Educational Psychology %D 1995 %T Effects and underlying mechanisms of self-adapted testing %A Rocklin, T. R. %A O’Donnell, A. M. %A Holst, P. M. %B Journal of Educational Psychology %V 87 %P 103-116 %G eng %0 Generic %D 1995 %T An evaluation of alternative concepts for administering the Armed Services Vocational Aptitude Battery to applicants for enlistment %A Hogan, P.F. %A J. R. McBride %A Curran, L. T. %C DMDC Technical Report 95-013. Monterey, CA: Personnel Testing Division, Defense Manpower Data Center %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1995 %T An investigation of item calibration procedures for a computerized licensure examination %A Haynie, K.A. %A Way, W. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco, CA %G eng %0 Journal Article %J European Journal of Psychological Assessment %D 1995 %T Item times in computerized testing: A new differential information %A Hornke, L. F. %B European Journal of Psychological Assessment %V 11 (Suppl. 1) %P 108-109 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1995 %T Theoretical results and item selection from multidimensional item bank in the Mokken IRT model for polytomous items %A Hemker, B. T. %A Sijtsma, K. %A Molenaar, I. W. %B Applied Psychological Measurement %V 19 %P 337–352 %G eng %0 Generic %D 1995 %T Using simulation to select an adaptive testing strategy: An item bank evaluation program %A Hsu, T. C. %A Tseng, F. L. %C Unpublished manuscript, University of Pittsburgh %G eng %0 Journal Article %J Applied Psychological Measurement %D 1994 %T A Comparison of Item Calibration Media in Computerized Adaptive Testing %A Hetter, R. D. %A Segall, D. O. %A Bloxom, B. M. %B Applied Psychological Measurement %V 18 %G English %N 3 %0 Journal Article %J Applied Psychological Measurement %D 1994 %T A comparison of item calibration media in computerized adaptive tests %A Hetter, R. D. %A Segall, D. O. %A Bloxom, B. M. %B Applied Psychological Measurement %V 18 %P 197-204 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1994 %T The effects of item pool depth on the accuracy of pass/fail decisions for NCLEX using CAT %A Haynie, K.A. %A Way, W. D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans %G eng %0 Journal Article %J Journal of Applied Psychology %D 1994 %T The incomplete equivalence of the paper-and-pencil and computerized versions of the General Aptitude Test Battery %A Van de Vijver, F. J. R., %A Harsveld, M. %B Journal of Applied Psychology %V 79 %P 852-859 %G eng %0 Journal Article %J Educational Measurement: Issues and Practice %D 1993 %T Assessing the utility of item response models: computerized adaptive testing %A Kingsbury, G. G. %A Houser, R.L. %K computerized adaptive testing %B Educational Measurement: Issues and Practice %V 12 %P 21-27 %G eng %0 Journal Article %J Nurs Health Care %D 1993 %T Computerized adaptive testing: the future is upon us %A Halkitis, P. N. %A Leahy, J. M. %K *Computer-Assisted Instruction %K *Education, Nursing %K *Educational Measurement %K *Reaction Time %K Humans %K Pharmacology/education %K Psychometrics %B Nurs Health Care %7 1993/09/01 %V 14 %P 378-85 %8 Sep %@ 0276-5284 (Print) %G eng %M 8247367 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T An investigation of restricted self-adapted testing %A Wise, S. L. %A Kingsbury, G. G. %A Houser, R.L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta GA %G eng %0 Generic %D 1993 %T Item Calibration: Medium-of-administration effect on computerized adaptive scores (TR-93-9) %A Hetter, R. D. %A Bloxom, B. M. %A Segall, D. O. %C Navy Personnel Research and Development Center %G eng %0 Conference Paper %B Paper presented to the annual meeting of the American Educational Research Association: Atlanta GA. %D 1993 %T A practical examination of the use of free-response questions in computerized adaptive testing %A Kingsbury, G. G. %A Houser, R.L. %B Paper presented to the annual meeting of the American Educational Research Association: Atlanta GA. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1992 %T Effects of feedback during self-adapted testing on estimates of ability %A Holst, P. M. %A O’Donnell, A. M. %A Rocklin, T. R. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco %G eng %0 Book Section %D 1992 %T Evaluation of alternative operational concepts %A J. R. McBride %A Hogan, P.F. %C Proceedings of the 34th Annual Conference of the Military Testing Association. San Diego, CA: Navy Personnel Research and Development Center. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1992 %T Item selection using an average growth approximation of target information functions %A Luecht, RM %A Hirsch, T. M. %B Applied Psychological Measurement %V 16 %P 41-51 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1992 %T Scaling of two-stage adaptive test configurations for achievement testing %A Hendrickson, A. B. %A Kolen, M. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Book Section %D 1991 %T Computerized adaptive testing: Theory, applications, and standards %A Hambleton, R. K. %A Zaal, J. N. %A Pieters, J. P. M. %C R. K. Hambleton and J. N. Zaal (Eds.), Advances in educational and psychological testing: Theory and Applications (pp. 341-366). Boston: Kluwer. %G eng %0 Generic %D 1991 %T A simulation study of some simple approaches to the study of DIF for CATs %A Holland, P. W. %A Zwick, R. %C Internal memorandum, Educational Testing Service %G eng %0 Journal Article %J Journal of Marketing Research %D 1990 %T Adaptive designs for Likert-type data: An approach for implementing marketing research %A Singh, J. %A Howell, R. D. %A Rhoads, G. K. %B Journal of Marketing Research %V 27 %P 304-321 %G eng %0 Conference Paper %B A paper presented to the annual meeting of the National Council of Measurement in Education %D 1990 %T Assessing the utility of item response models: Computerized adaptive testing %A Kingsbury, G. G. %A Houser, R.L. %B A paper presented to the annual meeting of the National Council of Measurement in Education %C Boston MA %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1990 %T The effects of variable entry on bias and information of the Bayesian adaptive testing procedure %A Hankins, J. A. %B Educational and Psychological Measurement %V 50 %P 785-802 %G eng %0 Journal Article %J Issues %D 1990 %T National Council Computerized Adaptive Testing Project Review--committee perspective %A Haynes, B. %K *Computers %K *Licensure %K Educational Measurement/*methods %K Feasibility Studies %K Societies, Nursing %K United States %B Issues %V 11 %P 3 %G eng %M 2074156 %0 Journal Article %J Applied Psychological Measurement %D 1989 %T Adaptive and Conventional Versions of the DAT: The First Complete Test Battery Comparison %A Henly, S. J. %A Klebe, K. J. %A J. R. McBride %A Cudeck, R. %B Applied Psychological Measurement %V 13 %P 363-371 %G English %N 4 %0 Journal Article %J Applied Psychological Measurement %D 1989 %T Adaptive and conventional versions of the DAT: The first complete test battery comparison %A Henly, S. J. %A Klebe, K. J. %A J. R. McBride %A Cudeck, R. %B Applied Psychological Measurement %V 13 %P 363-371 %G eng %0 Journal Article %J Dissertation Abstracts International %D 1989 %T Application of computerized adaptive testing to the University Entrance Exam of Taiwan, R.O.C %A Hung, P-H. %K computerized adaptive testing %B Dissertation Abstracts International %V 49 %P 3662 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1989 %T Assessing the impact of using item parameter estimates obtained from paper-and-pencil testing for computerized adaptive testing %A Kingsbury, G. G. %A Houser, R.L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1989 %T A comparison of three adaptive testing strategies using MicroCAT %A Ho, R. %A Hsu, T. C. %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco %G eng %0 Journal Article %J Journal of Educational Computing Research %D 1989 %T Comparisons of paper-administered, computer-administered and computerized adaptive achievement tests %A Olson, J. B %A Maynes, D. D. %A Slawson, D. %A Ho, K %X This research study was designed to compare student achievement scores from three different testing methods: paper-administered testing, computer-administered testing, and computerized adaptive testing. The three testing formats were developed from the California Assessment Program (CAP) item banks for grades three and six. The paper-administered and the computer-administered tests were identical in item content, format, and sequence. The computerized adaptive test was a tailored or adaptive sequence of the items in the computer-administered test. %B Journal of Educational Computing Research %V 5 %P 311-326 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1988 %T A comparison of achievement level estimates from computerized adaptive testing and paper-and-pencil testing %A Kingsbury, G. G. %A Houser, R.L. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper(s) presented at the annual meeting(s) of the American Educational Research Association %D 1988 %T The development and evaluation of a microcomputerized adaptive placement testing system for college mathematics %A Hsu, T.-C. %A Shermis, M. D. %B Paper(s) presented at the annual meeting(s) of the American Educational Research Association %C 1986 (San Francisco CA) and 1987 (Washington DC) %G eng %0 Generic %D 1988 %T The equivalence of scores from automated and conventional educational and psychological tests (College Board Report No. 88-8) %A Mazzeo, J. %A Harvey, A. L. %C New York: The College Entrance Examination Board. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1988 %T A predictive analysis approach to adaptive testing %A Kirisci, L. %A Hsu, T.-C. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Book %D 1987 %T The effects of variable entry on bias and information of the Bayesian adaptive testing procedure %A Hankins, J. A. %C Dissertation Abstracts International, 47 (8A), 3013 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1986 %T Comparison and equating of paper-administered, computer-administered, and computerized adaptive tests of achievement %A Olsen, J. B. %A Maynes, D. D. %A Slawson, D. %A Ho, K %B Paper presented at the annual meeting of the American Educational Research Association %C San Francisco CA %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 1985 %T Controlling item exposure conditional on ability in computerized adaptive testing %A Sympson, J. B. %A Hetter, R. D. %B Journal of Educational and Behavioral Statistics %V 23 %P 57-75 %G Eng %0 Book Section %D 1985 %T Controlling item-exposure rates in computerized adaptive testing %A Sympson, J. B. %A Hetter, R. D. %C Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973-977). San Diego CA: Navy Personnel Research and Development Center. %G eng %0 Generic %D 1984 %T Evaluation of computerized adaptive testing of the ASVAB %A Hardwicke, S. %A Vicino, F. %A J. R. McBride %A Nemeth, C. %C San Diego, CA: Navy Personnel Research and Development Center, unpublished manuscript %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1984 %T An evaluation of the utility of large scale computerized adaptive testing %A Vicino, F. L. %A Hardwicke, S. B. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1984 %T An evaluation of the utility of large scale computerized adaptive testing %A Vicino, F. L. %A Hardwicke, S. B. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Generic %D 1984 %T Evaluation plan for the computerized adaptive vocational aptitude battery (Research Report 82-1) %A Green, B. F. %A Bock, R. D. %A Humphreys, L. G. %A Linn, R. L. %A Reckase, M. D. %G eng %0 Journal Article %J Journal of Educational Measurement %D 1984 %T Technical guidelines for assessing computerized adaptive tests %A Green, B. F. %A Bock, R. D. %A Humphreys, L. G. %A Linn, R. L. %A Reckase, M. D. %K computerized adaptive testing %K Mode effects %K paper-and-pencil %B Journal of Educational Measurement %V 21 %P 347-360 %@ 1745-3984 %G eng %0 Generic %D 1983 %T Predictive utility evaluation of adaptive testing: Results of the Navy research %A Hardwicke, S. %A White, K. E. %C Falls Church VA: The Rehab Group Inc %G eng %0 Report %D 1982 %T Comparison of live and simulated adaptive tests %A HUnter, D. R. %B Air Force Human Resources Laborarory %I Air Force Systems Command %C Brooks Air Force Base, Texas %8 December 1982 %G eng %0 Book Section %D 1982 %T Item Calibrations for Computerized Adaptive Testing (CAT) Experimental Item Pools Adaptive Testing %A Sympson, J. B. %A Hartmann, l. %C D. J. Weiss (Ed.). Proceedings of the 1982 Computerized Adaptive Testing Conference (pp. 290-294). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Book Section %D 1980 %T A validity study of an adaptive test of reading comprehension %A Hornke, L. F. %A Sauter, M. B. %C D. J. Weiss (Ed.), Proceedings of the 1979 Computerized Adaptive Testing Conference (pp. 57-67). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Journal Article %J Journal of Auditory Research %D 1979 %T A comparison of a standard and a computerized adaptive paradigm in Bekesy fixed-frequency audiometry %A Harris, J. D. %A Smith, P. F. %B Journal of Auditory Research %V 19 %P 1-22 %G eng %0 Journal Article %J Programmed Larning and Educational Technology %D 1979 %T Four realizations of pyramidal adaptive testing %A Hornke, L. F. %B Programmed Larning and Educational Technology %V 16 %P 164-169 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1978 %T Predictive ability of a branching test %A Brooks, S. %A Hartz, M. A. %B Educational and Psychological Measurement %V 38 %P 415-419 %G eng %0 Journal Article %J Japanese Journal of Educational Psychology %D 1978 %T A stratified adaptive test of verbal ability %A Shiba, S. %A Noguchi, H. %A Haebra, T. %B Japanese Journal of Educational Psychology %V 26 %P 229-238 %0 Book Section %D 1977 %T Adaptive Branching in a Multi-Content Achievement Test %A Pennell, R. J. %A Harris, D. A. %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program %0 Book Section %D 1977 %T Adaptive Testing Applied to Hierarchically Structured Objectives-Based Programs %A Hambleton, R. K. %A Eignor, D. R. %C D. J. Weiss (Ed.), Proceedings of the 1977 Computerized Adaptive Testing Conference. Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program %0 Journal Article %J Educational and Psychological Measurement %D 1977 %T A computer simulation study of tailored testing strategies for objective-based instructional programs %A Spineti, J. P. %A Hambleton, R. K. %X One possible way of reducing the amount of time spent testing in . objective-based instructional programs would involve the implementation of a tailored testing strategy. Our purpose was to provide some additional data on the effectiveness of various tailored testing strategies for different testing situations. The three factors of a tailored testing strategy under study with various hypothetical distributions of abilities across two learning hierarchies were test length, mastery cutting score, and starting point. Overall, our simulation results indicate that it is possible to obtain a reduction of more than 50% in testing time without any loss in decision-making accuracy, when compared to a conventional testing procedure, by implementing a tailored testing strategy. In addition, our study of starting points revealed that it was generally best to begin testing in the middle of the learning hierarchy. Finally we observed a 40% reduction in errors of classification as the number of items for testing each objective was increased from one to five. %B Educational and Psychological Measurement %V 37 %P 139-158 %G eng %0 Generic %D 1977 %T Flexilevel adaptive testing paradigm: Validation in technical training %A Hansen, D. N. %A Ross, S. %A Harris, D. A. %C AFHRL Technical Report 77-35 (I) %G eng %0 Generic %D 1977 %T Flexilevel adaptive training paradigm: Hierarchical concept structures %A Hansen, D. N. %A Ross, S. %A Harris, D. A. %C AFHRL Technical Report 77-35 (II) %G eng %0 Conference Paper %B Paper presented at the Third International Symposium on Educational Testing %D 1977 %T Four realizations of pyramidal adaptive testing strategies %A Hornke, L. F. %B Paper presented at the Third International Symposium on Educational Testing %C University of Leiden, The Netherlands %G eng %0 Book Section %D 1976 %T Reflections on adaptive testing %A Hansen, D. N. %C C. K. Clark (Ed.), Proceedings of the First Conference on Computerized Adaptive Testing (pp. 90-94). Washington DC: U.S. Government Printing Office. %G eng %0 Conference Paper %B American Educational Research Association %D 1974 %T A Bayesian approach in sequential testing %A Hsu, T. %A Pingel, K. %B American Educational Research Association %C Chicago IL %8 04/1974 %G eng %0 Generic %D 1974 %T Computer-based adaptive testing models for the Air Force technical training environment: Phase I: Development of a computerized measurement system for Air Force technical Training %A Hansen, D. N. %A Johnson, B. F. %A Fagan, R. L. %A Tan, P. %A Dick, W. %C JSAS Catalogue of Selected Documents in Psychology, 5, 1-86 (MS No. 882). AFHRL Technical Report 74-48. %G eng %0 Journal Article %J Review of Educational Research %D 1974 %T Testing and decision-making procedures for selected individualized instruction programs %A Hambleton, R. K. %B Review of Educational Research %V 10 %P 371-400 %G eng %0 Generic %D 1973 %T A review of testing and decision-making procedures (Technical Bulletin No. 15 %A Hambleton, R. K. %C Iowa City IA: American College Testing Program. %G eng %0 Generic %D 1971 %T The application of item generators for individualizing mathematics testing and instruction (Report 1971/14) %A Ferguson, R. L. %A Hsu, T. %C Pittsburgh PA: University of Pittsburgh Learning Research and Development Center %G eng %0 Book Section %D 1970 %T Individually tailored testing: Discussion %A Holtzman, W. H. %C W. H. Holtzman, (Ed.), Computer-assisted instruction, testing, and guidance (pp.198-200). New York: Harper and Row. %G eng %0 Book Section %D 1969 %T An investigation of computer-based science testing %A Hansen, D. N. %C R. C. Atkinson and H. A. Wilson (Eds.), Computer-assisted instruction: A book of readings. New York: Academic Press. %G eng %0 Book %D 1968 %T Computer-assisted testing (Eds.) %A Harman, H. H. %A Helm, C. E. %A Loye, D. E. %C Princeton NJ: Educational Testing Service %G eng %0 Generic %D 1968 %T An investigation of computer-based science testing %A Hansen, D. N. %A Schwarz, G. %C Tallahassee: Institute of Human Learning, Florida State University %G eng %0 Generic %D 1968 %T An investigation of computer-based science testing %A Hansen, D. N. %A Schwarz, G. %C Tallahassee FL: Florida State University %G eng %0 Book Section %D 1966 %T Programmed testing in the examinations of the National Board of Medical Examiners %A Hubbard, J. P. %C A. Anastasi (Ed.), Testing problems in perspective. Washington DC: American Council on Education. %G eng %0 Journal Article %J American Psychologist %D 1956 %T The sequential item test %A Krathwohl, D. R. %A Huyser, R. J. %B American Psychologist %V 2 %P 419 %G eng %0 Journal Article %J Journal of Consulting Psychology %D 1947 %T A clinical study of consecutive and adaptive testing with the revised Stanford-Binet %A Hutt, M. L. %B Journal of Consulting Psychology %V 11 %P 93-103 %G eng