01395nas a2200181 4500008004500000022001400045245008700059210006900146490000700215520080200222653003401024653002501058653001901083653001901102653001601121100002101137856005501158 2024 Engldsh a2165-659200aThe Influence of Computerized Adaptive Testing on Psychometric Theory and Practice0 aInfluence of Computerized Adaptive Testing on Psychometric Theor0 v113 a
The major premise of this article is that part of the stimulus for the evolution of psychometric theory since the 1950s was the introduction of the concept of computerized adaptive testing (CAT) or its earlier non-CAT variations. The conceptual underpinnings of CAT that had the most influence on psychometric theory was the shift of emphasis from the test (or test score) as the focus of analysis to the test item (or item score). The change in focus allowed a change in the way that test results are conceived of as measurements. It also resolved the conflict among a number of ideas that were present in the early work on psychometric theory. Some of the conflicting ideas are summarized below to show how work on the development of CAT resolved some of those conflicts.
10acomputerized adaptive testing10aItem Response Theory10aparadigm shift10ascaling theory10atest design1 aReckase, Mark, D uhttps://jcatpub.net/index.php/jcat/issue/view/34/900537nas a2200169 4500008004500000245006600045210006600111300001000177490000700187653002100194653000800215653000800223653003000231653001300261100002000274856007300294 2023 Engldsh 00aExpanding the Meaning of Adaptive Testing to Enhance Validity0 aExpanding the Meaning of Adaptive Testing to Enhance Validity a22-310 v1010aAdaptive Testing10aCAT10aCBT10atest-taking disengagement10avalidity1 aWise, Steven, L uhttp://iacat.org/expanding-meaning-adaptive-testing-enhance-validity00728nas a2200193 4500008004500000245009100045210006900136300001000205490000700215653003500222653003400257653002900291653002600320100001900346700002700365700002400392700002100416856009700437 2023 Engldsh 00aHow Do Trait Change Patterns Affect the Performance of Adaptive Measurement of Change?0 aHow Do Trait Change Patterns Affect the Performance of Adaptive a32-580 v1010aadaptive measurement of change10acomputerized adaptive testing10alongitudinal measurement10atrait change patterns1 aTai, Ming, Him1 aCooperman, Allison, W.1 aDeWeese, Joseph, N.1 aWeiss, David, J. uhttp://iacat.org/how-do-trait-change-patterns-affect-performance-adaptive-measurement-change00480nas a2200133 4500008003900000245007300039210006700112490000600179653003400185653001300219653003500232100002600267856005300293 2022 d00aThe (non)Impact of Misfitting Items in Computerized Adaptive Testing0 anonImpact of Misfitting Items in Computerized Adaptive Testing0 v910acomputerized adaptive testing10aitem fit10athree-parameter logistic model1 aDeMars, Christine, E. uhttps://jcatpub.net/index.php/jcat/issue/view/2600579nas a2200169 4500008004500000245007100045210006900116300000900185490000600194653003100200653002000231653005100251100001800302700001400320700001500334856006000349 2019 Engldsh 00aHow Adaptive Is an Adaptive Test: Are All Adaptive Tests Adaptive?0 aHow Adaptive Is an Adaptive Test Are All Adaptive Tests Adaptive a1-140 v710acomputerized adaptive test10amultistage test10astatistical indicators of amount of adaptation1 aReckase, Mark1 aJu, Unhee1 aKim, Sewon uhttp://iacat.org/jcat/index.php/jcat/article/view/69/3401609nas a2200205 4500008004500000022001400045245005000059210004900109300001000158490000600168520099300174653003501167653003401202653002301236653001901259653002701278100002301305700001501328856006001343 2019 Engldsh a2165-659200aTime-Efficient Adaptive Measurement of Change0 aTimeEfficient Adaptive Measurement of Change a15-340 v73 aThe adaptive measurement of change (AMC) refers to the use of computerized adaptive testing (CAT) at multiple occasions to efficiently assess a respondent’s improvement, decline, or sameness from occasion to occasion. Whereas previous AMC research focused on administering the most informative item to a respondent at each stage of testing, the current research proposes the use of Fisher information per time unit as an item selection procedure for AMC. The latter procedure incorporates not only the amount of information provided by a given item but also the expected amount of time required to complete it. In a simulation study, the use of Fisher information per time unit item selection resulted in a lower false positive rate in the majority of conditions studied, and a higher true positive rate in all conditions studied, compared to item selection via Fisher information without accounting for the expected time taken. Future directions of research are suggested.
10aadaptive measurement of change10acomputerized adaptive testing10aFisher information10aitem selection10aresponse-time modeling1 aFinkelman, Matthew1 aWang, Chun uhttp://iacat.org/jcat/index.php/jcat/article/view/73/3501965nas a2200145 4500008004100000245008700041210006900128260005500197520137700252653003201629653001801661653002401679100001701703856009901720 2017 eng d00aAdapting Linear Models for Optimal Test Design to More Complex Test Specifications0 aAdapting Linear Models for Optimal Test Design to More Complex T aNiigata, JapanbNiigata Seiryo Universityc08/20173 aCombinatorial optimization (CO) has proven to be a very helpful approach for addressing test assembly issues and for providing solutions. Furthermore, CO has been applied for several test designs, including: (1) for the development of linear test forms; (2) for computerized adaptive testing and; (3) for multistage testing. In his seminal work, van der Linden (2006) laid out the basis for using linear models for simultaneously assembling exams and item pools in a variety of conditions: (1) for single tests and multiple tests; (2) with item sets, etc. However, for some testing programs, the number and complexity of test specifications can grow rapidly. Consequently, the mathematical representation of the test assembly problem goes beyond most approaches reported either in van der Linden’s book or in the majority of other publications related to test assembly. In this presentation, we extend van der Linden’s framework by including the concept of blocks for test specifications. We modify the usual mathematical notation of a test assembly problem by including this concept and we show how it can be applied to various test designs. Finally, we will demonstrate an implementation of this approach in a stand-alone software, called the ATASolver.
Session Video
10aComplex Test Specifications10aLinear Models10aOptimal Test Design1 aMorin, Maxim uhttp://iacat.org/adapting-linear-models-optimal-test-design-more-complex-test-specifications-001671nas a2200145 4500008004100000245004800041210004800089260005500137520120400192653000801396653002101404653001401425100002401439856006201463 2017 eng d00aAdaptivity in a Diagnostic Educational Test0 aAdaptivity in a Diagnostic Educational Test aNiigata, JapanbNiigata Seiryo Universityc08/20173 aDuring the past five years a diagnostic educational test for three subjects (writing Dutch, writing English and math) has been developed in the Netherlands. The test informs students and their teachers about the students’ strengths and weaknesses in such a manner that the learning process can be adjusted to their personal needs. It is a computer-based assessment for students in five different educational tracks midway secondary education that can yield diagnoses of many sub-skills. One of the main challenges at the outset of the development was to devise a way to deliver many diagnoses within a reasonably testing time. The answer to this challenge was to make the DET adaptive.
In this presentation we will discuss first how the adaptivity is shaped towards the purpose of the Diagnostic Educational Test. The adaptive design, particularly working with item blocks, will be discussed as well as the implemented adaptive rules. We will also show a simulation of different adaptive paths of students and some empirical information on the paths students took through the test
Session Video
10aCAT10aDiagnostic tests10aEducation1 aSchouwstra, Sanneke uhttp://iacat.org/adaptivity-diagnostic-educational-test-002709nas a2200157 4500008004100000245007100041210006900112260005500181520213800236653000802374653002002382653001402402100002002416700002602436856008902462 2017 eng d00aAnalysis of CAT Precision Depending on Parameters of the Item Pool0 aAnalysis of CAT Precision Depending on Parameters of the Item Po aNiigata, JapanbNiigata Seiryo Universityc08/20173 aThe purpose of this research project is to analyze the measurement precision of a latent variable depending on parameters of the item pool. The influence of the following factors is analyzed:
Factor A – range of variation of items in the pool. This factor varies on three levels with the following ranges in logits: a1 – [-3.0; +3.0], a2 - [-4.0; +4.0], a3 - [-5.0; +5.0].
Factor B – number of items in the pool. The factor varies on six levels with the following number of items for every factor: b1 - 128, b2 - 256, b3 – 512, b4 - 1024, b5 – 2048, b6 – 4096. The items are evenly distributed in each of the variation ranges.
Factor C – examinees’ proficiency varies at 30 levels (c1, c2, …, c30), which are evenly distributed in the range [-3.0; +3.0] logit.
The investigation was based on a simulation experiment within the framework of the theory of latent variables.
Response Y is the precision of measurement of examinees’ proficiency, which is calculated as the difference between the true levels of examinees’ proficiency and estimates obtained by means of adaptive testing. Three factor ANOVA was used for data processing.
The following results were obtained:
1. Factor A is significant. Ceteris paribus, the greater the range of variation of items in the pool, the higher the estimation precision is.
2. Factor B is significant. Ceteris paribus, the greater the number of items in the pool, the higher the estimation precision is.
3. Factor C is statistically insignificant at level α = .05. It means that the precision of estimation of examinees’ proficiency is the same within the range of their variation.
4. The only significant interaction among all interactions is AB. The significance of this interaction is explained by the fact that increasing the number of items in the pool decreases the effect of the range of variation of items in the pool.
Session Video
10aCAT10aItem parameters10aPrecision1 aMaslak, Anatoly1 aPozdniakov, Stanislav uhttps://drive.google.com/file/d/1Bwe58kOQRgCSbB8x6OdZTDK4OIm3LQI3/view?usp=drive_web02096nas a2200181 4500008004100000245004600041210004600087260005500133520153800188653002501726653000801751100002801759700001901787700001301806700002001819700001301839856006201852 2017 eng d00aBayesian Perspectives on Adaptive Testing0 aBayesian Perspectives on Adaptive Testing aNiigata, JapanbNiigata Seiryo Universityc08/20173 aAlthough adaptive testing is usually treated from the perspective of maximum-likelihood parameter estimation and maximum-informaton item selection, a Bayesian pespective is more natural, statistically efficient, and computationally tractable. This observation not only holds for the core process of ability estimation but includes such processes as item calibration, and real-time monitoring of item security as well. Key elements of the approach are parametric modeling of each relevant process, updating of the parameter estimates after the arrival of each new response, and optimal design of the next step.
The purpose of the symposium is to illustrates the role of Bayesian statistics in this approach. The first presentation discusses a basic Bayesian algorithm for the sequential update of any parameter in adaptive testing and illustrates the idea of Bayesian optimal design for the two processes of ability estimation and online item calibration. The second presentation generalizes the ideas to the case of 62 IACAT 2017 ABSTRACTS BOOKLET adaptive testing with polytomous items. The third presentation uses the fundamental Bayesian idea of sampling from updated posterior predictive distributions (“multiple imputations”) to deal with the problem of scoring incomplete adaptive tests.
Session Video 1
Session Video 2
10aBayesian Perspective10aCAT1 avan der Linden, Wim, J.1 aJiang, Bingnan1 aRen, Hao1 aChoi, Seung, W.1 aDiao, Qi uhttp://iacat.org/bayesian-perspectives-adaptive-testing-003109nas a2200145 4500008004100000245004900041210004500090260005500135520264100190653002802831653000802859653002102867100001702888856005802905 2017 eng d00aIs CAT Suitable for Automated Speaking Test?0 aCAT Suitable for Automated Speaking Test aNiigata, JapanbNiigata Seiryo Universityc08/20173 aWe have developed automated scoring system of Japanese speaking proficiency, namely SJ-CAT (Speaking Japanese Computerized Adaptive Test), which is operational for last few months. One of the unique features of the test is an adaptive test base on polytomous IRT.
SJ-CAT consists of two sections; Section 1 has sentence reading aloud tasks and a multiple choicereading tasks and Section 2 has sentence generation tasks and an open answer tasks. In reading aloud tasks, a test taker reads a phoneme-balanced sentence on the screen after listening to a model reading. In a multiple choice-reading task, a test taker sees a picture and reads aloud one sentence among three sentences on the screen, which describe the scene most appropriately. In a sentence generation task, a test taker sees a picture or watches a video clip and describes the scene with his/her own words for about ten seconds. In an open answer tasks, the test taker expresses one’s support for or opposition to e.g., a nuclear power generation with reasons for about 30 seconds.
In the course of the development of the test, we found many unexpected and unique characteristics of speaking CAT, which are not found in usual CATs with multiple choices. In this presentation, we will discuss some of such factors that are not previously noticed in our previous project of developing dichotomous J-CAT (Japanese Computerized Adaptive Test), which consists of vocabulary, grammar, reading, and listening. Firstly, we will claim that distribution of item difficulty parameters depends on the types of items. An item pool with unrestricted types of items such as open questions is difficult to achieve ideal distributions, either normal distribution or uniform distribution. Secondly, contrary to our expectations, open questions are not necessarily more difficult to operate in automated scoring system than more restricted questions such as sentence reading, as long as if one can set up suitable algorithm for open question scoring. Thirdly, we will show that the speed of convergence of standard deviation of posterior distribution, or standard error of theta parameter in polytomous IRT used for SJCAT is faster than dichotomous IRT used in J-CAT. Fourthly, we will discuss problems in equation of items in SJ-CAT, and suggest introducing deep learning with reinforcement learning instead of equation. And finally, we will discuss the issues of operation of SJ-CAT on the web, including speed of scoring, operation costs, security among others.
Session Video
10aAutomated Speaking Test10aCAT10alanguage testing1 aImai, Shingo uhttp://iacat.org/cat-suitable-automated-speaking-test05040nas a2200145 4500008004100000245008900041210006900130260005500199520447200254653000804726653002904734100001804763700001504781856009804796 2017 eng d00aComparison of Pretest Item Calibration Methods in a Computerized Adaptive Test (CAT)0 aComparison of Pretest Item Calibration Methods in a Computerized aNiigata, JapanbNiigata Seiryo Universityc08/20173 aCalibration methods for pretest items in a computerized adaptive test (CAT) are not a new area of research inquiry. After decades of research on CAT, the fixed item parameter calibration (FIPC) method has been widely accepted and used by practitioners to address two CAT calibration issues: (a) a restricted ability range each item is exposed to, and (b) a sparse response data matrix. In FIPC, the parameters of the operational items are fixed at their original values, and multiple expectation maximization (EM) cycles are used to estimate parameters of the pretest items with prior ability distribution being updated multiple times (Ban, Hanson, Wang, Yi, & Harris, 2001; Kang & Peterson, 2009; Pommerich & Segall, 2003).
Another calibration method is the fixed person parameter calibration (FPPC) method proposed by Stocking (1988) as “Method A.” Under this approach, candidates’ ability estimates are fixed in the calibration of pretest items and they define the scale on which the parameter estimates are reported. The logic of FPPC is suitable for CAT applications because the person parameters are estimated based on operational items and available for pretest item calibration. In Stocking (1988), the FPPC was evaluated using the LOGIST computer program developed by Wood, Wingersky, and Lord (1976). He reported that “Method A” produced larger root mean square errors (RMSEs) in the middle ability range than “Method B,” which required the use of anchor items (administered non-adaptively) and linking steps to attempt to correct for the potential scale drift due to the use of imperfect ability estimates.
Since then, new commercial software tools such as BILOG-MG and flexMIRT (Cai, 2013) have been developed to handle the FPPC method with different implementations (e.g., the MH-RM algorithm with flexMIRT). The performance of the FPPC method with those new software tools, however, has rarely been researched in the literature.
In our study, we evaluated the performance of two pretest item calibration methods using flexMIRT, the new software tool. The FIPC and FPPC are compared under various CAT settings. Each simulated exam contains 75% operational items and 25% pretest items, and real item parameters are used to generate the CAT data. This study also addresses the lack of guidelines in existing CAT item calibration literature regarding population ability shift and exam length (more accurate theta estimates are expected in longer exams). Thus, this study also investigates the following four factors and their impact on parameter estimation accuracy, including: (1) candidate population changes (3 ability distributions); (2) exam length (20: 15 OP + 5 PT, 40: 30 OP + 10 PT, and 60: 45 OP + 15 PT); (3) data model fit (3PL and 3PL with fixed C), and (4) pretest item calibration sample sizes (300, 500, and 1000). This study’s findings will fill the gap in this area of research and thus provide new information on which practitioners can base their decisions when selecting a pretest calibration method for their exams.
References
Ban, J. C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D. J. (2001). A comparative study of online pretest item—Calibration/scaling methods in computerized adaptive testing. Journal of Educational Measurement, 38(3), 191–212.
Cai, L. (2013). flexMIRT® Flexible Multilevel Multidimensional Item Analysis and Test Scoring (Version 2) [Computer software]. Chapel Hill, NC: Vector Psychometric Group.
Kang, T., & Petersen, N. S. (2009). Linking item parameters to a base scale (Research Report No. 2009– 2). Iowa City, IA: ACT.
Pommerich, M., & Segall, D.O. (2003, April). Calibrating CAT pools and online pretest items using marginal maximum likelihood methods. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
Stocking, M. L. (1988). Scale drift in online calibration (Research Report No. 88–28). Princeton, NJ: Educational Testing Service.
Wood, R. L., Wingersky, M. S., & Lord, F. M. (1976). LOGIST: A computer program for estimating examinee ability and item characteristic curve parameters (RM76-6) [Computer program]. Princeton, NJ: Educational Testing Service.
Session Video
10aCAT10aPretest Item Calibration1 aMeng, Huijuan1 aHan, Chris uhttp://iacat.org/comparison-pretest-item-calibration-methods-computerized-adaptive-test-cat-001722nas a2200133 4500008004100000245009200041210006900133260005500202520120200257653000801459653001601467100001801483856008701501 2017 eng d00aA Comparison of Three Empirical Reliability Estimates for Computerized Adaptive Testing0 aComparison of Three Empirical Reliability Estimates for Computer aNiigata, JapanbNiigata Seiryo Universityc08/20173 aReliability estimates in Computerized Adaptive Testing (CAT) are derived from estimated thetas and standard error of estimated thetas. In practical, the observed standard error (OSE) of estimated thetas can be estimated by test information function for each examinee with respect to Item response theory (IRT). Unlike classical test theory (CTT), OSEs in IRT are conditional values given each estimated thetas so that those values should be marginalized to consider test reliability. Arithmetic mean, Harmonic mean, and Jensen equality were applied to marginalize OSEs to estimate CAT reliability. Based on different marginalization method, three empirical CAT reliabilities were compared with true reliabilities. Results showed that three empirical CAT reliabilities were underestimated compared to true reliability in short test length (< 40), whereas the magnitude of CAT reliabilities was followed by Jensen equality, Harmonic mean, and Arithmetic mean in long test length (> 40). Specifically, Jensen equality overestimated true reliability across all conditions in long test length (>50).
Session Video
10aCAT10aReliability1 aSeo, Dong, Gi uhttps://drive.google.com/file/d/1gXgH-epPIWJiE0LxMHGiCAxZZAwy4dAH/view?usp=sharing04442nas a2200157 4500008004100000245009700041210006900138260005500207520381800262653001104080653002804091100002004119700001804139700002104157856010604178 2017 eng d00aComputerized Adaptive Testing for Cognitive Diagnosis in Classroom: A Nonparametric Approach0 aComputerized Adaptive Testing for Cognitive Diagnosis in Classro aNiigata, JapanbNiigata Seiryo Universityc08/20173 aIn the past decade, CDMs of educational test performance have received increasing attention among educational researchers (for details, see Fu & Li, 2007, and Rupp, Templin, & Henson, 2010). CDMs of educational test performance decompose the ability domain of a given test into specific skills, called attributes, each of which an examinee may or may not have mastered. The resulting attribute profile documents the individual’s strengths and weaknesses within the ability domain. The Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation (e.g., Cheng & Chang, 2007; Cheng, 2009; Liu, You, Wang, Ding, & Chang, 2013; Tatsuoka & Tatsuoka, 1997). While model-based CD-CAT is relatively well-researched in the context of large-scale assessments, this type of system has not received the same degree of development in small-scale settings, where it would be most useful. The main challenge is that the statistical estimation techniques successfully applied to the parametric CD-CAT require large samples to guarantee the reliable calibration of item parameters and accurate estimation of examinees’ attribute profiles. In response to the challenge, a nonparametric approach that does not require any parameter calibration, and thus can be used in small educational programs, is proposed. The proposed nonparametric CD-CAT relies on the same principle as the regular CAT algorithm, but uses the nonparametric classification method (Chiu & Douglas, 2013) to assess and update the student’s ability state while the test proceeds. Based on a student’s initial responses, 2 a neighborhood of candidate proficiency classes is identified, and items not characteristic of the chosen proficiency classes are precluded from being chosen next. The response to the next item then allows for an update of the skill profile, and the set of possible proficiency classes is further narrowed. In this manner, the nonparametric CD-CAT cycles through item administration and update stages until the most likely proficiency class has been pinpointed. The simulation results show that the proposed method outperformed the compared parametric CD-CAT algorithms and the differences were significant when the item parameter calibration was not optimal.
References
Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.
Cheng, Y., & Chang, H. (2007). The modified maximum global discrimination index method for cognitive diagnostic CAT. In D. Weiss (Ed.) Proceedings of the 2007 GMAC Computerized Adaptive Testing Conference.
Chiu, C.-Y., & Douglas, J. A. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250.
Fu, J., & Li, Y. (2007). An integrative review of cognitively diagnostic psychometric models. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, Illinois.
Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172.
Rupp, A. A., & Templin, J. L., & Henson, R. A. (2010). Diagnostic Measurement. Theory, Methods, and Applications. New York: Guilford.
Tatsuoka, K.K., & Tatsuoka, M.M. (1997), Computerized cognitive diagnostic adaptive testing: Effect on remedial instruction as empirical validation. Journal of Educational Measurement, 34, 3–20.
Session Video
10aCD-CAT10anon-parametric approach1 aChang, Yuan-Pei1 aChiu, Chia-Yi1 aTsai, Rung-Ching uhttp://iacat.org/computerized-adaptive-testing-cognitive-diagnosis-classroom-nonparametric-approach-001506nas a2200133 4500008004100000245006000041210005900101260005500160520103000215653001501245653002001260100002101280856007101301 2017 eng d00aConcerto 5 Open Source CAT Platform: From Code to Nodes0 aConcerto 5 Open Source CAT Platform From Code to Nodes aNiigata, JapanbNiigata Seiryo Universityc08/20173 aConcerto 5 is the newest version of the Concerto open source R-based Computer-Adaptive Testing platform, which is currently used in educational testing and in clinical trials. In our quest to make CAT accessible to all, the latest version uses flowchart nodes to connect different elements of a test, so that CAT test creation is an intuitive high-level process that does not require writing code.
A test creator might connect an Info Page node, to a Consent Page node, to a CAT node, to a Feedback node. And after uploading their items, their test is done.
This talk will show the new flowchart interface, and demonstrate the creation of a CAT test from scratch in less than 10 minutes.
Concerto 5 also includes a new Polytomous CAT node, so CATs with Likert items can be easily created in the flowchart interface. This node is currently used in depression and anxiety tests in a clinical trial.
Session Video
10aConcerto 510aOpen Source CAT1 aStillwell, David uhttps://drive.google.com/open?id=11eu1KKILQEoK5c-CYO1P1AiJgiQxX0E002607nas a2200133 4500008004100000245004800041210004700089260005500136520214600191653002002337653002402357100002102381856007102402 2017 eng d00aDeveloping a CAT: An Integrated Perspective0 aDeveloping a CAT An Integrated Perspective aNiigata, JapanbNiigata Seiryo Universityc08/20173 aMost resources on computerized adaptive testing (CAT) tend to focus on psychometric aspects such as mathematical formulae for item selection or ability estimation. However, development of a CAT assessment requires a holistic view of project management, financials, content development, product launch and branding, and more. This presentation will develop such a holistic view, which serves several purposes, including providing a framework for validity, estimating costs and ROI, and making better decisions regarding the psychometric aspects.
Thompson and Weiss (2011) presented a 5-step model for developing computerized adaptive tests (CATs). This model will be presented and discussed as the core of this holistic framework, then applied to real-life examples. While most CAT research focuses on developing new quantitative algorithms, this presentation is instead intended to help researchers evaluate and select algorithms that are most appropriate for their needs. It is therefore ideal for practitioners that are familiar with the basics of item response theory and CAT, and wish to explore how they might apply these methodologies to improve their assessments.
Steps include:
1. Feasibility, applicability, and planning studies
2. Develop item bank content or utilize existing bank
3. Pretest and calibrate item bank
4. Determine specifications for final CAT
5. Publish live CAT.
So, for example, Step 1 will contain simulation studies which estimate item bank requirements, which then can be used to determine costs of content development, which in turn can be integrated into an estimated project cost timeline. Such information is vital in determining if the CAT should even be developed in the first place.
References
Thompson, N. A., & Weiss, D. J. (2011). A Framework for the Development of Computerized Adaptive Tests. Practical Assessment, Research & Evaluation, 16(1). Retrieved from http://pareonline.net/getvn.asp?v=16&n=1.
Session Video
10aCAT Development10aintegrated approach1 aThompson, Nathan uhttps://drive.google.com/open?id=1Jv8bpH2zkw5TqSMi03e5JJJ98QtXf-Cv03027nas a2200169 4500008004100000245004800041210004300089260005500132520252300187653001002710653001802720100001902738700001702757700001402774700001602788856005302804 2017 eng d00aThe Development of a Web-Based CAT in China0 aDevelopment of a WebBased CAT in China aNiigata, JapanbNiigata Seiryo Universityc08/20173 aCognitive ability assessment has been widely used as the recruitment tool in hiring potential employees. Traditional cognitive ability tests have been encountering threats from item-exposures and long time for answering. Especially in China, campus recruitment thinks highly of short answering time and anti-cheating. Beisen, as the biggest native online assessment software provider, developed a web-based CAT for cognitive ability which assessing verbal, quantitative, logical and spatial ability in order to decrease answering times, improve assessment accuracy and reduce threats from cheating and faking in online ability test. The web-based test provides convenient testing for examinees who can access easily to the test via internet just by login the test website at any time and any place through any Internet-enabled devices (e.g., laptops, IPADs, and smart phones).
We designed the CAT following strategies of establishing item bank, setting starting point, item selection, scoring and terminating. Additionally, we pay close attention to administrating the test via web. For the CAT procedures, we employed online calibration for establishing a stable and expanding item bank, and integrated maximum Fisher information, α-stratified strategy and randomization for item selection and coping with item exposures. Fixed-length and variable-length strategies were combined in terminating the test. For fulfilling the fluid web-based testing, we employed cloud computing techniques and designed each computing process subtly. Distributed computation was used to process scoring which executes EAP and item selecting at high speed. Caching all items to the servers in advance helps shortening the process of loading items to examinees’ terminal equipment. Horizontally scalable cloud servers function coping with great concurrency. The massive computation in item selecting was conversed to searching items from an information matrix table.
We examined the average accuracy, bank usage and computing performance in the condition of laboratory and real testing. According to a test for almost 28000 examinees, we found that bank usage is averagely 50%, and that 80% tests terminate at test information of 10 and averagely at 9.6. In context of great concurrency, the testing is unhindered and the process of scoring and item selection only takes averagely 0.23s for each examiner.
Session Video
10aChina10aWeb-Based CAT1 aLiang, Chongli1 aWang, Danjun1 aZhou, Dan1 aZhan, Peida uhttp://iacat.org/development-web-based-cat-china06631nas a2200157 4500008004100000245011400041210006900155260005500224520601700279653001106296653004106307653001906348100001706367700001806384856007106402 2017 eng d00aEfficiency of Item Selection in CD-CAT Based on Conjunctive Bayesian Network Modeling Hierarchical attributes0 aEfficiency of Item Selection in CDCAT Based on Conjunctive Bayes aNiigata, JapanbNiigata Seiryo Universityc08/20173 aCognitive diagnosis models (CDM) aim to diagnosis examinee’s mastery status of multiple fine-grained skills. As new development for cognitive diagnosis methods emerges, much attention is given to cognitive diagnostic computerized adaptive testing (CD-CAT) as well. The topics such as item selection methods, item exposure control strategies, and online calibration methods, which have been wellstudied for traditional item response theory (IRT) based CAT, are also investigated in the context of CD-CAT (e.g., Xu, Chang, & Douglas, 2003; Wang, Chang, & Huebner, 2011; Chen et al., 2012).
In CDM framework, some researchers suggest to model structural relationship between cognitive skills, or namely, attributes. Especially, attributes can be hierarchical, such that some attributes must be acquired before the subsequent ones are mastered. For example, in mathematics, addition must be mastered before multiplication, which gives a hierarchy model for addition skill and multiplication skill. Recently, new CDMs considering attribute hierarchies have been suggested including the Attribute Hierarchy Method (AHM; Leighton, Gierl, & Hunka, 2004) and the Hierarchical Diagnostic Classification Models (HDCM; Templin & Bradshaw, 2014).
Bayesian Networks (BN), the probabilistic graphical models representing the relationship of a set of random variables using a directed acyclic graph with conditional probability distributions, also provide an efficient framework for modeling the relationship between attributes (Culbertson, 2016). Among various BNs, conjunctive Bayesian network (CBN; Beerenwinkel, Eriksson, & Sturmfels, 2007) is a special kind of BN, which assumes partial ordering between occurrences of events and conjunctive constraints between them.
In this study, we propose using CBN for modeling attribute hierarchies and discuss the advantage of CBN for CDM. We then explore the impact of the CBN modeling on the efficiency of item selection methods for CD-CAT when the attributes are truly hierarchical. To this end, two simulation studies, one for fixed-length CAT and another for variable-length CAT, are conducted. For each studies, two attribute hierarchy structures with 5 and 8 attributes are assumed. Among the various item selection methods developed for CD-CAT, six algorithms are considered: posterior-weighted Kullback-Leibler index (PWKL; Cheng, 2009), the modified PWKL index (MPWKL; Kaplan, de la Torre, Barrada, 2015), Shannon entropy (SHE; Tatsuoka, 2002), mutual information (MI; Wang, 2013), posterior-weighted CDM discrimination index (PWCDI; Zheng & Chang, 2016) and posterior-weighted attribute-level CDM discrimination index (PWACDI; Zheng & Chang, 2016). The impact of Q-matrix structure, item quality, and test termination rules on the efficiency of item selection algorithms is also investigated. Evaluation measures include the attribute classification accuracy (fixed-length experiment) and test length of CDCAT until stopping (variable-length experiment).
The results of the study indicate that the efficiency of item selection is improved by directly modeling the attribute hierarchies using CBN. The test length until achieving diagnosis probability threshold was reduced to 50-70% for CBN based CAT compared to the CD-CAT assuming independence of attributes. The magnitude of improvement is greater when the cognitive model of the test includes more attributes and when the test length is shorter. We conclude by discussing how Q-matrix structure, item quality, and test termination rules affect the efficiency.
References
Beerenwinkel, N., Eriksson, N., & Sturmfels, B. (2007). Conjunctive bayesian networks. Bernoulli, 893- 909.
Chen, P., Xin, T., Wang, C., & Chang, H. H. (2012). Online calibration methods for the DINA model with independent attributes in CD-CAT. Psychometrika, 77(2), 201-222.
Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619-632.
Culbertson, M. J. (2016). Bayesian networks in educational assessment: the state of the field. Applied Psychological Measurement, 40(1), 3-21.
Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167-188.
Leighton, J. P., Gierl, M. J., & Hunka, S. M. (2004). The attribute hierarchy method for cognitive assessment: a variation on Tatsuoka's rule‐space approach. Journal of Educational Measurement, 41(3), 205-237.
Tatsuoka, C. (2002). Data analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 51(3), 337-350.
Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317-339. Wang, C. (2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and Psychological Measurement, 73(6), 1017-1035.
Wang, C., Chang, H. H., & Huebner, A. (2011). Restrictive stochastic item selection methods in cognitive diagnostic computerized adaptive testing. Journal of Educational Measurement, 48(3), 255-273.
Xu, X., Chang, H., & Douglas, J. (2003, April). A simulation study to compare CAT strategies for cognitive diagnosis. Paper presented at the annual meeting of National Council on Measurement in Education, Chicago.
Zheng, C., & Chang, H. H. (2016). High-efficiency response distribution–based item selection algorithms for short-length cognitive diagnostic computerized adaptive testing. Applied Psychological Measurement, 40(8), 608-624.
Session Video
10aCD-CAT10aConjuctive Bayesian Network Modeling10aitem selection1 aHan, Soo-Yun1 aYoo, Yun, Joo uhttps://drive.google.com/open?id=1RbO2gd4aULqsSgRi_VZudNN_edX82NeD03162nas a2200181 4500008004100000245010600041210006900147260005500216520249300271653000802764653001502772653002702787100002202814700002602836700001602862700001502878856008702893 2017 eng d00aEfficiency of Targeted Multistage Calibration Designs under Practical Constraints: A Simulation Study0 aEfficiency of Targeted Multistage Calibration Designs under Prac aNiigata, JapanbNiigata Seiryo Universityc08/20173 aCalibration of an item bank for computer adaptive testing requires substantial resources. In this study, we focused on two related research questions. First, we investigated whether the efficiency of item calibration under the Rasch model could be enhanced by calibration designs that optimize the match between item difficulty and student ability (Berger, 1991). Therefore, we introduced targeted multistage calibration designs, a design type that refers to a combination of traditional targeted calibration designs and multistage designs. As such, targeted multistage calibration designs consider ability-related background variables (e.g., grade in school), as well as performance (i.e., outcome of a preceding test stage) for assigning students to suitable items.
Second, we explored how limited a priori knowledge about item difficulty affects the efficiency of both targeted calibration designs and targeted multistage calibration designs. When arranging items within a given calibration design, test developers need to know the item difficulties to locate items optimally within the design. However, usually, no empirical information about item difficulty is available before item calibration. Owing to missing empirical data, test developers might fail to assign all items to the most suitable location within a calibration design.
Both research questions were addressed in a simulation study in which we varied the calibration design, as well as the accuracy of item distribution across the different booklets or modules within each design (i.e., number of misplaced items). The results indicated that targeted multistage calibration designs were more efficient than ordinary targeted designs under optimal conditions. Especially, targeted multistage calibration designs provided more accurate estimates for very easy and 52 IACAT 2017 ABSTRACTS BOOKLET very difficult items. Limited knowledge about item difficulty during test construction impaired the efficiency of all designs. The loss of efficiency was considerably large for one of the two investigated targeted multistage calibration designs, whereas targeted designs were more robust.
References
Berger, M. P. F. (1991). On the efficiency of IRT models when applied to different sampling designs. Applied Psychological Measurement, 15(3), 293–306. doi:10.1177/014662169101500310
Session Video
10aCAT10aEfficiency10aMultistage Calibration1 aBerger, Stephanie1 aVerschoor, Angela, J.1 aEggen, Theo1 aMoser, Urs uhttps://drive.google.com/file/d/1ko2LuiARKqsjL_6aupO4Pj9zgk6p_xhd/view?usp=sharing02133nas a2200169 4500008004100000245006700041210006500108260005500173520156400228653000801792653000801800653002001808653002301828100002101851700002001872856007101892 2017 eng d00aEvaluation of Parameter Recovery, Drift, and DIF with CAT Data0 aEvaluation of Parameter Recovery Drift and DIF with CAT Data aNiigata, JapanbNiigata Seiryo Universityc08/20173 aParameter drift and differential item functioning (DIF) analyses are frequent components of a test maintenance plan. That is, after a test form(s) is published, organizations will often calibrate postpublishing data at a later date to evaluate whether the performance of the items or the test has changed over time. For example, if item content is leaked, the items might gradually become easier over time, and item statistics or parameters can reflect this.
When tests are published under a computerized adaptive testing (CAT) paradigm, they are nearly always calibrated with item response theory (IRT). IRT calibrations assume that range restriction is not an issue – that is, each item is administered to a range of examinee ability. CAT data violates this assumption. However, some organizations still wish to evaluate continuing performance of the items from a DIF or drift paradigm.
This presentation will evaluate just how inaccurate DIF and drift analyses might be on CAT data, using a Monte Carlo parameter recovery methodology. Known item parameters will be used to generate both linear and CAT data sets, which are then calibrated for DIF and drift. In addition, we will implement Randomesque item exposure constraints in some CAT conditions, as this randomization directly alleviates the range restriction problem somewhat, but it is an empirical question as to whether this improves the parameter recovery calibrations.
Session Video
10aCAT10aDIF10aParameter Drift10aParameter Recovery1 aThompson, Nathan1 aStoeger, Jordan uhttps://drive.google.com/open?id=1F7HCZWD28Q97sCKFIJB0Yps0H66NPeKq01714nas a2200157 4500008004100000245007500041210006900116260005500185520116600240653000801406653002401414653001201438100002001450700001501470856007101485 2017 eng d00aFrom Blueprints to Systems: An Integrated Approach to Adaptive Testing0 aFrom Blueprints to Systems An Integrated Approach to Adaptive Te aNiigata, JapanbNiigata Seiryo Universityc08/20173 aFor years, test blueprints have told test developers how many items and what types of items will be included in a test. Adaptive testing adopted this approach from paper testing, and it is reasonably useful. Unfortunately, 'how many items and what types of items' are not all the elements one should consider when choosing items for an adaptive test. To fill in gaps, practitioners have developed tools to allow an adaptive test to behave appropriately (i.e. examining exposure control, content balancing, item drift procedures, etc.). Each of these tools involves the use of a separate process external to the primary item selection process.
The use of these subsidiary processes makes item selection less optimal and makes it difficult to prioritize aspects of selection. This discussion describes systems-based adaptive testing. This approach uses metadata concerning items, test takers and test elements to select items. These elements are weighted by the stakeholders to shape an expanded blueprint designed for adaptive testing.
Session Video
10aCAT10aintegrated approach10aKeynote1 aKingsbury, Gage1 aZara, Tony uhttps://drive.google.com/open?id=1CBaAfH4ES7XivmvrMjPeKyFCsFZOpQMJ01456nas a2200133 4500008004100000245007100041210006900112260005500181520096500236653002101201653000801222100002101230856007101251 2017 eng d00aHow Adaptive is an Adaptive Test: Are all Adaptive Tests Adaptive?0 aHow Adaptive is an Adaptive Test Are all Adaptive Tests Adaptive aNiigata, JapanbNiigata Seiryo Universityc08/20173 aThere are many different kinds of adaptive tests but they all have the characteristic that some feature of the test is customized to the purpose of the test. In the time allotted, it is impossible to consider the adaptation of all of this types so this address will focus on the “classic” adaptive test that matches the difficulty of the test to the capabilities of the person being tested. This address will first present information on the maximum level of adaptation that can occur and then compare the amount of adaptation that typically occurs on an operational adaptive test to the maximum level of adaptation. An index is proposed to summarize the amount of adaptation and it is argued that this type of index should be reported for operational adaptive tests to show the amount of adaptation that typically occurs.
Click for Presentation Video
10aAdaptive Testing10aCAT1 aReckase, Mark, D uhttps://drive.google.com/open?id=1Nj-zDCKk3DvHA4Jlp1qkb2XovmHeQfxu03610nas a2200169 4500008004100000245006900041210006600110260005500176520302300231653000803254653002403262653003303286100001503319700001803334700001703352856007103369 2017 eng d00aAn Imputation Approach to Handling Incomplete Computerized Tests0 aImputation Approach to Handling Incomplete Computerized Tests aNiigata, JapanbNiigata Seiryo Universityc08/20173 aAs technology advances, computerized adaptive testing (CAT) is becoming increasingly popular as it allows tests to be tailored to an examinee’s ability. Nevertheless, examinees might devise testing strategies to use CAT to their advantage. For instance, if only the items that examinees answer count towards their score, then a higher theta score might be obtained by spending more time on items at the beginning of the test and skipping items at the end if time runs out. This type of gaming can be discouraged if examinees’ scores are lowered or “penalized” based on the amount of non-response.
The goal of this study was to devise a penalty function that would meet two criteria: 1) the greater the omit rate, the greater the penalty, and 2) examinees with the same ability and the same omit rate should receive the same penalty. To create the penalty, theta was calculated based on only the items the examinee responded to ( ). Next, the expected number correct score (EXR) was obtained using and the test characteristic curve. A penalized expected number correct score (E ) was obtained by multiplying EXR by the proportion of items the examinee responded to. Finally, the penalized theta ( ) was identified using the test characteristic curve. Based on and the item parameters ( ) of an unanswered item, the likelihood of a correct response, , is computed and employed to estimate the imputed score ( ) for the unanswered item.
Two datasets were used to generate tests with completion rates of 50%, 80%, and 90%. The first dataset included real data where approximately 4,500 examinees responded to a 21 -item test which provided a baseline/truth. Sampling was done to achieve the three completion rate conditions. The second dataset consisted of simulated item scores for 50,000 simulees under a 1-2-4 multi-stage CAT design where each stage contained seven items. Imputed item scores for unanswered items were computed using a variety of values for G (and therefore T). Three other approaches to handling unanswered items were also considered: all correct (i.e., T = 0), all incorrect (i.e., T = 1), and random scoring (i.e., T = 0.5).
The current study investigated the impact on theta estimates resulting from the proposed approach to handling unanswered items in a fixed-length CAT. In real testing situations, when examinees do not finish a test, it is hard to tell whether they tried diligently but ran out of time or whether they attempted to manipulate the scoring engine. To handle unfinished tests with penalties, the proposed approach considers examinees’ abilities and incompletion rates. The results of this study provide direction for psychometric practitioners when considering penalties for omitted responses.
Session Video
10aCAT10aimputation approach10aincomplete computerized test1 aChen, Troy1 aHuang, Chi-Yu1 aLiu, Chunyan uhttps://drive.google.com/open?id=1vznZeO3nsZZK0k6_oyw5c9ZTP8uyGnXh03841nas a2200145 4500008004100000245014000041210006900181260005500250520324600305653000803551653001103559653002903570100002503599856007103624 2017 eng d00aIssues in Trait Range Coverage for Patient Reported Outcome Measure CATs - Extending the Ceiling for Above-average Physical Functioning0 aIssues in Trait Range Coverage for Patient Reported Outcome Meas aNiigata, JapanbNiigata Seiryo Universityc08/20173 aThe use of a measure which fails to cover the upper range of functioning may produce results which can lead to serious misinterpretation. Scores produced by such a measure may fail to recognize significant improvement, or may not be able to demonstrate functioning commensurate with an important milestone. Accurate measurement of this range is critical for the assessment of physically active adults, e.g., athletes recovering from injury and active military personnel who wish to return to active service. Alternatively, a PF measure with a low ceiling might fail to differentiate patients in rehabilitation who continue to improve, but for whom their score ceilings due to the measurement used.
The assessment of physical function (PF) has greatly benefited from modern psychometric theory and resulting scales, such as the Patient-Reported Outcomes Measurement Information System (PROMIS®) PF instruments. While PROMIS PF has extended the range of function upwards relative to older “legacy” instruments, few PROMIS PF items asses high levels of function. We report here on the development of higher functioning items for the PROMIS PF bank.
An expert panel representing orthopedics, sports/military medicine, and rehabilitation reviewed existing instruments and wrote new items. After internal review, cognitive interviews were conducted with 24 individuals of average and high levels of physical function. The remaining candidate items were administered along with 50 existing PROMIS anchor items to an internet panel screened for low, average, and high levels of physical function (N = 1,600), as well as members of Boston-area gyms (N= 344). The resulting data was subjected to standard psychometric analysis, along with multiple linking methods to place the new items on the existing PF metric. The new items were added to the full PF bank for simulated computerized adaptive testing (CAT).
Item response data was collected on 54 candidate items. Items that exhibited local dependence (LD) or differential item functioning (DIF) related to gender, age, race, education, or PF status. These items were removed from consideration. Of the 50 existing PROMIS PF items, 31 were free of DIF and LD and used as anchors. The parameters for the remaining new candidate items were estimated twice: freelyestimated and linked with coefficients and fixed-anchor calibration. Both methods were comparable and had appropriate fit. The new items were added to the full PF bank for simulated CATs. The resulting CAT was able to extend the ceiling with high precision to a T-score of 68, suggesting accurate measurement for 97% of the general population.
Extending the range of items by which PF is measured will substantially improve measurement quality, applicability, and efficiency. The bank has incorporated these extension items and is available for use in research and clinics for brief CAT administration (see www.healthmeasures.net). Future research projects should focus on recovery trajectories of the measure for individuals with above average function who are recovering from injury.
Session Video
10aCAT10aIssues10aPatient Reported Outcome1 aGershon, Richard, C. uhttps://drive.google.com/open?id=1ZC02F-dIyYovEjzpeuRdoXDiXMLFRuKb01393nas a2200169 4500008004100000245003600041210003600077260005500113520088800168653000801056653002101064100002101085700001201106700001601118700001801134856007101152 2017 eng d00aItem Pool Design and Evaluation0 aItem Pool Design and Evaluation aNiigata, JapanbNiigata Seiryo Universityc08/20173 aEarly work on CAT tended to use existing sets of items which came from fixed length test forms. These sets of items were selected to meet much different requirements than are needed for a CAT; decision making or covering a content domain. However, there was also some early work that suggested having items equally distributed over the range of proficiency that was of interest or concentrated at a decision point. There was also some work that showed that there was bias in proficiency estimates when an item pool was too easy or too hard. These early findings eventually led to work on item pool design and, more recently, on item pool evaluation. This presentation gives a brief overview of these topics to give some context for the following presentations in this symposium.
Session Video
10aCAT10aItem Pool Design1 aReckase, Mark, D1 aHe, Wei1 aXu, Jing-Ru1 aZhou, Xuechun uhttps://drive.google.com/open?id=1ZAsqm1yNZlliqxEHcyyqQ_vOSu20xxZs03459nas a2200145 4500008004100000245004500041210004500086260005500131520301500186653000803201653001803209653001603227100001403243856005603257 2017 eng d00aItem Response Time on Task Effect in CAT0 aItem Response Time on Task Effect in CAT aNiigata, JapanbNiigata Seiryo Universityc08/20173 aIntroduction. In addition to reduced test length and increased measurement efficiency, computerized adaptive testing (CAT) can provide new insights into the cognitive process of task completion that cannot be mined via conventional tests. Response time is a primary characteristic of the task completion procedure. It has the potential to inform us about underlying processes. In this study, the relationship between response time and response accuracy will be investigated.
Hypothesis. The present study argues that the relationship between response time on task and response accuracy, which may be positive, negative, or curvilinear, will depend on cognitive nature of task items, holding ability of the subjects and difficulty of the items constant. The interpretations regarding the associations are not uniform either.
Research question. Is there a homogeneous effect of response time on test outcome across Graduate
Proposed explanations. If the accuracy of cognitive test responses decreases with response time, then it is an indication that the underlying cognitive process is a degrading process such as knowledge retrieval. More accessible knowledge can be retrieved faster than less accessible knowledge. It is inherent to knowledge retrieval that the success rate declines with elapsing response time. For instance, in reading tasks, the time on task effect is negative and the more negative, the easier a task is. However, if the accuracy of cognitive test responses increases with response time, then the process is of an upgrading nature, with an increasing success rate as a function of response time. For example, problem-solving takes time, and fast responses are less likely to be well-founded responses. It is of course also possible that the relationship is curvilinear, as when an increasing success rate is followed by a decreasing success rate or vice versa.
Methodology. The data are from computer-based GRE quantitative and verbal tests and will be analyzed with generalized linear mixed models (GLMM) framework after controlling the effect of ability and item difficulty as possible confounding factors. A linear model means a linear combination of predictors determining the probability of person p for answering item i correctly. The models are equivalent with advanced IRT models that go beyond the regular modeling of test responses in terms of one or more latent variables and item parameters. The lme4 package for R will be utilized to conduct the statistical calculation.
Implications. The right amount of testing time in CAT is important—too much is wasteful and costly, too little impacts score validity. The study is expected to provide new perception on the relationship between response time and response accuracy, which in turn, contribute to a better understanding of time effects and relevant cognitive process in CA.
Session Video
10aCAT10aResponse time10aTask effect1 aShi, Yang uhttp://iacat.org/item-response-time-task-effect-cat01708nas a2200145 4500008004100000245006200041210006200103260005500165520122100220653000801441653001401449653003001463100002101493856004801514 2017 eng d00aItem Selection Strategies for Developing CAT in Indonesia0 aItem Selection Strategies for Developing CAT in Indonesia aNiigata JapanbNiiagata Seiryo Universityc08/20173 aRecently, development of computerized testing in Indonesia is quiet promising for the future. Many government institutions used the technology for recruitment. Starting from Indonesian Army acknowledged the benefits of computerized adaptive testing (CAT) over conventional test administration, ones of the issues of selection the first item have taken place of attention. Due to CAT’s basic philosophy, several methods can be used to select the first item such as educational level, ability estimation from item simulation, or other methods. In this case, the question is remains how apply the methods most effective in the context of constrained adaptive testing. This paper reviews such strategies that appeared in the relevant literature. The focus of this paper is on studies that have been conducted in order to evaluate the effectiveness of item selection strategies for dichotomous scoring. In this paper, also discusses the strength and weaknesses of each strategy group using examples from simulation studies. No new research is presented but rather a compendium of models is reviewed in term of learning in the newcomer context, a wide view of first item selection strategies.
10aCAT10aIndonesia10aitem selection strategies1 aChandra, Istiani uhttps://www.youtube.com/watch?v=2KuFrRATq9Q03826nas a2200157 4500008004100000245008500041210006900126260005500195520325800250653000803508653002203516653002303538100001603561700002003577856007103597 2017 eng d00aA Large-Scale Progress Monitoring Application with Computerized Adaptive Testing0 aLargeScale Progress Monitoring Application with Computerized Ada aNiigata, JapanbNiigata Seiryo Universityc08/20173 aMany conventional assessment tools are available to teachers in schools for monitoring student progress in a formative manner. The outcomes of these assessment tools are essential to teachers’ instructional modifications and schools’ data-driven educational strategies, such as using remedial activities and planning instructional interventions for students with learning difficulties. When measuring student progress toward instructional goals or outcomes, assessments should be not only considerably precise but also sensitive to individual change in learning. Unlike conventional paper-pencil assessments that are usually not appropriate for every student, computerized adaptive tests (CATs) are highly capable of estimating growth consistently with minimum and consistent error. Therefore, CATs can be used as a progress monitoring tool in measuring student growth.
This study focuses on an operational CAT assessment that has been used for measuring student growth in reading during the academic school year. The sample of this study consists of nearly 7 million students from the 1st grade to the 12th grade in the US. The students received a CAT-based reading assessment periodically during the school year. The purpose of these periodical assessments is to measure the growth in students’ reading achievement and identify the students who may need additional instructional support (e.g., academic interventions). Using real data, this study aims to address the following research questions: (1) How many CAT administrations are necessary to make psychometrically sound decisions about the need for instructional changes in the classroom or when to provide academic interventions?; (2) What is the ideal amount of time between CAT administrations to capture student growth for the purpose of producing meaningful decisions from assessment results?
To address these research questions, we first used the Theil-Sen estimator for robustly fitting a regression line to each student’s test scores obtained from a series of CAT administrations. Next, we used the conditional standard error of measurement (cSEM) from the CAT administrations to create an error band around the Theil-Sen slope (i.e., student growth rate). This process resulted in the normative slope values across all the grade levels. The optimal number of CAT administrations was established from grade-level regression results. The amount of time needed for progress monitoring was determined by calculating the amount of time required for a student to show growth beyond the median cSEM value for each grade level. The results showed that the normative slope values were the highest for lower grades and declined steadily as grade level increased. The results also suggested that the CAT-based reading assessment is most useful for grades 1 through 4, since most struggling readers requiring an intervention appear to be within this grade range. Because CAT yielded very similar cSEM values across administrations, the amount of error in the progress monitoring decisions did not seem to depend on the number of CAT administrations.
Session Video
10aCAT10aLarge-Scale tests10aProcess monitoring1 aBulut, Okan1 aCormier, Damien uhttps://drive.google.com/open?id=1uGbCKenRLnqTxImX1fZicR2c7GRV6Udc00666nas a2200193 4500008003900000022001400039245008000053210006900133300001000202490000600212653004000218653002600258653003300284653002600317653002100343100002200364700002600386856006000412 2017 d a2165-659200aLatent-Class-Based Item Selection for Computerized Adaptive Progress Tests0 aLatentClassBased Item Selection for Computerized Adaptive Progre a22-430 v510acomputerized adaptive progress test10aitem selection method10aKullback-Leibler information10aLatent class analysis10alog-odds scoring1 avan Buuren, Nikky1 aEggen, Theo, J. H. M. uhttp://iacat.org/jcat/index.php/jcat/article/view/62/2902365nas a2200277 4500008004100000245007100041210006900112260005500181520152900236653000801765653001501773653002801788100001501816700002101831700001501852700001401867700001601881700001501897700001601912700001601928700001601944700001801960700001901978700001901997856007102016 2017 eng d00aNew Challenges (With Solutions) and Innovative Applications of CAT0 aNew Challenges With Solutions and Innovative Applications of CAT aNiigata, JapanbNiigata Seiryo Universityc08/20173 aOver the past several decades, computerized adaptive testing (CAT) has profoundly changed the administration of large-scale aptitude tests, state-wide achievement tests, professional licensure exams, and health outcome measures. While many challenges of CAT have been successfully addressed due to the continual efforts of researchers in the field, there are still many remaining, longstanding challenges that have yet to be resolved. This symposium will begin with three presentations, each of which provides a sound solution to one of the unresolved challenges. They are (1) item calibration when responses are “missing not at random” from CAT administration; (2) online calibration of new items when person traits have non-ignorable measurement error; (3) establishing consistency and asymptotic normality of latent trait estimation when allowing item response revision in CAT. In addition, this symposium also features innovative applications of CAT. In particular, there is emerging interest in using cognitive diagnostic CAT to monitor and detect learning progress (4th presentation). Last but not least, the 5th presentation illustrates the power of multidimensional polytomous CAT that permits rapid identification of hospitalized patients’ rehabilitative care needs in health outcomes measurement. We believe this symposium covers a wide range of interesting and important topics in CAT.
Session Video
10aCAT10achallenges10ainnovative applications1 aWang, Chun1 aWeiss, David, J.1 aZhang, Xue1 aTao, Jian1 aHe, Yinhong1 aChen, Ping1 aWang, Shiyu1 aZhang, Susu1 aLin, Haiyan1 aGao, Xiaohong1 aChang, Hua-Hua1 aShang, Zhuoran uhttps://drive.google.com/open?id=1Wvgxw7in_QCq_F7kzID6zCZuVXWcFDPa02579nas a2200157 4500008004100000245011700041210006900158260005500227520193000282653001102212653001902223653002702242100001802269700001902287856011502306 2017 eng d00aA New Cognitive Diagnostic Computerized Adaptive Testing for Simultaneously Diagnosing Skills and Misconceptions0 aNew Cognitive Diagnostic Computerized Adaptive Testing for Simul aNiigata, JapanbNiigata Seiryo Universityc08/20173 aIn education diagnoses, diagnosing misconceptions is important as well as diagnosing skills. However, traditional cognitive diagnostic computerized adaptive testing (CD-CAT) is usually developed to diagnose skills. This study aims to propose a new CD-CAT that can simultaneously diagnose skills and misconceptions. The proposed CD-CAT is based on a recently published new CDM, called the simultaneously identifying skills and misconceptions (SISM) model (Kuo, Chen, & de la Torre, in press). A new item selection algorithm is also proposed in the proposed CD-CAT for achieving high adaptive testing performance. In simulation studies, we compare our new item selection algorithm with three existing item selection methods, including the Kullback–Leibler (KL) and posterior-weighted KL (PWKL) proposed by Cheng (2009) and the modified PWKL (MPWKL) proposed by Kaplan, de la Torre, and Barrada (2015). The results show that our proposed CD-CAT can efficiently diagnose skills and misconceptions; the accuracy of our new item selection algorithm is close to the MPWKL but less computational burden; and our new item selection algorithm outperforms the KL and PWKL methods on diagnosing skills and misconceptions.
References
Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632. doi: 10.1007/s11336-009-9123-2
Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. doi:10.1177/0146621614554650
Kuo, B.-C., Chen, C.-H., & de la Torre, J. (in press). A cognitive diagnosis model for identifying coexisting skills and misconceptions. Applied Psychological Measurement.
Session Video
10aCD-CAT10aMisconceptions10aSimultaneous diagnosis1 aKuo, Bor-Chen1 aChen, Chun-Hua uhttp://iacat.org/new-cognitive-diagnostic-computerized-adaptive-testing-simultaneously-diagnosing-skills-and-003804nas a2200169 4500008004100000245008600041210006900127260005500196520318900251653000903440653000803449653002503457100002503482700001703507700001703524856009303541 2017 eng d00aNew Results on Bias in Estimates due to Discontinue Rules in Intelligence Testing0 aNew Results on Bias in Estimates due to Discontinue Rules in Int aNiigata, JapanbNiigata Seiryo Universityc08/20173 aThe presentation provides new results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty, and the presentation of items is adaptive in the sense that each subtest session is discontinued once a test taker produces a certain number of incorrect responses in sequence. The subsequent (not observed) responses are commonly scored as wrong for that subtest, even though the test taker has not seen these. Discontinuation rules allow a certain form of adaptiveness both in paper-based and computerbased testing, and help reducing testing time.
Two lines of research that are relevant are studies that directly assess the impact of discontinuation rules, and studies that more broadly look at the impact of scoring rules on test results with a large number of not administered or not reached items. He & Wolf (2012) compared different ability estimation methods for this type of discontinuation rule adaptation of test length in a simulation study. However, to our knowledge there has been no rigorous analytical study of the underlying distributional changes of the response variables under discontinuation rules. It is important to point out that the results obtained by He & Wolf (2012) agree with results presented by, for example, DeAyala, Plake & Impara (2001) as well as Rose, von Davier & Xu (2010) and Rose, von Davier & Nagengast (2016) in that ability estimates are biased most when scoring the not observed responses as wrong. Discontinuation rules combined with scoring the non-administered items as wrong is used operationally in several major intelligence tests, so more research is needed in order to improve this particular type of adaptiveness in the testing practice.
The presentation extends existing research on adaptiveness by discontinue-rules in intelligence tests in multiple ways: First, a rigorous analytical study of the distributional properties of discontinue-rule scored items is presented. Second, an extended simulation is presented that includes additional alternative scoring rules as well as bias-corrected ability estimators that may be suitable to improve results for discontinue-rule scored intelligence tests.
References: DeAyala, R. J., Plake, B. S., & Impara, J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Journal of Educational Measurement, 38, 213-234.
He, W. & Wolfe, E. W. (2012). Treatment of Not-Administered Items on Individually Administered Intelligence Tests. Educational and Psychological Measurement, Vol 72, Issue 5, pp. 808 – 826. DOI: 10.1177/0013164412441937
Rose, N., von Davier, M., & Xu, X. (2010). Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11). Princeton, NJ: Educational Testing Service.
Rose, N., von Davier, M., & Nagengast, B. (2016) Modeling omitted and not-reached items in irt models. Psychometrika. doi:10.1007/s11336-016-9544-7
Session Video
10aBias10aCAT10aIntelligence Testing1 avon Davier, Matthias1 aCho, Youngmi1 aPan, Tianshu uhttp://iacat.org/new-results-bias-estimates-due-discontinue-rules-intelligence-testing-003772nas a2200145 4500008004100000245007300041210006900114260005500183520325500238653000803493653002203501653001803523100001403541856007103555 2017 eng d00aResponse Time and Response Accuracy in Computerized Adaptive Testing0 aResponse Time and Response Accuracy in Computerized Adaptive Tes aNiigata, JapanbNiigata Seiryo Universityc08/20173 aIntroduction. This study explores the relationship between response speed and response accuracy in Computerized Adaptive Testing (CAT). CAT provides a score as well as item response times, which can offer additional diagnostic information regarding behavioral processes of task completion that cannot be uncovered by paper-based instruments. The goal of this study is to investigate how the accuracy rate evolves as a function of response time. If the accuracy of cognitive test responses decreases with response time, then it is an indication that the underlying cognitive process is a degrading process such as knowledge retrieval. More accessible knowledge can be retrieved faster than less accessible knowledge. For instance, in reading tasks, the time on task effect is negative and the more negative, the easier a task is. However, if the accuracy of cognitive test responses increases with response time, then the process is of an upgrading nature, with an increasing success rate as a function of response time. For example, problem-solving takes time, and fast responses are less likely to be well-founded responses. It is of course also possible that the relationship is curvilinear, as when an increasing success rate is followed by a decreasing success rate or vice versa.
Hypothesis. The present study argues the relationship between response time on task and response accuracy can be positive, negative, or curvilinear, which depends on cognitive nature of task items holding ability of the subjects and difficulty of the items constant.
Methodology. Data from a subsection of GRE quantitative test were available. We will use generalized linear mixed models. A linear model means a linear combination of predictors determining the probability of person p for answering item i correctly. Modeling mixed effects means both random effects and fixed effects are included. Fixed effects refer to constants across test takers. The models are equivalent with advanced IRT models that go beyond the regular modeling of test responses in terms of one or more latent variables and item parameters. The lme4 package for R will be utilized to conduct the statistical calculation.
Research questions. 1. What is the relationship between response accuracy and response speed? 2. What is the correlation between response accuracy and type of response time (fast response vs slow response) after controlling ability of people?
Preliminary Findings. 1. There is a negative relationship between response time and response accuracy. The success rate declines with elapsing response time. 2. The correlation between the two response latent variables (fast and slow) is 1.0, indicating the time on task effects between respond time types are not different.
Implications. The right amount of testing time in CAT is important—too much is wasteful and costly, too little impacts score validity. The study is expected to provide new perception on the relationship between response time and response accuracy, which in turn, contribute to the best timing strategy in CAT—with or without time constraints.
Session Video
10aCAT10aresponse accuracy10aResponse time1 aShi, Yang uhttps://drive.google.com/open?id=1yYP01bzGrKvJnfLwepcAoQQ2F4TdSvZ203598nas a2200169 4500008004100000245004300041210004100084260005500125520306600180653000803246653002303254653002303277100001703300700002003317700002003337856007103357 2017 eng d00aScripted On-the-fly Multistage Testing0 aScripted Onthefly Multistage Testing aNiigata, JapanbNiigata Seiryo Universityc08/20173 aOn-the-fly multistage testing (OMST) was introduced recently as a promising alternative to preassembled MST. A decidedly appealing feature of both is the reviewability of items within the current stage. However, the fundamental difference is that, instead of routing to a preassembled module, OMST adaptively assembles a module at each stage according to an interim ability estimate. This produces more individualized forms with finer measurement precision, but imposing nonstatistical constraints and controlling item exposure become more cumbersome. One recommendation is to use the maximum priority index followed by a remediation step to satisfy content constraints, and the Sympson-Hetter method with a stratified item bank for exposure control.
However, these methods can be computationally expensive, thereby impeding practical implementation. Therefore, this study investigated the script method as a simpler solution to the challenge of strict content balancing and effective item exposure control in OMST. The script method was originally devised as an item selection algorithm for CAT and generally proceeds as follows: For a test with m items, there are m slots to be filled, and an item is selected according to pre-defined rules for each slot. For the first slot, randomly select an item from a designated content area (collection). For each subsequent slot, 1) Discard any enemies of items already administered in previous slots; 2) Draw a designated number of candidate items (selection length) from the designated collection according to the current ability estimate; 3) Randomly select one item from the set of candidates. There are two distinct features of the script method. First, a predetermined sequence of collections guarantees meeting content specifications. The specific ordering may be determined either randomly or deliberately by content experts. Second, steps 2 and 3 depict a method of exposure control, in which selection length balances item usage at the possible expense of ability estimation accuracy. The adaptation of the script method to OMST is straightforward. For the first module, randomly select each item from a designated collection. For each subsequent module, the process is the same as in scripted CAT (SCAT) except the same ability estimate is used for the selection of all items within the module. A series of simulations was conducted to evaluate the performance of scripted OMST (SOMST, with 3 or 4 evenly divided stages) relative to SCAT under various item exposure restrictions. In all conditions, reliability was maximized by programming an optimization algorithm that searches for the smallest possible selection length for each slot within the constraints. Preliminary results indicated that SOMST is certainly a capable design with performance comparable to that of SCAT. The encouraging findings and ease of implementation highly motivate the prospect of operational use for large-scale assessments.
Presentation Video
10aCAT10amultistage testing10aOn-the-fly testing1 aChoe, Edison1 aWilliams, Bruce1 aLee, Sung-Hyuck uhttps://drive.google.com/open?id=1wKuAstITLXo6BM4APf2mPsth1BymNl-y02148nas a2200157 4500008004100000245008800041210006900129260005400198520153900252653002901791653001101820100001901831700002001850700001801870856010201888 2017 eng d00aUsing Bayesian Decision Theory in Cognitive Diagnosis Computerized Adaptive Testing0 aUsing Bayesian Decision Theory in Cognitive Diagnosis Computeriz aNiigata JapanbNiigata Seiryo Universityc08/20173 aCognitive diagnosis computerized adaptive testing (CD-CAT) purports to provide each individual a profile about the strengths and weaknesses of attributes or skills with computerized adaptive testing. In the CD-CAT literature, researchers dedicated to evolving item selection algorithms to improve measurement efficiency, and most algorithms were developed based on information theory. By the discontinuous nature of the latent variables in CD-CAT, this study introduced an alternative for item selection, called the minimum expected cost (MEC) method, which was derived based on Bayesian decision theory. Using simulations, the MEC method was evaluated against the posterior weighted Kullback-Leibler (PWKL) information, the modified PWKL (MPWKL), and the mutual information (MI) methods by manipulating item bank quality, item selection algorithm, and termination rule. Results indicated that, regardless of item quality and termination criterion, the MEC, MPWKL, and MI methods performed very similarly and they all outperformed the PWKL method in classification accuracy and test efficiency, especially in short tests; the MEC method had more efficient item bank usage than the MPWKL and MI methods. Moreover, the MEC method could consider the costs of incorrect decisions and improve classification accuracy and test efficiency when a particular profile was of concern. All the results suggest the practicability of the MEC method in CD-CAT.
Session Video
10aBayesian Decision Theory10aCD-CAT1 aHsu, Chia-Ling1 aWang, Wen-Chung1 aChen, ShuYing uhttp://iacat.org/using-bayesian-decision-theory-cognitive-diagnosis-computerized-adaptive-testing04515nas a2200181 4500008004100000245010800041210006900149260005500218520381200273653000804085653002404093653002604117100001604143700001204159700001604171700002004187856012604207 2017 eng d00aUsing Computerized Adaptive Testing to Detect Students’ Misconceptions: Exploration of Item Selection0 aUsing Computerized Adaptive Testing to Detect Students Misconcep aNiigata, JapanbNiigata Seiryo Universityc08/20173 aOwning misconceptions impedes learning, thus detecting misconceptions through assessments is crucial to facilitate teaching. However, most computerized adaptive testing (CAT) applications to diagnose examinees’ attribute profiles focus on whether examinees mastering correct concepts or not. In educational scenario, teachers and students have to figure out the misconceptions underlying incorrect answers after obtaining the scores from assessments and then correct the corresponding misconceptions. The Scaling Individuals and Classifying Misconceptions (SICM) models proposed by Bradshaw and Templin (2014) fill this gap. SICMs can identify a student’s misconceptions directly from the distractors of multiple-choice questions and report whether s/he own the misconceptions or not. Simultaneously, SICM models are able to estimate a continuous ability within the item response theory (IRT) framework to fulfill the needs of policy-driven assessment systems relying on scaling examinees’ ability. However, the advantage of providing estimations for two types of latent variables also causes complexity of model estimation. More items are required to achieve the same accuracies for both classification and estimation compared to dichotomous DCMs and to IRT, respectively. Thus, we aim to develop a CAT using the SICM models (SICM-CAT) to estimate students’ misconceptions and continuous abilities simultaneously using fewer items than a linear test.
To achieve this goal, in this study, our research questions mainly focus on establishing several item selection rules that target on providing both accurate classification results and continuous ability estimations using SICM-CAT. The first research question is which information criterion to be used. The Kullback–Leibler (KL) divergence is the first choice, as it can naturally combine the continuous and discrete latent variables. Based on this criterion, we propose an item selection index that can nicely integrate the two types of information. Based on this index, the items selected in real time could discriminate the examinee’s current misconception profile and ability estimates from other possible estimates to the most extent. The second research question is about how to adaptively balance the estimations of the misconception profile and the continuous latent ability. Mimic the idea of the Hybrid Design proposed by Wang et al. (2016), we propose a design framework which makes the item selection transition from the group-level to the item-level. We aim to explore several design questions, such as how to select the transiting point and which latent variable estimation should be targeted first.
Preliminary results indicated that the SICM-CAT based on the proposed item selection index could classify examinees into different latent classes and measure their latent abilities compared with the random selection method more accurately and reliably under all the simulation conditions. We plan to compare different CAT designs based on our proposed item selection rules with the best linear test as the next step. We expect that the SICM-CAT is able to use shorter test length while retaining the same accuracies and reliabilities.
References
Bradshaw, L., & Templin, J. (2014). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79(3), 403-425.
Wang, S., Lin, H., Chang, H. H., & Douglas, J. (2016). Hybrid computerized adaptive testing: from group sequential design to fully sequential design. Journal of Educational Measurement, 53(1), 45-62.
Session Video
10aCAT10aincorrect answering10aStudent Misconception1 aShen, Yawei1 aBao, Yu1 aWang, Shiyu1 aBradshaw, Laine uhttp://iacat.org/using-computerized-adaptive-testing-detect-students%E2%80%99-misconceptions-exploration-item-selection-000516nas a2200193 4500008004500000022001400045245004500059210004200104300000900146490000600155653001300161653001500174653001300189653001200202653001100214653001200225100002100237856006400258 2015 Engldsh a2165-659200aImplementing a CAT: The AMC Experience 0 aImplementing a CAT The AMC Experience a1-120 v310aadaptive10aAssessment10acomputer10amedical10aonline10aTesting1 aBarnard, John, J uhttp://www.iacat.org/jcat/index.php/jcat/article/view/52/2500689nas a2200193 4500008004500000022001400045245012100059210006900180300001000249490000600259653003100265653002300296653002200319653003200341653002500373653001800398100001500416856006400431 2014 Engldsh a2165-659200aDetecting Item Preknowledge in Computerized Adaptive Testing Using Information Theory and Combinatorial Optimization0 aDetecting Item Preknowledge in Computerized Adaptive Testing Usi a37-580 v210acombinatorial optimization10ahypothesis testing10aitem preknowledge10aKullback-Leibler divergence10asimulated annealing.10atest security1 aBelov, D I uhttp://www.iacat.org/jcat/index.php/jcat/article/view/36/1800637nas a2200181 4500008003900000022001400039245011500053210006900168300001100237490000700248653001800255653002900273653003000302653004200332100002200374700001800396856004100414 2013 d a1745-399200aThe Philosophical Aspects of IRT Equating: Modeling Drift to Evaluate Cohort Growth in Large-Scale Assessments0 aPhilosophical Aspects of IRT Equating Modeling Drift to Evaluate a2–140 v3210acohort growth10aconstruct-relevant drift10aevaluation of scale drift10aphilosophical aspects of IRT equating1 aTaherbhai, Husein1 aSeo, Daeryong uhttp://dx.doi.org/10.1111/emip.1200000623nas a2200169 4500008004100000245008900041210006900130260001200199653000800211653002500219653002700244653002100271653001200292100002300304700001600327856011000343 2011 eng d00aAdaptive Item Calibration and Norming: Unique Considerations of a Global Deployment0 aAdaptive Item Calibration and Norming Unique Considerations of a c10/201110aCAT10acommon item equating10aFigural Reasoning Test10aitem calibration10anorming1 aSchwall, Alexander1 aSinar, Evan uhttp://iacat.org/content/adaptive-item-calibration-and-norming-%0Bunique-considerations-global-deployment00490nas a2200121 4500008004100000245009500041210006900136653001800205653000800223653000900231100001900240856010900259 2011 eng d00aBuilding Affordable CD-CAT Systems for Schools To Address Today's Challenges In Assessment0 aBuilding Affordable CDCAT Systems for Schools To Address Todays 10aaffordability10aCAT10acost1 aChang, Hua-Hua uhttp://iacat.org/content/building-affordable-cd-cat-systems-schools-address-todays-challenges-assessment01161nas a2200157 4500008004100000245005700041210005600098520065200154653002100806653003400827653001500861653002500876100001300901700001500914856007400929 2011 eng d00acatR: An R Package for Computerized Adaptive Testing0 acatR An R Package for Computerized Adaptive Testing3 aComputerized adaptive testing (CAT) is an active current research field in psychometrics and educational measurement. However, there is very little software available to handle such adaptive tasks. The R package catR was developed to perform adaptive testing with as much flexibility as possible, in an attempt to provide a developmental and testing platform to the interested user. Several item-selection rules and ability estimators are implemented. The item bank can be provided by the user or randomly generated from parent distributions of item parameters. Three stopping rules are available. The output can be graphically displayed.
10acomputer program10acomputerized adaptive testing10aEstimation10aItem Response Theory1 aMagis, D1 aRaîche, G uhttp://iacat.org/content/catr-r-package-computerized-adaptive-testing01629nas a2200145 4500008004100000245005200041210005000093260001200143520119300155653000801348653001601356653002001372100002301392856006801415 2011 eng d00aContinuous Testing (an avenue for CAT research)0 aContinuous Testing an avenue for CAT research c10/20113 aPublishing an Adaptive Test
-
Write Items
-
Field Test Items
-
Select an Operational Pool
-
Publish Pool
-
Distribute Pool
-
Administer Tests
-
Rinse and Repeat
Problems with Publishing
-
The static pool
-
Restricts content
-
Restricts psychometrics
-
Restricts corrections
-
Structures item exposure
-
The publication process
-
Creates logistic problems
-
Slows response to new content
-
Item filtration – An alternative to test publishing
-
Realtime test creation
-
Begins with entire item pool
-
Removes items until one item is chosen to administer
Research Questions
-
Since filtering adds constraints, what is the psychometric impact?
-
Can we use filtering to control item drift?
-
How does filtering interact with item selection?
10aCAT10aitem filter10aitem filtration1 aKingsbury, Gage, G uhttp://iacat.org/content/continuous-testing-avenue-cat-research01227nas a2200193 4500008004100000245009400041210006900135260001200204520052700216653002600743653000800769653000800777653003100785653003200816653003000848100002000878700001900898856011600917 2011 eng d00aDetecting DIF between Conventional and Computerized Adaptive Testing: A Monte Carlo Study0 aDetecting DIF between Conventional and Computerized Adaptive Tes c10/20113 aA comparison od two procedures, Modified Robust Z and 95% Credible Interval, were compared in a Monte Carlo study. Both procedures evidenced adequate control of false positive DIF results.
-
Exception: low difficulty items (< -2.5 logits).
-
Not significantly affected by % of DIF items.
-
Was affected by mean trait level difference.
-
95% Credibility Interval evidenced slightly higher power to detect DIF, but also higher false positive rate.
10a95% Credible Interval10aCAT10aDIF10adifferential item function10amodified robust Z statistic10aMonte Carlo methodologies1 aRiley, Barth, B1 aCarle, Adam, C uhttp://iacat.org/content/detecting-dif-between-conventional-and-computerized-adaptive-testing-monte-carlo-study02527nas a2200205 4500008004100000245011900041210006900160260001200229520179400241653000802035653000802043653003402051653003002085653000802115653003102123653001602154653001302170100002002183856011802203 2011 eng d00aFrom Reliability to Validity: Expanding Adaptive Testing Practice to Find the Most Valid Score for Each Test Taker0 aFrom Reliability to Validity Expanding Adaptive Testing Practice c10/20113 aCAT is an exception to the traditional conception of validity. It is one of the few examples of individualized testing. Item difficulty is tailored to each examinee. The intent, however, is increased efficiency. Focus on reliability (reduced standard error); Equivalence with paper & pencil tests is valued; Validity is enhanced through improved reliability.
How Else Might We Individualize Testing Using CAT?
-
By addressing construct-irrelevant factors influencing individual test scores (usually in negatively biased ways).
-
Individual Score Validity (ISV) – how free is a particular score from construct-irrelevant factors (often called construct-irrelevant variance, or CIV).
An ISV-Based View of Validity
Test Event -- An examinee encounters a series of items in a particular context.
-
•All 3 elements (examinee, items, context) are potential sources of CIV.
-
Examples:
-
Test anxiety (examinee)
-
Amount/difficulty of reading required (item)
-
Test stakes (context)
-
ISV can be affected by all 3 elements.
CAT Goal: individualize testing to address CIV threats to score validity (i.e., maximize ISV).
Some Research Issues:
-
What are some innovative methods for expanding CAT that address ISV threats while preserving measurement of the target construct?
-
How might CAT help address the ISV challenges posed by test anxiety?
-
How should policy-makers deal with scores that have been shown to have low ISV?
10aCAT10aCIV10aconstruct-irrelevant variance10aIndividual Score Validity10aISV10alow test taking motivation10aReliability10avalidity1 aWise, Steven, L uhttp://iacat.org/content/reliability-validity-expanding-adaptive-testing-practice-find-most-valid-score-each-test00478nas a2200145 4500008004100000245006100041210005900102653000800161653001600169653001300185100001900198700001600217700002200233856007700255 2011 eng d00aA Heuristic Of CAT Item Selection Procedure For Testlets0 aHeuristic Of CAT Item Selection Procedure For Testlets10aCAT10ashadow test10atestlets1 aChien, Yuehmei1 aShin, David1 aWay, Walter Denny uhttp://iacat.org/content/heuristic-cat-item-selection-procedure-testlets00534nas a2200121 4500008004100000245011800041210006900159653000800228653002400236653001100260100002000271856012100291 2011 eng d00aHigh-throughput Health Status Measurement using CAT in the Era of Personal Genomics: Opportunities and Challenges0 aHighthroughput Health Status Measurement using CAT in the Era of10aCAT10ahealth applications10aPROMIS1 aKrishnan, Eswar uhttp://iacat.org/content/high-throughput-health-status-measurement-using-cat-era-personal-genomics-opportunities-and01247nas a2200181 4500008004100000245012100041210006900162260001200231520055600243653003300799653000800832653001900840653003500859100001800894700001600912700002200928856011500950 2011 eng d00aItem Selection Methods based on Multiple Objective Approaches for Classification of Respondents into Multiple Levels0 aItem Selection Methods based on Multiple Objective Approaches fo c10/20113 aIs it possible to develop new item selection methods which take advantage of the fact that we want to classify into multiple categories? New methods: Taking multiple points on the ability scale into account; Based on multiple objective approaches.
Conclusions
-
Sequential Classification Tests higher ATL than Adaptive Classification Tests
-
Sequential Classification Tests slightly lower PCD than Adaptive Classification Tests
-
Results also hold with three and four cutting points
10aadaptive classification test10aCAT10aitem selection10asequential classification test1 aGroen, Maaike1 aEggen, Theo1 aVeldkamp, Bernard uhttp://iacat.org/content/item-selection-methods-based-multiple-objective-approaches-classification-respondents01253nas a2200181 4500008004100000245008300041210006900124260001200193520063300205653000800838653002600846653000900872653003500881653001600916653001400932100002300946856010200969 2011 eng d00aMoving beyond Efficiency to Allow CAT to Provide Better Diagnostic Information0 aMoving beyond Efficiency to Allow CAT to Provide Better Diagnost c10/20113 a
Future CATs will provide better diagnostic information to
–Examinees
–Regulators, Educators, Employers
–Test Developers
This goal will be accomplished by
–Smart CATs which collect additional information during the test
–Psychomagic
The time is now for Reporting
10aCAT10adianostic information10aMIRT10aMultiple unidimensional scales10apsychomagic10asmart CAT1 aBontempo, Brian, D uhttp://iacat.org/content/moving-beyond-efficiency-allow-cat-provide-better-diagnostic-information01099nas a2200169 4500008004100000245006600041210006600107260001200173520055300185653002600738653000800764653002100772653001700793653001000810100002200820856008700842 2011 eng d00aOptimal Calibration Designs for Computerized Adaptive Testing0 aOptimal Calibration Designs for Computerized Adaptive Testing c10/20113 aOptimaztion
How can we exploit the advantages of Balanced Block Design while keeping the logistics manageable?
-
Maximize number of item pairs
-
Subject to maximum number of test booklets
-
Subject to other constraints
Homogeneous Designs: Overlap between test booklets as regular as possible
Conclusions:
-
Establish overlaps as regular as possible between all test booklets
-
Or, at least as many test booklets as possible
10abalanced block design10aCAT10aitem calibration10aoptimization10aRasch1 aVerschoor, Angela uhttp://iacat.org/content/optimal-calibration-designs-computerized-adaptive-testing01303nas a2200133 4500008004100000245005000041210004800091260001200139520089400151653000801045653003501053100001201088856006901100 2011 eng d00aA Paradigm for Multinational Adaptive Testing0 aParadigm for Multinational Adaptive Testing c10/20113 aImpact of Issues in “Exported” Adaptive Testing
-
Exam content issues
-
Validity issues/construct differences
-
Method bias
-
DIF (item performance)
-
Interpretation of scores
-
“what do they mean?”
Goal is construct equivalency in the new environment
Research Questions
-
How can we assure that the constructs being measured across the geographies are equivalent?
-
Do we need to assure that the educational opportunities are “equivalent” across geographies?
-
Do standard CAT content balancing constraints work?
-
Computerized testing is common in the US, is there a need to re-visit the basic research on modality comparability in different geographies?
10aCAT10amultinational adaptive testing1 aZara, A uhttp://iacat.org/content/paradigm-multinational-adaptive-testing00684nas a2200217 4500008004100000245006000041210005800101260001200159653001700171653001700188653000800205653001500213653002500228653003200253653001700285100001800302700002100320700001600341700002400357856008500381 2011 eng d00aPractitioner’s Approach to Identify Item Drift in CAT0 aPractitioner s Approach to Identify Item Drift in CAT c10/201110aCUSUM method10aG2 statistic10aIPA10aitem drift10aitem parameter drift10aLord's chi-square statistic10aRaju's NCDIF1 aMeng, Huijuan1 aSteinkamp, Susan1 aJones, Paul1 aMatthews-Lopez, Joy uhttp://iacat.org/content/practitioner%E2%80%99s-approach-identify-item-drift-cat00313nas a2200109 4500008004100000245003200041210003100073653000800104653001600112100001800128856005700146 2011 eng d00aSmall-Sample Shadow Testing0 aSmallSample Shadow Testing10aCAT10ashadow test1 aJudd, Wallace uhttp://iacat.org/content/small-sample-shadow-testing00777nas a2200205 4500008004100000245003400041210003200075260001200107520027300119653000800392653000800400653002300408653001000431653001200441653000800453100002200461700001900483700001600502856005300518 2011 eng d00aA Test Assembly Model for MST0 aTest Assembly Model for MST c10/20113 aThis study is just a short exploration in the matter of optimization of a MST. It is extremely hard or maybe impossible to chart influence of item pool and test specifications on optimization process. Simulations are very helpful in finding an acceptable MST.
10aCAT10amst10amultistage testing10aRasch10arouting10atif1 aVerschoor, Angela1 aRadtke, Ingrid1 aEggen, Theo uhttp://iacat.org/content/test-assembly-model-mst02116nas a2200193 4500008004100000245007900041210006900120260001200189520146800201653002801669653000801697653001801705100002001723700001701743700002301760700002301783700002301806856009301829 2011 eng d00aThe Use of Decision Trees for Adaptive Item Selection and Score Estimation0 aUse of Decision Trees for Adaptive Item Selection and Score Esti c10/20113 aConducted post-hoc simulations comparing the relative efficiency, and precision of decision trees (using CHAID and CART) vs. IRT-based CAT.
-
Measure: Global Appraisal of Individual Needs (GAIN) Substance Problem Scale (16 items)
-
Past-year symptom count (SPSy)
-
Recency of symptom scale (SPSr)
Conclusions
Decision tree methods were more efficient than CAT
-
CART for dichotomous items (SPSy)
-
CHAID for polytomous items (SPSr)
-
Score bias was low in all conditions, particularly for decision trees using dichotomous items
-
In early stages of administration, decision trees provided slightly higher correlations with the full scale and lower RMSE values.
But,...
-
CAT outperformed decision tree methods in later stages of administration.
-
CAT also outperformed decision trees with respect to sensitivity to group differences as measured by effect size.
Conclusions
CAT selects items based on two criteria: Item location relative to current estimate of theta, Item discrimination
Decision Trees select items that best discriminate between groups defined by the total score.
CAT is optimal only when trait level is well estimated.
Findings suggest that combining decision tree followed by CAT item selection may be advantageous.
10aadaptive item selection10aCAT10adecision tree1 aRiley, Barth, B1 aFunk, Rodney1 aDennis, Michael, L1 aLennox, Richard, D1 aFinkelman, Matthew uhttp://iacat.org/content/use-decision-trees-adaptive-item-selection-and-score-estimation01388nas a2200145 4500008004100000245007100041210006900112260001200181520089200193653000801085653001801093653001701111100002601128856008801154 2011 eng d00aWalking the Tightrope: Using Better Content Control to Improve CAT0 aWalking the Tightrope Using Better Content Control to Improve CA c10/20113 aAll testing involves a balance between measurement precision and content considerations. CAT item-selection algorithms have evolved to accommodate content considerations. Reviews CAT evolution including: Original/”Pure” adaptive exams, Constrained CAT, Weighted-deviations method, Shadow-Test Approach, Testlets instead of fully adapted tests, Administration of one item may preclude the administration of other item(s), and item relationships.
Research Questions
-
How do the (new) item-selection algorithms perform in terms of measurement precision? item exposure? item-pool usage?
-
Do we know the best way to incorporate item sets into adaptive exams?
-
Consider exams with multiple item types. Must the item types be treated separately in a CAT, or can we adapt exams across item types?
10aCAT10aCAT evolution10atest content1 aGialluca, Kathleen, A uhttp://iacat.org/content/walking-tightrope-using-better-content-control-improve-cat01380nas a2200133 4500008004100000245006000041210006000101300001200161490000700173520093200180653003401112100001801146856008201164 2010 eng d00aBayesian item selection in constrained adaptive testing0 aBayesian item selection in constrained adaptive testing a149-1690 v313 aApplication of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item selection process. The Shadow Test Approach is a general purpose algorithm for administering constrained CAT. In this paper it is shown how the approach can be slightly modified to handle Bayesian item selection criteria. No differences in performance were found between the shadow test approach and the modifiedapproach. In a simulation study of the LSAT, the effects of Bayesian item selection criteria are illustrated. The results are compared to item selection based on Fisher Information. General recommendations about the use of Bayesian item selection criteria are provided.10acomputerized adaptive testing1 aVeldkamp, B P uhttp://iacat.org/content/bayesian-item-selection-constrained-adaptive-testing01345nas a2200229 4500008004100000020001300041245011700054210006900171300001200240490000700252520057500259653000800834653003400842653001900876653001500895100002000910700001600930700001800946700001500964700001400979856012200993 2010 eng d a0191886900aDetection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM0 aDetection of aberrant item score patterns in computerized adapti a921-9250 v483 aThe scalability of individual trait scores on a computerized adaptive test (CAT) was assessed through investigating the consistency of individual item score patterns. A sample of N = 428 persons completed a personality CAT as part of a career development procedure. To detect inconsistent item score patterns, we used a cumulative sum (CUSUM) procedure. Combined information from the CUSUM, other personality measures, and interviews showed that similar estimated trait values may have a different interpretation.Implications for computer-based assessment are discussed.10aCAT10acomputerized adaptive testing10aCUSUM approach10aperson Fit1 aEgberink, I J L1 aMeijer, R R1 aVeldkamp, B P1 aSchakel, L1 aSmid, N G uhttp://iacat.org/content/detection-aberrant-item-score-patterns-computerized-adaptive-testing-empirical-example-using03099nas a2200445 4500008004100000020004100041245012000082210006900202250001500271260001000286300001100296490000700307520175400314653003802068653002102106653001002127653000902137653002202146653002802168653003302196653001102229653001102240653000902251653001602260653001802276653001902294653003102313653003102344653001602375100001602391700001002407700001402417700001502431700001402446700001502460700001802475700002402493700001802517856011802535 2010 eng d a0161-8105 (Print)0161-8105 (Linking)00aDevelopment and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments0 aDevelopment and validation of patientreported outcome measures f a2010/06/17 cJun 1 a781-920 v333 aSTUDY OBJECTIVES: To develop an archive of self-report questions assessing sleep disturbance and sleep-related impairments (SRI), to develop item banks from this archive, and to validate and calibrate the item banks using classic validation techniques and item response theory analyses in a sample of clinical and community participants. DESIGN: Cross-sectional self-report study. SETTING: Academic medical center and participant homes. PARTICIPANTS: One thousand nine hundred ninety-three adults recruited from an Internet polling sample and 259 adults recruited from medical, psychiatric, and sleep clinics. INTERVENTIONS: None. MEASUREMENTS AND RESULTS: This study was part of PROMIS (Patient-Reported Outcomes Information System), a National Institutes of Health Roadmap initiative. Self-report item banks were developed through an iterative process of literature searches, collecting and sorting items, expert content review, qualitative patient research, and pilot testing. Internal consistency, convergent validity, and exploratory and confirmatory factor analysis were examined in the resulting item banks. Factor analyses identified 2 preliminary item banks, sleep disturbance and SRI. Item response theory analyses and expert content review narrowed the item banks to 27 and 16 items, respectively. Validity of the item banks was supported by moderate to high correlations with existing scales and by significant differences in sleep disturbance and SRI scores between participants with and without sleep disorders. CONCLUSIONS: The PROMIS sleep disturbance and SRI item banks have excellent measurement properties and may prove to be useful for assessing general aspects of sleep and SRI with various groups of patients and interventions.10a*Outcome Assessment (Health Care)10a*Self Disclosure10aAdult10aAged10aAged, 80 and over10aCross-Sectional Studies10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aPsychometrics10aQuestionnaires10aReproducibility of Results10aSleep Disorders/*diagnosis10aYoung Adult1 aBuysse, D J1 aYu, L1 aMoul, D E1 aGermain, A1 aStover, A1 aDodds, N E1 aJohnston, K L1 aShablesky-Cade, M A1 aPilkonis, P A uhttp://iacat.org/content/development-and-validation-patient-reported-outcome-measures-sleep-disturbance-and-sleep01507nas a2200217 4500008004100000245008100041210006900122300001200191490000700203520078900210653001100999653003401010653002201044653003501066653002101101653002101122100001901143700001401162700001601176856009701192 2010 eng d00aItem Selection and Hypothesis Testing for the Adaptive Measurement of Change0 aItem Selection and Hypothesis Testing for the Adaptive Measureme a238-2540 v343 aAssessing individual change is an important topic in both psychological and educational measurement. An adaptive measurement of change (AMC) method had previously been shown to exhibit greater efficiency in detecting change than conventional nonadaptive methods. However, little work had been done to compare different procedures within the AMC framework. This study introduced a new item selection criterion and two new test statistics for detecting change with AMC that were specifically designed for the paradigm of hypothesis testing. In two simulation sets, the new methods for detecting significant change improved on existing procedures by demonstrating better adherence to Type I error rates and substantially better power for detecting relatively small change.
10achange10acomputerized adaptive testing10aindividual change10aKullback–Leibler information10alikelihood ratio10ameasuring change1 aFinkelman, M D1 aWeiss, DJ1 aKim-Kang, G uhttp://iacat.org/content/item-selection-and-hypothesis-testing-adaptive-measurement-change-002745nas a2200409 4500008004100000020004600041245009400087210006900181250001500250260000800265300001200273490000700285520144500292653001501737653002001752653003101772653003001803653002001833653001901853653002601872653001101898653001101909653000901920653001601929653002601945653003701971653003002008653004402038653001802082653002002100653002802120100002002148700002302168700001602191700001702207856011102224 2009 eng d a1528-8447 (Electronic)1526-5900 (Linking)00aDevelopment and preliminary testing of a computerized adaptive assessment of chronic pain0 aDevelopment and preliminary testing of a computerized adaptive a a2009/07/15 cSep a932-9430 v103 aThe aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. PERSPECTIVE: This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain.10a*Computers10a*Questionnaires10aActivities of Daily Living10aAdaptation, Psychological10aChronic Disease10aCohort Studies10aDisability Evaluation10aFemale10aHumans10aMale10aMiddle Aged10aModels, Psychological10aOutcome Assessment (Health Care)10aPain Measurement/*methods10aPain, Intractable/*diagnosis/psychology10aPsychometrics10aQuality of Life10aUser-Computer Interface1 aAnatchkova, M D1 aSaris-Baglama, R N1 aKosinski, M1 aBjorner, J B uhttp://iacat.org/content/development-and-preliminary-testing-computerized-adaptive-assessment-chronic-pain02747nas a2200433 4500008004100000020004600041245012800087210006900215250001500284300001200299490000700311520139300318653003401711653001501745653001001760653000901770653002201779653002501801653001101826653001101837653000901848653001601857653001501873653003801888653001901926653003101945653002801976653004802004653002202052100002002074700001202094700001402106700001602120700001402136700001702150700001502167700001502182856011602197 2009 eng d a1878-5921 (Electronic)0895-4356 (Linking)00aAn evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception0 aevaluation of patientreported outcomes found computerized adapti a2008/07/22 a278-2870 v623 aOBJECTIVES: This study aimed to develop and evaluate a first computerized adaptive test (CAT) for the measurement of stress perception (Stress-CAT), in terms of the two dimensions: exposure to stress and stress reaction. STUDY DESIGN AND SETTING: Item response theory modeling was performed using a two-parameter model (Generalized Partial Credit Model). The evaluation of the Stress-CAT comprised a simulation study and real clinical application. A total of 1,092 psychosomatic patients (N1) were studied. Two hundred simulees (N2) were generated for a simulated response data set. Then the Stress-CAT was given to n=116 inpatients, (N3) together with established stress questionnaires as validity criteria. RESULTS: The final banks included n=38 stress exposure items and n=31 stress reaction items. In the first simulation study, CAT scores could be estimated with a high measurement precision (SE<0.32; rho>0.90) using 7.0+/-2.3 (M+/-SD) stress reaction items and 11.6+/-1.7 stress exposure items. The second simulation study reanalyzed real patients data (N1) and showed an average use of items of 5.6+/-2.1 for the dimension stress reaction and 10.0+/-4.9 for the dimension stress exposure. Convergent validity showed significantly high correlations. CONCLUSIONS: The Stress-CAT is short and precise, potentially lowering the response burden of patients in clinical decision making.10a*Diagnosis, Computer-Assisted10aAdolescent10aAdult10aAged10aAged, 80 and over10aConfidence Intervals10aFemale10aHumans10aMale10aMiddle Aged10aPerception10aQuality of Health Care/*standards10aQuestionnaires10aReproducibility of Results10aSickness Impact Profile10aStress, Psychological/*diagnosis/psychology10aTreatment Outcome1 aKocalevent, R D1 aRose, M1 aBecker, J1 aWalter, O B1 aFliege, H1 aBjorner, J B1 aKleiber, D1 aKlapp, B F uhttp://iacat.org/content/evaluation-patient-reported-outcomes-found-computerized-adaptive-testing-was-efficient03143nas a2200457 4500008004100000020004100041245015500082210006900237250001500306260000800321300001200329490000700341520174100348653002502089653001902114653002502133653003002158653001502188653003602203653001002239653002102249653003302270653001102303653001102314653000902325653001802334653001702352653001902369653001602388100001502404700001002419700001502429700002502444700001802469700001702487700001602504700001602520700001402536700001602550856011902566 2009 eng d a0962-9343 (Print)0962-9343 (Linking)00aMeasuring global physical health in children with cerebral palsy: Illustration of a multidimensional bi-factor model and computerized adaptive testing0 aMeasuring global physical health in children with cerebral palsy a2009/02/18 cApr a359-3700 v183 aPURPOSE: The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). METHODS: Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. RESULTS: Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. CONCLUSIONS: The bi-factor MIRT CAT application, especially the 10- and 15-item versions, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner.10a*Computer Simulation10a*Health Status10a*Models, Statistical10aAdaptation, Psychological10aAdolescent10aCerebral Palsy/*physiopathology10aChild10aChild, Preschool10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMassachusetts10aPennsylvania10aQuestionnaires10aYoung Adult1 aHaley, S M1 aNi, P1 aDumas, H M1 aFragala-Pinkham, M A1 aHambleton, RK1 aMontpetit, K1 aBilodeau, N1 aGorton, G E1 aWatson, K1 aTucker, C A uhttp://iacat.org/content/measuring-global-physical-health-children-cerebral-palsy-illustration-multidimensional-bi02431nas a2200385 4500008004100000020004100041245009300082210006900175250001500244260000800259300001100267490000700278520128100285653003201566653002701598653002001625653002901645653001001674653000901684653001901693653003401712653001101746653001101757653000901768653001601777653004601793100001501839700001001854700001501864700001101879700001201890700001401902700001601916856011301932 2009 eng d a0962-9343 (Print)0962-9343 (Linking)00aReplenishing a computerized adaptive test of patient-reported daily activity functioning0 aReplenishing a computerized adaptive test of patientreported dai a2009/03/17 cMay a461-710 v183 aPURPOSE: Computerized adaptive testing (CAT) item banks may need to be updated, but before new items can be added, they must be linked to the previous CAT. The purpose of this study was to evaluate 41 pretest items prior to including them into an operational CAT. METHODS: We recruited 6,882 patients with spine, lower extremity, upper extremity, and nonorthopedic impairments who received outpatient rehabilitation in one of 147 clinics across 13 states of the USA. Forty-one new Daily Activity (DA) items were administered along with the Activity Measure for Post-Acute Care Daily Activity CAT (DA-CAT-1) in five separate waves. We compared the scoring consistency with the full item bank, test information function (TIF), person standard errors (SEs), and content range of the DA-CAT-1 to the new CAT (DA-CAT-2) with the pretest items by real data simulations. RESULTS: We retained 29 of the 41 pretest items. Scores from the DA-CAT-2 were more consistent (ICC = 0.90 versus 0.96) than DA-CAT-1 when compared with the full item bank. TIF and person SEs were improved for persons with higher levels of DA functioning, and ceiling effects were reduced from 16.1% to 6.1%. CONCLUSIONS: Item response theory and online calibration methods were valuable in improving the DA-CAT.10a*Activities of Daily Living10a*Disability Evaluation10a*Questionnaires10a*User-Computer Interface10aAdult10aAged10aCohort Studies10aComputer-Assisted Instruction10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods1 aHaley, S M1 aNi, P1 aJette, A M1 aTao, W1 aMoed, R1 aMeyers, D1 aLudlow, L H uhttp://iacat.org/content/replenishing-computerized-adaptive-test-patient-reported-daily-activity-functioning03432nas a2200481 4500008004100000020004600041245013800087210006900225250001500294260000800309300001200317490000700329520191400336653002702250653002302277653003102300653001502331653001602346653001002362653002102372653002402393653002302417653003802440653001102478653002202489653001102511653001102522653000902533653003702542653002102579653003102600653002602631653001702657653003202674653001602706653002802722100001602750700001502766700001002781700001502791700002502806856011902831 2008 eng d a1532-821X (Electronic)0003-9993 (Linking)00aAssessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory0 aAssessing selfcare and social function using a computer adaptive a2008/04/01 cApr a622-6290 v893 aOBJECTIVE: To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. PARTICIPANTS: Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. RESULTS: Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. CONCLUSIONS: Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time.10a*Disability Evaluation10a*Social Adjustment10aActivities of Daily Living10aAdolescent10aAge Factors10aChild10aChild, Preschool10aComputer Simulation10aCross-Over Studies10aDisabled Children/*rehabilitation10aFemale10aFollow-Up Studies10aHumans10aInfant10aMale10aOutcome Assessment (Health Care)10aReference Values10aReproducibility of Results10aRetrospective Studies10aRisk Factors10aSelf Care/*standards/trends10aSex Factors10aSickness Impact Profile1 aCoster, W J1 aHaley, S M1 aNi, P1 aDumas, H M1 aFragala-Pinkham, M A uhttp://iacat.org/content/assessing-self-care-and-social-function-using-computer-adaptive-testing-version-pediatric01431nas a2200169 4500008003900000245009000039210006900129300001000198490000800208520081300216653002901029653003601058653003001094100001501124700001201139856011001151 2008 d00aComputer Adaptive-Attribute Testing A New Approach to Cognitive Diagnostic Assessment0 aComputer AdaptiveAttribute Testing A New Approach to Cognitive D a29-390 v2163 aThe influence of interdisciplinary forces stemming from developments in cognitive science,mathematical statistics, educational
psychology, and computing science are beginning to appear in educational and psychological assessment. Computer adaptive-attribute testing (CA-AT) is one example. The concepts and procedures in CA-AT can be found at the intersection between computer adaptive testing and cognitive diagnostic assessment. CA-AT allows us to fuse the administrative benefits of computer adaptive testing with the psychological benefits of cognitive diagnostic assessment to produce an innovative psychologically-based adaptive testing approach. We describe the concepts behind CA-AT as well as illustrate how it can be used to promote formative, computer-based, classroom assessment.
10acognition and assessment10acognitive diagnostic assessment10acomputer adaptive testing1 aGierl, M J1 aZhou, J uhttp://iacat.org/content/computer-adaptive-attribute-testing-new-approach-cognitive-diagnostic-assessment03037nas a2200481 4500008004100000020004600041245012200087210006900209250001500278260000800293300001200301490000700313520155700320653003201877653003101909653002201940653002001962653001001982653000901992653002202001653002802023653003302051653001102084653001102095653002502106653000902131653001602140653004602156653002202202653002402224653003002248653002902278100001502307700001402322700001502336700002402351700001802375700001102393700001602404700001002420700001502430856011002445 2008 eng d a1532-821X (Electronic)0003-9993 (Linking)00aComputerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes0 aComputerized adaptive testing for followup after discharge from a2008/01/30 cFeb a275-2830 v893 aOBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden.10a*Activities of Daily Living10a*Adaptation, Physiological10a*Computer Systems10a*Questionnaires10aAdult10aAged10aAged, 80 and over10aChi-Square Distribution10aFactor Analysis, Statistical10aFemale10aHumans10aLongitudinal Studies10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPatient Discharge10aProspective Studies10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aGandek, B1 aSiebens, H1 aBlack-Schaffer, R M1 aSinclair, S J1 aTao, W1 aCoster, W J1 aNi, P1 aJette, A M uhttp://iacat.org/content/computerized-adaptive-testing-follow-after-discharge-inpatient-rehabilitation-ii01872nas a2200205 4500008003900000245005600039210005600095300001000151490000800161520124900169653002101418653003001439653002501469653001801494653002501512100001301537700001701550700002101567856007801588 2008 d00aComputerized Adaptive Testing of Personality Traits0 aComputerized Adaptive Testing of Personality Traits a12-210 v2163 aA computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a
conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun’s Adjective
Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima’s graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used).
10aAdaptive Testing10acmoputer-assisted testing10aItem Response Theory10aLikert scales10aPersonality Measures1 aHol, A M1 aVorst, H C M1 aMellenbergh, G J uhttp://iacat.org/content/computerized-adaptive-testing-personality-traits01724nas a2200181 4500008004100000245011200041210006900153300001100222490000700233520107200240653003401312653001701346653001901363100001601382700001801398700001501416856011101431 2008 eng d00aThe D-optimality item selection criterion in the early stage of CAT: A study with the graded response model0 aDoptimality item selection criterion in the early stage of CAT A a88-1100 v333 aDuring the early stage of computerized adaptive testing (CAT), item selection criteria based on Fisher’s information often produce less stable latent trait estimates than the Kullback-Leibler global information criterion. Robustness against early stage instability has been reported for the D-optimality criterion in a polytomous CAT with the Nominal Response Model and is shown herein to be reproducible for the Graded Response Model. For comparative purposes, the A-optimality and the global information criteria are also applied. Their item selection is investigated as a function of test progression and item bank composition. The results indicate how the selection of specific item parameters underlies the criteria performances evaluated via accuracy and precision of estimation. In addition, the criteria item exposure rates are compared, without the use of any exposure controlling measure. On the account of stability, precision, accuracy, numerical simplicity, and less evidently, item exposure rate, the D-optimality criterion can be recommended for CAT.10acomputerized adaptive testing10aD optimality10aitem selection1 aPassos, V L1 aBerger, M P F1 aTan, F E S uhttp://iacat.org/content/d-optimality-item-selection-criterion-early-stage-cat-study-graded-response-model02556nas a2200313 4500008004100000020004100041245011500082210006900197250001500266300001100281490000700292520149300299653002701792653001001819653001401829653005301843653001501896653001101911653003701922653001801959653003101977653002602008653001402034653003202048100001502080700001002095700001502105856012202120 2008 eng d a0963-8288 (Print)0963-8288 (Linking)00aEfficiency and sensitivity of multidimensional computerized adaptive testing of pediatric physical functioning0 aEfficiency and sensitivity of multidimensional computerized adap a2008/02/26 a479-840 v303 aPURPOSE: Computerized adaptive tests (CATs) have efficiency advantages over fixed-length tests of physical functioning but may lose sensitivity when administering extremely low numbers of items. Multidimensional CATs may efficiently improve sensitivity by capitalizing on correlations between functional domains. Using a series of empirical simulations, we assessed the efficiency and sensitivity of multidimensional CATs compared to a longer fixed-length test. METHOD: Parent responses to the Pediatric Evaluation of Disability Inventory before and after intervention for 239 children at a pediatric rehabilitation hospital provided the data for this retrospective study. Reliability, effect size, and standardized response mean were compared between full-length self-care and mobility subscales and simulated multidimensional CATs with stopping rules at 40, 30, 20, and 10 items. RESULTS: Reliability was lowest in the 10-item CAT condition for the self-care (r = 0.85) and mobility (r = 0.79) subscales; all other conditions had high reliabilities (r > 0.94). All multidimensional CAT conditions had equivalent levels of sensitivity compared to the full set condition for both domains. CONCLUSIONS: Multidimensional CATs efficiently retain the sensitivity of longer fixed-length measures even with 5 items per dimension (10-item CAT condition). Measuring physical functioning with multidimensional CATs could enhance sensitivity following intervention while minimizing response burden.10a*Disability Evaluation10aChild10aComputers10aDisabled Children/*classification/rehabilitation10aEfficiency10aHumans10aOutcome Assessment (Health Care)10aPsychometrics10aReproducibility of Results10aRetrospective Studies10aSelf Care10aSensitivity and Specificity1 aAllen, D D1 aNi, P1 aHaley, S M uhttp://iacat.org/content/efficiency-and-sensitivity-multidimensional-computerized-adaptive-testing-pediatric-physical02190nas a2200145 4500008004100000245009900041210006900140300001000209490000800219520163900227653003401866100001901900700001601919856010901935 2008 eng d00aICAT: An adaptive testing procedure for the identification of idiosyncratic knowledge patterns0 aICAT An adaptive testing procedure for the identification of idi a40-480 v2163 aTraditional adaptive tests provide an efficient method for estimating student achievements levels, by adjusting the characteristicsof the test questions to match the performance of each student. These traditional adaptive tests are not designed to identify diosyncraticknowledge patterns. As students move through their education, they learn content in any number of different ways related to their learning style and cognitive development. This may result in a student having different achievement levels from one content area to another within a domain of content. This study investigates whether such idiosyncratic knowledge patterns exist. It discusses the differences between idiosyncratic knowledge patterns and multidimensionality. Finally, it proposes an adaptive testing procedure that can be used to identify a student’s areas of strength and weakness more efficiently than current adaptive testing approaches. The findings of the study indicate that a fairly large number of students may have test results that are influenced by their idiosyncratic knowledge patterns. The findings suggest that these patterns persist across time for a large number of students, and that the differences in student performance between content areas within a subject domain are large enough to allow them to be useful in instruction. Given the existence of idiosyncratic patterns of knowledge, the proposed testing procedure may enable us to provide more useful information to teachers. It should also allow us to differentiate between idiosyncratic patterns or knowledge, and important mutidimensionality in the testing data.
10acomputerized adaptive testing1 aKingsbury, G G1 aHouser, R L uhttp://iacat.org/content/icat-adaptive-testing-procedure-identification-idiosyncratic-knowledge-patterns03228nas a2200397 4500008004100000020002700041245014200068210006900210250001500279260001100294300001200305490000700317520193600324653002702260653003002287653001002317653000902327653002202336653003602358653001602394653002402410653004402434653001102478653001602489653002602505653003002531653003002561653003102591100001302622700001402635700001502649700001402664700001702678700001502695856012002710 2008 eng d a1528-1159 (Electronic)00aLetting the CAT out of the bag: Comparing computer adaptive tests and an 11-item short form of the Roland-Morris Disability Questionnaire0 aLetting the CAT out of the bag Comparing computer adaptive tests a2008/05/23 cMay 20 a1378-830 v333 aSTUDY DESIGN: A post hoc simulation of a computer adaptive administration of the items of a modified version of the Roland-Morris Disability Questionnaire. OBJECTIVE: To evaluate the effectiveness of adaptive administration of back pain-related disability items compared with a fixed 11-item short form. SUMMARY OF BACKGROUND DATA: Short form versions of the Roland-Morris Disability Questionnaire have been developed. An alternative to paper-and-pencil short forms is to administer items adaptively so that items are presented based on a person's responses to previous items. Theoretically, this allows precise estimation of back pain disability with administration of only a few items. MATERIALS AND METHODS: Data were gathered from 2 previously conducted studies of persons with back pain. An item response theory model was used to calibrate scores based on all items, items of a paper-and-pencil short form, and several computer adaptive tests (CATs). RESULTS: Correlations between each CAT condition and scores based on a 23-item version of the Roland-Morris Disability Questionnaire ranged from 0.93 to 0.98. Compared with an 11-item short form, an 11-item CAT produced scores that were significantly more highly correlated with scores based on the 23-item scale. CATs with even fewer items also produced scores that were highly correlated with scores based on all items. For example, scores from a 5-item CAT had a correlation of 0.93 with full scale scores. Seven- and 9-item CATs correlated at 0.95 and 0.97, respectively. A CAT with a standard-error-based stopping rule produced scores that correlated at 0.95 with full scale scores. CONCLUSION: A CAT-based back pain-related disability measure may be a valuable tool for use in clinical and research contexts. Use of CAT for other common measures in back pain research, such as other functional scales or measures of psychological distress, may offer similar advantages.10a*Disability Evaluation10a*Health Status Indicators10aAdult10aAged10aAged, 80 and over10aBack Pain/*diagnosis/psychology10aCalibration10aComputer Simulation10aDiagnosis, Computer-Assisted/*standards10aHumans10aMiddle Aged10aModels, Psychological10aPredictive Value of Tests10aQuestionnaires/*standards10aReproducibility of Results1 aCook, KF1 aChoi, S W1 aCrane, P K1 aDeyo, R A1 aJohnson, K L1 aAmtmann, D uhttp://iacat.org/content/letting-cat-out-bag-comparing-computer-adaptive-tests-and-11-item-short-form-roland-morris03424nas a2200385 4500008004100000020004100041245010600082210006900188250001500257260001200272300001000284490000700294520220300301653002702504653001502531653001002546653002102556653002402577653002802601653003802629653001102667653001102678653001102689653003902700653000902739653002402748653003102772653004002803100001802843700001502861700001302876700001702889700001402906856011802920 2008 eng d a0271-6798 (Print)0271-6798 (Linking)00aMeasuring physical functioning in children with spinal impairments with computerized adaptive testing0 aMeasuring physical functioning in children with spinal impairmen a2008/03/26 cApr-May a330-50 v283 aBACKGROUND: The purpose of this study was to assess the utility of measuring current physical functioning status of children with scoliosis and kyphosis by applying computerized adaptive testing (CAT) methods. Computerized adaptive testing uses a computer interface to administer the most optimal items based on previous responses, reducing the number of items needed to obtain a scoring estimate. METHODS: This was a prospective study of 77 subjects (0.6-19.8 years) who were seen by a spine surgeon during a routine clinic visit for progress spine deformity. Using a multidimensional version of the Pediatric Evaluation of Disability Inventory CAT program (PEDI-MCAT), we evaluated content range, accuracy and efficiency, known-group validity, concurrent validity with the Pediatric Outcomes Data Collection Instrument, and test-retest reliability in a subsample (n = 16) within a 2-week interval. RESULTS: We found the PEDI-MCAT to have sufficient item coverage in both self-care and mobility content for this sample, although most patients tended to score at the higher ends of both scales. Both the accuracy of PEDI-MCAT scores as compared with a fixed format of the PEDI (r = 0.98 for both mobility and self-care) and test-retest reliability were very high [self-care: intraclass correlation (3,1) = 0.98, mobility: intraclass correlation (3,1) = 0.99]. The PEDI-MCAT took an average of 2.9 minutes for the parents to complete. The PEDI-MCAT detected expected differences between patient groups, and scores on the PEDI-MCAT correlated in expected directions with scores from the Pediatric Outcomes Data Collection Instrument domains. CONCLUSIONS: Use of the PEDI-MCAT to assess the physical functioning status, as perceived by parents of children with complex spinal impairments, seems to be feasible and achieves accurate and efficient estimates of self-care and mobility function. Additional item development will be needed at the higher functioning end of the scale to avoid ceiling effects for older children. LEVEL OF EVIDENCE: This is a level II prospective study designed to establish the utility of computer adaptive testing as an evaluation method in a busy pediatric spine practice.10a*Disability Evaluation10aAdolescent10aChild10aChild, Preschool10aComputer Simulation10aCross-Sectional Studies10aDisabled Children/*rehabilitation10aFemale10aHumans10aInfant10aKyphosis/*diagnosis/rehabilitation10aMale10aProspective Studies10aReproducibility of Results10aScoliosis/*diagnosis/rehabilitation1 aMulcahey, M J1 aHaley, S M1 aDuffy, T1 aPengsheng, N1 aBetz, R R uhttp://iacat.org/content/measuring-physical-functioning-children-spinal-impairments-computerized-adaptive-testing02170nas a2200301 4500008004100000020001400041245010200055210006900157250001500226300001200241490000700253520109000260653001501350653001501365653002101380653004801401653002401449653002801473653006201501653005701563653001101620653002701631653004601658100001701704700001201721700001401733856012101747 2008 eng d a1138-741600aRotating item banks versus restriction of maximum exposure rates in computerized adaptive testing0 aRotating item banks versus restriction of maximum exposure rates a2008/11/08 a618-6250 v113 aIf examinees were to know, beforehand, part of the content of a computerized adaptive test, their estimated trait levels would then have a marked positive bias. One of the strategies to avoid this consists of dividing a large item bank into several sub-banks and rotating the sub-bank employed (Ariel, Veldkamp & van der Linden, 2004). This strategy permits substantial improvements in exposure control at little cost to measurement accuracy, However, we do not know whether this option provides better results than using the master bank with greater restriction in the maximum exposure rates (Sympson & Hetter, 1985). In order to investigate this issue, we worked with several simulated banks of 2100 items, comparing them, for RMSE and overlap rate, with the same banks divided in two, three... up to seven sub-banks. By means of extensive manipulation of the maximum exposure rate in each bank, we found that the option of rotating banks slightly outperformed the option of restricting maximum exposure rate of the master bank by means of the Sympson-Hetter method.
10a*Character10a*Databases10a*Software Design10aAptitude Tests/*statistics & numerical data10aBias (Epidemiology)10aComputing Methodologies10aDiagnosis, Computer-Assisted/*statistics & numerical data10aEducational Measurement/*statistics & numerical data10aHumans10aMathematical Computing10aPsychometrics/statistics & numerical data1 aBarrada, J R1 aOlea, J1 aAbad, F J uhttp://iacat.org/content/rotating-item-banks-versus-restriction-maximum-exposure-rates-computerized-adaptive-testing01289nas a2200133 4500008004100000245005700041210005700098300000900155490000800164520084700172653003401019100002301053856007901076 2008 eng d00aSome new developments in adaptive testing technology0 aSome new developments in adaptive testing technology a3-110 v2163 aIn an ironic twist of history, modern psychological testing has returned to an adaptive format quite common when testing was not yet standardized. Important stimuli to the renewed interest in adaptive testing have been the development of item-response theory in psychometrics, which models the responses on test items using separate parameters for the items and test takers, and the use of computers in test administration, which enables us to estimate the parameter for a test taker and select the items in real time. This article reviews a selection from the latest developments in the technology of adaptive testing, such as constrained adaptive item selection, adaptive testing using rule-based item generation, multidimensional adaptive testing, adaptive use of test batteries, and the use of response times in adaptive testing.
10acomputerized adaptive testing1 avan der Linden, WJ uhttp://iacat.org/content/some-new-developments-adaptive-testing-technology01473nas a2200241 4500008004100000020002200041245007800063210006900141250001500210260001100225300001200236490000700248520069100255653002300946653002500969653002100994653006201015653001101077653001601088100001701104700001401121856009601135 2007 eng d a0277-6715 (Print)00aComputerized adaptive testing for measuring development of young children0 aComputerized adaptive testing for measuring development of young a2006/11/30 cJun 15 a2629-380 v263 aDevelopmental indicators that are used for routine measurement in The Netherlands are usually chosen to optimally identify delayed children. Measurements on the majority of children without problems are therefore quite imprecise. This study explores the use of computerized adaptive testing (CAT) to monitor the development of young children. CAT is expected to improve the measurement precision of the instrument. We do two simulation studies - one with real data and one with simulated data - to evaluate the usefulness of CAT. It is shown that CAT selects developmental indicators that maximally match the individual child, so that all children can be measured to the same precision.10a*Child Development10a*Models, Statistical10aChild, Preschool10aDiagnosis, Computer-Assisted/*statistics & numerical data10aHumans10aNetherlands1 aJacobusse, G1 aBuuren, S uhttp://iacat.org/content/computerized-adaptive-testing-measuring-development-young-children01944nas a2200301 4500008004500000020001400045245012900059210006900188300001200257490000700269520093500276653002501211653002101236653002501257653003001282653003001312653001001342653001501352653002601367653002501393653002401418653001501442653001501457100001301472700001701485700002101502856011901523 2007 Engldsh a0146-621600aComputerized adaptive testing for polytomous motivation items: Administration mode effects and a comparison with short forms0 aComputerized adaptive testing for polytomous motivation items Ad a412-4290 v313 aIn a randomized experiment (n=515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit. (PsycINFO Database Record (c) 2007 APA ) (journal abstract)10a2220 Tests & Testing10aAdaptive Testing10aAttitude Measurement10acomputer adaptive testing10aComputer Assisted Testing10aitems10aMotivation10apolytomous motivation10aStatistical Validity10aTest Administration10aTest Forms10aTest Items1 aHol, A M1 aVorst, H C M1 aMellenbergh, G J uhttp://iacat.org/content/computerized-adaptive-testing-polytomous-motivation-items-administration-mode-effects-and00500nas a2200121 4500008004100000245006600041210006600107260005700173653003400230100001800264700001000282856008600292 2007 eng d00aComputerized classification testing with composite hypotheses0 aComputerized classification testing with composite hypotheses aSt. Paul, MNbGraduate Management Admissions Council10acomputerized adaptive testing1 aThompson, N A1 aRo, S uhttp://iacat.org/content/computerized-classification-testing-composite-hypotheses01873nas a2200193 4500008004100000020004600041245005200087210005200139260001900191300001000210490000600220520125500226653003801481653003001519653002301549100001901572700001401591856007401605 2007 eng d a1548-1093 (Print); 1548-1107 (Electronic)00aEvaluation of computer adaptive testing systems0 aEvaluation of computer adaptive testing systems bIGI Global: US a70-870 v23 aMany educational organizations are trying to reduce the cost of the exams, the workload and delay of scoring, and the human errors. Also, they try to increase the accuracy and efficiency of the testing. Recently, most examination organizations use computer adaptive testing (CAT) as the method for large scale testing. This article investigates the current state of CAT systems and identifies their strengths and weaknesses. It evaluates 10 CAT systems using an evaluation framework of 15 domains categorized into three dimensions: educational, technical, and economical. The results show that the majority of the CAT systems give priority to security, reliability, and maintainability. However, they do not offer to the examinee any advanced support and functionalities. Also, the feedback to the examinee is limited and the presentation of the items is poor. Recommendations are made in order to enhance the overall quality of a CAT system. For example, alternative multimedia items should be available so that the examinee would choose a preferred media type. Feedback could be improved by providing more information to the examinee or providing information anytime the examinee wished. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputer adaptive testing systems10aexamination organizations10asystems evaluation1 aEconomides, AA1 aRoupas, C uhttp://iacat.org/content/evaluation-computer-adaptive-testing-systems02871nas a2200313 4500008004100000020002200041245010100063210006900164250001500233260000800248300001200256490000700268520179500275653005102070653002002121653003702141653002602178653001902204653001102223653003002234653004602264653003502310653002802345653001302373100002102386700001702407700001502424856011802439 2007 eng d a0315-162X (Print)00aImproving patient reported outcomes using item response theory and computerized adaptive testing0 aImproving patient reported outcomes using item response theory a a2007/06/07 cJun a1426-310 v343 aOBJECTIVE: Patient reported outcomes (PRO) are considered central outcome measures for both clinical trials and observational studies in rheumatology. More sophisticated statistical models, including item response theory (IRT) and computerized adaptive testing (CAT), will enable critical evaluation and reconstruction of currently utilized PRO instruments to improve measurement precision while reducing item burden on the individual patient. METHODS: We developed a domain hierarchy encompassing the latent trait of physical function/disability from the more general to most specific. Items collected from 165 English-language instruments were evaluated by a structured process including trained raters, modified Delphi expert consensus, and then patient evaluation. Each item in the refined data bank will undergo extensive analysis using IRT to evaluate response functions and measurement precision. CAT will allow for real-time questionnaires of potentially smaller numbers of questions tailored directly to each individual's level of physical function. RESULTS: Physical function/disability domain comprises 4 subdomains: upper extremity, trunk, lower extremity, and complex activities. Expert and patient review led to consensus favoring use of present-tense "capability" questions using a 4- or 5-item Likert response construct over past-tense "performance"items. Floor and ceiling effects, attribution of disability, and standardization of response categories were also addressed. CONCLUSION: By applying statistical techniques of IRT through use of CAT, existing PRO instruments may be improved to reduce questionnaire burden on the individual patients while increasing measurement precision that may ultimately lead to reduced sample size requirements for costly clinical trials.10a*Rheumatic Diseases/physiopathology/psychology10aClinical Trials10aData Interpretation, Statistical10aDisability Evaluation10aHealth Surveys10aHumans10aInternational Cooperation10aOutcome Assessment (Health Care)/*methods10aPatient Participation/*methods10aResearch Design/*trends10aSoftware1 aChakravarty, E F1 aBjorner, J B1 aFries, J F uhttp://iacat.org/content/improving-patient-reported-outcomes-using-item-response-theory-and-computerized-adaptive02408nas a2200361 4500008004500000020001400045245011100059210006900170300001200239490000700251520135900258653001601617653002001633653001301653653002401666653002501690653001101715653001401726653001301740653001801753653002701771653001001798653001101808100001501819700001201834700001601846700001201862700001601874700001301890700001301903700001401916856011601930 2007 Engldsh a1057-924900aThe initial development of an item bank to assess and screen for psychological distress in cancer patients0 ainitial development of an item bank to assess and screen for psy a724-7320 v163 aPsychological distress is a common problem among cancer patients. Despite the large number of instruments that have been developed to assess distress, their utility remains disappointing. This study aimed to use Rasch models to develop an item-bank which would provide the basis for better means of assessing psychological distress in cancer patients. An item bank was developed from eight psychological distress questionnaires using Rasch analysis to link common items. Items from the questionnaires were added iteratively with common items as anchor points and misfitting items (infit mean square > 1.3) removed, and unidimensionality assessed. A total of 4914 patients completed the questionnaires providing an initial pool of 83 items. Twenty items were removed resulting in a final pool of 63 items. Good fit was demonstrated and no additional factor structure was evident from the residuals. However, there was little overlap between item locations and person measures, since items mainly targeted higher levels of distress. The Rasch analysis allowed items to be pooled and generated a unidimensional instrument for measuring psychological distress in cancer patients. Additional items are required to more accurately assess patients across the whole continuum of psychological distress. (PsycINFO Database Record (c) 2007 APA ) (journal abstract)10a3293 Cancer10acancer patients10aDistress10ainitial development10aItem Response Theory10aModels10aNeoplasms10aPatients10aPsychological10apsychological distress10aRasch10aStress1 aSmith, A B1 aRush, R1 aVelikova, G1 aWall, L1 aWright, E P1 aStark, D1 aSelby, P1 aSharpe, M uhttp://iacat.org/content/initial-development-item-bank-assess-and-screen-psychological-distress-cancer-patients02208nas a2200229 4500008004100000020004600041245008500087210006900172260004500241300001000286490000600296520140300302653003401705653002301739653002601762653001701788653002601805100001701831700001201848700001501860856010301875 2007 eng d a1614-1881 (Print); 1614-2241 (Electronic)00aMethods for restricting maximum exposure rate in computerized adaptative testing0 aMethods for restricting maximum exposure rate in computerized ad bHogrefe & Huber Publishers GmbH: Germany a14-230 v33 aThe Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method (Revuelta & Ponsoda, 1998), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10aitem bank security10aitem exposure control10aoverlap rate10aSympson-Hetter method1 aBarrada, J R1 aOlea, J1 aPonsoda, V uhttp://iacat.org/content/methods-restricting-maximum-exposure-rate-computerized-adaptative-testing01795nas a2200265 4500008004100000020004100041245010400082210006900186250001500255300001100270490001500281520085700296653001901153653003801172653002101210653001401231653002901245653005601274653001101330653002501341653001901366653001801385100001501403856011101418 2007 eng d a0962-9343 (Print)0962-9343 (Linking)00aPatient-reported outcomes measurement and management with innovative methodologies and technologies0 aPatientreported outcomes measurement and management with innovat a2007/05/29 a157-660 v16 Suppl 13 aSuccessful integration of modern psychometrics and advanced informatics in patient-reported outcomes (PRO) measurement and management can potentially maximize the value of health outcomes research and optimize the delivery of quality patient care. Unlike the traditional labor-intensive paper-and-pencil data collection method, item response theory-based computerized adaptive testing methodologies coupled with novel technologies provide an integrated environment to collect, analyze and present ready-to-use PRO data for informed and shared decision-making. This article describes the needs, challenges and solutions for accurate, efficient and cost-effective PRO data acquisition and dissemination means in order to provide critical and timely PRO information necessary to actively support and enhance routine patient care in busy clinical settings.10a*Health Status10a*Outcome Assessment (Health Care)10a*Quality of Life10a*Software10aComputer Systems/*trends10aHealth Insurance Portability and Accountability Act10aHumans10aPatient Satisfaction10aQuestionnaires10aUnited States1 aChang, C-H uhttp://iacat.org/content/patient-reported-outcomes-measurement-and-management-innovative-methodologies-and01453nas a2200181 4500008004100000245008200041210006900123260001300192490000800205520080800213653000801021653001901029653003001048653003401078653004001112100001801152856010101170 2007 eng d00aA practitioner's guide to variable-length computerized classification testing0 apractitioners guide to variablelength computerized classificatio c7/1/20090 v12 3 aVariable-length computerized classification tests, CCTs, (Lin & Spray, 2000; Thompson, 2006) are a powerful and efficient approach to testing for the purpose of classifying examinees into groups. CCTs are designed by the specification of at least five technical components: psychometric model, calibrated item bank, starting point, item selection algorithm, and termination criterion. Several options exist for each of these CCT components, creating a myriad of possible designs. Confusion among designs is exacerbated by the lack of a standardized nomenclature. This article outlines the components of a CCT, common options for each component, and the interaction of options for different components, so that practitioners may more efficiently design CCTs. It also offers a suggestion of nomenclature. 10aCAT10aclassification10acomputer adaptive testing10acomputerized adaptive testing10aComputerized classification testing1 aThompson, N A uhttp://iacat.org/content/practitioners-guide-variable-length-computerized-classification-testing02740nas a2200541 4500008004100000020002200041245017000063210006900233250001500302260000800317300001100325490000700336520116200343653001901505653002501524653002101549653002101570653001501591653001001606653000901616653001601625653002301641653003201664653001101696653001101707653000901718653001601727653004601743653001801789653002901807653001801836100001501854700001401869700001701883700001301900700001501913700001601928700001501944700001701959700001401976700001801990700001102008700001602019700001502035700001302050700001302063856012202076 2007 eng d a0025-7079 (Print)00aPsychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS)0 aPsychometric evaluation and calibration of healthrelated quality a2007/04/20 cMay aS22-310 v453 aBACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment.10a*Health Status10a*Information Systems10a*Quality of Life10a*Self Disclosure10aAdolescent10aAdult10aAged10aCalibration10aDatabases as Topic10aEvaluation Studies as Topic10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPsychometrics10aQuestionnaires/standards10aUnited States1 aReeve, B B1 aHays, R D1 aBjorner, J B1 aCook, KF1 aCrane, P K1 aTeresi, J A1 aThissen, D1 aRevicki, D A1 aWeiss, DJ1 aHambleton, RK1 aLiu, H1 aGershon, RC1 aReise, S P1 aLai, J S1 aCella, D uhttp://iacat.org/content/psychometric-evaluation-and-calibration-health-related-quality-life-item-banks-plans-patient02408nas a2200289 4500008004100000020002200041245010800063210006900171260004500240300001000285490000700295520141000302653003201712653002501744653002501769653002501794653002601819653001801845653003701863653002101900653001301921100001501934700001401949700001901963700002001982856011602002 2007 eng d a1015-5759 (Print)00aPsychometric properties of an emotional adjustment measure: An application of the graded response model0 aPsychometric properties of an emotional adjustment measure An ap bHogrefe & Huber Publishers GmbH: Germany a39-460 v233 aItem response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. However, IRT has been mainly used for assessing achievements and ability rather than personality factors. This paper presents an application of the IRT to a personality measure. Thus, the psychometric properties of a new emotional adjustment measure that consists of a 28-six graded response items is shown. Classical test theory (CTT) analyses as well as IRT analyses are carried out. Samejima's (1969) graded-response model has been used for estimating item parameters. Results show that the bank of items fulfills model assumptions and fits the data reasonably well, demonstrating the suitability of the IRT models for the description and use of data originating from personality measures. In this sense, the model fulfills the expectations that IRT has undoubted advantages: (1) The invariance of the estimated parameters, (2) the treatment given to the standard error of measurement, and (3) the possibilities offered for the construction of computerized adaptive tests (CAT). The bank of items shows good reliability. It also shows convergent validity compared to the Eysenck Personality Inventory (EPQ-A; Eysenck & Eysenck, 1975) and the Big Five Questionnaire (BFQ; Caprara, Barbaranelli, & Borgogni, 1993). (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive tests10aEmotional Adjustment10aItem Response Theory10aPersonality Measures10apersonnel recruitment10aPsychometrics10aSamejima's graded response model10atest reliability10avalidity1 aRubio, V J1 aAguado, D1 aHontangas, P M1 aHernández, J M uhttp://iacat.org/content/psychometric-properties-emotional-adjustment-measure-application-graded-response-model01684nas a2200217 4500008004100000020002200041245008000063210006900143260002600212300001200238490000700250520095600257653003401213653002701247653002301274653002901297100001601326700001801342700001301360856009301373 2007 eng d a0146-6216 (Print)00aTest design optimization in CAT early stage with the nominal response model0 aTest design optimization in CAT early stage with the nominal res bSage Publications: US a213-2320 v313 aThe early stage of computerized adaptive testing (CAT) refers to the phase of the trait estimation during the administration of only a few items. This phase can be characterized by bias and instability of estimation. In this study, an item selection criterion is introduced in an attempt to lessen this instability: the D-optimality criterion. A polytomous unconstrained CAT simulation is carried out to evaluate this criterion's performance under different test premises. The simulation shows that the extent of early stage instability depends primarily on the quality of the item pool information and its size and secondarily on the item selection criteria. The efficiency of the D-optimality criterion is similar to the efficiency of other known item selection criteria. Yet, it often yields estimates that, at the beginning of CAT, display a more robust performance against instability. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10anominal response model10arobust performance10atest design optimization1 aPassos, V L1 aBerger, M P F1 aTan, F E uhttp://iacat.org/content/test-design-optimization-cat-early-stage-nominal-response-model02172nas a2200181 4500008004100000020002200041245006200063210006200125260003800187300001200225490000700237520155900244653002901803653003401832653001801866100002201884856008401906 2006 eng d a0033-3018 (Print)00aAdaptive success control in computerized adaptive testing0 aAdaptive success control in computerized adaptive testing bPabst Science Publishers: Germany a436-4500 v483 aIn computerized adaptive testing (CAT) procedures within the framework of probabilistic test theory the difficulty of an item is adjusted to the ability of the respondent, with the aim of maximizing the amount of information generated per item, thereby also increasing test economy and test reasonableness. However, earlier research indicates that respondents might feel over-challenged by a constant success probability of p = 0.5 and therefore cannot come to a sufficiently high answer certainty within a reasonable timeframe. Consequently response time per item increases, which -- depending on the test material -- can outweigh the benefit of administering optimally informative items. Instead of a benefit, the result of using CAT procedures could be a loss of test economy. Based on this problem, an adaptive success control algorithm was designed and tested, adapting the success probability to the working style of the respondent. Persons who need higher answer certainty in order to come to a decision are detected and receive a higher success probability, in order to minimize the test duration (not the number of items as in classical CAT). The method is validated on the re-analysis of data from the Adaptive Matrices Test (AMT, Hornke, Etzel & Rettig, 1999) and by the comparison between an AMT version using classical CAT and an experimental version using Adaptive Success Control. The results are discussed in the light of psychometric and psychological aspects of test quality. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aadaptive success control10acomputerized adaptive testing10aPsychometrics1 aHäusler, Joachim uhttp://iacat.org/content/adaptive-success-control-computerized-adaptive-testing01633nas a2200217 4500008004100000020004600041245008900087210006900176260002500245300000900270490000700279520083400286653001901120653002801139653003001167653003301197653002101230653003501251100001801286856011101304 2006 eng d a0895-7347 (Print); 1532-4818 (Electronic)00aApplying Bayesian item selection approaches to adaptive tests using polytomous items0 aApplying Bayesian item selection approaches to adaptive tests us bLawrence Erlbaum: US a1-200 v193 aThis study applied the maximum expected information (MEI) and the maximum posterior- weighted information (MPI) approaches of computer adaptive testing item selection to the case of a test using polytomous items following the partial credit model. The MEI and MPI approaches are described. A simulation study compared the efficiency of ability estimation using the MEI and MPI approaches to the traditional maximal item information (MII) approach. The results of the simulation study indicated that the MEI and MPI approaches led to a superior efficiency of ability estimation compared with the MII approach. The superiority of the MEI and MPI approaches over the MII approach was greatest when the bank contained items having a relatively peaked information function. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aadaptive tests10aBayesian item selection10acomputer adaptive testing10amaximum expected information10apolytomous items10aposterior weighted information1 aPenfield, R D uhttp://iacat.org/content/applying-bayesian-item-selection-approaches-adaptive-tests-using-polytomous-items01959nas a2200265 4500008004100000020002200041245008200063210006900145260002600214300001000240490000700250520112900257653001501386653003401401653001401435653001701449653002401466653001501490653002201505653001501527100002301542700001301565700001801578856009701596 2006 eng d a1076-9986 (Print)00aAssembling a computerized adaptive testing item pool as a set of linear tests0 aAssembling a computerized adaptive testing item pool as a set of bSage Publications: US a81-990 v313 aTest-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content constraints, and/or have unfavorable exposure rates. Although at first sight somewhat counterintuitive, it is shown that if the CAT pool is assembled as a set of linear test forms, undesirable correlations can be broken down effectively. It is proposed to assemble such pools using a mixed integer programming model with constraints that guarantee that each test meets all content specifications and an objective function that requires them to have maximal information at a well-chosen set of ability values. An empirical example with a previous master pool from the Law School Admission Test (LSAT) yielded a CAT with nearly uniform bias and mean-squared error functions for the ability estimator and item-exposure rates that satisfied the target for all items in the pool. 10aAlgorithms10acomputerized adaptive testing10aitem pool10alinear tests10amathematical models10astatistics10aTest Construction10aTest Items1 avan der Linden, WJ1 aAriel, A1 aVeldkamp, B P uhttp://iacat.org/content/assembling-computerized-adaptive-testing-item-pool-set-linear-tests01943nas a2200241 4500008004100000020002200041245011200063210006900175260004100244300001200285490000700297520102300304653003401327653002401361653004701385653002401432653002101456653007001477100001301547700001401560700001001574856011701584 2006 eng d a0022-0655 (Print)00aComparing methods of assessing differential item functioning in a computerized adaptive testing environment0 aComparing methods of assessing differential item functioning in bBlackwell Publishing: United Kingdom a245-2640 v433 aMantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed. all rights reserved)10acomputerized adaptive testing10aeducational testing10aitem response theory likelihood ratio test10alogistic regression10atrait estimation10aunidirectional & non-unidirectional differential item functioning1 aLei, P-W1 aChen, S-Y1 aYu, L uhttp://iacat.org/content/comparing-methods-assessing-differential-item-functioning-computerized-adaptive-testing02399nas a2200241 4500008004100000020002200041245008500063210006900148260002500217300001200242490000700254520161500261653000801876653003401884653002601918653003001944653002601974100001402000700001402014700001602028700001502044856009802059 2006 eng d a0439-755X (Print)00aThe comparison among item selection strategies of CAT with multiple-choice items0 acomparison among item selection strategies of CAT with multiplec bScience Press: China a778-7830 v383 aThe initial purpose of comparing item selection strategies for CAT was to increase the efficiency of tests. As studies continued, however, it was found that increasing the efficiency of item bank using was also an important goal of comparing item selection strategies. These two goals often conflicted. The key solution was to find a strategy with which both goals could be accomplished. The item selection strategies for graded response model in this study included: the average of the difficulty orders matching with the ability; the medium of the difficulty orders matching with the ability; maximum information; A stratified (average); and A stratified (medium). The evaluation indexes used for comparison included: the bias of ability estimates for the true; the standard error of ability estimates; the average items which the examinees have administered; the standard deviation of the frequency of items selected; and sum of the indices weighted. Using the Monte Carlo simulation method, we obtained some data and computer iterated the data 20 times each under the conditions that the item difficulty parameters followed the normal distribution and even distribution. The results were as follows; The results indicated that no matter difficulty parameters followed the normal distribution or even distribution. Every type of item selection strategies designed in this research had its strong and weak points. In general evaluation, under the condition that items were stratified appropriately, A stratified (medium) (ASM) had the best effect. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aCAT10acomputerized adaptive testing10agraded response model10aitem selection strategies10amultiple choice items1 aHai-qi, D1 aDe-zhi, C1 aShuliang, D1 aTaiping, D uhttp://iacat.org/content/comparison-among-item-selection-strategies-cat-multiple-choice-items02648nas a2200397 4500008004100000020002200041245013500063210006900198250001500267260000800282300001200290490000700302520140700309653002601716653003101742653001501773653001001788653000901798653002201807653002501829653003301854653001101887653001101898653000901909653001601918653004601934653003001980653003102010653001302041100001502054700001002069700001802079700001602097700001502113856012202128 2006 eng d a0895-4356 (Print)00aComputer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank0 aComputer adaptive testing improved accuracy and precision of sco a2006/10/10 cNov a1174-820 v593 aBACKGROUND AND OBJECTIVE: Measuring physical functioning (PF) within and across postacute settings is critical for monitoring outcomes of rehabilitation; however, most current instruments lack sufficient breadth and feasibility for widespread use. Computer adaptive testing (CAT), in which item selection is tailored to the individual patient, holds promise for reducing response burden, yet maintaining measurement precision. We calibrated a PF item bank via item response theory (IRT), administered items with a post hoc CAT design, and determined whether CAT would improve accuracy and precision of score estimates over random item selection. METHODS: 1,041 adults were interviewed during postacute care rehabilitation episodes in either hospital or community settings. Responses for 124 PF items were calibrated using IRT methods to create a PF item bank. We examined the accuracy and precision of CAT-based scores compared to a random selection of items. RESULTS: CAT-based scores had higher correlations with the IRT-criterion scores, especially with short tests, and resulted in narrower confidence intervals than scores based on a random selection of items; gains, as expected, were especially large for low and high performing adults. CONCLUSION: The CAT design may have important precision and efficiency advantages for point-of-care functional assessment in rehabilitation practice settings.10a*Recovery of Function10aActivities of Daily Living10aAdolescent10aAdult10aAged10aAged, 80 and over10aConfidence Intervals10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aRehabilitation/*standards10aReproducibility of Results10aSoftware1 aHaley, S M1 aNi, P1 aHambleton, RK1 aSlavin, M D1 aJette, A M uhttp://iacat.org/content/computer-adaptive-testing-improved-accuracy-and-precision-scores-over-random-item-selectio-001466nas a2200205 4500008004100000245002700041210002600068260006000094300001100154490000800165520087300173653005101046653003001097653002001127653001801147653001301165100001501178700001501193856005201208 2006 eng d00aComputer-based testing0 aComputerbased testing aWashington D.C. USAbAmerican Psychological Association a87-1000 vxiv3 a(From the chapter) There has been a proliferation of research designed to explore and exploit opportunities provided by computer-based assessment. This chapter provides an overview of the diverse efforts by researchers in this area. It begins by describing how paper-and-pencil tests can be adapted for administration by computers. Computerization provides the important advantage that items can be selected so they are of appropriate difficulty for each examinee. Some of the psychometric theory needed for computerized adaptive testing is reviewed. Then research on innovative computerized assessments is summarized. These assessments go beyond multiple-choice items by using formats made possible by computerization. Then some hardware and software issues are described, and finally, directions for future work are outlined. (PsycINFO Database Record (c) 2006 APA )10aAdaptive Testing computerized adaptive testing10aComputer Assisted Testing10aExperimentation10aPsychometrics10aTheories1 aDrasgow, F1 aChuah, S C uhttp://iacat.org/content/computer-based-testing03325nas a2200469 4500008004100000020002200041245011600063210006900179250001500248260000800263300001200271490000700283520189400290653003202184653003102216653002202247653002002269653001002289653000902299653002202308653002802330653003302358653001102391653001102402653002502413653000902438653001602447653004602463653002202509653002402531653003002555653002902585100001502614700001502629700001602644700001102660700002402671700001402695700001802709700001002727856011802737 2006 eng d a0003-9993 (Print)00aComputerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes0 aComputerized adaptive testing for followup after discharge from a2006/08/01 cAug a1033-420 v873 aOBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time.10a*Activities of Daily Living10a*Adaptation, Physiological10a*Computer Systems10a*Questionnaires10aAdult10aAged10aAged, 80 and over10aChi-Square Distribution10aFactor Analysis, Statistical10aFemale10aHumans10aLongitudinal Studies10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPatient Discharge10aProspective Studies10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aSiebens, H1 aCoster, W J1 aTao, W1 aBlack-Schaffer, R M1 aGandek, B1 aSinclair, S J1 aNi, P uhttp://iacat.org/content/computerized-adaptive-testing-follow-after-discharge-inpatient-rehabilitation-i-activity01580nas a2200205 4500008004100000020002200041245005000063210005000113260002600163300001200189490000700201520094200208653003401150653002801184653001901212653002001231653003301251100002301284856006701307 2006 eng d a0146-6216 (Print)00aEquating scores from adaptive to linear tests0 aEquating scores from adaptive to linear tests bSage Publications: US a493-5080 v303 aTwo local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test for a population of test takers. The two local methods were generally best. Surprisingly, the TCF method performed slightly worse than the equipercentile method. Both methods showed strong bias and uniformly large inaccuracy, but the TCF method suffered from extra error due to the lower asymptote of the test characteristic function. It is argued that the worse performances of the two methods are a consequence of the fact that they use a single equating transformation for an entire population of test takers and therefore have to compromise between the individual score distributions. 10acomputerized adaptive testing10aequipercentile equating10alocal equating10ascore reporting10atest characteristic function1 avan der Linden, WJ uhttp://iacat.org/content/equating-scores-adaptive-linear-tests02394nas a2200301 4500008004100000020002200041245010800063210006900171260002400240300000900264490000600273520142200279653002201701653003401723653002301757653003201780653001801812653002101830653001801851100001401869700001301883700001401896700001901910700001501929700001701944700001301961856011801974 2006 eng d a1529-7713 (Print)00aExpansion of a physical function item bank and development of an abbreviated form for clinical research0 aExpansion of a physical function item bank and development of an bRichard M Smith: US a1-150 v73 aWe expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aclinical research10acomputerized adaptive testing10aperformance levels10aphysical function item bank10aPsychometrics10atest reliability10aTest Validity1 aBode, R K1 aLai, J-S1 aDineen, K1 aHeinemann, A W1 aShevrin, D1 aVon Roenn, J1 aCella, D uhttp://iacat.org/content/expansion-physical-function-item-bank-and-development-abbreviated-form-clinical-research02156nas a2200289 4500008004100000245010000041210006900141260000800210300001200218490000700230520127700237653003401514653002101548653000901569653001201578653002201590653001101612653001101623653000901634653001601643653002901659653001901688100001301707700001501720700001301735856011801748 2006 eng d00aFactor analysis techniques for assessing sufficient unidimensionality of cancer related fatigue0 aFactor analysis techniques for assessing sufficient unidimension cSep a1179-900 v153 aBACKGROUND: Fatigue is the most common unrelieved symptom experienced by people with cancer. The purpose of this study was to examine whether cancer-related fatigue (CRF) can be summarized using a single score, that is, whether CRF is sufficiently unidimensional for measurement approaches that require or assume unidimensionality. We evaluated this question using factor analysis techniques including the theory-driven bi-factor model. METHODS: Five hundred and fifty five cancer patients from the Chicago metropolitan area completed a 72-item fatigue item bank, covering a range of fatigue-related concerns including intensity, frequency and interference with physical, mental, and social activities. Dimensionality was assessed using exploratory and confirmatory factor analysis (CFA) techniques. RESULTS: Exploratory factor analysis (EFA) techniques identified from 1 to 17 factors. The bi-factor model suggested that CRF was sufficiently unidimensional. CONCLUSIONS: CRF can be considered sufficiently unidimensional for applications that require unidimensionality. One such application, item response theory (IRT), will facilitate the development of short-form and computer-adaptive testing. This may further enable practical and accurate clinical assessment of CRF.10a*Factor Analysis, Statistical10a*Quality of Life10aAged10aChicago10aFatigue/*etiology10aFemale10aHumans10aMale10aMiddle Aged10aNeoplasms/*complications10aQuestionnaires1 aLai, J-S1 aCrane, P K1 aCella, D uhttp://iacat.org/content/factor-analysis-techniques-assessing-sufficient-unidimensionality-cancer-related-fatigue03379nas a2200205 4500008004100000020002200041245009700063210006900160260002500229300001200254490000700266520266000273653003402933653002802967100001502995700001903010700001703029700001403046856011303060 2006 eng d a0439-755X (Print)00a[Item Selection Strategies of Computerized Adaptive Testing based on Graded Response Model.]0 aItem Selection Strategies of Computerized Adaptive Testing based bScience Press: China a461-4670 v383 aItem selection strategy (ISS) is an important component of Computerized Adaptive Testing (CAT). Its performance directly affects the security, efficiency and precision of the test. Thus, ISS becomes one of the central issues in CATs based on the Graded Response Model (GRM). It is well known that the goal of IIS is to administer the next unused item remaining in the item bank that best fits the examinees current ability estimate. In dichotomous IRT models, every item has only one difficulty parameter and the item whose difficulty matches the examinee's current ability estimate is considered to be the best fitting item. However, in GRM, each item has more than two ordered categories and has no single value to represent the item difficulty. Consequently, some researchers have used to employ the average or the median difficulty value across categories as the difficulty estimate for the item. Using the average value and the median value in effect introduced two corresponding ISSs. In this study, we used computer simulation compare four ISSs based on GRM. We also discussed the effect of "shadow pool" on the uniformity of pool usage as well as the influence of different item parameter distributions and different ability estimation methods on the evaluation criteria of CAT. In the simulation process, Monte Carlo method was adopted to simulate the entire CAT process; 1,000 examinees drawn from standard normal distribution and four 1,000-sized item pools of different item parameter distributions were also simulated. The assumption of the simulation is that a polytomous item is comprised of six ordered categories. In addition, ability estimates were derived using two methods. They were expected a posteriori Bayesian (EAP) and maximum likelihood estimation (MLE). In MLE, the Newton-Raphson iteration method and the Fisher Score iteration method were employed, respectively, to solve the likelihood equation. Moreover, the CAT process was simulated with each examinee 30 times to eliminate random error. The IISs were evaluated by four indices usually used in CAT from four aspects--the accuracy of ability estimation, the stability of IIS, the usage of item pool, and the test efficiency. Simulation results showed adequate evaluation of the ISS that matched the estimate of an examinee's current trait level with the difficulty values across categories. Setting "shadow pool" in ISS was able to improve the uniformity of pool utilization. Finally, different distributions of the item parameter and different ability estimation methods affected the evaluation indices of CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10aitem selection strategy1 aPing, Chen1 aShuliang, Ding1 aHaijing, Lin1 aJie, Zhou uhttp://iacat.org/content/item-selection-strategies-computerized-adaptive-testing-based-graded-response-model03115nas a2200277 4500008004100000020002200041245010900063210006900172250001500241260000800256300001200264490000700276520221700283653002902500653002002529653002502549653002102574653001502595653002802610653001102638653002502649100001702674700001502691700001202706856011902718 2006 eng d a0214-9915 (Print)00aMaximum information stratification method for controlling item exposure in computerized adaptive testing0 aMaximum information stratification method for controlling item e a2007/02/14 cFeb a156-1590 v183 aThe proposal for increasing the security in Computerized Adaptive Tests that has received most attention in recent years is the a-stratified method (AS - Chang and Ying, 1999): at the beginning of the test only items with low discrimination parameters (a) can be administered, with the values of the a parameters increasing as the test goes on. With this method, distribution of the exposure rates of the items is less skewed, while efficiency is maintained in trait-level estimation. The pseudo-guessing parameter (c), present in the three-parameter logistic model, is considered irrelevant, and is not used in the AS method. The Maximum Information Stratified (MIS) model incorporates the c parameter in the stratification of the bank and in the item-selection rule, improving accuracy by comparison with the AS, for item banks with a and b parameters correlated and uncorrelated. For both kinds of banks, the blocking b methods (Chang, Qian and Ying, 2001) improve the security of the item bank.Método de estratificación por máxima información para el control de la exposición en tests adaptativos informatizados. La propuesta para aumentar la seguridad en los tests adaptativos informatizados que ha recibido más atención en los últimos años ha sido el método a-estratificado (AE - Chang y Ying, 1999): en los momentos iniciales del test sólo pueden administrarse ítems con bajos parámetros de discriminación (a), incrementándose los valores del parámetro a admisibles según avanza el test. Con este método la distribución de las tasas de exposición de los ítems es más equilibrada, manteniendo una adecuada precisión en la medida. El parámetro de pseudoadivinación (c), presente en el modelo logístico de tres parámetros, se supone irrelevante y no se incorpora en el AE. El método de Estratificación por Máxima Información (EMI) incorpora el parámetro c a la estratificación del banco y a la regla de selección de ítems, mejorando la precisión en comparación con AE, tanto para bancos donde los parámetros a y b correlacionan como para bancos donde no. Para ambos tipos de bancos, los métodos de bloqueo de b (Chang, Qian y Ying, 2001) mejoran la seguridad del banco.10a*Artificial Intelligence10a*Microcomputers10a*Psychological Tests10a*Software Design10aAlgorithms10aChi-Square Distribution10aHumans10aLikelihood Functions1 aBarrada, J R1 aMazuela, P1 aOlea, J uhttp://iacat.org/content/maximum-information-stratification-method-controlling-item-exposure-computerized-adaptive02563nas a2200349 4500008004100000020002200041245016600063210006900229250001500298260000800313300001100321490000700332520142900339653002701768653001601795653001501811653001001826653002101836653001401857653005201871653001501923653001101938653001101949653003701960653001801997653001402015100001502029700001002044700001602054700002502070856011802095 2006 eng d a0003-9993 (Print)00aMeasurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the pediatric evaluation of disability inventory0 aMeasurement precision and efficiency of multidimensional compute a2006/08/29 cSep a1223-90 v873 aOBJECTIVE: To compare the measurement efficiency and precision of a multidimensional computer adaptive testing (M-CAT) application to a unidimensional CAT (U-CAT) comparison using item bank data from 2 of the functional skills scales of the Pediatric Evaluation of Disability Inventory (PEDI). DESIGN: Using existing PEDI mobility and self-care item banks, we compared the stability of item calibrations and model fit between unidimensional and multidimensional Rasch models and compared the efficiency and precision of the U-CAT- and M-CAT-simulated assessments to a random draw of items. SETTING: Pediatric rehabilitation hospital and clinics. PARTICIPANTS: Clinical and normative samples. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Not applicable. RESULTS: The M-CAT had greater levels of precision and efficiency than the separate mobility and self-care U-CAT versions when using a similar number of items for each PEDI subdomain. Equivalent estimation of mobility and self-care scores can be achieved with a 25% to 40% item reduction with the M-CAT compared with the U-CAT. CONCLUSIONS: M-CAT applications appear to have both precision and efficiency advantages compared with separate U-CAT assessments when content subdomains have a high correlation. Practitioners may also realize interpretive advantages of reporting test score information for each subdomain when separate clinical inferences are desired.10a*Disability Evaluation10a*Pediatrics10aAdolescent10aChild10aChild, Preschool10aComputers10aDisabled Persons/*classification/rehabilitation10aEfficiency10aHumans10aInfant10aOutcome Assessment (Health Care)10aPsychometrics10aSelf Care1 aHaley, S M1 aNi, P1 aLudlow, L H1 aFragala-Pinkham, M A uhttp://iacat.org/content/measurement-precision-and-efficiency-multidimensional-computer-adaptive-testing-physical02429nas a2200241 4500008004100000020004600041245008600087210006900173260002500242300001200267490000700279520159900286653001801885653002401903653002001927653002801947653002101975653004001996653001402036100001802050700001202068856010702080 2006 eng d a0895-7347 (Print); 1532-4818 (Electronic)00aOptimal and nonoptimal computer-based test designs for making pass-fail decisions0 aOptimal and nonoptimal computerbased test designs for making pas bLawrence Erlbaum: US a221-2390 v193 aNow that many credentialing exams are being routinely administered by computer, new computer-based test designs, along with item response theory models, are being aggressively researched to identify specific designs that can increase the decision consistency and accuracy of pass-fail decisions. The purpose of this study was to investigate the impact of optimal and nonoptimal multistage test (MST) designs, linear parallel-form test designs (LPFT), and computer adaptive test (CAT) designs on the decision consistency and accuracy of pass-fail decisions. Realistic testing situations matching those of one of the large credentialing agencies were simulated to increase the generalizability of the findings. The conclusions were clear: (a) With the LPFTs, matching test information functions (TIFs) to the mean of the proficiency distribution produced slightly better results than matching them to the passing score; (b) all of the test designs worked better than test construction using random selection of items, subject to content constraints only; (c) CAT performed better than the other test designs; and (d) if matching a TIP to the passing score, the MST design produced a bit better results than the LPFT design. If an argument for the MST design is to be made, it can be made on the basis of slight improvements over the LPFT design and better expected item bank utilization, candidate preference, and the potential for improved diagnostic feedback, compared with the feedback that is possible with fixed linear test forms. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aadaptive test10acredentialing exams10aDecision Making10aEducational Measurement10amultistage tests10aoptimal computer-based test designs10atest form1 aHambleton, RK1 aXing, D uhttp://iacat.org/content/optimal-and-nonoptimal-computer-based-test-designs-making-pass-fail-decisions01617nas a2200217 4500008004100000020002200041245008200063210006900145260002600214300001200240490000700252520088700259653002801146653002501174653002501199653001901224653001601243100001601259700002501275856009901300 2006 eng d a0146-6216 (Print)00aOptimal testing with easy or difficult items in computerized adaptive testing0 aOptimal testing with easy or difficult items in computerized ada bSage Publications: US a379-3930 v303 aComputerized adaptive tests (CATs) are individualized tests that, from a measurement point of view, are optimal for each individual, possibly under some practical conditions. In the present study, it is shown that maximum information item selection in CATs using an item bank that is calibrated with the one- or the two-parameter logistic model results in each individual answering about 50% of the items correctly. Two item selection procedures giving easier (or more difficult) tests for students are presented and evaluated. Item selection on probability points of items yields good results only with the one-parameter logistic model and not with the two-parameter logistic model. An alternative selection procedure, based on maximum information at a shifted ability level, gives satisfactory results with both models. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputer adaptive tests10aindividualized tests10aItem Response Theory10aitem selection10aMeasurement1 aEggen, Theo1 aVerschoor, Angela, J uhttp://iacat.org/content/optimal-testing-easy-or-difficult-items-computerized-adaptive-testing02342nas a2200217 4500008004100000020002200041245008000063210006900143260002600212300001000238490000700248520160400255653003001859653002101889653003201910653003001942653002501972100001501997700001502012856009702027 2006 eng d a0146-6216 (Print)00aSIMCAT 1.0: A SAS computer program for simulating computer adaptive testing0 aSIMCAT 10 A SAS computer program for simulating computer adaptiv bSage Publications: US a60-610 v303 aMonte Carlo methodologies are frequently applied to study the sampling distribution of the estimated proficiency level in adaptive testing. These methods eliminate real situational constraints. However, these Monte Carlo methodologies are not currently supported by the available software programs, and when these programs are available, their flexibility is limited. SIMCAT 1.0 is aimed at the simulation of adaptive testing sessions under different adaptive expected a posteriori (EAP) proficiency-level estimation methods (Blais & Raîche, 2005; Raîche & Blais, 2005) based on the one-parameter Rasch logistic model. These methods are all adaptive in the a priori proficiency-level estimation, the proficiency-level estimation bias correction, the integration interval, or a combination of these factors. The use of these adaptive EAP estimation methods diminishes considerably the shrinking, and therefore biasing, effect of the estimated a priori proficiency level encountered when this a priori is fixed at a constant value independently of the computed previous value of the proficiency level. SIMCAT 1.0 also computes empirical and estimated skewness and kurtosis coefficients, such as the standard error, of the estimated proficiency-level sampling distribution. In this way, the program allows one to compare empirical and estimated properties of the estimated proficiency-level sampling distribution under different variations of the EAP estimation method: standard error and bias, like the skewness and kurtosis coefficients. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputer adaptive testing10acomputer program10aestimated proficiency level10aMonte Carlo methodologies10aRasch logistic model1 aRaîche, G1 aBlais, J-G uhttp://iacat.org/content/simcat-10-sas-computer-program-simulating-computer-adaptive-testing02107nas a2200229 4500008004100000245013800041210006900179300001400248490000700262520127000269653003101539653003401570653002501604653001701629653001901646653002401665100001401689700001801703700001701721700001901738856012001757 2006 eng d00aSimulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function0 aSimulated computerized adaptive test for patients with lumbar sp a947–9560 v593 aObjective: To equate physical functioning (PF) items with Back Pain Functional Scale (BPFS) items, develop a computerized adaptive test (CAT) designed to assess lumbar spine functional status (LFS) in people with lumbar spine impairments, and compare discriminant validity of LFS measures (qIRT) generated using all items analyzed with a rating scale Item Response Theory model (RSM) and measures generated using the simulated CAT (qCAT).
Methods: We performed a secondary analysis of retrospective intake rehabilitation data.
Results: Unidimensionality and local independence of 25 BPFS and PF items were supported. Differential item functioning was negligible for levels of symptom acuity, gender, age, and surgical history. The RSM fit the data well. A lumbar spine specific CAT was developed
that was 72% more efficient than using all 25 items to estimate LFS measures. qIRT and qCAT measures did not discriminate patients by symptom acuity, age, or gender, but discriminated patients by surgical history in similar clinically logical ways. qCAT measures were as precise as qIRT measures.
Conclusion: A body part specific simulated CAT developed from an LFS item bank was efficient and produced precise measures of LFS without eroding discriminant validity.10aBack Pain Functional Scale10acomputerized adaptive testing10aItem Response Theory10aLumbar spine10aRehabilitation10aTrue-score equating1 aHart, D L1 aMioduski, J E1 aWerneke, M W1 aStratford, P W uhttp://iacat.org/content/simulated-computerized-adaptive-test-patients-lumbar-spine-impairments-was-efficient-and-002068nas a2200217 4500008004500000245013400045210006900179300001200248490000700260520127300267653003401540653004201574653002501616653001901641100001401660700001301674700001801687700001401705700001501719856011601734 2006 Engldsh 00aSimulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function0 aSimulated computerized adaptive test for patients with shoulder a290-2980 v593 aBackground and Objective: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items,
develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (qIRT) and measures generated using the simulated CAT (qCAT).
Study Design and Setting: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients
with shoulder impairments who completed 60 SFS items.
Results: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items on were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The qIRT and qCAT measures were highly correlated (r 5 .96) and resulted in similar classifications of patients.
Conclusion: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good
discriminating ability.
10acomputerized adaptive testing10aFlexilevel Scale of Shoulder Function10aItem Response Theory10aRehabilitation1 aHart, D L1 aCook, KF1 aMioduski, J E1 aTeal, C R1 aCrane, P K uhttp://iacat.org/content/simulated-computerized-adaptive-test-patients-shoulder-impairments-was-efficient-and-002247nas a2200301 4500008004500000020001400045245009800059210007100157300001200228490000700240520127200247653003201519653002601551653002801577653001801605653002501623653001601648653001201664653001501676653001801691653001601709653001801725653002701743653001101770100001901781700001601800856012901816 2006 Spandsh a0212-972800aTécnicas para detectar patrones de respuesta atípicos [Aberrant patterns detection methods]0 aTécnicas para detectar patrones de respuesta atípicos Aberrant p a143-1540 v223 aLa identificación de patrones de respuesta atípicos es de gran utilidad para la construcción de tests y de bancos de ítems con propiedades psicométricas así como para el análisis de validez de los mismos. En este trabajo de revisión se han recogido los más relevantes y novedosos métodos de ajuste de personas que se han elaborado dentro de cada uno de los principales ámbitos de trabajo de la Psicometría: el escalograma de Guttman, la Teoría Clásica de Tests (TCT), la Teoría de la Generalizabilidad (TG), la Teoría de Respuesta al Ítem (TRI), los Modelos de Respuesta al Ítem No Paramétricos (MRINP), los Modelos de Clase Latente de Orden Restringido (MCL-OR) y el Análisis de Estructura de Covarianzas (AEC).Aberrant patterns detection has a great usefulness in order to make tests and item banks with psychometric characteristics and validity analysis of tests and items. The most relevant and newest person-fit methods have been reviewed. All of them have been made in each one of main areas of Psychometry: Guttman's scalogram, Classical Test Theory (CTT), Generalizability Theory (GT), Item Response Theory (IRT), Non-parametric Response Models (NPRM), Order-Restricted Latent Class Models (OR-LCM) and Covariance Structure Analysis (CSA).10aaberrant patterns detection10aClassical Test Theory10ageneralizability theory10aItem Response10aItem Response Theory10aMathematics10amethods10aperson-fit10aPsychometrics10apsychometry10aTest Validity10atest validity analysis10aTheory1 aNúñez, R M N1 aPina, J A L uhttp://iacat.org/content/t%C3%A9cnicas-para-detectar-patrones-de-respuesta-at%C3%ADpicos-aberrant-patterns-detection-methods01465nas a2200217 4500008004100000245015400041210006900195260004600264300001200310520063100322653003000953653001100983653002500994653001601019653002201035653002301057100001801080700001501098700001401113856012001127 2005 eng d00aApplications of item response theory to improve health outcomes assessment: Developing item banks, linking instruments, and computer-adaptive testing0 aApplications of item response theory to improve health outcomes aCambridge, UKbCambridge University Press a445-4643 a(From the chapter) The current chapter builds on Reise's introduction to the basic concepts, assumptions, popular models, and important features of IRT and discusses the applications of item response theory (IRT) modeling to health outcomes assessment. In particular, we highlight the critical role of IRT modeling in: developing an instrument to match a study's population; linking two or more instruments measuring similar constructs on a common metric; and creating item banks that provide the foundation for tailored short-form instruments or for computerized adaptive assessments. (PsycINFO Database Record (c) 2005 APA )10aComputer Assisted Testing10aHealth10aItem Response Theory10aMeasurement10aTest Construction10aTreatment Outcomes1 aHambleton, RK1 aGotay, C C1 aSnyder, C uhttp://iacat.org/content/applications-item-response-theory-improve-health-outcomes-assessment-developing-item-banks03119nas a2200385 4500008004100000020002200041245012900063210006900192250001500261260000800276300001000284490000700294520188400301653002502185653002702210653001502237653001002252653002102262653002802283653003802311653001102349653001102360653001102371653000902382653004602391653002702437653003002464653003202494100001502526700001602541700001602557700001502573700002502588856012002613 2005 eng d a0003-9993 (Print)00aAssessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory0 aAssessing mobility in children using a computer adaptive testing a2005/05/17 cMay a932-90 v863 aOBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time.10a*Computer Simulation10a*Disability Evaluation10aAdolescent10aChild10aChild, Preschool10aCross-Sectional Studies10aDisabled Children/*rehabilitation10aFemale10aHumans10aInfant10aMale10aOutcome Assessment (Health Care)/*methods10aRehabilitation Centers10aRehabilitation/*standards10aSensitivity and Specificity1 aHaley, S M1 aRaczek, A E1 aCoster, W J1 aDumas, H M1 aFragala-Pinkham, M A uhttp://iacat.org/content/assessing-mobility-children-using-computer-adaptive-testing-version-pediatric-evaluation-002238nas a2200205 4500008004100000020004600041245009500087210006900182260002700251300001200278490000700290520149400297653002701791653003001818653001701848653002501865100001901890700001001909856011301919 2005 eng d a1560-4292 (Print); 1560-4306 (Electronic)00aA Bayesian student model without hidden nodes and its comparison with item response theory0 aBayesian student model without hidden nodes and its comparison w bIOS Press: Netherlands a291-3230 v153 aThe Bayesian framework offers a number of techniques for inferring an individual's knowledge state from evidence of mastery of concepts or skills. A typical application where such a technique can be useful is Computer Adaptive Testing (CAT). A Bayesian modeling scheme, POKS, is proposed and compared to the traditional Item Response Theory (IRT), which has been the prevalent CAT approach for the last three decades. POKS is based on the theory of knowledge spaces and constructs item-to-item graph structures without hidden nodes. It aims to offer an effective knowledge assessment method with an efficient algorithm for learning the graph structure from data. We review the different Bayesian approaches to modeling student ability assessment and discuss how POKS relates to them. The performance of POKS is compared to the IRT two parameter logistic model. Experimental results over a 34 item Unix test and a 160 item French language test show that both approaches can classify examinees as master or non-master effectively and efficiently, with relatively comparable performance. However, more significant differences are found in favor of POKS for a second task that consists in predicting individual question item outcome. Implications of these results for adaptive testing and student modeling are discussed, as well as the limitations and advantages of POKS, namely the issue of integrating concepts into its structure. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aBayesian Student Model10acomputer adaptive testing10ahidden nodes10aItem Response Theory1 aDesmarais, M C1 aPu, X uhttp://iacat.org/content/bayesian-student-model-without-hidden-nodes-and-its-comparison-item-response-theory02148nas a2200229 4500008004100000020002200041245008700063210006900150260004100219300001200260490000700272520135500279653002101634653001501655653002401670653002601694653002501720653002101745653003101766100002301797856009801820 2005 eng d a0022-0655 (Print)00aA comparison of item-selection methods for adaptive tests with content constraints0 acomparison of itemselection methods for adaptive tests with cont bBlackwell Publishing: United Kingdom a283-3020 v423 aIn test assembly, a fundamental difference exists between algorithms that select a test sequentially or simultaneously. Sequential assembly allows us to optimize an objective function at the examinee's ability estimate, such as the test information function in computerized adaptive testing. But it leads to the non-trivial problem of how to realize a set of content constraints on the test—a problem more naturally solved by a simultaneous item-selection method. Three main item-selection methods in adaptive testing offer solutions to this dilemma. The spiraling method moves item selection across categories of items in the pool proportionally to the numbers needed from them. Item selection by the weighted-deviations method (WDM) and the shadow test approach (STA) is based on projections of the future consequences of selecting an item. These two methods differ in that the former calculates a projection of a weighted sum of the attributes of the eventual test and the latter a projection of the test itself. The pros and cons of these methods are analyzed. An empirical comparison between the WDM and STA was conducted for an adaptive version of the Law School Admission Test (LSAT), which showed equally good item-exposure rates but violations of some of the constraints and larger bias and inaccuracy of the ability estimator for the WDM.10aAdaptive Testing10aAlgorithms10acontent constraints10aitem selection method10ashadow test approach10aspiraling method10aweighted deviations method1 avan der Linden, WJ uhttp://iacat.org/content/comparison-item-selection-methods-adaptive-tests-content-constraints01601nas a2200253 4500008004100000020002200041245003000063210003000093250001500123300001100138490000600149520095200155653001401107653002501121653002901146653001801175653001901193653001101212653001401223653001901237653002001256100001601276856005501292 2005 eng d a1529-7713 (Print)00aComputer adaptive testing0 aComputer adaptive testing a2005/02/11 a109-270 v63 aThe creation of item response theory (IRT) and Rasch models, inexpensive accessibility to high speed desktop computers, and the growth of the Internet, has led to the creation and growth of computerized adaptive testing or CAT. This form of assessment is applicable for both high stakes tests such as certification or licensure exams, as well as health related quality of life surveys. This article discusses the historical background of CAT including its many advantages over conventional (typically paper and pencil) alternatives. The process of CAT is then described including descriptions of the specific differences of using CAT based upon 1-, 2- and 3-parameter IRT and various Rasch models. Numerous specific topics describing CAT in practice are described including: initial item selection, content balancing, test difficulty, test length and stopping rules. The article concludes with the author's reflections regarding the future of CAT.10a*Internet10a*Models, Statistical10a*User-Computer Interface10aCertification10aHealth Surveys10aHumans10aLicensure10aMicrocomputers10aQuality of Life1 aGershon, RC uhttp://iacat.org/content/computer-adaptive-testing02787nas a2200469 4500008004100000020002200041245010400063210006900167250001500236260000800251300001200259490000700271520132800278653002201606653003101628653001501659653001601674653001001690653003401700653002101734653002401755653002501779653001501804653001101819653005301830653002901883653001101912653001101923653002001934653000901954653003101963653004601994653003102040653001402071653003202085100001502117700001002132700002502142700001702167700001302184856012002197 2005 eng d a0012-1622 (Print)00aA computer adaptive testing approach for assessing physical functioning in children and adolescents0 acomputer adaptive testing approach for assessing physical functi a2005/02/15 cFeb a113-1200 v473 aThe purpose of this article is to demonstrate: (1) the accuracy and (2) the reduction in amount of time and effort in assessing physical functioning (self-care and mobility domains) of children and adolescents using computer-adaptive testing (CAT). A CAT algorithm selects questions directly tailored to the child's ability level, based on previous responses. Using a CAT algorithm, a simulation study was used to determine the number of items necessary to approximate the score of a full-length assessment. We built simulated CAT (5-, 10-, 15-, and 20-item versions) for self-care and mobility domains and tested their accuracy in a normative sample (n=373; 190 males, 183 females; mean age 6y 11mo [SD 4y 2m], range 4mo to 14y 11mo) and a sample of children and adolescents with Pompe disease (n=26; 21 males, 5 females; mean age 6y 1mo [SD 3y 10mo], range 5mo to 14y 10mo). Results indicated that comparable score estimates (based on computer simulations) to the full-length tests can be achieved in a 20-item CAT version for all age ranges and for normative and clinical samples. No more than 13 to 16% of the items in the full-length tests were needed for any one administration. These results support further consideration of using CAT programs for accurate and efficient clinical assessments of physical functioning.10a*Computer Systems10aActivities of Daily Living10aAdolescent10aAge Factors10aChild10aChild Development/*physiology10aChild, Preschool10aComputer Simulation10aConfidence Intervals10aDemography10aFemale10aGlycogen Storage Disease Type II/physiopathology10aHealth Status Indicators10aHumans10aInfant10aInfant, Newborn10aMale10aMotor Activity/*physiology10aOutcome Assessment (Health Care)/*methods10aReproducibility of Results10aSelf Care10aSensitivity and Specificity1 aHaley, S M1 aNi, P1 aFragala-Pinkham, M A1 aSkrinar, A M1 aCorzo, D uhttp://iacat.org/content/computer-adaptive-testing-approach-assessing-physical-functioning-children-and-adolescents02348nas a2200193 4500008004100000245008700041210006900128300001200197490000700209520170500216653003001921653002201951653001401973653002801987653001302015100001002028700001302038856010302051 2005 eng d00aA computer-assisted test design and diagnosis system for use by classroom teachers0 acomputerassisted test design and diagnosis system for use by cla a419-4290 v213 aComputer-assisted assessment (CAA) has become increasingly important in education in recent years. A variety of computer software systems have been developed to help assess the performance of students at various levels. However, such systems are primarily designed to provide objective assessment of students and analysis of test items, and focus has been mainly placed on higher and further education. Although there are commercial professional systems available for use by primary and secondary educational institutions, such systems are generally expensive and require skilled expertise to operate. In view of the rapid progress made in the use of computer-based assessment for primary and secondary students by education authorities here in the UK and elsewhere, there is a need to develop systems which are economic and easy to use and can provide the necessary information that can help teachers improve students' performance. This paper presents the development of a software system that provides a range of functions including generating items and building item banks, designing tests, conducting tests on computers and analysing test results. Specifically, the system can generate information on the performance of students and test items that can be easily used to identify curriculum areas where students are under performing. A case study based on data collected from five secondary schools in Hong Kong involved in the Curriculum, Evaluation and Management Centre's Middle Years Information System Project, Durham University, UK, has been undertaken to demonstrate the use of the system for diagnostic and performance analysis. (PsycINFO Database Record (c) 2006 APA ) (journal abstract)10aComputer Assisted Testing10aComputer Software10aDiagnosis10aEducational Measurement10aTeachers1 aHe, Q1 aTymms, P uhttp://iacat.org/content/computer-assisted-test-design-and-diagnosis-system-use-classroom-teachers01555nas a2200169 4500008004100000245008000041210006900121300001200190490000700202520094200209653002101151653003001172653005401202100001401256700001301270856010201283 2005 eng d00aControlling item exposure and test overlap in computerized adaptive testing0 aControlling item exposure and test overlap in computerized adapt a204-2170 v293 aThis article proposes an item exposure control method, which is the extension of the Sympson and Hetter procedure and can provide item exposure control at both the item and test levels. Item exposure rate and test overlap rate are two indices commonly used to track item exposure in computerized adaptive tests. By considering both indices, item exposure can be monitored at both the item and test levels. To control the item exposure rate and test overlap rate simultaneously, the modified procedure attempted to control not only the maximum value but also the variance of item exposure rates. Results indicated that the item exposure rate and test overlap rate could be controlled simultaneously by implementing the modified procedure. Item exposure control was improved and precision of trait estimation decreased when a prespecified maximum test overlap rate was stringent. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aItem Content (Test) computerized adaptive testing1 aChen, S-Y1 aLei, P-W uhttp://iacat.org/content/controlling-item-exposure-and-test-overlap-computerized-adaptive-testing01996nas a2200205 4500008004100000020004600041245007900087210006900166260004100235300001400276490000700290520127200297653003001569653002501599653003401624100001301658700001801671700001601689856008501705 2005 eng d a0017-9124 (Print); 1475-6773 (Electronic)00aDynamic assessment of health outcomes: Time to let the CAT out of the bag?0 aDynamic assessment of health outcomes Time to let the CAT out of bBlackwell Publishing: United Kingdom a1694-17110 v403 aBackground: The use of item response theory (IRT) to measure self-reported outcomes has burgeoned in recent years. Perhaps the most important application of IRT is computer-adaptive testing (CAT), a measurement approach in which the selection of items is tailored for each respondent. Objective. To provide an introduction to the use of CAT in the measurement of health outcomes, describe several IRT models that can be used as the basis of CAT, and discuss practical issues associated with the use of adaptive scaling in research settings. Principal Points: The development of a CAT requires several steps that are not required in the development of a traditional measure including identification of "starting" and "stopping" rules. CAT's most attractive advantage is its efficiency. Greater measurement precision can be achieved with fewer items. Disadvantages of CAT include the high cost and level of technical expertise required to develop a CAT. Conclusions: Researchers, clinicians, and patients benefit from the availability of psychometrically rigorous measures that are not burdensome. CAT outcome measures hold substantial promise in this regard, but their development is not without challenges. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputer adaptive testing10aItem Response Theory10aself reported health outcomes1 aCook, KF1 aO'Malley, K J1 aRoddey, T S uhttp://iacat.org/content/dynamic-assessment-health-outcomes-time-let-cat-out-bag02237nas a2200217 4500008004100000020002200041245014200063210006900205260004100274300001200315490000700327520142600334653001401760653003401774653002301808653001601831653002701847100001201874700001701886856011601903 2005 eng d a0022-0655 (Print)00aIncreasing the homogeneity of CAT's item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests0 aIncreasing the homogeneity of CATs itemexposure rates by minimiz bBlackwell Publishing: United Kingdom a245-2690 v423 aA computerized adaptive testing (CAT) algorithm that has the potential to increase the homogeneity of CATs item-exposure rates without significantly sacrificing the precision of ability estimates was proposed and assessed in the shadow-test (van der Linden & Reese, 1998) CAT context. This CAT algorithm was formed by a combination of maximizing or minimizing varied target functions while assembling shadow tests. There were four target functions to be separately used in the first, second, third, and fourth quarter test of CAT. The elements to be used in the four functions were associated with (a) a random number assigned to each item, (b) the absolute difference between an examinee's current ability estimate and an item difficulty, (c) the absolute difference between an examinee's current ability estimate and an optimum item difficulty, and (d) item information. The results indicated that this combined CAT fully utilized all the items in the pool, reduced the maximum exposure rates, and achieved more homogeneous exposure rates. Moreover, its precision in recovering ability estimates was similar to that of the maximum item-information method. The combined CAT method resulted in the best overall results compared with the other individual CAT item-selection methods. The findings from the combined CAT are encouraging. Future uses are discussed. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aalgorithm10acomputerized adaptive testing10aitem exposure rate10ashadow test10avaried target function1 aLi, Y H1 aSchafer, W D uhttp://iacat.org/content/increasing-homogeneity-cats-item-exposure-rates-minimizing-or-maximizing-varied-target01739nas a2200229 4500008004100000245008300041210006900124300001100193490000700204520103000211653003401241100001301275700001401288700001501302700001701317700001501334700001501349700001401364700001301378700001301391856010501404 2005 eng d00aAn item response theory-based pain item bank can enhance measurement precision0 aitem response theorybased pain item bank can enhance measurement a278-880 v303 aCancer-related pain is often under-recognized and undertreated. This is partly due to the lack of appropriate assessments, which need to be comprehensive and precise yet easily integrated into clinics. Computerized adaptive testing (CAT) can enable precise-yet-brief assessments by only selecting the most informative items from a calibrated item bank. The purpose of this study was to create such a bank. The sample included 400 cancer patients who were asked to complete 61 pain-related items. Data were analyzed using factor analysis and the Rasch model. The final bank consisted of 43 items which satisfied the measurement requirement of factor analysis and the Rasch model, demonstrated high internal consistency and reasonable item-total correlations, and discriminated patients with differing degrees of pain. We conclude that this bank demonstrates good psychometric properties, is sensitive to pain reported by patients, and can be used as the foundation for a CAT pain-testing platform for use in clinical practice.10acomputerized adaptive testing1 aLai, J-S1 aDineen, K1 aReeve, B B1 aVon Roenn, J1 aShervin, D1 aMcGuire, M1 aBode, R K1 aPaice, J1 aCella, D uhttp://iacat.org/content/item-response-theory-based-pain-item-bank-can-enhance-measurement-precision02898nas a2200409 4500008004100000245012300041210006900164260000800233300001000241490000700251520159700258653004701855653001001902653000901912653001901921653003101940653002601971653001101997653002902008653001102037653000902048653001602057653003902073653001402112653002502126653002702151653003002178653003202208653002802240653002202268100001502290700001602305700001702321700001602338700001502354856011902369 2005 eng d00aMeasuring physical function in patients with complex medical and postsurgical conditions: a computer adaptive approach0 aMeasuring physical function in patients with complex medical and cOct a741-80 v843 aOBJECTIVE: To examine whether the range of disability in the medically complex and postsurgical populations receiving rehabilitation is adequately sampled by the new Activity Measure--Post-Acute Care (AM-PAC), and to assess whether computer adaptive testing (CAT) can derive valid patient scores using fewer questions. DESIGN: Observational study of 158 subjects (mean age 67.2 yrs) receiving skilled rehabilitation services in inpatient (acute rehabilitation hospitals, skilled nursing facility units) and community (home health services, outpatient departments) settings for recent-onset or worsening disability from medical (excluding neurological) and surgical (excluding orthopedic) conditions. Measures were interviewer-administered activity questions (all patients) and physical functioning portion of the SF-36 (outpatients) and standardized chart items (11 Functional Independence Measure (FIM), 19 Standardized Outcome and Assessment Information Set (OASIS) items, and 22 Minimum Data Set (MDS) items). Rasch modeling analyzed all data and the relationship between person ability estimates and average item difficulty. CAT assessed the ability to derive accurate patient scores using a sample of questions. RESULTS: The 163-item activity item pool covered the range of physical movement and personal and instrumental activities. CAT analysis showed comparable scores between estimates using 10 items or the total item pool. CONCLUSION: The AM-PAC can assess a broad range of function in patients with complex medical illness. CAT achieves valid patient scores using fewer questions.10aActivities of Daily Living/*classification10aAdult10aAged10aCohort Studies10aContinuity of Patient Care10aDisability Evaluation10aFemale10aHealth Services Research10aHumans10aMale10aMiddle Aged10aPostoperative Care/*rehabilitation10aPrognosis10aRecovery of Function10aRehabilitation Centers10aRehabilitation/*standards10aSensitivity and Specificity10aSickness Impact Profile10aTreatment Outcome1 aSiebens, H1 aAndres, P L1 aPengsheng, N1 aCoster, W J1 aHaley, S M uhttp://iacat.org/content/measuring-physical-function-patients-complex-medical-and-postsurgical-conditions-computer01910nas a2200157 4500008004100000245010500041210006900146300001000215490000700225520132900232653003401561100001501595700001301610700001301623856011601636 2005 eng d00aThe promise of PROMIS: using item response theory to improve assessment of patient-reported outcomes0 apromise of PROMIS using item response theory to improve assessme aS53-70 v233 aPROMIS (Patient-Reported-Outcomes Measurement Information System) is an NIH Roadmap network project intended to improve the reliability, validity, and precision of PROs and to provide definitive new instruments that will exceed the capabilities of classic instruments and enable improved outcome measurement for clinical research across all NIH institutes. Item response theory (IRT) measurement models now permit us to transition conventional health status assessment into an era of item banking and computerized adaptive testing (CAT). Item banking uses IRT measurement models and methods to develop item banks from large pools of items from many available questionnaires. IRT allows the reduction and improvement of items and assembles domains of items which are unidimensional and not excessively redundant. CAT provides a model-driven algorithm and software to iteratively select the most informative remaining item in a domain until a desired degree of precision is obtained. Through these approaches the number of patients required for a clinical trial may be reduced while holding statistical power constant. PROMIS tools, expected to improve precision and enable assessment at the individual patient level which should broaden the appeal of PROs, will begin to be available to the general medical community in 2008.10acomputerized adaptive testing1 aFries, J F1 aBruce, B1 aCella, D uhttp://iacat.org/content/promise-promis-using-item-response-theory-improve-assessment-patient-reported-outcomes02850nas a2200241 4500008004100000245018600041210007000227300001200297490000700309520195000316653003002266653002502296653001802321653002502339653001802364653001802382653001102400100001402411700001502425700001902440700002002459856012902479 2005 eng d00aPropiedades psicométricas de un test Adaptativo Informatizado para la medición del ajuste emocional [Psychometric properties of an Emotional Adjustment Computerized Adaptive Test]0 aPropiedades psicométricas de un test Adaptativo Informatizado pa a484-4910 v173 aEn el presente trabajo se describen las propiedades psicométricas de un Test Adaptativo Informatizado para la medición del ajuste emocional de las personas. La revisión de la literatura acerca de la aplicación de los modelos de la teoría de la respuesta a los ítems (TRI) muestra que ésta se ha utilizado más en el trabajo con variables aptitudinales que para la medición de variables de personalidad, sin embargo diversos estudios han mostrado la eficacia de la TRI para la descripción psicométrica de dichasvariables. Aun así, pocos trabajos han explorado las características de un Test Adaptativo Informatizado, basado en la TRI, para la medición de una variable de personalidad como es el ajuste emocional. Nuestros resultados muestran la eficiencia del TAI para la evaluación del ajuste emocional, proporcionando una medición válida y precisa, utilizando menor número de elementos de medida encomparación con las escalas de ajuste emocional de instrumentos fuertemente implantados. Psychometric properties of an emotional adjustment computerized adaptive test. In the present work it was described the psychometric properties of an emotional adjustment computerized adaptive test. An examination of Item Response Theory (IRT) research literature indicates that IRT has been mainly used for assessing achievements and ability rather than personality factors. Nevertheless last years have shown several studies wich have successfully used IRT to personality assessment instruments. Even so, a few amount of works has inquired the computerized adaptative test features, based on IRT, for the measurement of a personality traits as it’s the emotional adjustment. Our results show the CAT efficiency for the emotional adjustment assessment so this provides a valid and accurate measurement; by using a less number of items in comparison with the emotional adjustment scales from the most strongly established questionnaires.10aComputer Assisted Testing10aEmotional Adjustment10aItem Response10aPersonality Measures10aPsychometrics10aTest Validity10aTheory1 aAguado, D1 aRubio, V J1 aHontangas, P M1 aHernández, J M uhttp://iacat.org/content/propiedades-psicom%C3%A9tricas-de-un-test-adaptativo-informatizado-para-la-medici%C3%B3n-del-ajuste01733nas a2200181 4500008004100000245014500041210006900186300001200255490000700267520104600274653003001320653002401350653001501374100001301389700001701402700002101419856011101440 2005 eng d00aA randomized experiment to compare conventional, computerized, and computerized adaptive administration of ordinal polytomous attitude items0 arandomized experiment to compare conventional computerized and c a159-1830 v293 aA total of 520 high school students were randomly assigned to a paper-and-pencil test (PPT), a computerized standard test (CST), or a computerized adaptive test (CAT) version of the Dutch School Attitude Questionnaire (SAQ), consisting of ordinal polytomous items. The CST administered items in the same order as the PPT. The CAT administered all items of three SAQ subscales in adaptive order using Samejima's graded response model, so that six different stopping rule settings could be applied afterwards. School marks were used as external criteria. Results showed significant but small multivariate administration mode effects on conventional raw scores and small to medium effects on maximum likelihood latent trait estimates. When the precision of CAT latent trait estimates decreased, correlations with grade point average in general decreased. However, the magnitude of the decrease was not very large as compared to the PPT, the CST, and the CAT without the stopping rule. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aComputer Assisted Testing10aTest Administration10aTest Items1 aHol, A M1 aVorst, H C M1 aMellenbergh, G J uhttp://iacat.org/content/randomized-experiment-compare-conventional-computerized-and-computerized-adaptive00507nas a2200157 4500008004100000020001000041245004300051210004300094260002600137653003000163653002800193653003400221653001700255100001200272856006500284 2005 eng d a05-0500aRecent trends in comparability studies0 aRecent trends in comparability studies bPearsoncAugust, 200510acomputer adaptive testing10aComputerized assessment10adifferential item functioning10aMode effects1 aPaek, P uhttp://iacat.org/content/recent-trends-comparability-studies01472nas a2200193 4500008004100000245020200041210006900243300001200312490000700324520066400331653002100995653003001016653005501046653001101101653001801112100001401130700001701144856011701161 2005 eng d00aSomministrazione di test computerizzati di tipo adattivo: Un' applicazione del modello di misurazione di Rasch [Administration of computerized and adaptive tests: An application of the Rasch Model]0 aSomministrazione di test computerizzati di tipo adattivo Un appl a131-1490 v123 aThe aim of the present study is to describe the characteristics of a procedure for administering computerized and adaptive tests (Computer Adaptive Testing or CAT). Items to be asked to the individuals are interactively chosen and are selected from a "bank" in which they were previously calibrated and recorded on the basis of their difficulty level. The selection of items is performed by increasingly more accurate estimates of the examinees' ability. The building of an item-bank on Psychometrics and the implementation of this procedure allow a first validation through Monte Carlo simulations. (PsycINFO Database Record (c) 2006 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aItem Response Theory computerized adaptive testing10aModels10aPsychometrics1 aMiceli, R1 aMolinengo, G uhttp://iacat.org/content/somministrazione-di-test-computerizzati-di-tipo-adattivo-un-applicazione-del-modello-di01468nas a2200217 4500008004100000245004600041210004600087300001200133490000700145520085100152653001801003653002501021653003201046653001301078653002201091653002401113653001501137100001601152700001501168856006701183 2005 eng d00aTest construction for cognitive diagnosis0 aTest construction for cognitive diagnosis a262-2770 v293 aAlthough cognitive diagnostic models (CDMs) can be useful in the analysis and interpretation of existing tests, little has been developed to specify how one might construct a good test using aspects of the CDMs. This article discusses the derivation of a general CDM index based on Kullback-Leibler information that will serve as a measure of how informative an item is for the classification of examinees. The effectiveness of the index is examined for items calibrated using the deterministic input noisy "and" gate model (DINA) and the reparameterized unified model (RUM) by implementing a simple heuristic to construct a test from an item bank. When compared to randomly constructed tests from the same item bank, the heuristic shows significant improvement in classification rates. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10a(Measurement)10aCognitive Assessment10aItem Analysis (Statistical)10aProfiles10aTest Construction10aTest Interpretation10aTest Items1 aHenson, R K1 aDouglas, J uhttp://iacat.org/content/test-construction-cognitive-diagnosis03703nas a2200481 4500008004100000245005200041210005200093300001200145490000700157520221100164653001902375653002902394653005802423653001002481653005302491653000902544653001102553653002502564653002602589653003302615653001102648653001002659653000902669653001602678653002402694653007402718653001802792653002902810653005802839653003102897653003202928653003602960653003202996100001503028700001603043700001603059700001603075700001003091700001403101700001803115700001503133856007303148 2004 eng d00aActivity outcome measurement for postacute care0 aActivity outcome measurement for postacute care aI49-1610 v423 aBACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings.10a*Self Efficacy10a*Sickness Impact Profile10aActivities of Daily Living/*classification/psychology10aAdult10aAftercare/*standards/statistics & numerical data10aAged10aBoston10aCognition/physiology10aDisability Evaluation10aFactor Analysis, Statistical10aFemale10aHuman10aMale10aMiddle Aged10aMovement/physiology10aOutcome Assessment (Health Care)/*methods/statistics & numerical data10aPsychometrics10aQuestionnaires/standards10aRehabilitation/*standards/statistics & numerical data10aReproducibility of Results10aSensitivity and Specificity10aSupport, U.S. Gov't, Non-P.H.S.10aSupport, U.S. Gov't, P.H.S.1 aHaley, S M1 aCoster, W J1 aAndres, P L1 aLudlow, L H1 aNi, P1 aBond, T L1 aSinclair, S J1 aJette, A M uhttp://iacat.org/content/activity-outcome-measurement-postacute-care01880nas a2200241 4500008004100000245006000041210005900101260004800160300001200208520112200220653001501342653003401357653002201391653002101413653001901434653001601453653001701469653001301486653002801499100001301527700001601540856008201556 2004 eng d00aAdaptive computerized educational systems: A case study0 aAdaptive computerized educational systems A case study aSan Diego, CA. USAbElsevier Academic Press a143-1693 a(Created by APA) Adaptive instruction describes adjustments typical of one-on-one tutoring as discussed in the college tutorial scenario. So computerized adaptive instruction refers to the use of computer software--almost always incorporating artificially intelligent services--which has been designed to adjust both the presentation of information and the form of questioning to meet the current needs of an individual learner. This chapter describes a system for Internet-delivered adaptive instruction. The author attempts to demonstrate a sharp difference between the teaching that takes place outside of the classroom in universities and the kind that is at least afforded, if not taken advantage of by many, students in a more personalized educational setting such as those in the small liberal arts colleges. The author describes a computer-based technology that allows that gap to be bridged with the advantage of at least having more highly prepared learners sitting in college classrooms. A limited range of emerging research that supports that proposition is cited. (PsycINFO Database Record (c) 2005 APA )10aArtificial10aComputer Assisted Instruction10aComputer Software10aHigher Education10aIndividualized10aInstruction10aIntelligence10aInternet10aUndergraduate Education1 aRay, R D1 aMalott, R W uhttp://iacat.org/content/adaptive-computerized-educational-systems-case-study01664nas a2200229 4500008004100000245005500041210005300096300000800149490000700157520102900164653002101193653001201214653003001226653001801256653000901274100001701283700001201300700001501312700001601327700001401343856007701357 2004 eng d00aAssisted self-adapted testing: A comparative study0 aAssisted selfadapted testing A comparative study a2-90 v203 aA new type of self-adapted test (S-AT), called Assisted Self-Adapted Test (AS-AT), is presented. It differs from an ordinary S-AT in that prior to selecting the difficulty category, the computer advises examinees on their best difficulty category choice, based on their previous performance. Three tests (computerized adaptive test, AS-AT, and S-AT) were compared regarding both their psychometric (precision and efficiency) and psychological (anxiety) characteristics. Tests were applied in an actual assessment situation, in which test scores determined 20% of term grades. A sample of 173 high school students participated. Neither differences in posttest anxiety nor ability were obtained. Concerning precision, AS-AT was as precise as CAT, and both revealed more precision than S-AT. It was concluded that AS-AT acted as a CAT concerning precision. Some hints, but not conclusive support, of the psychological similarity between AS-AT and S-AT was also found. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aAnxiety10aComputer Assisted Testing10aPsychometrics10aTest1 aHontangas, P1 aOlea, J1 aPonsoda, V1 aRevuelta, J1 aWise, S L uhttp://iacat.org/content/assisted-self-adapted-testing-comparative-study01797nas a2200361 4500008004100000020002200041245009500063210006900158250001500227260001100242300001000253490000700263520066300270653002500933653002900958653001000987653000900997653002201006653004501028653003701073653001101110653001101121653000901132653001601141653003601157653003001193653003401223100001601257700002401273700001001297700001501307856011301322 2004 eng d a1074-9357 (Print)00aComputer adaptive testing: a strategy for monitoring stroke rehabilitation across settings0 aComputer adaptive testing a strategy for monitoring stroke rehab a2004/05/01 cSpring a33-390 v113 aCurrent functional assessment instruments in stroke rehabilitation are often setting-specific and lack precision, breadth, and/or feasibility. Computer adaptive testing (CAT) offers a promising potential solution by providing a quick, yet precise, measure of function that can be used across a broad range of patient abilities and in multiple settings. CAT technology yields a precise score by selecting very few relevant items from a large and diverse item pool based on each individual's responses. We demonstrate the potential usefulness of a CAT assessment model with a cross-sectional sample of persons with stroke from multiple rehabilitation settings.10a*Computer Simulation10a*User-Computer Interface10aAdult10aAged10aAged, 80 and over10aCerebrovascular Accident/*rehabilitation10aDisabled Persons/*classification10aFemale10aHumans10aMale10aMiddle Aged10aMonitoring, Physiologic/methods10aSeverity of Illness Index10aTask Performance and Analysis1 aAndres, P L1 aBlack-Schaffer, R M1 aNi, P1 aHaley, S M uhttp://iacat.org/content/computer-adaptive-testing-strategy-monitoring-stroke-rehabilitation-across-settings02589nas a2200469 4500008004100000245007200041210006900113300001000182490000600192520108600198653002501284653001001309653001501319653002101334653002201355653005901377653007001436653003301506653001101539653001101550653001301561653000901574653002701583653002201610653005501632653001901687653001501706653006601721653001801787653003701805653004101842653003001883653001301913100001501926700001301941700001801954700001501972700001401987700001402001700001302015856009102028 2004 eng d00aComputerized adaptive measurement of depression: A simulation study0 aComputerized adaptive measurement of depression A simulation stu a13-230 v43 aBackground: Efficient, accurate instruments for measuring depression are increasingly importantin clinical practice. We developed a computerized adaptive version of the Beck DepressionInventory (BDI). We examined its efficiency and its usefulness in identifying Major DepressiveEpisodes (MDE) and in measuring depression severity.Methods: Subjects were 744 participants in research studies in which each subject completed boththe BDI and the SCID. In addition, 285 patients completed the Hamilton Depression Rating Scale.Results: The adaptive BDI had an AUC as an indicator of a SCID diagnosis of MDE of 88%,equivalent to the full BDI. The adaptive BDI asked fewer questions than the full BDI (5.6 versus 21items). The adaptive latent depression score correlated r = .92 with the BDI total score and thelatent depression score correlated more highly with the Hamilton (r = .74) than the BDI total scoredid (r = .70).Conclusions: Adaptive testing for depression may provide greatly increased efficiency withoutloss of accuracy in identifying MDE or in measuring depression severity.10a*Computer Simulation10aAdult10aAlgorithms10aArea Under Curve10aComparative Study10aDepressive Disorder/*diagnosis/epidemiology/psychology10aDiagnosis, Computer-Assisted/*methods/statistics & numerical data10aFactor Analysis, Statistical10aFemale10aHumans10aInternet10aMale10aMass Screening/methods10aPatient Selection10aPersonality Inventory/*statistics & numerical data10aPilot Projects10aPrevalence10aPsychiatric Status Rating Scales/*statistics & numerical data10aPsychometrics10aResearch Support, Non-U.S. Gov't10aResearch Support, U.S. Gov't, P.H.S.10aSeverity of Illness Index10aSoftware1 aGardner, W1 aShear, K1 aKelleher, K J1 aPajer, K A1 aMammen, O1 aBuysse, D1 aFrank, E uhttp://iacat.org/content/computerized-adaptive-measurement-depression-simulation-study01539nas a2200229 4500008004100000020002200041245006400063210006300127260002600190300001200216490000700228520081800235653003401053653003001087653002801117653001301145100001901158700001501177700001601192700001701208856008401225 2004 eng d a0146-6216 (Print)00aComputerized adaptive testing with multiple-form structures0 aComputerized adaptive testing with multipleform structures bSage Publications: US a147-1640 v283 aA multiple-form structure (MFS) is an ordered collection or network of testlets (i.e., sets of items). An examinee's progression through the network of testlets is dictated by the correctness of an examinee's answers, thereby adapting the test to his or her trait level. The collection of paths through the network yields the set of all possible test forms, allowing test specialists the opportunity to review them before they are administered. Also, limiting the exposure of an individual MFS to a specific period of time can enhance test security. This article provides an overview of methods that have been developed to generate parallel MFSs. The approach is applied to the assembly of an experimental computerized Law School Admission Test (LSAT). (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10aLaw School Admission Test10amultiple-form structure10atestlets1 aArmstrong, R D1 aJones, D H1 aKoppel, N B1 aPashley, P J uhttp://iacat.org/content/computerized-adaptive-testing-multiple-form-structures01904nas a2200217 4500008004100000020004600041245010100087210006900188260002600257300001200283490000700295520111100302653002401413653003201437653001301469653003801482100001701520700001501537700001401552856012001566 2004 eng d a0021-9762 (Print); 1097-4679 (Electronic)00aComputers in clinical assessment: Historical developments, present status, and future challenges0 aComputers in clinical assessment Historical developments present bJohn Wiley & Sons: US a331-3450 v603 aComputerized testing methods have long been regarded as a potentially powerful asset for providing psychological assessment services. Ever since computers were first introduced and adapted to the field of assessment psychology in the 1950s, they have been a valuable aid for scoring, data processing, and even interpretation of test results. The history and status of computer-based personality and neuropsychological tests are discussed in this article. Several pertinent issues involved in providing test interpretation by computer are highlighted. Advances in computer-based test use, such as computerized adaptive testing, are described and problems noted. Today, there is great interest in expanding the availability of psychological assessment applications on the Internet. Although these applications show great promise, there are a number of problems associated with providing psychological tests on the Internet that need to be addressed by psychologists before the Internet can become a major medium for psychological service delivery. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aclinical assessment10acomputerized testing method10aInternet10apsychological assessment services1 aButcher, J N1 aPerry, J L1 aHahn, J A uhttp://iacat.org/content/computers-clinical-assessment-historical-developments-present-status-and-future-challenges01608nas a2200217 4500008004100000020002200041245008200063210006900145260004300214300001200257490000700269520084600276653003401122653002601156653003501182653001601217653001701233100002301250700001801273856009901291 2004 eng d a1076-9986 (Print)00aConstraining item exposure in computerized adaptive testing with shadow tests0 aConstraining item exposure in computerized adaptive testing with bAmerican Educational Research Assn: US a273-2910 v293 aItem-exposure control in computerized adaptive testing is implemented by imposing item-ineligibility constraints on the assembly process of the shadow tests. The method resembles Sympson and Hetter’s (1985) method of item-exposure control in that the decisions to impose the constraints are probabilistic. The method does not, however, require time-consuming simulation studies to set values for control parameters before the operational use of the test. Instead, it can set the probabilities of item ineligibility adaptively during the test using the actual item-exposure rates. An empirical study using an item pool from the Law School Admission Test showed that application of the method yielded perfect control of the item-exposure rates and had negligible impact on the bias and mean-squared error functions of the ability estimator. 10acomputerized adaptive testing10aitem exposure control10aitem ineligibility constraints10aProbability10ashadow tests1 avan der Linden, WJ1 aVeldkamp, B P uhttp://iacat.org/content/constraining-item-exposure-computerized-adaptive-testing-shadow-tests01903nas a2200217 4500008004100000020002200041245007000063210006900133260004100202300001200243490000700255520117100262653003201433653003301465653001801498653002401516100001301540700001801553700002301571856009101594 2004 eng d a0022-0655 (Print)00aConstructing rotating item pools for constrained adaptive testing0 aConstructing rotating item pools for constrained adaptive testin bBlackwell Publishing: United Kingdom a345-3590 v413 aPreventing items in adaptive testing from being over- or underexposed is one of the main problems in computerized adaptive testing. Though the problem of overexposed items can be solved using a probabilistic item-exposure control method, such methods are unable to deal with the problem of underexposed items. Using a system of rotating item pools, on the other hand, is a method that potentially solves both problems. In this method, a master pool is divided into (possibly overlapping) smaller item pools, which are required to have similar distributions of content and statistical attributes. These pools are rotated among the testing sites to realize desirable exposure rates for the items. A test assembly model, motivated by Gulliksen's matched random subtests method, was explored to help solve the problem of dividing a master pool into a set of smaller pools. Different methods to solve the model are proposed. An item pool from the Law School Admission Test was used to evaluate the performances of computerized adaptive tests from systems of rotating item pools constructed using these methods. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive tests10aconstrained adaptive testing10aitem exposure10arotating item pools1 aAriel, A1 aVeldkamp, B P1 avan der Linden, WJ uhttp://iacat.org/content/constructing-rotating-item-pools-constrained-adaptive-testing00542nas a2200145 4500008004100000245008900041210006900130300001200199490000700211653003400218100001400252700001400266700001500280856010100295 2004 eng d00aThe development and evaluation of a software prototype for computer-adaptive testing0 adevelopment and evaluation of a software prototype for computera a109-1230 v4310acomputerized adaptive testing1 aLilley, M1 aBarker, T1 aBritton, C uhttp://iacat.org/content/development-and-evaluation-software-prototype-computer-adaptive-testing01989nas a2200193 4500008004100000020002200041245011400063210006900177260004100246300001200287490000700299520125600306653003401562653002501596653002601621100001401647700001901661856011501680 2004 eng d a0022-0655 (Print)00aEffects of practical constraints on item selection rules at the early stages of computerized adaptive testing0 aEffects of practical constraints on item selection rules at the bBlackwell Publishing: United Kingdom a149-1740 v413 aThe purpose of this study was to compare the effects of four item selection rules--(1) Fisher information (F), (2) Fisher information with a posterior distribution (FP), (3) Kullback-Leibler information with a posterior distribution (KP), and (4) completely randomized item selection (RN)--with respect to the precision of trait estimation and the extent of item usage at the early stages of computerized adaptive testing. The comparison of the four item selection rules was carried out under three conditions: (1) using only the item information function as the item selection criterion; (2) using both the item information function and content balancing; and (3) using the item information function, content balancing, and item exposure control. When test length was less than 10 items, FP and KP tended to outperform F at extreme trait levels in Condition 1. However, in more realistic settings, it could not be concluded that FP and KP outperformed F, especially when item exposure control was imposed. When test length was greater than 10 items, the three nonrandom item selection procedures performed similarly no matter what the condition was, while F had slightly higher item usage. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10aitem selection rules10apractical constraints1 aChen, S-Y1 aAnkenmann, R D uhttp://iacat.org/content/effects-practical-constraints-item-selection-rules-early-stages-computerized-adaptive02468nas a2200193 4500008004100000245010700041210007100148300001200219490000700231520172200238653002101960653003401981653001602015653003002031653002302061653004502084100001502129856013002144 2004 eng d00aÉvaluation et multimédia dans l'apprentissage d'une L2 [Assessment and multimedia in learning an L2]0 aÉvaluation et multimédia dans lapprentissage dune L2 Assessment a475-4870 v163 aIn the first part of this paper different areas where technology may be used for second language assessment are described. First, item banking operations, which are generally based on item Response Theory but not necessarily restricted to dichotomously scored items, facilitate assessment task organization and require technological support. Second, technology may help to design more authentic assessment tasks or may be needed in some direct testing situations. Third, the assessment environment may be more adapted and more stimulating when technology is used to give the student more control. The second part of the paper presents different functions of assessment. The monitoring function (often called formative assessment) aims at adapting the classroom activities to students and to provide continuous feedback. Technology may be used to train the teachers in monitoring techniques, to organize data or to produce diagnostic information; electronic portfolios or quizzes that are built in some educational software may also be used for monitoring. The placement function is probably the one in which the application of computer adaptive testing procedures (e.g. French CAPT) is the most appropriate. Automatic scoring devices may also be used for placement purposes. Finally the certification function requires more valid and more reliable tools. Technology may be used to enhance the testing situation (to make it more authentic) or to facilitate data processing during the construction of a test. Almond et al. (2002) propose a four component model (Selection, Presentation, Scoring and Response) for designing assessment systems. Each component must be planned taking into account the assessment function. 10aAdaptive Testing10aComputer Assisted Instruction10aEducational10aForeign Language Learning10aProgram Evaluation10aTechnology computerized adaptive testing1 aLaurier, M uhttp://iacat.org/content/%C3%A9valuation-et-multim%C3%A9dia-dans-lapprentissage-dune-l2-assessment-and-multimedia-learning-l202032nas a2200181 4500008004100000020002200041245006400063210006400127260004300191300001200234490000700246520141900253653003201672653003401704100001801738700001701756856007701773 2004 eng d a1076-9986 (Print)00aEvaluation of the CATSIB DIF procedure in a pretest setting0 aEvaluation of the CATSIB DIF procedure in a pretest setting bAmerican Educational Research Assn: US a177-1990 v293 aA new procedure, CATSIB, for assessing differential item functioning (DIF) on computerized adaptive tests (CATs) is proposed. CATSIB, a modified SIBTEST procedure, matches test takers on estimated ability and controls for impact-induced Type I error inflation by employing a CAT version of the SIBTEST "regression correction." The performance of CATSIB in terms of detection of DIF in pretest items was evaluated in a simulation study. Simulated test takers were adoptively administered 25 operational items from a pool of 1,000 and were linearly administered 16 pretest items that were evaluated for DIF. Sample size varied from 250 to 500 in each group. Simulated impact levels ranged from a 0- to 1-standard-deviation difference in mean ability levels. The results showed that CATSIB with the regression correction displayed good control over Type 1 error, whereas CATSIB without the regression correction displayed impact-induced Type 1 error inflation. With 500 test takers in each group, power rates were exceptionally high (84% to 99%) for values of DIF at the boundary between moderate and large DIF. For smaller samples of 250 test takers in each group, the corresponding power rates ranged from 47% to 95%. In addition, in all cases, CATSIB was very accurate in estimating the true values of DIF, displaying at most only minor estimation bias. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive tests10adifferential item functioning1 aNandakumar, R1 aRoussos, L A uhttp://iacat.org/content/evaluation-catsib-dif-procedure-pretest-setting00705nas a2200157 4500008004100000245013900041210006900180260003200249653003400281653004000315653004100355100001200396700001200408700001200420856011500432 2004 eng d00aAn investigation of two combination procedures of SPRT for three-category classification decisions in computerized classification test0 ainvestigation of two combination procedures of SPRT for threecat aSan Antonio, Texasc04/200410acomputerized adaptive testing10aComputerized classification testing10asequential probability ratio testing1 aJiao, H1 aWang, S1 aLau, CA uhttp://iacat.org/content/investigation-two-combination-procedures-sprt-three-category-classification-decisions01769nas a2200193 4500008004100000245023000041210006900271300000900340490000700349520093900356653002101295653003001316653001801346653001601364653004201380100001201422700001901434856012201453 2004 eng d00aKann die Konfundierung von Konzentrationsleistung und Aktivierung durch adaptives Testen mit dern FAKT vermieden werden? [Avoiding the confounding of concentration performance and activation by adaptive testing with the FACT]0 aKann die Konfundierung von Konzentrationsleistung und Aktivierun a1-170 v253 aThe study investigates the effect of computerized adaptive testing strategies on the confounding of concentration performance with activation. A sample of 54 participants was administered 1 out of 3 versions (2 adaptive, 1 non-adaptive) of the computerized Frankfurt Adaptive Concentration Test FACT (Moosbrugger & Heyden, 1997) at three subsequent points in time. During the test administration changes in activation (electrodermal activity) were recorded. The results pinpoint a confounding of concentration performance with activation for the non-adaptive test version, but not for the adaptive test versions (p = .01). Thus, adaptive FACT testing strategies can remove the confounding of concentration performance with activation, thereby increasing the discriminant validity. In conclusion, an attention-focusing-hypothesis is formulated to explain the observed effect. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aConcentration10aPerformance10aTesting computerized adaptive testing1 aFrey, A1 aMoosbrugger, H uhttp://iacat.org/content/kann-die-konfundierung-von-konzentrationsleistung-und-aktivierung-durch-adaptives-testen-mit01770nas a2200217 4500008004100000245007700041210006900118300001100187490000600198520108400204653001501288653002501303653001601328653001001344653001801354653002101372653003101393100001901424700001501443856009401458 2004 eng d00aPre-equating: a simulation study based on a large scale assessment model0 aPreequating a simulation study based on a large scale assessment a301-180 v53 aAlthough post-equating (PE) has proven to be an acceptable method in the scaling and equating of items and forms, there are times when the turn-around period for equating and converting raw scores to scale scores is so small that PE cannot be undertaken within the prescribed time frame. In such cases, pre-equating (PrE) could be considered as an acceptable alternative. Assessing the feasibility of using item calibrations from the item bank (as in PrE) is conditioned on the equivalency of the calibrations and the errors associated with it vis a vis the results obtained via PE. This paper creates item banks over three periods of item introduction into the banks and uses the Rasch model in examining data with respect to the recovery of item parameters, the measurement error, and the effect cut-points have on examinee placement in both the PrE and PE situations. Results indicate that PrE is a viable solution to PE provided the stability of the item calibrations are enhanced by using large sample sizes (perhaps as large as full-population) in populating the item bank.10a*Databases10a*Models, Theoretical10aCalibration10aHuman10aPsychometrics10aReference Values10aReproducibility of Results1 aTaherbhai, H M1 aYoung, M J uhttp://iacat.org/content/pre-equating-simulation-study-based-large-scale-assessment-model00537nas a2200181 4500008004100000245005000041210004800091300001000139490000700149653003400156100001400190700001500204700001500219700001400234700002600248700001300274856006800287 2004 eng d00aSiette: a web-based tool for adaptive testing0 aSiette a webbased tool for adaptive testing a29-610 v1410acomputerized adaptive testing1 aConejo, R1 aGuzmán, E1 aMillán, E1 aTrella, M1 aPérez-De-La-Cruz, JL1 aRíos, A uhttp://iacat.org/content/siette-web-based-tool-adaptive-testing01891nas a2200181 4500008004100000020002200041245012000063210006900183260002600252300001200278490000700290520119300297653003401490653003701524653001801561100001401579856011601593 2004 eng d a0146-6216 (Print)00aStrategies for controlling item exposure in computerized adaptive testing with the generalized partial credit model0 aStrategies for controlling item exposure in computerized adaptiv bSage Publications: US a165-1850 v283 aChoosing a strategy for controlling item exposure has become an integral part of test development for computerized adaptive testing (CAT). This study investigated the performance of six procedures for controlling item exposure in a series of simulated CATs under the generalized partial credit model. In addition to a no-exposure control baseline condition, the randomesque, modified-within-.10-logits, Sympson-Hetter, conditional Sympson-Hetter, a-stratified with multiple-stratification, and enhanced a-stratified with multiple-stratification procedures were implemented to control exposure rates. Two variations of the randomesque and modified-within-.10-logits procedures were examined, which varied the size of the item group from which the next item to be administered was randomly selected. The results indicate that although the conditional Sympson-Hetter provides somewhat lower maximum exposure rates, the randomesque and modified-within-.10-logits procedures with the six-item group variation have great utility for controlling overlap rates and increasing pool utilization and should be given further consideration. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10ageneralized partial credit model10aitem exposure1 aDavis, LL uhttp://iacat.org/content/strategies-controlling-item-exposure-computerized-adaptive-testing-generalized-partial01875nas a2200169 4500008004100000245013100041210006900172300001200241490000700253520122700260653003001487653002501517653001501542653001601557100001601573856011601589 2004 eng d00aUsing patterns of summed scores in paper-and-pencil tests and computer-adaptive tests to detect misfitting item score patterns0 aUsing patterns of summed scores in paperandpencil tests and comp a119-1360 v413 aTwo new methods have been proposed to determine unexpected sum scores on subtests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted ρ, was compared with a method where the probability for each score combination was calculated using a highest density region (HDR). Furthermore, these methods were compared with the standardized log-likelihood statistic with and without a correction for the estimated latent trait value (denoted as l-super(*)-sub(z) and l-sub(z), respectively). Data were simulated on the basis of the one-parameter logistic model, and both parametric and nonparametric logistic regression was used to obtain estimates of the latent trait. Results showed that it is important to take the trait level into account when comparing subtest scores. In a nonparametric item response theory (IRT) context, on adapted version of the HDR method was a powerful alterative to ρ. In a parametric IRT context, results showed that l-super(*)-sub(z) had the highest power when the data were simulated conditionally on the estimated latent trait level. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aComputer Assisted Testing10aItem Response Theory10aperson Fit10aTest Scores1 aMeijer, R R uhttp://iacat.org/content/using-patterns-summed-scores-paper-and-pencil-tests-and-computer-adaptive-tests-detect01223nas a2200193 4500008004100000245002900041210002900070260003200099300001200131520066800143653003000811653003200841653001400873653004500887100001200932700001500944700001600959856005400975 2003 eng d00aAssessing question banks0 aAssessing question banks aLondon, UKbKogan Page Ltd. a171-2303 aIn Chapter 14, Joanna Bull and James Daziel provide a comprehensive treatment of the issues surrounding the use of Question Banks and Computer Assisted Assessment, and provide a number of excellent examples of implementations. In their review of the technologies employed in Computer Assisted Assessment the authors include Computer Adaptive Testing and data generation. The authors reveal significant issues involving the impact of Intellectual Property rights and computer assisted assessment and make important suggestions for strategies to overcome these obstacles. (PsycINFO Database Record (c) 2005 APA )http://www-jime.open.ac.uk/2003/1/ (journal abstract)10aComputer Assisted Testing10aCurriculum Based Assessment10aEducation10aTechnology computerized adaptive testing1 aBull, J1 aDalziel, J1 aVreeland, T uhttp://iacat.org/content/assessing-question-banks01919nas a2200241 4500008004100000245009400041210006900135300001200204490000700216520110100223653002101324653001301345653003001358653005701388653000901445653003201454653002601486653002001512100001401532700001301546700001501559856010301574 2003 eng d00aA Bayesian method for the detection of item preknowledge in computerized adaptive testing0 aBayesian method for the detection of item preknowledge in comput a121-1370 v273 aWith the increased use of continuous testing in computerized adaptive testing, new concerns about test security have evolved, such as how to ensure that items in an item pool are safeguarded from theft. In this article, procedures to detect test takers using item preknowledge are explored. When test takers use item preknowledge, their item responses deviate from the underlying item response theory (IRT) model, and estimated abilities may be inflated. This deviation may be detected through the use of person-fit indices. A Bayesian posterior log odds ratio index is proposed for detecting the use of item preknowledge. In this approach to person fit, the estimated probability that each test taker has preknowledge of items is updated after each item response. These probabilities are based on the IRT parameters, a model specifying the probability that each item has been memorized, and the test taker's item responses. Simulations based on an operational computerized adaptive test (CAT) pool are used to demonstrate the use of the odds ratio index. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aCheating10aComputer Assisted Testing10aIndividual Differences computerized adaptive testing10aItem10aItem Analysis (Statistical)10aMathematical Modeling10aResponse Theory1 aMcLeod, L1 aLewis, C1 aThissen, D uhttp://iacat.org/content/bayesian-method-detection-item-preknowledge-computerized-adaptive-testing02650nas a2200385 4500008004100000245014400041210006900185300001200254490000700266520138100273653002101654653003301675653002901708653001501737653001001752653000901762653002201771653002601793653003301819653002501852653001901877653001001896653002501906653001601931653002401947653002601971653002701997653003202024653001302056653002802069100001702097700001602114700001402130856012002144 2003 eng d00aCalibration of an item pool for assessing the burden of headaches: an application of item response theory to the Headache Impact Test (HIT)0 aCalibration of an item pool for assessing the burden of headache a913-9330 v123 aBACKGROUND: Measurement of headache impact is important in clinical trials, case detection, and the clinical monitoring of patients. Computerized adaptive testing (CAT) of headache impact has potential advantages over traditional fixed-length tests in terms of precision, relevance, real-time quality control and flexibility. OBJECTIVE: To develop an item pool that can be used for a computerized adaptive test of headache impact. METHODS: We analyzed responses to four well-known tests of headache impact from a population-based sample of recent headache sufferers (n = 1016). We used confirmatory factor analysis for categorical data and analyses based on item response theory (IRT). RESULTS: In factor analyses, we found very high correlations between the factors hypothesized by the original test constructers, both within and between the original questionnaires. These results suggest that a single score of headache impact is sufficient. We established a pool of 47 items which fitted the generalized partial credit IRT model. By simulating a computerized adaptive health test we showed that an adaptive test of only five items had a very high concordance with the score based on all items and that different worst-case item selection scenarios did not lead to bias. CONCLUSION: We have established a headache impact item pool that can be used in CAT of headache impact.10a*Cost of Illness10a*Decision Support Techniques10a*Sickness Impact Profile10aAdolescent10aAdult10aAged10aComparative Study10aDisability Evaluation10aFactor Analysis, Statistical10aHeadache/*psychology10aHealth Surveys10aHuman10aLongitudinal Studies10aMiddle Aged10aMigraine/psychology10aModels, Psychological10aPsychometrics/*methods10aQuality of Life/*psychology10aSoftware10aSupport, Non-U.S. Gov't1 aBjorner, J B1 aKosinski, M1 aWare, Jr. uhttp://iacat.org/content/calibration-item-pool-assessing-burden-headaches-application-item-response-theory-headache01836nas a2200205 4500008004100000245009000041210006900131300001100200490000700211520111400218653002101332653003001353653001601383653003201399653001601431653004501447100001501492700001601507856010701523 2003 eng d00aA comparative study of item exposure control methods in computerized adaptive testing0 acomparative study of item exposure control methods in computeriz a71-1030 v403 aThis study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The M. L. Stocking and C. Lewis conditional multinomial procedure and, to a slightly lesser extent, the T. Davey and C. G. Parshall method seemed to be the most promising considering all of the factors that this study addressed. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aEducational10aItem Analysis (Statistical)10aMeasurement10aStrategies computerized adaptive testing1 aChang, S-W1 aAnsley, T N uhttp://iacat.org/content/comparative-study-item-exposure-control-methods-computerized-adaptive-testing01757nas a2200313 4500008004100000245007700041210006900118300001200187490000700199520083400206653002101040653001501061653001701076653001601093653003001109653001701139653001501156653002501171653002001196653001501216653002501231653001801256653000901274100001801283700001201301700001601313700001601329856009801345 2003 eng d00aComputerized adaptive rating scales for measuring managerial performance0 aComputerized adaptive rating scales for measuring managerial per a237-2460 v113 aComputerized adaptive rating scales (CARS) had been developed to measure contextual or citizenship performance. This rating format used a paired-comparison protocol, presenting pairs of behavioral statements scaled according to effectiveness levels, and an iterative item response theory algorithm to obtain estimates of ratees' citizenship performance (W. C. Borman et al, 2001). In the present research, we developed CARS to measure the entire managerial performance domain, including task and citizenship performance, thus addressing a major limitation of the earlier CARS. The paper describes this development effort, including an adjustment to the algorithm that reduces substantially the number of item pairs required to obtain almost as much precision in the performance estimates. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aAlgorithms10aAssociations10aCitizenship10aComputer Assisted Testing10aConstruction10aContextual10aItem Response Theory10aJob Performance10aManagement10aManagement Personnel10aRating Scales10aTest1 aSchneider, RJ1 aGoff, M1 aAnderson, S1 aBorman, W C uhttp://iacat.org/content/computerized-adaptive-rating-scales-measuring-managerial-performance01927nas a2200229 4500008004100000245007200041210006900113300001200182490000700194520116000201653001801361653002101379653003001400653001801430653002501448653002501473653005701498653002201555100001501577700001201592856009301604 2003 eng d00aComputerized adaptive testing using the nearest-neighbors criterion0 aComputerized adaptive testing using the nearestneighbors criteri a204-2160 v273 aItem selection procedures designed for computerized adaptive testing need to accurately estimate every taker's trait level (θ) and, at the same time, effectively use all items in a bank. Empirical studies showed that classical item selection procedures based on maximizing Fisher or other related information yielded highly varied item exposure rates; with these procedures, some items were frequently used whereas others were rarely selected. In the literature, methods have been proposed for controlling exposure rates; they tend to affect the accuracy in θ estimates, however. A modified version of the maximum Fisher information (MFI) criterion, coined the nearest neighbors (NN) criterion, is proposed in this study. The NN procedure improves to a moderate extent the undesirable item exposure rates associated with the MFI criterion and keeps sufficient precision in estimates. The NN criterion will be compared with a few other existing methods in an empirical study using the mean squared errors in θ estimates and plots of item exposure rates associated with different distributions. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10a(Statistical)10aAdaptive Testing10aComputer Assisted Testing10aItem Analysis10aItem Response Theory10aStatistical Analysis10aStatistical Estimation computerized adaptive testing10aStatistical Tests1 aCheng, P E1 aLiou, M uhttp://iacat.org/content/computerized-adaptive-testing-using-nearest-neighbors-criterion01587nas a2200145 4500008004100000245005200041210005200093300001200145490000700157520113200164653003401296100001601330700002301346856007201369 2003 eng d00aComputerized adaptive testing with item cloning0 aComputerized adaptive testing with item cloning a247-2610 v273 a(from the journal abstract) To increase the number of items available for adaptive testing and reduce the cost of item writing, the use of techniques of item cloning has been proposed. An important consequence of item cloning is possible variability between the item parameters. To deal with this variability, a multilevel item response (IRT) model is presented which allows for differences between the distributions of item parameters of families of item clones. A marginal maximum likelihood and a Bayesian procedure for estimating the hyperparameters are presented. In addition, an item-selection procedure for computerized adaptive testing with item cloning is presented which has the following two stages: First, a family of item clones is selected to be optimal at the estimate of the person parameter. Second, an item is randomly selected from the family for administration. Results from simulation studies based on an item pool from the Law School Admission Test (LSAT) illustrate the accuracy of these item pool calibration and adaptive testing procedures. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aGlas, C A W1 avan der Linden, WJ uhttp://iacat.org/content/computerized-adaptive-testing-item-cloning03162nas a2200361 4500008004100000245012500041210006900166300001200235490000700247520200400254653002902258653001502287653001002302653000902312653002202321653002002343653003302363653002402396653001102420653001002431653000902441653001602450653002502466653002602491653004302517653003202560653001902592653002802611100001702639700001602656700001402672856011402686 2003 eng d00aThe feasibility of applying item response theory to measures of migraine impact: a re-analysis of three clinical studies0 afeasibility of applying item response theory to measures of migr a887-9020 v123 aBACKGROUND: Item response theory (IRT) is a powerful framework for analyzing multiitem scales and is central to the implementation of computerized adaptive testing. OBJECTIVES: To explain the use of IRT to examine measurement properties and to apply IRT to a questionnaire for measuring migraine impact--the Migraine Specific Questionnaire (MSQ). METHODS: Data from three clinical studies that employed the MSQ-version 1 were analyzed by confirmatory factor analysis for categorical data and by IRT modeling. RESULTS: Confirmatory factor analyses showed very high correlations between the factors hypothesized by the original test constructions. Further, high item loadings on one common factor suggest that migraine impact may be adequately assessed by only one score. IRT analyses of the MSQ were feasible and provided several suggestions as to how to improve the items and in particular the response choices. Out of 15 items, 13 showed adequate fit to the IRT model. In general, IRT scores were strongly associated with the scores proposed by the original test developers and with the total item sum score. Analysis of response consistency showed that more than 90% of the patients answered consistently according to a unidimensional IRT model. For the remaining patients, scores on the dimension of emotional function were less strongly related to the overall IRT scores that mainly reflected role limitations. Such response patterns can be detected easily using response consistency indices. Analysis of test precision across score levels revealed that the MSQ was most precise at one standard deviation worse than the mean impact level for migraine patients that are not in treatment. Thus, gains in test precision can be achieved by developing items aimed at less severe levels of migraine impact. CONCLUSIONS: IRT proved useful for analyzing the MSQ. The approach warrants further testing in a more comprehensive item pool for headache impact that would enable computerized adaptive testing.10a*Sickness Impact Profile10aAdolescent10aAdult10aAged10aComparative Study10aCost of Illness10aFactor Analysis, Statistical10aFeasibility Studies10aFemale10aHuman10aMale10aMiddle Aged10aMigraine/*psychology10aModels, Psychological10aPsychometrics/instrumentation/*methods10aQuality of Life/*psychology10aQuestionnaires10aSupport, Non-U.S. Gov't1 aBjorner, J B1 aKosinski, M1 aWare, Jr. uhttp://iacat.org/content/feasibility-applying-item-response-theory-measures-migraine-impact-re-analysis-three00957nas a2200157 4500008004100000245011200041210006900153300001100222490000700233520035900240653003400599100001500633700001900648700001300667856011900680 2003 eng d00aIncorporation of Content Balancing Requirements in Stratification Designs for Computerized Adaptive Testing0 aIncorporation of Content Balancing Requirements in Stratificatio a257-700 v633 aStudied three stratification designs for computerized adaptive testing in conjunction with three well-developed content balancing methods. Simulation study results show substantial differences in item overlap rate and pool utilization among different methods. Recommends an optimal combination of stratification design and content balancing method. (SLD)10acomputerized adaptive testing1 aLeung, C-K1 aChang, Hua-Hua1 aHau, K-T uhttp://iacat.org/content/incorporation-content-balancing-requirements-stratification-designs-computerized-adaptive01743nas a2200217 4500008004100000245008700041210006900128300001200197490000700209520100200216653002101218653003001239653002601269653002501295653002001320653001401340653004901354100001401403700001401417856009401431 2003 eng d00aItem exposure constraints for testlets in the verbal reasoning section of the MCAT0 aItem exposure constraints for testlets in the verbal reasoning s a335-3560 v273 aThe current study examined item exposure control procedures for testlet scored reading passages in the Verbal Reasoning section of the Medical College Admission Test with four computerized adaptive testing (CAT) systems using the partial credit model. The first system used a traditional CAT using maximum information item selection. The second used random item selection to provide a baseline for optimal exposure rates. The third used a variation of Lunz and Stahl's randomization procedure. The fourth used Luecht and Nungester's computerized adaptive sequential testing (CAST) system. A series of simulated fixed-length CATs was run to determine the optimal item length selection procedure. Results indicated that both the randomization procedure and CAST performed well in terms of exposure control and measurement precision, with the CAST system providing the best overall solution when all variables were taken into consideration. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aEntrance Examinations10aItem Response Theory10aRandom Sampling10aReasoning10aVerbal Ability computerized adaptive testing1 aDavis, LL1 aDodd, B G uhttp://iacat.org/content/item-exposure-constraints-testlets-verbal-reasoning-section-mcat00516nas a2200169 4500008004100000245003700041210003700078260004900115300001400164653003400178100001800212700001300230700001700243700001200260700001500272856005900287 2003 eng d00aItem selection in polytomous CAT0 aItem selection in polytomous CAT aTokyo, JapanbPsychometric Society, Springer a207–21410acomputerized adaptive testing1 aVeldkamp, B P1 aOkada, A1 aShigenasu, K1 aKano, Y1 aMeulman, J uhttp://iacat.org/content/item-selection-polytomous-cat01432nas a2200205 4500008004100000245008800041210007000129300001200199490000700211520067700218653002100895653003000916653002400946653002500970653002600995653005201021100001901073700002301092856011101115 2003 eng d00aOptimal stratification of item pools in α-stratified computerized adaptive testing0 aOptimal stratification of item pools in αstratified computerized a262-2740 v273 aA method based on 0-1 linear programming (LP) is presented to stratify an item pool optimally for use in α-stratified adaptive testing. Because the 0-1 LP model belongs to the subclass of models with a network flow structure, efficient solutions are possible. The method is applied to a previous item pool from the computerized adaptive testing (CAT) version of the Graduate Record Exams (GRE) Quantitative Test. The results indicate that the new method performs well in practical situations. It improves item exposure control, reduces the mean squared error in the θ estimates, and increases test reliability. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aItem Content (Test)10aItem Response Theory10aMathematical Modeling10aTest Construction computerized adaptive testing1 aChang, Hua-Hua1 avan der Linden, WJ uhttp://iacat.org/content/optimal-stratification-item-pools-%CE%B1-stratified-computerized-adaptive-testing01797nas a2200241 4500008004100000245009300041210006900134300001200203490000700215520098200222653001801204653002101222653003001243653001901273653004601292653001801338653002501356653001501381100001401396700001901410700001501429856011101444 2003 eng d00aThe relationship between item exposure and test overlap in computerized adaptive testing0 arelationship between item exposure and test overlap in computeri a129-1450 v403 aThe purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (r-sub(max)). Therefore, item exposure control methods which implement a specification of r-sub(max) (e.g., J. B. Sympson and R. D. Hetter, 1985) provide the most direct control at both the item and test levels. (PsycINFO Database Record (c) 2005 APA )10a(Statistical)10aAdaptive Testing10aComputer Assisted Testing10aHuman Computer10aInteraction computerized adaptive testing10aItem Analysis10aItem Analysis (Test)10aTest Items1 aChen, S-Y1 aAnkenmann, R D1 aSpray, J A uhttp://iacat.org/content/relationship-between-item-exposure-and-test-overlap-computerized-adaptive-testing01516nas a2200157 4500008004100000245009500041210006900136300001200205490000700217520090100224653002101125653003001146653004501176100002301221856011401244 2003 eng d00aSome alternatives to Sympson-Hetter item-exposure control in computerized adaptive testing0 aSome alternatives to SympsonHetter itemexposure control in compu a249-2650 v283 aTheHetter and Sympson (1997; 1985) method is a method of probabilistic item-exposure control in computerized adaptive testing. Setting its control parameters to admissible values requires an iterative process of computer simulations that has been found to be time consuming, particularly if the parameters have to be set conditional on a realistic set of values for the examinees’ ability parameter. Formal properties of the method are identified that help us explain why this iterative process can be slow and does not guarantee admissibility. In addition, some alternatives to the SH method are introduced. The behavior of these alternatives was estimated for an adaptive test from an item pool from the Law School Admission Test (LSAT). Two of the alternatives showed attractive behavior and converged smoothly to admissibility for all items in a relatively small number of iteration steps. 10aAdaptive Testing10aComputer Assisted Testing10aTest Items computerized adaptive testing1 avan der Linden, WJ uhttp://iacat.org/content/some-alternatives-sympson-hetter-item-exposure-control-computerized-adaptive-testing01545nas a2200193 4500008004100000245026000041210006900301300001000370490000700380520067700387653002101064653002201085653001701107653001501124653004801139100002201187700002201209856012001231 2003 eng d00aTiming behavior in computerized adaptive testing: Response times for correct and incorrect answers are not related to general fluid intelligence/Zum Zeitverhalten beim computergestützten adaptiveb Testen: Antwortlatenzen bei richtigen und falschen Lösun0 aTiming behavior in computerized adaptive testing Response times a57-630 v243 aExamined the effects of general fluid intelligence on item response times for correct and false responses in computerized adaptive testing. After performing the CFT3 intelligence test, 80 individuals (aged 17-44 yrs) completed perceptual and cognitive discrimination tasks. Results show that response times were related neither to the proficiency dimension reflected by the task nor to the individual level of fluid intelligence. Furthermore, the false > correct-phenomenon as well as substantial positive correlations between item response times for false and correct responses were shown to be independent of intelligence levels. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aCognitive Ability10aIntelligence10aPerception10aReaction Time computerized adaptive testing1 aRammsayer, Thomas1 aBrandler, Susanne uhttp://iacat.org/content/timing-behavior-computerized-adaptive-testing-response-times-correct-and-incorrect-answers01512nas a2200229 4500008004100000245008700041210006900128300001200197490000700209520075300216653002100969653001300990653003001003653003401033653001101067653001501078653001501093653001801108100002301126700002701149856010601176 2003 eng d00aUsing response times to detect aberrant responses in computerized adaptive testing0 aUsing response times to detect aberrant responses in computerize a251-2650 v683 aA lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered.10aAdaptive Testing10aBehavior10aComputer Assisted Testing10acomputerized adaptive testing10aModels10aperson Fit10aPrediction10aReaction Time1 avan der Linden, WJ1 aKrimpen-Stoop, E M L A uhttp://iacat.org/content/using-response-times-detect-aberrant-responses-computerized-adaptive-testing02855nas a2200265 4500008004100000245006600041210006600107260000800173300000900181490000700190520208800197653002102285653002902306653003002335653001202365653001102377653001302388653003102401653001902432100001302451700001502464700001302479700001502492856008202507 2002 eng d00aAdvances in quality of life measurements in oncology patients0 aAdvances in quality of life measurements in oncology patients cJun a60-80 v293 aAccurate assessment of the quality of life (QOL) of patients can provide important clinical information to physicians, especially in the area of oncology. Changes in QOL are important indicators of the impact of a new cytotoxic therapy, can affect a patient's willingness to continue treatment, and may aid in defining response in the absence of quantifiable endpoints such as tumor regression. Because QOL is becoming an increasingly important aspect in the management of patients with malignant disease, it is vital that the instruments used to measure QOL are reliable and accurate. Assessment of QOL involves a multidimensional approach that includes physical, functional, social, and emotional well-being, and the most comprehensive instruments measure at least three of these domains. Instruments to measure QOL can be generic (eg, the Nottingham Health Profile), targeted toward specific illnesses (eg, Functional Assessment of Cancer Therapy - Lung), or be a combination of generic and targeted. Two of the most widely used examples of the combination, or hybrid, instruments are the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 Items and the Functional Assessment of Chronic Illness Therapy. A consequence of the increasing international collaboration in clinical trials has been the growing necessity for instruments that are valid across languages and cultures. To assure the continuing reliability and validity of QOL instruments in this regard, item response theory can be applied. Techniques such as item response theory may be used in the future to construct QOL item banks containing large sets of validated questions that represent various levels of QOL domains. As QOL becomes increasingly important in understanding and approaching the overall management of cancer patients, the tools available to clinicians and researchers to assess QOL will continue to evolve. While the instruments currently available provide reliable and valid measurement, further improvements in precision and application are anticipated.10a*Quality of Life10a*Sickness Impact Profile10aCross-Cultural Comparison10aCulture10aHumans10aLanguage10aNeoplasms/*physiopathology10aQuestionnaires1 aCella, D1 aChang, C-H1 aLai, J S1 aWebster, K uhttp://iacat.org/content/advances-quality-life-measurements-oncology-patients01978nas a2200289 4500008004100000245007600041210006900117260000800186300001200194490000700206520114900213653002401362653001301386653002101399653002001420653001501440653001001455653001001465653001101475653001101486653000901497653002401506653002601530100001601556700001501572856010101587 2002 eng d00aAssessing tobacco beliefs among youth using item response theory models0 aAssessing tobacco beliefs among youth using item response theory cNov aS21-S390 v683 aSuccessful intervention research programs to prevent adolescent smoking require well-chosen, psychometrically sound instruments for assessing smoking prevalence and attitudes. Twelve thousand eight hundred and ten adolescents were surveyed about their smoking beliefs as part of the Teenage Attitudes and Practices Survey project, a prospective cohort study of predictors of smoking initiation among US adolescents. Item response theory (IRT) methods are used to frame a discussion of questions that a researcher might ask when selecting an optimal item set. IRT methods are especially useful for choosing items during instrument development, trait scoring, evaluating item functioning across groups, and creating optimal item subsets for use in specialized applications such as computerized adaptive testing. Data analytic steps for IRT modeling are reviewed for evaluating item quality and differential item functioning across subgroups of gender, age, and smoking status. Implications and challenges in the use of these methods for tobacco onset research and for assessing the developmental trajectories of smoking among youth are discussed.10a*Attitude to Health10a*Culture10a*Health Behavior10a*Questionnaires10aAdolescent10aAdult10aChild10aFemale10aHumans10aMale10aModels, Statistical10aSmoking/*epidemiology1 aPanter, A T1 aReeve, B B uhttp://iacat.org/content/assessing-tobacco-beliefs-among-youth-using-item-response-theory-models02081nas a2200229 4500008004100000245012900041210006900170300001200239490000700251520124000258653001801498653002101516653004501537653003001582653001801612653002501630653002601655100001601681700001401697700001901711856012101730 2002 eng d00aA comparison of item selection techniques and exposure control mechanisms in CATs using the generalized partial credit model0 acomparison of item selection techniques and exposure control mec a147-1630 v263 aThe use of more performance items in large-scale testing has led to an increase in the research investigating the use of polytomously scored items in computer adaptive testing (CAT). Because this research has to be complemented with information pertaining to exposure control, the present research investigated the impact of using five different exposure control algorithms in two sized item pools calibrated using the generalized partial credit model. The results of the simulation study indicated that the a-stratified design, in comparison to a no-exposure control condition, could be used to reduce item exposure and overlap, increase pool utilization, and only minorly degrade measurement precision. Use of the more restrictive exposure control algorithms, such as the Sympson-Hetter and conditional Sympson-Hetter, controlled exposure to a greater extent but at the cost of measurement precision. Because convergence of the exposure control parameters was problematic for some of the more restrictive exposure control algorithms, use of the more simplistic exposure control mechanisms, particularly when the test length to item pool size ratio is large, is recommended. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10a(Statistical)10aAdaptive Testing10aAlgorithms computerized adaptive testing10aComputer Assisted Testing10aItem Analysis10aItem Response Theory10aMathematical Modeling1 aPastor, D A1 aDodd, B G1 aChang, Hua-Hua uhttp://iacat.org/content/comparison-item-selection-techniques-and-exposure-control-mechanisms-cats-using-generalized01498nas a2200133 4500008004100000245011800041210006900159300000900228490000700237520094700244653003401191100001801225856012101243 2002 eng d00aComputer adaptive testing: The impact of test characteristics on perceived performance and test takers' reactions0 aComputer adaptive testing The impact of test characteristics on a34100 v623 aThis study examined the relationship between characteristics of adaptive testing and test takers' subsequent reactions to the test. Participants took a computer adaptive test in which two features, the difficulty of the initial item and the difficulty of subsequent items, were manipulated. These two features of adaptive testing determined the number of items answered correctly by examinees and their subsequent reactions to the test. The data show that the relationship between test characteristics and reactions was fully mediated by perceived performance on the test. In addition, the impact of feedback on reactions to adaptive testing was also evaluated. In general, feedback that was consistent with perceptions of performance had a positive impact on reactions to the test. Implications for adaptive test design concerning maximizing test takers' reactions are discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aTonidandel, S uhttp://iacat.org/content/computer-adaptive-testing-impact-test-characteristics-perceived-performance-and-test-takers00685nas a2200145 4500008004100000245003400041210003400075300001100109490000700120520029200127653003400419100001200453700001500465856005900480 2002 eng d00aComputerised adaptive testing0 aComputerised adaptive testing a619-220 v333 aConsiders the potential of computer adaptive testing (CAT). Discusses the use of CAT instead of traditional paper and pencil tests, identifies decisions that impact the efficacy of CAT, and concludes that CAT is beneficial when used to its full potential on certain types of tests. (LRW)10acomputerized adaptive testing1 aLatu, E1 aChapman, E uhttp://iacat.org/content/computerised-adaptive-testing01443nas a2200241 4500008004100000245008000041210006900121300001200190490000700202520067300209653003000882653002800912653002500940653002300965653001600988653002201004653002101026100001301047700001601060700001001076700001601086856009901102 2002 eng d00aData sparseness and on-line pretest item calibration-scaling methods in CAT0 aData sparseness and online pretest item calibrationscaling metho a207-2180 v393 aCompared and evaluated 3 on-line pretest item calibration-scaling methods (the marginal maximum likelihood estimate with 1 expectation maximization [EM] cycle [OEM] method, the marginal maximum likelihood estimate with multiple EM cycles [MEM] method, and M. L. Stocking's Method B) in terms of item parameter recovery when the item responses to the pretest items in the pool are sparse. Simulations of computerized adaptive tests were used to evaluate the results yielded by the three methods. The MEM method produced the smallest average total error in parameter estimation, and the OEM method yielded the largest total error (PsycINFO Database Record (c) 2005 APA )10aComputer Assisted Testing10aEducational Measurement10aItem Response Theory10aMaximum Likelihood10aMethodology10aScaling (Testing)10aStatistical Data1 aBan, J-C1 aHanson, B A1 aYi, Q1 aHarris, D J uhttp://iacat.org/content/data-sparseness-and-line-pretest-item-calibration-scaling-methods-cat02773nas a2200133 4500008004100000245009800041210006900139300000900208490000700217520225500224653003402479100001602513856011002529 2002 eng d00aThe effect of test characteristics on aberrant response patterns in computer adaptive testing0 aeffect of test characteristics on aberrant response patterns in a33630 v623 aThe advantages that computer adaptive testing offers over linear tests have been well documented. The Computer Adaptive Test (CAT) design is more efficient than the Linear test design as fewer items are needed to estimate an examinee's proficiency to a desired level of precision. In the ideal situation, a CAT will result in examinees answering different number of items according to the stopping rule employed. Unfortunately, the realities of testing conditions have necessitated the imposition of time and minimum test length limits on CATs. Such constraints might place a burden on the CAT test taker resulting in aberrant response behaviors by some examinees. Occurrence of such response patterns results in inaccurate estimation of examinee proficiency levels. This study examined the effects of test lengths, time limits and the interaction of these factors with the examinee proficiency levels on the occurrence of aberrant response patterns. The focus of the study was on the aberrant behaviors caused by rushed guessing due to restrictive time limits. Four different testing scenarios were examined; fixed length performance tests with and without content constraints, fixed length mastery tests and variable length mastery tests without content constraints. For each of these testing scenarios, the effect of two test lengths, five different timing conditions and the interaction between these factors with three ability levels on ability estimation were examined. For fixed and variable length mastery tests, decision accuracy was also looked at in addition to the estimation accuracy. Several indices were used to evaluate the estimation and decision accuracy for different testing conditions. The results showed that changing time limits had a significant impact on the occurrence of aberrant response patterns conditional on ability. Increasing test length had negligible if not negative effect on ability estimation when rushed guessing occured. In case of performance testing high ability examinees while in classification testing middle ability examinees suffered the most. The decision accuracy was considerably affected in case of variable length classification tests. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aRizavi, S M uhttp://iacat.org/content/effect-test-characteristics-aberrant-response-patterns-computer-adaptive-testing01634nas a2200217 4500008004100000245009700041210006900138300001200207490000700219520087500226653002101101653003001122653002501152653002301177653002501200653002801225653002701253100001301280700001501293856010801308 2002 eng d00aAn EM approach to parameter estimation for the Zinnes and Griggs paired comparison IRT model0 aEM approach to parameter estimation for the Zinnes and Griggs pa a208-2270 v263 aBorman et al. recently proposed a computer adaptive performance appraisal system called CARS II that utilizes paired comparison judgments of behavioral stimuli. To implement this approach,the paired comparison ideal point model developed by Zinnes and Griggs was selected. In this article,the authors describe item response and information functions for the Zinnes and Griggs model and present procedures for estimating stimulus and person parameters. Monte carlo simulations were conducted to assess the accuracy of the parameter estimation procedures. The results indicated that at least 400 ratees (i.e.,ratings) are required to obtain reasonably accurate estimates of the stimulus parameters and their standard errors. In addition,latent trait estimation improves as test length increases. The implications of these results for test construction are also discussed. 10aAdaptive Testing10aComputer Assisted Testing10aItem Response Theory10aMaximum Likelihood10aPersonnel Evaluation10aStatistical Correlation10aStatistical Estimation1 aStark, S1 aDrasgow, F uhttp://iacat.org/content/em-approach-parameter-estimation-zinnes-and-griggs-paired-comparison-irt-model00524nas a2200109 4500008004100000245010600041210006900147260002500216653003400241100001900275856012000294 2002 eng d00aAn empirical comparison of achievement level estimates from adaptive tests and paper-and-pencil tests0 aempirical comparison of achievement level estimates from adaptiv aNew Orleans, LA. USA10acomputerized adaptive testing1 aKingsbury, G G uhttp://iacat.org/content/empirical-comparison-achievement-level-estimates-adaptive-tests-and-paper-and-pencil-tests01306nas a2200169 4500008004100000245009500041210006900136300001200205490000700217520070700224653003400931100001400965700001600979700001600995700001701011856010801028 2002 eng d00aEvaluation of selection procedures for computerized adaptive testing with polytomous items0 aEvaluation of selection procedures for computerized adaptive tes a393-4110 v263 aIn the present study, a procedure that has been used to select dichotomous items in computerized adaptive testing was applied to polytomous items. This procedure was designed to select the item with maximum weighted information. In a simulation study, the item information function was integrated over a fixed interval of ability values and the item with the maximum area was selected. This maximum interval information item selection procedure was compared to a maximum point information item selection procedure. Substantial differences between the two item selection procedures were not found when computerized adaptive tests were evaluated on bias and the root mean square of the ability estimate. 10acomputerized adaptive testing1 aRijn, P W1 aEggen, Theo1 aHemker, B T1 aSanders, P F uhttp://iacat.org/content/evaluation-selection-procedures-computerized-adaptive-testing-polytomous-items01774nas a2200217 4500008004100000245006200041210006200103260005600165300001200221520108400233653002401317653001601341653001401357653002201371653001501393653001801408653001301426100001901439700001601458856008201474 2002 eng d00aGenerating abstract reasoning items with cognitive theory0 aGenerating abstract reasoning items with cognitive theory aMahwah, N.J. USAbLawrence Erlbaum Associates, Inc. a219-2503 a(From the chapter) Developed and evaluated a generative system for abstract reasoning items based on cognitive theory. The cognitive design system approach was applied to generate matrix completion problems. Study 1 involved developing the cognitive theory with 191 college students who were administered Set I and Set II of the Advanced Progressive Matrices. Study 2 examined item generation by cognitive theory. Study 3 explored the psychometric properties and construct representation of abstract reasoning test items with 728 young adults. Five structurally equivalent forms of Abstract Reasoning Test (ART) items were prepared from the generated item bank and administered to the Ss. In Study 4, the nomothetic span of construct validity of the generated items was examined with 728 young adults who were administered ART items, and 217 young adults who were administered ART items and the Advanced Progressive Matrices. Results indicate the matrix completion items were effectively generated by the cognitive design system approach. (PsycINFO Database Record (c) 2005 APA )10aCognitive Processes10aMeasurement10aReasoning10aTest Construction10aTest Items10aTest Validity10aTheories1 aEmbretson, S E1 aKyllomen, P uhttp://iacat.org/content/generating-abstract-reasoning-items-cognitive-theory02009nas a2200205 4500008004100000245008200041210006900123300001200192490000700204520132300211653002101534653001501555653003001570653001101600653000901611653004701620100001901667700001301686856010401699 2002 eng d00aHypergeometric family and item overlap rates in computerized adaptive testing0 aHypergeometric family and item overlap rates in computerized ada a387-3980 v673 aA computerized adaptive test (CAT) is usually administered to small groups of examinees at frequent time intervals. It is often the case that examinees who take the test earlier share information with examinees who will take the test later, thus increasing the risk that many items may become known. Item overlap rate for a group of examinees refers to the number of overlapping items encountered by these examinees divided by the test length. For a specific item pool, different item selection algorithms may yield different item overlap rates. An important issue in designing a good CAT item selection algorithm is to keep item overlap rate below a preset level. In doing so, it is important to investigate what the lowest rate could be for all possible item selection algorithms. In this paper we rigorously prove that if every item had an equal possibility to be selected from the pool in a fixed-length CAT, the number of overlapping item among any α randomly sampled examinees follows the hypergeometric distribution family for α ≥ 1. Thus, the expected values of the number of overlapping items among any randomly sampled α examinee can be calculated precisely. These values may serve as benchmarks in controlling item overlap rates for fixed-length adaptive tests. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aAlgorithms10aComputer Assisted Testing10aTaking10aTest10aTime On Task computerized adaptive testing1 aChang, Hua-Hua1 aZhang, J uhttp://iacat.org/content/hypergeometric-family-and-item-overlap-rates-computerized-adaptive-testing02787nas a2200133 4500008004100000245010200041210006900143300000900212490000700221520226500228653003402493100002002527856010602547 2002 eng d00aThe implications of the use of non-optimal items in a Computer Adaptive Testing (CAT) environment0 aimplications of the use of nonoptimal items in a Computer Adapti a16060 v633 aThis study describes the effects of manipulating item difficulty in a computer adaptive testing (CAT) environment. There are many potential benefits when using CATS as compared to traditional tests. These include increased security, shorter tests, and more precise measurement. According to IRT, the theory underlying CAT, as the computer continually recalculates ability, items that match that current estimate of ability are administered. Such items provide maximum information about examinees during the test. Herein, however, lies a potential problem. These optimal CAT items result in an examinee having only a 50% chance of a correct response. Some examinees may consider such items unduly challenging. Further, when test anxiety is a factor, it is possible that test scores may be negatively affected. This research was undertaken to determine the effects of administering easier CAT items on ability estimation and test length using computer simulations. Also considered was the administration of different numbers of initial items prior to the start of the adaptive portion of the test, using three different levels of measurement precision. Results indicate that regardless of the number of initial items administered, the level of precision employed, or the modifications made to item difficulty, the approximation of estimated ability to true ability is good in all cases. Additionally, the standard deviations of the ability estimates closely approximate the theoretical levels of precision used as stopping rules for the simulated CATs. Since optimal CAT items are not used, each item administered provides less information about examinees than optimal CAT items. This results in longer tests. Fortunately, using easier items that provide up to a 66.4% chance of a correct response results in tests that only modestly increase in length, across levels of precision. For larger standard errors, even easier items (up to a 73.5% chance of a correct response) result in only negligible to modest increases in test length. Examinees who find optimal CAT items difficult or examinees with test anxiety may find CATs that implement easier items enhance the already existing benefits of CAT. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aGrodenchik, D J uhttp://iacat.org/content/implications-use-non-optimal-items-computer-adaptive-testing-cat-environment01213nas a2200217 4500008004100000245005100041210005100092300001200143490000700155520060500162653002600767653003000793653001600823653001300839653001300852653001100865653001200876653001500888100001600903856007600919 2002 eng d00aInformation technology and literacy assessment0 aInformation technology and literacy assessment a369-3730 v183 aThis column discusses information technology and literacy assessment in the past and present. The author also describes computer-based assessments today including the following topics: computer-scored testing, computer-administered formal assessment, Internet formal assessment, computerized adaptive tests, placement tests, informal assessment, electronic portfolios, information management, and Internet information dissemination. A model of the major present-day applications of information technologies in reading and literacy assessment is also included. (PsycINFO Database Record (c) 2005 APA )10aComputer Applications10aComputer Assisted Testing10aInformation10aInternet10aLiteracy10aModels10aSystems10aTechnology1 aBalajthy, E uhttp://iacat.org/content/information-technology-and-literacy-assessment01179nas a2200133 4500008004100000245006200041210005900103300001200162490000700174520073400181653003400915100001600949856008000965 2002 eng d00aAn item response model for characterizing test compromise0 aitem response model for characterizing test compromise a163-1790 v273 aThis article presents an item response model for characterizing test-compromise that enables the estimation of item-preview and score-gain distributions observed in on-demand high-stakes testing programs. Model parameters and posterior distributions are estimated by Markov Chain Monte Carlo (MCMC) procedures. Results of a simulation study suggest that when at least some of the items taken by a small sample of test takers are known to be secure (uncompromised), the procedure can provide useful summaries of test-compromise and its impact on test scores. The article includes discussions of operational use of the proposed procedure, possible model violations and extensions, and application to computerized adaptive testing. 10acomputerized adaptive testing1 aSegall, D O uhttp://iacat.org/content/item-response-model-characterizing-test-compromise01689nas a2200277 4500008004100000020001000041245006500051210006400116260009700180300001100277520074500288653002101033653002201054653002501076653002801101653002501129653001601154653001801170653005501188653001501243653001201258100001801270700002301288700001301311856008701324 2002 eng d a02-0900aMathematical-programming approaches to test item pool design0 aMathematicalprogramming approaches to test item pool design aTwente, The NetherlandsbUniversity of Twente, Faculty of Educational Science and Technology a93-1083 a(From the chapter) This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing and hence to increase both measurement precision and validity. The approach consists of the application of mathematical programming techniques to calculate optimal blueprints for item pools. These blueprints can be used to guide the item-writing process. Three different types of design problems are discussed, namely for item pools for linear tests, item pools computerized adaptive testing (CAT), and systems of rotating item pools for CAT. The paper concludes with an empirical example of the problem of designing a system of rotating item pools for CAT.10aAdaptive Testing10aComputer Assisted10aComputer Programming10aEducational Measurement10aItem Response Theory10aMathematics10aPsychometrics10aStatistical Rotation computerized adaptive testing10aTest Items10aTesting1 aVeldkamp, B P1 avan der Linden, WJ1 aAriel, A uhttp://iacat.org/content/mathematical-programming-approaches-test-item-pool-design02411nas a2200277 4500008004100000245012200041210006900163260000800232300001000240490000700250520148700257653002101744653002101765653002001786653001001806653002201816653002901838653001101867653001801878653001901896653004101915653003201956100001301988700001802001856011402019 2002 eng d00aMeasuring quality of life in chronic illness: the functional assessment of chronic illness therapy measurement system0 aMeasuring quality of life in chronic illness the functional asse cDec aS10-70 v833 aWe focus on quality of life (QOL) measurement as applied to chronic illness. There are 2 major types of health-related quality of life (HRQOL) instruments-generic health status and targeted. Generic instruments offer the opportunity to compare results across patient and population cohorts, and some can provide normative or benchmark data from which to interpret results. Targeted instruments ask questions that focus more on the specific condition or treatment under study and, as a result, tend to be more responsive to clinically important changes than generic instruments. Each type of instrument has a place in the assessment of HRQOL in chronic illness, and consideration of the relative advantages and disadvantages of the 2 options best drives choice of instrument. The Functional Assessment of Chronic Illness Therapy (FACIT) system of HRQOL measurement is a hybrid of the 2 approaches. The FACIT system combines a core general measure with supplemental measures targeted toward specific diseases, conditions, or treatments. Thus, it capitalizes on the strengths of each type of measure. Recently, FACIT questionnaires were administered to a representative sample of the general population with results used to derive FACIT norms. These normative data can be used for benchmarking and to better understand changes in HRQOL that are often seen in clinical trials. Future directions in HRQOL assessment include test equating, item banking, and computerized adaptive testing.10a*Chronic Disease10a*Quality of Life10a*Rehabilitation10aAdult10aComparative Study10aHealth Status Indicators10aHumans10aPsychometrics10aQuestionnaires10aResearch Support, U.S. Gov't, P.H.S.10aSensitivity and Specificity1 aCella, D1 aNowinski, C J uhttp://iacat.org/content/measuring-quality-life-chronic-illness-functional-assessment-chronic-illness-therapy03058nas a2200325 4500008004100000020004100041245008100082210006900163250001500232260000800247300001100255490000700266520201300273653001502286653001002301653004002311653005702351653003302408653001102441653001102452653001802463653000902481653002802490653001202518653005502530100001502585700001802600700001502618856009902633 2002 eng d a0025-7079 (Print)0025-7079 (Linking)00aMultidimensional adaptive testing for mental health problems in primary care0 aMultidimensional adaptive testing for mental health problems in a2002/09/10 cSep a812-230 v403 aOBJECTIVES: Efficient and accurate instruments for assessing child psychopathology are increasingly important in clinical practice and research. For example, screening in primary care settings can identify children and adolescents with disorders that may otherwise go undetected. However, primary care offices are notorious for the brevity of visits and screening must not burden patients or staff with long questionnaires. One solution is to shorten assessment instruments, but dropping questions typically makes an instrument less accurate. An alternative is adaptive testing, in which a computer selects the items to be asked of a patient based on the patient's previous responses. This research used a simulation to test a child mental health screen based on this technology. RESEARCH DESIGN: Using half of a large sample of data, a computerized version was developed of the Pediatric Symptom Checklist (PSC), a parental-report psychosocial problem screen. With the unused data, a simulation was conducted to determine whether the Adaptive PSC can reproduce the results of the full PSC with greater efficiency. SUBJECTS: PSCs were completed by parents on 21,150 children seen in a national sample of primary care practices. RESULTS: Four latent psychosocial problem dimensions were identified through factor analysis: internalizing problems, externalizing problems, attention problems, and school problems. A simulated adaptive test measuring these traits asked an average of 11.6 questions per patient, and asked five or fewer questions for 49% of the sample. There was high agreement between the adaptive test and the full (35-item) PSC: only 1.3% of screening decisions were discordant (kappa = 0.93). This agreement was higher than that obtained using a comparable length (12-item) short-form PSC (3.2% of decisions discordant; kappa = 0.84). CONCLUSIONS: Multidimensional adaptive testing may be an accurate and efficient technology for screening for mental health problems in primary care settings.10aAdolescent10aChild10aChild Behavior Disorders/*diagnosis10aChild Health Services/*organization & administration10aFactor Analysis, Statistical10aFemale10aHumans10aLinear Models10aMale10aMass Screening/*methods10aParents10aPrimary Health Care/*organization & administration1 aGardner, W1 aKelleher, K J1 aPajer, K A uhttp://iacat.org/content/multidimensional-adaptive-testing-mental-health-problems-primary-care01627nas a2200241 4500008004100000245005900041210005800100300001200158490000700170520087100177653002101048653003401069653002801103653002001131653003201151653002501183653001501208653002701223653002201250653001601272100001601288856008101304 2002 eng d00aOutlier detection in high-stakes certification testing0 aOutlier detection in highstakes certification testing a219-2330 v393 aDiscusses recent developments of person-fit analysis in computerized adaptive testing (CAT). Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in CAT Most person-fit research in CAT is restricted to simulated data. In this study, empirical data from a certification test were used. Alternatives are discussed to generate norms so that bounds can be determined to classify an item score pattern as fitting or misfitting. Using bounds determined from a sample of a high-stakes certification test, the empirical analysis showed that different types of misfit can be distinguished Further applications using statistical process control methods to detect misfitting item score patterns are discussed. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10acomputerized adaptive testing10aEducational Measurement10aGoodness of Fit10aItem Analysis (Statistical)10aItem Response Theory10aperson Fit10aStatistical Estimation10aStatistical Power10aTest Scores1 aMeijer, R R uhttp://iacat.org/content/outlier-detection-high-stakes-certification-testing02030nas a2200253 4500008004100000245010900041210006900150300000900219490000600228520114100234653002101375653001501396653003901411653002201450653002501472653001801497653002201515653005501537653001501592653001201607100001701619700002501636856011501661 2002 eng d00aA structure-based approach to psychological measurement: Matching measurement models to latent structure0 astructurebased approach to psychological measurement Matching me a4-160 v93 aThe present article sets forth the argument that psychological assessment should be based on a construct's latent structure. The authors differentiate dimensional (continuous) and taxonic (categorical) structures at the latent and manifest levels and describe the advantages of matching the assessment approach to the latent structure of a construct. A proper match will decrease measurement error, increase statistical power, clarify statistical relationships, and facilitate the location of an efficient cutting score when applicable. Thus, individuals will be placed along a continuum or assigned to classes more accurately. The authors briefly review the methods by which latent structure can be determined and outline a structure-based approach to assessment that builds on dimensional scaling models, such as item response theory, while incorporating classification methods as appropriate. Finally, the authors empirically demonstrate the utility of their approach and discuss its compatibility with traditional assessment methods and with computerized adaptive testing. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aAssessment10aClassification (Cognitive Process)10aComputer Assisted10aItem Response Theory10aPsychological10aScaling (Testing)10aStatistical Analysis computerized adaptive testing10aTaxonomies10aTesting1 aRuscio, John1 aRuscio, Ayelet Meron uhttp://iacat.org/content/structure-based-approach-psychological-measurement-matching-measurement-models-latent01145nas a2200205 4500008004100000245008200041210006900123260005600192520043100248653002100679653003000700653001600730653001600746653001800762100001500780700001700795700001700812700001400829856009600843 2002 eng d00aThe work ahead: A psychometric infrastructure for computerized adaptive tests0 awork ahead A psychometric infrastructure for computerized adapti aMahwah, N.J. USAbLawrence Erlbaum Associates, Inc.3 a(From the chapter) Considers the past and future of computerized adaptive tests and computer-based tests and looks at issues and challenges confronting a testing program as it implements and operates a computer-based test. Recommendations for testing programs from The National Council of Measurement in Education Ad Hoc Committee on Computerized Adaptive Test Disclosure are appended. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aEducational10aMeasurement10aPsychometrics1 aDrasgow, F1 aPotenza, M P1 aFreemer, J J1 aWard, W C uhttp://iacat.org/content/work-ahead-psychometric-infrastructure-computerized-adaptive-tests02220nas a2200145 4500008004100000245011600041210006900157300001100226490000600237520165200243653003401895100001301929700001601942856011601958 2001 eng d00aAssessment in the twenty-first century: A role of computerised adaptive testing in national curriculum subjects0 aAssessment in the twentyfirst century A role of computerised ada a241-570 v53 aWith the investment of large sums of money in new technologies forschools and education authorities and the subsequent training of teachers to integrate Information and Communications Technology (ICT) into their teaching strategies, it is remarkable that the old outdated models of assessment still remain. This article highlights the current problems associated with pen-and paper-testing and offers suggestions for an innovative and new approach to assessment for the twenty-first century. Based on the principle of the 'wise examiner' a computerised adaptive testing system which measures pupils' ability against the levels of the United Kingdom National Curriculum has been developed for use in mathematics. Using constructed response items, pupils are administered a test tailored to their ability with a reliability index of 0.99. Since the software administers maximally informative questions matched to each pupil's current ability estimate, no two pupils will receive the same set of items in the same order therefore removing opportunities for plagarism and teaching to the test. All marking is automated and a journal recording the outcome of the test and highlighting the areas of difficulty for each pupil is available for printing by the teacher. The current prototype of the system can be used on a school's network however the authors envisage a day when Examination Boards or the Qualifications and Assessment Authority (QCA) will administer Government tests from a central server to all United Kingdom schools or testing centres. Results will be issued at the time of testing and opportunities for resits will become more widespr10acomputerized adaptive testing1 aCowan, P1 aMorrison, H uhttp://iacat.org/content/assessment-twenty-first-century-role-computerised-adaptive-testing-national-curriculum00838nas a2200157 4500008004100000245007400041210006900115300001100184490000700195520030900202653003400511100001900545700001200564700001200576856009200588 2001 eng d00aa-stratified multistage computerized adaptive testing with b blocking0 aastratified multistage computerized adaptive testing with b bloc a333-410 v253 aProposed a refinement, based on the stratification of items developed by D. Weiss (1973), of the computerized adaptive testing item selection procedure of H. Chang and Z. Ying (1999). Simulation studies using an item bank from the Graduate Record Examination show the benefits of the new procedure. (SLD)10acomputerized adaptive testing1 aChang, Hua-Hua1 aQian, J1 aYang, Z uhttp://iacat.org/content/stratified-multistage-computerized-adaptive-testing-b-blocking01695nas a2200229 4500008004100000245007800041210006900119300001200188490000700200520094500207653002501152653005101177653003001228653001801258653001101276653002701287653001101314100001701325700001101342700001801353856009401371 2001 eng d00aComputerized adaptive testing with the generalized graded unfolding model0 aComputerized adaptive testing with the generalized graded unfold a177-1960 v253 aExamined the use of the generalized graded unfolding model (GGUM) in computerized adaptive testing. The objective was to minimize the number of items required to produce equiprecise estimates of person locations. Simulations based on real data about college student attitudes toward abortion and on data generated to fit the GGUM were used. It was found that as few as 7 or 8 items were needed to produce accurate and precise person estimates using an expected a posteriori procedure. The number items in the item bank (20, 40, or 60 items) and their distribution on the continuum (uniform locations or item clusters in moderately extreme locations) had only small effects on the accuracy and precision of the estimates. These results suggest that adaptive testing with the GGUM is a good method for achieving estimates with an approximately uniform level of precision using a small number of items. (PsycINFO Database Record (c) 2005 APA )10aAttitude Measurement10aCollege Students computerized adaptive testing10aComputer Assisted Testing10aItem Response10aModels10aStatistical Estimation10aTheory1 aRoberts, J S1 aLin, Y1 aLaughlin, J E uhttp://iacat.org/content/computerized-adaptive-testing-generalized-graded-unfolding-model01965nas a2200193 4500008004100000245008600041210006900127300001000196490000700206520131400213653001401527653003001541653002501571653001101596653003601607653001401643100001501657856009901672 2001 eng d00aDevelopments in measurement of persons and items by means of item response models0 aDevelopments in measurement of persons and items by means of ite a65-940 v283 aThis paper starts with a general introduction into measurement of hypothetical constructs typical of the social and behavioral sciences. After the stages ranging from theory through operationalization and item domain to preliminary test or questionnaire have been treated, the general assumptions of item response theory are discussed. The family of parametric item response models for dichotomous items is introduced and it is explained how parameters for respondents and items are estimated from the scores collected from a sample of respondents who took the test or questionnaire. Next, the family of nonparametric item response models is explained, followed by the 3 classes of item response models for polytomous item scores (e.g., rating scale scores). Then, to what degree the mean item score and the unweighted sum of item scores for persons are useful for measuring items and persons in the context of item response theory is discussed. Methods for fitting parametric and nonparametric models to data are briefly discussed. Finally, the main applications of item response models are discussed, which include equating and item banking, computerized and adaptive testing, research into differential item functioning, person fit research, and cognitive modeling. (PsycINFO Database Record (c) 2005 APA )10aCognitive10aComputer Assisted Testing10aItem Response Theory10aModels10aNonparametric Statistical Tests10aProcesses1 aSijtsma, K uhttp://iacat.org/content/developments-measurement-persons-and-items-means-item-response-models01613nas a2200193 4500008004100000245008600041210006900127300001200196490000700208520094500215653002101160653003001181653004101211653000901252653001701261100001601278700001701294856010801311 2001 eng d00aDifferences between self-adapted and computerized adaptive tests: A meta-analysis0 aDifferences between selfadapted and computerized adaptive tests a235-2470 v383 aSelf-adapted testing has been described as a variation of computerized adaptive testing that reduces test anxiety and thereby enhances test performance. The purpose of this study was to gain a better understanding of these proposed effects of self-adapted tests (SATs); meta-analysis procedures were used to estimate differences between SATs and computerized adaptive tests (CATs) in proficiency estimates and post-test anxiety levels across studies in which these two types of tests have been compared. After controlling for measurement error the results showed that SATs yielded proficiency estimates that were 0.12 standard deviation units higher and post-test anxiety levels that were 0.19 standard deviation units lower than those yielded by CATs. The authors speculate about possible reasons for these differences and discuss advantages and disadvantages of using SATs in operational settings. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aScores computerized adaptive testing10aTest10aTest Anxiety1 aPitkin, A K1 aVispoel, W P uhttp://iacat.org/content/differences-between-self-adapted-and-computerized-adaptive-tests-meta-analysis00678nas a2200133 4500008004100000245001800041210001700059300001000076490000800086520036100094653003400455100001300489856004200502 2001 eng d00aFinal answer?0 aFinal answer a24-260 v1883 aThe Northwest Evaluation Association helped an Indiana school district develop a computerized adaptive testing system that was aligned with its curriculum and geared toward measuring individual student growth. Now the district can obtain such information from semester to semester and year to year, get immediate results, and test students on demand. (MLH)10acomputerized adaptive testing1 aCoyle, J uhttp://iacat.org/content/final-answer01981nas a2200205 4500008004100000245010100041210006900142300001200211490000700223520124900230653001201479653002101491653003001512653001501542653001601557653004501573100001701618700001901635856012101654 2001 eng d00aItem selection in computerized adaptive testing: Should more discriminating items be used first?0 aItem selection in computerized adaptive testing Should more disc a249-2660 v383 aDuring computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with J. B. Sympson and R. D. Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order, as described in H. Chang and Z. Yings (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure. (PsycINFO Database Record (c) 2005 APA )10aability10aAdaptive Testing10aComputer Assisted Testing10aEstimation10aStatistical10aTest Items computerized adaptive testing1 aHau, Kit-Tai1 aChang, Hua-Hua uhttp://iacat.org/content/item-selection-computerized-adaptive-testing-should-more-discriminating-items-be-used-first03148nas a2200133 4500008004100000245007900041210006900120300000900189490000700198520266000205653003402865100001502899856010002914 2001 eng d00aMultidimensional adaptive testing using the weighted likelihood estimation0 aMultidimensional adaptive testing using the weighted likelihood a47460 v613 aThis study extended Warm's (1989) weighted likelihood estimation (WLE) to a multidimensional computerized adaptive test (MCAT) setting. WLE was compared with the maximum likelihood estimation (MLE), expected a posteriori (EAP), and maximum a posteriori (MAP) using a three-dimensional 3PL IRT model under a variety of computerized adaptive testing conditions. The dependent variables included bias, standard error of ability estimates (SE), square root of mean square error (RMSE), and test information. The independent variables were ability estimation methods, intercorrelation levels between dimensions, multidimensional structures, and ability combinations. Simulation results were presented in terms of descriptive statistics, such as figures and tables. In addition, inferential procedures were used to analyze bias by conceptualizing this Monte Carlo study as a statistical sampling experiment. The results of this study indicate that WLE and the other three estimation methods yield significantly more accurate ability estimates under an approximate simple test structure with one dominant dimension and several secondary dimensions. All four estimation methods, especially WLE, yield very large SEs when a three equally dominant multidimensional structure was employed. Consistent with previous findings based on unidimensional IRT model, MLE and WLE are less biased in the extreme of the ability scale; MLE and WLE yield larger SEs than the Bayesian methods; test information-based SEs underestimate actual SEs for both MLE and WLE in MCAT situations, especially at shorter test lengths; WLE reduced the bias of MLE under the approximate simple structure; test information-based SEs underestimates the actual SEs of MLE and WLE estimators in the MCAT conditions, similar to the findings of Warm (1989) in the unidimensional case. The results from the MCAT simulations did show some advantages of WLE in reducing the bias of MLE under the approximate simple structure with a fixed test length of 50 items, which was consistent with the previous research findings based on different unidimensional models. It is clear from the current results that all four methods perform very poorly when the multidimensional structures with multiple dominant factors were employed. More research efforts are urged to investigate systematically how different multidimensional structures affect the accuracy and reliability of ability estimation. Based on the simulated results in this study, there is no significant effect found on the ability estimation from the intercorrelation between dimensions. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aTseng, F-L uhttp://iacat.org/content/multidimensional-adaptive-testing-using-weighted-likelihood-estimation01855nas a2200193 4500008004100000245012400041210007100165300001200236490000700248520110700255653002101362653002601383653002201409653001401431653005901445100001601504700001701520856012401537 2001 eng d00aNouveaux développements dans le domaine du testing informatisé [New developments in the area of computerized testing]0 aNouveaux développements dans le domaine du testing informatisé N a221-2300 v463 aL'usage de l'évaluation assistée par ordinateur s'est fortement développé depuis la première formulation de ses principes de base dans les années soixante et soixante-dix. Cet article offre une introduction aux derniers développements dans le domaine de l'évaluation assistée par ordinateur, en particulier celui du testing adaptative informatisée (TAI). L'estimation de l'aptitude, la sélection des items et le développement d'une base d'items dans le cas du TAI sont discutés. De plus, des exemples d'utilisations innovantes de l'ordinateur dans des systèmes intégrés de testing et de testing via Internet sont présentés. L'article se termine par quelques illustrations de nouvelles applications du testing informatisé et des suggestions pour des recherches futures.Discusses the latest developments in computerized psychological assessment, with emphasis on computerized adaptive testing (CAT). Ability estimation, item selection, and item pool development in CAT are described. Examples of some innovative approaches to CAT are presented. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Applications10aComputer Assisted10aDiagnosis10aPsychological Assessment computerized adaptive testing1 aMeijer, R R1 aGrégoire, J uhttp://iacat.org/content/nouveaux-d%C3%A9veloppements-dans-le-domaine-du-testing-informatis%C3%A9-new-developments-area01701nas a2200181 4500008004100000245007300041210006900114300001100183490000700194520110100201653002101302653003001323653002501353653001501378100001701393700001501410856009401425 2001 eng d00aOutlier measures and norming methods for computerized adaptive tests0 aOutlier measures and norming methods for computerized adaptive t a85-1040 v263 aNotes that the problem of identifying outliers has 2 important aspects: the choice of outlier measures and the method to assess the degree of outlyingness (norming) of those measures. Several classes of measures for identifying outliers in Computerized Adaptive Tests (CATs) are introduced. Some of these measures are constructed to take advantage of CATs' sequential choice of items; other measures are taken directly from paper and pencil (P&P) tests and are used for baseline comparisons. Assessing the degree of outlyingness of CAT responses, however, can not be applied directly from P&P tests because stopping rules associated with CATs yield examinee responses of varying lengths. Standard outlier measures are highly correlated with the varying lengths which makes comparison across examinees impossible. Therefore, 4 methods are presented and compared which map outlier statistics to a familiar probability scale (a p value). The methods are explored in the context of CAT data from a 1995 Nationally Administered Computerized Examination (NACE). (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aStatistical Analysis10aTest Norms1 aBradlow, E T1 aWeiss, R E uhttp://iacat.org/content/outlier-measures-and-norming-methods-computerized-adaptive-tests02333nas a2200157 4500008004100000020001400041245019300055210006900248300001200317490000700329520164900336653003401985100001402019700001502033856012702048 2001 eng d a0214-991500aPasado, presente y futuro de los test adaptativos informatizados: Entrevista con Isaac I. Béjar [Past, present and future of computerized adaptive testing: Interview with Isaac I. Béjar]0 aPasado presente y futuro de los test adaptativos informatizados a685-6900 v133 aEn este artículo se presenta el resultado de una entrevista con Isaac I. Bejar. El Dr. Bejar es actualmente Investigador Científico Principal y Director del Centro para el Diseño de Evaluación y Sistemas de Puntuación perteneciente a la División de Investigación del Servicio de Medición Educativa (Educa - tional Testing Service, Princeton, NJ, EE.UU.). El objetivo de esta entrevista fue conversar sobre el pasado, presente y futuro de los Tests Adaptativos Informatizados. En la entrevista se recogen los inicios de los Tests Adaptativos y de los Tests Adaptativos Informatizados y últimos avances que se desarrollan en el Educational Testing Service sobre este tipo de tests (modelos generativos, isomorfos, puntuación automática de ítems de ensayo…). Se finaliza con la visión de futuro de los Tests Adaptativos Informatizados y su utilización en España.Past, present and future of Computerized Adaptive Testing: Interview with Isaac I. Bejar. In this paper the results of an interview with Isaac I. Bejar are presented. Dr. Bejar is currently Principal Research Scientist and Director of Center for Assessment Design and Scoring, in Research Division at Educational Testing Service (Princeton, NJ, U.S.A.). The aim of this interview was to review the past, present and future of the Computerized Adaptive Tests. The beginnings of the Adaptive Tests and Computerized Adaptive Tests, and the latest advances developed at the Educational Testing Service (generative response models, isomorphs, automated scoring…) are reviewed. The future of Computerized Adaptive Tests is analyzed, and its utilization in Spain commented.10acomputerized adaptive testing1 aTejada, R1 aAntonio, J uhttp://iacat.org/content/pasado-presente-y-futuro-de-los-test-adaptativos-informatizados-entrevista-con-isaac-i-b%C3%A9jar01344nas a2200181 4500008004100000245007300041210006900114260005600183300001200239520069300251653002100944653003000965653002200995653002001017100001601037700001701053856009201070 2001 eng d00aPractical issues in setting standards on computerized adaptive tests0 aPractical issues in setting standards on computerized adaptive t aMahwah, N.J. USAbLawrence Erlbaum Associates, Inc. a355-3693 a(From the chapter) Examples of setting standards on computerized adaptive tests (CATs) are hard to find. Some examples of CATs involving performance standards include the registered nurse exam and the Novell systems engineer exam. Although CATs do not require separate standard setting-methods, there are special issues to be addressed by test specialist who set performance standards on CATs. Setting standards on a CAT will typical require modifications on the procedures used with more traditional, fixed-form, paper-and -pencil examinations. The purpose of this chapter is to illustrate why CATs pose special challenges to the standard setter. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aPerformance Tests10aTesting Methods1 aSireci, S G1 aClauser, B E uhttp://iacat.org/content/practical-issues-setting-standards-computerized-adaptive-tests01356nas a2200205 4500008004100000245016700041210007000208300001000278490000700288520051600295653003000811653003100841653004800872100001800920700001900938700002600957700002600983700001401009856012701023 2001 eng d00aRequerimientos, aplicaciones e investigación en tests adaptativos informatizados [Requirements, applications, and investigation in computerized adaptive testing]0 aRequerimientos aplicaciones e investigación en tests adaptativos a11-280 v193 aSummarizes the main requirements and applications of computerized adaptive testing (CAT) with emphasis on the differences between CAT and conventional computerized tests. Psychometric properties of estimations based on CAT, item selection strategies, and implementation software are described. Results of CAT studies in Spanish-speaking samples are described. Implications for developing a CAT measuring the English vocabulary of Spanish-speaking students are discussed. (PsycINFO Database Record (c) 2005 APA )10aComputer Assisted Testing10aEnglish as Second Language10aPsychometrics computerized adaptive testing1 aOlea Díaz, J1 aPonsoda Gil, V1 aRevuelta Menéndez, J1 aHontangas Beltrán, P1 aAbad, F J uhttp://iacat.org/content/requerimientos-aplicaciones-e-investigaci%C3%B3n-en-tests-adaptativos-informatizados-requirements01587nas a2200205 4500008004100000245016500041210006900206300001200275490000700287520076900294653002101063653002601084653003001110653002501140653005101165100001301216700001701229700002101246856011401267 2001 eng d00aToepassing van een computergestuurde adaptieve testprocedure op persoonlijkheidsdata [Application of a computerised adaptive test procedure on personality data]0 aToepassing van een computergestuurde adaptieve testprocedure op a119-1330 v563 aStudied the applicability of a computerized adaptive testing procedure to an existing personality questionnaire within the framework of item response theory. The procedure was applied to the scores of 1,143 male and female university students (mean age 21.8 yrs) in the Netherlands on the Neuroticism scale of the Amsterdam Biographical Questionnaire (G. J. Wilde, 1963). The graded response model (F. Samejima, 1969) was used. The quality of the adaptive test scores was measured based on their correlation with test scores for the entire item bank and on their correlation with scores on other scales from the personality test. The results indicate that computerized adaptive testing can be applied to personality scales. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Applications10aComputer Assisted Testing10aPersonality Measures10aTest Reliability computerized adaptive testing1 aHol, A M1 aVorst, H C M1 aMellenbergh, G J uhttp://iacat.org/content/toepassing-van-een-computergestuurde-adaptieve-testprocedure-op-persoonlijkheidsdata02438nas a2200169 4500008004100000245012400041210007300165300001000238490000700248520174900255653003402004100002002038700001802058700002202076700002202098856014802120 2000 eng d00aAlgoritmo mixto mínima entropía-máxima información para la selección de ítems en un test adaptativo informatizado0 aAlgoritmo mixto mínima entropíamáxima información para la selecc a12-140 v123 aEl objetivo del estudio que presentamos es comparar la eficacia como estrat egia de selección de ítems de tres algo ritmos dife rentes: a) basado en máxima info rmación; b) basado en mínima entropía; y c) mixto mínima entropía en los ítems iniciales y máxima info rmación en el resto; bajo la hipótesis de que el algo ritmo mixto, puede dotar al TAI de mayor eficacia. Las simulaciones de procesos TAI se re a l i z a ron sobre un banco de 28 ítems de respuesta graduada calibrado según el modelo de Samejima, tomando como respuesta al TAI la respuesta ori ginal de los sujetos que fueron utilizados para la c a l i b ración. Los resultados iniciales mu e s t ran cómo el cri t e rio mixto es más eficaz que cualquiera de los otros dos tomados indep e n d i e n t e m e n t e. Dicha eficacia se maximiza cuando el algo ritmo de mínima entropía se re s t ri n ge a la selección de los pri m e ros ítems del TAI, ya que con las respuestas a estos pri m e ros ítems la estimación de q comienza a ser re l evante y el algo ritmo de máxima informaciónse optimiza.Item selection algo rithms in computeri zed adap t ive testing. The aim of this paper is to compare the efficacy of three different item selection algo rithms in computeri zed adap t ive testing (CAT). These algorithms are based as follows: the first one is based on Item Info rm ation, the second one on Entropy, and the last algo rithm is a mixture of the two previous ones. The CAT process was simulated using an emotional adjustment item bank. This item bank contains 28 graded items in six categories , calibrated using Samejima (1969) Graded Response Model. The initial results show that the mixed criterium algorithm performs better than the other ones.10acomputerized adaptive testing1 aDorronsoro, J R1 aSanta-Cruz, C1 aRubio Franco, V J1 aAguado García, D uhttp://iacat.org/content/algoritmo-mixto-m%C3%ADnima-entrop%C3%ADa-m%C3%A1xima-informaci%C3%B3n-para-la-selecci%C3%B3n-de-%C3%ADtems-en-un-test01256nas a2200145 4500008004100000245006500041210006500106300001000171490000700181520076500188653003400953100002300987700001601010856008401026 2000 eng d00aCapitalization on item calibration error in adaptive testing0 aCapitalization on item calibration error in adaptive testing a35-530 v133 a(from the journal abstract) In adaptive testing, item selection is sequentially optimized during the test. Because the optimization takes place over a pool of items calibrated with estimation error, capitalization on chance is likely to occur. How serious the consequences of this phenomenon are depends not only on the distribution of the estimation errors in the pool or the conditional ratio of the test length to the pool size given ability, but may also depend on the structure of the item selection criterion used. A simulation study demonstrated a dramatic impact of capitalization on estimation errors on ability estimation. Four different strategies to minimize the likelihood of capitalization on error in computerized adaptive testing are discussed.10acomputerized adaptive testing1 avan der Linden, WJ1 aGlas, C A W uhttp://iacat.org/content/capitalization-item-calibration-error-adaptive-testing02650nas a2200133 4500008004100000245007300041210006900114300000900183490000700192520217300199653003402372100001702406856009302423 2000 eng d00aA comparison of computerized adaptive testing and multistage testing0 acomparison of computerized adaptive testing and multistage testi a58290 v603 aThere is considerable evidence to show that computerized-adaptive testing (CAT) and multi-stage testing (MST) are viable frameworks for testing. With many testing organizations looking to move towards CAT or MST, it is important to know what framework is superior in different situations and at what cost in terms of measurement. What was needed is a comparison of the different testing procedures under various realistic testing conditions. This dissertation addressed the important problem of the increase or decrease in accuracy of ability estimation in using MST rather than CAT. The purpose of this study was to compare the accuracy of ability estimates produced by MST and CAT while keeping some variables fixed and varying others. A simulation study was conducted to investigate the effects of several factors on the accuracy of ability estimation using different CAT and MST designs. The factors that were manipulated are the number of stages, the number of subtests per stage, and the number of items per subtest. Kept constant were test length, distribution of subtest information, method of determining cut-points on subtests, amount of overlap between subtests, and method of scoring total test. The primary question of interest was, given a fixed test length, how many stages and many subtests per stage should there be to maximize measurement precision? Furthermore, how many items should there be in each subtest? Should there be more in the routing test or should there be more in the higher stage tests? Results showed that, in general, increasing the number of stages from two to three decreased the amount of errors in ability estimation. Increasing the number of subtests from three to five increased the accuracy of ability estimates as well as the efficiency of the MST designs relative to the P&P and CAT designs at most ability levels (-.75 to 2.25). Finally, at most ability levels (-.75 to 2.25), varying the number of items per stage had little effect on either the resulting accuracy of ability estimates or the relative efficiency of the MST designs to the P&P and CAT designs. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aPatsula, L N uhttp://iacat.org/content/comparison-computerized-adaptive-testing-and-multistage-testing01385nas a2200193 4500008004100000245009400041210006900135300001200204490000700216520067900223653002100902653003000923653002500953653005700978100001401035700001901049700001901068856010401087 2000 eng d00aA comparison of item selection rules at the early stages of computerized adaptive testing0 acomparison of item selection rules at the early stages of comput a241-2550 v243 aThe effects of 5 item selection rules--Fisher information (FI), Fisher interval information (FII), Fisher information with a posterior distribution (FIP), Kullback-Leibler information (KL), and Kullback-Leibler information with a posterior distribution (KLP)--were compared with respect to the efficiency and precision of trait (θ) estimation at the early stages of computerized adaptive testing (CAT). FII, FIP, KL, and KLP performed marginally better than FI at the early stages of CAT for θ=-3 and -2. For tests longer than 10 items, there appeared to be no precision advantage for any of the selection rules. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aItem Analysis (Test)10aStatistical Estimation computerized adaptive testing1 aChen, S-Y1 aAnkenmann, R D1 aChang, Hua-Hua uhttp://iacat.org/content/comparison-item-selection-rules-early-stages-computerized-adaptive-testing00540nas a2200157 4500008004100000245006500041210006300106260002700169490000700196653003400203100001700237700001200254700001200266700001700278856008700295 2000 eng d00aComputer-adaptive testing: A methodology whose time has come0 aComputeradaptive testing A methodology whose time has come aChicago, IL. USAbMESA0 v6910acomputerized adaptive testing1 aLinacre, J M1 aKang, U1 aJean, E1 aLinacre, J M uhttp://iacat.org/content/computer-adaptive-testing-methodology-whose-time-has-come01559nas a2200229 4500008004100000245006400041210006300105300001100168490000600179520083800185653002701023653001501050653001501065653004201080653001101122653002601133653002601159653003101185100001501216700001601231856008201247 2000 eng d00aComputerization and adaptive administration of the NEO PI-R0 aComputerization and adaptive administration of the NEO PIR a347-640 v73 aThis study asks, how well does an item response theory (IRT) based computerized adaptive NEO PI-R work? To explore this question, real-data simulations (N = 1,059) were used to evaluate a maximum information item selection computerized adaptive test (CAT) algorithm. Findings indicated satisfactory recovery of full-scale facet scores with the administration of around four items per facet scale. Thus, the NEO PI-R could be reduced in half with little loss in precision by CAT administration. However, results also indicated that the CAT algorithm was not necessary. We found that for many scales, administering the "best" four items per facet scale would have produced similar results. In the conclusion, we discuss the future of computerized personality assessment and describe the role IRT methods might play in such assessments.10a*Personality Inventory10aAlgorithms10aCalifornia10aDiagnosis, Computer-Assisted/*methods10aHumans10aModels, Psychological10aPsychometrics/methods10aReproducibility of Results1 aReise, S P1 aHenson, J M uhttp://iacat.org/content/computerization-and-adaptive-administration-neo-pi-r01594nas a2200157 4500008004100000245008200041210006900123300001100192490000700203520101400210653003401224653004001258100001601298700002401314856009801338 2000 eng d00aComputerized adaptive testing for classifying examinees into three categories0 aComputerized adaptive testing for classifying examinees into thr a713-340 v603 aThe objective of this study was to explore the possibilities for using computerized adaptive testing in situations in which examinees are to be classified into one of three categories.Testing algorithms with two different statistical computation procedures are described and evaluated. The first computation procedure is based on statistical testing and the other on statistical estimation. Item selection methods based on maximum information (MI) considering content and exposure control are considered. The measurement quality of the proposed testing algorithms is reported. The results of the study are that a reduction of at least 22% in the mean number of items can be expected in a computerized adaptive test (CAT) compared to an existing paper-and-pencil placement test. Furthermore, statistical testing is a promising alternative to statistical estimation. Finally, it is concluded that imposing constraints on the MI selection strategy does not negatively affect the quality of the testing algorithms10acomputerized adaptive testing10aComputerized classification testing1 aEggen, Theo1 aStraetmans, G J J M uhttp://iacat.org/content/computerized-adaptive-testing-classifying-examinees-three-categories02840nas a2200193 4500008004100000245013800041210006900179300000900248490000700257520208000264653003002344653002002374653005202394653002202446653001802468653002402486100001602510856012002526 2000 eng d00aThe development of a computerized version of Vandenberg's mental rotation test and the effect of visuo-spatial working memory loading0 adevelopment of a computerized version of Vandenbergs mental rota a39380 v603 aThis dissertation focused on the generation and evaluation of web-based versions of Vandenberg's Mental Rotation Test. Memory and spatial visualization theory were explored in relation to the addition of a visuo-spatial working memory component. Analysis of the data determined that there was a significant difference between scores on the MRT Computer and MRT Memory test. The addition of a visuo-spatial working memory component did significantly affect results at the .05 alpha level. Reliability and discrimination estimates were higher on the MRT Memory version. The computerization of the paper and pencil version on the MRT did not significantly effect scores but did effect the time required to complete the test. The population utilized in the quasi-experiment consisted of 107 university students from eight institutions in engineering graphics related courses. The subjects completed two researcher developed, Web-based versions of Vandenberg's Mental Rotation Test and the original paper and pencil version of the Mental Rotation Test. One version of the test included a visuo-spatial working memory loading. Significant contributions of this study included developing and evaluating computerized versions of Vandenberg's Mental Rotation Test. Previous versions of Vandenberg's Mental Rotation Test did not take advantage of the ability of the computer to incorporate an interaction factor, such as a visuo-spatial working memory loading, into the test. The addition of an interaction factor results in a more discriminate test which will lend itself well to computerized adaptive testing practices. Educators in engineering graphics related disciplines should strongly consider the use of spatial visualization tests to aid in establishing the effects of modern computer systems on fundamental design/drafting skills. Regular testing of spatial visualization skills will result assist in the creation of a more relevant curriculum. Computerized tests which are valid and reliable will assist in making this task feasible. (PsycINFO Database Record (c) 2005 APA )10aComputer Assisted Testing10aMental Rotation10aShort Term Memory computerized adaptive testing10aTest Construction10aTest Validity10aVisuospatial Memory1 aStrong, S D uhttp://iacat.org/content/development-computerized-version-vandenbergs-mental-rotation-test-and-effect-visuo-spatial03270nas a2200217 4500008004100000245020800041210007000249300001000319490000700329520241300336653002102749653002402770653003202794653001302826100001902839700001802858700001702876700001802893700001702911856012402928 2000 eng d00aDiagnostische programme in der Demenzfrüherkennung: Der Adaptive Figurenfolgen-Lerntest (ADAFI) [Diagnostic programs in the early detection of dementia: The Adaptive Figure Series Learning Test (ADAFI)]0 aDiagnostische programme in der Demenzfrüherkennung Der Adaptive a16-290 v133 aZusammenfassung: Untersucht wurde die Eignung des computergestützten Adaptiven Figurenfolgen-Lerntests (ADAFI), zwischen gesunden älteren Menschen und älteren Menschen mit erhöhtem Demenzrisiko zu differenzieren. Der im ADAFI vorgelegte Aufgabentyp der fluiden Intelligenzdimension (logisches Auffüllen von Figurenfolgen) hat sich in mehreren Studien zur Erfassung des intellektuellen Leistungspotentials (kognitive Plastizität) älterer Menschen als günstig für die genannte Differenzierung erwiesen. Aufgrund seiner Konzeption als Diagnostisches Programm fängt der ADAFI allerdings einige Kritikpunkte an Vorgehensweisen in diesen bisherigen Arbeiten auf. Es konnte gezeigt werden, a) daß mit dem ADAFI deutliche Lokationsunterschiede zwischen den beiden Gruppen darstellbar sind, b) daß mit diesem Verfahren eine gute Vorhersage des mentalen Gesundheitsstatus der Probanden auf Einzelfallebene gelingt (Sensitivität: 80 %, Spezifität: 90 %), und c) daß die Vorhersageleistung statusdiagnostischer Tests zur Informationsverarbeitungsgeschwindigkeit und zum Arbeitsgedächtnis geringer ist. Die Ergebnisse weisen darauf hin, daß die plastizitätsorientierte Leistungserfassung mit dem ADAFI vielversprechend für die Frühdiagnostik dementieller Prozesse sein könnte.The aim of this study was to examine the ability of the computerized Adaptive Figure Series Learning Test (ADAFI) to differentiate among old subjects at risk for dementia and old healthy controls. Several studies on the subject of measuring the intellectual potential (cognitive plasticity) of old subjects have shown the usefulness of the fluid intelligence type of task used in the ADAFI (completion of figure series) for this differentiation. Because the ADAFI has been developed as a Diagnostic Program it is able to counter some critical issues in those studies. It was shown a) that distinct differences between both groups are revealed by the ADAFI, b) that the prediction of the cognitive health status of individual subjects is quite good (sensitivity: 80 %, specifity: 90 %), and c) that the prediction of the cognitive health status with tests of processing speed and working memory is worse than with the ADAFI. The results indicate that the ADAFI might be a promising plasticity-oriented tool for the measurement of cognitive decline in the elderly, and thus might be useful for the early detection of dementia.10aAdaptive Testing10aAt Risk Populations10aComputer Assisted Diagnosis10aDementia1 aSchreiber, M D1 aSchneider, RJ1 aSchweizer, A1 aBeckmann, J F1 aBaltissen, R uhttp://iacat.org/content/diagnostische-programme-der-demenzfr%C3%BCherkennung-der-adaptive-figurenfolgen-lerntest-adafi00712nas a2200193 4500008004100000245008400041210006900125300001400194490000700208653003000215653001100245653002500256653001600281653005500297653002200352653002300374100001800397856010300415 2000 eng d00aEmergence of item response modeling in instrument development and data analysis0 aEmergence of item response modeling in instrument development an aII60-II650 v3810aComputer Assisted Testing10aHealth10aItem Response Theory10aMeasurement10aStatistical Validity computerized adaptive testing10aTest Construction10aTreatment Outcomes1 aHambleton, RK uhttp://iacat.org/content/emergence-item-response-modeling-instrument-development-and-data-analysis01428nas a2200193 4500008004100000245006300041210006300104300001200167490000700179520079500186653001800981653002100999653003001020653001801050653005701068100001501125700001201140856008201152 2000 eng d00aEstimation of trait level in computerized adaptive testing0 aEstimation of trait level in computerized adaptive testing a257-2650 v243 aNotes that in computerized adaptive testing (CAT), a examinee's trait level (θ) must be estimated with reasonable accuracy based on a small number of item responses. A successful implementation of CAT depends on (1) the accuracy of statistical methods used for estimating θ and (2) the efficiency of the item-selection criterion. Methods of estimating θ suitable for CAT are reviewed, and the differences between Fisher and Kullback-Leibler information criteria for selecting items are discussed. The accuracy of different CAT algorithms was examined in an empirical study. The results show that correcting θ estimates for bias was necessary at earlier stages of CAT, but most CAT algorithms performed equally well for tests of 10 or more items. (PsycINFO Database Record (c) 2005 APA )10a(Statistical)10aAdaptive Testing10aComputer Assisted Testing10aItem Analysis10aStatistical Estimation computerized adaptive testing1 aCheng, P E1 aLiou, M uhttp://iacat.org/content/estimation-trait-level-computerized-adaptive-testing02311nas a2200205 4500008004100000245012100041210006900162300000800231490000700239520159200246653002101838653003001859653002201889653001801911653001601929653000901945653001801954100001401972856011901986 2000 eng d00aAn examination of the reliability and validity of performance ratings made using computerized adaptive rating scales0 aexamination of the reliability and validity of performance ratin a5700 v613 aThis study compared the psychometric properties of performance ratings made using recently-developed computerized adaptive rating scales (CARS) to the psyc hometric properties of ratings made using more traditional paper-and-pencil rati ng formats, i.e., behaviorally-anchored and graphic rating scales. Specifically, the reliability, validity and accuracy of the performance ratings from each for mat were examined. One hundred twelve participants viewed six 5-minute videotape s of office situations and rated the performance of a target person in each vide otape on three contextual performance dimensions-Personal Support, Organizationa l Support, and Conscientious Initiative-using CARS and either behaviorally-ancho red or graphic rating scales. Performance rating properties were measured using Shrout and Fleiss's intraclass correlation (2, 1), Borman's differential accurac y measure, and Cronbach's accuracy components as indexes of rating reliability, validity, and accuracy, respectively. Results found that performance ratings mad e using the CARS were significantly more reliable and valid than performance rat ings made using either of the other formats. Additionally, CARS yielded more acc urate performance ratings than the paper-and-pencil formats. The nature of the C ARS system (i.e., its adaptive nature and scaling methodology) and its paired co mparison judgment task are offered as possible reasons for the differences found in the psychometric properties of the performance ratings made using the variou s rating formats. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aPerformance Tests10aRating Scales10aReliability10aTest10aTest Validity1 aBuck, D E uhttp://iacat.org/content/examination-reliability-and-validity-performance-ratings-made-using-computerized-adaptive00688nas a2200169 4500008004100000020001400041245015800055210006900213300001200282490000600294653003400300100001700334700001500351700001200366700001400378856012600392 2000 eng d a1575-910500aLos tests adaptativos informatizados en la frontera del siglo XXI: Una revisión [Computerized adaptive tests at the turn of the 21st century: A review]0 aLos tests adaptativos informatizados en la frontera del siglo XX a183-2160 v210acomputerized adaptive testing1 aHontangas, P1 aPonsoda, V1 aOlea, J1 aAbad, F J uhttp://iacat.org/content/los-tests-adaptativos-informatizados-en-la-frontera-del-siglo-xxi-una-revisi%C3%B3n-computerized00863nas a2200145 4500008004100000245006600041210006600107300001200173490000700185520036100192653002100553653004400574100001500618856008400633 2000 eng d00aOverview of the computerized adaptive testing special section0 aOverview of the computerized adaptive testing special section a115-1200 v213 aThis paper provides an overview of the five papers included in the Psicologica special section on computerized adaptive testing. A short introduction to this topic is presented as well. The main results, the links between the five papers and the general research topic to which they are more related are also shown. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputers computerized adaptive testing1 aPonsoda, V uhttp://iacat.org/content/overview-computerized-adaptive-testing-special-section01195nas a2200133 4500008004100000245008300041210006900124300001200193490000700205520069300212653003400905100002000939856010200959 2000 eng d00aTaylor approximations to logistic IRT models and their use in adaptive testing0 aTaylor approximations to logistic IRT models and their use in ad a307-3430 v253 aTaylor approximation can be used to generate a linear approximation to a logistic ICC and a linear ability estimator. For a specific situation it will be shown to result in a special case of a Robbins-Monro item selection procedure for adaptive testing. The linear estimator can be used for the situation of zero and perfect scores when maximum likelihood estimation fails to come up with a finite estimate. It is also possible to use this estimator to generate starting values for maximum likelihood and weighted likelihood estimation. Approximations to the expectation and variance of the linear estimator for a sequence of Robbins-Monro item selections can be determined analytically. 10acomputerized adaptive testing1 aVeerkamp, W J J uhttp://iacat.org/content/taylor-approximations-logistic-irt-models-and-their-use-adaptive-testing00508nas a2200121 4500008004100000245009600041210006900137300000900206490000700215653003400222100002300256856010700279 1999 eng d00aAlternative methods for the detection of item preknowledge in computerized adaptive testing0 aAlternative methods for the detection of item preknowledge in co a37650 v5910acomputerized adaptive testing1 aMcLeod, Lori Davis uhttp://iacat.org/content/alternative-methods-detection-item-preknowledge-computerized-adaptive-testing01730nas a2200145 4500008004100000245005800041210005700099300001200156490000700168520126300175653003401438100001901472700001201491856008101503 1999 eng d00aa-stratified multistage computerized adaptive testing0 aastratified multistage computerized adaptive testing a211-2220 v233 aFor computerized adaptive tests (CAT) based on the three-parameter logistic mode it was found that administering items with low discrimination parameter (a) values early in the test and administering those with high a values later was advantageous; the skewness of item exposure distributions was reduced while efficiency was maintain in trait level estimation. Thus, a new multistage adaptive testing approach is proposed that factors a into the item selection process. In this approach, the items in the item bank are stratified into a number of levels based on their a values. The early stages of a test use items with lower as and later stages use items with higher as. At each stage, items are selected according to an optimization criterion from the corresponding level. Simulation studies were performed to compare a-stratified CATs with CATs based on the Sympson-Hetter method for controlling item exposure. Results indicated that this new strategy led to tests that were well-balanced, with respect to item exposure, and efficient. The a-stratified CATs achieved a lower average exposure rate than CATs based on Bayesian or information-based item selection and the Sympson-Hetter method. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aChang, Hua-Hua1 aYing, Z uhttp://iacat.org/content/stratified-multistage-computerized-adaptive-testing01272nas a2200145 4500008004100000245004000041210004000081260004600121300001000167520081600177653003400993100002401027700001401051856006101065 1999 eng d00aCAT for certification and licensure0 aCAT for certification and licensure aMahwah, N.J.bLawrence Erlbaum Associates a67-913 a(from the chapter) This chapter discusses implementing computerized adaptive testing (CAT) for high-stakes examinations that determine whether or not a particular candidate will be certified or licensed. The experience of several boards who have chosen to administer their licensure or certification examinations using the principles of CAT illustrates the process of moving into this mode of administration. Examples of the variety of options that can be utilized within a CAT administration are presented, the decisions that boards must make to implement CAT are discussed, and a timetable for completing the tasks that need to be accomplished is provided. In addition to the theoretical aspects of CAT, practical issues and problems are reviewed. (PsycINFO Database Record (c) 2002 APA, all rights reserved).10acomputerized adaptive testing1 aBergstrom, Betty, A1 aLunz, M E uhttp://iacat.org/content/cat-certification-and-licensure01768nas a2200265 4500008004100000245004900041210004800090300001000138490000600148520092600154653002501080653005701105653001501162653001201177653001001189653002101199653006401220653001101284653002201295653001101317653000901328653007801337100001701415856007001432 1999 eng d00aCompetency gradient for child-parent centers0 aCompetency gradient for childparent centers a35-520 v33 aThis report describes an implementation of the Rasch model during the longitudinal evaluation of a federally-funded early childhood preschool intervention program. An item bank is described for operationally defining a psychosocial construct called community life-skills competency, an expected teenage outcome of the preschool intervention. This analysis examined the position of teenage students on this scale structure, and investigated a pattern of cognitive operations necessary for students to pass community life-skills test items. Then this scale structure was correlated with nationally standardized reading and math achievement scores, teacher ratings, and school records to assess its validity as a measure of the community-related outcome goal for this intervention. The results show a functional relationship between years of early intervention and magnitude of effect on the life-skills competency variable.10a*Models, Statistical10aActivities of Daily Living/classification/psychology10aAdolescent10aChicago10aChild10aChild, Preschool10aEarly Intervention (Education)/*statistics & numerical data10aFemale10aFollow-Up Studies10aHumans10aMale10aOutcome and Process Assessment (Health Care)/*statistics & numerical data1 aBezruczko, N uhttp://iacat.org/content/competency-gradient-child-parent-centers00911nas a2200145 4500008004100000245006100041210006000102300001100162490000700173520043400180653003400614100001600648700001600664856008500680 1999 eng d00aComputerized Adaptive Testing: Overview and Introduction0 aComputerized Adaptive Testing Overview and Introduction a187-940 v233 aUse of computerized adaptive testing (CAT) has increased substantially since it was first formulated in the 1970s. This paper provides an overview of CAT and introduces the contributions to this Special Issue. The elements of CAT discussed here include item selection procedures, estimation of the latent trait, item exposure, measurement precision, and item bank development. Some topics for future research are also presented. 10acomputerized adaptive testing1 aMeijer, R R1 aNering, M L uhttp://iacat.org/content/computerized-adaptive-testing-overview-and-introduction01680nas a2200145 4500008004100000245010000041210006900141300001000210490000700220520112900227653003401356100001601390700001501406856011301421 1999 eng d00aThe effect of model misspecification on classification decisions made using a computerized test0 aeffect of model misspecification on classification decisions mad a47-590 v363 aMany computerized testing algorithms require the fitting of some item response theory (IRT) model to examinees' responses to facilitate item selection, the determination of test stopping rules, and classification decisions. Some IRT models are thought to be particularly useful for small volume certification programs that wish to make the transition to computerized adaptive testing (CAT). The 1-parameter logistic model (1-PLM) is usually assumed to require a smaller sample size than the 3-parameter logistic model (3-PLM) for item parameter calibrations. This study examined the effects of model misspecification on the precision of the decisions made using the sequential probability ratio test. For this comparison, the 1-PLM was used to estimate item parameters, even though the items' characteristics were represented by a 3-PLM. Results demonstrate that the 1-PLM produced considerably more decision errors under simulation conditions similar to a real testing environment, compared to the true model and to a fixed-form standard reference set of items. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aKalohn, J C1 aSpray, J A uhttp://iacat.org/content/effect-model-misspecification-classification-decisions-made-using-computerized-test00750nas a2200145 4500008004100000245005500041210005500096300001100151490000700162520028800169653003400457100001600491700001700507856008000524 1999 eng d00aGraphical models and computerized adaptive testing0 aGraphical models and computerized adaptive testing a223-370 v233 aConsiders computerized adaptive testing from the perspective of graphical modeling (GM). GM provides methods for making inferences about multifaceted skills and knowledge and for extracting data from complex performances. Provides examples from language-proficiency assessment. (SLD)10acomputerized adaptive testing1 aAlmond, R G1 aMislevy, R J uhttp://iacat.org/content/graphical-models-and-computerized-adaptive-testing02467nam a2200133 4500008004100000245004300041210004300084260005200127520201600179653003402195100001502229700002402244856006502268 1999 eng d00aInnovations in computerized assessment0 aInnovations in computerized assessment aMahwah, N.J.bLawrence Erlbaum Associates, Inc.3 aChapters in this book present the challenges and dilemmas faced by researchers as they created new computerized assessments, focusing on issues addressed in developing, scoring, and administering the assessments. Chapters are: (1) "Beyond Bells and Whistles; An Introduction to Computerized Assessment" (Julie B. Olson-Buchanan and Fritz Drasgow); (2) "The Development of a Computerized Selection System for Computer Programmers in a Financial Services Company" (Michael J. Zickar, Randall C. Overton, L. Rogers Taylor, and Harvey J. Harms); (3) "Development of the Computerized Adaptive Testing Version of the Armed Services Vocational Aptitude Battery" (Daniel O. Segall and Kathleen E. Moreno); (4) "CAT for Certification and Licensure" (Betty A. Bergstrom and Mary E. Lunz); (5) "Developing Computerized Adaptive Tests for School Children" (G. Gage Kingsbury and Ronald L. Houser); (6) "Development and Introduction of a Computer Adaptive Graduate Record Examinations General Test" (Craig N. Mills); (7) "Computer Assessment Using Visual Stimuli: A Test of Dermatological Skin Disorders" (Terry A. Ackerman, John Evans, Kwang-Seon Park, Claudia Tamassia, and Ronna Turner); (8) "Creating Computerized Adaptive Tests of Music Aptitude: Problems, Solutions, and Future Directions" (Walter P. Vispoel); (9) "Development of an Interactive Video Assessment: Trials and Tribulations" (Fritz Drasgow, Julie B. Olson-Buchanan, and Philip J. Moberg); (10) "Computerized Assessment of Skill for a Highly Technical Job" (Mary Ann Hanson, Walter C. Borman, Henry J. Mogilka, Carol Manning, and Jerry W. Hedge); (11) "Easing the Implementation of Behavioral Testing through Computerization" (Wayne A. Burroughs, Janet Murray, S. Scott Wesley, Debra R. Medina, Stacy L. Penn, Steven R. Gordon, and Michael Catello); and (12) "Blood, Sweat, and Tears: Some Final Comments on Computerized Assessment." (Fritz Drasgow and Julie B. Olson-Buchanan). Each chapter contains references. (Contains 17 tables and 21 figures.) (SLD)10acomputerized adaptive testing1 aDrasgow, F1 aOlson-Buchanan, J B uhttp://iacat.org/content/innovations-computerized-assessment01086nas a2200133 4500008004100000245007800041210006900119300001200188490000700200520059200207653003400799100002300833856009600856 1999 eng d00aMultidimensional adaptive testing with a minimum error-variance criterion0 aMultidimensional adaptive testing with a minimum errorvariance c a398-4120 v243 aAdaptive testing under a multidimensional logistic response model is addressed. An algorithm is proposed that minimizes the (asymptotic) variance of the maximum-likelihood estimator of a linear combination of abilities of interest. The criterion results in a closed-form expression that is easy to evaluate. In addition, it is shown how the algorithm can be modified if the interest is in a test with a "simple ability structure". The statistical properties of the adaptive ML estimator are demonstrated for a two-dimensional item pool with several linear combinations of the abilities. 10acomputerized adaptive testing1 avan der Linden, WJ uhttp://iacat.org/content/multidimensional-adaptive-testing-minimum-error-variance-criterion02642nas a2200133 4500008004100000245007300041210006900114300000900183490000700192520216800199653003402367100001602401856009102417 1999 eng d00aOptimal design for item calibration in computerized adaptive testing0 aOptimal design for item calibration in computerized adaptive tes a42200 v593 aItem Response Theory is the psychometric model used for standardized tests such as the Graduate Record Examination. A test-taker's response to an item is modelled as a binary response with success probability depending on parameters for both the test-taker and the item. Two popular models are the two-parameter logistic (2PL) model and the three-parameter logistic (3PL) model. For the 2PL model, the logit of the probability of a correct response equals ai(theta j-bi), where ai and bi are item parameters, while thetaj is the test-taker's parameter, known as "proficiency." The 3PL model adds a nonzero left asymptote to model random response behavior by low theta test-takers. Assigning scores to students requires accurate estimation of theta s, while accurate estimation of theta s requires accurate estimation of the item parameters. The operational implementation of Item Response Theory, particularly following the advent of computerized adaptive testing, generally involves handling these two estimation problems separately. This dissertation addresses the optimal design for item parameter estimation. Most current designs calibrate items with a sample drawn from the overall test-taking population. For 2PL models a sequential design based on the D-optimality criterion has been proposed, while no 3PL design is in the literature. In this dissertation, we design the calibration with the ultimate use of the items in mind, namely to estimate test-takers' proficiency parameters. For both the 2PL and 3PL models, this criterion leads to a locally L-optimal design criterion, named the Minimal Information Loss criterion. In turn, this criterion and the General Equivalence Theorem give a two point design for the 2PL model and a three point design for the 3PL model. A sequential implementation of this optimal design is presented. For the 2PL model, this design is almost 55% more efficient than the simple random sample approach, and 12% more efficient than the locally D-optimal design. For the 3PL model, the proposed design is 34% more efficient than the simple random sample approach. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aBuyske, S G uhttp://iacat.org/content/optimal-design-item-calibration-computerized-adaptive-testing01186nas a2200157 4500008004100000245010900041210006900150300001200219490000700231520058300238653003400821100002300855700001600878700001800894856011600912 1999 eng d00aUsing response-time constraints to control for differential speededness in computerized adaptive testing0 aUsing responsetime constraints to control for differential speed a195-2100 v233 aAn item-selection algorithm is proposed for neutralizing the differential effects of time limits on computerized adaptive test scores. The method is based on a statistical model for distributions of examinees’ response times on items in a bank that is updated each time an item is administered. Predictions from the model are used as constraints in a 0-1 linear programming model for constrained adaptive testing that maximizes the accuracy of the trait estimator. The method is demonstrated empirically using an item bank from the Armed Services Vocational Aptitude Battery. 10acomputerized adaptive testing1 avan der Linden, WJ1 aScrams, D J1 aSchnipke, D L uhttp://iacat.org/content/using-response-time-constraints-control-differential-speededness-computerized-adaptive01452nas a2200133 4500008004100000245006700041210006700108300000900175490000700184520098800191653003401179100001901213856008601232 1998 eng d00aApplications of network flows to computerized adaptive testing0 aApplications of network flows to computerized adaptive testing a08550 v593 aRecently, the concept of Computerized Adaptive Testing (CAT) has been receiving ever growing attention from the academic community. This is so because of both practical and theoretical considerations. Its practical importance lies in the advantages of CAT over the traditional (perhaps outdated) paper-and-pencil test in terms of time, accuracy, and money. The theoretical interest is sparked by its natural relationship to Item Response Theory (IRT). This dissertation offers a mathematical programming approach which creates a model that generates a CAT that takes care of many questions concerning the test, such as feasibility, accuracy and time of testing, as well as item pool security. The CAT generated is designed to obtain the most information about a single test taker. Several methods for eatimating the examinee's ability, based on the (dichotomous) responses to the items in the test, are also offered here. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aClaudio, M J C uhttp://iacat.org/content/applications-network-flows-computerized-adaptive-testing01749nas a2200217 4500008004100000245015600041210006900197300001100266490000600277520090700283653004001190653002201230653002401252653002301276653003701299653001001336653002401346653002701370100001801397856011601415 1998 eng d00aThe effect of item pool restriction on the precision of ability measurement for a Rasch-based CAT: comparisons to traditional fixed length examinations0 aeffect of item pool restriction on the precision of ability meas a97-1220 v23 aThis paper describes a method for examining the precision of a computerized adaptive test with a limited item pool. Standard errors of measurement ascertained in the testing of simulees with a CAT using a restricted pool were compared to the results obtained in a live paper-and-pencil achievement testing of 4494 nursing students on four versions of an examination of calculations of drug administration. CAT measures of precision were considered when the simulated examine pools were uniform and normal. Precision indices were also considered in terms of the number of CAT items required to reach the precision of the traditional tests. Results suggest that regardless of the size of the item pool, CAT provides greater precision in measurement with a smaller number of items administered even when the choice of items is limited but fails to achieve equiprecision along the entire ability continuum.10a*Decision Making, Computer-Assisted10aComparative Study10aComputer Simulation10aEducation, Nursing10aEducational Measurement/*methods10aHuman10aModels, Statistical10aPsychometrics/*methods1 aHalkitis, P N uhttp://iacat.org/content/effect-item-pool-restriction-precision-ability-measurement-rasch-based-cat-comparisons01236nas a2200157 4500008004100000245006600041210006600107300001000173490000600183520071600189653003400905100001500939700001700954700001900971856008800990 1998 eng d00aMaintaining content validity in computerized adaptive testing0 aMaintaining content validity in computerized adaptive testing a29-410 v33 aThe authors empirically demonstrate some of the trade-offs which can occur when content balancing is imposed in computerized adaptive testing (CAT) forms or conversely, when it is ignored. The authors contend that the content validity of a CAT form can actually change across a score scale when content balancing is ignored. However they caution that, efficiency and score precision can be severely reduced by over specifying content restrictions in a CAT form. The results from 2 simulation studies are presented as a means of highlighting some of the trade-offs that could occur between content and statistical considerations in CAT form assembly. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aLuecht, RM1 aChamplain, A1 aNungester, R J uhttp://iacat.org/content/maintaining-content-validity-computerized-adaptive-testing01435nas a2200145 4500008004100000245005300041210005100094300001200145490000700157520098100164653003401145100002301179700001501202856007201217 1998 eng d00aA model for optimal constrained adaptive testing0 amodel for optimal constrained adaptive testing a259-2700 v223 aA model for constrained computerized adaptive testing is proposed in which the information in the test at the trait level (0) estimate is maximized subject to a number of possible constraints on the content of the test. At each item-selection step, a full test is assembled to have maximum information at the current 0 estimate, fixing the items already administered. Then the item with maximum in-formation is selected. All test assembly is optimal because a linear programming (LP) model is used that automatically updates to allow for the attributes of the items already administered and the new value of the 0 estimator. The LP model also guarantees that each adaptive test always meets the entire set of constraints. A simulation study using a bank of 753 items from the Law School Admission Test showed that the 0 estimator for adaptive tests of realistic lengths did not suffer any loss of efficiency from the presence of 433 constraints on the item selection process. 10acomputerized adaptive testing1 avan der Linden, WJ1 aReese, L M uhttp://iacat.org/content/model-optimal-constrained-adaptive-testing01294nas a2200157 4500008004100000245007500041210006900116300001000185490000700195520076100202653003400963100001800997700001401015700001701029856009001046 1998 eng d00aSimulating the use of disclosed items in computerized adaptive testing0 aSimulating the use of disclosed items in computerized adaptive t a48-680 v353 aRegular use of questions previously made available to the public (i.e., disclosed items) may provide one way to meet the requirement for large numbers of questions in a continuous testing environment, that is, an environment in which testing is offered at test taker convenience throughout the year rather than on a few prespecified test dates. First it must be shown that such use has effects on test scores small enough to be acceptable. In this study simulations are used to explore the use of disclosed items under a worst-case scenario which assumes that disclosed items are always answered correctly. Some item pool and test designs were identified in which the use of disclosed items produces effects on test scores that may be viewed as negligible.10acomputerized adaptive testing1 aStocking, M L1 aWard, W C1 aPotenza, M T uhttp://iacat.org/content/simulating-use-disclosed-items-computerized-adaptive-testing02911nas a2200133 4500008004100000245016300041210006900204300000800273490000700281520232300288653003402611100001402645856011802659 1997 eng d00aA comparison of maximum likelihood estimation and expected a posteriori estimation in computerized adaptive testing using the generalized partial credit model0 acomparison of maximum likelihood estimation and expected a poste a4530 v583 aA simulation study was conducted to investigate the application of expected a posteriori (EAP) trait estimation in computerized adaptive tests (CAT) based on the generalized partial credit model (Muraki, 1992), and to compare the performance of EAP with maximum likelihood trait estimation (MLE). The performance of EAP was evaluated under different conditions: the number of quadrature points (10, 20, and 30), and the type of prior distribution (normal, uniform, negatively skewed, and positively skewed). The relative performance of the MLE and EAP estimation methods were assessed under two distributional forms of the latent trait, one normal and the other negatively skewed. Also, both the known item parameters and estimated item parameters were employed in the simulation study. Descriptive statistics, correlations, scattergrams, accuracy indices, and audit trails were used to compare the different methods of trait estimation in CAT. The results showed that, regardless of the latent trait distribution, MLE and EAP with a normal prior, a uniform prior, or the prior that matches the latent trait distribution using either 20 or 30 quadrature points provided relatively accurate estimation in CAT based on the generalized partial credit model. However, EAP using only 10 quadrature points did not work well in the generalized partial credit CAT. Also, the study found that increasing the number of quadrature points from 20 to 30 did not increase the accuracy of EAP estimation. Therefore, it appears 20 or more quadrature points are sufficient for accurate EAP estimation. The results also showed that EAP with a negatively skewed prior and positively skewed prior performed poorly for the normal data set, and EAP with positively skewed prior did not provide accurate estimates for the negatively skewed data set. Furthermore, trait estimation in CAT using estimated item parameters produced results similar to those obtained using known item parameters. In general, when at least 20 quadrature points are used, EAP estimation with a normal prior, a uniform prior or the prior that matches the latent trait distribution appears to be a good alternative to MLE in the application of polytomous CAT based on the generalized partial credit model. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aChen, S-K uhttp://iacat.org/content/comparison-maximum-likelihood-estimation-and-expected-posteriori-estimation-computerized01682nam a2200145 4500008004100000245006100041210006000102260006200162520115300224653003401377100001501411700001601426700001701442856007701459 1997 eng d00aComputerized adaptive testing: From inquiry to operation0 aComputerized adaptive testing From inquiry to operation aWashington, D.C., USAbAmerican Psychological Association3 a(from the cover) This book traces the development of computerized adaptive testing (CAT) from its origins in the 1960s to its integration with the Armed Services Vocational Aptitude Battery (ASVAB) in the 1990s. A paper-and-pencil version of the battery (P&P-ASVAB) has been used by the Defense Department since the 1970s to measure the abilities of applicants for military service. The test scores are used both for initial qualification and for classification into entry-level training opportunities. /// This volume provides the developmental history of the CAT-ASVAB through its various stages in the Joint-Service arena. Although the majority of the book concerns the myriad technical issues that were identified and resolved, information is provided on various political and funding support challenges that were successfully overcome in developing, testing, and implementing the battery into one of the nation's largest testing programs. The book provides useful information to professionals in the testing community and everyone interested in personnel assessment and evaluation. (PsycINFO Database Record (c) 2004 APA, all rights reserved).10acomputerized adaptive testing1 aSands, W A1 aWaters, B K1 aMcBride, J R uhttp://iacat.org/content/computerized-adaptive-testing-inquiry-operation01729nas a2200169 4500008004100000245009900041210006900140300001200209490000700221520112300228653002101351653003001372653000801402653002301410100001601433856011001449 1997 eng d00aThe distribution of indexes of person fit within the computerized adaptive testing environment0 adistribution of indexes of person fit within the computerized ad a115-1270 v213 aThe extent to which a trait estimate represents the underlying latent trait of interest can be estimated by using indexes of person fit. Several statistical methods for indexing person fit have been proposed to identify nonmodel-fitting response vectors. These person-fit indexes have generally been found to follow a standard normal distribution for conventionally administered tests. The present investigation found that within the context of computerized adaptive testing (CAT) these indexes tended not to follow a standard normal distribution. As the item pool became less discriminating, as the CAT termination criterion became less stringent, and as the number of items in the pool decreased, the distributions of the indexes approached a standard normal distribution. It was determined that under these conditions the indexes' distributions approached standard normal distributions because more items were being administered. However, even when over 50 items were administered in a CAT the indexes were distributed in a fashion that was different from what was expected. (PsycINFO Database Record (c) 2006 APA )10aAdaptive Testing10aComputer Assisted Testing10aFit10aPerson Environment1 aNering, M L uhttp://iacat.org/content/distribution-indexes-person-fit-within-computerized-adaptive-testing-environment01795nas a2200169 4500008004100000245014100041210006900182300001200251490000700263520113700270653003401407100001401441700001301455700002101468700001401489856012201503 1997 eng d00aThe effect of population distribution and method of theta estimation on computerized adaptive testing (CAT) using the rating scale model0 aeffect of population distribution and method of theta estimation a422-4390 v573 aInvestigated the effect of population distribution on maximum likelihood estimation (MLE) and expected a posteriori estimation (EAP) in a simulation study of computerized adaptive testing (CAT) based on D. Andrich's (1978) rating scale model. Comparisons were made among MLE and EAP with a normal prior distribution and EAP with a uniform prior distribution within 2 data sets: one generated using a normal trait distribution and the other using a negatively skewed trait distribution. Descriptive statistics, correlations, scattergrams, and accuracy indices were used to compare the different methods of trait estimation. The EAP estimation with a normal prior or uniform prior yielded results similar to those obtained with MLE, even though the prior did not match the underlying trait distribution. An additional simulation study based on real data suggested that more work is needed to determine the optimal number of quadrature points for EAP in CAT based on the rating scale model. The choice between MLE and EAP for particular measurement situations is discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aChen, S-K1 aHou, L Y1 aFitzpatrick, S J1 aDodd, B G uhttp://iacat.org/content/effect-population-distribution-and-method-theta-estimation-computerized-adaptive-testing-cat01810nas a2200169 4500008004100000245005300041210005300094250001000147260006000157300001000217520125400227653003401481100001701515700001601532700001701548856007501565 1997 eng d00aResearch antecedents of applied adaptive testing0 aResearch antecedents of applied adaptive testing axviii aWashington D.C. USAbAmerican Psychological Association a47-573 a(from the chapter) This chapter sets the stage for the entire computerized adaptive testing Armed Services Vocational Aptitude Battery (CAT-ASVAB) development program by describing the state of the art immediately preceding its inception. By the mid-l970s, a great deal of research had been conducted that provided the technical underpinnings needed to develop adaptive tests, but little research had been done to corroborate empirically the promising results of theoretical analyses and computer simulation studies. In this chapter, the author summarizes much of the important theoretical and simulation research prior to 1977. In doing so, he describes a variety of approaches to adaptive testing, and shows that while many methods for adaptive testing had been proposed, few practical attempts had been made to implement it. Furthermore, the few instances of adaptive testing were based primarily on traditional test theory, and were developed in laboratory settings for purposes of basic research. The most promising approaches, those based on item response theory and evaluated analytically or by means of computer simulations, remained to be proven in the crucible of live testing. (PsycINFO Database Record (c) 2004 APA, all rights reserved).10acomputerized adaptive testing1 aMcBride, J R1 aWaters, B K1 aMcBride, J R uhttp://iacat.org/content/research-antecedents-applied-adaptive-testing01402nas a2200133 4500008004100000245008900041210006900130300001200199490000700211520089300218653003401111100001801145856010501163 1997 eng d00aRevising item responses in computerized adaptive tests: A comparison of three models0 aRevising item responses in computerized adaptive tests A compari a129-1420 v213 aInterest in the application of large-scale computerized adaptive testing has focused attention on issues that arise when theoretical advances are made operational. One such issue is that of the order in which exaniinees address questions within a test or separately timed test section. In linear testing, this order is entirely under the control of the examinee, who can look ahead at questions and return and revise answers to questions. Using simulation, this study investigated three models that permit restricted examinee control over revising previous answers in the context of adaptive testing. Even under a worstcase model of examinee revision behavior, two of the models of permitting item revisions worked well in preserving test fairness and accuracy. One model studied may also preserve some cognitive processing styles developed by examinees for a linear testing environment. 10acomputerized adaptive testing1 aStocking, M L uhttp://iacat.org/content/revising-item-responses-computerized-adaptive-tests-comparison-three-models01946nas a2200133 4500008004100000245005600041210005600097260002100153520149700174653003401671100001801705700001701723856007201740 1997 eng d00aValidation of CATSIB To investigate DIF of CAT data0 aValidation of CATSIB To investigate DIF of CAT data aChicago, IL. USA3 aThis paper investigates the performance of CATSIB (a modified version of the SIBTEST computer program) to assess differential item functioning (DIF) in the context of computerized adaptive testing (CAT). One of the distinguishing features of CATSIB is its theoretically built-in regression correction to control for the Type I error rates when the distributions of the reference and focal groups differ on the intended ability. This phenomenon is also called impact. The Type I error rate of CATSIB with the regression correction (WRC) was compared with that of CATSIB without the regression correction (WORC) to see if the regression correction was indeed effective. Also of interest was the power level of CATSIB after the regression correction. The subtest size was set at 25 items, and sample size, the impact level, and the amount of DIF were varied. Results show that the regression correction was very useful in controlling for the Type I error, CATSIB WORC had inflated observed Type I errors, especially when impact levels were high. The CATSIB WRC had observed Type I error rates very close to the nominal level of 0.05. The power rates of CATSIB WRC were impressive. As expected, the power increased as the sample size increased and as the amount of DIF increased. Even for small samples with high impact rates, power rates were 64% or higher for high DIF levels. For large samples, power rates were over 90% for high DIF levels. (Contains 12 tables and 7 references.) (Author/SLD)10acomputerized adaptive testing1 aNandakumar, R1 aRoussos, L A uhttp://iacat.org/content/validation-catsib-investigate-dif-cat-data00557nas a2200121 4500008004100000245011900041210006900160260002100229653003400250653001900284100001400303856011800317 1996 eng d00aA comparison of the traditional maximum information method and the global information method in CAT item selection0 acomparison of the traditional maximum information method and the aNew York, NY USA10acomputerized adaptive testing10aitem selection1 aTang, K L uhttp://iacat.org/content/comparison-traditional-maximum-information-method-and-global-information-method-cat-item02572nas a2200133 4500008004100000245009100041210006900132300000900201490000700210520206600217653003402283100001402317856010702331 1996 eng d00aDynamic scaling: An ipsative procedure using techniques from computer adaptive testing0 aDynamic scaling An ipsative procedure using techniques from comp a58240 v563 aThe purpose of this study was to create a prototype method for scaling items using computer adaptive testing techniques and to demonstrate the method with a working model program. The method can be used to scale items, rank individuals with respect to the scaled items, and to re-scale the items with respect to the individuals' responses. When using this prototype method, the items to be scaled are part of a database that contains not only the items, but measures of how individuals respond to each item. After completion of all presented items, the individual is assigned an overall scale value which is then compared with each item responded to, and an individual "error" term is stored with each item. After several individuals have responded to the items, the item error terms are used to revise the placement of the scaled items. This revision feature allows the natural adaptation of one general list to reflect subgroup differences, for example, differences among geographic areas or ethnic groups. It also provides easy revision and limited authoring of the scale items by the computer program administrator. This study addressed the methodology, the instrumentation needed to handle the scale-item administration, data recording, item error analysis, and scale-item database editing required by the method, and the behavior of a prototype vocabulary test in use. Analyses were made of item ordering, response profiles, item stability, reliability and validity. Although slow, the movement of unordered words used as items in the prototype program was accurate as determined by comparison with an expert word ranking. Person scores obtained by multiple administrations of the prototype test were reliable and correlated at.94 with a commercial paper-and-pencil vocabulary test, while holding a three-to-one speed advantage in administration. Although based upon self-report data, dynamic scaling instruments like the model vocabulary test could be very useful for self-assessment, for pre (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aBerg, S R uhttp://iacat.org/content/dynamic-scaling-ipsative-procedure-using-techniques-computer-adaptive-testing02609nas a2200133 4500008004100000245011400041210006900155300000900224490000700233520206700240653003402307100001602341856011802357 1996 eng d00aThe effect of individual differences variables on the assessment of ability for Computerized Adaptive Testing0 aeffect of individual differences variables on the assessment of a40850 v573 aComputerized Adaptive Testing (CAT) continues to gain momentum as the accepted testing modality for a growing number of certification, licensure, education, government and human resource applications. However, the developers of these tests have for the most part failed to adequately explore the impact of individual differences such as test anxiety on the adaptive testing process. It is widely accepted that non-cognitive individual differences variables interact with the assessment of ability when using written examinations. Logic would dictate that individual differences variables would equally affect CAT. Two studies were used to explore this premise. In the first study, 507 examinees were given a test anxiety survey prior to taking a high stakes certification exam using CAT or using a written format. All examinees had already completed their course of study, and the examination would be their last hurdle prior to being awarded certification. High test anxious examinees performed worse than their low anxious counterparts on both testing formats. The second study replicated the finding that anxiety depresses performance in CAT. It also addressed the differential effect of anxiety on within test performance. Examinees were candidates taking their final certification examination following a four year college program. Ability measures were calculated for each successive part of the test for 923 subjects. Within subject performance varied depending upon test position. High anxious examinees performed poorly at all points in the test, while low and medium anxious examinee performance peaked in the middle of the test. If test anxiety and performance measures were actually the same trait, then low anxious individuals should have performed equally well throughout the test. The observed interaction of test anxiety and time on task serves as strong evidence that test anxiety has motivationally mediated as well as cognitively mediated effects. The results of the studies are di (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aGershon, RC uhttp://iacat.org/content/effect-individual-differences-variables-assessment-ability-computerized-adaptive-testing01524nas a2200229 4500008004100000020004100041245010500082210006900187250001500256260001200271300000900283490000700292520069800299653002500997653002501022653004301047653003701090653001101127100001601138700001801154856012201172 1996 eng d a0363-3624 (Print)0363-3624 (Linking)00aMethodologic trends in the healthcare professions: computer adaptive and computer simulation testing0 aMethodologic trends in the healthcare professions computer adapt a1996/07/01 cJul-Aug a13-40 v213 aAssessing knowledge and performance on computer is rapidly becoming a common phenomenon in testing and measurement. Computer adaptive testing presents an individualized test format in accordance with the examinee's ability level. The efficiency of the testing process enables a more precise estimate of performance, often with fewer items than traditional paper-and-pencil testing methodologies. Computer simulation testing involves performance-based, or authentic, assessment of the examinee's clinical decision-making abilities. The authors discuss the trends in assessing performance through computerized means and the application of these methodologies to community-based nursing practice.10a*Clinical Competence10a*Computer Simulation10aComputer-Assisted Instruction/*methods10aEducational Measurement/*methods10aHumans1 aForker, J E1 aMcDonald, M E uhttp://iacat.org/content/methodologic-trends-healthcare-professions-computer-adaptive-and-computer-simulation-testing01606nas a2200133 4500008004100000245009100041210006900132300001200201490000700213520109200220653003401312100001501346856011101361 1996 eng d00aMultidimensional computerized adaptive testing in a certification or licensure context0 aMultidimensional computerized adaptive testing in a certificatio a389-4040 v203 a(from the journal abstract) Multidimensional item response theory (MIRT) computerized adaptive testing, building on a recent work by D. O. Segall (1996), is applied in a licensing/certification context. An example of a medical licensure test is used to demonstrate situations in which complex, integrated content must be balanced at the total test level for validity reasons, but items assigned to reportable subscore categories may be used under a MIRT adaptive paradigm to improve the reliability of the subscores. A heuristic optimization framework is outlined that generalizes to both univariate and multivariate statistical objective functions, with additional systems of constraints included to manage the content balancing or other test specifications on adaptively constructed test forms. Simulation results suggested that a multivariate treatment of the problem, although complicating somewhat the objective function used and the estimation of traits, nonetheless produces advantages from a psychometric perspective. (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aLuecht, RM uhttp://iacat.org/content/multidimensional-computerized-adaptive-testing-certification-or-licensure-context02605nas a2200133 4500008004100000245012000041210006900161300000900230490000700239520205500246653003402301100001602335856012002351 1995 eng d00aAssessment of scaled score consistency in adaptive testing from a multidimensional item response theory perspective0 aAssessment of scaled score consistency in adaptive testing from a55980 v553 aThe purpose of this study was twofold: (a) to examine whether the unidimensional adaptive testing estimates are comparable for different ability levels of examinees when the true examinee-item interaction is correctly modeled using a compensatory multidimensional item response theory (MIRT) model; and (b) to investigate the effects of adaptive testing estimation when the procedure of item selection of computerized adaptive testing (CAT) is controlled by either content-balancing or selecting the most informative item in a user specified direction at the current estimate of unidimensional ability. A series of Monte Carlo simulations were conducted in this study. Deviation from the reference composite angle was used as an index of the theta1,theta2-composite consistency across the different levels of unidimensional CAT estimates. In addition, the effect of the content-balancing item selection procedure and the fixed-direction item selection procedure were compared across the different ability levels. The characteristics of item selection, test information and the relationship between unidimensional and multidimensional models were also investigated. In addition to employing statistical analysis to examine the robustness of the CAT procedure violations of unidimensionality, this research also included graphical analyses to present the results. The results were summarized as follows: (a) the reference angles for the no-control-item-selection method were disparate across the unidimensional ability groups; (b) the unidimensional CAT estimates from the content-balancing item selection method did not offer much improvement; (c) the fixed-direction-item selection method did provide greater consistency for the unidimensional CAT estimates across the different levels of ability; (d) and, increasing the CAT test length did not provide greater score scale consistency. Based on the results of this study, the following conclusions were drawn: (a) without any controlling (PsycINFO Database Record (c) 2003 APA, all rights reserved).10acomputerized adaptive testing1 aFan, Miechu uhttp://iacat.org/content/assessment-scaled-score-consistency-adaptive-testing-multidimensional-item-response-theory00605nas a2200145 4500008004100000245010000041210006900141260004400210300001200254490000600266653003400272100002400306700001400330856011500344 1994 eng d00aThe equivalence of Rasch item calibrations and ability estimates across modes of administration0 aequivalence of Rasch item calibrations and ability estimates acr aNorwood, N.J. USAbAblex Publishing Co. a122-1280 v210acomputerized adaptive testing1 aBergstrom, Betty, A1 aLunz, M E uhttp://iacat.org/content/equivalence-rasch-item-calibrations-and-ability-estimates-across-modes-administration00503nas a2200121 4500008004100000245009300041210006900134300000900203490000700212653003400219100001300253856011500266 1994 eng d00aMonte Carlo simulation comparison of two-stage testing and computerized adaptive testing0 aMonte Carlo simulation comparison of twostage testing and comput a25480 v5410acomputerized adaptive testing1 aKim, H-O uhttp://iacat.org/content/monte-carlo-simulation-comparison-two-stage-testing-and-computerized-adaptive-testing00497nas a2200121 4500008004100000245009700041210006900138300001400207490000700221653003400228100001200262856010100274 1993 eng d00aAn application of Computerized Adaptive Testing to the Test of English as a Foreign Language0 aapplication of Computerized Adaptive Testing to the Test of Engl a4257-42580 v5310acomputerized adaptive testing1 aMoon, O uhttp://iacat.org/content/application-computerized-adaptive-testing-test-english-foreign-language00509nas a2200133 4500008004100000245008100041210006900122300001000191490000700201653003400208100001900242700001600261856009800277 1993 eng d00aAssessing the utility of item response models: computerized adaptive testing0 aAssessing the utility of item response models computerized adapt a21-270 v1210acomputerized adaptive testing1 aKingsbury, G G1 aHouser, R L uhttp://iacat.org/content/assessing-utility-item-response-models-computerized-adaptive-testing00470nas a2200121 4500008004100000245008000041210006900121300000900190490000700199653003400206100001500240856009300255 1993 eng d00aComparability and validity of computerized adaptive testing with the MMPI-20 aComparability and validity of computerized adaptive testing with a37910 v5310acomputerized adaptive testing1 aRoper, B L uhttp://iacat.org/content/comparability-and-validity-computerized-adaptive-testing-mmpi-200572nas a2200121 4500008004100000245015100041210006900192300000900261490000700270653003400277100001700311856012200328 1993 eng d00aComputer adaptive testing: A comparison of four item selection strategies when used with the golden section search strategy for estimating ability0 aComputer adaptive testing A comparison of four item selection st a17720 v5410acomputerized adaptive testing1 aCarlson, R D uhttp://iacat.org/content/computer-adaptive-testing-comparison-four-item-selection-strategies-when-used-golden-section01216nas a2200157 4500008004100000245006600041210006600107300001200173490000600185520069800191653003400889100002400923700001400947700001600961856008100977 1992 eng d00aAltering the level of difficulty in computer adaptive testing0 aAltering the level of difficulty in computer adaptive testing a137-1490 v53 aExamines the effect of altering test difficulty on examinee ability measures and test length in a computer adaptive test. The 225 Ss were randomly assigned to 3 test difficulty conditions and given a variable length computer adaptive test. Examinees in the hard, medium, and easy test condition took a test targeted at the 50%, 60%, or 70% probability of correct response. The results show that altering the probability of a correct response does not affect estimation of examinee ability and that taking an easier computer adaptive test only slightly increases the number of items necessary to reach specified levels of precision. (PsycINFO Database Record (c) 2002 APA, all rights reserved).10acomputerized adaptive testing1 aBergstrom, Betty, A1 aLunz, M E1 aGershon, RC uhttp://iacat.org/content/altering-level-difficulty-computer-adaptive-testing00477nas a2200121 4500008004100000245008100041210006900122300000900191490000700200653003400207100002100241856009300262 1992 eng d00aThe development and evaluation of a system for computerized adaptive testing0 adevelopment and evaluation of a system for computerized adaptive a43040 v5210acomputerized adaptive testing1 aTorre Sanchez, R uhttp://iacat.org/content/development-and-evaluation-system-computerized-adaptive-testing00495nas a2200121 4500008004100000245008200041210006900123300000900192490000700201653003400208100002400242856010700266 1992 eng d00aTest anxiety and test performance under computerized adaptive testing methods0 aTest anxiety and test performance under computerized adaptive te a25180 v5210acomputerized adaptive testing1 aPowell, Zen-Hsiu, E uhttp://iacat.org/content/test-anxiety-and-test-performance-under-computerized-adaptive-testing-methods00580nas a2200121 4500008004100000245016000041210006900201300000900270490000700279653003400286100002000320856011800340 1991 eng d00aA comparison of paper-and-pencil, computer-administered, computerized feedback, and computerized adaptive testing methods for classroom achievement testing0 acomparison of paperandpencil computeradministered computerized f a17190 v5210acomputerized adaptive testing1 aKuan, Tsung Hao uhttp://iacat.org/content/comparison-paper-and-pencil-computer-administered-computerized-feedback-and-computerized00435nas a2200121 4500008004100000245006100041210006000102300001200162490000700174653003400181100001500215856008300230 1991 eng d00aInter-subtest branching in computerized adaptive testing0 aIntersubtest branching in computerized adaptive testing a140-1410 v5210acomputerized adaptive testing1 aChang, S-H uhttp://iacat.org/content/inter-subtest-branching-computerized-adaptive-testing00734nas a2200169 4500008004100000020000900041245012400050210006900174260008700243653003400330653001500364653001800379100001600397700001900413700001700432856011500449 1991 eng d aR-1100aPatterns of alcohol and drug use among federal offenders as assessed by the Computerized Lifestyle Screening Instrument0 aPatterns of alcohol and drug use among federal offenders as asse aOttawa, ON. CanadabResearch and Statistics Branch, Correctional Service of Canada10acomputerized adaptive testing10adrug abuse10asubstance use1 aRobinson, D1 aPorporino, F J1 aMillson, W A uhttp://iacat.org/content/patterns-alcohol-and-drug-use-among-federal-offenders-assessed-computerized-lifestyle01504nas a2200157 4500008004100000245008900041210006900130300001200199490000700211520093700218653003401155100002001189700001401209700001401223856010901237 1990 eng d00aA simulation and comparison of flexilevel and Bayesian computerized adaptive testing0 asimulation and comparison of flexilevel and Bayesian computerize a227-2390 v273 aComputerized adaptive testing (CAT) is a testing procedure that adapts an examination to an examinee's ability by administering only items of appropriate difficulty for the examinee. In this study, the authors compared Lord's flexilevel testing procedure (flexilevel CAT) with an item response theory-based CAT using Bayesian estimation of ability (Bayesian CAT). Three flexilevel CATs, which differed in test length (36, 18, and 11 items), and three Bayesian CATs were simulated; the Bayesian CATs differed from one another in the standard error of estimate (SEE) used for terminating the test (0.25, 0.10, and 0.05). Results showed that the flexilevel 36- and 18-item CATs produced ability estimates that may be considered as accurate as those of the Bayesian CAT with SEE = 0.10 and comparable to the Bayesian CAT with SEE = 0.05. The authors discuss the implications for classroom testing and for item response theory-based CAT.10acomputerized adaptive testing1 ade Ayala, R. J.1 aDodd, B G1 aKoch, W R uhttp://iacat.org/content/simulation-and-comparison-flexilevel-and-bayesian-computerized-adaptive-testing00423nas a2200133 4500008004100000020001400041245005100055210005000106300001000156490000600166653003400172100001700206856006600223 1989 eng d a1745-399200aAdaptive testing: The evolution of a good idea0 aAdaptive testing The evolution of a good idea a11-150 v810acomputerized adaptive testing1 aReckase, M D uhttp://iacat.org/content/adaptive-testing-evolution-good-idea00501nas a2200121 4500008004100000245009800041210006900139300000900208490000700217653003400224100001400258856010700272 1989 eng d00aApplication of computerized adaptive testing to the University Entrance Exam of Taiwan, R.O.C0 aApplication of computerized adaptive testing to the University E a36620 v4910acomputerized adaptive testing1 aHung, P-H uhttp://iacat.org/content/application-computerized-adaptive-testing-university-entrance-exam-taiwan-roc01759nas a2200133 4500008004100000245005400041210005100095260005500146300000800201520129200209653003401501100001701535856007301552 1989 eng d00aAn applied study on computerized adaptive testing0 aapplied study on computerized adaptive testing aGroningen, The NetherlandsbUniversity of Groingen a1853 a(from the cover) The rapid development and falling prices of powerful personal computers, in combination with new test theories, will have a large impact on psychological testing. One of the new possibilities is computerized adaptive testing. During the test administration each item is chosen to be appropriate for the person being tested. The test becomes tailor-made, resolving some of the problems with classical paper-and-pencil tests. In this way individual differences can be measured with higher efficiency and reliability. Scores on other meaningful variables, such as response time, can be obtained easily using computers. /// In this book a study on computerized adaptive testing is described. The study took place at Dutch Railways in an applied setting and served practical goals. Topics discussed include the construction of computerized tests, the use of response time, the choice of algorithms and the implications of using a latent trait model. After running a number of simulations and calibrating the item banks, an experiment was carried out. In the experiment a pretest was administered to a sample of over 300 applicants, followed by an adaptive test. In addition, a survey concerning the attitudes of testees towards computerized testing formed part of the design.10acomputerized adaptive testing1 aSchoonman, W uhttp://iacat.org/content/applied-study-computerized-adaptive-testing01445nas a2200157 4500008004100000245007900041210006900120300001000189490000600199520090200205653003401107100002001141700001701161700001701178856009201195 1989 eng d00aA real-data simulation of computerized adaptive administration of the MMPI0 arealdata simulation of computerized adaptive administration of t a18-220 v13 aA real-data simulation of computerized adaptive administration of the MMPI was conducted with data obtained from two personnel-selection samples and two clinical samples. A modification of the countdown method was tested to determine the usefulness, in terms of item administration savings, of several different test administration procedures. Substantial item administration savings were achieved for all four samples, though the clinical samples required administration of more items to achieve accurate classification and/or full-scale scores than did the personnel-selection samples. The use of normative item endorsement frequencies was found to be as effective as sample-specific frequencies for the determination of item administration order. The role of computerized adaptive testing in the future of personality assessment is discussed., (C) 1989 by the American Psychological Association10acomputerized adaptive testing1 aBen-Porath, Y S1 aSlutske, W S1 aButcher, J N uhttp://iacat.org/content/real-data-simulation-computerized-adaptive-administration-mmpi00529nas a2200121 4500008004100000245010800041210006900149300000900218490000700227653003400234100002000268856011900288 1988 eng d00aComputerized adaptive testing: A comparison of the nominal response model and the three parameter model0 aComputerized adaptive testing A comparison of the nominal respon a31480 v4810acomputerized adaptive testing1 ade Ayala, R. J. uhttp://iacat.org/content/computerized-adaptive-testing-comparison-nominal-response-model-and-three-parameter-model00616nas a2200133 4500008004100000245011200041210006900153260003800222653003400260653003800294100001500332700001700347856011800364 1987 eng d00aThe effect of item parameter estimation error on decisions made using the sequential probability ratio test0 aeffect of item parameter estimation error on decisions made usin aIowa City, IA. USAbDTIC Document10acomputerized adaptive testing10aSequential probability ratio test1 aSpray, J A1 aReckase, M D uhttp://iacat.org/content/effect-item-parameter-estimation-error-decisions-made-using-sequential-probability-ratio01520nas a2200157 4500008004100000020001400041245008900055210006900144300001000213490000700223520095700230653003401187100001801221700002001239856010301259 1986 eng d a0013-164400aAn application of computer adaptive testing with communication handicapped examinees0 aapplication of computer adaptive testing with communication hand a23-350 v463 aThis study was conducted to evaluate a computerized adaptive testing procedure for the measurement of mathematical skills of entry level deaf college students. The theoretical basis of the study was the Rasch model for person measurement. Sixty persons were tested using an Apple II Plus microcomputer. Ability estimates provided by the computerized procedure were compared for stability with those obtained six to eight weeks earlier from conventional (written) testing of the same subject matter. Students' attitudes toward their testing experiences also were measured. Substantial increases in measurement efficiency (by reducing test length) were realized through the adaptive testing procedure. Because the item pool used was not specifically designed for adaptive testing purposes, the psychometric quality of measurements resulting from the different testing methods was approximately equal. Attitudes toward computerized testing were favorable.10acomputerized adaptive testing1 aGarrison, W M1 aBaumgarten, B S uhttp://iacat.org/content/application-computer-adaptive-testing-communication-handicapped-examinees00645nas a2200121 4500008004100000245022600041210006900267300000900336490000700345653003400352100001900386856011800405 1985 eng d00aAdaptive self-referenced testing as a procedure for the measurement of individual change due to instruction: A comparison of the reliabilities of change estimates obtained from conventional and adaptive testing procedures0 aAdaptive selfreferenced testing as a procedure for the measureme a30570 v4510acomputerized adaptive testing1 aKingsbury, G G uhttp://iacat.org/content/adaptive-self-referenced-testing-procedure-measurement-individual-change-due-instruction01566nas a2200169 4500008004100000245013900041210006900180300001200249490000600261520091500267653003401182100001601216700001601232700001701248700001401265856011701279 1984 eng d00aRelationship between corresponding Armed Services Vocational Aptitude Battery (ASVAB) and computerized adaptive testing (CAT) subtests0 aRelationship between corresponding Armed Services Vocational Apt a155-1630 v83 aInvestigated the relationships between selected subtests from the Armed Services Vocational Aptitude Battery (ASVAB) and corresponding subtests administered as computerized adaptive tests (CATs), using 270 17-26 yr old Marine recruits as Ss. Ss were administered the ASVAB before enlisting and approximately 2 wks after entering active duty, and the CAT tests were administered to Ss approximately 24 hrs after arriving at the recruit depot. Results indicate that 3 adaptive subtests correlated as well with ASVAB as did the 2nd administration of the ASVAB, although CAT subtests contained only half the number of items. Factor analysis showed CAT subtests to load on the same factors as the corresponding ASVAB subtests, indicating that the same abilities were being measured. It is concluded that CAT can achieve the same measurement precision as a conventional test, with half the number of items. (16 ref) 10acomputerized adaptive testing1 aMoreno, K E1 aWetzel, C D1 aMcBride, J R1 aWeiss, DJ uhttp://iacat.org/content/relationship-between-corresponding-armed-services-vocational-aptitude-battery-asvab-and00651nas a2200205 4500008004100000020001400041245006700055210006700122300001200189490000700201653003400208653001700242653002100259100001400280700001400294700001900308700001300327700001700340856008800357 1984 eng d a1745-398400aTechnical guidelines for assessing computerized adaptive tests0 aTechnical guidelines for assessing computerized adaptive tests a347-3600 v2110acomputerized adaptive testing10aMode effects10apaper-and-pencil1 aGreen, BF1 aBock, R D1 aHumphreys, L G1 aLinn, RL1 aReckase, M D uhttp://iacat.org/content/technical-guidelines-assessing-computerized-adaptive-tests00569nas a2200157 4500008004100000245006000041210005700101260003800158300001200196653000900208653004900217653004100266653000900307100001700316856007800333 1983 eng d00aA procedure for decision making using tailored testing.0 aprocedure for decision making using tailored testing aNew York, NY. USAbAcademic Press a237-25410aCCAT10aCLASSIFICATION Computerized Adaptive Testing10asequential probability ratio testing10aSPRT1 aReckase, M D uhttp://iacat.org/content/procedure-decision-making-using-tailored-testing00567nas a2200121 4500008004100000245015600041210006900197300000900266490000700275653003400282100001500316856011400331 1982 eng d00aAbility measurement, test bias reduction, and psychological reactions to testing as a function of computer adaptive testing versus conventional testing0 aAbility measurement test bias reduction and psychological reacti a42330 v4210acomputerized adaptive testing1 aOrban, J A uhttp://iacat.org/content/ability-measurement-test-bias-reduction-and-psychological-reactions-testing-function00562nas a2200181 4500008004100000245005100041210004900092300001100141490000700152653000900159653004900168653004100217653000900258100001300267700001400280700001600294856007000310 1972 eng d00aSequential testing for dichotomous decisions. 0 aSequential testing for dichotomous decisions a85-95.0 v3210aCCAT10aCLASSIFICATION Computerized Adaptive Testing10asequential probability ratio testing10aSPRT1 aLinn, RL1 aRock, D A1 aCleary, T A uhttp://iacat.org/content/sequential-testing-dichotomous-decisions