01395nas a2200181 4500008004500000022001400045245008700059210006900146490000700215520080200222653003401024653002501058653001901083653001901102653001601121100002101137856005501158 2024 Engldsh a2165-659200aThe Influence of Computerized Adaptive Testing on Psychometric Theory and Practice0 aInfluence of Computerized Adaptive Testing on Psychometric Theor0 v113 a
The major premise of this article is that part of the stimulus for the evolution of psychometric theory since the 1950s was the introduction of the concept of computerized adaptive testing (CAT) or its earlier non-CAT variations. The conceptual underpinnings of CAT that had the most influence on psychometric theory was the shift of emphasis from the test (or test score) as the focus of analysis to the test item (or item score). The change in focus allowed a change in the way that test results are conceived of as measurements. It also resolved the conflict among a number of ideas that were present in the early work on psychometric theory. Some of the conflicting ideas are summarized below to show how work on the development of CAT resolved some of those conflicts.
10acomputerized adaptive testing10aItem Response Theory10aparadigm shift10ascaling theory10atest design1 aReckase, Mark, D uhttps://jcatpub.net/index.php/jcat/issue/view/34/900533nas a2200181 4500008004500000022001400045245004000059210004000099260001200139300000800151490000600159653003300165653003000198653002000228653002700248100002200275856005400297 2022 Engldsh a2165-659200aImproving Precision of CAT Measures0 aImproving Precision of CAT Measures c10/2022 a1-70 v910a: dichotomously scored items10aoption probability theory10ascoring methods10asubjective probability1 aBarnard, John, J. uhttp://iacat.org/improving-precision-cat-measures00579nas a2200169 4500008004500000245007100045210006900116300000900185490000600194653003100200653002000231653005100251100001800302700001400320700001500334856006000349 2019 Engldsh 00aHow Adaptive Is an Adaptive Test: Are All Adaptive Tests Adaptive?0 aHow Adaptive Is an Adaptive Test Are All Adaptive Tests Adaptive a1-140 v710acomputerized adaptive test10amultistage test10astatistical indicators of amount of adaptation1 aReckase, Mark1 aJu, Unhee1 aKim, Sewon uhttp://iacat.org/jcat/index.php/jcat/article/view/69/3403946nas a2200181 4500008004100000245009300041210006900134260005500203520329800258653001203556653002403568653002603592653002503618100001403643700002103657700001503678856007103693 2017 eng d00aDIF-CAT: Doubly Adaptive CAT Using Subgroup Information to Improve Measurement Precision0 aDIFCAT Doubly Adaptive CAT Using Subgroup Information to Improve aNiigata, JapanbNiigata Seiryo Universityc08/20173 aDifferential item functioning (DIF) is usually regarded as a test fairness issue in high-stakes tests. In low-stakes tests, it is more of an accuracy problem. However, in low-stakes tests, the same method, deleting items that demonstrate significant DIF, is still employed to treat DIF items. When political concerns are not important, such as in low-stakes tests and instruments that are not used to make decisions about people, deleting items might not be optimal. Computerized adaptive testing (CAT) is more and more frequently used in low-stakes tests. The DIF-CAT method evaluated in this research is designed to cope with DIF in a CAT environment. Using this method, item parameters are separately estimated for the focal group and the reference group in a DIF study, then CATs are administered based on different sets of item parameters for the focal and reference groups.
To evaluate the performance of the DIF-CAT procedure, it was compared in a simulation study to (1) deleting all the DIF items in a CAT bank and (2) ignoring DIF. A 300-item flat item bank and a 300-item peaked item bank were simulated using the three-parameter logistic IRT model with D = 1,7. 40% of the items in each bank showed DIF. The DIF size was b and/or a = 0.5 while original b ranged from -3 to 3 and a ranged from 0.3 to 2.1. Three types of DIF were considered: (1) uniform DIF caused by differences in b, non-uniform DIF caused by differences in a, and non-uniform DIF caused by differences in both a and b. 500 normally distributed simulees in each of reference and focal groups were used in item parameter re-calibration. In the Delete DIF method, only DIF-free items were calibrated. In the Ignore DIF method, all the items were calibrated using all simulees without differentiating the groups. In the DIF-CAT method, the DIF-free items were used as anchor items to estimate the item parameters for the focal and reference groups and the item parameters from recalibration were used. All simulees used the same item parameters in the Delete method and the Ignore method. CATs for simulees within the two groups used group-specific item parameters in the DIF-CAT method. In the CAT stage, 100 simulees were generated for each of the reference and focal groups, at each of six discrete q levels ranging from -2.5 to 2.5. CAT test length was fixed at 40 items. Bias, average absolute difference, RMSE, standard error of θ estimates, and person fit, were used to compare the performance of the DIF methods. DIF item usage was also recorded for the Ignore method and the DIF-CAT method.
Generally, the DIF-CAT method outperformed both the Delete method and the Ignore method in dealing with DIF items in CAT. The Delete method, which is the most frequently used method for handling DIF, performed the worst of the three methods in a CAT environment, as reflected in multiple indices of measurement precision. Even the Ignore method, which simply left DIF items in the item bank, provided θ estimates of higher precision than the Delete method. This poor performance of the Delete method was probably due to reduction in size of the item bank available for each CAT.
Session Video
10aDIF-CAT10aDoubly Adaptive CAT10aMeasurement Precision10asubgroup information1 aWang, Joy1 aWeiss, David, J.1 aWang, Chun uhttps://drive.google.com/open?id=1Gu4FR06qM5EZNp_Ns0Kt3HzBqWAv3LPy01509nas a2200145 4500008004100000245003400041210003300075260005500108520107900163653002101242653001601263653001701279100002201296856004501318 2017 eng d00aGrow a Tiger out of Your CAT 0 aGrow a Tiger out of Your CAT aNiigata, JapanbNiigata Seiryo Universityc08/20173 aThe main focus in the community of test developers and researchers is on improving adaptive test procedures and methodologies. Yet, the transition from research projects to larger-scale operational CATs is facing its own challenges. Usually, these operational CATs find their origin in government tenders. “Scalability”, “Interoperability” and “Transparency” are three keywords often found in these documents. Scalability is concerned with parallel system architectures which are based upon stateless selection algorithms. Design capacities often range from 10,000 to well over 100,000 concurrent students. Interoperability is implemented in standards like QTI, standards that were not designed with adaptive testing in mind. Transparency is being realized by open source software: the adaptive test should not be a black box. These three requirements often complicate the development of an adaptive test, or sometimes even conflict.
Session Video
10ainteroparability10aScalability10atransparency1 aVerschoor, Angela uhttp://iacat.org/grow-tiger-out-your-cat02579nas a2200157 4500008004100000245011700041210006900158260005500227520193000282653001102212653001902223653002702242100001802269700001902287856011502306 2017 eng d00aA New Cognitive Diagnostic Computerized Adaptive Testing for Simultaneously Diagnosing Skills and Misconceptions0 aNew Cognitive Diagnostic Computerized Adaptive Testing for Simul aNiigata, JapanbNiigata Seiryo Universityc08/20173 aIn education diagnoses, diagnosing misconceptions is important as well as diagnosing skills. However, traditional cognitive diagnostic computerized adaptive testing (CD-CAT) is usually developed to diagnose skills. This study aims to propose a new CD-CAT that can simultaneously diagnose skills and misconceptions. The proposed CD-CAT is based on a recently published new CDM, called the simultaneously identifying skills and misconceptions (SISM) model (Kuo, Chen, & de la Torre, in press). A new item selection algorithm is also proposed in the proposed CD-CAT for achieving high adaptive testing performance. In simulation studies, we compare our new item selection algorithm with three existing item selection methods, including the Kullback–Leibler (KL) and posterior-weighted KL (PWKL) proposed by Cheng (2009) and the modified PWKL (MPWKL) proposed by Kaplan, de la Torre, and Barrada (2015). The results show that our proposed CD-CAT can efficiently diagnose skills and misconceptions; the accuracy of our new item selection algorithm is close to the MPWKL but less computational burden; and our new item selection algorithm outperforms the KL and PWKL methods on diagnosing skills and misconceptions.
References
Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74(4), 619–632. doi: 10.1007/s11336-009-9123-2
Kaplan, M., de la Torre, J., & Barrada, J. R. (2015). New item selection methods for cognitive diagnosis computerized adaptive testing. Applied Psychological Measurement, 39(3), 167–188. doi:10.1177/0146621614554650
Kuo, B.-C., Chen, C.-H., & de la Torre, J. (in press). A cognitive diagnosis model for identifying coexisting skills and misconceptions. Applied Psychological Measurement.
Session Video
10aCD-CAT10aMisconceptions10aSimultaneous diagnosis1 aKuo, Bor-Chen1 aChen, Chun-Hua uhttp://iacat.org/new-cognitive-diagnostic-computerized-adaptive-testing-simultaneously-diagnosing-skills-and-004515nas a2200181 4500008004100000245010800041210006900149260005500218520381200273653000804085653002404093653002604117100001604143700001204159700001604171700002004187856012604207 2017 eng d00aUsing Computerized Adaptive Testing to Detect Students’ Misconceptions: Exploration of Item Selection0 aUsing Computerized Adaptive Testing to Detect Students Misconcep aNiigata, JapanbNiigata Seiryo Universityc08/20173 aOwning misconceptions impedes learning, thus detecting misconceptions through assessments is crucial to facilitate teaching. However, most computerized adaptive testing (CAT) applications to diagnose examinees’ attribute profiles focus on whether examinees mastering correct concepts or not. In educational scenario, teachers and students have to figure out the misconceptions underlying incorrect answers after obtaining the scores from assessments and then correct the corresponding misconceptions. The Scaling Individuals and Classifying Misconceptions (SICM) models proposed by Bradshaw and Templin (2014) fill this gap. SICMs can identify a student’s misconceptions directly from the distractors of multiple-choice questions and report whether s/he own the misconceptions or not. Simultaneously, SICM models are able to estimate a continuous ability within the item response theory (IRT) framework to fulfill the needs of policy-driven assessment systems relying on scaling examinees’ ability. However, the advantage of providing estimations for two types of latent variables also causes complexity of model estimation. More items are required to achieve the same accuracies for both classification and estimation compared to dichotomous DCMs and to IRT, respectively. Thus, we aim to develop a CAT using the SICM models (SICM-CAT) to estimate students’ misconceptions and continuous abilities simultaneously using fewer items than a linear test.
To achieve this goal, in this study, our research questions mainly focus on establishing several item selection rules that target on providing both accurate classification results and continuous ability estimations using SICM-CAT. The first research question is which information criterion to be used. The Kullback–Leibler (KL) divergence is the first choice, as it can naturally combine the continuous and discrete latent variables. Based on this criterion, we propose an item selection index that can nicely integrate the two types of information. Based on this index, the items selected in real time could discriminate the examinee’s current misconception profile and ability estimates from other possible estimates to the most extent. The second research question is about how to adaptively balance the estimations of the misconception profile and the continuous latent ability. Mimic the idea of the Hybrid Design proposed by Wang et al. (2016), we propose a design framework which makes the item selection transition from the group-level to the item-level. We aim to explore several design questions, such as how to select the transiting point and which latent variable estimation should be targeted first.
Preliminary results indicated that the SICM-CAT based on the proposed item selection index could classify examinees into different latent classes and measure their latent abilities compared with the random selection method more accurately and reliably under all the simulation conditions. We plan to compare different CAT designs based on our proposed item selection rules with the best linear test as the next step. We expect that the SICM-CAT is able to use shorter test length while retaining the same accuracies and reliabilities.
References
Bradshaw, L., & Templin, J. (2014). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika, 79(3), 403-425.
Wang, S., Lin, H., Chang, H. H., & Douglas, J. (2016). Hybrid computerized adaptive testing: from group sequential design to fully sequential design. Journal of Educational Measurement, 53(1), 45-62.
Session Video
10aCAT10aincorrect answering10aStudent Misconception1 aShen, Yawei1 aBao, Yu1 aWang, Shiyu1 aBradshaw, Laine uhttp://iacat.org/using-computerized-adaptive-testing-detect-students%E2%80%99-misconceptions-exploration-item-selection-000689nas a2200193 4500008004500000022001400045245012100059210006900180300001000249490000600259653003100265653002300296653002200319653003200341653002500373653001800398100001500416856006400431 2014 Engldsh a2165-659200aDetecting Item Preknowledge in Computerized Adaptive Testing Using Information Theory and Combinatorial Optimization0 aDetecting Item Preknowledge in Computerized Adaptive Testing Usi a37-580 v210acombinatorial optimization10ahypothesis testing10aitem preknowledge10aKullback-Leibler divergence10asimulated annealing.10atest security1 aBelov, D I uhttp://www.iacat.org/jcat/index.php/jcat/article/view/36/1800478nas a2200145 4500008004100000245006100041210005900102653000800161653001600169653001300185100001900198700001600217700002200233856007700255 2011 eng d00aA Heuristic Of CAT Item Selection Procedure For Testlets0 aHeuristic Of CAT Item Selection Procedure For Testlets10aCAT10ashadow test10atestlets1 aChien, Yuehmei1 aShin, David1 aWay, Walter Denny uhttp://iacat.org/content/heuristic-cat-item-selection-procedure-testlets01247nas a2200181 4500008004100000245012100041210006900162260001200231520055600243653003300799653000800832653001900840653003500859100001800894700001600912700002200928856011500950 2011 eng d00aItem Selection Methods based on Multiple Objective Approaches for Classification of Respondents into Multiple Levels0 aItem Selection Methods based on Multiple Objective Approaches fo c10/20113 aIs it possible to develop new item selection methods which take advantage of the fact that we want to classify into multiple categories? New methods: Taking multiple points on the ability scale into account; Based on multiple objective approaches.
Conclusions
-
Sequential Classification Tests higher ATL than Adaptive Classification Tests
-
Sequential Classification Tests slightly lower PCD than Adaptive Classification Tests
-
Results also hold with three and four cutting points
10aadaptive classification test10aCAT10aitem selection10asequential classification test1 aGroen, Maaike1 aEggen, Theo1 aVeldkamp, Bernard uhttp://iacat.org/content/item-selection-methods-based-multiple-objective-approaches-classification-respondents01253nas a2200181 4500008004100000245008300041210006900124260001200193520063300205653000800838653002600846653000900872653003500881653001600916653001400932100002300946856010200969 2011 eng d00aMoving beyond Efficiency to Allow CAT to Provide Better Diagnostic Information0 aMoving beyond Efficiency to Allow CAT to Provide Better Diagnost c10/20113 a
Future CATs will provide better diagnostic information to
–Examinees
–Regulators, Educators, Employers
–Test Developers
This goal will be accomplished by
–Smart CATs which collect additional information during the test
–Psychomagic
The time is now for Reporting
10aCAT10adianostic information10aMIRT10aMultiple unidimensional scales10apsychomagic10asmart CAT1 aBontempo, Brian, D uhttp://iacat.org/content/moving-beyond-efficiency-allow-cat-provide-better-diagnostic-information00313nas a2200109 4500008004100000245003200041210003100073653000800104653001600112100001800128856005700146 2011 eng d00aSmall-Sample Shadow Testing0 aSmallSample Shadow Testing10aCAT10ashadow test1 aJudd, Wallace uhttp://iacat.org/content/small-sample-shadow-testing03099nas a2200445 4500008004100000020004100041245012000082210006900202250001500271260001000286300001100296490000700307520175400314653003802068653002102106653001002127653000902137653002202146653002802168653003302196653001102229653001102240653000902251653001602260653001802276653001902294653003102313653003102344653001602375100001602391700001002407700001402417700001502431700001402446700001502460700001802475700002402493700001802517856011802535 2010 eng d a0161-8105 (Print)0161-8105 (Linking)00aDevelopment and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments0 aDevelopment and validation of patientreported outcome measures f a2010/06/17 cJun 1 a781-920 v333 aSTUDY OBJECTIVES: To develop an archive of self-report questions assessing sleep disturbance and sleep-related impairments (SRI), to develop item banks from this archive, and to validate and calibrate the item banks using classic validation techniques and item response theory analyses in a sample of clinical and community participants. DESIGN: Cross-sectional self-report study. SETTING: Academic medical center and participant homes. PARTICIPANTS: One thousand nine hundred ninety-three adults recruited from an Internet polling sample and 259 adults recruited from medical, psychiatric, and sleep clinics. INTERVENTIONS: None. MEASUREMENTS AND RESULTS: This study was part of PROMIS (Patient-Reported Outcomes Information System), a National Institutes of Health Roadmap initiative. Self-report item banks were developed through an iterative process of literature searches, collecting and sorting items, expert content review, qualitative patient research, and pilot testing. Internal consistency, convergent validity, and exploratory and confirmatory factor analysis were examined in the resulting item banks. Factor analyses identified 2 preliminary item banks, sleep disturbance and SRI. Item response theory analyses and expert content review narrowed the item banks to 27 and 16 items, respectively. Validity of the item banks was supported by moderate to high correlations with existing scales and by significant differences in sleep disturbance and SRI scores between participants with and without sleep disorders. CONCLUSIONS: The PROMIS sleep disturbance and SRI item banks have excellent measurement properties and may prove to be useful for assessing general aspects of sleep and SRI with various groups of patients and interventions.10a*Outcome Assessment (Health Care)10a*Self Disclosure10aAdult10aAged10aAged, 80 and over10aCross-Sectional Studies10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aPsychometrics10aQuestionnaires10aReproducibility of Results10aSleep Disorders/*diagnosis10aYoung Adult1 aBuysse, D J1 aYu, L1 aMoul, D E1 aGermain, A1 aStover, A1 aDodds, N E1 aJohnston, K L1 aShablesky-Cade, M A1 aPilkonis, P A uhttp://iacat.org/content/development-and-validation-patient-reported-outcome-measures-sleep-disturbance-and-sleep02877nas a2200493 4500008004100000020004100041245014100082210006900223250001500292260000800307300001100315490000700326520125100333653003001584653001001614653000901624653004601633653003301679653001101712653003101723653001101754653000901765653003301774653001601807653002401823653004601847653005501893653005501948653004602003653001902049653003102068653001402099100001602113700001502129700001302144700001402157700001502171700001702186700001502203700001702218700001502235700001302250856012002263 2009 eng d a0090-5550 (Print)0090-5550 (Linking)00aDevelopment of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis0 aDevelopment of an item bank for the assessment of depression in a2009/05/28 cMay a186-970 v543 aOBJECTIVE: The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. METHOD: The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. RESULTS: Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. CONCLUSIONS: The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings.10aAdaptation, Psychological10aAdult10aAged10aDepressive Disorder/*diagnosis/psychology10aDiagnosis, Computer-Assisted10aFemale10aHeart Diseases/*psychology10aHumans10aMale10aMental Disorders/*psychology10aMiddle Aged10aModels, Statistical10aOtorhinolaryngologic Diseases/*psychology10aPersonality Assessment/statistics & numerical data10aPersonality Inventory/*statistics & numerical data10aPsychometrics/statistics & numerical data10aQuestionnaires10aReproducibility of Results10aSick Role1 aForkmann, T1 aBoecker, M1 aNorra, C1 aEberle, N1 aKircher, T1 aSchauerte, P1 aMischke, K1 aWesthofen, M1 aGauggel, S1 aWirtz, M uhttp://iacat.org/content/development-item-bank-assessment-depression-persons-mental-illnesses-and-physical-diseases02747nas a2200433 4500008004100000020004600041245012800087210006900215250001500284300001200299490000700311520139300318653003401711653001501745653001001760653000901770653002201779653002501801653001101826653001101837653000901848653001601857653001501873653003801888653001901926653003101945653002801976653004802004653002202052100002002074700001202094700001402106700001602120700001402136700001702150700001502167700001502182856011602197 2009 eng d a1878-5921 (Electronic)0895-4356 (Linking)00aAn evaluation of patient-reported outcomes found computerized adaptive testing was efficient in assessing stress perception0 aevaluation of patientreported outcomes found computerized adapti a2008/07/22 a278-2870 v623 aOBJECTIVES: This study aimed to develop and evaluate a first computerized adaptive test (CAT) for the measurement of stress perception (Stress-CAT), in terms of the two dimensions: exposure to stress and stress reaction. STUDY DESIGN AND SETTING: Item response theory modeling was performed using a two-parameter model (Generalized Partial Credit Model). The evaluation of the Stress-CAT comprised a simulation study and real clinical application. A total of 1,092 psychosomatic patients (N1) were studied. Two hundred simulees (N2) were generated for a simulated response data set. Then the Stress-CAT was given to n=116 inpatients, (N3) together with established stress questionnaires as validity criteria. RESULTS: The final banks included n=38 stress exposure items and n=31 stress reaction items. In the first simulation study, CAT scores could be estimated with a high measurement precision (SE<0.32; rho>0.90) using 7.0+/-2.3 (M+/-SD) stress reaction items and 11.6+/-1.7 stress exposure items. The second simulation study reanalyzed real patients data (N1) and showed an average use of items of 5.6+/-2.1 for the dimension stress reaction and 10.0+/-4.9 for the dimension stress exposure. Convergent validity showed significantly high correlations. CONCLUSIONS: The Stress-CAT is short and precise, potentially lowering the response burden of patients in clinical decision making.10a*Diagnosis, Computer-Assisted10aAdolescent10aAdult10aAged10aAged, 80 and over10aConfidence Intervals10aFemale10aHumans10aMale10aMiddle Aged10aPerception10aQuality of Health Care/*standards10aQuestionnaires10aReproducibility of Results10aSickness Impact Profile10aStress, Psychological/*diagnosis/psychology10aTreatment Outcome1 aKocalevent, R D1 aRose, M1 aBecker, J1 aWalter, O B1 aFliege, H1 aBjorner, J B1 aKleiber, D1 aKlapp, B F uhttp://iacat.org/content/evaluation-patient-reported-outcomes-found-computerized-adaptive-testing-was-efficient01650nas a2200289 4500008004100000020004100041245011100082210006900193250001500262260000800277300001100285490000700296520053700303653004800840653006200888653005700950653001101007653002701018653002401045653005101069653004701120653003101167653001301198100001301211700001901224856011701243 2009 eng d a0007-1102 (Print)0007-1102 (Linking)00aThe maximum priority index method for severely constrained item selection in computerized adaptive testing0 amaximum priority index method for severely constrained item sele a2008/06/07 cMay a369-830 v623 aThis paper introduces a new heuristic approach, the maximum priority index (MPI) method, for severely constrained item selection in computerized adaptive testing. Our simulation study shows that it is able to accommodate various non-statistical constraints simultaneously, such as content balancing, exposure control, answer key balancing, and so on. Compared with the weighted deviation modelling method, it leads to fewer constraint violations and better exposure control while maintaining the same level of measurement precision.10aAptitude Tests/*statistics & numerical data10aDiagnosis, Computer-Assisted/*statistics & numerical data10aEducational Measurement/*statistics & numerical data10aHumans10aMathematical Computing10aModels, Statistical10aPersonality Tests/*statistics & numerical data10aPsychometrics/*statistics & numerical data10aReproducibility of Results10aSoftware1 aCheng, Y1 aChang, Hua-Hua uhttp://iacat.org/content/maximum-priority-index-method-severely-constrained-item-selection-computerized-adaptive02593nas a2200337 4500008004100000020004600041245012800087210006900215250001500284300000700299490000600306520149200312653003201804653002301836653002501859653003401884653001101918653001101929653000901940653002601949653003101975653002702006653001102033653001802044100001502062700001202077700001402089700001802103700001202121856012202133 2009 eng d a1477-7525 (Electronic)1477-7525 (Linking)00aReduction in patient burdens with graphical computerized adaptive testing on the ADL scale: tool development and simulation0 aReduction in patient burdens with graphical computerized adaptiv a2009/05/07 a390 v73 aBACKGROUND: The aim of this study was to verify the effectiveness and efficacy of saving time and reducing burden for patients, nurses, and even occupational therapists through computer adaptive testing (CAT). METHODS: Based on an item bank of the Barthel Index (BI) and the Frenchay Activities Index (FAI) for assessing comprehensive activities of daily living (ADL) function in stroke patients, we developed a visual basic application (VBA)-Excel CAT module, and (1) investigated whether the averaged test length via CAT is shorter than that of the traditional all-item-answered non-adaptive testing (NAT) approach through simulation, (2) illustrated the CAT multimedia on a tablet PC showing data collection and response errors of ADL clinical functional measures in stroke patients, and (3) demonstrated the quality control of endorsing scale with fit statistics to detect responding errors, which will be further immediately reconfirmed by technicians once patient ends the CAT assessment. RESULTS: The results show that endorsed items could be shorter on CAT (M = 13.42) than on NAT (M = 23) at 41.64% efficiency in test length. However, averaged ability estimations reveal insignificant differences between CAT and NAT. CONCLUSION: This study found that mobile nursing services, placed at the bedsides of patients could, through the programmed VBA-Excel CAT module, reduce the burden to patients and save time, more so than the traditional NAT paper-and-pencil testing appraisals.10a*Activities of Daily Living10a*Computer Graphics10a*Computer Simulation10a*Diagnosis, Computer-Assisted10aFemale10aHumans10aMale10aPoint-of-Care Systems10aReproducibility of Results10aStroke/*rehabilitation10aTaiwan10aUnited States1 aChien, T W1 aWu, H M1 aWang, W-C1 aCastillo, R V1 aChou, W uhttp://iacat.org/content/reduction-patient-burdens-graphical-computerized-adaptive-testing-adl-scale-tool-development00713nas a2200229 4500008004100000020004100041245005200082210005100134250001500185260000800200300000800208490000700216653003400223653005000257653001100307653003200318653001300350100001500363700001500378700001800393856007200411 2008 eng d a1075-2730 (Print)1075-2730 (Linking)00aAre we ready for computerized adaptive testing?0 aAre we ready for computerized adaptive testing a2008/04/02 cApr a3690 v5910a*Attitude of Health Personnel10a*Diagnosis, Computer-Assisted/instrumentation10aHumans10aMental Disorders/*diagnosis10aSoftware1 aUnick, G J1 aShumway, M1 aHargreaves, W uhttp://iacat.org/content/are-we-ready-computerized-adaptive-testing03432nas a2200481 4500008004100000020004600041245013800087210006900225250001500294260000800309300001200317490000700329520191400336653002702250653002302277653003102300653001502331653001602346653001002362653002102372653002402393653002302417653003802440653001102478653002202489653001102511653001102522653000902533653003702542653002102579653003102600653002602631653001702657653003202674653001602706653002802722100001602750700001502766700001002781700001502791700002502806856011902831 2008 eng d a1532-821X (Electronic)0003-9993 (Linking)00aAssessing self-care and social function using a computer adaptive testing version of the pediatric evaluation of disability inventory0 aAssessing selfcare and social function using a computer adaptive a2008/04/01 cApr a622-6290 v893 aOBJECTIVE: To examine score agreement, validity, precision, and response burden of a prototype computer adaptive testing (CAT) version of the self-care and social function scales of the Pediatric Evaluation of Disability Inventory compared with the full-length version of these scales. DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics; community-based day care, preschool, and children's homes. PARTICIPANTS: Children with disabilities (n=469) and 412 children with no disabilities (analytic sample); 38 children with disabilities and 35 children without disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from prototype CAT applications of each scale using 15-, 10-, and 5-item stopping rules; scores from the full-length self-care and social function scales; time (in seconds) to complete assessments and respondent ratings of burden. RESULTS: Scores from both computer simulations and field administration of the prototype CATs were highly consistent with scores from full-length administration (r range, .94-.99). Using computer simulation of retrospective data, discriminant validity, and sensitivity to change of the CATs closely approximated that of the full-length scales, especially when the 15- and 10-item stopping rules were applied. In the cross-validation study the time to administer both CATs was 4 minutes, compared with over 16 minutes to complete the full-length scales. CONCLUSIONS: Self-care and social function score estimates from CAT administration are highly comparable with those obtained from full-length scale administration, with small losses in validity and precision and substantial decreases in administration time.10a*Disability Evaluation10a*Social Adjustment10aActivities of Daily Living10aAdolescent10aAge Factors10aChild10aChild, Preschool10aComputer Simulation10aCross-Over Studies10aDisabled Children/*rehabilitation10aFemale10aFollow-Up Studies10aHumans10aInfant10aMale10aOutcome Assessment (Health Care)10aReference Values10aReproducibility of Results10aRetrospective Studies10aRisk Factors10aSelf Care/*standards/trends10aSex Factors10aSickness Impact Profile1 aCoster, W J1 aHaley, S M1 aNi, P1 aDumas, H M1 aFragala-Pinkham, M A uhttp://iacat.org/content/assessing-self-care-and-social-function-using-computer-adaptive-testing-version-pediatric03037nas a2200481 4500008004100000020004600041245012200087210006900209250001500278260000800293300001200301490000700313520155700320653003201877653003101909653002201940653002001962653001001982653000901992653002202001653002802023653003302051653001102084653001102095653002502106653000902131653001602140653004602156653002202202653002402224653003002248653002902278100001502307700001402322700001502336700002402351700001802375700001102393700001602404700001002420700001502430856011002445 2008 eng d a1532-821X (Electronic)0003-9993 (Linking)00aComputerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes0 aComputerized adaptive testing for followup after discharge from a2008/01/30 cFeb a275-2830 v893 aOBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden.10a*Activities of Daily Living10a*Adaptation, Physiological10a*Computer Systems10a*Questionnaires10aAdult10aAged10aAged, 80 and over10aChi-Square Distribution10aFactor Analysis, Statistical10aFemale10aHumans10aLongitudinal Studies10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPatient Discharge10aProspective Studies10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aGandek, B1 aSiebens, H1 aBlack-Schaffer, R M1 aSinclair, S J1 aTao, W1 aCoster, W J1 aNi, P1 aJette, A M uhttp://iacat.org/content/computerized-adaptive-testing-follow-after-discharge-inpatient-rehabilitation-ii02556nas a2200313 4500008004100000020004100041245011500082210006900197250001500266300001100281490000700292520149300299653002701792653001001819653001401829653005301843653001501896653001101911653003701922653001801959653003101977653002602008653001402034653003202048100001502080700001002095700001502105856012202120 2008 eng d a0963-8288 (Print)0963-8288 (Linking)00aEfficiency and sensitivity of multidimensional computerized adaptive testing of pediatric physical functioning0 aEfficiency and sensitivity of multidimensional computerized adap a2008/02/26 a479-840 v303 aPURPOSE: Computerized adaptive tests (CATs) have efficiency advantages over fixed-length tests of physical functioning but may lose sensitivity when administering extremely low numbers of items. Multidimensional CATs may efficiently improve sensitivity by capitalizing on correlations between functional domains. Using a series of empirical simulations, we assessed the efficiency and sensitivity of multidimensional CATs compared to a longer fixed-length test. METHOD: Parent responses to the Pediatric Evaluation of Disability Inventory before and after intervention for 239 children at a pediatric rehabilitation hospital provided the data for this retrospective study. Reliability, effect size, and standardized response mean were compared between full-length self-care and mobility subscales and simulated multidimensional CATs with stopping rules at 40, 30, 20, and 10 items. RESULTS: Reliability was lowest in the 10-item CAT condition for the self-care (r = 0.85) and mobility (r = 0.79) subscales; all other conditions had high reliabilities (r > 0.94). All multidimensional CAT conditions had equivalent levels of sensitivity compared to the full set condition for both domains. CONCLUSIONS: Multidimensional CATs efficiently retain the sensitivity of longer fixed-length measures even with 5 items per dimension (10-item CAT condition). Measuring physical functioning with multidimensional CATs could enhance sensitivity following intervention while minimizing response burden.10a*Disability Evaluation10aChild10aComputers10aDisabled Children/*classification/rehabilitation10aEfficiency10aHumans10aOutcome Assessment (Health Care)10aPsychometrics10aReproducibility of Results10aRetrospective Studies10aSelf Care10aSensitivity and Specificity1 aAllen, D D1 aNi, P1 aHaley, S M uhttp://iacat.org/content/efficiency-and-sensitivity-multidimensional-computerized-adaptive-testing-pediatric-physical03424nas a2200385 4500008004100000020004100041245010600082210006900188250001500257260001200272300001000284490000700294520220300301653002702504653001502531653001002546653002102556653002402577653002802601653003802629653001102667653001102678653001102689653003902700653000902739653002402748653003102772653004002803100001802843700001502861700001302876700001702889700001402906856011802920 2008 eng d a0271-6798 (Print)0271-6798 (Linking)00aMeasuring physical functioning in children with spinal impairments with computerized adaptive testing0 aMeasuring physical functioning in children with spinal impairmen a2008/03/26 cApr-May a330-50 v283 aBACKGROUND: The purpose of this study was to assess the utility of measuring current physical functioning status of children with scoliosis and kyphosis by applying computerized adaptive testing (CAT) methods. Computerized adaptive testing uses a computer interface to administer the most optimal items based on previous responses, reducing the number of items needed to obtain a scoring estimate. METHODS: This was a prospective study of 77 subjects (0.6-19.8 years) who were seen by a spine surgeon during a routine clinic visit for progress spine deformity. Using a multidimensional version of the Pediatric Evaluation of Disability Inventory CAT program (PEDI-MCAT), we evaluated content range, accuracy and efficiency, known-group validity, concurrent validity with the Pediatric Outcomes Data Collection Instrument, and test-retest reliability in a subsample (n = 16) within a 2-week interval. RESULTS: We found the PEDI-MCAT to have sufficient item coverage in both self-care and mobility content for this sample, although most patients tended to score at the higher ends of both scales. Both the accuracy of PEDI-MCAT scores as compared with a fixed format of the PEDI (r = 0.98 for both mobility and self-care) and test-retest reliability were very high [self-care: intraclass correlation (3,1) = 0.98, mobility: intraclass correlation (3,1) = 0.99]. The PEDI-MCAT took an average of 2.9 minutes for the parents to complete. The PEDI-MCAT detected expected differences between patient groups, and scores on the PEDI-MCAT correlated in expected directions with scores from the Pediatric Outcomes Data Collection Instrument domains. CONCLUSIONS: Use of the PEDI-MCAT to assess the physical functioning status, as perceived by parents of children with complex spinal impairments, seems to be feasible and achieves accurate and efficient estimates of self-care and mobility function. Additional item development will be needed at the higher functioning end of the scale to avoid ceiling effects for older children. LEVEL OF EVIDENCE: This is a level II prospective study designed to establish the utility of computer adaptive testing as an evaluation method in a busy pediatric spine practice.10a*Disability Evaluation10aAdolescent10aChild10aChild, Preschool10aComputer Simulation10aCross-Sectional Studies10aDisabled Children/*rehabilitation10aFemale10aHumans10aInfant10aKyphosis/*diagnosis/rehabilitation10aMale10aProspective Studies10aReproducibility of Results10aScoliosis/*diagnosis/rehabilitation1 aMulcahey, M J1 aHaley, S M1 aDuffy, T1 aPengsheng, N1 aBetz, R R uhttp://iacat.org/content/measuring-physical-functioning-children-spinal-impairments-computerized-adaptive-testing02137nas a2200289 4500008004100000020004600041245007200087210006400159250001500223260001100238300000700249490000700256520118400263653002901447653003501476653002601511653002601537653001101563653006101574653001801635653004501653653001301698100001601711700001301727700001801740856008901758 2008 eng d a1553-6467 (Electronic)0002-9459 (Linking)00aThe NAPLEX: evolution, purpose, scope, and educational implications0 aNAPLEX evolution purpose scope and educational implications a2008/05/17 cApr 15 a330 v723 aSince 2004, passing the North American Pharmacist Licensure Examination (NAPLEX) has been a requirement for earning initial pharmacy licensure in all 50 United States. The creation and evolution from 1952-2005 of the particular pharmacy competency testing areas and quantities of questions are described for the former paper-and-pencil National Association of Boards of Pharmacy Licensure Examination (NABPLEX) and the current candidate-specific computer adaptive NAPLEX pharmacy licensure examinations. A 40% increase in the weighting of NAPLEX Blueprint Area 2 in May 2005, compared to that in the preceding 1997-2005 Blueprint, has implications for candidates' NAPLEX performance and associated curricular content and instruction. New pharmacy graduates' scores on the NAPLEX are neither intended nor validated to serve as a criterion for assessing or judging the quality or effectiveness of pharmacy curricula and instruction. The newest cycle of NAPLEX Blueprint revision, a continual process to ensure representation of nationwide contemporary practice, began in early 2008. It may take up to 2 years, including surveying several thousand national pharmacists, to complete.10a*Educational Measurement10aEducation, Pharmacy/*standards10aHistory, 20th Century10aHistory, 21st Century10aHumans10aLicensure, Pharmacy/history/*legislation & jurisprudence10aNorth America10aPharmacists/*legislation & jurisprudence10aSoftware1 aNewton, D W1 aBoyle, M1 aCatizone, C A uhttp://iacat.org/content/naplex-evolution-purpose-scope-and-educational-implications01943nas a2200277 4500008004100000020004100041245007300082210006900155250001500224260000800239300001000247490000700257520099700264653001601261653002901277653004801306653006201354653001101416653002401427653004601451653003101497653001301528100001401541700001501555856009501570 2008 eng d a0007-1102 (Print)0007-1102 (Linking)00aPredicting item exposure parameters in computerized adaptive testing0 aPredicting item exposure parameters in computerized adaptive tes a2008/05/17 cMay a75-910 v613 aThe purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) - a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge-based solution for finding item exposure parameters.10a*Algorithms10a*Artificial Intelligence10aAptitude Tests/*statistics & numerical data10aDiagnosis, Computer-Assisted/*statistics & numerical data10aHumans10aModels, Statistical10aPsychometrics/statistics & numerical data10aReproducibility of Results10aSoftware1 aChen, S-Y1 aDoong, S H uhttp://iacat.org/content/predicting-item-exposure-parameters-computerized-adaptive-testing01944nas a2200301 4500008004500000020001400045245012900059210006900188300001200257490000700269520093500276653002501211653002101236653002501257653003001282653003001312653001001342653001501352653002601367653002501393653002401418653001501442653001501457100001301472700001701485700002101502856011901523 2007 Engldsh a0146-621600aComputerized adaptive testing for polytomous motivation items: Administration mode effects and a comparison with short forms0 aComputerized adaptive testing for polytomous motivation items Ad a412-4290 v313 aIn a randomized experiment (n=515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit. (PsycINFO Database Record (c) 2007 APA ) (journal abstract)10a2220 Tests & Testing10aAdaptive Testing10aAttitude Measurement10acomputer adaptive testing10aComputer Assisted Testing10aitems10aMotivation10apolytomous motivation10aStatistical Validity10aTest Administration10aTest Forms10aTest Items1 aHol, A M1 aVorst, H C M1 aMellenbergh, G J uhttp://iacat.org/content/computerized-adaptive-testing-polytomous-motivation-items-administration-mode-effects-and01873nas a2200193 4500008004100000020004600041245005200087210005200139260001900191300001000210490000600220520125500226653003801481653003001519653002301549100001901572700001401591856007401605 2007 eng d a1548-1093 (Print); 1548-1107 (Electronic)00aEvaluation of computer adaptive testing systems0 aEvaluation of computer adaptive testing systems bIGI Global: US a70-870 v23 aMany educational organizations are trying to reduce the cost of the exams, the workload and delay of scoring, and the human errors. Also, they try to increase the accuracy and efficiency of the testing. Recently, most examination organizations use computer adaptive testing (CAT) as the method for large scale testing. This article investigates the current state of CAT systems and identifies their strengths and weaknesses. It evaluates 10 CAT systems using an evaluation framework of 15 domains categorized into three dimensions: educational, technical, and economical. The results show that the majority of the CAT systems give priority to security, reliability, and maintainability. However, they do not offer to the examinee any advanced support and functionalities. Also, the feedback to the examinee is limited and the presentation of the items is poor. Recommendations are made in order to enhance the overall quality of a CAT system. For example, alternative multimedia items should be available so that the examinee would choose a preferred media type. Feedback could be improved by providing more information to the examinee or providing information anytime the examinee wished. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputer adaptive testing systems10aexamination organizations10asystems evaluation1 aEconomides, AA1 aRoupas, C uhttp://iacat.org/content/evaluation-computer-adaptive-testing-systems02871nas a2200313 4500008004100000020002200041245010100063210006900164250001500233260000800248300001200256490000700268520179500275653005102070653002002121653003702141653002602178653001902204653001102223653003002234653004602264653003502310653002802345653001302373100002102386700001702407700001502424856011802439 2007 eng d a0315-162X (Print)00aImproving patient reported outcomes using item response theory and computerized adaptive testing0 aImproving patient reported outcomes using item response theory a a2007/06/07 cJun a1426-310 v343 aOBJECTIVE: Patient reported outcomes (PRO) are considered central outcome measures for both clinical trials and observational studies in rheumatology. More sophisticated statistical models, including item response theory (IRT) and computerized adaptive testing (CAT), will enable critical evaluation and reconstruction of currently utilized PRO instruments to improve measurement precision while reducing item burden on the individual patient. METHODS: We developed a domain hierarchy encompassing the latent trait of physical function/disability from the more general to most specific. Items collected from 165 English-language instruments were evaluated by a structured process including trained raters, modified Delphi expert consensus, and then patient evaluation. Each item in the refined data bank will undergo extensive analysis using IRT to evaluate response functions and measurement precision. CAT will allow for real-time questionnaires of potentially smaller numbers of questions tailored directly to each individual's level of physical function. RESULTS: Physical function/disability domain comprises 4 subdomains: upper extremity, trunk, lower extremity, and complex activities. Expert and patient review led to consensus favoring use of present-tense "capability" questions using a 4- or 5-item Likert response construct over past-tense "performance"items. Floor and ceiling effects, attribution of disability, and standardization of response categories were also addressed. CONCLUSION: By applying statistical techniques of IRT through use of CAT, existing PRO instruments may be improved to reduce questionnaire burden on the individual patients while increasing measurement precision that may ultimately lead to reduced sample size requirements for costly clinical trials.10a*Rheumatic Diseases/physiopathology/psychology10aClinical Trials10aData Interpretation, Statistical10aDisability Evaluation10aHealth Surveys10aHumans10aInternational Cooperation10aOutcome Assessment (Health Care)/*methods10aPatient Participation/*methods10aResearch Design/*trends10aSoftware1 aChakravarty, E F1 aBjorner, J B1 aFries, J F uhttp://iacat.org/content/improving-patient-reported-outcomes-using-item-response-theory-and-computerized-adaptive02408nas a2200361 4500008004500000020001400045245011100059210006900170300001200239490000700251520135900258653001601617653002001633653001301653653002401666653002501690653001101715653001401726653001301740653001801753653002701771653001001798653001101808100001501819700001201834700001601846700001201862700001601874700001301890700001301903700001401916856011601930 2007 Engldsh a1057-924900aThe initial development of an item bank to assess and screen for psychological distress in cancer patients0 ainitial development of an item bank to assess and screen for psy a724-7320 v163 aPsychological distress is a common problem among cancer patients. Despite the large number of instruments that have been developed to assess distress, their utility remains disappointing. This study aimed to use Rasch models to develop an item-bank which would provide the basis for better means of assessing psychological distress in cancer patients. An item bank was developed from eight psychological distress questionnaires using Rasch analysis to link common items. Items from the questionnaires were added iteratively with common items as anchor points and misfitting items (infit mean square > 1.3) removed, and unidimensionality assessed. A total of 4914 patients completed the questionnaires providing an initial pool of 83 items. Twenty items were removed resulting in a final pool of 63 items. Good fit was demonstrated and no additional factor structure was evident from the residuals. However, there was little overlap between item locations and person measures, since items mainly targeted higher levels of distress. The Rasch analysis allowed items to be pooled and generated a unidimensional instrument for measuring psychological distress in cancer patients. Additional items are required to more accurately assess patients across the whole continuum of psychological distress. (PsycINFO Database Record (c) 2007 APA ) (journal abstract)10a3293 Cancer10acancer patients10aDistress10ainitial development10aItem Response Theory10aModels10aNeoplasms10aPatients10aPsychological10apsychological distress10aRasch10aStress1 aSmith, A B1 aRush, R1 aVelikova, G1 aWall, L1 aWright, E P1 aStark, D1 aSelby, P1 aSharpe, M uhttp://iacat.org/content/initial-development-item-bank-assess-and-screen-psychological-distress-cancer-patients03100nas a2200445 4500008004100000020002200041245007100063210006900134250001500203300001200218490000700230520183100237653003802068653001902106653002102125653002002146653001402166653001102180653003002191653001102221653000902232653002502241653004602266653001802312653002602330100001302356700001402369700001702383700001302400700001502413700001502428700001702443700001402460700001802474700002302492700001602515700001602531700001502547856009202562 2007 eng d a0962-9343 (Print)00aIRT health outcomes data analysis project: an overview and summary0 aIRT health outcomes data analysis project an overview and summar a2007/03/14 a121-1320 v163 aBACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed.10a*Data Interpretation, Statistical10a*Health Status10a*Quality of Life10a*Questionnaires10a*Software10aFemale10aHIV Infections/psychology10aHumans10aMale10aNeoplasms/psychology10aOutcome Assessment (Health Care)/*methods10aPsychometrics10aStress, Psychological1 aCook, KF1 aTeal, C R1 aBjorner, J B1 aCella, D1 aChang, C-H1 aCrane, P K1 aGibbons, L E1 aHays, R D1 aMcHorney, C A1 aOcepek-Welikson, K1 aRaczek, A E1 aTeresi, J A1 aReeve, B B uhttp://iacat.org/content/irt-health-outcomes-data-analysis-project-overview-and-summary02208nas a2200229 4500008004100000020004600041245008500087210006900172260004500241300001000286490000600296520140300302653003401705653002301739653002601762653001701788653002601805100001701831700001201848700001501860856010301875 2007 eng d a1614-1881 (Print); 1614-2241 (Electronic)00aMethods for restricting maximum exposure rate in computerized adaptative testing0 aMethods for restricting maximum exposure rate in computerized ad bHogrefe & Huber Publishers GmbH: Germany a14-230 v33 aThe Sympson-Hetter (1985) method provides a means of controlling maximum exposure rate of items in Computerized Adaptive Testing. Through a series of simulations, control parameters are set that mark the probability of administration of an item on being selected. This method presents two main problems: it requires a long computation time for calculating the parameters and the maximum exposure rate is slightly above the fixed limit. Van der Linden (2003) presented two alternatives which appear to solve both of the problems. The impact of these methods in the measurement accuracy has not been tested yet. We show how these methods over-restrict the exposure of some highly discriminating items and, thus, the accuracy is decreased. It also shown that, when the desired maximum exposure rate is near the minimum possible value, these methods offer an empirical maximum exposure rate clearly above the goal. A new method, based on the initial estimation of the probability of administration and the probability of selection of the items with the restricted method (Revuelta & Ponsoda, 1998), is presented in this paper. It can be used with the Sympson-Hetter method and with the two van der Linden's methods. This option, when used with Sympson-Hetter, speeds the convergence of the control parameters without decreasing the accuracy. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10aitem bank security10aitem exposure control10aoverlap rate10aSympson-Hetter method1 aBarrada, J R1 aOlea, J1 aPonsoda, V uhttp://iacat.org/content/methods-restricting-maximum-exposure-rate-computerized-adaptative-testing02408nas a2200289 4500008004100000020002200041245010800063210006900171260004500240300001000285490000700295520141000302653003201712653002501744653002501769653002501794653002601819653001801845653003701863653002101900653001301921100001501934700001401949700001901963700002001982856011602002 2007 eng d a1015-5759 (Print)00aPsychometric properties of an emotional adjustment measure: An application of the graded response model0 aPsychometric properties of an emotional adjustment measure An ap bHogrefe & Huber Publishers GmbH: Germany a39-460 v233 aItem response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. However, IRT has been mainly used for assessing achievements and ability rather than personality factors. This paper presents an application of the IRT to a personality measure. Thus, the psychometric properties of a new emotional adjustment measure that consists of a 28-six graded response items is shown. Classical test theory (CTT) analyses as well as IRT analyses are carried out. Samejima's (1969) graded-response model has been used for estimating item parameters. Results show that the bank of items fulfills model assumptions and fits the data reasonably well, demonstrating the suitability of the IRT models for the description and use of data originating from personality measures. In this sense, the model fulfills the expectations that IRT has undoubted advantages: (1) The invariance of the estimated parameters, (2) the treatment given to the standard error of measurement, and (3) the possibilities offered for the construction of computerized adaptive tests (CAT). The bank of items shows good reliability. It also shows convergent validity compared to the Eysenck Personality Inventory (EPQ-A; Eysenck & Eysenck, 1975) and the Big Five Questionnaire (BFQ; Caprara, Barbaranelli, & Borgogni, 1993). (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive tests10aEmotional Adjustment10aItem Response Theory10aPersonality Measures10apersonnel recruitment10aPsychometrics10aSamejima's graded response model10atest reliability10avalidity1 aRubio, V J1 aAguado, D1 aHontangas, P M1 aHernández, J M uhttp://iacat.org/content/psychometric-properties-emotional-adjustment-measure-application-graded-response-model01959nas a2200265 4500008004100000020002200041245008200063210006900145260002600214300001000240490000700250520112900257653001501386653003401401653001401435653001701449653002401466653001501490653002201505653001501527100002301542700001301565700001801578856009701596 2006 eng d a1076-9986 (Print)00aAssembling a computerized adaptive testing item pool as a set of linear tests0 aAssembling a computerized adaptive testing item pool as a set of bSage Publications: US a81-990 v313 aTest-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content constraints, and/or have unfavorable exposure rates. Although at first sight somewhat counterintuitive, it is shown that if the CAT pool is assembled as a set of linear test forms, undesirable correlations can be broken down effectively. It is proposed to assemble such pools using a mixed integer programming model with constraints that guarantee that each test meets all content specifications and an objective function that requires them to have maximal information at a well-chosen set of ability values. An empirical example with a previous master pool from the Law School Admission Test (LSAT) yielded a CAT with nearly uniform bias and mean-squared error functions for the ability estimator and item-exposure rates that satisfied the target for all items in the pool. 10aAlgorithms10acomputerized adaptive testing10aitem pool10alinear tests10amathematical models10astatistics10aTest Construction10aTest Items1 avan der Linden, WJ1 aAriel, A1 aVeldkamp, B P uhttp://iacat.org/content/assembling-computerized-adaptive-testing-item-pool-set-linear-tests02648nas a2200397 4500008004100000020002200041245013500063210006900198250001500267260000800282300001200290490000700302520140700309653002601716653003101742653001501773653001001788653000901798653002201807653002501829653003301854653001101887653001101898653000901909653001601918653004601934653003001980653003102010653001302041100001502054700001002069700001802079700001602097700001502113856012202128 2006 eng d a0895-4356 (Print)00aComputer adaptive testing improved accuracy and precision of scores over random item selection in a physical functioning item bank0 aComputer adaptive testing improved accuracy and precision of sco a2006/10/10 cNov a1174-820 v593 aBACKGROUND AND OBJECTIVE: Measuring physical functioning (PF) within and across postacute settings is critical for monitoring outcomes of rehabilitation; however, most current instruments lack sufficient breadth and feasibility for widespread use. Computer adaptive testing (CAT), in which item selection is tailored to the individual patient, holds promise for reducing response burden, yet maintaining measurement precision. We calibrated a PF item bank via item response theory (IRT), administered items with a post hoc CAT design, and determined whether CAT would improve accuracy and precision of score estimates over random item selection. METHODS: 1,041 adults were interviewed during postacute care rehabilitation episodes in either hospital or community settings. Responses for 124 PF items were calibrated using IRT methods to create a PF item bank. We examined the accuracy and precision of CAT-based scores compared to a random selection of items. RESULTS: CAT-based scores had higher correlations with the IRT-criterion scores, especially with short tests, and resulted in narrower confidence intervals than scores based on a random selection of items; gains, as expected, were especially large for low and high performing adults. CONCLUSION: The CAT design may have important precision and efficiency advantages for point-of-care functional assessment in rehabilitation practice settings.10a*Recovery of Function10aActivities of Daily Living10aAdolescent10aAdult10aAged10aAged, 80 and over10aConfidence Intervals10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aRehabilitation/*standards10aReproducibility of Results10aSoftware1 aHaley, S M1 aNi, P1 aHambleton, RK1 aSlavin, M D1 aJette, A M uhttp://iacat.org/content/computer-adaptive-testing-improved-accuracy-and-precision-scores-over-random-item-selectio-003325nas a2200469 4500008004100000020002200041245011600063210006900179250001500248260000800263300001200271490000700283520189400290653003202184653003102216653002202247653002002269653001002289653000902299653002202308653002802330653003302358653001102391653001102402653002502413653000902438653001602447653004602463653002202509653002402531653003002555653002902585100001502614700001502629700001602644700001102660700002402671700001402695700001802709700001002727856011802737 2006 eng d a0003-9993 (Print)00aComputerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes0 aComputerized adaptive testing for followup after discharge from a2006/08/01 cAug a1033-420 v873 aOBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time.10a*Activities of Daily Living10a*Adaptation, Physiological10a*Computer Systems10a*Questionnaires10aAdult10aAged10aAged, 80 and over10aChi-Square Distribution10aFactor Analysis, Statistical10aFemale10aHumans10aLongitudinal Studies10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPatient Discharge10aProspective Studies10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aSiebens, H1 aCoster, W J1 aTao, W1 aBlack-Schaffer, R M1 aGandek, B1 aSinclair, S J1 aNi, P uhttp://iacat.org/content/computerized-adaptive-testing-follow-after-discharge-inpatient-rehabilitation-i-activity01580nas a2200205 4500008004100000020002200041245005000063210005000113260002600163300001200189490000700201520094200208653003401150653002801184653001901212653002001231653003301251100002301284856006701307 2006 eng d a0146-6216 (Print)00aEquating scores from adaptive to linear tests0 aEquating scores from adaptive to linear tests bSage Publications: US a493-5080 v303 aTwo local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test for a population of test takers. The two local methods were generally best. Surprisingly, the TCF method performed slightly worse than the equipercentile method. Both methods showed strong bias and uniformly large inaccuracy, but the TCF method suffered from extra error due to the lower asymptote of the test characteristic function. It is argued that the worse performances of the two methods are a consequence of the fact that they use a single equating transformation for an entire population of test takers and therefore have to compromise between the individual score distributions. 10acomputerized adaptive testing10aequipercentile equating10alocal equating10ascore reporting10atest characteristic function1 avan der Linden, WJ uhttp://iacat.org/content/equating-scores-adaptive-linear-tests02563nas a2200349 4500008004100000020002200041245016600063210006900229250001500298260000800313300001100321490000700332520142900339653002701768653001601795653001501811653001001826653002101836653001401857653005201871653001501923653001101938653001101949653003701960653001801997653001402015100001502029700001002044700001602054700002502070856011802095 2006 eng d a0003-9993 (Print)00aMeasurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the pediatric evaluation of disability inventory0 aMeasurement precision and efficiency of multidimensional compute a2006/08/29 cSep a1223-90 v873 aOBJECTIVE: To compare the measurement efficiency and precision of a multidimensional computer adaptive testing (M-CAT) application to a unidimensional CAT (U-CAT) comparison using item bank data from 2 of the functional skills scales of the Pediatric Evaluation of Disability Inventory (PEDI). DESIGN: Using existing PEDI mobility and self-care item banks, we compared the stability of item calibrations and model fit between unidimensional and multidimensional Rasch models and compared the efficiency and precision of the U-CAT- and M-CAT-simulated assessments to a random draw of items. SETTING: Pediatric rehabilitation hospital and clinics. PARTICIPANTS: Clinical and normative samples. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Not applicable. RESULTS: The M-CAT had greater levels of precision and efficiency than the separate mobility and self-care U-CAT versions when using a similar number of items for each PEDI subdomain. Equivalent estimation of mobility and self-care scores can be achieved with a 25% to 40% item reduction with the M-CAT compared with the U-CAT. CONCLUSIONS: M-CAT applications appear to have both precision and efficiency advantages compared with separate U-CAT assessments when content subdomains have a high correlation. Practitioners may also realize interpretive advantages of reporting test score information for each subdomain when separate clinical inferences are desired.10a*Disability Evaluation10a*Pediatrics10aAdolescent10aChild10aChild, Preschool10aComputers10aDisabled Persons/*classification/rehabilitation10aEfficiency10aHumans10aInfant10aOutcome Assessment (Health Care)10aPsychometrics10aSelf Care1 aHaley, S M1 aNi, P1 aLudlow, L H1 aFragala-Pinkham, M A uhttp://iacat.org/content/measurement-precision-and-efficiency-multidimensional-computer-adaptive-testing-physical02512nas a2200265 4500008004100000020004100041245013200082210006900214250001500283260000800298300001100306490000700317520154000324653003101864653003701895653003301932653002401965653001101989653002402000653002702024653003302051653003002084100001602114856011602130 2006 eng d a0025-7079 (Print)0025-7079 (Linking)00aOverview of quantitative measurement methods. Equivalence, invariance, and differential item functioning in health applications0 aOverview of quantitative measurement methods Equivalence invaria a2006/10/25 cNov aS39-490 v443 aBACKGROUND: Reviewed in this article are issues relating to the study of invariance and differential item functioning (DIF). The aim of factor analyses and DIF, in the context of invariance testing, is the examination of group differences in item response conditional on an estimate of disability. Discussed are parameters and statistics that are not invariant and cannot be compared validly in crosscultural studies with varying distributions of disability in contrast to those that can be compared (if the model assumptions are met) because they are produced by models such as linear and nonlinear regression. OBJECTIVES: The purpose of this overview is to provide an integrated approach to the quantitative methods used in this special issue to examine measurement equivalence. The methods include classical test theory (CTT), factor analytic, and parametric and nonparametric approaches to DIF detection. Also included in the quantitative section is a discussion of item banking and computerized adaptive testing (CAT). METHODS: Factorial invariance and the articles discussing this topic are introduced. A brief overview of the DIF methods presented in the quantitative section of the special issue is provided together with a discussion of ways in which DIF analyses and examination of invariance using factor models may be complementary. CONCLUSIONS: Although factor analytic and DIF detection methods share features, they provide unique information and can be viewed as complementary in informing about measurement equivalence.10a*Cross-Cultural Comparison10aData Interpretation, Statistical10aFactor Analysis, Statistical10aGuidelines as Topic10aHumans10aModels, Statistical10aPsychometrics/*methods10aStatistics as Topic/*methods10aStatistics, Nonparametric1 aTeresi, J A uhttp://iacat.org/content/overview-quantitative-measurement-methods-equivalence-invariance-and-differential-item02649nas a2200409 4500008004100000245013400041210006900175300001000244490000700254520123100261653002501492653003201517653003101549653001001580653000901590653002201599653003301621653001101654653001101665653000901676653001601685653002401701653003101725653004101756653004501797653006801842653006101910653003001971653002802001653002202029100001402051700001302065700001802078700001402096700001502110856011402125 2006 eng d00aSimulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function0 aSimulated computerized adaptive test for patients with shoulder a290-80 v593 aBACKGROUND AND OBJECTIVE: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items, develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (theta(IRT)) and measures generated using the simulated CAT (theta(CAT)). STUDY DESIGN AND SETTING: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients with shoulder impairments who completed 60 SFS items. RESULTS: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The theta(IRT) and theta(CAT) measures were highly correlated (r = .96) and resulted in similar classifications of patients. CONCLUSION: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability.10a*Computer Simulation10a*Range of Motion, Articular10aActivities of Daily Living10aAdult10aAged10aAged, 80 and over10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aProspective Studies10aReproducibility of Results10aResearch Support, N.I.H., Extramural10aResearch Support, U.S. Gov't, Non-P.H.S.10aShoulder Dislocation/*physiopathology/psychology/rehabilitation10aShoulder Pain/*physiopathology/psychology/rehabilitation10aShoulder/*physiopathology10aSickness Impact Profile10aTreatment Outcome1 aHart, D L1 aCook, KF1 aMioduski, J E1 aTeal, C R1 aCrane, P K uhttp://iacat.org/content/simulated-computerized-adaptive-test-patients-shoulder-impairments-was-efficient-and03119nas a2200385 4500008004100000020002200041245012900063210006900192250001500261260000800276300001000284490000700294520188400301653002502185653002702210653001502237653001002252653002102262653002802283653003802311653001102349653001102360653001102371653000902382653004602391653002702437653003002464653003202494100001502526700001602541700001602557700001502573700002502588856012002613 2005 eng d a0003-9993 (Print)00aAssessing mobility in children using a computer adaptive testing version of the pediatric evaluation of disability inventory0 aAssessing mobility in children using a computer adaptive testing a2005/05/17 cMay a932-90 v863 aOBJECTIVE: To assess score agreement, validity, precision, and response burden of a prototype computerized adaptive testing (CAT) version of the Mobility Functional Skills Scale (Mob-CAT) of the Pediatric Evaluation of Disability Inventory (PEDI) as compared with the full 59-item version (Mob-59). DESIGN: Computer simulation analysis of cross-sectional and longitudinal retrospective data; and cross-sectional prospective study. SETTING: Pediatric rehabilitation hospital, including inpatient acute rehabilitation, day school program, outpatient clinics, community-based day care, preschool, and children's homes. PARTICIPANTS: Four hundred sixty-nine children with disabilities and 412 children with no disabilities (analytic sample); 41 children without disabilities and 39 with disabilities (cross-validation sample). INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from a prototype Mob-CAT application and versions using 15-, 10-, and 5-item stopping rules; scores from the Mob-59; and number of items and time (in seconds) to administer assessments. RESULTS: Mob-CAT scores from both computer simulations (intraclass correlation coefficient [ICC] range, .94-.99) and field administrations (ICC=.98) were in high agreement with scores from the Mob-59. Using computer simulations of retrospective data, discriminant validity, and sensitivity to change of the Mob-CAT closely approximated that of the Mob-59, especially when using the 15- and 10-item stopping rule versions of the Mob-CAT. The Mob-CAT used no more than 15% of the items for any single administration, and required 20% of the time needed to administer the Mob-59. CONCLUSIONS: Comparable score estimates for the PEDI mobility scale can be obtained from CAT administrations, with losses in validity and precision for shorter forms, but with a considerable reduction in administration time.10a*Computer Simulation10a*Disability Evaluation10aAdolescent10aChild10aChild, Preschool10aCross-Sectional Studies10aDisabled Children/*rehabilitation10aFemale10aHumans10aInfant10aMale10aOutcome Assessment (Health Care)/*methods10aRehabilitation Centers10aRehabilitation/*standards10aSensitivity and Specificity1 aHaley, S M1 aRaczek, A E1 aCoster, W J1 aDumas, H M1 aFragala-Pinkham, M A uhttp://iacat.org/content/assessing-mobility-children-using-computer-adaptive-testing-version-pediatric-evaluation-002148nas a2200229 4500008004100000020002200041245008700063210006900150260004100219300001200260490000700272520135500279653002101634653001501655653002401670653002601694653002501720653002101745653003101766100002301797856009801820 2005 eng d a0022-0655 (Print)00aA comparison of item-selection methods for adaptive tests with content constraints0 acomparison of itemselection methods for adaptive tests with cont bBlackwell Publishing: United Kingdom a283-3020 v423 aIn test assembly, a fundamental difference exists between algorithms that select a test sequentially or simultaneously. Sequential assembly allows us to optimize an objective function at the examinee's ability estimate, such as the test information function in computerized adaptive testing. But it leads to the non-trivial problem of how to realize a set of content constraints on the test—a problem more naturally solved by a simultaneous item-selection method. Three main item-selection methods in adaptive testing offer solutions to this dilemma. The spiraling method moves item selection across categories of items in the pool proportionally to the numbers needed from them. Item selection by the weighted-deviations method (WDM) and the shadow test approach (STA) is based on projections of the future consequences of selecting an item. These two methods differ in that the former calculates a projection of a weighted sum of the attributes of the eventual test and the latter a projection of the test itself. The pros and cons of these methods are analyzed. An empirical comparison between the WDM and STA was conducted for an adaptive version of the Law School Admission Test (LSAT), which showed equally good item-exposure rates but violations of some of the constraints and larger bias and inaccuracy of the ability estimator for the WDM.10aAdaptive Testing10aAlgorithms10acontent constraints10aitem selection method10ashadow test approach10aspiraling method10aweighted deviations method1 avan der Linden, WJ uhttp://iacat.org/content/comparison-item-selection-methods-adaptive-tests-content-constraints02787nas a2200469 4500008004100000020002200041245010400063210006900167250001500236260000800251300001200259490000700271520132800278653002201606653003101628653001501659653001601674653001001690653003401700653002101734653002401755653002501779653001501804653001101819653005301830653002901883653001101912653001101923653002001934653000901954653003101963653004601994653003102040653001402071653003202085100001502117700001002132700002502142700001702167700001302184856012002197 2005 eng d a0012-1622 (Print)00aA computer adaptive testing approach for assessing physical functioning in children and adolescents0 acomputer adaptive testing approach for assessing physical functi a2005/02/15 cFeb a113-1200 v473 aThe purpose of this article is to demonstrate: (1) the accuracy and (2) the reduction in amount of time and effort in assessing physical functioning (self-care and mobility domains) of children and adolescents using computer-adaptive testing (CAT). A CAT algorithm selects questions directly tailored to the child's ability level, based on previous responses. Using a CAT algorithm, a simulation study was used to determine the number of items necessary to approximate the score of a full-length assessment. We built simulated CAT (5-, 10-, 15-, and 20-item versions) for self-care and mobility domains and tested their accuracy in a normative sample (n=373; 190 males, 183 females; mean age 6y 11mo [SD 4y 2m], range 4mo to 14y 11mo) and a sample of children and adolescents with Pompe disease (n=26; 21 males, 5 females; mean age 6y 1mo [SD 3y 10mo], range 5mo to 14y 10mo). Results indicated that comparable score estimates (based on computer simulations) to the full-length tests can be achieved in a 20-item CAT version for all age ranges and for normative and clinical samples. No more than 13 to 16% of the items in the full-length tests were needed for any one administration. These results support further consideration of using CAT programs for accurate and efficient clinical assessments of physical functioning.10a*Computer Systems10aActivities of Daily Living10aAdolescent10aAge Factors10aChild10aChild Development/*physiology10aChild, Preschool10aComputer Simulation10aConfidence Intervals10aDemography10aFemale10aGlycogen Storage Disease Type II/physiopathology10aHealth Status Indicators10aHumans10aInfant10aInfant, Newborn10aMale10aMotor Activity/*physiology10aOutcome Assessment (Health Care)/*methods10aReproducibility of Results10aSelf Care10aSensitivity and Specificity1 aHaley, S M1 aNi, P1 aFragala-Pinkham, M A1 aSkrinar, A M1 aCorzo, D uhttp://iacat.org/content/computer-adaptive-testing-approach-assessing-physical-functioning-children-and-adolescents02119nas a2200253 4500008004100000245007900041210006900120300001200189490000700201520113300208653002701341653004601368653005201414653002901466653001101495653005601506653002501562653004101587653004501628653006201673100001501735700001501750856010001765 2005 eng d00aContemporary measurement techniques for rehabilitation outcomes assessment0 aContemporary measurement techniques for rehabilitation outcomes a339-3450 v373 aIn this article, we review the limitations of traditional rehabilitation functional outcome instruments currently in use within the rehabilitation field to assess Activity and Participation domains as defined by the International Classification of Function, Disability, and Health. These include a narrow scope of functional outcomes, data incompatibility across instruments, and the precision vs feasibility dilemma. Following this, we illustrate how contemporary measurement techniques, such as item response theory methods combined with computer adaptive testing methodology, can be applied in rehabilitation to design functional outcome instruments that are comprehensive in scope, accurate, allow for compatibility across instruments, and are sensitive to clinically important change without sacrificing their feasibility. Finally, we present some of the pressing challenges that need to be overcome to provide effective dissemination and training assistance to ensure that current and future generations of rehabilitation professionals are familiar with and skilled in the application of contemporary outcomes measurement.10a*Disability Evaluation10aActivities of Daily Living/classification10aDisabled Persons/classification/*rehabilitation10aHealth Status Indicators10aHumans10aOutcome Assessment (Health Care)/*methods/standards10aRecovery of Function10aResearch Support, N.I.H., Extramural10aResearch Support, U.S. Gov't, Non-P.H.S.10aSensitivity and Specificity computerized adaptive testing1 aJette, A M1 aHaley, S M uhttp://iacat.org/content/contemporary-measurement-techniques-rehabilitation-outcomes-assessment02210nas a2200373 4500008004100000245011500041210006900156300001100225490000700236520104400243653002101287653002001308653001001328653000901338653002801347653001101375653003801386653000901424653001601433653004101449653001801490653003701508653003001545100001401575700001301589700001301602700001501615700001701630700001501647700001901662700001601681700001801697856012101715 2005 eng d00aData pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer0 aData pooling and analysis to build a preliminary item bank an ex a142-590 v283 aAssessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress.10a*Quality of Life10a*Questionnaires10aAdult10aAged10aData Collection/methods10aHumans10aIntestine, Large/*physiopathology10aMale10aMiddle Aged10aProstatic Neoplasms/*physiopathology10aPsychometrics10aResearch Support, Non-U.S. Gov't10aStatistics, Nonparametric1 aEton, D T1 aLai, J S1 aCella, D1 aReeve, B B1 aTalcott, J A1 aClark, J A1 aMcPherson, C P1 aLitwin, M S1 aMoinpour, C M uhttp://iacat.org/content/data-pooling-and-analysis-build-preliminary-item-bank-example-using-bowel-function-prostate01996nas a2200205 4500008004100000020004600041245007900087210006900166260004100235300001400276490000700290520127200297653003001569653002501599653003401624100001301658700001801671700001601689856008501705 2005 eng d a0017-9124 (Print); 1475-6773 (Electronic)00aDynamic assessment of health outcomes: Time to let the CAT out of the bag?0 aDynamic assessment of health outcomes Time to let the CAT out of bBlackwell Publishing: United Kingdom a1694-17110 v403 aBackground: The use of item response theory (IRT) to measure self-reported outcomes has burgeoned in recent years. Perhaps the most important application of IRT is computer-adaptive testing (CAT), a measurement approach in which the selection of items is tailored for each respondent. Objective. To provide an introduction to the use of CAT in the measurement of health outcomes, describe several IRT models that can be used as the basis of CAT, and discuss practical issues associated with the use of adaptive scaling in research settings. Principal Points: The development of a CAT requires several steps that are not required in the development of a traditional measure including identification of "starting" and "stopping" rules. CAT's most attractive advantage is its efficiency. Greater measurement precision can be achieved with fewer items. Disadvantages of CAT include the high cost and level of technical expertise required to develop a CAT. Conclusions: Researchers, clinicians, and patients benefit from the availability of psychometrically rigorous measures that are not burdensome. CAT outcome measures hold substantial promise in this regard, but their development is not without challenges. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputer adaptive testing10aItem Response Theory10aself reported health outcomes1 aCook, KF1 aO'Malley, K J1 aRoddey, T S uhttp://iacat.org/content/dynamic-assessment-health-outcomes-time-let-cat-out-bag02237nas a2200217 4500008004100000020002200041245014200063210006900205260004100274300001200315490000700327520142600334653001401760653003401774653002301808653001601831653002701847100001201874700001701886856011601903 2005 eng d a0022-0655 (Print)00aIncreasing the homogeneity of CAT's item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests0 aIncreasing the homogeneity of CATs itemexposure rates by minimiz bBlackwell Publishing: United Kingdom a245-2690 v423 aA computerized adaptive testing (CAT) algorithm that has the potential to increase the homogeneity of CATs item-exposure rates without significantly sacrificing the precision of ability estimates was proposed and assessed in the shadow-test (van der Linden & Reese, 1998) CAT context. This CAT algorithm was formed by a combination of maximizing or minimizing varied target functions while assembling shadow tests. There were four target functions to be separately used in the first, second, third, and fourth quarter test of CAT. The elements to be used in the four functions were associated with (a) a random number assigned to each item, (b) the absolute difference between an examinee's current ability estimate and an item difficulty, (c) the absolute difference between an examinee's current ability estimate and an optimum item difficulty, and (d) item information. The results indicated that this combined CAT fully utilized all the items in the pool, reduced the maximum exposure rates, and achieved more homogeneous exposure rates. Moreover, its precision in recovering ability estimates was similar to that of the maximum item-information method. The combined CAT method resulted in the best overall results compared with the other individual CAT item-selection methods. The findings from the combined CAT are encouraging. Future uses are discussed. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aalgorithm10acomputerized adaptive testing10aitem exposure rate10ashadow test10avaried target function1 aLi, Y H1 aSchafer, W D uhttp://iacat.org/content/increasing-homogeneity-cats-item-exposure-rates-minimizing-or-maximizing-varied-target02296nas a2200205 4500008004100000245009000041210007100131300000900202490000700211520161900218653002001837653001601857653001801873653002201891653001601913653001501929653001801944100001401962856011401976 2005 eng d00aLa Validez desde una óptica psicométrica [Validity from a psychometric perspective]0 aLa Validez desde una óptica psicométrica Validity from a psychom a9-200 v133 aEl estudio de la validez constituye el eje central de los análisis psicométricos de los instrumentos de medida. En esta comunicación se traza una breve nota histórica de los distintos modos de concebir la validez a lo largo de los tiempos, se comentan las líneas actuales, y se tratan de vislumbrar posibles vías futuras, teniendo en cuenta el impacto que las nuevas tecnologías informáticas están ejerciendo sobre los propios instrumentos de medida en Psicología y Educación. Cuestiones como los nuevos formatos multimedia de los ítems, la evaluación a distancia, el uso intercultural de las pruebas, las consecuencias de su uso, o los tests adaptativos informatizados, reclaman nuevas formas de evaluar y conceptualizar la validez. También se analizan críticamente algunos planteamientos recientes sobre el concepto de validez. The study of validity constitutes a central axis of psychometric analyses of measurement instruments. This paper presents a historical sketch of different modes of conceiving validity, with commentary on current views, and it attempts to predict future lines of research by considering the impact of new computerized technologies on measurement instruments in psychology and education. Factors such as the new multimedia format of items, distance assessment, the intercultural use of tests, the consequences of the latter, or the development of computerized adaptive tests demand new ways of conceiving and evaluating validity. Some recent thoughts about the concept of validity are also critically analyzed. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aFactor Analysis10aMeasurement10aPsychometrics10aScaling (Testing)10aStatistical10aTechnology10aTest Validity1 aMuñiz, J uhttp://iacat.org/content/la-validez-desde-una-%C3%B3ptica-psicom%C3%A9trica-validity-psychometric-perspective02898nas a2200409 4500008004100000245012300041210006900164260000800233300001000241490000700251520159700258653004701855653001001902653000901912653001901921653003101940653002601971653001101997653002902008653001102037653000902048653001602057653003902073653001402112653002502126653002702151653003002178653003202208653002802240653002202268100001502290700001602305700001702321700001602338700001502354856011902369 2005 eng d00aMeasuring physical function in patients with complex medical and postsurgical conditions: a computer adaptive approach0 aMeasuring physical function in patients with complex medical and cOct a741-80 v843 aOBJECTIVE: To examine whether the range of disability in the medically complex and postsurgical populations receiving rehabilitation is adequately sampled by the new Activity Measure--Post-Acute Care (AM-PAC), and to assess whether computer adaptive testing (CAT) can derive valid patient scores using fewer questions. DESIGN: Observational study of 158 subjects (mean age 67.2 yrs) receiving skilled rehabilitation services in inpatient (acute rehabilitation hospitals, skilled nursing facility units) and community (home health services, outpatient departments) settings for recent-onset or worsening disability from medical (excluding neurological) and surgical (excluding orthopedic) conditions. Measures were interviewer-administered activity questions (all patients) and physical functioning portion of the SF-36 (outpatients) and standardized chart items (11 Functional Independence Measure (FIM), 19 Standardized Outcome and Assessment Information Set (OASIS) items, and 22 Minimum Data Set (MDS) items). Rasch modeling analyzed all data and the relationship between person ability estimates and average item difficulty. CAT assessed the ability to derive accurate patient scores using a sample of questions. RESULTS: The 163-item activity item pool covered the range of physical movement and personal and instrumental activities. CAT analysis showed comparable scores between estimates using 10 items or the total item pool. CONCLUSION: The AM-PAC can assess a broad range of function in patients with complex medical illness. CAT achieves valid patient scores using fewer questions.10aActivities of Daily Living/*classification10aAdult10aAged10aCohort Studies10aContinuity of Patient Care10aDisability Evaluation10aFemale10aHealth Services Research10aHumans10aMale10aMiddle Aged10aPostoperative Care/*rehabilitation10aPrognosis10aRecovery of Function10aRehabilitation Centers10aRehabilitation/*standards10aSensitivity and Specificity10aSickness Impact Profile10aTreatment Outcome1 aSiebens, H1 aAndres, P L1 aPengsheng, N1 aCoster, W J1 aHaley, S M uhttp://iacat.org/content/measuring-physical-function-patients-complex-medical-and-postsurgical-conditions-computer10197nas a2200553 4500008004100000245016300041210006900204300001400273490000700287520862500294653001808919653003208937100001608969700001608985700001409001700001609015700001409031700001509045700001609060700001409076700001509090700001509105700001409120700001709134700001709151700001709168700001609185700001709201700001609218700001509234700001609249700001709265700002309282700001609305700002209321700001609343700001609359700001709375700001309392700001509405700001309420700001609433700001209449700001409461700001609475700002009491700001209511856012009523 2005 eng d00aToward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire0 aToward efficient and comprehensive measurement of the alcohol pr a1180-11890 v293 aBackground: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association10aPsychometrics10aSubstance-Related Disorders1 aKahler, C W1 aStrong, D R1 aRead, J P1 aDe Boeck, P1 aWilson, M1 aActon, G S1 aPalfai, T P1 aWood, M D1 aMehta, P D1 aNeale, M C1 aFlay, B R1 aConklin, C A1 aClayton, R R1 aTiffany, S T1 aShiffman, S1 aKrueger, R F1 aNichol, P E1 aHicks, B M1 aMarkon, K E1 aPatrick, C J1 aIacono, William, G1 aMcGue, Matt1 aLangenbucher, J W1 aLabouvie, E1 aMartin, C S1 aSanjuan, P M1 aBavly, L1 aKirisci, L1 aChung, T1 aVanyukov, M1 aDunn, M1 aTarter, R1 aHandel, R W1 aBen-Porath, Y S1 aWatt, M uhttp://iacat.org/content/toward-efficient-and-comprehensive-measurement-alcohol-problems-continuum-college-students03703nas a2200481 4500008004100000245005200041210005200093300001200145490000700157520221100164653001902375653002902394653005802423653001002481653005302491653000902544653001102553653002502564653002602589653003302615653001102648653001002659653000902669653001602678653002402694653007402718653001802792653002902810653005802839653003102897653003202928653003602960653003202996100001503028700001603043700001603059700001603075700001003091700001403101700001803115700001503133856007303148 2004 eng d00aActivity outcome measurement for postacute care0 aActivity outcome measurement for postacute care aI49-1610 v423 aBACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings.10a*Self Efficacy10a*Sickness Impact Profile10aActivities of Daily Living/*classification/psychology10aAdult10aAftercare/*standards/statistics & numerical data10aAged10aBoston10aCognition/physiology10aDisability Evaluation10aFactor Analysis, Statistical10aFemale10aHuman10aMale10aMiddle Aged10aMovement/physiology10aOutcome Assessment (Health Care)/*methods/statistics & numerical data10aPsychometrics10aQuestionnaires/standards10aRehabilitation/*standards/statistics & numerical data10aReproducibility of Results10aSensitivity and Specificity10aSupport, U.S. Gov't, Non-P.H.S.10aSupport, U.S. Gov't, P.H.S.1 aHaley, S M1 aCoster, W J1 aAndres, P L1 aLudlow, L H1 aNi, P1 aBond, T L1 aSinclair, S J1 aJette, A M uhttp://iacat.org/content/activity-outcome-measurement-postacute-care02772nas a2200421 4500008004100000020004600041245011200087210006900199250001500268260001000283300000700293490000600300520144100306653002701747653003001774653004701804653001001851653000901861653002201870653002801892653001101920653001101931653002001942653000901962653001601971653001601987653001902003653001602022653003502038653002902073653004002102653003002142100001402172700001702186700001702203700001402220856011602234 2004 eng d a1477-7525 (Electronic)1477-7525 (Linking)00aThe AMC Linear Disability Score project in a population requiring residential care: psychometric properties0 aAMC Linear Disability Score project in a population requiring re a2004/08/05 cAug 3 a420 v23 aBACKGROUND: Currently there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes, including functional status. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. METHOD: This paper examines the psychometric properties of the AMC Linear Disability Score (ALDS) project item bank using an item response theory model and full information factor analysis. Data were collected from 555 respondents on a total of 160 items. RESULTS: Following the analysis, 79 items remained in the item bank. The remaining 81 items were excluded because of: difficulties in presentation (1 item); low levels of variation in response pattern (28 items); significant differences in measurement characteristics for males and females or for respondents under or over 85 years old (26 items); or lack of model fit to the data at item level (26 items). CONCLUSIONS: It is conceivable that the item bank will have different measurement characteristics for other patient or demographic populations. However, these results indicate that the ALDS item bank has sound psychometric properties for respondents in residential care settings and could form a stable base for measuring functional status in a range of situations, including the implementation of computerised adaptive testing of functional status.10a*Disability Evaluation10a*Health Status Indicators10aActivities of Daily Living/*classification10aAdult10aAged10aAged, 80 and over10aData Collection/methods10aFemale10aHumans10aLogistic Models10aMale10aMiddle Aged10aNetherlands10aPilot Projects10aProbability10aPsychometrics/*instrumentation10aQuestionnaires/standards10aResidential Facilities/*utilization10aSeverity of Illness Index1 aHolman, R1 aLindeboom, R1 aVermeulen, M1 aHaan, R J uhttp://iacat.org/content/amc-linear-disability-score-project-population-requiring-residential-care-psychometric01797nas a2200361 4500008004100000020002200041245009500063210006900158250001500227260001100242300001000253490000700263520066300270653002500933653002900958653001000987653000900997653002201006653004501028653003701073653001101110653001101121653000901132653001601141653003601157653003001193653003401223100001601257700002401273700001001297700001501307856011301322 2004 eng d a1074-9357 (Print)00aComputer adaptive testing: a strategy for monitoring stroke rehabilitation across settings0 aComputer adaptive testing a strategy for monitoring stroke rehab a2004/05/01 cSpring a33-390 v113 aCurrent functional assessment instruments in stroke rehabilitation are often setting-specific and lack precision, breadth, and/or feasibility. Computer adaptive testing (CAT) offers a promising potential solution by providing a quick, yet precise, measure of function that can be used across a broad range of patient abilities and in multiple settings. CAT technology yields a precise score by selecting very few relevant items from a large and diverse item pool based on each individual's responses. We demonstrate the potential usefulness of a CAT assessment model with a cross-sectional sample of persons with stroke from multiple rehabilitation settings.10a*Computer Simulation10a*User-Computer Interface10aAdult10aAged10aAged, 80 and over10aCerebrovascular Accident/*rehabilitation10aDisabled Persons/*classification10aFemale10aHumans10aMale10aMiddle Aged10aMonitoring, Physiologic/methods10aSeverity of Illness Index10aTask Performance and Analysis1 aAndres, P L1 aBlack-Schaffer, R M1 aNi, P1 aHaley, S M uhttp://iacat.org/content/computer-adaptive-testing-strategy-monitoring-stroke-rehabilitation-across-settings02589nas a2200469 4500008004100000245007200041210006900113300001000182490000600192520108600198653002501284653001001309653001501319653002101334653002201355653005901377653007001436653003301506653001101539653001101550653001301561653000901574653002701583653002201610653005501632653001901687653001501706653006601721653001801787653003701805653004101842653003001883653001301913100001501926700001301941700001801954700001501972700001401987700001402001700001302015856009102028 2004 eng d00aComputerized adaptive measurement of depression: A simulation study0 aComputerized adaptive measurement of depression A simulation stu a13-230 v43 aBackground: Efficient, accurate instruments for measuring depression are increasingly importantin clinical practice. We developed a computerized adaptive version of the Beck DepressionInventory (BDI). We examined its efficiency and its usefulness in identifying Major DepressiveEpisodes (MDE) and in measuring depression severity.Methods: Subjects were 744 participants in research studies in which each subject completed boththe BDI and the SCID. In addition, 285 patients completed the Hamilton Depression Rating Scale.Results: The adaptive BDI had an AUC as an indicator of a SCID diagnosis of MDE of 88%,equivalent to the full BDI. The adaptive BDI asked fewer questions than the full BDI (5.6 versus 21items). The adaptive latent depression score correlated r = .92 with the BDI total score and thelatent depression score correlated more highly with the Hamilton (r = .74) than the BDI total scoredid (r = .70).Conclusions: Adaptive testing for depression may provide greatly increased efficiency withoutloss of accuracy in identifying MDE or in measuring depression severity.10a*Computer Simulation10aAdult10aAlgorithms10aArea Under Curve10aComparative Study10aDepressive Disorder/*diagnosis/epidemiology/psychology10aDiagnosis, Computer-Assisted/*methods/statistics & numerical data10aFactor Analysis, Statistical10aFemale10aHumans10aInternet10aMale10aMass Screening/methods10aPatient Selection10aPersonality Inventory/*statistics & numerical data10aPilot Projects10aPrevalence10aPsychiatric Status Rating Scales/*statistics & numerical data10aPsychometrics10aResearch Support, Non-U.S. Gov't10aResearch Support, U.S. Gov't, P.H.S.10aSeverity of Illness Index10aSoftware1 aGardner, W1 aShear, K1 aKelleher, K J1 aPajer, K A1 aMammen, O1 aBuysse, D1 aFrank, E uhttp://iacat.org/content/computerized-adaptive-measurement-depression-simulation-study01608nas a2200217 4500008004100000020002200041245008200063210006900145260004300214300001200257490000700269520084600276653003401122653002601156653003501182653001601217653001701233100002301250700001801273856009901291 2004 eng d a1076-9986 (Print)00aConstraining item exposure in computerized adaptive testing with shadow tests0 aConstraining item exposure in computerized adaptive testing with bAmerican Educational Research Assn: US a273-2910 v293 aItem-exposure control in computerized adaptive testing is implemented by imposing item-ineligibility constraints on the assembly process of the shadow tests. The method resembles Sympson and Hetter’s (1985) method of item-exposure control in that the decisions to impose the constraints are probabilistic. The method does not, however, require time-consuming simulation studies to set values for control parameters before the operational use of the test. Instead, it can set the probabilities of item ineligibility adaptively during the test using the actual item-exposure rates. An empirical study using an item pool from the Law School Admission Test showed that application of the method yielded perfect control of the item-exposure rates and had negligible impact on the bias and mean-squared error functions of the ability estimator. 10acomputerized adaptive testing10aitem exposure control10aitem ineligibility constraints10aProbability10ashadow tests1 avan der Linden, WJ1 aVeldkamp, B P uhttp://iacat.org/content/constraining-item-exposure-computerized-adaptive-testing-shadow-tests01899nas a2200193 4500008004100000020002200041245010000063210006900163260004300232300001200275490000700287520117500294653002301469653003001492653002301522653002501545100001601570856011901586 2004 eng d a1076-9986 (Print)00aEstimating ability and item-selection strategy in self-adapted testing: A latent class approach0 aEstimating ability and itemselection strategy in selfadapted tes bAmerican Educational Research Assn: US a379-3960 v293 aThis article presents a psychometric model for estimating ability and item-selection strategies in self-adapted testing. In contrast to computer adaptive testing, in self-adapted testing the examinees are allowed to select the difficulty of the items. The item-selection strategy is defined as the distribution of difficulty conditional on the responses given to previous items. The article shows that missing responses in self-adapted testing are missing at random and can be ignored in the estimation of ability. However, the item-selection strategy cannot always be ignored in such an estimation. An EM algorithm is presented to estimate an examinee's ability and strategies, and a model fit is evaluated using Akaike's information criterion. The article includes an application with real data to illustrate how the model can be used in practice for evaluating hypotheses, estimating ability, and identifying strategies. In the example, four strategies were identified and related to examinees' ability. It was shown that individual examinees tended not to follow a consistent strategy throughout the test. (PsycINFO Database Record (c) 2007 APA, all rights reserved)10aestimating ability10aitem-selection strategies10apsychometric model10aself-adapted testing1 aRevuelta, J uhttp://iacat.org/content/estimating-ability-and-item-selection-strategy-self-adapted-testing-latent-class-approach00705nas a2200157 4500008004100000245013900041210006900180260003200249653003400281653004000315653004100355100001200396700001200408700001200420856011500432 2004 eng d00aAn investigation of two combination procedures of SPRT for three-category classification decisions in computerized classification test0 ainvestigation of two combination procedures of SPRT for threecat aSan Antonio, Texasc04/200410acomputerized adaptive testing10aComputerized classification testing10asequential probability ratio testing1 aJiao, H1 aWang, S1 aLau, CA uhttp://iacat.org/content/investigation-two-combination-procedures-sprt-three-category-classification-decisions02839nas a2200349 4500008004100000020004600041245011400087210006900201250001500270260001100285300000700296490000600303520169400309653002702003653002002030653002102050653002002071653004702091653003702138653001802175653001102193653001902204653001602223653002002239653003002259100001402289700001402303700001702317700002002334700001402354856012102368 2004 eng d a1477-7525 (Electronic)1477-7525 (Linking)00aPractical methods for dealing with 'not applicable' item responses in the AMC Linear Disability Score project0 aPractical methods for dealing with not applicable item responses a2004/06/18 cJun 16 a290 v23 aBACKGROUND: Whenever questionnaires are used to collect data on constructs, such as functional status or health related quality of life, it is unlikely that all respondents will respond to all items. This paper examines ways of dealing with responses in a 'not applicable' category to items included in the AMC Linear Disability Score (ALDS) project item bank. METHODS: The data examined in this paper come from the responses of 392 respondents to 32 items and form part of the calibration sample for the ALDS item bank. The data are analysed using the one-parameter logistic item response theory model. The four practical strategies for dealing with this type of response are: cold deck imputation; hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. RESULTS: The item and respondent population parameter estimates were very similar for the strategies involving hot deck imputation; treating the missing responses as if these items had never been offered to those individual patients; and using a model which takes account of the 'tendency to respond to items'. The estimates obtained using the cold deck imputation method were substantially different. CONCLUSIONS: The cold deck imputation method was not considered suitable for use in the ALDS item bank. The other three methods described can be usefully implemented in the ALDS item bank, depending on the purpose of the data analysis to be carried out. These three methods may be useful for other data sets examining similar constructs, when item response theory based methods are used.10a*Disability Evaluation10a*Health Surveys10a*Logistic Models10a*Questionnaires10aActivities of Daily Living/*classification10aData Interpretation, Statistical10aHealth Status10aHumans10aPilot Projects10aProbability10aQuality of Life10aSeverity of Illness Index1 aHolman, R1 aGlas, C A1 aLindeboom, R1 aZwinderman, A H1 aHaan, R J uhttp://iacat.org/content/practical-methods-dealing-not-applicable-item-responses-amc-linear-disability-score-project04028nas a2200433 4500008004100000245012300041210006900164260000800233300001200241490000700253520252400260653001902784653002902803653005802832653001002890653000902900653002202909653002602931653003302957653001102990653001103001653000903012653001603021653007403037653003003111653003603141653005803177653003103235653004503266653004103311653003203352100001603384700001503400700001603415700001603431700001403447700001203461856012103473 2004 eng d00aRefining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain0 aRefining the conceptual basis for rehabilitation outcome measure cJan aI62-1720 v423 aBACKGROUND: Rehabilitation outcome measures routinely include content on performance of daily activities; however, the conceptual basis for item selection is rarely specified. These instruments differ significantly in format, number, and specificity of daily activity items and in the measurement dimensions and type of scale used to specify levels of performance. We propose that a requirement for upper limb and hand skills underlies many activities of daily living (ADL) and instrumental activities of daily living (IADL) items in current instruments, and that items selected based on this definition can be placed along a single functional continuum. OBJECTIVE: To examine the dimensional structure and content coverage of a Personal Care and Instrumental Activities item set and to examine the comparability of items from existing instruments and a set of new items as measures of this domain. METHODS: Participants (N = 477) from 3 different disability groups and 4 settings representing the continuum of postacute rehabilitation care were administered the newly developed Activity Measure for Post-Acute Care (AM-PAC), the SF-8, and an additional setting-specific measure: FIM (in-patient rehabilitation); MDS (skilled nursing facility); MDS-PAC (postacute settings); OASIS (home care); or PF-10 (outpatient clinic). Rasch (partial-credit model) analyses were conducted on a set of 62 items covering the Personal Care and Instrumental domain to examine item fit, item functioning, and category difficulty estimates and unidimensionality. RESULTS: After removing 6 misfitting items, the remaining 56 items fit acceptably along the hypothesized continuum. Analyses yielded different difficulty estimates for the maximum score (eg, "Independent performance") for items with comparable content from different instruments. Items showed little differential item functioning across age, diagnosis, or severity groups, and 92% of the participants fit the model. CONCLUSIONS: ADL and IADL items from existing rehabilitation outcomes instruments that depend on skilled upper limb and hand use can be located along a single continuum, along with the new personal care and instrumental items of the AM-PAC addressing gaps in content. Results support the validity of the proposed definition of the Personal Care and Instrumental Activities dimension of function as a guide for future development of rehabilitation outcome instruments, such as linked, setting-specific short forms and computerized adaptive testing approaches.10a*Self Efficacy10a*Sickness Impact Profile10aActivities of Daily Living/*classification/psychology10aAdult10aAged10aAged, 80 and over10aDisability Evaluation10aFactor Analysis, Statistical10aFemale10aHumans10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods/statistics & numerical data10aQuestionnaires/*standards10aRecovery of Function/physiology10aRehabilitation/*standards/statistics & numerical data10aReproducibility of Results10aResearch Support, U.S. Gov't, Non-P.H.S.10aResearch Support, U.S. Gov't, P.H.S.10aSensitivity and Specificity1 aCoster, W J1 aHaley, S M1 aAndres, P L1 aLudlow, L H1 aBond, T L1 aNi, P S uhttp://iacat.org/content/refining-conceptual-basis-rehabilitation-outcome-measurement-personal-care-and-instrumental02884nas a2200301 4500008004100000020002200041245013700063210006900200250001500269260000800284300001000292490000700302520186600309653001102175653003302186653001102219653004602230653002402276653002902300653003002329653002902359100001502388700001602403700001602419700001602435700001002451856012102461 2004 eng d a0003-9993 (Print)00aScore comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care0 aScore comparability of short forms and computerized adaptive tes a2004/04/15 cApr a661-60 v853 aOBJECTIVE: To compare simulated short-form and computerized adaptive testing (CAT) scores to scores obtained from complete item sets for each of the 3 domains of the Activity Measure for Post-Acute Care (AM-PAC). DESIGN: Prospective study. SETTING: Six postacute health care networks in the greater Boston metropolitan area, including inpatient acute rehabilitation, transitional care units, home care, and outpatient services. PARTICIPANTS: A convenience sample of 485 adult volunteers who were receiving skilled rehabilitation services. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Inpatient and community-based short forms and CAT applications were developed for each of 3 activity domains (physical & mobility, personal care & instrumental, applied cognition) using item pools constructed from new items and items from existing postacute care instruments. RESULTS: Simulated CAT scores correlated highly with score estimates from the total item pool in each domain (4- and 6-item CAT r range,.90-.95; 10-item CAT r range,.96-.98). Scores on the 10-item short forms constructed for inpatient and community settings also provided good estimates of the AM-PAC item pool scores for the physical & movement and personal care & instrumental domains, but were less consistent in the applied cognition domain. Confidence intervals around individual scores were greater in the short forms than for the CATs. CONCLUSIONS: Accurate scoring estimates for AM-PAC domains can be obtained with either the setting-specific short forms or the CATs. The strong relationship between CAT and item pool scores can be attributed to the CAT's ability to select specific items to match individual responses. The CAT may have additional advantages over short forms in practicality, efficiency, and the potential for providing more precise scoring estimates for individuals.10aBoston10aFactor Analysis, Statistical10aHumans10aOutcome Assessment (Health Care)/*methods10aProspective Studies10aQuestionnaires/standards10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aCoster, W J1 aAndres, P L1 aKosinski, M1 aNi, P uhttp://iacat.org/content/score-comparability-short-forms-and-computerized-adaptive-testing-simulation-study-activity02650nas a2200385 4500008004100000245014400041210006900185300001200254490000700266520138100273653002101654653003301675653002901708653001501737653001001752653000901762653002201771653002601793653003301819653002501852653001901877653001001896653002501906653001601931653002401947653002601971653002701997653003202024653001302056653002802069100001702097700001602114700001402130856012002144 2003 eng d00aCalibration of an item pool for assessing the burden of headaches: an application of item response theory to the Headache Impact Test (HIT)0 aCalibration of an item pool for assessing the burden of headache a913-9330 v123 aBACKGROUND: Measurement of headache impact is important in clinical trials, case detection, and the clinical monitoring of patients. Computerized adaptive testing (CAT) of headache impact has potential advantages over traditional fixed-length tests in terms of precision, relevance, real-time quality control and flexibility. OBJECTIVE: To develop an item pool that can be used for a computerized adaptive test of headache impact. METHODS: We analyzed responses to four well-known tests of headache impact from a population-based sample of recent headache sufferers (n = 1016). We used confirmatory factor analysis for categorical data and analyses based on item response theory (IRT). RESULTS: In factor analyses, we found very high correlations between the factors hypothesized by the original test constructers, both within and between the original questionnaires. These results suggest that a single score of headache impact is sufficient. We established a pool of 47 items which fitted the generalized partial credit IRT model. By simulating a computerized adaptive health test we showed that an adaptive test of only five items had a very high concordance with the score based on all items and that different worst-case item selection scenarios did not lead to bias. CONCLUSION: We have established a headache impact item pool that can be used in CAT of headache impact.10a*Cost of Illness10a*Decision Support Techniques10a*Sickness Impact Profile10aAdolescent10aAdult10aAged10aComparative Study10aDisability Evaluation10aFactor Analysis, Statistical10aHeadache/*psychology10aHealth Surveys10aHuman10aLongitudinal Studies10aMiddle Aged10aMigraine/psychology10aModels, Psychological10aPsychometrics/*methods10aQuality of Life/*psychology10aSoftware10aSupport, Non-U.S. Gov't1 aBjorner, J B1 aKosinski, M1 aWare, Jr. uhttp://iacat.org/content/calibration-item-pool-assessing-burden-headaches-application-item-response-theory-headache01836nas a2200205 4500008004100000245009000041210006900131300001100200490000700211520111400218653002101332653003001353653001601383653003201399653001601431653004501447100001501492700001601507856010701523 2003 eng d00aA comparative study of item exposure control methods in computerized adaptive testing0 acomparative study of item exposure control methods in computeriz a71-1030 v403 aThis study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The M. L. Stocking and C. Lewis conditional multinomial procedure and, to a slightly lesser extent, the T. Davey and C. G. Parshall method seemed to be the most promising considering all of the factors that this study addressed. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aEducational10aItem Analysis (Statistical)10aMeasurement10aStrategies computerized adaptive testing1 aChang, S-W1 aAnsley, T N uhttp://iacat.org/content/comparative-study-item-exposure-control-methods-computerized-adaptive-testing01927nas a2200229 4500008004100000245007200041210006900113300001200182490000700194520116000201653001801361653002101379653003001400653001801430653002501448653002501473653005701498653002201555100001501577700001201592856009301604 2003 eng d00aComputerized adaptive testing using the nearest-neighbors criterion0 aComputerized adaptive testing using the nearestneighbors criteri a204-2160 v273 aItem selection procedures designed for computerized adaptive testing need to accurately estimate every taker's trait level (θ) and, at the same time, effectively use all items in a bank. Empirical studies showed that classical item selection procedures based on maximizing Fisher or other related information yielded highly varied item exposure rates; with these procedures, some items were frequently used whereas others were rarely selected. In the literature, methods have been proposed for controlling exposure rates; they tend to affect the accuracy in θ estimates, however. A modified version of the maximum Fisher information (MFI) criterion, coined the nearest neighbors (NN) criterion, is proposed in this study. The NN procedure improves to a moderate extent the undesirable item exposure rates associated with the MFI criterion and keeps sufficient precision in estimates. The NN criterion will be compared with a few other existing methods in an empirical study using the mean squared errors in θ estimates and plots of item exposure rates associated with different distributions. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10a(Statistical)10aAdaptive Testing10aComputer Assisted Testing10aItem Analysis10aItem Response Theory10aStatistical Analysis10aStatistical Estimation computerized adaptive testing10aStatistical Tests1 aCheng, P E1 aLiou, M uhttp://iacat.org/content/computerized-adaptive-testing-using-nearest-neighbors-criterion03162nas a2200361 4500008004100000245012500041210006900166300001200235490000700247520200400254653002902258653001502287653001002302653000902312653002202321653002002343653003302363653002402396653001102420653001002431653000902441653001602450653002502466653002602491653004302517653003202560653001902592653002802611100001702639700001602656700001402672856011402686 2003 eng d00aThe feasibility of applying item response theory to measures of migraine impact: a re-analysis of three clinical studies0 afeasibility of applying item response theory to measures of migr a887-9020 v123 aBACKGROUND: Item response theory (IRT) is a powerful framework for analyzing multiitem scales and is central to the implementation of computerized adaptive testing. OBJECTIVES: To explain the use of IRT to examine measurement properties and to apply IRT to a questionnaire for measuring migraine impact--the Migraine Specific Questionnaire (MSQ). METHODS: Data from three clinical studies that employed the MSQ-version 1 were analyzed by confirmatory factor analysis for categorical data and by IRT modeling. RESULTS: Confirmatory factor analyses showed very high correlations between the factors hypothesized by the original test constructions. Further, high item loadings on one common factor suggest that migraine impact may be adequately assessed by only one score. IRT analyses of the MSQ were feasible and provided several suggestions as to how to improve the items and in particular the response choices. Out of 15 items, 13 showed adequate fit to the IRT model. In general, IRT scores were strongly associated with the scores proposed by the original test developers and with the total item sum score. Analysis of response consistency showed that more than 90% of the patients answered consistently according to a unidimensional IRT model. For the remaining patients, scores on the dimension of emotional function were less strongly related to the overall IRT scores that mainly reflected role limitations. Such response patterns can be detected easily using response consistency indices. Analysis of test precision across score levels revealed that the MSQ was most precise at one standard deviation worse than the mean impact level for migraine patients that are not in treatment. Thus, gains in test precision can be achieved by developing items aimed at less severe levels of migraine impact. CONCLUSIONS: IRT proved useful for analyzing the MSQ. The approach warrants further testing in a more comprehensive item pool for headache impact that would enable computerized adaptive testing.10a*Sickness Impact Profile10aAdolescent10aAdult10aAged10aComparative Study10aCost of Illness10aFactor Analysis, Statistical10aFeasibility Studies10aFemale10aHuman10aMale10aMiddle Aged10aMigraine/*psychology10aModels, Psychological10aPsychometrics/instrumentation/*methods10aQuality of Life/*psychology10aQuestionnaires10aSupport, Non-U.S. Gov't1 aBjorner, J B1 aKosinski, M1 aWare, Jr. uhttp://iacat.org/content/feasibility-applying-item-response-theory-measures-migraine-impact-re-analysis-three02744nas a2200349 4500008004100000245015800041210006900199260000800268300001200276490000700288520160300295653003001898653002001928653001001948653003201958653001101990653001102001653000902012653001602021653002802037653001802065653003702083653004102120653002802161100001302189700001502202700001302217700001502230700001402245700001902259856011602278 2003 eng d00aItem banking to improve, shorten and computerized self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale0 aItem banking to improve shorten and computerized selfreported fa cAug a485-5010 v123 aFatigue is a common symptom among cancer patients and the general population. Due to its subjective nature, fatigue has been difficult to effectively and efficiently assess. Modern computerized adaptive testing (CAT) can enable precise assessment of fatigue using a small number of items from a fatigue item bank. CAT enables brief assessment by selecting questions from an item bank that provide the maximum amount of information given a person's previous responses. This article illustrates steps to prepare such an item bank, using 13 items from the Functional Assessment of Chronic Illness Therapy Fatigue Subscale (FACIT-F) as the basis. Samples included 1022 cancer patients and 1010 people from the general population. An Item Response Theory (IRT)-based rating scale model, a polytomous extension of the Rasch dichotomous model was utilized. Nine items demonstrating acceptable psychometric properties were selected and positioned on the fatigue continuum. The fatigue levels measured by these nine items along with their response categories covered 66.8% of the general population and 82.6% of the cancer patients. Although the operational CAT algorithms to handle polytomously scored items are still in progress, we illustrated how CAT may work by using nine core items to measure level of fatigue. Using this illustration, a fatigue measure comparable to its full-length 13-item scale administration was obtained using four items. The resulting item bank can serve as a core to which will be added a psychometrically sound and operational item bank covering the entire fatigue continuum.10a*Health Status Indicators10a*Questionnaires10aAdult10aFatigue/*diagnosis/etiology10aFemale10aHumans10aMale10aMiddle Aged10aNeoplasms/complications10aPsychometrics10aResearch Support, Non-U.S. Gov't10aResearch Support, U.S. Gov't, P.H.S.10aSickness Impact Profile1 aLai, J-S1 aCrane, P K1 aCella, D1 aChang, C-H1 aBode, R K1 aHeinemann, A W uhttp://iacat.org/content/item-banking-improve-shorten-and-computerized-self-reported-fatigue-illustration-steps01978nas a2200289 4500008004100000245007600041210006900117260000800186300001200194490000700206520114900213653002401362653001301386653002101399653002001420653001501440653001001455653001001465653001101475653001101486653000901497653002401506653002601530100001601556700001501572856010101587 2002 eng d00aAssessing tobacco beliefs among youth using item response theory models0 aAssessing tobacco beliefs among youth using item response theory cNov aS21-S390 v683 aSuccessful intervention research programs to prevent adolescent smoking require well-chosen, psychometrically sound instruments for assessing smoking prevalence and attitudes. Twelve thousand eight hundred and ten adolescents were surveyed about their smoking beliefs as part of the Teenage Attitudes and Practices Survey project, a prospective cohort study of predictors of smoking initiation among US adolescents. Item response theory (IRT) methods are used to frame a discussion of questions that a researcher might ask when selecting an optimal item set. IRT methods are especially useful for choosing items during instrument development, trait scoring, evaluating item functioning across groups, and creating optimal item subsets for use in specialized applications such as computerized adaptive testing. Data analytic steps for IRT modeling are reviewed for evaluating item quality and differential item functioning across subgroups of gender, age, and smoking status. Implications and challenges in the use of these methods for tobacco onset research and for assessing the developmental trajectories of smoking among youth are discussed.10a*Attitude to Health10a*Culture10a*Health Behavior10a*Questionnaires10aAdolescent10aAdult10aChild10aFemale10aHumans10aMale10aModels, Statistical10aSmoking/*epidemiology1 aPanter, A T1 aReeve, B B uhttp://iacat.org/content/assessing-tobacco-beliefs-among-youth-using-item-response-theory-models01443nas a2200241 4500008004100000245008000041210006900121300001200190490000700202520067300209653003000882653002800912653002500940653002300965653001600988653002201004653002101026100001301047700001601060700001001076700001601086856009901102 2002 eng d00aData sparseness and on-line pretest item calibration-scaling methods in CAT0 aData sparseness and online pretest item calibrationscaling metho a207-2180 v393 aCompared and evaluated 3 on-line pretest item calibration-scaling methods (the marginal maximum likelihood estimate with 1 expectation maximization [EM] cycle [OEM] method, the marginal maximum likelihood estimate with multiple EM cycles [MEM] method, and M. L. Stocking's Method B) in terms of item parameter recovery when the item responses to the pretest items in the pool are sparse. Simulations of computerized adaptive tests were used to evaluate the results yielded by the three methods. The MEM method produced the smallest average total error in parameter estimation, and the OEM method yielded the largest total error (PsycINFO Database Record (c) 2005 APA )10aComputer Assisted Testing10aEducational Measurement10aItem Response Theory10aMaximum Likelihood10aMethodology10aScaling (Testing)10aStatistical Data1 aBan, J-C1 aHanson, B A1 aYi, Q1 aHarris, D J uhttp://iacat.org/content/data-sparseness-and-line-pretest-item-calibration-scaling-methods-cat01634nas a2200217 4500008004100000245009700041210006900138300001200207490000700219520087500226653002101101653003001122653002501152653002301177653002501200653002801225653002701253100001301280700001501293856010801308 2002 eng d00aAn EM approach to parameter estimation for the Zinnes and Griggs paired comparison IRT model0 aEM approach to parameter estimation for the Zinnes and Griggs pa a208-2270 v263 aBorman et al. recently proposed a computer adaptive performance appraisal system called CARS II that utilizes paired comparison judgments of behavioral stimuli. To implement this approach,the paired comparison ideal point model developed by Zinnes and Griggs was selected. In this article,the authors describe item response and information functions for the Zinnes and Griggs model and present procedures for estimating stimulus and person parameters. Monte carlo simulations were conducted to assess the accuracy of the parameter estimation procedures. The results indicated that at least 400 ratees (i.e.,ratings) are required to obtain reasonably accurate estimates of the stimulus parameters and their standard errors. In addition,latent trait estimation improves as test length increases. The implications of these results for test construction are also discussed. 10aAdaptive Testing10aComputer Assisted Testing10aItem Response Theory10aMaximum Likelihood10aPersonnel Evaluation10aStatistical Correlation10aStatistical Estimation1 aStark, S1 aDrasgow, F uhttp://iacat.org/content/em-approach-parameter-estimation-zinnes-and-griggs-paired-comparison-irt-model01213nas a2200217 4500008004100000245005100041210005100092300001200143490000700155520060500162653002600767653003000793653001600823653001300839653001300852653001100865653001200876653001500888100001600903856007600919 2002 eng d00aInformation technology and literacy assessment0 aInformation technology and literacy assessment a369-3730 v183 aThis column discusses information technology and literacy assessment in the past and present. The author also describes computer-based assessments today including the following topics: computer-scored testing, computer-administered formal assessment, Internet formal assessment, computerized adaptive tests, placement tests, informal assessment, electronic portfolios, information management, and Internet information dissemination. A model of the major present-day applications of information technologies in reading and literacy assessment is also included. (PsycINFO Database Record (c) 2005 APA )10aComputer Applications10aComputer Assisted Testing10aInformation10aInternet10aLiteracy10aModels10aSystems10aTechnology1 aBalajthy, E uhttp://iacat.org/content/information-technology-and-literacy-assessment01689nas a2200277 4500008004100000020001000041245006500051210006400116260009700180300001100277520074500288653002101033653002201054653002501076653002801101653002501129653001601154653001801170653005501188653001501243653001201258100001801270700002301288700001301311856008701324 2002 eng d a02-0900aMathematical-programming approaches to test item pool design0 aMathematicalprogramming approaches to test item pool design aTwente, The NetherlandsbUniversity of Twente, Faculty of Educational Science and Technology a93-1083 a(From the chapter) This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing and hence to increase both measurement precision and validity. The approach consists of the application of mathematical programming techniques to calculate optimal blueprints for item pools. These blueprints can be used to guide the item-writing process. Three different types of design problems are discussed, namely for item pools for linear tests, item pools computerized adaptive testing (CAT), and systems of rotating item pools for CAT. The paper concludes with an empirical example of the problem of designing a system of rotating item pools for CAT.10aAdaptive Testing10aComputer Assisted10aComputer Programming10aEducational Measurement10aItem Response Theory10aMathematics10aPsychometrics10aStatistical Rotation computerized adaptive testing10aTest Items10aTesting1 aVeldkamp, B P1 avan der Linden, WJ1 aAriel, A uhttp://iacat.org/content/mathematical-programming-approaches-test-item-pool-design02411nas a2200277 4500008004100000245012200041210006900163260000800232300001000240490000700250520148700257653002101744653002101765653002001786653001001806653002201816653002901838653001101867653001801878653001901896653004101915653003201956100001301988700001802001856011402019 2002 eng d00aMeasuring quality of life in chronic illness: the functional assessment of chronic illness therapy measurement system0 aMeasuring quality of life in chronic illness the functional asse cDec aS10-70 v833 aWe focus on quality of life (QOL) measurement as applied to chronic illness. There are 2 major types of health-related quality of life (HRQOL) instruments-generic health status and targeted. Generic instruments offer the opportunity to compare results across patient and population cohorts, and some can provide normative or benchmark data from which to interpret results. Targeted instruments ask questions that focus more on the specific condition or treatment under study and, as a result, tend to be more responsive to clinically important changes than generic instruments. Each type of instrument has a place in the assessment of HRQOL in chronic illness, and consideration of the relative advantages and disadvantages of the 2 options best drives choice of instrument. The Functional Assessment of Chronic Illness Therapy (FACIT) system of HRQOL measurement is a hybrid of the 2 approaches. The FACIT system combines a core general measure with supplemental measures targeted toward specific diseases, conditions, or treatments. Thus, it capitalizes on the strengths of each type of measure. Recently, FACIT questionnaires were administered to a representative sample of the general population with results used to derive FACIT norms. These normative data can be used for benchmarking and to better understand changes in HRQOL that are often seen in clinical trials. Future directions in HRQOL assessment include test equating, item banking, and computerized adaptive testing.10a*Chronic Disease10a*Quality of Life10a*Rehabilitation10aAdult10aComparative Study10aHealth Status Indicators10aHumans10aPsychometrics10aQuestionnaires10aResearch Support, U.S. Gov't, P.H.S.10aSensitivity and Specificity1 aCella, D1 aNowinski, C J uhttp://iacat.org/content/measuring-quality-life-chronic-illness-functional-assessment-chronic-illness-therapy01627nas a2200241 4500008004100000245005900041210005800100300001200158490000700170520087100177653002101048653003401069653002801103653002001131653003201151653002501183653001501208653002701223653002201250653001601272100001601288856008101304 2002 eng d00aOutlier detection in high-stakes certification testing0 aOutlier detection in highstakes certification testing a219-2330 v393 aDiscusses recent developments of person-fit analysis in computerized adaptive testing (CAT). Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in CAT Most person-fit research in CAT is restricted to simulated data. In this study, empirical data from a certification test were used. Alternatives are discussed to generate norms so that bounds can be determined to classify an item score pattern as fitting or misfitting. Using bounds determined from a sample of a high-stakes certification test, the empirical analysis showed that different types of misfit can be distinguished Further applications using statistical process control methods to detect misfitting item score patterns are discussed. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10acomputerized adaptive testing10aEducational Measurement10aGoodness of Fit10aItem Analysis (Statistical)10aItem Response Theory10aperson Fit10aStatistical Estimation10aStatistical Power10aTest Scores1 aMeijer, R R uhttp://iacat.org/content/outlier-detection-high-stakes-certification-testing02030nas a2200253 4500008004100000245010900041210006900150300000900219490000600228520114100234653002101375653001501396653003901411653002201450653002501472653001801497653002201515653005501537653001501592653001201607100001701619700002501636856011501661 2002 eng d00aA structure-based approach to psychological measurement: Matching measurement models to latent structure0 astructurebased approach to psychological measurement Matching me a4-160 v93 aThe present article sets forth the argument that psychological assessment should be based on a construct's latent structure. The authors differentiate dimensional (continuous) and taxonic (categorical) structures at the latent and manifest levels and describe the advantages of matching the assessment approach to the latent structure of a construct. A proper match will decrease measurement error, increase statistical power, clarify statistical relationships, and facilitate the location of an efficient cutting score when applicable. Thus, individuals will be placed along a continuum or assigned to classes more accurately. The authors briefly review the methods by which latent structure can be determined and outline a structure-based approach to assessment that builds on dimensional scaling models, such as item response theory, while incorporating classification methods as appropriate. Finally, the authors empirically demonstrate the utility of their approach and discuss its compatibility with traditional assessment methods and with computerized adaptive testing. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aAssessment10aClassification (Cognitive Process)10aComputer Assisted10aItem Response Theory10aPsychological10aScaling (Testing)10aStatistical Analysis computerized adaptive testing10aTaxonomies10aTesting1 aRuscio, John1 aRuscio, Ayelet Meron uhttp://iacat.org/content/structure-based-approach-psychological-measurement-matching-measurement-models-latent01695nas a2200229 4500008004100000245007800041210006900119300001200188490000700200520094500207653002501152653005101177653003001228653001801258653001101276653002701287653001101314100001701325700001101342700001801353856009401371 2001 eng d00aComputerized adaptive testing with the generalized graded unfolding model0 aComputerized adaptive testing with the generalized graded unfold a177-1960 v253 aExamined the use of the generalized graded unfolding model (GGUM) in computerized adaptive testing. The objective was to minimize the number of items required to produce equiprecise estimates of person locations. Simulations based on real data about college student attitudes toward abortion and on data generated to fit the GGUM were used. It was found that as few as 7 or 8 items were needed to produce accurate and precise person estimates using an expected a posteriori procedure. The number items in the item bank (20, 40, or 60 items) and their distribution on the continuum (uniform locations or item clusters in moderately extreme locations) had only small effects on the accuracy and precision of the estimates. These results suggest that adaptive testing with the GGUM is a good method for achieving estimates with an approximately uniform level of precision using a small number of items. (PsycINFO Database Record (c) 2005 APA )10aAttitude Measurement10aCollege Students computerized adaptive testing10aComputer Assisted Testing10aItem Response10aModels10aStatistical Estimation10aTheory1 aRoberts, J S1 aLin, Y1 aLaughlin, J E uhttp://iacat.org/content/computerized-adaptive-testing-generalized-graded-unfolding-model01613nas a2200193 4500008004100000245008600041210006900127300001200196490000700208520094500215653002101160653003001181653004101211653000901252653001701261100001601278700001701294856010801311 2001 eng d00aDifferences between self-adapted and computerized adaptive tests: A meta-analysis0 aDifferences between selfadapted and computerized adaptive tests a235-2470 v383 aSelf-adapted testing has been described as a variation of computerized adaptive testing that reduces test anxiety and thereby enhances test performance. The purpose of this study was to gain a better understanding of these proposed effects of self-adapted tests (SATs); meta-analysis procedures were used to estimate differences between SATs and computerized adaptive tests (CATs) in proficiency estimates and post-test anxiety levels across studies in which these two types of tests have been compared. After controlling for measurement error the results showed that SATs yielded proficiency estimates that were 0.12 standard deviation units higher and post-test anxiety levels that were 0.19 standard deviation units lower than those yielded by CATs. The authors speculate about possible reasons for these differences and discuss advantages and disadvantages of using SATs in operational settings. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aScores computerized adaptive testing10aTest10aTest Anxiety1 aPitkin, A K1 aVispoel, W P uhttp://iacat.org/content/differences-between-self-adapted-and-computerized-adaptive-tests-meta-analysis01836nas a2200229 4500008004100000245007400041210006900115300001000184490000700194520110700201653002101308653000901329653004801338653001801386653002801404653002401432653001501456100001601471700001701487700001601504856008601520 2001 eng d00aEvaluation of an MMPI-A short form: Implications for adaptive testing0 aEvaluation of an MMPIA short form Implications for adaptive test a76-890 v763 aReports some psychometric properties of an MMPI-Adolescent version (MMPI-A; J. N. Butcher et al, 1992) short form based on administration of the 1st 150 items of this test instrument. The authors report results for both the MMPI-A normative sample of 1,620 adolescents (aged 14-18 yrs) and a clinical sample of 565 adolescents (mean age 15.2 yrs) in a variety of treatment settings. The authors summarize results for the MMPI-A basic scales in terms of Pearson product-moment correlations generated between full administration and short-form administration formats and mean T score elevations for the basic scales generated by each approach. In this investigation, the authors also examine single-scale and 2-point congruences found for the MMPI-A basic clinical scales as derived from standard and short-form administrations. The authors present the relative strengths and weaknesses of the MMPI-A short form and discuss the findings in terms of implications for attempts to shorten the item pool through the use of computerized adaptive assessment approaches. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aMean10aMinnesota Multiphasic Personality Inventory10aPsychometrics10aStatistical Correlation10aStatistical Samples10aTest Forms1 aArcher, R P1 aTirrell, C A1 aElkins, D E uhttp://iacat.org/content/evaluation-mmpi-short-form-implications-adaptive-testing02097nas a2200337 4500008004100000245014400041210006900185300001200254490000700266520096600273653002501239653003601264653002501300653001001325653003001335653001101365653001001376653000901386653003101395653003201426653003601458653003401494653002001528100001601548700001401564700001601578700001901594700001301613700001501626856011801641 2001 eng d00aAn examination of the comparative reliability, validity, and accuracy of performance ratings made using computerized adaptive rating scales0 aexamination of the comparative reliability validity and accuracy a965-9730 v863 aThis laboratory research compared the reliability, validity, and accuracy of a computerized adaptive rating scale (CARS) format and 2 relatively common and representative rating formats. The CARS is a paired-comparison rating task that uses adaptive testing principles to present pairs of scaled behavioral statements to the rater to iteratively estimate a ratee's effectiveness on 3 dimensions of contextual performance. Videotaped vignettes of 6 office workers were prepared, depicting prescripted levels of contextual performance, and 112 subjects rated these vignettes using the CARS format and one or the other competing format. Results showed 23%-37% lower standard errors of measurement for the CARS format. In addition, validity was significantly higher for the CARS format (d = .18), and Cronbach's accuracy coefficients showed significantly higher accuracy, with a median effect size of .08. The discussion focuses on possible reasons for the results.10a*Computer Simulation10a*Employee Performance Appraisal10a*Personnel Selection10aAdult10aAutomatic Data Processing10aFemale10aHuman10aMale10aReproducibility of Results10aSensitivity and Specificity10aSupport, U.S. Gov't, Non-P.H.S.10aTask Performance and Analysis10aVideo Recording1 aBorman, W C1 aBuck, D E1 aHanson, M A1 aMotowidlo, S J1 aStark, S1 aDrasgow, F uhttp://iacat.org/content/examination-comparative-reliability-validity-and-accuracy-performance-ratings-made-using01461nas a2200253 4500008004100000245013900041210006900180260005000249300001200299520053700311653002100848653002500869653001200894653002900906653002200935653002700957653002600984653001501010653001601025100001501041700001601056700001701072856011801089 2001 eng d00aItem response theory applied to combinations of multiple-choice and constructed-response items--approximation methods for scale scores0 aItem response theory applied to combinations of multiplechoice a aMahwah, N.J. USAbLawrence Erlbaum Associates a289-3153 a(From the chapter) The authors develop approximate methods that replace the scoring tables with weighted linear combinations of the component scores. Topics discussed include: a linear approximation for the extension to combinations of scores; the generalization of two or more scores; potential applications of linear approximations to item response theory in computerized adaptive tests; and evaluation of the pattern-of-summed-scores, and Gaussian approximation, estimates of proficiency. (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aItem Response Theory10aMethod)10aMultiple Choice (Testing10aScoring (Testing)10aStatistical Estimation10aStatistical Weighting10aTest Items10aTest Scores1 aThissen, D1 aNelson, L A1 aSwygert, K A uhttp://iacat.org/content/item-response-theory-applied-combinations-multiple-choice-and-constructed-response-items01981nas a2200205 4500008004100000245010100041210006900142300001200211490000700223520124900230653001201479653002101491653003001512653001501542653001601557653004501573100001701618700001901635856012101654 2001 eng d00aItem selection in computerized adaptive testing: Should more discriminating items be used first?0 aItem selection in computerized adaptive testing Should more disc a249-2660 v383 aDuring computerized adaptive testing (CAT), items are selected continuously according to the test-taker's estimated ability. Test security has become a problem because high-discrimination items are more likely to be selected and become overexposed. So, there seems to be a tradeoff between high efficiency in ability estimations and balanced usage of items. This series of four studies addressed the dilemma by focusing on the notion of whether more or less discriminating items should be used first in CAT. The first study demonstrated that the common maximum information method with J. B. Sympson and R. D. Hetter (1985) control resulted in the use of more discriminating items first. The remaining studies showed that using items in the reverse order, as described in H. Chang and Z. Yings (1999) stratified method had potential advantages: (a) a more balanced item usage and (b) a relatively stable resultant item pool structure with easy and inexpensive management. This stratified method may have ability-estimation efficiency better than or close to that of other methods. It is argued that the judicious selection of items, as in the stratified method, is a more active control of item exposure. (PsycINFO Database Record (c) 2005 APA )10aability10aAdaptive Testing10aComputer Assisted Testing10aEstimation10aStatistical10aTest Items computerized adaptive testing1 aHau, Kit-Tai1 aChang, Hua-Hua uhttp://iacat.org/content/item-selection-computerized-adaptive-testing-should-more-discriminating-items-be-used-first02027nas a2200253 4500008004100000245007700041210006900118260001200187300001200199490000700211520125800218653003901476653002901515653001501544653001001559653001101569653001101580653000901591653003001600653001301630100001601643700002001659856009401679 2001 eng d00aNCLEX-RN performance: predicting success on the computerized examination0 aNCLEXRN performance predicting success on the computerized exami cJul-Aug a158-1650 v173 aSince the adoption of the Computerized Adaptive Testing (CAT) format of the National Certification Licensure Examination for Registered Nurses (NCLEX-RN), no studies have been reported in the literature on predictors of successful performance by baccalaureate nursing graduates on the licensure examination. In this study, a discriminant analysis was used to identify which of 21 variables can be significant predictors of success on the CAT NCLEX-RN. The convenience sample consisted of 289 individuals who graduated from a baccalaureate nursing program between 1995 and 1998. Seven significant predictor variables were identified. The total number of C+ or lower grades earned in nursing theory courses was the best predictor, followed by grades in several individual nursing courses. More than 93 per cent of graduates were correctly classified. Ninety-four per cent of NCLEX "passes" were correctly classified, as were 92 per cent of NCLEX failures. This degree of accuracy in classifying CAT NCLEX-RN failures represents a marked improvement over results reported in previous studies of licensure examinations, and suggests the discriminant function will be helpful in identifying future students in danger of failure. J Prof Nurs 17:158-165, 2001.10a*Education, Nursing, Baccalaureate10a*Educational Measurement10a*Licensure10aAdult10aFemale10aHumans10aMale10aPredictive Value of Tests10aSoftware1 aBeeman, P B1 aWaterhouse, J K uhttp://iacat.org/content/nclex-rn-performance-predicting-success-computerized-examination01701nas a2200181 4500008004100000245007300041210006900114300001100183490000700194520110100201653002101302653003001323653002501353653001501378100001701393700001501410856009401425 2001 eng d00aOutlier measures and norming methods for computerized adaptive tests0 aOutlier measures and norming methods for computerized adaptive t a85-1040 v263 aNotes that the problem of identifying outliers has 2 important aspects: the choice of outlier measures and the method to assess the degree of outlyingness (norming) of those measures. Several classes of measures for identifying outliers in Computerized Adaptive Tests (CATs) are introduced. Some of these measures are constructed to take advantage of CATs' sequential choice of items; other measures are taken directly from paper and pencil (P&P) tests and are used for baseline comparisons. Assessing the degree of outlyingness of CAT responses, however, can not be applied directly from P&P tests because stopping rules associated with CATs yield examinee responses of varying lengths. Standard outlier measures are highly correlated with the varying lengths which makes comparison across examinees impossible. Therefore, 4 methods are presented and compared which map outlier statistics to a familiar probability scale (a p value). The methods are explored in the context of CAT data from a 1995 Nationally Administered Computerized Examination (NACE). (PsycINFO Database Record (c) 2005 APA )10aAdaptive Testing10aComputer Assisted Testing10aStatistical Analysis10aTest Norms1 aBradlow, E T1 aWeiss, R E uhttp://iacat.org/content/outlier-measures-and-norming-methods-computerized-adaptive-tests01385nas a2200193 4500008004100000245009400041210006900135300001200204490000700216520067900223653002100902653003000923653002500953653005700978100001401035700001901049700001901068856010401087 2000 eng d00aA comparison of item selection rules at the early stages of computerized adaptive testing0 acomparison of item selection rules at the early stages of comput a241-2550 v243 aThe effects of 5 item selection rules--Fisher information (FI), Fisher interval information (FII), Fisher information with a posterior distribution (FIP), Kullback-Leibler information (KL), and Kullback-Leibler information with a posterior distribution (KLP)--were compared with respect to the efficiency and precision of trait (θ) estimation at the early stages of computerized adaptive testing (CAT). FII, FIP, KL, and KLP performed marginally better than FI at the early stages of CAT for θ=-3 and -2. For tests longer than 10 items, there appeared to be no precision advantage for any of the selection rules. (PsycINFO Database Record (c) 2005 APA ) (journal abstract)10aAdaptive Testing10aComputer Assisted Testing10aItem Analysis (Test)10aStatistical Estimation computerized adaptive testing1 aChen, S-Y1 aAnkenmann, R D1 aChang, Hua-Hua uhttp://iacat.org/content/comparison-item-selection-rules-early-stages-computerized-adaptive-testing02840nas a2200193 4500008004100000245013800041210006900179300000900248490000700257520208000264653003002344653002002374653005202394653002202446653001802468653002402486100001602510856012002526 2000 eng d00aThe development of a computerized version of Vandenberg's mental rotation test and the effect of visuo-spatial working memory loading0 adevelopment of a computerized version of Vandenbergs mental rota a39380 v603 aThis dissertation focused on the generation and evaluation of web-based versions of Vandenberg's Mental Rotation Test. Memory and spatial visualization theory were explored in relation to the addition of a visuo-spatial working memory component. Analysis of the data determined that there was a significant difference between scores on the MRT Computer and MRT Memory test. The addition of a visuo-spatial working memory component did significantly affect results at the .05 alpha level. Reliability and discrimination estimates were higher on the MRT Memory version. The computerization of the paper and pencil version on the MRT did not significantly effect scores but did effect the time required to complete the test. The population utilized in the quasi-experiment consisted of 107 university students from eight institutions in engineering graphics related courses. The subjects completed two researcher developed, Web-based versions of Vandenberg's Mental Rotation Test and the original paper and pencil version of the Mental Rotation Test. One version of the test included a visuo-spatial working memory loading. Significant contributions of this study included developing and evaluating computerized versions of Vandenberg's Mental Rotation Test. Previous versions of Vandenberg's Mental Rotation Test did not take advantage of the ability of the computer to incorporate an interaction factor, such as a visuo-spatial working memory loading, into the test. The addition of an interaction factor results in a more discriminate test which will lend itself well to computerized adaptive testing practices. Educators in engineering graphics related disciplines should strongly consider the use of spatial visualization tests to aid in establishing the effects of modern computer systems on fundamental design/drafting skills. Regular testing of spatial visualization skills will result assist in the creation of a more relevant curriculum. Computerized tests which are valid and reliable will assist in making this task feasible. (PsycINFO Database Record (c) 2005 APA )10aComputer Assisted Testing10aMental Rotation10aShort Term Memory computerized adaptive testing10aTest Construction10aTest Validity10aVisuospatial Memory1 aStrong, S D uhttp://iacat.org/content/development-computerized-version-vandenbergs-mental-rotation-test-and-effect-visuo-spatial00712nas a2200193 4500008004100000245008400041210006900125300001400194490000700208653003000215653001100245653002500256653001600281653005500297653002200352653002300374100001800397856010300415 2000 eng d00aEmergence of item response modeling in instrument development and data analysis0 aEmergence of item response modeling in instrument development an aII60-II650 v3810aComputer Assisted Testing10aHealth10aItem Response Theory10aMeasurement10aStatistical Validity computerized adaptive testing10aTest Construction10aTreatment Outcomes1 aHambleton, RK uhttp://iacat.org/content/emergence-item-response-modeling-instrument-development-and-data-analysis01428nas a2200193 4500008004100000245006300041210006300104300001200167490000700179520079500186653001800981653002100999653003001020653001801050653005701068100001501125700001201140856008201152 2000 eng d00aEstimation of trait level in computerized adaptive testing0 aEstimation of trait level in computerized adaptive testing a257-2650 v243 aNotes that in computerized adaptive testing (CAT), a examinee's trait level (θ) must be estimated with reasonable accuracy based on a small number of item responses. A successful implementation of CAT depends on (1) the accuracy of statistical methods used for estimating θ and (2) the efficiency of the item-selection criterion. Methods of estimating θ suitable for CAT are reviewed, and the differences between Fisher and Kullback-Leibler information criteria for selecting items are discussed. The accuracy of different CAT algorithms was examined in an empirical study. The results show that correcting θ estimates for bias was necessary at earlier stages of CAT, but most CAT algorithms performed equally well for tests of 10 or more items. (PsycINFO Database Record (c) 2005 APA )10a(Statistical)10aAdaptive Testing10aComputer Assisted Testing10aItem Analysis10aStatistical Estimation computerized adaptive testing1 aCheng, P E1 aLiou, M uhttp://iacat.org/content/estimation-trait-level-computerized-adaptive-testing01770nas a2200289 4500008004100000245007700041210006900118300001400187490000700201520080000208653002501008653003101033653003701064653003801101653001901139653001001158653002701168653004601195653002001241653002801261653003201289653001801321100001401339700001701353700001501370856009501385 2000 eng d00aItem response theory and health outcomes measurement in the 21st century0 aItem response theory and health outcomes measurement in the 21st aII28-II420 v383 aItem response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods.10a*Models, Statistical10aActivities of Daily Living10aData Interpretation, Statistical10aHealth Services Research/*methods10aHealth Surveys10aHuman10aMathematical Computing10aOutcome Assessment (Health Care)/*methods10aResearch Design10aSupport, Non-U.S. Gov't10aSupport, U.S. Gov't, P.H.S.10aUnited States1 aHays, R D1 aMorales, L S1 aReise, S P uhttp://iacat.org/content/item-response-theory-and-health-outcomes-measurement-21st-century02714nas a2200169 4500008004100000245011500041210006900156300000900225490000700234520208500241653001302326653002802339653002602367653001602393100001602409856011902425 2000 eng d00aLagrangian relaxation for constrained curve-fitting with binary variables: Applications in educational testing0 aLagrangian relaxation for constrained curvefitting with binary v a10630 v613 aThis dissertation offers a mathematical programming approach to curve fitting with binary variables. Various Lagrangian Relaxation (LR) techniques are applied to constrained curve fitting. Applications in educational testing with respect to test assembly are utilized. In particular, techniques are applied to both static exams (i.e. conventional paper-and-pencil (P&P)) and adaptive exams (i.e. a hybrid computerized adaptive test (CAT) called a multiple-forms structure (MFS)). This dissertation focuses on the development of mathematical models to represent these test assembly problems as constrained curve-fitting problems with binary variables and solution techniques for the test development. Mathematical programming techniques are used to generate parallel test forms with item characteristics based on item response theory. A binary variable is used to represent whether or not an item is present on a form. The problem of creating a test form is modeled as a network flow problem with additional constraints. In order to meet the target information and the test characteristic curves, a Lagrangian relaxation heuristic is applied to the problem. The Lagrangian approach works by multiplying the constraint by a "Lagrange multiplier" and adding it to the objective. By systematically varying the multiplier, the test form curves approach the targets. This dissertation explores modifications to Lagrangian Relaxation as it is applied to the classical paper-and-pencil exams. For the P&P exams, LR techniques are also utilized to include additional practical constraints to the network problem, which limit the item selection. An MFS is a type of a computerized adaptive test. It is a hybrid of a standard CAT and a P&P exam. The concept of an MFS will be introduced in this dissertation, as well as, the application of LR as it is applied to constructing parallel MFSs. The approach is applied to the Law School Admission Test for the assembly of the conventional P&P test as well as an experimental computerized test using MFSs. (PsycINFO Database Record (c) 2005 APA )10aAnalysis10aEducational Measurement10aMathematical Modeling10aStatistical1 aKoppel, N B uhttp://iacat.org/content/lagrangian-relaxation-constrained-curve-fitting-binary-variables-applications-educational01906nas a2200229 4500008004100000245008600041210006900127300001000196490000700206520111400213653003201327653003701359653001001396653003401406653003001440653002901470653003201499100001601531700001801547700001301565856009801578 1999 eng d00aThe use of Rasch analysis to produce scale-free measurement of functional ability0 ause of Rasch analysis to produce scalefree measurement of functi a83-900 v533 aInnovative applications of Rasch analysis can lead to solutions for traditional measurement problems and can produce new assessment applications in occupational therapy and health care practice. First, Rasch analysis is a mechanism that translates scores across similar functional ability assessments, thus enabling the comparison of functional ability outcomes measured by different instruments. This will allow for the meaningful tracking of functional ability outcomes across the continuum of care. Second, once the item-difficulty order of an instrument or item bank is established by Rasch analysis, computerized adaptive testing can be used to target items to the patient's ability level, reducing assessment length by as much as one half. More importantly, Rasch analysis can provide the foundation for "equiprecise" measurement or the potential to have precise measurement across all levels of functional ability. The use of Rasch analysis to create scale-free measurement of functional ability demonstrates how this methodlogy can be used in practical applications of clinical and outcome assessment.10a*Activities of Daily Living10aDisabled Persons/*classification10aHuman10aOccupational Therapy/*methods10aPredictive Value of Tests10aQuestionnaires/standards10aSensitivity and Specificity1 aVelozo, C A1 aKielhofner, G1 aLai, J-S uhttp://iacat.org/content/use-rasch-analysis-produce-scale-free-measurement-functional-ability00734nas a2200169 4500008004100000020000900041245012400050210006900174260008700243653003400330653001500364653001800379100001600397700001900413700001700432856011500449 1991 eng d aR-1100aPatterns of alcohol and drug use among federal offenders as assessed by the Computerized Lifestyle Screening Instrument0 aPatterns of alcohol and drug use among federal offenders as asse aOttawa, ON. CanadabResearch and Statistics Branch, Correctional Service of Canada10acomputerized adaptive testing10adrug abuse10asubstance use1 aRobinson, D1 aPorporino, F J1 aMillson, W A uhttp://iacat.org/content/patterns-alcohol-and-drug-use-among-federal-offenders-assessed-computerized-lifestyle00615nas a2200169 4500008004100000245009000041210006900131300001200200490000700212653001500219653001500234653003700249653002300286653001800309100001600327856010200343 1990 eng d00aFuture directions for the National Council: the Computerized Adaptive Testing Project0 aFuture directions for the National Council the Computerized Adap a1, 3, 50 v1110a*Computers10a*Licensure10aEducational Measurement/*methods10aSocieties, Nursing10aUnited States1 aBouchard, J uhttp://iacat.org/content/future-directions-national-council-computerized-adaptive-testing-project00653nas a2200181 4500008004100000245008900041210006900130300000600199490000700205653001500212653001500227653003700242653002400279653002300303653001800326100001400344856011300358 1990 eng d00aNational Council Computerized Adaptive Testing Project Review--committee perspective0 aNational Council Computerized Adaptive Testing Project Reviewcom a30 v1110a*Computers10a*Licensure10aEducational Measurement/*methods10aFeasibility Studies10aSocieties, Nursing10aUnited States1 aHaynes, B uhttp://iacat.org/content/national-council-computerized-adaptive-testing-project-review-committee-perspective00616nas a2200133 4500008004100000245011200041210006900153260003800222653003400260653003800294100001500332700001700347856011800364 1987 eng d00aThe effect of item parameter estimation error on decisions made using the sequential probability ratio test0 aeffect of item parameter estimation error on decisions made usin aIowa City, IA. USAbDTIC Document10acomputerized adaptive testing10aSequential probability ratio test1 aSpray, J A1 aReckase, M D uhttp://iacat.org/content/effect-item-parameter-estimation-error-decisions-made-using-sequential-probability-ratio00569nas a2200157 4500008004100000245006000041210005700101260003800158300001200196653000900208653004900217653004100266653000900307100001700316856007800333 1983 eng d00aA procedure for decision making using tailored testing.0 aprocedure for decision making using tailored testing aNew York, NY. USAbAcademic Press a237-25410aCCAT10aCLASSIFICATION Computerized Adaptive Testing10asequential probability ratio testing10aSPRT1 aReckase, M D uhttp://iacat.org/content/procedure-decision-making-using-tailored-testing00562nas a2200181 4500008004100000245005100041210004900092300001100141490000700152653000900159653004900168653004100217653000900258100001300267700001400280700001600294856007000310 1972 eng d00aSequential testing for dichotomous decisions. 0 aSequential testing for dichotomous decisions a85-95.0 v3210aCCAT10aCLASSIFICATION Computerized Adaptive Testing10asequential probability ratio testing10aSPRT1 aLinn, RL1 aRock, D A1 aCleary, T A uhttp://iacat.org/content/sequential-testing-dichotomous-decisions