00728nas a2200193 4500008004500000245009100045210006900136300001000205490000700215653003500222653003400257653002900291653002600320100001900346700002700365700002400392700002100416856009700437 2023 Engldsh 00aHow Do Trait Change Patterns Affect the Performance of Adaptive Measurement of Change?0 aHow Do Trait Change Patterns Affect the Performance of Adaptive a32-580 v1010aadaptive measurement of change10acomputerized adaptive testing10alongitudinal measurement10atrait change patterns1 aTai, Ming, Him1 aCooperman, Allison, W.1 aDeWeese, Joseph, N.1 aWeiss, David, J. uhttp://iacat.org/how-do-trait-change-patterns-affect-performance-adaptive-measurement-change01965nas a2200145 4500008004100000245008700041210006900128260005500197520137700252653003201629653001801661653002401679100001701703856009901720 2017 eng d00aAdapting Linear Models for Optimal Test Design to More Complex Test Specifications0 aAdapting Linear Models for Optimal Test Design to More Complex T aNiigata, JapanbNiigata Seiryo Universityc08/20173 a
Combinatorial optimization (CO) has proven to be a very helpful approach for addressing test assembly issues and for providing solutions. Furthermore, CO has been applied for several test designs, including: (1) for the development of linear test forms; (2) for computerized adaptive testing and; (3) for multistage testing. In his seminal work, van der Linden (2006) laid out the basis for using linear models for simultaneously assembling exams and item pools in a variety of conditions: (1) for single tests and multiple tests; (2) with item sets, etc. However, for some testing programs, the number and complexity of test specifications can grow rapidly. Consequently, the mathematical representation of the test assembly problem goes beyond most approaches reported either in van der Linden’s book or in the majority of other publications related to test assembly. In this presentation, we extend van der Linden’s framework by including the concept of blocks for test specifications. We modify the usual mathematical notation of a test assembly problem by including this concept and we show how it can be applied to various test designs. Finally, we will demonstrate an implementation of this approach in a stand-alone software, called the ATASolver.
Session Video
10aComplex Test Specifications10aLinear Models10aOptimal Test Design1 aMorin, Maxim uhttp://iacat.org/adapting-linear-models-optimal-test-design-more-complex-test-specifications-003109nas a2200145 4500008004100000245004900041210004500090260005500135520264100190653002802831653000802859653002102867100001702888856005802905 2017 eng d00aIs CAT Suitable for Automated Speaking Test?0 aCAT Suitable for Automated Speaking Test aNiigata, JapanbNiigata Seiryo Universityc08/20173 aWe have developed automated scoring system of Japanese speaking proficiency, namely SJ-CAT (Speaking Japanese Computerized Adaptive Test), which is operational for last few months. One of the unique features of the test is an adaptive test base on polytomous IRT.
SJ-CAT consists of two sections; Section 1 has sentence reading aloud tasks and a multiple choicereading tasks and Section 2 has sentence generation tasks and an open answer tasks. In reading aloud tasks, a test taker reads a phoneme-balanced sentence on the screen after listening to a model reading. In a multiple choice-reading task, a test taker sees a picture and reads aloud one sentence among three sentences on the screen, which describe the scene most appropriately. In a sentence generation task, a test taker sees a picture or watches a video clip and describes the scene with his/her own words for about ten seconds. In an open answer tasks, the test taker expresses one’s support for or opposition to e.g., a nuclear power generation with reasons for about 30 seconds.
In the course of the development of the test, we found many unexpected and unique characteristics of speaking CAT, which are not found in usual CATs with multiple choices. In this presentation, we will discuss some of such factors that are not previously noticed in our previous project of developing dichotomous J-CAT (Japanese Computerized Adaptive Test), which consists of vocabulary, grammar, reading, and listening. Firstly, we will claim that distribution of item difficulty parameters depends on the types of items. An item pool with unrestricted types of items such as open questions is difficult to achieve ideal distributions, either normal distribution or uniform distribution. Secondly, contrary to our expectations, open questions are not necessarily more difficult to operate in automated scoring system than more restricted questions such as sentence reading, as long as if one can set up suitable algorithm for open question scoring. Thirdly, we will show that the speed of convergence of standard deviation of posterior distribution, or standard error of theta parameter in polytomous IRT used for SJCAT is faster than dichotomous IRT used in J-CAT. Fourthly, we will discuss problems in equation of items in SJ-CAT, and suggest introducing deep learning with reinforcement learning instead of equation. And finally, we will discuss the issues of operation of SJ-CAT on the web, including speed of scoring, operation costs, security among others.
Session Video
10aAutomated Speaking Test10aCAT10alanguage testing1 aImai, Shingo uhttp://iacat.org/cat-suitable-automated-speaking-test03826nas a2200157 4500008004100000245008500041210006900126260005500195520325800250653000803508653002203516653002303538100001603561700002003577856007103597 2017 eng d00aA Large-Scale Progress Monitoring Application with Computerized Adaptive Testing0 aLargeScale Progress Monitoring Application with Computerized Ada aNiigata, JapanbNiigata Seiryo Universityc08/20173 aMany conventional assessment tools are available to teachers in schools for monitoring student progress in a formative manner. The outcomes of these assessment tools are essential to teachers’ instructional modifications and schools’ data-driven educational strategies, such as using remedial activities and planning instructional interventions for students with learning difficulties. When measuring student progress toward instructional goals or outcomes, assessments should be not only considerably precise but also sensitive to individual change in learning. Unlike conventional paper-pencil assessments that are usually not appropriate for every student, computerized adaptive tests (CATs) are highly capable of estimating growth consistently with minimum and consistent error. Therefore, CATs can be used as a progress monitoring tool in measuring student growth.
This study focuses on an operational CAT assessment that has been used for measuring student growth in reading during the academic school year. The sample of this study consists of nearly 7 million students from the 1st grade to the 12th grade in the US. The students received a CAT-based reading assessment periodically during the school year. The purpose of these periodical assessments is to measure the growth in students’ reading achievement and identify the students who may need additional instructional support (e.g., academic interventions). Using real data, this study aims to address the following research questions: (1) How many CAT administrations are necessary to make psychometrically sound decisions about the need for instructional changes in the classroom or when to provide academic interventions?; (2) What is the ideal amount of time between CAT administrations to capture student growth for the purpose of producing meaningful decisions from assessment results?
To address these research questions, we first used the Theil-Sen estimator for robustly fitting a regression line to each student’s test scores obtained from a series of CAT administrations. Next, we used the conditional standard error of measurement (cSEM) from the CAT administrations to create an error band around the Theil-Sen slope (i.e., student growth rate). This process resulted in the normative slope values across all the grade levels. The optimal number of CAT administrations was established from grade-level regression results. The amount of time needed for progress monitoring was determined by calculating the amount of time required for a student to show growth beyond the median cSEM value for each grade level. The results showed that the normative slope values were the highest for lower grades and declined steadily as grade level increased. The results also suggested that the CAT-based reading assessment is most useful for grades 1 through 4, since most struggling readers requiring an intervention appear to be within this grade range. Because CAT yielded very similar cSEM values across administrations, the amount of error in the progress monitoring decisions did not seem to depend on the number of CAT administrations.
Session Video
10aCAT10aLarge-Scale tests10aProcess monitoring1 aBulut, Okan1 aCormier, Damien uhttps://drive.google.com/open?id=1uGbCKenRLnqTxImX1fZicR2c7GRV6Udc00666nas a2200193 4500008003900000022001400039245008000053210006900133300001000202490000600212653004000218653002600258653003300284653002600317653002100343100002200364700002600386856006000412 2017 d a2165-659200aLatent-Class-Based Item Selection for Computerized Adaptive Progress Tests0 aLatentClassBased Item Selection for Computerized Adaptive Progre a22-430 v510acomputerized adaptive progress test10aitem selection method10aKullback-Leibler information10aLatent class analysis10alog-odds scoring1 avan Buuren, Nikky1 aEggen, Theo, J. H. M. uhttp://iacat.org/jcat/index.php/jcat/article/view/62/2902104nas a2200169 4500008004100000245005200041210005100093260005500144520156900199653002101768653000801789653002301797100001501820700001901835700001301854856006701867 2017 eng d00aMHK-MST Design and the Related Simulation Study0 aMHKMST Design and the Related Simulation Study aNiigata, JapanbNiigata Seiryo Universityc08/20173 aThe MHK is a national standardized exam that tests and rates Chinese language proficiency. It assesses non-native Chinese minorities’ abilities in using the Chinese language in their daily, academic and professional lives; Computerized multistage adaptive testing (MST) is a combination of conventional paper-and-pencil (P&P) and item level computerized adaptive test (CAT), it is a kind of test forms based on computerized technology, take the item set as the scoring unit. It can be said that, MST estimate the Ability extreme value more accurate than conventional paper-and-pencil (P&P), also used the CAT auto-adapted characteristic to reduce the examination length and the score time of report. At present, MST has used in some large test, like Uniform CPA Examination and Graduate Record Examination(GRE). Therefore, it is necessary to develop the MST of application in China.
Based on consideration of the MHK characteristics and its future development, the researchers start with design of MHK-MST. This simulation study is conducted to validate the performance of the MHK -MST system. Real difficulty parameters of MHK items and the simulated ability parameters of the candidates are used to generate the original score matrix and the item modules are delivered to the candidates following the adaptive procedures set according to the path rules. This simulation study provides a sound basis for the implementation of MHK-MST.
Session Video
10alanguage testing10aMHK10amultistage testing1 aYuyu, Ling1 aChenglin, Zhou1 aJie, Ren uhttp://iacat.org/mhk-mst-design-and-related-simulation-study-004649nas a2200145 4500008004100000245010200041210006900143260005500212520406800267653003004335653001604365653002204381100002904403856007104432 2017 eng d00aUsing Automated Item Generation in a Large-scale Medical Licensure Exam Program: Lessons Learned.0 aUsing Automated Item Generation in a Largescale Medical Licensur aNiigata, JapanbNiigata Seiryo Universityc08.20173 aOn-demand testing has become commonplace with most large-scale testing programs. Continuous testing is appealing for candidates in that it affords greater flexibility in scheduling a session at the desired location. Furthermore, the push for more comprehensive systems of assessment (e.g. CBAL) is predicated on the availability of more frequently administered tasks given the purposeful link between instruction and assessment in these frameworks. However, continuous testing models impose several challenges to programs, including overexposure of items. Robust item banks are therefore needed to support routine retirement and replenishment of items. In a traditional approach to developing items, content experts select a topic and then develop an item consisting of a stem, lead-in question, a correct answer and list of distractors. The item then undergoes review by a panel of experts to validate the content and identify any potential flaws. The process involved in developing quality MCQ items can be time-consuming as well as costly, with estimates as high as $1500-$2500 USD per item (Rudner, 2010). The Medical Council of Canada (MCC) has been exploring a novel item development process to supplement traditional approaches. Specifically, the use of automated item generation (AIG), which uses technology to generate test items from cognitive models, has been studied for over five years. Cognitive models are representations of the knowledge and skills that are required to solve any given problem. While developing a cognitive model for a medical scenario, for example, content experts are asked to deconstruct the (clinical) reasoning process involved via clearly stated variables and related elements. The latter information is then entered into a computer program that uses algorithms to generate MCQs. The MCC has been piloting AIG –based items for over five years with the MCC Qualifying Examination Part I (MCCQE I), a pre-requisite for licensure in Canada. The aim of this presentation is to provide an overview of the practical lessons learned in the use and operational rollout of AIG with the MCCQE I. Psychometrically, the quality of the items is at least equal, and in many instances superior, to that of traditionally written MCQs, based on difficulty, discrimination, and information. In fact, 96% of the AIG based items piloted in a recent administration were retained for future operational scoring based on pre-defined inclusion criteria. AIG also offers a framework for the systematic creation of plausible distractors, in that the content experts not only need to provide the clinical reasoning underlying a correct response but also the cognitive errors associated with each of the distractors (Lai et al. 2016). Consequently, AIG holds great promise in regard to improving and tailoring diagnostic feedback for remedial purposes (Pugh, De Champlain, Gierl, Lai, Touchie, 2016). Furthermore, our test development process has been greatly enhanced by the addition of AIG as it requires that item writers use metacognitive skills to describe how they solve problems. We are hopeful that sharing our experiences with attendees might not only help other testing organizations interested in adopting AIG, but also foster discussion which might benefit all participants.
References
Lai, H., Gierl, M.J., Touchie, C., Pugh, D., Boulais, A.P., & De Champlain, A.F. (2016). Using automatic item generation to improve the quality of MCQ distractors. Teaching and Learning in Medicine, 28, 166-173.
Pugh, D., De Champlain, A.F., Lai, H., Gierl, M., & Touchie, C. (2016). Using cognitive models to develop quality multiple choice questions. Medical Teacher, 38, 838-843.
Rudner, L. (2010). Implementing the Graduate Management Admission Test Computerized Adaptive Test. In W. van der Linden & C. Glass (Eds.), Elements of adaptive testing (pp. 151-165). New York, NY: Springer.
Presentation Video
10aAutomated item generation10alarge scale10amedical licensure1 aDe Champlain, André, F. uhttps://drive.google.com/open?id=14N8hUc8qexAy5W_94TykEDABGVIJHG1h02527nas a2200205 4500008004100000245011900041210006900160260001200229520179400241653000802035653000802043653003402051653003002085653000802115653003102123653001602154653001302170100002002183856011802203 2011 eng d00aFrom Reliability to Validity: Expanding Adaptive Testing Practice to Find the Most Valid Score for Each Test Taker0 aFrom Reliability to Validity Expanding Adaptive Testing Practice c10/20113 aCAT is an exception to the traditional conception of validity. It is one of the few examples of individualized testing. Item difficulty is tailored to each examinee. The intent, however, is increased efficiency. Focus on reliability (reduced standard error); Equivalence with paper & pencil tests is valued; Validity is enhanced through improved reliability.
How Else Might We Individualize Testing Using CAT?
-
By addressing construct-irrelevant factors influencing individual test scores (usually in negatively biased ways).
-
Individual Score Validity (ISV) – how free is a particular score from construct-irrelevant factors (often called construct-irrelevant variance, or CIV).
An ISV-Based View of Validity
Test Event -- An examinee encounters a series of items in a particular context.
-
•All 3 elements (examinee, items, context) are potential sources of CIV.
-
Examples:
-
Test anxiety (examinee)
-
Amount/difficulty of reading required (item)
-
Test stakes (context)
-
ISV can be affected by all 3 elements.
CAT Goal: individualize testing to address CIV threats to score validity (i.e., maximize ISV).
Some Research Issues:
-
What are some innovative methods for expanding CAT that address ISV threats while preserving measurement of the target construct?
-
How might CAT help address the ISV challenges posed by test anxiety?
-
How should policy-makers deal with scores that have been shown to have low ISV?
10aCAT10aCIV10aconstruct-irrelevant variance10aIndividual Score Validity10aISV10alow test taking motivation10aReliability10avalidity1 aWise, Steven, L uhttp://iacat.org/content/reliability-validity-expanding-adaptive-testing-practice-find-most-valid-score-each-test00684nas a2200217 4500008004100000245006000041210005800101260001200159653001700171653001700188653000800205653001500213653002500228653003200253653001700285100001800302700002100320700001600341700002400357856008500381 2011 eng d00aPractitioner’s Approach to Identify Item Drift in CAT0 aPractitioner s Approach to Identify Item Drift in CAT c10/201110aCUSUM method10aG2 statistic10aIPA10aitem drift10aitem parameter drift10aLord's chi-square statistic10aRaju's NCDIF1 aMeng, Huijuan1 aSteinkamp, Susan1 aJones, Paul1 aMatthews-Lopez, Joy uhttp://iacat.org/content/practitioner%E2%80%99s-approach-identify-item-drift-cat03045nas a2200241 4500008004100000020004100041245017000082210007100252250001500323300001000338490000700348520214300355653001402498653006602512653001102578653001302589100001402602700001202616700001402628700001502642700001702657856012902674 2010 spa d a0214-9915 (Print)0214-9915 (Linking)00aDeterioro de parámetros de los ítems en tests adaptativos informatizados: estudio con eCAT [Item parameter drift in computerized adaptive testing: Study with eCAT]0 aDeterioro de parámetros de los ítems en tests adaptativos inform a2010/04/29 a340-70 v223 aEn el presente trabajo se muestra el análisis realizado sobre un Test Adaptativo Informatizado (TAI) diseñado para la evaluación del nivel de inglés, denominado eCAT, con el objetivo de estudiar el deterioro de parámetros (parameter drift) producido desde la calibración inicial del banco de ítems. Se ha comparado la calibración original desarrollada para la puesta en servicio del TAI (N= 3224) y la calibración actual obtenida con las aplicaciones reales del TAI (N= 7254). Se ha analizado el Funcionamiento Diferencial de los Ítems (FDI) en función de los parámetros utilizados y se ha simulado el impacto que sobre el nivel de rasgo estimado tiene la variación en los parámetros. Los resultados muestran que se produce especialmente un deterioro de los parámetros a y c, que hay unimportante número de ítems del banco para los que existe FDI y que la variación de los parámetros produce un impacto moderado en la estimación de θ de los evaluados con nivel de inglés alto. Se concluye que los parámetros de los ítems se han deteriorado y deben ser actualizados.Item parameter drift in computerized adaptive testing: Study with eCAT. This study describes the parameter drift analysis conducted on eCAT (a Computerized Adaptive Test to assess the written English level of Spanish speakers). The original calibration of the item bank (N = 3224) was compared to a new calibration obtained from the data provided by most eCAT operative administrations (N =7254). A Differential Item Functioning (DIF) study was conducted between the original and the new calibrations. The impact that the new parameters have on the trait level estimates was obtained by simulation. Results show that parameter drift is found especially for a and c parameters, an important number of bank items show DIF, and the parameter change has a moderate impact on high-level-English θ estimates. It is then recommended to replace the original estimates by the new set. by the new set.
10a*Software10aEducational Measurement/*methods/*statistics & numerical data10aHumans10aLanguage1 aAbad, F J1 aOlea, J1 aAguado, D1 aPonsoda, V1 aBarrada, J R uhttp://iacat.org/content/deterioro-de-par%C3%A1metros-de-los-%C3%ADtems-en-tests-adaptativos-informatizados-estudio-con-ecat01507nas a2200217 4500008004100000245008100041210006900122300001200191490000700203520078900210653001100999653003401010653002201044653003501066653002101101653002101122100001901143700001401162700001601176856009701192 2010 eng d00aItem Selection and Hypothesis Testing for the Adaptive Measurement of Change0 aItem Selection and Hypothesis Testing for the Adaptive Measureme a238-2540 v343 aAssessing individual change is an important topic in both psychological and educational measurement. An adaptive measurement of change (AMC) method had previously been shown to exhibit greater efficiency in detecting change than conventional nonadaptive methods. However, little work had been done to compare different procedures within the AMC framework. This study introduced a new item selection criterion and two new test statistics for detecting change with AMC that were specifically designed for the paradigm of hypothesis testing. In two simulation sets, the new methods for detecting significant change improved on existing procedures by demonstrating better adherence to Type I error rates and substantially better power for detecting relatively small change.
10achange10acomputerized adaptive testing10aindividual change10aKullback–Leibler information10alikelihood ratio10ameasuring change1 aFinkelman, M D1 aWeiss, DJ1 aKim-Kang, G uhttp://iacat.org/content/item-selection-and-hypothesis-testing-adaptive-measurement-change-001520nas a2200181 4500008004100000020001400041245007400055210006900129300001200198490000800210520093200218653001401150653002401164653002301188100001501211700001901226856009301245 2009 eng d a0377-221700aA mixed integer programming model for multiple stage adaptive testing0 amixed integer programming model for multiple stage adaptive test a342-3500 v1933 aThe last decade has seen paper-and-pencil (P&P) tests being replaced by computerized adaptive tests (CATs) within many testing programs. A CAT may yield several advantages relative to a conventional P&P test. A CAT can determine the questions or test items to administer, allowing each test form to be tailored to a test taker's skill level. Subsequent items can be chosen to match the capability of the test taker. By adapting to a test taker's ability, a CAT can acquire more information about a test taker while administering fewer items. A Multiple Stage Adaptive test (MST) provides a means to implement a CAT that allows review before the administration. The MST format is a hybrid between the conventional P&P and CAT formats. This paper presents mixed integer programming models for MST assembly problems. Computational results with commercial optimization software will be given and advantages of the models evaluated.10aEducation10aInteger programming10aLinear programming1 aEdmonds, J1 aArmstrong, R D uhttp://iacat.org/content/mixed-integer-programming-model-multiple-stage-adaptive-testing03037nas a2200481 4500008004100000020004600041245012200087210006900209250001500278260000800293300001200301490000700313520155700320653003201877653003101909653002201940653002001962653001001982653000901992653002202001653002802023653003302051653001102084653001102095653002502106653000902131653001602140653004602156653002202202653002402224653003002248653002902278100001502307700001402322700001502336700002402351700001802375700001102393700001602404700001002420700001502430856011002445 2008 eng d a1532-821X (Electronic)0003-9993 (Linking)00aComputerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes0 aComputerized adaptive testing for followup after discharge from a2008/01/30 cFeb a275-2830 v893 aOBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden.10a*Activities of Daily Living10a*Adaptation, Physiological10a*Computer Systems10a*Questionnaires10aAdult10aAged10aAged, 80 and over10aChi-Square Distribution10aFactor Analysis, Statistical10aFemale10aHumans10aLongitudinal Studies10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPatient Discharge10aProspective Studies10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aGandek, B1 aSiebens, H1 aBlack-Schaffer, R M1 aSinclair, S J1 aTao, W1 aCoster, W J1 aNi, P1 aJette, A M uhttp://iacat.org/content/computerized-adaptive-testing-follow-after-discharge-inpatient-rehabilitation-ii01872nas a2200205 4500008003900000245005600039210005600095300001000151490000800161520124900169653002101418653003001439653002501469653001801494653002501512100001301537700001701550700002101567856007801588 2008 d00aComputerized Adaptive Testing of Personality Traits0 aComputerized Adaptive Testing of Personality Traits a12-210 v2163 aA computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a
conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun’s Adjective
Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima’s graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used).
10aAdaptive Testing10acmoputer-assisted testing10aItem Response Theory10aLikert scales10aPersonality Measures1 aHol, A M1 aVorst, H C M1 aMellenbergh, G J uhttp://iacat.org/content/computerized-adaptive-testing-personality-traits02137nas a2200289 4500008004100000020004600041245007200087210006400159250001500223260001100238300000700249490000700256520118400263653002901447653003501476653002601511653002601537653001101563653006101574653001801635653004501653653001301698100001601711700001301727700001801740856008901758 2008 eng d a1553-6467 (Electronic)0002-9459 (Linking)00aThe NAPLEX: evolution, purpose, scope, and educational implications0 aNAPLEX evolution purpose scope and educational implications a2008/05/17 cApr 15 a330 v723 aSince 2004, passing the North American Pharmacist Licensure Examination (NAPLEX) has been a requirement for earning initial pharmacy licensure in all 50 United States. The creation and evolution from 1952-2005 of the particular pharmacy competency testing areas and quantities of questions are described for the former paper-and-pencil National Association of Boards of Pharmacy Licensure Examination (NABPLEX) and the current candidate-specific computer adaptive NAPLEX pharmacy licensure examinations. A 40% increase in the weighting of NAPLEX Blueprint Area 2 in May 2005, compared to that in the preceding 1997-2005 Blueprint, has implications for candidates' NAPLEX performance and associated curricular content and instruction. New pharmacy graduates' scores on the NAPLEX are neither intended nor validated to serve as a criterion for assessing or judging the quality or effectiveness of pharmacy curricula and instruction. The newest cycle of NAPLEX Blueprint revision, a continual process to ensure representation of nationwide contemporary practice, began in early 2008. It may take up to 2 years, including surveying several thousand national pharmacists, to complete.10a*Educational Measurement10aEducation, Pharmacy/*standards10aHistory, 20th Century10aHistory, 21st Century10aHumans10aLicensure, Pharmacy/history/*legislation & jurisprudence10aNorth America10aPharmacists/*legislation & jurisprudence10aSoftware1 aNewton, D W1 aBoyle, M1 aCatizone, C A uhttp://iacat.org/content/naplex-evolution-purpose-scope-and-educational-implications01959nas a2200265 4500008004100000020002200041245008200063210006900145260002600214300001000240490000700250520112900257653001501386653003401401653001401435653001701449653002401466653001501490653002201505653001501527100002301542700001301565700001801578856009701596 2006 eng d a1076-9986 (Print)00aAssembling a computerized adaptive testing item pool as a set of linear tests0 aAssembling a computerized adaptive testing item pool as a set of bSage Publications: US a81-990 v313 aTest-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content constraints, and/or have unfavorable exposure rates. Although at first sight somewhat counterintuitive, it is shown that if the CAT pool is assembled as a set of linear test forms, undesirable correlations can be broken down effectively. It is proposed to assemble such pools using a mixed integer programming model with constraints that guarantee that each test meets all content specifications and an objective function that requires them to have maximal information at a well-chosen set of ability values. An empirical example with a previous master pool from the Law School Admission Test (LSAT) yielded a CAT with nearly uniform bias and mean-squared error functions for the ability estimator and item-exposure rates that satisfied the target for all items in the pool. 10aAlgorithms10acomputerized adaptive testing10aitem pool10alinear tests10amathematical models10astatistics10aTest Construction10aTest Items1 avan der Linden, WJ1 aAriel, A1 aVeldkamp, B P uhttp://iacat.org/content/assembling-computerized-adaptive-testing-item-pool-set-linear-tests01943nas a2200241 4500008004100000020002200041245011200063210006900175260004100244300001200285490000700297520102300304653003401327653002401361653004701385653002401432653002101456653007001477100001301547700001401560700001001574856011701584 2006 eng d a0022-0655 (Print)00aComparing methods of assessing differential item functioning in a computerized adaptive testing environment0 aComparing methods of assessing differential item functioning in bBlackwell Publishing: United Kingdom a245-2640 v433 aMantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed. all rights reserved)10acomputerized adaptive testing10aeducational testing10aitem response theory likelihood ratio test10alogistic regression10atrait estimation10aunidirectional & non-unidirectional differential item functioning1 aLei, P-W1 aChen, S-Y1 aYu, L uhttp://iacat.org/content/comparing-methods-assessing-differential-item-functioning-computerized-adaptive-testing03325nas a2200469 4500008004100000020002200041245011600063210006900179250001500248260000800263300001200271490000700283520189400290653003202184653003102216653002202247653002002269653001002289653000902299653002202308653002802330653003302358653001102391653001102402653002502413653000902438653001602447653004602463653002202509653002402531653003002555653002902585100001502614700001502629700001602644700001102660700002402671700001402695700001802709700001002727856011802737 2006 eng d a0003-9993 (Print)00aComputerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes0 aComputerized adaptive testing for followup after discharge from a2006/08/01 cAug a1033-420 v873 aOBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time.10a*Activities of Daily Living10a*Adaptation, Physiological10a*Computer Systems10a*Questionnaires10aAdult10aAged10aAged, 80 and over10aChi-Square Distribution10aFactor Analysis, Statistical10aFemale10aHumans10aLongitudinal Studies10aMale10aMiddle Aged10aOutcome Assessment (Health Care)/*methods10aPatient Discharge10aProspective Studies10aRehabilitation/*standards10aSubacute Care/*standards1 aHaley, S M1 aSiebens, H1 aCoster, W J1 aTao, W1 aBlack-Schaffer, R M1 aGandek, B1 aSinclair, S J1 aNi, P uhttp://iacat.org/content/computerized-adaptive-testing-follow-after-discharge-inpatient-rehabilitation-i-activity03162nas a2200361 4500008004100000020002200041245013600063210006900199250001500268260000800283300001200291490000700303520206800310653001502378653002402393653002102417653001002438653000902448653002902457653003402486653002402520653001102544653001102555653001302566653000902579653001602588100001602604700001302620700002302633700001202656700001102668856012102679 2006 eng d a0962-9343 (Print)00aComputerized adaptive testing of diabetes impact: a feasibility study of Hispanics and non-Hispanics in an active clinic population0 aComputerized adaptive testing of diabetes impact a feasibility s a2006/10/13 cNov a1503-180 v153 aBACKGROUND: Diabetes is a leading cause of death and disability in the US and is twice as common among Hispanic Americans as non-Hispanics. The societal costs of diabetes provide an impetus for developing tools that can improve patient care and delay or prevent diabetes complications. METHODS: We implemented a feasibility study of a Computerized Adaptive Test (CAT) to measure diabetes impact using a sample of 103 English- and 97 Spanish-speaking patients (mean age = 56.5, 66.5% female) in a community medical center with a high proportion of minority patients (28% African-American). The 37 items of the Diabetes Impact Survey were translated using forward-backward translation and cognitive debriefing. Participants were randomized to receive either the full-length tool or the Diabetes-CAT first, in the patient's native language. RESULTS: The number of items and the amount of time to complete the survey for the CAT was reduced to one-sixth the amount for the full-length tool in both languages, across disease severity. Confirmatory Factor Analysis confirmed that the Diabetes Impact Survey is unidimensional. The Diabetes-CAT demonstrated acceptable internal consistency reliability, construct validity, and discriminant validity in the overall sample, although subgroup analyses suggested that the English sample data evidenced higher levels of reliability and validity than the Spanish sample and issues with discriminant validity in the Spanish sample. Differential Item Function analysis revealed differences in responses tendencies by language group in 3 of the 37 items. Participant interviews suggested that the Spanish-speaking patients generally preferred the paper survey to the computer-assisted tool, and were twice as likely to experience difficulties understanding the items. CONCLUSIONS: While the Diabetes-CAT demonstrated clear advantages in reducing respondent burden as compared to the full-length tool, simplifying the item bank will be necessary for enhancing the feasibility of the Diabetes-CAT for use with low literacy patients.10a*Computers10a*Hispanic Americans10a*Quality of Life10aAdult10aAged10aData Collection/*methods10aDiabetes Mellitus/*psychology10aFeasibility Studies10aFemale10aHumans10aLanguage10aMale10aMiddle Aged1 aSchwartz, C1 aWelch, G1 aSantiago-Kelley, P1 aBode, R1 aSun, X uhttp://iacat.org/content/computerized-adaptive-testing-diabetes-impact-feasibility-study-hispanics-and-non-hispanics01580nas a2200205 4500008004100000020002200041245005000063210005000113260002600163300001200189490000700201520094200208653003401150653002801184653001901212653002001231653003301251100002301284856006701307 2006 eng d a0146-6216 (Print)00aEquating scores from adaptive to linear tests0 aEquating scores from adaptive to linear tests bSage Publications: US a493-5080 v303 aTwo local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test for a population of test takers. The two local methods were generally best. Surprisingly, the TCF method performed slightly worse than the equipercentile method. Both methods showed strong bias and uniformly large inaccuracy, but the TCF method suffered from extra error due to the lower asymptote of the test characteristic function. It is argued that the worse performances of the two methods are a consequence of the fact that they use a single equating transformation for an entire population of test takers and therefore have to compromise between the individual score distributions. 10acomputerized adaptive testing10aequipercentile equating10alocal equating10ascore reporting10atest characteristic function1 avan der Linden, WJ uhttp://iacat.org/content/equating-scores-adaptive-linear-tests03115nas a2200277 4500008004100000020002200041245010900063210006900172250001500241260000800256300001200264490000700276520221700283653002902500653002002529653002502549653002102574653001502595653002802610653001102638653002502649100001702674700001502691700001202706856011902718 2006 eng d a0214-9915 (Print)00aMaximum information stratification method for controlling item exposure in computerized adaptive testing0 aMaximum information stratification method for controlling item e a2007/02/14 cFeb a156-1590 v183 aThe proposal for increasing the security in Computerized Adaptive Tests that has received most attention in recent years is the a-stratified method (AS - Chang and Ying, 1999): at the beginning of the test only items with low discrimination parameters (a) can be administered, with the values of the a parameters increasing as the test goes on. With this method, distribution of the exposure rates of the items is less skewed, while efficiency is maintained in trait-level estimation. The pseudo-guessing parameter (c), present in the three-parameter logistic model, is considered irrelevant, and is not used in the AS method. The Maximum Information Stratified (MIS) model incorporates the c parameter in the stratification of the bank and in the item-selection rule, improving accuracy by comparison with the AS, for item banks with a and b parameters correlated and uncorrelated. For both kinds of banks, the blocking b methods (Chang, Qian and Ying, 2001) improve the security of the item bank.Método de estratificación por máxima información para el control de la exposición en tests adaptativos informatizados. La propuesta para aumentar la seguridad en los tests adaptativos informatizados que ha recibido más atención en los últimos años ha sido el método a-estratificado (AE - Chang y Ying, 1999): en los momentos iniciales del test sólo pueden administrarse ítems con bajos parámetros de discriminación (a), incrementándose los valores del parámetro a admisibles según avanza el test. Con este método la distribución de las tasas de exposición de los ítems es más equilibrada, manteniendo una adecuada precisión en la medida. El parámetro de pseudoadivinación (c), presente en el modelo logístico de tres parámetros, se supone irrelevante y no se incorpora en el AE. El método de Estratificación por Máxima Información (EMI) incorpora el parámetro c a la estratificación del banco y a la regla de selección de ítems, mejorando la precisión en comparación con AE, tanto para bancos donde los parámetros a y b correlacionan como para bancos donde no. Para ambos tipos de bancos, los métodos de bloqueo de b (Chang, Qian y Ying, 2001) mejoran la seguridad del banco.10a*Artificial Intelligence10a*Microcomputers10a*Psychological Tests10a*Software Design10aAlgorithms10aChi-Square Distribution10aHumans10aLikelihood Functions1 aBarrada, J R1 aMazuela, P1 aOlea, J uhttp://iacat.org/content/maximum-information-stratification-method-controlling-item-exposure-computerized-adaptive02107nas a2200229 4500008004100000245013800041210006900179300001400248490000700262520127000269653003101539653003401570653002501604653001701629653001901646653002401665100001401689700001801703700001701721700001901738856012001757 2006 eng d00aSimulated computerized adaptive test for patients with lumbar spine impairments was efficient and produced valid measures of function0 aSimulated computerized adaptive test for patients with lumbar sp a947–9560 v593 aObjective: To equate physical functioning (PF) items with Back Pain Functional Scale (BPFS) items, develop a computerized adaptive test (CAT) designed to assess lumbar spine functional status (LFS) in people with lumbar spine impairments, and compare discriminant validity of LFS measures (qIRT) generated using all items analyzed with a rating scale Item Response Theory model (RSM) and measures generated using the simulated CAT (qCAT).
Methods: We performed a secondary analysis of retrospective intake rehabilitation data.
Results: Unidimensionality and local independence of 25 BPFS and PF items were supported. Differential item functioning was negligible for levels of symptom acuity, gender, age, and surgical history. The RSM fit the data well. A lumbar spine specific CAT was developed
that was 72% more efficient than using all 25 items to estimate LFS measures. qIRT and qCAT measures did not discriminate patients by symptom acuity, age, or gender, but discriminated patients by surgical history in similar clinically logical ways. qCAT measures were as precise as qIRT measures.
Conclusion: A body part specific simulated CAT developed from an LFS item bank was efficient and produced precise measures of LFS without eroding discriminant validity.10aBack Pain Functional Scale10acomputerized adaptive testing10aItem Response Theory10aLumbar spine10aRehabilitation10aTrue-score equating1 aHart, D L1 aMioduski, J E1 aWerneke, M W1 aStratford, P W uhttp://iacat.org/content/simulated-computerized-adaptive-test-patients-lumbar-spine-impairments-was-efficient-and-001601nas a2200253 4500008004100000020002200041245003000063210003000093250001500123300001100138490000600149520095200155653001401107653002501121653002901146653001801175653001901193653001101212653001401223653001901237653002001256100001601276856005501292 2005 eng d a1529-7713 (Print)00aComputer adaptive testing0 aComputer adaptive testing a2005/02/11 a109-270 v63 aThe creation of item response theory (IRT) and Rasch models, inexpensive accessibility to high speed desktop computers, and the growth of the Internet, has led to the creation and growth of computerized adaptive testing or CAT. This form of assessment is applicable for both high stakes tests such as certification or licensure exams, as well as health related quality of life surveys. This article discusses the historical background of CAT including its many advantages over conventional (typically paper and pencil) alternatives. The process of CAT is then described including descriptions of the specific differences of using CAT based upon 1-, 2- and 3-parameter IRT and various Rasch models. Numerous specific topics describing CAT in practice are described including: initial item selection, content balancing, test difficulty, test length and stopping rules. The article concludes with the author's reflections regarding the future of CAT.10a*Internet10a*Models, Statistical10a*User-Computer Interface10aCertification10aHealth Surveys10aHumans10aLicensure10aMicrocomputers10aQuality of Life1 aGershon, RC uhttp://iacat.org/content/computer-adaptive-testing02716nas a2200373 4500008004100000245017500041210006900216300001100285490000700296520137800303653003001681653003101711653001501742653001001757653000901767653002201776653003201798653004201830653001101872653003001883653001101913653005101924653003101975653003702006653000902043653001602052653004102068653004102109653002602150100001402176700001802190700001902208856011502227 2005 eng d00aSimulated computerized adaptive tests for measuring functional status were efficient with good discriminant validity in patients with hip, knee, or foot/ankle impairments0 aSimulated computerized adaptive tests for measuring functional s a629-380 v583 aBACKGROUND AND OBJECTIVE: To develop computerized adaptive tests (CATs) designed to assess lower extremity functional status (FS) in people with lower extremity impairments using items from the Lower Extremity Functional Scale and compare discriminant validity of FS measures generated using all items analyzed with a rating scale Item Response Theory model (theta(IRT)) and measures generated using the simulated CATs (theta(CAT)). METHODS: Secondary analysis of retrospective intake rehabilitation data. RESULTS: Unidimensionality of items was strong, and local independence of items was adequate. Differential item functioning (DIF) affected item calibration related to body part, that is, hip, knee, or foot/ankle, but DIF did not affect item calibration for symptom acuity, gender, age, or surgical history. Therefore, patients were separated into three body part specific groups. The rating scale model fit all three data sets well. Three body part specific CATs were developed: each was 70% more efficient than using all LEFS items to estimate FS measures. theta(IRT) and theta(CAT) measures discriminated patients by symptom acuity, age, and surgical history in similar ways. theta(CAT) measures were as precise as theta(IRT) measures. CONCLUSION: Body part-specific simulated CATs were efficient and produced precise measures of FS with good discriminant validity.10a*Health Status Indicators10aActivities of Daily Living10aAdolescent10aAdult10aAged10aAged, 80 and over10aAnkle Joint/physiopathology10aDiagnosis, Computer-Assisted/*methods10aFemale10aHip Joint/physiopathology10aHumans10aJoint Diseases/physiopathology/*rehabilitation10aKnee Joint/physiopathology10aLower Extremity/*physiopathology10aMale10aMiddle Aged10aResearch Support, N.I.H., Extramural10aResearch Support, U.S. Gov't, P.H.S.10aRetrospective Studies1 aHart, D L1 aMioduski, J E1 aStratford, P W uhttp://iacat.org/content/simulated-computerized-adaptive-tests-measuring-functional-status-were-efficient-good02772nas a2200421 4500008004100000020004600041245011200087210006900199250001500268260001000283300000700293490000600300520144100306653002701747653003001774653004701804653001001851653000901861653002201870653002801892653001101920653001101931653002001942653000901962653001601971653001601987653001902003653001602022653003502038653002902073653004002102653003002142100001402172700001702186700001702203700001402220856011602234 2004 eng d a1477-7525 (Electronic)1477-7525 (Linking)00aThe AMC Linear Disability Score project in a population requiring residential care: psychometric properties0 aAMC Linear Disability Score project in a population requiring re a2004/08/05 cAug 3 a420 v23 aBACKGROUND: Currently there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes, including functional status. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. METHOD: This paper examines the psychometric properties of the AMC Linear Disability Score (ALDS) project item bank using an item response theory model and full information factor analysis. Data were collected from 555 respondents on a total of 160 items. RESULTS: Following the analysis, 79 items remained in the item bank. The remaining 81 items were excluded because of: difficulties in presentation (1 item); low levels of variation in response pattern (28 items); significant differences in measurement characteristics for males and females or for respondents under or over 85 years old (26 items); or lack of model fit to the data at item level (26 items). CONCLUSIONS: It is conceivable that the item bank will have different measurement characteristics for other patient or demographic populations. However, these results indicate that the ALDS item bank has sound psychometric properties for respondents in residential care settings and could form a stable base for measuring functional status in a range of situations, including the implementation of computerised adaptive testing of functional status.10a*Disability Evaluation10a*Health Status Indicators10aActivities of Daily Living/*classification10aAdult10aAged10aAged, 80 and over10aData Collection/methods10aFemale10aHumans10aLogistic Models10aMale10aMiddle Aged10aNetherlands10aPilot Projects10aProbability10aPsychometrics/*instrumentation10aQuestionnaires/standards10aResidential Facilities/*utilization10aSeverity of Illness Index1 aHolman, R1 aLindeboom, R1 aVermeulen, M1 aHaan, R J uhttp://iacat.org/content/amc-linear-disability-score-project-population-requiring-residential-care-psychometric01539nas a2200229 4500008004100000020002200041245006400063210006300127260002600190300001200216490000700228520081800235653003401053653003001087653002801117653001301145100001901158700001501177700001601192700001701208856008401225 2004 eng d a0146-6216 (Print)00aComputerized adaptive testing with multiple-form structures0 aComputerized adaptive testing with multipleform structures bSage Publications: US a147-1640 v283 aA multiple-form structure (MFS) is an ordered collection or network of testlets (i.e., sets of items). An examinee's progression through the network of testlets is dictated by the correctness of an examinee's answers, thereby adapting the test to his or her trait level. The collection of paths through the network yields the set of all possible test forms, allowing test specialists the opportunity to review them before they are administered. Also, limiting the exposure of an individual MFS to a specific period of time can enhance test security. This article provides an overview of methods that have been developed to generate parallel MFSs. The approach is applied to the assembly of an experimental computerized Law School Admission Test (LSAT). (PsycINFO Database Record (c) 2007 APA, all rights reserved)10acomputerized adaptive testing10aLaw School Admission Test10amultiple-form structure10atestlets1 aArmstrong, R D1 aJones, D H1 aKoppel, N B1 aPashley, P J uhttp://iacat.org/content/computerized-adaptive-testing-multiple-form-structures02650nas a2200385 4500008004100000245014400041210006900185300001200254490000700266520138100273653002101654653003301675653002901708653001501737653001001752653000901762653002201771653002601793653003301819653002501852653001901877653001001896653002501906653001601931653002401947653002601971653002701997653003202024653001302056653002802069100001702097700001602114700001402130856012002144 2003 eng d00aCalibration of an item pool for assessing the burden of headaches: an application of item response theory to the Headache Impact Test (HIT)0 aCalibration of an item pool for assessing the burden of headache a913-9330 v123 aBACKGROUND: Measurement of headache impact is important in clinical trials, case detection, and the clinical monitoring of patients. Computerized adaptive testing (CAT) of headache impact has potential advantages over traditional fixed-length tests in terms of precision, relevance, real-time quality control and flexibility. OBJECTIVE: To develop an item pool that can be used for a computerized adaptive test of headache impact. METHODS: We analyzed responses to four well-known tests of headache impact from a population-based sample of recent headache sufferers (n = 1016). We used confirmatory factor analysis for categorical data and analyses based on item response theory (IRT). RESULTS: In factor analyses, we found very high correlations between the factors hypothesized by the original test constructers, both within and between the original questionnaires. These results suggest that a single score of headache impact is sufficient. We established a pool of 47 items which fitted the generalized partial credit IRT model. By simulating a computerized adaptive health test we showed that an adaptive test of only five items had a very high concordance with the score based on all items and that different worst-case item selection scenarios did not lead to bias. CONCLUSION: We have established a headache impact item pool that can be used in CAT of headache impact.10a*Cost of Illness10a*Decision Support Techniques10a*Sickness Impact Profile10aAdolescent10aAdult10aAged10aComparative Study10aDisability Evaluation10aFactor Analysis, Statistical10aHeadache/*psychology10aHealth Surveys10aHuman10aLongitudinal Studies10aMiddle Aged10aMigraine/psychology10aModels, Psychological10aPsychometrics/*methods10aQuality of Life/*psychology10aSoftware10aSupport, Non-U.S. Gov't1 aBjorner, J B1 aKosinski, M1 aWare, Jr. uhttp://iacat.org/content/calibration-item-pool-assessing-burden-headaches-application-item-response-theory-headache02855nas a2200265 4500008004100000245006600041210006600107260000800173300000900181490000700190520208800197653002102285653002902306653003002335653001202365653001102377653001302388653003102401653001902432100001302451700001502464700001302479700001502492856008202507 2002 eng d00aAdvances in quality of life measurements in oncology patients0 aAdvances in quality of life measurements in oncology patients cJun a60-80 v293 aAccurate assessment of the quality of life (QOL) of patients can provide important clinical information to physicians, especially in the area of oncology. Changes in QOL are important indicators of the impact of a new cytotoxic therapy, can affect a patient's willingness to continue treatment, and may aid in defining response in the absence of quantifiable endpoints such as tumor regression. Because QOL is becoming an increasingly important aspect in the management of patients with malignant disease, it is vital that the instruments used to measure QOL are reliable and accurate. Assessment of QOL involves a multidimensional approach that includes physical, functional, social, and emotional well-being, and the most comprehensive instruments measure at least three of these domains. Instruments to measure QOL can be generic (eg, the Nottingham Health Profile), targeted toward specific illnesses (eg, Functional Assessment of Cancer Therapy - Lung), or be a combination of generic and targeted. Two of the most widely used examples of the combination, or hybrid, instruments are the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 Items and the Functional Assessment of Chronic Illness Therapy. A consequence of the increasing international collaboration in clinical trials has been the growing necessity for instruments that are valid across languages and cultures. To assure the continuing reliability and validity of QOL instruments in this regard, item response theory can be applied. Techniques such as item response theory may be used in the future to construct QOL item banks containing large sets of validated questions that represent various levels of QOL domains. As QOL becomes increasingly important in understanding and approaching the overall management of cancer patients, the tools available to clinicians and researchers to assess QOL will continue to evolve. While the instruments currently available provide reliable and valid measurement, further improvements in precision and application are anticipated.10a*Quality of Life10a*Sickness Impact Profile10aCross-Cultural Comparison10aCulture10aHumans10aLanguage10aNeoplasms/*physiopathology10aQuestionnaires1 aCella, D1 aChang, C-H1 aLai, J S1 aWebster, K uhttp://iacat.org/content/advances-quality-life-measurements-oncology-patients01213nas a2200217 4500008004100000245005100041210005100092300001200143490000700155520060500162653002600767653003000793653001600823653001300839653001300852653001100865653001200876653001500888100001600903856007600919 2002 eng d00aInformation technology and literacy assessment0 aInformation technology and literacy assessment a369-3730 v183 aThis column discusses information technology and literacy assessment in the past and present. The author also describes computer-based assessments today including the following topics: computer-scored testing, computer-administered formal assessment, Internet formal assessment, computerized adaptive tests, placement tests, informal assessment, electronic portfolios, information management, and Internet information dissemination. A model of the major present-day applications of information technologies in reading and literacy assessment is also included. (PsycINFO Database Record (c) 2005 APA )10aComputer Applications10aComputer Assisted Testing10aInformation10aInternet10aLiteracy10aModels10aSystems10aTechnology1 aBalajthy, E uhttp://iacat.org/content/information-technology-and-literacy-assessment03058nas a2200325 4500008004100000020004100041245008100082210006900163250001500232260000800247300001100255490000700266520201300273653001502286653001002301653004002311653005702351653003302408653001102441653001102452653001802463653000902481653002802490653001202518653005502530100001502585700001802600700001502618856009902633 2002 eng d a0025-7079 (Print)0025-7079 (Linking)00aMultidimensional adaptive testing for mental health problems in primary care0 aMultidimensional adaptive testing for mental health problems in a2002/09/10 cSep a812-230 v403 aOBJECTIVES: Efficient and accurate instruments for assessing child psychopathology are increasingly important in clinical practice and research. For example, screening in primary care settings can identify children and adolescents with disorders that may otherwise go undetected. However, primary care offices are notorious for the brevity of visits and screening must not burden patients or staff with long questionnaires. One solution is to shorten assessment instruments, but dropping questions typically makes an instrument less accurate. An alternative is adaptive testing, in which a computer selects the items to be asked of a patient based on the patient's previous responses. This research used a simulation to test a child mental health screen based on this technology. RESEARCH DESIGN: Using half of a large sample of data, a computerized version was developed of the Pediatric Symptom Checklist (PSC), a parental-report psychosocial problem screen. With the unused data, a simulation was conducted to determine whether the Adaptive PSC can reproduce the results of the full PSC with greater efficiency. SUBJECTS: PSCs were completed by parents on 21,150 children seen in a national sample of primary care practices. RESULTS: Four latent psychosocial problem dimensions were identified through factor analysis: internalizing problems, externalizing problems, attention problems, and school problems. A simulated adaptive test measuring these traits asked an average of 11.6 questions per patient, and asked five or fewer questions for 49% of the sample. There was high agreement between the adaptive test and the full (35-item) PSC: only 1.3% of screening decisions were discordant (kappa = 0.93). This agreement was higher than that obtained using a comparable length (12-item) short-form PSC (3.2% of decisions discordant; kappa = 0.84). CONCLUSIONS: Multidimensional adaptive testing may be an accurate and efficient technology for screening for mental health problems in primary care settings.10aAdolescent10aChild10aChild Behavior Disorders/*diagnosis10aChild Health Services/*organization & administration10aFactor Analysis, Statistical10aFemale10aHumans10aLinear Models10aMale10aMass Screening/*methods10aParents10aPrimary Health Care/*organization & administration1 aGardner, W1 aKelleher, K J1 aPajer, K A uhttp://iacat.org/content/multidimensional-adaptive-testing-mental-health-problems-primary-care02151nas a2200277 4500008004100000020002200041245009600063210006900159250001500228260000800243300001100251490000700262520122800269653001601497653003901513653003701552653001101589653003301600653002501633653002701658653003101685100001701716700001601733700001701749856010701766 1999 eng d a1040-2446 (Print)00aEvaluating the usefulness of computerized adaptive testing for medical in-course assessment0 aEvaluating the usefulness of computerized adaptive testing for m a1999/10/28 cOct a1125-80 v743 aPURPOSE: This study investigated the feasibility of converting an existing computer-administered, in-course internal medicine test to an adaptive format. METHOD: A 200-item internal medicine extended matching test was used for this research. Parameters were estimated with commercially available software with responses from 621 examinees. A specially developed simulation program was used to retrospectively estimate the efficiency of the computer-adaptive exam format. RESULTS: It was found that the average test length could be shortened by almost half with measurement precision approximately equal to that of the full 200-item paper-and-pencil test. However, computer-adaptive testing with this item bank provided little advantage for examinees at the upper end of the ability continuum. An examination of classical item statistics and IRT item statistics suggested that adding more difficult items might extend the advantage to this group of examinees. CONCLUSIONS: Medical item banks presently used for incourse assessment might be advantageously employed in adaptive testing. However, it is important to evaluate the match between the items and the measurement objective of the test before implementing this format.10a*Automation10a*Education, Medical, Undergraduate10aEducational Measurement/*methods10aHumans10aInternal Medicine/*education10aLikelihood Functions10aPsychometrics/*methods10aReproducibility of Results1 aKreiter, C D1 aFerguson, K1 aGruppen, L D uhttp://iacat.org/content/evaluating-usefulness-computerized-adaptive-testing-medical-course-assessment00753nas a2200241 4500008004100000020002200041245006700063210006400130250001500194260000800209300001100217490000700228653001500235653002600250653003700276653002300313653001800336100002100354700001400375700002400389700001400413856008400427 1993 eng d a0744-6314 (Print)00aMoving in a new direction: Computerized adaptive testing (CAT)0 aMoving in a new direction Computerized adaptive testing CAT a1993/01/01 cJan a80, 820 v2410a*Computers10aAccreditation/methods10aEducational Measurement/*methods10aLicensure, Nursing10aUnited States1 aJones-Dickson, C1 aDorsey, D1 aCampbell-Warnock, J1 aFields, F uhttp://iacat.org/content/moving-new-direction-computerized-adaptive-testing-cat