%0 Journal Article %J Applied Psychological Measurement %D 2020 %T Multidimensional Test Assembly Using Mixed-Integer Linear Programming: An Application of Kullback–Leibler Information %A Dries Debeer %A Peter W. van Rijn %A Usama S. Ali %X Many educational testing programs require different test forms with minimal or no item overlap. At the same time, the test forms should be parallel in terms of their statistical and content-related properties. A well-established method to assemble parallel test forms is to apply combinatorial optimization using mixed-integer linear programming (MILP). Using this approach, in the unidimensional case, Fisher information (FI) is commonly used as the statistical target to obtain parallelism. In the multidimensional case, however, FI is a multidimensional matrix, which complicates its use as a statistical target. Previous research addressing this problem focused on item selection criteria for multidimensional computerized adaptive testing (MCAT). Yet these selection criteria are not directly transferable to the assembly of linear parallel test forms. To bridge this gap the authors derive different statistical targets, based on either FI or the Kullback–Leibler (KL) divergence, that can be applied in MILP models to assemble multidimensional parallel test forms. Using simulated item pools and an item pool based on empirical items, the proposed statistical targets are compared and evaluated. Promising results with respect to the KL-based statistical targets are presented and discussed. %B Applied Psychological Measurement %V 44 %P 17-32 %U https://doi.org/10.1177/0146621619827586 %R 10.1177/0146621619827586 %0 Journal Article %J Journal of Educational Measurement %D 2019 %T Computerized Adaptive Testing in Early Education: Exploring the Impact of Item Position Effects on Ability Estimation %A Albano, Anthony D. %A Cai, Liuhan %A Lease, Erin M. %A McConnell, Scott R. %X Abstract Studies have shown that item difficulty can vary significantly based on the context of an item within a test form. In particular, item position may be associated with practice and fatigue effects that influence item parameter estimation. The purpose of this research was to examine the relevance of item position specifically for assessments used in early education, an area of testing that has received relatively limited psychometric attention. In an initial study, multilevel item response models fit to data from an early literacy measure revealed statistically significant increases in difficulty for items appearing later in a 20-item form. The estimated linear change in logits for an increase of 1 in position was .024, resulting in a predicted change of .46 logits for a shift from the beginning to the end of the form. A subsequent simulation study examined impacts of item position effects on person ability estimation within computerized adaptive testing. Implications and recommendations for practice are discussed. %B Journal of Educational Measurement %V 56 %P 437-451 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12215 %R 10.1111/jedm.12215 %0 Journal Article %J Educational and Psychological Measurement %D 2019 %T Developing Multistage Tests Using D-Scoring Method %A Kyung (Chris) T. Han %A Dimiter M. Dimitrov %A Faisal Al-Mashary %X The D-scoring method for scoring and equating tests with binary items proposed by Dimitrov offers some of the advantages of item response theory, such as item-level difficulty information and score computation that reflects the item difficulties, while retaining the merits of classical test theory such as the simplicity of number correct score computation and relaxed requirements for model sample sizes. Because of its unique combination of those merits, the D-scoring method has seen quick adoption in the educational and psychological measurement field. Because item-level difficulty information is available with the D-scoring method and item difficulties are reflected in test scores, it conceptually makes sense to use the D-scoring method with adaptive test designs such as multistage testing (MST). In this study, we developed and compared several versions of the MST mechanism using the D-scoring approach and also proposed and implemented a new framework for conducting MST simulation under the D-scoring method. Our findings suggest that the score recovery performance under MST with D-scoring was promising, as it retained score comparability across different MST paths. We found that MST using the D-scoring method can achieve improvements in measurement precision and efficiency over linear-based tests that use D-scoring method. %B Educational and Psychological Measurement %V 79 %P 988-1008 %U https://doi.org/10.1177/0013164419841428 %R 10.1177/0013164419841428 %0 Journal Article %J Applied Psychological Measurement %D 2017 %T Is a Computerized Adaptive Test More Motivating Than a Fixed-Item Test? %A Guangming Ling %A Yigal Attali %A Bridgid Finn %A Elizabeth A. Stone %X Computer adaptive tests provide important measurement advantages over traditional fixed-item tests, but research on the psychological reactions of test takers to adaptive tests is lacking. In particular, it has been suggested that test-taker engagement, and possibly test performance as a consequence, could benefit from the control that adaptive tests have on the number of test items examinees answer correctly. However, previous research on this issue found little support for this possibility. This study expands on previous research by examining this issue in the context of a mathematical ability assessment and by considering the possible effect of immediate feedback of response correctness on test engagement, test anxiety, time on task, and test performance. Middle school students completed a mathematics assessment under one of three test type conditions (fixed, adaptive, or easier adaptive) and either with or without immediate feedback about the correctness of responses. Results showed little evidence for test type effects. The easier adaptive test resulted in higher engagement and lower anxiety than either the adaptive or fixed-item tests; however, no significant differences in performance were found across test types, although performance was significantly higher across all test types when students received immediate feedback. In addition, these effects were not related to ability level, as measured by the state assessment achievement levels. The possibility that test experiences in adaptive tests may not in practice be significantly different than in fixed-item tests is raised and discussed to explain the results of this and previous studies. %B Applied Psychological Measurement %V 41 %P 495-511 %U https://doi.org/10.1177/0146621617707556 %R 10.1177/0146621617707556 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Construction of Gratitude Scale Using Polytomous Item Response Theory Model %A Nurul Arbiyah %K Gratitude Scale %K polytomous items %X

Various studies have shown that gratitude is essential to increase the happiness and quality of life of every individual. Unfortunately, research on gratitude still received little attention, and there is no standardized measurement for it. Gratitude measurement scale was developed overseas, and has not adapted to the Indonesian culture context. Moreover, the scale development is generally performed with classical theory approach, which has some drawbacks. This research will develop a gratitude scale using polytomous Item Response Theory model (IRT) by applying the Partial Credit Model (PCM).

The pilot study results showed that the gratitude scale (with 44 items) is a reliable measure (α = 0.944) and valid (meet both convergent and discriminant validity requirements). The pilot study results also showed that the gratitude scale satisfies unidimensionality assumptions.

The test results using the PCM model showed that the gratitude scale had a fit model. Of 44 items, there was one item that does not fit, so it was eliminated. Second test results for the remaining 43 items showed that they fit the model, and all items were fit to measure gratitude. Analysis using Differential Item Functioning (DIF) showed four items have a response bias based on gender. Thus, there are 39 items remaining in the scale.

Session Video 

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1pHhO4cq2-wh24ht3nBAoXNHv7234_mjH %0 Journal Article %J Quality of Life Research %D 2017 %T The validation of a computer-adaptive test (CAT) for assessing health-related quality of life in children and adolescents in a clinical sample: study design, methods and first results of the Kids-CAT study %A Barthel, D. %A Otto, C. %A Nolte, S. %A Meyrose, A.-K. %A Fischer, F. %A Devine, J. %A Walter, O. %A Mierke, A. %A Fischer, K. I. %A Thyen, U. %A Klein, M. %A Ankermann, T. %A Rose, M. %A Ravens-Sieberer, U. %X Recently, we developed a computer-adaptive test (CAT) for assessing health-related quality of life (HRQoL) in children and adolescents: the Kids-CAT. It measures five generic HRQoL dimensions. The aims of this article were (1) to present the study design and (2) to investigate its psychometric properties in a clinical setting. %B Quality of Life Research %V 26 %P 1105–1117 %8 May %U https://doi.org/10.1007/s11136-016-1437-9 %R 10.1007/s11136-016-1437-9 %0 Journal Article %J Applied Measurement in Education %D 2015 %T Considering the Use of General and Modified Assessment Items in Computerized Adaptive Testing %A Wyse, A. E. %A Albano, A. D. %X This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for students with disabilities that have typically taken alternate assessments based on modified achievement standards (AA-MAS). A simulation study indicated that the abilities of AA-MAS students can be underestimated or overestimated by the mixed-item CAT, depending on students’ location on the underlying ability scale. These findings held across grade levels and test lengths. The mixed-item CAT appeared to function well for non-AA-MAS students. %B Applied Measurement in Education %V 28 %N 2 %R http://dx.doi.org/10.1080/08957347.2014.1002921 %0 Generic %D 2011 %T Cross-cultural development of an item list for computer-adaptive testing of fatigue in oncological patients %A Giesinger, J. M. %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Arraras, J. I. %A Conroy, T. %A Gamper, E. M. %A Kemmler, G. %A King, M. T. %A Oberguggenberger, A. S. %A Velikova, G. %A Young, T. %A Holzner, B. %A Eortc-Qlg, E. O. %X ABSTRACT: INTRODUCTION: Within an ongoing project of the EORTC Quality of Life Group, we are developing computerized adaptive test (CAT) measures for the QLQ-C30 scales. These new CAT measures are conceptualised to reflect the same constructs as the QLQ-C30 scales. Accordingly, the Fatigue-CAT is intended to capture physical and general fatigue. METHODS: The EORTC approach to CAT development comprises four phases (literature search, operationalisation, pre-testing, and field testing). Phases I-III are described in detail in this paper. A literature search for fatigue items was performed in major medical databases. After refinement through several expert panels, the remaining items were used as the basis for adapting items and/or formulating new items fitting the EORTC item style. To obtain feedback from patients with cancer, these English items were translated into Danish, French, German, and Spanish and tested in the respective countries. RESULTS: Based on the literature search a list containing 588 items was generated. After a comprehensive item selection procedure focusing on content, redundancy, item clarity and item difficulty a list of 44 fatigue items was generated. Patient interviews (n=52) resulted in 12 revisions of wording and translations. DISCUSSION: The item list developed in phases I-III will be further investigated within a field-testing phase (IV) to examine psychometric characteristics and to fit an item response theory model. The Fatigue CAT based on this item bank will provide scores that are backward-compatible to the original QLQ-C30 fatigue scale. %B Health and Quality of Life Outcomes %7 2011/03/31 %V 9 %P 10 %8 March 29, 2011 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G Eng %M 21447160 %0 Journal Article %J Journal of Applied Testing Technology %D 2011 %T Design of a Computer-Adaptive Test to Measure English Literacy and Numeracy in the Singapore Workforce: Considerations, Benefits, and Implications %A Jacobsen, J. %A Ackermann, R. %A Egüez, J. %A Ganguli, D. %A Rickard, P. %A Taylor, L. %X

A computer adaptive test CAT) is a delivery methodology that serves the larger goals of the assessment system in which it is embedded. A thorough analysis of the assessment system for which a CAT is being designed is critical to ensure that the delivery platform is appropriate and addresses all relevant complexities. As such, a CAT engine must be designed to conform to the
validity and reliability of the overall system. This design takes the form of adherence to the assessment goals and objectives of the adaptive assessment system. When the assessment is adapted for use in another country, consideration must be given to any necessary revisions including content differences. This article addresses these considerations while drawing, in part, on the process followed in the development of the CAT delivery system designed to test English language workplace skills for the Singapore Workforce Development Agency. Topics include item creation and selection, calibration of the item pool, analysis and testing of the psychometric properties, and reporting and interpretation of scores. The characteristics and benefits of the CAT delivery system are detailed as well as implications for testing programs considering the use of a
CAT delivery system.

%B Journal of Applied Testing Technology %V 12 %G English %U http://www.testpublishers.org/journal-of-applied-testing-technology %N 1 %0 Journal Article %J BMC Medical Informatics and Decision Making %D 2011 %T A new adaptive testing algorithm for shortening health literacy assessments %A Kandula, S. %A Ancker, J.S. %A Kaufman, D.R. %A Currie, L.M. %A Qing, Z.-T. %X

 

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3178473/?tool=pmcentrez
%B BMC Medical Informatics and Decision Making %V 11 %G English %N 52 %R 10.1186/1472-6947-11-52 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Assembling an Inventory of Multistage Adaptive Testing Systems %A Breithaupt, K %A Ariel, A. %A Hare, D. R. %B Elements of Adaptive Testing %P 247-266 %G eng %& 13 %R 10.1007/978-0-387-85461-8 %0 Journal Article %J Psicothema %D 2010 %T Deterioro de parámetros de los ítems en tests adaptativos informatizados: estudio con eCAT [Item parameter drift in computerized adaptive testing: Study with eCAT] %A Abad, F. J. %A Olea, J. %A Aguado, D. %A Ponsoda, V. %A Barrada, J %K *Software %K Educational Measurement/*methods/*statistics & numerical data %K Humans %K Language %X

En el presente trabajo se muestra el análisis realizado sobre un Test Adaptativo Informatizado (TAI) diseñado para la evaluación del nivel de inglés, denominado eCAT, con el objetivo de estudiar el deterioro de parámetros (parameter drift) producido desde la calibración inicial del banco de ítems. Se ha comparado la calibración original desarrollada para la puesta en servicio del TAI (N= 3224) y la calibración actual obtenida con las aplicaciones reales del TAI (N= 7254). Se ha analizado el Funcionamiento Diferencial de los Ítems (FDI) en función de los parámetros utilizados y se ha simulado el impacto que sobre el nivel de rasgo estimado tiene la variación en los parámetros. Los resultados muestran que se produce especialmente un deterioro de los parámetros a y c, que hay unimportante número de ítems del banco para los que existe FDI y que la variación de los parámetros produce un impacto moderado en la estimación de θ de los evaluados con nivel de inglés alto. Se concluye que los parámetros de los ítems se han deteriorado y deben ser actualizados.Item parameter drift in computerized adaptive testing: Study with eCAT. This study describes the parameter drift analysis conducted on eCAT (a Computerized Adaptive Test to assess the written English level of Spanish speakers). The original calibration of the item bank (N = 3224) was compared to a new calibration obtained from the data provided by most eCAT operative administrations (N =7254). A Differential Item Functioning (DIF) study was conducted between the original and the new calibrations. The impact that the new parameters have on the trait level estimates was obtained by simulation. Results show that parameter drift is found especially for a and c parameters, an important number of bank items show DIF, and the parameter change has a moderate impact on high-level-English θ estimates. It is then recommended to replace the original estimates by the new set. by the new set.

%B Psicothema %7 2010/04/29 %V 22 %P 340-7 %@ 0214-9915 (Print)0214-9915 (Linking) %G spa %M 20423641 %( Deterioro de parametros de los items en tests adaptativos informatizados: estudio con eCAT. %0 Journal Article %J Quality of Life Research %D 2010 %T Development of computerized adaptive testing (CAT) for the EORTC QLQ-C30 physical functioning dimension %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Chie, W. C. %A Conroy, T. %A Costantini, A. %A Fayers, P. %A Helbostad, J. %A Holzner, B. %A Kaasa, S. %A Singer, S. %A Velikova, G. %A Young, T. %X PURPOSE: Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF). METHODS: Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties. RESULTS: Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation. CONCLUSIONS: We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF. %B Quality of Life Research %7 2010/10/26 %V 20 %P 479-490 %@ 1573-2649 (Electronic)0962-9343 (Linking) %G Eng %M 20972628 %0 Journal Article %J Applied Psychological Measurement %D 2010 %T A Method for the Comparison of Item Selection Rules in Computerized Adaptive Testing %A Barrada, Juan Ramón %A Olea, Julio %A Ponsoda, Vicente %A Abad, Francisco José %X

In a typical study comparing the relative efficiency of two item selection rules in computerized adaptive testing, the common result is that they simultaneously differ in accuracy and security, making it difficult to reach a conclusion on which is the more appropriate rule. This study proposes a strategy to conduct a global comparison of two or more selection rules. A plot showing the performance of each selection rule for several maximum exposure rates is obtained and the whole plot is compared with other rule plots. The strategy was applied in a simulation study with fixed-length CATs for the comparison of six item selection rules: the point Fisher information, Fisher information weighted by likelihood, Kullback-Leibler weighted by likelihood, maximum information stratification with blocking, progressive and proportional methods. Our results show that there is no optimal rule for any overlap value or root mean square error (RMSE). The fact that a rule, for a given level of overlap, has lower RMSE than another does not imply that this pattern holds for another overlap rate. A fair comparison of the rules requires extensive manipulation of the maximum exposure rates. The best methods were the Kullback-Leibler weighted by likelihood, the proportional method, and the maximum information stratification method with blocking.

%B Applied Psychological Measurement %V 34 %P 438-452 %U http://apm.sagepub.com/content/34/6/438.abstract %R 10.1177/0146621610370152 %0 Journal Article %J Journal of Educational Measurement %D 2010 %T Stratified and Maximum Information Item Selection Procedures in Computer Adaptive Testing %A Deng, Hui %A Ansley, Timothy %A Chang, Hua-Hua %X

In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with completely random item selection (RAN). The comparisons were with respect to error variances, reliability of ability estimates and item usage through CATs simulated under nine test conditions of various practical constraints and item selection space. The results showed that F had an apparent precision advantage over STR and USTR under unconstrained item selection, but with very poor item usage. USTR reduced error variances for STR under various conditions, with small compromises in item usage. Compared to F, USTR enhanced item usage while achieving comparable precision in ability estimates; it achieved a precision level similar to F with improved item usage when items were selected under exposure control and with limited item selection space. The results provide implications for choosing an appropriate item selection procedure in applied settings.

%B Journal of Educational Measurement %V 47 %P 202–226 %U http://dx.doi.org/10.1111/j.1745-3984.2010.00109.x %R 10.1111/j.1745-3984.2010.00109.x %0 Journal Article %J Journal of Educational Measurement %D 2010 %T Stratified and maximum information item selection procedures in computer adaptive testing %A Deng, H. %A Ansley, T. %A Chang, H.-H. %B Journal of Educational Measurement %V 47 %P 202-226 %G Eng %0 Journal Article %J Papeles del Psicólogo %D 2010 %T Tests informatizados y otros nuevos tipos de tests [Computerized and other new types of tests] %A Olea, J. %A Abad, F. J. %A Barrada, J %X Recientemente se ha producido un considerable desarrollo de los tests adaptativos informatizados, en los que el test se adapta progresivamente al rendimiento del evaluando, y de otros tipos de tests: a) los test basados en modelos (se dispone de un modelo o teoría de cómo se responde a cada ítem, lo que permite predecir su dificultad), b) los tests ipsativos (el evaluado ha de elegir entre opciones que tienen parecida deseabilidad social, por lo que pueden resultar eficaces para controlar algunos sesgos de respuestas), c) los tests conductuales (miden rasgos que ordinariamente se han venido midiendo con autoinformes, mediante tareas que requieren respuestas no verbales) y d) los tests situacionales (en los que se presenta al evaluado una situación de conflicto laboral, por ejemplo, con varias posibles soluciones, y ha de elegir la que le parece la mejor descripción de lo que el haría en esa situación). El artículo comenta las características, ventajas e inconvenientes de todos ellos y muestra algunos ejemplos de tests concretos. Palabras clave: Test adaptativo informatizado, Test situacional, Test comportamental, Test ipsativo y generación automática de ítems.The paper provides a short description of some test types that are earning considerable interest in both research and applied areas. The main feature of a computerized adaptive test is that in despite of the examinees receiving different sets of items, their test scores are in the same metric and can be directly compared. Four other test types are considered: a) model-based tests (a model or theory is available to explain the item response process and this makes the prediction of item difficulties possible), b) ipsative tests (the examinee has to select one among two or more options with similar social desirability; so, these tests can help to control faking or other examinee’s response biases), c) behavioral tests (personality traits are measured from non-verbal responses rather than from self-reports), and d) situational tests (the examinee faces a conflictive situation and has to select the option that best describes what he or she will do). The paper evaluates these types of tests, comments on their pros and cons and provides some specific examples. Key words: Computerized adaptive test, Situational test, Behavioral test, Ipsative test and y automatic item generation. %B Papeles del Psicólogo %V 31 %P 94-107 %G eng %0 Generic %D 2010 %T Validation of a computer-adaptive test to evaluate generic health-related quality of life %A Rebollo, P. %A Castejon, I. %A Cuervo, J. %A Villa, G. %A Garcia-Cueto, E. %A Diaz-Cuervo, H. %A Zardain, P. C. %A Muniz, J. %A Alonso, J. %X BACKGROUND: Health Related Quality of Life (HRQoL) is a relevant variable in the evaluation of health outcomes. Questionnaires based on Classical Test Theory typically require a large number of items to evaluate HRQoL. Computer Adaptive Testing (CAT) can be used to reduce tests length while maintaining and, in some cases, improving accuracy. This study aimed at validating a CAT based on Item Response Theory (IRT) for evaluation of generic HRQoL: the CAT-Health instrument. METHODS: Cross-sectional study of subjects aged over 18 attending Primary Care Centres for any reason. CAT-Health was administered along with the SF-12 Health Survey. Age, gender and a checklist of chronic conditions were also collected. CAT-Health was evaluated considering: 1) feasibility: completion time and test length; 2) content range coverage, Item Exposure Rate (IER) and test precision; and 3) construct validity: differences in the CAT-Health scores according to clinical variables and correlations between both questionnaires. RESULTS: 396 subjects answered CAT-Health and SF-12, 67.2% females, mean age (SD) 48.6 (17.7) years. 36.9% did not report any chronic condition. Median completion time for CAT-Health was 81 seconds (IQ range = 59-118) and it increased with age (p < 0.001). The median number of items administered was 8 (IQ range = 6-10). Neither ceiling nor floor effects were found for the score. None of the items in the pool had an IER of 100% and it was over 5% for 27.1% of the items. Test Information Function (TIF) peaked between levels -1 and 0 of HRQoL. Statistically significant differences were observed in the CAT-Health scores according to the number and type of conditions. CONCLUSIONS: Although domain-specific CATs exist for various areas of HRQoL, CAT-Health is one of the first IRT-based CATs designed to evaluate generic HRQoL and it has proven feasible, valid and efficient, when administered to a broad sample of individuals attending primary care settings. %B Health and Quality of Life Outcomes %7 2010/12/07 %V 8 %P 147 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %M 21129169 %2 3022567 %0 Journal Article %J Psicothema %D 2009 %T Comparison of methods for controlling maximum exposure rates in computerized adaptive testing %A Barrada, J %A Abad, F. J. %A Veldkamp, B. P. %K *Numerical Analysis, Computer-Assisted %K Psychological Tests/*standards/*statistics & numerical data %X This paper has two objectives: (a) to provide a clear description of three methods for controlling the maximum exposure rate in computerized adaptive testing —the Symson-Hetter method, the restricted method, and the item-eligibility method— showing how all three can be interpreted as methods for constructing the variable sub-bank of items from which each examinee receives the items in his or her test; (b) to indicate the theoretical and empirical limitations of each method and to compare their performance. With the three methods, we obtained basically indistinguishable results in overlap rate and RMSE (differences in the third decimal place). The restricted method is the best method for controlling exposure rate, followed by the item-eligibility method. The worst method is the Sympson-Hetter method. The restricted method presents problems of sequential overlap rate. Our advice is to use the item-eligibility method, as it saves time and satisfies the goals of restricting maximum exposure. Comparación de métodos para el control de tasa máxima en tests adaptativos informatizados. Este artículo tiene dos objetivos: (a) ofrecer una descripción clara de tres métodos para el control de la tasa máxima en tests adaptativos informatizados, el método Symson-Hetter, el método restringido y el métodode elegibilidad del ítem, mostrando cómo todos ellos pueden interpretarse como métodos para la construcción del subbanco de ítems variable, del cual cada examinado recibe los ítems de su test; (b) señalar las limitaciones teóricas y empíricas de cada método y comparar sus resultados. Se obtienen resultados básicamente indistinguibles en tasa de solapamiento y RMSE con los tres métodos (diferencias en la tercera posición decimal). El método restringido es el mejor en el control de la tasa de exposición,seguido por el método de elegibilidad del ítem. El peor es el método Sympson-Hetter. El método restringido presenta un problema de solapamiento secuencial. Nuestra recomendación sería utilizar el método de elegibilidad del ítem, puesto que ahorra tiempo y satisface los objetivos de limitar la tasa máxima de exposición. %B Psicothema %7 2009/05/01 %V 21 %P 313-320 %8 May %@ 0214-9915 (Print)0214-9915 (Linking) %G eng %M 19403088 %0 Book Section %D 2009 %T A comparison of three methods of item selection for computerized adaptive testing %A Costa, D. R. %A Karino, C. A. %A Moura, F. A. S. %A Andrade, D. F. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Journal of Pain %D 2009 %T Development and preliminary testing of a computerized adaptive assessment of chronic pain %A Anatchkova, M. D. %A Saris-Baglama, R. N. %A Kosinski, M. %A Bjorner, J. B. %K *Computers %K *Questionnaires %K Activities of Daily Living %K Adaptation, Psychological %K Chronic Disease %K Cohort Studies %K Disability Evaluation %K Female %K Humans %K Male %K Middle Aged %K Models, Psychological %K Outcome Assessment (Health Care) %K Pain Measurement/*methods %K Pain, Intractable/*diagnosis/psychology %K Psychometrics %K Quality of Life %K User-Computer Interface %X The aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. PERSPECTIVE: This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain. %B Journal of Pain %7 2009/07/15 %V 10 %P 932-943 %8 Sep %@ 1528-8447 (Electronic)1526-5900 (Linking) %G eng %M 19595636 %2 2763618 %0 Journal Article %J Educational and Psychological Measurement %D 2009 %T Direct and Inverse Problems of Item Pool Design for Computerized Adaptive Testing %A Belov, Dmitry I. %A Armstrong, Ronald D. %X

The recent literature on computerized adaptive testing (CAT) has developed methods for creating CAT item pools from a large master pool. Each CAT pool is designed as a set of nonoverlapping forms reflecting the skill levels of an assumed population of test takers. This article presents a Monte Carlo method to obtain these CAT pools and discusses its advantages over existing methods. Also, a new problem is considered that finds a population ability density function best matching the master pool. An analysis of the solution to this new problem provides testing organizations with effective guidance for maintaining their master pools. Computer experiments with a pool of Law School Admission Test items and its assembly constraints are presented.

%B Educational and Psychological Measurement %V 69 %P 533-547 %U http://epm.sagepub.com/content/69/4/533.abstract %R 10.1177/0013164409332224 %0 Journal Article %J Educational and Psychological Measurement %D 2009 %T Direct and inverse problems of item pool design for computerized adaptive testing %A Belov, D. I. %A Armstrong, R. D. %B Educational and Psychological Measurement %V 69 %P 533-547 %G eng %0 Book Section %D 2009 %T Features of J-CAT (Japanese Computerized Adaptive Test) %A Imai, S. %A Ito, S. %A Nakamura, Y. %A Kikuchi, K. %A Akagi, Y. %A Nakasono, H. %A Honda, A. %A Hiramura, T. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Methodology %D 2009 %T Item selection rules in computerized adaptive testing: Accuracy and security %A Barrada, J %A Olea, J. %A Ponsoda, V. %A Abad, F. J. %B Methodology %V 5 %P 7-17 %G eng %0 Journal Article %J European Journal of Operational Research %D 2009 %T A mixed integer programming model for multiple stage adaptive testing %A Edmonds, J. %A Armstrong, R. D. %K Education %K Integer programming %K Linear programming %X The last decade has seen paper-and-pencil (P&P) tests being replaced by computerized adaptive tests (CATs) within many testing programs. A CAT may yield several advantages relative to a conventional P&P test. A CAT can determine the questions or test items to administer, allowing each test form to be tailored to a test taker's skill level. Subsequent items can be chosen to match the capability of the test taker. By adapting to a test taker's ability, a CAT can acquire more information about a test taker while administering fewer items. A Multiple Stage Adaptive test (MST) provides a means to implement a CAT that allows review before the administration. The MST format is a hybrid between the conventional P&P and CAT formats. This paper presents mixed integer programming models for MST assembly problems. Computational results with commercial optimization software will be given and advantages of the models evaluated. %B European Journal of Operational Research %V 193 %P 342-350 %@ 0377-2217 %G eng %0 Generic %D 2009 %T Proposta para a construo de um Teste Adaptativo Informatizado baseado na Teoria da Resposta ao Item (Proposal for the construction of a Computerized Adaptive Testing based on the Item Response Theory) %A Moreira Junior, F. J. %A Andrade, D. F. %C Poster session presented at the Congresso Brasileiro de Teoria da Resposta ao Item, Florianpolis SC Brazil %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2009 %T Studying the Equivalence of Computer-Delivered and Paper-Based Administrations of the Raven Standard Progressive Matrices Test %A Arce-Ferrer, Alvaro J. %A Martínez Guzmán, Elvira %X

This study investigates the effect of mode of administration of the Raven Standard Progressive Matrices test on distribution, accuracy, and meaning of raw scores. A random sample of high school students take counterbalanced paper-and-pencil and computer-based administrations of the test and answer a questionnaire surveying preferences for computer-delivered test administrations. Administration mode effect is studied with repeated measures multivariate analysis of variance, internal consistency reliability estimates, and confirmatory factor analysis approaches. Results show a lack of test mode effect on distribution, accuracy, and meaning of raw scores. Participants indicate their preferences for the computer-delivered administration of the test. The article discusses findings in light of previous studies of the Raven Standard Progressive Matrices test.

%B Educational and Psychological Measurement %V 69 %P 855-867 %U http://epm.sagepub.com/content/69/5/855.abstract %R 10.1177/0013164409332219 %0 Book Section %D 2009 %T Test overlap rate and item exposure rate as indicators of test security in CATs %A Barrada, J %A Olea, J. %A Ponsoda, V. %A Abad, F. J. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Using automatic item generation to address item demands for CAT %A Lai, H. %A Alves, C. %A Gierl, M. J. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Disability & Rehabilitation %D 2008 %T Efficiency and sensitivity of multidimensional computerized adaptive testing of pediatric physical functioning %A Allen, D. D. %A Ni, P. %A Haley, S. M. %K *Disability Evaluation %K Child %K Computers %K Disabled Children/*classification/rehabilitation %K Efficiency %K Humans %K Outcome Assessment (Health Care) %K Psychometrics %K Reproducibility of Results %K Retrospective Studies %K Self Care %K Sensitivity and Specificity %X PURPOSE: Computerized adaptive tests (CATs) have efficiency advantages over fixed-length tests of physical functioning but may lose sensitivity when administering extremely low numbers of items. Multidimensional CATs may efficiently improve sensitivity by capitalizing on correlations between functional domains. Using a series of empirical simulations, we assessed the efficiency and sensitivity of multidimensional CATs compared to a longer fixed-length test. METHOD: Parent responses to the Pediatric Evaluation of Disability Inventory before and after intervention for 239 children at a pediatric rehabilitation hospital provided the data for this retrospective study. Reliability, effect size, and standardized response mean were compared between full-length self-care and mobility subscales and simulated multidimensional CATs with stopping rules at 40, 30, 20, and 10 items. RESULTS: Reliability was lowest in the 10-item CAT condition for the self-care (r = 0.85) and mobility (r = 0.79) subscales; all other conditions had high reliabilities (r > 0.94). All multidimensional CAT conditions had equivalent levels of sensitivity compared to the full set condition for both domains. CONCLUSIONS: Multidimensional CATs efficiently retain the sensitivity of longer fixed-length measures even with 5 items per dimension (10-item CAT condition). Measuring physical functioning with multidimensional CATs could enhance sensitivity following intervention while minimizing response burden. %B Disability & Rehabilitation %7 2008/02/26 %V 30 %P 479-84 %@ 0963-8288 (Print)0963-8288 (Linking) %G eng %M 18297502 %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2008 %T Incorporating randomness in the Fisher information for improving item-exposure control in CATs %A Barrada, J %A Olea, J. %A Ponsoda, V. %A Abad, F. J. %B British Journal of Mathematical and Statistical Psychology %V 61 %P 493-513 %G eng %0 Journal Article %J Spine %D 2008 %T Letting the CAT out of the bag: Comparing computer adaptive tests and an 11-item short form of the Roland-Morris Disability Questionnaire %A Cook, K. F. %A Choi, S. W. %A Crane, P. K. %A Deyo, R. A. %A Johnson, K. L. %A Amtmann, D. %K *Disability Evaluation %K *Health Status Indicators %K Adult %K Aged %K Aged, 80 and over %K Back Pain/*diagnosis/psychology %K Calibration %K Computer Simulation %K Diagnosis, Computer-Assisted/*standards %K Humans %K Middle Aged %K Models, Psychological %K Predictive Value of Tests %K Questionnaires/*standards %K Reproducibility of Results %X STUDY DESIGN: A post hoc simulation of a computer adaptive administration of the items of a modified version of the Roland-Morris Disability Questionnaire. OBJECTIVE: To evaluate the effectiveness of adaptive administration of back pain-related disability items compared with a fixed 11-item short form. SUMMARY OF BACKGROUND DATA: Short form versions of the Roland-Morris Disability Questionnaire have been developed. An alternative to paper-and-pencil short forms is to administer items adaptively so that items are presented based on a person's responses to previous items. Theoretically, this allows precise estimation of back pain disability with administration of only a few items. MATERIALS AND METHODS: Data were gathered from 2 previously conducted studies of persons with back pain. An item response theory model was used to calibrate scores based on all items, items of a paper-and-pencil short form, and several computer adaptive tests (CATs). RESULTS: Correlations between each CAT condition and scores based on a 23-item version of the Roland-Morris Disability Questionnaire ranged from 0.93 to 0.98. Compared with an 11-item short form, an 11-item CAT produced scores that were significantly more highly correlated with scores based on the 23-item scale. CATs with even fewer items also produced scores that were highly correlated with scores based on all items. For example, scores from a 5-item CAT had a correlation of 0.93 with full scale scores. Seven- and 9-item CATs correlated at 0.95 and 0.97, respectively. A CAT with a standard-error-based stopping rule produced scores that correlated at 0.95 with full scale scores. CONCLUSION: A CAT-based back pain-related disability measure may be a valuable tool for use in clinical and research contexts. Use of CAT for other common measures in back pain research, such as other functional scales or measures of psychological distress, may offer similar advantages. %B Spine %7 2008/05/23 %V 33 %P 1378-83 %8 May 20 %@ 1528-1159 (Electronic) %G eng %M 18496352 %0 Journal Article %J Applied Psychological Measurement %D 2008 %T A monte carlo approach for adaptive testing with content constraints %A Belov, D. I. %A Armstrong, R. D. %A Weissman, A. %B Applied Psychological Measurement %V 32 %P 431-446 %R 10.1177/0146621607309081 %0 Journal Article %J Applied Psychological Measurement %D 2008 %T A Monte Carlo Approach for Adaptive Testing With Content Constraints %A Belov, Dmitry I. %A Armstrong, Ronald D. %A Weissman, Alexander %X

This article presents a new algorithm for computerized adaptive testing (CAT) when content constraints are present. The algorithm is based on shadow CAT methodology to meet content constraints but applies Monte Carlo methods and provides the following advantages over shadow CAT: (a) lower maximum item exposure rates, (b) higher utilization of the item pool, and (c) more robust ability estimates. Computer simulations with Law School Admission Test items demonstrated that the new algorithm (a) produces similar ability estimates as shadow CAT but with half the maximum item exposure rate and 100% pool utilization and (b) produces more robust estimates when a high- (or low-) ability examinee performs poorly (or well) at the beginning of the test.

%B Applied Psychological Measurement %V 32 %P 431-446 %U http://apm.sagepub.com/content/32/6/431.abstract %R 10.1177/0146621607309081 %0 Journal Article %J Applied Psychological Measurement %D 2008 %T A Monte Carlo Approach to the Design, Assembly, and Evaluation of Multistage Adaptive Tests %A Belov, Dmitry I. %A Armstrong, Ronald D. %X

This article presents an application of Monte Carlo methods for developing and assembling multistage adaptive tests (MSTs). A major advantage of the Monte Carlo assembly over other approaches (e.g., integer programming or enumerative heuristics) is that it provides a uniform sampling from all MSTs (or MST paths) available from a given item pool. The uniform sampling allows a statistically valid analysis for MST design and evaluation. Given an item pool, MST model, and content constraints for test assembly, three problems are addressed in this study. They are (a) the construction of item response theory (IRT) targets for each MST path, (b) the assembly of an MST such that each path satisfies content constraints and IRT constraints, and (c) an analysis of the pool and constraints to increase the number of nonoverlapping MSTs that can be assembled from the pool. The primary intent is to produce reliable measurements and enhance pool utilization.

%B Applied Psychological Measurement %V 32 %P 119-137 %U http://apm.sagepub.com/content/32/2/119.abstract %R 10.1177/0146621606297308 %0 Journal Article %J Spanish Journal of Psychology %D 2008 %T Rotating item banks versus restriction of maximum exposure rates in computerized adaptive testing %A Barrada, J %A Olea, J. %A Abad, F. J. %K *Character %K *Databases %K *Software Design %K Aptitude Tests/*statistics & numerical data %K Bias (Epidemiology) %K Computing Methodologies %K Diagnosis, Computer-Assisted/*statistics & numerical data %K Educational Measurement/*statistics & numerical data %K Humans %K Mathematical Computing %K Psychometrics/statistics & numerical data %X

If examinees were to know, beforehand, part of the content of a computerized adaptive test, their estimated trait levels would then have a marked positive bias. One of the strategies to avoid this consists of dividing a large item bank into several sub-banks and rotating the sub-bank employed (Ariel, Veldkamp & van der Linden, 2004). This strategy permits substantial improvements in exposure control at little cost to measurement accuracy, However, we do not know whether this option provides better results than using the master bank with greater restriction in the maximum exposure rates (Sympson & Hetter, 1985). In order to investigate this issue, we worked with several simulated banks of 2100 items, comparing them, for RMSE and overlap rate, with the same banks divided in two, three... up to seven sub-banks. By means of extensive manipulation of the maximum exposure rate in each bank, we found that the option of rotating banks slightly outperformed the option of restricting maximum exposure rate of the master bank by means of the Sympson-Hetter method.

%B Spanish Journal of Psychology %7 2008/11/08 %V 11 %P 618-625 %@ 1138-7416 %G eng %M 18988447 %0 Journal Article %J Medical Care %D 2007 %T The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years %A Cella, D. %A Yount, S. %A Rothrock, N. %A Gershon, R. C. %A Cook, K. F. %A Reeve, B. %A Ader, D. %A Fries, J.F. %A Bruce, B. %A Rose, M. %X The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative (www.nihpromis.org) is a 5-year cooperative group program of research designed to develop, validate, and standardize item banks to measure patient-reported outcomes (PROs) relevant across common medical conditions. In this article, we will summarize the organization and scientific activity of the PROMIS network during its first 2 years. %B Medical Care %V 45 %P S3-S11 %G eng %0 Journal Article %J European Journal of Psychological Assessment %D 2007 %T Psychometric properties of an emotional adjustment measure: An application of the graded response model %A Rubio, V. J. %A Aguado, D. %A Hontangas, P. M. %A Hernández, J. M. %K computerized adaptive tests %K Emotional Adjustment %K Item Response Theory %K Personality Measures %K personnel recruitment %K Psychometrics %K Samejima's graded response model %K test reliability %K validity %X Item response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. However, IRT has been mainly used for assessing achievements and ability rather than personality factors. This paper presents an application of the IRT to a personality measure. Thus, the psychometric properties of a new emotional adjustment measure that consists of a 28-six graded response items is shown. Classical test theory (CTT) analyses as well as IRT analyses are carried out. Samejima's (1969) graded-response model has been used for estimating item parameters. Results show that the bank of items fulfills model assumptions and fits the data reasonably well, demonstrating the suitability of the IRT models for the description and use of data originating from personality measures. In this sense, the model fulfills the expectations that IRT has undoubted advantages: (1) The invariance of the estimated parameters, (2) the treatment given to the standard error of measurement, and (3) the possibilities offered for the construction of computerized adaptive tests (CAT). The bank of items shows good reliability. It also shows convergent validity compared to the Eysenck Personality Inventory (EPQ-A; Eysenck & Eysenck, 1975) and the Big Five Questionnaire (BFQ; Caprara, Barbaranelli, & Borgogni, 1993). (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B European Journal of Psychological Assessment %I Hogrefe & Huber Publishers GmbH: Germany %V 23 %P 39-46 %@ 1015-5759 (Print) %G eng %M 2007-01587-007 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2006 %T Assembling a computerized adaptive testing item pool as a set of linear tests %A van der Linden, W. J. %A Ariel, A. %A Veldkamp, B. P. %K Algorithms %K computerized adaptive testing %K item pool %K linear tests %K mathematical models %K statistics %K Test Construction %K Test Items %X Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content constraints, and/or have unfavorable exposure rates. Although at first sight somewhat counterintuitive, it is shown that if the CAT pool is assembled as a set of linear test forms, undesirable correlations can be broken down effectively. It is proposed to assemble such pools using a mixed integer programming model with constraints that guarantee that each test meets all content specifications and an objective function that requires them to have maximal information at a well-chosen set of ability values. An empirical example with a previous master pool from the Law School Admission Test (LSAT) yielded a CAT with nearly uniform bias and mean-squared error functions for the ability estimator and item-exposure rates that satisfied the target for all items in the pool. %B Journal of Educational and Behavioral Statistics %I Sage Publications: US %V 31 %P 81-99 %@ 1076-9986 (Print) %G eng %M 2007-08137-004 %0 Conference Paper %B Presented at the National Council on Measurement on Education %D 2006 %T A comparison of online calibration methods for a CAT %A Morgan, D. L. %A Way, W. D. %A Augemberg, K.E. %B Presented at the National Council on Measurement on Education %C San Francisco, CA %G eng %0 Journal Article %J Quality of Life Research %D 2006 %T Multidimensional computerized adaptive testing of the EORTC QLQ-C30: basic developments and evaluations %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Fayers, P. %A Sprangers, M. %A Bjorner, J. B. %K *Quality of Life %K *Self Disclosure %K Adult %K Female %K Health Status %K Humans %K Male %K Middle Aged %K Questionnaires/*standards %K User-Computer Interface %X OBJECTIVE: Self-report questionnaires are widely used to measure health-related quality of life (HRQOL). Ideally, such questionnaires should be adapted to the individual patient and at the same time scores should be directly comparable across patients. This may be achieved using computerized adaptive testing (CAT). Usually, CAT is carried out for a single domain at a time. However, many HRQOL domains are highly correlated. Multidimensional CAT may utilize these correlations to improve measurement efficiency. We investigated the possible advantages and difficulties of multidimensional CAT. STUDY DESIGN AND SETTING: We evaluated multidimensional CAT of three scales from the EORTC QLQ-C30: the physical functioning, emotional functioning, and fatigue scales. Analyses utilised a database with 2958 European cancer patients. RESULTS: It was possible to obtain scores for the three domains with five to seven items administered using multidimensional CAT that were very close to the scores obtained using all 12 items and with no or little loss of measurement precision. CONCLUSION: The findings suggest that multidimensional CAT may significantly improve measurement precision and efficiency and encourage further research into multidimensional CAT. Particularly, the estimation of the model underlying the multidimensional CAT and the conceptual aspects need further investigations. %B Quality of Life Research %7 2006/03/21 %V 15 %P 315-29 %8 Apr %@ 0962-9343 (Print) %G eng %M 16547770 %0 Journal Article %J Applied Psychological Measurement %D 2006 %T Optimal Testlet Pool Assembly for Multistage Testing Designs %A Ariel, Adelaide %A Veldkamp, Bernard P. %A Breithaupt, Krista %X

Computerized multistage testing (MST) designs require sets of test questions (testlets) to be assembled to meet strict, often competing criteria. Rules that govern testlet assembly may dictate the number of questions on a particular subject or may describe desirable statistical properties for the test, such as measurement precision. In an MST design, testlets of differing difficulty levels must be created. Statistical properties for assembly of the testlets can be expressed using item response theory (IRT) parameters. The testlet test information function (TIF) value can be maximized at a specific point on the IRT ability scale. In practical MST designs, parallel versions of testlets are needed, so sets of testlets with equivalent properties are built according to equivalent specifications. In this project, the authors study the use of a mathematical programming technique to simultaneously assemble testlets to ensure equivalence and fairness to candidates who may be administered different testlets.

%B Applied Psychological Measurement %V 30 %P 204-215 %U http://apm.sagepub.com/content/30/3/204.abstract %R 10.1177/0146621605284350 %0 Journal Article %J International Journal of Testing %D 2005 %T Automated Simultaneous Assembly for Multistage Testing %A Breithaupt, Krista %A Ariel, Adelaide %A Veldkamp, Bernard P. %B International Journal of Testing %V 5 %P 319-330 %U http://www.tandfonline.com/doi/abs/10.1207/s15327574ijt0503_8 %R 10.1207/s15327574ijt0503_8 %0 Journal Article %J American Journal of Physical Medicine and Rehabilitation %D 2005 %T Measuring physical function in patients with complex medical and postsurgical conditions: a computer adaptive approach %A Siebens, H. %A Andres, P. L. %A Pengsheng, N. %A Coster, W. J. %A Haley, S. M. %K Activities of Daily Living/*classification %K Adult %K Aged %K Cohort Studies %K Continuity of Patient Care %K Disability Evaluation %K Female %K Health Services Research %K Humans %K Male %K Middle Aged %K Postoperative Care/*rehabilitation %K Prognosis %K Recovery of Function %K Rehabilitation Centers %K Rehabilitation/*standards %K Sensitivity and Specificity %K Sickness Impact Profile %K Treatment Outcome %X OBJECTIVE: To examine whether the range of disability in the medically complex and postsurgical populations receiving rehabilitation is adequately sampled by the new Activity Measure--Post-Acute Care (AM-PAC), and to assess whether computer adaptive testing (CAT) can derive valid patient scores using fewer questions. DESIGN: Observational study of 158 subjects (mean age 67.2 yrs) receiving skilled rehabilitation services in inpatient (acute rehabilitation hospitals, skilled nursing facility units) and community (home health services, outpatient departments) settings for recent-onset or worsening disability from medical (excluding neurological) and surgical (excluding orthopedic) conditions. Measures were interviewer-administered activity questions (all patients) and physical functioning portion of the SF-36 (outpatients) and standardized chart items (11 Functional Independence Measure (FIM), 19 Standardized Outcome and Assessment Information Set (OASIS) items, and 22 Minimum Data Set (MDS) items). Rasch modeling analyzed all data and the relationship between person ability estimates and average item difficulty. CAT assessed the ability to derive accurate patient scores using a sample of questions. RESULTS: The 163-item activity item pool covered the range of physical movement and personal and instrumental activities. CAT analysis showed comparable scores between estimates using 10 items or the total item pool. CONCLUSION: The AM-PAC can assess a broad range of function in patients with complex medical illness. CAT achieves valid patient scores using fewer questions. %B American Journal of Physical Medicine and Rehabilitation %V 84 %P 741-8 %8 Oct %G eng %M 16205429 %0 Journal Article %J Applied Psychological Measurement %D 2005 %T Monte Carlo Test Assembly for Item Pool Analysis and Extension %A Belov, Dmitry I. %A Armstrong, Ronald D. %X

A new test assembly algorithm based on a Monte Carlo random search is presented in this article. A major advantage of the Monte Carlo test assembly over other approaches (integer programming or enumerative heuristics) is that it performs a uniform sampling from the item pool, which provides every feasible item combination (test) with an equal chance of being built during an assembly. This allows the authors to address the following issues of pool analysis and extension: compare the strengths and weaknesses of different pools, identify the most restrictive constraint(s) for test assembly, and identify properties of the items that should be added to a pool to achieve greater usability of the pool. Computer experiments with operational pools are given.

%B Applied Psychological Measurement %V 29 %P 239-261 %U http://apm.sagepub.com/content/29/4/239.abstract %R 10.1177/0146621605275413 %0 Journal Article %J Psicothema %D 2005 %T Propiedades psicométricas de un test Adaptativo Informatizado para la medición del ajuste emocional [Psychometric properties of an Emotional Adjustment Computerized Adaptive Test] %A Aguado, D. %A Rubio, V. J. %A Hontangas, P. M. %A Hernández, J. M. %K Computer Assisted Testing %K Emotional Adjustment %K Item Response %K Personality Measures %K Psychometrics %K Test Validity %K Theory %X En el presente trabajo se describen las propiedades psicométricas de un Test Adaptativo Informatizado para la medición del ajuste emocional de las personas. La revisión de la literatura acerca de la aplicación de los modelos de la teoría de la respuesta a los ítems (TRI) muestra que ésta se ha utilizado más en el trabajo con variables aptitudinales que para la medición de variables de personalidad, sin embargo diversos estudios han mostrado la eficacia de la TRI para la descripción psicométrica de dichasvariables. Aun así, pocos trabajos han explorado las características de un Test Adaptativo Informatizado, basado en la TRI, para la medición de una variable de personalidad como es el ajuste emocional. Nuestros resultados muestran la eficiencia del TAI para la evaluación del ajuste emocional, proporcionando una medición válida y precisa, utilizando menor número de elementos de medida encomparación con las escalas de ajuste emocional de instrumentos fuertemente implantados. Psychometric properties of an emotional adjustment computerized adaptive test. In the present work it was described the psychometric properties of an emotional adjustment computerized adaptive test. An examination of Item Response Theory (IRT) research literature indicates that IRT has been mainly used for assessing achievements and ability rather than personality factors. Nevertheless last years have shown several studies wich have successfully used IRT to personality assessment instruments. Even so, a few amount of works has inquired the computerized adaptative test features, based on IRT, for the measurement of a personality traits as it’s the emotional adjustment. Our results show the CAT efficiency for the emotional adjustment assessment so this provides a valid and accurate measurement; by using a less number of items in comparison with the emotional adjustment scales from the most strongly established questionnaires. %B Psicothema %V 17 %P 484-491 %G eng %0 Journal Article %J Alcoholism: Clinical & Experimental Research %D 2005 %T Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire %A Kahler, C. W. %A Strong, D. R. %A Read, J. P. %A De Boeck, P. %A Wilson, M. %A Acton, G. S. %A Palfai, T. P. %A Wood, M. D. %A Mehta, P. D. %A Neale, M. C. %A Flay, B. R. %A Conklin, C. A. %A Clayton, R. R. %A Tiffany, S. T. %A Shiffman, S. %A Krueger, R. F. %A Nichol, P. E. %A Hicks, B. M. %A Markon, K. E. %A Patrick, C. J. %A Iacono, William G. %A McGue, Matt %A Langenbucher, J. W. %A Labouvie, E. %A Martin, C. S. %A Sanjuan, P. M. %A Bavly, L. %A Kirisci, L. %A Chung, T. %A Vanyukov, M. %A Dunn, M. %A Tarter, R. %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. %K Psychometrics %K Substance-Related Disorders %X Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association %B Alcoholism: Clinical & Experimental Research %V 29 %P 1180-1189 %G eng %0 Journal Article %J Medical Care %D 2004 %T Activity outcome measurement for postacute care %A Haley, S. M. %A Coster, W. J. %A Andres, P. L. %A Ludlow, L. H. %A Ni, P. %A Bond, T. L. %A Sinclair, S. J. %A Jette, A. M. %K *Self Efficacy %K *Sickness Impact Profile %K Activities of Daily Living/*classification/psychology %K Adult %K Aftercare/*standards/statistics & numerical data %K Aged %K Boston %K Cognition/physiology %K Disability Evaluation %K Factor Analysis, Statistical %K Female %K Human %K Male %K Middle Aged %K Movement/physiology %K Outcome Assessment (Health Care)/*methods/statistics & numerical data %K Psychometrics %K Questionnaires/standards %K Rehabilitation/*standards/statistics & numerical data %K Reproducibility of Results %K Sensitivity and Specificity %K Support, U.S. Gov't, Non-P.H.S. %K Support, U.S. Gov't, P.H.S. %X BACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings. %B Medical Care %V 42 %P I49-161 %G eng %M 14707755 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2004 %T Automated Simultaneous Assembly of Multi-Stage Testing for the Uniform CPA Examination %A Breithaupt, K %A Ariel, A. %A Veldkamp, B. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Journal Article %J Stroke Rehabilitation %D 2004 %T Computer adaptive testing: a strategy for monitoring stroke rehabilitation across settings %A Andres, P. L. %A Black-Schaffer, R. M. %A Ni, P. %A Haley, S. M. %K *Computer Simulation %K *User-Computer Interface %K Adult %K Aged %K Aged, 80 and over %K Cerebrovascular Accident/*rehabilitation %K Disabled Persons/*classification %K Female %K Humans %K Male %K Middle Aged %K Monitoring, Physiologic/methods %K Severity of Illness Index %K Task Performance and Analysis %X Current functional assessment instruments in stroke rehabilitation are often setting-specific and lack precision, breadth, and/or feasibility. Computer adaptive testing (CAT) offers a promising potential solution by providing a quick, yet precise, measure of function that can be used across a broad range of patient abilities and in multiple settings. CAT technology yields a precise score by selecting very few relevant items from a large and diverse item pool based on each individual's responses. We demonstrate the potential usefulness of a CAT assessment model with a cross-sectional sample of persons with stroke from multiple rehabilitation settings. %B Stroke Rehabilitation %7 2004/05/01 %V 11 %P 33-39 %8 Spring %@ 1074-9357 (Print) %G eng %M 15118965 %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Computerized adaptive testing with multiple-form structures %A Armstrong, R. D. %A Jones, D. H. %A Koppel, N. B. %A Pashley, P. J. %K computerized adaptive testing %K Law School Admission Test %K multiple-form structure %K testlets %X A multiple-form structure (MFS) is an ordered collection or network of testlets (i.e., sets of items). An examinee's progression through the network of testlets is dictated by the correctness of an examinee's answers, thereby adapting the test to his or her trait level. The collection of paths through the network yields the set of all possible test forms, allowing test specialists the opportunity to review them before they are administered. Also, limiting the exposure of an individual MFS to a specific period of time can enhance test security. This article provides an overview of methods that have been developed to generate parallel MFSs. The approach is applied to the assembly of an experimental computerized Law School Admission Test (LSAT). (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Applied Psychological Measurement %I Sage Publications: US %V 28 %P 147-164 %@ 0146-6216 (Print) %G eng %M 2004-13800-001 %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Computerized Adaptive Testing With Multiple-Form Structures %A Armstrong, Ronald D. %A Jones, Douglas H. %A Koppel, Nicole B. %A Pashley, Peter J. %X

A multiple-form structure (MFS) is an orderedcollection or network of testlets (i.e., sets of items).An examinee’s progression through the networkof testlets is dictated by the correctness of anexaminee’s answers, thereby adapting the test tohis or her trait level. The collection of pathsthrough the network yields the set of all possibletest forms, allowing test specialists the opportunityto review them before they are administered. Also,limiting the exposure of an individual MFS to aspecific period of time can enhance test security.This article provides an overview of methods thathave been developed to generate parallel MFSs.The approach is applied to the assembly of anexperimental computerized Law School Admission Test (LSAT).

%B Applied Psychological Measurement %V 28 %P 147-164 %U http://apm.sagepub.com/content/28/3/147.abstract %R 10.1177/0146621604263652 %0 Journal Article %J Journal of Educational Measurement %D 2004 %T Constructing rotating item pools for constrained adaptive testing %A Ariel, A. %A Veldkamp, B. P. %A van der Linden, W. J. %K computerized adaptive tests %K constrained adaptive testing %K item exposure %K rotating item pools %X Preventing items in adaptive testing from being over- or underexposed is one of the main problems in computerized adaptive testing. Though the problem of overexposed items can be solved using a probabilistic item-exposure control method, such methods are unable to deal with the problem of underexposed items. Using a system of rotating item pools, on the other hand, is a method that potentially solves both problems. In this method, a master pool is divided into (possibly overlapping) smaller item pools, which are required to have similar distributions of content and statistical attributes. These pools are rotated among the testing sites to realize desirable exposure rates for the items. A test assembly model, motivated by Gulliksen's matched random subtests method, was explored to help solve the problem of dividing a master pool into a set of smaller pools. Different methods to solve the model are proposed. An item pool from the Law School Admission Test was used to evaluate the performances of computerized adaptive tests from systems of rotating item pools constructed using these methods. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Educational Measurement %I Blackwell Publishing: United Kingdom %V 41 %P 345-359 %@ 0022-0655 (Print) %G eng %M 2004-21596-004 %0 Journal Article %J Journal of Educational Measurement %D 2004 %T Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing %A Chen, S-Y. %A Ankenmann, R. D. %K computerized adaptive testing %K item selection rules %K practical constraints %X The purpose of this study was to compare the effects of four item selection rules--(1) Fisher information (F), (2) Fisher information with a posterior distribution (FP), (3) Kullback-Leibler information with a posterior distribution (KP), and (4) completely randomized item selection (RN)--with respect to the precision of trait estimation and the extent of item usage at the early stages of computerized adaptive testing. The comparison of the four item selection rules was carried out under three conditions: (1) using only the item information function as the item selection criterion; (2) using both the item information function and content balancing; and (3) using the item information function, content balancing, and item exposure control. When test length was less than 10 items, FP and KP tended to outperform F at extreme trait levels in Condition 1. However, in more realistic settings, it could not be concluded that FP and KP outperformed F, especially when item exposure control was imposed. When test length was greater than 10 items, the three nonrandom item selection procedures performed similarly no matter what the condition was, while F had slightly higher item usage. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Educational Measurement %I Blackwell Publishing: United Kingdom %V 41 %P 149-174 %@ 0022-0655 (Print) %G eng %M 2005-04771-004 %0 Journal Article %J Journal of Educational Measurement %D 2004 %T ffects of practical constraints on item selection rules at the early stages of computerized adaptive testing %A Chen, Y.-Y. %A Ankenmann, R. D. %B Journal of Educational Measurement %V 41 %P 149-174 %G eng %0 Book Section %B Intelligent Tutoring Systems %D 2004 %T A Learning Environment for English for Academic Purposes Based on Adaptive Tests and Task-Based Systems %A Gonçalves, Jean P. %A Aluisio, Sandra M. %A de Oliveira, Leandro H.M. %A Oliveira Jr., Osvaldo N. %E Lester, James C. %E Vicari, Rosa Maria %E Paraguaçu, Fábio %B Intelligent Tutoring Systems %S Lecture Notes in Computer Science %I Springer Berlin / Heidelberg %V 3220 %P 1-11 %@ 978-3-540-22948-3 %G eng %U http://dx.doi.org/10.1007/978-3-540-30139-4_1 %R 10.1007/978-3-540-30139-4_1 %0 Conference Paper %B Intelligent Tutoring Systems. %D 2004 %T A learning environment for english for academic purposes based on adaptive tests and task-based systems %A PITON-GONÇALVES, J. %A ALUISIO, S. M. %A MENDONCA, L. H. %A NOVAES, O. O. %B Intelligent Tutoring Systems. %I Springer Berlin Heidelberg %G eng %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Mokken Scale Analysis Using Hierarchical Clustering Procedures %A van Abswoude, Alexandra A. H. %A Vermunt, Jeroen K. %A Hemker, Bas T. %A van der Ark, L. Andries %X

Mokken scale analysis (MSA) can be used to assess and build unidimensional scales from an item pool that is sensitive to multiple dimensions. These scales satisfy a set of scaling conditions, one of which follows from the model of monotone homogeneity. An important drawback of the MSA program is that the sequential item selection and scale construction procedure may not find the dominant underlying dimensionality of the responses to a set of items. The authors investigated alternative hierarchical item selection procedures and compared the performance of four hierarchical methods and the sequential clustering method in the MSA context. The results showed that hierarchical clustering methods can improve the search process of the dominant dimensionality of a data matrix. In particular, the complete linkage and scale linkage methods were promising in finding the dimensionality of the item response data from a set of items.

%B Applied Psychological Measurement %V 28 %P 332-354 %U http://apm.sagepub.com/content/28/5/332.abstract %R 10.1177/0146621604265510 %0 Journal Article %J Medical Care %D 2004 %T Refining the conceptual basis for rehabilitation outcome measurement: personal care and instrumental activities domain %A Coster, W. J. %A Haley, S. M. %A Andres, P. L. %A Ludlow, L. H. %A Bond, T. L. %A Ni, P. S. %K *Self Efficacy %K *Sickness Impact Profile %K Activities of Daily Living/*classification/psychology %K Adult %K Aged %K Aged, 80 and over %K Disability Evaluation %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods/statistics & numerical data %K Questionnaires/*standards %K Recovery of Function/physiology %K Rehabilitation/*standards/statistics & numerical data %K Reproducibility of Results %K Research Support, U.S. Gov't, Non-P.H.S. %K Research Support, U.S. Gov't, P.H.S. %K Sensitivity and Specificity %X BACKGROUND: Rehabilitation outcome measures routinely include content on performance of daily activities; however, the conceptual basis for item selection is rarely specified. These instruments differ significantly in format, number, and specificity of daily activity items and in the measurement dimensions and type of scale used to specify levels of performance. We propose that a requirement for upper limb and hand skills underlies many activities of daily living (ADL) and instrumental activities of daily living (IADL) items in current instruments, and that items selected based on this definition can be placed along a single functional continuum. OBJECTIVE: To examine the dimensional structure and content coverage of a Personal Care and Instrumental Activities item set and to examine the comparability of items from existing instruments and a set of new items as measures of this domain. METHODS: Participants (N = 477) from 3 different disability groups and 4 settings representing the continuum of postacute rehabilitation care were administered the newly developed Activity Measure for Post-Acute Care (AM-PAC), the SF-8, and an additional setting-specific measure: FIM (in-patient rehabilitation); MDS (skilled nursing facility); MDS-PAC (postacute settings); OASIS (home care); or PF-10 (outpatient clinic). Rasch (partial-credit model) analyses were conducted on a set of 62 items covering the Personal Care and Instrumental domain to examine item fit, item functioning, and category difficulty estimates and unidimensionality. RESULTS: After removing 6 misfitting items, the remaining 56 items fit acceptably along the hypothesized continuum. Analyses yielded different difficulty estimates for the maximum score (eg, "Independent performance") for items with comparable content from different instruments. Items showed little differential item functioning across age, diagnosis, or severity groups, and 92% of the participants fit the model. CONCLUSIONS: ADL and IADL items from existing rehabilitation outcomes instruments that depend on skilled upper limb and hand use can be located along a single continuum, along with the new personal care and instrumental items of the AM-PAC addressing gaps in content. Results support the validity of the proposed definition of the Personal Care and Instrumental Activities dimension of function as a guide for future development of rehabilitation outcome instruments, such as linked, setting-specific short forms and computerized adaptive testing approaches. %B Medical Care %V 42 %P I62-172 %8 Jan %G eng %M 14707756 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2004 %T Score comparability of short forms and computerized adaptive testing: Simulation study with the activity measure for post-acute care %A Haley, S. M. %A Coster, W. J. %A Andres, P. L. %A Kosinski, M. %A Ni, P. %K Boston %K Factor Analysis, Statistical %K Humans %K Outcome Assessment (Health Care)/*methods %K Prospective Studies %K Questionnaires/standards %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVE: To compare simulated short-form and computerized adaptive testing (CAT) scores to scores obtained from complete item sets for each of the 3 domains of the Activity Measure for Post-Acute Care (AM-PAC). DESIGN: Prospective study. SETTING: Six postacute health care networks in the greater Boston metropolitan area, including inpatient acute rehabilitation, transitional care units, home care, and outpatient services. PARTICIPANTS: A convenience sample of 485 adult volunteers who were receiving skilled rehabilitation services. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Inpatient and community-based short forms and CAT applications were developed for each of 3 activity domains (physical & mobility, personal care & instrumental, applied cognition) using item pools constructed from new items and items from existing postacute care instruments. RESULTS: Simulated CAT scores correlated highly with score estimates from the total item pool in each domain (4- and 6-item CAT r range,.90-.95; 10-item CAT r range,.96-.98). Scores on the 10-item short forms constructed for inpatient and community settings also provided good estimates of the AM-PAC item pool scores for the physical & movement and personal care & instrumental domains, but were less consistent in the applied cognition domain. Confidence intervals around individual scores were greater in the short forms than for the CATs. CONCLUSIONS: Accurate scoring estimates for AM-PAC domains can be obtained with either the setting-specific short forms or the CATs. The strong relationship between CAT and item pool scores can be attributed to the CAT's ability to select specific items to match individual responses. The CAT may have additional advantages over short forms in practicality, efficiency, and the potential for providing more precise scoring estimates for individuals. %B Archives of Physical Medicine and Rehabilitation %7 2004/04/15 %V 85 %P 661-6 %8 Apr %@ 0003-9993 (Print) %G eng %M 15083444 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2004 %T A study of multiple stage adaptive test designs %A Armstrong, R. D. %A Edmonds, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T The assembly of multiple form structures %A Armstrong, R. D. %A Little, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Generic %D 2003 %T The assembly of multiple stage adaptive tests with discrete items %A Armstrong, R. D. %A Edmonds, J.J. %C Newtown, PA: Law School Admission Council Report %G eng %0 Journal Article %J Journal of Educational Measurement %D 2003 %T A comparative study of item exposure control methods in computerized adaptive testing %A Chang, S-W. %A Ansley, T. N. %K Adaptive Testing %K Computer Assisted Testing %K Educational %K Item Analysis (Statistical) %K Measurement %K Strategies computerized adaptive testing %X This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The M. L. Stocking and C. Lewis conditional multinomial procedure and, to a slightly lesser extent, the T. Davey and C. G. Parshall method seemed to be the most promising considering all of the factors that this study addressed. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 40 %P 71-103 %G eng %0 Journal Article %J International Journal of Selection and Assessment %D 2003 %T Computerized adaptive rating scales for measuring managerial performance %A Schneider, R. J. %A Goff, M. %A Anderson, S. %A Borman, W. C. %K Adaptive Testing %K Algorithms %K Associations %K Citizenship %K Computer Assisted Testing %K Construction %K Contextual %K Item Response Theory %K Job Performance %K Management %K Management Personnel %K Rating Scales %K Test %X Computerized adaptive rating scales (CARS) had been developed to measure contextual or citizenship performance. This rating format used a paired-comparison protocol, presenting pairs of behavioral statements scaled according to effectiveness levels, and an iterative item response theory algorithm to obtain estimates of ratees' citizenship performance (W. C. Borman et al, 2001). In the present research, we developed CARS to measure the entire managerial performance domain, including task and citizenship performance, thus addressing a major limitation of the earlier CARS. The paper describes this development effort, including an adjustment to the algorithm that reduces substantially the number of item pairs required to obtain almost as much precision in the performance estimates. (PsycINFO Database Record (c) 2005 APA ) %B International Journal of Selection and Assessment %V 11 %P 237-246 %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T Constructing rotating item pools for constrained adaptive testing %A Ariel, A. %A Veldkamp, B. %A van der Linden, W. J. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Generic %D 2003 %T A method to determine targets for multi-stage adaptive tests %A Armstrong, R. D. %A Roussos, L. %C Unpublished manuscript %G eng %0 Journal Article %J Journal of Educational Measurement %D 2003 %T The relationship between item exposure and test overlap in computerized adaptive testing %A Chen, S. %A Ankenmann, R. D. %A Spray, J. A. %B Journal of Educational Measurement %V 40 %P 129-145 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2003 %T The relationship between item exposure and test overlap in computerized adaptive testing %A Chen, S. %A Ankenmann, R. D. %A Spray, J. A. %B Journal of Educational Measurement %V 40 %P 129-145 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2003 %T The relationship between item exposure and test overlap in computerized adaptive testing %A Chen, S-Y. %A Ankemann, R. D. %A Spray, J. A. %K (Statistical) %K Adaptive Testing %K Computer Assisted Testing %K Human Computer %K Interaction computerized adaptive testing %K Item Analysis %K Item Analysis (Test) %K Test Items %X The purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (r-sub(max)). Therefore, item exposure control methods which implement a specification of r-sub(max) (e.g., J. B. Sympson and R. D. Hetter, 1985) provide the most direct control at both the item and test levels. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 40 %P 129-145 %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T To stratify or not: An investigation of CAT item selection procedures under practical constraints %A Deng, H. %A Ansley, T. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Journal of Applied Psychology %D 2002 %T Computer-adaptive testing: The impact of test characteristics on perceived performance and test takers’ reactions %A Tonidandel, S. %A Quiñones, M. A. %A Adams, A. A. %B Journal of Applied Psychology %V 87 %P 320-332 %0 Report %D 2002 %T Mathematical-programming approaches to test item pool design %A Veldkamp, B. P. %A van der Linden, W. J. %A Ariel, A. %K Adaptive Testing %K Computer Assisted %K Computer Programming %K Educational Measurement %K Item Response Theory %K Mathematics %K Psychometrics %K Statistical Rotation computerized adaptive testing %K Test Items %K Testing %X (From the chapter) This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing and hence to increase both measurement precision and validity. The approach consists of the application of mathematical programming techniques to calculate optimal blueprints for item pools. These blueprints can be used to guide the item-writing process. Three different types of design problems are discussed, namely for item pools for linear tests, item pools computerized adaptive testing (CAT), and systems of rotating item pools for CAT. The paper concludes with an empirical example of the problem of designing a system of rotating item pools for CAT. %I University of Twente, Faculty of Educational Science and Technology %C Twente, The Netherlands %P 93-108 %@ 02-09 %G eng %0 Journal Article %J Journal of Personality Assessment %D 2001 %T Evaluation of an MMPI-A short form: Implications for adaptive testing %A Archer, R. P. %A Tirrell, C. A. %A Elkins, D. E. %K Adaptive Testing %K Mean %K Minnesota Multiphasic Personality Inventory %K Psychometrics %K Statistical Correlation %K Statistical Samples %K Test Forms %X Reports some psychometric properties of an MMPI-Adolescent version (MMPI-A; J. N. Butcher et al, 1992) short form based on administration of the 1st 150 items of this test instrument. The authors report results for both the MMPI-A normative sample of 1,620 adolescents (aged 14-18 yrs) and a clinical sample of 565 adolescents (mean age 15.2 yrs) in a variety of treatment settings. The authors summarize results for the MMPI-A basic scales in terms of Pearson product-moment correlations generated between full administration and short-form administration formats and mean T score elevations for the basic scales generated by each approach. In this investigation, the authors also examine single-scale and 2-point congruences found for the MMPI-A basic clinical scales as derived from standard and short-form administrations. The authors present the relative strengths and weaknesses of the MMPI-A short form and discuss the findings in terms of implications for attempts to shorten the item pool through the use of computerized adaptive assessment approaches. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Personality Assessment %V 76 %P 76-89 %G eng %0 Journal Article %J Applied Measurement in Education %D 2001 %T An examination of conditioning variables used in computer adaptive testing for DIF analyses %A Walker, C. M. %A Beretvas, S. N %A Ackerman, T. A. %B Applied Measurement in Education %V 14 %P 3-16 %0 Book %D 2001 %T The FastTEST Professional Testing System, Version 1.6 [Computer software] %A Assessment-Systems-Corporation %C St. Paul MN: Author %G eng %0 Journal Article %J Psicothema %D 2001 %T Pasado, presente y futuro de los test adaptativos informatizados: Entrevista con Isaac I. Béjar [Past, present and future of computerized adaptive testing: Interview with Isaac I. Béjar] %A Tejada, R. %A Antonio, J. %K computerized adaptive testing %X En este artículo se presenta el resultado de una entrevista con Isaac I. Bejar. El Dr. Bejar es actualmente Investigador Científico Principal y Director del Centro para el Diseño de Evaluación y Sistemas de Puntuación perteneciente a la División de Investigación del Servicio de Medición Educativa (Educa - tional Testing Service, Princeton, NJ, EE.UU.). El objetivo de esta entrevista fue conversar sobre el pasado, presente y futuro de los Tests Adaptativos Informatizados. En la entrevista se recogen los inicios de los Tests Adaptativos y de los Tests Adaptativos Informatizados y últimos avances que se desarrollan en el Educational Testing Service sobre este tipo de tests (modelos generativos, isomorfos, puntuación automática de ítems de ensayo…). Se finaliza con la visión de futuro de los Tests Adaptativos Informatizados y su utilización en España.Past, present and future of Computerized Adaptive Testing: Interview with Isaac I. Bejar. In this paper the results of an interview with Isaac I. Bejar are presented. Dr. Bejar is currently Principal Research Scientist and Director of Center for Assessment Design and Scoring, in Research Division at Educational Testing Service (Princeton, NJ, U.S.A.). The aim of this interview was to review the past, present and future of the Computerized Adaptive Tests. The beginnings of the Adaptive Tests and Computerized Adaptive Tests, and the latest advances developed at the Educational Testing Service (generative response models, isomorphs, automated scoring…) are reviewed. The future of Computerized Adaptive Tests is analyzed, and its utilization in Spain commented. %B Psicothema %V 13 %P 685-690 %@ 0214-9915 %G eng %0 Journal Article %J Apuntes de Psicologia %D 2001 %T Requerimientos, aplicaciones e investigación en tests adaptativos informatizados [Requirements, applications, and investigation in computerized adaptive testing] %A Olea Díaz, J. %A Ponsoda Gil, V. %A Revuelta Menéndez, J. %A Hontangas Beltrán, P. %A Abad, F. J. %K Computer Assisted Testing %K English as Second Language %K Psychometrics computerized adaptive testing %X Summarizes the main requirements and applications of computerized adaptive testing (CAT) with emphasis on the differences between CAT and conventional computerized tests. Psychometric properties of estimations based on CAT, item selection strategies, and implementation software are described. Results of CAT studies in Spanish-speaking samples are described. Implications for developing a CAT measuring the English vocabulary of Spanish-speaking students are discussed. (PsycINFO Database Record (c) 2005 APA ) %B Apuntes de Psicologia %V 19 %P 11-28 %G eng %0 Journal Article %J Psicothema %D 2000 %T Algoritmo mixto mínima entropía-máxima información para la selección de ítems en un test adaptativo informatizado %A Dorronsoro, J. R. %A Santa-Cruz, C. %A Rubio Franco, V. J. %A Aguado García, D. %K computerized adaptive testing %X El objetivo del estudio que presentamos es comparar la eficacia como estrat egia de selección de ítems de tres algo ritmos dife rentes: a) basado en máxima info rmación; b) basado en mínima entropía; y c) mixto mínima entropía en los ítems iniciales y máxima info rmación en el resto; bajo la hipótesis de que el algo ritmo mixto, puede dotar al TAI de mayor eficacia. Las simulaciones de procesos TAI se re a l i z a ron sobre un banco de 28 ítems de respuesta graduada calibrado según el modelo de Samejima, tomando como respuesta al TAI la respuesta ori ginal de los sujetos que fueron utilizados para la c a l i b ración. Los resultados iniciales mu e s t ran cómo el cri t e rio mixto es más eficaz que cualquiera de los otros dos tomados indep e n d i e n t e m e n t e. Dicha eficacia se maximiza cuando el algo ritmo de mínima entropía se re s t ri n ge a la selección de los pri m e ros ítems del TAI, ya que con las respuestas a estos pri m e ros ítems la estimación de q comienza a ser re l evante y el algo ritmo de máxima informaciónse optimiza.Item selection algo rithms in computeri zed adap t ive testing. The aim of this paper is to compare the efficacy of three different item selection algo rithms in computeri zed adap t ive testing (CAT). These algorithms are based as follows: the first one is based on Item Info rm ation, the second one on Entropy, and the last algo rithm is a mixture of the two previous ones. The CAT process was simulated using an emotional adjustment item bank. This item bank contains 28 graded items in six categories , calibrated using Samejima (1969) Graded Response Model. The initial results show that the mixed criterium algorithm performs better than the other ones. %B Psicothema %V 12 %P 12-14 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T A comparison of item selection rules at the early stages of computerized adaptive testing %A Chen, S-Y. %A Ankenmann, R. D. %A Chang, Hua-Hua %K Adaptive Testing %K Computer Assisted Testing %K Item Analysis (Test) %K Statistical Estimation computerized adaptive testing %X The effects of 5 item selection rules--Fisher information (FI), Fisher interval information (FII), Fisher information with a posterior distribution (FIP), Kullback-Leibler information (KL), and Kullback-Leibler information with a posterior distribution (KLP)--were compared with respect to the efficiency and precision of trait (θ) estimation at the early stages of computerized adaptive testing (CAT). FII, FIP, KL, and KLP performed marginally better than FI at the early stages of CAT for θ=-3 and -2. For tests longer than 10 items, there appeared to be no precision advantage for any of the selection rules. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 24 %P 241-255 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T A comparison of item selection rules at the early stages of computerized adaptive testing %A Chen, S.Y. %A Ankenmann, R. D. %A Chang, Hua-Hua %B Applied Psychological Measurement %V 24 %P 241-255 %G eng %0 Journal Article %J Metodología de las Ciencias del Comportamiento %D 2000 %T Los tests adaptativos informatizados en la frontera del siglo XXI: Una revisión [Computerized adaptive tests at the turn of the 21st century: A review] %A Hontangas, P. %A Ponsoda, V. %A Olea, J. %A Abad, F. J. %K computerized adaptive testing %B Metodología de las Ciencias del Comportamiento %V 2 %P 183-216 %@ 1575-9105 %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 2000 %T Performance of item exposure control methods in computerized adaptive testing: Further explorations %A Chang, Hua-Hua %A Chang, S. %A Ansley %B Paper presented at the Annual Meeting of the American Educational Research Association %C New Orleans , LA %G eng %0 Journal Article %J Psicolgia %D 2000 %T Psychometric and psychological effects of review on computerized fixed and adaptive tests %A Olea, J. %A Revuelta, J. %A Ximenez, M. C. %A Abad, F. J. %B Psicolgia %V 21 %P 157-173 %G Spanish %0 Conference Paper %B Paper presented at the Computer-Assisted Testing Conference. %D 2000 %T Using constraints to develop and deliver adaptive tests %A Abdullah, S. C %A Cooley, R. E. %B Paper presented at the Computer-Assisted Testing Conference. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T An examination of conditioning variables in DIF analysis in a computer adaptive testing environment %A Walker, C. M. %A Ackerman, T. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Generic %D 1999 %T Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing (ACT Research Report series 99-5) %A Chen, S-Y. %A Ankenmann, R. D. %A Spray, J. A. %C Iowa City IA: ACT, Inc %G eng %0 Generic %D 1999 %T Exploring the relationship between item exposure rate and test overlap rate in computerized adaptive testing %A Chen, S. %A Ankenmann, R. D. %A Spray, J. A. %C Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T Graphical models and computerized adaptive testing %A Almond, R. G. %A Mislevy, R. J. %K computerized adaptive testing %X Considers computerized adaptive testing from the perspective of graphical modeling (GM). GM provides methods for making inferences about multifaceted skills and knowledge and for extracting data from complex performances. Provides examples from language-proficiency assessment. (SLD) %B Applied Psychological Measurement %V 23 %P 223-37 %G eng %M EJ596307 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Use of conditional item exposure methodology for an operational CAT %A Anderson, D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Generic %D 1998 %T Computer adaptive testing – Approaches for item selection and measurement %A Armstrong, R. D. %A Jones, D. H. %C Rutgers Center for Operations Research, New Brunswick NJ %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T Computerized adaptive testing with multiple form structures %A Armstrong, R. D. %A Jones, D. H. %A Berliner, N. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Conference Paper %B Paper presented at the colloquium %D 1998 %T Developing, maintaining, and renewing the item inventory to support computer-based testing %A Way, W. D. %A Steffen, M. %A Anderson, G. S. %B Paper presented at the colloquium %C Computer-Based Testing: Building the Foundation for Future Assessments, Philadelphia PA %G eng %0 Generic %D 1997 %T Unidimensional approximations for a computerized adaptive test when the item pool and latent space are multidimensional (Research Report 97-5) %A Spray, J. A. %A Abdel-Fattah, A. A. %A Huang, C.-Y. %A Lau, CA %C Iowa City IA: ACT Inc %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T Effect of altering passing score in CAT when unidimensionality is violated %A Abdel-Fattah, A. A. %A Lau, CA %A Spray, J. A. %B Paper presented at the annual meeting of the American Educational Research Association %C New York NY %8 April %G eng %0 Book %D 1996 %T Users manual for the MicroCAT testing system, Version 3.5 %A Assessment-Systems-Corporation. %C St Paul MN: Assessment Systems Corporation %G eng %0 Conference Paper %B Poster session presented at the annual meeting of the American Educational Research Association %D 1996 %T Using unidimensional IRT models for dichotomous classification via CAT with multidimensional data %A Lau, CA %A Abdel-Fattah, A. A. %A Spray, J. A. %B Poster session presented at the annual meeting of the American Educational Research Association %C Boston MA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1995 %T The effect of model misspecification on classification decisions made using a computerized test: UIRT versus MIRT %A Abdel-Fattah, A. A. %A Lau, C.-M. A. %B Paper presented at the annual meeting of the Psychometric Society %C Minneapolis MN %G eng %0 Book %D 1995 %T Guidelines for computer-adaptive test development and use in education %A American-Council-on-Education. %C Washington DC: Author %G eng %0 Journal Article %J Psychometrika %D 1995 %T Review of the book Computerized Adaptive Testing: A Primer %A Andrich, D. %B Psychometrika %V 4? %P 615-620 %G eng %0 Book %D 1994 %T Effects of computerized adaptive test anxiety on nursing licensure examinations %A Arrowwood, V. E. %C Dissertation Abstracts International, A (Humanities and Social Sciences), 54 (9-A), 3410 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1994 %T Establishing the comparability of the NCLEX using CAT with traditional NCLEX examinations %A Eignor, D. R. %A Way, W. D. %A Amoss, K.E. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans, LA %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1993 %T Linking the standard and advanced forms of the Ravens Progressive Matrices in both the paper-and-pencil and computer-adaptive-testing formats %A Styles, I. %A Andrich, D. %B Educational and Psychological Measurement %V 53 %P 905-925 %G eng %0 Journal Article %J Mesure et évaluation en éducation %D 1992 %T Le testing adaptatif avec interprétation critérielle, une expérience de praticabilité du TAM pour l’évaluation sommative des apprentissages au Québec. %A Auger, R. %E Seguin, S. P. %B Mesure et évaluation en éducation %V 15-1 et 2 %G French %& 10 %0 Journal Article %J Applied Psychological Measurement %D 1991 %T The use of unidimensional parameter estimates of multidimensional items in adaptive testing %A Ackerman, T. A. %B Applied Psychological Measurement %V 15 %P 13-24 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1991 %T The Use of Unidimensional Parameter Estimates of Multidimensional Items in Adaptive Testing %A Ackerman, T. A. %B Applied Psychological Measurement %V 15 %P 13-24 %G English %N 1 %0 Journal Article %D 1990 %T The construction of customized two-staged tests %A Adema, J. J. %V 27 %P 241-253 %G eng %0 Book %D 1989 %T Étude de praticabilité du testing adaptatif de maîtrise des apprentissages scolaires au Québec : une expérimentation en éducation économique secondaire 5 %A Auger, R. %C Thèse de doctorat non publiée. Montréal : Université du Québec à Montréal. [In French] %G eng %0 Book %D 1988 %T Users manual for the MicroCAT Testing System, Version 3 %A Assessment-Systems-Corporation %C St. Paul MN: Author. %G eng %0 Generic %D 1987 %T Adaptive testing, information, and the partial credit model %A Adams, R. J. %C Melbourne, Australia: University of Melbourne, Center for the Study of Higher Education %0 Report %D 1987 %T The use of unidimensional item parameter estimates of multidimensional items in adaptive testing %A Ackerman, T. A. %X Investigated the effect of using multidimensional (MDN) items in a computer adaptive test setting that assumes a unidimensional item response theory model in 2 experiments, using generated and real data in which difficulty was known to be confounded with dimensionality. Results from simulations suggest that univariate calibration of MDN data filtered out multidimensionality. The closer an item's MDN composite aligned itself with the calibrated univariate ability scale's orientation, the larger was the estimated discrimination parameter. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B ACT Research Reports %I ACT %C Iowa City, IA %P 33 %8 September, 1987 %@ 87-13 %G eng %0 Generic %D 1984 %T Analysis of experimental CAT ASVAB data %A Allred, L. A %A Green, B. F. %C Baltimore MD: Johns Hopkins University, Department of Psychology %0 Journal Article %J Journal of Educational Measurement %D 1984 %T Issues in item banking %A Millman, J. %A Arter, J.A. %B Journal of Educational Measurement %V 1 %P 315-330 %G eng %0 Book %D 1984 %T Users manual for the MicroCAT Testing System %A Assessment-Systems-Corporation %C St. Paul MN: Author %G eng %0 Book %D 1983 %T The stochastic modeling of elementary psychological processes %A Townsend, J. T. %A Ashby, G. F. %C Cambridge: Cambridge University Press %G eng %0 Generic %D 1982 %T An adaptive Private Pilot Certification Exam %A Trollip, S. R. %A Anderson, R. I. %C Aviation, Space, and Environmental Medicine %G eng %0 Journal Article %J British Journal of Educational Psychology %D 1980 %T A simple form of tailored testing %A Nisbet, J. %A Adams, M. %A Arthur, J. %B British Journal of Educational Psychology %V 50 %P 301-303 %0 Generic %D 1962 %T Exploratory study of a sequential item test %A Seeley, L. C. %A Morton, M. A. %A Anderson, A. A. %C U.S. Army Personnel Research Office, Technical Research Note 129. %G eng %0 Generic %D 1960 %T Construction of an experimental sequential item test (Research Memorandum 60-1) %A Bayroff, A. G. %A Thomas, J. J %A Anderson, A. A. %C Washington DC: Personnel Research Branch, Department of the Army %G eng %0 Generic %D 1958 %T The multi-level experiment: A study of a two-level test system for the College Board Scholastic Aptitude Test %A Angoff, W. H. Huddleston, E. M. %C Princeton NJ: Educational Testing Service %G eng %0 Journal Article %D 1953 %T An empirical study of the applicability of sequential analysis to item selection %A Anastasi, A. %V 13 %P 3-13 %G eng %0 Journal Article %D 1950 %T Sequential analysis with more than two alternative hypotheses, and its relation to discriminant function analysis %A Armitage, P. %V 12 %P 137-144 %G eng