%0 Journal Article
%J Journal of Applied Measurement
%D 2009
%T Considerations about expected a posteriori estimation in adaptive testing: adaptive a priori, adaptive correction for bias, and adaptive integration interval
%A Raiche, G.
%A Blais, J. G.
%K *Bias (Epidemiology)
%K *Computers
%K Data Interpretation, Statistical
%K Models, Statistical
%X In a computerized adaptive test, we would like to obtain an acceptable precision of the proficiency level estimate using an optimal number of items. Unfortunately, decreasing the number of items is accompanied by a certain degree of bias when the true proficiency level differs significantly from the a priori estimate. The authors suggest that it is possible to reduced the bias, and even the standard error of the estimate, by applying to each provisional estimation one or a combination of the following strategies: adaptive correction for bias proposed by Bock and Mislevy (1982), adaptive a priori estimate, and adaptive integration interval.
%B Journal of Applied Measurement
%7 2009/07/01
%V 10
%P 138-56
%@ 1529-7713 (Print)1529-7713 (Linking)
%G eng
%M 19564695
%0 Journal Article
%J Rehabilitation Psychology
%D 2009
%T Development of an item bank for the assessment of depression in persons with mental illnesses and physical diseases using Rasch analysis
%A Forkmann, T.
%A Boecker, M.
%A Norra, C.
%A Eberle, N.
%A Kircher, T.
%A Schauerte, P.
%A Mischke, K.
%A Westhofen, M.
%A Gauggel, S.
%A Wirtz, M.
%K Adaptation, Psychological
%K Adult
%K Aged
%K Depressive Disorder/*diagnosis/psychology
%K Diagnosis, Computer-Assisted
%K Female
%K Heart Diseases/*psychology
%K Humans
%K Male
%K Mental Disorders/*psychology
%K Middle Aged
%K Models, Statistical
%K Otorhinolaryngologic Diseases/*psychology
%K Personality Assessment/statistics & numerical data
%K Personality Inventory/*statistics & numerical data
%K Psychometrics/statistics & numerical data
%K Questionnaires
%K Reproducibility of Results
%K Sick Role
%X OBJECTIVE: The calibration of item banks provides the basis for computerized adaptive testing that ensures high diagnostic precision and minimizes participants' test burden. The present study aimed at developing a new item bank that allows for assessing depression in persons with mental and persons with somatic diseases. METHOD: The sample consisted of 161 participants treated for a depressive syndrome, and 206 participants with somatic illnesses (103 cardiologic, 103 otorhinolaryngologic; overall mean age = 44.1 years, SD =14.0; 44.7% women) to allow for validation of the item bank in both groups. Persons answered a pool of 182 depression items on a 5-point Likert scale. RESULTS: Evaluation of Rasch model fit (infit < 1.3), differential item functioning, dimensionality, local independence, item spread, item and person separation (>2.0), and reliability (>.80) resulted in a bank of 79 items with good psychometric properties. CONCLUSIONS: The bank provides items with a wide range of content coverage and may serve as a sound basis for computerized adaptive testing applications. It might also be useful for researchers who wish to develop new fixed-length scales for the assessment of depression in specific rehabilitation settings.
%B Rehabilitation Psychology
%7 2009/05/28
%V 54
%P 186-97
%8 May
%@ 0090-5550 (Print)0090-5550 (Linking)
%G eng
%M 19469609
%0 Journal Article
%J British Journal of Mathematical and Statistical Psychology
%D 2009
%T The maximum priority index method for severely constrained item selection in computerized adaptive testing
%A Cheng, Y
%A Chang, Hua-Hua
%K Aptitude Tests/*statistics & numerical data
%K Diagnosis, Computer-Assisted/*statistics & numerical data
%K Educational Measurement/*statistics & numerical data
%K Humans
%K Mathematical Computing
%K Models, Statistical
%K Personality Tests/*statistics & numerical data
%K Psychometrics/*statistics & numerical data
%K Reproducibility of Results
%K Software
%X This paper introduces a new heuristic approach, the maximum priority index (MPI) method, for severely constrained item selection in computerized adaptive testing. Our simulation study shows that it is able to accommodate various non-statistical constraints simultaneously, such as content balancing, exposure control, answer key balancing, and so on. Compared with the weighted deviation modelling method, it leads to fewer constraint violations and better exposure control while maintaining the same level of measurement precision.
%B British Journal of Mathematical and Statistical Psychology
%7 2008/06/07
%V 62
%P 369-83
%8 May
%@ 0007-1102 (Print)0007-1102 (Linking)
%G eng
%M 18534047
%0 Journal Article
%J Journal of Applied Measurement
%D 2008
%T Binary items and beyond: a simulation of computer adaptive testing using the Rasch partial credit model
%A Lange, R.
%K *Data Interpretation, Statistical
%K *User-Computer Interface
%K Educational Measurement/*statistics & numerical data
%K Humans
%K Illinois
%K Models, Statistical
%X Past research on Computer Adaptive Testing (CAT) has focused almost exclusively on the use of binary items and minimizing the number of items to be administrated. To address this situation, extensive computer simulations were performed using partial credit items with two, three, four, and five response categories. Other variables manipulated include the number of available items, the number of respondents used to calibrate the items, and various manipulations of respondents' true locations. Three item selection strategies were used, and the theoretically optimal Maximum Information method was compared to random item selection and Bayesian Maximum Falsification approaches. The Rasch partial credit model proved to be quite robust to various imperfections, and systematic distortions did occur mainly in the absence of sufficient numbers of items located near the trait or performance levels of interest. The findings further indicate that having small numbers of items is more problematic in practice than having small numbers of respondents to calibrate these items. Most importantly, increasing the number of response categories consistently improved CAT's efficiency as well as the general quality of the results. In fact, increasing the number of response categories proved to have a greater positive impact than did the choice of item selection method, as the Maximum Information approach performed only slightly better than the Maximum Falsification approach. Accordingly, issues related to the efficiency of item selection methods are far less important than is commonly suggested in the literature. However, being based on computer simulations only, the preceding presumes that actual respondents behave according to the Rasch model. CAT research could thus benefit from empirical studies aimed at determining whether, and if so, how, selection strategies impact performance.
%B Journal of Applied Measurement
%7 2008/01/09
%V 9
%P 81-104
%@ 1529-7713 (Print)1529-7713 (Linking)
%G eng
%M 18180552
%0 Journal Article
%J British Journal of Mathematical and Statistical Psychology
%D 2008
%T Predicting item exposure parameters in computerized adaptive testing
%A Chen, S-Y.
%A Doong, S. H.
%K *Algorithms
%K *Artificial Intelligence
%K Aptitude Tests/*statistics & numerical data
%K Diagnosis, Computer-Assisted/*statistics & numerical data
%K Humans
%K Models, Statistical
%K Psychometrics/statistics & numerical data
%K Reproducibility of Results
%K Software
%X The purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) - a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge-based solution for finding item exposure parameters.
%B British Journal of Mathematical and Statistical Psychology
%7 2008/05/17
%V 61
%P 75-91
%8 May
%@ 0007-1102 (Print)0007-1102 (Linking)
%G eng
%M 18482476
%0 Journal Article
%J Quality of Life Research
%D 2007
%T Developing tailored instruments: item banking and computerized adaptive assessment
%A Bjorner, J. B.
%A Chang, C-H.
%A Thissen, D.
%A Reeve, B. B.
%K *Health Status
%K *Health Status Indicators
%K *Mental Health
%K *Outcome Assessment (Health Care)
%K *Quality of Life
%K *Questionnaires
%K *Software
%K Algorithms
%K Factor Analysis, Statistical
%K Humans
%K Models, Statistical
%K Psychometrics
%X Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges.
%B Quality of Life Research
%7 2007/05/29
%V 16
%P 95-108
%@ 0962-9343 (Print)
%G eng
%M 17530450
%0 Journal Article
%J Medical Care
%D 2006
%T Overview of quantitative measurement methods. Equivalence, invariance, and differential item functioning in health applications
%A Teresi, J. A.
%K *Cross-Cultural Comparison
%K Data Interpretation, Statistical
%K Factor Analysis, Statistical
%K Guidelines as Topic
%K Humans
%K Models, Statistical
%K Psychometrics/*methods
%K Statistics as Topic/*methods
%K Statistics, Nonparametric
%X BACKGROUND: Reviewed in this article are issues relating to the study of invariance and differential item functioning (DIF). The aim of factor analyses and DIF, in the context of invariance testing, is the examination of group differences in item response conditional on an estimate of disability. Discussed are parameters and statistics that are not invariant and cannot be compared validly in crosscultural studies with varying distributions of disability in contrast to those that can be compared (if the model assumptions are met) because they are produced by models such as linear and nonlinear regression. OBJECTIVES: The purpose of this overview is to provide an integrated approach to the quantitative methods used in this special issue to examine measurement equivalence. The methods include classical test theory (CTT), factor analytic, and parametric and nonparametric approaches to DIF detection. Also included in the quantitative section is a discussion of item banking and computerized adaptive testing (CAT). METHODS: Factorial invariance and the articles discussing this topic are introduced. A brief overview of the DIF methods presented in the quantitative section of the special issue is provided together with a discussion of ways in which DIF analyses and examination of invariance using factor models may be complementary. CONCLUSIONS: Although factor analytic and DIF detection methods share features, they provide unique information and can be viewed as complementary in informing about measurement equivalence.
%B Medical Care
%7 2006/10/25
%V 44
%P S39-49
%8 Nov
%@ 0025-7079 (Print)0025-7079 (Linking)
%G eng
%M 17060834
%0 Journal Article
%J Drug and Alcohol Dependence
%D 2002
%T Assessing tobacco beliefs among youth using item response theory models
%A Panter, A. T.
%A Reeve, B. B.
%K *Attitude to Health
%K *Culture
%K *Health Behavior
%K *Questionnaires
%K Adolescent
%K Adult
%K Child
%K Female
%K Humans
%K Male
%K Models, Statistical
%K Smoking/*epidemiology
%X Successful intervention research programs to prevent adolescent smoking require well-chosen, psychometrically sound instruments for assessing smoking prevalence and attitudes. Twelve thousand eight hundred and ten adolescents were surveyed about their smoking beliefs as part of the Teenage Attitudes and Practices Survey project, a prospective cohort study of predictors of smoking initiation among US adolescents. Item response theory (IRT) methods are used to frame a discussion of questions that a researcher might ask when selecting an optimal item set. IRT methods are especially useful for choosing items during instrument development, trait scoring, evaluating item functioning across groups, and creating optimal item subsets for use in specialized applications such as computerized adaptive testing. Data analytic steps for IRT modeling are reviewed for evaluating item quality and differential item functioning across subgroups of gender, age, and smoking status. Implications and challenges in the use of these methods for tobacco onset research and for assessing the developmental trajectories of smoking among youth are discussed.
%B Drug and Alcohol Dependence
%V 68
%P S21-S39
%8 Nov
%G eng
%M 12324173
%0 Journal Article
%J J Outcome Meas
%D 1998
%T The effect of item pool restriction on the precision of ability measurement for a Rasch-based CAT: comparisons to traditional fixed length examinations
%A Halkitis, P. N.
%K *Decision Making, Computer-Assisted
%K Comparative Study
%K Computer Simulation
%K Education, Nursing
%K Educational Measurement/*methods
%K Human
%K Models, Statistical
%K Psychometrics/*methods
%X This paper describes a method for examining the precision of a computerized adaptive test with a limited item pool. Standard errors of measurement ascertained in the testing of simulees with a CAT using a restricted pool were compared to the results obtained in a live paper-and-pencil achievement testing of 4494 nursing students on four versions of an examination of calculations of drug administration. CAT measures of precision were considered when the simulated examine pools were uniform and normal. Precision indices were also considered in terms of the number of CAT items required to reach the precision of the traditional tests. Results suggest that regardless of the size of the item pool, CAT provides greater precision in measurement with a smaller number of items administered even when the choice of items is limited but fails to achieve equiprecision along the entire ability continuum.
%B J Outcome Meas
%V 2
%P 97-122
%G eng
%M 9661734