PLENARY SYMPOSIUM: Computerized Adaptive and Multistage Testing (MST) with R
Presenters: David Magis (University of Liège); Duanli Yan (ETS); Alina A. von Davier (ACT)
Abstract: Computerized Adaptive Testing (CAT) has become a very popular method of administering questionnaires, collecting data and on-the-fly scoring. It has been used in many large scale assessments over the past few decades and is currently an important field of research in psychometrics. Multistage testing (MST), on the other hand, has also received increased popularity in recent years. Both approaches rely on the notion of adaptive testing, where items are administered sequentially and selected optimally according to the current responses to the administered items. The conceptual difference between CAT and MST is that with CAT, items are selected one after each other and the ability of the test taker is estimated after the administration of each item. In MST, however, items are included in predefined modules and the selection of the subsequent module/s is based on the performance on the previously administered module/s, not on the single items. In CAT, some commercial software exists (i.e. CATSim, Adaptest…). Some open-source solutions for simulation studies also exist, most of them implemented in the R software. Among others are the packages catR, mirtCAT and the R-based software Firestar. In MST, MSTGen exists. Very recently, the R package mstR was developed as a tool for simulations in the MST context, similar to the catR package for the CAT framework.
This presentation provides a practical (and brief) overview of the theory on Computerized Adaptive Testing (CAT) and Multistage Testing (MST), and illustrates the methodologies and applications using R open source language and several data examples. The implementations rely on the R packages catR and mstR which have already been and are being further developed, and include some of the newest research algorithms developed by the presenters. It will cover several topics including: a theoretical overview of CAT and MST, CAT and MST designs, assembly methodologies, catR and mstR packages, simulations and applications.
Bio: David Magis is Research Associate of the “Fonds de la Recherche Scientifique – FNRS” at the Department of Education, University of Liège, Belgium. He earned an MSc in biostatistics (Hasselt University, Belgium) and a Ph.D. in statistics (University of Liège). His specialization is in statistical methods in psychometrics, with special interest in item response theory (IRT), differential item functioning and computerized adaptive testing (CAT). His research interests include both theoretical and methodological development of psychometric models and methods, as well as open source implementation and dissemination in R.
He is associate editor of the British Journal of Mathematical and Statistical Psychology and published numerous research papers in psychometric journals. He is the main developer and maintainer of the packages catR and mstR for adaptive and multistage testing, among others. He was awarded the 2016 Psychometric Society Early Career Award for his contribution in open-source programming and adaptive testing.
Bio: Duanli Yan is a Manager of Data Analysis and Computational Research in the Research and Development Division at Educational Testing Service (ETS). She is also an Adjunct Professor at Rutgers University, the state university of New Jersey. She holds a Ph.D. in Psychometrics at Fordham University, NY, and dual Masters in Statistics and Operations Research at the Penn State University, University Park. At ETS, she is responsible for the statistical modeling and analysis for automated scoring engines including essay scoring and Speech scoring. During her tenure at ETS, she has led the statistical operational analysis and scoring for EXADEP, NAC and TOEIC® Institutional programs, has been a development scientist for innovative research applications, and a psychometrician for several operational programs.
She is the co-author Bayesian Networks in Educational Assessment (Almond, Mislevy, Steinberg, Yan, and Williamson), co-editor for Computerized Multistage Testing: Theory and Applications (Yan, von Davier, Lewis). She was further the recipient of the ETS presidential award and spot awards, the NCME Brenda H. Loyd Outstanding Dissertation Award, the IACAT Early career award, and the AERA Division D Significant Contribution to Educational Measurement and Research Methodology award. She has been an invited training session/workshop lecture, symposium organizer, discussant, and presenter at many conferences including the National Council of Measurement in Education (NCME), the International Association for Computerized Adaptive Testing (IACAT), and the International Psychometrics Society (IMPS) conferences. Her current research interests include computerized multistage testing applications in operational programs, Bayesian inference methodologies, and automated scoring methodologies and applications.
PLENARY SYMPOSIUM: Item Pool Design and Evaluation
Presenters: Mark D. Reckase (Michigan State University); Wei He (Northwest Evaluation Association); Emre Gonulates (Western Michigan University); Jing-Ru Xu (Pearson VUE); Xuechun Zhou (Pearson Clinical Assessment)
Introductory Topic: The Need for Item Pool Design and Evaluation for Operational CATs
Abstract: Early work on CAT tended to use existing sets of items which came from fixed length test forms. These sets of items were selected to meet much different requirements than are needed for a CAT; decision making or covering a content domain. However, there was also some early work that suggested having items equally distributed over the range of proficiency that was of interest or concentrated at a decision point. There was also some work that showed that there was bias in proficiency estimates when an item pool was too easy or too hard. These early findings eventually led to work on item pool design and, more recently, on item pool evaluation. This presentation gives a brief overview of these topics to give some context for the following presentations in this symposium.
Bio: Mark Reckase is a University Distinguished Professor Emeritus at Michigan State University where he has taught courses in psychometric theory and various apsects of item response theory. He has also done work on standard setting procedures for educational and licensure tests, the use of statistical models for evaluating the performance of teachers, international studies of the preparation of teachers of mathematics, and the design and implementation of computerized adaptive tests. He has been the editor of Applied Psychological Measurement and the Journal of Educational Measurement. He has been the president of the National Council on Measurement in Education (NCME), the vice president of Division D of the American Educational Research Association, and the secretary of the Psychometric Society.
Topic: Item Pool Design for CAT with Repeaters
Abstract: Item pool design in CAT focuses on developing an item pool blueprint in which the distribution of numbers of items with all possible combinations of the relevant statistical and non-statistical attributes of the items are described. Typically, no items are available at the time of the design. Several factors have been documented to affect the item pool design and the ability distribution of the expected examinee population is one of them. This study takes a deeper look at how the ability distribution of the expected examinee population, due to having repeaters in the population, affects item pool design in an achievement test that employs a complex adaptive testing algorithm.
Bio: Wei He obtained her Ph.D. degree in Measurement and Quantitative Methods at Michigan State University. Her primary research interests include computerized adaptive/based testing, psychometrics, and large-scale educational assessment. Dr. He is currently Director of Psychometric Services Team at Northwest Evaluation Association, in Portland.
Topic: Statistical Methods for Evaluating the Sufficiency of Item Pools for Adaptive Tests
Abstract: An item pool that is sufficient for a particular CAT can function differently for another CAT with different specifications or target population. Traditional methods such as the item pool information function may not capture the sufficiency of an item pool especially if there are many constraints on the item selection algorithm. This session presents a new method to evaluate the item pool sufficiency and how to diagnose the potential inadequacies of an item pool.
Bio: Emre Gonulates is an Assistant Professor of evaluation, measurement and research in Western Michigan University. He earned a Ph.D. in measurement and quantitative methods from Michigan State University, where he also received an M.S. in statistics. He also earned M.S. and B.S. in mathematics education from Bogazici University, Turkey. Prior to his academic work, Emre worked as a high school mathematics teacher. His research interests include educational measurement and psychometrics focusing on computerized adaptive testing.
Topic: Evaluating Different Content Distributions for a Variable Length Content-Balanced CAT
Abstract: In computerized adaptive testing (CAT), content balancing designs are used to control the content coverage of the items administered to test candidates. In current CAT studies, the content balancing design assures all candidates take a similar proportion of items from each content area, and in some cases of a fixed length CAT each candidate will get the exact same number of items from each domain. This research proposes different content balancing designs for a variable length CAT. Different simulation studies and evaluation criteria were applied to investigate and evaluate the influence of the content balancing control for a variable length CAT with multiple content areas. This study is a pioneer research for future analyses of CAT item pool evaluation and design under different conditions.
Bio: Jing-Ru Xu is a Psychometrician at Pearson VUE in Chicago, where she works on computerized adaptive licensure tests. She received her Ph.D. in Measurement and Quantitative Methods and M.S. in Statistics from Michigan State University.
Topic: Item Pool Design and Management using P-Optimal Method for mixed-format CAT
Abstract: The primary purpose of this study is to identify the item pool characteristics for a CAT program consisting of both dichotomous and polytomous items using the p-optimality method with the Rasch and partial credit model (PCM). Optimal item pools are generated using CAT simulations with two factors considered: stopping rules for dichotomous items as the first part of the test and item exposure rate control. The resulting characteristics describe the item difficulty distribution of two item types, proportions of the two item types, item pool information distribution, and pool sizes. The performances of the simulated item pools are evaluated against an operational item pool with respect to the ability estimate, classification at the cut point and pool utilization.
Bio: Xuechun Zhou is a Psychometrician at Pearson Clinical Assessment. She holds a Ph.D. in Measurement and Quantitative Method from Michigan State University. Her responsibilities at Pearson are involved in developing norm-reference psychological and educational assessments including research design and data analyses. Her research interests are optimal item pool design for computerized adaptive testing, item pool management, and statistical models in improving assessments and clinical utility.