Symposia | IACAT

There are two Plenary Symposia in the 2017 IACAT Conference. They include:

PLENARY SYMPOSIUM: Computerized Adaptive and Multistage Testing (MST) with R

Presenters: David Magis (University of Liège); Duanli Yan (ETS); Alina A. von Davier (ACT)

Abstract: Computerized Adaptive Testing (CAT) has become a very popular method of administering questionnaires, collecting data and on-the-fly scoring. It has been used in many large scale assessments over the past few decades and is currently an important field of research in psychometrics. Multistage testing (MST), on the other hand, has also received increased popularity in recent years. Both approaches rely on the notion of adaptive testing, where items are administered sequentially and selected optimally according to the current responses to the administered items. The conceptual difference between CAT and MST is that with CAT, items are selected one after each other and the ability of the test taker is estimated after the administration of each item. In MST, however, items are included in predefined modules and the selection of the subsequent module/s is based on the performance on the previously administered module/s, not on the single items. In CAT, some commercial software exists (i.e. CATSim, Adaptest…). Some open-source solutions for simulation studies also exist, most of them implemented in the R software. Among others are the packages catR, mirtCAT and the R-based software Firestar. In MST, MSTGen exists. Very recently, the R package mstR was developed as a tool for simulations in the MST context, similar to the catR package for the CAT framework.

This presentation provides a practical (and brief) overview of the theory on Computerized Adaptive Testing (CAT) and Multistage Testing (MST), and illustrates the methodologies and applications using R open source language and several data examples. The implementations rely on the R packages catR and mstR which have already been and are being further developed, and include some of the newest research algorithms developed by the presenters. It will cover several topics including: a theoretical overview of CAT and MST, CAT and MST designs, assembly methodologies, catR and mstR packages, simulations and applications.

Bio: David Magis is Research Associate of the “Fonds de la Recherche Scientifique – FNRS” at the Department of Education, University of Liège, Belgium. He earned an MSc in biostatistics (Hasselt University, Belgium) and a Ph.D. in statistics (University of Liège). His specialization is in statistical methods in psychometrics, with special interest in item response theory (IRT), differential item functioning and computerized adaptive testing (CAT). His research interests include both theoretical and methodological development of psychometric models and methods, as well as open source implementation and dissemination in R.

He is associate editor of the British Journal of Mathematical and Statistical Psychology and published numerous research papers in psychometric journals. He is the main developer and maintainer of the packages catR and mstR for adaptive and multistage testing, among others. He was awarded the 2016 Psychometric Society Early Career Award for his contribution in open-source programming and adaptive testing.

Bio: Duanli Yan is a Manager of Data Analysis and Computational Research in the Research and Development Division at Educational Testing Service (ETS). She is also an Adjunct Professor at Rutgers University, the state university of New Jersey. She holds a Ph.D. in Psychometrics at Fordham University, NY, and dual Masters in Statistics and Operations Research at the Penn State University, University Park. At ETS, she is responsible for the statistical modeling and analysis for automated scoring engines including essay scoring and Speech scoring. During her tenure at ETS, she has led the statistical operational analysis and scoring for EXADEP, NAC and TOEIC® Institutional programs, has been a development scientist for innovative research applications, and a psychometrician for several operational programs.

She is the co-author Bayesian Networks in Educational Assessment (Almond, Mislevy, Steinberg, Yan, and Williamson), co-editor for Computerized Multistage Testing: Theory and Applications (Yan, von Davier, Lewis). She was further the recipient of the ETS presidential award and spot awards, the NCME Brenda H. Loyd Outstanding Dissertation Award, the IACAT Early career award, and the AERA Division D Significant Contribution to Educational Measurement and Research Methodology award. She has been an invited training session/workshop lecture, symposium organizer, discussant, and presenter at many conferences including the National Council of Measurement in Education (NCME), the International Association for Computerized Adaptive Testing (IACAT), and the International Psychometrics Society (IMPS) conferences. Her current research interests include computerized multistage testing applications in operational programs, Bayesian inference methodologies, and automated scoring methodologies and applications.

Bio: Alina von Davier is the Vice President of ACTNext, the ACT, Inc. Research, Development, and Business Innovation Division, as well as an Adjunct Professor at Fordham University. She earned her PhD in mathematics from the Otto von Guericke University of Magdeburg, Germany, and her MS in mathematics from the University of Bucharest, Romania. At ACT, von Davier and her team of experts are responsible for developing prototypes of research-based solutions and creating a research agenda to support the next generation for learning and assessment systems (LAS). She pioneers the development and application of computational psychometrics and conducts research on blending machine learning algorithms with the psychometric theory. Prior to her employment with ACT, von Davier was a Senior Research Director at Educational Testing Service (ETS) where she led the Computational Psychometrics Research Center. Previously, she led the Center for Psychometrics for International Tests, where she managed a large group of psychometricians, and was responsible for both the psychometrics in support of international tests, TOEFL ® and TOEIC ® , and the scores reported to millions of test takers annually. Prior to joining ETS, she worked in Germany at the Universities of Trier, Magdeburg, Kiel, and Jena, and at the ZUMA in Mannheim, and in Romania, at the Institute of Psychology of the Romanian Academy.

Two of her volumes, a co-edited volume on Computerized Multistage Testing, and an edited volume on test equating, Statistical Models for Test Equating, Scaling, and Linking, were selected, respectively, as the 2016 and 2013 winners of the AERA Division D Significant Contribution to Educational Measurement and Research Methodology award. In addition, she wrote or co-edited five other books and volumes on statistic and psychometric topics. Her current research interests involve developing and adapting methodologies in support of virtual and collaborative learning and assessment systems. Machine learning and data mining techniques, Bayesian inference methods, and stochastic processes are the key set of tools employed in her current research. She serves as an Associate Editor for Psychometrika and the Journal of Educational Measurement.

PLENARY SYMPOSIUM: Item Pool Design and Evaluation

Presenters: Mark D. Reckase (Michigan State University); Wei He (Northwest Evaluation Association); Emre Gonulates (Western Michigan University); Jing-Ru Xu (Pearson VUE); Xuechun Zhou (Pearson Clinical Assessment)

Introductory Topic: The Need for Item Pool Design and Evaluation for Operational CATs

Abstract: Early work on CAT tended to use existing sets of items which came from fixed length test forms. These sets of items were selected to meet much different requirements than are needed for a CAT; decision making or covering a content domain. However, there was also some early work that suggested having items equally distributed over the range of proficiency that was of interest or concentrated at a decision point. There was also some work that showed that there was bias in proficiency estimates when an item pool was too easy or too hard. These early findings eventually led to work on item pool design and, more recently, on item pool evaluation. This presentation gives a brief overview of these topics to give some context for the following presentations in this symposium.

Bio: Mark Reckase is a University Distinguished Professor Emeritus at Michigan State University where he has taught courses in psychometric theory and various apsects of item response theory. He has also done work on standard setting procedures for educational and licensure tests, the use of statistical models for evaluating the performance of teachers, international studies of the preparation of teachers of mathematics, and the design and implementation of computerized adaptive tests. He has been the editor of Applied Psychological Measurement and the Journal of Educational Measurement. He has been the president of the National Council on Measurement in Education (NCME), the vice president of Division D of the American Educational Research Association, and the secretary of the Psychometric Society.

Topic: Item Pool Design for CAT with Repeaters

Abstract: Item pool design in CAT focuses on developing an item pool blueprint in which the distribution of numbers of items with all possible combinations of the relevant statistical and non-statistical attributes of the items are described. Typically, no items are available at the time of the design. Several factors have been documented to affect the item pool design and the ability distribution of the expected examinee population is one of them. This study takes a deeper look at how the ability distribution of the expected examinee population, due to having repeaters in the population, affects item pool design in an achievement test that employs a complex adaptive testing algorithm.

Bio: Wei He obtained her Ph.D. degree in Measurement and Quantitative Methods at Michigan State University. Her primary research interests include computerized adaptive/based testing, psychometrics, and large-scale educational assessment. Dr. He is currently Director of Psychometric Services Team at Northwest Evaluation Association, in Portland.

Topic: Statistical Methods for Evaluating the Sufficiency of Item Pools for Adaptive Tests

Abstract: An item pool that is sufficient for a particular CAT can function differently for another CAT with different specifications or target population. Traditional methods such as the item pool information function may not capture the sufficiency of an item pool especially if there are many constraints on the item selection algorithm. This session presents a new method to evaluate the item pool sufficiency and how to diagnose the potential inadequacies of an item pool.

Bio: Emre Gonulates is an Assistant Professor of evaluation, measurement and research in Western Michigan University. He earned a Ph.D. in measurement and quantitative methods from Michigan State University, where he also received an M.S. in statistics. He also earned M.S. and B.S. in mathematics education from Bogazici University, Turkey. Prior to his academic work, Emre worked as a high school mathematics teacher. His research interests include educational measurement and psychometrics focusing on computerized adaptive testing.

Topic: Evaluating Different Content Distributions for a Variable Length Content-Balanced CAT

Abstract: In computerized adaptive testing (CAT), content balancing designs are used to control the content coverage of the items administered to test candidates. In current CAT studies, the content balancing design assures all candidates take a similar proportion of items from each content area, and in some cases of a fixed length CAT each candidate will get the exact same number of items from each domain. This research proposes different content balancing designs for a variable length CAT. Different simulation studies and evaluation criteria were applied to investigate and evaluate the influence of the content balancing control for a variable length CAT with multiple content areas. This study is a pioneer research for future analyses of CAT item pool evaluation and design under different conditions.

Bio: Jing-Ru Xu is a Psychometrician at Pearson VUE in Chicago, where she works on computerized adaptive licensure tests. She received her Ph.D. in Measurement and Quantitative Methods and M.S. in Statistics from Michigan State University.

Topic: Item Pool Design and Management using P-Optimal Method for mixed-format CAT

Abstract: The primary purpose of this study is to identify the item pool characteristics for a CAT program consisting of both dichotomous and polytomous items using the p-optimality method with the Rasch and partial credit model (PCM). Optimal item pools are generated using CAT simulations with two factors considered: stopping rules for dichotomous items as the first part of the test and item exposure rate control. The resulting characteristics describe the item difficulty distribution of two item types, proportions of the two item types, item pool information distribution, and pool sizes. The performances of the simulated item pools are evaluated against an operational item pool with respect to the ability estimate, classification at the cut point and pool utilization.

Bio: Xuechun Zhou is a Psychometrician at Pearson Clinical Assessment. She holds a Ph.D. in Measurement and Quantitative Method from Michigan State University. Her responsibilities at Pearson are involved in developing norm-reference psychological and educational assessments including research design and data analyses. Her research interests are optimal item pool design for computerized adaptive testing, item pool management, and statistical models in improving assessments and clinical utility.

IACAT Newsletter

Contact Us

Follow Us