Presidential Address: Improving precision of CAT measures
John Barnard, Executive Director, EPEC Pty Ltd & Professor, Universities of Sydney and Cape Town
The basic idea of adaptive testing is quite simple and has been implemented for over a century (Binet-Simon; oral examinations; etc.). Over the years item selection algorithms such as MI, MPP and WI have been developed to maximize efficiency and convergence and MLE, EAP and MAP are commonly used to estimate ability.
Dichotomously scored MCQs are mostly used to obtain response vectors in CATs. This means that a response is scored as either correct or incorrect. However, a correct response doesn’t necessarily mean that the test taker knew the answer. Although the SEM increasingly decreases as the provisional ability is estimated, the question is whether the process can be improved at the item response level. In other words, can more information be extracted from a response than a simple 0 or 1? In my presentation this question is addressed.
Bio: John is the founder and Executive Director of EPEC Pty Ltd in Melbourne, Australia and an internationally recognized expert in psychometrics, assessment and education. His work and leadership has been recognized through his appointment as professor with a dual appointment in an adjunct capacity at The University of Sydney and an honorary position at The University of Cape Town. He also taught courses in psychometrics, statistics and research methodology on a sessional basis at the Australian Catholic University from 2000 to 2013. John is a full member of a number of professional organizations, the latest as a Board member of the International Association for Computerized Adaptive Testing (IACAT) since 2010, elected as Vice President in 2013 and as President from 2015. John is also a member of the International Assessments Joint National Advisory Committee (IAJNAC). He is a consulting editor of the Journal of Computerized Adaptive Testing and a member of the International Editorial Board of the SA Journal of Science and Technology.
He holds three doctorates (D.Ed.; Ph.D.; Ed.D) following two masters' degrees (M.Sc.; M.Ed.). He specializes in psychometrics, computer-based assessment and measurement theories. His acclaimed work as a researcher has been acknowledged with various awards and promotion to the highest rank as chief research specialist. Following studies in the US, John pioneered the implementation of IRT in South Africa at national level, developed an item banking system and published the first CATs for selection of students in the late 80's. After resigning as professor at the University of South Africa, he migrated to Australia in 1996 to take up a position as research director and head of psychometrics. Since founding EPEC in 2000, John has been active in numerous projects - from engineering CATs, through psychometric services, to the development of Option Probability Theory.
Keynote 1: Some Exciting New Developments Concerning CAT Foundations and Implementations
Hua-Hua Chang, Professor, University of Illinois
Bio: Dr. Hua-Hua Chang is a Professor of Educational Psychology, Psychology and Statistics at the University of Illinois at Urbana-Champaign (UIUC). He is a practitioner turned professor. Before moving to academia in 2001, he had worked in the testing industry for nine years, of which six years were at Educational Testing Service in Princeton, NJ and three years were at the National Board of Medical Examiners in Philadelphia, PA. He is the Editor-in-Chief of Applied Psychological Measurement, past President of the Psychometric Society (2012-2013), and a Fellow of American Educational Research Association. Since 2008, he has been included on the “List of Teachers Ranked Excellent by Their Students” at UIUC for six consecutive years. Dr. Chang currently serves as the director of the Confucius Institute at UIUC, and he was most recently awarded the Changjiang Scholar Chair Professor by the Ministry of Education of PR China.
Keynote 2: Multidimensional CAT: Calibration, Model Fit, Secondary Analyses
Cees Glass, Department of Research Methodology, Measurement and Data Analysis, Faculty of Behavioural Sciences, University of Twente, the Netherlands
Bio: Cees Glas is the chair of the Department of Research Methodology, Measurement and Data Analysis, at the Faculty of Behavioural Science of the University of Twente in the Netherlands. The focus of his work is on estimation and testing of latent variable models in general and of IRT models in particular, and on the application of IRT models in educational and psychological testing. He participated in numerous research projects including projects of the Dutch National Institute for Educational Measurement (Cito, the Netherlands), the Law School Admission Council (USA) and the OECD international educational survey PISA. He serves as the chair of the technical advisory committee of the OECD project PIAAC.
With Wim van der Linden, he co-edited the volume Elements of Adaptive Testing (2010) published by Springer. Published articles, book chapters and supervised theses cover such topics as testing of fit to IRT models, Bayesian estimation of multidimensional and multilevel IRT models using MCMC, modeling with non-ignorable missing data, concurrent modeling of item responses and response times, concurrent modeling of item response and textual input, and the application of computerized adaptive testing in the context of health assessment and organisational psychology.
Keynote 3: Happy CAT: Options to Allow Test Takers to Review and Change Responses in CAT
Kyung (Chris) Han (GMAC)
With its well-known advantages such as improved measurement efficiency, computerized adaptive testing (CAT) is quickly becoming mainstream in the testing industry. Many test takers, however, say they are not necessarily happy with the testing experience under CAT. Most (if not all) CAT programs do not allow test takers to review and change their responses during the testing process in order to prevent individuals from attempting to game the CAT system. According to findings from our recent research study, more than 50% of test takers complained about increased test anxiety due to these CAT restrictions and more than 80% of test takers believe they would perform better on the test if they were allowed to review and change their responses. In this keynote session, Chris Han from Graduate Management Admission Council (GMAC®) will introduce several CAT testing options that would allow for response review and revision while still retaining the measurement efficiency of CAT and its robustness against attempts to game the CAT system.
Bio: Kyung (Chris) T. Han is a senior psychometrician, director at the Graduate Management Admission Council responsible for designing various test programs, including the Graduate Management Admission Test® (GMAT®) exam, and conducting psychometric research to improve and ensure the quality of the test programs. Han received his doctorate in Research and Evaluation Methods from the University of Massachusetts at Amherst. He received the Alicia Cascallar Award for an Outstanding Paper by an Early Career Scholar in 2012 and the Jason Millman Promising Measurement Scholar Award in 2013 from the National Council on Measurement in Education (NCME). He has presented and published numerous papers and book chapters on a variety of topics from item response theory, test validity, and test equating to adaptive testing. He also has developed several psychometric software programs including WinGen, IRTEQ, and SimulCAT, which are used widely in the test measurement field.
Keynote 4: The future of CAT should be Open Source
Michal Kosinski, University of Stanford
Bio: Michal is a Professor of Organizational Behaviour at Stanford University Graduate School of Business. His research focuses on humans in a digital environment and employs cutting-edge computational methods and Big Data mining. Michal holds a PhD in Psychology from University of Cambridge, an MPhil in Psychometrics, and a MS in Social Psychology. He previously worked at Microsoft Research, founded a successful ITC start-up and served as a brand manager for a major digital brand.
Keynote 5: A Self-replenishing Adaptive Test
Wim van der Linden(Pacific Metrics)
Items in the operational pool for an adaptive test have a restricted life span. Ideally, we should be able to replace them periodically, using the response data immediately both to calibrate the new items and score the examinees. In my presentation, I will show how a fully Bayesian approach to calibration, item selection, and examinee scoring statistics can be exploited to realize the ideal.
Bio: Wim J van der Linden is Distinghuished Sicentist and Director of Reseach Innovation, Pacific Metrics Corporation, Monterey, CA, and Professor Emeritus of Measurement and Data Analysis, University of Twente. He received his PhD in psychometrics from the University of Amsterdam. His research interests include test theory, computerized adaptive testing, optimal test assembly, test equating, modeling response times on test items, as well as decision theory and its application to problems of educational decision making. He is the author of Linear Models for Optimal Test Design published by Springer in 2005 and the editor of a new three-volume Handbook of Item Response Theory: Models, Statistical Tools, and Applications to be published by Chapman & Hall/CRC in 2015. He is also a co-editor of Computerized Adaptive Testing: Theory and Applications (Boston: Kluwer, 2000; with C. A. W. Glas), and its sequel Elements of Adaptive Testing (New York Springer, 2010; with C. A. W. Glas). Wim van der Linden has served on the editorial boards of nearly all major test-theory journals and is co-editor for the Chapman & Hal//CRC Series on Statistics for Social and Behavioral Sciences. He is also a former President of both the National Council on Measurement in Education (NCME) and the Psychometric Society, Fellow of the Center for Advanced Study in the Behavioral Sciences, Stanford, CA, was awarded an Honorary Doctorate from Umea University in Sweden in 2008, and is a recipient of the ATP and NCME Career Achievement Awards for his work on educational measurement.
Keynote 6: CAT and Optimal design for Rasch Poisson Counts Models
Heinz Holling, University of Münster
Keynote 7: Learning Parameters in Learning Environoments: Trials and Tribulations
Kevin Wilson, Knewton
Calibration of items for adaptive testing is a very well-studied problem, and frequently the solution involves carefully curating the populations exposed to particular items. However, in an adaptive learning environment, students are exposed to items based on their current goals and needs. Thus, in adaptive learning environments, response patterns of students are necessarily biased in ways that can typically be avoided in assessment contexts. In this talk, we discuss several variations on item response theory and response patterns observed in data from Knewton’s adaptive learning platform, and we connect the pitfalls they highlight to corner cases of these models that practitioners should be aware of.
Bio: Kevin studied mathematics at the University of Michigan, where he re-implemented the computer science department's autograding technology and studied parallel SAT solving. During this time, he also obtained several cryptography-related patents. He went on to get his Ph.D. at Princeton University where he worked with Manjul Bhargava and focused on the connections between representation theory, commutative algebra, and number theory. Kevin continues collaborating with number theorists, being especially interested in the security of certain cryptographic schemes and in random families of representations. He joined Knewton in 2012 and specializes in the proficiency models underlying Knewton’s recommendation system.