%0 Journal Article %J BMC Pediatrics %D 2020 %T Computerized adaptive testing to screen children for emotional and behavioral problems by preventive child healthcare %A Theunissen, Meninou H.C. %A de Wolff, Marianne S. %A Deurloo, Jacqueline A. %A Vogels, Anton G. C. %X

Background

Questionnaires to detect emotional and behavioral problems (EBP) in Preventive Child Healthcare (PCH) should be short which potentially affects validity and reliability. Simulation studies have shown that Computerized Adaptive Testing (CAT) could overcome these weaknesses. We studied the applicability (using the measures participation rate, satisfaction, and efficiency) and the validity of CAT in routine PCH practice.

Methods

We analyzed data on 461 children aged 10–11 years (response 41%), who were assessed during routine well-child examinations by PCH professionals. Before the visit, parents completed the CAT and the Child Behavior Checklist (CBCL). Satisfaction was measured by parent- and PCH professional-report. Efficiency of the CAT procedure was measured as number of items needed to assess whether a child has serious problems or not. Its validity was assessed using the CBCL as the criterion.

Results

Parents and PCH professionals rated the CAT on average as good. The procedure required at average 16 items to assess whether a child has serious problems or not. Agreement of scores on the CAT scales with corresponding CBCL scales was high (range of Spearman correlations 0.59–0.72). Area Under Curves (AUC) were high (range: 0.95–0.97) for the Psycat total, externalizing, and hyperactivity scales using corresponding CBCL scale scores as criterion. For the Psycat internalizing scale the AUC was somewhat lower but still high (0.86).

Conclusions

CAT is a valid procedure for the identification of emotional and behavioral problems in children aged 10–11 years. It may support the efficient and accurate identification of children with overall, and potentially also specific, emotional and behavioral problems in routine PCH.

%B BMC Pediatrics %V 20 %U https://bmcpediatr.biomedcentral.com/articles/10.1186/s12887-020-2018-1 %N Article number: 119 %0 Journal Article %J Applied Psychological Measurement %D 2020 %T Multidimensional Test Assembly Using Mixed-Integer Linear Programming: An Application of Kullback–Leibler Information %A Dries Debeer %A Peter W. van Rijn %A Usama S. Ali %X Many educational testing programs require different test forms with minimal or no item overlap. At the same time, the test forms should be parallel in terms of their statistical and content-related properties. A well-established method to assemble parallel test forms is to apply combinatorial optimization using mixed-integer linear programming (MILP). Using this approach, in the unidimensional case, Fisher information (FI) is commonly used as the statistical target to obtain parallelism. In the multidimensional case, however, FI is a multidimensional matrix, which complicates its use as a statistical target. Previous research addressing this problem focused on item selection criteria for multidimensional computerized adaptive testing (MCAT). Yet these selection criteria are not directly transferable to the assembly of linear parallel test forms. To bridge this gap the authors derive different statistical targets, based on either FI or the Kullback–Leibler (KL) divergence, that can be applied in MILP models to assemble multidimensional parallel test forms. Using simulated item pools and an item pool based on empirical items, the proposed statistical targets are compared and evaluated. Promising results with respect to the KL-based statistical targets are presented and discussed. %B Applied Psychological Measurement %V 44 %P 17-32 %U https://doi.org/10.1177/0146621619827586 %R 10.1177/0146621619827586 %0 Journal Article %J Journal of Educational Measurement %D 2019 %T Efficiency of Targeted Multistage Calibration Designs Under Practical Constraints: A Simulation Study %A Berger, Stéphanie %A Verschoor, Angela J. %A Eggen, Theo J. H. M. %A Moser, Urs %X Abstract Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we investigated whether the efficiency of calibration under the Rasch model could be enhanced by improving the match between item difficulty and student ability. We introduced targeted multistage calibration designs, a design type that considers ability-related background variables and performance for assigning students to suitable items. Furthermore, we investigated whether uncertainty about item difficulty could impair the assembling of efficient designs. The results indicated that targeted multistage calibration designs were more efficient than ordinary targeted designs under optimal conditions. Limited knowledge about item difficulty reduced the efficiency of one of the two investigated targeted multistage calibration designs, whereas targeted designs were more robust. %B Journal of Educational Measurement %V 56 %P 121-146 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12203 %R 10.1111/jedm.12203 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T Measuring patient-reported outcomes adaptively: Multidimensionality matters! %A Paap, Muirne C. S. %A Kroeze, Karel A. %A Glas, C. A. W. %A Terwee, C. B. %A van der Palen, Job %A Veldkamp, Bernard P. %B Applied Psychological Measurement %R 10.1177/0146621617733954 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Adaptive Item and Feedback Selection in Personalized Learning with a Network Approach %A Nikky van Buuren %A Hendrik Straat %A Theo Eggen %A Jean-Paul Fox %K feedback selection %K item selection %K network approach %K personalized learning %X

Personalized learning is a term used to describe educational systems that adapt student-specific curriculum sequencing, pacing, and presentation based on their unique backgrounds, knowledge, preferences, interests, and learning goals. (Chen, 2008; Netcoh, 2016). The technological approach to personalized learning provides data-driven models to incorporate these adaptations automatically. Examples of applications include online learning systems, educational games, and revision-aid systems. In this study we introduce Bayesian networks as a methodology to implement an adaptive framework within a personalized learning environment. Existing ideas from Computerized Adaptive Testing (CAT) with Item Response Theory (IRT), where choices about content provision are based on maximizing information, are related to the goals of personalized learning environments. Personalized learning entails other goals besides efficient ability estimation by maximizing information, such as an adaptive configuration of preferences and feedback to the student. These considerations will be discussed and their application in networks will be illustrated.

Adaptivity in Personalized Learning.In standard CAT’s there is a focus on selecting items that provide maximum information about the ability of an individual at a certain point in time (Van der Linden & Glas, 2000). When learning is the main goal of testing, alternative adaptive item selection methods were explored by Eggen (2012). The adaptive choices made in personalized learning applications require additional adaptivity with respect to the following aspects; the moment of feedback, the kind of feedback, and the possibility for students to actively influence the learning process.

Bayesian Networks and Personalized Learning.Personalized learning aims at constructing a framework to incorporate all the aspects mentioned above. Therefore, the goal of this framework is not only to focus on retrieving ability estimates by choosing items on maximum information, but also to construct a framework that allows for these other factors to play a role. Plajner and Vomlel (2016) have already applied Bayesian Networks to adaptive testing, selecting items with help of entropy reduction. Almond et al. (2015) provide a reference work on Bayesian Networks in Educational Assessment. Both acknowledge the potential of the method in terms of features such as modularity options to build finer-grained models. IRT does not allow to model sub-skills very easily and to gather information on fine-grained level, due to its dependency on the assumption of generally one underlying trait. The local independence assumption in IRT implies being interested in mainly the student’s overall ability on the subject of interest. When the goal is to improve student’s learning, we are not just interested in efficiently coming to their test score on a global subject. One wants a model that is able to map educational problems and talents in detail over the whole educational program, while allowing for dependency between items. The moment in time can influence topics to be better mastered than others, and this is exactly what we can to get out of a model. The possibility to model flexible structures, estimate abilities on a very detailed level for sub-skills and to easily incorporate other variables such as feedback in Bayesian Networks makes it a very promising method for making adaptive choices in personalized learning. It is shown in this research how item and feedback selection can be performed with help of the promising Bayesian Networks. A student involvement possibility is also introduced and evaluated.

References

Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., & Williamson, D. M. (2015). Bayesian Networks in Educational Assessment. Test. New York: Springer Science+Business Media. http://doi.org/10.1007/978-0-387-98138-3

Eggen, T.J.H.M. (2012) Computerized Adaptive Testing Item Selection in Computerized Adaptive Learning Systems. In: Eggen. TJHM & Veldkamp, BP.. (Eds). Psychometrics in Practice at RCEC. Enschede: RCEC

Netcoh, S. (2016, March). “What Do You Mean by ‘Personalized Learning?’. Croscutting Conversations in Education – Research, Reflections & Practice. Blogpost.

Plajner, M., & Vomlel, J. (2016). Student Skill Models in Adaptive Testing. In Proceedings of the Eighth International Conference on Probabilistic Graphical Models (pp. 403-414).

Van der Linden, W. J., & Glas, C. A. (2000). Computerized adaptive testing: Theory and practice. Dordrecht: Kluwer Academic Publishers.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2017 %T ATS-PD: An Adaptive Testing System for Psychological Disorders %A Ivan Donadello %A Andrea Spoto %A Francesco Sambo %A Silvana Badaloni %A Umberto Granziol %A Giulio Vidotto %X The clinical assessment of mental disorders can be a time-consuming and error-prone procedure, consisting of a sequence of diagnostic hypothesis formulation and testing aimed at restricting the set of plausible diagnoses for the patient. In this article, we propose a novel computerized system for the adaptive testing of psychological disorders. The proposed system combines a mathematical representation of psychological disorders, known as the “formal psychological assessment,” with an algorithm designed for the adaptive assessment of an individual’s knowledge. The assessment algorithm is extended and adapted to the new application domain. Testing the system on a real sample of 4,324 healthy individuals, screened for obsessive-compulsive disorder, we demonstrate the system’s ability to support clinical testing, both by identifying the correct critical areas for each individual and by reducing the number of posed questions with respect to a standard written questionnaire. %B Educational and Psychological Measurement %V 77 %P 792-815 %U https://doi.org/10.1177/0013164416652188 %R 10.1177/0013164416652188 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Bayesian Perspectives on Adaptive Testing %A Wim J. van der Linden %A Bingnan Jiang %A Hao Ren %A Seung W. Choi %A Qi Diao %K Bayesian Perspective %K CAT %X

Although adaptive testing is usually treated from the perspective of maximum-likelihood parameter estimation and maximum-informaton item selection, a Bayesian pespective is more natural, statistically efficient, and computationally tractable. This observation not only holds for the core process of ability estimation but includes such processes as item calibration, and real-time monitoring of item security as well. Key elements of the approach are parametric modeling of each relevant process, updating of the parameter estimates after the arrival of each new response, and optimal design of the next step.

The purpose of the symposium is to illustrates the role of Bayesian statistics in this approach. The first presentation discusses a basic Bayesian algorithm for the sequential update of any parameter in adaptive testing and illustrates the idea of Bayesian optimal design for the two processes of ability estimation and online item calibration. The second presentation generalizes the ideas to the case of 62 IACAT 2017 ABSTRACTS BOOKLET adaptive testing with polytomous items. The third presentation uses the fundamental Bayesian idea of sampling from updated posterior predictive distributions (“multiple imputations”) to deal with the problem of scoring incomplete adaptive tests.

Session Video 1

Session Video 2

 

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Efficiency of Targeted Multistage Calibration Designs under Practical Constraints: A Simulation Study %A Stephanie Berger %A Angela J. Verschoor %A Theo Eggen %A Urs Moser %K CAT %K Efficiency %K Multistage Calibration %X

Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we focused on two related research questions. First, we investigated whether the efficiency of item calibration under the Rasch model could be enhanced by calibration designs that optimize the match between item difficulty and student ability (Berger, 1991). Therefore, we introduced targeted multistage calibration designs, a design type that refers to a combination of traditional targeted calibration designs and multistage designs. As such, targeted multistage calibration designs consider ability-related background variables (e.g., grade in school), as well as performance (i.e., outcome of a preceding test stage) for assigning students to suitable items.

Second, we explored how limited a priori knowledge about item difficulty affects the efficiency of both targeted calibration designs and targeted multistage calibration designs. When arranging items within a given calibration design, test developers need to know the item difficulties to locate items optimally within the design. However, usually, no empirical information about item difficulty is available before item calibration. Owing to missing empirical data, test developers might fail to assign all items to the most suitable location within a calibration design.

Both research questions were addressed in a simulation study in which we varied the calibration design, as well as the accuracy of item distribution across the different booklets or modules within each design (i.e., number of misplaced items). The results indicated that targeted multistage calibration designs were more efficient than ordinary targeted designs under optimal conditions. Especially, targeted multistage calibration designs provided more accurate estimates for very easy and 52 IACAT 2017 ABSTRACTS BOOKLET very difficult items. Limited knowledge about item difficulty during test construction impaired the efficiency of all designs. The loss of efficiency was considerably large for one of the two investigated targeted multistage calibration designs, whereas targeted designs were more robust.

References

Berger, M. P. F. (1991). On the efficiency of IRT models when applied to different sampling designs. Applied Psychological Measurement, 15(3), 293–306. doi:10.1177/014662169101500310

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/file/d/1ko2LuiARKqsjL_6aupO4Pj9zgk6p_xhd/view?usp=sharing %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Grow a Tiger out of Your CAT %A Angela Verschoor %K interoparability %K Scalability %K transparency %X

The main focus in the community of test developers and researchers is on improving adaptive test procedures and methodologies. Yet, the transition from research projects to larger-scale operational CATs is facing its own challenges. Usually, these operational CATs find their origin in government tenders. “Scalability”, “Interoperability” and “Transparency” are three keywords often found in these documents. Scalability is concerned with parallel system architectures which are based upon stateless selection algorithms. Design capacities often range from 10,000 to well over 100,000 concurrent students. Interoperability is implemented in standards like QTI, standards that were not designed with adaptive testing in mind. Transparency is being realized by open source software: the adaptive test should not be a black box. These three requirements often complicate the development of an adaptive test, or sometimes even conflict.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T The Implementation of Nationwide High Stakes Computerized (adaptive) Testing in the Netherlands %A Mia van Boxel %A Theo Eggen %K High stakes CAT %K Netherlands %K WISCAT %X

In this presentation the challenges of implementation of (adaptive) digital testing in the Facet system in the Netherlands is discussed. In the Netherlands there is a long tradition of implementing adaptive testing in educational settings. Already since the late nineties of the last century adaptive testing was used mostly in low stakes testing. Several CATs were implemented in student monitoring systems for primary education and in the general subjects language and arithmetic in vocational education. The only nationwide implemented high stakes CAT is the WISCAT-pabo: an arithmetic test for students in the first year of primary school teacher colleges. The psychometric advantages of item based adaptive testing are obvious. For example efficiency and high measurement precision. But there are also some disadvantages such as the impossibility of reviewing items during and after the test. During the test the student is not in control of his own test; e.q . he can only navigate forward to the next item. This is one of the reasons other methods of testing, such as multistage-testing, with adaptivity not on the item level but on subtest level, has become more popular to use in high stakes testing.

A main challenge of computerized (adaptive) testing is the implementation of the item bank and the test workflow in a digital system. Since 2014 a nationwide new digital system (Facet) was introduced in the Netherlands, with connections to the digital systems of different parties based on international standards (LTI and QTI). The first nationwide tests in the Facet-system were flexible exams Dutch and arithmetic for vocational (and secondary) education, taken as item response theory-based equated linear multiple forms tests, which are administered during 5 periods in a year. Nowadays there are some implementations of different methods of (multistage) adaptive testing in the same Facet system (DTT en Acet).

In this conference, other presenters of Cito will elaborate on the psychometric characteristics of this other adaptive testing methods. In this contribution, the system architecture and interoperability of the Facet system will be explained. The emphasis is on the implementation and the problems to be solved by using this digital system in all phases of the (adaptive) testing process: item banking, test construction, designing, publication, test taking, analyzing and reporting to the student. An evaluation of the use of the system will be presented.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1Kn1PvgioUYaOJ5pykq-_XWnwDU15rRsf %0 Journal Article %J Quality of Life Research %D 2017 %T Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life %A Paap, Muirne C. S. %A Kroeze, Karel A. %A Terwee, Caroline B. %A van der Palen, Job %A Veldkamp, Bernard P. %B Quality of Life Research %V 26 %P 2909–2918 %U https://doi.org/10.1007/s11136-017-1624-3 %R 10.1007/s11136-017-1624-3 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2017 %T Latent-Class-Based Item Selection for Computerized Adaptive Progress Tests %A van Buuren, Nikky %A Eggen, Theo J. H. M. %K computerized adaptive progress test %K item selection method %K Kullback-Leibler information %K Latent class analysis %K log-odds scoring %B Journal of Computerized Adaptive Testing %V 5 %P 22-43 %U http://iacat.org/jcat/index.php/jcat/article/view/62/29 %N 2 %R 10.7333/1704-0502022 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T New Results on Bias in Estimates due to Discontinue Rules in Intelligence Testing %A Matthias von Davier %A Youngmi Cho %A Tianshu Pan %K Bias %K CAT %K Intelligence Testing %X

The presentation provides new results on a form of adaptive testing that is used frequently in intelligence testing. In these tests, items are presented in order of increasing difficulty, and the presentation of items is adaptive in the sense that each subtest session is discontinued once a test taker produces a certain number of incorrect responses in sequence. The subsequent (not observed) responses are commonly scored as wrong for that subtest, even though the test taker has not seen these. Discontinuation rules allow a certain form of adaptiveness both in paper-based and computerbased testing, and help reducing testing time.

Two lines of research that are relevant are studies that directly assess the impact of discontinuation rules, and studies that more broadly look at the impact of scoring rules on test results with a large number of not administered or not reached items. He & Wolf (2012) compared different ability estimation methods for this type of discontinuation rule adaptation of test length in a simulation study. However, to our knowledge there has been no rigorous analytical study of the underlying distributional changes of the response variables under discontinuation rules. It is important to point out that the results obtained by He & Wolf (2012) agree with results presented by, for example, DeAyala, Plake & Impara (2001) as well as Rose, von Davier & Xu (2010) and Rose, von Davier & Nagengast (2016) in that ability estimates are biased most when scoring the not observed responses as wrong. Discontinuation rules combined with scoring the non-administered items as wrong is used operationally in several major intelligence tests, so more research is needed in order to improve this particular type of adaptiveness in the testing practice.

The presentation extends existing research on adaptiveness by discontinue-rules in intelligence tests in multiple ways: First, a rigorous analytical study of the distributional properties of discontinue-rule scored items is presented. Second, an extended simulation is presented that includes additional alternative scoring rules as well as bias-corrected ability estimators that may be suitable to improve results for discontinue-rule scored intelligence tests.

References: DeAyala, R. J., Plake, B. S., & Impara, J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Journal of Educational Measurement, 38, 213-234.

He, W. & Wolfe, E. W. (2012). Treatment of Not-Administered Items on Individually Administered Intelligence Tests. Educational and Psychological Measurement, Vol 72, Issue 5, pp. 808 – 826. DOI: 10.1177/0013164412441937

Rose, N., von Davier, M., & Xu, X. (2010). Modeling non-ignorable missing data with item response theory (IRT; ETS RR-10-11). Princeton, NJ: Educational Testing Service.

Rose, N., von Davier, M., & Nagengast, B. (2016) Modeling omitted and not-reached items in irt models. Psychometrika. doi:10.1007/s11336-016-9544-7

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Journal Article %J Frontiers in Education %D 2017 %T Robust Automated Test Assembly for Testlet-Based Tests: An Illustration with Analytical Reasoning Items %A Veldkamp, Bernard P. %A Paap, Muirne C. S. %B Frontiers in Education %V 2 %P 63 %U https://www.frontiersin.org/article/10.3389/feduc.2017.00063 %R 10.3389/feduc.2017.00063 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Using Determinantal Point Processes for Multistage Testing %A Jill-Jênn Vie %K Multidimentional CAT %K multistage testing %X

Multistage tests are a generalization of computerized adaptive tests (CATs), that allow to ask batches of questions before starting to adapt the process, instead of asking questions one by one. In order to be provided in real-world scenarios, they should be assembled on the fly, and recent models have been designed accordingly (Zheng & Chang, 2015). We will present a new algorithm for assembling multistage tests, based on a recent technique in machine learning called determinantal point processes. We will illustrate this technique on various student data that come from fraction subtraction items, or massive online open courses.

In multidimensional CATs, feature vectors are estimated for students and questions, and the probability that a student gets a question correct depends on how much their feature vector is correlated with the question feature vector. In other words, questions that are close in space lead to similar response patterns from the students. Therefore, in order to maximize the information of a batch of questions, the volume spanned by their feature vectors should be as large as possible. Determinantal point processes allow to sample efficiently batches of items from a bank that are diverse, i.e., that span a large volume: it is actually possible to draw k items among n with a O(nk3 ) complexity, which is convenient for large databases of 10,000s of items.

References

Zheng, Y., & Chang, H. H. (2015). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104-118.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1GkJkKTEFWK3srDX8TL4ra_Xbsliemu1R %0 Journal Article %J Journal of Educational Measurement %D 2016 %T On the Issue of Item Selection in Computerized Adaptive Testing With Response Times %A Veldkamp, Bernard P. %X Many standardized tests are now administered via computer rather than paper-and-pencil format. The computer-based delivery mode brings with it certain advantages. One advantage is the ability to adapt the difficulty level of the test to the ability level of the test taker in what has been termed computerized adaptive testing (CAT). A second advantage is the ability to record not only the test taker's response to each item (i.e., question), but also the amount of time the test taker spends considering and answering each item. Combining these two advantages, various methods were explored for utilizing response time data in selecting appropriate items for an individual test taker.Four strategies for incorporating response time data were evaluated, and the precision of the final test-taker score was assessed by comparing it to a benchmark value that did not take response time information into account. While differences in measurement precision and testing times were expected, results showed that the strategies did not differ much with respect to measurement precision but that there were differences with regard to the total testing time. %B Journal of Educational Measurement %V 53 %P 212–228 %U http://dx.doi.org/10.1111/jedm.12110 %R 10.1111/jedm.12110 %0 Journal Article %J Applied Psychological Measurement %D 2016 %T Multidimensional Computerized Adaptive Testing for Classifying Examinees With Within-Dimensionality %A van Groen, Maaike M. %A Eggen, Theo J. H. M. %A Veldkamp, Bernard P. %X A classification method is presented for adaptive classification testing with a multidimensional item response theory (IRT) model in which items are intended to measure multiple traits, that is, within-dimensionality. The reference composite is used with the sequential probability ratio test (SPRT) to make decisions and decide whether testing can be stopped before reaching the maximum test length. Item-selection methods are provided that maximize the determinant of the information matrix at the cutoff point or at the projected ability estimate. A simulation study illustrates the efficiency and effectiveness of the classification method. Simulations were run with the new item-selection methods, random item selection, and maximization of the determinant of the information matrix at the ability estimate. The study also showed that the SPRT with multidimensional IRT has the same characteristics as the SPRT with unidimensional IRT and results in more accurate classifications than the latter when used for multidimensional data. %B Applied Psychological Measurement %V 40 %P 387-404 %U http://apm.sagepub.com/content/40/6/387.abstract %R 10.1177/0146621616648931 %0 Journal Article %J Applied Psychological Measurement %D 2014 %T Computerized Adaptive Testing for the Random Weights Linear Logistic Test Model %A Crabbe, Marjolein %A Vandebroek, Martina %X

This article discusses four-item selection rules to design efficient individualized tests for the random weights linear logistic test model (RWLLTM): minimum posterior-weighted -error minimum expected posterior-weighted -error maximum expected Kullback–Leibler divergence between subsequent posteriors (KLP), and maximum mutual information (MUI). The RWLLTM decomposes test items into a set of subtasks or cognitive features and assumes individual-specific effects of the features on the difficulty of the items. The model extends and improves the well-known linear logistic test model in which feature effects are only estimated at the aggregate level. Simulations show that the efficiencies of the designs obtained with the different criteria appear to be equivalent. However, KLP and MUI are given preference over and due to their lesser complexity, which significantly reduces the computational burden.

%B Applied Psychological Measurement %V 38 %P 415-431 %U http://apm.sagepub.com/content/38/6/415.abstract %R 10.1177/0146621614533987 %0 Book %D 2014 %T Computerized multistage testing: Theory and applications %A Duanli Yan %A Alina A von Davier %A Charles Lewis %I CRC Press %C Boca Raton FL %@ 13-978-1-4665-0577-3 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2014 %T Item Selection Methods Based on Multiple Objective Approaches for Classifying Respondents Into Multiple Levels %A van Groen, Maaike M. %A Eggen, Theo J. H. M. %A Veldkamp, Bernard P. %X

Computerized classification tests classify examinees into two or more levels while maximizing accuracy and minimizing test length. The majority of currently available item selection methods maximize information at one point on the ability scale, but in a test with multiple cutting points selection methods could take all these points simultaneously into account. If for each cutting point one objective is specified, the objectives can be combined into one optimization function using multiple objective approaches. Simulation studies were used to compare the efficiency and accuracy of eight selection methods in a test based on the sequential probability ratio test. Small differences were found in accuracy and efficiency between different methods depending on the item pool and settings of the classification method. The size of the indifference region had little influence on accuracy but considerable influence on efficiency. Content and exposure control had little influence on accuracy and efficiency.

%B Applied Psychological Measurement %V 38 %P 187-200 %U http://apm.sagepub.com/content/38/3/187.abstract %R 10.1177/0146621613509723 %0 Journal Article %J Applied Psychological Measurement %D 2013 %T Uncertainties in the Item Parameter Estimates and Robust Automated Test Assembly %A Veldkamp, Bernard P. %A Matteucci, Mariagiulia %A de Jong, Martijn G. %X

Item response theory parameters have to be estimated, and because of the estimation process, they do have uncertainty in them. In most large-scale testing programs, the parameters are stored in item banks, and automated test assembly algorithms are applied to assemble operational test forms. These algorithms treat item parameters as fixed values, and uncertainty is not taken into account. As a consequence, resulting tests might be off target or less informative than expected. In this article, the process of parameter estimation is described to provide insight into the causes of uncertainty in the item parameters. The consequences of uncertainty are studied. Besides, an alternative automated test assembly algorithm is presented that is robust against uncertainties in the data. Several numerical examples demonstrate the performance of the robust test assembly algorithm, and illustrate the consequences of not taking this uncertainty into account. Finally, some recommendations about the use of robust test assembly and some directions for further research are given.

%B Applied Psychological Measurement %V 37 %P 123-139 %U http://apm.sagepub.com/content/37/2/123.abstract %R 10.1177/0146621612469825 %0 Generic %D 2011 %T Cross-cultural development of an item list for computer-adaptive testing of fatigue in oncological patients %A Giesinger, J. M. %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Arraras, J. I. %A Conroy, T. %A Gamper, E. M. %A Kemmler, G. %A King, M. T. %A Oberguggenberger, A. S. %A Velikova, G. %A Young, T. %A Holzner, B. %A Eortc-Qlg, E. O. %X ABSTRACT: INTRODUCTION: Within an ongoing project of the EORTC Quality of Life Group, we are developing computerized adaptive test (CAT) measures for the QLQ-C30 scales. These new CAT measures are conceptualised to reflect the same constructs as the QLQ-C30 scales. Accordingly, the Fatigue-CAT is intended to capture physical and general fatigue. METHODS: The EORTC approach to CAT development comprises four phases (literature search, operationalisation, pre-testing, and field testing). Phases I-III are described in detail in this paper. A literature search for fatigue items was performed in major medical databases. After refinement through several expert panels, the remaining items were used as the basis for adapting items and/or formulating new items fitting the EORTC item style. To obtain feedback from patients with cancer, these English items were translated into Danish, French, German, and Spanish and tested in the respective countries. RESULTS: Based on the literature search a list containing 588 items was generated. After a comprehensive item selection procedure focusing on content, redundancy, item clarity and item difficulty a list of 44 fatigue items was generated. Patient interviews (n=52) resulted in 12 revisions of wording and translations. DISCUSSION: The item list developed in phases I-III will be further investigated within a field-testing phase (IV) to examine psychometric characteristics and to fit an item response theory model. The Fatigue CAT based on this item bank will provide scores that are backward-compatible to the original QLQ-C30 fatigue scale. %B Health and Quality of Life Outcomes %7 2011/03/31 %V 9 %P 10 %8 March 29, 2011 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G Eng %M 21447160 %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T Item Selection Methods based on Multiple Objective Approaches for Classification of Respondents into Multiple Levels %A Maaike van Groen %A Theo Eggen %A Bernard Veldkamp %K adaptive classification test %K CAT %K item selection %K sequential classification test %X

Is it possible to develop new item selection methods which take advantage of the fact that we want to classify into multiple categories? New methods: Taking multiple points on the ability scale into account; Based on multiple objective approaches.

Conclusions

%B Annual Conference of the International Association for Computerized Adaptive Testing %8 10/2011 %G eng %0 Journal Article %J Journal of Methods and Measurement in the Social Sciences %D 2011 %T Measuring Individual Growth With Conventional and Adaptive Tests %A Weiss, D. J. %A Von Minden, S. %B Journal of Methods and Measurement in the Social Sciences %V 2 %P 80-101 %G English %N 1 %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T Optimal Calibration Designs for Computerized Adaptive Testing %A Angela Verschoor %K balanced block design %K CAT %K item calibration %K optimization %K Rasch %X

Optimaztion

How can we exploit the advantages of Balanced Block Design while keeping the logistics manageable?

Homogeneous Designs: Overlap between test booklets as regular as possible

Conclusions:

%B Annual Conference of the International Association for Computerized Adaptive Testing %8 10/2011 %G eng %0 Conference Paper %B Annual Conference of the International Association for Computerized Adaptive Testing %D 2011 %T A Test Assembly Model for MST %A Angela Verschoor %A Ingrid Radtke %A Theo Eggen %K CAT %K mst %K multistage testing %K Rasch %K routing %K tif %X

This study is just a short exploration in the matter of optimization of a MST. It is extremely hard or maybe impossible to chart influence of item pool and test specifications on optimization process. Simulations are very helpful in finding an acceptable MST.

%B Annual Conference of the International Association for Computerized Adaptive Testing %8 10/2011 %G eng %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Adaptive Mastery Testing Using a Multidimensional IRT Model %A Glas, C. A. W. %A Vos, H. J. %B Elements of Adaptive Testing %P 409-431 %G eng %& 21 %R 10.1007/978-0-387-85461-8 %0 Journal Article %J Psicologica %D 2010 %T Bayesian item selection in constrained adaptive testing %A Veldkamp, B. P. %K computerized adaptive testing %X Application of Bayesian item selection criteria in computerized adaptive testing might result in improvement of bias and MSE of the ability estimates. The question remains how to apply Bayesian item selection criteria in the context of constrained adaptive testing, where large numbers of specifications have to be taken into account in the item selection process. The Shadow Test Approach is a general purpose algorithm for administering constrained CAT. In this paper it is shown how the approach can be slightly modified to handle Bayesian item selection criteria. No differences in performance were found between the shadow test approach and the modifiedapproach. In a simulation study of the LSAT, the effects of Bayesian item selection criteria are illustrated. The results are compared to item selection based on Fisher Information. General recommendations about the use of Bayesian item selection criteria are provided. %B Psicologica %V 31 %P 149-169 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2010 %T A Comparison of Item Selection Techniques for Testlets %A Murphy, Daniel L. %A Dodd, Barbara G. %A Vaughn, Brandon K. %X

This study examined the performance of the maximum Fisher’s information, the maximum posterior weighted information, and the minimum expected posterior variance methods for selecting items in a computerized adaptive testing system when the items were grouped in testlets. A simulation study compared the efficiency of ability estimation among the item selection techniques under varying conditions of local-item dependency when the response model was either the three-parameter-logistic item response theory or the three-parameter-logistic testlet response theory. The item selection techniques performed similarly within any particular condition, the practical implications of which are discussed within the article.

%B Applied Psychological Measurement %V 34 %P 424-437 %U http://apm.sagepub.com/content/34/6/424.abstract %R 10.1177/0146621609349804 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Constrained Adaptive Testing with Shadow Tests %A van der Linden, W. J. %B Elements of Adaptive Testing %P 31-56 %G eng %& 2 %R 10.1007/978-0-387-85461-8 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Designing Item Pools for Adaptive Testing %A Veldkamp, B. P. %A van der Linden, W. J. %B Elements of Adaptive Testing %P 231-245 %G eng %& 12 %R 10.1007/978-0-387-85461-8 %0 Journal Article %J Personality and Individual Differences %D 2010 %T Detection of aberrant item score patterns in computerized adaptive testing: An empirical example using the CUSUM %A Egberink, I. J. L. %A Meijer, R. R. %A Veldkamp, B. P. %A Schakel, L. %A Smid, N. G. %K CAT %K computerized adaptive testing %K CUSUM approach %K person Fit %X The scalability of individual trait scores on a computerized adaptive test (CAT) was assessed through investigating the consistency of individual item score patterns. A sample of N = 428 persons completed a personality CAT as part of a career development procedure. To detect inconsistent item score patterns, we used a cumulative sum (CUSUM) procedure. Combined information from the CUSUM, other personality measures, and interviews showed that similar estimated trait values may have a different interpretation.Implications for computer-based assessment are discussed. %B Personality and Individual Differences %V 48 %P 921-925 %@ 01918869 %G eng %0 Journal Article %J Quality of Life Research %D 2010 %T Development of computerized adaptive testing (CAT) for the EORTC QLQ-C30 physical functioning dimension %A Petersen, M. A. %A Groenvold, M. %A Aaronson, N. K. %A Chie, W. C. %A Conroy, T. %A Costantini, A. %A Fayers, P. %A Helbostad, J. %A Holzner, B. %A Kaasa, S. %A Singer, S. %A Velikova, G. %A Young, T. %X PURPOSE: Computerized adaptive test (CAT) methods, based on item response theory (IRT), enable a patient-reported outcome instrument to be adapted to the individual patient while maintaining direct comparability of scores. The EORTC Quality of Life Group is developing a CAT version of the widely used EORTC QLQ-C30. We present the development and psychometric validation of the item pool for the first of the scales, physical functioning (PF). METHODS: Initial developments (including literature search and patient and expert evaluations) resulted in 56 candidate items. Responses to these items were collected from 1,176 patients with cancer from Denmark, France, Germany, Italy, Taiwan, and the United Kingdom. The items were evaluated with regard to psychometric properties. RESULTS: Evaluations showed that 31 of the items could be included in a unidimensional IRT model with acceptable fit and good content coverage, although the pool may lack items at the upper extreme (good PF). There were several findings of significant differential item functioning (DIF). However, the DIF findings appeared to have little impact on the PF estimation. CONCLUSIONS: We have established an item pool for CAT measurement of PF and believe that this CAT instrument will clearly improve the EORTC measurement of PF. %B Quality of Life Research %7 2010/10/26 %V 20 %P 479-490 %@ 1573-2649 (Electronic)0962-9343 (Linking) %G Eng %M 20972628 %0 Book %D 2010 %T Elements of Adaptive Testing %A van der Linden, W. J. %A Glas, C. A. W. %I Springer %C New York %P 437 %G eng %R 10.1007/978-0-387-85461-8 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Estimation of the Parameters in an Item-Cloning Model for Adaptive Testing %A Glas, C. A. W. %A van der Linden, W. J. %A Geerlings, H. %B Elements of Adaptive Testing %P 289-314 %G eng %& 15 %R 10.1007/978-0-387-85461-8 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Item Selection and Ability Estimation in Adaptive Testing %A van der Linden, W. J. %A Pashley, P. J. %B Elements of Adaptive Testing %I Springer %C New York %P 3-30 %G eng %& 1 %R 10.1007/978-0-387-85461-8 %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2010 %T Marginal likelihood inference for a model for item responses and response times %A Glas, C. A. W. %A van der Linden, W. J. %X

Marginal maximum-likelihood procedures for parameter estimation and testing the fit of a hierarchical model for speed and accuracy on test items are presented. The model is a composition of two first-level models for dichotomous responses and response times along with multivariate normal models for their item and person parameters. It is shown how the item parameters can easily be estimated using Fisher's identity. To test the fit of the model, Lagrange multiplier tests of the assumptions of subpopulation invariance of the item parameters (i.e., no differential item functioning), the shape of the response functions, and three different types of conditional independence were derived. Simulation studies were used to show the feasibility of the estimation and testing procedures and to estimate the power and Type I error rate of the latter. In addition, the procedures were applied to an empirical data set from a computerized adaptive test of language comprehension.

%B British Journal of Mathematical and Statistical Psychology %7 2010/01/30 %V 63 %P 603-26 %@ 0007-1102 (Print)0007-1102 (Linking) %G eng %M 20109271 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T MATHCAT: A Flexible Testing System in Mathematics Education for Adults %A Verschoor, Angela J. %A Straetmans, G. J. J. M. %B Elements of Adaptive Testing %P 137-150 %G eng %& 7 %R 10.1007/978-0-387-85461-8 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Multidimensional Adaptive Testing with Kullback–Leibler Information Item Selection %A Mulder, J. %A van der Linden, W. J. %B Elements of Adaptive Testing %P 77-102 %G eng %& 4 %R 10.1007/978-0-387-85461-8 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Sequencing an Adaptive Test Battery %A van der Linden, W. J. %B Elements of Adaptive Testing %G eng %& 5 %R 10.1007/978-0-387-85461-8 %0 Book Section %B Elements of Adaptive Testing %D 2010 %T Testlet-Based Adaptive Mastery Testing %A Vos, H. J. %A Glas, C. A. W. %B Elements of Adaptive Testing %P 387-409 %G eng %& 20 %R 10.1007/978-0-387-85461-8 %0 Generic %D 2010 %T Validation of a computer-adaptive test to evaluate generic health-related quality of life %A Rebollo, P. %A Castejon, I. %A Cuervo, J. %A Villa, G. %A Garcia-Cueto, E. %A Diaz-Cuervo, H. %A Zardain, P. C. %A Muniz, J. %A Alonso, J. %X BACKGROUND: Health Related Quality of Life (HRQoL) is a relevant variable in the evaluation of health outcomes. Questionnaires based on Classical Test Theory typically require a large number of items to evaluate HRQoL. Computer Adaptive Testing (CAT) can be used to reduce tests length while maintaining and, in some cases, improving accuracy. This study aimed at validating a CAT based on Item Response Theory (IRT) for evaluation of generic HRQoL: the CAT-Health instrument. METHODS: Cross-sectional study of subjects aged over 18 attending Primary Care Centres for any reason. CAT-Health was administered along with the SF-12 Health Survey. Age, gender and a checklist of chronic conditions were also collected. CAT-Health was evaluated considering: 1) feasibility: completion time and test length; 2) content range coverage, Item Exposure Rate (IER) and test precision; and 3) construct validity: differences in the CAT-Health scores according to clinical variables and correlations between both questionnaires. RESULTS: 396 subjects answered CAT-Health and SF-12, 67.2% females, mean age (SD) 48.6 (17.7) years. 36.9% did not report any chronic condition. Median completion time for CAT-Health was 81 seconds (IQ range = 59-118) and it increased with age (p < 0.001). The median number of items administered was 8 (IQ range = 6-10). Neither ceiling nor floor effects were found for the score. None of the items in the pool had an IER of 100% and it was over 5% for 27.1% of the items. Test Information Function (TIF) peaked between levels -1 and 0 of HRQoL. Statistically significant differences were observed in the CAT-Health scores according to the number and type of conditions. CONCLUSIONS: Although domain-specific CATs exist for various areas of HRQoL, CAT-Health is one of the first IRT-based CATs designed to evaluate generic HRQoL and it has proven feasible, valid and efficient, when administered to a broad sample of individuals attending primary care settings. %B Health and Quality of Life Outcomes %7 2010/12/07 %V 8 %P 147 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %M 21129169 %2 3022567 %0 Journal Article %J Psicothema %D 2009 %T Comparison of methods for controlling maximum exposure rates in computerized adaptive testing %A Barrada, J %A Abad, F. J. %A Veldkamp, B. P. %K *Numerical Analysis, Computer-Assisted %K Psychological Tests/*standards/*statistics & numerical data %X This paper has two objectives: (a) to provide a clear description of three methods for controlling the maximum exposure rate in computerized adaptive testing —the Symson-Hetter method, the restricted method, and the item-eligibility method— showing how all three can be interpreted as methods for constructing the variable sub-bank of items from which each examinee receives the items in his or her test; (b) to indicate the theoretical and empirical limitations of each method and to compare their performance. With the three methods, we obtained basically indistinguishable results in overlap rate and RMSE (differences in the third decimal place). The restricted method is the best method for controlling exposure rate, followed by the item-eligibility method. The worst method is the Sympson-Hetter method. The restricted method presents problems of sequential overlap rate. Our advice is to use the item-eligibility method, as it saves time and satisfies the goals of restricting maximum exposure. Comparación de métodos para el control de tasa máxima en tests adaptativos informatizados. Este artículo tiene dos objetivos: (a) ofrecer una descripción clara de tres métodos para el control de la tasa máxima en tests adaptativos informatizados, el método Symson-Hetter, el método restringido y el métodode elegibilidad del ítem, mostrando cómo todos ellos pueden interpretarse como métodos para la construcción del subbanco de ítems variable, del cual cada examinado recibe los ítems de su test; (b) señalar las limitaciones teóricas y empíricas de cada método y comparar sus resultados. Se obtienen resultados básicamente indistinguibles en tasa de solapamiento y RMSE con los tres métodos (diferencias en la tercera posición decimal). El método restringido es el mejor en el control de la tasa de exposición,seguido por el método de elegibilidad del ítem. El peor es el método Sympson-Hetter. El método restringido presenta un problema de solapamiento secuencial. Nuestra recomendación sería utilizar el método de elegibilidad del ítem, puesto que ahorra tiempo y satisface los objetivos de limitar la tasa máxima de exposición. %B Psicothema %7 2009/05/01 %V 21 %P 313-320 %8 May %@ 0214-9915 (Print)0214-9915 (Linking) %G eng %M 19403088 %0 Journal Article %J Psychometrika %D 2009 %T Multidimensional Adaptive Testing with Optimal Design Criteria for Item Selection %A Mulder, J. %A van der Linden, W. J. %X Several criteria from the optimal design literature are examined for use with item selection in multidimensional adaptive testing. In particular, it is examined what criteria are appropriate for adaptive testing in which all abilities are intentional, some should be considered as a nuisance, or the interest is in the testing of a composite of the abilities. Both the theoretical analyses and the studies of simulated data in this paper suggest that the criteria of A-optimality and D-optimality lead to the most accurate estimates when all abilities are intentional, with the former slightly outperforming the latter. The criterion of E-optimality showed occasional erratic behavior for this case of adaptive testing, and its use is not recommended. If some of the abilities are nuisances, application of the criterion of A(s)-optimality (or D(s)-optimality), which focuses on the subset of intentional abilities is recommended. For the measurement of a linear combination of abilities, the criterion of c-optimality yielded the best results. The preferences of each of these criteria for items with specific patterns of parameter values was also assessed. It was found that the criteria differed mainly in their preferences of items with different patterns of values for their discrimination parameters. %B Psychometrika %7 2010/02/02 %V 74 %P 273-296 %8 Jun %@ 0033-3123 (Print)0033-3123 (Linking) %G Eng %M 20119511 %2 2813188 %0 Journal Article %J Applied Psychological Measurement %D 2009 %T Multiple Maximum Exposure Rates in Computerized Adaptive Testing %A Barrada, Juan Ramón %A Veldkamp, Bernard P. %A Olea, Julio %X

Computerized adaptive testing is subject to security problems, as the item bank content remains operative over long periods and administration time is flexible for examinees. Spreading the content of a part of the item bank could lead to an overestimation of the examinees' trait level. The most common way of reducing this risk is to impose a maximum exposure rate (rmax) that no item should exceed. Several methods have been proposed with this aim. All of these methods establish a single value of rmax throughout the test. This study presents a new method, the multiple-rmax method, that defines as many values of rmax as the number of items presented in the test. In this way, it is possible to impose a high degree of randomness in item selection at the beginning of the test, leaving the administration of items with the best psychometric properties to the moment when the trait level estimation is most accurate. The implementation of the multiple-r max method is described and is tested in simulated item banks and in an operative bank. Compared with a single maximum exposure method, the new method has a more balanced usage of the item bank and delays the possible distortion of trait estimation due to security problems, with either no or only slight decrements of measurement accuracy.

%B Applied Psychological Measurement %V 33 %P 58-73 %U http://apm.sagepub.com/content/33/1/58.abstract %R 10.1177/0146621608315329 %0 Journal Article %J Zeitschrift für Psychologie / Journal of Psychology %D 2008 %T Adaptive models of psychological testing %A van der Linden, W. J. %B Zeitschrift für Psychologie / Journal of Psychology %V 216(1) %P 3–11 %G eng %0 Journal Article %J Zeitschrift für Psychologie / Journal of Psychology %D 2008 %T Adaptive Models of Psychological Testing %A van der Linden, W. J. %B Zeitschrift für Psychologie / Journal of Psychology %V 216 %P 1-2 %N 1 %R 10.1027/0044-3409.216.1.49 %0 Journal Article %J Zeitschrift für Psychologie / Journal of Psychology %D 2008 %T Computerized Adaptive Testing of Personality Traits %A Hol, A. M. %A Vorst, H. C. M. %A Mellenbergh, G. J. %K Adaptive Testing %K cmoputer-assisted testing %K Item Response Theory %K Likert scales %K Personality Measures %X

A computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a
conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun’s Adjective
Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima’s graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used).

%B Zeitschrift für Psychologie / Journal of Psychology %V 216 %P 12-21 %N 1 %R 10.1027/0044-3409.216.1.12 %0 Journal Article %J International Journal of Testing %D 2008 %T Implementing Sympson-Hetter Item-Exposure Control in a Shadow-Test Approach to Constrained Adaptive Testing %A Veldkamp, Bernard P. %A van der Linden, Wim J. %B International Journal of Testing %V 8 %P 272-289 %U http://www.tandfonline.com/doi/abs/10.1080/15305050802262233 %R 10.1080/15305050802262233 %0 Journal Article %J Zeitschrift für Psychologie %D 2008 %T Some new developments in adaptive testing technology %A van der Linden, W. J. %K computerized adaptive testing %X

In an ironic twist of history, modern psychological testing has returned to an adaptive format quite common when testing was not yet standardized. Important stimuli to the renewed interest in adaptive testing have been the development of item-response theory in psychometrics, which models the responses on test items using separate parameters for the items and test takers, and the use of computers in test administration, which enables us to estimate the parameter for a test taker and select the items in real time. This article reviews a selection from the latest developments in the technology of adaptive testing, such as constrained adaptive item selection, adaptive testing using rule-based item generation, multidimensional adaptive testing, adaptive use of test batteries, and the use of response times in adaptive testing.

%B Zeitschrift für Psychologie %V 216 %P 3-11 %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2008 %T Using response times for item selection in adaptive testing %A van der Linden, W. J. %B Journal of Educational and Behavioral Statistics %V 33 %P 5–20 %G eng %0 Journal Article %J Disability and Rehabilitation %D 2008 %T Utilizing Rasch measurement models to develop a computer adaptive self-report of walking, climbing, and running %A Velozo, C. A. %A Wang, Y. %A Lehman, L. A. %A Wang, J. H. %X Purpose.The purpose of this paper is to show how the Rasch model can be used to develop a computer adaptive self-report of walking, climbing, and running.Method.Our instrument development work on the walking/climbing/running construct of the ICF Activity Measure was used to show how to develop a computer adaptive test (CAT). Fit of the items to the Rasch model and validation of the item difficulty hierarchy was accomplished using Winsteps software. Standard error was used as a stopping rule for the CAT. Finally, person abilities were connected to items difficulties using Rasch analysis ‘maps’.Results.All but the walking one mile item fit the Rasch measurement model. A CAT was developed which selectively presented items based on the last calibrated person ability measure and was designed to stop when standard error decreased to a pre-set criterion. Finally, person ability measures were connected to the ability to perform specific walking/climbing/running activities using Rasch maps.Conclusions.Rasch measurement models can be useful in developing CAT measures for rehabilitation and disability. In addition to CATs reducing respondent burden, the connection of person measures to item difficulties may be important for the clinical interpretation of measures.Read More: http://informahealthcare.com/doi/abs/10.1080/09638280701617317 %B Disability and Rehabilitation %V 30 %P 458-467 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2007 %T Computerized adaptive testing for polytomous motivation items: Administration mode effects and a comparison with short forms %A Hol, A. M. %A Vorst, H. C. M. %A Mellenbergh, G. J. %K 2220 Tests & Testing %K Adaptive Testing %K Attitude Measurement %K computer adaptive testing %K Computer Assisted Testing %K items %K Motivation %K polytomous motivation %K Statistical Validity %K Test Administration %K Test Forms %K Test Items %X In a randomized experiment (n=515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit. (PsycINFO Database Record (c) 2007 APA ) (journal abstract) %B Applied Psychological Measurement %V 31 %P 412-429 %@ 0146-6216 %G English %M 2007-13340-003 %0 Journal Article %J Applied Psychological Measurement %D 2007 %T Computerized Adaptive Testing for Polytomous Motivation Items: Administration Mode Effects and a Comparison With Short Forms %A Hol, A. Michiel %A Vorst, Harrie C. M. %A Mellenbergh, Gideon J. %X

In a randomized experiment (n = 515), a computerized and a computerized adaptive test (CAT) are compared. The item pool consists of 24 polytomous motivation items. Although items are carefully selected, calibration data show that Samejima's graded response model did not fit the data optimally. A simulation study is done to assess possible consequences of model misfit. CAT efficiency was studied by a systematic comparison of the CAT with two types of conventional fixed length short forms, which are created to be good CAT competitors. Results showed no essential administration mode effects. Efficiency analyses show that CAT outperformed the short forms in almost all aspects when results are aggregated along the latent trait scale. The real and the simulated data results are very similar, which indicate that the real data results are not affected by model misfit.

%B Applied Psychological Measurement %V 31 %P 412-429 %U http://apm.sagepub.com/content/31/5/412.abstract %R 10.1177/0146621606297314 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2007 %T Conditional Item-Exposure Control in Adaptive Testing Using Item-Ineligibility Probabilities %A van der Linden, Wim J. %A Veldkamp, Bernard P. %X

Two conditional versions of the exposure-control method with item-ineligibility constraints for adaptive testing in van der Linden and Veldkamp (2004) are presented. The first version is for unconstrained item selection, the second for item selection with content constraints imposed by the shadow-test approach. In both versions, the exposure rates of the items are controlled using probabilities of item ineligibility given θ that adapt the exposure rates automatically to a goal value for the items in the pool. In an extensive empirical study with an adaptive version of the Law School Admission Test, the authors show how the method can be used to drive conditional exposure rates below goal values as low as 0.025. Obviously, the price to be paid for minimal exposure rates is a decrease in the accuracy of the ability estimates. This trend is illustrated with empirical data.

%B Journal of Educational and Behavioral Statistics %V 32 %P 398-418 %U http://jeb.sagepub.com/cgi/content/abstract/32/4/398 %R 10.3102/1076998606298044 %0 Book Section %D 2007 %T The development of a computerized adaptive test for integrity %A Egberink, I. J. L. %A Veldkamp, B. P. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2007 %T Development of a multiple-component CAT for measuring foreign language proficiency (SIMTEST) %A Sumbling, M. %A Sanz, P. %A Viladrich, M. C. %A Doval, E. %A Riera, L. %C D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Psycho-Oncology %D 2007 %T The initial development of an item bank to assess and screen for psychological distress in cancer patients %A Smith, A. B. %A Rush, R. %A Velikova, G. %A Wall, L. %A Wright, E. P. %A Stark, D. %A Selby, P. %A Sharpe, M. %K 3293 Cancer %K cancer patients %K Distress %K initial development %K Item Response Theory %K Models %K Neoplasms %K Patients %K Psychological %K psychological distress %K Rasch %K Stress %X Psychological distress is a common problem among cancer patients. Despite the large number of instruments that have been developed to assess distress, their utility remains disappointing. This study aimed to use Rasch models to develop an item-bank which would provide the basis for better means of assessing psychological distress in cancer patients. An item bank was developed from eight psychological distress questionnaires using Rasch analysis to link common items. Items from the questionnaires were added iteratively with common items as anchor points and misfitting items (infit mean square > 1.3) removed, and unidimensionality assessed. A total of 4914 patients completed the questionnaires providing an initial pool of 83 items. Twenty items were removed resulting in a final pool of 63 items. Good fit was demonstrated and no additional factor structure was evident from the residuals. However, there was little overlap between item locations and person measures, since items mainly targeted higher levels of distress. The Rasch analysis allowed items to be pooled and generated a unidimensional instrument for measuring psychological distress in cancer patients. Additional items are required to more accurately assess patients across the whole continuum of psychological distress. (PsycINFO Database Record (c) 2007 APA ) (journal abstract) %B Psycho-Oncology %V 16 %P 724-732 %@ 1057-9249 %G English %M 2007-12507-004 %0 Report %D 2007 %T A multiple objective test assembly approach for exposure control problems in computerized adaptive testing %A Veldkamp, B. P. %A Verschoor, Angela J. %A Theo Eggen %B Measurement and Research Department Reports %I Cito %C Arnhem, The Netherlands %G eng %0 Book Section %D 2007 %T The shadow-test approach: A universal framework for implementing adaptive testing %A van der Linden, W. J. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2007 %T Statistical aspects of adaptive testing %A van der Linden, W. J. %A Glas, C. A. W. %C C. R. Rao and S. Sinharay (Eds.), Handbook of statistics (Vol. 27: Psychometrics) (pp. 801838). Amsterdam: North-Holland. %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2006 %T Assembling a computerized adaptive testing item pool as a set of linear tests %A van der Linden, W. J. %A Ariel, A. %A Veldkamp, B. P. %K Algorithms %K computerized adaptive testing %K item pool %K linear tests %K mathematical models %K statistics %K Test Construction %K Test Items %X Test-item writing efforts typically results in item pools with an undesirable correlational structure between the content attributes of the items and their statistical information. If such pools are used in computerized adaptive testing (CAT), the algorithm may be forced to select items with less than optimal information, that violate the content constraints, and/or have unfavorable exposure rates. Although at first sight somewhat counterintuitive, it is shown that if the CAT pool is assembled as a set of linear test forms, undesirable correlations can be broken down effectively. It is proposed to assemble such pools using a mixed integer programming model with constraints that guarantee that each test meets all content specifications and an objective function that requires them to have maximal information at a well-chosen set of ability values. An empirical example with a previous master pool from the Law School Admission Test (LSAT) yielded a CAT with nearly uniform bias and mean-squared error functions for the ability estimator and item-exposure rates that satisfied the target for all items in the pool. %B Journal of Educational and Behavioral Statistics %I Sage Publications: US %V 31 %P 81-99 %@ 1076-9986 (Print) %G eng %M 2007-08137-004 %0 Journal Article %J Applied Psychological Measurement %D 2006 %T Equating scores from adaptive to linear tests %A van der Linden, W. J. %K computerized adaptive testing %K equipercentile equating %K local equating %K score reporting %K test characteristic function %X Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test for a population of test takers. The two local methods were generally best. Surprisingly, the TCF method performed slightly worse than the equipercentile method. Both methods showed strong bias and uniformly large inaccuracy, but the TCF method suffered from extra error due to the lower asymptote of the test characteristic function. It is argued that the worse performances of the two methods are a consequence of the fact that they use a single equating transformation for an entire population of test takers and therefore have to compromise between the individual score distributions. %B Applied Psychological Measurement %I Sage Publications: US %V 30 %P 493-508 %@ 0146-6216 (Print) %G eng %M 2006-20197-003 %0 Journal Article %J Journal of Applied Measurement %D 2006 %T Expansion of a physical function item bank and development of an abbreviated form for clinical research %A Bode, R. K. %A Lai, J-S. %A Dineen, K. %A Heinemann, A. W. %A Shevrin, D. %A Von Roenn, J. %A Cella, D. %K clinical research %K computerized adaptive testing %K performance levels %K physical function item bank %K Psychometrics %K test reliability %K Test Validity %X We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Applied Measurement %I Richard M Smith: US %V 7 %P 1-15 %@ 1529-7713 (Print) %G eng %M 2006-01262-001 %0 Conference Paper %B Paper presented at the SMABS-EAM Conference %D 2006 %T Multiple maximum exposure rates in computerized adaptive testing %A Barrada, J %A Veldkamp, B. P. %A Olea, J. %B Paper presented at the SMABS-EAM Conference %C Budapest, Hungary %G eng %0 Journal Article %J Applied Psychological Measurement %D 2006 %T Optimal Testing With Easy or Difficult Items in Computerized Adaptive Testing %A Theo Eggen %A Verschoor, Angela J. %X

Computerized adaptive tests (CATs) are individualized tests that, from a measurement point of view, are optimal for each individual, possibly under some practical conditions. In the present study, it is shown that maximum information item selection in CATs using an item bank that is calibrated with the one or the two-parameter logistic model results in each individual answering about 50% of the items correctly. Two item selection procedures giving easier (or more difficult) tests for students are presented and evaluated. Item selection on probability points of items yields good results only with the one-parameter logistic model and not with the two-parameter logistic model. An alternative selection procedure, based on maximum information at a shifted ability level, gives satisfactory results with both models. Index terms: computerized adaptive testing, item selection, item response theory

%B Applied Psychological Measurement %V 30 %P 379-393 %U http://apm.sagepub.com/content/30/5/379.abstract %R 10.1177/0146621606288890 %0 Journal Article %J Applied Psychological Measurement %D 2006 %T Optimal testing with easy or difficult items in computerized adaptive testing %A Theo Eggen %A Verschoor, Angela J. %K computer adaptive tests %K individualized tests %K Item Response Theory %K item selection %K Measurement %X Computerized adaptive tests (CATs) are individualized tests that, from a measurement point of view, are optimal for each individual, possibly under some practical conditions. In the present study, it is shown that maximum information item selection in CATs using an item bank that is calibrated with the one- or the two-parameter logistic model results in each individual answering about 50% of the items correctly. Two item selection procedures giving easier (or more difficult) tests for students are presented and evaluated. Item selection on probability points of items yields good results only with the one-parameter logistic model and not with the two-parameter logistic model. An alternative selection procedure, based on maximum information at a shifted ability level, gives satisfactory results with both models. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Applied Psychological Measurement %I Sage Publications: US %V 30 %P 379-393 %@ 0146-6216 (Print) %G eng %M 2006-10279-002 %0 Journal Article %J Applied Psychological Measurement %D 2006 %T Optimal Testlet Pool Assembly for Multistage Testing Designs %A Ariel, Adelaide %A Veldkamp, Bernard P. %A Breithaupt, Krista %X

Computerized multistage testing (MST) designs require sets of test questions (testlets) to be assembled to meet strict, often competing criteria. Rules that govern testlet assembly may dictate the number of questions on a particular subject or may describe desirable statistical properties for the test, such as measurement precision. In an MST design, testlets of differing difficulty levels must be created. Statistical properties for assembly of the testlets can be expressed using item response theory (IRT) parameters. The testlet test information function (TIF) value can be maximized at a specific point on the IRT ability scale. In practical MST designs, parallel versions of testlets are needed, so sets of testlets with equivalent properties are built according to equivalent specifications. In this project, the authors study the use of a mathematical programming technique to simultaneously assemble testlets to ensure equivalence and fairness to candidates who may be administered different testlets.

%B Applied Psychological Measurement %V 30 %P 204-215 %U http://apm.sagepub.com/content/30/3/204.abstract %R 10.1177/0146621605284350 %0 Journal Article %J International Journal of Testing %D 2005 %T Automated Simultaneous Assembly for Multistage Testing %A Breithaupt, Krista %A Ariel, Adelaide %A Veldkamp, Bernard P. %B International Journal of Testing %V 5 %P 319-330 %U http://www.tandfonline.com/doi/abs/10.1207/s15327574ijt0503_8 %R 10.1207/s15327574ijt0503_8 %0 Journal Article %J Journal of Educational Measurement %D 2005 %T A closer look at using judgments of item difficulty to change answers on computerized adaptive tests %A Vispoel, W. P. %A Clough, S. J. %A Bleiler, T. %B Journal of Educational Measurement %V 42 %P 331-350 %0 Journal Article %J Journal of Educational Measurement %D 2005 %T A comparison of item-selection methods for adaptive tests with content constraints %A van der Linden, W. J. %K Adaptive Testing %K Algorithms %K content constraints %K item selection method %K shadow test approach %K spiraling method %K weighted deviations method %X In test assembly, a fundamental difference exists between algorithms that select a test sequentially or simultaneously. Sequential assembly allows us to optimize an objective function at the examinee's ability estimate, such as the test information function in computerized adaptive testing. But it leads to the non-trivial problem of how to realize a set of content constraints on the test—a problem more naturally solved by a simultaneous item-selection method. Three main item-selection methods in adaptive testing offer solutions to this dilemma. The spiraling method moves item selection across categories of items in the pool proportionally to the numbers needed from them. Item selection by the weighted-deviations method (WDM) and the shadow test approach (STA) is based on projections of the future consequences of selecting an item. These two methods differ in that the former calculates a projection of a weighted sum of the attributes of the eventual test and the latter a projection of the test itself. The pros and cons of these methods are analyzed. An empirical comparison between the WDM and STA was conducted for an adaptive version of the Law School Admission Test (LSAT), which showed equally good item-exposure rates but violations of some of the constraints and larger bias and inaccuracy of the ability estimator for the WDM. %B Journal of Educational Measurement %I Blackwell Publishing: United Kingdom %V 42 %P 283-302 %@ 0022-0655 (Print) %G eng %M 2005-10716-004 %0 Journal Article %J Journal of Educational Measurement %D 2005 %T A comparison of item-selection methods for adaptive tests with content constraints %A van der Linden, W. J. %B Journal of Educational Measurement %V 42 %P 283-302 %G eng %0 Generic %D 2005 %T Constraining item exposure in computerized adaptive testing with shadow tests %A van der Linden, W. J. %A Veldkamp, B. P. %C Law School Admission Council Computerized Testing Report 02-03 %G eng %0 Generic %D 2005 %T Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach %A van der Linden, W. J. %A Chang, Hua-Hua %C Law School Admission Council, Computerized Testing Report 01-09 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2005 %T Infeasibility in automated test assembly models: A comparison study of different methods %A Huitzing, H. A. %A Veldkamp, B. P. %A Verschoor, A. J. %K Algorithms %K Item Content (Test) %K Models %K Test Construction %X Several techniques exist to automatically put together a test meeting a number of specifications. In an item bank, the items are stored with their characteristics. A test is constructed by selecting a set of items that fulfills the specifications set by the test assembler. Test assembly problems are often formulated in terms of a model consisting of restrictions and an objective to be maximized or minimized. A problem arises when it is impossible to construct a test from the item pool that meets all specifications, that is, when the model is not feasible. Several methods exist to handle these infeasibility problems. In this article, test assembly models resulting from two practical testing programs were reconstructed to be infeasible. These models were analyzed using methods that forced a solution (Goal Programming, Multiple-Goal Programming, Greedy Heuristic), that analyzed the causes (Relaxed and Ordered Deletion Algorithm (RODA), Integer Randomized Deletion Algorithm (IRDA), Set Covering (SC), and Item Sampling), or that analyzed the causes and used this information to force a solution (Irreducible Infeasible Set-Solver). Specialized methods such as the IRDA and the Irreducible Infeasible Set-Solver performed best. Recommendations about the use of different methods are given. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Journal of Educational Measurement %V 42 %P 223-243 %G eng %0 Journal Article %J Journal of Clinical Epidemiology %D 2005 %T An item bank was created to improve the measurement of cancer-related fatigue %A Lai, J-S. %A Cella, D. %A Dineen, K. %A Bode, R. %A Von Roenn, J. %A Gershon, R. C. %A Shevrin, D. %K Adult %K Aged %K Aged, 80 and over %K Factor Analysis, Statistical %K Fatigue/*etiology/psychology %K Female %K Humans %K Male %K Middle Aged %K Neoplasms/*complications/psychology %K Psychometrics %K Questionnaires %X OBJECTIVE: Cancer-related fatigue (CRF) is one of the most common unrelieved symptoms experienced by patients. CRF is underrecognized and undertreated due to a lack of clinically sensitive instruments that integrate easily into clinics. Modern computerized adaptive testing (CAT) can overcome these obstacles by enabling precise assessment of fatigue without requiring the administration of a large number of questions. A working item bank is essential for development of a CAT platform. The present report describes the building of an operational item bank for use in clinical settings with the ultimate goal of improving CRF identification and treatment. STUDY DESIGN AND SETTING: The sample included 301 cancer patients. Psychometric properties of items were examined by using Rasch analysis, an Item Response Theory (IRT) model. RESULTS AND CONCLUSION: The final bank includes 72 items. These 72 unidimensional items explained 57.5% of the variance, based on factor analysis results. Excellent internal consistency (alpha=0.99) and acceptable item-total correlation were found (range: 0.51-0.85). The 72 items covered a reasonable range of the fatigue continuum. No significant ceiling effects, floor effects, or gaps were found. A sample short form was created for demonstration purposes. The resulting bank is amenable to the development of a CAT platform. %B Journal of Clinical Epidemiology %7 2005/02/01 %V 58 %P 190-7 %8 Feb %@ 0895-4356 (Print)0895-4356 (Linking) %G eng %9 Multicenter Study %M 15680754 %0 Journal Article %J Journal of Pain and Symptom Management %D 2005 %T An item response theory-based pain item bank can enhance measurement precision %A Lai, J-S. %A Dineen, K. %A Reeve, B. B. %A Von Roenn, J. %A Shervin, D. %A McGuire, M. %A Bode, R. K. %A Paice, J. %A Cella, D. %K computerized adaptive testing %X Cancer-related pain is often under-recognized and undertreated. This is partly due to the lack of appropriate assessments, which need to be comprehensive and precise yet easily integrated into clinics. Computerized adaptive testing (CAT) can enable precise-yet-brief assessments by only selecting the most informative items from a calibrated item bank. The purpose of this study was to create such a bank. The sample included 400 cancer patients who were asked to complete 61 pain-related items. Data were analyzed using factor analysis and the Rasch model. The final bank consisted of 43 items which satisfied the measurement requirement of factor analysis and the Rasch model, demonstrated high internal consistency and reasonable item-total correlations, and discriminated patients with differing degrees of pain. We conclude that this bank demonstrates good psychometric properties, is sensitive to pain reported by patients, and can be used as the foundation for a CAT pain-testing platform for use in clinical practice. %B Journal of Pain and Symptom Management %V 30 %P 278-88 %G eng %M 16183012 %0 Journal Article %J Applied Psychological Measurement %D 2005 %T A Randomized Experiment to Compare Conventional, Computerized, and Computerized Adaptive Administration of Ordinal Polytomous Attitude Items %A Hol, A. Michiel %A Vorst, Harrie C. M. %A Mellenbergh, Gideon J. %X

A total of 520 high school students were randomly assigned to a paper-and-pencil test (PPT), a computerized standard test (CST), or a computerized adaptive test (CAT) version of the Dutch School Attitude Questionnaire (SAQ), consisting of ordinal polytomous items. The CST administered items in the same order as the PPT. The CAT administered all items of three SAQ subscales in adaptive order using Samejima’s graded response model, so that six different stopping rule settings could be applied afterwards. School marks were used as external criteria. Results showed significant but small multivariate administration mode effects on conventional raw scores and small to medium effects on maximum likelihood latent trait estimates. When the precision of CAT latent trait estimates decreased, correlations with grade point average in general decreased. However, the magnitude of the decrease was not very large as compared to the PPT, the CST, and the CAT without the stopping rule.

%B Applied Psychological Measurement %V 29 %P 159-183 %U http://apm.sagepub.com/content/29/3/159.abstract %R 10.1177/0146621604271268 %0 Journal Article %J Applied Psychological Measurement %D 2005 %T A randomized experiment to compare conventional, computerized, and computerized adaptive administration of ordinal polytomous attitude items %A Hol, A. M. %A Vorst, H. C. M. %A Mellenbergh, G. J. %K Computer Assisted Testing %K Test Administration %K Test Items %X A total of 520 high school students were randomly assigned to a paper-and-pencil test (PPT), a computerized standard test (CST), or a computerized adaptive test (CAT) version of the Dutch School Attitude Questionnaire (SAQ), consisting of ordinal polytomous items. The CST administered items in the same order as the PPT. The CAT administered all items of three SAQ subscales in adaptive order using Samejima's graded response model, so that six different stopping rule settings could be applied afterwards. School marks were used as external criteria. Results showed significant but small multivariate administration mode effects on conventional raw scores and small to medium effects on maximum likelihood latent trait estimates. When the precision of CAT latent trait estimates decreased, correlations with grade point average in general decreased. However, the magnitude of the decrease was not very large as compared to the PPT, the CST, and the CAT without the stopping rule. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 29 %P 159-183 %G eng %0 Journal Article %J Alcoholism: Clinical & Experimental Research %D 2005 %T Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire %A Kahler, C. W. %A Strong, D. R. %A Read, J. P. %A De Boeck, P. %A Wilson, M. %A Acton, G. S. %A Palfai, T. P. %A Wood, M. D. %A Mehta, P. D. %A Neale, M. C. %A Flay, B. R. %A Conklin, C. A. %A Clayton, R. R. %A Tiffany, S. T. %A Shiffman, S. %A Krueger, R. F. %A Nichol, P. E. %A Hicks, B. M. %A Markon, K. E. %A Patrick, C. J. %A Iacono, William G. %A McGue, Matt %A Langenbucher, J. W. %A Labouvie, E. %A Martin, C. S. %A Sanjuan, P. M. %A Bavly, L. %A Kirisci, L. %A Chung, T. %A Vanyukov, M. %A Dunn, M. %A Tarter, R. %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. %K Psychometrics %K Substance-Related Disorders %X Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association %B Alcoholism: Clinical & Experimental Research %V 29 %P 1180-1189 %G eng %0 Generic %D 2004 %T The AMC Linear Disability Score project in a population requiring residential care: psychometric properties %A Holman, R. %A Lindeboom, R. %A Vermeulen, M. %A de Haan, R. J. %K *Disability Evaluation %K *Health Status Indicators %K Activities of Daily Living/*classification %K Adult %K Aged %K Aged, 80 and over %K Data Collection/methods %K Female %K Humans %K Logistic Models %K Male %K Middle Aged %K Netherlands %K Pilot Projects %K Probability %K Psychometrics/*instrumentation %K Questionnaires/standards %K Residential Facilities/*utilization %K Severity of Illness Index %X BACKGROUND: Currently there is a lot of interest in the flexible framework offered by item banks for measuring patient relevant outcomes, including functional status. However, there are few item banks, which have been developed to quantify functional status, as expressed by the ability to perform activities of daily life. METHOD: This paper examines the psychometric properties of the AMC Linear Disability Score (ALDS) project item bank using an item response theory model and full information factor analysis. Data were collected from 555 respondents on a total of 160 items. RESULTS: Following the analysis, 79 items remained in the item bank. The remaining 81 items were excluded because of: difficulties in presentation (1 item); low levels of variation in response pattern (28 items); significant differences in measurement characteristics for males and females or for respondents under or over 85 years old (26 items); or lack of model fit to the data at item level (26 items). CONCLUSIONS: It is conceivable that the item bank will have different measurement characteristics for other patient or demographic populations. However, these results indicate that the ALDS item bank has sound psychometric properties for respondents in residential care settings and could form a stable base for measuring functional status in a range of situations, including the implementation of computerised adaptive testing of functional status. %B Health and Quality of Life Outcomes %7 2004/08/05 %V 2 %P 42 %8 Aug 3 %@ 1477-7525 (Electronic)1477-7525 (Linking) %G eng %M 15291958 %2 514531 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2004 %T Automated Simultaneous Assembly of Multi-Stage Testing for the Uniform CPA Examination %A Breithaupt, K %A Ariel, A. %A Veldkamp, B. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2004 %T Constraining item exposure in computerized adaptive testing with shadow tests %A van der Linden, W. J. %A Veldkamp, B. P. %K computerized adaptive testing %K item exposure control %K item ineligibility constraints %K Probability %K shadow tests %X Item-exposure control in computerized adaptive testing is implemented by imposing item-ineligibility constraints on the assembly process of the shadow tests. The method resembles Sympson and Hetter’s (1985) method of item-exposure control in that the decisions to impose the constraints are probabilistic. The method does not, however, require time-consuming simulation studies to set values for control parameters before the operational use of the test. Instead, it can set the probabilities of item ineligibility adaptively during the test using the actual item-exposure rates. An empirical study using an item pool from the Law School Admission Test showed that application of the method yielded perfect control of the item-exposure rates and had negligible impact on the bias and mean-squared error functions of the ability estimator. %B Journal of Educational and Behavioral Statistics %I American Educational Research Assn: US %V 29 %P 273-291 %@ 1076-9986 (Print) %G eng %M 2006-01687-001 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2004 %T Constraining Item Exposure in Computerized Adaptive Testing With Shadow Tests %A van der Linden, Wim J. %A Veldkamp, Bernard P. %X

Item-exposure control in computerized adaptive testing is implemented by imposing item-ineligibility constraints on the assembly process of the shadow tests. The method resembles Sympson and Hetter’s (1985) method of item-exposure control in that the decisions to impose the constraints are probabilistic. The method does not, however, require time-consuming simulation studies to set values for control parameters before the operational use of the test. Instead, it can set the probabilities of item ineligibility adaptively during the test using the actual item-exposure rates. An empirical study using an item pool from the Law School Admission Test showed that application of the method yielded perfect control of the item-exposure rates and had negligible impact on the bias and mean-squared error functions of the ability estimator.

%B Journal of Educational and Behavioral Statistics %V 29 %P 273-291 %U http://jeb.sagepub.com/cgi/content/abstract/29/3/273 %R 10.3102/10769986029003273 %0 Journal Article %J Journal of Educational Measurement %D 2004 %T Constructing rotating item pools for constrained adaptive testing %A Ariel, A. %A Veldkamp, B. P. %A van der Linden, W. J. %K computerized adaptive tests %K constrained adaptive testing %K item exposure %K rotating item pools %X Preventing items in adaptive testing from being over- or underexposed is one of the main problems in computerized adaptive testing. Though the problem of overexposed items can be solved using a probabilistic item-exposure control method, such methods are unable to deal with the problem of underexposed items. Using a system of rotating item pools, on the other hand, is a method that potentially solves both problems. In this method, a master pool is divided into (possibly overlapping) smaller item pools, which are required to have similar distributions of content and statistical attributes. These pools are rotated among the testing sites to realize desirable exposure rates for the items. A test assembly model, motivated by Gulliksen's matched random subtests method, was explored to help solve the problem of dividing a master pool into a set of smaller pools. Different methods to solve the model are proposed. An item pool from the Law School Admission Test was used to evaluate the performances of computerized adaptive tests from systems of rotating item pools constructed using these methods. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Educational Measurement %I Blackwell Publishing: United Kingdom %V 41 %P 345-359 %@ 0022-0655 (Print) %G eng %M 2004-21596-004 %0 Book Section %B Intelligent Tutoring Systems %D 2004 %T A Learning Environment for English for Academic Purposes Based on Adaptive Tests and Task-Based Systems %A Gonçalves, Jean P. %A Aluisio, Sandra M. %A de Oliveira, Leandro H.M. %A Oliveira Jr., Osvaldo N. %E Lester, James C. %E Vicari, Rosa Maria %E Paraguaçu, Fábio %B Intelligent Tutoring Systems %S Lecture Notes in Computer Science %I Springer Berlin / Heidelberg %V 3220 %P 1-11 %@ 978-3-540-22948-3 %G eng %U http://dx.doi.org/10.1007/978-3-540-30139-4_1 %R 10.1007/978-3-540-30139-4_1 %0 Journal Article %J Applied Psychological Measurement %D 2004 %T Mokken Scale Analysis Using Hierarchical Clustering Procedures %A van Abswoude, Alexandra A. H. %A Vermunt, Jeroen K. %A Hemker, Bas T. %A van der Ark, L. Andries %X

Mokken scale analysis (MSA) can be used to assess and build unidimensional scales from an item pool that is sensitive to multiple dimensions. These scales satisfy a set of scaling conditions, one of which follows from the model of monotone homogeneity. An important drawback of the MSA program is that the sequential item selection and scale construction procedure may not find the dominant underlying dimensionality of the responses to a set of items. The authors investigated alternative hierarchical item selection procedures and compared the performance of four hierarchical methods and the sequential clustering method in the MSA context. The results showed that hierarchical clustering methods can improve the search process of the dominant dimensionality of a data matrix. In particular, the complete linkage and scale linkage methods were promising in finding the dimensionality of the item response data from a set of items.

%B Applied Psychological Measurement %V 28 %P 332-354 %U http://apm.sagepub.com/content/28/5/332.abstract %R 10.1177/0146621604265510 %0 Generic %D 2004 %T Optimal testing with easy items in computerized adaptive testing (Measurement and Research Department Report 2004-2) %A Theo Eggen %A Verschoor, A. J. %C Arnhem, The Netherlands: Cito Group %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2004 %T A sequential Bayesian procedure for item calibration in multistage testing %A van der Linden, W. J. %A Alan D Mead %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego CA %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Alpha-stratified adaptive testing with large numbers of content constraints %A van der Linden, W. J. %A Chang, Hua-Hua %B Applied Psychological Measurement %V 27 %P 107-120 %G eng %0 Book Section %B Reusing online resources: A sustanable approach to e-learning %D 2003 %T Assessing question banks %A Bull, J. %A Dalziel, J. %A Vreeland, T. %K Computer Assisted Testing %K Curriculum Based Assessment %K Education %K Technology computerized adaptive testing %X In Chapter 14, Joanna Bull and James Daziel provide a comprehensive treatment of the issues surrounding the use of Question Banks and Computer Assisted Assessment, and provide a number of excellent examples of implementations. In their review of the technologies employed in Computer Assisted Assessment the authors include Computer Adaptive Testing and data generation. The authors reveal significant issues involving the impact of Intellectual Property rights and computer assisted assessment and make important suggestions for strategies to overcome these obstacles. (PsycINFO Database Record (c) 2005 APA )http://www-jime.open.ac.uk/2003/1/ (journal abstract) %B Reusing online resources: A sustanable approach to e-learning %I Kogan Page Ltd. %C London, UK %P 171-230 %G eng %0 Book Section %D 2003 %T Bayesian checks on outlying response times in computerized adaptive testing %A van der Linden, W. J. %C H. Yanai, A. Okada, K. Shigemasu, Y. Kano, Y. and J. J. Meulman, (Eds.), New developments in psychometrics (pp. 215-222). New York: Springer-Verlag. %G eng %0 Journal Article %J Clinical Therapeutics %D 2003 %T Can an item response theory-based pain item bank enhance measurement precision? %A Lai, J-S. %A Dineen, K. %A Cella, D. %A Von Roenn, J. %B Clinical Therapeutics %V 25 %P D34-D36 %G eng %M 14568660 %! Clin Ther %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Computerized adaptive testing with item cloning %A Glas, C. A. W. %A van der Linden, W. J. %K computerized adaptive testing %X (from the journal abstract) To increase the number of items available for adaptive testing and reduce the cost of item writing, the use of techniques of item cloning has been proposed. An important consequence of item cloning is possible variability between the item parameters. To deal with this variability, a multilevel item response (IRT) model is presented which allows for differences between the distributions of item parameters of families of item clones. A marginal maximum likelihood and a Bayesian procedure for estimating the hyperparameters are presented. In addition, an item-selection procedure for computerized adaptive testing with item cloning is presented which has the following two stages: First, a family of item clones is selected to be optimal at the estimate of the person parameter. Second, an item is randomly selected from the family for administration. Results from simulation studies based on an item pool from the Law School Admission Test (LSAT) illustrate the accuracy of these item pool calibration and adaptive testing procedures. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Applied Psychological Measurement %V 27 %P 247-261 %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T Constraining item exposure in computerized adaptive testing with shadow tests %A van der Linden, W. J. %A Veldkamp, B. P. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T Constructing rotating item pools for constrained adaptive testing %A Ariel, A. %A Veldkamp, B. %A van der Linden, W. J. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %D 2003 %T Controlling item exposure and item eligibility in computerized adaptive testing %A van der Linden, W. J. %A Veldkamp, B. P. %G eng %0 Conference Paper %B Paper presented at the Annual meeting of the National Council on Measurement in Education %D 2003 %T Implementing an alternative to Sympson-Hetter item-exposure control in constrained adaptive testing %A Veldkamp, B. P. %A van der Linden, W. J. %B Paper presented at the Annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Implementing content constraints in alpha-stratified adaptive testing using a shadow test approach %A van der Linden, W. J. %A Chang, Hua-Hua %B Applied Psychological Measurement %V 27 %P 107-120 %G eng %0 Book Section %B New developments in psychometrics %D 2003 %T Item selection in polytomous CAT %A Veldkamp, B. P. %E A. Okada %E K. Shigenasu %E Y. Kano %E J. Meulman %K computerized adaptive testing %B New developments in psychometrics %I Psychometric Society, Springer %C Tokyo, Japan %P 207–214 %G eng %0 Book Section %D 2003 %T Item selection in polytomous CAT %A Veldkamp, B. P. %C H. Yanai, A. Okada, K. Shigemasu, Y Kano, and J. J. Meulman (eds.), New developments in psychometrics (pp. 207-214). Tokyo, Japan: Springer-Verlag. %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T Optimal stratification of item pools in α-stratified computerized adaptive testing %A Chang, Hua-Hua %A van der Linden, W. J. %K Adaptive Testing %K Computer Assisted Testing %K Item Content (Test) %K Item Response Theory %K Mathematical Modeling %K Test Construction computerized adaptive testing %X A method based on 0-1 linear programming (LP) is presented to stratify an item pool optimally for use in α-stratified adaptive testing. Because the 0-1 LP model belongs to the subclass of models with a network flow structure, efficient solutions are possible. The method is applied to a previous item pool from the computerized adaptive testing (CAT) version of the Graduate Record Exams (GRE) Quantitative Test. The results indicate that the new method performs well in practical situations. It improves item exposure control, reduces the mean squared error in the θ estimates, and increases test reliability. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Applied Psychological Measurement %V 27 %P 262-274 %G eng %0 Conference Paper %B Paper presented at the conference of the International Association for Educational Assessment %D 2003 %T Optimal testing with easy items in computerized adaptive testing %A Theo Eggen %A Verschoor, A. %B Paper presented at the conference of the International Association for Educational Assessment %C Manchester UK %G eng %0 Generic %D 2003 %T A sequential Bayes procedure for item calibration in multi-stage testing %A van der Linden, W. J. %A Alan D Mead %C Manuscript in preparation %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2003 %T Some alternatives to Sympson-Hetter item-exposure control in computerized adaptive testing %A van der Linden, W. J. %K Adaptive Testing %K Computer Assisted Testing %K Test Items computerized adaptive testing %X TheHetter and Sympson (1997; 1985) method is a method of probabilistic item-exposure control in computerized adaptive testing. Setting its control parameters to admissible values requires an iterative process of computer simulations that has been found to be time consuming, particularly if the parameters have to be set conditional on a realistic set of values for the examinees’ ability parameter. Formal properties of the method are identified that help us explain why this iterative process can be slow and does not guarantee admissibility. In addition, some alternatives to the SH method are introduced. The behavior of these alternatives was estimated for an adaptive test from an item pool from the Law School Admission Test (LSAT). Two of the alternatives showed attractive behavior and converged smoothly to admissibility for all items in a relatively small number of iteration steps. %B Journal of Educational and Behavioral Statistics %V 28 %P 249-265 %G eng %0 Journal Article %J Psychometrika %D 2003 %T Using response times to detect aberrant responses in computerized adaptive testing %A van der Linden, W. J. %A van Krimpen-Stoop, E. M. L. A. %K Adaptive Testing %K Behavior %K Computer Assisted Testing %K computerized adaptive testing %K Models %K person Fit %K Prediction %K Reaction Time %X A lognormal model for response times is used to check response times for aberrances in examinee behavior on computerized adaptive tests. Both classical procedures and Bayesian posterior predictive checks are presented. For a fixed examinee, responses and response times are independent; checks based on response times offer thus information independent of the results of checks on response patterns. Empirical examples of the use of classical and Bayesian checks for detecting two different types of aberrances in response times are presented. The detection rates for the Bayesian checks outperformed those for the classical checks, but at the cost of higher false-alarm rates. A guideline for the choice between the two types of checks is offered. %B Psychometrika %V 68 %P 251-265 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2002 %T Can examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests? %A Vispoel, W. P. %A Clough, S. J. %A Bleiler, T. %A Hendrickson, A. B. %A Ihrig, D. %B Journal of Educational Measurement %V 39 %P 311-330 %G eng %0 Generic %D 2002 %T Constraining item exposure in computerized adaptive testing with shadow tests (Research Report No. 02-06) %A van der Linden, W. J. %A Veldkamp, B. P. %C University of Twente, The Netherlands %G eng %0 Report %D 2002 %T Mathematical-programming approaches to test item pool design %A Veldkamp, B. P. %A van der Linden, W. J. %A Ariel, A. %K Adaptive Testing %K Computer Assisted %K Computer Programming %K Educational Measurement %K Item Response Theory %K Mathematics %K Psychometrics %K Statistical Rotation computerized adaptive testing %K Test Items %K Testing %X (From the chapter) This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing and hence to increase both measurement precision and validity. The approach consists of the application of mathematical programming techniques to calculate optimal blueprints for item pools. These blueprints can be used to guide the item-writing process. Three different types of design problems are discussed, namely for item pools for linear tests, item pools computerized adaptive testing (CAT), and systems of rotating item pools for CAT. The paper concludes with an empirical example of the problem of designing a system of rotating item pools for CAT. %I University of Twente, Faculty of Educational Science and Technology %C Twente, The Netherlands %P 93-108 %@ 02-09 %G eng %0 Generic %D 2002 %T Modifications of the Sympson-Hetter method for item-exposure control in computerized adaptive testing %A van der Linden, W. J. %C Manuscript submitted for publication %G eng %0 Journal Article %J Psychometrika %D 2002 %T Multidimensional adaptive testing with constraints on test content %A Veldkamp, B. P. %A van der Linden, W. J. %X The case of adaptive testing under a multidimensional response model with large numbers of constraints on the content of the test is addressed. The items in the test are selected using a shadow test approach. The 0–1 linear programming model that assembles the shadow tests maximizes posterior expected Kullback-Leibler information in the test. The procedure is illustrated for five different cases of multidimensionality. These cases differ in (a) the numbers of ability dimensions that are intentional or should be considered as ldquonuisance dimensionsrdquo and (b) whether the test should or should not display a simple structure with respect to the intentional ability dimensions. %B Psychometrika %V 67 %P 575-588 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T Using judgments of item difficulty to change answers on computerized adaptive vocabulary tests %A Vispoel, W. P. %A Clough, S. J. %A Bleiler, T. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Can examinees use judgments of item difficulty to improve proficiency estimates on computerized adaptive vocabulary tests? %A Vispoel, W. P. %A Clough, S. J. %A Bleiler, T. Hendrickson, A. B. %A Ihrig, D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Journal Article %J Applied Psychological Measurement %D 2001 %T Computerized adaptive testing with equated number-correct scoring %A van der Linden, W. J. %X A constrained computerized adaptive testing (CAT) algorithm is presented that can be used to equate CAT number-correct (NC) scores to a reference test. As a result, the CAT NC scores also are equated across administrations. The constraints are derived from van der Linden & Luecht’s (1998) set of conditions on item response functions that guarantees identical observed NC score distributions on two test forms. An item bank from the Law School Admission Test was used to compare the results of the algorithm with those for equipercentile observed-score equating, as well as the prediction of NC scores on a reference test using its test response function. The effects of the constraints on the statistical properties of the θ estimator in CAT were examined. %B Applied Psychological Measurement %V 25 %P 343-355 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2001 %T Differences between self-adapted and computerized adaptive tests: A meta-analysis %A Pitkin, A. K. %A Vispoel, W. P. %K Adaptive Testing %K Computer Assisted Testing %K Scores computerized adaptive testing %K Test %K Test Anxiety %X Self-adapted testing has been described as a variation of computerized adaptive testing that reduces test anxiety and thereby enhances test performance. The purpose of this study was to gain a better understanding of these proposed effects of self-adapted tests (SATs); meta-analysis procedures were used to estimate differences between SATs and computerized adaptive tests (CATs) in proficiency estimates and post-test anxiety levels across studies in which these two types of tests have been compared. After controlling for measurement error the results showed that SATs yielded proficiency estimates that were 0.12 standard deviation units higher and post-test anxiety levels that were 0.19 standard deviation units lower than those yielded by CATs. The authors speculate about possible reasons for these differences and discuss advantages and disadvantages of using SATs in operational settings. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Educational Measurement %V 38 %P 235-247 %G eng %0 Generic %D 2001 %T Implementing constrained CAT with shadow tests for large item pools %A Veldkamp, B. P. %C Submitted for publication %G eng %0 Generic %D 2001 %T Implementing content constraints in a-stratified adaptive testing using a shadow test approach (Research Report 01-001) %A Chang, Hua-Hua %A van der Linden, W. J. %C University of Twente, Department of Educational Measurement and Data Analysis %G eng %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 2001 %T A minimax procedure in the context of sequential testing problems in psychodiagnostics %A Vos, H. J. %B British Journal of Mathematical and Statistical Psychology %V 54 %P 139-159 %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %D 2001 %T Modeling variability in item parameters in CAT %A Glas, C. A. W. %A van der Linden, W. J. %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %D 2001 %T Multidimensional IRT-based adaptive sequential mastery testing %A Vos, H. J. %A Glas, C. E. W. %B Paper presented at the Annual Meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Journal Article %J Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden %D 2001 %T Toepassing van een computergestuurde adaptieve testprocedure op persoonlijkheidsdata [Application of a computerised adaptive test procedure on personality data] %A Hol, A. M. %A Vorst, H. C. M. %A Mellenbergh, G. J. %K Adaptive Testing %K Computer Applications %K Computer Assisted Testing %K Personality Measures %K Test Reliability computerized adaptive testing %X Studied the applicability of a computerized adaptive testing procedure to an existing personality questionnaire within the framework of item response theory. The procedure was applied to the scores of 1,143 male and female university students (mean age 21.8 yrs) in the Netherlands on the Neuroticism scale of the Amsterdam Biographical Questionnaire (G. J. Wilde, 1963). The graded response model (F. Samejima, 1969) was used. The quality of the adaptive test scores was measured based on their correlation with test scores for the entire item bank and on their correlation with scores on other scales from the personality test. The results indicate that computerized adaptive testing can be applied to personality scales. (PsycINFO Database Record (c) 2005 APA ) %B Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden %V 56 %P 119-133 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Using response times to detect aberrant behavior in computerized adaptive testing %A van der Linden, W. J. %A van Krimpen-Stoop, E. M. L. A. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Generic %D 2000 %T Adaptive mastery testing using a multidimensional IRT model and Bayesian sequential decision theory (Research Report 00-06) %A Glas, C. A. W. %A Vos, H. J. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Journal Article %J Applied Measurement in Education %D 2000 %T Capitalization on item calibration error in adaptive testing %A van der Linden, W. J. %A Glas, C. A. W. %K computerized adaptive testing %X (from the journal abstract) In adaptive testing, item selection is sequentially optimized during the test. Because the optimization takes place over a pool of items calibrated with estimation error, capitalization on chance is likely to occur. How serious the consequences of this phenomenon are depends not only on the distribution of the estimation errors in the pool or the conditional ratio of the test length to the pool size given ability, but may also depend on the structure of the item selection criterion used. A simulation study demonstrated a dramatic impact of capitalization on estimation errors on ability estimation. Four different strategies to minimize the likelihood of capitalization on error in computerized adaptive testing are discussed. %B Applied Measurement in Education %V 13 %P 35-53 %G eng %0 Book %D 2000 %T Computerized adaptive testing: Theory and practice %A van der Linden, W. J. %A Glas, C. A. W. %I Kluwer Academic Publishers %C Dordrecht, The Netherlands %G eng %0 Book Section %D 2000 %T Constrained adaptive testing with shadow tests %A van der Linden, W. J. %C W. J. van der Linden and C. A. W. Glas (eds.), Computerized adaptive testing: Theory and practice (pp.27-52). Norwell MA: Kluwer. %G eng %0 Book Section %D 2000 %T Cross-validating item parameter estimation in adaptive testing %A van der Linden, W. J. %A Glas, C. A. W. %C A. Boorsma, M. A. J. van Duijn, and T. A. B. Snijders (Eds.) (pp. 205-219), Essays on item response theory. New York: Springer. %G eng %0 Book Section %B Computerized adaptive testing: Theory and practice %D 2000 %T Designing item pools for computerized adaptive testing %A Veldkamp, B. P. %A van der Linden, W. J. %B Computerized adaptive testing: Theory and practice %I Kluwer Academic Publishers %C Dendrecht, The Netherlands %P 149–162 %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2000 %T Detection of known items in adaptive testing with a statistical quality control method %A Veerkamp, W. J. J. %A Glas, C. E. W. %B Journal of Educational and Behavioral Statistics %V 25 %P 373-389 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2000 %T An integer programming approach to item bank design %A van der Linden, W. J. %A Veldkamp, B. P. %A Reese, L. M. %K Aptitude Measures %K Item Analysis (Test) %K Item Response Theory %K Test Construction %K Test Items %X An integer programming approach to item bank design is presented that can be used to calculate an optimal blueprint for an item bank, in order to support an existing testing program. The results are optimal in that they minimize the effort involved in producing the items as revealed by current item writing patterns. Also presented is an adaptation of the models, which can be used as a set of monitoring tools in item bank management. The approach is demonstrated empirically for an item bank that was designed for the Law School Admission Test. %B Applied Psychological Measurement %V 24 %P 139-150 %G eng %0 Book Section %B Computerized adaptive testing: Theory and practice %D 2000 %T Item selection and ability estimation in adaptive testing %A van der Linden, W. J. %A Pashley, P. J. %B Computerized adaptive testing: Theory and practice %I Kluwer Academic Publishers %C Dordrecht, The Netherlands %P 1–25 %G eng %0 Journal Article %J Journal of Educational Measurement %D 2000 %T Limiting answer review and change on computerized adaptive vocabulary tests: Psychometric and attitudinal results %A Vispoel, W. P. %A Hendrickson, A. B. %A Bleiler, T. %B Journal of Educational Measurement %V 37 %P 21-38 %G eng %0 Book Section %D 2000 %T A minimax solution for sequential classification problems %A Vos, H. J. %C H. A. L. Kiers, J.-P.Rasson, P. J. F. Groenen, and M. Schader (Eds.), Data analysis, classification, and related methods (pp. 121-126). Berlin: Springer. %G eng %0 Generic %D 2000 %T Modifications of the branch-and-bound algorithm for application in constrained adaptive testing (Research Report 00-05) %A Veldkamp, B. P. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Generic %D 2000 %T Multidimensional adaptive testing with constraints on test content (Research Report 00-11) %A Veldkamp, B. P. %A van der Linden, W. J. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Generic %D 2000 %T Optimal stratification of item pools in a-stratified computerized adaptive testing (Research Report 00-07) %A van der Linden, W. J. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2000 %T Taylor approximations to logistic IRT models and their use in adaptive testing %A Veerkamp, W. J. J. %K computerized adaptive testing %X Taylor approximation can be used to generate a linear approximation to a logistic ICC and a linear ability estimator. For a specific situation it will be shown to result in a special case of a Robbins-Monro item selection procedure for adaptive testing. The linear estimator can be used for the situation of zero and perfect scores when maximum likelihood estimation fails to come up with a finite estimate. It is also possible to use this estimator to generate starting values for maximum likelihood and weighted likelihood estimation. Approximations to the expectation and variance of the linear estimator for a sequence of Robbins-Monro item selections can be determined analytically. %B Journal of Educational and Behavioral Statistics %V 25 %P 307-343 %G eng %M EJ620787 %0 Book Section %D 2000 %T Testlet-based adaptive mastery testing, W %A Vos, H. J. %A Glas, C. A. W. %C J. van der Linden (Ed.), Computerized adaptive testing: Theory and practice (pp. 289-309). Norwell MA: Kluwer. %G eng %0 Generic %D 2000 %T Using response times to detect aberrant behavior in computerized adaptive testing (Research Report 00-09) %A van der Linden, W. J. %A van Krimpen-Stoop, E. M. L. A. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Generic %D 1999 %T Adaptive testing with equated number-correct scoring (Research Report 99-02) %A van der Linden, W. J. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Journal Article %J Journal of Educational Measurement %D 1999 %T Can examinees use a review option to obtain positively biased ability estimates on a computerized adaptive test? %A Vispoel, W. P. %A Rocklin, T. R. %A Wang, T. %A Bleiler, T. %B Journal of Educational Measurement %V 36 %P 141-157 %G eng %0 Book Section %D 1999 %T Creating computerized adaptive tests of music aptitude: Problems, solutions, and future directions %A Vispoel, W. P. %C F. Drasgow and J. B. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 151-176). Mahwah NJ: Erlbaum. %G eng %0 Generic %D 1999 %T Designing item pools for computerized adaptive testing (Research Report 99-03 ) %A Veldkamp, B. P. %A van der Linden, W. J. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T Empirical initialization of the trait estimator in adaptive testing %A van der Linden, W. J. %B Applied Psychological Measurement %V 23 %P 21-29 %G eng %0 Book Section %D 1999 %T Het ontwerpen van adaptieve examens [Designing adaptive tests] %A van der Linden, W. J. %C J. M Pieters, Tj. Plomp, and L.E. Odenthal (Eds.), Twintig jaar Toegepaste Onderwijskunde: Een kaleidoscopisch overzicht van Twents onderwijskundig onderzoek (pp. 249-267). Enschede: Twente University Press. %G eng %0 Book Section %D 1999 %T Item calibration and parameter drift %A Glas, C. A. W. %A Veerkamp, W. J. J. %C W. J. van der Linden and C. A. W. Glas (Eds.), Computer adaptive testing: Theory and practice. Norwell MA: Kluwer. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Limiting answer review and change on computerized adaptive vocabulary tests: Psychometric and attitudinal results %A Vispoel, W. P. %A Hendrickson, A. %A Bleiler, T. %A Widiatmo, H. %A Shrairi, S. %A Ihrig, D. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Generic %D 1999 %T A minimax procedure in the context of sequential mastery testing (Research Report 99-04) %A Vos, H. J. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 1999 %T Multidimensional adaptive testing with a minimum error-variance criterion %A van der Linden, W. J. %K computerized adaptive testing %X Adaptive testing under a multidimensional logistic response model is addressed. An algorithm is proposed that minimizes the (asymptotic) variance of the maximum-likelihood estimator of a linear combination of abilities of interest. The criterion results in a closed-form expression that is easy to evaluate. In addition, it is shown how the algorithm can be modified if the interest is in a test with a "simple ability structure". The statistical properties of the adaptive ML estimator are demonstrated for a two-dimensional item pool with several linear combinations of the abilities. %B Journal of Educational and Behavioral Statistics %V 24 %P 398-412 %G eng %M EJ607470 %0 Journal Article %J American Journal of Occupational Therapy %D 1999 %T The use of Rasch analysis to produce scale-free measurement of functional ability %A Velozo, C. A. %A Kielhofner, G. %A Lai, J-S. %K *Activities of Daily Living %K Disabled Persons/*classification %K Human %K Occupational Therapy/*methods %K Predictive Value of Tests %K Questionnaires/standards %K Sensitivity and Specificity %X Innovative applications of Rasch analysis can lead to solutions for traditional measurement problems and can produce new assessment applications in occupational therapy and health care practice. First, Rasch analysis is a mechanism that translates scores across similar functional ability assessments, thus enabling the comparison of functional ability outcomes measured by different instruments. This will allow for the meaningful tracking of functional ability outcomes across the continuum of care. Second, once the item-difficulty order of an instrument or item bank is established by Rasch analysis, computerized adaptive testing can be used to target items to the patient's ability level, reducing assessment length by as much as one half. More importantly, Rasch analysis can provide the foundation for "equiprecise" measurement or the potential to have precise measurement across all levels of functional ability. The use of Rasch analysis to create scale-free measurement of functional ability demonstrates how this methodlogy can be used in practical applications of clinical and outcome assessment. %B American Journal of Occupational Therapy %V 53 %P 83-90 %G eng %M 9926224 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 1999 %T Using Bayesian decision theory to design a computerized mastery test %A Vos, H. J. %B Journal of Educational and Behavioral Statistics %V 24(3) %P 271–292 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T Using response-time constraints to control for differential speededness in computerized adaptive testing %A van der Linden, W. J. %A Scrams, D. J. %A Schnipke, D. L. %K computerized adaptive testing %X An item-selection algorithm is proposed for neutralizing the differential effects of time limits on computerized adaptive test scores. The method is based on a statistical model for distributions of examinees’ response times on items in a bank that is updated each time an item is administered. Predictions from the model are used as constraints in a 0-1 linear programming model for constrained adaptive testing that maximizes the accuracy of the trait estimator. The method is demonstrated empirically using an item bank from the Armed Services Vocational Aptitude Battery. %B Applied Psychological Measurement %V 23 %P 195-210 %G eng %0 Generic %D 1998 %T Adaptive mastery testing using the Rasch model and Bayesian sequential decision theory (Research Report 98-15) %A Glas, C. A. W. %A Vos, H. J. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Journal Article %J Psychometrika %D 1998 %T Bayesian item selection criteria for adaptive testing %A van der Linden, W. J. %B Psychometrika %V 63 %P 201-216 %G eng %0 Generic %D 1998 %T Capitalization on item calibration error in adaptive testing (Research Report 98-07) %A van der Linden, W. J. %A Glas, C. A. W. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Journal Article %J Applied Psychological Measurement %D 1998 %T A model for optimal constrained adaptive testing %A van der Linden, W. J. %A Reese, L. M. %K computerized adaptive testing %X A model for constrained computerized adaptive testing is proposed in which the information in the test at the trait level (0) estimate is maximized subject to a number of possible constraints on the content of the test. At each item-selection step, a full test is assembled to have maximum information at the current 0 estimate, fixing the items already administered. Then the item with maximum in-formation is selected. All test assembly is optimal because a linear programming (LP) model is used that automatically updates to allow for the attributes of the items already administered and the new value of the 0 estimator. The LP model also guarantees that each adaptive test always meets the entire set of constraints. A simulation study using a bank of 753 items from the Law School Admission Test showed that the 0 estimator for adaptive tests of realistic lengths did not suffer any loss of efficiency from the presence of 433 constraints on the item selection process. %B Applied Psychological Measurement %V 22 %P 259-270 %G eng %0 Journal Article %J Journal of Educational Computing Research %D 1998 %T Optimal sequential rules for computer-based instruction %A Vos, H. J. %B Journal of Educational Computing Research %V 19(2) %P 133-154 %G eng %0 Journal Article %J Applied Psychological Measurement %D 1998 %T Optimal test assembly of psychological and educational tests %A van der Linden, W. J. %B Applied Psychological Measurement %V 22 %P 195-211 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1998 %T Properties of ability estimation methods in computerized adaptive testing %A Wang, T. %A Vispoel, W. P. %B Journal of Educational Measurement %V 35 %P 109-135 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1998 %T Psychometric characteristics of computer-adaptive and self-adaptive vocabulary tests: The role of answer feedback and test anxiety %A Vispoel, W. P. %B Journal of Educational Measurement %V 35 %P 328-347 or 155-167 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1998 %T Reviewing and changing answers on computer-adaptive and self-adaptive vocabulary tests %A Vispoel, W. P. %B Journal of Educational Measurement %V 35 %P 328-345 %G eng %0 Journal Article %J Psychometrika %D 1998 %T Stochastic order in dichotomous item response models for fixed, adaptive, and multidimensional tests %A van der Linden, W. J. %B Psychometrika %V 63 %P 211-226 %G eng %0 Generic %D 1998 %T Using response-time constraints to control for differential speededness in adaptive testing (Research Report 98-06) %A van der Linden, W. J. %A Scrams, D. J. %A Schnipke, D. L. %C Enschede, The Netherlands: University of Twente, Faculty of Educational Science and Technology, Department of Measurement and Data Analysis %G eng %0 Generic %D 1997 %T Applications of Bayesian decision theory to sequential mastery testing (Research Report 97-06) %A Vos, H. J. %C Twente, The Netherlands: Department of Educational Measurement and Data Analysis %G eng %0 Journal Article %J Journal of Educational Measurement %D 1997 %T Computerized adaptive and fixed-item testing of music listening skill: A comparison of efficiency, precision, and concurrent validity %A Vispoel, W. P. %A Wang, T. %A Bleiler, T. %B Journal of Educational Measurement %V 34 %P 43-63 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1997 %T Computerized adaptive and fixed-item testing of music listening skill: A comparison of efficiency, precision, and concurrent validity %A Vispoel, W. P., %A Wang, T. %B Journal of Educational Measurement %V 34 %P 43-63 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T Detection of aberrant response patterns in CAT %A van der Linden, W. J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at Multidisciplinary Perspectives on Musicality: The Seashore Symposium %D 1997 %T Improving the quality of music aptitude tests through adaptive administration of items %A Vispoel, W. P. %B Paper presented at Multidisciplinary Perspectives on Musicality: The Seashore Symposium %C University of Iowa, Iowa City IA %G eng %0 Generic %D 1997 %T A minimax sequential procedure in the context of computerized adaptive mastery testing (Research Report 97-07) %A Vos, H. J. %C Twente, The Netherlands: Department of Educational Measurement and Data Analysis %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1997 %T Multidimensional adaptive testing with a minimum error-variance criterion %A van der Linden, W. J. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago %G eng %0 Generic %D 1997 %T Multidimensional adaptive testing with a minimum error-variance criterion (Research Report 97-03) %A van der Linden, W. J. %C Enschede, The Netherlands: University of Twente, Department of Educational Measurement and Data Analysis %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 1997 %T Some new item selection criteria for adaptive testing %A Berger, M. P. F., %A Veerkamp, W. J. J. %B Journal of Educational and Behavioral Statistics %V 22 %P 203-226 %G eng %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 1997 %T Some new item selection criteria for adaptive testing %A Veerkamp, W. J. J., %A Berger, M. P. F. %B Journal of Educational and Behavioral Statistics %V 22 %P 203-226 %G eng %0 Book %D 1997 %T Statistical methods for computerized adaptive testing %A Veerkamp, W. J. J. %C Unpublished doctoral dissertation, University of Twente, Enschede, The Netherlands %G eng %0 Book Section %D 1997 %T Validation of the experimental CAT-ASVAB system %A Segall, D. O. %A Moreno, K. E. %A Kieckhaefer, W. F. %A Vicino, F. L. %A J. R. McBride %C W. A. Sands, B. K. Waters, and J. R. McBride (Eds.), Computerized adaptive testing: From inquiry to operation. Washington, DC: American Psychological Association. %G eng %0 Journal Article %J Psychometrika %D 1996 %T Bayesian item selection criteria for adaptive testing %A van der Linden, W. J. %B Psychometrika %V 63 %P 201-216 %G eng %0 Generic %D 1996 %T Bayesian item selection criteria for adaptive testing (Research Report 96-01) %A van der Linden, W. J. %C Twente, The Netherlands: Department of Educational Measurement and Data Analysis %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National council on Measurement in Education %D 1996 %T Can examinees use a review option to positively bias their scores on a computerized adaptive test? Paper presented at the annual meeting of the National council on Measurement in Education, New York %A Rocklin, T. R. %A Vispoel, W. P. %A Wang, T. %A Bleiler, T. L. %B Paper presented at the annual meeting of the National council on Measurement in Education %C New York NY %G eng %0 Book %D 1996 %T A comparison of adaptive self-referenced testing and classical approaches to the measurement of individual change %A VanLoy, W. J. %C Unpublished doctoral dissertation, University of Minnesota %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education. %D 1996 %T Effects of answer feedback and test anxiety on the psychometric and motivational characteristics of computer-adaptive and self-adaptive vocabulary tests %A Vispoel, W. P. %A Brunsman, B. %A Forte, E. %A Bleiler, T. %B Paper presented at the annual meeting of the National Council on Measurement in Education. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1996 %T Effects of answer review and test anxiety on the psychometric and motivational characteristics of computer-adaptive and self-adaptive vocabulary tests %A Vispoel, W. %A Forte, E. %A Boo, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New York %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the Psychometric Society %D 1995 %T Bayesian item selection in adaptive testing %A van der Linden, W. J. %B Paper presented at the Annual Meeting of the Psychometric Society %C Minneapolis MN %G eng %0 Book Section %D 1995 %T Computerized testing for licensure %A Vale, C. D. %C J. Impara (ed.), Licensure testing: Purposes, procedures, and Practices (pp. 291-320). Lincoln NE: Buros Institute of Mental Measurements. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1995 %T Precision of ability estimation methods in computerized adaptive testing %A Wang, T. %A Vispoel, W. P. %B Paper presented at the annual meeting of the Psychometric Society %C Minneapolis %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T Computerized-adaptive and self-adapted music-listening tests: Features and motivational benefits %A Vispoel, W. P., %A Coffman, D. D. %B Applied Measurement in Education %V 7 %P 25-51 %G eng %0 Journal Article %J Journal of Applied Psychology %D 1994 %T The incomplete equivalence of the paper-and-pencil and computerized versions of the General Aptitude Test Battery %A Van de Vijver, F. J. R., %A Harsveld, M. %B Journal of Applied Psychology %V 79 %P 852-859 %G eng %0 Journal Article %J Applied Measurement in Education %D 1994 %T Individual differences and test administration procedures: A comparison of fixed-item, computerized adaptive, and self-adapted testing %A Vispoel, W. P. %A Rocklin, T. R. %A Wang, T. %B Applied Measurement in Education %V 7 %P 53-79 %G eng %0 Generic %D 1994 %T A simple and fast item selection procedure for adaptive testing %A Veerkamp, W. J. J. %C Research (Report 94-13). University of Twente. %G eng %0 Generic %D 1994 %T Some new item selection criteria for adaptive testing (Research Rep 94-6) %A Veerkamp, W. J. %A Berger, M. P. F. %C Enschede, The Netherlands: University of Twente, Department of Educational Measurement and Data Analysis. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 1993 %T Computerized adaptive and fixed-item versions of the ITED Vocabulary test %A Vispoel, W. P. %B Educational and Psychological Measurement %V 53 %P 779-788 %G eng %0 Journal Article %J Journal of Research in Music Education %D 1993 %T The development and evaluation of a computerized adaptive test of tonal memory %A Vispoel, W. P. %B Journal of Research in Music Education %V 41 %P 111-136 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1993 %T The efficiency, reliability, and concurrent validity of adaptive and fixed-item tests of music listening skills %A Vispoel, W. P. %A Wang, T. %A Bleiler, T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Atlanta GA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the AEARA %D 1993 %T Individual differences and test administration procedures: A comparison of fixed-item, adaptive, and self-adapted testing %A Vispoel, W. P. %A Rocklin, T. R. %B Paper presented at the annual meeting of the AEARA %C Atlanta GA %G eng %0 Journal Article %J Bulletin of the Council for Research in Music Education %D 1992 %T Computerized adaptive testing of music-related skills %A Vispoel, W. P., %A Coffman, D. D. %B Bulletin of the Council for Research in Music Education %V 112 %P 29-49 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1992 %T How review options and administration mode influence scores on computerized vocabulary tests %A Vispoel, W. P. %A Wang, T. %A De la Torre, R. %A Bleiler, T. %A Dings, J. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Francisco CA %G eng %0 Journal Article %J Psychomusicology %D 1992 %T Improving the measurement of tonal memory with computerized adaptive tests %A Vispoel, W. P. %B Psychomusicology %V 11 %P 73-89 %G eng %0 Book %D 1992 %T Manual for the General Scholastic Aptitude Test (Senior) Computerized adaptive test %A Von Tonder, M. %A Claasswn, N. C. W. %C Pretoria: Human Sciences Research Council %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1991 %T The development and evaluation of a computerized adaptive testing system %A De la Torre, R. %A Vispoel, W. P. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the biannual meeting of the Music Educators National Conference %D 1990 %T Computerized adaptive music tests: A new solution to three old problems %A Vispoel, W. P. %B Paper presented at the biannual meeting of the Music Educators National Conference %C Washington DC %G eng %0 Book Section %D 1990 %T Creating adaptive tests of musical ability with limited-size item pools %A Vispoel, W. T. %A Twing, J. S %C D. Dalton (Ed.), ADCIS 32nd International Conference Proceedings (pp. 105-112). Columbus OH: Association for the Development of Computer-Based Instructional Systems. %G eng %0 Conference Paper %B Paper presented at the ADCIS 32nd International Conference %D 1990 %T MusicCAT: An adaptive testing program to assess musical ability %A Vispoel, W. P. %A Coffman, D. %A Scriven, D. %B Paper presented at the ADCIS 32nd International Conference %C San Diego CA %G eng %0 Journal Article %J International Journal of Educational Research %D 1989 %T Some procedures for computerized ability testing %A van der Linden, W. J. %A Zwarts, M. A. %B International Journal of Educational Research %V 13(2) %P 175-187 %G eng %0 Book %D 1987 %T An adaptive test of musical memory: An application of item response theory to the assessment of musical ability %A Vispoel, W. P. %C Doctoral dissertation, University of Illinois. Dissertation Abstracts International, 49, 79A. %G eng %0 Journal Article %J Applied Psychology: An International Review %D 1987 %T Adaptive testing %A Weiss, D. J. %A Vale, C. D. %B Applied Psychology: An International Review %V 36 %P 249-262 %G eng %0 Book Section %D 1987 %T Computerized adaptive testing for measuring abilities and other psychological variables %A Weiss, D. J. %A Vale, C. D. %C J. N. Butcher (Ed.), Computerized personality measurement: A practitioners guide (pp. 325-343). New York: Basic Books. %G eng %0 Book Section %D 1987 %T Improving the measurement of musical ability through adaptive testing %A Vispoel, W. P. %C G. Hayes (Ed.), Proceedings of the 29th International ADCIS Conference (pp. 221-228). Bellingham WA: ADCIS. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1987 %T Multidimensional adaptive testing: A procedure for sequential estimation of the posterior centroid and dispersion of theta %A Bloxom, B. M. %A Vale, C. D. %B Paper presented at the annual meeting of the Psychometric Society %C Montreal, Canada %G eng %0 Generic %D 1985 %T Armed Services Vocational Aptitude Battery: Development of an adaptive item pool (AFHLR-TR-85-19; Technical Rep No 85-19) %A Prestwood, J. S. %A Vale, C. D. %A Massey, R. H. %A Welsh, J. R. %C Brooks Air Force Base TX: Air Force Human Resources Laboratory %G eng %0 Generic %D 1985 %T Development of a microcomputer-based adaptive testing system: Phase II Implementation (Research Report ONR 85-5) %A Vale, C. D. %C St. Paul MN: Assessment Systems Corporation %G eng %0 Generic %D 1984 %T Evaluation of computerized adaptive testing of the ASVAB %A Hardwicke, S. %A Vicino, F. %A J. R. McBride %A Nemeth, C. %C San Diego, CA: Navy Personnel Research and Development Center, unpublished manuscript %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1984 %T An evaluation of the utility of large scale computerized adaptive testing %A Vicino, F. L. %A Hardwicke, S. B. %B Paper presented at the annual meeting of the American Educational Research Association %C Chicago %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1984 %T An evaluation of the utility of large scale computerized adaptive testing %A Vicino, F. L. %A Hardwicke, S. B. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Book Section %D 1982 %T Design of a Microcomputer-Based Adaptive Testing System %A Vale, C. D. %C D. J. Weiss (Ed.), Proceedings of the 1979 Item Response Theory and Computerized Adaptive Testing Conference (pp. 360-371). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laborat %G eng %0 Journal Article %J International Journal of Man-Machine Studies %D 1982 %T Pros and cons of tailored testing: An examination of issues highlighted with an automated testing system %A Volans, P. J. %B International Journal of Man-Machine Studies %V 17 %P 301-304 %G eng %0 Journal Article %J Behavior Research Methods and Instrumentation %D 1981 %T Design and implementation of a microcomputer-based adaptive testing system %A Vale, C. D. %B Behavior Research Methods and Instrumentation %V 13 %P 399-406 %G eng %0 Book %D 1980 %T Development and evaluation of an adaptive testing strategy for use in multidimensional interest assessment %A Vale, C. D. %C Unpublished doctoral dissertation, University of Minnesota. Dissertational Abstract International, 42(11-B), 4248-4249 %G eng %0 Journal Article %J TIMS Studies in the Management Sciences %D 1978 %T The stratified adaptive ability test as a tool for personnel selection and placement %A Vale, C. D. %A Weiss, D. J. %B TIMS Studies in the Management Sciences %V 8 %P 135-151 %G eng %0 Book Section %D 1977 %T Adaptive testing and the problem of classification %A Vale, C. D. %C D. Weiss (Ed.), Applications of computerized adaptive testing (Research Report 77-1). Minneapolis MN: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Generic %D 1977 %T A rapid item search procedure for Bayesian adaptive testing (Research Report 77-4) %A Vale, C. D. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Generic %D 1975 %T A simulation study of stradaptive ability testing (Research Report 75-6) %A Vale, C. D. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng %0 Book Section %D 1975 %T Strategies of branching through an item pool %A Vale, C. D. %C D. J. Weiss (Ed.), Computerized adaptive trait measurement: Problems and Prospects (Research Report 75-5), pp. 1-16. Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program. %G eng %0 Generic %D 1975 %T A study of computer-administered stradaptive ability testing (Research Report 75-4) %A Vale, C. D. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program %G eng