%0 Journal Article %J Assessment %D In Press %T Development of a Computerized Adaptive Test for Anxiety Based on the Dutch–Flemish Version of the PROMIS Item Bank %A Gerard Flens %A Niels Smits %A Caroline B. Terwee %A Joost Dekker %A Irma Huijbrechts %A Philip Spinhoven %A Edwin de Beurs %X We used the Dutch–Flemish version of the USA PROMIS adult V1.0 item bank for Anxiety as input for developing a computerized adaptive test (CAT) to measure the entire latent anxiety continuum. First, psychometric analysis of a combined clinical and general population sample (N = 2,010) showed that the 29-item bank has psychometric properties that are required for a CAT administration. Second, a post hoc CAT simulation showed efficient and highly precise measurement, with an average number of 8.64 items for the clinical sample, and 9.48 items for the general population sample. Furthermore, the accuracy of our CAT version was highly similar to that of the full item bank administration, both in final score estimates and in distinguishing clinical subjects from persons without a mental health disorder. We discuss the future directions and limitations of CAT development with the Dutch–Flemish version of the PROMIS Anxiety item bank. %B Assessment %U https://doi.org/10.1177/1073191117746742 %R 10.1177/1073191117746742 %0 Journal Article %J Journal of Computerized Adaptive Testing %D 2023 %T How Do Trait Change Patterns Affect the Performance of Adaptive Measurement of Change? %A Ming Him Tai %A Allison W. Cooperman %A Joseph N. DeWeese %A David J. Weiss %K adaptive measurement of change %K computerized adaptive testing %K longitudinal measurement %K trait change patterns %B Journal of Computerized Adaptive Testing %V 10 %P 32-58 %G English %N 3 %R 10.7333/2307-1003032 %0 Journal Article %J BMC Pediatrics %D 2020 %T Computerized adaptive testing to screen children for emotional and behavioral problems by preventive child healthcare %A Theunissen, Meninou H.C. %A de Wolff, Marianne S. %A Deurloo, Jacqueline A. %A Vogels, Anton G. C. %X

Background

Questionnaires to detect emotional and behavioral problems (EBP) in Preventive Child Healthcare (PCH) should be short which potentially affects validity and reliability. Simulation studies have shown that Computerized Adaptive Testing (CAT) could overcome these weaknesses. We studied the applicability (using the measures participation rate, satisfaction, and efficiency) and the validity of CAT in routine PCH practice.

Methods

We analyzed data on 461 children aged 10–11 years (response 41%), who were assessed during routine well-child examinations by PCH professionals. Before the visit, parents completed the CAT and the Child Behavior Checklist (CBCL). Satisfaction was measured by parent- and PCH professional-report. Efficiency of the CAT procedure was measured as number of items needed to assess whether a child has serious problems or not. Its validity was assessed using the CBCL as the criterion.

Results

Parents and PCH professionals rated the CAT on average as good. The procedure required at average 16 items to assess whether a child has serious problems or not. Agreement of scores on the CAT scales with corresponding CBCL scales was high (range of Spearman correlations 0.59–0.72). Area Under Curves (AUC) were high (range: 0.95–0.97) for the Psycat total, externalizing, and hyperactivity scales using corresponding CBCL scale scores as criterion. For the Psycat internalizing scale the AUC was somewhat lower but still high (0.86).

Conclusions

CAT is a valid procedure for the identification of emotional and behavioral problems in children aged 10–11 years. It may support the efficient and accurate identification of children with overall, and potentially also specific, emotional and behavioral problems in routine PCH.

%B BMC Pediatrics %V 20 %U https://bmcpediatr.biomedcentral.com/articles/10.1186/s12887-020-2018-1 %N Article number: 119 %0 Journal Article %J Applied Psychological Measurement %D 2020 %T Stratified Item Selection Methods in Cognitive Diagnosis Computerized Adaptive Testing %A Jing Yang %A Hua-Hua Chang %A Jian Tao %A Ningzhong Shi %X Cognitive diagnostic computerized adaptive testing (CD-CAT) aims to obtain more useful diagnostic information by taking advantages of computerized adaptive testing (CAT). Cognitive diagnosis models (CDMs) have been developed to classify examinees into the correct proficiency classes so as to get more efficient remediation, whereas CAT tailors optimal items to the examinee’s mastery profile. The item selection method is the key factor of the CD-CAT procedure. In recent years, a large number of parametric/nonparametric item selection methods have been proposed. In this article, the authors proposed a series of stratified item selection methods in CD-CAT, which are combined with posterior-weighted Kullback–Leibler (PWKL), nonparametric item selection (NPS), and weighted nonparametric item selection (WNPS) methods, and named S-PWKL, S-NPS, and S-WNPS, respectively. Two different types of stratification indices were used: original versus novel. The performances of the proposed item selection methods were evaluated via simulation studies and compared with the PWKL, NPS, and WNPS methods without stratification. Manipulated conditions included calibration sample size, item quality, number of attributes, number of strata, and data generation models. Results indicated that the S-WNPS and S-NPS methods performed similarly, and both outperformed the S-PWKL method. And item selection methods with novel stratification indices performed slightly better than the ones with original stratification indices, and those without stratification performed the worst. %B Applied Psychological Measurement %V 44 %P 346-361 %U https://doi.org/10.1177/0146621619893783 %R 10.1177/0146621619893783 %0 Journal Article %J Applied Psychological Measurement %D 2019 %T Nonparametric CAT for CD in Educational Settings With Small Samples %A Yuan-Pei Chang %A Chia-Yi Chiu %A Rung-Ching Tsai %X Cognitive diagnostic computerized adaptive testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation. Although model-based CD-CAT is relatively well researched in the context of large-scale assessment systems, this type of system has not received the same degree of research and development in small-scale settings, such as at the course-based level, where this system would be the most useful. The main obstacle is that the statistical estimation techniques that are successfully applied within the context of a large-scale assessment require large samples to guarantee reliable calibration of the item parameters and an accurate estimation of the examinees’ proficiency class membership. Such samples are simply not obtainable in course-based settings. Therefore, the nonparametric item selection (NPS) method that does not require any parameter calibration, and thus, can be used in small educational programs is proposed in the study. The proposed nonparametric CD-CAT uses the nonparametric classification (NPC) method to estimate an examinee’s attribute profile and based on the examinee’s item responses, the item that can best discriminate the estimated attribute profile and the other attribute profiles is then selected. The simulation results show that the NPS method outperformed the compared parametric CD-CAT algorithms and the differences were substantial when the calibration samples were small. %B Applied Psychological Measurement %V 43 %P 543-561 %U https://doi.org/10.1177/0146621618813113 %R 10.1177/0146621618813113 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T Item Selection Methods in Multidimensional Computerized Adaptive Testing With Polytomously Scored Items %A Dongbo Tu %A Yuting Han %A Yan Cai %A Xuliang Gao %X Multidimensional computerized adaptive testing (MCAT) has been developed over the past decades, and most of them can only deal with dichotomously scored items. However, polytomously scored items have been broadly used in a variety of tests for their advantages of providing more information and testing complicated abilities and skills. The purpose of this study is to discuss the item selection algorithms used in MCAT with polytomously scored items (PMCAT). Several promising item selection algorithms used in MCAT are extended to PMCAT, and two new item selection methods are proposed to improve the existing selection strategies. Two simulation studies are conducted to demonstrate the feasibility of the extended and proposed methods. The simulation results show that most of the extended item selection methods for PMCAT are feasible and the new proposed item selection methods perform well. Combined with the security of the pool, when two dimensions are considered (Study 1), the proposed modified continuous entropy method (MCEM) is the ideal of all in that it gains the lowest item exposure rate and has a relatively high accuracy. As for high dimensions (Study 2), results show that mutual information (MUI) and MCEM keep relatively high estimation accuracy, and the item exposure rates decrease as the correlation increases. %B Applied Psychological Measurement %V 42 %P 677-694 %U https://doi.org/10.1177/0146621618762748 %R 10.1177/0146621618762748 %0 Journal Article %J Applied Psychological Measurement %D 2018 %T Measuring patient-reported outcomes adaptively: Multidimensionality matters! %A Paap, Muirne C. S. %A Kroeze, Karel A. %A Glas, C. A. W. %A Terwee, C. B. %A van der Palen, Job %A Veldkamp, Bernard P. %B Applied Psychological Measurement %R 10.1177/0146621617733954 %0 Journal Article %J Journal of Educational Measurement %D 2018 %T On-the-Fly Constraint-Controlled Assembly Methods for Multistage Adaptive Testing for Cognitive Diagnosis %A Liu, Shuchang %A Cai, Yan %A Tu, Dongbo %X Abstract This study applied the mode of on-the-fly assembled multistage adaptive testing to cognitive diagnosis (CD-OMST). Several and several module assembly methods for CD-OMST were proposed and compared in terms of measurement precision, test security, and constrain management. The module assembly methods in the study included the maximum priority index method (MPI), the revised maximum priority index (RMPI), the weighted deviation model (WDM), and the two revised Monte Carlo methods (R1-MC, R2-MC). Simulation results showed that on the whole the CD-OMST performs well in that it not only has acceptable attribute pattern correct classification rates but also satisfies both statistical and nonstatistical constraints; the RMPI method was generally better than the MPI method, the R2-MC method was generally better than the R1-MC method, and the two revised Monte Carlo methods performed best in terms of test security and constraint management, whereas the RMPI and WDM methods worked best in terms of measurement precision. The study is not only expected to provide information about how to combine MST and CD using an on-the-fly method and how do these assembled methods in CD-OMST perform relative to each other but also offer guidance for practitioners to assemble modules in CD-OMST with both statistical and nonstatistical constraints. %B Journal of Educational Measurement %V 55 %P 595-613 %U https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12194 %R 10.1111/jedm.12194 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Computerized Adaptive Testing for Cognitive Diagnosis in Classroom: A Nonparametric Approach %A Yuan-Pei Chang %A Chia-Yi Chiu %A Rung-Ching Tsai %K CD-CAT %K non-parametric approach %X

In the past decade, CDMs of educational test performance have received increasing attention among educational researchers (for details, see Fu & Li, 2007, and Rupp, Templin, & Henson, 2010). CDMs of educational test performance decompose the ability domain of a given test into specific skills, called attributes, each of which an examinee may or may not have mastered. The resulting attribute profile documents the individual’s strengths and weaknesses within the ability domain. The Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation (e.g., Cheng & Chang, 2007; Cheng, 2009; Liu, You, Wang, Ding, & Chang, 2013; Tatsuoka & Tatsuoka, 1997). While model-based CD-CAT is relatively well-researched in the context of large-scale assessments, this type of system has not received the same degree of development in small-scale settings, where it would be most useful. The main challenge is that the statistical estimation techniques successfully applied to the parametric CD-CAT require large samples to guarantee the reliable calibration of item parameters and accurate estimation of examinees’ attribute profiles. In response to the challenge, a nonparametric approach that does not require any parameter calibration, and thus can be used in small educational programs, is proposed. The proposed nonparametric CD-CAT relies on the same principle as the regular CAT algorithm, but uses the nonparametric classification method (Chiu & Douglas, 2013) to assess and update the student’s ability state while the test proceeds. Based on a student’s initial responses, 2 a neighborhood of candidate proficiency classes is identified, and items not characteristic of the chosen proficiency classes are precluded from being chosen next. The response to the next item then allows for an update of the skill profile, and the set of possible proficiency classes is further narrowed. In this manner, the nonparametric CD-CAT cycles through item administration and update stages until the most likely proficiency class has been pinpointed. The simulation results show that the proposed method outperformed the compared parametric CD-CAT algorithms and the differences were significant when the item parameter calibration was not optimal.

References

Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.

Cheng, Y., & Chang, H. (2007). The modified maximum global discrimination index method for cognitive diagnostic CAT. In D. Weiss (Ed.) Proceedings of the 2007 GMAC Computerized Adaptive Testing Conference.

Chiu, C.-Y., & Douglas, J. A. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250.

Fu, J., & Li, Y. (2007). An integrative review of cognitively diagnostic psychometric models. Paper presented at the Annual Meeting of the National Council on Measurement in Education. Chicago, Illinois.

Liu, H., You, X., Wang, W., Ding, S., & Chang, H. (2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152-172.

Rupp, A. A., & Templin, J. L., & Henson, R. A. (2010). Diagnostic Measurement. Theory, Methods, and Applications. New York: Guilford.

Tatsuoka, K.K., & Tatsuoka, M.M. (1997), Computerized cognitive diagnostic adaptive testing: Effect on remedial instruction as empirical validation. Journal of Educational Measurement, 34, 3–20.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Developing a CAT: An Integrated Perspective %A Nathan Thompson %K CAT Development %K integrated approach %X

Most resources on computerized adaptive testing (CAT) tend to focus on psychometric aspects such as mathematical formulae for item selection or ability estimation. However, development of a CAT assessment requires a holistic view of project management, financials, content development, product launch and branding, and more. This presentation will develop such a holistic view, which serves several purposes, including providing a framework for validity, estimating costs and ROI, and making better decisions regarding the psychometric aspects.

Thompson and Weiss (2011) presented a 5-step model for developing computerized adaptive tests (CATs). This model will be presented and discussed as the core of this holistic framework, then applied to real-life examples. While most CAT research focuses on developing new quantitative algorithms, this presentation is instead intended to help researchers evaluate and select algorithms that are most appropriate for their needs. It is therefore ideal for practitioners that are familiar with the basics of item response theory and CAT, and wish to explore how they might apply these methodologies to improve their assessments.

Steps include:

1. Feasibility, applicability, and planning studies

2. Develop item bank content or utilize existing bank

3. Pretest and calibrate item bank

4. Determine specifications for final CAT

5. Publish live CAT.

So, for example, Step 1 will contain simulation studies which estimate item bank requirements, which then can be used to determine costs of content development, which in turn can be integrated into an estimated project cost timeline. Such information is vital in determining if the CAT should even be developed in the first place.

References

Thompson, N. A., & Weiss, D. J. (2011). A Framework for the Development of Computerized Adaptive Tests. Practical Assessment, Research & Evaluation, 16(1). Retrieved from http://pareonline.net/getvn.asp?v=16&n=1.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1Jv8bpH2zkw5TqSMi03e5JJJ98QtXf-Cv %0 Journal Article %J Evaluation & the Health Professions %D 2017 %T Development of a Computer Adaptive Test for Depression Based on the Dutch-Flemish Version of the PROMIS Item Bank %A Gerard Flens %A Niels Smits %A Caroline B. Terwee %A Joost Dekker %A Irma Huijbrechts %A Edwin de Beurs %X We developed a Dutch-Flemish version of the patient-reported outcomes measurement information system (PROMIS) adult V1.0 item bank for depression as input for computerized adaptive testing (CAT). As item bank, we used the Dutch-Flemish translation of the original PROMIS item bank (28 items) and additionally translated 28 U.S. depression items that failed to make the final U.S. item bank. Through psychometric analysis of a combined clinical and general population sample (N = 2,010), 8 added items were removed. With the final item bank, we performed several CAT simulations to assess the efficiency of the extended (48 items) and the original item bank (28 items), using various stopping rules. Both item banks resulted in highly efficient and precise measurement of depression and showed high similarity between the CAT simulation scores and the full item bank scores. We discuss the implications of using each item bank and stopping rule for further CAT development. %B Evaluation & the Health Professions %V 40 %P 79-105 %U https://doi.org/10.1177/0163278716684168 %R 10.1177/0163278716684168 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T Evaluation of Parameter Recovery, Drift, and DIF with CAT Data %A Nathan Thompson %A Jordan Stoeger %K CAT %K DIF %K Parameter Drift %K Parameter Recovery %X

Parameter drift and differential item functioning (DIF) analyses are frequent components of a test maintenance plan. That is, after a test form(s) is published, organizations will often calibrate postpublishing data at a later date to evaluate whether the performance of the items or the test has changed over time. For example, if item content is leaked, the items might gradually become easier over time, and item statistics or parameters can reflect this.

When tests are published under a computerized adaptive testing (CAT) paradigm, they are nearly always calibrated with item response theory (IRT). IRT calibrations assume that range restriction is not an issue – that is, each item is administered to a range of examinee ability. CAT data violates this assumption. However, some organizations still wish to evaluate continuing performance of the items from a DIF or drift paradigm.

This presentation will evaluate just how inaccurate DIF and drift analyses might be on CAT data, using a Monte Carlo parameter recovery methodology. Known item parameters will be used to generate both linear and CAT data sets, which are then calibrated for DIF and drift. In addition, we will implement Randomesque item exposure constraints in some CAT conditions, as this randomization directly alleviates the range restriction problem somewhat, but it is an empirical question as to whether this improves the parameter recovery calibrations.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1F7HCZWD28Q97sCKFIJB0Yps0H66NPeKq %0 Journal Article %J Quality of Life Research %D 2017 %T Item usage in a multidimensional computerized adaptive test (MCAT) measuring health-related quality of life %A Paap, Muirne C. S. %A Kroeze, Karel A. %A Terwee, Caroline B. %A van der Palen, Job %A Veldkamp, Bernard P. %B Quality of Life Research %V 26 %P 2909–2918 %U https://doi.org/10.1007/s11136-017-1624-3 %R 10.1007/s11136-017-1624-3 %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T New Challenges (With Solutions) and Innovative Applications of CAT %A Chun Wang %A David J. Weiss %A Xue Zhang %A Jian Tao %A Yinhong He %A Ping Chen %A Shiyu Wang %A Susu Zhang %A Haiyan Lin %A Xiaohong Gao %A Hua-Hua Chang %A Zhuoran Shang %K CAT %K challenges %K innovative applications %X

Over the past several decades, computerized adaptive testing (CAT) has profoundly changed the administration of large-scale aptitude tests, state-wide achievement tests, professional licensure exams, and health outcome measures. While many challenges of CAT have been successfully addressed due to the continual efforts of researchers in the field, there are still many remaining, longstanding challenges that have yet to be resolved. This symposium will begin with three presentations, each of which provides a sound solution to one of the unresolved challenges. They are (1) item calibration when responses are “missing not at random” from CAT administration; (2) online calibration of new items when person traits have non-ignorable measurement error; (3) establishing consistency and asymptotic normality of latent trait estimation when allowing item response revision in CAT. In addition, this symposium also features innovative applications of CAT. In particular, there is emerging interest in using cognitive diagnostic CAT to monitor and detect learning progress (4th presentation). Last but not least, the 5th presentation illustrates the power of multidimensional polytomous CAT that permits rapid identification of hospitalized patients’ rehabilitative care needs in health outcomes measurement. We believe this symposium covers a wide range of interesting and important topics in CAT.

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1Wvgxw7in_QCq_F7kzID6zCZuVXWcFDPa %0 Conference Paper %B IACAT 2017 Conference %D 2017 %T A Simulation Study to Compare Classification Method in Cognitive Diagnosis Computerized Adaptive Testing %A Jing Yang %A Jian Tao %A Hua-Hua Chang %A Ning-Zhong Shi %X

Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) combines the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models that can be viewed as restricted latent class models have been developed to classify the examinees into the correct profile of skills that have been mastered and those that have not so as to get more efficient remediation. Chiu & Douglas (2013) introduces a nonparametric procedure that only requires specification of Q-matrix to classify by proximity to ideal response pattern. In this article, we compare nonparametric procedure with common profile estimation method like maximum a posterior (MAP) in CD-CAT. Simulation studies consider a variety of Q-matrix structure, the number of attributes, ways to generate attribute profiles, and item quality. Results indicate that nonparametric procedure consistently gets the higher pattern and attribute recovery rate in nearly all conditions.

References

Chiu, C.-Y., & Douglas, J. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30, 225-250. doi: 10.1007/s00357-013-9132-9

Session Video

%B IACAT 2017 Conference %I Niigata Seiryo University %C Niigata, Japan %8 08/2017 %G eng %U https://drive.google.com/open?id=1jCL3fPZLgzIdwvEk20D-FliZ15OTUtpr %0 Journal Article %J Quality of Life Research %D 2017 %T The validation of a computer-adaptive test (CAT) for assessing health-related quality of life in children and adolescents in a clinical sample: study design, methods and first results of the Kids-CAT study %A Barthel, D. %A Otto, C. %A Nolte, S. %A Meyrose, A.-K. %A Fischer, F. %A Devine, J. %A Walter, O. %A Mierke, A. %A Fischer, K. I. %A Thyen, U. %A Klein, M. %A Ankermann, T. %A Rose, M. %A Ravens-Sieberer, U. %X Recently, we developed a computer-adaptive test (CAT) for assessing health-related quality of life (HRQoL) in children and adolescents: the Kids-CAT. It measures five generic HRQoL dimensions. The aims of this article were (1) to present the study design and (2) to investigate its psychometric properties in a clinical setting. %B Quality of Life Research %V 26 %P 1105–1117 %8 May %U https://doi.org/10.1007/s11136-016-1437-9 %R 10.1007/s11136-016-1437-9 %0 Journal Article %J Applied Psychological Measurement %D 2015 %T Stochastic Curtailment in Adaptive Mastery Testing: Improving the Efficiency of Confidence Interval–Based Stopping Rules %A Sie, Haskell %A Finkelman, Matthew D. %A Bartroff, Jay %A Thompson, Nathan A. %X A well-known stopping rule in adaptive mastery testing is to terminate the assessment once the examinee’s ability confidence interval lies entirely above or below the cut-off score. This article proposes new procedures that seek to improve such a variable-length stopping rule by coupling it with curtailment and stochastic curtailment. Under the new procedures, test termination can occur earlier if the probability is high enough that the current classification decision remains the same should the test continue. Computation of this probability utilizes normality of an asymptotically equivalent version of the maximum likelihood ability estimate. In two simulation sets, the new procedures showed a substantial reduction in average test length while maintaining similar classification accuracy to the original method. %B Applied Psychological Measurement %V 39 %P 278-292 %U http://apm.sagepub.com/content/39/4/278.abstract %R 10.1177/0146621614561314 %0 Journal Article %J Educational Measurement: Issues and Practice %D 2013 %T The Philosophical Aspects of IRT Equating: Modeling Drift to Evaluate Cohort Growth in Large-Scale Assessments %A Taherbhai, Husein %A Seo, Daeryong %K cohort growth %K construct-relevant drift %K evaluation of scale drift %K philosophical aspects of IRT equating %B Educational Measurement: Issues and Practice %V 32 %P 2–14 %U http://dx.doi.org/10.1111/emip.12000 %R 10.1111/emip.12000 %0 Journal Article %J Applied Measurement in Education %D 2012 %T Multistage Computerized Adaptive Testing With Uniform Item Exposure %A Edwards, Michael C. %A Flora, David B. %A Thissen, David %B Applied Measurement in Education %V 25 %P 118-141 %U http://www.tandfonline.com/doi/abs/10.1080/08957347.2012.660363 %R 10.1080/08957347.2012.660363 %0 Journal Article %J Physical & Occupational Therapy in Pediatrics %D 2011 %T Content range and precision of a computer adaptive test of upper extremity function for children with cerebral palsy %A Montpetit, K. %A Haley, S. %A Bilodeau, N. %A Ni, P. %A Tian, F. %A Gorton, G., 3rd %A Mulcahey, M. J. %X This article reports on the content range and measurement precision of an upper extremity (UE) computer adaptive testing (CAT) platform of physical function in children with cerebral palsy. Upper extremity items representing skills of all abilities were administered to 305 parents. These responses were compared with two traditional standardized measures: Pediatric Outcomes Data Collection Instrument and Functional Independence Measure for Children. The UE CAT correlated strongly with the upper extremity component of these measures and had greater precision when describing individual functional ability. The UE item bank has wider range with items populating the lower end of the ability spectrum. This new UE item bank and CAT have the capability to quickly assess children of all ages and abilities with good precision and, most importantly, with items that are meaningful and appropriate for their age and level of physical function. %B Physical & Occupational Therapy in Pediatrics %7 2010/10/15 %V 31 %P 90-102 %@ 1541-3144 (Electronic)0194-2638 (Linking) %G eng %M 20942642 %! Phys Occup Ther Pediatr %0 Journal Article %J Journal of Applied Testing Technology %D 2011 %T Design of a Computer-Adaptive Test to Measure English Literacy and Numeracy in the Singapore Workforce: Considerations, Benefits, and Implications %A Jacobsen, J. %A Ackermann, R. %A Egüez, J. %A Ganguli, D. %A Rickard, P. %A Taylor, L. %X

A computer adaptive test CAT) is a delivery methodology that serves the larger goals of the assessment system in which it is embedded. A thorough analysis of the assessment system for which a CAT is being designed is critical to ensure that the delivery platform is appropriate and addresses all relevant complexities. As such, a CAT engine must be designed to conform to the
validity and reliability of the overall system. This design takes the form of adherence to the assessment goals and objectives of the adaptive assessment system. When the assessment is adapted for use in another country, consideration must be given to any necessary revisions including content differences. This article addresses these considerations while drawing, in part, on the process followed in the development of the CAT delivery system designed to test English language workplace skills for the Singapore Workforce Development Agency. Topics include item creation and selection, calibration of the item pool, analysis and testing of the psychometric properties, and reporting and interpretation of scores. The characteristics and benefits of the CAT delivery system are detailed as well as implications for testing programs considering the use of a
CAT delivery system.

%B Journal of Applied Testing Technology %V 12 %G English %U http://www.testpublishers.org/journal-of-applied-testing-technology %N 1 %0 Journal Article %J Practical Assessment Research & Evaluation %D 2011 %T A framework for the development of computerized adaptive tests %A Thompson, N. A. %A Weiss, D. J. %X A substantial amount of research has been conducted over the past 40 years on technical aspects of computerized adaptive testing (CAT), such as item selection algorithms, item exposure controls, and termination criteria. However, there is little literature providing practical guidance on the development of a CAT. This paper seeks to collate some of the available research methodologies into a general framework for the development of any CAT assessment. %B Practical Assessment Research & Evaluation %I Practical Assessment Research & Evaluation %V 16 %G eng %0 Journal Article %J Journal of Applied Testing Technology %D 2011 %T JATT Special Issue on Adaptive Testing: Welcome and Overview %A Thompson, N. A. %B Journal of Applied Testing Technology %V 12 %8 05/2011 %U http://www.testpublishers.org/journal-of-applied-testing-technology %0 Journal Article %J Journal of Career Assessment %D 2011 %T Using Item Response Theory and Adaptive Testing in Online Career Assessment %A Betz, Nancy E. %A Turner, Brandon M. %X

The present article describes the potential utility of item response theory (IRT) and adaptive testing for scale evaluation and for web-based career assessment. The article describes the principles of both IRT and adaptive testing and then illustrates these with reference to data analyses and simulation studies of the Career Confidence Inventory (CCI). The kinds of information provided by IRT are shown to give a more precise look at scale quality across the trait continuum and also to permit the use of adaptive testing, where the items administered are tailored to the individual being tested. Such tailoring can significantly reduce testing time while maintaining high quality of measurement. This efficiency is especially useful when multiscale inventories and/or a large number of scales are to be administered. Readers are encouraged to consider using these advances in career assessment.

%B Journal of Career Assessment %V 19 %P 274-286 %U http://jca.sagepub.com/cgi/content/abstract/19/3/274 %R 10.1177/1069072710395534 %0 Book Section %D 2009 %T Computerized adaptive testing by mutual information and multiple imputations %A Thissen-Roe, A. %X Over the years, most computerized adaptive testing (CAT) systems have used score estimation procedures from item response theory (IRT). IRT models have salutary properties for score estimation, error reporting, and next-item selection. However, some testing purposes favor scoring approaches outside IRT. Where a criterion metric is readily available and more relevant than the assessed construct, for example in the selection of job applicants, a predictive model might be appropriate (Scarborough & Somers, 2006). In these cases, neither IRT scoring nor a unidimensional assessment structure can be assumed. Yet, the primary benefit of CAT remains desirable: shorter assessments with minimal loss of accuracy due to unasked items. In such a case, it remains possible to create a CAT system that produces an estimated score from a subset of available items, recognizes differential item information given the emerging item response pattern, and optimizes the accuracy of the score estimated at every successive item. The method of multiple imputations (Rubin, 1987) can be used to simulate plausible scores given plausible response patterns to unasked items (Thissen-Roe, 2005). Mutual information can then be calculated in order to select an optimally informative next item (or set of items). Previously observed response patterns to two complete neural network-scored assessments were resampled according to MIMI CAT item selection. The reproduced CAT scores were compared to full-length assessment scores. Approximately 95% accurate assignment of examinees to one of three score categories was achieved with a 70%-80% reduction in median test length. Several algorithmic factors influencing accuracy and computational performance were examined. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T Guess what? Score differences with rapid replies versus omissions on a computerized adaptive test %A Talento-Miller, E. %A Guo, F. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Educational and Psychological Measurement %D 2009 %T Item Selection in Computerized Classification Testing %A Thompson, Nathan A. %X

Several alternatives for item selection algorithms based on item response theory in computerized classification testing (CCT) have been suggested, with no conclusive evidence on the substantial superiority of a single method. It is argued that the lack of sizable effect is because some of the methods actually assess items very similarly through different calculations and will usually select the same item. Consideration of methods that assess information across a wider range is often unnecessary under realistic conditions, although it might be advantageous to utilize them only early in a test. In addition, the efficiency of item selection approaches depend on the termination criteria that are used, which is demonstrated through didactic example and Monte Carlo simulation. Item selection at the cut score, which seems conceptually appropriate for CCT, is not always the most efficient option. A broad framework for item selection in CCT is presented that incorporates these points.

%B Educational and Psychological Measurement %V 69 %P 778-793 %U http://epm.sagepub.com/content/69/5/778.abstract %R 10.1177/0013164408324460 %0 Journal Article %J Educational and Psychological Measurement %D 2009 %T Item selection in computerized classification testing %A Thompson, N. A. %X Several alternatives for item selection algorithms based on item response theory in computerized classification testing (CCT) have been suggested, with no conclusive evidence on the substantial superiority of a single method. It is argued that the lack of sizable effect is because some of the methods actually assess items very similarly through different calculations and will usually select the same item. Consideration of methods that assess information across a wider range is often unnecessary under realistic conditions, although it might be advantageous to utilize them only early in a test. In addition, the efficiency of item selection approaches depend on the termination criteria that are used, which is demonstrated through didactic example and Monte Carlo simulation. Item selection at the cut score, which seems conceptually appropriate for CCT, is not always the most efficient option. A broad framework for item selection in CCT is presented that incorporates these points. %B Educational and Psychological Measurement %V 69 %P 778-793 %@ 0013-1644 %G eng %0 Journal Article %J Quality of Life Research %D 2009 %T Measuring global physical health in children with cerebral palsy: Illustration of a multidimensional bi-factor model and computerized adaptive testing %A Haley, S. M. %A Ni, P. %A Dumas, H. M. %A Fragala-Pinkham, M. A. %A Hambleton, R. K. %A Montpetit, K. %A Bilodeau, N. %A Gorton, G. E. %A Watson, K. %A Tucker, C. A. %K *Computer Simulation %K *Health Status %K *Models, Statistical %K Adaptation, Psychological %K Adolescent %K Cerebral Palsy/*physiopathology %K Child %K Child, Preschool %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Massachusetts %K Pennsylvania %K Questionnaires %K Young Adult %X PURPOSE: The purposes of this study were to apply a bi-factor model for the determination of test dimensionality and a multidimensional CAT using computer simulations of real data for the assessment of a new global physical health measure for children with cerebral palsy (CP). METHODS: Parent respondents of 306 children with cerebral palsy were recruited from four pediatric rehabilitation hospitals and outpatient clinics. We compared confirmatory factor analysis results across four models: (1) one-factor unidimensional; (2) two-factor multidimensional (MIRT); (3) bi-factor MIRT with fixed slopes; and (4) bi-factor MIRT with varied slopes. We tested whether the general and content (fatigue and pain) person score estimates could discriminate across severity and types of CP, and whether score estimates from a simulated CAT were similar to estimates based on the total item bank, and whether they correlated as expected with external measures. RESULTS: Confirmatory factor analysis suggested separate pain and fatigue sub-factors; all 37 items were retained in the analyses. From the bi-factor MIRT model with fixed slopes, the full item bank scores discriminated across levels of severity and types of CP, and compared favorably to external instruments. CAT scores based on 10- and 15-item versions accurately captured the global physical health scores. CONCLUSIONS: The bi-factor MIRT CAT application, especially the 10- and 15-item versions, yielded accurate global physical health scores that discriminated across known severity groups and types of CP, and correlated as expected with concurrent measures. The CATs have potential for collecting complex data on the physical health of children with CP in an efficient manner. %B Quality of Life Research %7 2009/02/18 %V 18 %P 359-370 %8 Apr %@ 0962-9343 (Print)0962-9343 (Linking) %G eng %M 19221892 %2 2692519 %0 Book Section %D 2009 %T The MEDPRO project: An SBIR project for a comprehensive IRT and CAT software system: CAT software %A Thompson, N. A. %X Development of computerized adaptive tests (CAT) requires a number of appropriate software tools. This paper describes the development of two new CAT software programs. CATSIM has been designed specifically to conduct several different kinds of simulation studies, which are necessary for planning purposes as well as properly designing live CATs. FastCAT is a software system for banking items and publishing CAT tests as standalone files, to be administered anywhere. Both are available for public use. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2009 %T The MEDPRO project: An SBIR project for a comprehensive IRT and CAT software system: IRT software %A Thissen, D. %X IRTPRO (Item Response Theory for Patient-Reported Outcomes) is an entirely new application for item calibration and test scoring using IRT. IRTPRO implements algorithms for maximum likelihood estimation of item parameters (item calibration) for several unidimensional and multidimensional item response theory (IRT) models for dichotomous and polytomous item responses. In addition, the software provides computation of goodness-of-fit indices, statistics for the diagnosis of local dependence and for the detection of differential item functioning (DIF), and IRT scaled scores. This paper illustrates the use, and some capabilities, of the software. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Quality of Life Research %D 2009 %T Replenishing a computerized adaptive test of patient-reported daily activity functioning %A Haley, S. M. %A Ni, P. %A Jette, A. M. %A Tao, W. %A Moed, R. %A Meyers, D. %A Ludlow, L. H. %K *Activities of Daily Living %K *Disability Evaluation %K *Questionnaires %K *User-Computer Interface %K Adult %K Aged %K Cohort Studies %K Computer-Assisted Instruction %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %X PURPOSE: Computerized adaptive testing (CAT) item banks may need to be updated, but before new items can be added, they must be linked to the previous CAT. The purpose of this study was to evaluate 41 pretest items prior to including them into an operational CAT. METHODS: We recruited 6,882 patients with spine, lower extremity, upper extremity, and nonorthopedic impairments who received outpatient rehabilitation in one of 147 clinics across 13 states of the USA. Forty-one new Daily Activity (DA) items were administered along with the Activity Measure for Post-Acute Care Daily Activity CAT (DA-CAT-1) in five separate waves. We compared the scoring consistency with the full item bank, test information function (TIF), person standard errors (SEs), and content range of the DA-CAT-1 to the new CAT (DA-CAT-2) with the pretest items by real data simulations. RESULTS: We retained 29 of the 41 pretest items. Scores from the DA-CAT-2 were more consistent (ICC = 0.90 versus 0.96) than DA-CAT-1 when compared with the full item bank. TIF and person SEs were improved for persons with higher levels of DA functioning, and ceiling effects were reduced from 16.1% to 6.1%. CONCLUSIONS: Item response theory and online calibration methods were valuable in improving the DA-CAT. %B Quality of Life Research %7 2009/03/17 %V 18 %P 461-71 %8 May %@ 0962-9343 (Print)0962-9343 (Linking) %G eng %M 19288222 %0 Book Section %D 2009 %T Utilizing the generalized likelihood ratio as a termination criterion %A Thompson, N. A. %X Computer-based testing can be used to classify examinees into mutually exclusive groups. Currently, the predominant psychometric algorithm for designing computerized classification tests (CCTs) is the sequential probability ratio test (SPRT; Reckase, 1983) based on item response theory (IRT). The SPRT has been shown to be more efficient than confidence intervals around θ estimates as a method for CCT delivery (Spray & Reckase, 1996; Rudner, 2002). More recently, it was demonstrated that the SPRT, which only uses fixed values, is less efficient than a generalized form which tests whether a given examinee’s θ is below θ1or above θ2 (Thompson, 2007). This formulation allows the indifference region to vary based on observed data. Moreover, this composite hypothesis formulation better represents the conceptual purpose of the test, which is to test whether θ is above or below the cutscore. The purpose of this study was to explore the specifications of the new generalized likelihood ratio (GLR; Huang, 2004). As with the SPRT, the efficiency of the procedure depends on the nominal error rates and the distance between θ1 and θ2 (Eggen, 1999). This study utilized a monte-carlo approach, with 10,000 examinees simulated under each condition, to evaluate differences in efficiency and accuracy due to hypothesis structure, nominal error rate, and indifference region size. The GLR was always at least as efficient as the fixed-point SPRT while maintaining equivalent levels of accuracy. %C D. J. Weiss (Ed.), Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J International Journal of Web-Based Learning and Teaching Technologies %D 2008 %T CAT-MD: Computerized adaptive testing on mobile devices %A Triantafillou, E. %A Georgiadou, E. %A Economides, A. A. %B International Journal of Web-Based Learning and Teaching Technologies %V 3 %P 13-20 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2008 %T Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes %A Haley, S. M. %A Gandek, B. %A Siebens, H. %A Black-Schaffer, R. M. %A Sinclair, S. J. %A Tao, W. %A Coster, W. J. %A Ni, P. %A Jette, A. M. %K *Activities of Daily Living %K *Adaptation, Physiological %K *Computer Systems %K *Questionnaires %K Adult %K Aged %K Aged, 80 and over %K Chi-Square Distribution %K Factor Analysis, Statistical %K Female %K Humans %K Longitudinal Studies %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Patient Discharge %K Prospective Studies %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVES: To measure participation outcomes with a computerized adaptive test (CAT) and compare CAT and traditional fixed-length surveys in terms of score agreement, respondent burden, discriminant validity, and responsiveness. DESIGN: Longitudinal, prospective cohort study of patients interviewed approximately 2 weeks after discharge from inpatient rehabilitation and 3 months later. SETTING: Follow-up interviews conducted in patient's home setting. PARTICIPANTS: Adults (N=94) with diagnoses of neurologic, orthopedic, or medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Participation domains of mobility, domestic life, and community, social, & civic life, measured using a CAT version of the Participation Measure for Postacute Care (PM-PAC-CAT) and a 53-item fixed-length survey (PM-PAC-53). RESULTS: The PM-PAC-CAT showed substantial agreement with PM-PAC-53 scores (intraclass correlation coefficient, model 3,1, .71-.81). On average, the PM-PAC-CAT was completed in 42% of the time and with only 48% of the items as compared with the PM-PAC-53. Both formats discriminated across functional severity groups. The PM-PAC-CAT had modest reductions in sensitivity and responsiveness to patient-reported change over a 3-month interval as compared with the PM-PAC-53. CONCLUSIONS: Although continued evaluation is warranted, accurate estimates of participation status and responsiveness to change for group-level analyses can be obtained from CAT administrations, with a sizeable reduction in respondent burden. %B Archives of Physical Medicine and Rehabilitation %7 2008/01/30 %V 89 %P 275-283 %8 Feb %@ 1532-821X (Electronic)0003-9993 (Linking) %G eng %M 18226651 %2 2666330 %0 Conference Paper %B Joint Meeting on Adolescent Treatment Effectiveness %D 2008 %T Developing a progressive approach to using the GAIN in order to reduce the duration and cost of assessment with the GAIN short screener, Quick and computer adaptive testing %A Dennis, M. L. %A Funk, R. %A Titus, J. %A Riley, B. B. %A Hosman, S. %A Kinne, S. %B Joint Meeting on Adolescent Treatment Effectiveness %C Washington D.C., USA %8 2008 %G eng %( 2008 %) ADDED 1 Aug 2008 %F 205795 %0 Journal Article %J Journal of Educational and Behavioral Statistics %D 2008 %T The D-optimality item selection criterion in the early stage of CAT: A study with the graded response model %A Passos, V. L. %A Berger, M. P. F. %A Tan, F. E. S. %K computerized adaptive testing %K D optimality %K item selection %X During the early stage of computerized adaptive testing (CAT), item selection criteria based on Fisher’s information often produce less stable latent trait estimates than the Kullback-Leibler global information criterion. Robustness against early stage instability has been reported for the D-optimality criterion in a polytomous CAT with the Nominal Response Model and is shown herein to be reproducible for the Graded Response Model. For comparative purposes, the A-optimality and the global information criteria are also applied. Their item selection is investigated as a function of test progression and item bank composition. The results indicate how the selection of specific item parameters underlies the criteria performances evaluated via accuracy and precision of estimation. In addition, the criteria item exposure rates are compared, without the use of any exposure controlling measure. On the account of stability, precision, accuracy, numerical simplicity, and less evidently, item exposure rate, the D-optimality criterion can be recommended for CAT. %B Journal of Educational and Behavioral Statistics %V 33 %P 88-110 %G eng %0 Journal Article %J BMC Musculoskelet Disorders %D 2008 %T An initial application of computerized adaptive testing (CAT) for measuring disability in patients with low back pain %A Elhan, A. H. %A Oztuna, D. %A Kutlay, S. %A Kucukdeveci, A. A. %A Tennant, A. %X ABSTRACT: BACKGROUND: Recent approaches to outcome measurement involving Computerized Adaptive Testing (CAT) offer an approach for measuring disability in low back pain (LBP) in a way that can reduce the burden upon patient and professional. The aim of this study was to explore the potential of CAT in LBP for measuring disability as defined in the International Classification of Functioning, Disability and Health (ICF) which includes impairments, activity limitation, and participation restriction. METHODS: 266 patients with low back pain answered questions from a range of widely used questionnaires. An exploratory factor analysis (EFA) was used to identify disability dimensions which were then subjected to Rasch analysis. Reliability was tested by internal consistency and person separation index (PSI). Discriminant validity of disability levels were evaluated by Spearman correlation coefficient (r), intraclass correlation coefficient [ICC(2,1)] and the Bland-Altman approach. A CAT was developed for each dimension, and the results checked against simulated and real applications from a further 133 patients. RESULTS: Factor analytic techniques identified two dimensions named "body functions" and "activity-participation". After deletion of some items for failure to fit the Rasch model, the remaining items were mostly free of Differential Item Functioning (DIF) for age and gender. Reliability exceeded 0.90 for both dimensions. The disability levels generated using all items and those obtained from the real CAT application were highly correlated (i.e. >0.97 for both dimensions). On average, 19 and 14 items were needed to estimate the precise disability levels using the initial CAT for the first and second dimension. However, a marginal increase in the standard error of the estimate across successive iterations substantially reduced the number of items required to make an estimate. CONCLUSIONS: Using a combination approach of EFA and Rasch analysis this study has shown that it is possible to calibrate items onto a single metric in a way that can be used to provide the basis of a CAT application. Thus there is an opportunity to obtain a wide variety of information to evaluate the biopsychosocial model in its more complex forms, without necessarily increasing the burden of information collection for patients. %B BMC Musculoskelet Disorders %7 2008/12/20 %V 9 %P 166 %8 Dec 18 %@ 1471-2474 (Electronic) %G Eng %M 19094219 %0 Book %D 2007 %T A comparison of two methods of polytomous computerized classification testing for multiple cutscores %A Thompson, N. A. %C Unpublished doctoral dissertation, University of Minnesota %G eng %0 Book Section %D 2007 %T Computerized classification testing with composite hypotheses %A Thompson, N. A. %A Ro, S. %C D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Conference Proceedings %B GMAC Conference on Computerized Adaptive Testing %D 2007 %T Computerized classification testing with composite hypotheses %A Thompson, N. A. %A Ro, S. %K computerized adaptive testing %B GMAC Conference on Computerized Adaptive Testing %I Graduate Management Admissions Council %C St. Paul, MN %G eng %0 Conference Paper %B Paper presented at the international meeting of the Psychometric Society %D 2007 %T Cutscore location and classification accuracy in computerized classification testing %A Ro, S. %A Thompson, N. A. %B Paper presented at the international meeting of the Psychometric Society %C Tokyo, Japan %G eng %0 Journal Article %J Computers & Education %D 2007 %T The design and evaluation of a computerized adaptive test on mobile devices %A Triantafillou, E. %A Georgiadou, E. %A Economides, A. A. %B Computers & Education %V 49. %0 Journal Article %J Quality of Life Research %D 2007 %T Developing tailored instruments: item banking and computerized adaptive assessment %A Bjorner, J. B. %A Chang, C-H. %A Thissen, D. %A Reeve, B. B. %K *Health Status %K *Health Status Indicators %K *Mental Health %K *Outcome Assessment (Health Care) %K *Quality of Life %K *Questionnaires %K *Software %K Algorithms %K Factor Analysis, Statistical %K Humans %K Models, Statistical %K Psychometrics %X Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges. %B Quality of Life Research %7 2007/05/29 %V 16 %P 95-108 %@ 0962-9343 (Print) %G eng %M 17530450 %0 Book Section %D 2007 %T Exploring potential designs for multi-form structure computerized adaptive tests with uniform item exposure %A Edwards, M. C. %A Thissen, D. %C D. J. Weiss (Ed.), Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Book Section %D 2007 %T Investigating CAT designs to achieve comparability with a paper test %A Thompson, T. %A Way, W. D. %C In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. %G eng %0 Journal Article %J Quality of Life Research %D 2007 %T IRT health outcomes data analysis project: an overview and summary %A Cook, K. F. %A Teal, C. R. %A Bjorner, J. B. %A Cella, D. %A Chang, C-H. %A Crane, P. K. %A Gibbons, L. E. %A Hays, R. D. %A McHorney, C. A. %A Ocepek-Welikson, K. %A Raczek, A. E. %A Teresi, J. A. %A Reeve, B. B. %K *Data Interpretation, Statistical %K *Health Status %K *Quality of Life %K *Questionnaires %K *Software %K Female %K HIV Infections/psychology %K Humans %K Male %K Neoplasms/psychology %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Stress, Psychological %X BACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed. %B Quality of Life Research %7 2007/03/14 %V 16 %P 121-132 %@ 0962-9343 (Print) %G eng %M 17351824 %0 Conference Paper %B Paper presented at the Conference on High Stakes Testing %D 2007 %T Item selection in computerized classification testing %A Thompson, N. A. %B Paper presented at the Conference on High Stakes Testing %C University of Nebraska %G eng %0 Journal Article %J Quality of Life Research %D 2007 %T Methodological issues for building item banks and computerized adaptive scales %A Thissen, D. %A Reeve, B. B. %A Bjorner, J. B. %A Chang, C-H. %X Abstract This paper reviews important methodological considerations for developing item banks and computerized adaptive scales (commonly called computerized adaptive tests in the educational measurement literature, yielding the acronym CAT), including issues of the reference population, dimensionality, dichotomous versus polytomous response scales, differential item functioning (DIF) and conditional scoring, mode effects, the impact of local dependence, and innovative approaches to assessment using CATs in health outcomes research. %B Quality of Life Research %V 16 %P 109-119, %@ 0962-93431573-2649 %G eng %0 Journal Article %J Practical Assessment Research and Evaluation %D 2007 %T A Practitioner’s Guide for Variable-length Computerized Classification Testing %A Thompson, N. A. %B Practical Assessment Research and Evaluation %V 12 %P 1-13 %N 1 %0 Generic %D 2007 %T A practitioner's guide to variable-length computerized classification testing %A Thompson, N. A. %K CAT %K classification %K computer adaptive testing %K computerized adaptive testing %K Computerized classification testing %X Variable-length computerized classification tests, CCTs, (Lin & Spray, 2000; Thompson, 2006) are a powerful and efficient approach to testing for the purpose of classifying examinees into groups. CCTs are designed by the specification of at least five technical components: psychometric model, calibrated item bank, starting point, item selection algorithm, and termination criterion. Several options exist for each of these CCT components, creating a myriad of possible designs. Confusion among designs is exacerbated by the lack of a standardized nomenclature. This article outlines the components of a CCT, common options for each component, and the interaction of options for different components, so that practitioners may more efficiently design CCTs. It also offers a suggestion of nomenclature. %B Practical Assessment, Research and Evaluation %V 12 %8 7/1/2009 %G eng %& January, 2007 %0 Journal Article %J Physical Therapy %D 2007 %T Prospective evaluation of the am-pac-cat in outpatient rehabilitation settings %A Jette, A., %A Haley, S. %A Tao, W. %A Ni, P. %A Moed, R. %A Meyers, D. %A Zurek, M. %B Physical Therapy %V 87 %P 385-398 %G eng %0 Journal Article %J Medical Care %D 2007 %T Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) %A Reeve, B. B. %A Hays, R. D. %A Bjorner, J. B. %A Cook, K. F. %A Crane, P. K. %A Teresi, J. A. %A Thissen, D. %A Revicki, D. A. %A Weiss, D. J. %A Hambleton, R. K. %A Liu, H. %A Gershon, R. C. %A Reise, S. P. %A Lai, J. S. %A Cella, D. %K *Health Status %K *Information Systems %K *Quality of Life %K *Self Disclosure %K Adolescent %K Adult %K Aged %K Calibration %K Databases as Topic %K Evaluation Studies as Topic %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Questionnaires/standards %K United States %X BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. %B Medical Care %7 2007/04/20 %V 45 %P S22-31 %8 May %@ 0025-7079 (Print) %G eng %M 17443115 %0 Journal Article %J Journal of Technology,Learning, and Assessment, %D 2007 %T A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005 %A Georgiadou, E. %A Triantafillou, E. %A Economides, A. A. %X Since researchers acknowledged the several advantages of computerized adaptive testing (CAT) over traditional linear test administration, the issue of item exposure control has received increased attention. Due to CAT’s underlying philosophy, particular items in the item pool may be presented too often and become overexposed, while other items are rarely selected by the CAT algorithm and thus become underexposed. Several item exposure control strategies have been presented in the literature aiming to prevent overexposure of some items and to increase the use rate of rarely or never selected items. This paper reviews such strategies that appeared in the relevant literature from 1983 to 2005. The focus of this paper is on studies that have been conducted in order to evaluate the effectiveness of item exposure control strategies for dichotomous scoring, polytomous scoring and testlet-based CAT systems. In addition, the paper discusses the strengths and weaknesses of each strategy group using examples from simulation studies. No new research is presented but rather a compendium of models is reviewed with an overall objective of providing researchers of this field, especially newcomers, a wide view of item exposure control strategies. %B Journal of Technology,Learning, and Assessment, %V 5(8) %G eng %0 Journal Article %J Applied Psychological Measurement %D 2007 %T Test design optimization in CAT early stage with the nominal response model %A Passos, V. L. %A Berger, M. P. F. %A Tan, F. E. %K computerized adaptive testing %K nominal response model %K robust performance %K test design optimization %X The early stage of computerized adaptive testing (CAT) refers to the phase of the trait estimation during the administration of only a few items. This phase can be characterized by bias and instability of estimation. In this study, an item selection criterion is introduced in an attempt to lessen this instability: the D-optimality criterion. A polytomous unconstrained CAT simulation is carried out to evaluate this criterion's performance under different test premises. The simulation shows that the extent of early stage instability depends primarily on the quality of the item pool information and its size and secondarily on the item selection criteria. The efficiency of the D-optimality criterion is similar to the efficiency of other known item selection criteria. Yet, it often yields estimates that, at the beginning of CAT, display a more robust performance against instability. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Applied Psychological Measurement %I Sage Publications: US %V 31 %P 213-232 %@ 0146-6216 (Print) %G eng %M 2007-06921-005 %0 Journal Article %J Acta Psychologica Sinica %D 2006 %T The comparison among item selection strategies of CAT with multiple-choice items %A Hai-qi, D. %A De-zhi, C. %A Shuliang, D. %A Taiping, D. %K CAT %K computerized adaptive testing %K graded response model %K item selection strategies %K multiple choice items %X The initial purpose of comparing item selection strategies for CAT was to increase the efficiency of tests. As studies continued, however, it was found that increasing the efficiency of item bank using was also an important goal of comparing item selection strategies. These two goals often conflicted. The key solution was to find a strategy with which both goals could be accomplished. The item selection strategies for graded response model in this study included: the average of the difficulty orders matching with the ability; the medium of the difficulty orders matching with the ability; maximum information; A stratified (average); and A stratified (medium). The evaluation indexes used for comparison included: the bias of ability estimates for the true; the standard error of ability estimates; the average items which the examinees have administered; the standard deviation of the frequency of items selected; and sum of the indices weighted. Using the Monte Carlo simulation method, we obtained some data and computer iterated the data 20 times each under the conditions that the item difficulty parameters followed the normal distribution and even distribution. The results were as follows; The results indicated that no matter difficulty parameters followed the normal distribution or even distribution. Every type of item selection strategies designed in this research had its strong and weak points. In general evaluation, under the condition that items were stratified appropriately, A stratified (medium) (ASM) had the best effect. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Acta Psychologica Sinica %I Science Press: China %V 38 %P 778-783 %@ 0439-755X (Print) %G eng %M 2006-20552-017 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2006 %T Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: I. Activity outcomes %A Haley, S. M. %A Siebens, H. %A Coster, W. J. %A Tao, W. %A Black-Schaffer, R. M. %A Gandek, B. %A Sinclair, S. J. %A Ni, P. %K *Activities of Daily Living %K *Adaptation, Physiological %K *Computer Systems %K *Questionnaires %K Adult %K Aged %K Aged, 80 and over %K Chi-Square Distribution %K Factor Analysis, Statistical %K Female %K Humans %K Longitudinal Studies %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Patient Discharge %K Prospective Studies %K Rehabilitation/*standards %K Subacute Care/*standards %X OBJECTIVE: To examine score agreement, precision, validity, efficiency, and responsiveness of a computerized adaptive testing (CAT) version of the Activity Measure for Post-Acute Care (AM-PAC-CAT) in a prospective, 3-month follow-up sample of inpatient rehabilitation patients recently discharged home. DESIGN: Longitudinal, prospective 1-group cohort study of patients followed approximately 2 weeks after hospital discharge and then 3 months after the initial home visit. SETTING: Follow-up visits conducted in patients' home setting. PARTICIPANTS: Ninety-four adults who were recently discharged from inpatient rehabilitation, with diagnoses of neurologic, orthopedic, and medically complex conditions. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Summary scores from AM-PAC-CAT, including 3 activity domains of movement and physical, personal care and instrumental, and applied cognition were compared with scores from a traditional fixed-length version of the AM-PAC with 66 items (AM-PAC-66). RESULTS: AM-PAC-CAT scores were in good agreement (intraclass correlation coefficient model 3,1 range, .77-.86) with scores from the AM-PAC-66. On average, the CAT programs required 43% of the time and 33% of the items compared with the AM-PAC-66. Both formats discriminated across functional severity groups. The standardized response mean (SRM) was greater for the movement and physical fixed form than the CAT; the effect size and SRM of the 2 other AM-PAC domains showed similar sensitivity between CAT and fixed formats. Using patients' own report as an anchor-based measure of change, the CAT and fixed length formats were comparable in responsiveness to patient-reported change over a 3-month interval. CONCLUSIONS: Accurate estimates for functional activity group-level changes can be obtained from CAT administrations, with a considerable reduction in administration time. %B Archives of Physical Medicine and Rehabilitation %7 2006/08/01 %V 87 %P 1033-42 %8 Aug %@ 0003-9993 (Print) %G eng %M 16876547 %0 Journal Article %J British Journal of Educational Technology %D 2006 %T Evaluation parameters for computer adaptive testing %A Georgiadou, E. %A Triantafillou, E. %A Economides, A. A. %B British Journal of Educational Technology %V Vol. 37 %P 261-278 %G eng %N No 2 %0 Journal Article %J Medical Care %D 2006 %T Overview of quantitative measurement methods. Equivalence, invariance, and differential item functioning in health applications %A Teresi, J. A. %K *Cross-Cultural Comparison %K Data Interpretation, Statistical %K Factor Analysis, Statistical %K Guidelines as Topic %K Humans %K Models, Statistical %K Psychometrics/*methods %K Statistics as Topic/*methods %K Statistics, Nonparametric %X BACKGROUND: Reviewed in this article are issues relating to the study of invariance and differential item functioning (DIF). The aim of factor analyses and DIF, in the context of invariance testing, is the examination of group differences in item response conditional on an estimate of disability. Discussed are parameters and statistics that are not invariant and cannot be compared validly in crosscultural studies with varying distributions of disability in contrast to those that can be compared (if the model assumptions are met) because they are produced by models such as linear and nonlinear regression. OBJECTIVES: The purpose of this overview is to provide an integrated approach to the quantitative methods used in this special issue to examine measurement equivalence. The methods include classical test theory (CTT), factor analytic, and parametric and nonparametric approaches to DIF detection. Also included in the quantitative section is a discussion of item banking and computerized adaptive testing (CAT). METHODS: Factorial invariance and the articles discussing this topic are introduced. A brief overview of the DIF methods presented in the quantitative section of the special issue is provided together with a discussion of ways in which DIF analyses and examination of invariance using factor models may be complementary. CONCLUSIONS: Although factor analytic and DIF detection methods share features, they provide unique information and can be viewed as complementary in informing about measurement equivalence. %B Medical Care %7 2006/10/25 %V 44 %P S39-49 %8 Nov %@ 0025-7079 (Print)0025-7079 (Linking) %G eng %M 17060834 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function %A Hart, D. L. %A Cook, K. F. %A Mioduski, J. E. %A Teal, C. R. %A Crane, P. K. %K computerized adaptive testing %K Flexilevel Scale of Shoulder Function %K Item Response Theory %K Rehabilitation %X

Background and Objective: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items,
develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (qIRT) and measures generated using the simulated CAT (qCAT).
Study Design and Setting: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients
with shoulder impairments who completed 60 SFS items.
Results: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items on were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The qIRT and qCAT measures were highly correlated (r 5 .96) and resulted in similar classifications of patients.
Conclusion: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good
discriminating ability. 

%B Journal of Clinical Epidemiology %V 59 %P 290-298 %G English %N 3 %0 Journal Article %J Journal of Clinical Epidemiology %D 2006 %T Simulated computerized adaptive test for patients with shoulder impairments was efficient and produced valid measures of function %A Hart, D. L. %A Cook, K. F. %A Mioduski, J. E. %A Teal, C. R. %A Crane, P. K. %K *Computer Simulation %K *Range of Motion, Articular %K Activities of Daily Living %K Adult %K Aged %K Aged, 80 and over %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Prospective Studies %K Reproducibility of Results %K Research Support, N.I.H., Extramural %K Research Support, U.S. Gov't, Non-P.H.S. %K Shoulder Dislocation/*physiopathology/psychology/rehabilitation %K Shoulder Pain/*physiopathology/psychology/rehabilitation %K Shoulder/*physiopathology %K Sickness Impact Profile %K Treatment Outcome %X BACKGROUND AND OBJECTIVE: To test unidimensionality and local independence of a set of shoulder functional status (SFS) items, develop a computerized adaptive test (CAT) of the items using a rating scale item response theory model (RSM), and compare discriminant validity of measures generated using all items (theta(IRT)) and measures generated using the simulated CAT (theta(CAT)). STUDY DESIGN AND SETTING: We performed a secondary analysis of data collected prospectively during rehabilitation of 400 patients with shoulder impairments who completed 60 SFS items. RESULTS: Factor analytic techniques supported that the 42 SFS items formed a unidimensional scale and were locally independent. Except for five items, which were deleted, the RSM fit the data well. The remaining 37 SFS items were used to generate the CAT. On average, 6 items were needed to estimate precise measures of function using the SFS CAT, compared with all 37 SFS items. The theta(IRT) and theta(CAT) measures were highly correlated (r = .96) and resulted in similar classifications of patients. CONCLUSION: The simulated SFS CAT was efficient and produced precise, clinically relevant measures of functional status with good discriminating ability. %B Journal of Clinical Epidemiology %V 59 %P 290-8 %G eng %M 16488360 %0 Book %D 2005 %T Adaptive selection of personality items to inform a neural network predicting job performance %A Thissen-Roe, A. %C Unpublished doctoral dissertation, University of Washington %G eng %0 Journal Article %J Journal of Computer Assisted Learning %D 2005 %T A computer-assisted test design and diagnosis system for use by classroom teachers %A He, Q. %A Tymms, P. %K Computer Assisted Testing %K Computer Software %K Diagnosis %K Educational Measurement %K Teachers %X Computer-assisted assessment (CAA) has become increasingly important in education in recent years. A variety of computer software systems have been developed to help assess the performance of students at various levels. However, such systems are primarily designed to provide objective assessment of students and analysis of test items, and focus has been mainly placed on higher and further education. Although there are commercial professional systems available for use by primary and secondary educational institutions, such systems are generally expensive and require skilled expertise to operate. In view of the rapid progress made in the use of computer-based assessment for primary and secondary students by education authorities here in the UK and elsewhere, there is a need to develop systems which are economic and easy to use and can provide the necessary information that can help teachers improve students' performance. This paper presents the development of a software system that provides a range of functions including generating items and building item banks, designing tests, conducting tests on computers and analysing test results. Specifically, the system can generate information on the performance of students and test items that can be easily used to identify curriculum areas where students are under performing. A case study based on data collected from five secondary schools in Hong Kong involved in the Curriculum, Evaluation and Management Centre's Middle Years Information System Project, Durham University, UK, has been undertaken to demonstrate the use of the system for diagnostic and performance analysis. (PsycINFO Database Record (c) 2006 APA ) (journal abstract) %B Journal of Computer Assisted Learning %V 21 %P 419-429 %G eng %0 Journal Article %J Evaluation and the Health Professions %D 2005 %T Data pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer %A Eton, D. T. %A Lai, J. S. %A Cella, D. %A Reeve, B. B. %A Talcott, J. A. %A Clark, J. A. %A McPherson, C. P. %A Litwin, M. S. %A Moinpour, C. M. %K *Quality of Life %K *Questionnaires %K Adult %K Aged %K Data Collection/methods %K Humans %K Intestine, Large/*physiopathology %K Male %K Middle Aged %K Prostatic Neoplasms/*physiopathology %K Psychometrics %K Research Support, Non-U.S. Gov't %K Statistics, Nonparametric %X Assessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress. %B Evaluation and the Health Professions %V 28 %P 142-59 %G eng %M 15851770 %0 Journal Article %J Alcoholism: Clinical & Experimental Research %D 2005 %T Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire %A Kahler, C. W. %A Strong, D. R. %A Read, J. P. %A De Boeck, P. %A Wilson, M. %A Acton, G. S. %A Palfai, T. P. %A Wood, M. D. %A Mehta, P. D. %A Neale, M. C. %A Flay, B. R. %A Conklin, C. A. %A Clayton, R. R. %A Tiffany, S. T. %A Shiffman, S. %A Krueger, R. F. %A Nichol, P. E. %A Hicks, B. M. %A Markon, K. E. %A Patrick, C. J. %A Iacono, William G. %A McGue, Matt %A Langenbucher, J. W. %A Labouvie, E. %A Martin, C. S. %A Sanjuan, P. M. %A Bavly, L. %A Kirisci, L. %A Chung, T. %A Vanyukov, M. %A Dunn, M. %A Tarter, R. %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. %K Psychometrics %K Substance-Related Disorders %X Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association %B Alcoholism: Clinical & Experimental Research %V 29 %P 1180-1189 %G eng %0 Journal Article %J Journal of Applied Measurement %D 2004 %T Pre-equating: a simulation study based on a large scale assessment model %A Taherbhai, H. M. %A Young, M. J. %K *Databases %K *Models, Theoretical %K Calibration %K Human %K Psychometrics %K Reference Values %K Reproducibility of Results %X Although post-equating (PE) has proven to be an acceptable method in the scaling and equating of items and forms, there are times when the turn-around period for equating and converting raw scores to scale scores is so small that PE cannot be undertaken within the prescribed time frame. In such cases, pre-equating (PrE) could be considered as an acceptable alternative. Assessing the feasibility of using item calibrations from the item bank (as in PrE) is conditioned on the equivalency of the calibrations and the errors associated with it vis a vis the results obtained via PE. This paper creates item banks over three periods of item introduction into the banks and uses the Rasch model in examining data with respect to the recovery of item parameters, the measurement error, and the effect cut-points have on examinee placement in both the PrE and PE situations. Results indicate that PrE is a viable solution to PE provided the stability of the item calibrations are enhanced by using large sample sizes (perhaps as large as full-population) in populating the item bank. %B Journal of Applied Measurement %V 5 %P 301-18 %G eng %M 15243175 %0 Journal Article %J International Journal of Artificial Intelligence in Education %D 2004 %T Siette: a web-based tool for adaptive testing %A Conejo, R %A Guzmán, E %A Millán, E %A Trella, M %A Pérez-De-La-Cruz, JL %A Ríos, A %K computerized adaptive testing %B International Journal of Artificial Intelligence in Education %V 14 %P 29-61 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T A Bayesian method for the detection of item preknowledge in computerized adaptive testing %A McLeod L. D., Lewis, C., %A Thissen, D. %B Applied Psychological Measurement %V 27 %P 2, 121-137 %G eng %0 Journal Article %J Applied Psychological Measurement %D 2003 %T A Bayesian method for the detection of item preknowledge in computerized adaptive testing %A McLeod, L. %A Lewis, C. %A Thissen, D. %K Adaptive Testing %K Cheating %K Computer Assisted Testing %K Individual Differences computerized adaptive testing %K Item %K Item Analysis (Statistical) %K Mathematical Modeling %K Response Theory %X With the increased use of continuous testing in computerized adaptive testing, new concerns about test security have evolved, such as how to ensure that items in an item pool are safeguarded from theft. In this article, procedures to detect test takers using item preknowledge are explored. When test takers use item preknowledge, their item responses deviate from the underlying item response theory (IRT) model, and estimated abilities may be inflated. This deviation may be detected through the use of person-fit indices. A Bayesian posterior log odds ratio index is proposed for detecting the use of item preknowledge. In this approach to person fit, the estimated probability that each test taker has preknowledge of items is updated after each item response. These probabilities are based on the IRT parameters, a model specifying the probability that each item has been memorized, and the test taker's item responses. Simulations based on an operational computerized adaptive test (CAT) pool are used to demonstrate the use of the odds ratio index. (PsycINFO Database Record (c) 2005 APA ) %B Applied Psychological Measurement %V 27 %P 121-137 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2003 %T Evaluating stability of online item calibrations under varying conditions %A Thomasson, G. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the Annual Meeting of the American Educational Research Association %D 2003 %T The evaluation of exposure control procedures for an operational CAT. %A French, B. F. %A Thompson, T. T. %B Paper presented at the Annual Meeting of the American Educational Research Association %C Chicago IL %0 Journal Article %J Applied Psychological Measurement %D 2002 %T Application of an empirical Bayes enhancement of Mantel-Haenszel differential item functioning analysis to a computerized adaptive test %A Zwick, R. %A Thayer, D. T. %B Applied Psychological Measurement %V 26 %P 57-76 %G eng %0 Journal Article %J Dissertation Abstracts International: Section B: the Sciences & Engineering %D 2002 %T Computer adaptive testing: The impact of test characteristics on perceived performance and test takers' reactions %A Tonidandel, S. %K computerized adaptive testing %X This study examined the relationship between characteristics of adaptive testing and test takers' subsequent reactions to the test. Participants took a computer adaptive test in which two features, the difficulty of the initial item and the difficulty of subsequent items, were manipulated. These two features of adaptive testing determined the number of items answered correctly by examinees and their subsequent reactions to the test. The data show that the relationship between test characteristics and reactions was fully mediated by perceived performance on the test. In addition, the impact of feedback on reactions to adaptive testing was also evaluated. In general, feedback that was consistent with perceptions of performance had a positive impact on reactions to the test. Implications for adaptive test design concerning maximizing test takers' reactions are discussed. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Dissertation Abstracts International: Section B: the Sciences & Engineering %V 62 %P 3410 %G eng %0 Journal Article %J Journal of Applied Psychology %D 2002 %T Computer-adaptive testing: The impact of test characteristics on perceived performance and test takers’ reactions %A Tonidandel, S. %A Quiñones, M. A. %A Adams, A. A. %B Journal of Applied Psychology %V 87 %P 320-332 %0 Conference Paper %B Paper presented at the conference “Advances in Health Outcomes Measurement %D 2002 %T Developing tailored instruments: Item banking and computerized adaptive assessment %A Thissen, D. %B Paper presented at the conference “Advances in Health Outcomes Measurement %C ” Bethesda, Maryland, June 23-25 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2002 %T Employing new ideas in CAT to a simulated reading test %A Thompson, T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans LA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2002 %T An investigation of procedures for estimating error indexes in proficiency estimation in CAT %A Shyu, C.-Y. %A Fan, M. %A Thompson, T, %A Hsu, Y. %B Paper presented at the annual meeting of the American Educational Research Association %C New Orleans LA %G eng %0 Conference Paper %B Presentation to the Annual Meeting of the Society for the Scientific Study of Reading. Chicago. %D 2002 %T Mapping the Development of Pre-reading Skills with STAR Early Literacy %A J. R. McBride %A Tardrew, S.P. %B Presentation to the Annual Meeting of the Society for the Scientific Study of Reading. Chicago. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T Effects of changes in the examinees’ ability distribution on the exposure control methods in CAT %A Chang, S-W. %A Twu, B.-Y. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %0 Journal Article %J Journal of Personality Assessment %D 2001 %T Evaluation of an MMPI-A short form: Implications for adaptive testing %A Archer, R. P. %A Tirrell, C. A. %A Elkins, D. E. %K Adaptive Testing %K Mean %K Minnesota Multiphasic Personality Inventory %K Psychometrics %K Statistical Correlation %K Statistical Samples %K Test Forms %X Reports some psychometric properties of an MMPI-Adolescent version (MMPI-A; J. N. Butcher et al, 1992) short form based on administration of the 1st 150 items of this test instrument. The authors report results for both the MMPI-A normative sample of 1,620 adolescents (aged 14-18 yrs) and a clinical sample of 565 adolescents (mean age 15.2 yrs) in a variety of treatment settings. The authors summarize results for the MMPI-A basic scales in terms of Pearson product-moment correlations generated between full administration and short-form administration formats and mean T score elevations for the basic scales generated by each approach. In this investigation, the authors also examine single-scale and 2-point congruences found for the MMPI-A basic clinical scales as derived from standard and short-form administrations. The authors present the relative strengths and weaknesses of the MMPI-A short form and discuss the findings in terms of implications for attempts to shorten the item pool through the use of computerized adaptive assessment approaches. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Personality Assessment %V 76 %P 76-89 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 2001 %T An investigation of procedures for estimating error indexes in proficiency estimation in a realistic second-order equitable CAT environment %A Shyu, C.-Y. %A Fan, M. %A Thompson, T, %A Hsu. %B Paper presented at the annual meeting of the American Educational Research Association %C Seattle WA %G eng %0 Book Section %B Test scoring %D 2001 %T Item response theory applied to combinations of multiple-choice and constructed-response items--approximation methods for scale scores %A Thissen, D. %A Nelson, L. A. %A Swygert, K. A. %K Adaptive Testing %K Item Response Theory %K Method) %K Multiple Choice (Testing %K Scoring (Testing) %K Statistical Estimation %K Statistical Weighting %K Test Items %K Test Scores %X (From the chapter) The authors develop approximate methods that replace the scoring tables with weighted linear combinations of the component scores. Topics discussed include: a linear approximation for the extension to combinations of scores; the generalization of two or more scores; potential applications of linear approximations to item response theory in computerized adaptive tests; and evaluation of the pattern-of-summed-scores, and Gaussian approximation, estimates of proficiency. (PsycINFO Database Record (c) 2005 APA ) %B Test scoring %I Lawrence Erlbaum Associates %C Mahwah, N.J. USA %P 289-315 %G eng %& 8 %0 Journal Article %J Dissertation Abstracts International Section A: Humanities & Social Sciences %D 2001 %T Multidimensional adaptive testing using the weighted likelihood estimation %A Tseng, F-L. %K computerized adaptive testing %X This study extended Warm's (1989) weighted likelihood estimation (WLE) to a multidimensional computerized adaptive test (MCAT) setting. WLE was compared with the maximum likelihood estimation (MLE), expected a posteriori (EAP), and maximum a posteriori (MAP) using a three-dimensional 3PL IRT model under a variety of computerized adaptive testing conditions. The dependent variables included bias, standard error of ability estimates (SE), square root of mean square error (RMSE), and test information. The independent variables were ability estimation methods, intercorrelation levels between dimensions, multidimensional structures, and ability combinations. Simulation results were presented in terms of descriptive statistics, such as figures and tables. In addition, inferential procedures were used to analyze bias by conceptualizing this Monte Carlo study as a statistical sampling experiment. The results of this study indicate that WLE and the other three estimation methods yield significantly more accurate ability estimates under an approximate simple test structure with one dominant dimension and several secondary dimensions. All four estimation methods, especially WLE, yield very large SEs when a three equally dominant multidimensional structure was employed. Consistent with previous findings based on unidimensional IRT model, MLE and WLE are less biased in the extreme of the ability scale; MLE and WLE yield larger SEs than the Bayesian methods; test information-based SEs underestimate actual SEs for both MLE and WLE in MCAT situations, especially at shorter test lengths; WLE reduced the bias of MLE under the approximate simple structure; test information-based SEs underestimates the actual SEs of MLE and WLE estimators in the MCAT conditions, similar to the findings of Warm (1989) in the unidimensional case. The results from the MCAT simulations did show some advantages of WLE in reducing the bias of MLE under the approximate simple structure with a fixed test length of 50 items, which was consistent with the previous research findings based on different unidimensional models. It is clear from the current results that all four methods perform very poorly when the multidimensional structures with multiple dominant factors were employed. More research efforts are urged to investigate systematically how different multidimensional structures affect the accuracy and reliability of ability estimation. Based on the simulated results in this study, there is no significant effect found on the ability estimation from the intercorrelation between dimensions. (PsycINFO Database Record (c) 2003 APA, all rights reserved). %B Dissertation Abstracts International Section A: Humanities & Social Sciences %V 61 %P 4746 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2001 %T Multidimensional adaptive testing using weighted likelihood estimation: A comparison of estimation methods %A Tseng, F.-E. %A Hsu, T.-C. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Seattle WA %G eng %0 Journal Article %J Psicothema %D 2001 %T Pasado, presente y futuro de los test adaptativos informatizados: Entrevista con Isaac I. Béjar [Past, present and future of computerized adaptive testing: Interview with Isaac I. Béjar] %A Tejada, R. %A Antonio, J. %K computerized adaptive testing %X En este artículo se presenta el resultado de una entrevista con Isaac I. Bejar. El Dr. Bejar es actualmente Investigador Científico Principal y Director del Centro para el Diseño de Evaluación y Sistemas de Puntuación perteneciente a la División de Investigación del Servicio de Medición Educativa (Educa - tional Testing Service, Princeton, NJ, EE.UU.). El objetivo de esta entrevista fue conversar sobre el pasado, presente y futuro de los Tests Adaptativos Informatizados. En la entrevista se recogen los inicios de los Tests Adaptativos y de los Tests Adaptativos Informatizados y últimos avances que se desarrollan en el Educational Testing Service sobre este tipo de tests (modelos generativos, isomorfos, puntuación automática de ítems de ensayo…). Se finaliza con la visión de futuro de los Tests Adaptativos Informatizados y su utilización en España.Past, present and future of Computerized Adaptive Testing: Interview with Isaac I. Bejar. In this paper the results of an interview with Isaac I. Bejar are presented. Dr. Bejar is currently Principal Research Scientist and Director of Center for Assessment Design and Scoring, in Research Division at Educational Testing Service (Princeton, NJ, U.S.A.). The aim of this interview was to review the past, present and future of the Computerized Adaptive Tests. The beginnings of the Adaptive Tests and Computerized Adaptive Tests, and the latest advances developed at the Educational Testing Service (generative response models, isomorphs, automated scoring…) are reviewed. The future of Computerized Adaptive Tests is analyzed, and its utilization in Spain commented. %B Psicothema %V 13 %P 685-690 %@ 0214-9915 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 2000 %T Applying specific information item selection to a passage-based test %A Thompson, T.D. %A Davey, T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C New Orleans, LA, April %G eng %0 Book %D 2000 %T Computerized adaptive testing: A primer (2nd edition) %A Wainer, H., %A Dorans, N. %A Eignor, D. R. %A Flaugher, R. %A Green, B. F. %A Mislevy, R. %A Steinberg, L. %A Thissen, D. %C Hillsdale, N. J. : Lawrence Erlbaum Associates %G eng %0 Journal Article %J Journal of Applied Measurement %D 2000 %T The impact of receiving the same items on consecutive computer adaptive test administrations %A O'Neill, T. %A Lunz, M. E. %A Thiede, K. %X Addresses item exposure in a Computerized Adaptive Test (CAT) when the item selection algorithm is permitted to present examinees with questions that they have already been asked in a previous test administration. The data were from a national certification exam in medical technology. The responses of 178 repeat examinees were compared. The results indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate person ability provides substantial protection from score contamination. The implications for constraints that prohibit examinees from seeing an item twice are discussed. (PsycINFO Database Record (c) 2002 APA, all rights reserved). %B Journal of Applied Measurement %V 1 %P 131-151 %G eng %0 Journal Article %J International Journal of Selection and Assessment %D 2000 %T Psychological reactions to adaptive testing %A Tonidandel, S., %A Quiñones, M. A. %B International Journal of Selection and Assessment %V 8 %P 7-15 %0 Book Section %D 2000 %T Using Bayesian Networks in Computerized Adaptive Tests %A Millan, E. %A Trella, M %A Perez-de-la-Cruz, J.-L. %A Conejo, R %C M. Ortega and J. Bravo (Eds.),Computers and Education in the 21st Century. Kluwer, pp. 217228. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Automated flawed item detection and graphical item used in on-line calibration of CAT-ASVAB. %A Krass, I. A. %A Thomasson, G. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National council on Measurement in Education %D 1999 %T Constructing adaptive tests to parallel conventional programs %A Fan, M. %A Thompson, T. %A Davey, T. %B Paper presented at the annual meeting of the National council on Measurement in Education %C Montreal %G eng %0 Book Section %D 1999 %T The development of a computerized adaptive selection system for computer programmers in a financial services company %A Zickar, M.. J. %A Overton, R. C. %A Taylor, L. R. %A Harms, H. J. %C F. Drasgow and J. B. Olsen (Eds.), Innvoations in computerized assessment (p. 7-33). Mahwah NJ Erlbaum. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Implications from information functions and standard errors for determining preferred normed scales for CAT and P and P ASVAB %A Nicewander, W. A. %A Thomasson, G. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1999 %T Pretesting alongside an operational CAT %A Davey, T. %A Pommerich, M %A Thompson, D. T. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Montreal, Canada %G eng %0 Journal Article %J Applied Psychological Measurement %D 1999 %T Some reliability estimates for computerized adaptive tests %A Nicewander, W. A. %A Thomasson, G. L. %X Three reliability estimates are derived for the Bayes modal estimate (BME) and the maximum likelihood estimate (MLE) of θin computerized adaptive tests (CAT). Each reliability estimate is a function of test information. Two of the estimates are shown to be upper bounds to true reliability. The three reliability estimates and the true reliabilities of both MLE and BME were computed for seven simulated CATs. Results showed that the true reliabilities for MLE and BME were nearly identical in all seven tests. The three reliability estimates never differed from the true reliabilities by more than .02 (.01 in most cases). A simple implementation of one reliability estimate was found to accurately estimate reliability in CATs. %B Applied Psychological Measurement %V 23 %P 239-47 %G eng %M EJ596308 %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T CAT item calibration %A Hsu, Y. %A Thompson, T.D. %A Chen, W-H. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council of Measurement in Education %D 1998 %T CAT Item exposure control: New evaluation tools, alternate methods and integration into a total CAT program %A Thomasson, G. L. %B Paper presented at the annual meeting of the National Council of Measurement in Education %C San Diego, CA %G eng %0 Generic %D 1998 %T A comparative study of item exposure control methods in computerized adaptive testing %A Chang, S-W. %A Twu, B.-Y. %C Research Report Series 98-3, Iowa City: American College Testing. %G eng %0 Conference Paper %B Paper presented at the meeting of the American Educational Research Association. San Diego CA. %D 1998 %T A comparison of two methods of controlling item exposure in computerized adaptive testing %A Tang, L. %A Jiang, H. %A Chang, Hua-Hua %B Paper presented at the meeting of the American Educational Research Association. San Diego CA. %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1998 %T Constructing adaptive tests to parallel conventional programs %A Thompson, T. %A Davey, T. %A Nering, M. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C San Diego %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T Constructing passage-based tests that parallel conventional programs %A Thompson, T. %A Davey, T. %A Nering, M. L. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T A hybrid method for controlling item exposure in computerized adaptive testing %A Nering, M. L. %A Davey, T. %A Thompson, T. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T A hybrid method for controlling item exposure in computerized adaptive testing %A Nering, M. L. %A Davey, T. %A Thompson, T. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Generic %D 1998 %T The relationship between computer familiarity and performance on computer-based TOEFL test tasks (Research Report 98-08) %A Taylor, C. %A Jamieson, J. %A Eignor, D. R. %A Kirsch, I. %C Princeton NJ: Educational Testing Service %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T Some item response theory to provide scale scores based on linear combinations of testlet scores, for computerized adaptive tests %A Thissen, D. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1998 %T Some reliability estimators for computerized adaptive tests %A Nicewander, W. A. %A Thomasson, G. L. %B Paper presented at the annual meeting of the Psychometric Society %C Urbana, IL %G eng %0 Journal Article %J Personnel Psychology %D 1997 %T Adapting to adaptive testing %A Overton, R. C. %A Harms, H. J. %A Taylor, L. R. %A Zickar, M.. J. %B Personnel Psychology %V 50 %P 171-185 %G eng %0 Journal Article %J Journal of Educational Measurement %D 1997 %T Diagnostic adaptive testing: Effects of remedial instruction as empirical validation %A Tatsuoka, K. K. %A Tatsuoka, M. M. %B Journal of Educational Measurement %V 34 %P 3-20 %G eng %0 Conference Paper %B Paper presented at the annual meeting of the National Council on Measurement in Education %D 1997 %T The goal of equity within and between computerized adaptive tests and paper and pencil forms. %A Thomasson, G. L. %B Paper presented at the annual meeting of the National Council on Measurement in Education %C Chicago IL %G eng %0 Conference Paper %B Paper presented at the Psychometric Society meeting %D 1997 %T Identifying similar item content clusters on multiple test forms %A Reckase, M. D. %A Thompson, T.D. %A Nering, M. %B Paper presented at the Psychometric Society meeting %C Gatlinburg, TN, June %G eng %0 Conference Paper %B In T. Miller (Chair), High-dimensional simulation of item response data for CAT research. Psychometric Society %D 1997 %T Realistic simulation procedures for item response data %A Davey, T. %A Nering, M. %A Thompson, T. %B In T. Miller (Chair), High-dimensional simulation of item response data for CAT research. Psychometric Society %C Gatlinburg TN %G eng %0 Conference Paper %B Paper presented at the Psychometric Society meeting %D 1997 %T Simulation of realistic ability vectors %A Nering, M. %A Thompson, T.D. %A Davey, T. %B Paper presented at the Psychometric Society meeting %C Gatlinburg TN %G eng %0 Conference Paper %B annual meeting of the National Council on Measurement in Education %D 1996 %T A comparison of the traditional maximum information method and the global information method in CAT item selection %A Tang, K. L. %K computerized adaptive testing %K item selection %B annual meeting of the National Council on Measurement in Education %C New York, NY USA %G eng %0 Conference Paper %B Paper presented at the annual meeting of the American Educational Research Association %D 1996 %T Constructing adaptive tests to parallel conventional programs %A Davey, T. %A Thomas, L. %B Paper presented at the annual meeting of the American Educational Research Association %C New York %G eng %0 Journal Article %J Journal of Educational Measurement %D 1995 %T Effect of Rasch calibration on ability and DIF estimation in computer-adaptive tests %A Zwick, R. %A Thayer, D. T. %A Wingersky, M. %B Journal of Educational Measurement %V 32 %P 341-363 %0 Conference Paper %B Paper presented at the annual meeting of the Psychometric Society %D 1995 %T New item exposure control algorithms for computerized adaptive testing %A Thomasson, G. L. %B Paper presented at the annual meeting of the Psychometric Society %C Minneapolis MN %G eng %0 Generic %D 1995 %T Using simulation to select an adaptive testing strategy: An item bank evaluation program %A Hsu, T. C. %A Tseng, F. L. %C Unpublished manuscript, University of Pittsburgh %G eng %0 Generic %D 1994 %T DIF analysis for pretest items in computer-adaptive testing (Educational Testing Service Research Rep No RR 94-33) %A Zwick, R. %A Thayer, D. T. %A Wingersky, M. %C Princeton NJ: Educational Testing Service. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1994 %T A Simulation Study of Methods for Assessing Differential Item Functioning in Computerized Adaptive Tests %A Zwick, R. %A Thayer, D. T. %A Wingersky, M. %B Applied Psychological Measurement %V 18 %P 121-140 %G English %0 Generic %D 1993 %T A simulation study of methods for assessing differential item functioning in computer-adaptive tests (Educational Testing Service Research Rep No RR 93-11) %A Zwick, R. %A Thayer, D. %A Wingersky, M. %C Princeton NJ: Educational Testing Service. %G eng %0 Book %D 1992 %T A comparison of methods for adaptive estimation of a multidimensional trait %A Tam, S. S. %C Unpublished doctoral dissertation, Columbia University %G eng %0 Journal Article %J Dissertation Abstracts International %D 1992 %T The development and evaluation of a system for computerized adaptive testing %A de la Torre Sanchez, R. %K computerized adaptive testing %B Dissertation Abstracts International %V 52 %P 4304 %G eng %0 Generic %D 1991 %T Construction and validation of the SON-R 5-17, the Snijders-Oomen non-verbal intelligence test %A Laros, J. A. %A Tellegen, P. J. %C Groningen: Wolters-Noordhoff %G eng %0 Journal Article %J Journal of Educational Measurement %D 1991 %T On the reliability of testlet-based tests %A Sireci, S. G. %A Wainer, H., %A Thissen, D. %B Journal of Educational Measurement %V 28 %P 237-247 %G eng %0 Book %D 1990 %T Computerized adaptive testing: A primer (Eds.) %A Wainer, H., %A Dorans, N. J. %A Flaugher, R. %A Green, B. F. %A Mislevy, R. J. %A Steinberg, L. %A Thissen, D. %C Hillsdale NJ: Erlbaum %0 Book Section %D 1990 %T Creating adaptive tests of musical ability with limited-size item pools %A Vispoel, W. T. %A Twing, J. S %C D. Dalton (Ed.), ADCIS 32nd International Conference Proceedings (pp. 105-112). Columbus OH: Association for the Development of Computer-Based Instructional Systems. %G eng %0 Book Section %D 1990 %T Future challenges %A Wainer, H., %A Dorans, N. J. %A Green, B. F. %A Mislevy, R. J. %A Steinberg, L. %A Thissen, D. %C H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 233-272). Hillsdale NJ: Erlbaum. %G eng %0 Book Section %D 1990 %T Reliability and measurement precision %A Thissen, D. %C H. Wainer, N. J. Dorans, R. Flaugher, B. F. Green, R. J. Mislevy, L. Steinberg, and D. Thissen (Eds.), Computerized adaptive testing: A primer (pp. 161-186). Hillsdale NJ: Erlbaum. %G eng %0 Journal Article %J British Journal of Mathematical and Statistical Psychology %D 1990 %T Sequential item response models with an ordered response %A Tutz, G. %B British Journal of Mathematical and Statistical Psychology %V 43 %P 39-55 %0 Book Section %D 1990 %T Testing algorithms %A Thissen, D. %A Mislevy, R. J. %C H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 103-135). Hillsdale NJ: Erlbaum. %G eng %0 Generic %D 1990 %T Utility of predicting starting abilities in sequential computer-based adaptive tests (Research Report 90-1) %A Green, B. F. %A Thomas, T. J. %C Baltimore MD: Johns Hopkins University, Department of Psychology %G eng %0 Book Section %D 1990 %T Validity %A Steinberg, L. %A Thissen, D. %A Wainer, H., %C H. Wainer (Ed.), Computerized adaptive testing: A primer (pp. 187-231). Hillsdale NJ: Erlbaum. %G eng %0 Generic %D 1989 %T Item-presentation controls for computerized adaptive testing: Content-balancing versus min-CAT (Research Report 89-1) %A Thomas, T. J. %A Green, B. F. %C Baltimore MD: Johns Hopkins University, Department of Psychology, Psychometric Laboratory %G eng %0 Journal Article %J Journal of Educational Measurement %D 1989 %T Trace lines for testlets: A use of multiple-categorical-response models %A Thissen, D. %A Steinberg, L. %A Mooney, J.A. %B Journal of Educational Measurement %V 26 %P 247-260 %G eng %0 Book Section %D 1986 %T A cognitive error diagnostic adaptive testing system %A Tatsuoka, K. K. %C the 28th ADCIS International Conference Proceedings. Washington DC: ADCIS. %G eng %0 Journal Article %J Applied Psychological Measurement %D 1986 %T Some Applications of Optimization Algorithms in Test Design and Adaptive Testing %A Theunissen, T. J. J. M. %B Applied Psychological Measurement %V 10 %P 381-389 %G English %N 4 %0 Journal Article %J Applied Psychological Measurement %D 1986 %T Some applications of optimization algorithms in test design and adaptive testing %A Theunissen, T. J. J. M. %B Applied Psychological Measurement %V 10 %P 381-389 %G eng %0 Journal Article %J Journal of Employment Counseling %D 1986 %T Using microcomputer-based assessment in career counseling %A Thompson, D. L. %B Journal of Employment Counseling %V 23 %P 50-56 %G eng %0 Journal Article %J Annual Review of Psychology %D 1985 %T Latent structure and item sampling models for testing %A Traub, R. E. %A Lam, Y. R. %B Annual Review of Psychology %V 36 %P 19-48 %0 Generic %D 1984 %T Adaptive testing (Final Report Contract OPM-29-80) %A Trollip, S. R. %C Urbana-Champaign IL: University of Illinois, Aviation Research Laboratory %G eng %0 Generic %D 1984 %T Application of adaptive testing to a fraction test (Research Report 84-3-NIE) %A Tatsuoka, K. K. %A Tatsuoka, M. M. %A Baillie, R. %C Urbana IL: Univerity of Illinois, Computer-Based Education Research Laboratory %G eng %0 Book Section %D 1983 %T The person response curve: Fit of individuals to item response theory models %A Trabin, T. E. %A Weiss, D. J. %C D. J. Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 83-108). New York: Academic Press. %G eng %0 Book %D 1983 %T The stochastic modeling of elementary psychological processes %A Townsend, J. T. %A Ashby, G. F. %C Cambridge: Cambridge University Press %G eng %0 Generic %D 1982 %T An adaptive Private Pilot Certification Exam %A Trollip, S. R. %A Anderson, R. I. %C Aviation, Space, and Environmental Medicine %G eng %0 Generic %D 1980 %T Criterion-related validity of adaptive testing strategies (Research Report 80-3) %A Thompson, J. G. %A Weiss, D. J. %C Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory %G eng %0 Generic %D 1979 %T The danger of relying solely on diagnostic adaptive testing when prior and subsequent instructional methods are different (CERL Report E-5) %A Tatsuoka, K. %A Birenbaum, M. %C Urbana IL: Univeristy of Illinois, Computer-Based Education Research Laboratory. %G eng %0 Journal Article %J Journal of Computer-Based Instruction. %D 1975 %T Sequential testing for instructional classification %A Thomas, D. B. %B Journal of Computer-Based Instruction. %V 1 %P 92-99 %G eng %0 Generic %D 1974 %T Computer-based adaptive testing models for the Air Force technical training environment: Phase I: Development of a computerized measurement system for Air Force technical Training %A Hansen, D. N. %A Johnson, B. F. %A Fagan, R. L. %A Tan, P. %A Dick, W. %C JSAS Catalogue of Selected Documents in Psychology, 5, 1-86 (MS No. 882). AFHRL Technical Report 74-48. %G eng %0 Book %D 1973 %T A multivariate experimental study of three computerized adaptive testing models for the measurement of attitude toward teaching effectiveness %A Tam, P. T.-K. %C Unpublished doctoral dissertation, Florida State University %G eng %L University Microfims No.73-31, 534 %0 Journal Article %J Journal of Psychology %D 1965 %T Adaptive testing in an older population %A Greenwood, D. I. %A Taylor, C. %B Journal of Psychology %V 60 %P 193-198 %G eng %0 Generic %D 1960 %T Construction of an experimental sequential item test (Research Memorandum 60-1) %A Bayroff, A. G. %A Thomas, J. J %A Anderson, A. A. %C Washington DC: Personnel Research Branch, Department of the Army %G eng