%0 Journal Article %J Sleep %D 2010 %T Development and validation of patient-reported outcome measures for sleep disturbance and sleep-related impairments %A Buysse, D. J. %A Yu, L. %A Moul, D. E. %A Germain, A. %A Stover, A. %A Dodds, N. E. %A Johnston, K. L. %A Shablesky-Cade, M. A. %A Pilkonis, P. A. %K *Outcome Assessment (Health Care) %K *Self Disclosure %K Adult %K Aged %K Aged, 80 and over %K Cross-Sectional Studies %K Factor Analysis, Statistical %K Female %K Humans %K Male %K Middle Aged %K Psychometrics %K Questionnaires %K Reproducibility of Results %K Sleep Disorders/*diagnosis %K Young Adult %X STUDY OBJECTIVES: To develop an archive of self-report questions assessing sleep disturbance and sleep-related impairments (SRI), to develop item banks from this archive, and to validate and calibrate the item banks using classic validation techniques and item response theory analyses in a sample of clinical and community participants. DESIGN: Cross-sectional self-report study. SETTING: Academic medical center and participant homes. PARTICIPANTS: One thousand nine hundred ninety-three adults recruited from an Internet polling sample and 259 adults recruited from medical, psychiatric, and sleep clinics. INTERVENTIONS: None. MEASUREMENTS AND RESULTS: This study was part of PROMIS (Patient-Reported Outcomes Information System), a National Institutes of Health Roadmap initiative. Self-report item banks were developed through an iterative process of literature searches, collecting and sorting items, expert content review, qualitative patient research, and pilot testing. Internal consistency, convergent validity, and exploratory and confirmatory factor analysis were examined in the resulting item banks. Factor analyses identified 2 preliminary item banks, sleep disturbance and SRI. Item response theory analyses and expert content review narrowed the item banks to 27 and 16 items, respectively. Validity of the item banks was supported by moderate to high correlations with existing scales and by significant differences in sleep disturbance and SRI scores between participants with and without sleep disorders. CONCLUSIONS: The PROMIS sleep disturbance and SRI item banks have excellent measurement properties and may prove to be useful for assessing general aspects of sleep and SRI with various groups of patients and interventions. %B Sleep %7 2010/06/17 %V 33 %P 781-92 %8 Jun 1 %@ 0161-8105 (Print)0161-8105 (Linking) %G eng %M 20550019 %2 2880437 %0 Journal Article %J Journal of Pain %D 2009 %T Development and preliminary testing of a computerized adaptive assessment of chronic pain %A Anatchkova, M. D. %A Saris-Baglama, R. N. %A Kosinski, M. %A Bjorner, J. B. %K *Computers %K *Questionnaires %K Activities of Daily Living %K Adaptation, Psychological %K Chronic Disease %K Cohort Studies %K Disability Evaluation %K Female %K Humans %K Male %K Middle Aged %K Models, Psychological %K Outcome Assessment (Health Care) %K Pain Measurement/*methods %K Pain, Intractable/*diagnosis/psychology %K Psychometrics %K Quality of Life %K User-Computer Interface %X The aim of this article is to report the development and preliminary testing of a prototype computerized adaptive test of chronic pain (CHRONIC PAIN-CAT) conducted in 2 stages: (1) evaluation of various item selection and stopping rules through real data-simulated administrations of CHRONIC PAIN-CAT; (2) a feasibility study of the actual prototype CHRONIC PAIN-CAT assessment system conducted in a pilot sample. Item calibrations developed from a US general population sample (N = 782) were used to program a pain severity and impact item bank (kappa = 45), and real data simulations were conducted to determine a CAT stopping rule. The CHRONIC PAIN-CAT was programmed on a tablet PC using QualityMetric's Dynamic Health Assessment (DYHNA) software and administered to a clinical sample of pain sufferers (n = 100). The CAT was completed in significantly less time than the static (full item bank) assessment (P < .001). On average, 5.6 items were dynamically administered by CAT to achieve a precise score. Scores estimated from the 2 assessments were highly correlated (r = .89), and both assessments discriminated across pain severity levels (P < .001, RV = .95). Patients' evaluations of the CHRONIC PAIN-CAT were favorable. PERSPECTIVE: This report demonstrates that the CHRONIC PAIN-CAT is feasible for administration in a clinic. The application has the potential to improve pain assessment and help clinicians manage chronic pain. %B Journal of Pain %7 2009/07/15 %V 10 %P 932-943 %8 Sep %@ 1528-8447 (Electronic)1526-5900 (Linking) %G eng %M 19595636 %2 2763618 %0 Journal Article %J Annual Review of Clinical Psychology %D 2009 %T Item response theory and clinical measurement %A Reise, S. P. %A Waller, N. G. %K *Psychological Theory %K Humans %K Mental Disorders/diagnosis/psychology %K Psychological Tests %K Psychometrics %K Quality of Life %K Questionnaires %X In this review, we examine studies that use item response theory (IRT) to explore the psychometric properties of clinical measures. Next, we consider how IRT has been used in clinical research for: scale linking, computerized adaptive testing, and differential item functioning analysis. Finally, we consider the scale properties of IRT trait scores. We conclude that there are notable differences between cognitive and clinical measures that have relevance for IRT modeling. Future research should be directed toward a better understanding of the metric of the latent trait and the psychological processes that lead to individual differences in item response behaviors. %B Annual Review of Clinical Psychology %7 2008/11/04 %V 5 %P 27-48 %@ 1548-5951 (Electronic) %G eng %M 18976138 %0 Journal Article %J Disability & Rehabilitation %D 2008 %T Efficiency and sensitivity of multidimensional computerized adaptive testing of pediatric physical functioning %A Allen, D. D. %A Ni, P. %A Haley, S. M. %K *Disability Evaluation %K Child %K Computers %K Disabled Children/*classification/rehabilitation %K Efficiency %K Humans %K Outcome Assessment (Health Care) %K Psychometrics %K Reproducibility of Results %K Retrospective Studies %K Self Care %K Sensitivity and Specificity %X PURPOSE: Computerized adaptive tests (CATs) have efficiency advantages over fixed-length tests of physical functioning but may lose sensitivity when administering extremely low numbers of items. Multidimensional CATs may efficiently improve sensitivity by capitalizing on correlations between functional domains. Using a series of empirical simulations, we assessed the efficiency and sensitivity of multidimensional CATs compared to a longer fixed-length test. METHOD: Parent responses to the Pediatric Evaluation of Disability Inventory before and after intervention for 239 children at a pediatric rehabilitation hospital provided the data for this retrospective study. Reliability, effect size, and standardized response mean were compared between full-length self-care and mobility subscales and simulated multidimensional CATs with stopping rules at 40, 30, 20, and 10 items. RESULTS: Reliability was lowest in the 10-item CAT condition for the self-care (r = 0.85) and mobility (r = 0.79) subscales; all other conditions had high reliabilities (r > 0.94). All multidimensional CAT conditions had equivalent levels of sensitivity compared to the full set condition for both domains. CONCLUSIONS: Multidimensional CATs efficiently retain the sensitivity of longer fixed-length measures even with 5 items per dimension (10-item CAT condition). Measuring physical functioning with multidimensional CATs could enhance sensitivity following intervention while minimizing response burden. %B Disability & Rehabilitation %7 2008/02/26 %V 30 %P 479-84 %@ 0963-8288 (Print)0963-8288 (Linking) %G eng %M 18297502 %0 Journal Article %J Quality of Life Research %D 2007 %T Developing tailored instruments: item banking and computerized adaptive assessment %A Bjorner, J. B. %A Chang, C-H. %A Thissen, D. %A Reeve, B. B. %K *Health Status %K *Health Status Indicators %K *Mental Health %K *Outcome Assessment (Health Care) %K *Quality of Life %K *Questionnaires %K *Software %K Algorithms %K Factor Analysis, Statistical %K Humans %K Models, Statistical %K Psychometrics %X Item banks and Computerized Adaptive Testing (CAT) have the potential to greatly improve the assessment of health outcomes. This review describes the unique features of item banks and CAT and discusses how to develop item banks. In CAT, a computer selects the items from an item bank that are most relevant for and informative about the particular respondent; thus optimizing test relevance and precision. Item response theory (IRT) provides the foundation for selecting the items that are most informative for the particular respondent and for scoring responses on a common metric. The development of an item bank is a multi-stage process that requires a clear definition of the construct to be measured, good items, a careful psychometric analysis of the items, and a clear specification of the final CAT. The psychometric analysis needs to evaluate the assumptions of the IRT model such as unidimensionality and local independence; that the items function the same way in different subgroups of the population; and that there is an adequate fit between the data and the chosen item response models. Also, interpretation guidelines need to be established to help the clinical application of the assessment. Although medical research can draw upon expertise from educational testing in the development of item banks and CAT, the medical field also encounters unique opportunities and challenges. %B Quality of Life Research %7 2007/05/29 %V 16 %P 95-108 %@ 0962-9343 (Print) %G eng %M 17530450 %0 Journal Article %J Quality of Life Research %D 2007 %T IRT health outcomes data analysis project: an overview and summary %A Cook, K. F. %A Teal, C. R. %A Bjorner, J. B. %A Cella, D. %A Chang, C-H. %A Crane, P. K. %A Gibbons, L. E. %A Hays, R. D. %A McHorney, C. A. %A Ocepek-Welikson, K. %A Raczek, A. E. %A Teresi, J. A. %A Reeve, B. B. %K *Data Interpretation, Statistical %K *Health Status %K *Quality of Life %K *Questionnaires %K *Software %K Female %K HIV Infections/psychology %K Humans %K Male %K Neoplasms/psychology %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Stress, Psychological %X BACKGROUND: In June 2004, the National Cancer Institute and the Drug Information Association co-sponsored the conference, "Improving the Measurement of Health Outcomes through the Applications of Item Response Theory (IRT) Modeling: Exploration of Item Banks and Computer-Adaptive Assessment." A component of the conference was presentation of a psychometric and content analysis of a secondary dataset. OBJECTIVES: A thorough psychometric and content analysis was conducted of two primary domains within a cancer health-related quality of life (HRQOL) dataset. RESEARCH DESIGN: HRQOL scales were evaluated using factor analysis for categorical data, IRT modeling, and differential item functioning analyses. In addition, computerized adaptive administration of HRQOL item banks was simulated, and various IRT models were applied and compared. SUBJECTS: The original data were collected as part of the NCI-funded Quality of Life Evaluation in Oncology (Q-Score) Project. A total of 1,714 patients with cancer or HIV/AIDS were recruited from 5 clinical sites. MEASURES: Items from 4 HRQOL instruments were evaluated: Cancer Rehabilitation Evaluation System-Short Form, European Organization for Research and Treatment of Cancer Quality of Life Questionnaire, Functional Assessment of Cancer Therapy and Medical Outcomes Study Short-Form Health Survey. RESULTS AND CONCLUSIONS: Four lessons learned from the project are discussed: the importance of good developmental item banks, the ambiguity of model fit results, the limits of our knowledge regarding the practical implications of model misfit, and the importance in the measurement of HRQOL of construct definition. With respect to these lessons, areas for future research are suggested. The feasibility of developing item banks for broad definitions of health is discussed. %B Quality of Life Research %7 2007/03/14 %V 16 %P 121-132 %@ 0962-9343 (Print) %G eng %M 17351824 %0 Journal Article %J Medical Care %D 2007 %T Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS) %A Reeve, B. B. %A Hays, R. D. %A Bjorner, J. B. %A Cook, K. F. %A Crane, P. K. %A Teresi, J. A. %A Thissen, D. %A Revicki, D. A. %A Weiss, D. J. %A Hambleton, R. K. %A Liu, H. %A Gershon, R. C. %A Reise, S. P. %A Lai, J. S. %A Cella, D. %K *Health Status %K *Information Systems %K *Quality of Life %K *Self Disclosure %K Adolescent %K Adult %K Aged %K Calibration %K Databases as Topic %K Evaluation Studies as Topic %K Female %K Humans %K Male %K Middle Aged %K Outcome Assessment (Health Care)/*methods %K Psychometrics %K Questionnaires/standards %K United States %X BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment. %B Medical Care %7 2007/04/20 %V 45 %P S22-31 %8 May %@ 0025-7079 (Print) %G eng %M 17443115 %0 Journal Article %J European Journal of Psychological Assessment %D 2007 %T Psychometric properties of an emotional adjustment measure: An application of the graded response model %A Rubio, V. J. %A Aguado, D. %A Hontangas, P. M. %A Hernández, J. M. %K computerized adaptive tests %K Emotional Adjustment %K Item Response Theory %K Personality Measures %K personnel recruitment %K Psychometrics %K Samejima's graded response model %K test reliability %K validity %X Item response theory (IRT) provides valuable methods for the analysis of the psychometric properties of a psychological measure. However, IRT has been mainly used for assessing achievements and ability rather than personality factors. This paper presents an application of the IRT to a personality measure. Thus, the psychometric properties of a new emotional adjustment measure that consists of a 28-six graded response items is shown. Classical test theory (CTT) analyses as well as IRT analyses are carried out. Samejima's (1969) graded-response model has been used for estimating item parameters. Results show that the bank of items fulfills model assumptions and fits the data reasonably well, demonstrating the suitability of the IRT models for the description and use of data originating from personality measures. In this sense, the model fulfills the expectations that IRT has undoubted advantages: (1) The invariance of the estimated parameters, (2) the treatment given to the standard error of measurement, and (3) the possibilities offered for the construction of computerized adaptive tests (CAT). The bank of items shows good reliability. It also shows convergent validity compared to the Eysenck Personality Inventory (EPQ-A; Eysenck & Eysenck, 1975) and the Big Five Questionnaire (BFQ; Caprara, Barbaranelli, & Borgogni, 1993). (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B European Journal of Psychological Assessment %I Hogrefe & Huber Publishers GmbH: Germany %V 23 %P 39-46 %@ 1015-5759 (Print) %G eng %M 2007-01587-007 %0 Journal Article %J Psychology Science %D 2006 %T Adaptive success control in computerized adaptive testing %A Häusler, Joachim %K adaptive success control %K computerized adaptive testing %K Psychometrics %X In computerized adaptive testing (CAT) procedures within the framework of probabilistic test theory the difficulty of an item is adjusted to the ability of the respondent, with the aim of maximizing the amount of information generated per item, thereby also increasing test economy and test reasonableness. However, earlier research indicates that respondents might feel over-challenged by a constant success probability of p = 0.5 and therefore cannot come to a sufficiently high answer certainty within a reasonable timeframe. Consequently response time per item increases, which -- depending on the test material -- can outweigh the benefit of administering optimally informative items. Instead of a benefit, the result of using CAT procedures could be a loss of test economy. Based on this problem, an adaptive success control algorithm was designed and tested, adapting the success probability to the working style of the respondent. Persons who need higher answer certainty in order to come to a decision are detected and receive a higher success probability, in order to minimize the test duration (not the number of items as in classical CAT). The method is validated on the re-analysis of data from the Adaptive Matrices Test (AMT, Hornke, Etzel & Rettig, 1999) and by the comparison between an AMT version using classical CAT and an experimental version using Adaptive Success Control. The results are discussed in the light of psychometric and psychological aspects of test quality. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Psychology Science %I Pabst Science Publishers: Germany %V 48 %P 436-450 %@ 0033-3018 (Print) %G eng %M 2007-03313-004 %0 Book Section %B Handbook of multimethod measurement in psychology %D 2006 %T Computer-based testing %A F Drasgow %A Chuah, S. C. %K Adaptive Testing computerized adaptive testing %K Computer Assisted Testing %K Experimentation %K Psychometrics %K Theories %X (From the chapter) There has been a proliferation of research designed to explore and exploit opportunities provided by computer-based assessment. This chapter provides an overview of the diverse efforts by researchers in this area. It begins by describing how paper-and-pencil tests can be adapted for administration by computers. Computerization provides the important advantage that items can be selected so they are of appropriate difficulty for each examinee. Some of the psychometric theory needed for computerized adaptive testing is reviewed. Then research on innovative computerized assessments is summarized. These assessments go beyond multiple-choice items by using formats made possible by computerization. Then some hardware and software issues are described, and finally, directions for future work are outlined. (PsycINFO Database Record (c) 2006 APA ) %B Handbook of multimethod measurement in psychology %I American Psychological Association %C Washington D.C. USA %V xiv %P 87-100 %G eng %0 Journal Article %J Journal of Applied Measurement %D 2006 %T Expansion of a physical function item bank and development of an abbreviated form for clinical research %A Bode, R. K. %A Lai, J-S. %A Dineen, K. %A Heinemann, A. W. %A Shevrin, D. %A Von Roenn, J. %A Cella, D. %K clinical research %K computerized adaptive testing %K performance levels %K physical function item bank %K Psychometrics %K test reliability %K Test Validity %X We expanded an existing 33-item physical function (PF) item bank with a sufficient number of items to enable computerized adaptive testing (CAT). Ten items were written to expand the bank and the new item pool was administered to 295 people with cancer. For this analysis of the new pool, seven poorly performing items were identified for further examination. This resulted in a bank with items that define an essentially unidimensional PF construct, cover a wide range of that construct, reliably measure the PF of persons with cancer, and distinguish differences in self-reported functional performance levels. We also developed a 5-item (static) assessment form ("BriefPF") that can be used in clinical research to express scores on the same metric as the overall bank. The BriefPF was compared to the PF-10 from the Medical Outcomes Study SF-36. Both short forms significantly differentiated persons across functional performance levels. While the entire bank was more precise across the PF continuum than either short form, there were differences in the area of the continuum in which each short form was more precise: the BriefPF was more precise than the PF-10 at the lower functional levels and the PF-10 was more precise than the BriefPF at the higher levels. Future research on this bank will include the development of a CAT version, the PF-CAT. (PsycINFO Database Record (c) 2007 APA, all rights reserved) %B Journal of Applied Measurement %I Richard M Smith: US %V 7 %P 1-15 %@ 1529-7713 (Print) %G eng %M 2006-01262-001 %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2006 %T Measurement precision and efficiency of multidimensional computer adaptive testing of physical functioning using the pediatric evaluation of disability inventory %A Haley, S. M. %A Ni, P. %A Ludlow, L. H. %A Fragala-Pinkham, M. A. %K *Disability Evaluation %K *Pediatrics %K Adolescent %K Child %K Child, Preschool %K Computers %K Disabled Persons/*classification/rehabilitation %K Efficiency %K Humans %K Infant %K Outcome Assessment (Health Care) %K Psychometrics %K Self Care %X OBJECTIVE: To compare the measurement efficiency and precision of a multidimensional computer adaptive testing (M-CAT) application to a unidimensional CAT (U-CAT) comparison using item bank data from 2 of the functional skills scales of the Pediatric Evaluation of Disability Inventory (PEDI). DESIGN: Using existing PEDI mobility and self-care item banks, we compared the stability of item calibrations and model fit between unidimensional and multidimensional Rasch models and compared the efficiency and precision of the U-CAT- and M-CAT-simulated assessments to a random draw of items. SETTING: Pediatric rehabilitation hospital and clinics. PARTICIPANTS: Clinical and normative samples. INTERVENTIONS: Not applicable. MAIN OUTCOME MEASURES: Not applicable. RESULTS: The M-CAT had greater levels of precision and efficiency than the separate mobility and self-care U-CAT versions when using a similar number of items for each PEDI subdomain. Equivalent estimation of mobility and self-care scores can be achieved with a 25% to 40% item reduction with the M-CAT compared with the U-CAT. CONCLUSIONS: M-CAT applications appear to have both precision and efficiency advantages compared with separate U-CAT assessments when content subdomains have a high correlation. Practitioners may also realize interpretive advantages of reporting test score information for each subdomain when separate clinical inferences are desired. %B Archives of Physical Medicine and Rehabilitation %7 2006/08/29 %V 87 %P 1223-9 %8 Sep %@ 0003-9993 (Print) %G eng %M 16935059 %0 Journal Article %J Anales de Psicología %D 2006 %T Técnicas para detectar patrones de respuesta atípicos [Aberrant patterns detection methods] %A Núñez, R. M. N. %A Pina, J. A. L. %K aberrant patterns detection %K Classical Test Theory %K generalizability theory %K Item Response %K Item Response Theory %K Mathematics %K methods %K person-fit %K Psychometrics %K psychometry %K Test Validity %K test validity analysis %K Theory %X La identificación de patrones de respuesta atípicos es de gran utilidad para la construcción de tests y de bancos de ítems con propiedades psicométricas así como para el análisis de validez de los mismos. En este trabajo de revisión se han recogido los más relevantes y novedosos métodos de ajuste de personas que se han elaborado dentro de cada uno de los principales ámbitos de trabajo de la Psicometría: el escalograma de Guttman, la Teoría Clásica de Tests (TCT), la Teoría de la Generalizabilidad (TG), la Teoría de Respuesta al Ítem (TRI), los Modelos de Respuesta al Ítem No Paramétricos (MRINP), los Modelos de Clase Latente de Orden Restringido (MCL-OR) y el Análisis de Estructura de Covarianzas (AEC).Aberrant patterns detection has a great usefulness in order to make tests and item banks with psychometric characteristics and validity analysis of tests and items. The most relevant and newest person-fit methods have been reviewed. All of them have been made in each one of main areas of Psychometry: Guttman's scalogram, Classical Test Theory (CTT), Generalizability Theory (GT), Item Response Theory (IRT), Non-parametric Response Models (NPRM), Order-Restricted Latent Class Models (OR-LCM) and Covariance Structure Analysis (CSA). %B Anales de Psicología %V 22 %P 143-154 %@ 0212-9728 %G Spanish %M 2006-07751-018 %0 Journal Article %J Evaluation and the Health Professions %D 2005 %T Data pooling and analysis to build a preliminary item bank: an example using bowel function in prostate cancer %A Eton, D. T. %A Lai, J. S. %A Cella, D. %A Reeve, B. B. %A Talcott, J. A. %A Clark, J. A. %A McPherson, C. P. %A Litwin, M. S. %A Moinpour, C. M. %K *Quality of Life %K *Questionnaires %K Adult %K Aged %K Data Collection/methods %K Humans %K Intestine, Large/*physiopathology %K Male %K Middle Aged %K Prostatic Neoplasms/*physiopathology %K Psychometrics %K Research Support, Non-U.S. Gov't %K Statistics, Nonparametric %X Assessing bowel function (BF) in prostate cancer can help determine therapeutic trade-offs. We determined the components of BF commonly assessed in prostate cancer studies as an initial step in creating an item bank for clinical and research application. We analyzed six archived data sets representing 4,246 men with prostate cancer. Thirty-one items from validated instruments were available for analysis. Items were classified into domains (diarrhea, rectal urgency, pain, bleeding, bother/distress, and other) then subjected to conventional psychometric and item response theory (IRT) analyses. Items fit the IRT model if the ratio between observed and expected item variance was between 0.60 and 1.40. Four of 31 items had inadequate fit in at least one analysis. Poorly fitting items included bleeding (2), rectal urgency (1), and bother/distress (1). A fifth item assessing hemorrhoids was poorly correlated with other items. Our analyses supported four related components of BF: diarrhea, rectal urgency, pain, and bother/distress. %B Evaluation and the Health Professions %V 28 %P 142-59 %G eng %M 15851770 %0 Journal Article %J Journal of Clinical Epidemiology %D 2005 %T An item bank was created to improve the measurement of cancer-related fatigue %A Lai, J-S. %A Cella, D. %A Dineen, K. %A Bode, R. %A Von Roenn, J. %A Gershon, R. C. %A Shevrin, D. %K Adult %K Aged %K Aged, 80 and over %K Factor Analysis, Statistical %K Fatigue/*etiology/psychology %K Female %K Humans %K Male %K Middle Aged %K Neoplasms/*complications/psychology %K Psychometrics %K Questionnaires %X OBJECTIVE: Cancer-related fatigue (CRF) is one of the most common unrelieved symptoms experienced by patients. CRF is underrecognized and undertreated due to a lack of clinically sensitive instruments that integrate easily into clinics. Modern computerized adaptive testing (CAT) can overcome these obstacles by enabling precise assessment of fatigue without requiring the administration of a large number of questions. A working item bank is essential for development of a CAT platform. The present report describes the building of an operational item bank for use in clinical settings with the ultimate goal of improving CRF identification and treatment. STUDY DESIGN AND SETTING: The sample included 301 cancer patients. Psychometric properties of items were examined by using Rasch analysis, an Item Response Theory (IRT) model. RESULTS AND CONCLUSION: The final bank includes 72 items. These 72 unidimensional items explained 57.5% of the variance, based on factor analysis results. Excellent internal consistency (alpha=0.99) and acceptable item-total correlation were found (range: 0.51-0.85). The 72 items covered a reasonable range of the fatigue continuum. No significant ceiling effects, floor effects, or gaps were found. A sample short form was created for demonstration purposes. The resulting bank is amenable to the development of a CAT platform. %B Journal of Clinical Epidemiology %7 2005/02/01 %V 58 %P 190-7 %8 Feb %@ 0895-4356 (Print)0895-4356 (Linking) %G eng %9 Multicenter Study %M 15680754 %0 Journal Article %J Acta Comportamentalia %D 2005 %T La Validez desde una óptica psicométrica [Validity from a psychometric perspective] %A Muñiz, J. %K Factor Analysis %K Measurement %K Psychometrics %K Scaling (Testing) %K Statistical %K Technology %K Test Validity %X El estudio de la validez constituye el eje central de los análisis psicométricos de los instrumentos de medida. En esta comunicación se traza una breve nota histórica de los distintos modos de concebir la validez a lo largo de los tiempos, se comentan las líneas actuales, y se tratan de vislumbrar posibles vías futuras, teniendo en cuenta el impacto que las nuevas tecnologías informáticas están ejerciendo sobre los propios instrumentos de medida en Psicología y Educación. Cuestiones como los nuevos formatos multimedia de los ítems, la evaluación a distancia, el uso intercultural de las pruebas, las consecuencias de su uso, o los tests adaptativos informatizados, reclaman nuevas formas de evaluar y conceptualizar la validez. También se analizan críticamente algunos planteamientos recientes sobre el concepto de validez. The study of validity constitutes a central axis of psychometric analyses of measurement instruments. This paper presents a historical sketch of different modes of conceiving validity, with commentary on current views, and it attempts to predict future lines of research by considering the impact of new computerized technologies on measurement instruments in psychology and education. Factors such as the new multimedia format of items, distance assessment, the intercultural use of tests, the consequences of the latter, or the development of computerized adaptive tests demand new ways of conceiving and evaluating validity. Some recent thoughts about the concept of validity are also critically analyzed. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B Acta Comportamentalia %V 13 %P 9-20 %G eng %0 Journal Article %J Psicothema %D 2005 %T Propiedades psicométricas de un test Adaptativo Informatizado para la medición del ajuste emocional [Psychometric properties of an Emotional Adjustment Computerized Adaptive Test] %A Aguado, D. %A Rubio, V. J. %A Hontangas, P. M. %A Hernández, J. M. %K Computer Assisted Testing %K Emotional Adjustment %K Item Response %K Personality Measures %K Psychometrics %K Test Validity %K Theory %X En el presente trabajo se describen las propiedades psicométricas de un Test Adaptativo Informatizado para la medición del ajuste emocional de las personas. La revisión de la literatura acerca de la aplicación de los modelos de la teoría de la respuesta a los ítems (TRI) muestra que ésta se ha utilizado más en el trabajo con variables aptitudinales que para la medición de variables de personalidad, sin embargo diversos estudios han mostrado la eficacia de la TRI para la descripción psicométrica de dichasvariables. Aun así, pocos trabajos han explorado las características de un Test Adaptativo Informatizado, basado en la TRI, para la medición de una variable de personalidad como es el ajuste emocional. Nuestros resultados muestran la eficiencia del TAI para la evaluación del ajuste emocional, proporcionando una medición válida y precisa, utilizando menor número de elementos de medida encomparación con las escalas de ajuste emocional de instrumentos fuertemente implantados. Psychometric properties of an emotional adjustment computerized adaptive test. In the present work it was described the psychometric properties of an emotional adjustment computerized adaptive test. An examination of Item Response Theory (IRT) research literature indicates that IRT has been mainly used for assessing achievements and ability rather than personality factors. Nevertheless last years have shown several studies wich have successfully used IRT to personality assessment instruments. Even so, a few amount of works has inquired the computerized adaptative test features, based on IRT, for the measurement of a personality traits as it’s the emotional adjustment. Our results show the CAT efficiency for the emotional adjustment assessment so this provides a valid and accurate measurement; by using a less number of items in comparison with the emotional adjustment scales from the most strongly established questionnaires. %B Psicothema %V 17 %P 484-491 %G eng %0 Journal Article %J Testing Psicometria Metodologia %D 2005 %T Somministrazione di test computerizzati di tipo adattivo: Un' applicazione del modello di misurazione di Rasch [Administration of computerized and adaptive tests: An application of the Rasch Model] %A Miceli, R. %A Molinengo, G. %K Adaptive Testing %K Computer Assisted Testing %K Item Response Theory computerized adaptive testing %K Models %K Psychometrics %X The aim of the present study is to describe the characteristics of a procedure for administering computerized and adaptive tests (Computer Adaptive Testing or CAT). Items to be asked to the individuals are interactively chosen and are selected from a "bank" in which they were previously calibrated and recorded on the basis of their difficulty level. The selection of items is performed by increasingly more accurate estimates of the examinees' ability. The building of an item-bank on Psychometrics and the implementation of this procedure allow a first validation through Monte Carlo simulations. (PsycINFO Database Record (c) 2006 APA ) (journal abstract) %B Testing Psicometria Metodologia %V 12 %P 131-149 %G eng %0 Journal Article %J Alcoholism: Clinical & Experimental Research %D 2005 %T Toward efficient and comprehensive measurement of the alcohol problems continuum in college students: The Brief Young Adult Alcohol Consequences Questionnaire %A Kahler, C. W. %A Strong, D. R. %A Read, J. P. %A De Boeck, P. %A Wilson, M. %A Acton, G. S. %A Palfai, T. P. %A Wood, M. D. %A Mehta, P. D. %A Neale, M. C. %A Flay, B. R. %A Conklin, C. A. %A Clayton, R. R. %A Tiffany, S. T. %A Shiffman, S. %A Krueger, R. F. %A Nichol, P. E. %A Hicks, B. M. %A Markon, K. E. %A Patrick, C. J. %A Iacono, William G. %A McGue, Matt %A Langenbucher, J. W. %A Labouvie, E. %A Martin, C. S. %A Sanjuan, P. M. %A Bavly, L. %A Kirisci, L. %A Chung, T. %A Vanyukov, M. %A Dunn, M. %A Tarter, R. %A Handel, R. W. %A Ben-Porath, Y. S. %A Watt, M. %K Psychometrics %K Substance-Related Disorders %X Background: Although a number of measures of alcohol problems in college students have been studied, the psychometric development and validation of these scales have been limited, for the most part, to methods based on classical test theory. In this study, we conducted analyses based on item response theory to select a set of items for measuring the alcohol problem severity continuum in college students that balances comprehensiveness and efficiency and is free from significant gender bias., Method: We conducted Rasch model analyses of responses to the 48-item Young Adult Alcohol Consequences Questionnaire by 164 male and 176 female college students who drank on at least a weekly basis. An iterative process using item fit statistics, item severities, item discrimination parameters, model residuals, and analysis of differential item functioning by gender was used to pare the items down to those that best fit a Rasch model and that were most efficient in discriminating among levels of alcohol problems in the sample., Results: The process of iterative Rasch model analyses resulted in a final 24-item scale with the data fitting the unidimensional Rasch model very well. The scale showed excellent distributional properties, had items adequately matched to the severity of alcohol problems in the sample, covered a full range of problem severity, and appeared highly efficient in retaining all of the meaningful variance captured by the original set of 48 items., Conclusions: The use of Rasch model analyses to inform item selection produced a final scale that, in both its comprehensiveness and its efficiency, should be a useful tool for researchers studying alcohol problems in college students. To aid interpretation of raw scores, examples of the types of alcohol problems that are likely to be experienced across a range of selected scores are provided., (C)2005Research Society on AlcoholismAn important, sometimes controversial feature of all psychological phenomena is whether they are categorical or dimensional. A conceptual and psychometric framework is described for distinguishing whether the latent structure behind manifest categories (e.g., psychiatric diagnoses, attitude groups, or stages of development) is category-like or dimension-like. Being dimension-like requires (a) within-category heterogeneity and (b) between-category quantitative differences. Being category-like requires (a) within-category homogeneity and (b) between-category qualitative differences. The relation between this classification and abrupt versus smooth differences is discussed. Hybrid structures are possible. Being category-like is itself a matter of degree; the authors offer a formalized framework to determine this degree. Empirical applications to personality disorders, attitudes toward capital punishment, and stages of cognitive development illustrate the approach., (C) 2005 by the American Psychological AssociationThe authors conducted Rasch model ( G. Rasch, 1960) analyses of items from the Young Adult Alcohol Problems Screening Test (YAAPST; S. C. Hurlbut & K. J. Sher, 1992) to examine the relative severity and ordering of alcohol problems in 806 college students. Items appeared to measure a single dimension of alcohol problem severity, covering a broad range of the latent continuum. Items fit the Rasch model well, with less severe symptoms reliably preceding more severe symptoms in a potential progression toward increasing levels of problem severity. However, certain items did not index problem severity consistently across demographic subgroups. A shortened, alternative version of the YAAPST is proposed, and a norm table is provided that allows for a linking of total YAAPST scores to expected symptom expression., (C) 2004 by the American Psychological AssociationA didactic on latent growth curve modeling for ordinal outcomes is presented. The conceptual aspects of modeling growth with ordinal variables and the notion of threshold invariance are illustrated graphically using a hypothetical example. The ordinal growth model is described in terms of 3 nested models: (a) multivariate normality of the underlying continuous latent variables (yt) and its relationship with the observed ordinal response pattern (Yt), (b) threshold invariance over time, and (c) growth model for the continuous latent variable on a common scale. Algebraic implications of the model restrictions are derived, and practical aspects of fitting ordinal growth models are discussed with the help of an empirical example and Mx script ( M. C. Neale, S. M. Boker, G. Xie, & H. H. Maes, 1999). The necessary conditions for the identification of growth models with ordinal data and the methodological implications of the model of threshold invariance are discussed., (C) 2004 by the American Psychological AssociationRecent research points toward the viability of conceptualizing alcohol problems as arrayed along a continuum. Nevertheless, modern statistical techniques designed to scale multiple problems along a continuum (latent trait modeling; LTM) have rarely been applied to alcohol problems. This study applies LTM methods to data on 110 problems reported during in-person interviews of 1,348 middle-aged men (mean age = 43) from the general population. The results revealed a continuum of severity linking the 110 problems, ranging from heavy and abusive drinking, through tolerance and withdrawal, to serious complications of alcoholism. These results indicate that alcohol problems can be arrayed along a dimension of severity and emphasize the relevance of LTM to informing the conceptualization and assessment of alcohol problems., (C) 2004 by the American Psychological AssociationItem response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview-Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus ( B. Muthen & L. Muthen, 1998) and MULTILOG ( D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance., (C) 2004 by the American Psychological AssociationThis study examined the psychometric characteristics of an index of substance use involvement using item response theory. The sample consisted of 292 men and 140 women who qualified for a Diagnostic and Statistical Manual of Mental Disorders (3rd ed., rev.; American Psychiatric Association, 1987) substance use disorder (SUD) diagnosis and 293 men and 445 women who did not qualify for a SUD diagnosis. The results indicated that men had a higher probability of endorsing substance use compared with women. The index significantly predicted health, psychiatric, and psychosocial disturbances as well as level of substance use behavior and severity of SUD after a 2-year follow-up. Finally, this index is a reliable and useful prognostic indicator of the risk for SUD and the medical and psychosocial sequelae of drug consumption., (C) 2002 by the American Psychological AssociationComparability, validity, and impact of loss of information of a computerized adaptive administration of the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) were assessed in a sample of 140 Veterans Affairs hospital patients. The countdown method ( Butcher, Keller, & Bacon, 1985) was used to adaptively administer Scales L (Lie) and F (Frequency), the 10 clinical scales, and the 15 content scales. Participants completed the MMPI-2 twice, in 1 of 2 conditions: computerized conventional test-retest, or computerized conventional-computerized adaptive. Mean profiles and test-retest correlations across modalities were comparable. Correlations between MMPI-2 scales and criterion measures supported the validity of the countdown method, although some attenuation of validity was suggested for certain health-related items. Loss of information incurred with this mode of adaptive testing has minimal impact on test validity. Item and time savings were substantial., (C) 1999 by the American Psychological Association %B Alcoholism: Clinical & Experimental Research %V 29 %P 1180-1189 %G eng %0 Journal Article %J Medical Care %D 2004 %T Activity outcome measurement for postacute care %A Haley, S. M. %A Coster, W. J. %A Andres, P. L. %A Ludlow, L. H. %A Ni, P. %A Bond, T. L. %A Sinclair, S. J. %A Jette, A. M. %K *Self Efficacy %K *Sickness Impact Profile %K Activities of Daily Living/*classification/psychology %K Adult %K Aftercare/*standards/statistics & numerical data %K Aged %K Boston %K Cognition/physiology %K Disability Evaluation %K Factor Analysis, Statistical %K Female %K Human %K Male %K Middle Aged %K Movement/physiology %K Outcome Assessment (Health Care)/*methods/statistics & numerical data %K Psychometrics %K Questionnaires/standards %K Rehabilitation/*standards/statistics & numerical data %K Reproducibility of Results %K Sensitivity and Specificity %K Support, U.S. Gov't, Non-P.H.S. %K Support, U.S. Gov't, P.H.S. %X BACKGROUND: Efforts to evaluate the effectiveness of a broad range of postacute care services have been hindered by the lack of conceptually sound and comprehensive measures of outcomes. It is critical to determine a common underlying structure before employing current methods of item equating across outcome instruments for future item banking and computer-adaptive testing applications. OBJECTIVE: To investigate the factor structure, reliability, and scale properties of items underlying the Activity domains of the International Classification of Functioning, Disability and Health (ICF) for use in postacute care outcome measurement. METHODS: We developed a 41-item Activity Measure for Postacute Care (AM-PAC) that assessed an individual's execution of discrete daily tasks in his or her own environment across major content domains as defined by the ICF. We evaluated the reliability and discriminant validity of the prototype AM-PAC in 477 individuals in active rehabilitation programs across 4 rehabilitation settings using factor analyses, tests of item scaling, internal consistency reliability analyses, Rasch item response theory modeling, residual component analysis, and modified parallel analysis. RESULTS: Results from an initial exploratory factor analysis produced 3 distinct, interpretable factors that accounted for 72% of the variance: Applied Cognition (44%), Personal Care & Instrumental Activities (19%), and Physical & Movement Activities (9%); these 3 activity factors were verified by a confirmatory factor analysis. Scaling assumptions were met for each factor in the total sample and across diagnostic groups. Internal consistency reliability was high for the total sample (Cronbach alpha = 0.92 to 0.94), and for specific diagnostic groups (Cronbach alpha = 0.90 to 0.95). Rasch scaling, residual factor, differential item functioning, and modified parallel analyses supported the unidimensionality and goodness of fit of each unique activity domain. CONCLUSIONS: This 3-factor model of the AM-PAC can form the conceptual basis for common-item equating and computer-adaptive applications, leading to a comprehensive system of outcome instruments for postacute care settings. %B Medical Care %V 42 %P I49-161 %G eng %M 14707755 %0 Journal Article %J European Journal of Psychological Assessment %D 2004 %T Assisted self-adapted testing: A comparative study %A Hontangas, P. %A Olea, J. %A Ponsoda, V. %A Revuelta, J. %A Wise, S. L. %K Adaptive Testing %K Anxiety %K Computer Assisted Testing %K Psychometrics %K Test %X A new type of self-adapted test (S-AT), called Assisted Self-Adapted Test (AS-AT), is presented. It differs from an ordinary S-AT in that prior to selecting the difficulty category, the computer advises examinees on their best difficulty category choice, based on their previous performance. Three tests (computerized adaptive test, AS-AT, and S-AT) were compared regarding both their psychometric (precision and efficiency) and psychological (anxiety) characteristics. Tests were applied in an actual assessment situation, in which test scores determined 20% of term grades. A sample of 173 high school students participated. Neither differences in posttest anxiety nor ability were obtained. Concerning precision, AS-AT was as precise as CAT, and both revealed more precision than S-AT. It was concluded that AS-AT acted as a CAT concerning precision. Some hints, but not conclusive support, of the psychological similarity between AS-AT and S-AT was also found. (PsycINFO Database Record (c) 2005 APA ) (journal abstract) %B European Journal of Psychological Assessment %V 20 %P 2-9 %G eng %0 Journal Article %J BMC Psychiatry %D 2004 %T Computerized adaptive measurement of depression: A simulation study %A Gardner, W. %A Shear, K. %A Kelleher, K. J. %A Pajer, K. A. %A Mammen, O. %A Buysse, D. %A Frank, E. %K *Computer Simulation %K Adult %K Algorithms %K Area Under Curve %K Comparative Study %K Depressive Disorder/*diagnosis/epidemiology/psychology %K Diagnosis, Computer-Assisted/*methods/statistics & numerical data %K Factor Analysis, Statistical %K Female %K Humans %K Internet %K Male %K Mass Screening/methods %K Patient Selection %K Personality Inventory/*statistics & numerical data %K Pilot Projects %K Prevalence %K Psychiatric Status Rating Scales/*statistics & numerical data %K Psychometrics %K Research Support, Non-U.S. Gov't %K Research Support, U.S. Gov't, P.H.S. %K Severity of Illness Index %K Software %X Background: Efficient, accurate instruments for measuring depression are increasingly importantin clinical practice. We developed a computerized adaptive version of the Beck DepressionInventory (BDI). We examined its efficiency and its usefulness in identifying Major DepressiveEpisodes (MDE) and in measuring depression severity.Methods: Subjects were 744 participants in research studies in which each subject completed boththe BDI and the SCID. In addition, 285 patients completed the Hamilton Depression Rating Scale.Results: The adaptive BDI had an AUC as an indicator of a SCID diagnosis of MDE of 88%,equivalent to the full BDI. The adaptive BDI asked fewer questions than the full BDI (5.6 versus 21items). The adaptive latent depression score correlated r = .92 with the BDI total score and thelatent depression score correlated more highly with the Hamilton (r = .74) than the BDI total scoredid (r = .70).Conclusions: Adaptive testing for depression may provide greatly increased efficiency withoutloss of accuracy in identifying MDE or in measuring depression severity. %B BMC Psychiatry %V 4 %P 13-23 %G eng %M 15132755 %0 Journal Article %J Journal of Applied Measurement %D 2004 %T Pre-equating: a simulation study based on a large scale assessment model %A Taherbhai, H. M. %A Young, M. J. %K *Databases %K *Models, Theoretical %K Calibration %K Human %K Psychometrics %K Reference Values %K Reproducibility of Results %X Although post-equating (PE) has proven to be an acceptable method in the scaling and equating of items and forms, there are times when the turn-around period for equating and converting raw scores to scale scores is so small that PE cannot be undertaken within the prescribed time frame. In such cases, pre-equating (PrE) could be considered as an acceptable alternative. Assessing the feasibility of using item calibrations from the item bank (as in PrE) is conditioned on the equivalency of the calibrations and the errors associated with it vis a vis the results obtained via PE. This paper creates item banks over three periods of item introduction into the banks and uses the Rasch model in examining data with respect to the recovery of item parameters, the measurement error, and the effect cut-points have on examinee placement in both the PrE and PE situations. Results indicate that PrE is a viable solution to PE provided the stability of the item calibrations are enhanced by using large sample sizes (perhaps as large as full-population) in populating the item bank. %B Journal of Applied Measurement %V 5 %P 301-18 %G eng %M 15243175 %0 Journal Article %J Journal of Applied Measurement %D 2003 %T Developing an initial physical function item bank from existing sources %A Bode, R. K. %A Cella, D. %A Lai, J. S. %A Heinemann, A. W. %K *Databases %K *Sickness Impact Profile %K Adaptation, Psychological %K Data Collection %K Humans %K Neoplasms/*physiopathology/psychology/therapy %K Psychometrics %K Quality of Life/*psychology %K Research Support, U.S. Gov't, P.H.S. %K United States %X The objective of this article is to illustrate incremental item banking using health-related quality of life data collected from two samples of patients receiving cancer treatment. The kinds of decisions one faces in establishing an item bank for computerized adaptive testing are also illustrated. Pre-calibration procedures include: identifying common items across databases; creating a new database with data from each pool; reverse-scoring "negative" items; identifying rating scales used in items; identifying pivot points in each rating scale; pivot anchoring items at comparable rating scale categories; and identifying items in each instrument that measure the construct of interest. A series of calibrations were conducted in which a small proportion of new items were added to the common core and misfitting items were identified and deleted until an initial item bank has been developed. %B Journal of Applied Measurement %V 4 %P 124-36 %G eng %M 12748405 %0 Journal Article %J Quality of Life Research %D 2003 %T Item banking to improve, shorten and computerized self-reported fatigue: an illustration of steps to create a core item bank from the FACIT-Fatigue Scale %A Lai, J-S. %A Crane, P. K. %A Cella, D. %A Chang, C-H. %A Bode, R. K. %A Heinemann, A. W. %K *Health Status Indicators %K *Questionnaires %K Adult %K Fatigue/*diagnosis/etiology %K Female %K Humans %K Male %K Middle Aged %K Neoplasms/complications %K Psychometrics %K Research Support, Non-U.S. Gov't %K Research Support, U.S. Gov't, P.H.S. %K Sickness Impact Profile %X Fatigue is a common symptom among cancer patients and the general population. Due to its subjective nature, fatigue has been difficult to effectively and efficiently assess. Modern computerized adaptive testing (CAT) can enable precise assessment of fatigue using a small number of items from a fatigue item bank. CAT enables brief assessment by selecting questions from an item bank that provide the maximum amount of information given a person's previous responses. This article illustrates steps to prepare such an item bank, using 13 items from the Functional Assessment of Chronic Illness Therapy Fatigue Subscale (FACIT-F) as the basis. Samples included 1022 cancer patients and 1010 people from the general population. An Item Response Theory (IRT)-based rating scale model, a polytomous extension of the Rasch dichotomous model was utilized. Nine items demonstrating acceptable psychometric properties were selected and positioned on the fatigue continuum. The fatigue levels measured by these nine items along with their response categories covered 66.8% of the general population and 82.6% of the cancer patients. Although the operational CAT algorithms to handle polytomously scored items are still in progress, we illustrated how CAT may work by using nine core items to measure level of fatigue. Using this illustration, a fatigue measure comparable to its full-length 13-item scale administration was obtained using four items. The resulting item bank can serve as a core to which will be added a psychometrically sound and operational item bank covering the entire fatigue continuum. %B Quality of Life Research %V 12 %P 485-501 %8 Aug %G eng %M 13677494 %0 Report %D 2002 %T Mathematical-programming approaches to test item pool design %A Veldkamp, B. P. %A van der Linden, W. J. %A Ariel, A. %K Adaptive Testing %K Computer Assisted %K Computer Programming %K Educational Measurement %K Item Response Theory %K Mathematics %K Psychometrics %K Statistical Rotation computerized adaptive testing %K Test Items %K Testing %X (From the chapter) This paper presents an approach to item pool design that has the potential to improve on the quality of current item pools in educational and psychological testing and hence to increase both measurement precision and validity. The approach consists of the application of mathematical programming techniques to calculate optimal blueprints for item pools. These blueprints can be used to guide the item-writing process. Three different types of design problems are discussed, namely for item pools for linear tests, item pools computerized adaptive testing (CAT), and systems of rotating item pools for CAT. The paper concludes with an empirical example of the problem of designing a system of rotating item pools for CAT. %I University of Twente, Faculty of Educational Science and Technology %C Twente, The Netherlands %P 93-108 %@ 02-09 %G eng %0 Journal Article %J Archives of Physical Medicine and Rehabilitation %D 2002 %T Measuring quality of life in chronic illness: the functional assessment of chronic illness therapy measurement system %A Cella, D. %A Nowinski, C. J. %K *Chronic Disease %K *Quality of Life %K *Rehabilitation %K Adult %K Comparative Study %K Health Status Indicators %K Humans %K Psychometrics %K Questionnaires %K Research Support, U.S. Gov't, P.H.S. %K Sensitivity and Specificity %X We focus on quality of life (QOL) measurement as applied to chronic illness. There are 2 major types of health-related quality of life (HRQOL) instruments-generic health status and targeted. Generic instruments offer the opportunity to compare results across patient and population cohorts, and some can provide normative or benchmark data from which to interpret results. Targeted instruments ask questions that focus more on the specific condition or treatment under study and, as a result, tend to be more responsive to clinically important changes than generic instruments. Each type of instrument has a place in the assessment of HRQOL in chronic illness, and consideration of the relative advantages and disadvantages of the 2 options best drives choice of instrument. The Functional Assessment of Chronic Illness Therapy (FACIT) system of HRQOL measurement is a hybrid of the 2 approaches. The FACIT system combines a core general measure with supplemental measures targeted toward specific diseases, conditions, or treatments. Thus, it capitalizes on the strengths of each type of measure. Recently, FACIT questionnaires were administered to a representative sample of the general population with results used to derive FACIT norms. These normative data can be used for benchmarking and to better understand changes in HRQOL that are often seen in clinical trials. Future directions in HRQOL assessment include test equating, item banking, and computerized adaptive testing. %B Archives of Physical Medicine and Rehabilitation %V 83 %P S10-7 %8 Dec %G eng %M 12474167 %0 Book Section %B Computer-based tests: Building the foundation for future assessment %D 2002 %T The work ahead: A psychometric infrastructure for computerized adaptive tests %A F Drasgow %E M. P. Potenza %E J. J. Freemer %E W. C. Ward %K Adaptive Testing %K Computer Assisted Testing %K Educational %K Measurement %K Psychometrics %X (From the chapter) Considers the past and future of computerized adaptive tests and computer-based tests and looks at issues and challenges confronting a testing program as it implements and operates a computer-based test. Recommendations for testing programs from The National Council of Measurement in Education Ad Hoc Committee on Computerized Adaptive Test Disclosure are appended. (PsycINFO Database Record (c) 2005 APA ) %B Computer-based tests: Building the foundation for future assessment %I Lawrence Erlbaum Associates, Inc. %C Mahwah, N.J. USA %G eng %0 Journal Article %J Journal of Personality Assessment %D 2001 %T Evaluation of an MMPI-A short form: Implications for adaptive testing %A Archer, R. P. %A Tirrell, C. A. %A Elkins, D. E. %K Adaptive Testing %K Mean %K Minnesota Multiphasic Personality Inventory %K Psychometrics %K Statistical Correlation %K Statistical Samples %K Test Forms %X Reports some psychometric properties of an MMPI-Adolescent version (MMPI-A; J. N. Butcher et al, 1992) short form based on administration of the 1st 150 items of this test instrument. The authors report results for both the MMPI-A normative sample of 1,620 adolescents (aged 14-18 yrs) and a clinical sample of 565 adolescents (mean age 15.2 yrs) in a variety of treatment settings. The authors summarize results for the MMPI-A basic scales in terms of Pearson product-moment correlations generated between full administration and short-form administration formats and mean T score elevations for the basic scales generated by each approach. In this investigation, the authors also examine single-scale and 2-point congruences found for the MMPI-A basic clinical scales as derived from standard and short-form administrations. The authors present the relative strengths and weaknesses of the MMPI-A short form and discuss the findings in terms of implications for attempts to shorten the item pool through the use of computerized adaptive assessment approaches. (PsycINFO Database Record (c) 2005 APA ) %B Journal of Personality Assessment %V 76 %P 76-89 %G eng %0 Journal Article %J Journal of Applied Measurement %D 2000 %T CAT administration of language placement examinations %A Stahl, J. %A Bergstrom, B. %A Gershon, R. C. %K *Language %K *Software %K Aptitude Tests/*statistics & numerical data %K Educational Measurement/*statistics & numerical data %K Humans %K Psychometrics %K Reproducibility of Results %K Research Support, Non-U.S. Gov't %X This article describes the development of a computerized adaptive test for Cegep de Jonquiere, a community college located in Quebec, Canada. Computerized language proficiency testing allows the simultaneous presentation of sound stimuli as the question is being presented to the test-taker. With a properly calibrated bank of items, the language proficiency test can be offered in an adaptive framework. By adapting the test to the test-taker's level of ability, an assessment can be made with significantly fewer items. We also describe our initial attempt to detect instances in which "cheating low" is occurring. In the "cheating low" situation, test-takers deliberately answer questions incorrectly, questions that they are fully capable of answering correctly had they been taking the test honestly. %B Journal of Applied Measurement %V 1 %P 292-302 %G eng %M 12029172 %0 Journal Article %J Journal of the Accoustical Society of America %D 1997 %T A computerized adaptive testing system for speech discrimination measurement: The Speech Sound Pattern Discrimination Test %A Bochner, J. %A Garrison, W. %A Palmer, L. %A MacKenzie, D. %A Braveman, A. %K *Diagnosis, Computer-Assisted %K *Speech Discrimination Tests %K *Speech Perception %K Adolescent %K Adult %K Audiometry, Pure-Tone %K Human %K Middle Age %K Psychometrics %K Reproducibility of Results %X A computerized, adaptive test-delivery system for the measurement of speech discrimination, the Speech Sound Pattern Discrimination Test, is described and evaluated. Using a modified discrimination task, the testing system draws on a pool of 130 items spanning a broad range of difficulty to estimate an examinee's location along an underlying continuum of speech processing ability, yet does not require the examinee to possess a high level of English language proficiency. The system is driven by a mathematical measurement model which selects only test items which are appropriate in difficulty level for a given examinee, thereby individualizing the testing experience. Test items were administered to a sample of young deaf adults, and the adaptive testing system evaluated in terms of respondents' sensory and perceptual capabilities, acoustic and phonetic dimensions of speech, and theories of speech perception. Data obtained in this study support the validity, reliability, and efficiency of this test as a measure of speech processing ability. %B Journal of the Accoustical Society of America %V 101 %P 2289-298 %G eng %M 9104030 %0 Journal Article %J Journal of Outcomes Measurement %D 1997 %T On-line performance assessment using rating scales %A Stahl, J. %A Shumway, R. %A Bergstrom, B. %A Fisher, A. %K *Outcome Assessment (Health Care) %K *Rehabilitation %K *Software %K *Task Performance and Analysis %K Activities of Daily Living %K Humans %K Microcomputers %K Psychometrics %K Psychomotor Performance %X The purpose of this paper is to report on the development of the on-line performance assessment instrument--the Assessment of Motor and Process Skills (AMPS). Issues that will be addressed in the paper include: (a) the establishment of the scoring rubric and its implementation in an extended Rasch model, (b) training of raters, (c) validation of the scoring rubric and procedures for monitoring the internal consistency of raters, and (d) technological implementation of the assessment instrument in a computerized program. %B Journal of Outcomes Measurement %V 1 %P 173-191 %G eng %M 9661720 %0 Journal Article %J Nurs Health Care %D 1993 %T Computerized adaptive testing: the future is upon us %A Halkitis, P. N. %A Leahy, J. M. %K *Computer-Assisted Instruction %K *Education, Nursing %K *Educational Measurement %K *Reaction Time %K Humans %K Pharmacology/education %K Psychometrics %B Nurs Health Care %7 1993/09/01 %V 14 %P 378-85 %8 Sep %@ 0276-5284 (Print) %G eng %M 8247367