Research Strategies in CAT

Live CAT Administration

Live CAT testing involves the administration of real tests to live examinees. To implement a live-testing research study you will need test items that have had item parameters estimated by an appropriate IRT model. Most live CAT administrations are based on test items that can be scored dichotomously, such as multiple-choice items answered correct or incorrect. For these types of items, IRT parameter estimates can be obtained from a number of different software programs. You will also need software that will deliver your CATs to an appropriate group of examinees. Prior to implementing a live-testing CAT research study or a CAT testing program, it is advisable to run a number of simulations to determine how CAT will function with the item banks that you have available.

Live-testing CAT research studies have a number of advantages and disadvantages:

  1. They require live examinees and live test administration and, therefore, are expensive and time-consuming.
  2. Live-testing studies can answer only a limited number of questions because they rely on a restricted set of criterion variables, such as correlations with other test and group differences in performance.
  3. Because test data are obtained from live examinees, the data includes an indeterminate degree of "noise" because real people no not always respond in accordance with a specified IRT model.
  4. Nevertheless, live-testing studies are essential for answering certain kinds of questions, such as "which model is appropriate for use with a particular test in a practical testing application?"
  5. Live-testing studies are useful in providing starting points for monte-carlo simulations.
  6. This type of study might also be useful in confirming the results of some monte-carlo simulations.

CAT Simulations

Three types of simulations are used in CAT research, depending on the research question -- post-hoc (or "real data") simulation, monte-carlo simulations, and hybrid simulations.

Post-Hoc or Real-Data Simulations

This approach to simulation is used when CAT is to be used to reduce the length of a test that has been administered conventionally. The "item bank" used in this case would be all the items in a conventional test. The objective of applying CAT procedures is to determine how much reduction in test length can be achieved by "re-administering" the items adaptively, without significant changes in the psychometric properties of the test scores. The data are the item responses of a group of examinees on the conventional test that is being analyzed. A real-data simulation proceeds as follows:

  1. Using the available item response data, estimate IRT parameters for the items in the test using appropriate software.
  2. Use those parameter estimates to estimate theta for each examinee by maximum likelihood or Bayesian methods.
  3. Apply maximum information CAT with maximum likelihood (or Bayesian) theta estimation to adaptively estimate theta for each examinee based on the item responses available for each examinee from the conventional test administration.
  4. Terminate the CAT when a prespecified termination criterion has been reached (e.g., a standard error of .25, minimum information in the next of of .10 or less, or a specific number of items).
  5. Compare the CAT theta estimates with the conventional test theta estimates as a function of the number of items administered in the CAT.
  6. Determine adaptive test lengths that result in maximum similarity between the CAT theta estimates and those of the conventional test, with a minimum number of CAT items.

Post-hoc simulations can also be used when a group of examinees has responded to all of the items in a CAT item bank.  Post-hoc simulations can be implemented with available software.

Monte-Carlo Simulations

Monte-carlo simulation can be used to evaluate the potential performance of various approaches to CAT with various populations and to evaluate the potential performance of CAT using using real or hypothetical item banks. This approach to simulation typically is used prior to the implementation of a live CAT testing program to evaluate the performance of a calibrated item bank and to determine operational CAT parameters, such as appropriate values of CAT entry theta values, termination criteria, and item exposure.

Monte-carlo simulation differs from real-data simulation in the following characteristics:

  1. "Examinees" (generally referred to as "simulees") are generated by the simulation process to have specified distributions of theta.
  2. Item parameters can either be generated to have specified values and distributions, or item parameter estimates from real items can be used as the CAT item bank.
  3. Item responses of the simulees are generated from an appropriate IRT model.
  4. CATs are then administered using prespecified CAT algorithm values to answer a specific research question or set of research questions.

The "monte-carlo" name for this procedure results from the random element introduced in the procedure when item responses are generated from an IRT model.

A monte-carlo study requires the researcher to carefully specify appropriate values for all the variables in the study. Thus, the number and distribution of examinees, number of items, distributions of IRT item parameters, IRT model, and all CAT algorithm values are under the control of the researcher. Harwell, Stone, Hsu, & Kirisci (1996) provide an excellent discussion of considerations in the design and implementation of monte carlo simulation studies using IRT.

Hybrid Simulations

Hybrid simulations combine elements of both post-hoc simulations and monte-carlo simulations.  They are designed to operate with data matrices that are sparsely populated with item response data.  Such response matrices frequently result from data collection designs used in developing CAT item banks when a linking or anchor test is used along with different subsets of additional items administered to different groups of examinees.  Item parameters are then estimated from these sparse data matrices.

To implement a hybrid simulation with an item response matrix of this type, available item responses are used for each examinee to estimate theta with the estimated item parameters.  That theta then is used to impute missing item responses for each examinee, to obtain a complete item response matrix.  The complete item response matrix is then used to simulate CAT administration using post-hoc simulation methods.

Several general purpose software packages for CAT simulations are available.

Reference

Harwell, M., Stone, C. A., Hsu, T.-C., & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20, 101-125

Nydick, S. & Weiss, D.J.  A hybrid procedure for the development of CATs.  In D. J. Weiss (Ed.) Proceedings of the 2009 GMAC Conference on Computerized Adaptive Testing.