Computerized adaptive testing (CAT) has been around since the 1970s and is well-known for the benefits it can provide, most notably that it can reduce testing time 50-90% with no loss of measurement precision. Developing a sound, defensible CAT is not easy, but our goal is to make it as easy as possible – that is, everything you need is available in clean software UI and you never have to write a single line of code. Here, we outline the software, data analysis, and project management steps needed to develop and publish a CAT that aligns with best practices and international standards.
This approach is based on Thompson and Weiss (2011) model, refer there for a general treatment of CAT development, especially the use of simulation studies. Also, this article assumes you have mastered the concepts of item response theory and CAT, including:
- IRT models (e.g., 3PL, rating scale model)
- Item response functions
- Item information functions
- Theta estimation
- Conditional standard error of measurement
- Item selection algorithms
- Termination criterion.
If IRT is new to you, please visit these resources
http://www.assess.com/what-is-item-response-theory/
http://www.assess.com/how-do-i-implement-item-response-theory/
http://www.assess.com/what-do-dichotomous-and-polytomous-mean-in-irt/
If you have some background in IRT but CAT is new, please visit these resources
http://www.assess.com/monte-carlo-simulation-adaptive-testing/
http://www.assess.com/adaptive-testing/
And for videos that delve more deeply into IRT/CAT,
https://www.youtube.com/user/ASCpsychometrics
Overview
There are nine steps to developing a CAT on our industry-leading platform, FastTest:
Step | Work to be done | Software |
1 | Perform feasibility and planning studies | CATSim |
2 | Develop item bank | FastTest |
3 | Pilot items on 100-2000 examinees | FastTest |
4 | Perform item analysis and other due diligence | Iteman/Xcalibre |
5 | IRT calibration | Xcalibre |
6 | Upload IRT parameters into FastTest | FastTest |
7 | Validity study | CATSim |
8 | Publish CAT | FastTest |
9 | Quality assurance | FastTest |
We’ll now talk a little more about each of these.
Perform feasibility and planning studies
The first step, before doing anything else, is to confirm that your assessment meets the basic requirements of CAT. For example, you need to have a decent sized item bank, data on hundreds of examinees (or the future opportunity), and items that are scoreable in real time. See this paper for a full discussion. If there are no roadblocks, the next step is to perform monte carlo simulations that help you scope out the project, using the CATSim software. For example, you might simulate CATs with three sizes of item bank, so you have a better idea how many items to write.
Develop item bank
Now that you have some idea of how many items you need and in which ranges of difficulty and/or content constraints, you can leverage the powerful item authoring functionality of FastTest, as well as the item review and workflow management to ensure that subject matter experts are performing quality assurance on each other.
Pilot items
Because IRT requires that you have data from real examinees to calibrate item difficulty, you need to get that data. To do so, create test(s) in FastTest to deliver all your items in a matter that meets your practical situation. That is, some organizations have a captive audience and might be able to have 500 people take all 300 items in their bank next week. Other organizations might need to create 4 linear forms of 100 items with some overlap. Others might be constrained to still use current test forms and only tack on 20 new items onto the end of every examinee’s test.
Of course, some of you might have existing data. That is, you might have spreadsheets of data from a previous test delivery system, paper based delivery, or perhaps even already have your IRT parameters from past efforts. You can use those too.
If you do deliver the pilot phase with FastTest, you now need to export the data to be analyzed in psychometric analytic software. This is done using the Export ->Data Analysis Matrix option. You also need to Export -> Item Metadata for the test forms that you used, which becomes the “control file” for your analysis in the next two steps.
Perform item analysis, DIF, and other due diligence
The purpose of this step is to ensure that items included in your future CAT are of high quality. Any steps that your organization normally does to review item performance is still relevant. This typically includes a review of items with low point-biserial correlations (poor discrimination), items where more examinees selected a distractor than the correct option (key flags), high or low classical P values, and differential item functioning (DIF) flags. Our Iteman software is designed exactly for this process. If you have a FastTest account the Iteman analysis report is now available at a single click. If not, Iteman is also available as a standalone program.
Calibrate with Xcalibre
Because CAT algorithms rely entirely on IRT parameters (unless you are doing special algorithms like diagnostic measurement models or measurement decision theory), we need to calculate the IRT parameters and get them into our testing platform. If you have delivered all your items in a single block to examinees, like the example above with 500 people, then that single matrix can just be analyzed with Xcalibre. If you have multiple forms, LOFT, or the “tack-on” approach, you need to worry about IRT equating.
Upload IRT parameters into FastTest
Xcalibre will provide all the IRT parameters in a spreadsheet, in addition to the primary Word report. You’ll then need to export Item Metadata from FastTest to get the template for importing; copy and paste those IRT parameters into the template, then you can import Item Metadata into FastTest. This will associate the IRT parameters with all the items in your CAT pool.
Validity study
Now that you have your final pool of items established, and calculated the IRT parameters, you need to establish the algorithms you are going to use to publish the CAT. That is, you need to decide on the Initial Theta rule, Item Selection rule (including subalgorithms like content or exposure constraints), and Termination Criterion. To establish these, you need to perform more simulation studies, but now with your final bank as the input rather than a fake bank from the monte carlo simulations. The most important aspect is determining the tradeoff between test length and precision; a termination criteria that provides more precise scores will have longer tests, and you can control the exact extent with a CAT.
Publish CAT
Assemble a “test form” in FastTest that consists of all the items you intend to use in your CAT pool. Then select CAT as the delivery method in the Test Options screen, and you’ll see a screen where you can input the results from your CATSim validity study for the three important CAT algorithms.
Quality assurance
Your CAT is now ready to go! Before bringing in real students, however, we recommend that you take it a few times as QA. Do so with certain students in mind, such as a very low student, a very high student, or one near the cutscore (if you have one). To peek under the hood at the CAT algorithm, you can export the Examinee Test Detail Report from FastTest, which provides an item-by-item picture of how the CAT proceeds.
Comments
0 comments
Please sign in to leave a comment.