Abstract
Routing examinees to modules based on their ability level is a very important aspect in computerized adaptive multistage testing. However, the presence of missing responses may complicate estimation of examinee ability, which may result in misrouting of individuals. Therefore, missing responses should be handled carefully. This study investigated multiple missing data methods in computerized adaptive multistage testing, including two imputation techniques, the use of full information maximum likelihood and the use of scoring missing data as incorrect. These methods were examined under the missing completely at random, missing at random, and missing not at random frameworks, as well as other testing conditions. Comparisons were made to baseline conditions where no missing data were present. The results showed that imputation and the full information maximum likelihood methods outperformed incorrect scoring methods in terms of average bias, average root mean square error, and correlation between estimated and true thetas.
References
|
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In Lord, F. M., Novick, M. R. (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley. Google Scholar | |
|
Bock, R. D., Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment R. Applied Psychological Measurement, 6, 431-444. Google Scholar | SAGE Journals | ISI | |
|
Brandriet, A., Holme, T. (2015). Methods for addressing missing data with applications from ACS exams. Journal of Chemical Education, 92, 2045-2053. Google Scholar | Crossref | |
|
Chen, P.-H., Chang, H.-H., Wu, H. (2012). Item selection for the development of parallel forms from an IRT-based seed test using a sampling and classification approach. Educational and Psychological Measurement, 72, 933-953. Google Scholar | SAGE Journals | ISI | |
|
Cohen, J. (1973). Eta squared and partial eta-squared in fixed factor ANOVA designs. Educational and Psychological Measurement, 33, 107-112. Google Scholar | SAGE Journals | ISI | |
|
Davey, T., Lee, Y-H. (2011, June). Potential impact of context effects on the scoring and equating of the multistage GRE® Revised general test (Research Report RR-11-26). Princeton, NJ: Educational Testing Service. Available from https://files.eric.ed.gov/fulltext/EJ1110374.pdf Google Scholar | |
|
De Ayala, R. J., Plake, B. S., Impara, J. C. (2001). The impact of omitted responses on the accuracy of ability estimation in item response theory. Journal of Educational Measurement, 38, 213-234. Google Scholar | Crossref | ISI | |
|
DeMars, C. (2002). Incomplete data and item parameter estimates under JMLE and MML estimation. Applied Measurement in Education, 15, 15-31. Google Scholar | Crossref | ISI | |
|
Eekhout, I., de Vet, H. C., Twisk, J. W., Brand, J. P., de Boer, M. R., Heymans, M. W. (2014). Missing data in a multi-item instrument were best handled by multiple imputation at the item score level. Journal of Clinical Epidemiology, 67, 335-342. Google Scholar | Crossref | Medline | ISI | |
|
Enciso, S. M. S. (2016). The effects of missing data treatment on person ability estimates using IRT models (Unpublished master’s thesis). University of Nebraska-Lincoln, Nebraska. Google Scholar | |
|
Enders, C. K. (2001). A primer on maximum likelihood algorithms available for use with missing data. Structural Equation Modeling, 8, 128-141. Google Scholar | Crossref | ISI | |
|
Enders, C. K. (2004). The impact of missing data on sample reliability estimates: Implications for reliability reporting practices. Educational and Psychological Measurement, 64, 419-436. Google Scholar | SAGE Journals | ISI | |
|
Finch, H. (2008). Estimation of item responses theory parameters in the presence of missing data. Journal of Educational Measurement, 45, 225-245. Google Scholar | Crossref | ISI | |
|
Gottschall, A. C., West, S. G., Enders, C. K. (2012). A comparison of item level and scale level multiple imputation for questionnaire batteries. Multivariate Behavioral Research, 47(1), 1-25. Google Scholar | Crossref | |
|
Han, K. T., Guo, F. (2014). Impact of violation of the missing at random assumption on full information maximum likelihood method in multidimensional adaptive testing. Practical, Assessment, Research and Evaluation, 19(20). Retrieved from https://pareonline.net/getvn.asp?v=19&n=2 Google Scholar | |
|
Hoogland, J. J., Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and meta-analysis. Sociological Methods and Research, 26, 329-367. Google Scholar | SAGE Journals | ISI | |
|
Holman, R., Glas, C. A. W. (2005). Modelling non-ignorable missing-data mechanisms with item response theory models. British Journal of Mathematical & Statistical Psychology, 58, 1-17. Google Scholar | Crossref | Medline | ISI | |
|
ILOG . (2006). ILOG CPLEX 10.0 (User’s manual). Paris, France: Author. Google Scholar | |
|
Kadengye, D. T., Ceulemans, E., Van den Noortgate, W. (2013). Direct likelihood analysis and multiple imputation for missing item scores in multilevel cross-classification educational data. Applied Psychological Measurement, 38, 61-80. Google Scholar | SAGE Journals | |
|
Kadengye, D. T., Cools, W., Ceulemans, E., van den Noortgate, W. (2012). Simple imputation methods versus direct likelihood analysis for missing item scores in multilevel educational data. Behavioral Research, 44, 516-531. Google Scholar | Crossref | Medline | |
|
Little, R. J. A., Rubin, D. (2002). Statistical analysis with missing data. New York, NY: John Wiley. Google Scholar | Crossref | |
|
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Luecht, R. M. (2000, April). Implementing the computer-adaptive sequential testing (CAST) framework to mass produce high quality computer-adaptive and mastery tests. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA. Google Scholar | |
|
Luecht, R. M., Brumfield, T., Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189-202. Google Scholar | Crossref | ISI | |
|
Luecht, R. M., Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35, 229-249. Google Scholar | Crossref | ISI | |
|
Maher, J. M., Markey, J. C., Ebert-May, D. (2013). The other half of the story: Effect size analysis in quantitative research. Life Science Education, 12, 345-351. Google Scholar | Crossref | Medline | |
|
Mislevy, R. J. (2017). Missing responses in item response modeling. Google Scholar | |
|
Moustaki, I., Knott, M. (2000). Weighting for item non-response in attitude scales by using latent variable models with covariates. Journal of the Royal Statistical Society, 163, 445-459. Google Scholar | Crossref | |
|
Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing. Available from ProQuest Dissertations & Theses Global. (Order No. 9950199) Google Scholar | |
|
Peyre, H., Leplege, A., Coste, J. (2011). Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey. Quality of Life Research, 20, 287-300. Google Scholar | Crossref | Medline | ISI | |
|
R Development Core Team . (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.rproject.org. Google Scholar | |
|
Rose, N., von Davier, M., Xu, X. (2010). Modeling nonignorable missing data with item response theory (ETS Research Report No. RR-10-11). Princeton, NJ: Educational Testing Service. Google Scholar | Crossref | |
|
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley. Google Scholar | Crossref | |
|
Schnipke, D. L., Reese, L. M. (1999). A comparison of testlet-based test designs for computerized adaptive testing (Law School Admission Council Computerized Testing Report). Newtown, PA: Law School Admission Council. Google Scholar | |
|
Sijtsma, K., van der Ark, L. A. (2003). Investigation and treatment of missing items scores in test and questionnaire data. Multivariate Behavioral Research, 38, 505-528. Google Scholar | Crossref | Medline | ISI | |
|
Sulis, I., Porcu, M. (2017). Handling missing data in item response theory. Assessing the accuracy of a multiple imputation procedure based on latent class analysis. Journal of Classification, 34, 327-359. Google Scholar | Crossref | |
|
van Buuren, S . (2010). Item imputation without specifying scale structure. Methodology, 6(1), 31-36. Google Scholar | Crossref | |
|
Van Ginkel, J. R., Sijtsma, K., van der Ark, L. A., Vermunt, J. K. (2010). Incidence of missing item scores in personality measurement, and simple item score imputation. Methodology, 6(1), 17-30. Google Scholar | Crossref | |
|
Wang, S., Jiao, H., Xiang, Y. (2013, April). The effect of nonignorable missing data in computerized adaptive test on item fit statistics polytomous item response models. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco, CA. Google Scholar | |
|
Wolkowitz, A. A., Skorupski, W. P. (2013). A method of imputing response options for missing data on multiple choice assessments. Educational and Psychological Measurement, 73, 1036-1053. Google Scholar | SAGE Journals | ISI | |
|
Zhang, B., Walker, C. M. (2008). Impact of missing data on person model fit and person trait estimation. Applied Psychological Measurement, 32, 466-479. Google Scholar | SAGE Journals | ISI | |
|
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment. Available from ProQuest Dissertations & Theses Global. (Order No. 3136800) Google Scholar |
