Abstract
The fit of cognitive diagnostic models (CDMs) to response data needs to be evaluated, since CDMs might yield misleading results when they do not fit the data well. Limited-information statistic M2 and the associated root mean square error of approximation (RMSEA2) in item factor analysis were extended to evaluate the fit of CDMs. The findings suggested that the M2 statistic has proper empirical Type I error rates and good statistical power, and it could be used as a general statistical tool. More importantly, we found that there was a strong linear relationship between mean marginal misclassification rates and RMSEA2 when there was model–data misfit. The evidence demonstrated that .030 and .045 could be reasonable thresholds for excellent and good fit, respectively, under the saturated log-linear cognitive diagnosis model.
References
|
Bishop, Y. M., Fienberg, S. E., Holland, P. W. (2007). Discrete multivariate analysis: Theory and practice. New York, NY: Springer. Google Scholar | |
|
Bock, R. D., Gibbons, R., Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261–280. Google Scholar | SAGE Journals | |
|
Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York, NY: Guilford. Google Scholar | |
|
Browne, M. W. (1984). Asymptotically distribution-free methods for the analysis of covariance structures. British Journal of Mathematical and Statistical Psychology, 37, 62–83. Google Scholar | Crossref | Medline | |
|
Browne, M. W., Cudeck, R. (1993). Alternative ways of assessing model fit. In Bollen, K. A., Long, J. S. (Eds.), Testing structural equation models (pp. 136–162). Newbury Park, CA: Sage. Google Scholar | |
|
Cai, L. (2013). flexMIRT® version 2: Flexible multilevel multidimensional item analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group. Google Scholar | |
|
Cai, L., Hansen, M. (2013). Limited-information goodness-of-fit testing of hierarchical item factor models. British Journal of Mathematical and Statistical Psychology, 66, 245–276. Google Scholar | Crossref | Medline | |
|
Cai, L., Maydeu-Olivares, A., Coffman, D. L., Thissen, D. (2006). Limited-information goodness-of-fit testing of item response theory models for sparse 2P tables. British Journal of Mathematical and Statistical Psychology, 59, 173–194. Google Scholar | Crossref | Medline | |
|
Chen, J., de la Torre, J., Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50, 123–140. Google Scholar | Crossref | |
|
DeCarlo, L. T. (2010). On the analysis of fraction subtraction data: The DINA model, classification, latent class sizes, and the Q-matrix. Applied Psychological Measurement, 35, 8–26. Google Scholar | SAGE Journals | |
|
de la Torre, J. (2009). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34, 115–130. Google Scholar | SAGE Journals | |
|
de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199. Google Scholar | Crossref | |
|
de la Torre, J., Douglas, J. A. (2008). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73, 595–624. Google Scholar | Crossref | |
|
de la Torre, J., Lee, Y.-S. (2013). Evaluating the Wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50, 355–373. Google Scholar | Crossref | |
|
Hartz, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality (Unpublished doctoral dissertation). Department of Statistics, University of Illinois at Urbana-Champaign, Champaign. Google Scholar | |
|
Henson, R. A., Templin, J. L., Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191–210. Google Scholar | Crossref | |
|
Hu, L., Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453. Google Scholar | Crossref | |
|
Junker, B. W., Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272. Google Scholar | SAGE Journals | |
|
Jurich, D. P. (2014). Assessing model fit of multidimensional item response theory and diagnostic classification models using limited-information statistics (Unpublished doctoral dissertation). Department of Graduate Psychology, James Madison University, Harrisonburg. Google Scholar | |
|
Khatri, C. G. (1966). A note on a MANOVA model applied to problems in growth curve. Annals of the Institute of Statistical Mathematics, 18, 75–86. Google Scholar | Crossref | |
|
Koehler, K. J., Larntz, K. (1980). An empirical investigation of goodness-of-fit statistics for sparse multinomials. Journal of the American Statistical Association, 75, 336–344. Google Scholar | Crossref | |
|
Kunina-Habenicht, O., Rupp, A. A., Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49, 59–81. Google Scholar | Crossref | |
|
Liu, Y., Maydeu-Olivares, A. (2014). Identifying the source of misfit in item response theory models. Multivariate Behavioral Research, 49, 354–371. Google Scholar | Crossref | Medline | |
|
Maydeu-Olivares, A. (2013). Goodness-of-fit assessment of item response theory models. Measurement: Interdisciplinary Research and Perspectives, 11, 71–101. Google Scholar | Crossref | |
|
Maydeu-Olivares, A., Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in 2n contingency tables: A unified framework. Journal of the American Statistical Association, 100, 1009–1020. Google Scholar | Crossref | |
|
Maydeu-Olivares, A., Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49, 305–328. Google Scholar | Crossref | Medline | |
|
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177–195. Google Scholar | Crossref | |
|
Muthén, L. K., Muthén, B. O. (2012). Mplus user’s guide (7th ed.). Los Angeles, CA: Author. Google Scholar | |
|
R Core Team . (2014). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing. Google Scholar | |
|
Reiser, M. (1996). Analysis of residuals for the multionmial item response model. Psychometrika, 61, 509–528. Google Scholar | Crossref | |
|
Reiser, M., Lin, Y. (1999). A goodness-of-fit test for the latent class model when expected frequencies are small. Sociological Methodology, 29, 81–111. Google Scholar | SAGE Journals | |
|
Robitzsch, A., Kiefer, T., George, A. C., Uenlue, A. (2015). CDM: Cognitive diagnostic modeling [Computer software]. Retrieved from http://CRAN.R-project.org/package=CDM (R package version 4.2-12) Google Scholar | |
|
Rupp, A. A., Templin, J. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA Model. Educational and Psychological Measurement, 68, 78–96. Google Scholar | SAGE Journals | |
|
Rupp, A. A., Templin, J., Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York, NY: Guilford. Google Scholar | |
|
Sinharay, S. (2006). Model diagnostics for Bayesian networks. Journal of Educational and Behavioral Statistics, 31, 1–33. Google Scholar | SAGE Journals | |
|
Sinharay, S., Almond, R. G. (2007). Assessing fit of cognitive diagnostic models a case study. Educational and Psychological Measurement, 67, 239–257. Google Scholar | SAGE Journals | |
|
Sinharay, S., Puhan, G., Haberman, S. J. (2011). An NCME instructional module on subscores. Educational Measurement: Issues and Practice, 30, 29–40. Google Scholar | Crossref | |
|
Tatsuoka, K. K. (1990). Toward an integration of item-response theory and cognitive error diagnosis. In Frederiksen, N., Glaser, R., Lesgold, A., Safto, M. (Eds.), Monitoring Skills and Knowledge Acquisition (pp. 453–488). Hillsdale, NJ: Erlbaum. Google Scholar | |
|
Templin, J. L., Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological methods, 11, 287–305. Google Scholar | Crossref | Medline |
