Abstract
Psychometric measurement models are only valid if measurement invariance holds between test takers of different groups. Global model tests, such as the well-established likelihood ratio (LR) test, are sensitive to violations of measurement invariance, such as differential item functioning and differential step functioning. However, these traditional approaches are only applicable when comparing previously specified reference and focal groups, such as males and females. Here, we propose a new framework for global model tests for polytomous Rasch models based on a model-based recursive partitioning algorithm. With this approach, a priori specification of reference and focal groups is no longer necessary, because they are automatically detected in a data-driven way. The statistical background of the new framework is introduced along with an instructive example. A series of simulation studies illustrates and compares its statistical properties to the well-established LR test. While both the LR test and the new framework are sensitive to differential item functioning and differential step functioning and respect a given significance level regardless of true differences in the ability distributions, the new data-driven approach is more powerful when the group structure is not known a priori—as will usually be the case in practical applications. The usage and interpretation of the new method are illustrated in an empirical application example. A software implementation is freely available in the R system for statistical computing.
References
|
Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123-140. Google Scholar | Crossref | ISI | |
|
Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42, 69-81. Google Scholar | Crossref | ISI | |
|
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573. Google Scholar | Crossref | ISI | |
|
Andrich, D. (2013). An expanded derivation of the threshold structure of the polytomous Rasch model that dispels any “threshold disorder controversy.” Educational and Psychological Measurement, 73, 78-124. Google Scholar | SAGE Journals | ISI | |
|
Ankenmann, R. D., Witt, E. A., Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36, 277-300. Google Scholar | Crossref | ISI | |
|
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15, 113-141. Google Scholar | Crossref | ISI | |
|
Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J. (1984). Classification and regression trees. London, England: Chapman & Hall. Google Scholar | |
|
Camilli, G., Congdon, P. (1999). Application of a method of estimating DIF for polytomous test items. Journal of Educational and Behavioral Statistics, 24, 323-341. Google Scholar | SAGE Journals | ISI | |
|
Chang, H., Mazzeo, J., Roussos, L. (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33, 333-353. Google Scholar | Crossref | ISI | |
|
De Boeck, P., Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York, NY: Springer-Verlag. Google Scholar | Crossref | |
|
Fischer, G. H., Molenaar, I. W. (1995). Rasch models: Foundations, recent developments, and applications. New York, NY: Springer-Verlag. Google Scholar | Crossref | |
|
Fischer, G. H., Ponocny, I. (1995). Extended rating scale and partial credit models for assessing change. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 353-370). New York, NY: Springer-Verlag. Google Scholar | Crossref | |
|
Fox, J., Hong, J. (2009). Effect displays in R for multinomial and proportional-odds logit models: Extensions to the effects package. Journal of Statistical Software, 32(1), 1-24. Google Scholar | Crossref | ISI | |
|
Glas, C. A. W., Verhelst, N. D. (1995). Testing the Rasch model. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 69-96). New York, NY: Springer-Verlag. Google Scholar | Crossref | |
|
Gustafsson, J. (1980). Testing and obtaining fit of data to the Rasch model. British Journal of Mathematical and Statistical Psychology, 33, 205-233. Google Scholar | Crossref | ISI | |
|
Hochberg, Y., Tamhane, A. C. (1987). Multiple comparison procedures. New York, NY: Wiley. Google Scholar | Crossref | |
|
Holland, P. W., Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In Wainer, H., Brown, H. I. (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum Google Scholar | |
|
Holland, P. W., Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale: Lawrence Erlbaum. Google Scholar | |
|
Johnson, E., Carlson, J. (1994). The NAEP 1992 technical report. Washington, DC: National Center for Education Statistics. Google Scholar | |
|
Kopf, J., Zeileis, A., Strobl, C. (2015). Anchor selection strategies for DIF analysis: Review, assessment, and new approaches. Educational and Psychological Measurement, 75, 22-56. Google Scholar | SAGE Journals | ISI | |
|
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174. Google Scholar | Crossref | ISI | |
|
Merkle, E. C., Zeileis, A. (2013). Tests of measurement invariance without subgroups: A generalization of classical methods. Psychometrika, 78, 59-82. doi:10.1007/s11336-012-9302-4 Google Scholar | Crossref | Medline | ISI | |
|
Penfield, R. D. (2007). Assessing differential step functioning in polytomous items using a common odds ratio estimator. Journal of Educational Measurement, 44, 187-210. Google Scholar | Crossref | ISI | |
|
Penfield, R. D., Algina, J. (2006). A generalized DIF effect variance estimator for measuring unsigned differential test functioning in mixed test formats. Journal of Educational Measurement, 43, 295-312. Google Scholar | Crossref | ISI | |
|
Potenza, M. T., Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23-37. Google Scholar | SAGE Journals | ISI | |
|
R Core Team . (2016). R: A language and environment for statistical computing (Version 3.2.2). Retrieved from https://www.R-project.org/ Google Scholar | |
|
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. (Vol. 17). Madison, WI: Psychometric Society. Google Scholar | Crossref | |
|
Sauer, S., Walach, H., Kohls, N., Strobl, C. (2013). Rasch-Analyse des Freiburger Fragebogens zur Achtsamkeit. Diagnostica, 59(2), 1-14. Google Scholar | Crossref | ISI | |
|
Strobl, C. (2013). Data mining. In Little, T. (Ed.), The Oxford handbook on quantitative methods (pp. 678-700). Oxford, England: Oxford University Press. Google Scholar | Crossref | |
|
Strobl, C., Boulesteix, A., Augustin, T. (2007). Unbiased split selection for classification trees based on the Gini index. Computational Statistics & Data Analysis, 52, 483-501. Google Scholar | Crossref | ISI | |
|
Strobl, C., Kopf, J., Zeileis, A. (2015). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80, 289-316. Google Scholar | Crossref | Medline | ISI | |
|
Strobl, C., Malley, J., Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application and characteristics of classification and regression trees, bagging and random forests. Psychological Methods, 14, 323-348. Google Scholar | Crossref | Medline | ISI | |
|
Strobl, C., Wickelmaier, F., Zeileis, A. (2011). Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. Journal of Education and Behavioral Statistics, 36, 135-153. Google Scholar | SAGE Journals | ISI | |
|
Su, Y. H., Wang, W. C. (2005). Efficiency of the Mantel, generalized Mantel-Haenszel, and logistic discriminant function analysis methods in detecting differential item functioning for polytomous items. Applied Measurement in Education, 18, 313-350. Google Scholar | Crossref | ISI | |
|
Swaminathan, H., Rogers, H. J. (2000). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370. Google Scholar | Crossref | ISI | |
|
Tay, L., Newman, D. A., Vermunt, J. K. (2011). Using mixed-measurement item response theory with covariates (MM-IRT-C) to ascertain observed and unobserved measurement equivalence. Organizational Research Methods, 14, 147-176. Google Scholar | SAGE Journals | ISI | |
|
Van den Noortgate, W., De Boeck, P. (2005). Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30, 443-464. Google Scholar | SAGE Journals | ISI | |
|
Van der Linden, W. J., Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer-Verlag. Google Scholar | Crossref | |
|
von Davier, M., Carstensen, C. H. (2007). Multivariate and mixture distribution Rasch models: Extensions and applications. New York: Springer. Google Scholar | Crossref | |
|
Walach, H., Buchheld, N., Buttenmüller, V., Kleinknecht, N., Schmidt, S. (2006). Measuring mindfulness: The Freiburg Mindfulness Inventory (FMI). Personality and Individual Differences, 40, 1543-1555. Google Scholar | Crossref | ISI | |
|
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261. Google Scholar | Crossref | ISI | |
|
Wang, W.-C., Su, Y.-H. (2004). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450-480. Google Scholar | SAGE Journals | ISI | |
|
Wilson, M., Masters, G. (1993). The partial credit model and null categories. Psychometrika, 58, 87-99. Google Scholar | Crossref | ISI | |
|
Zeileis, A., Hothorn, T., Hornik, K. (2008). Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17, 492-514. Google Scholar | Crossref | ISI | |
|
Zeileis, A., Strobl, C., Wickelmaier, F., Komboz, B., Kopf, J. (2015a). psychotools: Infrastructure for psychometric modeling (R package version 0.4-0). Retrieved from https://CRAN.R-project.org/package=psychotools Google Scholar | |
|
Zeileis, A., Strobl, C., Wickelmaier, F., Komboz, B., Kopf, J. (2015b). psychotree: Recursive partitioning based on psychometric models. (R package version 0.15-0). Retrieved from https://CRAN.R-project.org/package=psychotree Google Scholar |
