Psychometric measurement models are only valid if measurement invariance holds between test takers of different groups. Global model tests, such as the well-established likelihood ratio (LR) test, are sensitive to violations of measurement invariance, such as differential item functioning and differential step functioning. However, these traditional approaches are only applicable when comparing previously specified reference and focal groups, such as males and females. Here, we propose a new framework for global model tests for polytomous Rasch models based on a model-based recursive partitioning algorithm. With this approach, a priori specification of reference and focal groups is no longer necessary, because they are automatically detected in a data-driven way. The statistical background of the new framework is introduced along with an instructive example. A series of simulation studies illustrates and compares its statistical properties to the well-established LR test. While both the LR test and the new framework are sensitive to differential item functioning and differential step functioning and respect a given significance level regardless of true differences in the ability distributions, the new data-driven approach is more powerful when the group structure is not known a priori—as will usually be the case in practical applications. The usage and interpretation of the new method are illustrated in an empirical application example. A software implementation is freely available in the R system for statistical computing.

Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123-140.
Google Scholar | Crossref | ISI
Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42, 69-81.
Google Scholar | Crossref | ISI
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
Google Scholar | Crossref | ISI
Andrich, D. (2013). An expanded derivation of the threshold structure of the polytomous Rasch model that dispels any “threshold disorder controversy.” Educational and Psychological Measurement, 73, 78-124.
Google Scholar | SAGE Journals | ISI
Ankenmann, R. D., Witt, E. A., Dunbar, S. B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning. Journal of Educational Measurement, 36, 277-300.
Google Scholar | Crossref | ISI
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15, 113-141.
Google Scholar | Crossref | ISI
Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J. (1984). Classification and regression trees. London, England: Chapman & Hall.
Google Scholar
Camilli, G., Congdon, P. (1999). Application of a method of estimating DIF for polytomous test items. Journal of Educational and Behavioral Statistics, 24, 323-341.
Google Scholar | SAGE Journals | ISI
Chang, H., Mazzeo, J., Roussos, L. (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33, 333-353.
Google Scholar | Crossref | ISI
De Boeck, P., Wilson, M. (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York, NY: Springer-Verlag.
Google Scholar | Crossref
Fischer, G. H., Molenaar, I. W. (1995). Rasch models: Foundations, recent developments, and applications. New York, NY: Springer-Verlag.
Google Scholar | Crossref
Fischer, G. H., Ponocny, I. (1995). Extended rating scale and partial credit models for assessing change. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 353-370). New York, NY: Springer-Verlag.
Google Scholar | Crossref
Fox, J., Hong, J. (2009). Effect displays in R for multinomial and proportional-odds logit models: Extensions to the effects package. Journal of Statistical Software, 32(1), 1-24.
Google Scholar | Crossref | ISI
Glas, C. A. W., Verhelst, N. D. (1995). Testing the Rasch model. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models: Foundations, recent developments, and applications (pp. 69-96). New York, NY: Springer-Verlag.
Google Scholar | Crossref
Gustafsson, J. (1980). Testing and obtaining fit of data to the Rasch model. British Journal of Mathematical and Statistical Psychology, 33, 205-233.
Google Scholar | Crossref | ISI
Hochberg, Y., Tamhane, A. C. (1987). Multiple comparison procedures. New York, NY: Wiley.
Google Scholar | Crossref
Holland, P. W., Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In Wainer, H., Brown, H. I. (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum
Google Scholar
Holland, P. W., Wainer, H. (Eds.). (1993). Differential item functioning. Hillsdale: Lawrence Erlbaum.
Google Scholar
Johnson, E., Carlson, J. (1994). The NAEP 1992 technical report. Washington, DC: National Center for Education Statistics.
Google Scholar
Kopf, J., Zeileis, A., Strobl, C. (2015). Anchor selection strategies for DIF analysis: Review, assessment, and new approaches. Educational and Psychological Measurement, 75, 22-56.
Google Scholar | SAGE Journals | ISI
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
Google Scholar | Crossref | ISI
Merkle, E. C., Zeileis, A. (2013). Tests of measurement invariance without subgroups: A generalization of classical methods. Psychometrika, 78, 59-82. doi:10.1007/s11336-012-9302-4
Google Scholar | Crossref | Medline | ISI
Penfield, R. D. (2007). Assessing differential step functioning in polytomous items using a common odds ratio estimator. Journal of Educational Measurement, 44, 187-210.
Google Scholar | Crossref | ISI
Penfield, R. D., Algina, J. (2006). A generalized DIF effect variance estimator for measuring unsigned differential test functioning in mixed test formats. Journal of Educational Measurement, 43, 295-312.
Google Scholar | Crossref | ISI
Potenza, M. T., Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23-37.
Google Scholar | SAGE Journals | ISI
R Core Team . (2016). R: A language and environment for statistical computing (Version 3.2.2). Retrieved from https://www.R-project.org/
Google Scholar
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. (Vol. 17). Madison, WI: Psychometric Society.
Google Scholar | Crossref
Sauer, S., Walach, H., Kohls, N., Strobl, C. (2013). Rasch-Analyse des Freiburger Fragebogens zur Achtsamkeit. Diagnostica, 59(2), 1-14.
Google Scholar | Crossref | ISI
Strobl, C. (2013). Data mining. In Little, T. (Ed.), The Oxford handbook on quantitative methods (pp. 678-700). Oxford, England: Oxford University Press.
Google Scholar | Crossref
Strobl, C., Boulesteix, A., Augustin, T. (2007). Unbiased split selection for classification trees based on the Gini index. Computational Statistics & Data Analysis, 52, 483-501.
Google Scholar | Crossref | ISI
Strobl, C., Kopf, J., Zeileis, A. (2015). Rasch trees: A new method for detecting differential item functioning in the Rasch model. Psychometrika, 80, 289-316.
Google Scholar | Crossref | Medline | ISI
Strobl, C., Malley, J., Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application and characteristics of classification and regression trees, bagging and random forests. Psychological Methods, 14, 323-348.
Google Scholar | Crossref | Medline | ISI
Strobl, C., Wickelmaier, F., Zeileis, A. (2011). Accounting for individual differences in Bradley-Terry models by means of recursive partitioning. Journal of Education and Behavioral Statistics, 36, 135-153.
Google Scholar | SAGE Journals | ISI
Su, Y. H., Wang, W. C. (2005). Efficiency of the Mantel, generalized Mantel-Haenszel, and logistic discriminant function analysis methods in detecting differential item functioning for polytomous items. Applied Measurement in Education, 18, 313-350.
Google Scholar | Crossref | ISI
Swaminathan, H., Rogers, H. J. (2000). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Google Scholar | Crossref | ISI
Tay, L., Newman, D. A., Vermunt, J. K. (2011). Using mixed-measurement item response theory with covariates (MM-IRT-C) to ascertain observed and unobserved measurement equivalence. Organizational Research Methods, 14, 147-176.
Google Scholar | SAGE Journals | ISI
Van den Noortgate, W., De Boeck, P. (2005). Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30, 443-464.
Google Scholar | SAGE Journals | ISI
Van der Linden, W. J., Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer-Verlag.
Google Scholar | Crossref
von Davier, M., Carstensen, C. H. (2007). Multivariate and mixture distribution Rasch models: Extensions and applications. New York: Springer.
Google Scholar | Crossref
Walach, H., Buchheld, N., Buttenmüller, V., Kleinknecht, N., Schmidt, S. (2006). Measuring mindfulness: The Freiburg Mindfulness Inventory (FMI). Personality and Individual Differences, 40, 1543-1555.
Google Scholar | Crossref | ISI
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.
Google Scholar | Crossref | ISI
Wang, W.-C., Su, Y.-H. (2004). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450-480.
Google Scholar | SAGE Journals | ISI
Wilson, M., Masters, G. (1993). The partial credit model and null categories. Psychometrika, 58, 87-99.
Google Scholar | Crossref | ISI
Zeileis, A., Hothorn, T., Hornik, K. (2008). Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17, 492-514.
Google Scholar | Crossref | ISI
Zeileis, A., Strobl, C., Wickelmaier, F., Komboz, B., Kopf, J. (2015a). psychotools: Infrastructure for psychometric modeling (R package version 0.4-0). Retrieved from https://CRAN.R-project.org/package=psychotools
Google Scholar
Zeileis, A., Strobl, C., Wickelmaier, F., Komboz, B., Kopf, J. (2015b). psychotree: Recursive partitioning based on psychometric models. (R package version 0.15-0). Retrieved from https://CRAN.R-project.org/package=psychotree
Google Scholar
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

EPM-article-ppv for $37.50
Single Issue 24 hour E-access for $323.77

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top