Multilevel data structures are ubiquitous in the assessment of differential item functioning (DIF), particularly in large-scale testing programs. There are a handful of DIF procures for researchers to select from that appropriately account for multilevel data structures. However, little, if any, work has been completed to extend a popular DIF method to this case. Thus, the primary goal of this study was to introduce and investigate the effectiveness of several new options for DIF assessment in the presence of multilevel data with the Mantel–Haenszel (MH) procedure, a popular, flexible, and effective tool for DIF detection. The performance of these new methods was compared with the standard MH technique through a simulation study, where data were simulated in a multilevel framework, corresponding to examinees nested in schools, for example. The standard MH test for DIF detection was employed, along with several multilevel extensions of MH. Results demonstrated that these multilevel tests proved to be preferable to standard MH in a wide variety of cases where multilevel data were present, particularly when the intraclass correlation was relatively large. Implications of this study for practice and future research are discussed.

Begg, C. (1999).Analyzing k (2 × 2) tables under cluster sampling. Biometrics, 55, 302-307.
Google Scholar | Crossref | Medline | ISI
Beretvas, N. S., Cawthon, S. W., Lockhart, L. L., Kaye, A. D. (2012). Assessing impact, DIF, and DFF in accommodated item scores: A comparisons of multilevel measurement model parameterizations. Educational and Psychological Measurement, 72, 754-773.
Google Scholar | SAGE Journals | ISI
Camilli, G., Shepard, L. (1994). Methods for identifying biased test items. Newbury Park, CA: Sage.
Google Scholar
Cho, S.-J., Cohen, A. S. (2010). A multilevel mixture IRT model with applications to DIF. Journal of Educational and Behavioral Statistics, 35, 336-370.
Google Scholar | SAGE Journals | ISI
Cheong, Y. F. (2006). Analysis of school context effects on differential item functioning using hierarchical generalized linear models. International Journal of Testing, 6, 57-79.
Google Scholar | Crossref
Cooper, H., Hedges, L. V. (1994). The handbook of research synthesis. New York, NY: Russell Sage Foundation.
Google Scholar
Fidalgo, A. M., Hashimoto, K., Bartram, D., Muþiz, J. (2007). Empirical bayes versus standard Mantel-Haenszel statistics for detecting differential item functioning under small sample conditions. Journal of Experimental Education, 75, 293-314.
Google Scholar | Crossref | ISI
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.
Google Scholar | SAGE Journals | ISI
Finch, W. H., French, B. F. (2010). Detecting differential item functioning of a course satisfaction instrument in the presence of multilevel data. Journal of the First Year and Students in Transition, 22(1), 27-48.
Google Scholar
Finch, W. H., French, B. F. (2011). Estimation of MIMIC model parameters with multilevel data. Structural Equation Modeling, 18, 229-252.
Google Scholar | Crossref | ISI
French, B. F., Finch, W. H. (2010). Hierarchical logistic regression: Accounting for multilevel data in DIF detection. Journal of Educational Measurement, 47, 299-317.
Google Scholar | Crossref | ISI
Hedges, L. V., Hedberg, E. C. (2007). Intraclass correlation values for planning group-randomized trials in education, Educational Evaluation and Policy Analysis, 29, 60-87.
Google Scholar | SAGE Journals | ISI
Holland, P. W., Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In Holland, H., Braun, H. I. (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum.
Google Scholar
Hox, J. J., Maas, C. J. M. (2001). The accuracy of multilevel structural equation modeling with Pseudobalanced groups and small samples. Structural Equation Modeling, 8, 157-174.
Google Scholar | Crossref | ISI
Maas, C. J. M., Hox, J. J. (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1, 86-92.
Google Scholar | Crossref
Mantel, N., Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.
Google Scholar | Medline
Millsap, R. E., Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334.
Google Scholar | SAGE Journals | ISI
Muthén, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22, 376-398.
Google Scholar | SAGE Journals | ISI
Muthén, B. O. (1997). Latent variable modeling with longitudinal and multilevel data. In Raftery, A. (Ed.), Sociological methodology (pp. 453-480). Boston, MA: Blackwell.
Google Scholar | SAGE Journals
Muthén, B. O., Satorra, A. (1995). Complex sample data in structural equation modeling. In Marsden, P. (Ed.), Sociological methodology (pp. 267-316). Boston, MA: Blackwell.
Google Scholar | Crossref
Narayanan, P., Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 20, 257-274.
Google Scholar
Narayanan, P., Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257-274.
Google Scholar | SAGE Journals | ISI
Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., Chen, F. (2001). Monte Carlo experiments: Design and implementation. Structural Equation Modeling, 8, 287-312.
Google Scholar | Crossref | ISI
Patarapichayathan, C., Kamata, A., Kanjanawasee, S. (2012). Evaluation of model selection strategies for cross-level two-way differential item functioning analysis. Educational and Psychological Measurement, 72, 44-51.
Google Scholar | SAGE Journals | ISI
Paek, I., Guo, H. (2011). Accuracy of DIF estimates and power in unbalanced designs using the Mantel-Haenszel DIF detection procedure. Applied Psychological Measurement, 35, 518-535.
Google Scholar | SAGE Journals | ISI
Pei, L.-K., Li, J. (2010). Effects of unequal ability variances on the performance of logistic regression, Mantel-Haenszel, SIBTEST IRT, and IRT likelihood ratio for DIF detection. Applied Psychological Measurement, 34, 453-456.
Google Scholar | SAGE Journals | ISI
Pommerich, M. (1995). Demonstrating the utility of a multilevel model in the assessment of differential item functioning. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
Google Scholar
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495-502.
Google Scholar | Crossref | ISI
Raudenbush, S. W., Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage.
Google Scholar
Raudenbush, S. W., Sampson, R. (1999). Assessing direct and indirect associations in multilevel designs with latent variables. Sociological Methods & Research, 28, 123-153.
Google Scholar | SAGE Journals | ISI
Roussos, L. A., Stout, W. F. (1996). Simulation studies of the effect of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33, 215-230.
Google Scholar | Crossref | ISI
Thissen, D. (2001). IRTLRDIF v.2.0b:Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. L. L. Thurstone Psychometric Laboratory, University of North Carolina at Chapel Hill.
Google Scholar
Williams, N. J., Beretvas, N. S. (2006). DIF identification using HGLM for polytomous items. Applied Psychological Measurement, 30, 22-42.
Google Scholar | SAGE Journals | ISI
Zhang, J., Boos, D. D. (1997). Mantel-Haenszel test statistics for correlated binary data. Biometrics, 53, 1185-1198.
Google Scholar | Crossref | Medline | ISI
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

EPM-article-ppv for $37.50
Single Issue 24 hour E-access for $323.77

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top