Abstract
Multilevel data structures are ubiquitous in the assessment of differential item functioning (DIF), particularly in large-scale testing programs. There are a handful of DIF procures for researchers to select from that appropriately account for multilevel data structures. However, little, if any, work has been completed to extend a popular DIF method to this case. Thus, the primary goal of this study was to introduce and investigate the effectiveness of several new options for DIF assessment in the presence of multilevel data with the Mantel–Haenszel (MH) procedure, a popular, flexible, and effective tool for DIF detection. The performance of these new methods was compared with the standard MH technique through a simulation study, where data were simulated in a multilevel framework, corresponding to examinees nested in schools, for example. The standard MH test for DIF detection was employed, along with several multilevel extensions of MH. Results demonstrated that these multilevel tests proved to be preferable to standard MH in a wide variety of cases where multilevel data were present, particularly when the intraclass correlation was relatively large. Implications of this study for practice and future research are discussed.
References
|
Begg, C. (1999).Analyzing k (2 × 2) tables under cluster sampling. Biometrics, 55, 302-307. Google Scholar | Crossref | Medline | ISI | |
|
Beretvas, N. S., Cawthon, S. W., Lockhart, L. L., Kaye, A. D. (2012). Assessing impact, DIF, and DFF in accommodated item scores: A comparisons of multilevel measurement model parameterizations. Educational and Psychological Measurement, 72, 754-773. Google Scholar | SAGE Journals | ISI | |
|
Camilli, G., Shepard, L. (1994). Methods for identifying biased test items. Newbury Park, CA: Sage. Google Scholar | |
|
Cho, S.-J., Cohen, A. S. (2010). A multilevel mixture IRT model with applications to DIF. Journal of Educational and Behavioral Statistics, 35, 336-370. Google Scholar | SAGE Journals | ISI | |
|
Cheong, Y. F. (2006). Analysis of school context effects on differential item functioning using hierarchical generalized linear models. International Journal of Testing, 6, 57-79. Google Scholar | Crossref | |
|
Cooper, H., Hedges, L. V. (1994). The handbook of research synthesis. New York, NY: Russell Sage Foundation. Google Scholar | |
|
Fidalgo, A. M., Hashimoto, K., Bartram, D., Muþiz, J. (2007). Empirical bayes versus standard Mantel-Haenszel statistics for detecting differential item functioning under small sample conditions. Journal of Experimental Education, 75, 293-314. Google Scholar | Crossref | ISI | |
|
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295. Google Scholar | SAGE Journals | ISI | |
|
Finch, W. H., French, B. F. (2010). Detecting differential item functioning of a course satisfaction instrument in the presence of multilevel data. Journal of the First Year and Students in Transition, 22(1), 27-48. Google Scholar | |
|
Finch, W. H., French, B. F. (2011). Estimation of MIMIC model parameters with multilevel data. Structural Equation Modeling, 18, 229-252. Google Scholar | Crossref | ISI | |
|
French, B. F., Finch, W. H. (2010). Hierarchical logistic regression: Accounting for multilevel data in DIF detection. Journal of Educational Measurement, 47, 299-317. Google Scholar | Crossref | ISI | |
|
Hedges, L. V., Hedberg, E. C. (2007). Intraclass correlation values for planning group-randomized trials in education, Educational Evaluation and Policy Analysis, 29, 60-87. Google Scholar | SAGE Journals | ISI | |
|
Holland, P. W., Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In Holland, H., Braun, H. I. (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum. Google Scholar | |
|
Hox, J. J., Maas, C. J. M. (2001). The accuracy of multilevel structural equation modeling with Pseudobalanced groups and small samples. Structural Equation Modeling, 8, 157-174. Google Scholar | Crossref | ISI | |
|
Maas, C. J. M., Hox, J. J. (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1, 86-92. Google Scholar | Crossref | |
|
Mantel, N., Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748. Google Scholar | Medline | |
|
Millsap, R. E., Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334. Google Scholar | SAGE Journals | ISI | |
|
Muthén, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods & Research, 22, 376-398. Google Scholar | SAGE Journals | ISI | |
|
Muthén, B. O. (1997). Latent variable modeling with longitudinal and multilevel data. In Raftery, A. (Ed.), Sociological methodology (pp. 453-480). Boston, MA: Blackwell. Google Scholar | SAGE Journals | |
|
Muthén, B. O., Satorra, A. (1995). Complex sample data in structural equation modeling. In Marsden, P. (Ed.), Sociological methodology (pp. 267-316). Boston, MA: Blackwell. Google Scholar | Crossref | |
|
Narayanan, P., Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 20, 257-274. Google Scholar | |
|
Narayanan, P., Swaminathan, H. (1996). Identification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257-274. Google Scholar | SAGE Journals | ISI | |
|
Paxton, P., Curran, P. J., Bollen, K. A., Kirby, J., Chen, F. (2001). Monte Carlo experiments: Design and implementation. Structural Equation Modeling, 8, 287-312. Google Scholar | Crossref | ISI | |
|
Patarapichayathan, C., Kamata, A., Kanjanawasee, S. (2012). Evaluation of model selection strategies for cross-level two-way differential item functioning analysis. Educational and Psychological Measurement, 72, 44-51. Google Scholar | SAGE Journals | ISI | |
|
Paek, I., Guo, H. (2011). Accuracy of DIF estimates and power in unbalanced designs using the Mantel-Haenszel DIF detection procedure. Applied Psychological Measurement, 35, 518-535. Google Scholar | SAGE Journals | ISI | |
|
Pei, L.-K., Li, J. (2010). Effects of unequal ability variances on the performance of logistic regression, Mantel-Haenszel, SIBTEST IRT, and IRT likelihood ratio for DIF detection. Applied Psychological Measurement, 34, 453-456. Google Scholar | SAGE Journals | ISI | |
|
Pommerich, M. (1995). Demonstrating the utility of a multilevel model in the assessment of differential item functioning. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA. Google Scholar | |
|
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495-502. Google Scholar | Crossref | ISI | |
|
Raudenbush, S. W., Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage. Google Scholar | |
|
Raudenbush, S. W., Sampson, R. (1999). Assessing direct and indirect associations in multilevel designs with latent variables. Sociological Methods & Research, 28, 123-153. Google Scholar | SAGE Journals | ISI | |
|
Roussos, L. A., Stout, W. F. (1996). Simulation studies of the effect of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33, 215-230. Google Scholar | Crossref | ISI | |
|
Thissen, D. (2001). IRTLRDIF v.2.0b:Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. L. L. Thurstone Psychometric Laboratory, University of North Carolina at Chapel Hill. Google Scholar | |
|
Williams, N. J., Beretvas, N. S. (2006). DIF identification using HGLM for polytomous items. Applied Psychological Measurement, 30, 22-42. Google Scholar | SAGE Journals | ISI | |
|
Zhang, J., Boos, D. D. (1997). Mantel-Haenszel test statistics for correlated binary data. Biometrics, 53, 1185-1198. Google Scholar | Crossref | Medline | ISI |
