Abstract
In differential item functioning (DIF) analysis, a common metric is necessary to compare item parameters between groups of test-takers. In the Rasch model, the same restriction is placed on the item parameters in each group to define a common metric. However, the question how the items in the restriction—termed anchor items—are selected appropriately is still a major challenge. This article proposes a conceptual framework for categorizing anchor methods: The anchor class to describe characteristics of the anchor methods and the anchor selection strategy to guide how the anchor items are determined. Furthermore, the new iterative forward anchor class is proposed. Several anchor classes are implemented with different anchor selection strategies and are compared in an extensive simulation study. The results show that the new anchor class combined with the single-anchor selection strategy is superior in situations where no prior knowledge about the direction of DIF is available.
References
|
Candell, G. L., Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260. Google Scholar | SAGE Journals | ISI | |
|
Cohen, A. S., Kim, S. H., Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20, 15-26. Google Scholar | SAGE Journals | ISI | |
|
Drasgow, F. (1987). Study of the measurement bias of two standardized psychological tests. Journal of Applied Psychology, 72, 19-29. Google Scholar | Crossref | ISI | |
|
Edelen, M. O., Thissen, D., Teresi, J. A., Kleinman, M., Ocepek-Welikson, K. (2006). Identification of differential item functioning using item response theory and the likelihood-based model comparison approach: Application to the mini-mental state examination. Medical Care, 44, 134-142. Google Scholar | Crossref | ISI | |
|
Eggen, T., Verhelst, N. (2006). Loss of information in estimating item parameters in incomplete designs. Psychometrika, 71, 303-322. Google Scholar | Crossref | Medline | ISI | |
|
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295. Google Scholar | SAGE Journals | ISI | |
|
Fischer, G. H. (1995). Derivations of the Rasch model. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models—Foundations, recent developments, and applications (Chapter 2) (pp. 15-38). New York, NY: Springer. Google Scholar | Crossref | |
|
Frederickx, S., Tuerlinckx, F., De Boeck, P., Magis, D. (2010). RIM: A random item mixture model to detect differential item functioning. Journal of Educational Measurement, 47, 432-457. Google Scholar | Crossref | ISI | |
|
Glas, C. A. W., Verhelst, N. D. (1995). Testing the Rasch model. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models—Foundations, recent developments, and applications (Chapter 5) (pp. 69-96). New York, NY: Springer. Google Scholar | Crossref | |
|
Hidalgo-Montesinos, M. D., Lopez-Pina, J. A. (2002). Two-stage equating in differential item functioning detection under the graded response model with the Raju area measures and the Lord statistic. Educational and Psychological Measurement, 62, 32-44. Google Scholar | SAGE Journals | ISI | |
|
Jodoin, M. G., Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349. Google Scholar | Crossref | ISI | |
|
Lopez Rivas, G. E., Stark, S., Chernyshenko, O. S. (2009). The effects of referent item parameters on differential item functioning detection using the free baseline likelihood ratio test. Applied Psychological Measurement, 33, 251-265. Google Scholar | SAGE Journals | ISI | |
|
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum. Google Scholar | |
|
Magis, D., Raîche, G., Béland, S., Gérard, P. (2011). A generalized logistic regression procedure to detect differential item functioning among multiple groups. International Journal of Testing, 11, 365-386. Google Scholar | Crossref | |
|
McLaughlin, M. E., Drasgow, F. (1987). Lord’s chi-square test of item bias with estimated and with known person parameters. Applied Psychological Measurement, 11, 161-173. Google Scholar | SAGE Journals | ISI | |
|
Mellenbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7, 105-118. Google Scholar | SAGE Journals | |
|
Millsap, R. E., Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334. Google Scholar | SAGE Journals | ISI | |
|
Molenaar, I. W. (1995). Estimation of item parameters. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models—Foundations, recent developments, and applications (Chapter 3) (pp. 39-52). New York, NY: Springer. Google Scholar | Crossref | |
|
Paek, I., Han, K. T. (2013). IRTPRO 2.1 for Windows (item response theory for patient-reported outcomes). Applied Psychological Measurement, 37, 242-252. Google Scholar | SAGE Journals | ISI | |
|
Penfield, R. D. (2001). Assessing differential item functioning among multiple groups: A comparison of three Mantel-Haenszel procedures. Applied Measurement in Education, 14, 235-259. Google Scholar | Crossref | ISI | |
|
R Core Team . (2013). R: A language and environment for statistical computing [Computer software]. Vienna, Austria: R Foundation for Statistical Computing. Available from http://www.R-project.org/ Google Scholar | |
|
Raju, N. (1988). The area between two item characteristic curves. Psychometrika, 53, 495-502. Google Scholar | Crossref | ISI | |
|
Shih, C. L., Wang, W. C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33, 184-199. Google Scholar | SAGE Journals | ISI | |
|
Swaminathan, H., Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370. Google Scholar | Crossref | ISI | |
|
Thissen, D., Steinberg, L., Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In Wainer, H., Braun, H. I. (Eds.), Test validity (Chapter 10) (pp. 147-170). Hillsdale, NJ: Erlbaum. Google Scholar | |
|
Wang, W. C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261. Google Scholar | Crossref | ISI | |
|
Wang, W. C., Shih, C. L., Sun, G. W. (2012). The DIF-free-then-DIF strategy for the assessment of differential item functioning. Educational and Psychological Measurement, 72, 687-708. Google Scholar | SAGE Journals | ISI | |
|
Wang, W. C., Su, Y. H. (2004). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17, 113-144. Google Scholar | Crossref | ISI | |
|
Wang, W. C., Yeh, Y. L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498. Google Scholar | SAGE Journals | ISI | |
|
Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42-57. Google Scholar | SAGE Journals | ISI | |
|
Woods, C. M., Cai, L., Wang, M. (2013). The Langer-Improved Wald Test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532-547. Google Scholar | SAGE Journals | ISI |
