Abstract
Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model, therefore, the same linear restriction is imposed in both groups. Items in the restriction are termed the ``anchor items''. Ideally, these items are DIF-free to avoid artificially augmented false alarm rates. However, the question how DIF-free anchor items are selected appropriately is still a major challenge. Furthermore, various authors point out the lack of new anchor selection strategies and the lack of a comprehensive study especially for dichotomous IRT models. This article reviews existing anchor selection strategies that do not require any knowledge prior to DIF analysis, offers a straightforward notation, and proposes three new anchor selection strategies. An extensive simulation study is conducted to compare the performance of the anchor selection strategies. The results show that an appropriate anchor selection is crucial for suitable item-wise DIF analysis. The newly suggested anchor selection strategies outperform the existing strategies and can reliably locate a suitable anchor when the sample sizes are large enough.
References
|
Allalouf, A., Hambleton, R. K., Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36, 185-198. Google Scholar | Crossref | ISI | |
|
Candell, G. L., Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260. Google Scholar | SAGE Journals | ISI | |
|
Cohen, A. S., Kim, S.-H., Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20, 15-26. Google Scholar | SAGE Journals | ISI | |
|
Drasgow, F. (1987). Study of the measurement bias of two standardized psychological tests. Journal of Applied Psychology, 72, 19-29. Google Scholar | Crossref | ISI | |
|
Eggen, T., Verhelst, N. (2006). Loss of information in estimating item parameters in incomplete designs. Psychometrika, 71, 303-322. Google Scholar | Crossref | Medline | ISI | |
|
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295. Google Scholar | SAGE Journals | ISI | |
|
Fischer, G. H. (1995). Derivations of the Rasch model. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models: Foundations, recent developments, and applications (chap. 2, pp. 15-38). New York, NY: Springer. Google Scholar | Crossref | |
|
Frederickx, S., Tuerlinckx, F., De Boeck, P., Magis, D. (2010). RIM: A random item mixture model to detect differential item functioning. Journal of Educational Measurement, 47, 432-457. Google Scholar | Crossref | ISI | |
|
Glas, C. A. W., Verhelst, N. D. (1995). Testing the Rasch model. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models: Foundations, recent developments, and applications (chap. 5, pp. 69-96). New York, NY: Springer. Google Scholar | Crossref | |
|
González-Betanzos, F., Abad, F. J. (2012). The effects of purification and the evaluation of differential item functioning with the likelihood ratio test. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 8(4), 134-145. Google Scholar | Crossref | ISI | |
|
Hidalgo-Montesinos, M. D., Lopez-Pina, J. A. (2002). Two-stage equating in differential item functioning detection under the graded response model with the Raju area measures and the Lord statistic. Educational and Psychological Measurement, 62, 32-44. Google Scholar | SAGE Journals | ISI | |
|
Jodoin, M. G., Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349. Google Scholar | Crossref | ISI | |
|
Kim, S.-H., Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345-355. Google Scholar | SAGE Journals | ISI | |
|
Kopf, J., Zeileis, A., Strobl, C. (2013). Anchor methods for DIF detection: A comparison of the iterative forward, backward, constant and all-other anchor class (Technical Report 141). Munich, Germany: Department of Statistics, LMU Munich. Google Scholar | |
|
Lim, R. G., Drasgow, F. (1990). Evaluation of two methods for estimating item response theory parameters when assessing differential item functioning. Journal of Applied Psychology, 75, 164-174. Google Scholar | Crossref | ISI | |
|
Lopez Rivas, G. E., Stark, S., Chernyshenko, O. S. (2009). The effects of referent item parameters on differential item functioning detection using the free baseline likelihood ratio test. Applied Psychological Measurement, 33, 251-265. Google Scholar | SAGE Journals | ISI | |
|
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Magis, D., De Boeck, P. (2011). Identification of differential item functioning in multiple-group settings: A multivariate outlier detection approach. Multivariate Behavioral Research, 46, 733-755. Google Scholar | Crossref | Medline | ISI | |
|
McLaughlin, M. E., Drasgow, F. (1987). Lord’s chi-square test of item bias with estimated and with known person parameters. Applied Psychological Measurement, 11, 161-173. Google Scholar | SAGE Journals | ISI | |
|
Millsap, R. E., Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334. Google Scholar | SAGE Journals | ISI | |
|
Molenaar, I. W. (1995). Estimation of item parameters. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models: Foundations, recent developments, and applications (chap. 3, pp. 39-52). New York, NY: Springer. Google Scholar | Crossref | |
|
Paek, I., Han, K. T. (2013). IRTPRO 2.1 for Windows (item response theory for patient-reported outcomes). Applied Psychological Measurement, 37, 242-252. Google Scholar | SAGE Journals | ISI | |
|
Penfield, R. D. (2001). Assessing differential item functioning among multiple groups: A comparison of three Mantel-Haenszel procedures. Applied Measurement in Education, 14, 235-259. Google Scholar | Crossref | ISI | |
|
R Development Core Team . (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Google Scholar | |
|
Raju, N. (1988). The area between two item characteristic curves. Psychometrika, 53, 495-502. Google Scholar | Crossref | ISI | |
|
Rogers, H., Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116. Google Scholar | SAGE Journals | ISI | |
|
Shih, C.-L., Wang, W.-C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33, 184-199. Google Scholar | SAGE Journals | ISI | |
|
Stark, S., Chernyshenko, O. S., Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292-1306. Google Scholar | Crossref | Medline | ISI | |
|
Thissen, D., Steinberg, L., Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In Wainer, H., Braun, H. I. (Eds.), Test validity (chap. 10, pp. 147-169). Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261. Google Scholar | Crossref | ISI | |
|
Wang, W.-C., Shih, C.-L., Sun, G.-W. (2012). The DIF-free-then-DIF strategy for the assessment of differential item functioning. Educational and Psychological Measurement, 72, 687-708. Google Scholar | SAGE Journals | ISI | |
|
Wang, W.-C., Su, Y.-H. (2004). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17, 113-144. Google Scholar | Crossref | ISI | |
|
Wang, W.-C., Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498. Google Scholar | SAGE Journals | ISI | |
|
Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42-57. Google Scholar | SAGE Journals | ISI | |
|
Woods, C. M., Cai, L., Wang, M. (2013). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532-547. Google Scholar | SAGE Journals | ISI |
