Differential item functioning (DIF) indicates the violation of the invariance assumption, for instance, in models based on item response theory (IRT). For item-wise DIF analysis using IRT, a common metric for the item parameters of the groups that are to be compared (e.g., for the reference and the focal group) is necessary. In the Rasch model, therefore, the same linear restriction is imposed in both groups. Items in the restriction are termed the ``anchor items''. Ideally, these items are DIF-free to avoid artificially augmented false alarm rates. However, the question how DIF-free anchor items are selected appropriately is still a major challenge. Furthermore, various authors point out the lack of new anchor selection strategies and the lack of a comprehensive study especially for dichotomous IRT models. This article reviews existing anchor selection strategies that do not require any knowledge prior to DIF analysis, offers a straightforward notation, and proposes three new anchor selection strategies. An extensive simulation study is conducted to compare the performance of the anchor selection strategies. The results show that an appropriate anchor selection is crucial for suitable item-wise DIF analysis. The newly suggested anchor selection strategies outperform the existing strategies and can reliably locate a suitable anchor when the sample sizes are large enough.

Allalouf, A., Hambleton, R. K., Sireci, S. G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36, 185-198.
Google Scholar | Crossref | ISI
Candell, G. L., Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260.
Google Scholar | SAGE Journals | ISI
Cohen, A. S., Kim, S.-H., Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20, 15-26.
Google Scholar | SAGE Journals | ISI
Drasgow, F. (1987). Study of the measurement bias of two standardized psychological tests. Journal of Applied Psychology, 72, 19-29.
Google Scholar | Crossref | ISI
Eggen, T., Verhelst, N. (2006). Loss of information in estimating item parameters in incomplete designs. Psychometrika, 71, 303-322.
Google Scholar | Crossref | Medline | ISI
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.
Google Scholar | SAGE Journals | ISI
Fischer, G. H. (1995). Derivations of the Rasch model. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models: Foundations, recent developments, and applications (chap. 2, pp. 15-38). New York, NY: Springer.
Google Scholar | Crossref
Frederickx, S., Tuerlinckx, F., De Boeck, P., Magis, D. (2010). RIM: A random item mixture model to detect differential item functioning. Journal of Educational Measurement, 47, 432-457.
Google Scholar | Crossref | ISI
Glas, C. A. W., Verhelst, N. D. (1995). Testing the Rasch model. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models: Foundations, recent developments, and applications (chap. 5, pp. 69-96). New York, NY: Springer.
Google Scholar | Crossref
González-Betanzos, F., Abad, F. J. (2012). The effects of purification and the evaluation of differential item functioning with the likelihood ratio test. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 8(4), 134-145.
Google Scholar | Crossref | ISI
Hidalgo-Montesinos, M. D., Lopez-Pina, J. A. (2002). Two-stage equating in differential item functioning detection under the graded response model with the Raju area measures and the Lord statistic. Educational and Psychological Measurement, 62, 32-44.
Google Scholar | SAGE Journals | ISI
Jodoin, M. G., Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349.
Google Scholar | Crossref | ISI
Kim, S.-H., Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345-355.
Google Scholar | SAGE Journals | ISI
Kopf, J., Zeileis, A., Strobl, C. (2013). Anchor methods for DIF detection: A comparison of the iterative forward, backward, constant and all-other anchor class (Technical Report 141). Munich, Germany: Department of Statistics, LMU Munich.
Google Scholar
Lim, R. G., Drasgow, F. (1990). Evaluation of two methods for estimating item response theory parameters when assessing differential item functioning. Journal of Applied Psychology, 75, 164-174.
Google Scholar | Crossref | ISI
Lopez Rivas, G. E., Stark, S., Chernyshenko, O. S. (2009). The effects of referent item parameters on differential item functioning detection using the free baseline likelihood ratio test. Applied Psychological Measurement, 33, 251-265.
Google Scholar | SAGE Journals | ISI
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Magis, D., De Boeck, P. (2011). Identification of differential item functioning in multiple-group settings: A multivariate outlier detection approach. Multivariate Behavioral Research, 46, 733-755.
Google Scholar | Crossref | Medline | ISI
McLaughlin, M. E., Drasgow, F. (1987). Lord’s chi-square test of item bias with estimated and with known person parameters. Applied Psychological Measurement, 11, 161-173.
Google Scholar | SAGE Journals | ISI
Millsap, R. E., Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334.
Google Scholar | SAGE Journals | ISI
Molenaar, I. W. (1995). Estimation of item parameters. In Fischer, G. H., Molenaar, I. W. (Eds.), Rasch models: Foundations, recent developments, and applications (chap. 3, pp. 39-52). New York, NY: Springer.
Google Scholar | Crossref
Paek, I., Han, K. T. (2013). IRTPRO 2.1 for Windows (item response theory for patient-reported outcomes). Applied Psychological Measurement, 37, 242-252.
Google Scholar | SAGE Journals | ISI
Penfield, R. D. (2001). Assessing differential item functioning among multiple groups: A comparison of three Mantel-Haenszel procedures. Applied Measurement in Education, 14, 235-259.
Google Scholar | Crossref | ISI
R Development Core Team . (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Raju, N. (1988). The area between two item characteristic curves. Psychometrika, 53, 495-502.
Google Scholar | Crossref | ISI
Rogers, H., Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116.
Google Scholar | SAGE Journals | ISI
Shih, C.-L., Wang, W.-C. (2009). Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Applied Psychological Measurement, 33, 184-199.
Google Scholar | SAGE Journals | ISI
Stark, S., Chernyshenko, O. S., Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292-1306.
Google Scholar | Crossref | Medline | ISI
Thissen, D., Steinberg, L., Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In Wainer, H., Braun, H. I. (Eds.), Test validity (chap. 10, pp. 147-169). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.
Google Scholar | Crossref | ISI
Wang, W.-C., Shih, C.-L., Sun, G.-W. (2012). The DIF-free-then-DIF strategy for the assessment of differential item functioning. Educational and Psychological Measurement, 72, 687-708.
Google Scholar | SAGE Journals | ISI
Wang, W.-C., Su, Y.-H. (2004). Effects of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17, 113-144.
Google Scholar | Crossref | ISI
Wang, W.-C., Yeh, Y.-L. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.
Google Scholar | SAGE Journals | ISI
Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42-57.
Google Scholar | SAGE Journals | ISI
Woods, C. M., Cai, L., Wang, M. (2013). The Langer-improved Wald test for DIF testing with multiple groups: Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532-547.
Google Scholar | SAGE Journals | ISI
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

EPM-article-ppv for $37.50
Single Issue 24 hour E-access for $323.77

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top