In this brief explication, two challenges for using differential item functioning (DIF) measures when there are large group differences in true proficiency are illustrated. Each of these difficulties may lead to inflated Type I error rates, for very different reasons. One problem is that groups matched on observed score are not necessarily well matched on true proficiency, which may result in the false detection of DIF due to inaccurate matching. The other problem is that a model that does not allow for a nonzero asymptote can produce what seems to be DIF. These issues have been discussed separately in the literature earlier. This article brings them together in a nontechnical form.

Bolt, D. , & Gierl, M.J. ( 2006). Testing features of graphical DIF: Application of a regression correction to three nonparametric statistical tests. Journal of Educational Measurement, 43, 313-333.
Google Scholar | Crossref
Chang, H.-H. , Mazzeo, J. , & Roussos, L. ( 1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement , 33, 333-353.
Google Scholar | Crossref | ISI
Finch, W.H. , & French, B.F. ( 2007). Detection of crossing differential item functioning. Educational and Psychological Measurement, 67, 565-582.
Google Scholar | SAGE Journals
Finch, W.H. , & French, B.F. ( 2008). Anomalous Type I error rates for identifying one type of differential item functioning in the presence of the other. Educational and Psychological Measurement, 68, 742-759.
Google Scholar | SAGE Journals | ISI
Holland, P.W. , & Thayer, D.T. ( 1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Jiang, H. , & Stout, W. ( 1998). Improved Type I error control and reduced estimation bias for DIF detection using SIBTEST. Journal of Educational and Behavioral Statistics, 23, 291-322.
Google Scholar | SAGE Journals | ISI
Li, H.-H. , & Stout, W. ( 1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647-677.
Google Scholar | Crossref | ISI
Lord, F.M. , & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile "observed score" equatings. Applied Psychological Measurement, 8, 453-461.
Google Scholar | SAGE Journals
Meredith, W. , & Millsap, R.E. ( 1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289-311.
Google Scholar | Crossref | ISI
Oshima, T.C. , Raju, N.S. , & Nanda, A.O. ( 2006). A new method for assessing the statistical significance in the differential functioning of items and tests (DFIT) framework. Journal of Educational Measurement, 43, 1-17.
Google Scholar | Crossref | ISI
Raju, N.S. , van der Linden, W.J. , & Fleer, P.F. ( 1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353-368.
Google Scholar | SAGE Journals | ISI
Roussos, L.A. , & Stout, W.F. ( 1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance . Journal of Educational Measurement, 33, 215-230.
Google Scholar | Crossref | ISI
Shealy, R. , & Stout, W. ( 1993). A model-based standardization approach that separates true bias/ DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194.
Google Scholar | Crossref
Swaminathan, H. , & Rogers, H.J. ( 1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement , 27, 361-370.
Google Scholar | Crossref | ISI
Thissen, D. , & Orlando, M. ( 2001). Item response theory for items scored in two categories . In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 73-140). Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Wainer, H. , & Skorupski, W.P. (2005). Was it ethnic and social-class bias or statistical artifact? Logical and empirical evidence against Freedle’s method for reestimating SAT scores. Chance, 18, 17-24.
Google Scholar | Crossref
Wells, C.S. , & Bolt, D.M. ( 2008). Investigation of a nonparametric procedure for assessing goodness-of-fit in item response theory. Applied Measurement in Education, 21, 22-40.
Google Scholar | Crossref
Yen, W.M. ( 1981). Using simulation results to choose a latent trait model . Applied Psychological Measurement, 5, 245-262.
Google Scholar | SAGE Journals | ISI
Zwick, R. ( 1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185-197.
Google Scholar | Crossref
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

EPM-article-ppv for $37.50
Single Issue 24 hour E-access for $323.77

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top