Abstract
In this brief explication, two challenges for using differential item functioning (DIF) measures when there are large group differences in true proficiency are illustrated. Each of these difficulties may lead to inflated Type I error rates, for very different reasons. One problem is that groups matched on observed score are not necessarily well matched on true proficiency, which may result in the false detection of DIF due to inaccurate matching. The other problem is that a model that does not allow for a nonzero asymptote can produce what seems to be DIF. These issues have been discussed separately in the literature earlier. This article brings them together in a nontechnical form.
|
Bolt, D. , & Gierl, M.J. ( 2006). Testing features of graphical DIF: Application of a regression correction to three nonparametric statistical tests. Journal of Educational Measurement, 43, 313-333. Google Scholar | Crossref | |
|
Chang, H.-H. , Mazzeo, J. , & Roussos, L. ( 1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement , 33, 333-353. Google Scholar | Crossref | ISI | |
|
Finch, W.H. , & French, B.F. ( 2007). Detection of crossing differential item functioning. Educational and Psychological Measurement, 67, 565-582. Google Scholar | SAGE Journals | |
|
Finch, W.H. , & French, B.F. ( 2008). Anomalous Type I error rates for identifying one type of differential item functioning in the presence of the other. Educational and Psychological Measurement, 68, 742-759. Google Scholar | SAGE Journals | ISI | |
|
Holland, P.W. , & Thayer, D.T. ( 1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Jiang, H. , & Stout, W. ( 1998). Improved Type I error control and reduced estimation bias for DIF detection using SIBTEST. Journal of Educational and Behavioral Statistics, 23, 291-322. Google Scholar | SAGE Journals | ISI | |
|
Li, H.-H. , & Stout, W. ( 1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647-677. Google Scholar | Crossref | ISI | |
|
Lord, F.M. , & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile "observed score" equatings. Applied Psychological Measurement, 8, 453-461. Google Scholar | SAGE Journals | |
|
Meredith, W. , & Millsap, R.E. ( 1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289-311. Google Scholar | Crossref | ISI | |
|
Oshima, T.C. , Raju, N.S. , & Nanda, A.O. ( 2006). A new method for assessing the statistical significance in the differential functioning of items and tests (DFIT) framework. Journal of Educational Measurement, 43, 1-17. Google Scholar | Crossref | ISI | |
|
Raju, N.S. , van der Linden, W.J. , & Fleer, P.F. ( 1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353-368. Google Scholar | SAGE Journals | ISI | |
|
Roussos, L.A. , & Stout, W.F. ( 1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance . Journal of Educational Measurement, 33, 215-230. Google Scholar | Crossref | ISI | |
|
Shealy, R. , & Stout, W. ( 1993). A model-based standardization approach that separates true bias/ DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194. Google Scholar | Crossref | |
|
Swaminathan, H. , & Rogers, H.J. ( 1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement , 27, 361-370. Google Scholar | Crossref | ISI | |
|
Thissen, D. , & Orlando, M. ( 2001). Item response theory for items scored in two categories . In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 73-140). Mahwah, NJ: Lawrence Erlbaum. Google Scholar | |
|
Wainer, H. , & Skorupski, W.P. (2005). Was it ethnic and social-class bias or statistical artifact? Logical and empirical evidence against Freedle’s method for reestimating SAT scores. Chance, 18, 17-24. Google Scholar | Crossref | |
|
Wells, C.S. , & Bolt, D.M. ( 2008). Investigation of a nonparametric procedure for assessing goodness-of-fit in item response theory. Applied Measurement in Education, 21, 22-40. Google Scholar | Crossref | |
|
Yen, W.M. ( 1981). Using simulation results to choose a latent trait model . Applied Psychological Measurement, 5, 245-262. Google Scholar | SAGE Journals | ISI | |
|
Zwick, R. ( 1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide? Journal of Educational Statistics, 15, 185-197. Google Scholar | Crossref |
