Abstract
Differential item functioning (DIF) can undermine the validity of cross-lingual comparisons. While a lot of efficient statistics for detecting DIF are available, few general findings have been found to explain DIF results. The objective of the article was to study DIF sources by using a mixed method design. The design involves a quantitative phase in which DIF was analyzed followed by a qualitative phase conducting cognitive interviews. To illustrate the proposal, polytomous DIF was analyzed in the scales from the PISA (Programme for International Student Assessment) Student Questionnaire (Organisation for Economic Co-operation and Development). Evidence obtained allowed DIF to be connected with differences in the interpretation patterns of participants from the different linguistic groups. Finally, benefits of mixed methods design for analyzing equivalence in cross-lingual assessments are discussed.
|
Allalouf, A., Hambleton, R. K., Sireci, S. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36, 185-198. Google Scholar | Crossref | ISI | |
|
Beatty, P., Willis, G. B. (2007). Research synthesis: The practice of cognitive interviewing. Public Opinion Quarterly, 71, 287-311. Google Scholar | Crossref | ISI | |
|
Bryman, A. (1988). Quantity and quality in social research. London, England: Unwin Hyman. Google Scholar | Crossref | |
|
Castillo-Díaz, M., Padilla, J-L. (2012). How cognitive interviewing can provide validity evidence of the response processes to scale items. Social Indicator Research. Advance online publication. doi:10.1007/s11205-012-0184-8 Google Scholar | Crossref | ISI | |
|
Clarke, P. N., Yaros, P. S. (1988). Research blenders: Commentary and response. Nursing Science Quarterly, 1, 147-149. Google Scholar | SAGE Journals | |
|
Creswell, J. W. (1995). Research design: Qualitative and quantitative approaches. Thousand Oaks. CA: Sage. Google Scholar | |
|
Edmeades, J., Nyblade, L., Malhotra, A., MacQuarrie, K., Parasuraman, S., Walia, S. (2010). Methodological innovation in studying abortion in developing countries: A narrative quantitative survey in Madhya Pradesh, India. Journal of Mixed Methods Research, 4(3), 176-198. Google Scholar | SAGE Journals | ISI | |
|
Elosua, P., López-Jaúregui, A. (2007). Potential sources of differential item functioning in the adaptation of tests. International Journal of Testing, 7, 39-52. Google Scholar | Crossref | |
|
Ercikan, K., Arim, R., Law, D., Domene, J., Gagnon, F., Lacroix, S. (2010). Application of think aloud protocols for examining and confirming sources of differential item functioning identified by expert reviews. Educational Measurement: Issues and Practice, 29(2), 24-35. Google Scholar | Crossref | |
|
Ferne, T., Rupp, A. A. (2007). A synthesis of 15 years of research on DIF in language testing: Methodological advances, challenges, and recommendations. Language Assessment Quarterly, 4, 113-148. Google Scholar | Crossref | |
|
Gadermann, A. M., Guhn, M., Zumbo, B. D. (2011). Investigating the substantive aspect of construct validity for the satisfaction with life scale adapted for children: A focus on cognitive processes. Social Indicator Research, 100, 37-60. Google Scholar | Crossref | ISI | |
|
Gierl, M. J., Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement, 38, 164-187. Google Scholar | Crossref | ISI | |
|
Hambleton, R. K. (2006). Good practices for identifying differential item functioning. Medical Care, 44, S182-S188. Google Scholar | Crossref | Medline | ISI | |
|
Harkness, J. A. (2003). Questionnaire translation. In Harkness, J. A., van de Vijver, F., Mohler, P. Ph (Eds.), Cross-cultural survey methods (pp. 19-34). New York, NY: Wiley. Google Scholar | |
|
Hidalgo, M. D., Gómez-Benito, J. (2010). Education measurement: Differential item functioning. In Peterson, P., Baker, E., McGaw, B. (Eds.), International encyclopedia of education (3rd ed., pp. 36-44). New York, NY: Elsevier Science. Google Scholar | Crossref | |
|
International Association for the Evaluation of Educational Achievement . (2011). Progress in International Reading Literacy Study (PIRLS). Alexandria, VA: National Center for Education Statistics. Google Scholar | |
|
International Test Commission . (2010). International test commission guidelines for translating and adapting tests.Retrieved from http://www.intestcom.org Google Scholar | |
|
Johnson, T. P. (2006). Methods and frameworks for crosscultural measurement. Medical Care, 44(11 Suppl. 3), S17-S20. Google Scholar | Crossref | ISI | |
|
Linn, R. L., Harnisch, D. L. (1981). Interactions between item content and group measurement on achievement test items. Journal of Educational Measurement, 18, 109-118. Google Scholar | Crossref | ISI | |
|
Miller, K. (2007, June). Design and analysis of cognitive interviews for cross-national testing. Paper presented at the European Survey Research Association annual meeting, Prague, Czech Republic. Google Scholar | |
|
Miller, T. R., Spray, J. A. (1993). Logistic discriminant function analysis for DIF identification of polytomously scored items. Journal of Educational Measurement, 30, 107-122. Google Scholar | Crossref | ISI | |
|
Millsap, R. E., Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334. Google Scholar | SAGE Journals | ISI | |
|
National Center for Health Statistics . Q-notes software. Retrieved from https://www.cdc.gov/qnotes/login.aspx Google Scholar | |
|
Organisation for Economic Co-operation and Development . (2006). PISA 2006 database.Retrieved from http://pisa2006.acer.edu.au/downloads.php Google Scholar | |
|
Organisation for Economic Co-operation and Development . (2009). Programme for International Student Assessment (PISA). Paris, France: Author. Google Scholar | |
|
Organisation for Economic Co-operation and Development . (2011). Programme for the International Assessment of Adult Competencies (PIAAC). Paris, France: Author. Google Scholar | |
|
Penfield, R. D. (2005). DIFAS: Differential item functioning analysis system. Applied Psychological Measurement, 29, 150-151. Google Scholar | SAGE Journals | ISI | |
|
Penfield, R. D. (2010). Distinguishing between net and global DIF in polytomous items. Journal of Educational Measurement, 47, 129-149. Google Scholar | Crossref | ISI | |
|
Penfield, R. D., Alvarez, K., Lee, O. (2009). Using a taxonomy of differential step functioning to improve the interpretation of dif in polytomous items: An illustration. Applied Measurement in Education, 22, 61-78. Google Scholar | Crossref | ISI | |
|
Penfield, R. D., Gattamorta, K., Childs, R. A. (2009). An NCME instructional module on using differential step functioning to refine the analysis of DIF in polytomous items. Educational Measurement: Issues and Practice, 28, 38-49. Google Scholar | Crossref | |
|
Presser, S., Couper, M. P., Lessler, J. T., Martin, E., Martin, J., Rothgeb, J. M., Singer, E. (2004). Methods for testing and evaluating survey questions. Public Opinion Quarterly, 68, 109-130. Google Scholar | Crossref | ISI | |
|
Roussos, L. A., Stout, W. F. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355-371. Google Scholar | SAGE Journals | ISI | |
|
Schmeiser, C. B. (1982). Use of experimental design in statistical item bias studies. In Berk, R. A. (Ed.), Handbook of methods for detecting test bias (pp. 64-96). Baltimore, MD: Johns Hopkins University Press. Google Scholar | |
|
Shealy, R., Stout, W. F. (1993). An item response theory model for test bias. In Holland, P. W., Wainer, H. (Eds.), Differential item functioning (pp. 197-239). Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Sireci, S. G. (1997). Problems and issues in linking tests across languages. Educational Measurement: Issues and Practice, 16, 12-19. Google Scholar | Crossref | |
|
Sireci, S. G., Patsula, L., Hambleton, R. K. (2005). Statistical methods for identifying flaws in the test adaptation process. In Hambleton, R. K., Merenda, P. F., Spielberger, C. D. (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 93-115). Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
SPSS Inc . (2007). SPSS-16 user’s guide. Chicago, IL: Author. Google Scholar | |
|
Swanson, D. B., Clauser, B. E., Case, S. M., Nungester, R. J., Featherman, C. (2002). Analysis of differential item functioning (DIF) using hierarchical logistic regression models. Journal of Educational and Behavioral Statistics, 27, 53-75. Google Scholar | SAGE Journals | ISI | |
|
Tashakkori, A., Teddlie, C. (1998). Mixed methodology: Combining qualitative and quantitative approaches. Thousand Oaks, CA: Sage. Google Scholar | |
|
van, de, Vijver, F. J. R., Poortinga, Y. H. (2005). Conceptual and methodological issues in adapting tests. In Hambleton, R. K., Merenda, P. F., Spielberger, C. D. (Eds.), Adapting educational and psychological tests for cross-cultural assessment (pp. 39-63). Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Willis, G. B. (2005). Cognitive interviewing. Thousand Oaks: Sage. Google Scholar | Crossref | |
|
Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In Holland, P. W., Wainer, H. (Eds.), Differential item functioning (pp. 337-347). Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Ontario, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Google Scholar | |
|
Zumbo, B. D. (2009). Validity as contextualized and pragmatic explanation, and Its implications for validation practice. In Lissitz, R. W. (Ed.), The concept of validity: Revisions, new directions and applications (pp. 65-82). Charlotte, NC: Information Age. Google Scholar |

