Abstract
The findings and discussions related to cultural bias in testing have in no way been unanimous. However, the considerations of this area of inquiry may possess meaningful implications for educators of any subject. In this review of literature, I describe the issues, research, and arguments surrounding cultural bias in testing and discuss implications for the field of music education. A working description of cultural bias in testing for the purpose of this article involves the notions of (a) significantly different results for definable subgroups from apparently similar ability levels and (b) issues with the fair and equitable interpretation and use of test results. Applications of general education scholarship to music education settings include investigations and perceptions of cultural bias as well as suggestions for improved fairness consisting of addressing group differences, offering diverse ways to perform, discouraging misuse, and accommodating for differences.
|
Alordiah, C. O., Agbajor, H. T. (2014). Bias in test items and implication for national development. Journal of Education and Practice, 5(9), 10–13. Google Scholar | |
|
Banks, K. (2006). A comprehensive framework for evaluating hypotheses about cultural bias in educational testing. Applied Measurement in Education, 19, 115–132. doi:10.1207/s15324818ame1902_3 Google Scholar | Crossref | ISI | |
|
Banks, K. (2012). Are inferential reading items more susceptible to cultural bias than literal reading items? Applied Measurement in Education, 25, 220–245. doi:10.1080/08957347.2012.687610 Google Scholar | Crossref | ISI | |
|
Baumgartner, L. M., Johnson-Bailey, J. (2010). Racism and white privilege in adult education graduate programs: Admissions, retention, and curricula. New Directions for Adult and Continuing Education, 2010(125), 27–40. doi:10.1002/ace.360 Google Scholar | Crossref | |
|
Brown, R. T., Reynolds, C. R., Whitaker, J. S. (1999). Bias in mental testing since Bias in Mental Testing. School Psychology Quarterly, 14, 208–238. Google Scholar | Crossref | ISI | |
|
Clauser, B. E., Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31–44. doi:10.1111/j.1745-3992.1998.tb00619.x Google Scholar | Crossref | |
|
Cole, N. S., Moss, P. A. (1989). Bias in test use. In Linn, R. L. (Ed.), Educational measurement (3rd ed., pp. 201–219). New York, NY: American Council on Education/Macmillan. Google Scholar | |
|
Cole, N. S., Nitko, A. J. (1981). Measuring program effects. In Berk, R. A. (Ed.), Educational evaluation methodology: The state of the art. Baltimore, MD: Johns Hopkins University Press. Google Scholar | |
|
Cole, N. S., Zieky, M. J. (2001). The new faces of fairness. Journal of Educational Measurement, 38, 369–382. Google Scholar | Crossref | ISI | |
|
Contreras, F. E. (2005). Access, achievement, and social capital: Standardized exams and the Latino college-bound population. Journal of Hispanic Higher Education, 4, 197–214. doi:10.1177/1538192705276546 Google Scholar | SAGE Journals | |
|
Dorans, N. J., Zeller, K. (2004). Examining Freedle’s claims and his proposed solution: Dated data, inappropriate measurements, and incorrect and unfair scoring (Research Report No. 04-26). Retrieved from http://www.ets.org/Media/Research/pdf/RR-04-26.pdf Google Scholar | |
|
Fagan, J. F., Holland, C. R. (2002). Equal opportunity and racial differences in I.Q. Intelligence, 30, 361–387. doi:10.1016/S0160-2896(02)00080-6 Google Scholar | Crossref | ISI | |
|
Fleming, J. (2000). Affirmative action and standardized test scores. Journal of Negro Education, 69, 27–37. Google Scholar | |
|
Ford, D. Y., Helmys, J. E. (2012). Testing and assessing African Americans: “Unbiased” tests are still unfair. Journal of Negro Education, 81, 186–189. Google Scholar | Crossref | |
|
Freedle, R. O. (2003). Correcting the SAT’s ethnic and social-class bias: A method for reestimating SAT scores. Harvard Educational Review, 73, 1–43. Google Scholar | Crossref | ISI | |
|
Freedle, R., Kostin, I. (1997). Predicting black and white differential item functioning in verbal analogy performance. Intelligence, 24, 417–444. doi:10.1016/S0160-2896(97)90058-1 Google Scholar | Crossref | ISI | |
|
Gierl, M. J., Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement, 38, 164–187. Google Scholar | Crossref | ISI | |
|
Gregory, R. J. (2004). Psychological testing: History, principles, and applications. Boston, MA: Allyn & Bacon. Google Scholar | |
|
Hash, P. M. (2013). Large-group contest ratings and music teacher evaluation: Issues and recommendations. Arts Education Policy Review, 114, 163–169. doi:10.1080/10632913.2013.826035 Google Scholar | Crossref | |
|
Jencks, C., Phillips, M. (1998). The black-white test score gap. Washington, DC: Brookings Institution Press. Google Scholar | |
|
Magnuson, K., Waldfogel, J. (2008). Steady gains and stalled progress. New York, NY: Russell Sage Foundation. Google Scholar | |
|
Mupinga, E. E., Mupinga, D. M. (2005). Perceptions of international students toward graduate record examination (GRE). College Student Journal, 39, 402–408. Google Scholar | |
|
Nelson-Barber, S., Trumbull, E. (2007). Making assessment practices valid for Indigenous American students. Journal of American Indian Education, 46, 132–147. Google Scholar | |
|
Petchauer, E. (2013). Passing as white: Race, shame, and success in teacher licensure testing events for black preservice teachers. Race Ethnicity and Education. Advance online publication. doi:10.1080/13613324.2013.792796 Google Scholar | Crossref | ISI | |
|
Qi, C. H., Marley, S. C. (2009). Differential item functioning analysis of the Preschool Language Scale-4 between English-speaking Hispanic and European American children from low-income families. Topics in Early Childhood Special Education, 29, 171–180. doi:10.1177/0271121409332674 Google Scholar | SAGE Journals | ISI | |
|
Qi, C. H., Marley, S. C. (2011). Validity study of the Preschool Language Scale-4 with English-speaking Hispanic and European American children in Head Start programs. Topics in Early Childhood Special Education, 31, 89–98. doi:10.1177/0271121410391108 Google Scholar | SAGE Journals | ISI | |
|
Ramsey, P. A. (1993). Sensitivity review: The ETS experience as a case study. In Holland, P., Wainer, H. (Eds.), Differential item functioning (pp. 367–388). Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Reynolds, C. R. (1998). Cultural bias in testing of intelligence and personality. In Bellack, A., Hersen, M. (Series Eds.) & Belar, C. (Vol. Ed.), Comprehensive clinical psychology: Sociocultural and individual differences. New York, NY: Elsevier Science. Google Scholar | Crossref | |
|
Saenz, T. I., Huer, M. B. (2003). Testing strategies involving least biased language assessment of bilingual children. Communication Disorders Quarterly, 24, 184–193. doi:10.1177/15257401030240040401 Google Scholar | SAGE Journals | |
|
Santelices, M. V., Wilson, M. (2010). Unfair treatment? The case of Freedle, the SAT, and the standardization approach to differential item functioning. Harvard Educational Review, 80, 106–134. Google Scholar | Crossref | ISI | |
|
Scherbaum, C. A., Goldstein, H. W. (2008). Examining the relationship between race-based differential item functioning and item difficulty. Educational and Psychological Measurement, 68, 537–553. doi:10.1177/0013164407310129 Google Scholar | SAGE Journals | ISI | |
|
Schmitt, A. P., Dorans, N. J. (1990). Differential item functioning for minority examinees on the SAT. Journal of Educational Measurement, 27, 67–81. Google Scholar | Crossref | ISI | |
|
Skiba, R. J., Knesting, K., Bush, L. D. (2002). Culturally competent assessment: More than nonbiased tests. Journal of Child and Family Studies, 11, 61–78. doi:10.1023/A:1014767511894 Google Scholar | Crossref | |
|
Solano-Flores, G., Nelson-Barber, S. (2001). On the cultural validity of science assessments. Journal of Research in Science Teaching, 38, 553–573. doi:10.1002/tea.1018 Google Scholar | Crossref | ISI | |
|
Spencer, B., Castano, E. (2007). Social class is dead. Long live social class! Stereotype threat among low socioeconomic status individuals. Social Justice Research, 20, 418–432. doi:10.1007/s11211-007-0047-7 Google Scholar | Crossref | |
|
Steele, C. (1997). A threat in the air: How stereotypes shape intellectual identity and performance. American Psychologist, 52, 613–629. doi:10.1037/0003-066X.52.6.613 Google Scholar | Crossref | Medline | ISI | |
|
Steele, C., Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69, 797–811. doi:10.1037/0022-3514.69.5.797 Google Scholar | Crossref | Medline | ISI | |
|
Taylor, O. L., Lee, D. L. (1987). Standardized tests and African-American children: Communication and language issues. Negro Educational Review, 38(2-3), 67–80. Google Scholar | |
|
Walpole, M., McDonough, P. M., Bauer, C. J., Gibson, C., Kanyi, K., Toliver, R. (2005). This test is unfair: Urban African American and Latino high school students’ perceptions of standardized college admission tests. Urban Education, 40, 321–349. doi:10.1177/0042085905274536 Google Scholar | SAGE Journals | ISI | |
|
Whiting, G., Ford, D. (2009). Cultural bias in testing. Retrieved from http://www.education.com/reference/article/cultural-bias-in-testing Google Scholar | |
|
Wightman, L. F. (2003). Standardized testing and equal access: A tutorial. In Chang, M. J., Witt, D., Jones, J., Hakuta, K. (Eds.), Compelling interest: Examining the evidence on racial dynamics in colleges and universities (pp. 49–96). Stanford, CA: Stanford University Press. Google Scholar | |
|
Wilson, W. J. (1998). The role of the environment in the black-white test score gap. In Jencks, C., Phillips, M. (Eds.), The black-white test score gap (pp. 501–510). Washington, DC: Brookings Institution Press. Google Scholar |

