Abstract
The use of value-added modeling (VAM) in school accountability is expanding. However, trying to decide how to embrace VAM can be rather nettlesome. Some experts claim it is “too unreliable,” causes “more harm than good,” and has “a big margin for error,” while other experts assert VAM is “imperfect, but useful” and provides “valuable feedback.” This article attempts to parse these statements by exploring the underlying statistical assumptions of VAM, the reliability of VAM’s estimates, and the validity of the inferences commonly made based on the estimates of VAM. It then goes on to discuss the perverse incentives, unintended consequences, and gaming that might accompany the misuse of VAM. The article concludes that while, in many cases, VAM may be preferable to other commonly used measurement modes, it should never be used as the sole indicator of teacher effectiveness. Rather, it should just be a piece of a larger accountability system.
|
Aaronson, D., Barrow, L., Sander, W. (2007). Teachers and student achievement in the Chicago public high schools. Journal of Labor Economics, 25, 95-135. Google Scholar | Crossref | ISI | |
|
Adams, S. J., Heywood, J. S., Rothstein, R. (2009). Teachers, performance pay, and accountability: What education should learn from other sectors. Washington, DC: Economic Policy Institute. Google Scholar | |
|
Alexander, K. L., Entwisle, D. R., Olson, L. S. (2007). Lasting consequences of the summer learning gap. American Sociological Review, 72, 167-180. Google Scholar | SAGE Journals | ISI | |
|
Anyon, J. (1997). Ghetto schooling: A political economy of urban educational reform. New York, NY: Teachers College Press. Google Scholar | |
|
Anyon, J. (2005). Radical possibilities: Public policy, urban education, and a new social movement. New York, NY: Routledge. Google Scholar | |
|
Baker, E., Barton, P., Darling-Hammond, L., Haertel, E., Ladd, H., Linn, R., . . . Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers (Briefing Paper No. 278). Washington, DC: Economic Policy Institute. Google Scholar | |
|
Baldi, S., Jin, Y., Skemer, M., Green, P., Herget, D., Xie, H. (2007). Highlights from PISA 2006: Performance of U.S. 15-year-old students in science and mathematics literacy in an international context. Washington, DC: National Center for Education Statistics. Google Scholar | |
|
Bowles, S., Gintis, H., Groves, M. (2005). Unequal chances: Family background and economic success. Princeton, NJ: Princeton University Press. Google Scholar | |
|
Braun, H., Chudowsky, N., Koenig, J. (Eds.). (2010). Getting value out of value-added: Report of a workshop. Washington, DC: National Academies Press. Google Scholar | |
|
Bryk, A. S., Schneider, B. (2003). Trust in schools: A core resource for school reform. Educational Leadership, 60(6), 40-45. Google Scholar | ISI | |
| Center for K-12 Assessment & Performance Management at ETS . (2010). Developing an internationally comparable balanced assessment system that supports high-quality learning. Education Testing Services. Retrieved from http://www.k12center.org/publications.html Google Scholar | |
|
Coleman, J., Campbell, E., Hobson, C., McPartland, J., Mood, A., Weinfeld, F., York, R. (1966). Equality and educational opportunity. Washington, DC: U.S. Government Printing Office. Google Scholar | |
|
Cooper, H., Nye, B., Charlton, K., Lindsay, J., Greathouse, S. (1996). The effects of summer vacation on achievement test scores: A narrative and meta-analytic review. Review of Educational Research, 66, 227-268. Google Scholar | SAGE Journals | |
|
Darling-Hammond, L. (2010, September 7). Too unreliable. The New York Times. Retrieved from http://www.nytimes.com/roomfordebate/2010/09/06/assessing-a-teachers-value/value-added-assessment-is-too-unreliable-to-be-useful Google Scholar | |
|
Dee, T. S. (2004). Teachers, race, and student achievement in a randomized experiment. Review of Economics and Statistics, 86, 195-210. Google Scholar | Crossref | ISI | |
|
Downey, D. B., von Hippel, P. T., Broh, B. A. (2004). Are schools the great equalizer? Cognitive inequality during the summer months and the school year. American Sociological Review, 69, 613-635. Google Scholar | SAGE Journals | ISI | |
|
Easton, J. (2008, November). Goals and aims of value-added modeling: A Chicago perspective. Paper presented at the National Academies Committee on Value-Added Methodology for Instructional Improvement, Program Evaluation, and Accountability, Washington, DC. Retrieved from http://www7.nationalacademies.org/bota/VAM_Workshop_Agenda.html Google Scholar | |
|
Elmore, R. F. (2004). School reform from the inside out. Cambridge, MA: Harvard Education Press. Google Scholar | |
|
Gick, M. L., Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive Psychology, 15, 1-38. Google Scholar | Crossref | ISI | |
|
Goldenberg, C. N. (2004). Successful school change: Creating settings to improve teaching and learning. New York, NY: Teachers College Press. Google Scholar | |
|
Gordon, R., Kane, T. J., Staiger, D. O. (2006). Identifying effective teachers using performance on the job (Discussion Paper No. 2006-01). Washington, DC: Brookings Institution Press. Google Scholar | |
|
Haney, W. (2000). The myth of the Texas miracle in education. Education Policy Analysis Archives, 8(41). Retrieved from http://epaa.asu.edu/epaa/v8n41/ Google Scholar | |
|
Hanushek, E. A., Kain, J. F., O’Brien, D. M., Rivkin, S. G. (2005, February). The market for teacher quality (NBER Working Paper No. 11154). Cambridge, MA: National Bureau of Economic Research. Google Scholar | Crossref | |
|
Hanushek, E. A., Rivkin, S. G. (2006). Teacher quality. In Hanushek, E., Welch, F. (Eds.), Handbook of the economics of education (Vol. 2, pp. 1051-1078). Amsterdam, Netherlands: North-Holland. Google Scholar | |
|
Hanushek, E. A., Rivkin, S. G. (2010). Using value-added measures of teacher quality (CALDER Brief No. 9).Washington, DC: National Center for Analysis of Longitudinal Data in Education Research. Google Scholar | |
|
Harris, D. N. (2009). Would accountability based on teacher value added be smart policy? An examination of the statistical properties and policy alternatives. Education Finance and Policy, 4, 319-350. Google Scholar | Crossref | ISI | |
|
Harris, D. N., Sass, T. R. (2009). What makes for a good teacher and who can tell? (CALDER Working Paper No. 30). Washington, DC: National Center Analysis of Longitudinal Data in Education Research. Google Scholar | Crossref | |
|
Heyns, B. (1978). Summer learning and the effects of schooling. San Diego, CA: Academic Press. Google Scholar | |
|
Ishii, J., Rivkin, S. G. (2009). Impediments to the estimation of teacher value added. Education Finance and Policy, 4, 520-536. Google Scholar | Crossref | ISI | |
|
Jacob, B. (2007, January). Test-based accountability and student achievement: An investigation of differential performance on NAEP and state assessments. (NBER Working Paper No. 12817). Cambridge, MA: National Bureau of Economic Research. Google Scholar | |
|
Jacob, B. A., Levitt, S. D. (2003). Rotten apples: An investigation of the prevalence and predictors of teacher cheating. Quarterly Journal of Economics, 118, 843-877. Google Scholar | Crossref | ISI | |
|
Kane, T. J., Staiger, D. O. (2002). Volatility in school test scores: Implications for test-based accountability systems. In Ravitch, D. (Ed.), Brookings papers on education policy (pp. 235-283). Washington, DC: Brookings Institution. Google Scholar | Crossref | |
|
Keiser, D. (2005). Learners not widgets: Teacher education for social justice during transformational times. In Michelli, N., Keiser, D. (Eds.), Teacher education for democracy and social justice (pp. 31-55). New York, NY: Routledge. Google Scholar | |
|
Koedel, C., Betts, J. R. (2005). Re-examining the role of teacher quality in the educational production function (Working Paper No. 07-08). Columbia: University of Missouri Department of Economics. Google Scholar | |
|
Koretz, D. M., Barron, S. I. (1998). The validity of gains in scores on the Kentucky Instructional Results Information System (KIRIS). Santa Monica, CA: RAND. Google Scholar | |
|
Koretz, D. M., Hamilton, L. S. (2006). Testing for accountability in K-12. In Brennen, R. (Ed.), Educational measurement (pp. 531-578). Westport, CT: American Council on Education. Google Scholar | |
|
Koretz, D. M., Linn, R. L., Dunbar, S. B., Shepard, L. A. (1991, April). The effects of high-stakes testing on achievement: Preliminary findings about generalization across tests. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL. Google Scholar | |
|
Ladd, H. F. (2007). Holding schools accountable revisited. Spencer Foundation Lecture in Education Policy and Management . Retrieved from http://www.appam.org/awards/pdf/2007Spencer-Ladd.pdf Google Scholar | |
|
Ladd, H. F., Walsh, R. P. (2002). Implementing value-added measures of school effectiveness: getting the incentives right. Economics of Education review, 21, 1-17. Google Scholar | Crossref | |
|
Ladd, H. F., Zelli, A. (2002). School-based accountability in North Carolina: The responses of school principals. Educational Administration Quarterly, 38, 494-529. Google Scholar | SAGE Journals | ISI | |
|
Leithwood, K., Jantzi, D. (2000). The effects of transformational leadership on organizational conditions and student engagement with school. Journal of Educational Administration, 38, 112-129. Google Scholar | Crossref | |
|
Linn, R. L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4-16. Google Scholar | SAGE Journals | |
|
Lipman, P. (2004). High stakes education: Inequality, globalization, and urban school reform. New York, NY: Routledge. Google Scholar | Crossref | |
|
Little, J. W. (1982). Norms of collegiality and experimentation: Workplace conditions of school success. American Educational Research Journal, 19, 325-340. Google Scholar | SAGE Journals | |
|
Lockwood, J. R., McCaffrey, D. (2009). Exploring student-teacher interactions in longitudinal achievement data. Education Finance and Policy, 4, 439-467. Google Scholar | Crossref | ISI | |
|
Lockwood, J. R., McCaffrey, D. F., Hamilton, L. S., Stecher, B., Le, V. N., Martinez, J. F. (2007). The sensitivity of value-added teacher effect estimates to different mathematics achievement measures. Journal of Educational Measurement, 44, 47-67. Google Scholar | Crossref | |
|
Loewenstein, J., Thompson, L., Gentner, D. (1999). Analogical encoding facilitates knowledge transfer in negotiation. Psychonomic Bulletin and Review, 6, 586-597. Google Scholar | Crossref | Medline | ISI | |
|
Marks, H. M., Printy, S. M. (2003). Principal leadership and school performance: An integration of transformational and instructional leadership. Educational Administration Quarterly, 39, 370-397. Google Scholar | SAGE Journals | ISI | |
|
Marzano, R., Waters, T., McNulty, B. (2005). School leadership that works: From research to results. Alexandria, VA: Association for Supervision and Curriculum Development. Google Scholar | |
|
McCaffrey, D., Koretz, D., Lockwood, J. R., Hamilton, L. S. (2004). The promise and peril of using value-added modeling to measure teacher effectiveness (Research Brief). Santa Monica, CA: RAND. Google Scholar | |
|
McCaffrey, D. F., Sass, T. R., Lockwood, J. R., Mihaly, K. (2009). The intertemporal variability of teacher effect estimates. Education Finance and Policy, 4, 572-606. Google Scholar | Crossref | ISI | |
|
McNeil, L., Valenzuela, A. (2000). The harmful impact of the TAAS system of testing in Texas: Beneath the accountability rhetoric. Cambridge, MA: Harvard Civil Rights Project. Google Scholar | |
|
Meyer, R. H. (1997). Value-added indicators of school performance: A primer. Economics of Education Review, 16, 283-301. Google Scholar | Crossref | |
| National Academy of Sciences . (2010). Rising above the gathering storm, revisited: Approaching category 5. Washington, DC: National Academies Press. Google Scholar | |
|
Newmann, F. M., Rutter, R. A., Smith, M. S. (1989). Organizational factors that affect school sense of efficacy, community, and expectations. Sociology of Education, 62, 221-238. Google Scholar | Crossref | |
|
Novick, L. R. (1988). Analogical transfer, problem similarity, and expertise. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 510-520. Google Scholar | Crossref | Medline | ISI | |
|
Orfield, G., Lee, C. (2006). Racial transformation and the changing nature of segregation. Cambridge, MA: Civil Rights Project, Harvard University. Google Scholar | |
|
Raudenbush, S. W. (2004). What are value-added models estimating and what does this imply for statistical practice? Journal of Educational and Behavioral Statistics, 29, 121-129. Google Scholar | SAGE Journals | |
|
Ravitch, D. (2010, September, 7). A big margin for error. The New York Times. Retrieved from http://www.nytimes.com/roomfordebate/2010/09/06/assessing-a-teachers-value/assessing-teachers-by-student-scores-is-too-error-riden-to-be-effective?scp=1&sq=a%20big%20margin%20of%20error&st=cse Google Scholar | |
|
Resnick, L. B. (2006). Making accountability really count. Educational Measurement: Issues and Practice, 25, 33-37. Google Scholar | Crossref | |
|
Rothman, R., Slattery, J. B., Vranek, J. L., Resnick, L. B. (2002). Benchmarking and alignment of standards and testing (CSE Technical Report 566). Los Angeles: Center for the Study of Evaluation, University of California, Los Angeles. Google Scholar | Crossref | |
|
Rothstein, J. (2010a, September 7). More harm than good. The New York Times. Retrieved from http://www.nytimes.com/roomfordebate/2010/09/06/assessing-a-teachers-value/dont-be-too-quick-to-embrace-value-added-assessments Google Scholar | |
|
Rothstein, J. (2010b). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics, 125, 175-214. Google Scholar | Crossref | |
|
Rothstein, R. (2004). Class and schools: Using social, economic, and educational reform to close the black-white achievement gap. New York, NY: Teachers College Press. Google Scholar | |
|
Rothstein, R., Jacobsen, R., Wilder, T. (2008). Grading education: Getting accountability right. New York, NY: Teachers College Press. Google Scholar | |
|
Schmidt, H., Houang, R., McKnight, C. C. (2005). Value-added research: Right idea but wrong solution. In Lissitz, R. (Ed.), Value added models in education: Theory and applications (pp. 272-297). Maple Grove, MN: JAM Press. Google Scholar | |
|
Schochet, P. Z., Chiang, H. S. (2010). Error rates in measuring teacher and school performance based on student test score gains. Washington, DC: U.S. Department of Education. Google Scholar | |
|
Stigler, J. W., Hiebert, J. (1999). The teaching gap: Best ideas from the world’s teachers for improving education in the classroom. New York, NY: Free Press. Google Scholar | |
| U.S. Department of Education . (2000). Before it’s too late: A report to the nation from the National Commission on Mathematics and Science Teaching for the 21st Century. Washington, DC: Author. Google Scholar | |
| U.S. Department of Education . (2009). Race to the top fund executive summary: Notice of proposed priorities, requirements, definitions, and selection criteria. Washington, DC: Author. Google Scholar | |
| U.S. Department of Education . (2010a). The federal role in education. Retrieved from http://www2.ed.gov/about/overview/fed/role.html Google Scholar | |
| U.S. Department of Education . (2010b). Interim report on the evaluation of the growth model pilot project. Washington, DC: Author. Google Scholar | |
|
Wilkins, A. (2010, September 7). Valuable feedback. The New York Times. Retrieved from http://www.nytimes.com/roomfordebate/2010/09/06/assessing-a-teachers-value/valuable-feedback-for-teachers Google Scholar | |
|
Winters, M. (2010, September 7). Imperfect but useful. The New York Times. Retrieved September 17, 2010, from http://www.nytimes.com/roomfordebate/2010/09/06/assessing-a-teachers-value/value-added-assessments-are-imperfect-but-useful?scp=1&sq=imperfect%20but%20useful&st=cse Google Scholar |

