Abstract
Recent lawsuits reveal common mistakes plaguing current teacher evaluation systems. Drawing on arguments in court documents for prominent cases, the authors find that evaluation systems using value-added measures (VAM) suffer from a) inconsistent and unreliable teacher ratings, b) bias toward and against teachers of certain types of students, c) easy opportunities for administrators to game the system, and d) a lack of transparency. They urge others to engage with these (and other) arguments to design better, more valid, more useful, and ultimately more defensible teacher evaluation systems.
|
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education . (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association. Google Scholar | |
|
Amrein-Beardsley, A. (2016, 3 16). Alleged violation of protective order in Houston lawsuit, overruled. VAMboozled! http://vamboozled.com/alleged-violation-of-protective-order-in-houston-lawsuit-overruled Google Scholar | |
|
Carey, K. (2017, 5 19). The little-known statistician who taught us to measure teachers. The New York Times. Google Scholar | |
|
Chiang, H., McCullough, M., Lipscomb, S., Gill, B. (2016). Can student test scores provide useful measures of school principals’ performance? Washington, DC: U.S. Department of Education. Google Scholar | |
|
Close, K., Amrein-Beardsley, A., Collins, C. (2018). State-level assessments and teacher evaluation systems after the passage of the Every Student Succeeds Act: Some steps in the right direction. Boulder, CO: National Education Policy Center. Google Scholar | |
|
Education Week . (2015, 10 6). Teacher evaluation heads to the courts. Google Scholar | |
|
Gabriel, R., Lester, J.N. (2013). Sentinels guarding the grail: Value-added measurement and the quest for education reform. Education Policy Analysis Archives, 21 (9), 1–30. Google Scholar | |
|
Geiger, T., Amrein-Beardsley, A. (2017). The artificial conflation of teacher-level “multiple measures” [Commentary]. Teachers College Record. Google Scholar | |
|
Gill, B., Shoji, M., Coen, T., Place, K. (2016). The content, predictive power, and potential bias in five widely used teacher observation instruments. Washington, DC: U.S. Department of Education, Institute of Education Sciences. Google Scholar | |
|
Grossman, P., Cohen, J., Ronfeldt, M., Brown, L. (2014). The test matters: The relationship between classroom observation scores and teacher value added on multiple types of assessment. Educational Researcher, 43 (6), 293–303. Google Scholar | SAGE Journals | ISI | |
|
Hill, H.C., Kapitula, L., Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48 (3), 794–831. Google Scholar | SAGE Journals | ISI | |
|
Holloway-Libell, J. (2015). Evidence of grade and subject-level bias in value-added measures. Teachers College Record. Google Scholar | |
|
Houston Federation of Teachers Local 2415 et al. v. Houston Independent School District , 251 F. Supp. 3d 1168 (S.D. Tex., 2017). Google Scholar | |
|
Kane, M.T. (2017). Measurement error and bias in value-added models (ETS RR-17-25). Princeton, NJ: Educational Testing Service. Google Scholar | |
|
Lederman v. King , No. 26416, slip op. (N.Y. May 10, 2016). https://law.justia.com/cases/new-york/other-courts/2016/2016-ny-slip-op-26416.html Google Scholar | |
|
Newton, X., Darling-Hammond, L., Haertel, E., Thomas, E. (2010). Value-added modeling of teacher effectiveness: An exploration of stability across models and contexts. Educational Policy Analysis Archives, 18 (23). Google Scholar | Medline | |
|
Polikoff, M.S., Porter, A.C. (2014). Instructional alignment as a measure of teaching quality. Education Evaluation and Policy Analysis, 36 (4), 399–416. Google Scholar | SAGE Journals | ISI | |
|
Reinhorn, S.K., Moore Johnson, S., Simon, N.S. (2017). Investing in development: Six high-performing, high-poverty schools implement Massachusetts’ teacher evaluation policy. Educational Evaluation and Policy Analysis, 39 (3), 383–406. Google Scholar | SAGE Journals | ISI | |
|
Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics, 125 (1), 175–214. Google Scholar | ISI | |
|
State of New Mexico ex rel. the Hon. Mimi Stewart et al. v. Public Education Department (First Judicial District Court). www.aft.org/sites/default/files/nm-complaint-teacherevals_1114.pdf Google Scholar | |
|
Steinberg, M.P., Garrett, R. (2016). Classroom composition and measured teacher performance: What do teacher observation scores really measure? Educational Evaluation and Policy Analysis, 38 (2), 293–317. Google Scholar | SAGE Journals | ISI | |
|
Trout v. Knox County Board of Education , 163 F.Supp.3d 492 (E.D. Tenn. 2016). Google Scholar | |
|
Whitehurst, G.J., Chingos, M.M., Lindquist, K.M. (2014). Evaluating teachers with classroom observations: Lessons learned in four districts. Washington, DC: Brookings Institution. Google Scholar | |
|
Yeh, S.S. (2013). A re-analysis of the effects of teacher replacement using value-added modeling. Teachers College Record, 115 (12), 1–35. Google Scholar |

