Abstract
This study examines the validity of a statewide teacher evaluation system in the Commonwealth of Virginia. Three hundred and thirty-eight teachers from 16 at-risk schools located in eight school districts participated in an evaluation system pilot during the 2011-2012 academic year. Teachers received ratings on six teacher effectiveness process standards and one student academic progress outcome measure. For the outcome measure, student academic progress was measured by student growth percentiles (where available and appropriate) and student achievement goal setting (i.e., student learning objectives). The study examines the internal validity of the system, specifically (1) the relationship between the six teacher effectiveness process standards and the student academic progress outcome measure and (2) the relationship between ratings on outcome measure for teachers with student growth percentile data and without.
|
American Educational Research Association . (2015). AERA statement on use of value-added models (VAM) for the evaluation of educators and educator preparation programs. Educational Researcher, 44(8), 448-452. Google Scholar | SAGE Journals | ISI | |
|
Baker, B. D., Oluwole, J., Green, P. C.. (2013). The legal consequences of mandating high stakes decisions based on low quality information: Teacher evaluation in the race-to-the-top era. Education Policy Analysis Archives, 21(5). Retrieved from http://epaa.asu.edu/ojs/article/view/1298 Google Scholar | |
|
Ballou, D., Springer, M. G. (2015). Using student test scores to measure teacher performance: Some problems in the design and implementation of evaluation systems. Educational Researcher, 44(2), 77-86. Google Scholar | SAGE Journals | ISI | |
|
Betebenner, D. (2009). Growth, standards, and accountability. Retrieved from http://www.nciea.org/publication_PDFs/growthandStandard_DB09.pdf Google Scholar | |
|
Braun, H. (2015). The value in value-added depends on the ecology. Educational Researcher, 44(2), 127-131. Google Scholar | SAGE Journals | ISI | |
|
Darling-Hammond, L. (2013). Getting teacher evaluation right: What really matters for effectiveness and improvement. New York, NY: Teachers College Press. Google Scholar | |
|
Darling-Hammond, L., Amrein-Beardsley, A., Haertel, E., Rothstein, J. (2012). Evaluating teacher evaluation. Phi Delta Kappan, 93(6), 8-15. Google Scholar | SAGE Journals | ISI | |
|
Dee, T. S., Wyckoff, J. (2015). Incentives, selection, and teacher performance: Evidence from IMPACT. Journal of Policy Analysis and Management, 34(2), 267-297. Google Scholar | Crossref | ISI | |
|
Durand, F. T., Lawson, H. A., Wilcox, K. C., Schiller, K. S. (2016). The role of district office leaders in the adoption and implementation of the Common Core State Standards in elementary schools. Educational Administration Quarterly, 52(1), 45-74. Google Scholar | SAGE Journals | ISI | |
|
Every Student Succeeds Act of 2015 , S. 1177, 114th Congress. Google Scholar | |
|
Gallagher, H. A. (2004). Vaughn Elementary’s innovative teacher evaluation system: Are teacher evaluation scores related to growth in student achievement? Peabody Journal of Education, 79(4), 79-107. Google Scholar | Crossref | |
|
Grossman, P., Loeb, S., Cohen, J., Wyckoff, J. (2013). Measure for measure: The relationship between measures of instructional practice in middle school English language arts and teachers’ value-added scores. American Journal of Education, 119, 445-470. Google Scholar | Crossref | ISI | |
|
Hallinger, P., Heck, R. H., Murphy, J. (2014). Teacher evaluation and school improvement: An analysis of the evidence. Educational Assessment, Evaluation, and Accountability, 26, 5-28. Google Scholar | Crossref | ISI | |
|
Harris, D. N., Ingle, W. K., Rutledge, S. A. (2014). How teacher evaluation mothods matte for accountability: A comparative analysis of teacher effectiveness ratings by principals and teacher value-added measures. American Educational Research Journal, 51(1), 73-112. Google Scholar | SAGE Journals | ISI | |
|
Hill, H. C., Charalambous, C. Y., Kraft, M. A. (2012). When rater reliability is not enough: Teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56-64. Google Scholar | SAGE Journals | ISI | |
|
Hill, K. A. (2013). Walking the tightrope: Secondary school principals’ perspectives on teacher evaluation (Unpublished doctoral dissertation). The George Washington University, Washington, DC. Google Scholar | |
|
Hinchey, P. H. (2010, December 7). Getting teacher assessment right: What policymakers can learn from research. Boulder, CO: National Education Policy Center. Retrieved from http://nepc.colorado.edu/publication/getting-teacher-assessment-right Google Scholar | |
|
Holtzapple, E. (2003). Criterion-related validity evidence for a standards-based teacher evaluation system. Journal of Personal Evaluation in Education, 17(3), 207-219. Google Scholar | Crossref | |
|
Jiang, J. Y., Sporte, S. E., Luppescu, S. (2015). Teacher perspectives on evaluation reform: Chicago’s REACH students. Educational Researcher, 44(2), 105-116. Google Scholar | SAGE Journals | ISI | |
|
Joint Committee on Standards for Educational Evaluation . (2009). The personnel evaluation standards: How to assess systems for evaluating educators (2nd ed.). Thousand Oaks, CA: Corwin Press. Google Scholar | |
|
Kane, T. J., McCaffrey, D. F., Miller, T., Staiger, D. O. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment (MET Project). Seattle, WA: Bill & Melinda Gates Foundation. Google Scholar | |
|
Kimball, S. M., Milanowski, A. (2009). Examining teacher evaluation validity and leadership decision making within a standards-based movement. Educational Administration Quarterly, 45(1), 34-70. Google Scholar | SAGE Journals | ISI | |
|
Kimball, S. M., White, B., Milanowski, A. T., Borman, G. (2004). Examining the relationship between teacher evaluation and student assessment results in Washoe County. Peabody Journal of Education, 79(4), 54-78. Google Scholar | Crossref | |
|
Kraft, M. A., Gilmour, A. F. (2016, July). Revisiting the widget effect: Teacher evaluation reforms and the distribution of teacher effectiveness (Brown University Working Paper). Retrieved from http://scholar.harvard.edu/files/mkraft/files/revisitingrr_final_unblined.pdf Google Scholar | |
|
Lunenburg, F. C., Ornstein, A. C. (2011). Educational administration: Concepts and practices (6th ed.). Boston, MA: Cengage Learning. Google Scholar | |
|
Milanowski, A. (2004). The relationship between teacher performance evaluation scores and student achievement: Evidence from Cincinnati. Peabody Journal of Education, 79(4), 33-53. Google Scholar | Crossref | |
|
Narris, D. N., Ingle, W. K., Rutledge, S. A. (2014). How teacher evaluation method matter for accountability: A comparative analysis of teacher effectiveness ratings by principals and teacher value-added measures. American Educational Research Journal, 51(1), 73-112. Google Scholar | SAGE Journals | ISI | |
|
Odden, A. (2004). Lessons learned about standards-based teacher evaluation system. Peabody Journal of Education, 79(4), 126-137. Google Scholar | Crossref | |
|
Sartain, L., Stoelinga, S. R., Brown, E. R. (2011). Rethinking teacher evaluation in Chicago: Lessons learned from classroom observations, principal-teacher conferences, and district implementation. Chicago, IL: Consortium on Chicago School Research. Google Scholar | |
|
Stronge, J. H. (2007). Qualities of effective teachers (2nd ed.). Alexandria, VA: Association for Supervision and Curriculum Development. Google Scholar | |
|
Stronge, J. H., Grant, L. W., Xu, X. (2013). Teacher evaluation and teacher effectiveness. In Meyer, L. (Ed.), Oxford bibliographies in education. New York, NY: Oxford University Press. Retrieved from http://www.oxfordbibliographies.com/view/document/obo-9780199756810/obo-9780199756810-0138.xml#firstMatch Google Scholar | |
|
Thoonen, E. E. J., Sleegers, P. J. C., Oort, F. J., Peetsma, T. T. D. (2012). Building school-wide capacity for improvement: The role of leadership, school organizational conditions, and teacher factors. School Effectiveness & School Improvement, 23(4), 441-460. Google Scholar | Crossref | ISI | |
|
Toch, T., Rothman, R. (2008). Rush to judgment: Teacher evaluation in public education. Washington, DC: Education Sector. Google Scholar | |
|
Virginia Department of Education . (2012). Guidelines for uniform performance standards and evaluation criteria for teacher. Retrieved from http://www.doe.virginia.gov/teaching/performance_evaluation/teacher/index.shtml. Google Scholar | |
|
Weisberg, D., Sexton, S., Mulhern, J., Keeling, D. (2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. Retrieved from www.widgeteffect.org Google Scholar | |
|
Woodland, R. H., Mazur, R. (2015). Beyond hammers versus hugs: Leveraging educator evaluation and professional learning communities into job-embedded professional development. NASSP Bulletin, 99(1), 5-25. Google Scholar | SAGE Journals |
Author Biographies
Xianxuan Xu is a senior research associate at the Stronge and Associates Educational Consulting, LLC. She received her doctorate from the College of William and Mary’s Educational Policy, Planning, and Leadership Program. Her research interests are teacher effectiveness, teacher and principal evaluation, and cross-cultural comparative analyses of teacher qualities and evaluation.
Leslie W. Grant is associate dean for Academic Programs and associate professor for Educational Policy, Planning, and Leadership in the School of Education at the College of William & Mary. Her research interests focus on classroom-based assessments and teacher quality. She is involved in several research projects, including international comparative case studies of award-winning teachers in the United States and China and the efficacy of development assessment literacy in pre-service teachers, in-service teachers, and educational leaders.
Thomas J. Ward is professor in the School of Education at the College of William and Mary in Williamsburg, Virginia. Among his primary research interests are the use of data modeling for teaching and school improvement, the use of test data in decision making, and at-risk programs evaluation. He has worked as a consultant with the state departments of education in Virginia, South Carolina, and Pennsylvania and with numerous school divisions in Virginia, Pennsylvania, New Jersey, and Delaware.

