Abstract
States and districts are under increasing pressure to evaluate the effectiveness of their teachers and to ensure that all students receive high-quality instruction. This article describes some of the challenges associated with current effectiveness approaches, including paper-and-pencil tests of pedagogical content knowledge, classroom observation systems, and value-added models. It proposes development of a new teacher evaluation system using a virtual reality environment and describes how innovations in educational measurement and technology can be used to develop an improved teacher effectiveness measure.
|
Adair, G. (1984). The Hawthorne effect: A reconsideration of the methodological artifact. Journal of Applied Psychology, 69, 334-345. Google Scholar | Crossref | ISI | |
|
American Recovery and Reinvestment Act of 2009: P. L. 111-5, as signed by the President on February 17, 2009: Law, explanation, and analysis . (2009). Chicago, IL: CCH. Google Scholar | |
|
Baker, E. L. (2008, April). Empirically determining the instructional sensitivity of an accountability test. Paper presented at the annual meeting of the American Educational Research Association, New York, NY. Google Scholar | |
|
Ball, D. L., Thames, M. H., Phelps, G. (2008). Content knowledge for teaching: What makes it special? Journal of Teacher Education, 59, 389-407. Google Scholar | SAGE Journals | ISI | |
|
Brophy, J. E., Good, T. L. (1986). Teacher behavior and student achievement. In Whitrock, M. C. (Ed.), Handbook of research on teaching (pp. 328-375). New York, NY: Macmillan. Google Scholar | |
|
Campbell, D. T., Kenny, D. A. (1999). A primer of regression artifacts. New York, NY: Guilford. Google Scholar | |
|
Campbell, R. J., Kyriakides, L., Mujis, R. D., Robinson, W. (2004). Differential teacher effectiveness: Towards a model for research and teacher appraisal. Oxford Review of Education, 29, 347-362. Google Scholar | Crossref | ISI | |
|
Chism, N. V. N. (2007). Peer review of teaching: A sourcebook (2nd ed.). Bolton, MA: Anker. Google Scholar | |
|
Clarke-Midura, J., Code, J., Mayrath, M., Dede, C. (2011, April). Exploring inquiry processes in virtual performance assessments. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Google Scholar | |
|
D’Agostino, J. V., Powers, S. J. (2009). Predicting teacher performance with test scores and grade point average: A meta-analysis. American Educational Research Journal, 46, 146-182. Google Scholar | SAGE Journals | ISI | |
|
D’Agostino, J. V., Welsh, M. E., Corson, N. M. (2007). Instructional validity of a state’s standards-based assessment. Educational Assessment, 12, 1-22. Google Scholar | Crossref | |
|
Danielson, C., McGreal, T. L. (2000). Teacher evaluation to enhance professional practice. Princeton, NJ: Association for Supervision and Curriculum Development. Google Scholar | |
|
Dixon, A. (2011). Focus on teacher reform legislation in SREB states: Evaluation policies. Atlanta, GA: Southern Regional Educational Board. Retrieved from http://publications.sreb.org/2011/11S07_Focus_Teach_Eval.pdf Google Scholar | |
|
Domaleski, C., Hill, R. (2010). Considerations for using assessment data to inform determinations of teacher effectiveness. Dover, NH: National Center for Improving Educational Assessment. Retrieved from http://www.nciea.org/papers-UsingAssessmentData4-29-10.pdf Google Scholar | |
|
Educational Testing Service . (2006). Proper use of the Praxis series and related assessments. Princeton, NJ: Author. Retrieved from http://www.ets.org/Media/Tests/PRAXIS/pdf/guidelines.pdf Google Scholar | |
|
Ericsson, K. A., Simon, H. A. (1993). Protocol analysis. Cambridge, MA: MIT Press. Google Scholar | |
|
Federation of State Medical Boards of the United States & National Board of Medical Examiners . (2011). United States Medical Licensing Exam: 2011 Bulletin of information. Philadelphia, PA: Author. Retrieved from http://www.usmle.org/General_Information/bulletin/2011/2011%20BOI.pdf. Google Scholar | |
|
Georgia Department of Education . (2011). CLASS keys: Classroom analysis of state standards: The Georgia teacher evaluation system. Atlanta, GA: Author. Retrieved from http://www.gadoe.org/DMGetDocument.aspx/CK%20Standards%2010-18-2010.pdf?p=6CC6799F8C1371F6B59CF81E4ECD54E63F615CF1D9441A92E28BFA2A0AB27E3E&Type=D Google Scholar | |
|
Goe, L., Bell, C., Little, O. (2008). Approaches to evaluating teacher effectiveness: A research synthesis. Princeton, NJ: Educational Testing Service. Google Scholar | |
|
Goe, L., Holdheide, L. (2011). Measuring teachers’ contributions to student learning growth for nontested grades and subjects. Washington, DC: National Comprehensive Center on Teacher Quality. Google Scholar | |
|
Goldhaber, D., Anthony, E. (2007). Can teacher quality be effectively assessed? National Board Certification as a signal of effective teaching. Review of Economics and Statistics, 89, 134-150. Google Scholar | Crossref | ISI | |
|
Goldhaber, D., Hansen, M. (2010). Race, gender, and teacher testing: How objective a tool is teacher licensure testing? American Educational Research Journal, 47, 218-251. Google Scholar | SAGE Journals | ISI | |
|
Gordon, R., Kane, T. J., Staiger, D. O. (2009). Identifying effective teachers using performance on the job. Washington, DC: Brookings Institution. Google Scholar | |
|
Hambleton, R. H., Jones, R. W. (1993). An NCME instructional module on comparison of classical test theory and item response theory and their application to test development. Educational Measurement: Issues and Practice, 12, 38-47. Google Scholar | Crossref | |
|
Heneman, H. G., Milanowski, A., Kimball, S. M., Odden, A. (2006). Standards-based teacher evaluation as a foundation for knowledge and skill-based pay (CPRE Research Report RB-45). Philadelphia: University of Pennsylvania, Consortium for Policy Research in Education. Retrieved from http://www.cpre.org/images/stories/cpre_pdfs/RB45.pdf Google Scholar | |
|
Hill, H. C., Blunk, M., Charalambous, C., Lewis, J., Phelps, G. C., Sleep, L., Ball, D. L. (2008). Mathematical knowledge for teaching and the mathematical quality of instruction: An exploratory study. Cognition and Instruction, 26, 430-511. Google Scholar | Crossref | ISI | |
|
Hill, H. C., Kapitula, L., Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48, 794-831. Google Scholar | SAGE Journals | ISI | |
|
Johnson-Glenberg, M. C., Birchfield, D., Savvides, P., Megowan-Romanowicz, C. (2010). Semi-virtual embodied learning-real world STEM assessment. In Annetta, L., Bronack, S. (Eds.), Serious educational game assessment: Practical methods and models for educational games, simulations and virtual worlds (pp. 225-241). Rotterdam, Netherlands: Sense. Google Scholar | |
|
Kolowich, S. (2010, July 7). Avatars to teach the teachers. Inside Higher Ed. Retrieved from http://www.insidehighered.com/news/2010/07/07/avatars Google Scholar | |
|
Kromrey, J., Renfrow, D. (1991, February). Using multiple choice examination items to measure teachers’ content-specific pedagogical knowledge. Paper presented at the annual meeting of the Eastern Educational Research Association, Boston, MA. Google Scholar | |
|
Leighton, J. P. (2009, April). Two types of think aloud interviews for educational measurement: Protocol and verbal analysis. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA. Google Scholar | |
|
Lockwood, J. R., Louis, T. A., McCaffrey, D. F. (2002). Uncertainty in rank estimation: Implications for value-added modeling accountability systems. Journal of Educational and Behavioral Statistics, 27, 255-270. Google Scholar | SAGE Journals | ISI | |
|
Lohman, D. F., Korb, K. A. (2006). Gifted today but not tomorrow? Longitudinal changes in ability and achievement during elementary school. Journal for the Education of the Gifted, 29, 451-484. Google Scholar | SAGE Journals | |
|
McBee, M. (2010). Modeling outcomes with floor or ceiling effects: An introduction to the Tobit model. Gifted Child Quarterly, 54, 314-320. Google Scholar | SAGE Journals | ISI | |
|
McCaffrey, D. F., Koretz, D. M., Lockwood, J. R., Hamilton, L. S. (2003). Evaluating value-added models for teacher accountability. Santa Monica, CA: Rand. Google Scholar | Crossref | |
|
McCaffrey, D. F., Lockwood, J. R., Koretz, D. M., Louis, T. A., Hamilton, L. S. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29, 67-101. Google Scholar | SAGE Journals | ISI | |
|
Meyer, R. H. (2010, December). Value-added systems, accountability, and performance management in education: Promises and pitfalls. Paper presented at the 2010 Annual Conference of the Iowa Educational Research and Evaluation Association, Cedar Falls. Google Scholar | |
|
Milanowski, A. (2011). Strategic measures of teacher performance. Phi Delta Kappan, 92, 19-25. Google Scholar | SAGE Journals | ISI | |
|
Mislevy, R. J., Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educational Measurement: Issues and Practice, 25, 6-20. Google Scholar | Crossref | |
|
Mislevy, R. J., Riconscente, M. M. (2006). Evidence-centered assessment design: Layers, concepts, and terminology. In Downing, S., Haladyna, T. (Eds.), Handbook of test development (pp. 61-90). Mahwah, NJ: Erlbaum. Google Scholar | |
|
Mujis, D. (2006). Measuring teacher effectiveness: Some methodological reflections. Educational Research and Evaluation, 12, 53-74. Google Scholar | Crossref | |
|
No Child Left Behind Act of 2001, Pub. L. No. 107-110, § 115, Stat. 1425 . (2002). Google Scholar | |
|
Papay, J. P. (2011). Different tests, different answers: The stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48, 163-193. Google Scholar | SAGE Journals | ISI | |
|
Peters, S. J., Gates, J. C. (2010). The teacher observation form: Revisions and updates. Gifted Child Quarterly, 54, 179-188. Google Scholar | SAGE Journals | ISI | |
|
Phelps, L., Schmitz, C. D., Boatright, B. (1986). The effects of halo and leniency on cooperating teacher reports using Likert-type rating scales. Journal of Educational Research, 79, 151-154. Google Scholar | Crossref | ISI | |
|
Pianta, R. C., Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38, 109-119. Google Scholar | SAGE Journals | ISI | |
|
Polikoff, M. S. (2010). Instructional sensitivity as a psychometric property of assessments. Educational Measurement: Issues and Practice, 29, 3-14. Google Scholar | Crossref | |
|
Popham, J. W. (2007). Instructional insensitivity of tests: Accountability’s dire drawback. Phi Delta Kappan, 89, 146-155. Google Scholar | SAGE Journals | ISI | |
|
Prince, C. D., Schuermann, P. J., Guthrie, J. W., Witham, P. J., Milanowski, A. T., Thorn, C. A. (2009). The other 69 percent: Fairly rewarding the performance of teachers of nontested subjects and grades. Washington, DC: Center for Educator Compensation Reform. Retrieved from http://www.cecr.ed.gov/guides/other69Percent.pdf Google Scholar | |
|
Rambo, K. E. (2011). How much do schools matter? Using summer growth patterns to assess the impact of schools on high achieving and gifted students (Unpublished doctoral dissertation). University of Connecticut, Storrs. Google Scholar | |
|
Reis, S. M., McCoach, D. B. (2000). The underachievement of gifted students: What do we know and where do we go? Gifted Child Quarterly, 44, 152-170. Google Scholar | SAGE Journals | ISI | |
|
Reis, S. M., Renzulli, J. S. (2011). Challenging gifted and talented learners with a continuum of research-based intervention strategies. In Kehle, T. J., Bray, M. A., Nathan, P. E. (Eds.), Oxford handbook of school psychology (pp. 456-482). Oxford, UK: Oxford University Press. Google Scholar | Crossref | |
|
Robinson, A., Shore, B., Enerson, D. (2007). Best practices in gifted education: An evidence-based guide. Waco, TX: Prufrock. Google Scholar | |
|
Rowan, B., Schilling, S. G., Ball, D. L., Miller, R. (2001). Measuring teachers’ pedagogical content knowledge in surveys: An exploratory study. Ann Arbor: University of Michigan. Google Scholar | |
|
Sanders, W. L., Wright, S. P., Horn, S. P. (1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of Personnel Evaluation in Education, 11, 57-67. Google Scholar | Crossref | |
|
Shulman, L. S. (1987). Knowledge and teaching: Foundation of the new reform. Harvard Educational Review, 57, 1-22. Google Scholar | Crossref | ISI | |
|
Stronge, J. H., Ward, T. J., Tucker, P. D., Hindman, J. L., McColsky, W., Howard, B. (2007). National Board certified teachers and non-National Board certified teachers: Is there a difference in teacher effectiveness and student achievement? Journal of Personnel Evaluation in Education, 20, 185-210. Google Scholar | Crossref | |
|
Tomlinson, C. (2005). Quality curriculum and instruction for highly able students. Theory Into Practice, 44, 160-166. Google Scholar | Crossref | ISI | |
|
VanTassel-Baska, J., Avery, L., Struck, J., Feng, A., Bracken, B. A., Drummond, D., . . . Quek, C. (2005). Classroom Observation Scale–Revised: User’s manual. Williamsburg, VA: The Center for Gifted Education, College of William and Mary. Google Scholar | |
|
Westberg, K. L., Archambault, F. X., Dobyns, S. M., Salvin, T. (1993). An observational study of instructional and curricular practices used with gifted and talented students in regular classrooms (Research Monograph 93104). Storrs: The National Research Center on the Gifted and Talented, University of Connecticut. Google Scholar |

