States and districts are under increasing pressure to evaluate the effectiveness of their teachers and to ensure that all students receive high-quality instruction. This article describes some of the challenges associated with current effectiveness approaches, including paper-and-pencil tests of pedagogical content knowledge, classroom observation systems, and value-added models. It proposes development of a new teacher evaluation system using a virtual reality environment and describes how innovations in educational measurement and technology can be used to develop an improved teacher effectiveness measure.

Adair, G. (1984). The Hawthorne effect: A reconsideration of the methodological artifact. Journal of Applied Psychology, 69, 334-345.
Google Scholar | Crossref | ISI
American Recovery and Reinvestment Act of 2009: P. L. 111-5, as signed by the President on February 17, 2009: Law, explanation, and analysis . (2009). Chicago, IL: CCH.
Google Scholar
Baker, E. L. (2008, April). Empirically determining the instructional sensitivity of an accountability test. Paper presented at the annual meeting of the American Educational Research Association, New York, NY.
Google Scholar
Ball, D. L., Thames, M. H., Phelps, G. (2008). Content knowledge for teaching: What makes it special? Journal of Teacher Education, 59, 389-407.
Google Scholar | SAGE Journals | ISI
Brophy, J. E., Good, T. L. (1986). Teacher behavior and student achievement. In Whitrock, M. C. (Ed.), Handbook of research on teaching (pp. 328-375). New York, NY: Macmillan.
Google Scholar
Campbell, D. T., Kenny, D. A. (1999). A primer of regression artifacts. New York, NY: Guilford.
Google Scholar
Campbell, R. J., Kyriakides, L., Mujis, R. D., Robinson, W. (2004). Differential teacher effectiveness: Towards a model for research and teacher appraisal. Oxford Review of Education, 29, 347-362.
Google Scholar | Crossref | ISI
Chism, N. V. N. (2007). Peer review of teaching: A sourcebook (2nd ed.). Bolton, MA: Anker.
Google Scholar
Clarke-Midura, J., Code, J., Mayrath, M., Dede, C. (2011, April). Exploring inquiry processes in virtual performance assessments. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA.
Google Scholar
D’Agostino, J. V., Powers, S. J. (2009). Predicting teacher performance with test scores and grade point average: A meta-analysis. American Educational Research Journal, 46, 146-182.
Google Scholar | SAGE Journals | ISI
D’Agostino, J. V., Welsh, M. E., Corson, N. M. (2007). Instructional validity of a state’s standards-based assessment. Educational Assessment, 12, 1-22.
Google Scholar | Crossref
Danielson, C., McGreal, T. L. (2000). Teacher evaluation to enhance professional practice. Princeton, NJ: Association for Supervision and Curriculum Development.
Google Scholar
Dixon, A. (2011). Focus on teacher reform legislation in SREB states: Evaluation policies. Atlanta, GA: Southern Regional Educational Board. Retrieved from http://publications.sreb.org/2011/11S07_Focus_Teach_Eval.pdf
Google Scholar
Domaleski, C., Hill, R. (2010). Considerations for using assessment data to inform determinations of teacher effectiveness. Dover, NH: National Center for Improving Educational Assessment. Retrieved from http://www.nciea.org/papers-UsingAssessmentData4-29-10.pdf
Google Scholar
Educational Testing Service . (2006). Proper use of the Praxis series and related assessments. Princeton, NJ: Author. Retrieved from http://www.ets.org/Media/Tests/PRAXIS/pdf/guidelines.pdf
Google Scholar
Ericsson, K. A., Simon, H. A. (1993). Protocol analysis. Cambridge, MA: MIT Press.
Google Scholar
Federation of State Medical Boards of the United States & National Board of Medical Examiners . (2011). United States Medical Licensing Exam: 2011 Bulletin of information. Philadelphia, PA: Author. Retrieved from http://www.usmle.org/General_Information/bulletin/2011/2011%20BOI.pdf.
Google Scholar
Georgia Department of Education . (2011). CLASS keys: Classroom analysis of state standards: The Georgia teacher evaluation system. Atlanta, GA: Author. Retrieved from http://www.gadoe.org/DMGetDocument.aspx/CK%20Standards%2010-18-2010.pdf?p=6CC6799F8C1371F6B59CF81E4ECD54E63F615CF1D9441A92E28BFA2A0AB27E3E&Type=D
Google Scholar
Goe, L., Bell, C., Little, O. (2008). Approaches to evaluating teacher effectiveness: A research synthesis. Princeton, NJ: Educational Testing Service.
Google Scholar
Goe, L., Holdheide, L. (2011). Measuring teachers’ contributions to student learning growth for nontested grades and subjects. Washington, DC: National Comprehensive Center on Teacher Quality.
Google Scholar
Goldhaber, D., Anthony, E. (2007). Can teacher quality be effectively assessed? National Board Certification as a signal of effective teaching. Review of Economics and Statistics, 89, 134-150.
Google Scholar | Crossref | ISI
Goldhaber, D., Hansen, M. (2010). Race, gender, and teacher testing: How objective a tool is teacher licensure testing? American Educational Research Journal, 47, 218-251.
Google Scholar | SAGE Journals | ISI
Gordon, R., Kane, T. J., Staiger, D. O. (2009). Identifying effective teachers using performance on the job. Washington, DC: Brookings Institution.
Google Scholar
Hambleton, R. H., Jones, R. W. (1993). An NCME instructional module on comparison of classical test theory and item response theory and their application to test development. Educational Measurement: Issues and Practice, 12, 38-47.
Google Scholar | Crossref
Heneman, H. G., Milanowski, A., Kimball, S. M., Odden, A. (2006). Standards-based teacher evaluation as a foundation for knowledge and skill-based pay (CPRE Research Report RB-45). Philadelphia: University of Pennsylvania, Consortium for Policy Research in Education. Retrieved from http://www.cpre.org/images/stories/cpre_pdfs/RB45.pdf
Google Scholar
Hill, H. C., Blunk, M., Charalambous, C., Lewis, J., Phelps, G. C., Sleep, L., Ball, D. L. (2008). Mathematical knowledge for teaching and the mathematical quality of instruction: An exploratory study. Cognition and Instruction, 26, 430-511.
Google Scholar | Crossref | ISI
Hill, H. C., Kapitula, L., Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48, 794-831.
Google Scholar | SAGE Journals | ISI
Johnson-Glenberg, M. C., Birchfield, D., Savvides, P., Megowan-Romanowicz, C. (2010). Semi-virtual embodied learning-real world STEM assessment. In Annetta, L., Bronack, S. (Eds.), Serious educational game assessment: Practical methods and models for educational games, simulations and virtual worlds (pp. 225-241). Rotterdam, Netherlands: Sense.
Google Scholar
Kolowich, S. (2010, July 7). Avatars to teach the teachers. Inside Higher Ed. Retrieved from http://www.insidehighered.com/news/2010/07/07/avatars
Google Scholar
Kromrey, J., Renfrow, D. (1991, February). Using multiple choice examination items to measure teachers’ content-specific pedagogical knowledge. Paper presented at the annual meeting of the Eastern Educational Research Association, Boston, MA.
Google Scholar
Leighton, J. P. (2009, April). Two types of think aloud interviews for educational measurement: Protocol and verbal analysis. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Google Scholar
Lockwood, J. R., Louis, T. A., McCaffrey, D. F. (2002). Uncertainty in rank estimation: Implications for value-added modeling accountability systems. Journal of Educational and Behavioral Statistics, 27, 255-270.
Google Scholar | SAGE Journals | ISI
Lohman, D. F., Korb, K. A. (2006). Gifted today but not tomorrow? Longitudinal changes in ability and achievement during elementary school. Journal for the Education of the Gifted, 29, 451-484.
Google Scholar | SAGE Journals
McBee, M. (2010). Modeling outcomes with floor or ceiling effects: An introduction to the Tobit model. Gifted Child Quarterly, 54, 314-320.
Google Scholar | SAGE Journals | ISI
McCaffrey, D. F., Koretz, D. M., Lockwood, J. R., Hamilton, L. S. (2003). Evaluating value-added models for teacher accountability. Santa Monica, CA: Rand.
Google Scholar | Crossref
McCaffrey, D. F., Lockwood, J. R., Koretz, D. M., Louis, T. A., Hamilton, L. S. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29, 67-101.
Google Scholar | SAGE Journals | ISI
Meyer, R. H. (2010, December). Value-added systems, accountability, and performance management in education: Promises and pitfalls. Paper presented at the 2010 Annual Conference of the Iowa Educational Research and Evaluation Association, Cedar Falls.
Google Scholar
Milanowski, A. (2011). Strategic measures of teacher performance. Phi Delta Kappan, 92, 19-25.
Google Scholar | SAGE Journals | ISI
Mislevy, R. J., Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educational Measurement: Issues and Practice, 25, 6-20.
Google Scholar | Crossref
Mislevy, R. J., Riconscente, M. M. (2006). Evidence-centered assessment design: Layers, concepts, and terminology. In Downing, S., Haladyna, T. (Eds.), Handbook of test development (pp. 61-90). Mahwah, NJ: Erlbaum.
Google Scholar
Mujis, D. (2006). Measuring teacher effectiveness: Some methodological reflections. Educational Research and Evaluation, 12, 53-74.
Google Scholar | Crossref
No Child Left Behind Act of 2001, Pub. L. No. 107-110, § 115, Stat. 1425 . (2002).
Google Scholar
Papay, J. P. (2011). Different tests, different answers: The stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48, 163-193.
Google Scholar | SAGE Journals | ISI
Peters, S. J., Gates, J. C. (2010). The teacher observation form: Revisions and updates. Gifted Child Quarterly, 54, 179-188.
Google Scholar | SAGE Journals | ISI
Phelps, L., Schmitz, C. D., Boatright, B. (1986). The effects of halo and leniency on cooperating teacher reports using Likert-type rating scales. Journal of Educational Research, 79, 151-154.
Google Scholar | Crossref | ISI
Pianta, R. C., Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38, 109-119.
Google Scholar | SAGE Journals | ISI
Polikoff, M. S. (2010). Instructional sensitivity as a psychometric property of assessments. Educational Measurement: Issues and Practice, 29, 3-14.
Google Scholar | Crossref
Popham, J. W. (2007). Instructional insensitivity of tests: Accountability’s dire drawback. Phi Delta Kappan, 89, 146-155.
Google Scholar | SAGE Journals | ISI
Prince, C. D., Schuermann, P. J., Guthrie, J. W., Witham, P. J., Milanowski, A. T., Thorn, C. A. (2009). The other 69 percent: Fairly rewarding the performance of teachers of nontested subjects and grades. Washington, DC: Center for Educator Compensation Reform. Retrieved from http://www.cecr.ed.gov/guides/other69Percent.pdf
Google Scholar
Rambo, K. E. (2011). How much do schools matter? Using summer growth patterns to assess the impact of schools on high achieving and gifted students (Unpublished doctoral dissertation). University of Connecticut, Storrs.
Google Scholar
Reis, S. M., McCoach, D. B. (2000). The underachievement of gifted students: What do we know and where do we go? Gifted Child Quarterly, 44, 152-170.
Google Scholar | SAGE Journals | ISI
Reis, S. M., Renzulli, J. S. (2011). Challenging gifted and talented learners with a continuum of research-based intervention strategies. In Kehle, T. J., Bray, M. A., Nathan, P. E. (Eds.), Oxford handbook of school psychology (pp. 456-482). Oxford, UK: Oxford University Press.
Google Scholar | Crossref
Robinson, A., Shore, B., Enerson, D. (2007). Best practices in gifted education: An evidence-based guide. Waco, TX: Prufrock.
Google Scholar
Rowan, B., Schilling, S. G., Ball, D. L., Miller, R. (2001). Measuring teachers’ pedagogical content knowledge in surveys: An exploratory study. Ann Arbor: University of Michigan.
Google Scholar
Sanders, W. L., Wright, S. P., Horn, S. P. (1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of Personnel Evaluation in Education, 11, 57-67.
Google Scholar | Crossref
Shulman, L. S. (1987). Knowledge and teaching: Foundation of the new reform. Harvard Educational Review, 57, 1-22.
Google Scholar | Crossref | ISI
Stronge, J. H., Ward, T. J., Tucker, P. D., Hindman, J. L., McColsky, W., Howard, B. (2007). National Board certified teachers and non-National Board certified teachers: Is there a difference in teacher effectiveness and student achievement? Journal of Personnel Evaluation in Education, 20, 185-210.
Google Scholar | Crossref
Tomlinson, C. (2005). Quality curriculum and instruction for highly able students. Theory Into Practice, 44, 160-166.
Google Scholar | Crossref | ISI
VanTassel-Baska, J., Avery, L., Struck, J., Feng, A., Bracken, B. A., Drummond, D., . . . Quek, C. (2005). Classroom Observation Scale–Revised: User’s manual. Williamsburg, VA: The Center for Gifted Education, College of William and Mary.
Google Scholar
Westberg, K. L., Archambault, F. X., Dobyns, S. M., Salvin, T. (1993). An observational study of instructional and curricular practices used with gifted and talented students in regular classrooms (Research Monograph 93104). Storrs: The National Research Center on the Gifted and Talented, University of Connecticut.
Google Scholar
View access options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Your Access Options


Purchase

JOA-article-ppv for $36.00