A common strategy for estimating treatment effects in observational studies using individual student-level data is analysis of covariance (ANCOVA) or hierarchical variants of it, in which outcomes (often standardized test scores) are regressed on pretreatment test scores, other student characteristics, and treatment group indicators. Measurement error in the prior test scores, which typically is both large and heteroscedastic, is regularly overlooked in empirical analyses and may erode the ability of regression models to adjust for student factors and may result in biased treatment effect estimates. We develop extensions of method-of-moments, Simulation-Extrapolation, and latent regression approaches to correcting for measurement error using the conditional standard errors of measure of test scores, and demonstrate their effectiveness relative to simpler alternatives using both simulation and a case study of teacher value-added effect estimation using longitudinal data from a large suburban school district.

Ballou, D. (2009). Test scaling and value-added measurement. Education Finance and Policy, 4, 351383.
Google Scholar | Crossref
Battauz, M., Bellio, R., Gori, E. (2011). Covariate measurement error adjustment for multilevel models with application to education data. Journal of Educational and Behavioral Statistics, 36, 283306.
Google Scholar | SAGE Journals
Bianconcini, S., Cagnone, S. (2012). A general multivariate latent growth model with applications to student achievement. Journal of Educational and Behavioral Statistics, 37, 339364.
Google Scholar | SAGE Journals
Bollen, K. (1989). Structural equations with latent variables. New York, NY: John Wiley.
Google Scholar | Crossref
Boyd, D., Lankford, H., Loeb, S., Wyckoff, J. (2012). Measuring test measurement error: A general approach. (National Bureau of Economic Research Working Paper 18010).
Google Scholar | Crossref
Braun, H. (2005). Using student progress to evaluate teachers: A primer on value-added models. Princeton, NJ: Educational Testing Service, Policy Information Center.
Google Scholar
Briggs, D., Domingue, B. (2011). Due diligence and the evaluation of teachers: A review of the value-added analysis underlying the effectiveness rankings of Los Angeles Unified School District teachers by the Los Angeles Times. Boulder, CO: National Education Policy Center.
Google Scholar
Briggs, D., Weeks, J. (2009). The sensitivity of value-added modeling to the creation of a vertical score scale. Education Finance and Policy, 4, 384414.
Google Scholar | Crossref
Camilli, G. (2006). Examination of a simple errors-in-variables model: A demonstration of marginal maximum likelihood. Journal of Educational and Behavioral Statistics, 31, 311325.
Google Scholar | SAGE Journals
Carroll, R., Ruppert, D., Stefanski, L., Crainiceanu, C. (2006). Measurement error in nonlinear models: A modern perspective (2nd ed.). London, England: Chapman and Hall.
Google Scholar | Crossref
Chetty, R., Friedman, J., Rockoff, J. (2013). Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates. (National Bureau of Economic Research Working Paper w19423). Cambridge, MA: National Bureau of Economic Research.
Google Scholar | Crossref
Choi, K., Seltzer, M. (2010). Modeling heterogeneity in relationships between initial status and rates of change: Treating latent variable regression coefficients as random coefficients in a three-level hierarchical model. Journal of Educational and Behavioral Statistics, 35, 5491.
Google Scholar | SAGE Journals
Cook, J., Stefanski, L. (1994). Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association, 89, 13141328.
Google Scholar | Crossref
CTB/McGraw-Hill . (2011). New York State Testing Program 2011: Mathematics, Grades 3-8. Technical Report. New York, NY: Author.
Google Scholar
Culpepper, S., Aguinis, H. (2011). Using analysis of covariance (ANCOVA) with fallible covariates. Psychological Methods, 16, 166178.
Google Scholar | Crossref | Medline
Devanarayan, V., Stefanski, L. (2002). Empirical simulation extrapolation for measurement error models with replicate measurements. Statistics & Probability Letters, 59, 219225.
Google Scholar | Crossref
Fox, J.-P., Glas, C. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika, 68, 169191.
Google Scholar | Crossref
Fuller, W. (2006). Measurement error models (2nd ed.). New York, NY: John Wiley.
Google Scholar
Gelman, A., Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457472.
Google Scholar | Crossref
Greene, W. (2003). Econometric analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall.
Google Scholar
Harris, D. (2011). Value-Added Measures in Education: What Every Educator Needs to Know. Cambridge, MA: Harvard Education Press.
Google Scholar
Harris, D., Anderson, A. (2012, November). Bias of public sector worker performance monitoring: Theory and empirical evidence from middle school teachers. Paper presented at annual Association for Public Policy Analysis and Management conference, Baltimore, MD.
Google Scholar
Harris, D., Sass, T. (2006). Value-added models and the measurement of teacher quality. Unpublished manuscript, Florida State University, Tallahassee, FL.
Google Scholar
Junker, B., Schofield, L., Taylor, L. (2012). The use of cognitive ability measures as explanatory variables in regression analysis. IZA Journal of Labor Economics, 1, 119.
Google Scholar | Crossref
Kane, T., McCaffrey, D., Miller, T., Staiger, D. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Bill and Melinda Gates Foundation MET Project Research Paper.
Google Scholar
Kang, J., Schafer, J. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 523539.
Google Scholar | Crossref
Koedel, C., Betts, J. (2007). Re-examining the role of teacher quality in the educational production function (Technical report). Columbia: University of Missouri.
Google Scholar
Koedel, C., Leatherman, R., Parsons, E. (2012). Test measurement error and inference from value-added models. The B.E. Journal of Economic Analysis and Policy, 12, 137.
Google Scholar | Crossref
Kolen, M., Zeng, L., Hanson, B. (1996). Conditional standard errors of measurement for scale scores using IRT. Journal of Educational Measurement, 33, 129140.
Google Scholar | Crossref
Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 73, 805811.
Google Scholar | Crossref
Lehmann, E. (1999). Elements of large-sample theory. New York, NY: Springer-Verlag.
Google Scholar | Crossref
Lockwood, J., McCaffrey, D. (2007). Controlling for individual heterogeneity in longitudinal models, with applications to student achievement. Electronic Journal of Statistics, 1, 223252.
Google Scholar | Crossref
Lockwood, J., McCaffrey, D. (In press). Should nonlinear functions of test scores be used as covariates in a regression model? In Lissitz, R. (Ed.), Value-added modeling and growth modeling with particular application to teacher and school effectiveness. Charlotte, NC: Information Age.
Google Scholar
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Lunn, D., Thomas, A., Best, N., Spiegelhalter, D. (2000). WinBUGS—A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325337.
Google Scholar | Crossref
Martineau, J. (2006). Distorting value added: The use of longitudinal, vertically scaled student achievement data for value-added accountability. Journal of Educational and Behavioral Statistics, 31, 3562.
Google Scholar | SAGE Journals
McCaffrey, D., Han, B., Lockwood, J. (2013). Using auxiliary teacher data to improve value-added: An application of small area estimation to middle school mathematics teachers. In Lissitz, R. (Ed.), Value-added modeling and growth modeling with particular application to teacher and school effectiveness. Charlotte, NC: Information Age.
Google Scholar
McCaffrey, D., Lockwood, J., Koretz, D., Louis, T., Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29, 67101.
Google Scholar | SAGE Journals
McCaffrey, D., Lockwood, J., Setodji, C. (2013). Inverse probability weighting with error-prone covariates. Biometrika, 100, 671680.
Google Scholar | Crossref | Medline
Mihaly, K., McCaffrey, D., Lockwood, J., Sass, T. (2010). Centering and reference groups for estimates of fixed effects: Modifications to felsdvreg. The Stata Journal, 10, 82103.
Google Scholar | Abstract
Millsap, R. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248260.
Google Scholar | Crossref
Millsap, R. (2007). Invariance in measurement and prediction revisited. Psychometrika, 72, 461473.
Google Scholar | Crossref
Mislevy, R., Johnson, E., Muraki, E. (1992). Scaling procedures in NAEP. Journal of Educational Statistics, 17, 131154.
Google Scholar | SAGE Journals
Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), Vienna, Austria.
Google Scholar
Rabe-Hesketh, S., Pickles, A., Skrondal, A. (2003). Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation. Statistical Modelling, 3, 215232.
Google Scholar | SAGE Journals
Rabe-Hesketh, S., Skrondal, A., Pickles, A. (2001). Maximum likelihood estimation of generalized linear models with covariate measurement error. The Stata Journal, 1, 126.
Google Scholar
Rabe-Hesketh, S., Skrondal, A., Pickles, A. (2004). GLLAMM Manual (U.C. Berkeley Division of Biostatistics Working Paper Series 1160). Berkeley, CA: Berkeley Electronic Press.
Google Scholar
Raudenbush, S., Bryk, A. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Newbury Park, CA: Sage.
Google Scholar
Reckase, M. (2004). The real world is more complicated than we would like. Journal of Educational and Behavioral Statistics, 29, 117120.
Google Scholar | SAGE Journals
Roeder, K., Carroll, R., Lindsay, B. (1996). Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 91, 722732.
Google Scholar | Crossref
Rosenbaum, P. R., Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 4155.
Google Scholar | Crossref
Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4, 537571.
Google Scholar | Crossref
Schafer, D., Purdy, K. (1996). Likelihood analysis for errors-in-variables regression with replicate measurements. Biometrika, 83, 813824.
Google Scholar | Crossref
Schafer, W. (2006). Growth scales as an alternative to vertical scales. Practical Assessment, Research & Evaluation, 11.
Google Scholar
Shang, Y. (2012). Measurement error adjustment using the SIMEX method: An application to student growth percentiles. Journal of Educational Measurement, 49, 446465.
Google Scholar | Crossref
Skrondal, A., Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, FL: Chapman and Hall/CRC Interdisciplinary Statistics.
Google Scholar | Crossref
Smarter Balanced Assessment Consortium . (2013). Smarter balanced assessments. Retrieved from http://www.smarterbalanced.org/smarter-balanced-assessments/
Google Scholar
Stata Corp . (2011). Stata Statistical Software: Release 12. College Station, TX: Author.
Google Scholar
Value Added Research Center . (2010). NYC teacher data initiative: Technical report on the NYC value-added model. Madison, WI: Wisconsin Center for Education Research.
Google Scholar
van der Linden, W., Hambleton, R. (1996). Handbook of modern item response theory. New York, NY: Springer-Verlag.
Google Scholar
Wansbeek, T., Meijer, E. (2000). Measurement error and latent variables in econometrics. New York, NY: Elsevier.
Google Scholar
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50, 126.
Google Scholar | Crossref
Wooldridge, J. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.
Google Scholar
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

JEB-article-ppv for $37.50

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top