Abstract
A common strategy for estimating treatment effects in observational studies using individual student-level data is analysis of covariance (ANCOVA) or hierarchical variants of it, in which outcomes (often standardized test scores) are regressed on pretreatment test scores, other student characteristics, and treatment group indicators. Measurement error in the prior test scores, which typically is both large and heteroscedastic, is regularly overlooked in empirical analyses and may erode the ability of regression models to adjust for student factors and may result in biased treatment effect estimates. We develop extensions of method-of-moments, Simulation-Extrapolation, and latent regression approaches to correcting for measurement error using the conditional standard errors of measure of test scores, and demonstrate their effectiveness relative to simpler alternatives using both simulation and a case study of teacher value-added effect estimation using longitudinal data from a large suburban school district.
References
|
Ballou, D. (2009). Test scaling and value-added measurement. Education Finance and Policy, 4, 351–383. Google Scholar | Crossref | |
|
Battauz, M., Bellio, R., Gori, E. (2011). Covariate measurement error adjustment for multilevel models with application to education data. Journal of Educational and Behavioral Statistics, 36, 283–306. Google Scholar | SAGE Journals | |
|
Bianconcini, S., Cagnone, S. (2012). A general multivariate latent growth model with applications to student achievement. Journal of Educational and Behavioral Statistics, 37, 339–364. Google Scholar | SAGE Journals | |
|
Bollen, K. (1989). Structural equations with latent variables. New York, NY: John Wiley. Google Scholar | Crossref | |
|
Boyd, D., Lankford, H., Loeb, S., Wyckoff, J. (2012). Measuring test measurement error: A general approach. (National Bureau of Economic Research Working Paper 18010). Google Scholar | Crossref | |
|
Braun, H. (2005). Using student progress to evaluate teachers: A primer on value-added models. Princeton, NJ: Educational Testing Service, Policy Information Center. Google Scholar | |
|
Briggs, D., Domingue, B. (2011). Due diligence and the evaluation of teachers: A review of the value-added analysis underlying the effectiveness rankings of Los Angeles Unified School District teachers by the Los Angeles Times. Boulder, CO: National Education Policy Center. Google Scholar | |
|
Briggs, D., Weeks, J. (2009). The sensitivity of value-added modeling to the creation of a vertical score scale. Education Finance and Policy, 4, 384–414. Google Scholar | Crossref | |
|
Camilli, G. (2006). Examination of a simple errors-in-variables model: A demonstration of marginal maximum likelihood. Journal of Educational and Behavioral Statistics, 31, 311–325. Google Scholar | SAGE Journals | |
|
Carroll, R., Ruppert, D., Stefanski, L., Crainiceanu, C. (2006). Measurement error in nonlinear models: A modern perspective (2nd ed.). London, England: Chapman and Hall. Google Scholar | Crossref | |
|
Chetty, R., Friedman, J., Rockoff, J. (2013). Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates. (National Bureau of Economic Research Working Paper w19423). Cambridge, MA: National Bureau of Economic Research. Google Scholar | Crossref | |
|
Choi, K., Seltzer, M. (2010). Modeling heterogeneity in relationships between initial status and rates of change: Treating latent variable regression coefficients as random coefficients in a three-level hierarchical model. Journal of Educational and Behavioral Statistics, 35, 54–91. Google Scholar | SAGE Journals | |
|
Cook, J., Stefanski, L. (1994). Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association, 89, 1314–1328. Google Scholar | Crossref | |
|
CTB/McGraw-Hill . (2011). New York State Testing Program 2011: Mathematics, Grades 3-8. Technical Report. New York, NY: Author. Google Scholar | |
|
Culpepper, S., Aguinis, H. (2011). Using analysis of covariance (ANCOVA) with fallible covariates. Psychological Methods, 16, 166–178. Google Scholar | Crossref | Medline | |
|
Devanarayan, V., Stefanski, L. (2002). Empirical simulation extrapolation for measurement error models with replicate measurements. Statistics & Probability Letters, 59, 219–225. Google Scholar | Crossref | |
|
Fox, J.-P., Glas, C. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika, 68, 169–191. Google Scholar | Crossref | |
|
Fuller, W. (2006). Measurement error models (2nd ed.). New York, NY: John Wiley. Google Scholar | |
|
Gelman, A., Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–472. Google Scholar | Crossref | |
|
Greene, W. (2003). Econometric analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall. Google Scholar | |
|
Harris, D. (2011). Value-Added Measures in Education: What Every Educator Needs to Know. Cambridge, MA: Harvard Education Press. Google Scholar | |
|
Harris, D., Anderson, A. (2012, November). Bias of public sector worker performance monitoring: Theory and empirical evidence from middle school teachers. Paper presented at annual Association for Public Policy Analysis and Management conference, Baltimore, MD. Google Scholar | |
|
Harris, D., Sass, T. (2006). Value-added models and the measurement of teacher quality. Unpublished manuscript, Florida State University, Tallahassee, FL. Google Scholar | |
|
Junker, B., Schofield, L., Taylor, L. (2012). The use of cognitive ability measures as explanatory variables in regression analysis. IZA Journal of Labor Economics, 1, 1–19. Google Scholar | Crossref | |
|
Kane, T., McCaffrey, D., Miller, T., Staiger, D. (2013). Have we identified effective teachers? Validating measures of effective teaching using random assignment. Bill and Melinda Gates Foundation MET Project Research Paper. Google Scholar | |
|
Kang, J., Schafer, J. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 523–539. Google Scholar | Crossref | |
|
Koedel, C., Betts, J. (2007). Re-examining the role of teacher quality in the educational production function (Technical report). Columbia: University of Missouri. Google Scholar | |
|
Koedel, C., Leatherman, R., Parsons, E. (2012). Test measurement error and inference from value-added models. The B.E. Journal of Economic Analysis and Policy, 12, 1–37. Google Scholar | Crossref | |
|
Kolen, M., Zeng, L., Hanson, B. (1996). Conditional standard errors of measurement for scale scores using IRT. Journal of Educational Measurement, 33, 129–140. Google Scholar | Crossref | |
|
Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 73, 805–811. Google Scholar | Crossref | |
|
Lehmann, E. (1999). Elements of large-sample theory. New York, NY: Springer-Verlag. Google Scholar | Crossref | |
|
Lockwood, J., McCaffrey, D. (2007). Controlling for individual heterogeneity in longitudinal models, with applications to student achievement. Electronic Journal of Statistics, 1, 223–252. Google Scholar | Crossref | |
|
Lockwood, J., McCaffrey, D. (In press). Should nonlinear functions of test scores be used as covariates in a regression model? In Lissitz, R. (Ed.), Value-added modeling and growth modeling with particular application to teacher and school effectiveness. Charlotte, NC: Information Age. Google Scholar | |
|
Lord, F. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum. Google Scholar | |
|
Lunn, D., Thomas, A., Best, N., Spiegelhalter, D. (2000). WinBUGS—A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337. Google Scholar | Crossref | |
|
Martineau, J. (2006). Distorting value added: The use of longitudinal, vertically scaled student achievement data for value-added accountability. Journal of Educational and Behavioral Statistics, 31, 35–62. Google Scholar | SAGE Journals | |
|
McCaffrey, D., Han, B., Lockwood, J. (2013). Using auxiliary teacher data to improve value-added: An application of small area estimation to middle school mathematics teachers. In Lissitz, R. (Ed.), Value-added modeling and growth modeling with particular application to teacher and school effectiveness. Charlotte, NC: Information Age. Google Scholar | |
|
McCaffrey, D., Lockwood, J., Koretz, D., Louis, T., Hamilton, L. (2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29, 67–101. Google Scholar | SAGE Journals | |
|
McCaffrey, D., Lockwood, J., Setodji, C. (2013). Inverse probability weighting with error-prone covariates. Biometrika, 100, 671–680. Google Scholar | Crossref | Medline | |
|
Mihaly, K., McCaffrey, D., Lockwood, J., Sass, T. (2010). Centering and reference groups for estimates of fixed effects: Modifications to felsdvreg. The Stata Journal, 10, 82–103. Google Scholar | Abstract | |
|
Millsap, R. (1997). Invariance in measurement and prediction: Their relationship in the single-factor case. Psychological Methods, 2, 248–260. Google Scholar | Crossref | |
|
Millsap, R. (2007). Invariance in measurement and prediction revisited. Psychometrika, 72, 461–473. Google Scholar | Crossref | |
|
Mislevy, R., Johnson, E., Muraki, E. (1992). Scaling procedures in NAEP. Journal of Educational Statistics, 17, 131–154. Google Scholar | SAGE Journals | |
|
Plummer, M. (2003). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), Vienna, Austria. Google Scholar | |
|
Rabe-Hesketh, S., Pickles, A., Skrondal, A. (2003). Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation. Statistical Modelling, 3, 215–232. Google Scholar | SAGE Journals | |
|
Rabe-Hesketh, S., Skrondal, A., Pickles, A. (2001). Maximum likelihood estimation of generalized linear models with covariate measurement error. The Stata Journal, 1, 1–26. Google Scholar | |
|
Rabe-Hesketh, S., Skrondal, A., Pickles, A. (2004). GLLAMM Manual (U.C. Berkeley Division of Biostatistics Working Paper Series 1160). Berkeley, CA: Berkeley Electronic Press. Google Scholar | |
|
Raudenbush, S., Bryk, A. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Newbury Park, CA: Sage. Google Scholar | |
|
Reckase, M. (2004). The real world is more complicated than we would like. Journal of Educational and Behavioral Statistics, 29, 117–120. Google Scholar | SAGE Journals | |
|
Roeder, K., Carroll, R., Lindsay, B. (1996). Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 91, 722–732. Google Scholar | Crossref | |
|
Rosenbaum, P. R., Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. Google Scholar | Crossref | |
|
Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, 4, 537–571. Google Scholar | Crossref | |
|
Schafer, D., Purdy, K. (1996). Likelihood analysis for errors-in-variables regression with replicate measurements. Biometrika, 83, 813–824. Google Scholar | Crossref | |
|
Schafer, W. (2006). Growth scales as an alternative to vertical scales. Practical Assessment, Research & Evaluation, 11. Google Scholar | |
|
Shang, Y. (2012). Measurement error adjustment using the SIMEX method: An application to student growth percentiles. Journal of Educational Measurement, 49, 446–465. Google Scholar | Crossref | |
|
Skrondal, A., Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, FL: Chapman and Hall/CRC Interdisciplinary Statistics. Google Scholar | Crossref | |
|
Smarter Balanced Assessment Consortium . (2013). Smarter balanced assessments. Retrieved from http://www.smarterbalanced.org/smarter-balanced-assessments/ Google Scholar | |
|
Stata Corp . (2011). Stata Statistical Software: Release 12. College Station, TX: Author. Google Scholar | |
|
Value Added Research Center . (2010). NYC teacher data initiative: Technical report on the NYC value-added model. Madison, WI: Wisconsin Center for Education Research. Google Scholar | |
|
van der Linden, W., Hambleton, R. (1996). Handbook of modern item response theory. New York, NY: Springer-Verlag. Google Scholar | |
|
Wansbeek, T., Meijer, E. (2000). Measurement error and latent variables in econometrics. New York, NY: Elsevier. Google Scholar | |
|
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50, 1–26. Google Scholar | Crossref | |
|
Wooldridge, J. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press. Google Scholar |
