Abstract
We contribute to debate about causal inferences in educational research in two ways. First, we quantify how much bias there must be in an estimate to invalidate an inference. Second, we utilize Rubin’s causal model to interpret the bias necessary to invalidate an inference in terms of sample replacement. We apply our analysis to an inference of a positive effect of Open Court Curriculum on reading achievement from a randomized experiment, and an inference of a negative effect of kindergarten retention on reading achievement from an observational study. We consider details of our framework, and then discuss how our approach informs judgment of inference relative to study design. We conclude with implications for scientific discourse.
References
|
Abbott, A. (1998). The causal devolution. Sociological Methods & Research, 27, 148–181. Google Scholar | SAGE Journals | ISI | |
|
Alexander, K. L. (1998). Response to Shepard, Smith and Marion. Psychology in Schools, 9, 410–417. Google Scholar | |
|
Alexander, K. L., Entwisle, D. R., Dauber, S. L. (2003). On the success of failure: A reassessment of the effects of retention in the primary school grades. Cambridge, UK: Cambridge University Press. Google Scholar | |
|
Alexander, K. L., Pallas, A. M. (1983). Private schools and public policy: New evidence on cognitive achievement in public and private schools. Sociology of Education, 56, 170–182. Google Scholar | Crossref | ISI | |
|
Altonji, J. G., Conley, T., Elder, T., Taber, C. (2010). Methods for using selection on observed variables to address selection on unobserved variables. Retrieved from https://www.msu.edu/~telder/ Google Scholar | |
|
Altonji, J. G., Elder, T., Taber, C. (2005). An evaluation of instrumental variable strategies for estimating the effects of Catholic schooling. Journal of Human Resources, 40, 791–821. Google Scholar | Crossref | ISI | |
|
An Brian, P. (2013). The impact of dual enrollment on college degree attainment: Do low-SES students benefit? Educational Evaluation and Policy Analysis, 35, 57–75. Google Scholar | SAGE Journals | ISI | |
|
Becker, H. H. (1967). Whose side are we on? Social Problems, 14, 239–247. Google Scholar | Crossref | ISI | |
|
Behn, R. D., Vaupel, J. W. (1982). Quick analysis for busy decision makers. New York, NY: Basic Books. Google Scholar | |
|
Bogatz, G. A., Ball, S. (1972). The impact of “sesame street” on children’s first school experience. Princeton, NJ: Educational Testing Service. Google Scholar | |
|
Borman, G. D., Dowling, N. M., Schneck, C. (2008). A multi-site cluster randomized field trial of open court reading. Educational Evaluation and Policy Analysis, 30, 389–407. Google Scholar | SAGE Journals | ISI | |
|
Bozick, R., Dalton, B. (2013). Balancing career and technical education with academic coursework: The consequences for mathematics achievement in high school. Educational Evaluation and Policy Analysis, 35, 123–138. doi: 10.3102/0162373712453870 Google Scholar | SAGE Journals | ISI | |
|
Brian, P. (2013). The impact of dual enrollment on college degree attainment: Do low-SES students benefit? Educational Evaluation and Policy Analysis, 35, 57–75. doi: 10.3102/0162373712461933 Google Scholar | SAGE Journals | ISI | |
|
Brennan, R. L. (1995). The conventional wisdom about group mean scores. Journal of Educational Measurement, 32, 385–396. Google Scholar | Crossref | ISI | |
|
Bulterman-Bos, J. A. (2008). Will a clinical approach make education research more relevant for practice? Educational Researcher, 37, 412–420. Google Scholar | SAGE Journals | ISI | |
|
Burkam, D. T., LoGerfo, L., Ready, D., Lee, V. E. (2007). The differential effects of repeating kindergarten. Journal of Education for Students Placed at Risk, 12, 103–136. Google Scholar | Crossref | |
|
Campbell, D. T. (1976, December). Assessing the impact of planned social change. The Public Affairs Center, Dartmouth College, Hanover New Hampshire, USA. Retrieved from https://www.globalhivmeinfo.org/CapacityBuilding/Occasional%20Papers/08%20Assessing%20the%20Impact%20of%20Planned%20Social%20Change.pdf Google Scholar | |
|
Carlson, D., Cowen, J. M., Fleming, D. J. (2013). Life after vouchers: What happens to students who leave private schools for the traditional public sector? Educational Evaluation and Policy Analysis, 35, 179–199. doi:10.3102/0162373712461852 Google Scholar | SAGE Journals | ISI | |
|
Chubb, J. E., Moe, T. M. (1990). Politics, markets, and America’s schools. Washington, DC: The Brookings Institution. Google Scholar | |
|
Clements, D. H., Sarama, J. (2008). Experimental evaluation of the effects of a research-based preschool mathematics curriculum. American Educational Research Journal, 45, 443–494. Google Scholar | SAGE Journals | ISI | |
|
Coleman, J. S., Hoffer, T., Kilgore, S. (1982). High school achievement: Public, catholic, and private schools compared. New York, NY: Basic Books. Google Scholar | |
|
Cook, T. D. (2002). Randomized experiments in educational policy research: A critical examination of the reasons the educational evaluation community has offered for not doing them. Educational Evaluation and Policy Analysis, 24, 175–199. Google Scholar | SAGE Journals | ISI | |
|
Cook, T. D. (2003). Why have educational evaluators chosen not to do randomized experiments? Annals of American Academy of Political and Social Science, 589, 114–149. Google Scholar | SAGE Journals | ISI | |
|
Cook, T. D., Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago, IL: Rand McNally. Google Scholar | |
|
Copas, J. B., Li, H. G. (1997). Inference for non-random samples. Journal of the Royal Statistical Society, Series B (Methodological), 59, 55–95. Google Scholar | Crossref | ISI | |
|
Cronbach, L. J. (1982). Designing evaluations of educational and social programs. San Francisco, CA: Jossey-Bass. Google Scholar | |
|
ECLS-K user guide . (2001).Retrieved from http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2002149 Google Scholar | |
|
Eisenhart, M., Towne, L. (2008). Contestation and change in national policy on “scientifically based” education research. Educational Researcher, 32, 31–38. Google Scholar | SAGE Journals | |
|
Engel, M., Claessens, A., Finch, M. A. (2013). Teaching students what they already know? The (Mis)Alignment between mathematics instructional content and student knowledge in kindergarten. Educational Evaluation and Policy Analysis, 35,157–178. doi:10.3102/0162373712461850 Google Scholar | SAGE Journals | ISI | |
|
Federal Register . (1998). Federal Register, 1998. 63(179). Retrieved from http://www.fda.gov/downloads/RegulatoryInformation/Guidances/UCM129505.pdf Google Scholar | |
|
Finn, J. D., Achilles, C. M. (1990). Answers and questions about class size: A statewide experiment. American Educational Research Journal, 27, 557–577. Google Scholar | SAGE Journals | ISI | |
|
Fisher, R., Sir, A. (1970). Statistical methods for research workers. Darien, CT: Hafner (Original work published 1930) Google Scholar | |
|
Frank, K. A. (2000). Impact of a confounding variable on the inference of a regression coefficient. Sociological Methods & Research, 29, 147–194. Google Scholar | SAGE Journals | ISI | |
|
Frank, K. A., Min, K. (2007). Indices of robustness for sample representation. Sociological Methodology, 37, 349–392. Google Scholar | SAGE Journals | ISI | |
|
Frank, K. A., Sykes, G., Anagnostopoulos, D., Cannata, M., Chard, L., Krause, A., McCrory, R. (2008). Extended influence: National board certified teachers as help providers. Education, Evaluation and Policy Analysis, 30, 3–30. Google Scholar | SAGE Journals | ISI | |
|
Gastwirth, J. L., Krieger, A. M., Rosenbaum, P. R. (1998). Dual and simultaneous sensitivity analysis for matched pairs. Biometrika, 85, 907–920. Google Scholar | Crossref | ISI | |
|
Greco, J. (2009). The value problem. In Haddock, A., Millar, A., Pritchard, D. H. (Eds.), Epistemic value (pp. 313–321). Oxford, UK: Oxford University Press. Google Scholar | |
|
Grigg, J., Kelly, K. A., Gamoran, A., Borman, G. D. (2013). Effects of two scientific inquiry professional development interventions on teaching practice. Educational Evaluation and Policy Analysis, 35, 38–56. doi:10.3102/0162373712461851 Google Scholar | SAGE Journals | ISI | |
|
Habermas, J. (1987). Knowledge and human interests. Cambridge, UK: Polity Press. Google Scholar | |
|
Harding, D. J. (2003). Counterfactual models of neighborhood effects: The effect of neighborhood poverty on dropping out and teenage pregnancy. American Journal of Sociology, 109, 676–719. Google Scholar | Crossref | ISI | |
|
Harvard Education Letter . (1986). Repeating a grade: Dopes it help? Harvard Education Letter, 2, 1–4. Google Scholar | |
|
Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153–161. Google Scholar | Crossref | ISI | |
|
Heckman, J. (2005). The scientific model of causality. Sociological Methodology, 35, 1–99. Google Scholar | SAGE Journals | ISI | |
|
Heckman, J., Urzua, S., Vytlacil, E. (2006). Understanding instrumental variables in models with essential heterogeneity. Review of Economics and Statistics, 88, 389–432. Google Scholar | Crossref | ISI | |
|
Hedges, L., O’Muircheartaigh, C. (2011). Generalization from experiments. Retrieved from http://steinhardt.nyu.edu/scmsAdmin/uploads/003/585/Generalization%20from%20Experiments-Hedges.pdf Google Scholar | |
|
Hill, H. C., Rowan, B., Ball, D. L. (2005). Effects of teachers’ mathematical knowledge for teaching on student achievement. American Educational Research Journal, 42, 371–406. Google Scholar | SAGE Journals | ISI | |
|
Hirano, K., Imbens, G. W. (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Services and Outcomes Research Methodology, 2, 259–278. Google Scholar | Crossref | |
|
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–970. Google Scholar | Crossref | ISI | |
|
Holland, P. W. (1989). Choosing among alternative nonexperimental methods for estimating the impact of social programs: The case of manpower training: Comment. Journal of the American Statistical Association, 84, 875–877. Google Scholar | |
|
Holmes, C. T. (1989). Grade level retention effects: A meta-analysis of research studies. In Shepard, L. A., Smith, M. L. (Eds.), Flunking grades (pp. 16–33). New York, NY: Falmer Press. Google Scholar | |
|
Holmes, C. T., Matthews, K. (1984). The Effects of nonpromotion on elementary and junior high school pupils: A meta analysis. Review of Educational Research, 54, 225–236. Google Scholar | SAGE Journals | ISI | |
|
Hong, G., Raudenbush, S. W. (2005). Effects of kindergarten retention policy on children’s cognitive growth in reading and mathematics. Educational Evaluation and Policy Analysis, 27, 205–224. Google Scholar | SAGE Journals | ISI | |
|
Ichino, A., Mealli, F., Nannicini, T. (2008). From temporary help jobs to permanent employment: What can we learn from matching estimators and their sensitivity? Journal of Applied Econometrics, 23, 305–327. doi:10.1002/jae.998 Google Scholar | Crossref | ISI | |
|
Imai, K., Keele, L., Yamamoto, T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25, 51–71. Google Scholar | Crossref | ISI | |
|
Jimerson, S. (2001). Meta-analysis of grade retention research: Implications for practice in the 21st century. School Psychology Review, 30, 420–437. Google Scholar | ISI | |
|
Karweit, N. L. (1992). Retention policy. In Alkin, M. (Ed.), Encyclopedia of educational research (pp. 114–118). New York, NY: Macmillan. Google Scholar | |
|
Kuhn, T. (1962). The structure of scientific revolutions. Chicago, IL: University of Chicago Press. Google Scholar | |
|
Kvanvig, J. L. (2003). The value of knowledge and the pursuit of understanding. Oxford, UK: Oxford University Press. Google Scholar | Crossref | |
|
Lin, D. Y., Psaty, B. M., Kronmal, R. A. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics, 54, 948–963. Google Scholar | Crossref | Medline | ISI | |
|
Lorence, J., Dworkin, G., Toenjes, L., Hill, A. (2002). Grade retention and social promotion in Texas, 1994-99: Academic achievement among elementary school students. In Ravitch, D. (Ed.), Brookings papers on education policy (pp. 13–67). Washington, DC: Brookings Institution Press. Google Scholar | Crossref | |
|
Manski, C. (1990). Nonparametric bounds on treatment effects. American Economic Review Papers and Proceedings, 80, 319–323. Google Scholar | ISI | |
|
Mariano, L. T., Martorell, P. (2013). The academic effects of summer instruction and retention in New York City. Educational Evaluation and Policy Analysis, 35, 96–117. doi: 10.3102/0162373712454327 Google Scholar | SAGE Journals | ISI | |
|
Maroulis, S., Guimera, R., Petry, H., Gomez, L., Amaral, L. A. N., Wilensky, U. (2010). A complex systems view of educational policy. Science, 330, 38-39. Google Scholar | Crossref | Medline | ISI | |
|
Miller, S., Connolly, P. (2013). A randomized controlled trial evaluation of time to read, a volunteer tutoring program for 8- to 9-year-olds. Educational Evaluation and Policy Analysis, 35, 23–37. doi:10.3102/0162373712452628 Google Scholar | SAGE Journals | ISI | |
|
Morgan, S. L. (2001). Counterfactuals, causal effect heterogeneity, and the Catholic school effect on learning. Sociology of Education, 74, 341–374. Google Scholar | Crossref | ISI | |
|
Morgan, S. L., Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. Cambridge, UK: Cambridge University Press. Google Scholar | Crossref | |
|
National Reading Panel . (2000). Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No 00–4769). Washington, DC: U.S. Government Printing Office. Google Scholar | |
|
National Research Council . (2002). Scientific research in education. Washington, DC: National Academy Press. Google Scholar | |
|
Nomi, T. (2012). The unintended consequences of an algebra-for-all policy on high-skill students: effects on instructional organization and students’ academic outcomes. Educational Evaluation and Policy Analysis, 34, 489–505. doi:10.3102/0162373712453869 Google Scholar | SAGE Journals | ISI | |
|
Nye, B., Konstantopoulos, S., Hedges, L. V. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26, 237–257. Google Scholar | SAGE Journals | ISI | |
|
Oakley, A. (1998). Experimentation and social interventions: A forgotten but important history. British Medical Journal, 317, 1239–1242. Google Scholar | Crossref | Medline | |
|
Olkin, I., Pratt, J. W. (1958). Unbiased estimation of certain correlation coefficients. Annals of Mathematical Statistics, 29, 201–211. Google Scholar | Crossref | |
|
Pearl, J., Bareinboim, E. (2010, October). Transportability across studies: A formal approach. Retrieved from http://ftp.cs.ucla.edu/pub/stat_ser/r372.pdf Google Scholar | |
|
Raudenbush, S. W. (2005). Learning from attempts to improve schooling: The contribution of methodological diversity. Educational Researcher, 34, 25–31. Google Scholar | SAGE Journals | |
|
Reynolds, A. J. (1992). Grade retention and school adjustment: An explanatory analysis. Educational Evaluation and Policy Analysis, 14, 101–121. Google Scholar | SAGE Journals | ISI | |
|
Robins, J., Rotnisky, A., Scharfstein, D. (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Hallorn, E. (Ed.), Statistical models in epidemiology (pp. 1–95). New York, NY: Springer. Google Scholar | Crossref | |
|
Roderick, M., Bryk, A. S., Jacobs, B. A., Easton, J. Q., Allensworth, E. (1999). Ending social promotion: Results from the first two years. Chicago, IL: Consortium on Chicago School Research. Google Scholar | |
|
Rosenbaum, P. R. (1986). Dropping out of high school in the United States: An observational study. Journal of Educational Statistics, 11, 207–224. Google Scholar | SAGE Journals | |
|
Rosenbaum, P. R. (2002). Observational studies. New York, NY: Springer. Google Scholar | Crossref | |
|
Rosenbaum, P. R., Rubin, D. B. (1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society (Series B), 45, 212–218. Google Scholar | |
|
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701. Google Scholar | Crossref | ISI | |
|
Rubin, D. B. (1986). Which ifs have causal answers? Discussion of Holland’s “statistics and causal inference.” Journal of American Statistical Association, 83, 396. Google Scholar | |
|
Rubin, D. B. (1990). Formal modes of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25, 279–292. Google Scholar | Crossref | ISI | |
|
Rubin, D. B. (2004). Teaching statistical inference for causal effects in experiments and observational studies. Journal of Educational and Behavioral Statistics, 29, 343–368. Google Scholar | SAGE Journals | ISI | |
|
Saunders, W. M., Marcelletti, D. J. (2013). The gap that can’t go away: The Catch-22 of reclassification in monitoring the progress of English learners. Educational Evaluation and Policy Analysis, 35, 139-156. doi:10.3102/0162373712461849 Google Scholar | SAGE Journals | ISI | |
|
Scharfstein, D. A. I. (2002). Generalized additive selection models for the analysis of studies with potentially non-ignorable missing data. Biometrics, 59, 601–613. Google Scholar | Crossref | ISI | |
|
Schneider, B. M., Carnoy, J., Kilpatrick, W. H., Schmidt, Shavelson, R. J. (2007). Estimating causal effects using experimental and observational designs. AERA. Retrieved from http://www.aera.net/Publications/Books/EstimatingCausalEffectsUsingExperimentaland/tabid/12625/Default.aspx Google Scholar | |
|
Schweinhart, L. J., Barnes, H. V., Weikart, D. P. (with Barnett, W. S., Epstein, A. S.). (1993). Significant benefits: The high/scope Perry preschool study through age 27. Ypsilanti, MI: High/Scope Press. Google Scholar | |
|
Shadish, W. R., Clark, M. H., Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103, 1334–1344. Google Scholar | Crossref | ISI | |
|
Shadish, W. R., Cook, T. D., Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York, NY: Houghton Mifflin. Google Scholar | |
|
Shager, H. M., Schindler, H. S., Magnuson, K. A., Duncan, G. J., Yoshikawa, H., Hart, C. M. D. (2013). Can research design explain variation in head start research results? A meta-analysis of cognitive and achievement outcomes. Educational Evaluation and Policy Analysis, 35, 76–95. doi:10.3102/0162373712462453 Google Scholar | SAGE Journals | ISI | |
|
Shepard, L. A., Smith, M. L. (1989). Flunking grades. New York, NY: Falmer Press. Google Scholar | |
|
Shepard, L. A., Smith, M. L., Marion, S. F. (1998). On the success of failure: A rejoinder to Alexander. Psychology in the Schools, 35, 404–406. Google Scholar | Crossref | ISI | |
|
Slavin, R. E. (2008). Perspectives on evidence-based research in education-what works? Issues in synthesizing educational program evaluations. Educational Researcher, 37, 5–14. Google Scholar | SAGE Journals | ISI | |
|
Sosa, E. (2007). A virtue epistemology. Oxford, UK: Oxford University Press. Google Scholar | Crossref | |
|
SRA/McGraw-Hill . (2001). Technical report performance assessments. Monterey, CA: TerraNova. Google Scholar | |
|
Steiner, P. M., Cook, T. D., Shadish, W. R. (2011). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics, 36, 213–236. Google Scholar | SAGE Journals | ISI | |
|
Steiner, P. M., Cook, T. D., Shadish, W. R., Clark, M. H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15, 250–267. Google Scholar | Crossref | Medline | ISI | |
|
Stephan, J. L., Rosenbaum, J. E. (2013). Can high schools reduce college enrollment gaps with a new counseling model? Educational Evaluation and Policy Analysis, 35, 200-219. doi: 10.3102/0162373712462624 Google Scholar | SAGE Journals | ISI | |
|
Stuart, E. A., Cole, S. R., Bradshaw, C. P., Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society, Series A, 174, 369–386. Google Scholar | Crossref | ISI | |
|
Thorndike, E. L., Woodworth, R. S. (1901).The influence of improvement in one mental function upon the efficiency of other functions. Psychological Review, 8, 247-261, 384-395, 553-564. Google Scholar | Crossref | |
|
US Department of Education . (2002). Evidence based education. Retrieved from http://www.ed.gov/nclb/methods/whatworks/eb/edlite-index.html Google Scholar | |
|
US Department of Health and Human Services . (2000). Trends in the well-being of America’s children and youth. Washington, DC. Google Scholar | |
|
Wainer, H., Robinson, D. H. (2003). Shaping up the practice of null hypothesis significance testing. Educational Researcher, 32, 22–30. Google Scholar | SAGE Journals | |
|
Wilkinson, L. , & Task Force on Statistical Inference . (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604. Google Scholar | Crossref | ISI | |
|
Winship, C., Stephen, L. M. (1999). The estimation of causal effects from observational data. Annual Review of Sociology, 25, 659–706. Google Scholar | Crossref | ISI | |
|
Yuan, K., Le, V., McCaffrey, D. F., Marsh, J. A., Hamilton, L. S., Stecher, B. M., Springer, M. G. (2013). Incentive pay programs do not affect teacher motivation or reported practices: Results from three randomized studies. Educational Evaluation and Policy Analysis, 35, 3–22. doi:10.3102/0162373712462625 Google Scholar | SAGE Journals | ISI |
