We contribute to debate about causal inferences in educational research in two ways. First, we quantify how much bias there must be in an estimate to invalidate an inference. Second, we utilize Rubin’s causal model to interpret the bias necessary to invalidate an inference in terms of sample replacement. We apply our analysis to an inference of a positive effect of Open Court Curriculum on reading achievement from a randomized experiment, and an inference of a negative effect of kindergarten retention on reading achievement from an observational study. We consider details of our framework, and then discuss how our approach informs judgment of inference relative to study design. We conclude with implications for scientific discourse.

Abbott, A. (1998). The causal devolution. Sociological Methods & Research, 27, 148181.
Google Scholar | SAGE Journals | ISI
Alexander, K. L. (1998). Response to Shepard, Smith and Marion. Psychology in Schools, 9, 410417.
Google Scholar
Alexander, K. L., Entwisle, D. R., Dauber, S. L. (2003). On the success of failure: A reassessment of the effects of retention in the primary school grades. Cambridge, UK: Cambridge University Press.
Google Scholar
Alexander, K. L., Pallas, A. M. (1983). Private schools and public policy: New evidence on cognitive achievement in public and private schools. Sociology of Education, 56, 170182.
Google Scholar | Crossref | ISI
Altonji, J. G., Conley, T., Elder, T., Taber, C. (2010). Methods for using selection on observed variables to address selection on unobserved variables. Retrieved from https://www.msu.edu/~telder/
Google Scholar
Altonji, J. G., Elder, T., Taber, C. (2005). An evaluation of instrumental variable strategies for estimating the effects of Catholic schooling. Journal of Human Resources, 40, 791821.
Google Scholar | Crossref | ISI
An Brian, P. (2013). The impact of dual enrollment on college degree attainment: Do low-SES students benefit? Educational Evaluation and Policy Analysis, 35, 5775.
Google Scholar | SAGE Journals | ISI
Becker, H. H. (1967). Whose side are we on? Social Problems, 14, 239247.
Google Scholar | Crossref | ISI
Behn, R. D., Vaupel, J. W. (1982). Quick analysis for busy decision makers. New York, NY: Basic Books.
Google Scholar
Bogatz, G. A., Ball, S. (1972). The impact of “sesame street” on children’s first school experience. Princeton, NJ: Educational Testing Service.
Google Scholar
Borman, G. D., Dowling, N. M., Schneck, C. (2008). A multi-site cluster randomized field trial of open court reading. Educational Evaluation and Policy Analysis, 30, 389407.
Google Scholar | SAGE Journals | ISI
Bozick, R., Dalton, B. (2013). Balancing career and technical education with academic coursework: The consequences for mathematics achievement in high school. Educational Evaluation and Policy Analysis, 35, 123138. doi: 10.3102/0162373712453870
Google Scholar | SAGE Journals | ISI
Brian, P. (2013). The impact of dual enrollment on college degree attainment: Do low-SES students benefit? Educational Evaluation and Policy Analysis, 35, 5775. doi: 10.3102/0162373712461933
Google Scholar | SAGE Journals | ISI
Brennan, R. L. (1995). The conventional wisdom about group mean scores. Journal of Educational Measurement, 32, 385396.
Google Scholar | Crossref | ISI
Bulterman-Bos, J. A. (2008). Will a clinical approach make education research more relevant for practice? Educational Researcher, 37, 412420.
Google Scholar | SAGE Journals | ISI
Burkam, D. T., LoGerfo, L., Ready, D., Lee, V. E. (2007). The differential effects of repeating kindergarten. Journal of Education for Students Placed at Risk, 12, 103136.
Google Scholar | Crossref
Campbell, D. T. (1976, December). Assessing the impact of planned social change. The Public Affairs Center, Dartmouth College, Hanover New Hampshire, USA. Retrieved from https://www.globalhivmeinfo.org/CapacityBuilding/Occasional%20Papers/08%20Assessing%20the%20Impact%20of%20Planned%20Social%20Change.pdf
Google Scholar
Carlson, D., Cowen, J. M., Fleming, D. J. (2013). Life after vouchers: What happens to students who leave private schools for the traditional public sector? Educational Evaluation and Policy Analysis, 35, 179199. doi:10.3102/0162373712461852
Google Scholar | SAGE Journals | ISI
Chubb, J. E., Moe, T. M. (1990). Politics, markets, and America’s schools. Washington, DC: The Brookings Institution.
Google Scholar
Clements, D. H., Sarama, J. (2008). Experimental evaluation of the effects of a research-based preschool mathematics curriculum. American Educational Research Journal, 45, 443494.
Google Scholar | SAGE Journals | ISI
Coleman, J. S., Hoffer, T., Kilgore, S. (1982). High school achievement: Public, catholic, and private schools compared. New York, NY: Basic Books.
Google Scholar
Cook, T. D. (2002). Randomized experiments in educational policy research: A critical examination of the reasons the educational evaluation community has offered for not doing them. Educational Evaluation and Policy Analysis, 24, 175199.
Google Scholar | SAGE Journals | ISI
Cook, T. D. (2003). Why have educational evaluators chosen not to do randomized experiments? Annals of American Academy of Political and Social Science, 589, 114149.
Google Scholar | SAGE Journals | ISI
Cook, T. D., Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago, IL: Rand McNally.
Google Scholar
Copas, J. B., Li, H. G. (1997). Inference for non-random samples. Journal of the Royal Statistical Society, Series B (Methodological), 59, 5595.
Google Scholar | Crossref | ISI
Cronbach, L. J. (1982). Designing evaluations of educational and social programs. San Francisco, CA: Jossey-Bass.
Google Scholar
ECLS-K user guide . (2001).Retrieved from http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2002149
Google Scholar
Eisenhart, M., Towne, L. (2008). Contestation and change in national policy on “scientifically based” education research. Educational Researcher, 32, 3138.
Google Scholar | SAGE Journals
Engel, M., Claessens, A., Finch, M. A. (2013). Teaching students what they already know? The (Mis)Alignment between mathematics instructional content and student knowledge in kindergarten. Educational Evaluation and Policy Analysis, 35,157178. doi:10.3102/0162373712461850
Google Scholar | SAGE Journals | ISI
Federal Register . (1998). Federal Register, 1998. 63(179). Retrieved from http://www.fda.gov/downloads/RegulatoryInformation/Guidances/UCM129505.pdf
Google Scholar
Finn, J. D., Achilles, C. M. (1990). Answers and questions about class size: A statewide experiment. American Educational Research Journal, 27, 557577.
Google Scholar | SAGE Journals | ISI
Fisher, R., Sir, A. (1970). Statistical methods for research workers. Darien, CT: Hafner (Original work published 1930)
Google Scholar
Frank, K. A. (2000). Impact of a confounding variable on the inference of a regression coefficient. Sociological Methods & Research, 29, 147194.
Google Scholar | SAGE Journals | ISI
Frank, K. A., Min, K. (2007). Indices of robustness for sample representation. Sociological Methodology, 37, 349392.
Google Scholar | SAGE Journals | ISI
Frank, K. A., Sykes, G., Anagnostopoulos, D., Cannata, M., Chard, L., Krause, A., McCrory, R. (2008). Extended influence: National board certified teachers as help providers. Education, Evaluation and Policy Analysis, 30, 330.
Google Scholar | SAGE Journals | ISI
Gastwirth, J. L., Krieger, A. M., Rosenbaum, P. R. (1998). Dual and simultaneous sensitivity analysis for matched pairs. Biometrika, 85, 907920.
Google Scholar | Crossref | ISI
Greco, J. (2009). The value problem. In Haddock, A., Millar, A., Pritchard, D. H. (Eds.), Epistemic value (pp. 313321). Oxford, UK: Oxford University Press.
Google Scholar
Grigg, J., Kelly, K. A., Gamoran, A., Borman, G. D. (2013). Effects of two scientific inquiry professional development interventions on teaching practice. Educational Evaluation and Policy Analysis, 35, 3856. doi:10.3102/0162373712461851
Google Scholar | SAGE Journals | ISI
Habermas, J. (1987). Knowledge and human interests. Cambridge, UK: Polity Press.
Google Scholar
Harding, D. J. (2003). Counterfactual models of neighborhood effects: The effect of neighborhood poverty on dropping out and teenage pregnancy. American Journal of Sociology, 109, 676719.
Google Scholar | Crossref | ISI
Harvard Education Letter . (1986). Repeating a grade: Dopes it help? Harvard Education Letter, 2, 14.
Google Scholar
Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 47, 153161.
Google Scholar | Crossref | ISI
Heckman, J. (2005). The scientific model of causality. Sociological Methodology, 35, 199.
Google Scholar | SAGE Journals | ISI
Heckman, J., Urzua, S., Vytlacil, E. (2006). Understanding instrumental variables in models with essential heterogeneity. Review of Economics and Statistics, 88, 389432.
Google Scholar | Crossref | ISI
Hedges, L., O’Muircheartaigh, C. (2011). Generalization from experiments. Retrieved from http://steinhardt.nyu.edu/scmsAdmin/uploads/003/585/Generalization%20from%20Experiments-Hedges.pdf
Google Scholar
Hill, H. C., Rowan, B., Ball, D. L. (2005). Effects of teachers’ mathematical knowledge for teaching on student achievement. American Educational Research Journal, 42, 371406.
Google Scholar | SAGE Journals | ISI
Hirano, K., Imbens, G. W. (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Services and Outcomes Research Methodology, 2, 259278.
Google Scholar | Crossref
Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945970.
Google Scholar | Crossref | ISI
Holland, P. W. (1989). Choosing among alternative nonexperimental methods for estimating the impact of social programs: The case of manpower training: Comment. Journal of the American Statistical Association, 84, 875877.
Google Scholar
Holmes, C. T. (1989). Grade level retention effects: A meta-analysis of research studies. In Shepard, L. A., Smith, M. L. (Eds.), Flunking grades (pp. 1633). New York, NY: Falmer Press.
Google Scholar
Holmes, C. T., Matthews, K. (1984). The Effects of nonpromotion on elementary and junior high school pupils: A meta analysis. Review of Educational Research, 54, 225236.
Google Scholar | SAGE Journals | ISI
Hong, G., Raudenbush, S. W. (2005). Effects of kindergarten retention policy on children’s cognitive growth in reading and mathematics. Educational Evaluation and Policy Analysis, 27, 205224.
Google Scholar | SAGE Journals | ISI
Ichino, A., Mealli, F., Nannicini, T. (2008). From temporary help jobs to permanent employment: What can we learn from matching estimators and their sensitivity? Journal of Applied Econometrics, 23, 305327. doi:10.1002/jae.998
Google Scholar | Crossref | ISI
Imai, K., Keele, L., Yamamoto, T. (2010). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25, 5171.
Google Scholar | Crossref | ISI
Jimerson, S. (2001). Meta-analysis of grade retention research: Implications for practice in the 21st century. School Psychology Review, 30, 420437.
Google Scholar | ISI
Karweit, N. L. (1992). Retention policy. In Alkin, M. (Ed.), Encyclopedia of educational research (pp. 114118). New York, NY: Macmillan.
Google Scholar
Kuhn, T. (1962). The structure of scientific revolutions. Chicago, IL: University of Chicago Press.
Google Scholar
Kvanvig, J. L. (2003). The value of knowledge and the pursuit of understanding. Oxford, UK: Oxford University Press.
Google Scholar | Crossref
Lin, D. Y., Psaty, B. M., Kronmal, R. A. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics, 54, 948963.
Google Scholar | Crossref | Medline | ISI
Lorence, J., Dworkin, G., Toenjes, L., Hill, A. (2002). Grade retention and social promotion in Texas, 1994-99: Academic achievement among elementary school students. In Ravitch, D. (Ed.), Brookings papers on education policy (pp. 1367). Washington, DC: Brookings Institution Press.
Google Scholar | Crossref
Manski, C. (1990). Nonparametric bounds on treatment effects. American Economic Review Papers and Proceedings, 80, 319323.
Google Scholar | ISI
Mariano, L. T., Martorell, P. (2013). The academic effects of summer instruction and retention in New York City. Educational Evaluation and Policy Analysis, 35, 96117. doi: 10.3102/0162373712454327
Google Scholar | SAGE Journals | ISI
Maroulis, S., Guimera, R., Petry, H., Gomez, L., Amaral, L. A. N., Wilensky, U. (2010). A complex systems view of educational policy. Science, 330, 38-39.
Google Scholar | Crossref | Medline | ISI
Miller, S., Connolly, P. (2013). A randomized controlled trial evaluation of time to read, a volunteer tutoring program for 8- to 9-year-olds. Educational Evaluation and Policy Analysis, 35, 2337. doi:10.3102/0162373712452628
Google Scholar | SAGE Journals | ISI
Morgan, S. L. (2001). Counterfactuals, causal effect heterogeneity, and the Catholic school effect on learning. Sociology of Education, 74, 341374.
Google Scholar | Crossref | ISI
Morgan, S. L., Winship, C. (2007). Counterfactuals and causal inference: Methods and principles for social research. Cambridge, UK: Cambridge University Press.
Google Scholar | Crossref
National Reading Panel . (2000). Report of the National Reading Panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction (NIH Publication No 00–4769). Washington, DC: U.S. Government Printing Office.
Google Scholar
National Research Council . (2002). Scientific research in education. Washington, DC: National Academy Press.
Google Scholar
Nomi, T. (2012). The unintended consequences of an algebra-for-all policy on high-skill students: effects on instructional organization and students’ academic outcomes. Educational Evaluation and Policy Analysis, 34, 489505. doi:10.3102/0162373712453869
Google Scholar | SAGE Journals | ISI
Nye, B., Konstantopoulos, S., Hedges, L. V. (2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26, 237257.
Google Scholar | SAGE Journals | ISI
Oakley, A. (1998). Experimentation and social interventions: A forgotten but important history. British Medical Journal, 317, 12391242.
Google Scholar | Crossref | Medline
Olkin, I., Pratt, J. W. (1958). Unbiased estimation of certain correlation coefficients. Annals of Mathematical Statistics, 29, 201211.
Google Scholar | Crossref
Pearl, J., Bareinboim, E. (2010, October). Transportability across studies: A formal approach. Retrieved from http://ftp.cs.ucla.edu/pub/stat_ser/r372.pdf
Google Scholar
Raudenbush, S. W. (2005). Learning from attempts to improve schooling: The contribution of methodological diversity. Educational Researcher, 34, 2531.
Google Scholar | SAGE Journals
Reynolds, A. J. (1992). Grade retention and school adjustment: An explanatory analysis. Educational Evaluation and Policy Analysis, 14, 101121.
Google Scholar | SAGE Journals | ISI
Robins, J., Rotnisky, A., Scharfstein, D. (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Hallorn, E. (Ed.), Statistical models in epidemiology (pp. 195). New York, NY: Springer.
Google Scholar | Crossref
Roderick, M., Bryk, A. S., Jacobs, B. A., Easton, J. Q., Allensworth, E. (1999). Ending social promotion: Results from the first two years. Chicago, IL: Consortium on Chicago School Research.
Google Scholar
Rosenbaum, P. R. (1986). Dropping out of high school in the United States: An observational study. Journal of Educational Statistics, 11, 207224.
Google Scholar | SAGE Journals
Rosenbaum, P. R. (2002). Observational studies. New York, NY: Springer.
Google Scholar | Crossref
Rosenbaum, P. R., Rubin, D. B. (1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. Journal of the Royal Statistical Society (Series B), 45, 212218.
Google Scholar
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688701.
Google Scholar | Crossref | ISI
Rubin, D. B. (1986). Which ifs have causal answers? Discussion of Holland’s “statistics and causal inference.” Journal of American Statistical Association, 83, 396.
Google Scholar
Rubin, D. B. (1990). Formal modes of statistical inference for causal effects. Journal of Statistical Planning and Inference, 25, 279292.
Google Scholar | Crossref | ISI
Rubin, D. B. (2004). Teaching statistical inference for causal effects in experiments and observational studies. Journal of Educational and Behavioral Statistics, 29, 343368.
Google Scholar | SAGE Journals | ISI
Saunders, W. M., Marcelletti, D. J. (2013). The gap that can’t go away: The Catch-22 of reclassification in monitoring the progress of English learners. Educational Evaluation and Policy Analysis, 35, 139-156. doi:10.3102/0162373712461849
Google Scholar | SAGE Journals | ISI
Scharfstein, D. A. I. (2002). Generalized additive selection models for the analysis of studies with potentially non-ignorable missing data. Biometrics, 59, 601613.
Google Scholar | Crossref | ISI
Schneider, B. M., Carnoy, J., Kilpatrick, W. H., Schmidt, Shavelson, R. J. (2007). Estimating causal effects using experimental and observational designs. AERA. Retrieved from http://www.aera.net/Publications/Books/EstimatingCausalEffectsUsingExperimentaland/tabid/12625/Default.aspx
Google Scholar
Schweinhart, L. J., Barnes, H. V., Weikart, D. P. (with Barnett, W. S., Epstein, A. S.). (1993). Significant benefits: The high/scope Perry preschool study through age 27. Ypsilanti, MI: High/Scope Press.
Google Scholar
Shadish, W. R., Clark, M. H., Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment. Journal of the American Statistical Association, 103, 13341344.
Google Scholar | Crossref | ISI
Shadish, W. R., Cook, T. D., Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York, NY: Houghton Mifflin.
Google Scholar
Shager, H. M., Schindler, H. S., Magnuson, K. A., Duncan, G. J., Yoshikawa, H., Hart, C. M. D. (2013). Can research design explain variation in head start research results? A meta-analysis of cognitive and achievement outcomes. Educational Evaluation and Policy Analysis, 35, 7695. doi:10.3102/0162373712462453
Google Scholar | SAGE Journals | ISI
Shepard, L. A., Smith, M. L. (1989). Flunking grades. New York, NY: Falmer Press.
Google Scholar
Shepard, L. A., Smith, M. L., Marion, S. F. (1998). On the success of failure: A rejoinder to Alexander. Psychology in the Schools, 35, 404406.
Google Scholar | Crossref | ISI
Slavin, R. E. (2008). Perspectives on evidence-based research in education-what works? Issues in synthesizing educational program evaluations. Educational Researcher, 37, 514.
Google Scholar | SAGE Journals | ISI
Sosa, E. (2007). A virtue epistemology. Oxford, UK: Oxford University Press.
Google Scholar | Crossref
SRA/McGraw-Hill . (2001). Technical report performance assessments. Monterey, CA: TerraNova.
Google Scholar
Steiner, P. M., Cook, T. D., Shadish, W. R. (2011). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics, 36, 213236.
Google Scholar | SAGE Journals | ISI
Steiner, P. M., Cook, T. D., Shadish, W. R., Clark, M. H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15, 250267.
Google Scholar | Crossref | Medline | ISI
Stephan, J. L., Rosenbaum, J. E. (2013). Can high schools reduce college enrollment gaps with a new counseling model? Educational Evaluation and Policy Analysis, 35, 200-219. doi: 10.3102/0162373712462624
Google Scholar | SAGE Journals | ISI
Stuart, E. A., Cole, S. R., Bradshaw, C. P., Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society, Series A, 174, 369386.
Google Scholar | Crossref | ISI
Thorndike, E. L., Woodworth, R. S. (1901).The influence of improvement in one mental function upon the efficiency of other functions. Psychological Review, 8, 247-261, 384-395, 553-564.
Google Scholar | Crossref
US Department of Education . (2002). Evidence based education. Retrieved from http://www.ed.gov/nclb/methods/whatworks/eb/edlite-index.html
Google Scholar
US Department of Health and Human Services . (2000). Trends in the well-being of America’s children and youth. Washington, DC.
Google Scholar
Wainer, H., Robinson, D. H. (2003). Shaping up the practice of null hypothesis significance testing. Educational Researcher, 32, 2230.
Google Scholar | SAGE Journals
Wilkinson, L. , & Task Force on Statistical Inference . (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594604.
Google Scholar | Crossref | ISI
Winship, C., Stephen, L. M. (1999). The estimation of causal effects from observational data. Annual Review of Sociology, 25, 659706.
Google Scholar | Crossref | ISI
Yuan, K., Le, V., McCaffrey, D. F., Marsh, J. A., Hamilton, L. S., Stecher, B. M., Springer, M. G. (2013). Incentive pay programs do not affect teacher motivation or reported practices: Results from three randomized studies. Educational Evaluation and Policy Analysis, 35, 322. doi:10.3102/0162373712462625
Google Scholar | SAGE Journals | ISI
Access Options

My Account

Welcome
You do not have access to this content.



Chinese Institutions / 中国用户

Click the button below for the full-text content

请点击以下获取该全文

Institutional Access

does not have access to this content.

Purchase Content

24 hours online access to download content

Research off-campus without worrying about access issues. Find out about Lean Library here

Your Access Options


Purchase

EPA-article-ppv for $37.50
Single Issue 24 hour E-access for $155.66

Cookies Notification

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more.
Top