Abstract
This article introduces a new approach for evaluating replication results. It combines effect-size estimation with hypothesis testing, assessing the extent to which the replication results are consistent with an effect size big enough to have been detectable in the original study. The approach is demonstrated by examining replications of three well-known findings. Its benefits include the following: (a) differentiating “unsuccessful” replication attempts (i.e., studies yielding p > .05) that are too noisy from those that actively indicate the effect is undetectably different from zero, (b) “protecting” true findings from underpowered replications, and (c) arriving at intuitively compelling inferences in general and for the revisited replications in particular.
References
|
Alogna, V. K., Attaya, M. K., Aucoin, P., Bahník, Š., Birch, S., Birt, A. R., . . . Zwaan, R. A. (2014). Registered Replication Report: Schooler and Engstler-Schooler (1990). Perspectives on Psychological Science, 9, 556–578. Google Scholar | SAGE Journals | ISI | |
|
Asendorpf, J. B., Conner, M., De Fruyt, F., De Houwer, J., Denissen, J. J., Fiedler, K., . . . Wicherts, J. M. (2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27, 108–119. Google Scholar | Crossref | ISI | |
|
Association for Psychological Science . (2014). Registered Replication Reports. Retrieved from http://web.archive.org/web/20140623042346/http://www.psychologicalscience.org/index.php/replication Google Scholar | |
|
Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D., Marlow, C., Settle, J. E., Fowler, J. H. (2012). A 61-million-person experiment in social influence and political mobilization. Nature, 489, 295–298. Google Scholar | Crossref | Medline | ISI | |
|
Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. Google Scholar | Crossref | Medline | ISI | |
|
Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145–153. Google Scholar | Crossref | Medline | ISI | |
|
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Google Scholar | |
|
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003. Google Scholar | Crossref | ISI | |
|
Coursey, D., Hovis, J., Schulze, W. (1987). The disparity between willingness to accept and willingness to pay measures of value. Quarterly Journal of Economics, 102, 679–690. Google Scholar | Crossref | ISI | |
|
Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 3, 286–300. Google Scholar | SAGE Journals | ISI | |
|
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25, 7–29. Google Scholar | SAGE Journals | ISI | |
|
Dahl, M. S., Dezsā, C. L., Ross, D. G. (2012). Fatherhood and managerial style: How a male CEO’s children affect the wages of his employees. Administrative Science Quarterly, 57, 669–693. Google Scholar | SAGE Journals | ISI | |
|
Feddersen, J., Metcalfe, R., Wooden, M. (2012). Subjective well-being: Weather matters; climate doesn’t (Melbourne Institute Working Paper Series, 25/2012). Melbourne, Victoria, Australia: University of Melbourne. Retrieved from http://web.archive.org/web/20150107020727/http://melbourneinstitute.com/downloads/working_paper_series/wp2012n25.pdf Google Scholar | |
|
Fisher, R. A. (1926). The arrangement of field experiments. Journal of the Ministry of Agriculture, 33, 503–513. Google Scholar | |
|
Gámez, E., Díaz, J. M., Marrero, H. (2011). The uncertain universality of the Macbeth effect with a Spanish sample. The Spanish Journal of Psychology, 14, 156–162. Google Scholar | Crossref | Medline | ISI | |
|
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1–20. Google Scholar | Crossref | ISI | |
|
Hedges, L. V. (1984). Estimation of effect size under nonrandom sampling: The effects of censoring studies yielding statistically insignificant mean differences. Journal of Educational and Behavioral Statistics, 9, 61–85. Google Scholar | SAGE Journals | |
|
Hodges, J., Lehmann, E. (1954). Testing the approximate validity of statistical hypotheses. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 16, 261–268. Google Scholar | ISI | |
|
Ioannidis, J. P. A. (2008). Why most discovered true associations are inflated. Epidemiology, 19, 640–646. Google Scholar | Crossref | Medline | ISI | |
|
Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford, England: Oxford University Press. Google Scholar | |
|
Kahneman, D., Knetsch, J. L., Thaler, R. H. (1990). Experimental tests of the endowment effect and the Coase theorem. Journal of Political Economy, 98, 1325–1348. Google Scholar | Crossref | ISI | |
|
Knetsch, J. L., Sinden, J. A. (1984). Willingness to pay and compensation demanded: Experimental evidence of an unexpected disparity in measures of value. Quarterly Journal of Economics, 99, 507–521. Google Scholar | Crossref | ISI | |
|
Kraemer, H. C. (1983). Theory of estimation and testing of effect sizes: Use in meta-analysis. Journal of Educational and Behavioral Statistics, 8, 93–101. Google Scholar | SAGE Journals | |
|
Lane, D. M., Dunlap, W. P. (1978). Estimating effect size: Bias resulting from the significance criterion in editorial decisions. British Journal of Mathematical and Statistical Psychology, 31, 107–112. Google Scholar | Crossref | ISI | |
|
Lindley, D. V. (1965). Introduction to probability and statistics from a Bayesian viewpoint (Vol. 2). Cambridge, England: Cambridge University Press. Google Scholar | Crossref | |
|
Lucas, R. E., Lawless, N. M. (2013). Does life seem better on a sunny day? Examining the association between daily weather conditions and life satisfaction judgments. Journal of Personality and Social Psychology, 104, 872–884. Google Scholar | Crossref | Medline | ISI | |
|
Nosek, B. A., Banaji, M., Greenwald, A. G. (2002). Harvesting implicit group attitudes and beliefs from a demonstration web site. Group Dynamics: Theory, Research, and Practice, 6, 101–115. Google Scholar | Crossref | ISI | |
|
Open Science Collaboration . (2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science. Perspectives on Psychological Science, 7, 657–660. Google Scholar | SAGE Journals | ISI | |
|
Popper, K. R. (2005). The logic of scientific discovery [Taylor & Francis e-Library edition]. Retrieved from http://web.archive.org/web/20150218163435/http://strangebeautiful.com/other-texts/popper-logic-scientific-discovery.pdf (Original work published 1935) Google Scholar | |
|
Rindskopf, D. M. (1997). Testing ‘small’, not null, hypotheses: Classical and Bayesian approaches. In Harlow, L. L., Mulaik, S. A., Steiger, J. H. (Eds.), What if there were no significance tests (pp. 319–332). Mahwah, NJ: Erlbaum. Google Scholar | |
|
Rossi, J. S. (1990). Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology, 58, 646–656. Google Scholar | Crossref | Medline | ISI | |
|
Rouanet, H. (1996). Bayesian methods for assessing importance of effects. Psychological Bulletin, 119, 149–158. Google Scholar | Crossref | ISI | |
|
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237. Google Scholar | Crossref | Medline | ISI | |
|
Schooler, J. W., Engstler-Schooler, T. Y. (1990). Verbal overshadowing of visual memories: Some things are better left unsaid. Cognitive Psychology, 22, 36–71. Google Scholar | Crossref | Medline | ISI | |
|
Schwarz, N., Clore, G. (1983). Mood, misattribution, and judgments of well-being: Informative and directive functions of affective states. Journal of Personality and Social Psychology, 45, 513–523. Google Scholar | Crossref | ISI | |
|
Sedlmeier, P., Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309–316. Google Scholar | Crossref | ISI | |
|
Serlin, R. C., Lapsley, D. K. (1985). Rationality in psychological research: The good-enough principle. American Psychologist, 40, 73–83. Google Scholar | Crossref | ISI | |
|
Serlin, R. C., Lapsley, D. K. (1993). Rational appraisal of psychological research and the good-enough principle. In Keren, G., Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 199–228). Mahwah, NJ: Erlbaum. Google Scholar | |
|
Siev, J. (2012). [Attempt to replicate Zhong & Liljenquist]. Unpublished raw data. Google Scholar | |
|
Simonsohn, U. (2014). [17] No-way interactions. Retrieved from http://web.archive.org/web/20150206205257/http://datacolada.org/2014/03/12/17-no-way-interactions-2/ Google Scholar | |
|
Simonsohn, U., Nelson, L. D., Simmons, J. P. (2014a). p-curve and effect size: Correcting for publication bias using only significant results. Perspectives on Psychological Science, 9, 666–681. Google Scholar | SAGE Journals | ISI | |
|
Simonsohn, U., Nelson, L. D., Simmons, J. P. (2014b). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143, 534–547. Google Scholar | Crossref | Medline | ISI | |
|
Valentine, J. C., Biglan, A., Boruch, R. F., Castro, F. G., Collins, L. M., Flay, B. R., . . . Schinke, S. P. (2011). Replication in prevention science. Prevention Science, 12, 103–117. Google Scholar | Crossref | Medline | ISI | |
|
Verhagen, J., Wagenmakers, E.-J. (2014). A Bayesian test to quantify the success or failure of a replication attempt. Journal of Experimental Psychology: General, 143, 1457–1475. Google Scholar | Crossref | Medline | ISI | |
|
Zhong, C.-B., Liljenquist, K. (2006). Washing away your sins: Threatened morality and physical cleansing. Science, 313, 1451–1452. Google Scholar | Crossref | Medline | ISI |

