Abstract
Over-reliance on significance testing has been heavily criticized in psychology. Therefore the American Psychological Association recommended supplementing the p value with additional elements such as effect sizes, confidence intervals, and considering statistical power seriously. This article elaborates the conclusions that can be drawn when these measures accompany the p value. An analysis of over 30 summary papers (including over 6,000 articles) reveals that, if at all, only effect sizes are reported in addition to p’s (38%). Only every 10th article provides a confidence interval and statistical power is reported in only 3% of articles. An increase in reporting frequency of the supplements to p’s over time owing to stricter guidelines was found for effect sizes only. Given these practices, research faces a serious problem in the context of dichotomous statistical decision making: since significant results have a higher probability of being published (publication bias), effect sizes reported in articles may be seriously overestimated.
References
|
*Alhija, F. N., Levy, A. (2009). Effect size reporting practices in published articles. Educational and Psychological Measurement, 69, 245–265. Google Scholar | SAGE Journals | ISI | |
|
Altman, D. G. (1982). Misuse of statistics is unethical. In Altman, D. G., Gore, S. M. (Eds.), Statistics in practice (pp. 1–2). London, UK: BMJ Books. Google Scholar | |
|
Altman, D. G., Machin, D., Bryant, T. N., Gardner, M. J. (Eds.). (2000). Statistics with confidence: Confidence intervals and statistical guidelines (2nd ed.). London, UK: BMJ Books. Google Scholar | |
|
American Psychological Association . (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author. Google Scholar | |
|
American Psychological Association . (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: Author. Google Scholar | |
|
*Andersen, M. B., McCullagh, P., Wilson, G. J. (2007). But what do the numbers really tell us? Arbitrary metrics and effect size reporting in sport psychology research. Journal of Sport & Exercise Psychology, 29, 664–672. Google Scholar | Crossref | Medline | ISI | |
|
Anderson, D. R., Burnham, K. P., Thompson, W. L. (2000). Null hypothesis testing: Problems, prevalence, and alternatives. Journal of Wildlife Management, 64, 912–923. Google Scholar | Crossref | ISI | |
|
Atkinson, D. R., Furlang, M. J., Wampold, B. E. (1982). Statistical significance, reviewer evaluations, and the scientific process: Is there a (statistically) significant relationship? Journal of Counseling Psychology, 29, 189–194. Google Scholar | Crossref | ISI | |
|
Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100, 603–617. Google Scholar | Crossref | Medline | ISI | |
|
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66, 423–437. Google Scholar | Crossref | Medline | ISI | |
|
Balluerka, N., Gomez, J., Hidalgo, D. (2005). The controversy over null hypothesis significance testing revisited. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 1, 55–70. Google Scholar | Crossref | |
|
Begg, C. B. (1994). Publication bias. In Cooper, H., Hedges, L. V. (Eds.), The handbook of research synthesis (pp. 399–409). New York, NY: Russell Sage Foundation. Google Scholar | |
|
Belia, S., Fidler, F., Williams, J., Cumming, G. (2005). Researchers misunderstand confidence intervals and standard error bars. Psychological Methods, 10, 389–396. Google Scholar | Crossref | Medline | ISI | |
|
*Bezeau, S., Graves, R. (2001). Statistical power and effect sizes of clinical neuropsychology research. Journal of Clinical and Experimental Neuropsychology, 23, 399–406. Google Scholar | Crossref | Medline | ISI | |
|
Boring, E. G. (1919). Mathematical vs. scientific importance. Psychological Bulletin, 16, 335–338. Google Scholar | Crossref | |
|
Brandstätter, E. (1999). Confidence intervals as an alternative to significance testing. Methods of Psychological Research Online, 14, 33–46. Google Scholar | |
|
Budge, G., Katz, B. (1995). Constructing psychological knowledge: Reflections on science, scientists and epistemology in the APA Publication Manual. Theory & Psychology, 5, 217–231. Google Scholar | SAGE Journals | ISI | |
|
*Byrd, J. K. (2007). A call for statistical reform in Educational Administration Quarterly. Educational Administration Quarterly, 43, 381–391. Google Scholar | SAGE Journals | ISI | |
|
Capraro, M. M., Capraro, R. M. (2003). Exploring the APA Fifth Edition Publication Manual’s impact on the analytic preferences of journal editorial board members. Educational and Psychological Measurement, 63, 554–565. Google Scholar | SAGE Journals | ISI | |
|
Cashen, L. H., Geiger, S. W. (2004). Statistical power and the testing of null hypotheses: A review of contemporary management research and recommendations for future studies. Organizational Research Methods, 7, 151–167. Google Scholar | SAGE Journals | ISI | |
|
Castro Sotos, A. E., Vanhoof, S., Van den Noortgate, W., Onghena, P. (2007). Students’ misconceptions of statistical inference: A review of the empirical evidence from research on statistics education. Educational Research Review, 2, 98–113. Google Scholar | Crossref | |
|
Chan, A. W., Hróbjartsson, A., Haahr, M. T., Gøtzsche, P. C., Altman, D. G. (2004). Empirical evidence for selective reporting of outcomes in randomized trials—Comparison of protocols to published articles. Journal of the American Medical Association, 291, 2457–2465. Google Scholar | Crossref | ISI | |
|
*Clark-Carter, D. (1997). The account taken of statistical power in research published in the British Journal of Psychology. British Journal of Psychology, 88, 71–83. Google Scholar | Crossref | ISI | |
|
Cohen, J. (1977). Statistical power analysis for the behavioral sciences (rev. ed.). New York, NY: Academic Press. Google Scholar | |
|
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York, NY: Erlbaum. Google Scholar | |
|
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003. Google Scholar | Crossref | ISI | |
|
Cohn, L. D., Becker, B. J. (2003). How meta-analysis increases statistical power. Psychological Methods, 8, 243–253. Google Scholar | Crossref | Medline | ISI | |
|
*Conzelmann, A., Raab, M. (2009). Datenanalyse: Das Null-Ritual und der Umgang mit Effekten in der Zeitschrift für Sportpsychologie [Data analysis: The null ritual and the use of effect sizes]. Zeitschrift für Sportpsychologie, 16, 43–54. Google Scholar | Crossref | ISI | |
|
*Crosby, R. D., Wonderlich, S. A., Mitchell, J. E., deZwaan, M., Engel, S. G., Connolly, K., … Taheri, M. (2006). An empirical analysis of eating disorders and anxiety disorders publications (1980–2000) - Part II: Statistical hypothesis testing. International Journal of Eating Disorders, 39, 49–54. Google Scholar | Crossref | Medline | ISI | |
|
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge. Google Scholar | |
|
Cumming, G., Fidler, F. (2009). Confidence intervals: Better answers to better questions. Journal of Psychology, 217, 15–26. Google Scholar | ISI | |
|
*Cumming, G., Fidler, F., Leonard, M., Kalinowski, P., Christiansen, A., Kleinig, A., … Wilson, S. (2007). Statistical reform in psychology: Is anything changing. Psychological Science, 18, 230-233. Google Scholar | SAGE Journals | ISI | |
|
Cumming, G., Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532–574. Google Scholar | SAGE Journals | ISI | |
|
Cumming, G., Maillardet, R. (2006). Confidence intervals and replication: Where will the next mean fall? Psychological Methods, 11, 217–227. Google Scholar | Crossref | Medline | ISI | |
|
Cumming, G., Williams, J., Fidler, F. (2004). Replication and researchers’ understanding of confidence intervals and standard error bars. Understanding Statistics, 3, 299–311. Google Scholar | Crossref | |
|
Dalton, D. R., Dalton, C. M. (2008). Meta-analyses: Some very good steps toward a bit longer journey. Organizational Research Methods, 11, 127–147. Google Scholar | SAGE Journals | ISI | |
|
Dar, R., Serlin, R. C., Omer, H. (1994). Misuse of statistical tests in three decades of psychotherapy research. Journal of Consulting and Clinical Psychology, 62, 75–82. Google Scholar | Crossref | Medline | ISI | |
|
Denis, D. (2003). Alternatives to null hypothesis significance testing. Theory & Science, 4, 1. Retrieved from http://theoryandscience.icaap.org/content/vol4.1/02_denis.html Google Scholar | |
|
Dickersin, K., Min, Y. I. (1993). Publication bias: The problem that won’t go away. Annals of the New York Academy of Sciences, 703, 135–146. Google Scholar | Crossref | Medline | ISI | |
|
*Dunleavy, E. M., Barr, C. D., Glenn, D. M., Miller, K. R. (2006). Effect size reporting in applied psychology: How are we doing? The Industrial-Organizational Psychologist, 43, 29–37. Google Scholar | |
|
Easterbrook, P. J., Berlin, J. A., Gopalan, R., Matthews, D. R. (1991). Publication bias in clinical research. The Lancet, 337, 867–872. Google Scholar | Crossref | Medline | ISI | |
|
Edwards, W., Lindman, H., Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70, 193–242. Google Scholar | Crossref | ISI | |
|
*Faulkner, C., Fidler, F., Cumming, G. (2008). The value of RCT evidence depends on the quality of statistical analysis. Behaviour Research and Therapy, 46, 270–281. Google Scholar | Crossref | Medline | ISI | |
|
Ferguson, C. J. (2009). Is psychological research really as good as medical research? Effect size comparisons between psychology and medicine. Review of General Psychology, 13, 130–136. Google Scholar | SAGE Journals | ISI | |
|
Ferguson, C. J., Brannick, M. T. (2012). Publication bias in psychological science: Prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychological Methods, 17, 120–128. doi: 10.1037/a0024445 Google Scholar | Crossref | Medline | ISI | |
|
Fidler, F. (2002). The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement, 62, 749–770. Google Scholar | SAGE Journals | ISI | |
|
Fidler, F. (2006). From statistical significance to effect size estimation: Statistical reform in psychology, medicine and ecology. Unpublished Ph.D. thesis, University of Melbourne, Australia. Google Scholar | |
|
Fidler, F., Cumming, G. (2007). Lessons learned from statistical reform efforts in other disciplines. Psychology in the Schools, 44, 441–449. Google Scholar | Crossref | ISI | |
|
Fidler, F., Cumming, G., Burgman, M., Thomason, N. (2004). Statistical reform in medicine, psychology and ecology. The Journal of Socio-Economics, 33, 615–630. Google Scholar | Crossref | |
|
*Fidler, F., Cumming, G., Thomason, N., Pannuzzo, D., Smith, J., Fyffe, P., … Schmitt, R. (2005). Toward improved statistical reporting in the Journal of Consulting and Clinical Psychology. Journal of Consulting and Clinical Psychology, 73, 136–143. Google Scholar | Crossref | Medline | ISI | |
|
Fidler, F., Thomason, N., Cumming, G., Finch, S., Leeman, J. (2004). Editors can lead researchers to confidence intervals, but can’t make them think. Psychological Science, 15, 119–126. Google Scholar | SAGE Journals | ISI | |
|
*Finch, S., Cumming, G., Thomason, N. (2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement, 61, 181–210 Google Scholar | ISI | |
|
*Finch, S., Cumming, G., Williams, J., Palmer, L., Griffith, E., Alders, C., … Goodman, O. (2004). Reform of statistical inference in psychology: The case of memory and cognition. Behavior Research Methods, Instruments, & Computers, 36, 312–324. Google Scholar | Crossref | Medline | |
|
Finch, S., Thomason, N., Cumming, G. (2001). Past and future APA guidelines for statistical practice. Theory & Psychology, 12, 825–853. Google Scholar | SAGE Journals | ISI | |
|
Fisher, R. A. (1950). Statistical methods for research workers (11th ed.). Edinburgh, UK: Oliver & Boyd. (Original work published 1925) Google Scholar | |
|
Fisher, R. A. (1951). The design of experiments (5th ed.). Edinburgh, UK: Oliver & Boyd. (Original work published 1935) Google Scholar | |
|
Frick, R. W. (1995). A problem with confidence intervals. American Psychologist, 50, 1102–1103. Google Scholar | Crossref | ISI | |
|
Fritz, A., Lermer, E. M., Kühberger, A. (2011). The significance fallacy in inferential statistics. Manuscript submitted for publication. Google Scholar | |
|
Gardner, M. J., Altman, D. G. (1986). Confidence intervals rather than P values: Estimation rather than hypothesis testing. British Medical Journal, 292, 746–750. Google Scholar | Crossref | Medline | |
|
Gerber, A. S., Malhotra, N. (2008). Publication bias in empirical sociological research: Do arbitrary significance levels distort published results? Sociological Methods & Research, 37, 3–30. Google Scholar | SAGE Journals | ISI | |
|
Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In Keren, G., Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311–339). Hillsdale, NJ: Erlbaum. Google Scholar | |
|
Gigerenzer, G. (2010). Personal reflections on theory and psychology. Theory & Psychology, 20, 733–743. Google Scholar | SAGE Journals | ISI | |
|
Gigerenzer, G., Krauss, S., Vitouch, O. (2004). The null ritual: What you always wanted to know about significance testing but were afraid to ask. In Kaplan, D. (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 391–408). Thousand Oaks, CA: Sage. Google Scholar | Crossref | |
|
Gigerenzer, G., Murray, D. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Erlbaum. Google Scholar | |
|
Greenland, S. (1998). Meta-analysis. In Rothman, K., Greenland, S. (Eds.), Modern epidemiology (pp. 287–318). Philadelphia, PA: Lippincott-Raven. Google Scholar | |
|
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1–20. Google Scholar | Crossref | ISI | |
|
Grissom, R. J., Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Erlbaum. Google Scholar | |
|
Hagen, R. L. (1997). In praise of the null hypothesis statistical test. American Psychologist, 52, 15–24. Google Scholar | Crossref | ISI | |
|
*Hager, W. (2005). Vorgehensweise in der deutschsprachigen psychologischen Forschung: Eine Analyse empirischer Arbeiten der Jahre 2001 und 2002 [Procedures in German empirical research: An analysis of some psychological journals of the years 2001 and 2002]. Psychologische Rundschau, 56, 191–200. Google Scholar | |
|
Hallahan, M., Rosenthal, R. (1996). Statistical power: Concepts, procedures, and applications. Behaviour Research and Therapy, 34, 489–499. Google Scholar | Crossref | Medline | ISI | |
|
Haller, H., Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers? Methods of Psychological Research Online, 7, 1–20. Google Scholar | |
|
Halpin, P. F., Stam, H. J. (2006). Inductive inference or inductive behavior: Fisher and Neyman–Pearson approaches to statistical testing in psychological research (1940–1960). American Journal of Psychology, 119, 625–653. Google Scholar | Crossref | Medline | ISI | |
|
Harlow, L. L. (1997). Significance testing introduction and overview. In Harlow, L. L., Mulaik, S. A., Steiger, J. H. (Eds.), What if there were no significance tests? (pp. 37–64). Mahwah, NJ: Erlbaum. Google Scholar | |
|
Harlow, L. L., Mulaik, S. A., Steiger, J. H. (Eds.). (1997). What if there were no significance tests? Mahwah, NJ: Erlbaum. Google Scholar | |
|
Hays, W. L., Winkler, R. L. (1970). Statistics: Probability, inference and decision. New York, NY: Holt, Rinehart & Winston. Google Scholar | |
|
Hedges, L. V. (2008). What are effect sizes and why do we need them? Child Development Perspectives, 2, 167–171. Google Scholar | Crossref | ISI | |
|
Henson, R. K. (2006). Effect-size measures and meta-analytic thinking in counseling psychology research. The Counseling Psychologist, 34, 601–629. Google Scholar | SAGE Journals | ISI | |
|
*Hoekstra, R., Finch, S., Kiers, H. A. L., Johnson, A. (2006). Probability as certainty: Dichotomous thinking and the misuse of p values. Psychonomic Bulletin & Review, 13, 1033–1037. Google Scholar | Crossref | Medline | ISI | |
|
Hubbard, R., Ryan, P. A. (2000). The historical growth of statistical significance testing in psychology—and its future prospects. Educational and Psychological Measurement, 60, 661–681. Google Scholar | Abstract | ISI | |
|
Huberty, C. J. (2002). A history of effect size indices. Educational and Psychological Measurement, 62, 227–240. Google Scholar | SAGE Journals | ISI | |
|
Hunter, J. E., Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research findings. Newbury Park, CA: Sage. Google Scholar | |
|
International Committee of Medical Journal Editors . (1988). Uniform requirements for manuscripts submitted to biomedical journals. Annals of Internal Medicine, 108, 258–265. Google Scholar | Crossref | Medline | ISI | |
|
*Ives, B. (2003). Effect size use in studies of learning disabilities. Journal of Learning Disabilities, 36, 490–504. Google Scholar | SAGE Journals | ISI | |
|
Kalinowski, P., Fidler, F. (2010). Interpreting significance: The differences between statistical significance, effect size, and practical importance. Newborn and Infant Nursing Reviews, 10, 50–54. Google Scholar | Crossref | |
|
*Keselman, H. J., Huberty, C. J, Lix, L. M., Olejnik, S., Cribbie, R., Donahue, B., … Levin, J. R. (1998). Statistical practices of educational researchers: An analysis of their ANOVA, MANOVA, and ANCOVA analyses. Review of Educational Research, 68, 350–386. Google Scholar | SAGE Journals | ISI | |
|
*Kieffer, K. M., Reese, R. J., Thompson, B. (2001). Statistical techniques employed in AERJ and JCP articles from 1988 to 1977: A methodological review. Journal of Experimental Education, 69, 280–309. Google Scholar | Crossref | ISI | |
|
Killeen, P. R. (2006). Beyond statistical significance: A decision theory for science. Psychonomic Bulletin & Review, 13, 549–562. Google Scholar | Crossref | Medline | ISI | |
|
*Kirk, R. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746–759. Google Scholar | SAGE Journals | ISI | |
|
Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research. Washington, DC: American Psychological Association. Google Scholar | Crossref | |
|
*Kosciulek, J. F., Szymanski, E. M. (1993). Statistical power analysis of rehabilitation counseling research. Rehabilitation Counseling Bulletin, 36, 212–219. Google Scholar | ISI | |
|
Kraemer, H. C., Thiemann, S. (1987). How many subjects? Statistical power analysis in research. Beverly Hills, CA: Sage. Google Scholar | |
|
Kruschke, J. K. (2010). What to believe: Bayesian methods for data analysis. Trends in Cognitive Sciences, 14, 293–300. Google Scholar | Crossref | Medline | ISI | |
|
Lenth, R. V. (2007). Statistical power calculations. Journal of Animal Science, 85, E24–E29. Google Scholar | Crossref | Medline | ISI | |
|
Levine, T. R., Asada, K. J., Carpenter, C. (2009). Sample sizes and effect sizes are negatively correlated in meta-analyses: Evidence and implications of a publication bias against nonsignificant findings. Communication Monographs, 76, 286–302. Google Scholar | Crossref | ISI | |
|
Levine, T. R., Weber, R., Hullett, C., Park, H. S., Lindsey, L. L. M. (2008). A critical assessment of null hypothesis significance testing in quantitative communication research. Human Communication Research, 34, 171–187. Google Scholar | Crossref | ISI | |
|
Mahoney, M. J. (1977). Publication prejudices: An experimental study of confirmatory bias in the peer review system. Cognitive Therapy and Research, 1, 161–175. Google Scholar | Crossref | |
|
*Matthews, M. S., Gentry, M., McCoach, D. B, Worrell, F. C., Matthews, D. M., Dixon, F. (2008). Evaluating the state of a field: Effect size reporting in gifted education. Journal of Experimental Education, 77, 55–68. Google Scholar | Crossref | ISI | |
|
Maxwell, S. E. (2004). The persistence of underpowered studies in psychological research: Causes, consequences and remedies. Psychological Methods, 9, 147–163. Google Scholar | Crossref | Medline | ISI | |
|
Maxwell, S. E., Kelley, K., Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology, 59, 537–563. Google Scholar | Crossref | Medline | ISI | |
|
May, W. W. (1975). The composition and function of ethical committees. Journal of Medical Ethics, 1, 23–29. Google Scholar | Crossref | Medline | ISI | |
|
McDaniel, M. A., Rothstein, H. R., Whetzel, D. L. (2006). Publication bias: A case study of four test vendors. Personnel Psychology, 59, 927–953. Google Scholar | Crossref | ISI | |
|
*McMillan, J. H., Lawson, S., Lewis, K., Snyder, A. (2002, April). Reporting effect size: The road less traveled. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Google Scholar | |
|
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834. Google Scholar | Crossref | ISI | |
|
Mone, M. A., Mueller, G. C., Mauland, W. (1996). The perceptions and usage of statistical power in applied psychology and management research. Personnel Psychology, 49, 103–120. Google Scholar | Crossref | ISI | |
|
Murphy, K. R, Myors, B. (2004). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests. Mahwah, NJ: Erlbaum. Google Scholar | |
|
Neyman, J., Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Statistical Society, Series A, 231, 289–337. Google Scholar | Crossref | |
|
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301. Google Scholar | Crossref | Medline | ISI | |
|
Oakes, M. (1986). Statistical inference: A commentary for the social and behavioral sciences. New York, NY: Wiley. Google Scholar | |
|
*Osborne, J. W. (2008). Sweating the small stuff in educational psychology: How effect size and power reporting failed to change from 1969 to 1999, and what that means for the future of changing practices. Educational Psychology, 28, 151–160. Google Scholar | Crossref | ISI | |
|
Overall, J. E. (1969). Classical statistical hypothesis testing within the context of Bayesian theory. Psychological Bulletin, 71, 285–292. Google Scholar | Crossref | ISI | |
|
*Paul, K. M., Plucker, J. A. (2004). Two steps forward, one step back: Effect size reporting in gifted education research from 1995–2000. Roeper Review, 26, 68–72. Google Scholar | Crossref | |
|
*Plucker, J. A. (1997). Debunking the myth of the “highly significant” result: Effect sizes in gifted education research. Roeper Review, 20, 122–126. Google Scholar | Crossref | |
|
Richardson, J. T. E. (2011). Eta squared and partial eta squared as measures of effect size in educational research. Educational Research Review, 6, 135–147. Google Scholar | Crossref | ISI | |
|
Rosnow, R. L., Rosenthal, R. (2009). Effect sizes: Why, when, and how to use them. Journal of Psychology, 217, 6–14. Google Scholar | ISI | |
|
Rossi, J. S. (1990). Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology, 58, 646–656. Google Scholar | Crossref | Medline | ISI | |
|
Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1, 115–129. Google Scholar | Crossref | ISI | |
|
Schmidt, F. L., Hunter, J. E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In Harlow, L. L., Mulaik, S. A., Steiger, J. H. (Eds.), What if there were no significance tests? (pp. 37–64). Mahwah, NJ: Erlbaum. Google Scholar | |
|
Sedlmeier, P., Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 107, 309–316. Google Scholar | Crossref | |
|
Selvin, H. C., Stuart, A. (1966). Data-dredging procedures in survey research. American Statistician, 20, 20–23. Google Scholar | ISI | |
|
Serlin, R. C., Lapsley, D. K. (1985). Rationality in psychological research: The good-enough principle. American Psychologist, 40, 73–83. Google Scholar | Crossref | ISI | |
|
Serlin, R. C., Lapsley, D. K. (1993). Rational appraisal of psychological research and the good-enough principle. In Keren, G., Lewis, C. (Eds.), A handbook of data analysis in behavioural sciences: Vol. 1. Methodological issues (pp. 199–228). Hillsdale, NJ: Erlbaum. Google Scholar | |
|
Shercliffe, R. J., Stahl, W., Tuttle, M. P. (2009). The use of meta-analysis in psychology: A superior vintage or the casting of old wine in new bottles? Theory & Psychology, 19, 413–430. Google Scholar | SAGE Journals | ISI | |
|
Skidmore, S. T., Thompson, B. (2011). Choosing the best correction formula for the Pearson r2 effect size. Journal of Experimental Education, 79, 257–278. Google Scholar | Crossref | ISI | |
|
*Snyder, P. A., Thompson, B. (1998). Use of tests of statistical significance and other analytic choices in a school psychology journal: Review of practices and suggested alternatives. School Psychology Quarterly, 13, 335–348. Google Scholar | Crossref | ISI | |
|
Sterling, T. D. (1959). Publication bias and their possible effects on inference drawn from tests of significance—or vice versa. Journal of the American Statistical Association, 54, 30–34. Google Scholar | ISI | |
|
Sterling, T. D., Rosenbaum, W. L., Weinkam, J. J. (1995). Publication bias revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. The American Statistician, 49, 108–112. Google Scholar | ISI | |
|
Stern, J. M., Simes, R. J. (1997). Publication bias: Evidence of delayed publication in a cohort study of clinical research projects. British Medical Journal, 315, 640–645. Google Scholar | Crossref | Medline | |
|
Sterne, A. C., Gavaghan, D., Egger, M. (2000). Publication and related bias in meta-analysis. Journal of Clinical Epidemiology, 53, 1119–1129. Google Scholar | Crossref | Medline | ISI | |
|
*Sun, S., Pan, W., Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102, 989–1004. Google Scholar | Crossref | ISI | |
|
Thompson, B. (1993). The use of statistical significance tests in research: Bootstrap and other alternatives. Journal of Experimental Education, 61, 361–377. Google Scholar | Crossref | ISI | |
|
Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25, 26–30. Google Scholar | SAGE Journals | |
|
Thompson, B. (1997–2001). 402 citations questioning the indiscriminate use of null hypothesis significance tests in observational studies. Retrieved from http://www.warnercnr.colostate.edu/~anderson/thompson1.html Google Scholar | |
|
Thompson, B. (1999a). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology, 10, 167–183. Google Scholar | |
|
*Thompson, B. (1999b). Improving research clarity and usefulness with effect size indices as supplements to statistical significance tests. Exceptional Children, 65, 329–337. Google Scholar | SAGE Journals | ISI | |
|
Thompson, B. (2002a). “Statistical,” “practical,” and “clinical”: How many kinds of significance do counselors need to consider? Journal of Counseling and Development, 80, 64–71. Google Scholar | Crossref | ISI | |
|
Thompson, B. (2002b). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31, 25–32. Google Scholar | SAGE Journals | |
|
*Thompson, B., Snyder, P. A. (1997). Statistical significance testing practices in the Journal of Experimental Education. Journal of Experimental Education, 66, 75–83. Google Scholar | Crossref | ISI | |
|
*Thompson, B., Snyder, P. A. (1998). Statistical significance and reliability analyses in recent Journal of Counseling & Development research articles. Journal of Counseling & Development, 76, 436–441. Google Scholar | Crossref | ISI | |
|
Tukey, J. W. (1991). The philosophy of multiple comparisons. Statistical Science, 6, 100–116. Google Scholar | Crossref | |
|
Tversky, A., Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76, 105–110. Google Scholar | Crossref | |
|
*Vacha-Haase, T., Ness, C. N. (1999). Statistical significance testing as it relates to practice: Use within professional psychology: Research and practice. Professional Psychology: Research and Practice, 30, 104–105. Google Scholar | Crossref | ISI | |
|
*Vacha-Haase, T., Nilsson, J. E. (1998). Statistical significance reporting: Current trends and uses in MECD. Measurement and Evaluation in Counseling and Development, 31, 46–57. Google Scholar | ISI | |
|
*Vacha-Haase, T., Nilsson, J. E., Reetz, D. R., Lance, T. S., Thompson, B. (2000). Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory & Psychology, 10, 413–425. Google Scholar | SAGE Journals | ISI | |
|
Vacha-Haase, T., Thompson, B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology, 51, 473–481 Google Scholar | Crossref | ISI | |
|
Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779–804. Google Scholar | Crossref | Medline | ISI | |
|
Wang, Z., Thompson, B. (2007). Is the Pearson r2 biased, and if so, what is the best correction formula? Journal of Experimental Education, 75, 109–125. Google Scholar | Crossref | ISI | |
|
Wilkinson, L. , & the Task Force on Statistical Inference . (1999). Statistical methods in psychology journals: Guidelines and explanation. American Psychologist, 54, 594–604. Google Scholar | Crossref | ISI | |
|
Williamson, P. R., Gamble, C., Altman, D. G., Hutton, J. L. (2005). Outcome selection bias in meta-analysis. Statistical Methods in Medical Research, 14, 515–524. Google Scholar | SAGE Journals | ISI | |
|
*Woods, S. P., Rippeth, J. D., Conover, E., Carey, C. L., Parsons, T. D., Troster, A. I. (2006). Statistical power of studies examining the cognitive effects of subthalamic nucleus deep brain stimulation in Parkinson’s disease. Clinical Neuropsychologist, 20, 27–38. Google Scholar | Crossref | Medline | ISI |
