Calculating and reporting effect sizes on scientific papers (1): p < 0.05 limitations in the analysis of mean differences of two groups

Authors

DOI:

https://doi.org/10.7342/ismt.rpics.2015.1.1.14

Keywords:

Effect size, Statistical significance, Cohen's d, p-value, Hedges’ g, Glass’s Delta

Abstract

The Portuguese Journal of Behavioral and Social Research requires authors to follow the recommendations of the Publication Manual of the American Psychological Association (APA, 2010) in the presentation of statistical information. One of the APA recommendations is that effect sizes should be presented along with levels of statistical significance. Since p-values from the results of the statistical tests do not indicate the magnitude or importance of a difference, then effect sizes (ES) should be reported. In fact, ES gives meaning to statistical tests; emphasizes the power of statistical tests; reduces the risk of interpreting mere sampling variation as real relationship; can increase the reporting of “non-significant" results, and allows the accumulation of knowledge from several studies using meta-analysis. Thus, the objectives of this paper are to present the limits of the significance level; describe the foundations of presentation of ES of statistical tests to analyze differences between two groups; present the formulas to calculate directly ES, providing examples of our own previous studies; show how to calculate confidence intervals; provide the conversion formulas for the review of the literature; indicate how to interpret the ES; and show that, although interpretable, the meaning (small, medium or large effect for an arbitrary metric) could be inaccurate, requiring that interpretation should be made in the context of the research area and in the context of real-world variables.

Downloads

Download data is not yet available.

References

Acion, L., Peterson, J. J., Temple, S., & Arndt, S. (2006). Probabilistic index: an intuitive non-parametric approach to measuring the size of treatment effects. Statistics in Medicine, 25(4), 591–602. [Google Scholar] [Crossref]

Aguinis, H., Werner, S., Abbott, J. L., Angert, C., Park, J. H., & Kohlhausen, D. (2010). Customer-centric science: Reporting significant research results with rigor, relevance, and practical impact in mind. Organizational Research Methods, 13(3), 515–539. [Google Scholar]

Aickin, M. (2004). Bayes without priors. Journal of Clinical Epidemiology, 57(1), 4–13. [Google Scholar] [Crossref]

American Psychological Association. (APA) (2010). Publication Manual of the American Psychological Association (6.ª ed.). Washington, DC: APA. [Google Scholar]

Andersen, M. B., McCullagh, P., & Wilson, G. J. (2007). But what do the numbers really tell us? Arbitrary metrics and effect size reporting in sport psychology research. Journal of Sport e Exercise Psychology, 29(5), 664–672. [Google Scholar]

Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100(3), 603–617. [Google Scholar]

Berben, L., Sereika, S. M., & Engberg, S. (2012). Effect size estimation: methods and examples. International Journal of Nursing Studies, 49(8), 1039–1047. [Google Scholar] [Crossref]

Bezeau, S., & Graves, R. (2001). Statistical power and effect sizes of clinical neuropsychology research. Journal of Clinical and Experimental Neuropsychology (Neuropsychology, Development and Cognition: Section a), 23(3), 399–406. [Google Scholar]

Blanton, H., & Jaccard, J. (2006a). Arbitrary metrics in psychology. The American Psychologist, 61(1), 27–41. [Google Scholar] [Crossref]

Blanton, H., & Jaccard, J. (2006b). Arbitrary metrics redux. The American Psychologist, 61(1), 62. [Google Scholar]

Borenstein, M. (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hodges e J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (pp. 221–235). New York: Russell Sage Foundation. [Google Scholar]

Breaugh, J. A. (2003). Effect size estimation: Factors to consider and mistakes to avoid. Journal of Management, 29(1), 79–97. [Google Scholar]

Caperos, J. M., & Pardo, A. (2013). Consistency errors in p-values reported in Spanish psychology journals. Psicothema, 25(3), 408–414. [Google Scholar] [Crossref]

Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378–399. [Google Scholar]

Chow, S. L. (1988). Significance test or effect size? Psychological Bulletin, 103(1), 105-110. [Google Scholar]

Coe, R. (2002). It's the effect size, stupid: what effect size is and why it is important. Presented at the Annual Conference of the British Educational Research Association, University of Exeter, England, Education-line. [Google Scholar]

Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65(3), 145–153. [Google Scholar]

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2.ª ed.). Hillsdale: Lawrence Erlbaum Associates. [Google Scholar]

Cohen, J. (1992a). A power primer. Psychological Bulletin, 112(1), 155. [Google Scholar]

Cohen, J. (1992b). Statistical power analysis. Current Directions in Psychological Science, 1(3), 98–101. [Google Scholar]

Cohen, J. (1994). The earth is round (p < .05). The American Psychologist, 49(12), 997-1003[Google Scholar]

Conn, V. S., Chan, K. C., & Cooper, P. S. (2014). The problem with p. Western Journal of Nursing Research, 36(3), 291–293. [Google Scholar]

Cook, R. J., & Sackett, D. L. (1995). The number needed to treat: A clinically useful measure of treatment effect. BMJ, 310(6977), 452–454. [Google Scholar]

Cooper, H., Hedges, L. V., & Valentine, J. C. (2009). The handbook of research synthesis and meta-analysis (2.ª ed.). New York: Russell Sage Foundation. [Google Scholar]

Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge. [Google Scholar]

Dunlap, W. P. (1994). Generalizing the common language effect size indicator to bivariate normal correlations. Psychological Bulletin, 116(3), 509–511. [Google Scholar]

Durlak, J. A. (2009). How to select, calculate, and interpret effect sizes. Journal of Pediatric Psychology, 34(9), 917–928. [Google Scholar]

Ellis, P. D. (2010). The essential guide to effect sizes. Statistical power, meta-analysis, and the interpretation of research results (pp. 1–193). Cambridge: Cambridge University Press. [Google Scholar]

Embretson, S. E. (2006). The continued search for nonarbitrary metrics in psychology. The American Psychologist, 61(1), 50–55. [Google Scholar] [Crossref]

Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practice, 40(5), 532-538. [Google Scholar]

Fern, E. F., & Monroe, K. B. (1996). Effect-size estimates: Issues and problems in interpretation. Journal of Consumer Research, 23(2), 89–105. [Google Scholar]

Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd. [Google Scholar]

Fisher, R. A. (1959). Statistical methods and scientific inference (2.ª ed.). Edinburgh: Oliver and Boyd. [Google Scholar]

Furukawa, T. A., & Leucht, S. (2011). How to obtain NNT from Cohen's d: Comparison of two methods. PLoS ONE, 6(4), e19070, 1-5. [Google Scholar]

Giere, R. N. (1972). The significance test controversy. British Journal for the Philosophy of Science, 23(2), 170–181. [Google Scholar]

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. [Google Scholar]

Glass, G.V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Sage: Beverly Hills. [Google Scholar]

Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence Erlbaum Associates Publishers. [Google Scholar]

Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational and Behavioral Statistics, 6(2), 107–128. [Google Scholar]

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis (Vol. 11, pp. 104–106). Orlando: Academic Press. [Google Scholar]

Hentschke, H., & Stüttgen, M. C. (2011). Computation of measures of effect size for neuroscience data sets. The European Journal of Neuroscience, 34(12), 1887–1894. [Google Scholar] [Crossref]

Huberty, C. J. (1993). Historical origins of statistical testing practices: The treatment of Fisher versus Neyman-Pearson views in textbooks. The Journal of Experimental Education, 61(4), 317–333. [Google Scholar]

Jacobson, N. S., & Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12-19. [Google Scholar]

Kazdin, A. E. (2006). Arbitrary metrics: implications for identifying evidence-based treatments. The American Psychologist, 61(1), 42-49. [Google Scholar] [Crossref] https://scholar.google.pt/scholar?hl=pt-

Killeen, P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16(5), 345–353. [Google Scholar]

Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56(5), 746–759. [Google Scholar] [Crossref]

Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research (2.ª ed.). Washington, DC: American Psychological Association. [Google Scholar]

Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. BPS, 59(11), 990–996. [Crossref]

Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS ONE, 9(9), e105825, 1-8. [Google Scholar] [Crossref]

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4(863), 1-12. [Google Scholar] [Crossref]

Lee, M. D., & Wagenmakers, E.-J. (2005). Bayesian statistical inference in psychology: comment on Trafimow (2003). Psychological Review, 112(3), 662–668. [Google Scholar] [Crossref]

Lemos, L., Espirito-Santo, H., Silva, G. F., Costa, M., Cardoso, D., Vicente, F., et al. (2014). The impact of a Neuropsychological Rehabilitation Group Program (NRGP) on cognitive and emotional functioning in institutionalized elderly (p. 1). Presented at the 22nd European Congress of Psychiatry, Munich. [Google Scholar]

Lenth, R. V. (2006–2014). Java applets for power and sample size. [URL]

Liesbeth, W. A., Prins, J. B., Vernooij-Dassen, M. J. F. J., Wijnen, H. H., Olde Rikkert, M. G. M., & Kessels, R. P. C. (2011). Group therapy for patients with mild cognitive impairment and their significant others: results of a waiting-list controlled trial. Gerontology, 57(5), 444–454. [Google Scholar] [Crossref]

Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W. … Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. National Center for Special Education Research. National Center for Special Education Research, Institute of Education Sciences. [Google Scholar]

Loftus, G. R. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology, 36(2), 102–105. [Google Scholar]

Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 161–171. [Google Scholar]

McCartney, K., & Rosenthal, R. (2000). Effect size, practical importance, and social policy for children. Child Development, 71(1), 173–180. [Google Scholar]

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361-365. [Google Scholar]

McMillan, J. H., & Foley, J. (2011). Reporting and discussing effect size: Still the road less traveled. Practical Assessment, Research e Evaluation, 16(14), 1–12. [Google Scholar]

Morrison, D. E., & Henkel, R. E. (Eds.). (1970). The significance test controversy: A reader. Chicago: Aldine. [Google Scholar]

Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence interval and statistical significance: A practical guide for biologists. Biological Reviews, 82(4), 591–605. [Google Scholar] [Crossref]

Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301. [Google Scholar] [Crossref]

Nunnally, J. C. (1978). Psychometric theory. New York: McGraw-Hill. [Google Scholar]

Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25(3), 241–286. [Google Scholar] [Crossref]

Orwin, R. G. (1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics, 8(2), 157–159. [Google Scholar]

Paiva, A. C., Cunha, M., Xavier, A. M., Marques, M., Simões, S., & Espirito-Santo, H. (2013). Exploratory study of risk-taking and self-harm behaviours in adolescents: prevalence, characteristics and its relationship to attachment styles. European Psychiatry, 28(Supl. 1). [Google Scholar] [Crossref]

Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London. Series a, Containing Papers of a Mathematical or Physical Character, 195, 1–47. [Google Scholar] [Crossref]

Reiser, B., & Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: the normal equal variance case. Journal of the Royal Statistical Society: Series D (the Statistician), 48(3), 413–418. [Google Scholar]

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638-641. [Google Scholar]

Rosenthal, R. (1983). Assessing the statistical and social importance of the effects of psychotherapy. Journal of Consulting and Clinical Psychology, 51, 4-13. [Google Scholar]

Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper, & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 231–244). New York: Russell Sage. [Google Scholar]

Rosenthal, J. A. (1996). Qualitative descriptors of strength of association and effect size. Journal of Social Service Research, 21(4), 37-59. [Google Scholar]

Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. The American Psychologist, 44(10), 1276-1284. [Google Scholar]

Rosnow, R. L., Rosenthal, R., & Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation. Psychological Science, 11(6), 446–453. [Google Scholar]

Salsburg, D. (2002). The lady tasting tea. New York: Macmillan. [Google Scholar]

Sanabria, F., & Killeen, P. R. (2007). Better statistics for better decisions: Rejecting null hypotheses statistical tests in favor of replication statistics. Psychology in the Schools, 44(5), 471–481. [Google Scholar]

Schatz, P., Jay, K. A., McComb, J., & McLaughlin, J. R. (2005). Misuse of statistical tests in Archives of Clinical Neuropsychology publications. Archives of Clinical Neuropsychology, 20(8), 1053–1059. [Google Scholar] [Crossref]

Schmidt, F. L., & Hunter, J. E. (2004). Methods of meta-analysis. Thousand Oaks: SAGE Publications. [Google Scholar]

Schneider, A. L., & Darcy, R. E. (1984). Policy implications of using significance tests in evaluation research. Evaluation Review, 8(4), 573–582. [Google Scholar] [Crossref]

Schünemann, H. J., Oxman, A. D., Vist, G. E., Higgins, J. P. T., Deeks, J. J., Glasziou, P., & Guyatt, G. H. (2008). Interpreting results and drawing conclusions. In J. P. T. Higgins, & S. Green (Eds.), Cochrane Handbook for Systematic Reviews of Interventions: Cochrane Book Series (pp. 1–29). The Cochrane Collaboration. [Google Scholar] [Google Scholar]

Sechrest, L. McKnight, P., & McKnight, K. (1996). Calibration of measures for psychotherapy outcome studies. American Psychologist, 51, 1065-1071. [Google Scholar]

Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105(2), 309-316. [Google Scholar]

Snyder, P., & Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. The Journal of Experimental Education, 61(4), 334–349. [Google Scholar]

Sun, S., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102(4), 989-1004. [Google Scholar]

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston: Pearson. [Google Scholar]

Published

2015-02-28

How to Cite

Espirito Santo, H., & Daniel, F. B. (2015). Calculating and reporting effect sizes on scientific papers (1): p < 0.05 limitations in the analysis of mean differences of two groups. Portuguese Journal of Behavioral and Social Research, 1(1), 3–16. https://doi.org/10.7342/ismt.rpics.2015.1.1.14

Issue

Section

Review Paper

Most read articles by the same author(s)