Calcular e apresentar tamanhos do efeito em trabalhos científicos (1): As limitações do p < 0,05 na análise de diferenças de médias de dois grupos

Helena Espirito Santo; Fernanda Bento Daniel

doi:10.7342/ismt.rpics.2015.1.1.14

Autores

Helena Espirito Santo Instituto Superior Miguel Torga, Coimbra, Portugal https://orcid.org/0000-0003-2625-3754
Fernanda Bento Daniel Instituto Superior Miguel Torga, Coimbra, Portugal https://orcid.org/0000-0002-2202-1123

DOI:

https://doi.org/10.7342/ismt.rpics.2015.1.1.14

Palavras-chave:

Tamanho do efeito, Significância estatística, Valor p, d de Cohen, g de Hedges, Delta de Glass

Resumo

A Revista Portuguesa de Investigação Comportamental e Social exige que os autores sigam as recomendações do Publication Manual of the American Psychological Association (APA, 2010) na apresentação da informação estatística. Uma das recomendações da APA é de que os tamanhos do efeito sejam apresentados associados aos níveis de significância estatística. Uma vez que os valores de p decorrentes dos resultados dos testes estatísticos não informam sobre a magnitude ou importância de uma diferença, devem então reportar-se os tamanhos do efeito (TDE). De facto, os TDE dão significado aos testes estatísticos, enfatizam o poder dos testes estatísticos, reduzem o risco de a mera variação amostral ser interpretada como relação real, podem aumentar o relato de resultados “não-significativos” e permitem acumular conhecimento de vários estudos usando a meta-análise. Assim, os objetivos deste artigo são os de apresentar os limites do nível de significância; descrever os fundamentos da apresentação dos TDE dos testes estatísticos para análise de diferenças entre dois grupos; apresentar as fórmulas para calcular os TDE, fornecendo exemplos de estudos nossos; apresentar procedimentos de cálculo dos intervalos de confiança; fornecer as fórmulas de conversão para revisão da literatura; indicar como interpretar os TDE; e ainda mostrar que, apesar de frequentemente ser interpretável, o significado (efeito pequeno, médio ou grande para uma métrica arbitrária) pode ser impreciso, havendo necessidade de ser interpretado no contexto da área de investigação e de variáveis do mundo real.

Downloads

Não há dados estatísticos.

Referências

Acion, L., Peterson, J. J., Temple, S., & Arndt, S. (2006). Probabilistic index: an intuitive non-parametric approach to measuring the size of treatment effects. Statistics in Medicine, 25(4), 591–602. [Google Scholar] [Crossref]

Aguinis, H., Werner, S., Abbott, J. L., Angert, C., Park, J. H., & Kohlhausen, D. (2010). Customer-centric science: Reporting significant research results with rigor, relevance, and practical impact in mind. Organizational Research Methods, 13(3), 515–539. [Google Scholar]

Aickin, M. (2004). Bayes without priors. Journal of Clinical Epidemiology, 57(1), 4–13. [Google Scholar] [Crossref]

American Psychological Association. (APA) (2010). Publication Manual of the American Psychological Association (6.ª ed.). Washington, DC: APA. [Google Scholar]

Andersen, M. B., McCullagh, P., & Wilson, G. J. (2007). But what do the numbers really tell us? Arbitrary metrics and effect size reporting in sport psychology research. Journal of Sport e Exercise Psychology, 29(5), 664–672. [Google Scholar]

Baguley, T. (2009). Standardized or simple effect size: What should be reported? British Journal of Psychology, 100(3), 603–617. [Google Scholar]

Berben, L., Sereika, S. M., & Engberg, S. (2012). Effect size estimation: methods and examples. International Journal of Nursing Studies, 49(8), 1039–1047. [Google Scholar] [Crossref]

Bezeau, S., & Graves, R. (2001). Statistical power and effect sizes of clinical neuropsychology research. Journal of Clinical and Experimental Neuropsychology (Neuropsychology, Development and Cognition: Section a), 23(3), 399–406. [Google Scholar]

Blanton, H., & Jaccard, J. (2006a). Arbitrary metrics in psychology. The American Psychologist, 61(1), 27–41. [Google Scholar] [Crossref]

Blanton, H., & Jaccard, J. (2006b). Arbitrary metrics redux. The American Psychologist, 61(1), 62. [Google Scholar]

Borenstein, M. (2009). Effect sizes for continuous data. In H. Cooper, L. V. Hodges e J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis (pp. 221–235). New York: Russell Sage Foundation. [Google Scholar]

Breaugh, J. A. (2003). Effect size estimation: Factors to consider and mistakes to avoid. Journal of Management, 29(1), 79–97. [Google Scholar]

Caperos, J. M., & Pardo, A. (2013). Consistency errors in p-values reported in Spanish psychology journals. Psicothema, 25(3), 408–414. [Google Scholar] [Crossref]

Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review, 48(3), 378–399. [Google Scholar]

Chow, S. L. (1988). Significance test or effect size? Psychological Bulletin, 103(1), 105-110. [Google Scholar]

Coe, R. (2002). It's the effect size, stupid: what effect size is and why it is important. Presented at the Annual Conference of the British Educational Research Association, University of Exeter, England, Education-line. [Google Scholar]

Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65(3), 145–153. [Google Scholar]

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2.ª ed.). Hillsdale: Lawrence Erlbaum Associates. [Google Scholar]

Cohen, J. (1992a). A power primer. Psychological Bulletin, 112(1), 155. [Google Scholar]

Cohen, J. (1992b). Statistical power analysis. Current Directions in Psychological Science, 1(3), 98–101. [Google Scholar]

Cohen, J. (1994). The earth is round (p < .05). The American Psychologist, 49(12), 997-1003[Google Scholar]

Conn, V. S., Chan, K. C., & Cooper, P. S. (2014). The problem with p. Western Journal of Nursing Research, 36(3), 291–293. [Google Scholar]

Cook, R. J., & Sackett, D. L. (1995). The number needed to treat: A clinically useful measure of treatment effect. BMJ, 310(6977), 452–454. [Google Scholar]

Cooper, H., Hedges, L. V., & Valentine, J. C. (2009). The handbook of research synthesis and meta-analysis (2.ª ed.). New York: Russell Sage Foundation. [Google Scholar]

Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge. [Google Scholar]

Dunlap, W. P. (1994). Generalizing the common language effect size indicator to bivariate normal correlations. Psychological Bulletin, 116(3), 509–511. [Google Scholar]

Durlak, J. A. (2009). How to select, calculate, and interpret effect sizes. Journal of Pediatric Psychology, 34(9), 917–928. [Google Scholar]

Ellis, P. D. (2010). The essential guide to effect sizes. Statistical power, meta-analysis, and the interpretation of research results (pp. 1–193). Cambridge: Cambridge University Press. [Google Scholar]

Embretson, S. E. (2006). The continued search for nonarbitrary metrics in psychology. The American Psychologist, 61(1), 50–55. [Google Scholar] [Crossref]

Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers. Professional Psychology: Research and Practice, 40(5), 532-538. [Google Scholar]

Fern, E. F., & Monroe, K. B. (1996). Effect-size estimates: Issues and problems in interpretation. Journal of Consumer Research, 23(2), 89–105. [Google Scholar]

Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd. [Google Scholar]

Fisher, R. A. (1959). Statistical methods and scientific inference (2.ª ed.). Edinburgh: Oliver and Boyd. [Google Scholar]

Furukawa, T. A., & Leucht, S. (2011). How to obtain NNT from Cohen's d: Comparison of two methods. PLoS ONE, 6(4), e19070, 1-5. [Google Scholar]

Giere, R. N. (1972). The significance test controversy. British Journal for the Philosophy of Science, 23(2), 170–181. [Google Scholar]

Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8. [Google Scholar]

Glass, G.V., McGaw, B., & Smith, M. L. (1981). Meta-analysis in social research. Sage: Beverly Hills. [Google Scholar]

Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Lawrence Erlbaum Associates Publishers. [Google Scholar]

Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational and Behavioral Statistics, 6(2), 107–128. [Google Scholar]

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis (Vol. 11, pp. 104–106). Orlando: Academic Press. [Google Scholar]

Hentschke, H., & Stüttgen, M. C. (2011). Computation of measures of effect size for neuroscience data sets. The European Journal of Neuroscience, 34(12), 1887–1894. [Google Scholar] [Crossref]

Huberty, C. J. (1993). Historical origins of statistical testing practices: The treatment of Fisher versus Neyman-Pearson views in textbooks. The Journal of Experimental Education, 61(4), 317–333. [Google Scholar]

Jacobson, N. S., & Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12-19. [Google Scholar]

Kazdin, A. E. (2006). Arbitrary metrics: implications for identifying evidence-based treatments. The American Psychologist, 61(1), 42-49. [Google Scholar] [Crossref] https://scholar.google.pt/scholar?hl=pt-

Killeen, P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16(5), 345–353. [Google Scholar]

Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56(5), 746–759. [Google Scholar] [Crossref]

Kline, R. B. (2004). Beyond significance testing: Reforming data analysis methods in behavioral research (2.ª ed.). Washington, DC: American Psychological Association. [Google Scholar]

Kraemer, H. C., & Kupfer, D. J. (2006). Size of treatment effects and their importance to clinical research and practice. BPS, 59(11), 990–996. [Crossref]

Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A diagnosis based on the correlation between effect size and sample size. PLoS ONE, 9(9), e105825, 1-8. [Google Scholar] [Crossref]

Lakens, D. (2013). Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Frontiers in Psychology, 4(863), 1-12. [Google Scholar] [Crossref]

Lee, M. D., & Wagenmakers, E.-J. (2005). Bayesian statistical inference in psychology: comment on Trafimow (2003). Psychological Review, 112(3), 662–668. [Google Scholar] [Crossref]

Lemos, L., Espirito-Santo, H., Silva, G. F., Costa, M., Cardoso, D., Vicente, F., et al. (2014). The impact of a Neuropsychological Rehabilitation Group Program (NRGP) on cognitive and emotional functioning in institutionalized elderly (p. 1). Presented at the 22nd European Congress of Psychiatry, Munich. [Google Scholar]

Lenth, R. V. (2006–2014). Java applets for power and sample size. [URL]

Liesbeth, W. A., Prins, J. B., Vernooij-Dassen, M. J. F. J., Wijnen, H. H., Olde Rikkert, M. G. M., & Kessels, R. P. C. (2011). Group therapy for patients with mild cognitive impairment and their significant others: results of a waiting-list controlled trial. Gerontology, 57(5), 444–454. [Google Scholar] [Crossref]

Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W. … Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. National Center for Special Education Research. National Center for Special Education Research, Institute of Education Sciences. [Google Scholar]

Loftus, G. R. (1991). On the tyranny of hypothesis testing in the social sciences. Contemporary Psychology, 36(2), 102–105. [Google Scholar]

Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 161–171. [Google Scholar]

McCartney, K., & Rosenthal, R. (2000). Effect size, practical importance, and social policy for children. Child Development, 71(1), 173–180. [Google Scholar]

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361-365. [Google Scholar]

McMillan, J. H., & Foley, J. (2011). Reporting and discussing effect size: Still the road less traveled. Practical Assessment, Research e Evaluation, 16(14), 1–12. [Google Scholar]

Morrison, D. E., & Henkel, R. E. (Eds.). (1970). The significance test controversy: A reader. Chicago: Aldine. [Google Scholar]

Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence interval and statistical significance: A practical guide for biologists. Biological Reviews, 82(4), 591–605. [Google Scholar] [Crossref]

Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5(2), 241–301. [Google Scholar] [Crossref]

Nunnally, J. C. (1978). Psychometric theory. New York: McGraw-Hill. [Google Scholar]

Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25(3), 241–286. [Google Scholar] [Crossref]

Orwin, R. G. (1983). A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics, 8(2), 157–159. [Google Scholar]

Paiva, A. C., Cunha, M., Xavier, A. M., Marques, M., Simões, S., & Espirito-Santo, H. (2013). Exploratory study of risk-taking and self-harm behaviours in adolescents: prevalence, characteristics and its relationship to attachment styles. European Psychiatry, 28(Supl. 1). [Google Scholar] [Crossref]

Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London. Series a, Containing Papers of a Mathematical or Physical Character, 195, 1–47. [Google Scholar] [Crossref]

Reiser, B., & Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: the normal equal variance case. Journal of the Royal Statistical Society: Series D (the Statistician), 48(3), 413–418. [Google Scholar]

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638-641. [Google Scholar]

Rosenthal, R. (1983). Assessing the statistical and social importance of the effects of psychotherapy. Journal of Consulting and Clinical Psychology, 51, 4-13. [Google Scholar]

Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper, & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 231–244). New York: Russell Sage. [Google Scholar]

Rosenthal, J. A. (1996). Qualitative descriptors of strength of association and effect size. Journal of Social Service Research, 21(4), 37-59. [Google Scholar]

Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. The American Psychologist, 44(10), 1276-1284. [Google Scholar]

Rosnow, R. L., Rosenthal, R., & Rubin, D. B. (2000). Contrasts and correlations in effect-size estimation. Psychological Science, 11(6), 446–453. [Google Scholar]

Salsburg, D. (2002). The lady tasting tea. New York: Macmillan. [Google Scholar]

Sanabria, F., & Killeen, P. R. (2007). Better statistics for better decisions: Rejecting null hypotheses statistical tests in favor of replication statistics. Psychology in the Schools, 44(5), 471–481. [Google Scholar]

Schatz, P., Jay, K. A., McComb, J., & McLaughlin, J. R. (2005). Misuse of statistical tests in Archives of Clinical Neuropsychology publications. Archives of Clinical Neuropsychology, 20(8), 1053–1059. [Google Scholar] [Crossref]

Schmidt, F. L., & Hunter, J. E. (2004). Methods of meta-analysis. Thousand Oaks: SAGE Publications. [Google Scholar]

Schneider, A. L., & Darcy, R. E. (1984). Policy implications of using significance tests in evaluation research. Evaluation Review, 8(4), 573–582. [Google Scholar] [Crossref]

Schünemann, H. J., Oxman, A. D., Vist, G. E., Higgins, J. P. T., Deeks, J. J., Glasziou, P., & Guyatt, G. H. (2008). Interpreting results and drawing conclusions. In J. P. T. Higgins, & S. Green (Eds.), Cochrane Handbook for Systematic Reviews of Interventions: Cochrane Book Series (pp. 1–29). The Cochrane Collaboration. [Google Scholar] [Google Scholar]

Sechrest, L. McKnight, P., & McKnight, K. (1996). Calibration of measures for psychotherapy outcome studies. American Psychologist, 51, 1065-1071. [Google Scholar]

Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105(2), 309-316. [Google Scholar]

Snyder, P., & Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. The Journal of Experimental Education, 61(4), 334–349. [Google Scholar]

Sun, S., Pan, W., & Wang, L. L. (2010). A comprehensive review of effect size reporting and interpreting practices in academic journals in education and psychology. Journal of Educational Psychology, 102(4), 989-1004. [Google Scholar]

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston: Pearson. [Google Scholar]