1 Appendix to Accompany The Impact of APA and AERA Guidelines on Effect Size Reporting By C.-Y. J. Peng, L.-T. Chen, H.-M. Chiang, & Y.-C. Chiang February, 2013 Table of Contents Wilkinson and the Task Force on Statistical Inference (1999, abbreviated as the 1999 APA Task Force Report) .......................................................................................................................................................... 2 The 6th edition of The APA Publication Manual (APA, 2010) ....................................................................... 4 Reporting Standards for Research in Psychology (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008) .................................................................................. 6 Standards for Reporting on Empirical Social Science Research in AERA Publications (AERA, 2006) ............ 7 Table A -- ES Reporting Practices for Inferential Statistical Tests in 12 Education and Psychology Journals from 2009 to 2010 ........................................................................................................................................ 8 Table B -- ES Reporting Practices for Model Fitting Techniques in 12 Education and Psychology Journals from 2009 to 2010 ........................................................................................................................................ 9 2 Wilkinson and the Task Force on Statistical Inference (1999, abbreviated as the 1999 APA Task Force Report) Page Text no. 596 [under Power and sample size] Document the effect sizes, sampling and measurement assumptions, as well as analytic procedures used in power calculations. Because power computations are most meaningful when done before data are collected and examined, it is important to show how effect-size estimates have been derived from previous research and theory in order to dispel suspicions that they might have been taken from data used in the study or, even worse, constructed to justify a particular sample size. 599 [under Hypothesis tests] Always provide some effect-size estimate when reporting a p value. [under Effect sizes] Always present effect sizes [emphasis added] for primary outcomes. If the units of measurement are meaningful on a practical level (e.g., number of cigarettes smoked per day), then we usually prefer an unstandardized measure (regression coefficient or mean difference) to a standardized measure (r or d). It helps to add brief comments that place these effect sizes [emphasis added] in a practical and theoretical context. . . . We must stress again that reporting and interpreting effect sizes in the context of previously reported effects is essential to good research. It enables readers to evaluate the stability of results across samples, designs, and analyses. Reporting effect sizes also informs power analyses and meta-analyses needed in future research. Fleiss (1994), Kirk (1996), Rosenthal (1994), and Snyder and Lawson (1993) have summarized various measures of effect sizes used in psychological research. Consult these articles for information on computing them. For a simple, general purpose display of the practical meaning of an effect size, see Rosenthal and Rubin (1982). Consult Rosenthal and Rubin (1994) for information on the use of "counternull intervals" for effect sizes, as alternatives to confidence intervals. [under Interval estimates] Interval estimates should be given for any effect sizes involving principle outcomes. 602 [under Interpretation] Explicitly compare the effects detected in your inquiry with the effect sizes reported 3 Page Text no. in related previous studies. Do not be afraid to extend your interpretations to a general class or population if you have reasons to assume that your results apply. 4 The 6th edition of The APA Publication Manual (APA, 2010) Page Text no. 33 34 35 247 248 [under Statistics and data analysis] When reporting the results of inferential statistical tests or when providing estimates of parameters or effect sizes, include sufficient information to help the reader fully understand the analyses conducted and possible alternative explanations for the outcomes of those analyses. [under Statistics and data analysis] For the reader to appreciate the magnitude or importance of a study’s findings, it is almost always necessary to include some measure of effect size [emphasis added] in the Results section. Whenever possible, provide a confidence interval for each effect size [emphasis added] reported to indicate the precision of estimation of the effect size [emphasis added]. Effect sizes [emphasis added] may be expressed in the original units (e.g., the mean number of questions answered correctly; kg/month for a regression slope) and are often most easily understood when reported in original units. It can often be valuable to report an effect size [emphasis added] not only in original units but also in some standardized or units-free unit (e.g., Cohen’s d value) or a standardized regression weight. Multiple degree-of-freedom effect-size [emphasis added] indicators are often less useful than effect-size [emphasis added] indicators that decompose multiple degree-of-freedom tests into meaningful one degree-of-freedom effects—particularly when the latter are the results that inform the discussion. The general principle to be followed, however, is to provide the reader with enough information to assess the magnitude of the observed effect. [under Discussion] Your interpretation of the results should take into account (a) sources of potential bias and other threats to interval validity, (b) the imprecision of measures, (c) the overall number of tests or overlap among tests, (d) the effect sizes observed, and (e) other limitations or weakness of the study. [Table 1. Journal Article Reporting Standards (JARS): Information Recommended for Inclusion in Manuscripts That Report New Data Collections Regardless of Research Design for Abstract] Findings, including effect sizes and confidence intervals and/ or statistical significance levels [Table 1. Journal Article Reporting Standards (JARS): Information Recommended for Inclusion in Manuscripts That Report New Data Collections Regardless of 5 Page Text no. 251252 Research Design for Results – Statistics and data analysis] For each primary and secondary outcome and for each subgroup, a summary of: Cases deleted from each analysis . . . Effect sizes and confidence intervals Table 4 Meta-analysis reporting standards 6 Reporting Standards for Research in Psychology (APA Publications and Communications Board Working Group on Journal Article Reporting Standards, 2008) Page Text no. 842 843 848- [Table 1. Journal Article Reporting Standards (JARS): Information Recommended for Inclusion in Manuscripts That Report New Data Collections Regardless of Research Design for Abstract] Findings, including effect sizes and confidence intervals and/ or statistical significance levels [Table 1. Journal Article Reporting Standards (JARS): Information Recommended for Inclusion in Manuscripts That Report New Data Collections Regardless of Research Design for Results – Statistics and data analysis] For each primary and secondary outcome and for each subgroup, a summary of: Cases deleted from each analysis . . . Effect sizes and confidence intervals Table 4 Meta-analysis reporting standards 849 849 [Under Other Benefits of Reporting Standards] Or standards that specify reporting a confidence interval along with an effect size might motivate researchers to plan their studies so as to ensure that the confidence intervals surrounding point estimates will be appropriately narrow. 850 [Under Obstacles to Developing Standards] There are certain situations (e.g., multilevel designs) where no clear consensus exists on how best to conceptualize and/or calculate effect size measures. In a related vein, reporting a confidence interval with an effect size is sound advice, but calculating confidence intervals for effect sizes is often difficult given the current state of software. 7 Standards for Reporting on Empirical Social Science Research in AERA Publications (AERA, 2006) Page Text no. 37 37 37 [Under With quantitative methods] Interpretation of statistical analyses is enhanced by reporting magnitude of relations (e.g., effect sizes [emphasis added]) and their uncertainty [emphasis added] separately. [Under With quantitative methods] It is important to report the results of analyses that are critical for interpretation of findings in ways that capture the magnitude as well as the statistical significance of those results. Quantitative indices of effect magnitude (effect size [emphasis added] indices) are a useful way to do this. 5.10 For each of the statistical results that is critical to the logic of the design and analysis, there should be included: An index of the quantitative relation between variables (an effect size [emphasis added] of some kind such as a treatment effect, a regression coefficient, or an odds ratio) or, for studies that principally describe variables, an index of effect [emphasis added] that describes the magnitude of the measured variables. An indication of the uncertainty of that index of effect [emphasis added] (such as a standard error or a confidence interval). ... A qualitative interpretation of the index of the effect [emphasis added] that describes its meaningfulness in terms of the questions the study was intended to answer. This interpretation should include any qualification that may be appropriate because of the uncertainty of the findings (e.g., the estimated effect [emphasis added] is large enough to be educationally important but these data do not rule out the possibility that the true effect [emphasis added] is actually quite small). (emphasis added) 8 Table A -- ES Reporting Practices for Inferential Statistical Tests in 12 Education and Psychology Journals from 2009 to 2010 Types of ES reported Journal a Articles No ES reported (%) Adjusted Cohen’s Cohen’s Hedges’s ES not d db g specified R2 ∆R2 Adjusted R2 η2 Partial η2 f2 rc Z scored Cramer’s V φ % of correct classificatione Odds ratio others AERJ† 32 6 7 0 0 0 11 1 1 2 3 0 0 0 1 1 1 1 3 † 4 0 0 1 0 0 0 0 0 0 0 0 1 16 6 2 0 0 3 * 75 23 14 0 2 0 27 20 3 17 6 1 0 1 0 0 0 3 6 JEP* 134 34 27 0 1 3 20 15 7 12 17 2 1 1 1 0 0 4 11 7 5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 JRST 41 19 10 0 1 0 6 4 1 3 0 0 1 0 1 0 0 0 3 JRTE 14 6 2 0 0 2 1 0 0 2 1 1 0 0 0 0 0 0 2 JSE 20 9 2 0 0 4 0 0 0 2 4 0 0 0 0 0 0 1 1 JSP 34 9 5 1 0 3 7 9 0 5 0 1 0 0 0 1 0 3 1 MLJ 19 9 3 0 0 1 2 0 0 1 2 0 2 0 0 0 0 0 0 RHE 55 32 1 0 0 0 10 3 4 3 1 0 0 0 0 0 0 5 6 TRSE 4 0 0 0 0 0 0 0 1 2 2 0 0 0 0 0 0 0 1 158 (35.0) 73 1 4 16 88 52 18 50 36 5 4 2 3 2 1 18 35 ER JCP JRME Total 451 Note. Percentage in parentheses is row percent. The unit of analysis for the first two columns is articles. The unit of analysis for the remaining of Table 2 (i.e., ES reported) is the type of ES actually reported. Details about each journal article may be obtained from the first author. a Journal abbreviations; APA journals are marked by *, AERA journals are marked by †, journals reviewed more than once are in bold-AERJ†: American Educational Research Journal † ER†: Educational Researcher † JCP*: Journal of Counseling Psychology* JEP*: Journal of Educational Psychology* JRME: Journal for Research in Mathematics Education JRST: Journal of Research in Science Teaching JRTE: Journal of Research on Technology in Education (formerly known as Journal of Research on Computing in Education till 2001). JSE: Journal of Special Education JSP: Journal of School Psychology MLJ: The Modern Language Journal RHE: Research in Higher Education TRSE: Theory and Research in Social Education b Adjusted Cohen’s d was used in one multielement design. c r was used in the Wilcoxon ranks test. d Z was used in the cluster analysis as analogous to Cohen’s d. e Percent of correct classification was used in logistic regression. 9 Table B -- ES Reporting Practices for Model Fitting Techniques in 12 Education and Psychology Journals from 2009 to 2010 Journala Articles that employed at least one model fitting technique No fit index nor any other ES reported (%) Reported at least one fit index but no ES (%) Reported at least one fit index and specified fit index as ES Reported at least one fit index and also one other ES measure (%) Reported no fit index but indicated other measures as ES (%) 19 4 (21.1) 5 (16.7) 0 9 (47.4) 1 (5.3) ER† 6 1 (16.7) 0 1 0 0 JCP* 46 5 (10.9) 20 (43.5) 0 16 (34.8) 1 (2.2) JEP* 72 8 (11.1) 38 (52.8) 0 10 (13.9) 16 (22.2) JRME 2 2 (100.0) 0 0 0 0 JRST 5 0 2 (40.0) 0 1(20.0) 2 (40.0) JRTE 0 0 0 0 0 0 JSE 3 0 3 (100.0) 0 0 0 JSP 16 4 (25.0) 9 (56.3) 0 1 (.06) 2 (12.5) MLJ 0 0 0 0 0 0 RHE 20 8 (40.0) 10 (50.0) 0 1 (5.0) 1 (5.0) TRSE 0 0 0 0 0 0 32 (16.9) 87 (46.0) 1 38 (20.1) 23 (12.2) AERJ† Total 189 Note: Percentages listed in parentheses are row percents. Details about each journal article may be obtained from the first author. a Journal abbreviations are found in Note a under Table A.