Table A -- ES Reporting Practices for Inferential

advertisement
1
Appendix to Accompany
The Impact of APA and AERA Guidelines on Effect Size Reporting
By C.-Y. J. Peng, L.-T. Chen, H.-M. Chiang, & Y.-C. Chiang
February, 2013
Table of Contents
Wilkinson and the Task Force on Statistical Inference (1999, abbreviated as the 1999 APA Task Force
Report) .......................................................................................................................................................... 2
The 6th edition of The APA Publication Manual (APA, 2010) ....................................................................... 4
Reporting Standards for Research in Psychology (APA Publications and Communications Board Working
Group on Journal Article Reporting Standards, 2008) .................................................................................. 6
Standards for Reporting on Empirical Social Science Research in AERA Publications (AERA, 2006) ............ 7
Table A -- ES Reporting Practices for Inferential Statistical Tests in 12 Education and Psychology Journals
from 2009 to 2010 ........................................................................................................................................ 8
Table B -- ES Reporting Practices for Model Fitting Techniques in 12 Education and Psychology Journals
from 2009 to 2010 ........................................................................................................................................ 9
2
Wilkinson and the Task Force on Statistical Inference (1999, abbreviated as the 1999 APA
Task Force Report)
Page
Text
no.
596
[under Power and sample size]
Document the effect sizes, sampling and measurement assumptions, as well as
analytic procedures used in power calculations. Because power computations are
most meaningful when done before data are collected and examined, it is important
to show how effect-size estimates have been derived from previous research and
theory in order to dispel suspicions that they might have been taken from data used
in the study or, even worse, constructed to justify a particular sample size.
599
[under Hypothesis tests]
Always provide some effect-size estimate when reporting a p value.
[under Effect sizes]
Always present effect sizes [emphasis added] for primary outcomes. If the units of
measurement are meaningful on a practical level (e.g., number of cigarettes smoked
per day), then we usually prefer an unstandardized measure (regression coefficient
or mean difference) to a standardized measure (r or d). It helps to add brief
comments that place these effect sizes [emphasis added] in a practical and
theoretical context. . . . We must stress again that reporting and interpreting effect
sizes in the context of previously reported
effects is essential to good research. It enables readers to evaluate the stability of
results across samples, designs, and analyses. Reporting effect sizes also informs
power analyses and meta-analyses needed in future research. Fleiss (1994), Kirk
(1996), Rosenthal (1994), and Snyder and Lawson (1993) have summarized various
measures of effect sizes used in psychological research. Consult these articles for
information on computing them. For a simple, general purpose display of the
practical meaning of an effect size, see Rosenthal and Rubin (1982). Consult
Rosenthal and Rubin (1994) for information on the use of "counternull intervals" for
effect sizes, as alternatives to confidence intervals.
[under Interval estimates]
Interval estimates should be given for any effect sizes involving principle outcomes.
602
[under Interpretation]
Explicitly compare the effects detected in your inquiry with the effect sizes reported
3
Page
Text
no.
in related previous studies. Do not be afraid to extend your interpretations to a
general class or population if you have reasons to assume that your results apply.
4
The 6th edition of The APA Publication Manual (APA, 2010)
Page
Text
no.
33
34
35
247
248
[under Statistics and data analysis]
When reporting the results of inferential statistical tests or when providing
estimates of parameters or effect sizes, include sufficient information to help the
reader fully understand the analyses conducted and possible alternative
explanations for the outcomes of those analyses.
[under Statistics and data analysis]
For the reader to appreciate the magnitude or importance of a study’s findings, it is
almost always necessary to include some measure of effect size [emphasis added] in
the Results section. Whenever possible, provide a confidence interval for each effect
size [emphasis added] reported to indicate the precision of estimation of the effect
size [emphasis added]. Effect sizes [emphasis added] may be expressed in the
original units (e.g., the mean number of questions answered correctly; kg/month for
a regression slope) and are often most easily understood when reported in original
units. It can often be valuable to report an effect size [emphasis added] not only in
original units but also in some standardized or units-free unit (e.g., Cohen’s d value)
or a standardized regression weight. Multiple degree-of-freedom effect-size
[emphasis added] indicators are often less useful than effect-size [emphasis added]
indicators that decompose multiple degree-of-freedom tests into meaningful one
degree-of-freedom effects—particularly when the latter are the results that inform
the discussion. The general principle to be followed, however, is to provide the
reader with enough information to assess the magnitude of the observed effect.
[under Discussion]
Your interpretation of the results should take into account (a) sources of potential
bias and other threats to interval validity, (b) the imprecision of measures, (c) the
overall number of tests or overlap among tests, (d) the effect sizes observed, and (e)
other limitations or weakness of the study.
[Table 1. Journal Article Reporting Standards (JARS): Information Recommended
for Inclusion in Manuscripts That Report New Data Collections Regardless of
Research Design for Abstract]
Findings, including effect sizes and confidence intervals and/ or statistical
significance levels
[Table 1. Journal Article Reporting Standards (JARS): Information Recommended
for Inclusion in Manuscripts That Report New Data Collections Regardless of
5
Page
Text
no.
251252
Research Design for Results – Statistics and data analysis]
For each primary and secondary outcome and for each subgroup, a summary of:
Cases deleted from each analysis . . . Effect sizes and confidence intervals
Table 4 Meta-analysis reporting standards
6
Reporting Standards for Research in Psychology (APA Publications and Communications
Board Working Group on Journal Article Reporting Standards, 2008)
Page
Text
no.
842
843
848-
[Table 1. Journal Article Reporting Standards (JARS): Information Recommended
for Inclusion in Manuscripts That Report New Data Collections Regardless of
Research Design for Abstract]
Findings, including effect sizes and confidence intervals and/ or statistical
significance levels
[Table 1. Journal Article Reporting Standards (JARS): Information Recommended
for Inclusion in Manuscripts That Report New Data Collections Regardless of
Research Design for Results – Statistics and data analysis]
For each primary and secondary outcome and for each subgroup, a summary of:
Cases deleted from each analysis . . . Effect sizes and confidence intervals
Table 4 Meta-analysis reporting standards
849
849
[Under Other Benefits of Reporting Standards]
Or standards that specify reporting a confidence interval along with an effect size
might motivate researchers to plan their studies so as to ensure that the confidence
intervals surrounding point estimates will be appropriately narrow.
850
[Under Obstacles to Developing Standards]
There are certain situations (e.g., multilevel designs) where no clear consensus
exists on how best to conceptualize and/or calculate effect size measures. In a
related vein, reporting a confidence interval with an effect size is sound advice, but
calculating confidence intervals for effect sizes is often difficult given the current
state of software.
7
Standards for Reporting on Empirical Social Science Research in AERA Publications (AERA,
2006)
Page
Text
no.
37
37
37
[Under With quantitative methods]
Interpretation of statistical analyses is enhanced by reporting magnitude of relations
(e.g., effect sizes [emphasis added]) and their uncertainty [emphasis added]
separately.
[Under With quantitative methods]
It is important to report the results of analyses that are critical for interpretation of
findings in ways that capture the magnitude as well as the statistical significance of
those results. Quantitative indices of effect magnitude (effect size [emphasis added]
indices) are a useful way to do this.
5.10 For each of the statistical results that is critical to the logic of the design and
analysis, there should be included:
 An index of the quantitative relation between variables (an effect size
[emphasis added] of some kind such as a treatment effect, a regression
coefficient, or an odds ratio) or, for studies that principally describe
variables, an index of effect [emphasis added] that describes the
magnitude of the measured variables.
 An indication of the uncertainty of that index of effect [emphasis added]
(such as a standard error or a confidence interval).
 ...
 A qualitative interpretation of the index of the effect [emphasis added]
that describes its meaningfulness in terms of the questions the study was
intended to answer. This interpretation should include any qualification
that may be appropriate because of the uncertainty of the findings (e.g.,
the estimated effect [emphasis added] is large enough to be educationally
important but these data do not rule out the possibility that the true
effect [emphasis added] is actually quite small). (emphasis added)
8
Table A -- ES Reporting Practices for Inferential Statistical Tests in 12 Education and Psychology Journals from 2009 to 2010
Types of ES reported
Journal a
Articles
No ES
reported
(%)
Adjusted
Cohen’s Cohen’s Hedges’s ES not
d
db
g
specified
R2
∆R2
Adjusted
R2
η2
Partial η2
f2
rc
Z scored
Cramer’s
V
φ
% of correct
classificatione
Odds
ratio
others
AERJ†
32
6
7
0
0
0
11
1
1
2
3
0
0
0
1
1
1
1
3
†
4
0
0
1
0
0
0
0
0
0
0
0
1
16
6
2
0
0
3
*
75
23
14
0
2
0
27
20
3
17
6
1
0
1
0
0
0
3
6
JEP*
134
34
27
0
1
3
20
15
7
12
17
2
1
1
1
0
0
4
11
7
5
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
JRST
41
19
10
0
1
0
6
4
1
3
0
0
1
0
1
0
0
0
3
JRTE
14
6
2
0
0
2
1
0
0
2
1
1
0
0
0
0
0
0
2
JSE
20
9
2
0
0
4
0
0
0
2
4
0
0
0
0
0
0
1
1
JSP
34
9
5
1
0
3
7
9
0
5
0
1
0
0
0
1
0
3
1
MLJ
19
9
3
0
0
1
2
0
0
1
2
0
2
0
0
0
0
0
0
RHE
55
32
1
0
0
0
10
3
4
3
1
0
0
0
0
0
0
5
6
TRSE
4
0
0
0
0
0
0
0
1
2
2
0
0
0
0
0
0
0
1
158 (35.0)
73
1
4
16
88
52
18
50
36
5
4
2
3
2
1
18
35
ER
JCP
JRME
Total
451
Note. Percentage in parentheses is row percent. The unit of analysis for the first two columns is articles. The unit of analysis for the remaining of Table 2 (i.e., ES reported) is the type of ES actually reported.
Details about each journal article may be obtained from the first author.
a
Journal abbreviations; APA journals are marked by *, AERA journals are marked by †, journals reviewed more than once are in bold-AERJ†: American Educational Research Journal †
ER†:
Educational Researcher †
JCP*:
Journal of Counseling Psychology*
JEP*:
Journal of Educational Psychology*
JRME: Journal for Research in Mathematics Education
JRST:
Journal of Research in Science Teaching
JRTE:
Journal of Research on Technology in Education (formerly known as Journal of Research on Computing in Education till 2001).
JSE:
Journal of Special Education
JSP:
Journal of School Psychology
MLJ:
The Modern Language Journal
RHE:
Research in Higher Education
TRSE:
Theory and Research in Social Education
b
Adjusted Cohen’s d was used in one multielement design.
c
r was used in the Wilcoxon ranks test.
d
Z was used in the cluster analysis as analogous to Cohen’s d.
e
Percent of correct classification was used in logistic regression.
9
Table B -- ES Reporting Practices for Model Fitting Techniques in 12 Education and Psychology Journals from 2009 to 2010
Journala
Articles that employed at
least one model fitting
technique
No fit index nor any other
ES reported (%)
Reported at least one fit
index but no ES (%)
Reported at least one fit
index and specified fit
index as ES
Reported at least one fit
index and also one other
ES measure (%)
Reported no fit index but
indicated other measures as
ES (%)
19
4 (21.1)
5 (16.7)
0
9 (47.4)
1 (5.3)
ER†
6
1 (16.7)
0
1
0
0
JCP*
46
5 (10.9)
20 (43.5)
0
16 (34.8)
1 (2.2)
JEP*
72
8 (11.1)
38 (52.8)
0
10 (13.9)
16 (22.2)
JRME
2
2 (100.0)
0
0
0
0
JRST
5
0
2 (40.0)
0
1(20.0)
2 (40.0)
JRTE
0
0
0
0
0
0
JSE
3
0
3 (100.0)
0
0
0
JSP
16
4 (25.0)
9 (56.3)
0
1 (.06)
2 (12.5)
MLJ
0
0
0
0
0
0
RHE
20
8 (40.0)
10 (50.0)
0
1 (5.0)
1 (5.0)
TRSE
0
0
0
0
0
0
32 (16.9)
87 (46.0)
1
38 (20.1)
23 (12.2)
AERJ†
Total
189
Note: Percentages listed in parentheses are row percents. Details about each journal article may be obtained from the first author.
a
Journal abbreviations are found in Note a under Table A.
Download