10. Tips for stats in publications

advertisement
Statistical Reporting
Stats in the literature
Tables, Graphs, Words
(van Belle-Stat rules of Thumb,2002)
Bad – The blood types of the US population
are 40%, 11%, 4% and 45% for A, B, AB
and O respectively.
GoodBlood type
percent
O
45%
A
40%
B
11%
AB
4%
2
Sort by frequency
Occupation
active in 1980 - bad
Chiropractors
25,600
Dentists
121,240
Nutritionists
32,000
Nurses
1,272,900
Optometrists
22,330
Pharmacists
142,780
Physicians
427,122
3
Good
Occupation
active in 1980
Nurses
1,272,900
Physicians
427,122
Pharmacists
142,780
Dentists
121,240
Nutritionists
32,000
Chiropractors
25,600
Optometrists
22,330
4
Don’t use pie charts
(particularly for 2+ dimensions)
5
Use sorted tables
Blood type
Rh +
Rh-
total
O
38
7
45
A
34
6
40
B
9
2
11
AB
3
1
4
total
84
16
100
6
Bar graphs ok in simple situations
(& put in SE or CI bars)
7
Don’t use bar graphs in complex
situations
8
Use line graph
9
Try not to use asterisks
to report p values
* = 0.05 ≤ p < 0.10
** = 0.01 ≤ p < 0.05
*** = p < 0.01
10
*
**
11
Report actual p values
Be clear about the reference group
group
n
mean
SEM
p value*
cntl
12
20
5
ref
B
22
35
6
0.062
C
9
50
5
0.020
D
7
15
4
0.124
*p value compared to control group
report actual p values when possible, not just an
asterisk if p < 0.05. (Tables better than bar
charts for this).
12
Don’t cut off axis too high
13
Misc rules (Petrie & Sabin)
Don’t use 3-D bars for one variable.
Report n. Report SE or CI for all estimates
Make sure all estimates are labeled with the
correct units (ie b=3.0 mg/L, not b=3.0)
Define any uncommon symbols, including
those for equations. (b=slope, Δ=mean
difference in mg). Define abbreviations.
Make sure all tables have a title.
Label all axes & bars and explain the
meaning of all graph symbols.
Misc rules (cont.)
Be careful about drawing firm conclusions
about issues on which no data were
presented. Distinguish between
conclusions vs discussion/speculation.
Enough info should be given so results
could be reproduced by a person with the
proper training if raw data was provided.
Reporting Guidelines
The CONSORT (CONsolidated Standards
for Reporting Trials) guidelines are
formulated for clinical trials-but most of the
guidelines apply to other designs.
Also see the STROBE (STrengthening the
Reporting of OBservational studies in Epi)
statement.
16
From CONSORT checklist
*Objectives & hypotheses
*Background & rationale
*Design, inclusion/exclusion criteria
*Settings & locations where data collected
*Blinding (if relevant)
*Interventions (if any)
*Primary & secondary outcomes
17
Example: diet study
Objective- Compare wt loss under 4 diets
Design – RCT, twelve month trial
Setting & Participants – local recruitment. Trial
conducted in US from Feb 2003 to Oct 2005
among 311 free-living, overweight/obese (BMI
27-40) nondiabetic, premenopausal women.
Blinding – Not possible for patients
Interventions – 4 diets
Primary outcome – 12 month wt loss
Secondary outcomes – Lipids (LDL,HCL), insulin,
glucose, BP, energy, carbs, protein
18
Statistical checklist
*How sample size was determined-power
*How randomization was done, if relevant
(Blocks, stratified?)
*Interim analysis & stopping rules, if any.
*Statistical methods used-tests (parametric
or non), models, univariate & multivariate
methods, multiple testing correction
method (if any), assumptions
19
Diet study –sample size
Sample size –”Based on previous trials,
we projected a 6.3-kg SD of weight
change….Thus, with 4 treatment
groups and a projected 75
participants per group, the study was
designed to have 80% power to
detect a 2.7 kg difference for the 12
month weight change between
groups.”
20
Diet study – sample size
σ = 6.3 kg
Δ = 2.7 kg
α = 0.05,
Z0.975=1.96
power = 0.80, Z0.800=0.84
n = 2(Zα + Zpower)2(σ/Δ)2
= 2(1.96 + 0.84)2 (σ/Δ)2
n = (15.7)(6.3/2.7)2 = 85 ???
(ANOVA correction, k=4, gives n=81)
21
Diet Study- Randomization
Randomization – “Randomization was
conducted in blocks of 24 (6 per
treatment group) and occurred by
having a blinded research technician
select folded pieces of paper with
group assignments from an opaque
envelope”.
22
Diet Study- stat methods
Statistical Methods - Differences among
diets for 12-month changes from baseline
were tested by ANOVA. For statistically
significant ANOVAs, all pairwise
comparisons among the 4 diets were
tested using the Tukey studentized range
adjustment. Statistical testing of changes
from baseline to 2 months and to 6 months
using pairwise comparisons are presented
for descriptive purposes.
23
Diet study – stat methods
Multiple regression was used to examine
potential interactions between
race/ethnicity and diet group for effects on
weight loss; there were no significant
interactions. All statistical tests were 2tailed using a significance level of 0.05.
24
Reporting:
Number analyzed, lost/missing, excluded
Baseline data including demographics
Stats for primary outcome at least!
Absolute and relative measures for binary
outcomes
Discussion:
Limitations, generalisability, biases, was
power/sample size adequate?
Sources of funding, acknowledgements
25
26
27
Diet study-baseline
28
Diet study- Primary outcome
Atkins is best
29
Diet study- limitations
Insufficient follow up – wt still increasing
No assessment of diet adherence
Prior exposure to diet before study
No assessment of satiety
Menstural cycle not considered when
measuring lipids
30
Good organization – Put the statistical
methods in a subsection of the methods
section.
Don’t need to repeat statistical methods in
the results section, don’t put methods in
results section or results in the methods
section.
Try to use the same order of presentation in
all sections.
31
Example:
Statistical methods: Means +/- SEM in ml are
reported for sperm volume.
The p values for comparing mean sperm
volume across groups were computed under
an analysis of variance (ANOVA) model. The
Fisher LSD criterion was used to control for
multiple comparisons.
Quantile plots were examined to see if sperm
volume followed the Gaussian distribution.
32
(methods):
The p values for comparing proportions were
computed using chi-square tests.
MultivariateLinear regression was used to quantify the
association of sperm volume with age and
occupation. Spline methods were used to
verify that the relationship was linear.
A p < 0.05 is considered statistically significant.
33
Results:
Table 1 shows that the mean sperm volume
was highest in the biostatistics group and
was significantly higher than controls
(5.0+/- 1.0 vs 3.2+/-0.9, p = 0.0274).
Figure 1 shows sperm volume versus age in
controls. The average rate of decline was
0.30 +/- 0.07 ml/year.
34
Mistakes that make me suspicious
*If there are “n” rats in your study, do the
various sample size tables add up to “n”?
*Is the amount of missing reported?
*Do numbers add up?
*If the mean in group A is 20 and the mean in
group B is 40, and these are the only groups,
is the overall mean between 20 and 40?
*If mean time from v1 to v2 is 4 mos & from v2 to
v3 is 1 month, is time from v1 to v3=5 mos?
35
*Do SDs imply that the 95% PI takes
impossible values (negative)?
(need a transformation?)
*Are correlations, ORs, RRs, in impossible
directions?
*Are p values reported with NO summary
statistics? (gender was sig, p < 0.01)
* Are SEs or CIs reported, labeled with
correct units & not confused with SDs or
PIs?
36
Survival
Are survival curves and/or hazard rates
reported (with SEs), not just percent alive
with no time info attached?
Want statistics for follow up time including
median (not mean) follow up time. Same
follow up in each group?
Should not extend survival curve to last
follow up time, when most are dead and/or
have no follow up.
Multivariate regression:
What were the candidate predictors?
What method was used to pick the final
perdictors & what α level was used?
Were interactions considered? At least with
the primary predictor of interest?
Was linearity assumed or verified?
(JG-Final equation given in an appendix or footnote?)
SEs or CIs given for all βs or their
transformation (ORs, HRs)?
Were goodness of fit measures reported?
38
Regression results are in tables
Predictors of in hospital infection
Characteristic Odds Ratio (95% CI) p value
Incr APACHE score 1.15 (1.11-1.18) <.001
Transfusion (y/n)
4.15 (2.46-6.99) <.001
Increasing age
1.03 (1.02-1.05) <.001
Malignancy
2.60 (1.62-4.17) <.001
Max Temperature
0.70 (0.58-0.85) <.001
Adm to treat>7 d
1.66 (1.05-2.61) 0.03
Female (y/n)
1.32 (0.90-1.94) 0.16
*APACHE = Acute Physiology & Chronic Health Evaluation Score
Abstracts vs manuscripts
Bad (but common) practice to “rush” to get
abstract in before a deadline with idea that
sloppy methods and results will be “fixed”
in manuscript. I conclude therefore that
many abstracts are not to be trusted.
Are there results in the Abstract that are not
in the text?
“Never enough time to do it right, always
enough time to do it over.”
40
Science Citation index (Thomson Reuters)
Scopus (Elsevier)
“Bibliometrics” from these sources
imply that perhaps 1/3 to 1/2 of all science
papers are never cited or are only cited
once. This implies that, at least in part,
there is a LOT of sloppy stuff in the
literature that is not useful. This cluttering
may actually obscure the truth.
41
Publication bias
Negative results often not reported
Ethical issues/conflict of interest – Financial
disclosure and incentives. A problem when
private industry funds studies.
Investigators funded by the sponsors of a
treatment are more likely to give a positive
report.
42
When doing research, try to
follow the advice of Mark
Twain
“Always tell the truth. This will
gratify some people and
astonish the rest”.
43
Download