Statistical Reporting
Stats in the literature
Tables, Graphs, Words
( van Belle-Stat rules of Thumb,2002)
Bad – The blood types of the US population are 40%, 11%, 4% and 45% for A, B, AB and O respectively.
Good Blood type percent
O 45%
A 40%
B 11%
AB 4%
2
Occupation active in 1980 - bad
Chiropractors 25,600
Dentists 121,240
Nutritionists 32,000
Nurses 1,272,900
Optometrists 22,330
Pharmacists 142,780
Physicians 427,122
3
Good
Occupation active in 1980
Nurses 1,272,900
Physicians 427,122
Pharmacists 142,780
Dentists 121,240
Nutritionists 32,000
Chiropractors 25,600
Optometrists 22,330
4
Don’t use pie charts
(particularly for 2+ dimensions)
5
Blood type Rh +
O 38
A
B
34
9
AB total
3
84
Rh-
7
6
2
1
16 total
45
40
11
4
100
6
Bar graphs ok in simple situations
(& put in SE or CI bars)
7
Don’t use bar graphs in complex situations
8
9
* = 0.05 ≤ p < 0.10
** = 0.01 ≤ p < 0.05
*** = p < 0.01
10
*
**
11
Report actual p values
Be clear about the reference group group n mean SEM p value* cntl 12
B 22
20
35
5
6 ref
0.062
C
D
9
7
50
15
5
4
0.020
0.124
*p value compared to control group report actual p values when possible, not just an asterisk if p < 0.05. (Tables better than bar charts for this).
12
13
Don’t use 3-D bars for one variable.
Report n. Report SE or CI for all estimates
Make sure all estimates are labeled with the correct units (ie b=3.0 mg/L, not b=3.0)
Define any uncommon symbols, including those for equations. (b=slope, Δ=mean difference in mg). Define abbreviations.
Make sure all tables have a title.
Label all axes & bars and explain the meaning of all graph symbols.
Be careful about drawing firm conclusions about issues on which no data were presented. Distinguish between conclusions vs discussion/speculation.
Enough info should be given so results could be reproduced by a person with the proper training if raw data was provided.
CONSORT (CONsolidated Standards for
Reporting Trials) guidelines are formulated for clinical trials-but most of the guidelines apply to other designs.
STROBE (STrengthening the Reporting of
OBservational studies in Epi) statement.
TRIPOD (Transparent Reporting of a multivariable prediction model for Individual
Prognosis Or Diagnosis
16
*Objectives & hypotheses
*Background & rationale
*Design, inclusion/exclusion criteria
*Settings (population) & locations where data collected
*Blinding (if relevant)
*Interventions (if any)
*Primary & secondary outcomes
17
Objective- Compare wt loss under 4 diets
Design – RCT, twelve month trial
Setting & Participants – local recruitment. Trial conducted in US from Feb 2003 to Oct 2005 among 311 free-living, overweight/obese (BMI
27-40) nondiabetic, premenopausal women.
Blinding – Not possible for patients
Interventions – 4 diets
Primary outcome – 12 month wt loss
Secondary outcomes – Lipids (LDL,HCL), insulin, glucose, BP, energy, carbs, protein
18
*How sample size was determined-power
*How randomization was done, if relevant
(Blocks, stratified?)
*Interim analysis & stopping rules, if any.
*Statistical methods used-tests (parametric or non), models, univariate & multivariate methods, multiple testing correction method (if any), assumptions
19
Sample size –”
Based on previous trials, we projected a 6.3-kg SD of weight change ….
Thus, with 4 treatment groups and a projected 75 participants per group, the study was designed to have 80% power to detect a 2.7 kg difference for the 12 month weight change between groups.”
20
σ = 6.3 kg α = 0.05, Z
0.975
=1.96
Δ = 2.7 kg power = 0.80, Z
0.800
=0.84
n = 2(Z
α
+ Z power
) 2 ( σ/Δ) 2
= 2(1.96 + 0.84) 2 ( σ/Δ) 2 n = (15.7)(6.3/2.7) 2 = 85 ???
(ANOVA correction, k=4, gives n=81)
21
Randomization – “Randomization was conducted in blocks of 24 (6 per treatment group) and occurred by having a blinded research technician select folded pieces of paper with group assignments from an opaque envelope”.
22
Statistical Methods - Differences among diets for 12-month changes from baseline were tested by ANOVA. For statistically significant ANOVAs, all pairwise comparisons among the 4 diets were tested using the Tukey studentized range adjustment. Statistical testing of changes from baseline to 2 months and to 6 months using pairwise comparisons are presented for descriptive purposes.
23
Multiple regression was used to examine potential interactions between race/ethnicity and diet group for effects on weight loss; there were no significant interactions. All statistical tests were 2tailed using a significance level of 0.05.
24
Reporting:
Number analyzed, lost/missing, excluded
Baseline data including demographics
Stats for primary outcome at least!
Absolute and relative measures for binary outcomes
Discussion:
Limitations, generalisability, biases, was power/sample size adequate?
Sources of funding, acknowledgements
25
26
27
28
Atkins is best
29
Insufficient follow up – wt still increasing
No assessment of diet adherence
Prior exposure to diet before study
No assessment of satiety
Menstural cycle not considered when measuring lipids
30
Good organization – Put the statistical methods in a subsection of the methods section.
Don’t need to repeat statistical methods in the results section, don’t put methods in results section or results in the methods section.
Try to use the same order of presentation in all sections.
31
Statistical methods: Means +/- SEM in ml are reported for sperm volume.
The p values for comparing mean sperm volume across groups were computed under an analysis of variance (ANOVA) model. The
Fisher LSD criterion was used to control for multiple comparisons.
Quantile plots were examined to see if sperm volume followed the Gaussian distribution.
32
(methods):
The p values for comparing proportions were computed using chi-square tests.
Multivariate-
Linear regression was used to quantify the association of sperm volume with age and occupation. Spline methods were used to verify that the relationship was linear.
A p < 0.05 is considered statistically significant.
33
Results:
Table 1 shows that the mean sperm volume was highest in the biostatistics group and was significantly higher than controls
(5.0+/- 1.0 vs 3.2+/-0.9, p = 0.0274).
Figure 1 shows sperm volume versus age in controls. The average rate of decline was
0.30 +/- 0.07 ml/year.
34
Mistakes that make me suspicious
*If there are “n” rats in your study, do the various sample size tables add up to “n”?
*Is the amount of missing reported?
*Do numbers add up?
*If the mean in group A is 20 and the mean in group B is 40, and these are the only groups, is the overall mean between 20 and 40?
*If mean time from v
1 v
3 to v
2 is 4 mos & from v is 1 month, is time from v
1 to v
3
=5 mos?
2 to
35
*Do SDs imply that the 95% PI takes impossible values (negative)?
(need a transformation?)
*Are correlations, ORs, RRs, in impossible directions?
*Are p values reported with NO summary statistics? (gender was sig, p < 0.01)
* Are SEs or CIs reported, labeled with correct units & not confused with SDs or
PIs?
36
Are survival curves and/or hazard rates reported (with SEs), not just percent alive with no time info attached?
Want statistics for follow up time including median (not mean) follow up time. Same follow up in each group?
Should not extend survival curve to last follow up time, when most are dead and/or have no follow up.
Multivariate regression (TRIPOD):
What were the candidate predictors?
What method was used to pick the final predictors (ex:what α level was used?)
Were interactions considered at least with the primary predictor of interest (jg) ?
Was linearity assumed or verified?
(JG-Final equation given in an appendix or footnote?)
SEs or CIs given for all βs or their transformation (ORs, HRs)?
Were performance/fit measures reported?
Was there validation?
38
Predictors of in hospital infection
Characteristic Odds Ratio (95% CI) p value
Incr APACHE score 1.15 (1.11-1.18) <.001
Transfusion (y/n) 4.15 (2.46-6.99) <.001
Increasing age 1.03 (1.02-1.05) <.001
Malignancy 2.60 (1.62-4.17) <.001
Max Temperature 0.70 (0.58-0.85) <.001
Adm to treat>7 d 1.66 (1.05-2.61) 0.03
Female (y/n) 1.32 (0.90-1.94) 0.16
*APACHE = Acute Physiology & Chronic Health Evaluation Score
Bad (but common) practice to “rush” to get abstract in before a deadline with idea that sloppy methods and results will be “fixed” in manuscript. I conclude therefore that many abstracts are not to be trusted .
Are there results in the Abstract that are not in the text?
“Never enough time to do it right, always enough time to do it over.”
40
Science Citation index (Thomson Reuters)
Scopus (Elsevier)
“Bibliometrics” from these sources imply that perhaps 1/3 to 1/2 of all science papers are never cited or are only cited once. This implies that, at least in part, there is a LOT of sloppy stuff in the literature that is not useful. This cluttering may actually obscure the truth.
41
Negative results often not reported
Ethical issues/conflict of interest – Financial disclosure and incentives. A problem when private industry funds studies.
Investigators funded by the sponsors of a treatment are more likely to give a positive report.
42
When doing research, try to follow the advice of Mark
Twain
“Always tell the truth. This will gratify some people and astonish the rest”.
43