Insights into the black box of statistics

advertisement
What do you think about a doctor who
uses the wrong treatment, either wilfully
or through ignorance, or who uses the
right treatment wrongly (such as by
giving the wrong dose of a drug)?
Most people would agree that such
behaviour is unprofessional, arguably
unethical, and certainly unacceptable.
Derived from: Altman DG. The Scandal of Poor Medical Research. BMJ, 1994; 308:283
What do you think about researchers who use
the wrong techniques (either wilfully or in
ignorance), use the right techniques wrongly,
misinterpret their results, report their results
selectively or draw unjustified conclusions?
We should be appalled… but numerous studies
of the medical literature have shown that all of
the above phenomena are common.
Derived from: Altman DG. The Scandal of Poor Medical Research. BMJ, 1994; 308:283
Understanding your results
Research Talk
2015
Dr Emily Karahalios
emily.karahalios@unimelb.edu.au
Office for Research, Western Centre for Health Research & Education
Centre for Epidemiology and Biostatistics, Melbourne School of Population
and Global Health, University of Melbourne
Overview
• Defining your research question – PICOS
• Describing data
• Understanding the results
– Estimates reported in the literature
– Interpreting 95% confidence intervals and pvalues ~ Statistical Inference
Research question
Participants / population
• neonates
Intervention / exposure
• 14 day administration of antenatal corticosteroids
Comparison
• 7 day administration of antenatal corticosteroids
Outcome
• Neonatal mortality and neonatal morbidity
Study design
• RCT
Research question
Murphy et al. The Lancet, 2008; 372:2143-2151.
Research question
Participants / population
• Neonates
Intervention / exposure
• 14 day administration of antenatal corticosteroids
Comparison
• 7 day administration of antenatal corticosteroids
Outcome
• Neonatal mortality and neonatal morbidity
Study design
• RCT
Research question
Participants / population
• Women at high risk of preterm birth
Intervention / exposure
• 14 day administration of antenatal corticosteroids
Comparison
• 7 day administration of antenatal corticosteroids
Outcome
• Neonatal mortality and neonatal morbidity
Study design
• RCT
Study designs
The general idea…
– Evaluate whether a risk factor (or
preventative factor) increases (decreases)
the risk of an outcome (e.g. disease, death,
etc)
time
exposure
outcome
Overview
• Defining your research question – PICOS
• Describing data
• Understanding the results
– Estimates reported in the literature
– Interpreting 95% confidence intervals and pvalues ~ Statistical Inference
Study designs
The general idea…
– Evaluate whether a risk factor (or
preventative factor) increases (decreases)
the risk of an outcome (e.g. disease, death,
etc)
time
exposure
outcome
Summarising the data
Murphy et al. The Lancet, 2008; 372:2143-2151.
Summarising the data
Dreyfus et al. Journal of Pediatrics, 2015 online.
Summarising the data
Table 1: Baseline characteristics for 262 patients
Number (%) or
mean [SD] or
median {IQR}
Sex
Males
Females
Age (years)
Country of birth
Australia/NZ
Elsewhere
Length of stay (days)
128 (51.2)
134 (48.9)
49.8 [2.1]
105 (40.1)
157 (59.9)
4 {4.0}
Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand
Summarising the data
Continuous (age, weight, height)
Numerical
Discrete (length of stay, # of hospital
visits)
Nominal (sex, blood group)
Categorical
Ordinal (tumour stage, quintile of SES)
Summarising the data
• Which variables are categorical?
– Sex (Male/Female)
– Country of birth (Australia/Elsewhere)
• Which variables are continuous?
– Age (years)
– Length of stay (days)
Summarising the data
Table 1: Baseline characteristics for 262 patients
Number (%) or
mean [SD] or
median {IQR}
Sex
Males
Females
Age (years)
Country of birth
Australia/NZ
Elsewhere
Length of stay (days)
128 (51.2)
134 (48.9)
49.8 [2.1]
105 (40.1)
157 (59.9)
4 {4.0}
Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand
Summarising the data
Table 1: Baseline characteristics for 262 patients
Number (%) or
mean [SD] or
median {IQR}
Sex
Males
Females
Age (years)
Country of birth
Australia/NZ
Elsewhere
Length of stay (days)
128 (51.2)
134 (48.9)
49.8 [2.1]
105 (40.1)
157 (59.9)
4 {4.0}
Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand
Summarising the data
Table 1: Baseline characteristics for 262 patients
Number (%) or
mean [SD] or
median {IQR}
Sex
Males
Females
Age (years)
Country of birth
Australia/NZ
Elsewhere
Length of stay (days)
128 (51.2)
134 (48.9)
49.8 [2.1]
105 (40.1)
157 (59.9)
4 {4.0}
Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand
.1
0
.05
Density
.15
.2
Summarising the data
44
46
Stata command: histogram Age
48
50
Age (years)
52
54
Summarising the data
Mean = 49.8 years
Standard deviation
n
å(x - x)
2
i
=
i=1
(n -1)
= 2.1 years
Note, 95% of observations lie within approximately ±2×SD of the mean
In this example, 95% of observations lie within 45.6 and 54.0 years.
Summarising the data
Table 1: Baseline characteristics for 262 patients
Number (%) or
mean [SD] or
median {IQR}
Sex
Males
Females
Age (years)
Country of birth
Australia/NZ
Elsewhere
Length of stay (days)
128 (51.2)
134 (48.9)
49.8 [2.1]
105 (40.1)
157 (59.9)
4 {4.0}
Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand
Summarising the data
Table 1: Baseline characteristics for 262 patients
Number (%) or
mean [SD] or
median {IQR}
Sex
Males
Females
Age (years)
Country of birth
Australia/NZ
Elsewhere
Length of stay (days)
128 (51.2)
134 (48.9)
49.8 [2.1]
105 (40.1)
157 (59.9)
4 {4.0}
Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand
.1
0
.05
Density
.15
.2
Summarising the data
0
5
Stata command: hist LOS
10
15
20
Length of stay (days)
25
30
35
Summarising the data
Stata command: hist LOS, normal
Summarising the data
Mean = 5 days
Summarising the data
Mean = 5 days
Median
= 50th percentile
= 4 days
Summarising the data
Mean = 5 days
Median = 4 days
Mean is not a good measure of central tendency
and standard deviation is not a good
deviation
measures of spread forStandard
a skewed
distribution
Note, 95% of observations lie within approximately ±2SD of the mean.
In this example, 95% of observations lie within -4.8 and 14.8 days
BUT they don’t because LOS can’t be negative!
Summarising the data
Median
= 50th percentile
= 4 days
Inter-quartile range (IQR)
= lower quartile – upper quartile
= 25th percentile – 75th percentile
= 2 to 6 days
Summarising the data
Table 1: Baseline characteristics for 262 patients
Number (%) or
mean [SD] or
median {IQR}
Sex
Males
Females
Age (years)
Country of birth
Australia/NZ
Elsewhere
Length of stay (days)
128 (51.2)
134 (48.9)
49.8 [2.1]
105 (40.1)
157 (59.9)
4 {2, 6}
Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand
Summarising the data
Table 1: Baseline characteristics for 262 patients
Number (%) or
mean [SD] or
median {IQR}
Sex
Males
Females
Age (years)
Country of birth
Australia/NZ
Elsewhere
Length of stay (days)
128 (51.2)
134 (48.9)
49.8 [2.1]
105 (40.1)
157 (59.9)
4 {2, 6}
Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand
Summarising the data
Central tendency
Central tendency
Spread
Spread
Summarising the data
Positive skew
Negative
skew
Summarising the data
Data variable - numerical
Plot histogram
NOT normally
distributed
Normally
distributed
Unimodal
Mean
Standard deviation
Minimum-maximum
Median
Inter-quartile range
Minimum-maximum
Multimodal
Categorise
variable
Simpson et al. J Fam Plan and Rep Health Care, 2001; 27:234-236.
Summarising the data
Numerical
Continuous
(age, weight, height)
Normally
distributed
Skewed
Discrete
(length of stay, # of hospital visits)
Categorical
Nominal
(sex, blood group)
Ordinal
(tumour stage, quintile of SES)
Absolutely critical to choosing the appropriate
form of statistical analysis
Overview
• Defining your research question – PICOS
• Describing data
• Understanding the results
– Estimates reported in the literature
– Interpreting 95% confidence intervals and pvalues ~ Statistical Inference
Study designs
The general idea…
– Evaluate whether a risk factor (or
preventative factor) increases (decreases)
the risk of an outcome (e.g. disease, death,
etc)
time
exposure
outcome
Estimates reported in the literature
– Risk differences
– Odds ratios / risk ratio – logistic regression
– Beta-coefficients – linear regression
Summarising the data
Numerical
Continuous
(age, weight, height)
Normally
distributed
Skewed
Discrete
(length of stay, # of hospital visits)
Categorical
Nominal
(sex, blood group)
Ordinal
(tumour stage, quintile of SES)
Measures of association – binary outcome
Binary variables – two categories only
(also termed – dichotomous variable)
Examples:
• Outcome – diseased or healthy; alive or dead
• Exposure – male or female; smoker or non-smoker;
treatment or control group
Comparing two proportions
With outcome
(diseased)
Without outcome
(disease free)
Total
Exposed
(group 1)
d1
h1
n1
Unexposed
(group 0)
d0
h0
n0
Total
d
h
n
• Proportion of all subjects experiencing outcome, p =
d/n
• Proportion of exposed group, p1 = d1/n1
• Proportion of unexposed group, p0 = d0/n0
Comparing two proportions - TBM Trial
Adults with tuberculous meningitis randomly allocated
into 2 treatment groups:
1. Dexamethasone
2. Placebo
Outcome measure: Death during 9 months following
start of treatment.
Research question:
Can treatment with dexamethasone reduce the risk of
death among adults with tuberculous meningitis?
Thwaites et al 2004
Comparing two proportions
Death during 9 months post start of treatment
Treatment group
Yes
No
Total
Dexamethasone
(group 1)
87
187
274
Placebo
(group 0)
112
159
271
Total
199
346
545
Thwaites et al 2004
Comparing two proportions - TBM Trial
Measure of effect
Formula
Risk difference
Risk Ratio (RR)
Odds Ratio (OR)
p1-p0
p1/p0
(d1/h1)/(d0/h0)
When there is no association between exposure and
outcome:
– Risk difference = 0
– Risk ratio (RR) = 1
– Odds Ratio (OR) = 1
Comparing two proportions
Death during 9 months post start of treatment
Treatment group
Yes
No
Total
Dexamethasone
(group 1)
87 (d1)
187 (h1)
274 (n1)
Placebo
(group 0)
112 (d0)
159 (h0)
271 (n0)
199
346
545
Total
Risk difference = p1-p0 = (87/274)-(112/271) = -0.095
Risk ratio = p1/p0 = (87/274)/(112/271) = 0.77
Odds ratio = (d1/h1)/(d0/h0) = (87/187)/(112/159) = 0.66
Thwaites et al 2004
Comparing two proportions - TBM Trial
Estimates reported in the literature
– Risk differences
– Odds ratios / risk ratio – logistic regression
– Beta-coefficients – linear regression
Summarising the data
Numerical
Continuous
(age, weight, height)
Normally
distributed
Skewed
Discrete
(length of stay, # of hospital visits)
Categorical
Nominal
(sex, blood group)
Ordinal
(tumour stage, quintile of SES)
Linear regression
Dreyfus et al. Journal of Pediatrics, 2015 online.
Linear regression
There are four assumptions underlying our linear regression model:
Linearity (outcome and exposure)
Normality (residual variation)
Independence (of observations)
Homoscedasticity (constant variance)
Overview
• Defining your research question – PICOS
• Describing data
• Understanding the results
– Estimates reported in the literature
– Interpreting 95% confidence intervals and pvalues ~ Statistical Inference
Statistical Inference
Statistical Inference
We follow a standard four-step process
1)
2)
3)
4)
Sample size
Estimate of the effect size
Calculate a confidence interval
Derive a p-value to test the hypothesis of no
association
Statistical Inference
Definition of a confidence interval
REMEMBER! ..
If we were to draw several independent,
random samples (of equal size) from the
sample population and calculate 95%
confidence intervals for each of them,
0.4
0.35
then on average 19 out of every 20 (95%)
such confidence intervals would
contain the true population
proportion (! ), and one of every 20
(5%) would not.
Sample proportion and 95% CI
0.3
Population
proportion = 0.16
(16%)
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
8
9
10
11
Sam ple
12
13
14
15
16
17
18
19
20
Statistical Inference
P-value
How likely is it
we would see a
difference this big
What is the
probability (P-value)
of finding the
observed difference
IF
IF
There was NO real
difference between
the populations?
The null hypothesis
is true?
Statistical Inference
Interpretation of p-values
1
P-value
0.1
0.01
0.001
0.0001
Weak evidence against
the null hypothesis
Increasing evidence against
the null hypothesis with
decreasing P-value
Strong evidence against
the null hypothesis
Statistical Inference
Overweight and obese adults living in the UK
Mean weight loss after 4 weeks
Atkins group – 4.40 kg
Weight Watchers group – 2.86 kg
300 adults participating in a RCT comparing 2 dietary interventions
Source: Truby H et al. BMJ 2007
Statistical Inference
Example: Randomised controlled trial of
weight loss programmes in the UK
Group
n
Sample mean
Weight loss after
4 weeks (kg)
Sample
standard
deviation
Sample
standard error
Atkins
57
4.40
2.45
0.32
Weight
Watchers
58
2.86
2.23
0.29
1) Estimate of difference in population mean weight loss after 4 weeks between
Atkins & Weight Watchers groups = 4.40 – 2.86 = 1.54 kg
2) 95% CI: 0.67 kg to 2.41 kg
Source: Truby H et al. BMJ 2007
Statistical Inference
Interpretation
1)
We found a difference of 1.54 kg in mean weight loss
after 4 weeks between the Atkins & Weight Watchers
diet groups.
2)
From the 95% confidence interval, the true difference
could be as much as 2.41 kg (much greater weight
loss for Atkins diet) or 0.67 kg (marginally greater
weight loss for the Atkins diet compared with Weight
Watchers).
Statistical Inference
P-value: comparing two groups
How likely is it
we would see a
difference this big
What is the
probability (P-value)
of finding the
observed difference
IF
IF
There was NO real
difference between
the populations?
The null hypothesis
is true?
Statistical Inference
Null hypothesis –
There is no difference in the population mean weight loss after 4
weeks between the Atkins and Weight Watchers groups
2-sided p-value <0.001
Thus the probability of observing a difference of at least 1.54 kg in the
sample means of the two groups, assuming the null hypothesis is true,
is <0.001 or <0.1%.
Statistical Inference
Presenting the results
1) Sample size
300 adults participating in a RCT comparing 2 dietary interventions
2) Estimate of the effect size
Mean weight loss after 4 weeks for Atkins group compared to
Weight watchers: 1.54 kg
3) Calculate a confidence interval
95% CI for difference in population means: 0.67 kg to 2.41 kg
4) Derive a p-value to test the hypothesis of no
association
P-value < 0.001
Overview
• Defining your research question – PICOS
• Describing data
• Understanding the results
– Estimates reported in the literature
– Interpreting 95% confidence intervals and pvalues ~ Statistical Inference
Download