The Case for Vitamin K in Osteoarthritis: A Joint

advertisement
Epidemiology Module 3:
Systematic and Random Error
(Biases and Statistical Precision)
Tuhina Neogi, MD, PhD, FRCPC
Steven Vlad, MD
Clinical Epidemiology Research and Training Unit
Boston University School of Medicine
Goals
• Understand the difference between
systematic error and random error
• Review types of systematic error (bias)
– confounding, information bias (measurement
error), selection bias
• Review random error (statistical precision)
– correct interpretations of confidence intervals
and p-values
– Type 1 and Type 2 errors
Bias
3
How to interpret results of a study
• The result of a study (the stated effect
measure) can arise because:
– It is the truth
– It is due to bias (systematic error)
– It is due to chance (random error)
Bias/Systematic Error
• Once we know the study result, we need
to focus on whether the result is the truth,
or whether it could have been the result of
bias
– If a study’s result is RR=1.2, could the true
(unbiased) result be higher or lower than that
value?
• How could possible biases influence the reported
result?
What are biases
(systematic errors)?
• Biases can either move the observed result (as
opposed to the true result) away from the null, or
toward toward the null
– Null = 0 for difference measures (e.g. risk
difference)
– Null = 1 for ratio measures (e.g. risk ratio)
– Away =
• if true result > null, effect appears larger
• if true result < null, effect appears smaller
– Toward =
• if true result > null, effect appears smaller
• if true result < null, effect appears larger
– Bias has nothing to do with sample size
• effect of study design
Bias Away from Null
0
1
Observed
Value A
-∞
Null
True Value
B
1
0
Observed
Value A
0
+∞
+∞
0
True Value
A
-∞
Observed
Value B
+∞
Observed
Value B
Bias Toward Null
+∞
Direction of Bias
• Often, the result of bias is unpredictable
– Either toward or away from null
• In some circumstances we can predict the
effect a bias
– Usually toward the null
8
Types of Bias
• Only 3 types of bias!
– all studies
– regardless of study design
• The different study designs have ways of dealing
with these biases – whether it’s done effectively
determines how valid the study is
The 3 different types of biases
• Confounding
• Information Bias (Measurement Error)
• Selection Bias
Confounding
11
Confounding
• Question:
– Are there other factors that may account for
the apparent relationship between exposure
(intervention) and disease (outcome)?
– or
– Is there a common cause of both exposure
and disease?
Two ways to look at
Confounding
Confounder
Exposure
O
O
Disease
Confounder
Exposure
Disease
Confounding
• Note
– Confounder not caused by exposure or disease
– Confounder not on causal path from exposure
to disease
• i.e. not an intermediate
Confounder
Exposure
Disease
Examples
Smoking
Yellow Finger
Lung Cancer
Diabetes
CRP
CAD
Depression
SSRI
Suicide
15
Why Does Confounding Occur?
• As an imbalance in proportions of the
confounder between the two comparison
groups
Total Population
No. of Cases
N
Risk
Risk Ratio
Exposed
Unexposed
4,600
140
1,000,000
1,000,000
0.0046
0.00014
0.0046 / 0.00014 = 33
16
Total Population
Exposed Unexposed
Cases
N
4,600
140
1,000,000 1,000,000
Risk
0.0046
RR
Cases
N
Risk
RR
0.00014
33
Men
Women
Exposed Unexposed
Exposed Unexposed
4,500
50
900,000
100,000
N
0.005
0.0005
Rate
10
Cases
RR
100
90
100,000
900,000
0.001
0.0001
10
Total Population
Exposed Unexposed
Cases
N
4,600
140
1,000,000 1,000,000
Risk
0.0046
RR
Cases
33
Men
Women
Exposed Unexposed
Exposed Unexposed
4,140
14
N
900,000
100,000
Risk
0.0046
0.00014
RR
0.00014
33
Cases
460
126
N
100,000
900,000
Risk
0.0046
0.00014
RR
33
Cases
N
Risk
Men
Women
Exposed Unexposed
Exposed Unexposed
2,500
250
500,000
500,000
N
0.005
0.0005
Risk
RR
Cases
10
1,000
100
500,000
500,000
0.001
0.0001
RR
10
Total Population
Exposed Unexposed
Cases
N
Risk
RR
3,500
350
1,000,000 1,000,000
0.0035
0.00035
10
Why Does Confounding Occur?
Confounder
Exposure
Cases
N
Risk
RR
Disease
Men
Women
Exposed Unexposed
Exposed Unexposed
4,500
50
900,000
100,000
N
0.005
0.0005
Rate
10
Cases
RR
100
90
100,000
900,000
0.001
0.0001
10
Why Does Confounding Occur?
Smoking
Yellow Fingers
Lung Cancer
Smokers
Non-smokers
Yellow
Not Yellow
4,500
500
900,000
100,000
Risk
0.005
0.005
RR
1
RR
1
Crude RR = (4,600/1M) / (1,400/1M) = 3.3
Cases
N
Cases
N
Rate
Yellow
Not Yellow
100
900
100,000
900,000
0.001
0.001
Confounding cont’d
• Does hyperlipidemia cause MIs?
– Do higher lipid levels cause MIs?
BP, age, gender, BMI,
smoking, DM
?
High Lipids
MI
– Does lowering lipid levels lower the risk of
MI?
• Does statin use lower lipids?
– If so, does that
havegender,
an effect
on lowering MI events?
BP, age,
BMI,
smoking, DM, other healthy
lifestyle factors, adherence
?
statins
levels, MI
lower lipid
– Does a diet high in cholesterol increase the
risk of MI?
• Does a ‘heart healthy diet’ lower lipids?
– If so, does that have an effect on lowering MI events?
BP, age, gender, BMI,
smoking, DM, other healthy
lifestyle factors, diet,
exercise, adherence
?
Healthy heart diet
lower lipid levels, MI
Confounding by Indication
• Particularly common and difficult to deal
with in (observational)
pharmacoepidemiology studies
RA disease
severity
Lymphoma
TNF-antagonist
use
Depression
SSRI
Suicide
What Confounding is NOT
• Confounding IS NOT
– A factor on the causal pathway (intermediate)
• high fat diet  LDL  CAD
• smoking adenomatous polyp  colon CA
– A factor that modifies the relationship
between an exposure and a disease
• Effect of anti-HTN drug is different in Blacks vs
Whites
• Effect of blood levels of X on risk of Y in men vs
women
Effect Modification
(aka Interaction)
Total Population
High BP
Normal BP
MI
4,700
140
N
1,000,000
1,000,000
0.0047
0.00014
Risk
RR
33.6
White Americans
High BP
Normal BP
MI
4,500
50
N
900,000
0.005
Risk
RR
10
Black Americans
High BP
Normal BP
MI
200
90
100,000
N
100,000
900,000
0.0005
Rate
0.002
0.0001
RR
20
How does a RCT address
Confounding?
• Randomization
– evenly distribute both known and unknown
confounders (by chance) between the two (or
more) exposure groups (i.e. active treatment
and placebo)
– Can use specific inclusion/exclusion criteria to
ensure everyone is the same for a particular
confounder (e.g., all males in the study) [also
known as restriction]
Control of Confounding by
Randomization
– Since potential confounders are balanced
between exposure groups, they can’t
confound the association
Depression
SSRI
placebo
250
250
N
500,000
500,000
Risk
0.0005
0.0005
Death
RR
1
1,000,000 get SSRI
1,000,000 get placebo
No Depression
SSRI
placebo
100
100
N
500,000
500,000
Risk
0.0002
0.0002
Death
RR
Depression
SSRI
1
Depression distributed equally by chance, so:
Crude RR = (350/1M) / (350/1) = 1
Suicide
Control of Confounding in an RCT
1. Check Table 1 for imbalances between
treatment arms
–
–
–
–
–
Are any of the differences clinically meaningful?
(not the same as statistically significant!)
How could those differences affect the results?
What other potential confounders are missing
from Table 1?
Could they affect the results if they were
imbalanced between the treatment arms?
How likely is it that unknown confounders are
imbalanced? (with large RCTs, unlikely)
Control of Confounding in a RCT
2. Intention-to-treat analysis
– Maintains the balance of potential
confounders given by randomization,
thereby continuing to (theoretically) address
confounding
– May need to ‘control’ for very unbalanced
factors in the analyses
Control of Confounding in
Observational Studies
• Study design level: Restriction
– Inclusion/exclusion criteria to limit study population to a more
homogeneous group
• E.g. study of alcohol and MI may exclude smokers since smoking is
an important confounder
• Data analysis level: Stratification
– Analyze data stratified by an important confounder
• E.g. evaluate effect of alcohol on MI among smokers and among
non-smokers separately
• Added advantage: identifies effect modification
• Data analysis level: Regression
– ‘Control’ for potential confounders in regression models
• can also identify effect modification if planned for by investigators
• Matching
– Match on potential confounding variables
Control of Confounding in
Observational Studies
1. Have the investigators identified all the
important potential confounders in the
study?
– What factors could be common causes of
both exposure and disease?
2. Have the investigators accounted for
these potential confounders either in the
design of the study or in the analysis?
3. If they haven’t, how might that affect the
33
results they found?
Confounding: Examples
34
• In large RCTs, confounding is not usually
an important issue
– Randomization distributes confounders
equally between trial arms
• Table 1 appears to confirm this
– Probable confounders seem to be well
balanced
35
Is this difference important? Previous
fractures are a risk for future fractures ...
36
• Questions to ask:
1.What are the potential sources of confounding?
2.Have the authors identified these potential
confounders
3. How have the authors addressed potential
confounders?
4. Was this sufficient?
37
1. What are the potential
sources of confounding?
• Risk factors for exposure
– i.e. for NSAID use
•
•
•
•
confounder
RA? (chronic NSAIDs)
NSAID
CV event
CAD prevention? (ASA)
prior GI bleeding? (avoid NSAIDs, use coxib)
• Are any of these also related to
age? (avoid NSAIDs)
or type of NSAID
the likelihood of having an
event?
38
1. What are the potential
sources of confounding?
• Risk factors for outcome?
confounder
– i.e. having a CV event
•
•
•
•
•
smoking?
prior CV event?
general health?
ASA use? (preventative) •
age?
NSAID
or type of NSAID
CV event
Are any of these also related to
the likelihood of being exposed
to an NSAID or type of NSAID?
39
1. What are the potential
sources of confounding?
• Confounding by indication
– Almost always a concern in observational studies of
drugs
– Here, could the reason that NSAIDs are prescribed (the
‘indication’) be related to both exposure (NSAID
prescription) and disease (CV outcomes)?
– What about reasons to avoid NSAIDs?
• Chronic kidney disease?
• Other health problems?
CKD
pain
NSAID
or type of NSAID
CV event
NSAID
or type of NSAID
Poor health
CV event
NSAID
or type of NSAID
CV event
40
2. Have the authors identified
these potential confounders?
• In this study, seems like yes
– they even have a section talking about ‘covariates’
41
3. How have the authors
addressed potential confounders?
• Statistical analysis,
• ‘Advanced methods’
42
4. Was this sufficient?
• Often the hardest of these questions to answer,
especially if the authors have done a good job
addressing the first three questions
– Requires experience, judgement, maybe further
analysis
– My take on this study: probably sufficient
• They’ve acknowledged the difficulties and done something
about them;
• What they’ve done is appropriate;
• The methods are reasonable and go beyond what most studies
do;
• However, I recognize what an insidious problem confounding,
especially confounding by indication, is, and therefore I’m ready
to change my opinion if further evidence comes to light43
• Questions to ask:
1. What are the potential sources of confounding?
2. Have the authors identified these potential
confounders?
3. How have the authors addressed potential
confounders?
4. Was this sufficient?
44
1. What are the potential
sources of confounding?
• This is another pharmacoepidemiologic study (kind of),
so let’s jump right to confounding by indication again
– Here, could the reason that DMARDs were prescribed (the
‘indication’) be related to both exposure (DMARD use) and
disease (lymphoma)?
• -> Disease severity, i.e. chronic inflammation
disease severity
DMARD
or type of DMARD
lymphoma
45
2. Have the authors identified
these potential confounders?
– Clearly - that is what this paper is all about
46
3. How have the authors
addressed potential confounders?
• Here:
– detailed assessment of disease severity for
each subject
– mutual control of drug use and disease severity
etc.
47
4. Was this sufficient?
• Again, this is often the hardest of these
questions to answer,
– My take on this study: probably sufficient
• They’ve acknowledged the difficulties and done
something about them;
• What they’ve done is appropriate;
• The methods are reasonable;
• Again though, I recognize what an insidious
problem confounding, especially confounding by
indication, is, and therefore I’m ready to change
my opinion if further evidence comes to light
48
Information Bias
(Misclassification)
49
Information Bias
• Could exposure (intervention) or disease
(outcome) be inaccurately reported or
measured
– i.e. misclassified?
Information Bias
• Information about exposure, disease,
and/or confounder(s) is erroneous
– Misclassified into the wrong category
True exposure
Measured exposure
True disease status
Measured disease status
Examples of Information Bias
True EtOH
Exposure misclassification
Reported EtOH
Fetal outcome
True Gastritis status
Outcome misclassification
Drug
True illicit drug use
Gastritis by chart review
True blackouts
BOTH exposure and
outcome misclassification
Reported illicit drug use
If using the same source for exposure
and disease information (e.g., from the
participant), can get very biased results
Reported blackouts
Non-differential Misclassification
• Chance of misclassifying is the same in
– both the exposure groups (or more if more than one)
• and the chance of misclassifying is the same in
– both the outcome groups (ditto)
• i.e. misclassification is random in both exposure and
disease
• General Rule
– If misclassification is non-differential, then the resultant
bias is usually toward the null
• i.e. the measured effect is probably conservative
• Warning: this is true in general. It is not always true
Differential Misclassification
• Chance of misclassifying is not the same in
either
– the two exposure groups (or more if more than one)
• or
– the two outcome groups (ditto)
• i.e. misclassification is not random in either
exposure or disease
• In this case, the resultant bias is unpredictable!
– Consider carefully whether you think any
misclassification is likely to be differential or nondifferential!
How does an RCT address
Information Bias
• EXPOSURE:
– Exposure is assigned (randomly) in an RCT
• the label of the treatment arm is a proxy for the actual
exposure
• bias is usually non-differential
Assigned exposure
Actual exposure
Outcome
How does an RCT address
Information Bias
• EXPOSURE:
– Problems: non-adherence and contamination
• With less than 100% adherence, intention-to-treat
analysis is biased due to information bias (usually
non-differential)
• If using a ‘completers’ analysis (only analyze those
completing the study in their assigned group),
information bias is addressed, but
• introduces confounding and selection bias (when
subjects leave the study)
– randomization has been broken
– reasons for withdrawal could be related to treatment
» TRUTH is somewhere between ITT and
completers analysis when <100% adherence
56
(almost always)
Treatment A
Treatment B
•
•
ITT - thought to be more conservative estimate
• (biased toward null)
Completers analysis - thought to usually exaggerate estimate
57
Information Bias in an RCT
• OUTCOME
• allocation concealment (nobody knows what the next
treatment assignment will be)
• blinding (investigators and subjects)
• same study procedures for all arms
– these should prevent non-differential misclassification
of the outcome
• any bias should be toward the null
– Problems:
• Unblinding (e.g. side effects from treatment)
• Accuracy of outcome assessment
– reliability, reproducibility
Information Bias in
Observational Studies
• Obtain information about exposure and outcome
in a structured manner (preferably ‘objective’
sources)
– Obtain information about the disease in exactly the
same fashion regardless of the exposure status
– Obtain information about the exposure in exactly the
same fashion regardless of the disease status
– Use different sources of information for exposure and
disease if possible
• If you ask someone about their alcohol use and also
ask them about how many times they fell, you are
likely to get very skewed data
– EtOH users less likely to report both use and falls
Information Bias: Examples
60
61
Exposure misclassification?
• An RCT, so exposure should be random
• Pill counts and interviews to assess whether
subjects took medications as assigned
• Not an ITT (but not a completers analysis either)
– (the method they used is probably better than an ITT
analysis - you’ll just have to trust me)
62
Outcome misclassification?
• Allocation concealment
– Not commented upon
• unfortunately this is common
• we’ll presume allocation was concealed
• Blinding
– Stated to be double-blind
• placebos for both interventions (blinds subjects in absence of noticeable
side effects)
• BMD scan results withheld from local investigators (blinds investigators)
• Outcome assessments
– BMD - ‘Quality assurance, cross-calibration adjustment, and data
processing were done centrally’
• presumably without knowledge of treatment assignment
– Fracture - ‘Radiographs were assessed in a blinded fashion by an
independent reader’
• presumably using a standard protocol of some kind
Likelihood of Information Bias
• Appears to be minimal
• Any bias should be toward the null
• Estimates therefore likely to be
conservative (underestimate true results)
64
65
Exposure misclassification?
1. How was exposure assessed?
– prescriptions for NSAIDs (including coxibs)
2. Could exposure be mis-measured?
– here, a prescription does not guarantee that the subject
actually took the medication, or that s/he took it as
prescribed
– OTC medication use may not be measured
NSAID use
NSAID Prescription
CV event
Exposure misclassification?
3. Is any exposure mis-measurement likely to be nondifferential or differential?
– Three questions here:
1. Coxibs vs traditional NSAIDs
2. NSAIDs vs comparison meds (glaucoma/thryoid meds)
3. Are chronic NSAID users likely to differ in their ‘compliance’
compared to glaucoma/thyroid med users (or coxib users)?
1. Some traditional NSAIDs are available OTC
• coxibs are not
• therefore, more likely that NSAID use is mis-measured
(underestimated)
• therefore, differential misclassification is likely
2. Same potential problem with NSAIDS in general (some
available OTC) and comparison meds (prescription
Exposure misclassification?
3. Is any exposure mis-measurement likely
to be non-differential or differential?
3. Are chronic NSAID users likely to differ in
their ‘compliance’ compared to
glaucoma/thyroid med users (or coxib users)?
• I’m not sure but I suspect there might be a
•
difference.
Any opinions?
68
Outcome misclassification?
1. How was outcome assessed?
– CV outcomes based on various codes in admin
records
2. Could exposure be mis-measured?
– errors made by coders (doctors, administrators, etc.)
– no verification that a coded event is an actual event
Codes for CV events
NSAID Prescription
CV event
Outcome misclassification?
3. Is any outcome mis-measurement likely to be
non-differential or differential
• Is it more likely for persons not coded with a CV
event to actually have a CV event?
• or for persons coded with a CV event to not have
one?
• Or does each seem equally likely?
• To me, I think it’s more likely that persons coded
for a CV event were mis-coded compared to
persons who never received a code in the first
place.
Likelihood of Information Bias
• Overall, low-moderate
• Likelihood of differential misclassification
is moderate
• I recognize this risk, think the authors are
also aware of it and have considered it,
and conditionally accept the results of the
study pending further studies
72
Exposure misclassification?
1. How was exposure assessed?
•
•
Exposure here is degree of inflammation
Apparently, standardized, protocolized assessment
of tender/swollen joints
2. Could exposure be mis-measured?
– Physician error in counting joints
Tender/swollen joint count
Inflammation
Lymphoma
Exposure misclassification?
3.Is any exposure mis-measurement likely
to be non-differential or differential?
• Not 100% sure, but seems to me like it’s likely
to be non-differential
• i.e. all patients have RA
• More likely to count more joints if subject seems
•
more ‘active’?
Less likely to count more joints if subject seems to
be doing subjectively well?
Outcome misclassification?
1. How was outcome assessed?
– Cases - occurrence of lymphoma
– All cases validated and confirmed
– Controls - random other RA subjects w/o RA
2. Could exposure be mis-measured?
– Would have to argue that some of those without lymphoma
actually had missed cases - seems unlikely
Lymphoma on biopsy
Inflammation
Lymphoma
Outcome misclassification?
3.Is any outcome mis-measurement likely to
be non-differential or differential
• As said, risk seems small but any risk would
have to be non-differential
• Cases are almost certainly lymphoma
• Could be some controls who actually should be
cases
Likelihood of Information Bias
• Overall, low
• Likelihood of differential misclassification
is also low
• Effort taken to verify cases and relatively
rarity of lymphoma in those who are
controls virtually eliminates the risk of
information bias in this study.
Selection Bias
78
Selection Bias
• Is study entry or exit related to exposure
(intervention) or disease (outcome)?
Selection Bias
• Factors that influence study participation
– Subjects entering and staying in the study
• exposure-disease association is different
among those who participate than among
those who do not
– E.g. healthy worker effect
Example of Selection Bias
Exposure
New
Drug
DM
Chemical
Lipids
Participation
Disease
Lung
Pain
MI
Gall
bladder
disease
Those
Healthy
who
Hospitalized
Health
conscious or familial
completed
worker
patient
hypercholesterolemia
or other high
study
(compared to
risk
general
population)
How does an RCT address
Selection Bias?
• Study entry: Randomization after study
entry prevents any factors related to the
exposure from influencing study
participation
How does a RCT address
Selection Bias?
• Study exit: Try to achieve full follow-up
– Problem: Loss-to-follow-up (LTFU)
• LTFU is almost always related to either the
exposure or disease or both (e.g. drug is
ineffective (outcome), or drug has side effects
(exposure))
• Equal numbers of LTFU in treatment arms does
not guarantee that the there is no selection bias
(different reasons for dropout)
• Analytic techniques such as last observation
carried forward or multiple imputations do not
“take care of” selection bias
Selection Bias in
Observational Studies
• Difficult to deal with selection bias in
observational studies
– Can’t always control factors influencing study
participation
• Healthy worker effect: use a group in the same
office/factory not exposed to the particular
chemical
– Try to minimize loss-to-follow-up
• Use appropriate analytic methods to account for
differing lengths of follow-up (incidence rate ratio,
hazard ratio)
‘Representativeness’:
This is NOT a bias
• Trying to make one’s study population ‘representative’ of
a more general population is counter to using restriction
for control of confounding
– Basic science animal studies ‘restrict’ study population to
genetically homogeneous lab animals
• To elaborate a scientific theory about a biologic process
• If a different group is of clinical interest, should conduct
well-designed study in that group to determine whether
effects differ in that group
– This is a question of whether the BIOLOGY is different in that
group
• This is an issue of “Effect Measure Modification”
Selection Bias: Examples
86
87
A Clinical Trial
• Selection bias due to study entry is
virtually never an issue
– true here
– patients randomized after selection for study
• Loss to follow-up
– 70 (33%!) in alendronate group
– 64 (30%!) in teriparatide group
– These are fairly high rates of LTFU!
88
Does it matter?
• LTFU is roughly equal in each arm
• This does NOT guarantee lack of selection bias
• Question:
– Is LTFU random in each group?
– or is LTFU differential?
• e.g:
• alendronate subjects drop out because of side effects
– (maybe the same people where the drug was effective)
• teriparatide subjects drop out because of inefficacy
– This combination could make teriparatide look better than
it actually is, even though LFTU is equal in each group
• teriparatide users who had an effect are left in study
• alendronate users who had no effect are left in the study
Judgement
• Seems to me that drop out is likely to be
random
• above scenario is unlikely
– Therefore I doubt there is much if any
selection bias in this study
90
91
Study entry
• The authors recognize that healthy people
are less likely to have records in their study
database
– one reason they used other drug-users
(glaucoma and hypothyroid) as a control group
• Therefore healthy persons may be less
likely to be selected as a comparison group
subject
– disease estimate in control group could be
exaggerated
92
– results in underestimate of true risk
Study exit
• Subjects could leave study if they got
insurance, e.g.
– are persons who did this more likely to be
NSAID users? or to have an event? or both?
– are healthier persons more likely to get
insurance and not have an event?
• I would think so
• leaves less healthy persons who could have an
event in the control group
– more events in control group
– underestimate true risk
93
Judgement
• Selection bias is certainly possible
– Recognized by authors
– Methods to minimize (drug using control
group)
• Overall, risk is there but is probably not
very high(?)
94
95
Study entry
• Entry in this study is based on case-control
status
• How could cases or controls be
unrepresentative of RA patients?
• All RA patients should have been captured in
this RA registry
• All lymphoma patients should also be captured
in the lymphoma registry
• One reason why ‘population-based’ studies
96
are nice!
Study exit
• As with entry, these registers should
capture all RA patients who have had
lymphoma
• If RA patients left the study before they
got lymphoma that could be a source of
bias
• But these persons could not have been
cases or controls.
• Again, likelihood of selection bias seems
97
low
Judgement
• Selection bias seems unlikely in this
nested case-control study
• Note, this can often be the case in c-c
studies where the underlying population is
well-defined (esp. population-based
studies)
• When this is NOT the case (e.g. hospitalbased c-c study) selection bias is much
98
more likely
Interpreting Study Results
• Any study’s results must be interpreted in
the context of whether or not there are any
biases that could have deviated those
results from being the truth
– This is the critical step in understanding
whether or not these results are clinically
meaningful for your patients – i.e., are the
results valid?
• If there are biases, you need to try to determine
how much they would have affected the results –
would the message still be about the same even
taking those biases into account?
Summary
• You should now be able to:
– Understand that all study designs use various
strategies to address systematic error (bias)
to varying degrees of success
– Understand how to identify confounding,
information bias, selection bias
Random Error
101
Goals
• Review random error (statistical precision)
– Review the correct interpretations of
confidence intervals and p-values
– Review Type 1 and Type 2 errors
Measured value
Random Error vs Systematic Error
True
value
In absence of bias, with
perfect precision
Study size
Syst
error
Random
error
Random Error
• Results from variability in the data, sampling
– E.g. measuring height with measuring tape: 1
measurement may be off, but multiple measurements
will give you a better estimate of height
• Relates to precision
• We use confidence intervals to express the
degree of uncertainty/random error associated
with a point estimate (e.g. a RR or OR)
– Measure of precision
Confidence Interval
• Epidemiologist/biostatisticians have arbitrarily
chosen 95% as the level of confidence to report
• 95% C.I.=if the data collection and analysis
were repeated infinitely, the confidence interval
should include within it the correct value of the
point estimate 95% of the time
– Note: this is NOT the same as saying the confidence
interval is 95% likely to contain the ‘true’ value.
(INCORRECT!!)
– ASSUMES NO BIAS
90% confidence intervals from repeated studies
True value for RR
Confidence Interval cont’d
• If the confidence interval includes the null
(‘1’ for relative measures of effect (e.g.
RR, OR), ‘0’ for absolute measures of
effect (e.g. risk difference)), the result is
considered to be not statistically
significant
– Since the CI contains the correct value 95%
of the time, the correct value could be the null
Confidence Interval cont’d
• “Precision” relates to the width of the
confidence interval
– A very precise estimate of effect will have a
narrow CI
– Can improve precision by:
• Increasing sample size
P-value
• The P-value is calculated from the same
equations that lead to a confidence
interval
• Again, 0.05 (related to the 95% CI) was
arbitrarily chosen
• P-values are confounded by magnitude of
effect and sample size (precision)
– Can get a very small p-value if a study is
large enough, even if the effect is very small
P-value
• P-value=assuming the null hypothesis is true, the
p-value is the probability of this result or one more
extreme, assuming no bias/confounding
– E.g. RR=2, p=0.03: “Assuming the null hypothesis is
true (RR=1), the probability of this result (i.e. RR=2) or
one more extreme (i.e. RR>2) is 3%”
– E.g. RR=0.4, p<0.001: “Assuming that there truly is no
difference between the two treatment groups, the
probability of obtaining a RR of 0.4 or one more extreme
(i.e. RR<0.4) is less than 0.1%”
• Note: this is NOT the same as saying the pvalue is the probability that the null hypothesis is
true (INCORRECT!! since it is conditioned on
the null being true)
– Just because your p-value isn’t “statistically
significant” doesn’t mean that you can say the two
arms are “equivalent”
• The p-value says nothing about the probability
of your alternative hypothesis being true
• Think of the p-value as a relative marker of how
consistent the data are with the null hypothesis
Statistical Significance Testing
• Statistical significance testing is really only
about statistical precision
– How precise are the study results?
– P-values on their own tell you NOTHING about the
magnitude of effect; 95% CI confidence intervals at
least give you an idea of potential magnitude of effect
– P-values and 95% CI tell you NOTHING about how
valid the results are (i.e., whether the results are
biased)
• We should place LESS emphasis on statistical
significance and rather think MORE about
clinical significance or meaningfulness when
considering the results and validity of a study
Type I, Type II Errors
• Emanating from α=0.05, Type I error is
made if one incorrectly rejects the null
hypothesis when the null hypothesis is
actually true
– This probability is determined by the
significance level alpha (=0.05)
• Type II error is made if one fails to reject
the null hypothesis when the null is false
– This probability is determined by β, where 1-β
is the power, which is usually set at 80%
Basic Approach to Understanding
Power
• Power calculations/sample size
calculations:
– Use estimate of the expected event rate in
the unexposed (placebo group)
• Based on published literature/experience (e.g.,
10% of men age 60-70 have MIs)
– Based on that ‘background rate’, determine
sample size needed to detect a difference of
a specified magnitude between treatment and
placebo with 80% power and alpha=0.05
Random Error: Examples
115
116
Statistical Significance: p-value given, but no confidence interval regarding the
difference in means (i.e. the 95% CI for the mean BMD difference of 3.8%)
Can theoretically use the standard error (S.E.) information to
calculate the 95% confidence interval
Adequate power? Sample size: 214/arm
Had sufficient power to detect 2% difference
in BMD at L-spine (found 3.8% difference)
118
Statistical Significance
• 95% confidence interval
given for each rate ratio
• Adequate sample sizes?
– N=several thousand
persons for each drug
– 100s-1000s of events
– Null results (lack of
association) for some
drugs is unlikely due to
inadequate power (and did
find association for Vioxx,
as expected)
– ? Type 2 error
120
Statistical Significance
95% Confidence Intervals given for each Odds Ratio
Adequate sample size? 378 cases, 378 controls
They found an association, so no concerns about inadequate power due
to insufficient sample size
? Type 1 error
? Bias
Summary
• You should now be able to:
– Understand that confidence intervals reflect
precision, which is related to sample size
– Understand the correct interpretation of
confidence intervals and p-values
– Understand that power relates to sample size
Download