Possible Explanations for an Association Between a Risk Factor and

advertisement
BIAS
Possible Explanations for an Association Between a
Risk Factor and a Disease
Explanation
Random variability
Confounding
Bias
Causal relationship
Assessment Strategy
Estimation of precision (95%
confidence interval), and
testing (p)
Experimental design;
adjustment
Quality assurance and Quality
control
Eliminate alternative
explanations; Hill’s criteria
1. Definition of Bias
DISTRIBUTION OF AN INFINITE
NUMBER OF STUDIES
Statistical definition of bias: When the average
value of the association measure obtained from an
infinite number of studies is not the true value
Average
bias
TRUTH
Average of Results
Study Results
DISTRIBUTION OF AN INFINITE
NUMBER OF STUDIES
Difference between validity and
representativeness:
1. Biased study
Biased, but
“representative”
Study Results
TRUTH
Average of Results
DISTRIBUTION OF AN INFINITE
NUMBER OF STUDIES
Difference between validity and
representativeness:
2. Valid (unbiased) study
Valid, but not
“representative”
Study Results
TRUTH
Average of Results
Epidemiological Definition of Bias
Last J: A Dictionary of Epidemiology, ed. by J. Last, 3rd
Edition, IEA
“Deviation of results or inferences from the truth, or
processes leading to such deviation. Any trend in the
collection, analysis, interpretation, publication, or review
of data that can lead to conclusions that are
systematically different from the truth.”
2. Main types of bias
a. Selection bias
b. Information bias
Reference Population
Diseased
Exposed
Healthy
Exposed
Diseased
Healthy
Unexposed Unexposed
Selection Bias: One group (cell) in the
population (e.g., exposed
cases) has a > likelihood of inclusion
in study.
Study Sample
“Gold Standard”: Total Population Study
A. Hypothetical Case-Control Study Including All
Cases and All Non-Cases of a Reference Population
Risk Factor A
Total Reference Population
Cases
Controls (Non-cases)
Present
50
180
Absent
50
720
Total
100
900
Exposure Odds
50:50 = 1:1
180:720 = 1:4
Relative Odds
(50/50) ÷ (180/720) = 4.0
Conclusions
•Unbiased exposure odds in cases and controls
•Unbiased relative odds
Unbiased Study Based on Samples
B. Hypothetical Unbiased Case-Control Study Including
50% Cases and 10% Non-Cases of a Reference Population
Risk Factor A
Total Reference Population
Cases
Present
Absent
25
25
Controls (Non-cases)
18
72
Total
100 x 0.50= 50 900 x 0.10= 90
Exposure Odds 25:25 = 1:1
18:72 = 1:4
Relative Odds
Conclusions
(25/25) ÷ (18/72) = 4.0
• Unbiased exposure odds in cases and controls
• Unbiased relative odds
Example of Selection Bias Without Compensation
Bias
C. Hypothetical Biased Case-Control Study Including
50% Cases and 10% Non-Cases of a Reference Population
Risk Factor A
Total Reference Population
Cases
Present
Controls (Noncases)
50 x 0.60= 30 180 x 0.10= 18
Absent
50 x 0.40= 20 720 x 0.10= 72
Total
100 x 0.50= 50 900 x 0.10= 90
Exposure Odds 30:20 = 1.5:1.0
18:72 = 1:4
Relative Odds
(30/20) ÷ (18:72) = 6.0
Conclusions
• Biased exposure odds in cases
• Unbiased exposure odds in controls
• Biased relative odds
Selection Bias: Compensating Bias
D. Hypothetical Case-Control Study Including
50% Cases and 10% Non-Cases of a Reference Population
with Compensation Bias
Risk Factor A
Present
Absent
Total
Total Reference Population
Cases
Controls (Noncases)
50 x 0.60= 30
180 x 0.13= 24
50 x 0.40= 20 720 x 0.09= 66
100 x 0.50= 50 900 x 0.10= 90
Exposure Odds
30:20 = 1.5:1.0 24:66 = 1.0:2.7
Relative Odds
(30/20) ÷ (24/66) = 4.1
Conclusions
• Exposure odds equally biased in cases and
controls
Comparability of selection processes
• Unbiased Relative Odds
rounding
True Odds:
In cases: 1:1
In controls: 1:4
Bias:
In cases: [1.5:1]  [1:1]= 1.5
In controls: [1:2.7]  [1:4]  1.5
Odds Ratio 
Odds exp in cases  bias
Odds exp in controls  bias
= True OR
Example of Compensating Bias
Case-control study of breast cancer identified
in a population-based screening program
Controls?
- Random sample of total population?
Women who
have normal
mammograms, or
all women undergoing
screening
Controls
Cases
*Study Base
TWO EXAMPLES OF UNCOMPENSATED BIAS
Example 1*
• Hypothesis: HBV is associated with aplastic anemia
(AA)
• Cases: AA patients hospitalized at the Johns Hopkins
Hospital Hematology Division (a referral Division)
• Controls: Patients with non-malignant diseases
admitted to the Johns Hopkins Hospital
PROBLEM??
*Based on a study by Szklo et al
• Patients (cases) hospitalized with AA at the Johns Hopkins
Hospital in Baltimore, USA, usually undergo bone marrow
transplant
• To be eligible for bone marrow transplant, patient is more likely
to:
– Come from large family (where a genetically-matched donor
is more likely to be found), and
– Have medical insurance (as bone transplants are very
expensive), and thus belong to a higher S.E.S. than control
patients.
THUS, VARIABLES RELATED TO FAMILY SIZE AND TO
SOCIO-ECONOMIC STATUS  SELECTION FACTORS
THE CHOICE OF CONTROLS DID NOT FOLLOW THE
BASIC PRINCIPLE: THAT CASES AND CONTROLS
SHOULD BE CHOSEN FROM THE SAME STUDY BASE!
TWO EXAMPLES OF UNCOMPENSATED BIAS
Example 2
Coffee Drinking and Cancer of the Pancreas (MacMahon
et al, New Eng J Med 1981;304:630)
• Cases: Patients newly diagnosed with pancreatic
cancer admitted to 11 Boston and Rhode Island
hospitals during 1974-1980 (n= 369)
• Controls: Patients under the care of the same MD’s in
the same hospitals, interviewed at the same time as
the cases (i.e., cases and controls were matched by
attending physician, hospital, and timing of interview).
Controls mostly had gastrointestinal conditions, or
cancers other than pancreatic and biliary tract (n=
644)
Odds ratios for cancer of the pancreas according to
coffee drinking and smoking
Coffee intake (cups/day)
Smoking
0
1-2
3+
Never
1.0
2.1
3.1
Ex-smokers
1.0
3.1
2.3
Current
smokers
1.0
1.8
3.8
Total
1.0
1.8
2.7
Problem?
(Adapted from MacMahon et al, New Eng J Med 1981;304:630)
Reference Population
Diseased
Exposed
Exposed
Controls
Diseased Unexposed
Unexposed controls
Study Sample
2. Types of Bias
a. Selection bias
b. Information bias and
misclassification
Reference Population
Diseased
Exposed
Healthy
Exposed
Diseased
Healthy
Unexposed Unexposed
Cases Controls
Information bias:
misclassification of
exposure information in cases
and controls
Exp
a
b
Unexp
c
d
Study Sample
Examples of Information Bias
• Exposure Identification Bias
– Recall Bias
– Interviewer/Observer Bias
• Outcome Identification Bias
– Observer Bias
– Respondent Bias
Result of Information Bias:
Misclassification
• Differential
• Non-differential
Example of Recall Bias in a Study of Melanoma
(Weinstock et al, Am J Epidemiol 1991;133:240-5)
Healthy cohort:
Data collection
(questionnaire) on
tanning ability
Loss
Case
Time (cohort follow-up)
Example of Recall Bias in a Case-Control Study (Ca-Co)
Tanning Ability
No tan or light tan (“exposed”)
Premelanoma
diagnosis
Ca
Co
9
79
+6
Medium, average, deep or dark
tan (“unexposed”)
25
Odds Ratio
(Weinstock et al, Am J Epidemiol 1991;133:240-5)
155
0.7
Postmelanoma
diagnosis
Ca
Co
15
77
+2
19
157
1.6
Example of nondifferential misclassification
To investigate the validity of self-reported acquired
immunodeficiency syndrome (AIDS) among women enrolled in a
prospective study of human immunodeficiency virus (HIV)
infection, the authors compared the self-reported occurrence of
AIDS-specific diagnoses with AIDS diagnoses documented by
county AIDS surveillance registries.
“test”
“gold standard”
(Hessel et al, Am J Epidemiol 2001;153:1128-33)
Sensitivity and Specificity of Self-Reported
Diagnosis of Esophageal Candidiasis in AIDS
patients
• Sensitivity: 46%
• Specificity: 84%
(Adapted from: Hessol et al, Am J Epidemiol 2001;153:1128)
Relationship between AIDS and esophageal
candidiasis: “Gold Standard”
Esophageal
candidiasis
Present
AIDS
cases
20
Normal
controls
5
Odds
Ratio
8.3
Absent
480
995
1.0
Total
500
1,000
Definitions of Sensitivity and Specificity Used for
the Evaluation of Misclassification
• Sensitivity
– Proportion of all truly infected (exposed)
individuals correctly classified by the study
• Specificity
– Proportion of all truly uninfected
(unexposed) individuals correctly classified
by the study
Relationship between AIDS and esophageal candidiasis
ascertained by questionnaire with sensitivity= 46% and
specificity= 84%. Assume
non-differential misclassification*
True
Results
Self-report in
Cases
AIDS
cases
Normal
controls
Odds
Ratio
Present
20
5
8.3
Absent
480
995
1.0
Total
500
1,000
Truth in Cases
Present
Absent
In Study
Present
Absent
Total
20
480
500
*Same sensitivity and
specificity values for cases
and controls
Relationship Between AIDS and esophageal
candidiasis ascertained by questionnaire with
sensitivity= 46% and specificity= 84%
True Results:
Self-report
of
esophageal
candidiasis
AIDS
cases
Normal
controls
Odds
Ratio
Present
20
5
8.3
Absent
480
995
1.0
Total
500
1,000
Truth in Cases
Self-report in
cases
Present
Absent
In Study
Absent
9
11
77
403
86
414
Total
20
480
500
Present
Relationship Between AIDS and esophageal
candidiasis ascertained by questionnaire with
sensitivity= 46% and specificity= 84%
True
Results
Self-report in
Controls
AIDS
cases
Normal
controls
Odds
Ratio
Present
20
5
8.3
Absent
480
995
1.0
Total
500
1,000
Truth in Cntrls
Present
Absent
In Study
5
995
1000
Present
Absent
Total
Relationship Between AIDS and esophageal
candidiasis ascertained by questionnaire with
sensitivity= 46% and specificity= 84%
True Results:
Self-report
of
esophageal
candidiasis
AIDS
cases
Normal
controls
Odds
Ratio
Present
20
5
8.3
Absent
480
995
1.0
Total
500
1,000
Truth in Cntrls
Self-report in
controls
Present
Absent
In Study
Absent
2
3
159
836
161
839
Total
5
995
1000
Present
Self-report in
cases
Truth in Cases
Present
Absent
In Study
Present
9
77
86
Absent
11
403
414
Total
20
480
500
Self-report in
controls
Truth in Controls
Present
Absent
In Study
Present
2
159
161
Absent
3
836
839
Total
5
995
1000
Study Results:
Esophageal
candidiasis
AIDS
cases
Normal
controls
Present
86
161
Absent
414
839
Total
500
1,000
OR= (86/414) ÷ (161/839) =
= 1.1
(True OR= 8.3!!)
Rule: When there are two exposure categories, nondifferential misclassification biases odds ratio or relative risk toward 1.0
Odds Ratios for Inaccurate Self-Reporting of AIDS (Hessol et al, Am J Epidemiol 2001;153:1128)
*Simultaneously adjusted for all other variables using multiple logistic regression.
Examples of Differential Misclassification in a CaseControl Study. True Odds Ratio= 3.86; Prevalence of
Exposure in Controls= 0.10
Exposure Ascertainment
Sensitivity
Specificity
Odds Ratio
Cases
Controls
Cases
Controls
0.90
0.60
1.00
1.00
5.79
0.60
0.90
1.00
1.00
2.22
1.00
1.00
0.90
0.70
1.00
1.00
1.00
0.70
0.90
4.43
Net result of misclassification: Regression Dilution Bias
Example: Blood pressure as the exposure
Regression towards the mean:
SBP
(mmHg)
• Random variability
• Measurement error
• Physiologic variability
140
120
average
Time
SBP
(mmHg)
140
120
Time
Classification: Hypertensive
Normotensive Hypertensive
SBP
(mmHg)
Person A
140
120
Person B
Classified as:
Time
“Hypertensive”
2. Types of Bias
a. Selection bias
b. Information bias and misclassification
c. Mixed biases
- Prevalence-Incidence
- Temporal bias
bias
Cross-sectional study
P
1 - P I D 
 
P
I D 
1 - P
If the prevalence is low:
P I D
 
P I D
If exposure to the risk factor does not affect the
duration of the disease after it starts:
P I

P I
EXAMPLE: MYOCARDIAL INFARCTION
Hypothetical numerical example
Yearly incidence in persons older than 60:
• Men: 5%
• Women: 2%
RR= 2.5
Survival after the acute event:
• Men: 20 years
• Women: 10 years
Prevalence= Incidence × duration
Incidence MEN  duration MEN
Pr evalence Rate Ratio 

IncidenceWOMEN  durationWOMEN
duration MEN
20
 Risk Ratio 
 2.5 
 5.0
durationWOMEN
10
2. Types of Bias
a. Selection bias
b. Information bias and misclassification
c. Mixed biases
- Prevalence-Incidence bias
- Temporal bias
Number of Cases and Controls, and Odds Ratios for Endometrial
Cancer According to Type of Estrogen Replacement Therapy, with
95% Confidence Intervals (Antunes et al, NEJM 1979)
No. of
cases
No. of
controls
Odds
Ratios
None
274
390
1.0
95%
confidence
intervals
Reference
Conjugated
56
18
4.3
2.5, 7.5
Total
339
489
Type of estrogen
Feinstein & Horowitz’ criticism
Estrogen Use  Endometrial Cancer?
OR
Undiagnosed Endometrial Cancer  Bleeding 
Estrogen Use  Diagnosed Endometrial Cancer ?
Solution? Analyze only women who take estrogen
prophylactically
Example of Temporal Bias: Relationship of Bypass
Surgery to Physical Activity
Cases who had bypass surgery Vs. controls without
coronary heart disease
Question: Do you exercise often now?
“Yes”: Cases > Controls  “Reverse Causality”
Question: Did you exercise often before your bypass
surgery/before (date)?
“Yes”: Cases < Controls
A DETOUR…
IS CONFOUNDING A BIAS?
Confounded relationship
Sedentary life  Oral cancer
alcohol
IS CONFOUNDING A BIAS?
High risk marker useful for secondary
prevention
Sedentary life  Oral cancer
alcohol
Types of Association (Lilienfeld)
• Causal
True
• Statistical non-causal (“indirect”, due to confounding)
• Spurious or artifactual (due to bias)
Is confounding a bias?
Public health implications
Goal
Type of evidence needed
Primary prevention
Prevention or cessation of
exposure
E.g.: saturated fat intake
and atherosclerosis
Causal association must be present, otherwise,
intervention on risk factor will not affect disease
outcome
E.g.: if excessive fat did not cause
atherosclerosis, a lower fat intake would not
affect atherosclerosis risk
Secondary prevention
(screening)
Early diagnosis via
selective screening of “high
risk” subjects
E.g.: screening for
hypertension in AfricanAmericans
Associations may be either causal or statistical
(the latter must not be biased). In other words,
the association may be confounded, but it is
still useful for secondary prevention.
E.g.: even if “race” is not causally related to
hypertension (but confounded by SES, etc.), it
could be a useful marker to detect individuals at
higher risk for hypertension
2. Types of Bias
a. Selection bias
b. Information bias and misclassification
c. Mixed biases
- Prevalence-Incidence bias
- Temporal bias
d. Biases in the evaluation of screening
Biases in the Evaluation of Screening
Programs
•Lead Time Bias
•Selection Biases
•Referral Bias
•Length-biased sampling
Biases in the Evaluation of Screening
Programs
•Lead Time Bias 
Earliest
point when
Biologicdiagnosis is
onset possible
A
B
Point
when early
diagnosis
is made
C
Usual
diagnosis
based on
symptoms
D
Detectable Preclinical
Phase
Survival after early diagnosis
Natural History of a
Disease
(Adapted from Gordis,
Epidemiology, 1996)
Lead
Time
Survival after
usual diagnosis
F
LEAD TIME BIAS
No protective effect: Survival B = Survival A + lead time =
= 8 years = 6 years + 2 years
Diagnosis Based
on Symptoms
A
1985
Survival
6 years
Early
Diagnosis
(Screening)
B
1985
2000
1994
2000
1992
Lead time
(2 years)
Survival
8 years
Protective effect: Surv (B) > Surv (A) + Lead Time =
= 10 years > 6 years + 2 years
Diagnosis
based on
symptoms
A
Early
Diagn.
B
2000
1994
Lead
time
1992
Survival: 6 years
Survival: 8 years
Survival:10 years
Gain=
2 yrs
2000
2002
Natural History of a
Disease: Lead Time Bias
Cumulative Survival
100%
-2
(Adapted from Frank, Am J Prev
1985;1:3-9)
70%
40%
Lead
Time
bias
5
3
5 years after usual diagnosis
10
12
Years after diagnosis
5 years after early diagnosis= 3 years after usual diagnosis
Lead time: Prevention and
Correction
• Prevention: Use mortality instead of
case-fatality
• Correction: Estimate lead time, and
adjust for it
– Examples
• Breast Cancer: 1 year
• Invasive Cervical Cancer: At least 10 years?
• Lung Cancer: Less than 1 year?
Estimation of Lead Time
Step 1= Estimation of Duration of DPCP
Earliest
point when
Biologicdiagnosis is
onset possible
A
B
Usual
diagnosis
based on
symptoms
D
F
Detectable Preclinical
Phase
Pr evalenceDPCP  IncidenceDPCP  Duration DPCP
DurationDPCP
Pr evalenceDPCP

IncidenceDPCP
Estimation of Lead Time
Step 1= Estimation of Duration of DPCP
Prevalence of the DPCP?
Incidence of the DPCP? Also, incidence of
clinical cases
1st
exam
Time
2nd exam
Estimation of Lead Time
Step 1= Estimation of Duration of DPCP
Earliest
point when
Biologicdiagnosis is
onset possible
A
B
Usual
diagnosis
based on
symptoms
D
Detectable Preclinical
Phase
DurationDPCP
Pr evalenceDPCP

IncidenceDPCP
F
Estimation of Lead Time
Step 2= Estimation of Lead Time
a. Prevalent Cases
1st
exam
Time
Possible points
in time when
early diagnosis is
possible
B
D
Detectable Preclinical
Phase
Average Lead TimePREVALENT CASES
DPCP

2
b. Incident Cases
When screening exams are frequent,
the lead time approximates the DPCP
Average Lead TimeINCIDENT CASES  DPCP
Estimation of Lead Time
Step 2= Estimation of the Lead Time
b. Incident Cases
LEAD TIME
DPCP
Patient A
2nd exam
LEAD TIME
Patient B
2nd exam
1st
exam
Time
Biases in the Evaluation of Screening
Programs
•Lead Time Bias
•Selection Biases 
Biases in Evaluation of Screening
• Selection Bias
–Referral Bias (Volunteer Bias) 
• DEFINITION and SOLUTION?
– Randomized Clinical Trial
Biases in Evaluation of Screening
• Selection Bias
–Referral Bias (Volunteer Bias)
–Length-Biased Sampling 
– DEFINITION and SOLUTION?
Length-Biased Sampling (each
horizontal line represents the DPCP
for a case)
Screeening
Exam No. 1
1 year
Interval Cases
Screening
Exam No. 2
Lead-Time- Adjusted Five-Year Case-Fatality
Rates Among Breast Cancer Patients
(Shapiro et al, JNCI 1982;69:349-55)
STUDY GROUP
45
40
35
30
25
20
15
10
5
0
Control
Refsd Scr
Total
Scr-detct No Scr
Det
SCREENED
Lead-Time- Adjusted Five-Year Case-Fatality
Rates Among Breast Cancer Patients
(Shapiro et al, JNCI 1982;69:349-55)
STUDY GROUP
45
40
35
30
25
20
15
10
5
0
Control
Total Refsd Scr
Study
Only valid comparison
Total
Scr-detct No Scr
Det
SCREENED
Biases in Evaluation of Screening
• Selection Bias
–Referral Bias (Volunteer Bias)
–Length-Biased Sampling 
– DEFINITION and SOLUTION?
»Randomized Clinical Trials
»Compare all individuals
randomized to the
“experimental” group with all
individuals randomized to the
control group
Download