BIAS Possible Explanations for an Association Between a Risk Factor and a Disease Explanation Random variability Confounding Bias Causal relationship Assessment Strategy Estimation of precision (95% confidence interval), and testing (p) Experimental design; adjustment Quality assurance and Quality control Eliminate alternative explanations; Hill’s criteria 1. Definition of Bias DISTRIBUTION OF AN INFINITE NUMBER OF STUDIES Statistical definition of bias: When the average value of the association measure obtained from an infinite number of studies is not the true value Average bias TRUTH Average of Results Study Results DISTRIBUTION OF AN INFINITE NUMBER OF STUDIES Difference between validity and representativeness: 1. Biased study Biased, but “representative” Study Results TRUTH Average of Results DISTRIBUTION OF AN INFINITE NUMBER OF STUDIES Difference between validity and representativeness: 2. Valid (unbiased) study Valid, but not “representative” Study Results TRUTH Average of Results Epidemiological Definition of Bias Last J: A Dictionary of Epidemiology, ed. by J. Last, 3rd Edition, IEA “Deviation of results or inferences from the truth, or processes leading to such deviation. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth.” 2. Main types of bias a. Selection bias b. Information bias Reference Population Diseased Exposed Healthy Exposed Diseased Healthy Unexposed Unexposed Selection Bias: One group (cell) in the population (e.g., exposed cases) has a > likelihood of inclusion in study. Study Sample “Gold Standard”: Total Population Study A. Hypothetical Case-Control Study Including All Cases and All Non-Cases of a Reference Population Risk Factor A Total Reference Population Cases Controls (Non-cases) Present 50 180 Absent 50 720 Total 100 900 Exposure Odds 50:50 = 1:1 180:720 = 1:4 Relative Odds (50/50) ÷ (180/720) = 4.0 Conclusions •Unbiased exposure odds in cases and controls •Unbiased relative odds Unbiased Study Based on Samples B. Hypothetical Unbiased Case-Control Study Including 50% Cases and 10% Non-Cases of a Reference Population Risk Factor A Total Reference Population Cases Present Absent 25 25 Controls (Non-cases) 18 72 Total 100 x 0.50= 50 900 x 0.10= 90 Exposure Odds 25:25 = 1:1 18:72 = 1:4 Relative Odds Conclusions (25/25) ÷ (18/72) = 4.0 • Unbiased exposure odds in cases and controls • Unbiased relative odds Example of Selection Bias Without Compensation Bias C. Hypothetical Biased Case-Control Study Including 50% Cases and 10% Non-Cases of a Reference Population Risk Factor A Total Reference Population Cases Present Controls (Noncases) 50 x 0.60= 30 180 x 0.10= 18 Absent 50 x 0.40= 20 720 x 0.10= 72 Total 100 x 0.50= 50 900 x 0.10= 90 Exposure Odds 30:20 = 1.5:1.0 18:72 = 1:4 Relative Odds (30/20) ÷ (18:72) = 6.0 Conclusions • Biased exposure odds in cases • Unbiased exposure odds in controls • Biased relative odds Selection Bias: Compensating Bias D. Hypothetical Case-Control Study Including 50% Cases and 10% Non-Cases of a Reference Population with Compensation Bias Risk Factor A Present Absent Total Total Reference Population Cases Controls (Noncases) 50 x 0.60= 30 180 x 0.13= 24 50 x 0.40= 20 720 x 0.09= 66 100 x 0.50= 50 900 x 0.10= 90 Exposure Odds 30:20 = 1.5:1.0 24:66 = 1.0:2.7 Relative Odds (30/20) ÷ (24/66) = 4.1 Conclusions • Exposure odds equally biased in cases and controls Comparability of selection processes • Unbiased Relative Odds rounding True Odds: In cases: 1:1 In controls: 1:4 Bias: In cases: [1.5:1] [1:1]= 1.5 In controls: [1:2.7] [1:4] 1.5 Odds Ratio Odds exp in cases bias Odds exp in controls bias = True OR Example of Compensating Bias Case-control study of breast cancer identified in a population-based screening program Controls? - Random sample of total population? Women who have normal mammograms, or all women undergoing screening Controls Cases *Study Base TWO EXAMPLES OF UNCOMPENSATED BIAS Example 1* • Hypothesis: HBV is associated with aplastic anemia (AA) • Cases: AA patients hospitalized at the Johns Hopkins Hospital Hematology Division (a referral Division) • Controls: Patients with non-malignant diseases admitted to the Johns Hopkins Hospital PROBLEM?? *Based on a study by Szklo et al • Patients (cases) hospitalized with AA at the Johns Hopkins Hospital in Baltimore, USA, usually undergo bone marrow transplant • To be eligible for bone marrow transplant, patient is more likely to: – Come from large family (where a genetically-matched donor is more likely to be found), and – Have medical insurance (as bone transplants are very expensive), and thus belong to a higher S.E.S. than control patients. THUS, VARIABLES RELATED TO FAMILY SIZE AND TO SOCIO-ECONOMIC STATUS SELECTION FACTORS THE CHOICE OF CONTROLS DID NOT FOLLOW THE BASIC PRINCIPLE: THAT CASES AND CONTROLS SHOULD BE CHOSEN FROM THE SAME STUDY BASE! TWO EXAMPLES OF UNCOMPENSATED BIAS Example 2 Coffee Drinking and Cancer of the Pancreas (MacMahon et al, New Eng J Med 1981;304:630) • Cases: Patients newly diagnosed with pancreatic cancer admitted to 11 Boston and Rhode Island hospitals during 1974-1980 (n= 369) • Controls: Patients under the care of the same MD’s in the same hospitals, interviewed at the same time as the cases (i.e., cases and controls were matched by attending physician, hospital, and timing of interview). Controls mostly had gastrointestinal conditions, or cancers other than pancreatic and biliary tract (n= 644) Odds ratios for cancer of the pancreas according to coffee drinking and smoking Coffee intake (cups/day) Smoking 0 1-2 3+ Never 1.0 2.1 3.1 Ex-smokers 1.0 3.1 2.3 Current smokers 1.0 1.8 3.8 Total 1.0 1.8 2.7 Problem? (Adapted from MacMahon et al, New Eng J Med 1981;304:630) Reference Population Diseased Exposed Exposed Controls Diseased Unexposed Unexposed controls Study Sample 2. Types of Bias a. Selection bias b. Information bias and misclassification Reference Population Diseased Exposed Healthy Exposed Diseased Healthy Unexposed Unexposed Cases Controls Information bias: misclassification of exposure information in cases and controls Exp a b Unexp c d Study Sample Examples of Information Bias • Exposure Identification Bias – Recall Bias – Interviewer/Observer Bias • Outcome Identification Bias – Observer Bias – Respondent Bias Result of Information Bias: Misclassification • Differential • Non-differential Example of Recall Bias in a Study of Melanoma (Weinstock et al, Am J Epidemiol 1991;133:240-5) Healthy cohort: Data collection (questionnaire) on tanning ability Loss Case Time (cohort follow-up) Example of Recall Bias in a Case-Control Study (Ca-Co) Tanning Ability No tan or light tan (“exposed”) Premelanoma diagnosis Ca Co 9 79 +6 Medium, average, deep or dark tan (“unexposed”) 25 Odds Ratio (Weinstock et al, Am J Epidemiol 1991;133:240-5) 155 0.7 Postmelanoma diagnosis Ca Co 15 77 +2 19 157 1.6 Example of nondifferential misclassification To investigate the validity of self-reported acquired immunodeficiency syndrome (AIDS) among women enrolled in a prospective study of human immunodeficiency virus (HIV) infection, the authors compared the self-reported occurrence of AIDS-specific diagnoses with AIDS diagnoses documented by county AIDS surveillance registries. “test” “gold standard” (Hessel et al, Am J Epidemiol 2001;153:1128-33) Sensitivity and Specificity of Self-Reported Diagnosis of Esophageal Candidiasis in AIDS patients • Sensitivity: 46% • Specificity: 84% (Adapted from: Hessol et al, Am J Epidemiol 2001;153:1128) Relationship between AIDS and esophageal candidiasis: “Gold Standard” Esophageal candidiasis Present AIDS cases 20 Normal controls 5 Odds Ratio 8.3 Absent 480 995 1.0 Total 500 1,000 Definitions of Sensitivity and Specificity Used for the Evaluation of Misclassification • Sensitivity – Proportion of all truly infected (exposed) individuals correctly classified by the study • Specificity – Proportion of all truly uninfected (unexposed) individuals correctly classified by the study Relationship between AIDS and esophageal candidiasis ascertained by questionnaire with sensitivity= 46% and specificity= 84%. Assume non-differential misclassification* True Results Self-report in Cases AIDS cases Normal controls Odds Ratio Present 20 5 8.3 Absent 480 995 1.0 Total 500 1,000 Truth in Cases Present Absent In Study Present Absent Total 20 480 500 *Same sensitivity and specificity values for cases and controls Relationship Between AIDS and esophageal candidiasis ascertained by questionnaire with sensitivity= 46% and specificity= 84% True Results: Self-report of esophageal candidiasis AIDS cases Normal controls Odds Ratio Present 20 5 8.3 Absent 480 995 1.0 Total 500 1,000 Truth in Cases Self-report in cases Present Absent In Study Absent 9 11 77 403 86 414 Total 20 480 500 Present Relationship Between AIDS and esophageal candidiasis ascertained by questionnaire with sensitivity= 46% and specificity= 84% True Results Self-report in Controls AIDS cases Normal controls Odds Ratio Present 20 5 8.3 Absent 480 995 1.0 Total 500 1,000 Truth in Cntrls Present Absent In Study 5 995 1000 Present Absent Total Relationship Between AIDS and esophageal candidiasis ascertained by questionnaire with sensitivity= 46% and specificity= 84% True Results: Self-report of esophageal candidiasis AIDS cases Normal controls Odds Ratio Present 20 5 8.3 Absent 480 995 1.0 Total 500 1,000 Truth in Cntrls Self-report in controls Present Absent In Study Absent 2 3 159 836 161 839 Total 5 995 1000 Present Self-report in cases Truth in Cases Present Absent In Study Present 9 77 86 Absent 11 403 414 Total 20 480 500 Self-report in controls Truth in Controls Present Absent In Study Present 2 159 161 Absent 3 836 839 Total 5 995 1000 Study Results: Esophageal candidiasis AIDS cases Normal controls Present 86 161 Absent 414 839 Total 500 1,000 OR= (86/414) ÷ (161/839) = = 1.1 (True OR= 8.3!!) Rule: When there are two exposure categories, nondifferential misclassification biases odds ratio or relative risk toward 1.0 Odds Ratios for Inaccurate Self-Reporting of AIDS (Hessol et al, Am J Epidemiol 2001;153:1128) *Simultaneously adjusted for all other variables using multiple logistic regression. Examples of Differential Misclassification in a CaseControl Study. True Odds Ratio= 3.86; Prevalence of Exposure in Controls= 0.10 Exposure Ascertainment Sensitivity Specificity Odds Ratio Cases Controls Cases Controls 0.90 0.60 1.00 1.00 5.79 0.60 0.90 1.00 1.00 2.22 1.00 1.00 0.90 0.70 1.00 1.00 1.00 0.70 0.90 4.43 Net result of misclassification: Regression Dilution Bias Example: Blood pressure as the exposure Regression towards the mean: SBP (mmHg) • Random variability • Measurement error • Physiologic variability 140 120 average Time SBP (mmHg) 140 120 Time Classification: Hypertensive Normotensive Hypertensive SBP (mmHg) Person A 140 120 Person B Classified as: Time “Hypertensive” 2. Types of Bias a. Selection bias b. Information bias and misclassification c. Mixed biases - Prevalence-Incidence - Temporal bias bias Cross-sectional study P 1 - P I D P I D 1 - P If the prevalence is low: P I D P I D If exposure to the risk factor does not affect the duration of the disease after it starts: P I P I EXAMPLE: MYOCARDIAL INFARCTION Hypothetical numerical example Yearly incidence in persons older than 60: • Men: 5% • Women: 2% RR= 2.5 Survival after the acute event: • Men: 20 years • Women: 10 years Prevalence= Incidence × duration Incidence MEN duration MEN Pr evalence Rate Ratio IncidenceWOMEN durationWOMEN duration MEN 20 Risk Ratio 2.5 5.0 durationWOMEN 10 2. Types of Bias a. Selection bias b. Information bias and misclassification c. Mixed biases - Prevalence-Incidence bias - Temporal bias Number of Cases and Controls, and Odds Ratios for Endometrial Cancer According to Type of Estrogen Replacement Therapy, with 95% Confidence Intervals (Antunes et al, NEJM 1979) No. of cases No. of controls Odds Ratios None 274 390 1.0 95% confidence intervals Reference Conjugated 56 18 4.3 2.5, 7.5 Total 339 489 Type of estrogen Feinstein & Horowitz’ criticism Estrogen Use Endometrial Cancer? OR Undiagnosed Endometrial Cancer Bleeding Estrogen Use Diagnosed Endometrial Cancer ? Solution? Analyze only women who take estrogen prophylactically Example of Temporal Bias: Relationship of Bypass Surgery to Physical Activity Cases who had bypass surgery Vs. controls without coronary heart disease Question: Do you exercise often now? “Yes”: Cases > Controls “Reverse Causality” Question: Did you exercise often before your bypass surgery/before (date)? “Yes”: Cases < Controls A DETOUR… IS CONFOUNDING A BIAS? Confounded relationship Sedentary life Oral cancer alcohol IS CONFOUNDING A BIAS? High risk marker useful for secondary prevention Sedentary life Oral cancer alcohol Types of Association (Lilienfeld) • Causal True • Statistical non-causal (“indirect”, due to confounding) • Spurious or artifactual (due to bias) Is confounding a bias? Public health implications Goal Type of evidence needed Primary prevention Prevention or cessation of exposure E.g.: saturated fat intake and atherosclerosis Causal association must be present, otherwise, intervention on risk factor will not affect disease outcome E.g.: if excessive fat did not cause atherosclerosis, a lower fat intake would not affect atherosclerosis risk Secondary prevention (screening) Early diagnosis via selective screening of “high risk” subjects E.g.: screening for hypertension in AfricanAmericans Associations may be either causal or statistical (the latter must not be biased). In other words, the association may be confounded, but it is still useful for secondary prevention. E.g.: even if “race” is not causally related to hypertension (but confounded by SES, etc.), it could be a useful marker to detect individuals at higher risk for hypertension 2. Types of Bias a. Selection bias b. Information bias and misclassification c. Mixed biases - Prevalence-Incidence bias - Temporal bias d. Biases in the evaluation of screening Biases in the Evaluation of Screening Programs •Lead Time Bias •Selection Biases •Referral Bias •Length-biased sampling Biases in the Evaluation of Screening Programs •Lead Time Bias Earliest point when Biologicdiagnosis is onset possible A B Point when early diagnosis is made C Usual diagnosis based on symptoms D Detectable Preclinical Phase Survival after early diagnosis Natural History of a Disease (Adapted from Gordis, Epidemiology, 1996) Lead Time Survival after usual diagnosis F LEAD TIME BIAS No protective effect: Survival B = Survival A + lead time = = 8 years = 6 years + 2 years Diagnosis Based on Symptoms A 1985 Survival 6 years Early Diagnosis (Screening) B 1985 2000 1994 2000 1992 Lead time (2 years) Survival 8 years Protective effect: Surv (B) > Surv (A) + Lead Time = = 10 years > 6 years + 2 years Diagnosis based on symptoms A Early Diagn. B 2000 1994 Lead time 1992 Survival: 6 years Survival: 8 years Survival:10 years Gain= 2 yrs 2000 2002 Natural History of a Disease: Lead Time Bias Cumulative Survival 100% -2 (Adapted from Frank, Am J Prev 1985;1:3-9) 70% 40% Lead Time bias 5 3 5 years after usual diagnosis 10 12 Years after diagnosis 5 years after early diagnosis= 3 years after usual diagnosis Lead time: Prevention and Correction • Prevention: Use mortality instead of case-fatality • Correction: Estimate lead time, and adjust for it – Examples • Breast Cancer: 1 year • Invasive Cervical Cancer: At least 10 years? • Lung Cancer: Less than 1 year? Estimation of Lead Time Step 1= Estimation of Duration of DPCP Earliest point when Biologicdiagnosis is onset possible A B Usual diagnosis based on symptoms D F Detectable Preclinical Phase Pr evalenceDPCP IncidenceDPCP Duration DPCP DurationDPCP Pr evalenceDPCP IncidenceDPCP Estimation of Lead Time Step 1= Estimation of Duration of DPCP Prevalence of the DPCP? Incidence of the DPCP? Also, incidence of clinical cases 1st exam Time 2nd exam Estimation of Lead Time Step 1= Estimation of Duration of DPCP Earliest point when Biologicdiagnosis is onset possible A B Usual diagnosis based on symptoms D Detectable Preclinical Phase DurationDPCP Pr evalenceDPCP IncidenceDPCP F Estimation of Lead Time Step 2= Estimation of Lead Time a. Prevalent Cases 1st exam Time Possible points in time when early diagnosis is possible B D Detectable Preclinical Phase Average Lead TimePREVALENT CASES DPCP 2 b. Incident Cases When screening exams are frequent, the lead time approximates the DPCP Average Lead TimeINCIDENT CASES DPCP Estimation of Lead Time Step 2= Estimation of the Lead Time b. Incident Cases LEAD TIME DPCP Patient A 2nd exam LEAD TIME Patient B 2nd exam 1st exam Time Biases in the Evaluation of Screening Programs •Lead Time Bias •Selection Biases Biases in Evaluation of Screening • Selection Bias –Referral Bias (Volunteer Bias) • DEFINITION and SOLUTION? – Randomized Clinical Trial Biases in Evaluation of Screening • Selection Bias –Referral Bias (Volunteer Bias) –Length-Biased Sampling – DEFINITION and SOLUTION? Length-Biased Sampling (each horizontal line represents the DPCP for a case) Screeening Exam No. 1 1 year Interval Cases Screening Exam No. 2 Lead-Time- Adjusted Five-Year Case-Fatality Rates Among Breast Cancer Patients (Shapiro et al, JNCI 1982;69:349-55) STUDY GROUP 45 40 35 30 25 20 15 10 5 0 Control Refsd Scr Total Scr-detct No Scr Det SCREENED Lead-Time- Adjusted Five-Year Case-Fatality Rates Among Breast Cancer Patients (Shapiro et al, JNCI 1982;69:349-55) STUDY GROUP 45 40 35 30 25 20 15 10 5 0 Control Total Refsd Scr Study Only valid comparison Total Scr-detct No Scr Det SCREENED Biases in Evaluation of Screening • Selection Bias –Referral Bias (Volunteer Bias) –Length-Biased Sampling – DEFINITION and SOLUTION? »Randomized Clinical Trials »Compare all individuals randomized to the “experimental” group with all individuals randomized to the control group