ERRORS IN EPIDEMIOLOGICAL STUDIES Assoc. Prof. Pratap Singhasivanon Department of Tropical Hygiene Page 2 ERROR Is defined as a false or mistaken result obtained in a study or experiment Consists of 2 components Systematic error Random error Page 3 RANDOM ERROR Refers to fluctuations around a true value because of Sampling variability SYSTEMATIC ERROR Any difference between the true value and that actually obtained that is the result of all causes other than Sampling variability. Page 4 ERROR = A false or mistaken result obtained in a study or experiment SYSTEMATIC ERROR BIAS Error due to factorsthat inherent in the design, conduct and analysis + RANDOM ERROR Fluctuation of and estimate around the population value (RANDOM VARIABILITY) Result obtained in sample differs from result that would be obtained if the entire population were studies Page 5 SOURCES AND TYPES OF MEASUREMENT ERROR Sources of Error Observers Bias Random Researchers Administering The measure Bias Random Subjects Bias Random Page 6 SYSTEMATIC ERROR : SELECTION BIAS INFORMATION BIAS CONFOUNDING Page 7 RANDOM ERROR Is the divergence, due to chance alone, of an observation on an sample from the true population value Page 8 Different combinations of high and low reliability and validity RELIABILITY High VALIDITY Low High High Low Low Page 9 Internal and External Validity External Population Target Population Study Sample INT. EXT. VALIDITY Page 10 VALIDITY AND RELIABILITY HIGH VALIDITY A B C D HIGH RELIABILITY LOW LOW Page 11 VALIDITY : A study is valid if its results corresponds to the truth, no systematic error or should be as small as possible Page 12 VALIDITY Is the expression of the degree to which a test is capable of measuring what it is intended to measure A study is valid if its results corresponds to the truth, no systematic error and random error should be as small as possible Page 13 RELATION SHIP BETWEEN BIAS AND CHANCE TRUE BLOOD PRESSURE (INTRA-ARTERIAL CANULA) BLOOD PRESSURE MEASUREMENT (SPHYGMOMANOMETER) CHANCE BIAS 80 90 DIASTOLIC BLOOD PRESSURE (mmHg) Page 14 SOURCES OF VARIATION CONDITIONS OF MEASUREMENT DISTRIBUTION OF MEASUREMENT SOURCE OF VARIATION One Patient, One Observer Repeated observations MEASUREMENT One Patient, Many Observer, At one time One Patient, One observer, Many Times of Day BIOLOGIC + MEASUREMENT Many Patients Page 15 FRAMEWORK FOR THE INTERPRETATION OF AN EPIDEMIOLOGIC STUDY IS THERE A VALID STATISTICAL ASSOCIATION? Is the association likely to be due chance? Is the association likely to be due bias? Is the association likely to be due confounding? CAN THIS VALID STATISTICAL ASSOCIATION BE JUDGED AS CAUSE AND EFFECT? Is there a strong association? Is there biologic credibility to the hypothesis? Is there consistency with other studies? Is the time sequence compatible? Is there evidence of a dose-response relationship? Page 16 Precision : Is the quality of being sharply defined through exact detail. The repeated assay of a single test specimen typically gives rise to a set of results that differ to a greater or lesser extent from each order. The smaller the differences, the greater the precision of the assay method. Page 17 Measurement The procedure of applying a standard scale to a variable or a set of values. (Last, 1988) Terms used to describe properties of measurement: - Accuracy Validity Precision Reliability Repeatability Reproducibility Page 18 SELECTION BIAS is a distorsion in the estimate of effect resulting from the manner in which subject are selected for the study population MAJOR SOUREC OF SELECTION BIAS 1) flaws in the choice of groups to be compared 2) choice of sampling frame 3) loss to follow up or nonresponse during data collection 4) selective survival Page 19 INFORMATION BIAS is a distortion in the measurement error or misclassification of subject on one or more variables MAJOR SOURCES OF INFORMATION BIAS 1) 2) 3) 4) invalid measurement incorrect diagnostic criteria omissions imprecisions other inadequacies in previously recorded data Prevalence of Down syndrome at Maternal Age 9 8 7 6 5 4 3 2 1 0 <20 20-24 25-29 30-34 Maternal Age 35-39 40+ Prevalence of Down syndrome at birth by birth order 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1 2 3 Birth Order 4 5+ Hypothetical Examples of Unadjusted and Adjusted Relative Risks According to Type of confounding (Positive or Negative) Example No. Type of Confounding Unadjusted Relative Risk Adjusted Relative Risk 1 Positive 3.5 1.0 2 Positive 3.5 2.1 3 Positive 0.3 0.7 4 Negative 1.0 3.2 5 Negative 1.5 3.2 6 Negative 0.8 0.2 7 Qualitative 2.0 0.7 8 Qualitative 0.6 1.8 Page 25 CONFOUNDING MIXING OF EFFECTS The estimate of the effect of The exposure of interest is distorted because it is mixed With the effect of an Extraneous factor Page 26 CONFOUNDING COFFEE DRINKING, CIGARETTE SMOKING AND CORONARY HEART DISEASE EXPOSURE (coffee drinking) CONFOUNDING VARIABLE (cigarette smoking) DISEASE (heart disease) Page 27 The distortion introduced by a confounding factor can lead to overestimation or under estimation of an effect depending on the direction of the association that the confounding factor has with exposure and disease. Confounding can even change the apparent direction of an effect. Example : Alcohol Smoking Oral cancer E = D D E = E = D D E = E D D E E E E= E= D D E D D E E E Page 32 Situation in which F is a confounder for a D - E association. E E E D D D F F F Situation in which F is not a confounder for a D E E D F E D F - E association. E D F D F Page 33 To be confounding, the extraneous variable must have the following characteristics A confounding variable must be a risk factor for the disease. A confounding variable must be associated with the exposure under study (in the population from which the case derive). A confounding variable must not be an intermediate step in the causal path between the exposure and the disease. Page 34 The data-based criterion for establishing the presence or absence of confounding involve the comparison of a crude effect measure with an adjusted effect measure that corrects for distortions due to extraneous variables. Confounding is acknowledged to be present when the crude and adjusted effect measure d i f f e r i n v a l u e. Page 35 CONTROL OF CONFOUNDING - RESTRICTION - MATCHING DESIGN - STRATIFICATION - MATHEMATICAL MODEL (Multivariate analysis) ANALYSIS Page 36 Relation of Confounder to Disease and Exposure DISEASE EXPOSURE AGE *MI (%) 25-29 3 16 29 30-34 9 14 10 35-39 16 20 8 40-44 30 21 4 45-49 42 18 3 *MI **OC CONTROLS(%) **OC USE (%) : MYOCARDIAL INFARCTION : ORAL CONTRACEPTIVE Page 37 CRUDE RR E - E RR = 4 + D - CRUDE =4 RR D 1000 1000 CRR 2000 • Collapsed • Collapsed in 1 table without separation into subgroup. E = E= E E D D E D D E E D D E E E Page 40 EXPOSURE ALC ALC + DISEASE - 200 50 800 950 1750 1000 1000 2000 250 ^ CRUDE CIR = 4.0 EXPOSURE SMOKERS DISEASE + + - + 194 21 - 706 79 + 6 94 NON-SMOKERS 29 ^ CIR 871 SM = 1.86 ADJUSTED ^ = 1.13 CIR ^ CIR SM = 1.02 Page 41 Degree of Confounding measures the amount of confounding rather than mere presence or absence degree of confounding = crude measure adjusted measure = Crude Adjusted = = 1.68 3.97 d.c. = 1.68 3.97 4.00 1.13 = 3.53 over estimation = 0.42 under estimation Page 42 4 fold risk of MI among recent of OC users as compared to non-users. a OR(MH) AGE 25-29 Recent Use of OC MI Controls Yes 4 62 No 2 244 Yes 9 33 30-34 OR 7.2 8.9 No 12 390 Yes 4 26 No 33 330 Yes 6 9 No 65 362 Yes 6 5 No 93 301 Yes 29 135 No 205 1607 1.5 35-39 3.7 40-44 3.9 45-49 TOTAL 65 1.7 3.97 Page 43 TYPES OF ASSOCIATION A. Not statistically associated (Independent) B. Statistically associated 1. Noncausally associated (Secondarily) 2. Causally associated a. Indirectly associated b. Directly causal Page 44 Association refers to the statistical dependence between two variables that is .. The degree to which the rate of disease in person with a specific exposure is either higher or lower than the rate of disease among those without that exposure. The presence of an association, does not imply that the observed association is one of cause and effect. Page 45 STATISTICAL SIGNIFICANCE YES NO Clinical / Public Health significance Sample size big enough YES OK NO YES NO RESEARCH Page 46 Advantages and disadvantages of the major observational designs. (cont.) 2. CROSS-SECTIONAL Advantages - May study several outcomes - Control over selection of subjects - Control over measurements - Relatively short duration - A good first step for a cohort study - Yield prevalence, relative prevalence Disadvantages - Dose not establish sequence of events - Potential bias in measuring predictors - Potential survival bias - Not feasible for rare conditions - Does not yield incidence or true relative risk Page 47 Advantages and disadvantages of the major observational designs. (cont.) 4. NESTED CASE-CONTROL (Prospective or retrospective) Advantages Scientific advantages of cohort design samples stored until Disadvantage Requires bank of outcomes occur Relatively inexpensive * All of these observational designs have the disadvantages (compare to experiment) of being susceptible to the influence of confounding variables Page 48 Advantages and disadvantages of the major observational designs 1. COHORT Advantages Establishes sequence of events Disadvantages Often requires large sample sizes Avoid bias in measuring predictors Not feasible for rare outcomes Avoid survival bias Can study; several outcomes Number of outcome events grows over time Page 49 Advantages and disadvantages of the major observational designs. (cont.) 3. CASE-CONTROL Advantage Disadvantages Useful for studying rare conditions Potential bias from sampling two population Short duration Does not establish sequence of events Relatively inexpensive Yield odds ratio(usually a good predictors Potential bias in measuring approximation of relative risk) Potential survival bias Limited to one outcome variable Does not yield prevalence, incidence, or excess risk Page 50 CHARCTERISTICS OF INCIDENCE AND PREVALENCE INCIDENCE NUMERATOR DENOMINATOR TIME HOW MEASURED PREVALENCE New cases occurring during All cases counted on a single a period of time among a survey or examination of a group initially free of disease group All susceptible people present at the beginning of the period All people examined including cases and new cases Duration of the period Single point Cohort study Prevalence (cross-sectional) study Page 51 Nondifferential Misclassification SENSITIVITY AND SPECIFICITY REMAIN CONSTANT IRRESPECTIVE OF THE VALUES OF THE OTHER VARIABLE : “BIAS TOWARD THE NULL” Page 52 Differential Misclassification WHEN THE MAGNITUDE OF ERROR FOR ONE VARIABLE DIFFERS ACC. TOTHE ACTUAL VALUE OF ANOTHER VARIABLE (DIFF. SENSITIVITY & SPECIFICITY) EG. EXPOSURE TO RADIATION EMPHYSEMA CONGENITAL MALFORMATION SMOKING “BIAS TOWARD OR AWAY FROM NULL VALUE” Page 53 FRAMEWORK FOR THE INTERPRETATION OF AN EPIDEMIOLOGIC STUDY IS THERE A VALID STATISTICAL ASSOCIATION? Is the association likely to be due chance? Is the association likely to be due bias? Is the association likely to be due confounding? CAN THIS VALID STATISTICAL ASSOCIATION BE JUDGED AS CAUSE AND EFFECT? Is there a strong association? Is there biologic credibility to the hypothesis? Is there consistency with other studies? Is the time sequence compatible? Is there evidence of a dose-response relationship? Page 54 Prevalence of Dyslipidemia Source pop. Prevalence = 25% 1. 2. 3. 4.+ 5. 6.+ 7. 8. 9. 10. 11. 12.+ 13. 14. 15.+ 16. 17. 18 19. 20.+ Sample 1 Sample 2 Sample 3 4+ 6+ 8 9 14 Prevalence = 40% 12+ 7 17 16 19 Prevalence = 20% 18 14 11 10 5 Prevalence = 0% Page 55 Advantages and disadvantages of the major observational designs. (cont.) Advantages Disadvantages Yields incidence, relative risk, excess risk - Prospective More controls over - Retrospective selection of Less expensive Shorter duration - Double cohort Useful when distinct cohort different or rare exposures More expensive Longer duration Less controls over selection of subjects Less controls over measurements Potential bias from sampling two populations Page 56 MISCLASSIFICATION WITH REGARD TO DISEASE (NONDIFFERENTIAL MISCLASSIFICATION) Exposed Unexposed Relative Risk 2,000 8,000 2.0 Number of Cases 20 40 Under Diagnosis (Sens. = 0.05; Spec.=1.0) Number Identified as cases 10 20 2.0 Over Diagnosis (Sens.= 1.0 ; Spec.=0.99) : Number Identified as cases 40 120 1.3 Number of Individuals Page 57 Explanation for the observed difference in survival between propanolol and control group: 1. Chance (Random error) 2. Bias (Systematic error) Selection Information Confounding 3. Effect of propanolol Page 58 A high reliability means that in repeated measurements the results fall very close to each other; conversely, A low reliability means that they are scattered. 1000 250 2000