Biases and errors in Epidemiology Anchita Khatri Definitions ERROR: 1. A false or mistaken result obtained in a study or experiment 2. Random error is the portion of variation in measurement that has no apparent connection to any other measurement or variable, generally regarded as due to chance 3. Systematic error which often has a recognizable source, e.g., a faulty measuring instrument, or pattern, e.g., it is consistently wrong in a particular direction (Last) Relationship b/w Bias and Chance True BP (intra-arterial cannula) BP measurement (sphygmomanometer) Chance Bias 80 90 Diastolic Blood Pressure (mm Hg) Validity • Validity: The degree to which a measurement measures what it purports to measure (Last) Degree to which the data measure what they were intended to measure – that is, the results of a measurement correspond to the true state of the phenomenon being measured (Fletcher) • also known as ‘Accuracy’ Reliability • The degree of stability expected when a measurement is repeated under identical conditions; degree to which the results obtained from a measurement procedure can be replicated (Last) • Extent to which repeated measurements of a stable phenomenon – by different people and instruments, at different times and places – get similar results (Fletcher) • Also known as ‘Reproduciblity’ and ‘Precision’ Validity and Reliability VALIDITY High High RELIABILITY Low Low Bias • Deviation of results or inferences from the truth, or processes leading to such deviation. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth. (Last) • A process at any stage of inference tending to produce results that depart systematically from true values (Fletcher) Types of biases 1. Selection bias 2. Measurement / (mis)classification bias 3. Confounding bias Selection bias • Errors due to systematic differences in characteristics between those who are selected for study and those who are not. (Last; Beaglehole) • When comparisons are made between groups of patients that differ in ways other than the main factors under study, that affect the outcome under study. (Fletcher) Examples of Selection bias • Subjects: hospital cases under the care of a physician • Excluded: 1. Die before admission – acute/severe disease. 2. Not sick enough to require hospital care 3. Do not have access due to cost, distance etc. • Result: conclusions cannot be generalized • Also known as ‘Ascertainment Bias’ (Last) Ascertainment Bias • Systematic failure to represent equally all classes of cases or persons supposed to be represented in a sample. This bias may arise because of the nature of the sources from which the persons come, e.g., a specialized clinic; from a diagnostic process influenced by culture, custom, or idiosyncracy. (Last) Selection bias with ‘volunteers’ • Also known as ‘response bias’ • Systematic error due to differences in characteristics b/w those who choose or volunteer to take part in a study and those who do not Examples …response bias • Volunteer either because they are unwell, or worried about an exposure • Respondents to ‘effects of smoking’ usually not as heavy smokers as non-respondents. • In a cohort study of newborn children, the proportion successfully followed up for 12 months varied according to the income level of the parents Examples…. (Assembly bias) • Study: ? association b/w reserpine and breast cancer in women • Design: Case Control • Cases: Women with breast cancer Controls: Women without breast cancer who were not suffering from any cardio-vascular disease (frequently associated with HT) • Result: Controls likely to be on reserpine systematically excluded association between reserpine and breast cancer observed Examples…. (Assembly bias) • Study: effectiveness of OCP1 vs. OCP2 • Subjects: on OCP1 – women who had given birth at least once ( able to conceive) on OCP2 – women had never become pregnant • Result: if OCP2 found to be better, inference correct?? Susceptibility Bias • Groups being compared are not equally susceptible to the outcome of interest, for reasons other than the factors under study • Comparable to ‘Assembly Bias’ • In prognosis studies; cohorts may differ in one or more ways – extent of disease, presence of other diseases, the point of time in the course of disease, prior treatment etc. Examples…..(Susceptibility Bias) • Background: for colorectal cancer, - CEA levels correlated with extent of disease (Duke’s classification) - Duke’s classification and CEA levels strongly predicted diseases relapse • Question: Does CEA level predict relapse independent of of Duke’s classification, or was susceptibility to relapse explained by Duke’s classification alone? Example… CEA levels (contd.) • Answer: association of pre-op levels of CEA to disease relapse was observed for each category of Duke’s classification stratification Disease-free survival according to CEA levels in colorectal cancer pts.with similar pathological staging (Duke’s B) % disease free 100 80 CEA Level (ng) <2.5 2.5 – 10.0 60 >10.0 0 3 6 9 12 Months 15 18 21 24 Selection bias with ‘Survival Cohorts’ • Patients are included in study because they are available, and currently have the disease • For lethal diseases patients in survival cohort are the ones who are fortunate to have survived, and so are available for observation • For remitting diseases patients are those who are unfortunate enough to have persistent disease • Also known as ‘Available patient cohorts’ Example… bias with ‘survival cohort’ TRUE COHORT Assemble Cohort (N=150) Observed Measure outcome improvement Improved: 75 Not improved: 75 50% True improvement 50% SURVIVAL COHORT Assemble patients Begin Follow-up (N=50) Not observed (N=100) Measure outcome Improved: 40 Not improved: 10 Dropouts Improved: 35 Not improved: 65 80% 50% Selection bias due to ‘Loss to Follow-up’ • Also known as ‘Migration Bias’ • In nearly all large studies some members of the original cohort drop out of the study • If drop-outs occur randomly, such that characteristics of lost subjects in one group are on an average similar to those who remain in the group, no bias is introduced • But ordinarily the characteristics of the lost subjects are not the same Example of ‘lost to follow-up’ EXPOSURE irradiation EXPOSURE irradiation +nt -nt Total +nt 50 100 150 -nt 10000 20000 30000 RR= 50/10000 100/20000 =1 +nt -nt +nt -nt 30 30 Total 60 4000 8000 12000 RR= 30/4000 30/8000 =2 Migration bias • A form of Selection Bias • Can occur when patients in one group leave their original group, dropping out of the study altogether or moving to one of the other groups under study (Fletcher) • If occur on a large scale, can affect validity of conclusions. • Bias due to crossover more often a problem in risk studies, than in prognosis studies, because risk studies go on for many years Example of migration • Question: relationship between lifestyle and mortality • Subjects: 10,269 Harvard College alumni - classified according to physical activity, smoking, weight, BP - In 1966 and 1977 • Mortality rates observed from 1977 to 1985 Example of migration (contd.) • Problem: original classification of ‘lifestyle’ might change (migration b/w groups) • Solution: defined four categories - Men who maintained high risk lifestyles - Men who crossed over from low to high risk - Men who crossed over from high to low risk - Men who maintained low risk lifestyles Example of migration (contd.) • Result: after controlling for other risk factors - those who maintained or adopted high risk characteristics had highest mortality - Those who changed from high to low had lesser mortality than above - Those who never had any high risk behavior had least mortality Healthy worker effect • A phenomenon observed initially in studies of occupational diseases: workers usually exhibit lower overall death rates than the general population, because the severely ill and chronically disabled are ordinarily excluded from employment. Death rates in the general population may be inappropriate for comparison if this effect is not taken into account. (Last) Example…. ‘healthy worker effect’ • Question: association b/w formaldehyde exposure and eye irritation • Subjects: factory workers exposed to formaldehyde • Bias: those who suffer most from eye irritation are likely to leave the job at their own request or on medical advice • Result: remaining workers are less affected; association effect is diluted Measurement bias • Systematic error arising from inaccurate measurements (or classification) of subjects or study variables. (Last) • Occurs when individual measurements or classifications of disease or exposure are inaccurate (i.e. they do not measure correctly what they are supposed to measure) (Beaglehole) • If patients in one group stand a better chance of having their outcomes detected than those in another group. (Fletcher) Measurement / (Mis) classification • Exposure misclassification occurs when exposed subjects are incorrectly classified as unexposed, or vice versa • Disease misclassification occurs when diseased subjects are incorrectly classified as non-diseased, or vice versa (Norell) Causes of misclassification 1. Measurement gap: gap between the measured and the true value of a variable - Observer / interviewer bias - Recall bias - Reporting bias 2. Gap b/w the theoretical and empirical definition of exposure / disease Sources of misclassification Measurement results Measurement errors Empirical definition Gap b/w theoretical & empirical definitions Theoretical definition Example… ‘gap b/w definitions’ Theoretical definition • Exposure: passive smoking – inhalation of tobacco smoke from other people’s smoking • Disease: Myocardial infarction – necrosis of the heart muscle tissue Empirical definition • Exposure: passive smoking – time spent with smokers (having smokers as room-mates) • Disease: Myocardial infarction – certain diagnostic criteria (chest pain, enzyme levels, signs on ECG) Exposure misclassification – Non-differential • Misclassification does not differ between cases and non-cases • Generally leads to dilution of effect, i.e. bias towards RR=1 (no association) Example…Non-differential Exposure Misclassification EXPOSURE X-ray exposure +nt -nt Total +nt 40 80 120 -nt 10000 40000 50000 RR= 40/10000 80/40000 =2 EXPOSURE X-ray exposure +nt -nt Total +nt 60 60 120 -nt 20000 30000 50000 RR= 60/20000 60/30000 = 1.5 Exposure misclassification Differential • Misclassification differs between cases and non-cases • Introduces a bias towards RR= 0 (negative / protective association), or RR= α (infinity)(strong positive association) Example…Differential Exposure Misclassification EXPOSURE X-ray exposure +nt -nt Total +nt 40 80 120 -nt 9960 39920 49880 10000 40000 50000 RR= 40/10000 80/40000 =2 EXPOSURE X-ray exposure +nt -nt Total +nt 40 80 120 -nt 19940 29940 49880 19980 30020 50000 RR= 40/19980 80/30020 = 0.75 Implications of Differential exposure misclassification • An improvement in accuracy of exposure information (i.e. no misclassification among those who had breast cancer), actually reduced accuracy of results • Non-differential misclassification is ‘better’ than differential misclassification • So, epidemiologists are more concerned with comparability of information than with improving accuracy of information Causes of Differential Exposure Misclassification • Recall Bias:Systematic error due to differences in accuracy or completeness of recall to memory of past events or experience. For e.g. patients suffering from MI are more likely to recall and report ‘lack of exercise’ in the past than controls Causes of Differential Exposure Misclassification • Measurement bias: e.g. analysis of Hb by different methods (cyanmethemoglobin and Sahli's) in cases and controls. e.g.biochemical analysis of the two groups from two different laboratories, which give consistently different results Causes of Differential Exposure Misclassification • Interviewer / observer bias: systematic error due to observer variation (failure of the observer to measure or identify a phenomenon correctly) e.g. in patients of thrombo-embolism, look for h/o OCP use more aggressively Measurement bias in treatment effects • Hawthorne effect: effect (usually positive / beneficial) of being under study upon the persons being studied; their knowledge of being studied influences their behavior • Placebo effect: (usually, but not necessarily beneficial) expectation that regimen will have effect, i.e. the effect is due to the power of suggestion. Total effects of treatment are the sum of spontaneous improvement, non-specific responses, and the effects of specific treatments IMPROVEMENT EFFECTS Specific to treatment Placebo Hawthorne Natural History Confounding 1. A situation in which the effects of two processes are not separated. The distortion of the apparent effect of an exposure on risk brought about by the association with other factors that can influence the outcome 2. A relationship b/w the effects of two or more causal factors as observed in a set of data such that it is not logically possible to separate the contribution that any single causal factor has made to an effect (Last) Confounding When another exposure exists in the study population (besides the one being studied) and is associated both with disease and the exposure being studied. If this extraneous factor – itself a determinant of or risk factor for health outcome is unequally distributed b/w the exposure subgroups, it can lead to confounding (Beaglehole) Confounder … must be 1. Risk factor among the unexposed (itself a determinant of disease) 2. Associated with the exposure under study 3. Unequally distributed among the exposed and the unexposed groups Examples … confounding SMOKING LUNG CANCER AGE (If the average ages of the smoking and non-smoking groups are very different) (As age advances chances of lung cancer increase) Examples … confounding COFFEE DRINKING HEART DISEASE (Smoking increases the risk of heart ds) (Coffee drinkers are more likely to smoke) SMOKING Examples … confounding ALCOHOL INTAKE MYOCARDIAL INFARCTION (Men are more likely to consume alcohol than women) (Men are more at risk for MI) SEX Examples … confounding Exposure-alcohol +nt +nt -nt 140 100 -nt RR = 140/30000 100/30000 = 1.4 Total 30000 30000 Exposure-alcohol RR = 120/20000 +nt -nt (M) 60/10000 male female male female =1 +nt 120 20 60 40 RR = 20/10000 (F) 40/20000 -nt =1 Total 20000 10000 10000 20000 Example … multiple biases • Study: ?? Association b/w regular exercise and risk of CHD • Methodology: employees of a plant offered an exercise program; some volunteered, others did not coronary events detected by regular voluntary check-ups, including a careful history, ECG, checking routine heath records • Result: the group that exercised had lower CHD rates Biases operating • Selection: volunteers might have had initial lower risk (e.g. lower lipids etc.) • Measurement: exercise group had a better chance of having a coronary event detected since more likely to be examined more frequently • Confounding: if exercise group smoked cigarettes less, a known risk factor for CHD Dealing with Selection Bias Ideally, To judge the effect of an exposure / factor on the risk / prognosis of disease, we should compare groups with and without that factor, everything else being equal But in real life ‘everything else’ is usually not equal Methods for controlling Selection Bias During Study Design 1. Randomization 2. Restriction 3. Matching During analysis 1. Stratification 2. Adjustment a) Simple / standardization b) Multiple / multivariate adjustment c) Best case / worst case analysis Restriction • Subjects chosen for study are restricted to only those possessing a narrow range of characteristics, to equalize important extraneous factors • Limitation: generalisability is compromised; by excluding potential subjects, cohorts / groups selected may be unusual and not representative of most patients or people with condition Example… restriction • Study: effect of age on prognosis of MI • Restriction: Male / White / Uncomplicated anterior wall MI • Important extraneous factors controlled for: sex / race / severity of disease • Limitation: results not generalizable to females, people of non-white community, those with complicated MI Example… restriction • OCP example restrict study to women having at least one child • Colorectal cancer example restrict patients to a particular staging of Duke’s classification Matching - definition • The process of making a study group and a comparison group comparable with respect to extraneous factors (Last) • For each patient in one group there are one or more patients in the comparison group with same characteristics, except for the factor of interest (Fletcher) Types of Matching • Caliper matching: process of matching comparison group to study group within a specific distance for a continuous variable (e.g., matching age to within 2 years) • Frequency matching: frequency distributions of the matched variable(s) be similar in study and comparison groups • Category matching: matching the groups in broad classes such as relatively wide age ranges or occupational groups Types of Matching … (contd.) • Individual matching: identifying individual subjects for comparison, each resembling a study subject on the matched variable(s) • Pair matching: individual matching in which the study and comparison subjects are paired (Last) • Matching is often done for age, sex, race, place of residence, severity of disease, rate of progression of disease, previous treatment received etc. • Limitations: - controls for bias for only those factors involved in the match - Usually not possible to match for more than a few factors because of the practical difficulties of finding patients that meet all matching criteria - If categories for matching are relatively crude, there may be room for substantial differences b/w matched groups Example… Matching • Study: ? Association of Sickle cell trait (HbAS) with defects in physical growth and cognitive development • Other potential biasing factors: race, sex, birth date, birth weight, gestational age, 5-min Apgar score, socio economic status • Solution: matching – for each child with HbAS selected a child with HbAA who was similar with respect to the seven other factors (50+50=100) • Result: no difference in growth and development Overmatching A situation that may arise when groups are being matched. Several varieties: 1. The matching procedure partially or completely obscures evidence of a true causal association b/w the independent and dependant variables. Overmatching may occur if the matching variable is involved in, or is closely connected with, the mechanism whereby the independent variable affects the dependant variable. The matching variable may be an intermediate cause in the causal chain or it may be strongly affected by, or a consequence of, such an intermediate cause 2. The matching procedure uses one or more unnecessary matching variables, e.g., variables that have no causal effect or influence on the dependant variable, and hence cannot confound the relationship b/w the independent and dependant variables. 3. The matching process is unduly elaborate, involving the use of numerous matching variables and / or insisting on a very close similarity with respect to specific matching variables. This leads to difficulty in finding suitable controls (Last) Stratification • The process of or the result of separating a sample into several sub-samples according to specified criteria such as age groups, socio-economic status etc. (Last) • The effect of confounding variables may be controlled by stratifying the analysis of results • After data are collected, they can be analyzed and results presented according to subgroups of patients, or strata, of similar characteristics (Fletcher) Example…Stratification (Fletcher) HOSPITAL ‘A’ Pre-op Risk High Pts Deaths % 500 30 6 Medium 400 16 4 Low 02 .67 300 Total 1200 48 4 HOSPITAL ‘B’ Pre-op Risk High Pts 400 Medium 800 Low 1200 Total 2400 Deaths % 24 32 6 4 8 .67 64 2.6 Example…Stratification Relat. Age – Pinellas county Dade county Wise Rate Stratifi cation Dead Total Rate Dead Total Rate Birth – 737 54 yrs 229,198 3.2 2463 748,035 3.3 1.0 > 55 yrs 4989 145,147 34.4 5898 187,985 31.2 1.1 Overall 5726 374,665 15.3 8332 935,047 8.9 1.7 Standardization A set of techniques used to remove as far as possible the effects of differences in age or other confounding variables when comparing two or more populations The method uses weighted averaging of rates specific for age, sex, or some other potentially confounding variable(s), according to some specified distribution of these variables (Last) Standard population A population in which the age and sex composition is known precisely, as a result of a census or by an arbitrary means – e.g. an imaginary population, the “standard million” in which the age and sex composition is arbitrary. A standard population is used as comparison group in the actuarial procedure of standardization of mortality rates. (e.g. Segi world population, European standard population) (Last) Types of standardization Direct: the specific rates in a study population are averaged using as weights the distribution of a specified standard population. The standardized rate so obtained represents what the rate would have been in the study population if that population had the same distribution as the standard population w.r.t. the variables for which the adjustment or standardization was carried out. Indirect: used to compare the study populations for which the specific rates are either statistically unstable or unknown. The specific rates are averaged using as weights the distribution of the study population. The ratio of the crude rate for the study population to the weighted average so obtained is known as standardized mortality (or morbidity) ratio, or SMR. (Last) [represents what the rate would have been in the study population if that population had the same specific rates as the standard population] Standardized mortality ratio (SMR) Ratio of The no. of deaths observed in the study group or population X 100 No. of deaths expected if the study population had the same specific rates as the standard population Example … direct standardization Age 0 1-4 5-14 15-19 20-24 25-34 34-44 45-54 55-64 Total Pop Deaths 4000 60 4500 20 4000 12 5000 15 4000 16 8000 25 9000 48 8000 100 7000 150 53,500 446 Rate 15.0 4.4 3.0 3.0 4.0 3.1 5.3 12.5 21.4 8.3 Std.Pop Exp deaths 2400 36 9600 42.24 19000 57 9000 27 8000 32 14000 43.4 12000 63.6 11000 137.5 8000 171.2 93000 609.94(6.56) Example … direct standardization HOSPITAL ‘A’ Preop Pts Deaths % High 500 30 6 Medium 400 16 4 Low 300 2 .67 Total 1200 48 4 HOSPITAL ‘Std’ Preop Pts Rate Exp.deaths High 400 6 24 Medium 400 4 16 Low 400 .67 2.68 Total 1200 42.68 (3.6%) Stratification vs. Standardization • Standardization removes the effect • Stratification controls for the effect of factor, but the effect can still be seen • For e.g. in the ‘hospital example’, with standardization we found that patients had similar prognosis in both hospitals; with stratification also learnt mortality rates among different risk strata • Similar to difference b/w age-standardized mortality rate and age specific mortality rates Multivariate adjustment • Simultaneously controlling the effects of many variables to determine the independent effects of one • Can select from a large no. of variables a smaller subset that independently and significantly contributes to the overall variation in outcome, and can arrange variables in order of the strength of their contribution • Only feasible way to deal with many variables at one time during the analysis phase Examples… Multivariate adjustment • CHD is the joint result of lipid abnormalities, HT, smoking, family history, DM, exercise, personality type. • Start with 2x2 tables using one variable at a time • Contingency tables, i.e. stratified analyses, examining the effect of one variable changed in the presence/absence of one or more variables Example…Multivariate adjustment • Multi variable modeling i.e developing a mathematical expression of the effects of many variables taken together • Basic structure of a multivariate model: Outcome variable = constant + (β1 x variable1) + (β2 x variable2) + ………. • β1, β2, … are coefficients determined from the data; variable1, variable2, …. are the predictor variables that might be related to outcome Sensitivity analysis • When data on important prognostic factors is not available, it is possible to estimate the potential effects on the study by assuming various degrees of mal-distribution of the factors b/w the groups being compared and seeing how that would affect the results • Best case / worst case analysis is a special type of sensitivity analysis – assuming the best and worst type of mal-distribution Example… best/worst case analysis • Study: effect of gastro-gastrostomy on morbid obesity • Subjects: cohort of 123 morbidly obese patients who underwent gastro-gastrostomy, 19 to 47 months after surgery • Success : losing >30% excess weight • Follow-up: 103 (84%) patients 20 patients lost to follow up Example…. (contd.) • Success rate: 60/103 (58%) • Best case: all 20 lost to follow up had “success” Best success rate: (60+20)/123 (65%) • Worst case: all 20 lost to follow up had “failures” Worst success rate: 60/123 (49%) • Result: true success rate b/w 49% and 65%; probably closer to 58% ! (because pts. lost to follow up unlikely to be all successes or all failures Randomization • The only way to equalize all extraneous factors, or ‘everything else’ is to assign patients to groups randomly so that each has an equal chance of falling into the exposed or unexposed group • Equalizes even those factors which we might not know about! • But it is not possible always Overall strategy • Except for randomization, all ways of dealing with extraneous differences b/w groups. Are effective against only those factors that are singled out for consideration • Ordinarily one uses several methods layered one upon another Example… • Study: effect of presence of VPCs on survival of patients after acute MI • Strategies: - Restriction: not too young / old; no unusual causes (e.g.mycotic aneurysm) for infarction - Matching: for age (as important prognostic factor, but not the factor under study) - Stratification: examine results for different strata of clinical severity - Multivariate analysis: adjust crude rates for the effects of all other variables except VPC, taken together. Dealing with measurement bias 1. 2. Blinding Subject Observer / interviewer Analyser Strict definition / standard definition for exposure / disease / outcome 3. Equal efforts to discover events equally in all the groups Controlling confounding • Similar to controlling for selection bias • Use randomization, restriction, matching, stratification, standardization, multivariate analysis etc. Lead time bias • Lead time is the period of time b/w the detection of a medical condition by screening and when it ordinarily would be diagnosed because a pt. experiences symptoms and seeks medical care • As a result of screening, on an average, pt will survive longer from the time of diagnosis than who are diagnosed otherwise, even if T/t is not effective. • Not more ‘survival time’, but more ‘disease time’ How lead time affects survival time Unscreened Diag Screened – Early T/t not effective Diag Screened – Early T/t is effective Diag Onset of Ds Death Survival after diagnosis Controlling lead time bias • Compare screened group of people, and control group, and compare age specific mortality rates, rather than survival times from time of diagnoses • E.g. early diagnosis and T/t for colorectal cancer is effective because mortality rates of screened people are lower than those of a comparable group of unscreened people Length time bias • Can affect studies of screening • B’cos the proportion of slow growing tumors diagnosed during screening programs is greater than those diagnosed during usual medical care • B’cos slow growing tumors are present for a longer period before they cause symptoms; fast growing tumors are likely to cause symptoms leading to interval diagnosis • Screening tends to find tumors with inherently better prognoses Compliance bias • Compliant patients tend to have better prognoses regardless of the screening • If a study compares disease outcomes among volunteers for a screening program with outcomes in a group of people who did not volunteer, better results for the volunteers might not be due to T/t but due to factors related to compliance • Compliance bias and length-time bias can both be avoided by relying on RCTs Types of studies & related biases Prevalence study •Uncertainty about temporal sequences •Bias studying ‘old’/prevalent cases Case control •Selection bias in selecting cases/controls •Measurement bias •Susceptibility bias •Survival cohort vs. true cohort •Migration bias •Consider natural h/o disease, Hawthorne effect, placebo effect etc. •Compliance problems •Effect of co-interventions Cohort study Randomized control trials Random error • Divergence on the basis of chance alone of an observation on a sample from the population from the true population values • ‘random’ because on an average it is as likely to result in observed values being on one side of the true value as on the other side • Inherent in all observations • Can be minimized, but never avoided altogether Sources of random error 1. Individual biological variation 2. Measurement error 3. Sampling error ( the part of the total estimation of error of a parameter caused by the random nature of the sample) Sampling variation Because research must ordinarily be conducted on a sample of patients and not on all the patients with the condition under study always a possibility that the particular sample of patients in a study, even though selected in an unbiased way, might not be similar to population of patients as a whole Sampling variation - definition Since inclusion of individuals in a sample is determined by chance, the results of analysis on two or more samples will differ purely by chance. (Last) Assessing the role of chance 1. Hypothesis testing 2. Estimation Hypothesis testing Start off with the Null Hypothesis (H0) the statistical hypothesis that one variable has no association with another variable or set of variables, or that two or more population distributions do not differ from one another. in simpler terms, the null hypothesis states that the results observed in a study, experiment or test are no different from what might have occurred as a result of operation of chance alone (Last) Statistical tests – errors (Fletcher) TRUE DIFFERENCE PRESENT ABSENT (H0) false (H0) true Type I SIGNIFICANT CONCLUSION ( α ) error (H0) Rejected Power OF NOT STATISTICAL Type II SIGNIFICANT TEST ( β ) error (H0) Accepted Statistical tests - errors • Type I (α) error: error of rejecting a true null hypothesis , I.e. declaring a difference exists when it does not • Type II (β) error: error of failing to reject a false null hypothesis , I.e. declaring that a difference does not exist when in fact it does • Power of a study: ability of a study to demonstrate an association if one exists Power = 1- β p - value • Probability of an α error. • Quantitative estimate of probability that observed difference in b/w the groups in the study could have happened by chance alone, assuming that there is no real difference b/w the groups OR • If there were no difference b/w the groups, and the trial was repeated many times, what proportion of the trials would lead to conclusions that there is the same or a bigger difference b/w the groups than the results found in the study p – value – Remember!! • Usually P < 0.05 is considered statistically significant (i.e. probability of 1 in 20 that observed difference is due to chance) • 0.05 is an arbitrary cut-off; can change according to requirements • Statistically significant result might not be clinically significant and vice-versa Statistical significance vs. clinical significance Large RCT called GUSTO (41,021 pts of ac MI) • Study: Streptokinase vs. tPA • Result: death rate at 30 days - streptokinase (7.2%) (p < 0.001) - tPA (6.3%) • But, need to treat 100 patients with tPA instead of streptokinase to prevent 1 death! • tPA costly - $ 250 thousand to save one death ??? Clinically significant Estimation • Effect size observed in a particular study is called ‘Point estimate’ • True effect is unlikely to be exactly that observed in study because of random variation • Confidence interval (CI): usually 95% (Last) computed interval with a given probability e.g. 95%, that the true value such as a mean, proportion, or rate is contained within the interval Confidence intervals (Fletcher) If the study is unbiased, there is a 95% chance that that the interval includes the true effect size. The true value is likely to be close to the point estimate, less likely to be near the outer limits of that interval, and could (5 times out of 100) fall outside these limits altogether, CI allows the reader to see the range of plausible values and so to decide whether the effect size they regard as clinically meaningful is consistent with or ruled out by the data Multiple comparison problem • If a no. of comparisons are made, (e.g. in a large study, the effect of treatment assessed separately for each subgroup, and for each outcome), 1 in 20 of these comparisons is likely to be statistically significant at the 0.05 level “If you dredge the data sufficiently deeply, and sufficiently often, you will find something odd. Many of these bizarre findings will be due to chance…….discoveries that were not initially postulated among the major objectives of the trial should be treated with extreme caution.” Dealing with random error • Increasing the sample size: sample size depends upon - level of statistical significance (α error) - Acceptable chance of missing a real effect (β error) - Magnitude of effect under investigation - Amount of disease in population - Relative sizes of groups being compared • Sample size is usually a compromise b/w ideal and logistic and financial considerations References 1. Fletcher RH et al.Clinical Epidemiology : The Essentials – 3rd ed. 2. Beaglehole R et al. Basic Epidemiology, WHO 3. Last JM. Dictionary in Epidemiology – 3rd ed. 4. Maxcy-Rosenau-Last. Public Health & Preventive Medicine – 14th ed. 5. Norell SE. Workbook of Epidemiology 6. Park K. Park’s textbook of preventive and social medicine – 16th ed.