Advanced Epi August 15-19th 2011 SACEMA Matthew Fox Boston University Center for Global Health and Development Department of Epidemiology Health Economics and Epidemiology Research Office mfox@bu.edu Introductions Who are you? Where do you work/study? What do you study? Welcome About me Week long short course on epi methods 2 Sessions/day each about 3 hours (depending) Assumes intro/intermediate epi, practical experience with epi and stats Mix of lecture and discussion Too much material, take good notes, go back to them Finish mid-day on Friday Course works if you read and participate Course Overview Review basic epidemiologic principles Reinterpret them in a new light Think through problems/implications of what we learned in intro/intermed epi Develop a causal framework(s) to hang our epidemiologic thinking Learn/apply advanced epi methods Modern Epidemiology III Questions for Today What is epidemiology, what is its goal? What are measures of association and measures of effect? What do these measures really mean? Which ones have causal meanings? What is the odds ratio really about Why does everyone use it? The goal of epidemiologic research Epidemiology is study of: The distribution and determinants of disease in human populations and the application of that knowledge to the control of disease But the goal is: To obtain a valid and precise (and generalizable) estimate of the effect of an exposure on a disease Validity is the opposite of bias, precision is the opposite of random error Fundamentally concerned with measurement Anyone remember Type I and Type II error? What are they? Basic Statistics Truth about Null Effect Effect No effect Correct Type I error (alpha) Our study null No effect Type II error (beta) Correct Type I: If we reject the null, what are the chance there is no effect? Type II: If we fail to reject the null, what are the chances there is an effect? How do we know a particular epidemiologic finding is true? Find that the relative risk of exposure to vitamin # on cancer @ is 2.5, p=0.049 Assume we did the perfect study No bias (confounding, selection, information) 80% power, alpha = 0.05 What is chance there is really no effect of vitamins on cancer? i.e. True relative risk is 1 Syphilis testing in the US In US pre-2005, Massachusetts required a syphilis test before marriage Assume the test was: 95% sensitive and 95% specific If I test positive, how likely is it that I truly have syphilis? Answer is that it depends Syphilis Se = 95% Sp = 95% Truth - Total 95 495 590 5 9405 9410 9900 Prevalence is: 10,000 1% + + Test Total 100 PPV = 16% Back to our study Truth Effect Our study No effect Effect No effect Correct Type I error (alpha) Type II error (beta) Correct Alpha and beta use the TRUTH as the denominator and so are like Se and Sp Back to our study Truth Effect Our study No effect Effect No effect Correct Type I error (alpha) Type II error (beta) Correct Judging the “correctness” of a single study is the PPV, and depends of the prevalence of true hypotheses Back to our study alpha = 5%, (Sp 95%) beta = 5%, (Se 95%) Truth - Total 950 450 1400 50 8550 8600 + + Our Study - 68% chance our study is right Prevalence of true Total 1000 9000 10,000 hypotheses is: 10% Take home message: We need to critically examine the way we have been taught to design and interpret epidemiologic research Review of basic concepts Study design, measures of disease frequency, measures of effect/association The Source Population The population that gives rise to cases It is defined: In time and place With respect to population characteristics With respect to external influences (modifiers) Not as a sample of the general population Cohorts Membership in a cohort requires a person meet admissibility criteria Have common admissibility-defining events Membership begins once the temporally last criterion is met Once a member, a person never leaves (membership is static or closed) A closed cohort adds no new members and loses only to death, an open cohort is adding new members Dynamic population Membership requires a person satisfy the membership status criteria They have common admissibility-defining characteristics Membership exists so long as all of the status criteria are satisfied A person can enter a dynamic population, leave it, and then re-enter Cohorts vs. Dynamic Populations Framingham heart study – the admissibility criteria are enrolling in the study in 1948. Never leave the cohort once you enroll. Dynamic population – could have instead studied all residents of Framingham from 1948 onwards, the catchment population for a case registry there. Some will leave, new people will join. Cohort STUDY DESIGN: How to harvest information from the base Census (cohort) or Sample (case-control) Cases are valuable (information rich) In SE calcs, these drive your standard error Ex. SE(LN(RR)) = sqrt(1/A–1/N1+1/B–1/N0) Include all the cases in the population Information density of population that gave rise to cases is not great Can include all or sample Nearly all base’s info is harvested when sample of base is small multiple of the cases Which is the best measure to assess causal effects? 1) Risk Difference 2) Risk Ratio 3) Odds Ratio In a case-control study, from what population do we sample controls? 1) 2) 3) Those with disease Those without disease Everyone, regardless of whether they have the disease Cohort Study Case-control Study Kramer and Bovin 1987 We define a cohort study as a study in which subjects are followed forward from exposure to outcome… Inferential reasoning is from cause to effect. In casecontrol studies, the directionality is the reverse. Study subjects are investigated backwards from outcome to exposure, and the reasoning is from effect to cause.” Cohort Study: Relative Risks Index (E+) Reference (E-) Cases A B Non-cases C D Total N1 N0 Relative risk: Risk (A/N1) / (B/N0) in exposed / risk in unexposed Risk is number of cases / total at risk Numerator is number of cases Denominator is cases and controls! Cohort Concept Exposed Cases A NE+ C (NE+ - a) NE- t D (NE- - b) t0 Unexposed Cases B Cohort Study: Relative Risks Index (E+) Reference (E-) Cases A B Non-cases C D Total N1 N0 Relative risk: (A/N1)/(B/N0) can be rearranged as (A/B)/(N1/N0) A/B is ratio of exposed to unexposed cases N1/N0 is ratio of exposed to unexposed in population Relative risk has meaning: average increase in risk produced by exposure Case-control: Cases Members of population who develop disease over the follow-up period Same cases as the analogous cohort study Case ascertainment is influenced by design Primary base: population defined first Secondary base: cases defined first Case-control: Controls A sample of the population experience that gave rise to the cases 3 options (paradigms) Un-diseased experience Population at risk at beginning of the study Population experience over follow-up Cases Non-cases 0 mos 6 mos 12 mos 18 mos 24 mos 0 5 10 15 20 100 95 90 85 80 Case-control Concept Option 2: Case-cohort Exposed Cases A Option 1: Cumulative NE+ C (NE+ - a) NE- t D (NE- - b) t0 Option 3: Density Sampling Unexposed Cases B Case-control study Index Reference Cases A B Controls C D Now we can’t estimate risk A/N1 and B/N0 because we don’t know the denominators Left with an odds ratio But how to interpret? 2 ways to calculate an OR Index Reference Cases A B Controls C D Cross product ratio: (A*D)/(B*C) Not particularly meaningful, but it works 2 ways to calculate an OR Index Reference Cases A B Controls C D Case ratio/base ratio: (A/B) / (C/D) A/B is the ratio of exposed to unexposed cases C/D is the ratio of exposed to unexposed controls Remember back to Relative Risk Here C/D fills in for N1/N0 The trohoc fallacy Index Reference Cases 400 100 Non-cases 600 Total 1000 Index Reference Cases 400 100 900 Non-cases 60 90 1000 Total Not sampled RR = (400/1000) / (100/1000) = 4.0 10% sample of non-cases OR = (400/60) / (100/90) = 6.0 The trohoc fallacy is idea that a case-control study is a cohort study done backwards (heteropalindrome) Requires a rare disease assumption for the odds ratio to approximate the relative risk Case-control Concept Option 2: Case-cohort Exposed Cases A Option 1: Cumulative NE+ C (NE+ - a) NE- t D (NE- - b) t0 Unexposed Cases B 10% sample of population that gave rise to cases The trohoc fallacy revealed Index Reference Index Reference Cases 400 100 Cases 400 100 Non-cases 600 900 Non-cases Not sampled Total 1000 1000 Controls 100 100 RR = (400/1000) / (100/1000) = 4.0 Sample total population that gave rise to cases (which includes cases), not undiseased at end Cases OR = (400/100) / (100/100) = 4.0 can be their own controls if randomly sampled Requires no rare disease assumption Miettinen on the trohoc fallacy “Consider the clinical trial: the concern is, as always, to contrast categories of treatment as to subsequent occurrence of some outcome phenomenon, whereas comparing different categories of the outcome as to the antecedent distribution of treatment is uninteresting if not downright perverse.” Preferred terms like “case-referent” and “casebase” studies as “the base sample is no more a control series than a census of the base is” Why it works OR = [A*D] / [B*C] = [A/B] / [C/D] If we sample 10% of the Cases base then the odds ratio is: OR = Non[A/B] /[(10%*N1)/(10%*N0)] case = [A/B]/(N1/N0) = RR Total Index Ref A B C D N1 N0 Cohort studies exclude those who are not at risk for disease (though they don’t need to). In a case control study. Should we exclude those not at risk for exposure? Ex. In a study of hormonal contraception and heart disease, should we exclude nuns? With appropriate sampling, odds ratio is interpreted as estimate of relative risk, which has meaning. Case control studies are cohort studies done efficiently, not cohort studies done backwards. Measures of Disease Frequency Provide an estimate of the occurrence of disease in a population Typically we study first occurrence as later occurrences are often affected by first Incorporates: Disease state Time Population definition Measures of Disease Frequency Prevalence: Proportion of population with disease at a particular time Cross-sectional Reflects rate of disease occurrence and survival with disease Measures of Disease Frequency Cumulative Incidence (Simple) Proportion of a population that develops disease over a follow-up period Also called incidence proportion or risk Bounded by 0 and 1 Time not part of measure but must report Difficult to measure in dynamic populations CI(t0,t) = I(t0,t)/N0 Measures of Disease Frequency Incidence rate (density) Number of newly developed cases divided by accumulated person time Time is part of the denominator Can be used in dynamic populations/cohorts Ignores distinction between individuals (2/100 py could be 2 followed 50 yrs each, both get event or 100 followed 1 yr each, 2 get event) N IR(t ,t) = I(t0,t) /∑PT 0 where PT t or PT Nt i i 1 Measures of Disease Frequency Rules for counting person time Start disease free, free of history of disease at entry At risk for outcome? Not necessary, but wasteful Start after exposure is complete (not during) and after minimum induction period Stop when disease occurs (date or midpoint) Stop if withdrawn (lost to follow up, death from another cause, study ends, no longer at risk) Only those eligible to be counted in numerator are in denominator Ask, if became a case, would I have counted them? Person Time Issues I We conduct a cohort study of continuous smoking vs. no smoking and prostate cancer Enroll 1000 smokers and 1000 non-smokers At end, find 100 non-smokers became smokers. Should we exclude them? Can’t because if they became cases while not smoking we would have included them Person Time Issues II Study HAART regimens and death But much death and LTFU in first 6-months and we care about long term mortality Exclude any deaths in first 6-months OK if all we care about is long-term effects When should person time start? Immortal person-time biases towards null Black triangle Prevalence = 2/8 = 0.25 Black triangle Cum Inc = 2/9 5 5 5 5 5 Black triangle 5 Inc Rate = 2/42 2 5 5 Measure of Effect Comparison of occurrence of outcome in the same population at same time under two different conditions Only one can be observed Second is “counterfactual” (we will come back to this) Theoretical, as such we substitute measure of association But as an approximation to measure of effect Measures of Association Comparison of incidence in 2+ populations Relative: Comparison by division Null (no effect) is 1 Log scale (distance from 0-1 is same as 1 to infinity) Difference: Comparison by subtraction Null (no effect) is 0 Distance above and below null is equivalent Calculations RD CI E CI E IRD IRE IRE CI E RR CI E IRE IRR IRE Conclusion Objective is a VALID and PRECISE estimate of the effect of an exposure on an outcome Need to think critically about the logic of the methods we have been taught Make sure we understand how to validly design studies and how to correctly interpret study findings Odds ratios are odd Correct sampling means can reduce reliance on them