The 2006 Summer Program in Applied Biostatical & Epidemiological Methods Nicholas P. Jewell University of California Berkeley Ohio State University July 10, 2006 Day 1: Definitions, Measures of Disease Incidence & Association Course Outline Class meets from 8:30am—12:15pm Break? Labs Meet 5:30—8pm (except Friday when it stops at 7pm) Rough Idea of Topics Day 1: Definitions, Measures of Disease Incidence and Association Day 2: Confounding, Interaction & Stratification Techniques Day 3: Regression Models, Logistic Regression and Maximum Likelihood Day 4: Confounding & Interaction in Logistic Regression Models, Model Building & Goodness of Fit Day 5: Matched Studies, Alternatives and Extensions to Logistic Regression Nicholas P. Jewell © Copyright 2006, all rights reserved 2 Binary Outcome Data Binary Outcome Explanatory Factors Use of Mental Health Services in 2005 Costs of mental health visit, sex Moved Residence in 2005 Family size, family income Low birthweight of newborn Health insurance status of mother, marital status of mother Vote Republican in 2004 election Parental voting pattern, sex Health insurance coverage Place of birth, marital status Employment status in 2005 Education level Choice of transportation to work Income Nicholas P. Jewell © Copyright 2006, all rights reserved 3 Issues Related to Application Area Study design Randomized? Causality/association Definition of binary outcome Extensions Longitudinal observations More than 2 categories • Ordered categories? Nicholas P. Jewell © Copyright 2006, all rights reserved 4 Other Issues Statistical Art in addition to Statistical Science Case studies WCGS (CHD--men) Coffee drinking and pancreatic cancer Spontaneous abortion history and CHD (women) Titanic Nicholas P. Jewell © Copyright 2006, all rights reserved 5 How do we Measure the Binary Outcome for Disease Occurrence? Incidence/prevalence Role of ‘time’ • • • • Chronological time Exposure time age Number of contacts Incidence (time interval) Prevalence (time point or interval) Fractions: Incidence Proportion unitless Nicholas P. Jewell © Copyright 2006, all rights reserved 6 Incidence Proportion Definition (D, =1, “yes”): Define risk interval explicitly including time scale (calendar year 2005, year of age 55, first year after menopause, etc) Be at risk at the beginning of the interval (define explicitly what ‘at risk’ means) Become an incident case during interval Incidence proportion is fraction of at risk population who are D Nicholas P. Jewell © Copyright 2006, all rights reserved Cumulative measure 7 Incidence Rate Introduces time at risk into our thinking: Incidence Rate (time interval) “=“ #D/cum. time at risk Units are now time-1 Still measure applies to whole interval (so still cumulative in that sense) Instantaneous Incidence rate: Hazard Function dI (t ) h(t ) dt 1 I (t ) I(t) is the Incidence Proportion over the time interval [0,t] Nicholas P. Jewell © Copyright 2006, all rights reserved 8 Hazard Function for Caucasian Males in California in 1980 Nicholas P. Jewell © Copyright 2006, all rights reserved 9 Survival Function (1-I(t)) for Caucasian Males in California in 1980 Nicholas P. Jewell © Copyright 2006, all rights reserved 10 1991 US Infant Mortality Mother’s Marital Status Infant Mortality Death Unmarried Married Total 16,712 18,784 35,496 Live at 1 Year 1,197,142 2,878,421 4,075,563 Total 1,213,854 2,897,205 4,111,059 Nicholas P. Jewell © Copyright 2006, all rights reserved 11 1991 US Infant Mortality A: Death in First Year B: Unmarried Mother P(A&B) = 0.0041 P(A) = 0.0086 P(B) = 0.295 P(A)xP(B) = 0.0086 x 0.295 = 0.0025 Nicholas P. Jewell © Copyright 2006, all rights reserved 12 Measures of Association: Relative Risk RR P( D | E ) P( D | not E ) Relative measure RR = 1 Independence Note upper bound RR is not symmetric in allroles of D and E © Copyright 2006, rights reserved Nicholas P. Jewell 13 Non-Symmetry of RR RR E D Nicholas P. Jewell P( D | E ) P( E | D) RR D E P( D | not E ) P( E | not D) © Copyright 2006, all rights reserved 14 1991 US Infant Mortality Mother’s Marital Status Infant Mortality Death Unmarried Married Total 16,712 18,784 35,496 Live at 1 Year 1,197,142 2,878,421 4,075,563 Total 1,213,854 2,897,205 4,111,059 (16,712 / 1,213,854) 2.12 RR (assoc. with unmarried) = 18,784 / 2,897,205) Nicholas P. Jewell © Copyright 2006, all rights reserved 15 Measures of Association: Odds Ratio P( D | E ) OR P(not D | E ) P( D | not E ) P(not D | not E ) Relative measure OR = 1 Independence No upper bound OR is symmetric in roles of D and E Nicholas P. Jewell © Copyright 2006, all rights reserved 16 Symmetry of OR ORE D P( D | E ) P(not D | E ) P( D | not E ) P(not D | not E ) P( D & E ) / P( E ) P(not D & E ) / P( E ) P( D & E ) P(not D & E ) / Nicholas P. Jewell P( D & not E ) / P(not E ) P(not D & not E ) / P(not E ) P( D & not E ) P(not D & not E ) © Copyright 2006, all rights reserved 17 Symmetry of OR ORE D P( D & E ) P( D & not E ) / P(not D & E ) P(not D & not E ) P( D & E ) / P( D) P(not D & E ) / P( D) P( E | D) P(not E | D) Nicholas P. Jewell P( D & not E ) / P(not D) P(not D & not E ) / P(not D) P( E | not D) P(not E | not D) ORD E © Copyright 2006, all rights reserved 18 1991 US Infant Mortality Mother’s Marital Status Infant Mortality Death Unmarried Married Total 16,712 18,784 35,496 Live at 1 Year 1,197,142 2,878,421 4,075,563 Total 1,213,854 2,897,205 4,111,059 OR (assoc. with unmarried) Nicholas P. Jewell (16,712 / 1,213,854) /(1,197,142 / 1,213,854) 2.14 (18,784 / 2,897,205) /( 2,878,421 / 2,897,205) (16,712 / 35,496) /(18,784 / 35,496) © Copyright 2006, 19 (1,197 ,142all / 4rights ,075,reserved 563) /( 2,878,421 / 4,075,563) OR as Approximation to RR P( D | E ) OR P(not D | E ) Nicholas P. Jewell P( D | not E ) P(not D | not E ) © Copyright 2006, all rights reserved 20 OR as Approximation to RR P( D | E ) OR P (not D | E ) P ( D | not E ) P (not D | not E ) P( D | E ) P (not D | not E ) P ( D | not E ) P (not D | E ) Nicholas P. Jewell © Copyright 2006, all rights reserved 21 OR as Approximation to RR P( D | E ) OR P(not D | E ) P( D | not E ) P (not D | not E ) P( D | E ) P(not D | not E ) P( D | not E ) P (not D | E ) RR Nicholas P. Jewell P(not D | not E ) P(not D | E ) © Copyright 2006, all rights reserved 22 OR as Approximation to RR P( D | E ) OR P(not D | E ) P( D | not E ) P(not D | not E ) P( D | E ) P(not D | not E ) P( D | not E ) P(not D | E ) 1 if RR 1 P(not D | not E ) RR P(not D | E ) Nicholas P. Jewell © Copyright 2006, all rights reserved 23 OR as Approximation to RR P( D | E ) OR P(not D | E ) P( D | not E ) P(not D | not E ) P( D | E ) P(not D | not E ) P( D | not E ) P(not D | E ) 1 if RR 1 P(not D | not E ) RR P(not D | E ) Nicholas P. Jewell © Copyright 2006, all rights reserved OR RR 24 Comparison of RR and OR at Various Risk Levels P(D|not E) P(D|E) RR OR Relative Difference 0.01 0.02 0.01 0.05 0.10 Nicholas P. Jewell © Copyright 2006, all rights reserved 25 Comparison of RR and OR at Various Risk Levels P(D|not E) 0.01 Nicholas P. Jewell P(D|E) RR 0.01 1.00 0.02 2.00 0.05 5.00 0.10 10.00 OR © Copyright 2006, all rights reserved Relative Difference 26 Comparison of RR and OR at Various Risk Levels P(D|not E) 0.01 Nicholas P. Jewell P(D|E) RR OR 0.01 1.00 1.00 0.02 2.00 2.02 0.05 5.00 5.21 0.10 10.00 11.00 © Copyright 2006, all rights reserved Relative Difference 27 Comparison of RR and OR at Various Risk Levels P(D|not E) 0.01 Nicholas P. Jewell P(D|E) RR OR Relative Difference 0.01 1.00 1.00 0 0.02 2.00 2.02 1% 0.05 5.00 5.21 4.2% 0.10 10.00 11.00 10% © Copyright 2006, all rights reserved 28 Comparison of RR and OR at Various Risk Levels P(D|not E) P(D|E) RR OR Relative Difference 0.05 0.01 0.05 0.15 0.20 0.50 Nicholas P. Jewell © Copyright 2006, all rights reserved 29 Comparison of RR and OR at Various Risk Levels P(D|not E) 0.05 Nicholas P. Jewell P(D|E) RR OR 0.05 1.00 1.00 0.01 2.00 2.11 0.15 3.00 3.35 0.20 4.00 4.75 0.50 10.00 19.00 © Copyright 2006, all rights reserved Relative Difference 30 Comparison of RR and OR at Various Risk Levels P(D|not E) 0.05 Nicholas P. Jewell P(D|E) RR OR Relative Difference 0.05 1.00 1.00 0 0.01 2.00 2.11 5.6% 0.15 3.00 3.35 11.8% 0.20 4.00 4.75 18.8% 0.50 10.00 19.00 90% © Copyright 2006, all rights reserved 31 Comparison of RR and OR at Various Risk Levels P(D|not E) 0.10 Nicholas P. Jewell P(D|E) RR OR 0.10 1.00 1.00 0.15 1.50 1.59 0.20 2.00 2.25 0.30 3.00 3.86 0.40 4.00 6.00 0.50 5.00 9.00 © Copyright 2006, all rights reserved Relative Difference 32 Comparison of RR and OR at Various Risk Levels P(D|not E) 0.10 Nicholas P. Jewell P(D|E) RR OR Relative Difference 0.10 1.00 1.00 0 0.15 1.50 1.59 5.9% 0.20 2.00 2.25 12.5% 0.30 3.00 3.86 28.6% 0.40 4.00 6.00 50% 0.50 5.00 9.00 80% © Copyright 2006, all rights reserved 33 Comparison of RR and OR at Various Risk Levels P(D|not E) 0.20 Nicholas P. Jewell P(D|E) RR OR Relative Difference 0.20 1.00 1.00 0 0.30 1.50 1.71 14.3% 0.40 2.00 2.67 33.3% 0.50 2.50 4.00 60% 0.60 3.00 6.00 100% 0.80 4.00 16.00 300% 1.00 5.00 ∞ ∞ © Copyright 2006, all rights reserved 34 RH (solid line), RR (dotted line), OR (dashdotted line) as Risk Period extends in Time Nicholas P. Jewell © Copyright 2006, all rights reserved 35 Measures of Association: Odds Ratio ER P( D | E ) P( D | not E ) Absolute comparison ER = 0 Independence ER is not symmetric in roles of D and E Nicholas P. Jewell © Copyright 2006, all rights reserved 36 Measures of Association: Attributable Risk AR NP( D) NP( D | E ) NP( D) P( D) P( D | E ) P( D) Number of cases with current exposure distribution Number of cases with no exposure to E Population size = N Nicholas P. Jewell P ( E )( RR 1) 1 P ( E )( RR 1) © Copyright 2006, all rights reserved 37 1991 US Infant Mortality Mother’s Marital Status Infant Mortality Death Unmarried Married Total 16,712 18,784 35,496 Live at 1 Year 1,197,142 2,878,421 4,075,563 Total 1,213,854 2,897,205 4,111,059 AR (assoc. with unmarried) Nicholas P. Jewell (35,496 / 4,111,059) (18,784 / 2,897,205) 0.25 (35,496 / 4,111,059) (1,213,854 / 4,111,059) (2.12 1) © Copyright 2006, all rights reserved 1 (1,213,854 / 4,111,059) (2.12 1) 38 Attributable Risk—Caution! Encourages causal interpretation that may be incorrect Assumes modification of E doesn’t change other risk factors "Baseball is 90% mental -- the other half is physical." (Yogi Berra) Nicholas P. Jewell © Copyright 2006, all rights reserved 39 Target Populaton, Study population and Sample Target Population Study Population Sample Selection bias may occur when Study2006, Population Nicholas P. Jewell © Copyright all rights differs reserved from Study Population 40 Population-Based Study Need: Frame for Study Population Take a simple random sample of size n Measure D and E on sampled individuals Can estimate Joint probabilities, e.g. P(D & E) Marginal probabilities, e.g. P(D) Conditional probabilities, e.g. P(D | E) Nicholas P. Jewell © Copyright 2006, all rights reserved 41 Marital Status & Birthweight Birthweight Marital Unmarried Status Married at Birth Nicholas P. Jewell Low Normal 7 52 59 7 134 141 14 186 200 © Copyright 2006, all rights reserved 42 Marital Status & Birthweight Birthweight Marital Status at Birth P( D & E ) P( D) Low Normal Unmarried 7 52 59 Married 7 134 141 14 186 200 7 0.035 200 14 0.07 200 RRˆ 0.119 2.39 0.050 ORˆ 0.119 / 0.881 2.58 0.050 / 0.950 Joint probabilities Marginal probabilities ERˆ 0.119 0.050 0.069 7 0.119 59 7 P ( D | E ) 0 . 050 Nicholas P. Jewell 141 P( D | E ) Conditional probabilities © Copyright 2006, all rights reserved ARˆ 0.07 0.050 0.29 43 0.07 Cohort Study Need: Frame for Exposed and Unexposed Populations Take two (or more) simple random samples of size nE and nnot E , separately from exposed and unexposed populations, respectively Measure D on sampled individuals Can estimate Some Conditional probabilities, e.g. P(D | E) Nicholas P. Jewell © Copyright 2006, all rights reserved 44 Marital Status & Birthweight Birthweight Marital Status at Birth Low Normal Unmarried 12 88 100 Married 5 95 100 17 183 200 No Joint probabilities RRˆ 0.120 2.40 0.050 ORˆ 0.120 / 0.880 2.59 0.050 / 0.950 No Marginal probabilities 12 0.120 100 5 P( D | EP.) Jewell 0.050 Nicholas 100 P( D | E ) Conditional probabilities © Copyright 2006, all rights reserved ERˆ 0.120 0.050 0.070 45 Case-Control Study Need: Frame for Diseases and No Disease Populations Take two simple random samples of size nD and nnot D , separately from case-status groups Measure E on sampled individuals Can estimate Some Conditional probabilities, e.g. P(E | D) Nicholas P. Jewell © Copyright 2006, all rights reserved 46 Marital Status & Birthweight Birthweight Marital Status at Birth Low Normal Unmarried 50 28 78 Married 50 72 122 100 100 200 No Joint probabilities No Marginal probabilities 50 0.500 100 28 P( E | DP.) Jewell 0.280 Nicholas 100 P( E | D) ORˆ 0.5 / 0.5 2.57 0.28 / 0.72 Conditional probabilities © Copyright 2006, all rights reserved 47 Risk-Set (Density) Sampling For each incident case sampled at time t, select random set of controls from those still at risk at t Note control sampled at time s might be sampled as a case at time t 0 Nicholas P. Jewell T © Copyright 2006, all rights reserved t 48 Example: HSV-2 and Cervical Cancer Study Population: 550,000 woman with donations to serum banks in Finland, Norway, and Sweden Cervical cancer cases identified over time and linked to serum bank data for identification of HSV-2 status 3 random controls chosen who were cancer free at the time of diagnosis of a case Caution: HSV-2 status is measured at time of donation rather than at time of sampling Nicholas P. Jewell © Copyright 2006, all rights reserved 49 Standard Case-Control Sampling D E Exposur e not E nD P( E | D) nD P( E | D) nD not D nD P ( E | D ) nD P ( E | D ) nD OR ORED ORDE Nicholas P. Jewell © Copyright 2006, all rights reserved 50 Risk-Set Sampling E Exposure not E D not D N E (t )hE (t ) N E (t ) m N E (t ) N E (t ) N E (t )hE (t ) N E (t ) m N E (t ) N E (t ) OR RH (t ) 0 Nicholas P. Jewell T © Copyright 2006, all rights reserved t 51 Case-Cohort Sampling Select cases as for traditional or risk-set-sampling; select random set of m ”controls” from all those at risk at beginning of interval Note “control” might also be sampled as a case 0 Nicholas P. Jewell All controls T © Copyright 2006, all rights reserved t 52 Example: Low Fat Diet and Breast Cancer Women’s Health Trial randomly assigned 32,000 women (high risk group) to low fat intervention or control group All women filled out food questionnaires, and gave blood samples, at regular intervals over 10 years All breast cancer cases had their food diaries and blood samples analyzed 10% of original cohort were randomly selected to have their diaries and samples analyzed Nicholas P. Jewell © Copyright 2006, all rights reserved 53 Case-Cohort Sampling E Exposure not E D not D nD P( E | D) mP(E ) nD P( E | D) mP(E ) m nD nD 0 Nicholas P. Jewell All controls T © Copyright 2006, all rights reserved t OR RR 54 Case-Cohort Sampling:OR = RR nD P ( E | D ) mP( E ) " OR" nD P ( E | D ) mP( E ) P( E | D) P( E ) P( E | D) P( E ) P( E ) P( D | E ) P( E ) P( D) (Bayes’ Theorem) P( E ) P( D | E ) P( E ) P( D) P( D | E ) RR P ( D | E )© Copyright 2006, all rights reserved Nicholas P. Jewell 55 Rare Disease Assumption for OR RR Standard Case-control sampling Need rare disease assumption Risk Set Sampling No rare disease assumption if RH is of interest Case-Cohort Sampling No rare disease assumption if RR is of interest Nicholas P. Jewell © Copyright 2006, all rights reserved 56 2 x 2 Table Notation Disease Status Exposure Nicholas P. Jewell D not D E a b a+b not E c d c+d a+c b+d n © Copyright 2006, all rights reserved 57 Chi-Squared Test Population-based study: Independence of D and E Look at estimate of P(D&E)-P(D)P(E) Yields (ad-bc)/n2 Look at (ad-bc) or (ad-bc)2 for simplicity Estimated variance of (ad-bc) is (a+b)(a+c)(b+d)(c+d)/n Yields 2 n ( ad bc ) 2 (a b)(a c)(b d )(c d ) Nicholas P. Jewell © Copyright 2006, all rights reserved 58 Statistic for Assessing Independence ˆ (D & E) a / n P ˆ ( D) (a c) / n P ˆ ( E ) ( a b) / n P Pˆ ( D & E ) Pˆ ( D) Pˆ ( E ) (a / n) (a c)( a b) / n 2 (na a 2 ab ac bc) / n 2 Nicholas P. Jewell (a 2 ab ac ad a 2 ab ac bc) / n 2 (ad bc) 2 © Copyrightn2006, all rights reserved 59 Population-Based Study Birthweight Marital Status at Birth Low Normal Unmarried 7 52 59 Married 7 134 141 14 186 200 200 (7 134) (7 52) 2 3.04 59 14114 186 2 Nicholas P. Jewell p = 0.08 © Copyright 2006, all rights reserved 60 Cohort Study Cohort study Look at estimate of P(D|E)-P(D|not E) Yields (a/n1)-(c/n2) where n1 = a+b & n2 = c+d Estimated variance of (a/n1)-(c/n2) is 1 1 pˆ (1 pˆ ) n1 n2 ac p̂ n Yields 2 n ( ad bc ) 2 (a b)(a c)(b d )(c d ) Nicholas P. Jewell © Copyright 2006, all rights reserved 61 Cohort Study Birthweight Marital Status at Birth Low Normal Unmarried 12 88 100 Married 5 95 100 17 183 200 200 (12 95) (5 88) 2 3.15 100 100 17 183 2 Nicholas P. Jewell p = 0.08 © Copyright 2006, all rights reserved 62 Case-Control Study Case-Control study Look at estimate of P(E|D)-P(E|not D) Yields (a/n1)-(b/n2) where n1 = a+c & n2 = b+d Estimated variance of (a/n1)-(c/n2) is 1 1 ˆp(1 pˆ ) n1 n2 ab p̂ n Yields 2 n ( ad bc ) 2 (a b)(a c)(b d )(c d ) Nicholas P. Jewell © Copyright 2006, all rights reserved 63 Case-Control Study Birthweight Marital Status at Birth Low Normal Unmarried 50 28 78 Married 50 72 122 100 100 200 200 (50 72) (50 28) 10.17 78 122 100 100 2 2 Nicholas P. Jewell © Copyright 2006, all rights reserved p = 0.002 64 Power Comparison PopulationBased Cohort CaseControl 2 statistic 3.04 3.15 10.17 P-value 0.08 0.08 0.002 Nicholas P. Jewell © Copyright 2006, all rights reserved 65 Power Comparison for Specific Population: Cohort vs. Population-Based 1 1 ˆ ˆ (1 p ˆ ) V p n1 n2 ( pˆ 1 pˆ 2 ) 2 Vˆ 2 fixed 1 1 n1 n2 is minimized, for fixed n when n1 = n2 = n/2 n1 # E Nicholas P. Jewell p1 P( D | E ) p P( D | E ) n 2 # E 2 © Copyright 2006, all rights reserved 66 Power Comparison for Specific Population: Case-Control vs. Population-Based 1 1 ˆ ˆ (1 p ˆ ) V p n1 n2 ( pˆ 1 pˆ 2 ) 2 Vˆ 2 fixed 1 1 n1 n2 is minimized, for fixed n when n1 = n2 = n/2 n1 # D Nicholas P. Jewell p1 P( E | D) p P( E | D ) n 2 # D 2 © Copyright 2006, all rights reserved 67 Large-Sample Power Comparison Equal sample sizes of Exposed & Unexposed Cohort is more powerful than Population-Based Equal sample sizes of Cases & Controls Case-Control is more powerful than Population-Based Nicholas P. Jewell © Copyright 2006, all rights reserved 68 Power Comparison :Cohort & CaseControl (Equal Sample Sizes) 1 1 ˆ V pˆ (1 pˆ ) n1 n2 pˆ1 pˆ 2 Vˆ 2 fixed p1 p2 Power depends on size of d p(1 p) (where p ( p1 p2 ) / 2 d differs between Nicholas P. Jewell because of equal sample sizes) cohort and case-control (although OR is fixed) © Copyright 2006, all rights reserved 69 d against p Nicholas P. Jewell d is biggest when p = (p1 + p2 ) /2= 0.5 © Copyright 2006, all rights reserved 70 Power Comparison :Cohort & CaseControl (Equal Sample Sizes) When P(E) is closer to 0.5 than P(D), the case-control design has greater power than the cohort Since then the average of P(E|D) and P(E|not D) is closer to 0.5 than the average of P(D|E) and P(D|not E) When P(D) is closer to 0.5 than P(E), the cohort design has greater power than the case-control Since then the average of P(D|E) and P(D|not E) is closer to 0.5 than the average of P(E|D) and P(E|not D) Nicholas P. Jewell © Copyright 2006, all rights reserved 71 Rule of Thumb about Power/Precision Want both exposure and disease marginals to be as balanced as possible given fixed total sample size For fixed design, more sample still always gives greater power For example, suppose fixed number of cases (n1) Increasing controls (n2) still increases power since n1 n1 will get smaller but with diminishing 1 2 returns Nicholas P. Jewell © Copyright 2006, all rights reserved 72 Fixed Number of Cases-Increasing Number of Controls n2 kn1 1 1 n2 n1 k 1 ssf (k ) n1n2 kn1 n1 n2 ssf (1) 2k R ssf (k ) k 1 Nicholas P. Jewell R bigger means 2 statistic gets bigger by same amount © Copyright 2006, all rights reserved 73 How many more Controls than Cases? Nicholas P. Jewell Primary Copyright from 2006, all going rights reserved gain©comes from k = 1 to k = 4 74 2 x 2 Table Notation Disease Status Exposure a /( a b) b /( a b) ORˆ Nicholas P. Jewell D not D E a b a+b not E c d c+d a+c b+d n c /( c d ) d /( c d ) ad bc a /( a c) c /( a c) © Copyright 2006, all rights reserved b /(b d ) d /(b d ) 75 Cohort Study Example (Population OR = 1) Disease Status Typical Study Exposure status D Not D E 8 42 50 not E 11 39 50 19 81 100 8 39 ˆ OR 0.68 11 42 Nicholas P. Jewell 2 0.58; p = 0.44 © Copyright 2006, all rights reserved 76 Cohort Study Example (Population OR = 1) 1,000 typical studies Smallest OR estimate = 0.15 Largest OR estimate = 7.58 Average of OR estimates = 1.16 (bias) Median of OR estimates = 1 Nicholas P. Jewell © Copyright 2006, all rights reserved 77 Sampling Distribution of Odds Ratio Estimate Nicholas P. Jewell not Normal--skewed © Copyright 2006, all rights reserved 78 Cohort Study Example (Population OR = 1) 1,000 typical studies Smallest log(OR) estimate = -1.90 =log(0.15) Largest log(OR) estimate = 2.03 = log(7.58) Average of OR estimates = -0.011 (little bias) Median of OR estimates = 0 = log(1) Nicholas P. Jewell I always use natural logarithms © Copyright 2006, all rights reserved 79 Sampling Distribution of Log Odds Ratio Estimate Nicholas P. Jewell © Copyright 2006, all rights reserved 80 Confidence Intervals for the Odds Ratio Disease Status Exposure ORˆ D not D E a b a+b not E c d c+d a+c b+d n ad bc ad log( ORˆ ) log bc Nicholas P. Jewell 1 1 1 1 vâr(log ORˆ ) a b c d ˆ ˆ 95% CIs for log OR 1.96 va r̂(log OR) log(OR) and OR ˆ ˆ ˆ logOR 1.96 var̂(logOR ) logOR 1.96 ( e , e © Copyright 2006, all rights reserved var̂(logORˆ ) 81 ) Case-Control Study of Pancreatic Cancer Coffee Drinking (cups/day) Sex Disease Status 0 1-2 3-4 5+ Men Case Control 9 32 94 119 53 74 60 82 216 307 Women Case Control 11 56 59 152 53 80 28 48 151 336 Total 108 424 260 218 1010 Nicholas P. Jewell © Copyright 2006, all rights reserved Total 82 Case-Control Study of Pancreatic Cancer Pancreatic Cancer Coffee Drinking (cups/day) Cases Controls 1 347 555 902 0 20 88 108 367 643 1010 902 (347 88) (555 20) 902 108 367 643 16.60 log( ORˆ ) log( 2.75) 1.01 2 2 ORˆ 347 88 2.75 555 20 Nicholas P. Jewell 1 1 1 1 vâr log( ORˆ ) 0.066 347 555 20 88 95% CI for log(OR) :1.01 1.96 0.066 (0.508, 1.516) © Copyright 2006, all rights reserved 1.011.96 0.066 95% CI for OR : e (e 0.508 , e1.516 ) (1.66, 4.55) 83 Estimate & Confidence Intervals for the Relative Risk Disease Status Exposure a ˆ RR a b c cd a a b ˆ log( RR) log c cd Nicholas P. Jewell D not D E a b a+b not E c d c+d a+c b+d n vâr(log RRˆ ) b d a ( a b ) c (c d ) ˆ ˆ 95% CIs for log RR 1.96 va r̂(log RR) log(RR) ˆ ˆ ˆ log RR 1.96 var̂(log RR ) log RR 1.96 ( e , e © Copyright 2006, all rights reserved and RR var̂(log RRˆ ) )84 Western Collaborative Group Study Occurrence of CHD Behavior Type Yes No Type A 178 1411 1589 Type B 79 1486 1565 257 2897 3154 3154 (1787 1486) (1411 79) 1589 1565 257 2897 39.9 2 2 log( RRˆ ) log( 2.22) 0.797 vâr log( RRˆ ) 1411 1486 0.017 178 1589 79 1565 95% CI for log(RR) : 0.797 1.96 0.017 (0.542, 1.053) 178 1589 2.22 79 1565 Nicholas P. Jewell RRˆ 95% CI for RR : e 0.7971.96 0.017 (e 0.542 , e1.053 ) (1.72, 2.87) © Copyright 2006, all rights reserved 85 Estimate & Confidence Intervals for the Excess Risk Disease Status Exposure ERˆ a ab c cd D not D E a b a+b not E c d c+d a+c b+d n vâr( ERˆ ) ab cd ( a b) 3 (c d ) 3 95% CIs for ER: Nicholas P. Jewell ERˆ 1.96 var̂( ERˆ ) © Copyright 2006, all rights reserved 86 Western Collaborative Group Study Occurrence of CHD Behavior Type Yes No Type A 178 1411 1589 Type B 79 1486 1565 257 2897 3154 ERˆ 178 79 0.062 1589 1565 vâr ERˆ ) Nicholas P. Jewell 178 1411 79 1486 0.000093 3 3 1589 1565 95% CI for ER : 0.062 1.96 0.000093 (0.043, 0.080) © Copyright 2006, all rights reserved 87 Estimate & Confidence Intervals for the Attributable Risk: Population-Based Study Disease Status Exposure ac c n c d ˆ AR ac n ad bc ( a c )( c d ) D not D E a b a+b not E c d c+d a+c b+d n b ARˆ (a d ) ˆ vâr(log( 1 AR)) nc ˆ ˆ 95% CIs for log( 1 AR ) 1.96 va r̂(log( 1 AR)) log(1-AR) and AR Nicholas P. Jewell ˆ 2006, © Copyright rights vaall r̂(log( 1 ARˆreserved )) log(1 ARˆ ) 1.96 var̂(log(1 ARˆ )) 88 (1 e log(1 AR ) 1.96 ,1 e ) Western Collaborative Group Study Occurrence of CHD Behavior Type Yes No Type A 178 1411 1589 Type B 79 1486 1565 257 2897 3154 ARˆ (178 1486) (1411 79) 0.38 1565 257 vâr ERˆ ) Nicholas P. Jewell 178 1411 79 1486 0.000093 3 3 1589 1565 95% CI for ER : 0.062 1.96 0.000093 (0.043, 0.080) © Copyright 2006, all rights reserved 89 Small sample adjustments Odds Ratio Estimate: CIs: Exact tests/CIs ORˆ ss ad (b 1)(c 1) (a 0.5)( d 0.5) (log ORˆ ) ss log (b 0.5)(c 0.5) Vˆar (log ORˆ ) ss 1 1 1 1 a 0.5 b 0.5 c 0.5 d 0.5 Relative Risk a /( a b) Estimate: ˆ RRss Nicholas P. Jewell © Copyright 2006, all rights reserved (c 1)(c d 1) 90 Case-Control Study of Pancreatic Cancer Pancreatic Cancer Coffee Drinking (cups/day) ORˆ ss 347 88 2.62 556 21 Cases Controls 1 347 555 902 0 20 88 108 367 643 1010 347.5 88.5 log( ORˆ ) ss log 0.993 555.5 20.5 vâr log( ORˆ ) ss 1 1 1 1 0.065 347.5 555.5 20.5 88.5 95% CI for log(OR) : 0.993 1.96 0.065 (0.495, 1.492) An exact 95% CI for OR Jewell © Copyright 2006, all rights reserved isNicholas (1.64,P. 4.80) 0.9931.96 0.065 95% CI for OR : e (e 0.495 1.492 ,e ) (1.64, 4.45) 91 Small Sample Ideas Be aware when you have entered “small sample world” where approximations may not be accurate and adjustments/exact methods may be required Nicholas P. Jewell © Copyright 2006, all rights reserved 92