This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2008, The Johns Hopkins University and Brian Caffo. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed. Lecture 24 Brian Caffo Table of contents Outline Case-control methods Lecture 24 Rare disease assumption Exact inference for the odds ratio Brian Caffo Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University December 19, 2007 Lecture 24 Table of contents Brian Caffo Table of contents Outline Case-control methods 1 Table of contents Rare disease assumption Exact inference for the odds ratio 2 Outline 3 Case-control methods 4 Rare disease assumption 5 Exact inference for the odds ratio Lecture 24 Outline Brian Caffo Table of contents Outline Case-control methods Rare disease assumption Exact inference for the odds ratio 1 Odds ratios for retrospective studies 2 Odds ratios approximating the prospective RR 3 Exact inference for the odds ratio Lecture 24 Case-control methods Brian Caffo Table of contents Outline Case-control methods Rare disease assumption Exact inference for the odds ratio Smoker Yes No Lung Cases 688 21 709 cancer Controls 650 59 709 • Case status obtained from records • Cannot estimate P(Case | Smoker) • Can estimate P(Smoker | Case) Total 1338 80 1418 Lecture 24 Continued Brian Caffo Table of contents Outline Case-control methods Rare disease assumption • Can estimate odds ratio b/c Exact inference for the odds ratio Odds(case | smoker) Odds(case | smokerc ) = Odds(smoker | case) Odds(smoker | casec ) Lecture 24 Proof Brian Caffo Table of contents Outline C - case, S - smoker Case-control methods Odds(case | smoker) Odds(case | smokerc ) Rare disease assumption Exact inference for the odds ratio = P(C | S)/P(C̄ | S) P(C | S̄)/P(C̄ | S̄) = P(C , S)/P(C̄ , S) P(C , S̄)/P(C̄ , S̄) = P(C , S)P(C̄ , S̄) P(C , S̄)P(C̄ , S) Exchange C and S and the result is obtained Lecture 24 Notes Brian Caffo Table of contents Outline Case-control methods Rare disease assumption Exact inference for the odds ratio • Sample OR is nn11 nn22 12 21 • Sample OR is unchanged if a row or column is multiplied by a constant • Invariant to transposing • Is related to RR Lecture 24 Notes continued Brian Caffo Table of contents Outline Case-control methods OR = P(S | C )/P(S̄ | C ) P(S | C̄ )/P(S̄ | C̄ ) = P(C | S)/P(C̄ | S) P(C | S̄)/P(C̄ | S̄) = P(C | S) P(C̄ | S̄) P(C | S̄) P(C̄ | S) Rare disease assumption Exact inference for the odds ratio = RR × 1 − P(C | S̄) 1 − P(C | S) • OR approximate RR if P(C | S̄) and P(C | S) are small (or if they are nearly equal) Lecture 24 Brian Caffo Rare disease assumption Table of contents Outline Case-control methods Rare disease assumption Exposure Yes No Exact inference for the odds ratio Disease Yes No 9 1 1 999 10 1000 • Cross-sectional data ˆ = 10/1010 ≈ .01 • P(D) ˆ = (9 × 999)/(1 × 1) = 8991 • OR ˆ = (9/10)/(1/1000) = 900 • RR • D is rare in the sample • D is not rare among the exposed Total 10 1000 1010 Lecture 24 Notes Brian Caffo Table of contents Outline • OR = 1 implies no association Case-control methods • OR > 1 positive association Rare disease assumption • OR < 1 negative association Exact inference for the odds ratio • For retrospective CC studies, OR can be interpreted prospectively • For diseases that are rare among the cases and controls, the OR approximates the RR • Delta method SE for log OR is r 1 1 1 1 + + + n11 n12 n21 n22 Lecture 24 Example Brian Caffo Table of contents Outline Smoker Yes No Case-control methods Rare disease assumption Exact inference for the odds ratio Lung Cases 688 21 709 cancer Controls 650 59 709 Total 1338 80 1418 1 ˆ = 688×59 = 3.0 • OR 21×650 q 1 1 1 1 ˆ • SE ˆ = 688 + 650 + 21 + 59 = .26 log OR • CI = log(3.0) ± 1.96 × .26 = [.59, 1.61] • The estimated odds of lung cancer for smokers are 3 times that of the odds for non-smokers with an interval of [exp(.59), exp(1.61)] = [1.80, 5.00] 1 Data from Agresti, Categorical Data Analysis, second edition Lecture 24 Brian Caffo Exact inference for the OR Table of contents Outline Case-control methods Rare disease assumption Exact inference for the odds ratio Smoker Yes No Lung Cases 688 21 709 cancer Controls 650 59 709 Total 1338 80 1418 • X the number of smokers for the cases • Y the number of smokers for the controls • Calculate an exact CI for the odds ratio • Have to eliminate a nuisance parameter Lecture 24 Notation Brian Caffo Table of contents Outline Case-control methods Rare disease assumption Exact inference for the odds ratio • logit(p) = log{p/(1 − p)} is the log-odds • Differences in logits are log-odds ratios • logit{P(Smoker | Case)} = δ • P(Smoker | Case) = e δ /(1 + e δ ) • logit{P(Smoker | Control)} = δ + θ • P(Smoker | Control) = e δ+θ /(1 + e δ+θ ) • θ is the log-odds ratio • δ is the nuisance parameter Lecture 24 Notation Brian Caffo Table of contents Outline Case-control methods Rare disease assumption Exact inference for the odds ratio • X is binomial with n1 trials and success probability e δ /(1 + e δ ) • Y is binomial with n2 trials and success probability e δ+θ /(1 + e δ+θ ) n1 x n1 x P(X = x) = = eδ 1 + eδ e xδ x 1 1 + eδ 1 1 + eδ n 1 n1 −x Lecture 24 Brian Caffo Table of contents Outline P(X = x) = n1 x e xδ 1 1 + eδ n 1 Case-control methods Rare disease assumption Exact inference for the odds ratio P(Y = z − x) = n2 z −x P(X + Y = z) = X e (z−x)δ+(z−x)θ 1 1 + e δ+θ n2 P(X = u)P(Y = z − u) u P(X = x)P(Y = z − x) P(X = x | X + Y = z) = P u P(X = u)P(Y = z − u) Lecture 24 Brian Caffo Table of contents Non-central hypergeometric distribution Outline Case-control methods Rare disease assumption Exact inference for the odds ratio n1 n2 e xθ x z −x P(X = x | X + Y = z; θ) = P n1 n2 e uθ u u z −u • θ is the log odds ratio • This distribution is used to calculate exact hypothesis tests for H0 : θ = θ0 • Inverting exact tests yields exact confidence intervals for the odds ratio • Simplifies to the hypergeometric distribution for θ = 0