licensed under a . Your use of this Creative Commons Attribution-NonCommercial-ShareAlike License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this
material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2008, The Johns Hopkins University and Brian Caffo. All rights reserved. Use of these materials
permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or
warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently
review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for
obtaining permissions for use from third parties as needed.
Lecture 24
Brian Caffo
Table of
contents
Outline
Case-control
methods
Lecture 24
Rare disease
assumption
Exact
inference for
the odds ratio
Brian Caffo
Department of Biostatistics
Johns Hopkins Bloomberg School of Public Health
Johns Hopkins University
December 19, 2007
Lecture 24
Table of contents
Brian Caffo
Table of
contents
Outline
Case-control
methods
1 Table of contents
Rare disease
assumption
Exact
inference for
the odds ratio
2 Outline
3 Case-control methods
4 Rare disease assumption
5 Exact inference for the odds ratio
Lecture 24
Outline
Brian Caffo
Table of
contents
Outline
Case-control
methods
Rare disease
assumption
Exact
inference for
the odds ratio
1
Odds ratios for retrospective studies
2
Odds ratios approximating the prospective RR
3
Exact inference for the odds ratio
Lecture 24
Case-control methods
Brian Caffo
Table of
contents
Outline
Case-control
methods
Rare disease
assumption
Exact
inference for
the odds ratio
Smoker
Yes
No
Lung
Cases
688
21
709
cancer
Controls
650
59
709
• Case status obtained from records
• Cannot estimate P(Case | Smoker)
• Can estimate P(Smoker | Case)
Total
1338
80
1418
Lecture 24
Continued
Brian Caffo
Table of
contents
Outline
Case-control
methods
Rare disease
assumption
• Can estimate odds ratio b/c
Exact
inference for
the odds ratio
Odds(case | smoker)
Odds(case | smokerc )
=
Odds(smoker | case)
Odds(smoker | casec )
Lecture 24
Proof
Brian Caffo
Table of
contents
Outline
C - case, S - smoker
Case-control
methods
Odds(case | smoker)
Odds(case | smokerc )
Rare disease
assumption
Exact
inference for
the odds ratio
=
P(C | S)/P(C̄ | S)
P(C | S̄)/P(C̄ | S̄)
=
P(C , S)/P(C̄ , S)
P(C , S̄)/P(C̄ , S̄)
=
P(C , S)P(C̄ , S̄)
P(C , S̄)P(C̄ , S)
Exchange C and S and the result is obtained
Lecture 24
Notes
Brian Caffo
Table of
contents
Outline
Case-control
methods
Rare disease
assumption
Exact
inference for
the odds ratio
• Sample OR is nn11 nn22
12 21
• Sample OR is unchanged if a row or column is multiplied
by a constant
• Invariant to transposing
• Is related to RR
Lecture 24
Notes continued
Brian Caffo
Table of
contents
Outline
Case-control
methods
OR =
P(S | C )/P(S̄ | C )
P(S | C̄ )/P(S̄ | C̄ )
=
P(C | S)/P(C̄ | S)
P(C | S̄)/P(C̄ | S̄)
=
P(C | S) P(C̄ | S̄)
P(C | S̄) P(C̄ | S)
Rare disease
assumption
Exact
inference for
the odds ratio
= RR ×
1 − P(C | S̄)
1 − P(C | S)
• OR approximate RR if P(C | S̄) and P(C | S) are small
(or if they are nearly equal)
Lecture 24
Brian Caffo
Rare disease assumption
Table of
contents
Outline
Case-control
methods
Rare disease
assumption
Exposure
Yes
No
Exact
inference for
the odds ratio
Disease
Yes No
9
1
1
999
10
1000
• Cross-sectional data
ˆ = 10/1010 ≈ .01
• P(D)
ˆ = (9 × 999)/(1 × 1) = 8991
• OR
ˆ = (9/10)/(1/1000) = 900
• RR
• D is rare in the sample
• D is not rare among the exposed
Total
10
1000
1010
Lecture 24
Notes
Brian Caffo
Table of
contents
Outline
• OR = 1 implies no association
Case-control
methods
• OR > 1 positive association
Rare disease
assumption
• OR < 1 negative association
Exact
inference for
the odds ratio
• For retrospective CC studies, OR can be interpreted
prospectively
• For diseases that are rare among
the cases and controls, the OR
approximates the RR
• Delta method SE for log OR is
r
1
1
1
1
+
+
+
n11 n12 n21 n22
Lecture 24
Example
Brian Caffo
Table of
contents
Outline
Smoker
Yes
No
Case-control
methods
Rare disease
assumption
Exact
inference for
the odds ratio
Lung
Cases
688
21
709
cancer
Controls
650
59
709
Total
1338
80
1418
1
ˆ = 688×59 = 3.0
• OR
21×650
q
1
1
1
1
ˆ
• SE
ˆ =
688 + 650 + 21 + 59 = .26
log OR
• CI = log(3.0) ± 1.96 × .26 = [.59, 1.61]
• The estimated odds of lung cancer for smokers are 3 times
that of the odds for non-smokers with an interval of
[exp(.59), exp(1.61)] = [1.80, 5.00]
1
Data from Agresti, Categorical Data Analysis, second edition
Lecture 24
Brian Caffo
Exact inference for the OR
Table of
contents
Outline
Case-control
methods
Rare disease
assumption
Exact
inference for
the odds ratio
Smoker
Yes
No
Lung
Cases
688
21
709
cancer
Controls
650
59
709
Total
1338
80
1418
• X the number of smokers for the cases
• Y the number of smokers for the controls
• Calculate an exact CI for the odds ratio
• Have to eliminate a nuisance parameter
Lecture 24
Notation
Brian Caffo
Table of
contents
Outline
Case-control
methods
Rare disease
assumption
Exact
inference for
the odds ratio
• logit(p) = log{p/(1 − p)} is the log-odds
• Differences in logits are log-odds ratios
• logit{P(Smoker | Case)} = δ
• P(Smoker | Case) = e δ /(1 + e δ )
• logit{P(Smoker | Control)} = δ + θ
• P(Smoker | Control) = e δ+θ /(1 + e δ+θ )
• θ is the log-odds ratio
• δ is the nuisance parameter
Lecture 24
Notation
Brian Caffo
Table of
contents
Outline
Case-control
methods
Rare disease
assumption
Exact
inference for
the odds ratio
• X is binomial with n1 trials and success probability
e δ /(1 + e δ )
• Y is binomial with n2 trials and success probability
e δ+θ /(1 + e δ+θ )
n1
x
n1
x
P(X = x) =
=
eδ
1 + eδ
e xδ
x 1
1 + eδ
1
1 + eδ
n 1
n1 −x
Lecture 24
Brian Caffo
Table of
contents
Outline
P(X = x) =
n1
x
e
xδ
1
1 + eδ
n 1
Case-control
methods
Rare disease
assumption
Exact
inference for
the odds ratio
P(Y = z − x) =
n2
z −x
P(X + Y = z) =
X
e
(z−x)δ+(z−x)θ
1
1 + e δ+θ
n2
P(X = u)P(Y = z − u)
u
P(X = x)P(Y = z − x)
P(X = x | X + Y = z) = P
u P(X = u)P(Y = z − u)
Lecture 24
Brian Caffo
Table of
contents
Non-central hypergeometric
distribution
Outline
Case-control
methods
Rare disease
assumption
Exact
inference for
the odds ratio
n1
n2
e xθ
x
z −x
P(X = x | X + Y = z; θ) =
P
n1
n2
e uθ
u
u
z −u
• θ is the log odds ratio
• This distribution is used to calculate exact hypothesis tests
for H0 : θ = θ0
• Inverting exact tests yields exact confidence intervals for
the odds ratio
• Simplifies to the hypergeometric distribution for θ = 0