Relative Risks Odds Ratios - John Snow's Cholera Investigations

advertisement
Case Study - Relative Risk and
Odds Ratio
John Snow’s Cholera Investigations
Population Information
• 2 Water Providers: Southwark & Vauxhall (S&V)
and Lambeth (L)
– S&V: Population: 267625 # Cholera Deaths: 3706
– L: Poulation: 171528 # Choleta Deaths: 411
3706
.013848
 .013848 odds( D | S & V ) 
 .014042
267625
1  .013848
411
.002396
P ( D | L) 
 .002396 odds ( D | L) 
 .002402
171528
1  .002396
.013848
.014042
Population (S & V/L) : RR 
 5.78 OR 
 5.85
.002396
.002402
P( D | S & V ) 
Sampling Distribution of RR & OR
• Goal: Obtain Empirical Sampling Distributions of
sample RR and OR and observe coverage rate of 95%
Confidence Intervals
• Process: Take independent random samples of size nSV
and nL from the 2 populations and observe XSV and XL
deaths in sample. These XSV and XL are approximately
distributed as Binomial random variables
(approximate due to sampling from finite, but very
large, populations)
X SV ~ B(nSV , pSV  0.013848)
X L ~ B(nL , pL  0.002396)
Binomial Distribution for Sample Counts
• Binomial “Experiment”
– Consists of n trials or observations
– Trials/observations are independent of one another
– Each trial/observation can end in one of two possible
outcomes often labelled “Success” and “Failure”
– The probability of success, p, is constant across
trials/observations
– Random variable, X, is the number of successes observed in
the n trials/observations.
• Binomial Distributions: Family of distributions for X,
indexed by Success probability (p) and number of
trials/observations (n). Notation: X~B(n,p)
Binomial Distributions and Sampling
• Problem when sampling from a finite sample: the
sequence of probabilities of Success is altered after
observing earlier individuals.
• When the population is much larger than the sample
(say at least 20 times as large), the effect is minimal and
we say X is approximately binomial
• Obtaining probabilities:
n k
P( X  k )    p (1  p) n k
k 
n
n!
  
k  0,1,, n
 k  k!(n  k )!
Table C gives probabilities for various n and p. Note that for p
> 0.5, use 1-p and you are obtaining P(X=n-k)
Simulating Binomial RVs
• Select n and p
• Obtain n random numbers distributed uniformly
between 0 and 1 (any software package should have
built-in random number generator): U1,…,Un
• Let X be the number of Ui values that  p
• X~B(n,p)
• Finite population adjustments can be made by
“correcting” p after each draw
• EXCEL has built in Function:
– Tools --> Data Analysis --> Random Number Generation
– --> Binomial --> Fill in p and n
Simulation Example
• Simulate by taking samples of nSV=nL=5000 individuals
from each population of customers
• Generate XSV~B(5000,.013848) and XL~B(5000,.002396)
• Compute sample relative risk, ln(RR), odds ratio,
ln(OR), and estimated std. errors of ln(RR) and ln(OR)
• Obtain 95% CIs for RR, OR (based on ln(RR),ln(OR)
• Repeat for a large number of samples (1000 samples)
• Obtain the empirical distribution of each statistic
• Obtain an indicator of whether the 95% CI for RR
contains the population RR (5.78) and whether the 95%
CI for OR contains the population OR (5.85)
Computations
^
p SV
X SV
X SV


nSV
5000
XL
XL
pL 

nL
5000
^
^
RR 
p SV
^
pL
oddsSV
X SV (5000  X L )
OR 

odds L
X L (5000  X SV )
X SV

XL
^
^
1  p SV 1  p L
SE (ln( RR )) 

X SV
XL
SE (ln( OR )) 
1
1
1
1



X SV 5000  X SV X L 5000  X L
95% CI for population ln(RR) : ln(RR)  1.96 SE (ln( RR ))
95% CI for population ln(OR) : ln(OR)  1.96 SE (ln( OR ))
Raise e  2.718... to the power of the lower and upper bounds
of CIs for ln(RR) and ln(OR) to get CIs for RR and OR
Histogram of (Sample) Relative Risks
70
50
40
30
20
10
12.1
11.3
10.5
9.7
8.9
8.1
7.3
6.5
5.7
4.9
4.1
3.3
0
2.5
Frequency
60
RR
Note that the distribution of Relative Risks is not normal
ln(RR)
Note that distribution of ln(RR) is approximately normal
More
3.2
3
2.8
2.6
2.4
2.2
2
1.8
1.6
1.4
1.2
140
120
100
80
60
40
20
0
1
Frequency
Histogram of Sample ln(RR)
Download