E - University of California, Berkeley

The 2006 Summer Program in
Applied Biostatical & Epidemiological Methods
Nicholas P. Jewell
University of California Berkeley
Ohio State University
July 10, 2006
Day 1: Definitions, Measures of
Disease Incidence & Association
Course Outline
 Class meets from 8:30am—12:15pm
 Break?
 Labs
 Meet 5:30—8pm (except Friday when it stops at 7pm)
 Rough Idea of Topics
 Day 1: Definitions, Measures of Disease Incidence and
Association
 Day 2: Confounding, Interaction & Stratification
Techniques
 Day 3: Regression Models, Logistic Regression and
Maximum Likelihood
 Day 4: Confounding & Interaction in Logistic Regression
Models, Model Building & Goodness of Fit
 Day 5: Matched Studies, Alternatives and Extensions to
Logistic Regression
Nicholas P. Jewell
© Copyright 2006, all rights reserved
2
Binary Outcome Data
Binary Outcome
Explanatory Factors
Use of Mental Health Services in
2005
Costs of mental health visit, sex
Moved Residence in 2005
Family size, family income
Low birthweight of newborn
Health insurance status of
mother, marital status of mother
Vote Republican in 2004 election
Parental voting pattern, sex
Health insurance coverage
Place of birth, marital status
Employment status in 2005
Education level
Choice of transportation to work
Income
Nicholas P. Jewell
© Copyright 2006, all rights reserved
3
Issues Related to Application Area
 Study design
 Randomized?
 Causality/association
 Definition of binary outcome
 Extensions
 Longitudinal observations
 More than 2 categories
• Ordered categories?
Nicholas P. Jewell
© Copyright 2006, all rights reserved
4
Other Issues
 Statistical Art in addition to
Statistical Science
 Case studies
 WCGS (CHD--men)
 Coffee drinking and pancreatic cancer
 Spontaneous abortion history and CHD
(women)
 Titanic
Nicholas P. Jewell
© Copyright 2006, all rights reserved
5
How do we Measure the Binary
Outcome for Disease Occurrence?
 Incidence/prevalence
 Role of ‘time’
•
•
•
•
Chronological time
Exposure time
age
Number of contacts
 Incidence (time interval)
 Prevalence (time point or interval)
 Fractions: Incidence Proportion
 unitless
Nicholas P. Jewell
© Copyright 2006, all rights reserved
6
Incidence Proportion
 Definition (D, =1, “yes”):
 Define risk interval explicitly including
time scale (calendar year 2005, year of
age 55, first year after menopause, etc)
 Be at risk at the beginning of the
interval (define explicitly what ‘at risk’
means)
 Become an incident case during interval
 Incidence proportion is fraction of at
risk population who are D
Nicholas P. Jewell
© Copyright 2006, all rights reserved
 Cumulative measure
7
Incidence Rate
 Introduces time at risk into our thinking:
Incidence Rate (time interval)
 “=“ #D/cum. time at risk
 Units are now time-1
 Still measure applies to whole interval (so still
cumulative in that sense)
 Instantaneous Incidence rate: Hazard Function
dI (t )
h(t )  dt
1  I (t )
I(t) is the Incidence Proportion over the time
interval [0,t]
Nicholas P. Jewell
© Copyright 2006, all rights reserved
8
Hazard Function for Caucasian
Males in California in 1980
Nicholas P. Jewell
© Copyright 2006, all rights reserved
9
Survival Function (1-I(t)) for Caucasian
Males in California in 1980
Nicholas P. Jewell
© Copyright 2006, all rights reserved
10
1991 US Infant Mortality
Mother’s Marital Status
Infant
Mortality
Death
Unmarried
Married
Total
16,712
18,784
35,496
Live at 1
Year
1,197,142
2,878,421
4,075,563
Total
1,213,854
2,897,205
4,111,059
Nicholas P. Jewell
© Copyright 2006, all rights reserved
11
1991 US Infant Mortality
 A: Death in First Year
B: Unmarried Mother
 P(A&B) = 0.0041
 P(A) = 0.0086
 P(B) = 0.295
 P(A)xP(B) = 0.0086 x 0.295 = 0.0025
Nicholas P. Jewell
© Copyright 2006, all rights reserved
12
Measures of Association: Relative Risk

RR 
P( D | E )
P( D | not E )
 Relative measure
 RR = 1
Independence
 Note upper bound
 RR is not symmetric
in allroles
of D and E
© Copyright 2006,
rights reserved
Nicholas P. Jewell
13
Non-Symmetry of RR
RR E  D
Nicholas P. Jewell
P( D | E )
P( E | D)


 RR D  E
P( D | not E ) P( E | not D)
© Copyright 2006, all rights reserved
14
1991 US Infant Mortality
Mother’s Marital Status
Infant
Mortality
Death
Unmarried
Married
Total
16,712
18,784
35,496
Live at 1
Year
1,197,142
2,878,421
4,075,563
Total
1,213,854
2,897,205
4,111,059
(16,712 / 1,213,854)
 2.12
RR (assoc. with unmarried) =
18,784 / 2,897,205)
Nicholas P. Jewell
© Copyright 2006, all rights reserved
15
Measures of Association: Odds Ratio

P( D | E )
OR  P(not D | E )
P( D | not E )
P(not D | not E )
 Relative measure
 OR = 1
Independence
 No upper bound
 OR is symmetric in roles of D and E
Nicholas P. Jewell
© Copyright 2006, all rights reserved
16
Symmetry of OR
ORE  D
P( D | E )
 P(not D | E )
P( D | not E )
P(not D | not E )
P( D & E ) / P( E )
 P(not D & E ) / P( E )
P( D & E )
 P(not D & E ) /
Nicholas P. Jewell
P( D & not E ) / P(not E )
P(not D & not E ) / P(not E )
P( D & not E )
P(not D & not E )
© Copyright 2006, all rights reserved
17
Symmetry of OR
ORE  D
P( D & E )
 P( D & not E ) /
P(not D & E )
P(not D & not E )
P( D & E ) / P( D)
 P(not D & E ) / P( D)
P( E | D)
 P(not E | D)
Nicholas P. Jewell
P( D & not E ) / P(not D)
P(not D & not E ) / P(not D)
P( E | not D)
P(not E | not D)
 ORD E
© Copyright 2006, all rights reserved
18
1991 US Infant Mortality
Mother’s Marital Status
Infant
Mortality
Death
Unmarried
Married
Total
16,712
18,784
35,496
Live at 1
Year
1,197,142
2,878,421
4,075,563
Total
1,213,854
2,897,205
4,111,059
OR
(assoc. with
unmarried)
Nicholas
P. Jewell
(16,712 / 1,213,854) /(1,197,142 / 1,213,854)
 2.14
(18,784 / 2,897,205) /( 2,878,421 / 2,897,205)
(16,712 / 35,496) /(18,784 / 35,496)

© Copyright
2006,
19
(1,197
,142all
/ 4rights
,075,reserved
563) /( 2,878,421 / 4,075,563)

OR as Approximation to RR
P( D | E )
OR  P(not D | E )
Nicholas P. Jewell
P( D | not E )
P(not D | not E )
© Copyright 2006, all rights reserved
20
OR as Approximation to RR
P( D | E )
OR  P (not D | E )
P ( D | not E )
P (not D | not E )
P( D | E )
P (not D | not E )


P ( D | not E )
P (not D | E )
Nicholas P. Jewell
© Copyright 2006, all rights reserved
21
OR as Approximation to RR
P( D | E )
OR  P(not D | E )

P( D | not E )
P (not D | not E )
P( D | E )
P(not D | not E )

P( D | not E )
P (not D | E )
 RR 
Nicholas P. Jewell
P(not D | not E )
P(not D | E )
© Copyright 2006, all rights reserved
22
OR as Approximation to RR
P( D | E )
OR  P(not D | E )
P( D | not E )
P(not D | not E )
P( D | E )
P(not D | not E )


P( D | not E )
P(not D | E )
 1 if RR  1
P(not D | not E )
 RR 
P(not D | E )
Nicholas P. Jewell
© Copyright 2006, all rights reserved
23
OR as Approximation to RR
P( D | E )
OR  P(not D | E )
P( D | not E )
P(not D | not E )
P( D | E )
P(not D | not E )


P( D | not E )
P(not D | E )
 1 if RR  1
P(not D | not E )
 RR 
P(not D | E )
Nicholas P. Jewell
© Copyright 2006, all rights reserved
 OR  RR
24
Comparison of RR and OR at
Various Risk Levels
P(D|not E)
P(D|E)
RR
OR
Relative
Difference
0.01
0.02
0.01
0.05
0.10
Nicholas P. Jewell
© Copyright 2006, all rights reserved
25
Comparison of RR and OR at
Various Risk Levels
P(D|not E)
0.01
Nicholas P. Jewell
P(D|E)
RR
0.01
1.00
0.02
2.00
0.05
5.00
0.10
10.00
OR
© Copyright 2006, all rights reserved
Relative
Difference
26
Comparison of RR and OR at
Various Risk Levels
P(D|not E)
0.01
Nicholas P. Jewell
P(D|E)
RR
OR
0.01
1.00
1.00
0.02
2.00
2.02
0.05
5.00
5.21
0.10
10.00
11.00
© Copyright 2006, all rights reserved
Relative
Difference
27
Comparison of RR and OR at
Various Risk Levels
P(D|not E)
0.01
Nicholas P. Jewell
P(D|E)
RR
OR
Relative
Difference
0.01
1.00
1.00
0
0.02
2.00
2.02
1%
0.05
5.00
5.21
4.2%
0.10
10.00
11.00
10%
© Copyright 2006, all rights reserved
28
Comparison of RR and OR at
Various Risk Levels
P(D|not E)
P(D|E)
RR
OR
Relative
Difference
0.05
0.01
0.05
0.15
0.20
0.50
Nicholas P. Jewell
© Copyright 2006, all rights reserved
29
Comparison of RR and OR at
Various Risk Levels
P(D|not E)
0.05
Nicholas P. Jewell
P(D|E)
RR
OR
0.05
1.00
1.00
0.01
2.00
2.11
0.15
3.00
3.35
0.20
4.00
4.75
0.50
10.00
19.00
© Copyright 2006, all rights reserved
Relative
Difference
30
Comparison of RR and OR at
Various Risk Levels
P(D|not E)
0.05
Nicholas P. Jewell
P(D|E)
RR
OR
Relative
Difference
0.05
1.00
1.00
0
0.01
2.00
2.11
5.6%
0.15
3.00
3.35
11.8%
0.20
4.00
4.75
18.8%
0.50
10.00
19.00
90%
© Copyright 2006, all rights reserved
31
Comparison of RR and OR at
Various Risk Levels
P(D|not E)
0.10
Nicholas P. Jewell
P(D|E)
RR
OR
0.10
1.00
1.00
0.15
1.50
1.59
0.20
2.00
2.25
0.30
3.00
3.86
0.40
4.00
6.00
0.50
5.00
9.00
© Copyright 2006, all rights reserved
Relative
Difference
32
Comparison of RR and OR at
Various Risk Levels
P(D|not E)
0.10
Nicholas P. Jewell
P(D|E)
RR
OR
Relative
Difference
0.10
1.00
1.00
0
0.15
1.50
1.59
5.9%
0.20
2.00
2.25
12.5%
0.30
3.00
3.86
28.6%
0.40
4.00
6.00
50%
0.50
5.00
9.00
80%
© Copyright 2006, all rights reserved
33
Comparison of RR and OR at
Various Risk Levels
P(D|not E)
0.20
Nicholas P. Jewell
P(D|E)
RR
OR
Relative
Difference
0.20
1.00
1.00
0
0.30
1.50
1.71
14.3%
0.40
2.00
2.67
33.3%
0.50
2.50
4.00
60%
0.60
3.00
6.00
100%
0.80
4.00
16.00
300%
1.00
5.00
∞
∞
© Copyright 2006, all rights reserved
34
RH (solid line), RR (dotted line), OR (dashdotted line) as Risk Period extends in Time
Nicholas P. Jewell
© Copyright 2006, all rights reserved
35
Measures of Association: Odds Ratio

ER  P( D | E )  P( D | not E )
 Absolute comparison
 ER = 0
Independence
 ER is not symmetric in roles of D and E
Nicholas P. Jewell
© Copyright 2006, all rights reserved
36
Measures of Association: Attributable Risk
AR 
NP( D)  NP( D | E )
NP( D)
P( D)  P( D | E )

P( D)
Number of cases with
current exposure distribution
Number of cases with
no exposure to E
Population size = N

Nicholas P. Jewell
P ( E )( RR  1)
1  P ( E )( RR  1)
© Copyright 2006, all rights reserved
37
1991 US Infant Mortality
Mother’s Marital Status
Infant
Mortality
Death
Unmarried
Married
Total
16,712
18,784
35,496
Live at 1
Year
1,197,142
2,878,421
4,075,563
Total
1,213,854
2,897,205
4,111,059
AR
(assoc. with
unmarried)
Nicholas
P. Jewell

(35,496 / 4,111,059)  (18,784 / 2,897,205)
 0.25
(35,496 / 4,111,059)
(1,213,854 / 4,111,059)  (2.12  1)
 © Copyright 2006, all rights reserved
1  (1,213,854 / 4,111,059)  (2.12  1)
38
Attributable Risk—Caution!
 Encourages causal interpretation that
may be incorrect
 Assumes modification of E doesn’t
change other risk factors
 "Baseball is 90% mental -- the other
half is physical." (Yogi Berra)
Nicholas P. Jewell
© Copyright 2006, all rights reserved
39
Target Populaton, Study population
and Sample
Target Population
Study Population
Sample
Selection
bias may occur when
Study2006,
Population
Nicholas P. Jewell
© Copyright
all rights differs
reserved from Study Population 40
Population-Based Study
 Need: Frame for Study Population
 Take a simple random sample of size n
 Measure D and E on sampled individuals
 Can estimate
 Joint probabilities, e.g. P(D & E)
 Marginal probabilities, e.g. P(D)
 Conditional probabilities, e.g. P(D | E)
Nicholas P. Jewell
© Copyright 2006, all rights reserved
41
Marital Status & Birthweight
Birthweight
Marital Unmarried
Status
Married
at Birth
Nicholas P. Jewell
Low
Normal
7
52
59
7
134
141
14
186
200
© Copyright 2006, all rights reserved
42
Marital Status & Birthweight
Birthweight
Marital
Status at
Birth
P( D & E ) 
P( D) 
Low
Normal
Unmarried
7
52
59
Married
7
134
141
14
186
200
7
 0.035
200
14
 0.07
200
RRˆ 
0.119
 2.39
0.050
ORˆ 
0.119 / 0.881
 2.58
0.050 / 0.950
Joint probabilities
Marginal probabilities
ERˆ  0.119  0.050  0.069
7

 0.119 
59

7

P
(
D
|
E
)


0
.
050
Nicholas P. Jewell
141

P( D | E ) 
Conditional probabilities
© Copyright 2006, all rights reserved
ARˆ 
0.07  0.050
 0.29
43
0.07
Cohort Study
 Need: Frame for Exposed and Unexposed
Populations
 Take two (or more) simple random samples of
size nE and nnot E , separately from exposed and
unexposed populations, respectively
 Measure D on sampled individuals
 Can estimate
 Some Conditional probabilities, e.g. P(D | E)
Nicholas P. Jewell
© Copyright 2006, all rights reserved
44
Marital Status & Birthweight
Birthweight
Marital
Status at
Birth
Low
Normal
Unmarried
12
88
100
Married
5
95
100
17
183
200
No Joint probabilities
RRˆ 
0.120
 2.40
0.050
ORˆ 
0.120 / 0.880
 2.59
0.050 / 0.950
No Marginal probabilities
12

 0.120 
100

5
P( D | EP.) Jewell

 0.050
Nicholas
100

P( D | E ) 
Conditional probabilities
© Copyright 2006, all rights reserved
ERˆ  0.120  0.050  0.070
45
Case-Control Study
 Need: Frame for Diseases and No Disease
Populations
 Take two simple random samples of size nD and
nnot D , separately from case-status groups
 Measure E on sampled individuals
 Can estimate
 Some Conditional probabilities, e.g. P(E | D)
Nicholas P. Jewell
© Copyright 2006, all rights reserved
46
Marital Status & Birthweight
Birthweight
Marital
Status at
Birth
Low
Normal
Unmarried
50
28
78
Married
50
72
122
100
100
200
No Joint probabilities
No Marginal probabilities
50

 0.500 
100

28
P( E | DP.) Jewell

 0.280
Nicholas
100

P( E | D) 
ORˆ 
0.5 / 0.5
 2.57
0.28 / 0.72
Conditional probabilities
© Copyright 2006, all rights reserved
47
Risk-Set (Density) Sampling
 For each incident case sampled at
time t, select random set of controls
from those still at risk at t
 Note control sampled at time s might
be sampled as a case at time t
0
Nicholas P. Jewell
T
© Copyright 2006, all rights reserved
t
48
Example: HSV-2 and Cervical Cancer
 Study Population: 550,000 woman with
donations to serum banks in Finland,
Norway, and Sweden
 Cervical cancer cases identified over time
and linked to serum bank data for
identification of HSV-2 status
 3 random controls chosen who were cancer
free at the time of diagnosis of a case
 Caution: HSV-2 status is measured at time
of donation rather than at time of
sampling
Nicholas P. Jewell
© Copyright 2006, all rights reserved
49
Standard Case-Control Sampling
D
E
Exposur
e
not E
nD P( E | D)
nD P( E | D)
nD
not D
nD P ( E | D )
nD P ( E | D )
nD
OR  ORED  ORDE
Nicholas P. Jewell
© Copyright 2006, all rights reserved
50
Risk-Set Sampling
E
Exposure
not E
D
not D
N E (t )hE (t )
N E (t )
m
N E (t )  N E (t )
N E (t )hE (t )
N E (t )
m
N E (t )  N E (t )
OR  RH (t )
0
Nicholas P. Jewell
T
© Copyright 2006, all rights reserved
t
51
Case-Cohort Sampling
 Select cases as for traditional or
risk-set-sampling; select random set
of m ”controls” from all those at risk
at beginning of interval
 Note “control” might also be sampled
as a case
0
Nicholas P. Jewell
All controls
T
© Copyright 2006, all rights reserved
t
52
Example: Low Fat Diet and Breast Cancer
 Women’s Health Trial randomly assigned
32,000 women (high risk group) to low fat
intervention or control group
 All women filled out food questionnaires,
and gave blood samples, at regular
intervals over 10 years
 All breast cancer cases had their food
diaries and blood samples analyzed
 10% of original cohort were randomly
selected to have their diaries and samples
analyzed
Nicholas P. Jewell
© Copyright 2006, all rights reserved
53
Case-Cohort Sampling
E
Exposure
not E
D
not D
nD P( E | D)
mP(E )
nD P( E | D)
mP(E )
m  nD
nD
0
Nicholas P. Jewell
All controls
T
© Copyright 2006, all rights reserved
t
OR  RR
54
Case-Cohort Sampling:OR = RR
nD P ( E | D )  mP( E )
" OR" 
nD P ( E | D )  mP( E )
P( E | D)  P( E )
P( E | D)  P( E )
P( E )
P( D | E ) 
 P( E )
P( D)
(Bayes’ Theorem)

P( E )
P( D | E ) 
 P( E )
P( D)
P( D | E )

 RR
P ( D | E )© Copyright 2006, all rights reserved
Nicholas P. Jewell
55

Rare Disease Assumption for OR  RR
 Standard Case-control sampling
 Need rare disease assumption
 Risk Set Sampling
 No rare disease assumption if RH is of
interest
 Case-Cohort Sampling
 No rare disease assumption if RR is of
interest
Nicholas P. Jewell
© Copyright 2006, all rights reserved
56
2 x 2 Table Notation
Disease Status
Exposure
Nicholas P. Jewell
D
not D
E
a
b
a+b
not E
c
d
c+d
a+c
b+d
n
© Copyright 2006, all rights reserved
57
Chi-Squared Test
 Population-based study: Independence of
D and E
Look at estimate of P(D&E)-P(D)P(E)
Yields (ad-bc)/n2
Look at (ad-bc) or (ad-bc)2 for simplicity
Estimated variance of (ad-bc) is
(a+b)(a+c)(b+d)(c+d)/n
 Yields
2
n
(
ad

bc
)
2 
(a  b)(a  c)(b  d )(c  d )




Nicholas P. Jewell
© Copyright 2006, all rights reserved
58
Statistic for Assessing
Independence
ˆ (D & E)  a / n
P
ˆ ( D)  (a  c) / n
P
ˆ ( E )  ( a  b) / n
P
Pˆ ( D & E )  Pˆ ( D) Pˆ ( E )  (a / n)  (a  c)( a  b) / n 2
 (na  a 2  ab  ac  bc) / n 2
Nicholas P. Jewell
 (a 2  ab  ac  ad  a 2  ab  ac  bc) / n 2
(ad  bc)

2
© Copyrightn2006, all rights reserved
59
Population-Based Study
Birthweight
Marital
Status at
Birth
Low
Normal
Unmarried
7
52
59
Married
7
134
141
14
186
200
200  (7 134)  (7  52)
2 
 3.04
59 14114 186
2
Nicholas P. Jewell
p = 0.08
© Copyright 2006, all rights reserved
60
Cohort Study
 Cohort study
 Look at estimate of P(D|E)-P(D|not E)
 Yields (a/n1)-(c/n2) where n1 = a+b & n2 = c+d
 Estimated variance of (a/n1)-(c/n2) is
1
1 
pˆ (1  pˆ )  
 n1 n2 
ac

 p̂ 

n 

 Yields
2
n
(
ad

bc
)
2 
(a  b)(a  c)(b  d )(c  d )
Nicholas P. Jewell
© Copyright 2006, all rights reserved
61
Cohort Study
Birthweight
Marital
Status at
Birth
Low
Normal
Unmarried
12
88
100
Married
5
95
100
17
183
200
200  (12  95)  (5  88)
2 
 3.15
100 100 17 183
2
Nicholas P. Jewell
p = 0.08
© Copyright 2006, all rights reserved
62
Case-Control Study
 Case-Control study
 Look at estimate of P(E|D)-P(E|not D)
 Yields (a/n1)-(b/n2) where n1 = a+c & n2 = b+d
 Estimated variance of (a/n1)-(c/n2) is
1
1 
ˆp(1  pˆ )  
 n1 n2 
ab

 p̂ 

n 

 Yields
2
n
(
ad

bc
)
2 
(a  b)(a  c)(b  d )(c  d )
Nicholas P. Jewell
© Copyright 2006, all rights reserved
63
Case-Control Study
Birthweight
Marital
Status at
Birth
Low
Normal
Unmarried
50
28
78
Married
50
72
122
100
100
200
200  (50  72)  (50  28)
 
 10.17
78 122 100 100
2
2
Nicholas P. Jewell
© Copyright 2006, all rights reserved
p = 0.002
64
Power Comparison
PopulationBased
Cohort
CaseControl
2 statistic
3.04
3.15
10.17
P-value
0.08
0.08
0.002
Nicholas P. Jewell
© Copyright 2006, all rights reserved
65
Power Comparison for Specific Population:
Cohort vs. Population-Based
1
1 
ˆ

ˆ (1  p
ˆ )
V  p

 n1 n2 
( pˆ 1  pˆ 2 ) 2
 
Vˆ
2
fixed
1 1

n1 n2
is minimized, for fixed n when
n1 = n2 = n/2
n1 # E
Nicholas P. Jewell
p1  P( D | E )
p  P( D | E )
n 2 # E
2
© Copyright 2006,
all rights reserved
66
Power Comparison for Specific Population:
Case-Control vs. Population-Based
1
1 
ˆ

ˆ (1  p
ˆ )
V  p

 n1 n2 
( pˆ 1  pˆ 2 ) 2
 
Vˆ
2
fixed
1 1

n1 n2
is minimized, for fixed n when
n1 = n2 = n/2
n1 # D
Nicholas P. Jewell
p1  P( E | D)
p  P( E | D )
n 2 # D
2
© Copyright 2006,
all rights reserved
67
Large-Sample Power Comparison
 Equal sample sizes of Exposed &
Unexposed
Cohort is more powerful
than Population-Based
 Equal sample sizes of Cases & Controls
Case-Control is more powerful than
Population-Based
Nicholas P. Jewell
© Copyright 2006, all rights reserved
68
Power Comparison :Cohort & CaseControl (Equal Sample Sizes)
1 1
ˆ
V  pˆ (1  pˆ )  
 n1 n2 
pˆ1  pˆ 2
 
Vˆ
2
fixed
p1  p2
Power depends on size of d 
p(1  p)
(where
p  ( p1  p2 ) / 2
d differs
between
Nicholas
P. Jewell
because of equal sample
sizes)
cohort
and case-control
(although OR is fixed)
© Copyright
2006, all rights reserved
69
d against p
Nicholas P. Jewell
d is biggest when p = (p1 + p2 ) /2= 0.5
© Copyright 2006, all rights reserved
70
Power Comparison :Cohort & CaseControl (Equal Sample Sizes)
 When P(E) is closer to 0.5 than P(D),
the case-control design has greater
power than the cohort
Since then the average of P(E|D) and P(E|not D) is
closer to 0.5 than the average of P(D|E) and P(D|not E)
 When P(D) is closer to 0.5 than P(E),
the cohort design has greater power
than the case-control
Since then the average of P(D|E) and P(D|not E) is
closer to 0.5 than the average of P(E|D) and P(E|not D)
Nicholas P. Jewell
© Copyright 2006, all rights reserved
71
Rule of Thumb about
Power/Precision
 Want both exposure and disease
marginals to be as balanced as possible
given fixed total sample size
 For fixed design, more sample still always
gives greater power
 For example, suppose fixed number of
cases (n1)
 Increasing controls (n2) still increases power
since n1  n1 will get smaller but with diminishing
1
2
returns
Nicholas
P. Jewell
© Copyright 2006, all rights reserved
72
Fixed Number of Cases-Increasing Number of Controls
n2  kn1
 1 1  n2  n1 k  1
   

 ssf (k )
n1n2
kn1
 n1 n2 
ssf (1)
2k
R

ssf (k ) k  1
Nicholas P. Jewell
R bigger means 2 statistic
gets bigger by same amount
© Copyright 2006, all rights reserved
73
How many more Controls than Cases?
Nicholas P. Jewell
Primary
Copyright from
2006, all going
rights reserved
gain©comes
from k = 1 to k = 4
74
2 x 2 Table Notation
Disease Status
Exposure
 a /( a  b) 
 b /( a  b) 

ORˆ  
Nicholas P. Jewell
D
not D
E
a
b
a+b
not E
c
d
c+d
a+c
b+d
n
 c /( c  d ) 
 d /( c  d ) 


ad

bc
 a /( a  c) 
 c /( a  c) 

 
© Copyright 2006, all rights reserved
 b /(b  d ) 
 d /(b  d ) 


75
Cohort Study Example
(Population OR = 1)
Disease Status
Typical Study
Exposure
status
D
Not D
E
8
42
50
not E
11
39
50
19
81
100
8  39
ˆ
OR 
 0.68
11 42
Nicholas P. Jewell
2  0.58; p = 0.44
© Copyright 2006, all rights reserved
76
Cohort Study Example
(Population OR = 1)
 1,000 typical studies
 Smallest OR estimate = 0.15
 Largest OR estimate = 7.58
 Average of OR estimates = 1.16 (bias)
 Median of OR estimates = 1
Nicholas P. Jewell
© Copyright 2006, all rights reserved
77
Sampling Distribution of Odds
Ratio Estimate
Nicholas P. Jewell
not Normal--skewed
© Copyright 2006, all rights reserved
78
Cohort Study Example
(Population OR = 1)
 1,000 typical studies
 Smallest log(OR) estimate = -1.90
=log(0.15)
 Largest log(OR) estimate = 2.03 =
log(7.58)
 Average of OR estimates = -0.011
(little bias)
 Median of OR estimates = 0 = log(1)
Nicholas P. Jewell
I always
use natural logarithms
© Copyright 2006, all rights
reserved
79
Sampling Distribution of Log Odds
Ratio Estimate
Nicholas P. Jewell
© Copyright 2006, all rights reserved
80
Confidence Intervals for the Odds
Ratio
Disease Status
Exposure
ORˆ 
D
not D
E
a
b
a+b
not E
c
d
c+d
a+c
b+d
n
ad
bc
 ad 
log( ORˆ )  log 

bc


Nicholas P. Jewell
1 1 1 1
vâr(log ORˆ )    
a b c d
ˆ
ˆ
95% CIs for log OR  1.96 va r̂(log OR)
log(OR)
and OR
ˆ
ˆ
ˆ
logOR 1.96 var̂(logOR )
logOR 1.96
(
e
,
e
© Copyright 2006, all rights reserved
var̂(logORˆ )
81 )
Case-Control Study of Pancreatic Cancer
Coffee Drinking (cups/day)
Sex
Disease
Status
0
1-2
3-4
5+
Men
Case
Control
9
32
94
119
53
74
60
82
216
307
Women
Case
Control
11
56
59
152
53
80
28
48
151
336
Total
108
424
260
218
1010
Nicholas P. Jewell
© Copyright 2006, all rights reserved
Total
82
Case-Control Study of Pancreatic Cancer
Pancreatic Cancer
Coffee
Drinking
(cups/day)
Cases
Controls
1
347
555
902
0
20
88
108
367
643
1010
902  (347  88)  (555  20)
 
902 108  367  643
 16.60
log( ORˆ )  log( 2.75)  1.01
2
2
ORˆ 
347  88
 2.75
555  20
Nicholas P. Jewell


1
1
1
1
vâr log( ORˆ ) 



 0.066
347 555 20 88
95% CI for log(OR) :1.01  1.96 0.066  (0.508, 1.516)
© Copyright 2006, all rights
reserved
1.011.96
0.066
95% CI for OR : e
 (e 0.508 , e1.516 )  (1.66, 4.55)
83
Estimate & Confidence Intervals
for the Relative Risk
Disease Status
Exposure
a
ˆ
RR  a  b
c
cd
a



a

b
ˆ
log( RR)  log 

c
 cd 
Nicholas P. Jewell
D
not D
E
a
b
a+b
not E
c
d
c+d
a+c
b+d
n
vâr(log RRˆ ) 
b
d

a ( a  b ) c (c  d )
ˆ
ˆ
95% CIs for log RR  1.96 va r̂(log RR)
log(RR)
ˆ
ˆ
ˆ
log RR 1.96 var̂(log RR )
log RR 1.96
(
e
,
e
© Copyright 2006, all rights reserved
and RR
var̂(log RRˆ )
)84
Western Collaborative Group Study
Occurrence of CHD
Behavior
Type
Yes
No
Type A
178
1411
1589
Type B
79
1486
1565
257
2897
3154
3154  (1787 1486)  (1411 79)
 
1589 1565  257  2897
 39.9
2
2
log( RRˆ )  log( 2.22)  0.797


vâr log( RRˆ ) 
1411
1486

 0.017
178  1589 79  1565
95% CI for log(RR) : 0.797  1.96 0.017  (0.542, 1.053)
178
1589  2.22
79
1565
Nicholas P. Jewell
RRˆ 
95% CI for RR : e 0.7971.96 0.017  (e 0.542 , e1.053 )  (1.72, 2.87)
© Copyright 2006, all rights reserved
85
Estimate & Confidence Intervals
for the Excess Risk
Disease Status
Exposure
ERˆ  a
ab
c
cd
D
not D
E
a
b
a+b
not E
c
d
c+d
a+c
b+d
n
vâr( ERˆ ) 
ab
cd

( a  b) 3 (c  d ) 3
95% CIs for ER:
Nicholas P. Jewell
ERˆ  1.96 var̂( ERˆ )
© Copyright 2006, all rights reserved
86
Western Collaborative Group Study
Occurrence of CHD
Behavior Type
Yes
No
Type A
178
1411
1589
Type B
79
1486
1565
257
2897
3154
ERˆ  178
 79
 0.062
1589
1565
 
vâr ERˆ ) 
Nicholas P. Jewell
178 1411 79 1486

 0.000093
3
3
1589
1565
95% CI for ER : 0.062  1.96 0.000093  (0.043, 0.080)
© Copyright 2006, all rights reserved
87
Estimate & Confidence Intervals for the
Attributable Risk: Population-Based Study
Disease Status
Exposure
ac
c

n
c

d
ˆ 
AR
ac
n

ad  bc
( a  c )( c  d )
D
not D
E
a
b
a+b
not E
c
d
c+d
a+c
b+d
n
b  ARˆ (a  d )
ˆ
vâr(log( 1  AR)) 
nc
ˆ
ˆ
95% CIs for log( 1  AR )  1.96 va r̂(log( 1  AR))
log(1-AR)
and AR
Nicholas P. Jewell
ˆ 2006,
© Copyright
rights
vaall
r̂(log(
1 ARˆreserved
))
log(1 ARˆ ) 1.96 var̂(log(1 ARˆ )) 88
(1  e log(1 AR ) 1.96
,1  e
)
Western Collaborative Group Study
Occurrence of CHD
Behavior Type
Yes
No
Type A
178
1411
1589
Type B
79
1486
1565
257
2897
3154
ARˆ 
 
(178 1486)  (1411 79)
 0.38
1565  257
vâr ERˆ ) 
Nicholas P. Jewell
178 1411 79 1486

 0.000093
3
3
1589
1565
95% CI for ER : 0.062  1.96 0.000093  (0.043, 0.080)
© Copyright 2006, all rights reserved
89
Small sample adjustments
 Odds Ratio
 Estimate:
 CIs:
 Exact tests/CIs
ORˆ ss 
ad
(b  1)(c  1)
 (a  0.5)( d  0.5) 

(log ORˆ ) ss  log 
 (b  0.5)(c  0.5) 


Vˆar (log ORˆ ) ss 
1
1
1
1



a  0.5 b  0.5 c  0.5 d  0.5
 Relative Risk
a /( a  b)
 Estimate: ˆ
RRss 
Nicholas P. Jewell
© Copyright 2006, all rights reserved
(c  1)(c  d  1)
90
Case-Control Study of Pancreatic Cancer
Pancreatic Cancer
Coffee
Drinking
(cups/day)
ORˆ ss 
347  88
 2.62
556  21
Cases
Controls
1
347
555
902
0
20
88
108
367
643
1010
 347.5  88.5 
log( ORˆ ) ss  log 
  0.993
 555.5  20.5 


vâr log( ORˆ ) ss 
1
1
1
1



 0.065
347.5 555.5 20.5 88.5
95% CI for log(OR) : 0.993  1.96 0.065  (0.495, 1.492)
An exact 95% CI for OR
Jewell
© Copyright 2006, all rights reserved
isNicholas
(1.64,P. 4.80)
0.9931.96 0.065
95% CI for OR : e
 (e
0.495
1.492
,e
)  (1.64, 4.45)
91
Small Sample Ideas
 Be aware when you have entered
“small sample world” where
approximations may not be accurate
and adjustments/exact methods may
be required
Nicholas P. Jewell
© Copyright 2006, all rights reserved
92