Cobb's Lecture #4

advertisement
Analysis of matched data
HRP 261 02/02/04
Chapter 9 Agresti – read sections 9.1 and 9.2
Pair Matching: Why match?

Pairing can control for extraneous sources
of variability and increase the power of a
statistical test.
 Match 1 control to 1 case based on potential
confounders, such as age, gender, and
smoking.
Example

Johnson and Johnson (NEJM 287: 1122-1125,
1972) selected 85 Hodgkin’s patients who had a
sibling of the same sex who was free of the
disease and whose age was within 5 years of the
patient’s…they presented the data as….
Tonsillectomy
None
Hodgkin’s
41
44
Sib control
33
52
OR=1.47; chi-square=1.53 (NS)
From John A. Rice, “Mathematical Statistics and Data Analysis.
Example

But several letters to the editor pointed out that
those investigators had made an error by
ignoring the pairings. These are not
independent samples because the sibs are
paired…better to analyze data like this:
Control
Tonsillectomy
None
Tonsillectomy
37
7
None
15
26
Case
OR=2.14; chi-square=2.91 (p=.09)
From John A. Rice, “Mathematical Statistics and Data Analysis.
Pair Matching: Agresti example
Match each MI case to an MI control based on
age and gender.
Ask about history of diabetes to find out if
diabetes increases your risk for MI.
Pair Matching: Agresti example
Just the discordant cells are
informative!
MI controls
MI cases
Diabetes
No Diabetes
Diabetes
9
37
No diabetes
16
82
25
119
Which cells are informative?
46
98
144
Pair Matching
MI controls
MI cases
Diabetes
No Diabetes
Diabetes
9
37
No diabetes
16
82
25
119
46
98
144
OR estimate comes only from discordant pairs!
The question is: among the discordant pairs, what
proportion are discordant in the direction of the
case vs. the direction of the control. If more
discordant pairs “favor” the case, this indicates
OR>1.
MI controls
MI cases
Diabetes
No Diabetes
Diabetes
9
37
No diabetes
16
82
25
119
46
98
144
P(“favors” case/discordant pair) =
P( E / D) * P(~ E / ~ D)
P( E / D) * P(~ E / ~ D)  P(~ E / D) * P( E / ~ D)
=the probability of observing a
case-control pair with only the
case exposed
=the probability of observing a
case-control pair with only the
control exposed
MI controls
MI cases
Diabetes
No Diabetes
Diabetes
9
37
No diabetes
16
82
25
119
P(“favors” case/discordant pair) =
37
b
37
ˆ 
p


37  16 b  c 53
46
98
144
MI controls
MI cases
Diabetes
No Diabetes
Diabetes
9
37
No diabetes
16
82
25
119
odds(“favors” case/discordant pair) =
b 37
OR  
c 16
46
98
144
MI controls
MI cases
Diabetes
No Diabetes
Diabetes
9
37
No diabetes
16
82
25
119
46
98
144
OR estimate comes only from discordant pairs!!
OR= 37/16 = 2.31
Makes Sense!
McNemar’s Test
MI controls
MI cases
Diabetes
No Diabetes
Diabetes
9
37
No diabetes
16
82
Null hypothesis: P(“favors” case / discordant pair) = .5
(note: equivalent to OR=1.0 or cell b=cell c)
 53 
 53 
 53 
37
16
38
15
p  value   (.5) (.5)   (.5) (.5)   (.5)39 (.5)14  ...
 37 
 38 
 39 
By normal approximation to binomial:
Z 
53
)
10.5
2

 2.88; p  .01
3.64
53(.5)(. 5)
37  (
McNemar’s Test: generally
controls
exp
No exp
exp
a
b
No exp
c
d
cases
By normal approximation to binomial:
Z 
bc
b
c
)

2
2  bc
 2
(b  c )(. 5)(. 5)
bc
bc
4
b(
Equivalently:

2
1
bc 2
(b  c) 2
(
) 
bc
bc
95% CI for difference in
dependent proportions
MI controls
MI cases
Diabetes
No Diabetes
Diabetes
9
37
No diabetes
16
82
25
119
46
98
144
Var ( p1  p2 )  Var ( p1 )  Var ( p2 )  2Cov( p1 , p2 )
Var ( p E / D  p E / ~ D )

p E / D (1  p E / D )
p
(1  p E / ~ D )
 E /~D
 2Cov( p E / D , p E / ~ D )
ncases controls
ncases controls
(.32)(. 68)  (.17)(. 83)  2(.06 * .57  .26 * .11)
 .0024
144
 95% CI : .32 - .17  .15  1.96( .0024 )  .05  .24

Each pair is it’s own “agegender” stratum
Example:
Concordant for
exposure (cell “a”
from before)
Case (MI)
Control
Diabetes
1
1
No diabetes
0
0
Case (MI)
Control
Diabetes
1
1
No diabetes
0
0
Case (MI)
Control
Diabetes
1
0
No diabetes
0
1
Case (MI)
Control
0
1
Diabetes
1
0
Case (MI)
Control
Diabetes
0
0
No diabetes
1
1
No diabetes
x9
x 37
x 16
x 82
Mantel-Haenszel for pairmatched data
We want to know the relationship between diabetes and
MI controlling for age and gender.
Mantel-Haenszel methods apply.
RECALL: The Mantel-Haenszel
Summary Odds Ratio
k
ai d i

i 1 Ti
k
bi ci

i 1 Ti
Case
Control
Exposed
a
b
Not Exposed
c
d
Case (MI)
Control
Diabetes
1
1
ad/T = 0
No diabetes
0
0
bc/T=0
Case (MI)
Control
Diabetes
1
0
ad/T=1/2
No diabetes
0
1
bc/T=0
Case (MI)
Control
Diabetes
0
1
ad/T=0
No diabetes
1
0
bc/T=1/2
Case (MI)
Control
Diabetes
0
0
ad/T=0
No diabetes
1
1
bc/T=0
Mantel-Haenszel Summary OR
144
ORMH
ai d i
1
37 x

37
i 1 2
2
 144


1 16
bi ci
16 *

2
i 1 2
Mantel-Haenszel Test Statistic
(same as McNemar’s)
recall : E(n11k ) 
Var(n11k ) 
n1 k n1k
n  k
n1 k n1k n2 k n 2 k
n 2   k (n  k  1)
Concordant cells contribute 0
discordant cells :
(1)(1) 1
(1)(1)(1)(1) 1
11k 
 ;Var(n11k )  2

2
2
2 (2  1) 4
[
CMH 

 .5
con.disc.cells

 .5]2
case disc.cells
 .25
disc.cells
[.5(b)  .5(c)]2 (b  c) 2


(b  c)(.25)
bc
Example: Salmonella
Outbreak in France, 1996
From: “Large outbreak of Salmonella enterica serotype
paratyphi B infection caused by a goats' milk cheese, France,
1993: a case finding and epidemiological study” BMJ 312: 9194; Jan 1996.
Epidemic Curve
Matched Case Control Study
Case = Salmonella gastroenteritis.
Community controls (1:1) matched for:
 age group (< 1, 1-4, 5-14, 15-34, 35-44, 4554, 55-64, or >= 65 years)
 gender
 city of residence
Results
In 2x2 table form: any goat’s
cheese
Controls
Goat’ cheese
None
Goat’s cheese
23
23
None
6
7
29
30
Cases
b 23
OR  
 3.8
c 6
46
13
59
In 2x2 table form: Brand B
Goat’s cheese
Controls
Goat’ cheese B
None
Goat’s cheese B
8
24
None
2
25
10
49
Cases
b 24
OR  
 12.0
c 2
32
27
59
Case (MI)
Control
1
1
0
0
Case (MI)
Control
Brand B
1
0
None
0
1
Case (MI)
Control
Brand B
0
1
None
1
0
Case (MI)
Control
Brand B
0
0
None
1
1
Brand B
None
x8
x24
x2
x25
8 concordant exposed : 11k
n1 k n1k 2 *1
 E(n11k ) 

1
n  k
2
Observed(n11k )  11k  1  1  0
n1 k n1k n2 k n 2 k 2 *1 * 0 *1
Var(n11k )  2

0
4(2  1)
n   k (n  k  1)
Summary: 8 concordant-exposed pairs (=strata) contribute
nothing to the numerator (observed-expected=0) and nothing to
the denominator (variance=0).
n1 k n1k 0 *1
25 concordant unexposed : 11k  E(n11k ) 

0
n  k
2
Observed(n11k )  11k  0  0  0
n n n n
0 *1 * 2 *1
Var(n11k )  12k 1k 2 k  2 k 
0
4(2  1)
n   k (n  k  1)
Summary: 25 concordant-unexposed pairs contribute nothing to
the numerator (observed-expected=0) and nothing to the
denominator (variance=0).
2 discordant cells favor control : 11k
Observed(n11k )  11k  0  .5  .5
(1)(1) 1


2
2
n1 k n1k n2 k n 2 k 1 *1 *1 *1 1
Var(n11k )  2


4(2  1) 4
n   k (n  k  1)
Summary: 2 discordant “control-exposed” pairs contribute -.5
each to the numerator (observed-expected= -.5) and .25 each to
the denominator (variance= .25).
(1)(1) 1
24 discordant cells favor case : 11k 

2
2
Observed(n11k )  11k  1  .5  .5
n1 k n1k n2 k n 2 k 1 *1 *1 *1 1
Var(n11k )  2


4(2  1) 4
n   k (n  k  1)
Summary: 24 discordant “case-exposed” pairs contribute +.5
each to the numerator (observed-expected= +.5) and .25 each to
the denominator (variance= .25).
[8(0)  25(0)  24(.5)  2(.5)]2
 CMH 
0  0  24(.25)  2(.25)
22 (.25) 22
(24  2)
(b  c)




26(.25)
26
26
bc
2
2
2
2
M:1 matched studies

One-to-one pair matching provides the most costeffective design when cases and controls are
equally scarce.
 But when cases are the limiting factor, as with rare
diseases, statistical power may be increased by
selecting more than 1 control matched to each
case.
 But with diminishing returns…
M:1 matched studies

2:1 matched study of colorectal cancer.
 Background: Carcinoembryonic antigen (CEA) is
the classical tumor marker for colorectal cancer.
This study investigated whether the plasma levels
of carcinoembryonic antigen and/or CA 242 were
elevated BEFORE clinical diagnosis of colorectal
cancer.
From: Palmqvist R et al. Prediagnostic Levels of Carcinoembryonic Antigen and CA 242 in Colorectal
Cancer: A Matched Case-Control Study. Diseases of the Colon & Rectum. 46(11):1538-1544,
November 2003.
M:1 matched studies
Prediagnostic Levels of Carcinoembryonic Antigen and CA
242 in Colorectal Cancer: A Matched Case-Control Study
Study design: A so-called “nested case-control
study.”
Idea: Study subjects who were members of an
ongoing prospective cohort study in Sweden had
given blood at baseline, when they had no disease.
Years later, blood can be thawed and tested for the
presence of prediagnostic antigens.
Key innovation: The cohort is large, the disease is
rare, and it’s too costly to test everyone’s blood; so
only test stored blood of cases and matched
controls from the cohort.
M:1 matched studies

Two cancer-free controls were randomly selected
to each case from the corresponding cohort at the
time of diagnosis of the matched case.
Matched for:
 Gender
 age at recruitment (±12 months)
 date of blood sampling ±2 months
 fasting time (<4 hours, 4–8 hours, >8 hours).
2:1 matching:
•stratum=matching group
•3 subjects per stratum
•6 possible 2x2 tables…
Case (CRC)
Controls
CEA +
1
2
CEA -
0
0
Case (CRC)
Controls
CEA +
1
1
CEA -
0
1
Case (CRC)
Controls
CEA +
1
0
CEA -
0
2
Everyone exposed; noninformative
Case exposed; 1 control
unexposed
Case exposed; both
controls unexposed
Case (CRC)
Controls
CEA +
0
2
CEA -
1
0
Case (CRC)
Controls
CEA +
0
1
CEA -
1
1
Case (CRC)
Controls
CEA +
0
0
CEA -
1
2
Case unexposed; both
controls exposed
Case unexposed; 1
control exposed
Everyone unexposed;
non-informative
Case (CRC)
Controls
CEA +
1
2
CEA -
0
0
Case (CRC)
Controls
CEA +
1
1
CEA -
0
1
Case (CRC)
Controls
CEA +
1
0
CEA -
0
2
0
2
12
Case (CRC)
Controls
CEA +
0
2
CEA -
1
0
Case (CRC)
Controls
CEA +
0
1
CEA -
1
1
Case (CRC)
Controls
CEA +
0
0
CEA -
1
2
0
1
102
2 Tables with 2 exposed
Case (CRC)
Controls
CEA +
0
2
CEA -
1
0
Case (CRC)
Controls
CEA +
1
1
CEA -
0
1
2
2
13 Tables with 1 exposed
Case (CRC)
Controls
CEA +
1
0
CEA -
0
2
Case (CRC)
Controls
CEA +
0
1
CEA -
1
1
1
1
Represents all
possible
discordant
tables (either
2 or 1 total
exposed)
2 Tables with 2 exposed
Case (CRC)
Controls
CEA +
0
2
CEA -
1
0
Case (CRC)
Controls
CEA +
1
1
CEA -
0
1
2
2
 2 2
P(first table)  (1  p E / D )  p E /~ D (1  p E /~ D ) 0
 2
 2
P(second table )  ( p E / D )  p E /~ D (1  p E /~ D )
1
 2
( p E / D )  p E /~ D (1  p E /~ D )
1
P(case exposed/ 2 total exposed) 
 2 2
 2
(1  p E / D )  p E /~ D  ( p E / D )  p E /~ D (1  p E /~ D )
 2
1
 2
( p E / D )  p E /~ D (1  p E /~ D )
1

2
2
  2
 
(1  p E / D )  p E /~ D  ( p E / D )  p E /~ D (1  p E /~ D )
 2
1
( p E / D )2(1  p E / ~ D )

(1  p E / D ) p E / ~ D  ( p E / D )2(1  p E / ~ D )
2 p E / D p ~E /~D
p~ E / D p E / ~ D  2 p E / D p ~ E / ~ D
2

p E / D p ~E /~D
p~ E / D p E / ~ D
p~ E / D p E / ~ D
p~ E / D p E / ~ D
2
p E / D p ~E /~D
p~ E / D p E / ~ D
2OR

2OR  1
13 Tables with 1 exposed
Case (CRC)
Controls
CEA +
1
0
CEA -
0
Case (CRC)
2
Controls
CEA +
0
1
CEA -
1
1
1
1
 2 0
P(first table)  p E / D   p E /~ D (1  p E /~D ) 2
0
 2
P(second table )  (1  p E / D )  p E /~ D (1  p E /~ D )
1
 2
p E / D  (1  p E /~ D ) 2
0
P(case exposed/ 1 total exposed) 
 2
 2
2
p E / D  (1  p E /~ D )  (1  p E / D )  p E /~ D (1  p E / ~ D )
0
1
 2
p E / D  (1  p E /~ D ) 2
0

2
2
 
 
2
p E / D  (1  p E / ~ D )  (1  p E / D )  p E / ~ D (1  p E /~ D )
0
1
p E / D p~ E / ~ D
2
p E / D p~ E / ~ D  p~ E / D 2 p E / ~ D p~ E / ~ D
2
p E / D p~ E / ~ D
p E / D p~ E / ~ D  2 p~ E / D p E / ~ D

p E / D p~ E / ~ D
p~ E / D p E / ~ D
OR

p E / D p~ E /~ D 2 p~ E / D p E /~ D OR  2

p~ E / D p E / ~ D
p~ E / D p E / ~ D
Summary





P(case exposed/2 total exposed)=2OR/(2OR+1)
P(case unexposed/2 total exposed)=1-2OR/(2OR+1)
P(case exposed/1 total exposed) = OR/(OR+2)
P(case unexposed/1 total exposed)= 1-OR/(OR+2)
Therefore, we can make a likelihood equation for our data
that is a function of the OR, and use MLE to solve for OR
Applying to example data
2OR 2
2OR 0 OR 12
OR 1
P(data / OR) (
) (1 
) (
) (1 
)
2OR  1
2OR  1 OR  2
OR  2
2OR 2
1
OR 12
2
0
(
) (
) (
) (
)1
2OR  1 2OR  1 OR  2
OR  2
A little complicated to solve further…
Applying to example data
BD give a more simple robust estimate of OR for
2:1 matching:
1(# where 2 total exposed & case exposed)  2(# where1 total exposed & case exposed)
2(# where 2 total exposed & 2 controls exposed)  1(# where1 total exposed & control exposed)
1(2)  2(12)

 26.0
2(0)  1(1)
OR 
Download