Reading: Chapter 3.3, 8
I. Multiple Control Groups
The central concern in an observational study that compares treated and control groups in a matched study that matches for observed covariates is that there is hidden bias – within a matched set, the treated and control groups differ on an unobserved covariate.
An observational study has multiple control groups if it has several distinct groups of subjects who did not receive the treatment. In a randomized experiment, every control is denied the treatment for the same reason, namely, the toss of a coin. In an observational study, there may be several distinct ways that the treatment is denied to a subject. If these several control groups have outcomes that differ substantially and significantly after observed covariates have been adjusted for, then this cannot reflect an effect of the treatment, since no control subject received the treatment. It must reflect, instead, some form of hidden bias.
Example 1: Occupational Exposure to Hydrocarbons and
Kidney Disease
To investigate whether occupational exposures to hydrocarbons cause renal disorders, Douglas and Carney (1998, Occupational and Environmental Medicine ) compared 92 road workers exposed to asphalt or bitumen fumes to two control groups,
namely, “38 hard rock quarry workers not occupationally exposed to hydrocarbons, and 43 office workers also not exposed to hydrocarbons.” They identified renal function abnormalities using blood and urine tests.
Renal
Function
Road Office Quarry
Abnormal
Normal
24
68
4
39
2
36
Total 92 43 38
Using Fisher’s exact 2x2 test without hidden bias, there is no significant difference between the control groups. fisher.test(matrix(c(4,39,2,36),ncol=2))
Fisher's Exact Test for Count Data data: matrix(c(4, 39, 2, 36), ncol = 2) p-value = 0.6792 alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.2455057 21.4091805 sample estimates: odds ratio
1.832704
Example 2: Case-Control Study
A case-control study compares the treatment histories of groups of subjects defined by their responses, that is a case group and
one or more noncase or “control” groups (the text book refers to the noncases as referents). See Section 3.3 of the text book. For example, Doll and Hill (1952, British Medical Journal) compared the smoking histories of lung cancer patients (the cases) and patients with other diseases in the same hospital (the
“control” or noncases). This is quite different from a cohort study in which smokers and nonsmokers are compared with respect to subsequent outcomes. There is however, a key result due to Cornfield (1951, Journal of National Cancer Institute) which states that certain population odds ratios estimated from case-control studies are equal to the corresponding population odds ratios obtained from cohort studies of the same population.
A synthetic case-control study starts with the population of subjects and draws a random sample of cases and a separate random sample of noncases, possibly after stratification using the observed covariates X . Synthetic case-control studies are typically conducted when there is a computerized database describing the entire population of subjects but the study requires the costly collection of additional data not in the database.
For a synthetic case control study, at a fixed X
x , define
x to be the ratio of the odds of disease under the treatment and under the control, i.e.,
Thus,
P r
T
1|
P r
C
1|
( )
X
x
X
x
P r
T
0 |
P r
C
0 |
X
x )
X
x )
.
is a measure of the effect of the treatment on the subpopulation of subjects with X
x .
Let S denote different types of noncases. There is no selection bias in the selection of noncases if
Z independent of S given X .
Let
s x be the ratio of the odds of exposure to the treatment for cases and noncases in stratum
(
1| R
1, X
x , S
s )
S
s .
s
(
0 | R
1,
( 1| R 0,
X
X
x x ,
, S
s
S
s
)
)
(
0 | R
0, X
x , S
s )
In a synthetic case control study,
s x can be estimated directly from the corresponding empirical odds ratio at X
x .
Cornfield’s key results states that
( )
s
( ) if there is no hidden bias in treatment assignment: r r
T C
)
independent of Z given X and there is no selection bias in the selection of noncases.
Proof:
s
(
1| R
1,
(
(
0 |
1|
R
R
1,
0,
(
0 | R
0,
X
x )
X
X
x x
)
)
X
x )
(no selection bias in selection of noncases)
(
1| Z
1,
(
(
0 |
1|
Z
Z
1,
0,
(
0 | Z
0,
X
x )
X
X
X
x x x
)
)
)
(by Bayes Theorem)
P r
T
1| Z
1,
P r
T
0 | Z
1,
P r
C
1| Z
0,
P r
C
0 | Z
0,
X
x )
X
X
X
x x x
)
)
)
x
If the treatment has no effect, then
( )
( )
1
for all x
. We can test this using the Mantel-Haenszel test for whether the exposure is independent of being a case in each x strata. The Mantel-
Haenszel test combines Fisher’s exact tests for each strata.
Hiller, Giacometti and Yuen (1977, American Journal of
Epidemiology ) studied the effects of sunlight on the risk of cataract. The treatment consisted of exposure to more than 3000 hours of sunshine ( Z
less than 2400 hours (
1)
Z
each year as opposed to exposure to
0) . Here “exposure” refers simply to living in a region of the United States with these levels of total annual exposure. Cataract cases ( R
1) were obtained from a registry. Noncases were drawn from three strata of noncases in the population: noncases in the same registry having diabetic retinopathy ( R
0, S
1)
R
S
, noncases in the same registry having severe myopia ( 0, 2) having optic nerve disease ( R
and noncases in the same registry
0, S
3) . The complete population also contains noncases from another stratum from which no noncases are available, namely the stratum of all noncases not in the registry, possibly because of no eye disease.
The covariates age and sex were stratified on.
Age Males Females
20-44
Z
0
R
1
8
R
0, S
1
9
Z
33
96
1 Z
6
3
0 Z
23
54
1
46-64
65-74
75+
R
0, S
2
11
R
0, S
3
45
R
0
65
R
1
19
R
0, S
1
18
R
0, S
2
16
R
0, S
3
48
R
0
82
R
1
33
R
0, S
1
3
R
0, S
2
9
R
0, S
3
11
R
0
23
R
1
121
R
0, S
1
2
R
0, S
2
8
R
0, S
3
15
56
204
356
139
172
79
226
477
76
90
36
84
210
172
41
22
70
6
26
35
30
13
7
25
45
26
13
3
11
27
165
14
5
13
29
139
222
114
222
95
134
451
99
185
48
49
282
364
123
50
64
R
0
25 133 32 237 sunlight.data.array=array(c(8,65,33,356,6,35,23,222,19,82,139,477,30,45,114,451,
33,23,76,210,26,27,99,282,121,25,172,133,165,32,364,237),dim=c(2,2,8),dimnam es=list(R=c(1,0),Z=c(1,0),Strata=c("20-44 Male","20-44 Female","46-64
Male","46-64 Female","65-74 Male","65-74 Female","75+ Male","75+ Female")));
mantelhaen.test(sunlight.data.array);
Mantel-Haenszel chi-squared test with continuity correction data: sunlight.data.array
Mantel-Haenszel X-squared = 89.2826, df = 1, p-value < 2.2e-16 alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
2.064884 3.009394 sample estimates: common odds ratio
2.492799 exposure to sunlight increase the risks of cataracts -- estimated odds ratio: 2.49; 95% CI: (2.06, 3.01).
This study is not a synthetic case-control study because the noncases were not randomly sampled from the population.
There is the possibility of selection bias in the selection of noncases. For example, in a study in which the cases are patients with lung cancer at a hospital are compared to noncases who were selected as patients with cardiac disease in the same hospital, the odds ratio linking lung cancer with cigarette smoking would be too small because smoking causes both lung cancer and cardiac disease. In Hiller et al.’s study, there would be selection bias if sunlight exposure causes an increase in the risk of the other diseases, namely diabetic retinopathy, myopia and optic nerve disease.
If there is no hidden bias in treatment assignment and no selection bias in the selection of noncases, then a comparison of two noncase groups S
s and S
s ' should show no evidence of a treatment effect:
(
1| R
0, S
s ,
(
(
0 | R
0,
1| R
0,
S
s ,
S
s ',
(
0 | R
0, S
s ',
X
x )
X
X
X
x x x
)
)
)
1 for all , ' and all x
Informally, if adjustments for x suffice to estimate the treatment effect,
x , then the control groups will not differ from one another after adjustment for x in the sense that
.
We can test for the combination of hidden bias/selection bias by comparing noncase groups. diabetic.retinopathy.vs.myopia.array=array(c(9,11,96,56,3,6,54,29,18,16,172,79,13
,7,222,95,3,9,90,36,13,3,185,48,2,8,41,22,14,5,123,50),dim=c(2,2,8),dimnames=li st(R=c(1,0),Z=c(1,0),Strata=c("20-44 Male","20-44 Female","46-64 Male","46-64
Female","65-74 Male","65-74 Female","75+ Male","75+ Female"))); mantelhaen.test(diabetic.retinopathy.vs.myopia.array);
Mantel-Haenszel chi-squared test with continuity correction data: diabetic.retinopathy.vs.myopia.array
Mantel-Haenszel X-squared = 13.4261, df = 1, p-value = 0.0002481 alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.3533570 0.7222181 sample estimates: common odds ratio
0.505174 diabetic.retinopathy.vs.optic.nerve.array=array(c(9,45,96,204,3,26,54,139,18,48,17
2,226,13,25,222,134,3,11,90,84,13,11,185,49,2,15,41,70,14,13,123,64),dim=c(2,2,
8),dimnames=list(R=c(1,0),Z=c(1,0),Strata=c("20-44 Male","20-44 Female","46-
64 Male","46-64 Female","65-74 Male","65-74 Female","75+ Male","75+
Female"))); mantelhaen.test(diabetic.retinopathy.vs.optic.nerve.array);
mantelhaen.test(diabetic.retinopathy.vs.optic.nerve.array);
Mantel-Haenszel chi-squared test with continuity correction data: diabetic.retinopathy.vs.optic.nerve.array
Mantel-Haenszel X-squared = 40.4149, df = 1, p-value = 2.054e-10 alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.2872970 0.5211056 sample estimates: common odds ratio
0.3869265 myopia.vs.optic.nerve.array=array(c(11,45,56,204,6,26,29,139,16,48,79,226,7,25,9
5,134,9,11,36,84,3,11,48,49,8,15,22,70,5,13,50,64),dim=c(2,2,8),dimnames=list(R
=c(1,0),Z=c(1,0),Strata=c("20-44 Male","20-44 Female","46-64 Male","46-64
Female","65-74 Male","65-74 Female","75+ Male","75+ Female"))); mantelhaen.test(myopia.vs.optic.nerve.array);
Mantel-Haenszel chi-squared test with continuity correction data: myopia.vs.optic.nerve.array
Mantel-Haenszel X-squared = 1.2541, df = 1, p-value = 0.2628 alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.6131166 1.1287240 sample estimates: common odds ratio
0.8318891
This study shows evidence of hidden bias and/or selection bias.
The diabetic retinopathy noncases differ significantly from the myopia noncases and also the diabetic retinopathy noncases differ significantly from the optic nerve noncases.
To reject the hypothesis of combined no hidden bias/selection bias, as has been done here, is not to conclude that the treatment has no effect, nor to conclude that the study was poorly conducted, nor to conclude that the study’s results are uninteresting or undeserving of publication. Rather, to reject the hypothesis of no hidden bias/selection bias is to conclude that adjustment of x alone is insufficient to remove bias, and therefore that the conventional estimates and significance levels cannot be taken at face value. Indeed, a good observational study will be designed to permit several tests of the hypothesis of no hidden bias/selection bias. The results of tests of hidden bias/selection bias are simply part of the record of the study’s results, intended to aid sober interpretation.
Course Summary
I. Potential Outcomes Model
Potential outcomes model provides a model for the causal effect of a treatment.
II. Randomized Experiments
1. Randomized experiments are the ideal way to estimate the causal effects of a treatment.
2. Randomization inference provides a way to analyze experiments that makes no assumptions beyond randomization.
III. Observational Studies
A. Adjusting for Overt Bias
1. Propensity Scores
2. Pair Matching
3. Matching with multiple controls, full matching
B. Sensitivity analysis for hidden bias
C. Instrumental variables methods
D. Tests for hidden bias
1. Known Effects
2. Multiple Control Groups