Chapter 3-8. Bias And Confounding

advertisement
Chapter 3-8. Bias and Confounding
We consider error to be the difference between the unknown correct effect measure value, such
as an incidence rate ratio, and the study’s observed effect measure value (Rothman, 2002, p.94).
If
True IRR – Estimated IRR = “small” , we say that the study is accurate and has little error.
True IRR – Estimated IRR = “large” , we say the study is inaccurate and has large error.
Error in a research study can be classified either as random error or systematic error.
Random errors are those that can be reduced to zero if the sample size becomes infinitely large,
such as the inaccuracy of an incidence proportion estimate. This is a principal called statistical
regularity, which is demonstrated in the K30 Intro Biostatistics course.
Systematic errors are those that remain even when the sample size is infinitely increased.
For example, taking measurements with an improperly calibrated heart monitor, which always
measures the heart rate too low, would be a systematic error that would remain no matter how
large the sample size became (called bias due to instrumental error, Last, 1995)
Bias is another term for systemic error.
Bias can result from the investigator’s attitude, the way in which subjects have been selected, or
the way study variables were measured.
A study can also be biased due to some confounding factor that is not completely controlled-what Rothman and a few other experts refer to as confounding bias.
It is useful to think of confounding as something separate from bias, however, since different
approaches are used to avoid these errors in a research study.
Unlike Rothman, Vandenbrouchke et al (2007, p.W-172), in the STROBE statement for
reporting observational studies, STROBE_Observational_Studies_AnnIternMed2007.pdf, insist
on the distinction between bias and confounding:make this distinction:
“Bias and confounding are not synonymous. Bias arises from flawed information or
subject selection so that a wrong association is found. Confounding produces relations
that are factually right, but that cannot be interpreted causally because some underlying,
unaccounted-for factor is associated with both exposure and outcome.”
Some other examples of bias are listed in the following box.
_____________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.
Chapter 3-8 (revision 16 May 2010)
p. 1
A Sample of Specific Biases (Last, 1995)
Here are the definitions of a few specific biases selected from a much larger list that Last
included in his epidemiology dictionary.
Bias in assumption: (synonym: conceptual bias) Error arising from faulty logic or premises or
mistaken beliefs on the part of the investigator.
Design bias: The difference between a true value and that obtained as a result of faulty design of
study.
Detection bias: Bias due to systematic error(s) in methods of ascertainment, diagnosis, or
verification of cases in an epidemiologic study.
Bias of Interpretation: Error arising from inference and speculation.
Recall bias: Systematic error due to differences in accuracy or completeness of recall to memory
of past events or experiences.
Response bias: Systematic error due to differences in characteristics between those who choose
or volunteer to take part in a study and those who do not.
Bias due to withdrawals: A difference between the true value and that actually observed in a
study due to the characteristics of those subjects who choose to withdraw.
Bias can be classified into three broad categories:
selection bias
information bias
confounding
Exercise Look at the article by Delgado-Rodriquez and Llorca (2004).
1) Notice the long list of sources of bias, beginning on the second page. Sackett (1979) also
published a list of biases, which has been widely cited.
2) Notice that the biases are categorized into the three broad categories.
Exercise Look at the article by Grimes and Schulz (2002).
1) Notice these three broad categories of bias used as section headings.
Chapter 3-8 (revision 16 May 2010)
p. 2
Selection Bias
Selection bias is a systematic error in a study resulting from the procedures used to select
subjects and from factors that influence study participation (Rothman, 2002, p.96).
For example, if the association between exposure and disease, such as that measured with the
odds ratio, is different between participants and non-participants in a study, then selection bias is
introduced into the study.
An example of a selection bias is what is often referred to as the healthy worker effect. When
workers of a specific occupation are compared to the general population, the occupation tends to
have a lower overall death rate. This is because people in the occupation are healthy, while
people in the general population include many people who cannot work due to ill health. The
correct way to design such a study is to make a comparison with workers in another occupation.
Berkson’s Bias
This is a selection bias, a classic bias found in nearly all epidemiology textbooks, that can occur
in a case-control design that uses hospital controls.
Definition: Berkson’s bias (Last, 1995, p.15)
A form of selection bias that leads hospital cases and controls in a case control study to be
systematically different from one another.1 This occurs when the combination of
exposure and disease under study increases the risk of admission to hospital, leading to a
systematically higher exposure rate among the hospital cases than the hospital controls;
this, in turn, systematically distorts the odds ratio.
_______________________
1
Berkson J. Limitations of the application of fourfold table analysis to hospital data.
Biometrics Bull 1946;2:47-53.
Of course the bias can go in the other direction as well, leading to a systematically higher
exposure rate among the hospital controls.
This bias is also referred to “Berkson’s fallacy”, “Berksonian bias”, and “selection bias”.
Here is how the bias could operate.
Suppose a researcher came up with the idea to study whether or not water pills (diuretics) were a
risk factor for bladder cancer. The researcher finds N=500 cases of bladder cancer in from a
cancer registry. For controls, he decides to use N=500 patients from the hospital he works at,
admitted for any reason but cancer.
Chapter 3-8 (revision 16 May 2010)
p. 3
Suppose that in the general population, the incidence of bladder cancer is equal among those who
use diuretics and those who don’t (odds ratio = 1, or no associaton).
Designing a case-control study, now, the researcher begins with the 500 bladder cancer cases.
Starting with the cases (row totals shown)
Past or Current Diuretic Use
Bladder
Yes
No
Cancer
(exposed)
(unexposed)
Yes
75 (15%)
425 (85%)
No
Row
Total
500
If there really is no association, a random sample of N=500 controls from the general population
(not hospitalized controls) would have the same distribution of Diuretics.
If Had Used General Population Controls (row totals shown)
Past or Current Diuretic Use
Bladder
Yes
No
Row
Cancer
(exposed)
(unexposed)
Total
Yes
75 (15%)
425 (85%)
500
No
75 (15%)
425 (85%)
500
Odds Ratio = (75  425)/(75  425) = 1.0
Hypertension (or high blood pressure) is an early stage of heart disease. Diuretics are usually
given as the initial treatment for hypertension, and many patients are still using direutics when
later hospitalized for heart disease events.
Since heart disease makes up a large share of hospitializations, we should expect a random
sample of hospital controls to be more frequent users of diuretics. Thus, our data might look
like:
Using Hospital Controls (row totals shown)
Past or Current Diuretic Use
Bladder
Yes
No
Row
Cancer
(exposed)
(unexposed)
Total
Yes
75 (15%)
425 (85%)
500
No
100 (20%)
400 (80%)
500
Odds Ratio = (75  400)/(425  100) = 0.71
Chi-square test , p = 0.038
Since no association exists in the population, this observed protective effect is attributable to
Berkson’s bias.
Chapter 3-8 (revision 16 May 2010)
p. 4
Such a spurious association will not arise if either (Kraus, 1954; Lilienfeld and Stolley, 1994,
p.233):
1) the exposure does not affect hospitalization, that is, no person is hospitalized simply
because of the exposure; or
2) the rate of admission to the hospital for those persons with the disease is equal to those
without the disease.
Lilienfeld and Stolley (1994, p.233) clarify that condition 1) need only be an association, which
makes it difficult to judge, and they provide this example:
If the exposure is eye color, it could easily be assumed that this could not influence the
probably of hospitalization. It is possible, however, that persons with a particular eye
color more frequently belong to an ethnic group whose members are more frequently of a
particular social class, which in turn may influence the probability of hospitalization.
Lilienfeld and Stolley (1994, p.233) point out that condition 2 is rarely, if ever, met since
different diseases usually have different probabilities of hospitalization.
Exercise (good example of selecting controls to avoid Berkson’s bias)
Look at the case-control study published by Zhang et al (2005) in the American Journal
of Epidemiology. These researchers studied the following assocation:
Breast Cancer
Use of Nonsteroidal
Antiinflammatory
Drugs (NSAID)
[includes aspirin,
Ibuprofen, etc
Yes
No
Yes
(cases)
No
(controls)
Under the section heading Controls (p.166) they state:
“Controls were selected from a pool of 3,906 women aged 30-79 years with no
history of cancer who had been admitted to the hospital for nonmalignant diseases
that we considered unrelated to NSAID use. Eligible diagnoses included
appendicities, hernia of of the abdominal cavity, and traumatic injury.”
Notice they are avoiding the exposure-hospitalization association (meeting condition 1
above).
Chapter 3-8 (revision 16 May 2010)
p. 5
In the Discussion section, 3rd paragraph, page 169, they state:
“Several characteristics of our study are noteworthy. Controls were selected from
women whose diagnoses were unrelated to NSAID use. The distributions of
NSAID use among subgroups of the controls were similar, suggesting the absence
of selection bias.”
Information Bias
Information bias results when a systematic error is made in information collected on study
subjects.
For a categorical scale measurement, this biased information is often referred to as misclassified,
since the study subject will be placed in an incorrect category.
For example, a heavy smoker who is categorized as a light smoker is misclassified.
Misclassification for either exposure or disease can be differential or nondifferential, which refer
to the mechanism for misclassification.
For exposure misclassification, the misclassification is nondifferential if it is unrelated to the
occurrence or presence of disease; if the misclassification of exposure is different for those with
and without disease, it is differential.
Similarly, misclassification of disease is nondifferential if it is unrelated to exposure; otherwise,
it is differential.
Chapter 3-8 (revision 16 May 2010)
p. 6
A common type of information bias is recall bias, which occurs in case-control studies where a
subject is interviewed to obtain exposure information after disease has occurred.
An example is a case-control study where mothers of babies with birth defects are asked to recall
exposures during pregnancy, such as taking nonprescription drugs. Given the stimulus of an
adverse pregnancy outcome, these mothers recall vividly what they wondered might have caused
the birth defect. Mothers of normal babies, however, have no stimulus for recall and thus forget
such exposures. This particular version of recall bias is known as maternal recall bias.
Exercise
1) Maternal recall bias is an example of:
a)
b)
c)
d)
Nondifferential misclassification of disease
Differential misclassification of disease
Nondifferential misclassification of exposure
Differential misclassification of exposure
2) Consider the following data table.
Disease (Birth Defect)
Exposure (Extra
Strength Tylenol
used in 1st
Trimester)
Yes
No
Yes
No
a
c
b
d
Would we expect the OR (OR = ad/bc) to be too large or too small?
Chapter 3-8 (revision 16 May 2010)
p. 7
Nondifferential Misclassification
With nondifferential misclassification, either exposure or disease (or both) is misclassified, but
the misclassification does not depend on a person’s status for the other variable.
In contrast to maternal recall bias, all people to some extent have difficulty remembering when
responding to survey questions. This tends to result in non-differential misclassification.
Nondifferential misclassification of a dichotomous exposure will always bias an effect, if there is
one, toward the null value. (Rothman, 2002, p.100)
Nondifferential misclassification in a hypothetical case-control study
Correct
Classification
Nondifferential Misclassification
20% of
20% of No Yes
No  Yes*
20% of Yes  No
High-Fat Diet
High-Fat Diet
High-Fat Diet
Yes
No
Yes
No
Yes
No
Heart attack cases
250
450
340
360
290
410
Controls
100
900
280
720
260
740
Odds Ratio
5.0
2.4
2.0
*At first this column might appear as differential misclassification. It is
nondifferential, however, since both cases and controls are misclassified equally
by 20%.
If the exposure is not dichotomous, there may be bias toward the null value; but there may also
be bias away from the null value, depending on the categories to which individuals are
misclassified.
In general, nondifferential misclassification between two exposure categories will make the
effect estimates for those two categories converge toward one another.
In contrast, differential misclassification can either exaggerate (take the effect away from the null
in either the protective or deleterious direction) or underestimate an effect (take the effect
towards the null). (Rothman, 2002, p.99)
Exercise. Look at the article by Millard (1999).
1) Look at the possible sources of bias listed in the Data extraction section.
2) Look at the differences in RR for likely biased and likely not biased in the second half of the
Main Results section.
3) Finally, look at the 2nd paragraph of Millard’ commentary, where he suggests the possibility
of “diagnostic access bias”.
Chapter 3-8 (revision 16 May 2010)
p. 8
Exercise (design-related bias in studies of diagnostic tests)
Look at the article by Lijmer et al (1999). Notice in the same pdf file that there is a correction to
their DOR formula. Using the diagnostic odds ratio (DOR), which is another measure of the
diagnostic accuracy of a test [see box], they show that studies of diagnostic tests are subject to
design-related bias.
Some Diagnostic Test Definitions
With the data in the required form for Stata:
Gold Standard “true value”
disease present ( + )
disease absent ( - )
Test “probable value”
disease present ( + )
disease absent ( - )
a (true positives)
b (false negatives)
c (false positives)
d (true negatives)
a+c
b+d
a+b
c+d
We define the following terminology, expressed as percents:
sensitivity = (true positives)/(true positives plus false negatives)
= (true positives)/(all those with the disease)
= a / (a + b) 100
specificity = (true negatives)/(true negatives plus false positives)
= (true negatives)/(all those without the disease)
= d / (c + d)  100
likelihood ratio positive (LR+) = sensitivity / (1 – specificity)
= odds that a positive test result would be found in a patient
with, versus without, a disease
likelihood ratio negative (LR-) = (1 – sensitivity) / specificity
= odds that a negative test result would be found in a patient
without, versus with, a disease
diagnostic odds ratio (DOR) = LR+ / LR= (a/b)/(c/d) = ad/bc
= odds of a positive test result in diseased persons relative to the
odds of a positive test in nondiseased persons
Chapter 3-8 (revision 16 May 2010)
p. 9
Sensitivity Analysis
Since the researcher cannot be sure of the extent that information bias has influenced the study
results, if at all, a correction for the bias cannot be made in the data analysis.
What clever researchers do in their papers, then, when the potential for information bias is a
concern, is to present a sensitivity analysis in the Discussion section. Otherwise, the reader
might just dismiss the paper altogether, being concerned that the results were too biased to draw
conclusions.
Greenland (1998, p.343) advises always including some level of sensitivity analysis in a
scientific paper:
“Potential biases due to unmeasured confounders, classification errors, and selection bias
need to be addressed in any thorough discussion of study results.”
Example: You are publishing a paper on a novel approach to convince patients, who visit your
clinic for medical reasons, to quit smoking. You randomly present the approach to
half of your patients, and then find that 15% of your study group quit smoking,
compared to 0% of the control group (RR=0.85, p<0.001).
You recognized that the reader may think this result is too good to be true, the reader
perhaps wondering if some large part of the 15%, perhaps 14.5%, of the study group
was lying to you.
A clever thing to include in your paper is a sensitivity analysis, reporting the RR that
would have been the result if 5% lied and if 10% lied.
How to conduct a sensitivity analysis is presented in it’s own chapter of this course manual.
Confounding
A simple definition of confounding would be
the confusion, or mixing, of effects
This definition implies that the effect of the exposure is mixed together with the effect of another
variable, leading to a bias.
We call such variables confounding variables or confounders.
Chapter 3-8 (revision 16 May 2010)
p. 10
Example of Confounding
Affected Babies per 1000 Live Births
Rothman (2002, p.101) provides a classic example of confounding, which is the relation between
birth order and the occurrence of Down syndrome.
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1
2
3
Birth Order
4
5+
These data suggest that the prevalence of Down syndrome is associated with (perhaps “causally”)
birth order.
The effect of birth order, however, is a blend of whatever effect birth order has by itself and the
effect of another variable that is closely correlated with birth order.
Affected Babies per 1000 Live Births
The other variable is the age of the mother.
9
8
7
6
5
4
3
2
1
0
<20
20-24
25-29 30-34 35-39
Maternal Age
40+
This figure gives the relation between mother’s age and the occurrence of Down syndrome from
the same data.
Chapter 3-8 (revision 16 May 2010)
p. 11
1.8
Affected Babies per 1000 Live Births
Affected Babies per 1000 Live Births
It indicates a much stronger relationship (8.5 per 1000 for the highest category vs 1.7 per 1000
for the highest category from the previous graph—notice difference in scale of Y axis).
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1
2
3
Birth Order
4
5+
9
8
7
6
5
4
3
2
1
0
<20
20-24
25-29 30-34 35-39
Maternal Age
40+
Because birth order and the age of the mother are highly correlated, we can expect that mothers
giving birth to their fifth baby are, as a group, considerably older than mothers giving birth to
their first baby.
Thus, the birth order effect is mixed with the mother’s age effect. We call this mixing of effects
confounding. In this example, the birth order effect is confounded with maternal age.
We can resolve this confounding by consider both effects simultaneously.
Chapter 3-8 (revision 16 May 2010)
p. 12
In this graph, we see a striking trend with maternal age (cases increase for each birth order),
while there is no trend with birth order (cases essentially constant for each maternal age).
This last graph is an example of a stratified display, in contrast to the previous graphs which are
examples of a crude display.
The stratified display reveals that the erroneous birth order effect was due to confounding with
maternal age.
Chapter 3-8 (revision 16 May 2010)
p. 13
Properties of a confounding factor
A confounding factor must have an effect on disease and it must be imbalanced between the
exposure groups to be compared.
That is, a confounding factor must have two associations:
1) A confounder must be associated with the disease
2) A confounder must be associated with exposure.
Diagrammatically, the two necessary associations for confounding are:
Confounder
association
association
Exposure
Disease
confounded effect
There is also a third requirement.
A factor that is an effect of the exposure and an intermediate step in the causal pathway from
exposure to disease will have the above associations, but causal intermediates are not
confounders; they are part of the effect that we wish to study.
Thus, the third property of a confounder is as follows:
3) A confounder must not be an effect of the exposure.
Rothman (2002, p.164) points out that the degree of confounding is not dependent upon
statistical significance, but rather upon the strength of the associations between the confounder
and both exposure and disease. He advocates that a better way to evaluate confounding is to
statistically control for the potential confounder, using stratification or regression analyses, and
determine whether the unconfounded result (adjusted model) differs from the potentially
confounded result (the unadjusted model). If they differ, then confounding is present in the
unadjusted model.
Chapter 3-8 (revision 16 May 2010)
p. 14
Example (Stoddard’s hypothetical data)
A researcher wants to study the association of a diet high in pizza and the disease outcome skin
acne. The following data are collected using a cross-sectional study design:
Eat Pizza
(at least twice per month)
Noticeable Skin Acne
Yes
No
Yes
80
40
No
120
149
Odds Ratio = 2.48 , p < 0.001
Thus, an association between pizza consumption and skin acne is observed in this crude analysis.
One might consider, however, if the association is confounded by age. The data are stratified and
these are the results:
Teenager (age 13-19)
Eat Pizza
at least twice per month
Noticeable Skin Acne
Yes
No
Yes
60
10
No
20
4
Odds Ratio = 1.20 , p = 0.78
Adult (age 20+)
Eat Pizza
at least twice per month
Noticeable Skin Acne
Yes
No
Yes
20
30
No
100
145
Odds Ratio = 0.97 , p = 0.91
Now we see that the association does not hold in either age group.
Since the crude display gives a different result than the stratified display of the data, the result is
confounded. That is, the pizza-acne association observed in the crude display is confounded by
age.
Chapter 3-8 (revision 16 May 2010)
p. 15
Filling in the labels to our diagram,
Confounder
association
association
Exposure
Disease
confounded effect
we have
Age
association
association
Pizza
Acne
confounded effect
Chapter 3-8 (revision 16 May 2010)
p. 16
There is an association between age and pizza, as 80/94, or 85%, of teenagers eat pizza regularly,
while 120/295, or 41%, of adults eat pizza regularly. This “imbalance” represents an
“association” (as age increases, pizza eating decreases).
Age
association
85% vs 41%
association
Pizza
Acne
confounded effect
There is a well-known assocation between age and acne, acne being mostly a teenager disease.
In these data, 70/94, or 74%, of children have acne, while only 50/295, or 17%, of adults have
acne. This imbalance represents an association (as age increases, acne decreases).
Age
association
85% vs 41%
association
74% vs 17%
Pizza
Acne
confounded effect
We therefore have the two assocations that must be present for age to be a confounder
1) A confounder must be associated with the disease (age-acne association)
2) A confounder must be associated with exposure (age-pizza association)
We also satisfied the third property of a confounder:
3) A confounder must not be an effect of the exposure.
(This holds, since eating pizza does not cause a person to be a teenager.)
This example illustrates what is usually the case in a confounded relationship. What we consider
to be the putative cause of the disease is simply a surrogate for something else, the confounder,
that is the cause, or more directly related to the cause, of disease.
That is, pizza is a surrogate for teenager, since teenagers are the big pizza consumers; and
something that occurs in the teenage years produces acne, perhaps related to rising hormonal
levels.
Chapter 3-8 (revision 16 May 2010)
p. 17
Control of Confounding
Confounding is a systematic error that investigators aim either to prevent or to remove from a
study.
There are two common methods to prevent confounding.
One of them, randomization, or the random assignment of subjects to experimental groups, can
only be used in experiments. Randomization produces study groups with nearly the same
distribution of characteristics (creates balance), and so removes one of the two required
associations present in confounding (the exposure-confounder association).
The other, restriction, involves selecting subjects for a study who have the same value, or nearly
the same value, for a variable that might be a confounder, thus achieving balance.
Rothman (2002, pp.20,110-111) provides a very convincing argument to support restriction.
Basically, the arguments proceeds as follows. The idea of “representativeness” comes from
survey sampling, where the sample is supposed to look like the general population. So, when
researchers restrict their sample to a more limited group of subjects, there is an impression in
many people’s minds that this is a bad thing, where they think the study results do not
generalized to the population as a whole. However, the idea is not to achieve representativeness,
as a survey sample does. The idea is to test a theory, which can only be done if confounding is
removed. By restriction, confounders are removed because the subjects have the same value on
the confounder (balance on the confounder). This takes the study closer to the conterfactual
ideal, so that a causal inference is more tenable.
Matching
A third method to prevent confounding, matching, is deferred to the Regression Models course,
where the analysis of a matched study is discussed in the conditional logistic regression chapter.
Sometimes it works better than just using regression models to control for confounding, while
other times it performs worse. For example, in a case-control study, matching can introduce
confounding into the study when there was none to begin with. (Rothman and Greenland, p. 151)
Chapter 3-8 (revision 16 May 2010)
p. 18
Exercise
In the Rauscher (2000) article:
1) Notice the use of restriction in the fourth line of the Abstract.
2) Look at the last two sentences of the Selection of Controls section. Notice the use of
interview type matching to minimize recall bias.
3) Look at Table 2:
a) Notice how Table 2 is a collection of two-way stratifications (something analogous to the
two-way histogram we used in the Down syndrome example above).
b) Notice one result this approach produced (reported in the BMI As a Categorical Variable
section).
4) Look at the 2nd paragraph of the 2nd column of the Discussion section. Notice they discuss a
sensitivity analysis to account for bias (although their actual sensitivity analysis was poorly
presented).
Chapter 3-8 (revision 16 May 2010)
p. 19
Stata Exercise
Evans County Dataset (evans.xls)
Data are from a cohort study in which n=609 white males were followed for 7
years, with coronary heart disease as the outcome of interest.
Codebook
n = 609
outcome
chd
coronary heart disease (1=presence, 0=absence)
predictors
cat
catecholamine level (1=high, 0=normal)
age
age in years (continuous)
chl
cholesterol (continuous)
smk smoker (1=ever smoked, 0=never smoked)
ecg
electrocardiogram abnormality (1=presence, 0=absence)
dbp
diastolic blood pressure (continuous)
sbp
systolic blood pressure (continuous)
hpt
high blood pressure (1=presence, 0=absence)
defined as: DBP  160 or SBP  95
We will use the evans.dta dataset to illustrate confounding.
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on evans.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\evans.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\”
cd “Biostats & Epi With Stata\datasets & do-files"
use evans.dta, clear
Recall, these data were produced using a cohort study design, so the risk ratio is an appropriate
measure of effect. We will illustrate with the odds ratio, however, so we can compare the results
to a logistic regression approach.
The exercise is to see if the smoking-CHD association is confounded by age in this dataset.
Chapter 3-8 (revision 16 May 2010)
p. 20
First fitting a univariable (one predictor variable) logistic regression,
Statistics
Binary outcomes
Logistic regression (reporting odds ratios)
Model tab: Dependent variable: chd
Independent variables: smk
OK
logistic chd smk
Logistic regression
Log likelihood = -216.40647
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
609
5.75
0.0165
0.0131
-----------------------------------------------------------------------------chd | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smk |
1.955485
.5708642
2.30
0.022
1.103477
3.465337
------------------------------------------------------------------------------
we see that there is a significant smoking-CHD association (OR=1.96, p=0.022).
Next, fitting a multivariable (two or more predictor variables) logistic regression, adjusting for
the continuous variable age,
Statistics
Binary outcomes
Logistic regression (reporting odds ratios)
Model tab: Dependent variable: chd
Independent variables: smk age
OK
logistic chd smk age
Logistic regression
Log likelihood = -209.31516
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
=
=
=
=
609
19.93
0.0000
0.0454
-----------------------------------------------------------------------------chd | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smk |
2.30354
.6892501
2.79
0.005
1.281459
4.140823
age |
1.052105
.0141545
3.78
0.000
1.024725
1.080216
------------------------------------------------------------------------------
Chapter 3-8 (revision 16 May 2010)
p. 21
Checking to see if the adjusted effect changed the unadjusted effect by more than 10%,
display 1.96*1.10
2.156
We see that the adjusted odds ratio (adjusted OR=2.30) differs by more than 10% from the
unadjusted odds ratio (unadjusted OR= 1.96 × 1.1 = 2.16), so by definition, the unadjusted
smoking-CHD assocation was confounded by age.
Here we are using the “10% change in estimate” variable selection rule that has been proposed
for determining if the putative confounder needs to be adjusted for in a regression model. (see
box)
“10% change in estimate” variable selection rule
A variable selection rule consistent with this definition of confounding is the change-in-estimate
method of variable selection. In this method, a potential confounder is included in the model if it
changes the coefficient, or effect estimate, of the primary exposure variable by 10%. This
method has been shown to produce more reliable models than variable selection methods based
on statistical significance [Greenland, 1989].
In practice, we could stop here. For illustration, however, lets assess confounding by the “two
associations” definition. We could compute odds ratios between 1) smoking and age, and
between 2) age and CHD, to test for the associations. More quickly, we can just use Pearson
correlation coefficients, another way to test for associatons.
Statistics
Summaries, tables, & tests
Summary and descriptive statistics
Pairwise correlations
Main tab: Variables: chd age smk
Options: Print significance level for each entry
OK
pwcorr chd age smk, sig
|
chd
age
smk
-------------+--------------------------chd |
1.0000
|
|
age |
0.1393
1.0000
|
0.0006
|
smk |
0.0944 -0.1391
1.0000
|
0.0198
0.0006
Chapter 3-8 (revision 16 May 2010)
p. 22
We see that both assocations are present, indicating confounding by the “two assocations”
definition.
Note: In epidemiologic studies, effect measures such as the odds ratio are preferred to ordinary
correlation coefficients. Although we can see that the association is significant, from examining
the p values, it is difficult to get a feel for the size of the effect from the correlation coefficient.
The odds ratio, on the other hand, provides such a feel.
To investigate the association further, let’s next try a “stratified” analysis, where we are testing
the smoking-CHD association within homogenous age subgroups, or age strata.
First, creating a categorical age variable,
recode age 40/49=1 50/59=2 60/69=3 70/76=4, gen(agecat)
Then, requesting a logistic regression for each of these age categories,
Statistics
Binary outcomes
Logistic regression (reporting odds ratios)
Model tab: Dependent variable: chd
Independent variables: smk
by/if/in tab: Repeat command by groups
Variables that define groups: agecat
OK
by agecat, sort : logistic chd smk
*
<or>
bysort agecat: logistic chd smk
Chapter 3-8 (revision 16 May 2010)
p. 23
-> agecat = 1
Logistic regression
Log likelihood = -67.277892
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
247
4.33
0.0375
0.0311
-----------------------------------------------------------------------------chd | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smk |
3.849057
2.922623
1.78
0.076
.8690193
17.04823
------------------------------------------------------------------------------> agecat = 2
Logistic regression
Log likelihood = -62.733206
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
203
5.19
0.0227
0.0398
-----------------------------------------------------------------------------chd | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smk |
3.675676
2.368044
2.02
0.043
1.039807
12.99336
------------------------------------------------------------------------------> agecat = 3
Logistic regression
Log likelihood = -58.821111
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
115
0.17
0.6809
0.0014
-----------------------------------------------------------------------------chd | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smk |
1.208081
.5559709
0.41
0.681
.4901901
2.977333
------------------------------------------------------------------------------> agecat = 4
Logistic regression
Log likelihood = -17.466348
Number of obs
LR chi2(1)
Prob > chi2
Pseudo R2
=
=
=
=
44
3.63
0.0569
0.0940
-----------------------------------------------------------------------------chd | Odds Ratio
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------smk |
6.333333
7.15092
1.63
0.102
.6927028
57.90522
------------------------------------------------------------------------------
In Chapter 10, we will see how to do a more sophisticated stratified analysis, where the agespecific ORs are combined into a single summary measure.
Chapter 3-8 (revision 16 May 2010)
p. 24
References
Delgado-Rodriquez M, Llorca J. (2004). Bias. J Epidemiol Community Health 58:635-641.
Greenland S. (1989). Modeling and variable selection in epidemiologic analysis. Am J Public
Health 79(3):340-349.
Greenland S. (1998). Chapter 19. Basic methods for sensitivity analysis and external adjustment.
In, Rothman KJ, Greenland S. Modern Epidemiology, 2nd ed. Philadelphia PA,
Lippincott-Raven Publishers, pp. 343-357.
Grimes DA, Schulz KF. (2002). Bias and causal associations in observational research. Lancet
359:248-52.
Kraus AS. (1954). The use of hospital data in studying the association between a characteristic
and a disease. Pub Health Rep 69:1211-1214.
Last JM. (1995). A Dictionary of Epidemiology. 3rd ed. New York, Oxford University Press.
Lilienfeld DE, Stolley PD. (1994). Foundations of Epidemiology, 3rd ed, New York,
Oxford University Press.
Millard PS. (1999). Review: bias may contribute to association of vasectomy with prostate
cancer. West J Med 171:91.
Rauscher GH, Mayne ST, Janerich DT. (2000). Relation between body mass index and lung
cancer risk in men and women never and former smokers. Am J Epidemiol 152:506-13.
Rothman KJ. (2002). Epidemiology: An Introduction. New York, Oxford University Press.
Rothman KJ, Greenland S. (1998). Modern Epidemiology, 2nd ed. Philadelphia PA,
Lippincott-Raven Publishers.
Vandenbrouchke JP, von Elm E, Altman DG, et al. (2007). Strengthening and reporting of
observational studies in epidemiology (STROBE): explanation and elaboration. Ann
Intern Med 147(8):W-163 to W-194.
Chapter 3-8 (revision 16 May 2010)
p. 25
Download