The Odds Ratio: Impact of Study Design

RANDOM BYTES
Editor: Garrett Fitzmaurice, ScD
The Odds Ratio: Impact of Study Design
Garrett Fitzmaurice, ScD
From the Department of Biostatistics, Harvard School of Public Health,
Boston, Massachusetts, USA
n the previous column1 we considered
the “two-by-two” contingency table, a
simple square array used to provide a
tabular display of the relationship between
two categorical variables, each having only
two levels. Figure 1 shows such a table,
where the rows correspond to nutritional
status and the columns correspond to disease status. We showed that the relationship
between the two variables can be described
in terms of the odds ratio,
I
Pr(disease ⱍ malnourished)/
Pr(no disease ⱍ malnourished)
OR ⫽
Pr(disease ⱍ well nourished)/
Pr(no disease ⱍ well nourished)
⫽ 共a ⴱ d兲/共b ⴱ c兲
where OR denotes the odds ratio and Pr
denotes probability. In the previous column,
we also showed that the odds ratio has many
appealing properties that account for its
widespread use in practice. First, the odds
ratio can often be interpreted as an approximation to the relative risk (or risk ratio) of
disease in cases where the probability of
disease is rare.2 Second, the odds ratio is
invariant to reversals of the orientation of
the two-by-two contingency table. That is,
the odds ratio remains the same when rows
and columns of the table are interchanged; a
property that is not shared by other measures of association, e.g., the relative risk.
This latter property implies that it is not
necessary to distinguish which of the two
variables is considered to be the outcome
and which is considered to be the predictor
to estimate the odds ratio. In this column,
we will see that a very appealing feature of
the odds ratio is that it is equally valid
regardless of whether the study design is
prospective or retrospective. This unique
property of the odds ratio is not shared by
other measures of association and has implications for the design of studies that examine the relationship between disease and
a hypothesized risk factor (e.g., nutritional
status).
Correspondence to: Garrett Fitzmaurice, ScD,
Department of Biostatistics, Harvard School of
Public Health, Room 423, 665 Huntington Avenue, Boston, MA 02115, USA. E-mail: fitzmaur@
hsph.harvard.edu
STUDY DESIGN
To examine the relationship between disease and a hypothesized risk factor, e.g.,
nutritional status, we ordinarily must obtain
data on both disease and nutritional status in
a sample of individuals. There are two common designs for observational studies of the
association between disease and a specific
risk factor: the prospective and the retrospective study. The prospective study,
sometimes known as the cohort study, attempts to mimic a designed experiment.
That is, in a prospective study individuals
are selected into the study on the basis of
their nutritional status. In the simplest case,
the greatest power for detecting an effect of
nutritional status on disease is obtained by
choosing an equal number of malnourished
and well-nourished individuals. The individuals in the study are then followed for a
specified period to determine the development of disease in each of these two groups.
Note that in a prospective study we can
express the relationship between nutritional
status and subsequent disease in terms of the
odds ratio,
Pr(disease ⱍ malnourished)/
Pr(no disease ⱍ malnourished)
OR ⫽
Pr(disease ⱍ well nourished)/
Pr(no disease ⱍ well nourished)
or in terms of the relative risk (RR),
RR ⫽
Pr(disease ⱍ malnourished)
Pr(disease ⱍ well nourished)
Both of these measures of association attempt to explain the same phenomenon,
namely whether nutritional status has any
effect on the probability of disease.
The prospective design is generally the
method of choice if the disease outcome can
be observed relatively soon after the commencement of the study. However, in many
instances, particular diseases may develop
decades after initial exposure to specific risk
factors. In such instances the prospective
study would take decades to complete, making it very costly, if not entirely infeasible.
The retrospective study, also known as the
case-control study, in some sense takes the
opposite design approach. In a case-control
study individuals are selected into the study
on the basis of their disease status. Often an
equal number of diseased individuals
(called cases) and non-diseased individuals
Nutrition 16:1114 –1115, 2000
©Elsevier Science Inc., 2000. Printed in the United States. All rights reserved.
(called controls) are used. When the disease
under study is relatively rare, the cases may
include all diseased subjects in a clinic or
registry. The controls, however, should be
drawn from the same population. Cases and
controls are then interviewed to determine
their nutritional status; typically thus, data
on nutritional status are obtained retrospectively after determination of the disease.
It should be intuitively clear that finding
elevated rates of malnutrition among diseased cases compared with controls provides evidence for association between disease and nutritional status. However,
because of the sample design, with data
from a case-control study we can only estimate Pr(malnourished ⱍ disease status) and
not Pr(disease ⱍ nutritional status). As a result, we cannot directly estimate the relative
risk with data from a case-control study.
However, the association between nutritional status and disease can be expressed in
terms of the odds ratio. Using Bayes’ rule3
(a fundamental theorem that the reader may
have encountered in an introductory statistics course) and a little bit of algebra, it can
be shown that the odds ratio can be defined
not only in terms of Pr(disease ⱍ nutritional
status) but also in terms of the Pr(malnourished ⱍ disease status). That is,
Pr(malnourished ⱍ disease)/
Pr(well nourished ⱍ disease)
OR ⫽
Pr(malnourished ⱍ no disease)/
Pr(well nourished ⱍ no disease)
As a result, the odds ratio can be estimated
regardless of whether the study design is
prospective or retrospective.
CONCLUSION
In summary, the odds ratio is often considered to be the measure of choice for quantifying the association between two dichotomous variables. The reason for the
widespread adoption of the odds ratio is
due, at least in part, to its unique mathematical properties. The odds ratio, unlike other
measures of association, can be defined in
terms of the conditional probabilities of either one of the two variables, given the
other. As a result, the odds ratio can be
estimated either from a prospective or casecontrol (retrospective) design. Furthermore,
this attractive property readily generalizes
0899-9007/00/$20.00
PII S0899-9007(00)00437-8
Nutrition Volume 16, Numbers 11/12, 2000
The Odds Ratio: Impact of Design
1115
factor as the predictor, regardless of whether
the design is prospective or retrospective.
That is, case-control data can be treated as if
it were prospective data in a logistic regression analysis to determine the odds ratio
relating disease and exposure to the hypothesized risk factor.5
REFERENCES
FIG. 1. Illustration of a two-by-two contingency table.
when, for example, the risk factor is a continuous (e.g., urinary nitrogen as a biomarker for protein intake) rather than dichotomous variable, and/or when there are
additional confounding variables to control
for in the analysis. In this case, a logistic
regression model4 can be used, with disease
status treated as the outcome and the risk
1. Fitzmaurice G. Some aspects of interpretation of the
odds ratio. Nutrition 2000;16:462
2. Cornfield J. A method of estimating comparative
rates from clinical data: applications to cancer of the
lung, breast, and cervix. J Natl Cancer Inst 1951;11:
1269
3. Pagano M, Gauvreau K. Principles of biostatistics.
Belmont, CA: Duxbury Press, 1993
4. Pagano M. Logistic regression. Nutrition 1996;12:
135
5. Prentice RL, Pyke R. Logistic disease incidence models and case-control studies. Biometrika 1979;66:403