RANDOM BYTES Editor: Garrett Fitzmaurice, ScD The Odds Ratio: Impact of Study Design Garrett Fitzmaurice, ScD From the Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA n the previous column1 we considered the “two-by-two” contingency table, a simple square array used to provide a tabular display of the relationship between two categorical variables, each having only two levels. Figure 1 shows such a table, where the rows correspond to nutritional status and the columns correspond to disease status. We showed that the relationship between the two variables can be described in terms of the odds ratio, I Pr(disease ⱍ malnourished)/ Pr(no disease ⱍ malnourished) OR ⫽ Pr(disease ⱍ well nourished)/ Pr(no disease ⱍ well nourished) ⫽ 共a ⴱ d兲/共b ⴱ c兲 where OR denotes the odds ratio and Pr denotes probability. In the previous column, we also showed that the odds ratio has many appealing properties that account for its widespread use in practice. First, the odds ratio can often be interpreted as an approximation to the relative risk (or risk ratio) of disease in cases where the probability of disease is rare.2 Second, the odds ratio is invariant to reversals of the orientation of the two-by-two contingency table. That is, the odds ratio remains the same when rows and columns of the table are interchanged; a property that is not shared by other measures of association, e.g., the relative risk. This latter property implies that it is not necessary to distinguish which of the two variables is considered to be the outcome and which is considered to be the predictor to estimate the odds ratio. In this column, we will see that a very appealing feature of the odds ratio is that it is equally valid regardless of whether the study design is prospective or retrospective. This unique property of the odds ratio is not shared by other measures of association and has implications for the design of studies that examine the relationship between disease and a hypothesized risk factor (e.g., nutritional status). Correspondence to: Garrett Fitzmaurice, ScD, Department of Biostatistics, Harvard School of Public Health, Room 423, 665 Huntington Avenue, Boston, MA 02115, USA. E-mail: fitzmaur@ hsph.harvard.edu STUDY DESIGN To examine the relationship between disease and a hypothesized risk factor, e.g., nutritional status, we ordinarily must obtain data on both disease and nutritional status in a sample of individuals. There are two common designs for observational studies of the association between disease and a specific risk factor: the prospective and the retrospective study. The prospective study, sometimes known as the cohort study, attempts to mimic a designed experiment. That is, in a prospective study individuals are selected into the study on the basis of their nutritional status. In the simplest case, the greatest power for detecting an effect of nutritional status on disease is obtained by choosing an equal number of malnourished and well-nourished individuals. The individuals in the study are then followed for a specified period to determine the development of disease in each of these two groups. Note that in a prospective study we can express the relationship between nutritional status and subsequent disease in terms of the odds ratio, Pr(disease ⱍ malnourished)/ Pr(no disease ⱍ malnourished) OR ⫽ Pr(disease ⱍ well nourished)/ Pr(no disease ⱍ well nourished) or in terms of the relative risk (RR), RR ⫽ Pr(disease ⱍ malnourished) Pr(disease ⱍ well nourished) Both of these measures of association attempt to explain the same phenomenon, namely whether nutritional status has any effect on the probability of disease. The prospective design is generally the method of choice if the disease outcome can be observed relatively soon after the commencement of the study. However, in many instances, particular diseases may develop decades after initial exposure to specific risk factors. In such instances the prospective study would take decades to complete, making it very costly, if not entirely infeasible. The retrospective study, also known as the case-control study, in some sense takes the opposite design approach. In a case-control study individuals are selected into the study on the basis of their disease status. Often an equal number of diseased individuals (called cases) and non-diseased individuals Nutrition 16:1114 –1115, 2000 ©Elsevier Science Inc., 2000. Printed in the United States. All rights reserved. (called controls) are used. When the disease under study is relatively rare, the cases may include all diseased subjects in a clinic or registry. The controls, however, should be drawn from the same population. Cases and controls are then interviewed to determine their nutritional status; typically thus, data on nutritional status are obtained retrospectively after determination of the disease. It should be intuitively clear that finding elevated rates of malnutrition among diseased cases compared with controls provides evidence for association between disease and nutritional status. However, because of the sample design, with data from a case-control study we can only estimate Pr(malnourished ⱍ disease status) and not Pr(disease ⱍ nutritional status). As a result, we cannot directly estimate the relative risk with data from a case-control study. However, the association between nutritional status and disease can be expressed in terms of the odds ratio. Using Bayes’ rule3 (a fundamental theorem that the reader may have encountered in an introductory statistics course) and a little bit of algebra, it can be shown that the odds ratio can be defined not only in terms of Pr(disease ⱍ nutritional status) but also in terms of the Pr(malnourished ⱍ disease status). That is, Pr(malnourished ⱍ disease)/ Pr(well nourished ⱍ disease) OR ⫽ Pr(malnourished ⱍ no disease)/ Pr(well nourished ⱍ no disease) As a result, the odds ratio can be estimated regardless of whether the study design is prospective or retrospective. CONCLUSION In summary, the odds ratio is often considered to be the measure of choice for quantifying the association between two dichotomous variables. The reason for the widespread adoption of the odds ratio is due, at least in part, to its unique mathematical properties. The odds ratio, unlike other measures of association, can be defined in terms of the conditional probabilities of either one of the two variables, given the other. As a result, the odds ratio can be estimated either from a prospective or casecontrol (retrospective) design. Furthermore, this attractive property readily generalizes 0899-9007/00/$20.00 PII S0899-9007(00)00437-8 Nutrition Volume 16, Numbers 11/12, 2000 The Odds Ratio: Impact of Design 1115 factor as the predictor, regardless of whether the design is prospective or retrospective. That is, case-control data can be treated as if it were prospective data in a logistic regression analysis to determine the odds ratio relating disease and exposure to the hypothesized risk factor.5 REFERENCES FIG. 1. Illustration of a two-by-two contingency table. when, for example, the risk factor is a continuous (e.g., urinary nitrogen as a biomarker for protein intake) rather than dichotomous variable, and/or when there are additional confounding variables to control for in the analysis. In this case, a logistic regression model4 can be used, with disease status treated as the outcome and the risk 1. Fitzmaurice G. Some aspects of interpretation of the odds ratio. Nutrition 2000;16:462 2. Cornfield J. A method of estimating comparative rates from clinical data: applications to cancer of the lung, breast, and cervix. J Natl Cancer Inst 1951;11: 1269 3. Pagano M, Gauvreau K. Principles of biostatistics. Belmont, CA: Duxbury Press, 1993 4. Pagano M. Logistic regression. Nutrition 1996;12: 135 5. Prentice RL, Pyke R. Logistic disease incidence models and case-control studies. Biometrika 1979;66:403