Advanced Course in Epidemiological Analysis 2011 Session 1 Measures of disease and effect Helen Weiss Objectives: To introduce the main measures of disease occurrence and exposure effect used in the statistical analysis of epidemiological studies. To explain the difference between risks and rates, and the advantages of using rates in the analysis of cohort studies. To explain why odds ratios are usually used in the analysis of case-control and crosssectional (prevalence) studies. 1. Introduction Many of the statistical methods used in the analysis of epidemiological studies describe and evaluate the extent to which the chance of disease varies according to different levels of some factor (exposure). Note that the chance (or probability) of disease refers to the unknown outcome for future subjects, not to the actual outcomes in a study. The data from a study are used to estimate this probability. Note also that “disease” is used here as a general term to refer to the outcome of interest, whether this is infection, disease onset, death or any other well-defined event (e.g. pregnancy). 2. Measures of disease outcome The choice of the outcome measure used in a study will often depend on the study design: For cohort studies, the outcome measure is usually the risk or rate of disease. For cross-sectional studies, the outcome measure is usually the risk (prevalence) or odds of disease. In this session we will assume that there are two groups of individuals under study, an exposed group who possess the risk factor of interest, and an unexposed group who do not. We are interested in comparing the amount of disease in the exposed group to that in the unexposed group. Whatever the outcome measure used (e.g. risk, rate, odds), we assume that there is some true or underlying value for the population from which the data were sampled. We wish to estimate this value, using the data from our study. L.1.1 Advanced Course in Epidemiological Analysis 2011 3. Risk measures Consider a group of N disease-free individuals who join a cohort at a particular point in time and are then followed for a fixed period. The following table shows disease outcomes by exposure: Disease Exposed Unexposed Total Yes D1 D0 D No H1 H0 H Total N1 N0 N The risk of disease is defined as the probability that an individual experiences the disease during the fixed period of time. For exposed individuals, we can estimate the risk of disease (1) as D1/N1. The risk of disease in the unexposed group (0) is estimated as D0/N0. The effect of exposure on disease is often assessed by taking the ratio of the risk in exposed individuals to the risk in unexposed individuals. Risk ratio = 1/0 Another measure of effect is the difference between the risks of disease in exposed and unexposed individuals: Risk difference = 1 - 0 Exercise 1: As an example, consider a group of 30,000 subjects observed for 10 years, after which time 50 were diseased. i) Exposed Unexposed Total Yes 30 20 50 No 9 970 19980 29 950 Total 10 000 20 000 30 000 Estimate the risk of disease separately for exposed and unexposed individuals and obtain the risk ratio for exposed vs. unexposed individuals. Risk in exposed = ii) Disease Risk in unexposed = Calculate the risk difference for exposed vs. unexposed individuals. Risk difference = L.1.2 Risk ratio = Advanced Course in Epidemiological Analysis 2011 4. Rate measures 4.1 Disadvantages of risks The risk of disease will generally increase with the length of the time interval considered: the longer an individual is observed the greater their chance of developing the disease. It follows that in comparing exposed and non-exposed groups a 5-year risk should only be compared with other 5-year risks. It would be misleading to compare a 5-year risk with a 10-year risk, etc. Some further disadvantages of risks are: 1. They are more complicated to calculate if all individuals are not followed for the same length of time. This complicates the analysis of cohort studies where individuals may enter or exit at various points during the period of observation: for example, individuals may be lost to follow-up through emigration or cease to be at risk of a particular disease (e.g. because they have died from some other cause). 2. For individuals who develop the disease, no account is taken of when the disease occurred during the study period. 3. They do not provide a suitable approach for examining the effect of exposures, which change with time e.g. where individuals who are initially unexposed become exposed at some point during the follow-up period. Studying the effect of any timedependent variable using risks is similarly difficult. An important example is the effect of age, since individuals inevitably increase in age with increasing follow-up. 4. They may not be useful for common diseases with recurrent episodes (e.g. malaria, respiratory disease). In some settings everyone will experience an episode of disease, so everyone’s risk is 1. Most of these disadvantages can be overcome by measuring disease occurrence using rates. 4.2 Definition of a rate Consider one individual at a particular point in time. As discussed above, the risk that the individual experiences an event in an interval of time depends on the length of that time interval. For a short interval of time, h, over which the risk is , the risk per unit time is /h, where depends on h and gets smaller as h gets smaller. We define the rate, , as the limiting value of /h as h gets very small: = /h, h small. The rate is an instantaneous measure of the subject's liability to the disease and, thus, in contrast with risk, does not depend on the length of follow-up. When expressing a rate, the unit of time is arbitrary and could be per year, per week, per day, etc. Whatever the unit used, a rate is the instantaneous risk of disease, rather than the risk over that unit of time. The rate is also called the force of disease. For example, a rate of 0.6 per year could also be expressed 0.6/12 = 0.05 per month, as 0.6/365 = 0.00164 per day, or even as 6 per 10 years. L.1.3 Advanced Course in Epidemiological Analysis 2011 4.3 Estimation of the rate Although defined in terms of an individual's instantaneous risk of disease, the disease rate for one individual is not observable in practice. We estimate disease rates by observing a group of individuals who are assumed to have the same disease rate . Examples of such groups might be individuals in a particular age group in a particular time period, of the same sex and living in a particular geographical area. In analysing data on outcomes that only occur once in an individual (e.g. measles or death) or in analysing the time to the first episode of disease that may recur (e.g. malaria, common cold) each individual contributes a certain amount of observation time, starting when they join the study and finishing when they develop the disease, or are lost to follow-up, or the follow-up period ends, whichever happens first. The observation time is therefore the time during which, were the subject to experience an event, the event would be recorded in the study. Y is defined as the sum of the observation times for all individuals (the person-time-at-risk). If measured in years it is called person-years-at-risk (or pyar). D is defined as the total number of events observed among all individuals. The rate is estimated as D/Y (see Appendix A for further explanation). The estimate of obtained in this way is called the incidence rate. Exercise 2: In the Whitehall study of UK civil servants, individuals were grouped as never smokers or ex- and current smokers. Complete the table (which shows the total deaths D and person-years Y in each group) by calculating the rate per 1000 person-years in each group. D Y Rate per 1000 person-years Never/ex smokers 359 32145.4 Current smokers 484 22713.3 4.4 Effect measures based on rates We may compare disease occurrence in exposed and unexposed individuals by calculating the ratio or difference of their disease rates: Rate ratio = 1/0 Rate difference = 1 - 0 where 1 and 0 are the disease rates in exposed and unexposed individuals respectively. Exercise 3: Calculate the rate ratio and rate difference for smokers (exposed) compared to never/ex-smokers in the Whitehall study. Rate ratio = Rate difference = per 1000 person-years (Note that a rate ratio is a number and has no units, whereas a rate difference has units) L.1.4 Advanced Course in Epidemiological Analysis 2011 4.5 Converting rates into risks When the time interval is short, the risk of disease is approximately the rate of disease multiplied by time (see 4.2 above). But if the time interval is not short, we need to use another formula to estimate risk. If the rate of disease during a time interval 0 to t is , then risk of an individual developing the disease at some time in the interval = 1- Probability (escaping disease) = 1 - (1 - t) t / t = 1- e -t Example If the disease rate is 0.001/day The risk of developing the disease during a 1000 day period (also = = = 1- e -.001 * 1000 1- e -1 0.6321 = 1- probability of escaping disease =1- 0.999 1000) If R is risk in time (0,t), then R = 1-e -t or = [-ln (1-R)] / t 5. Odds measures 5.1 Odds The odds of disease, , is defined as the probability that an individual experiences the disease divided by the probability that they do not: = /(1-) [Note that if the risk is known, the odds can be calculated ( = /(1-)), and vice-versa ( = /(1+)).] Exercise 4: i) Calculate the odds of disease, if the risk is (a) 0.01, (b) 0.1 (c) 0.5 (d) 0.8 (a) (b) (c) (d) ii) Calculate the risk of disease, if the odds are (a) 0.05 (b) 1 (c) 2 (d) 99 (a) (b) (c) (d) L.1.5 Advanced Course in Epidemiological Analysis 2011 5.2 Odds ratio Consider a cohort study as below: Disease Exposed Unexposed Total Yes D1 D0 D No H1 H0 H Total N1 N0 N We have seen how to calculate risks, risk ratios and risk differences from such tables. We can also calculate odds and the ratio of the odds among exposed and non-exposed - that is, the odds ratio. For exposed individuals in the table, 1 = D1/N1, and 1-1 = H1/N1 (since N1=D1+H1), and the odds in exposed individuals is estimated as 1 = D1/N1 = D1/H1 H1/N1 Similarly, the odds in unexposed individuals is estimated as 0 = D0/H0. The odds ratio is defined as = 1/(1-1) divided by 0/(1-0) and this is estimated as: D1/H1 = D0/H0 D1 x H0 D0 x H1 5.3 Odds ratios in case-control studies In case-control studies it is usually not possible to calculate disease rates or ratios of those rates as the study design is essentially cross-sectional and the control group (non-cases) consists of a (usually unknown) fraction of the total population at risk. Indeed, the case group may consist of an unknown fraction of all cases. Thus, suppose we are conducting a case-control study in a population in which the total expected distribution between cases/non-cases and exposed/non-exposed is as below: Disease Exposed Unexposed Total Yes D1=N11 D0=N00 D No H1=N1(1-1) H0=N0(1-0) H Total N1 N0 N In a case-control study we sample a fraction, f1 (often unknown) of cases and a different fraction, f0 (often unknown) of non-cases. Thus the expected results of a case-control study may be represented as: L.1.6 Advanced Course in Epidemiological Analysis 2011 Disease Exposed Unexposed Total Yes d1=N1 1f1 D0=N0 0f1 d=f1D No h1=N1(1-1)f0 H0=N0(1-0)f0 h=f0H Because f1 and f0 are unknown we cannot estimate risks or rates but what can be estimated is the odds ratio as (d1/d0) / (h1/h0) = (d1h0) / (d0h1) will give an estimate of (1/(1-1)) / (0(1-0)) 5.4 Advantages of odds ratios For most people, odds and odds ratios are more difficult concepts than risks and risk ratios. However, there are two reasons why odds and odds ratios are used in the analysis of epidemiological studies: (i) In case-control studies, we cannot calculate the absolute risk (or odds) of disease, because we have selected people according to disease status. However, it is possible to estimate the odds of exposure and, hence, the exposure odds ratio. The interpretation of this will be discussed further in Session 4. (ii) Odds and log(odds) have useful mathematical properties that risks do not. Risks must lie between 0 and 1, odds must lie between 0 and , and log(odds) can lie between - and +. The fact that log(odds) are unbounded is an advantage when fitting statistical models. We shall return to this point in the sessions on logistic regression. Thus, the analysis of case-control is usually based on odds ratios. 6. Approximations between rate ratios, risk ratios and odds ratios If a disease is rare (i.e. « 1), then 1/(1-1) 1 and 0/(1-0) 0, so the ratio of odds will be approximately equal to the ratio of risks. The more common the disease, the less close will be the agreement between the risk ratio and the odds ratio. In these circumstances, the odds ratio will be more extreme (further from 1) than the risk ratio. Furthermore, if a disease is rare (small ), then = 1 - exp(- x Time) x Time, so the ratio of risks will be approximately equal to the ratio of rates. In summary, for diseases that are rare over the course of the follow-up time: Risk ratio Odds ratio Rate ratio. L.1.7 Advanced Course in Epidemiological Analysis 2011 7. Measures of exposure effect: ratios versus differences We have mentioned two ways of assessing the effect of exposure on disease - ratio measures and difference measures. In general, ratio measures give a better measure of the strength of association between exposure and disease and are commonly used for studying the aetiology of disease (i.e. epidemiology). Difference measures (risk or rate differences) are usually more appropriate for assessing the public health implications of an exposure. The two are illustrated in the following example: Example: Mortality of British male physicians 1951-1961 Death rate per 1000 person years Non smokers Smokers Lung cancer Cardiovascular 0.07 7.32 2.27 9.93 Rate ratio Rate difference 32.4 1.4 2.20 2.61 The importance of smoking as an aetiological factor for lung cancer is clearly demonstrated by the large rate ratio (RR=32.4). In contrast, the association between smoking and cardiovascular disease is less strong (RR=1.4). From a public health viewpoint, however, smoking is just as important for cardiovascular disease as lung cancer because the absolute increase in death rate is similar for both diseases. 8. Summary The following table shows, for each type of epidemiological study, what measure of disease and effect should be obtained, and what method may be used to adjust for a confounder: Measure of disease (Ratio) Measure of Effect 1 Cohort rate, risk risk ratio, rate ratio Cross-sectional1 risk, odds risk ratio , odds ratio Note that in cross-sectional studies, risk is usually interpreted as prevalence L.1.8 Case-control odds ratio Advanced Course in Epidemiological Analysis 2011 Appendix A. Derivation of why D/Y estimates the rate of disease It may not be intuitively obvious why D/Y estimates the rate of disease. The illustration below gives a justification for this (which may be ignored for the less mathematically inclined). Suppose the rate of disease is . The risk of disease in a small interval t is t. The probability of escaping disease from time 0 to time t and then developing disease in the interval t to t+t is: (1-t) t/t t = e - t t (if t is small) If everyone is followed-up until they get disease then the expected time to disease is: ∞ o te - t dt which is 1/ Thus the total follow-up time, Y, if N individuals are followed until they all get disease (D = N) is: Y = D/ Thus = D/Y L.1.9 Advanced Course in Epidemiological Analysis 2011 Appendix B The Mantel-Haenszel risk ratio is calculated as follows. Suppose that there are J levels of a confounder, then the association between exposure and disease in the jth level of the confounder may be summarised in the following table, where the subscripts 0, 1, and j represent unexposed, exposed, and jth level of the confounder respectively: Disease Exposed Unexposed Total Yes D1j D0j Dj No H1j H0j Hj Total N1j N0j Nj If each level of the confounder is summarised in this way, then the Mantel-Haenszel risk ratio is given by: RRMH = D1j N0j / Nj D0j N1j / Nj where summation if for j=1,2,...,J. A similar formula is available for a Mantel-Haenszel rate ratio (Session 2) and a MantelHaenszel odds ratio (Session 4). L.1.10