Chapter 3-5. Disease Frequency and Effect Measures << Half of this chapter is paraphrase right out of Rothman (2002), done that way to quickly prepare a lecture while teaching out of the Rothman text. It needs to be further turned into my own work. >> Given that epidemiology is the study of the occurrence of disease, we are interested in measurements of disease frequency. Three measures of disease frequency we discuss in this chapter are: risk, incidence rate, prevalence, and hazard rate. We will then combined these to construct measures of effect: risk difference, risk ratio, prevalence ratio, hazard ratio, and attributable fraction. These effect measures are used to quantify potential causal effects. How Frequently Are Epidemiology Statistics Used Horton and Switzer (2006) surveyed what statistical methods are used in research articles published in N Engl J Med. They found that 35% of research articles published in 2004-2005 reported epidemiologic statistics. Risk (Incidence Proportion) Using Rothman’s notation (Rothman, 2002, p.24), we measure risk as: risk= A number of subjects developing disease during a time period N number of subjects followed for the time period which is the proportion of subjects developing disease during a time period. That is, risk = # cases , defined for a specific time period sample size Like proportions in general, this proportion ranges between 0 and 1: 0 = no one develops disease up to 1 = everyone develops disease _____________________ Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual. Salt Lake City, UT: University of Utah School of Medicine. Chapter 3-5. (Accessed February 14, 2012, at http://www.ccts.utah.edu/biostats/ ?pageId=5385). Chapter 3-5 (revision 14 Feb 2012) p. 1 While we measure risk at the group level, we use it as an estimate of the probability than an individual will develop disease. To consider this proportion to be a probability estimate is consistent with probability theory. Definition: Probability = long run average of occurrences of an event = “the actual probability existing in nature” Formally stated as, Probability = A N as N In epidemiology, we, of course, use N’s smaller than infinity, so our risks are probability estimates, rather than “the exact probability existing in nature”. Having now clarified that risk is a probability, this is a good time to point out why we use the term “risk factor” to denote a variable thought to be “causally related” (i.e., a possible component cause) to a disease outcome. We say “risk factor”, rather than “casual factor” to clarify our uncertainty that we have indeed identified an actual component cause. If we were to measure risk in a single person, applying the risk equation, risk = A/N, risk would be either 0 or 1. risk = 0/1 = 0 , if person did not get disease risk = 1/1 = 1 , if person did get disease or This is rather pointless, however, since it is unlike that a person surely will not get disease (risk=0) or surely will get disease (risk=1). When speaking about risk applying to an individual, then, we are describing the probability that a person will develop a given disease—not whether disease did or did not develop in an individual—which we compute from the group data as risk = A / N, for our given follow-up period The risk in a group is also referred to as the incidence proportion (Rothman, 2002, p.25). From the sufficient/component cause theory perspective, the incidence proportion is the proportion of study subjects that complete a sufficient cause for disease during the follow-up period. The word “incidence”, which is a synonym for “occurrence”, is a natural choice here. An “incident” is something that occurs as the consequence of something, giving the causal implication which epidemiologists are so fond of. Chapter 3-5 (revision 14 Feb 2012) p. 2 Clarifying Example (Rothman, 2002, p.25) Suppose you read that “women who are 60 years old have a 2% risk of dying from cardiovascular disease”. What does that mean? It certainly does not apply to the next 24 hours. Likewise, the risk of developing and dying from cardiovascular disease over her remaining lifetime would likely be greater than 2%. There might be a specific period of time over which the 2% figure would be correct, but any other period would have a different value of risk. The only way to interpret a risk, then, is to know the time period for which the risk applies. Competing Risks Risk is an excellent measure of disease frequency because everyone understands it, even without epidemiology training. It does have a drawback, however. Over a long time interval, it is impossible to accurately measure risk, because some people will die from other causes which are not being studied. We call this phenomenon of people being removed from a study due to death from other causes as competing risks. The only outcome which does not have this problem is “death from all causes”, since there is no possibility of anyone dying from a cause not under study. Chapter 3-5 (revision 14 Feb 2012) p. 3 Clarifying Example (Rothman, 2002, p.27) Suppose you wish to measure the incidence proportion of domestic violence in a population of 10,000 married women over a 30-year period. Clearly, not all of the 10,000 women will survive the 30-year period, due to deaths from cardiovascular disease or cancer. It is likely that some of these women who died from competing risks would have experienced domestic violence if they had remained in the study. The numerator of the incidence proportion is underestimated, then, while the denominator still contains all 10,000 women. The result is that the incidence proportion is an underestimate of the 30-year risk. Losses to Follow-up A related problem with long-term follow-up is losses to follow-up. This occurs when people drop out of the study because they move away, or because they choose to decline further participation. Losses to follow-up create the same problem as competing risks, in that the incidence proportion is underestimated when this occurs. If no competing risks or losses to follow-up occur, the incidence proportion is a reasonable estimate of disease frequency. Generally this only occurs with very short study periods, where losses due to competing risks and losses to follow-up are rare. Chapter 3-5 (revision 14 Feb 2012) p. 4 Stata Exercise 1 (calculating incidence proportions with a crosstabulation table) Look at the Bergstrom et al (2004) article, BergstromArth&Rheum2004.pdf. Notice on page 1960, second column, paragraph 2, they state: “The cumulative incidence of symptomatic coccidioidomycosis was obtained both in patients receiving TNF antagonist therapy and in those receiving other therapies, as a group.” Cumulative incidence is identically incidence proportion. Even Rothman, in his earlier edition of Modern Epidemiology (Rothman, 1986, pp.23-31), used to call it that. The task is to check the “Incidence of infection” column of Bergstrom’s Table 2, using Stata. Bergstrom’s Table 2 contains the following information: Infliximab Other therapy Total coccidioidomycosis Yes No 7 240 4 734 11 974 Total 247 738 985 Incidence of infection 0.028 0.005 0.011 Using Stata’s menus, Statistics Summaries, tables & tests Tables Table calculator Main tab: User supplied cell frequencies: 7 240 \ 4 734 Cell contents: within-row relative frequencies OK which is identical to running the following (tabulate immediate) command in the Command Window, tabi 7 240 \ 4 734 , row | col row | 1 2 | Total -----------+----------------------+---------1 | 7 240 | 247 | 2.83 97.17 | 100.00 -----------+----------------------+---------2 | 4 734 | 738 | 0.54 99.46 | 100.00 -----------+----------------------+---------Total | 11 974 | 985 | 1.12 98.88 | 100.00 Converting the percents (2.83% and 0.54%) to proportions (0.028 and 0.005), we see that Bergstrom’s calculations were correct. Chapter 3-5 (revision 14 Feb 2012) p. 5 Bergstrom never discusses competing risks or losses to follow-up in her article. Her study design used all of the patients receiving TNF antagonist therapy (infliximab) during a three-year interval, with a control group of 3 times as many patients receiving other therapies. That data were from medical records. Her three-year study period is certainly long enough that she will have competing risks and losses to follow-up, although she never discusses this potential problem. This study has the potential to be flawed by losses-to-followup bias, where bias means simply the deviation of results or inferences from the truth. Incidence Rate and Mortality Rate To address the problem of competing risks and losses to follow-up, epidemiologists often use incidence rate in place of incidence proportion (Rothman, 2002, p.28). Similar to incidence proportion, incidence rate uses the number of cases, A, as the numerator. We use a different denominator, however. Instead of dividing by the number of people at the start of follow-up, we divide by a measure of time. This time measure is the summation, across all individuals, of the follow-up time. Incidence rate A number of subjects developing disease Time total time experienced for the subjects followed the denominator usually referred to as person-time. A mortality rate is an incidence rate in which the event being measured is death (Rothman, 2002, p.30) Mortality rate A number of subjects dying Time total time experienced for the subjects followed Chapter 3-5 (revision 14 Feb 2012) p. 6 The person-time denominator represents the total “time at risk” for disease for the study group. In the following diagram, we see the time at risk for five people being followed to measure the mortality rate of leukemia (Rothman, 2002, p.30). Leukemia death Death from automobile crash Lost to follow-up End of follow-up End of follow-up 2 3 4 5 Time (Years) 1 The first person died from leukemia after 3 years, and so contributed 3 years of time-at-risk to the denominator. The second person died from another cause (a competing risk) at which time the person was no longer at risk for leukemia, contributing 4 years of time-at-risk to the denominator. The third person was lost to follow-up. Although this person could still die from leukemia, we would never know about it so it could not be counted in the numerator. Thus the time-at-risk ended upon lost to follow-up, contributing 2 years to the denominator. The fourth and fifth persons were followed to the end of the study’s follow-up period, each contributing 5 years of time-at-risk to the denominator. Therefore, the mortality rate is: MR cases 1 1 case person-time 3 4 2 5 5 19 person-years 0.0526 cases 5.26 cases per 100 person-years person-year Notice that incidence rates treat one unit of time as equivalent to another, regardless of whether these time units come from the same person or from different people. If you allow cases in the same person to contribute more than once to the numerator, something you might do in an upper respiratory tract infection study, then the denominator should likewise include all of the time each person is at risk of getting the any episodes of the infection. Ideally, the follow-up for a specific individual should temporarily end when an episode begins, and resume after recovery of each episode. If a person can experience an event only once, such as death from leukemia, the person ceases to contribute follow-up time after the event occurs. Chapter 3-5 (revision 14 Feb 2012) p. 7 If you are studying events that can occur more than once, but you only count the first occurrence of the event (such as first heart attack) in the numerator, then follow-up ends with the first event. In all of these cases, the time that goes into the denominator is the time experienced by the people being followed during which the disease or event studied could have occurred. For this reason, the time tallied in the denominator of an incidence rate is often referred to as the time at risk of disease. From the sufficient/component cause theory perspective, the best measure of person-time is the induction time, since any other time measurement will contribute excess time to the measure. However, we also learned in Chapter 2 that induction time is defined for each specific component cause. The way researchers allow for differing times for specific component causes (specific exposure variables of interest) in their analysis is by using either Poisson regression or Cox regression with time-dependent covariates. Chapter 3-5 (revision 14 Feb 2012) p. 8 Aside: Are researchers consistent with labeling of proportions and rates? Incidence = number of new cases of a disease in a specified population within a specified time period. Incidence proportion = cases / N = risk Incidence rate = cases / person-time Rothman makes a clear distinction between proportions and rates, but is such a distinction found in the literature? Are incidence, risk, and rate used synonymously? Exercise. Take out the Sulkowski (2000) article, SulkowskiJAMA2000.pdf Find on page 77, bottom of first column, “Use of ritonavir was associated with a higher incidence of toxicity (30%, 95% CI, 17.9%-44.6%).” The 30% comes from the Ritonavir lines of Table 3, (6+9)/(22+28)=30%. Now go to column 2 on the same page, 2nd sentence under the Hepatotoxity and Chronic Viral Hepatitis heading: “Rate of severe toxicity with any PI in coinfected patients was 12.2% (13/107; 95% CI, 6.6%-19.9%). From these two examples, we see that the authors are using incidence and rate interchangeably. Next, find the caption to Figure 1 on the same page, “Incidence Rate (Cases per Persons Exposed) of Hepatotoxicity During Antiretroviral Therapy, by Drug Regimen” The authors are using incidence rate for what Rothman refers to as incidence proportion. Finally, look at column headings for Table 3 on the same page, “Incidence (Cases/Persons Exposed)” and “Incidence (Cases/100 Person-Months)”. In this article, the authors did not use the names incidence proportion and incidence rate to make the distinction, but at least they were always careful to inform the reader what type of the denominator was being used. Such inconsistency is not surprising. Even epidemiologists are inconsistent with each other when it comes to effect measure terminology, so confusion abounds. To see this, look at the Granados (1997) article. Chapter 3-5 (revision 14 Feb 2012) p. 9 Stata Exercise 2 (calculating incidence proportions with epitab) The classical form for displaying an exposure-disease relationship in a 2 2 table (2 rows and 2 columns) is: Incidence Proportion Layout Disease Cases Noncases Exposure Exposed Unexposed a b c d There is a suite of commands in Stata, called epitab (tables for epidemiologists), used for epidemiologic statistics. To use epitab with Bergstrom’s (2004) Table 2, Infliximab Other therapy Total coccidioidomycosis Yes No 7 240 4 734 11 974 Total 247 738 985 Incidence of infection 0.028 0.005 0.011 we first must put the data in the order of the classical incidence proportion layout: Disease Cases Noncases Exposure Exposed Unexposed 7 4 240 734 Now, using Stata’s epitab feature Statistics Epidemiology and related Tables for epidemiologists Cohort study risk ratio etc. calculator 7 4 240 734 OK csi 7 4 240 734 where “csi” is cohort study immediate (this name comes from the incidence proportion being the effect measure of choice, and being a correct statistic, for a cohort study) Note: we did not have to use the “\” for a carriage return, as was done with tabi, in the csi command because Stata knows it will be a 2 × 2 table read in row order. Chapter 3-5 (revision 14 Feb 2012) p. 10 | Exposed Unexposed | Total -----------------+------------------------+-----------Cases | 7 4 | 11 Noncases | 240 734 | 974 -----------------+------------------------+-----------Total | 247 738 | 985 | | Risk | .0283401 .0054201 | .0111675 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Risk difference | .02292 | .0015582 .0442818 Risk ratio | 5.228745 | 1.543692 17.71065 Attr. frac. ex. | .8087495 | .3522022 .9435368 Attr. frac. pop | .5146588 | +------------------------------------------------chi2(1) = 8.80 Pr>chi2 = 0.0030 Reading from the “Risk” line, we see the proportions 0.028 and 0.005, which are identical to what we calculated using the crosstabulation approach above. Chapter 3-5 (revision 14 Feb 2012) p. 11 Stata Exercise 3 (calculating incidence rates with epitab) The classical form for displaying an exposure-disease relationship when person-time denominators are used is: Incidence Rate Layout Disease Cases Person-Time Exposure Exposed Unexposed a b PT(exp) PT(unexp) In the Sulkowski (2000) article, Table 3, we find Antiretroviral Drug Regimen Dual nucleoside analog (referent regimen) Protease inhibitor all (study regimen) Severe Person-Time Hepatotoxicity (100 Person-Months) cases 5 246 Incidence Rate 26 3.3 795 2.0 To replicate these rates, we put the data in the Incidence Rate Layout Incidence Rate Layout Disease Cases Person-Time Exposure Exposed Unexposed 26 5 795 246 Now, using Stata’s epitab feature Statistics Epidemiology and related Tables for epidemiologists Incidence-rate ratios calculator 26 5 795 246 OK iri 26 5 795 246 where “iri” is incidence rate immediate Chapter 3-5 (revision 14 Feb 2012) p. 12 | Exposed Unexposed | Total -----------------+------------------------+-----------Cases | 26 5 | 31 Person-time | 795 246 | 1041 -----------------+------------------------+-----------| | Incidence Rate | .0327044 .0203252 | .0297791 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Inc. rate diff. | .0123792 | -.0094249 .0341833 Inc. rate ratio | 1.609057 | .6080283 5.36572 Attr. frac. ex. | .3785178 | -.6446602 .8136317 Attr. frac. pop | .3174666 | +------------------------------------------------(midp) Pr(k>=26) = 0.1684 (midp) 2*Pr(k>=26) = 0.3368 (exact) (exact) (exact) (exact) Since Sulkowski was expressing his rates per 100 person-years (3.3 and 2.0), we multiply our incidence rates by 100 to replicate his result. Chapter 3-5 (revision 14 Feb 2012) p. 13 Person-Time Units Researchers generally express their rates in whatever person-time units provide at least one digit to the left of the decimal place. Notice that rate = 3.3 cases per 100 person-years is much easier to grasp than rate = 0.033 cases per person-year. Aside: Changing units of person-time To change the units of anything, we divide by the new units. For example, if we have 30 eggs and we want to express it in dozens, we divide by a dozen (or 12), eggs in dozens = 30/12 = 2.5 dozen The general form for a rate is: rate = cases person-years To express this in 100 person-years units, or per 100 person-years, which is person-years in units of 100, we simply divide the person-years by 100 rate per 100 person-years = cases person-years 100 cases person-years 1 100 cases 100 rate 100 person-years which we see is simply multiplying the original rate by 100, the unit of person-years we desire. Chapter 3-5 (revision 14 Feb 2012) p. 14 Prevalence Proportion Both incidence proportion and incidence rate measure the frequency of disease onset. In contrast, prevalence proportion, or simply prevalence, does not measure disease onset, but instead is a measure of disease status (Rothman, 2002, p.40). It is the proportion of people in a population that has a specific disease. For a population of size N, where P individuals have the disease at a given point in time, prevalence proportion = P/N. For example, suppose that among 10,000 female residents of a town on July 1, 2001, that 1200 have hypertension. Then, prevalence proportion = 1200/10,000 = 0.12, or 12% Prevalence proportion is affected by disease occurrence, as the greater the incidence of disease, the more people will have it. Prevalence is also related to the length of time that a person has disease, as the longer the duration of disease once it occurs, the higher the prevalence. Because prevalence is a mixture of incidence rate and disease duration, it is not as useful for studying the cause of disease. It is useful, however, as a measure of disease burden (such as what proportion of the population at any given point in time have the disease and thus contribute to health care costs). Chapter 3-5 (revision 14 Feb 2012) p. 15 Life Tables and Hazard Rates Although the incidence rate is a more sensitive analysis than the incidence proportion, because its denominator allows for attrition due to competing risks and losses to follow-up, it is usually too simplistic. That is, it makes no distinction between cases that occur early on from cases that occur later. Because of its simplicity in calculation, it assumes that the risk is constant over the follow-up period. It is most often the case that the incidence rate changes over the follow-up period. To account for the changing incidence rate, as well as the shrinking denominator, we can calculate the risk separately for subintervals of the time period. This is called a life-table. This will be demonstrated with data provided in Lee (1980, Table 3.5, p.31), which originally came from Myers (1969). The data represent a cohort of male patients with localized cancer of the rectum diagnosed in Connecticut from 1935 to 1944. In life-table format, Life Table for male patients with localized cancer of the rectum diagnosed in Connecticut from 1935 to 1954 (LeeLife dataset) Interval Beginning Lost to (years) N Deaths Follow-up Hazard 1 388 167 2 0.5502 2 219 45 1 0.2296 3 173 45 1 0.3000 4 127 19 1 0.1624 5 107 17 1 0.1735 6 89 11 1 0.1325 7 77 8 1 0.1103 8 68 0 1 0.0000 9 67 6 1 0.0945 10 60 7 1 0.1250 The “Hazard” is the interval-specific risk (year-specific risk in this example). The traditional calculation for life tables is to use the acturial method of adjustment for deaths and censored observations. In the actuarial method, deaths and losses are assumed to occur in the middle of the time interval, so the formula is: interval-specific hazard = deaths/[(beginning N) – (1/2)(deaths + lost)] For the first interval, we get year 1 hazard = 167 / [ 388 – (1/2)(167 + 2)] = 167 / [ 388 – 84.5 ] = 167 / 303.5 = 0.5502 Chapter 3-5 (revision 14 Feb 2012) p. 16 If we added a cumulative failure column, without using the actuarial adjustment, based on interval-specific hazard = deaths/(beginning N) it would be the Kaplan-Meier failure curve. We will do this in Chapter 10, as well as defer the calculation of a life-table in Stata until then. Kaplan-Meier curves, accompanied by Cox regression, are a much better and much more often used approach than incidence proportions and incidence rates. Kaplan-Meier curves and Cox regression are survival analysis methods, which like life-tables, are methods of calculating risks over a time period with changing incidence and shrinking denominators. The interval-specific hazard described above can be extended into an effect measure called the hazard rate, usually expressed without the actuarial adjustment. The hazard rate is the theoretical limit approached by an incidence rate as the time interval is narrowed toward zero (Rothman and Greenland, 1998, p.35) Chapter 3-5 (revision 14 Feb 2012) p. 17 Measures of Causal Effects The primary goal of epidemiologic research is to study the causes of disease. To determine whether an exposure causes disease is problematic, however, because of the following essential epidemiologic principle (Rothman, 2002, p.44), Essential epidemiologic principle A person may be exposed to an agent and then develop disease without there being any causal connection between exposure and disease. That is, the exposure-disease relationship may only be an association, where the disease seemed to follow the exposure in time, but was actually caused by something else. Given this reason (merely association), we cannot consider the incidence proportion or the incidence rate among exposed people to measure a causal effect. To measure a causal effect, we have to compare the experience of exposed people with what would have happened without the exposure. This is the counterfactual ideal. Chapter 3-5 (revision 14 Feb 2012) p. 18 The Counterfactual Ideal If we compare risks or incidence rates between exposed and unexposed people, we cannot be certain that the differences observed are attributable to the exposure. The differences could be attributable to other factors that differ between exposed and unexposed people, some being factors that we have not even measured. The ideal comparison would be comparing people to themselves, in both an exposed and unexposed state, at a single instance in time when nothing else varied (the perfect situation of varying a single factor while holding all else constant). If this impossible goal were achievable, we could determine the effect of the exposure, because the only difference between the two settings would be the exposure. Because this situation is not realistic, it is called counterfactual (Rothman, 2002, p.45). A crossover study does use the same person, and so comes close to the counterfactual ideal, but does not quite achieve it because a person can only be in one treatment group at a time, and thus uncontrolled factors can vary. Although not achievable, the counterfactual ideal provides a reference point, or best case, for judging a measure of causal effect. That is, the closer our study approximates the counterfactual, the better our effect measure represents an actual causal effect. Chapter 3-5 (revision 14 Feb 2012) p. 19 Effect Measures We cannot achieve the counterfactual ideal in a research study, but we can strive to come as close as possible. What we do is compare an exposed group to an unexposed group, where the two groups are as close as possible except for the exposure. We might perform a crossover study, or perform a randomized experiment to balance the effect of other factors, or perhaps choose an unexposed group with a similar risk profile. We might also use statistical methods such as stratification and regression, which is discussed later, to help assure comparability. For a crude analysis, in particular, which does not use stratification or regression, we must make the following assumption (Rothman, 2002, p.47), “Otherwise Comparable” Assumption If we can assume that the exposed and unexposed groups are otherwise comparable with regard to risk for disease, we can compare measures of disease occurrence to assess the effect of the exposure. In other words, we must assume that there is no confounding. Chapter 3-5 (revision 14 Feb 2012) p. 20 Risk Difference The risk difference (or incidence proportion difference) is simply the difference in incidence proportions between the exposed and unexposed groups. A A RD R1 R0 N 1 N 0 where 1 denotes exposed group and 0 denotes unexposed group We subtract in this order, so that if risk for greater for the exposed group, the number is positive. A positive number implies risk is increasing, while a negative number implies risk is decreasing with exposure. We computed this above for Bergstrom’s (2004) data, csi 7 4 240 734 | Exposed Unexposed | Total -----------------+------------------------+-----------Cases | 7 4 | 11 Noncases | 240 734 | 974 -----------------+------------------------+-----------Total | 247 738 | 985 | | Risk | .0283401 .0054201 | .0111675 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Risk difference | .02292 | .0015582 .0442818 Risk ratio | 5.228745 | 1.543692 17.71065 Attr. frac. ex. | .8087495 | .3522022 .9435368 Attr. frac. pop | .5146588 | +------------------------------------------------chi2(1) = 8.80 Pr>chi2 = 0.0030 Chapter 3-5 (revision 14 Feb 2012) p. 21 Incidence Rate Difference The incidence rate difference is simply the difference in incidence rates between the exposed and unexposed groups. a b A A IRD IR1 IR0 PT 1 PT 0 PT1 PT0 where 1 denotes exposed group and 0 denotes unexposed group We computed this above for Sulkowski’s (2000) data. iri 26 5 795 246 | Exposed Unexposed | Total -----------------+------------------------+-----------Cases | 26 5 | 31 Person-time | 795 246 | 1041 -----------------+------------------------+-----------| | Incidence Rate | .0327044 .0203252 | .0297791 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Inc. rate diff. | .0123792 | -.0094249 .0341833 Inc. rate ratio | 1.609057 | .6080283 5.36572 Attr. frac. ex. | .3785178 | -.6446602 .8136317 Attr. frac. pop | .3174666 | +------------------------------------------------(midp) Pr(k>=26) = 0.1684 (midp) 2*Pr(k>=26) = 0.3368 (exact) (exact) (exact) (exact) Difference measures such as risk difference and incidence rate difference measure the absolute effect of an exposure. It is also possible to measure the relative effect. Chapter 3-5 (revision 14 Feb 2012) p. 22 Risk Ratio or Relative Risk The risk ratio, also called relative risk, is simply the ratio of incidence proportions for the exposed and unexposed groups. RR R1 R0 where 1 denotes exposed group and 0 denotes unexposed group We place the exposed group risk on top, so that if risk is greater for the exposed group, RR > 1. An RR > 1 indicates that the exposure is a risk factor, while an RR < 1 indicates that the exposure is a protective factor. We sometimes refer to factors as deleterious or protective, respectively. If the factor has no effect, than the numerator and denominator risks are equal, so RR = 1 indicates no effect, in contrast to a RD = 0 indicating no effect. We computed this above for Bergstrom’s (2004) data, csi 7 4 240 734 | Exposed Unexposed | Total -----------------+------------------------+-----------Cases | 7 4 | 11 Noncases | 240 734 | 974 -----------------+------------------------+-----------Total | 247 738 | 985 | | Risk | .0283401 .0054201 | .0111675 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Risk difference | .02292 | .0015582 .0442818 Risk ratio | 5.228745 | 1.543692 17.71065 Attr. frac. ex. | .8087495 | .3522022 .9435368 Attr. frac. pop | .5146588 | +------------------------------------------------chi2(1) = 8.80 Pr>chi2 = 0.0030 Chapter 3-5 (revision 14 Feb 2012) p. 23 Rate Ratio, or Incidence Rate Ratio, or Relative Rate The rate ratio, also called incidence rate ratio or relative rate, is simply the ratio of incidence rates for the exposed and unexposed groups. IRR or RR IR1 IR0 where 1 denotes exposed group and 0 denotes unexposed group As with the risk ratio, we place the exposed group rate on top, so that if rate is greater for the exposed group, IRR > 1. An IRR > 1 indicates that the exposure is a risk factor, while an IRR < 1 indicates that the exposure is a protective factor. If the factor has no effect, than the numerator and denominator rates are equal, so IRR = 1 indicates no effect, in contrast to an IRD = 0 indicating no effect. We computed this above for Sulkowski’s (2000) data. iri 26 5 795 246 | Exposed Unexposed | Total -----------------+------------------------+-----------Cases | 26 5 | 31 Person-time | 795 246 | 1041 -----------------+------------------------+-----------| | Incidence Rate | .0327044 .0203252 | .0297791 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Inc. rate diff. | .0123792 | -.0094249 .0341833 Inc. rate ratio | 1.609057 | .6080283 5.36572 Attr. frac. ex. | .3785178 | -.6446602 .8136317 Attr. frac. pop | .3174666 | +------------------------------------------------(midp) Pr(k>=26) = 0.1684 (midp) 2*Pr(k>=26) = 0.3368 Chapter 3-5 (revision 14 Feb 2012) (exact) (exact) (exact) (exact) p. 24 How to Interpret Sizes of Effect Risk Difference We call the risk difference an absolute effect. RD = Rexposed – Runexposed which has a range of [-1, 1] = 0 when there is no effect (Rexposed = Runexposed) Some example interpretations are: “there was no effect of exposure” “there was an absolute 75% increase in risk with the exposure” RD = 0 RD = 0.75 Suppose we are studying an intervention, such as a new therapy, that decreases risk. We would call our new therapy the exposed group and the standard therapy the unexposed group. Than we would say, RD = -0.25 “there was an absolute 25% decrease in risk with the new therapy” Incidence Rate Difference Similarly, we call the incidence rate difference an absolute effect. IRD = IRexposed – IRunexposed which has a range of (-, +) = 0 when there is no effect (IRexposed = IRunexposed) The incidence rate difference has an infinite range because an incidence rate, itself, has an infinite range of (-, +). This is due to the 1/time units of the rate. We can greatly influence the value of the rate by adjusting the units of time. For example, IR =0.025 per person-day = 9.13 per person-year (0.025 365.25 , where we use the “0.25” fraction to allow for 366 days in a leap year, which occurs every fourth year) = 91,312.5 per 10,000 person-years (0.025 365.25 10,000) Some example interpretations are: IRD = 0 “there was no effect of exposure” IRD = 5.5/person-day “there was an absolute 5.5 cases per person-day increase in disease rate with exposure” Chapter 3-5 (revision 14 Feb 2012) p. 25 Risk Ratio We call the risk ratio a relative effect. RR = Rexposed / Runexposed which has a range of [0, +) = 1 when there is no effect (Rexposed = Runexposed) Some example interpretations are: RR = 1 RR = 3.2 RR = 3.2 “there was no effect of exposure” “there was a 3.2-fold increase in risk with the exposure” “there was a 220% increase in risk with the exposure” (this comes from RR-1 = 3.2 -1 = 2.2, or 220%, since RR=1 is no effect) Suppose we are studying an intervention, such as a new therapy, that decreases risk. We would call our new therapy the exposed group and the standard therapy the unexposed group. An algebraically similar magnitude of decrease as the RR = 3.2 example would be 1/ RR = 1/3.2 = 0.31, so our protective RR would be RR = 0.31 To express this as a proportion decrease we use 1- RR = 1 - 0.31 = 0.69 or 69% “there was a 69% decrease in risk with the new therapy” It hardly seems fair that a deleterious effect can range from 1 to +, and so can explode into a really large effect, while a protective effect is crunched up between 0 and 1, so can never get bigger than a 100% decrease. It makes sense, however, since the best you can do is eliminate all cases of disease, which is a 100% reduction. Chapter 3-5 (revision 14 Feb 2012) p. 26 Incidence Rate Ratio We call the incidence rate ratio a relative effect. IRR = IRexposed / IRunexposed which has a range of [0, +) = 1 when there is no effect (IRexposed = IRunexposed) Some example interpretations are: IRR = 1 IRR = 3.2 IRR = 3.2 “there was no effect of exposure” “there was a 3.2-fold increase in disease rate with the exposure” “there was a 220% increase in disease rate with the exposure” (this comes from IRR-1 = 3.2 -1 = 2.2, or 220%, since RR=1 is no effect) Notice that with IRR, we no longer have to include the units of time. This is because the units of units cancel out of the IRR equation. To illustrate this, c PT1 IRR d PT0 5 88 person years 3 50 person years , for example 5 1 5 88 person years 88 , since we can cancel the units (just as we cancel numbers) 3 1 3 50 person years 50 Chapter 3-5 (revision 14 Feb 2012) p. 27 Attributable Fraction and Preventable Fraction If we divide the risk difference by the risk in the exposed, we get the attributable fraction (Rothman, 2002, p.53). AF RD R1 R0 R1 R0 1 RR 1 1 R1 R1 R1 R1 RR RR If the risk difference reflects a causal effect that is not distorted by any bias (including confounding bias), then the attributable fraction is the proportion of the disease burden among exposed people that is caused by the exposure (Rothman, 2002, p.53). If the exposure is protective, we can calculate the preventable fraction (Rothman and Greenland, 1998, p.55). PF R RD R0 R1 R0 R1 1 1 1 RR R0 R0 R0 R0 R0 If the risk difference reflects a causal effect that is not distorted by any bias (including confounding bias), then the preventable fraction is the proportion of the disease burden among nonexposed people that could be prevented by exposure (Rothman and Greenland, 1998, p.55). To obtain the overall attributable fraction for the population, we multiply the attributable fraction by the proportion of all cases in the total population that are exposed. (Rothman, 2002, p.54) For example, if AF=0.8 and here are 1400 cases in the entire population, of whom 500 are exposed (proportion of exposed cases is 500/1400 = 0.357) then the overall attributable fraction for the population 0.8 0.357 = 0.286. That is, 28.6% of all cases in the population are attributable to the exposure. Chapter 3-5 (revision 14 Feb 2012) p. 28 We computed this above for Bergstrom’s (2004) data, based on incidence proportions, csi 7 4 240 734 | Exposed Unexposed | Total -----------------+------------------------+-----------Cases | 7 4 | 11 Noncases | 240 734 | 974 -----------------+------------------------+-----------Total | 247 738 | 985 | | Risk | .0283401 .0054201 | .0111675 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Risk difference | .02292 | .0015582 .0442818 Risk ratio | 5.228745 | 1.543692 17.71065 Attr. frac. ex. | .8087495 | .3522022 .9435368 Attr. frac. pop | .5146588 | +------------------------------------------------chi2(1) = 8.80 Pr>chi2 = 0.0030 and again with Sulkowski’s (2000) data, based on incidence rates, iri 26 5 795 246 | Exposed Unexposed | Total -----------------+------------------------+-----------Cases | 26 5 | 31 Person-time | 795 246 | 1041 -----------------+------------------------+-----------| | Incidence Rate | .0327044 .0203252 | .0297791 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Inc. rate diff. | .0123792 | -.0094249 .0341833 Inc. rate ratio | 1.609057 | .6080283 5.36572 Attr. frac. ex. | .3785178 | -.6446602 .8136317 Attr. frac. pop | .3174666 | +------------------------------------------------(midp) Pr(k>=26) = 0.1684 (midp) 2*Pr(k>=26) = 0.3368 Chapter 3-5 (revision 14 Feb 2012) (exact) (exact) (exact) (exact) p. 29 If the exposure is categorized into more than two levels, we can use an extension of our formula which takes into account each of the exposure levels (Rothman, 2002, p.54) total attributable fraction = (AFi x Pi ) where i represents an exposure category Hypothetical data giving 1-year disease risk for people at three levels of exposure Exposure None Low High Total Disease 100 1200 1200 2500 No disease 9900 58,800 28,800 97,500 Total 10,000 60,000 30,000 100,000 Risk 0.01 0.02 0.04 0.025 Risk ratio 1.00 2.00 4.00 Proportion 0.04 0.48 0.48 of all cases The attributable fraction for the group with no exposure is 0. For the low-exposure group, the attributable fraction is (RR-1)/RR = (2-1)/2=0.5. For the high-exposure group, AF = (41)/4=0.75. The total attributable fraction is: total AF = (AFi x Pi ) = 0 + 0.5(.48) + 0.75(.48) = 0.60 Chapter 3-5 (revision 14 Feb 2012) p. 30 Published Population Attributable Fraction Example Lee et al (N Engl J Med, 2006) report the following: Statistical Analysis Section (p.140, second to last paragraph) We estimated the population attributable risk (PAR) for heart failure associated with parental occurrence of the condition as a function of the proportion of cases occurring in those with a parent with heart failure (pd) and the multivariable-adjusted relative risk (RR, equivalent to hazard ratio from models with clinical covaraites), calculated19 as PAR = pd (RR - 1) ×100. RR Results Section (p.143, first paragraph) The population-attributable risk of heart failure that was due to the presence of the condition in a parent was 17.8 percent. Discussion Section (p.145, end of first paragraph) Approximately 18 percent of the heart-failure burden in the offspring was attributable to parental heart failure. Our findings that heart failure in their parents predisposes people to both left ventricular systolic dysfunction and overt ehart failure underscore the contribution of familial factors to development of the condition. This was calculated using the following, disp 39/(39+51)*(1.70-1)/1.70*100 where the 39 and 51 come from the 2nd row of Table 5 (the number cases for the two groups) and the 1.70 is the multivariable-adjusted HR shown on row 6. Lee is using his own sample data to estimate the proportion of all cases in the total population that is exposed, which is the “pd” in his PAR equation. Generally, you would not do this because a study sample is generally restricted in some way and so is not representative of the population (see Note on page 34 below). Lee’s estimate might be sufficiently accurate, however, since his sample is community-based sample (comes from the Framingham Offspring Study, began in 1971 with the enrollment of childen of the original Framingham Heart Study cohort). In Lee’s Statistical Analysis Section, reference 19 is cited. This is Rockhill (1998), which is a very helpful paper on the subject. This paper is well worth reading before publishing a population attributable fraction. Chapter 3-5 (revision 14 Feb 2012) p. 31 Stata Exercise 4 (using variables with epitab) We will practice using the following data file. Evans County Dataset (evans.dta) Source dataset to accompany Kleinbaum and Klein (K&K chapter 2) http://www.sph.emory.edu/~dkleinb/logreg2.htm#data Brief Description Data are from a cohort study in which n=609 white males were followed for 7 years, with coronary heart disease as the outcome of interest. Codebook n = 609 outcome chd coronary heart disease (1=presence, 0=absence) predictors cat catecholamine level (1=high, 0=normal) age age in years (continuous) chl cholesterol (continuous) smk smoker (1=ever smoked, 0=never smoked) ecg electrocardiogram abnormality (1=presence, 0=absence) dbp diastolic blood pressure (continuous) sbp systolic blood pressure (continuous) hbp high blood pressure (1=presence, 0=absence) defined as: DBP 160 or SBP 95 data management id subject identifier (unique #, one observation per subject) Chapter 3-5 (revision 14 Feb 2012) p. 32 Start the Stata program and read in the data, File Open Find the directory where you copied the course CD Change to the subdirectory datasets & do-files Single click on evans.dta Open use "C:\Documents and Settings\u0032770.SRVR\Desktop\ Biostats & Epi With Stata\datasets & do-files\evans.dta", clear * which must be all on one line, or use: cd "C:\Documents and Settings\u0032770.SRVR\Desktop\” cd “Biostats & Epi With Stata\datasets & do-files" use evans.dta, clear Chapter 3-5 (revision 14 Feb 2012) p. 33 To look at the association between CHD and Smoking, we use Statistics Epidemiology and related Tables for epidemiologists Cohort study risk ratio etc. Main tab: Case variable: chd Exposed variable: smk OK cs chd smk | smk | | Exposed Unexposed | Total -----------------+------------------------+---------Cases | 54 17 | 71 Noncases | 333 205 | 538 -----------------+------------------------+---------Total | 387 222 | 609 | | Risk | .1395349 .0765766 | .1165846 | | | Point estimate | [95% Conf. Interval] |------------------------+---------------------Risk difference | .0629583 | .0138116 .112105 Risk ratio | 1.822161 | 1.083858 3.063382 Attr. frac. ex. | .4512012 | .0773703 .6735634 Attr. frac. pop | .3431671 | +----------------------------------------------chi2(1) = 5.43 Pr>chi2 = 0.0198 Note: We see from the “Attr. frac. pop” line that the overall attributable fraction for the population is 0.34, or 34%. Recall that this is computed by AR (proportion of all cases in the total population that is exposed), or 0.4512012 (54/71). We can verify this using the display command in Stata’s command window, display (54/71)*.4512012 .34316711 Note. Stata’s “prevented fraction for the population” is of little interest, because it depends on an estimate of the “proportion of all cases in the total population that is exposed” being made by your sample data. It is hard to convince your reader that this estimate is reliable, since research samples are rarely representive of the “total population”. Research samples in medical research are usually restricted in some way in order to eliminate confounding and to better test theories. Chapter 3-5 (revision 14 Feb 2012) p. 34 References Blosseld H-P, Hamerle A, Mayer KU. (1989). Event History Analysis: Statistical Theory and Application in the Social Sciences. Hillsdale NJ, Lawrence Erlbaum Associates. Granados JAT. (1997). On the terminology and dimensions of incidence. J Clin Epidemiol 50(8):891-897. Horton NJ, Switzer SS. (2005). Statistical methods in the Journal. [letter] NEJM 353;18:197779. Lee ET. (1980). Statistical Methods for Survival Data Analysis. Belmont CA, Lifetime Learning Publications. Myers MH. (1969). A Computing Procedure for a Significance Test of the Difference Between Two Survival Curves, Methodological Note No. 18 in Methodoligcal Notes compiled by the End Results Sections, National Cancer Institute, National Institute of Health, Bethesda, Maryland. Rockhill B, Newman B, Weinberg C. (1998). Use and misuse of population attributable fractions. Am J Public Health 88(1):15-19. Rothman KJ. (1986). Modern Epidemiology. Boston, Little, Brown. Rothman KJ. (2002). Epidemiology: An Introduction. Oxford, Oxford University Press. Rothman KJ, Greenland S. (1998). Modern Epidemiology, 2nd ed. Philadelphia, PA, Lippincott-Raven Publishers. Sulkowski MS, Thomas DL, Chaisson RE, Moore RD. (2000). Hepatotoxicity associated with antiretroviral therapy in adults infected with human immunodeficiency virus and the role of hepatitis C or B virus infection. JAMA 283(1):74-80. Chapter 3-5 (revision 14 Feb 2012) p. 35