Disease Frequency and Effect Measures

advertisement
Chapter 3-5. Disease Frequency and Effect Measures
<< Half of this chapter is paraphrase right out of Rothman (2002), done that way
to quickly prepare a lecture while teaching out of the Rothman text. It needs to be
further turned into my own work. >>
Given that epidemiology is the study of the occurrence of disease, we are interested in
measurements of disease frequency.
Three measures of disease frequency we discuss in this chapter are: risk, incidence rate,
prevalence, and hazard rate.
We will then combined these to construct measures of effect: risk difference, risk ratio,
prevalence ratio, hazard ratio, and attributable fraction.
These effect measures are used to quantify potential causal effects.
How Frequently Are Epidemiology Statistics Used
Horton and Switzer (2006) surveyed what statistical methods are used in research articles
published in N Engl J Med. They found that 35% of research articles published in 2004-2005
reported epidemiologic statistics.
Risk (Incidence Proportion)
Using Rothman’s notation (Rothman, 2002, p.24), we measure risk as:
risk=
A number of subjects developing disease during a time period

N
number of subjects followed for the time period
which is the proportion of subjects developing disease during a time period.
That is,
risk =
# cases
, defined for a specific time period
sample size
Like proportions in general, this proportion ranges between 0 and 1:
0 = no one develops disease
up to
1 = everyone develops disease
_____________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual. Salt Lake City, UT: University of Utah
School of Medicine. Chapter 3-5. (Accessed February 14, 2012, at http://www.ccts.utah.edu/biostats/
?pageId=5385).
Chapter 3-5 (revision 14 Feb 2012)
p. 1
While we measure risk at the group level, we use it as an estimate of the probability than an
individual will develop disease.
To consider this proportion to be a probability estimate is consistent with probability theory.
Definition: Probability = long run average of occurrences of an event
= “the actual probability existing in nature”
Formally stated as,
Probability =
A
N
as N  
In epidemiology, we, of course, use N’s smaller than infinity, so our risks are probability
estimates, rather than “the exact probability existing in nature”.
Having now clarified that risk is a probability, this is a good time to point out why we use the
term “risk factor” to denote a variable thought to be “causally related” (i.e., a possible component
cause) to a disease outcome. We say “risk factor”, rather than “casual factor” to clarify our
uncertainty that we have indeed identified an actual component cause.
If we were to measure risk in a single person, applying the risk equation, risk = A/N,
risk would be either 0 or 1.
risk = 0/1 = 0
, if person did not get disease
risk = 1/1 = 1
, if person did get disease
or
This is rather pointless, however, since it is unlike that a person surely will not get disease
(risk=0) or surely will get disease (risk=1).
When speaking about risk applying to an individual, then, we are describing the probability that a
person will develop a given disease—not whether disease did or did not develop in an
individual—which we compute from the group data as
risk = A / N, for our given follow-up period
The risk in a group is also referred to as the incidence proportion (Rothman, 2002, p.25).
From the sufficient/component cause theory perspective, the incidence proportion is the
proportion of study subjects that complete a sufficient cause for disease during the follow-up
period.
The word “incidence”, which is a synonym for “occurrence”, is a natural choice here. An
“incident” is something that occurs as the consequence of something, giving the causal
implication which epidemiologists are so fond of.
Chapter 3-5 (revision 14 Feb 2012)
p. 2
Clarifying Example (Rothman, 2002, p.25)
Suppose you read that “women who are 60 years old have a 2% risk of dying from cardiovascular
disease”.
What does that mean?
It certainly does not apply to the next 24 hours.
Likewise, the risk of developing and dying from cardiovascular disease over her remaining
lifetime would likely be greater than 2%.
There might be a specific period of time over which the 2% figure would be correct, but any
other period would have a different value of risk.
The only way to interpret a risk, then, is to know the time period for which the risk applies.
Competing Risks
Risk is an excellent measure of disease frequency because everyone understands it, even without
epidemiology training.
It does have a drawback, however. Over a long time interval, it is impossible to accurately
measure risk, because some people will die from other causes which are not being studied.
We call this phenomenon of people being removed from a study due to death from other causes
as competing risks.
The only outcome which does not have this problem is “death from all causes”, since there is no
possibility of anyone dying from a cause not under study.
Chapter 3-5 (revision 14 Feb 2012)
p. 3
Clarifying Example (Rothman, 2002, p.27)
Suppose you wish to measure the incidence proportion of domestic violence in a population of
10,000 married women over a 30-year period.
Clearly, not all of the 10,000 women will survive the 30-year period, due to deaths from
cardiovascular disease or cancer.
It is likely that some of these women who died from competing risks would have experienced
domestic violence if they had remained in the study.
The numerator of the incidence proportion is underestimated, then, while the denominator still
contains all 10,000 women. The result is that the incidence proportion is an underestimate of the
30-year risk.
Losses to Follow-up
A related problem with long-term follow-up is losses to follow-up.
This occurs when people drop out of the study because they move away, or because they choose
to decline further participation.
Losses to follow-up create the same problem as competing risks, in that the incidence proportion
is underestimated when this occurs.
If no competing risks or losses to follow-up occur, the incidence proportion is a reasonable
estimate of disease frequency.
Generally this only occurs with very short study periods, where losses due to competing risks and
losses to follow-up are rare.
Chapter 3-5 (revision 14 Feb 2012)
p. 4
Stata Exercise 1 (calculating incidence proportions with a crosstabulation table)
Look at the Bergstrom et al (2004) article, BergstromArth&Rheum2004.pdf. Notice on page
1960, second column, paragraph 2, they state:
“The cumulative incidence of symptomatic coccidioidomycosis was obtained both in
patients receiving TNF antagonist therapy and in those receiving other therapies, as a
group.”
Cumulative incidence is identically incidence proportion. Even Rothman, in his earlier edition
of Modern Epidemiology (Rothman, 1986, pp.23-31), used to call it that.
The task is to check the “Incidence of infection” column of Bergstrom’s Table 2, using Stata.
Bergstrom’s Table 2 contains the following information:
Infliximab
Other therapy
Total
coccidioidomycosis
Yes
No
7
240
4
734
11
974
Total
247
738
985
Incidence of infection
0.028
0.005
0.011
Using Stata’s menus,
Statistics
Summaries, tables & tests
Tables
Table calculator
Main tab: User supplied cell frequencies: 7 240 \ 4 734
Cell contents: within-row relative frequencies
OK
which is identical to running the following (tabulate immediate) command in the Command
Window,
tabi 7 240 \ 4 734 , row
|
col
row |
1
2 |
Total
-----------+----------------------+---------1 |
7
240 |
247
|
2.83
97.17 |
100.00
-----------+----------------------+---------2 |
4
734 |
738
|
0.54
99.46 |
100.00
-----------+----------------------+---------Total |
11
974 |
985
|
1.12
98.88 |
100.00
Converting the percents (2.83% and 0.54%) to proportions (0.028 and 0.005), we see that
Bergstrom’s calculations were correct.
Chapter 3-5 (revision 14 Feb 2012)
p. 5
Bergstrom never discusses competing risks or losses to follow-up in her article. Her study design
used all of the patients receiving TNF antagonist therapy (infliximab) during a three-year
interval, with a control group of 3 times as many patients receiving other therapies. That data
were from medical records. Her three-year study period is certainly long enough that she will
have competing risks and losses to follow-up, although she never discusses this potential
problem.
This study has the potential to be flawed by losses-to-followup bias, where bias means simply the
deviation of results or inferences from the truth.
Incidence Rate and Mortality Rate
To address the problem of competing risks and losses to follow-up, epidemiologists often use
incidence rate in place of incidence proportion (Rothman, 2002, p.28).
Similar to incidence proportion, incidence rate uses the number of cases, A, as the numerator.
We use a different denominator, however. Instead of dividing by the number of people at the
start of follow-up, we divide by a measure of time.
This time measure is the summation, across all individuals, of the follow-up time.
Incidence rate 
A
number of subjects developing disease

Time total time experienced for the subjects followed
the denominator usually referred to as person-time.
A mortality rate is an incidence rate in which the event being measured is death (Rothman, 2002,
p.30)
Mortality rate 
A
number of subjects dying

Time total time experienced for the subjects followed
Chapter 3-5 (revision 14 Feb 2012)
p. 6
The person-time denominator represents the total “time at risk” for disease for the study group.
In the following diagram, we see the time at risk for five people being followed to measure the
mortality rate of leukemia (Rothman, 2002, p.30).
Leukemia death
Death from automobile crash
Lost to follow-up
End of follow-up
End of follow-up
2
3
4
5
Time (Years)
1
The first person died from leukemia after 3 years, and so contributed 3 years of time-at-risk to the
denominator.
The second person died from another cause (a competing risk) at which time the person was no
longer at risk for leukemia, contributing 4 years of time-at-risk to the denominator.
The third person was lost to follow-up. Although this person could still die from leukemia, we
would never know about it so it could not be counted in the numerator. Thus the time-at-risk
ended upon lost to follow-up, contributing 2 years to the denominator.
The fourth and fifth persons were followed to the end of the study’s follow-up period, each
contributing 5 years of time-at-risk to the denominator.
Therefore, the mortality rate is:
MR 

cases
1
1 case


person-time 3  4  2  5  5 19 person-years
0.0526 cases
 5.26 cases per 100 person-years
person-year
Notice that incidence rates treat one unit of time as equivalent to another, regardless of whether
these time units come from the same person or from different people.
If you allow cases in the same person to contribute more than once to the numerator, something
you might do in an upper respiratory tract infection study, then the denominator should likewise
include all of the time each person is at risk of getting the any episodes of the infection. Ideally,
the follow-up for a specific individual should temporarily end when an episode begins, and
resume after recovery of each episode.
If a person can experience an event only once, such as death from leukemia, the person ceases to
contribute follow-up time after the event occurs.
Chapter 3-5 (revision 14 Feb 2012)
p. 7
If you are studying events that can occur more than once, but you only count the first occurrence
of the event (such as first heart attack) in the numerator, then follow-up ends with the first event.
In all of these cases, the time that goes into the denominator is the time experienced by the
people being followed during which the disease or event studied could have occurred.
For this reason, the time tallied in the denominator of an incidence rate is often referred to as the
time at risk of disease.
From the sufficient/component cause theory perspective, the best measure of person-time is the
induction time, since any other time measurement will contribute excess time to the measure.
However, we also learned in Chapter 2 that induction time is defined for each specific
component cause. The way researchers allow for differing times for specific component causes
(specific exposure variables of interest) in their analysis is by using either Poisson regression or
Cox regression with time-dependent covariates.
Chapter 3-5 (revision 14 Feb 2012)
p. 8
Aside: Are researchers consistent with labeling of proportions and rates?
Incidence = number of new cases of a disease in a specified population within a specified time
period.
Incidence proportion = cases / N = risk
Incidence rate = cases / person-time
Rothman makes a clear distinction between proportions and rates, but is such a distinction found
in the literature? Are incidence, risk, and rate used synonymously?
Exercise. Take out the Sulkowski (2000) article, SulkowskiJAMA2000.pdf
Find on page 77, bottom of first column, “Use of ritonavir was associated with a higher incidence
of toxicity (30%, 95% CI, 17.9%-44.6%).” The 30% comes from the Ritonavir lines of Table 3,
(6+9)/(22+28)=30%.
Now go to column 2 on the same page, 2nd sentence under the Hepatotoxity and Chronic Viral
Hepatitis heading: “Rate of severe toxicity with any PI in coinfected patients was 12.2%
(13/107; 95% CI, 6.6%-19.9%).
From these two examples, we see that the authors are using incidence and rate interchangeably.
Next, find the caption to Figure 1 on the same page, “Incidence Rate (Cases per Persons
Exposed) of Hepatotoxicity During Antiretroviral Therapy, by Drug Regimen”
The authors are using incidence rate for what Rothman refers to as incidence proportion.
Finally, look at column headings for Table 3 on the same page, “Incidence (Cases/Persons
Exposed)” and “Incidence (Cases/100 Person-Months)”.
In this article, the authors did not use the names incidence proportion and incidence rate to make
the distinction, but at least they were always careful to inform the reader what type of the
denominator was being used.
Such inconsistency is not surprising. Even epidemiologists are inconsistent with each other
when it comes to effect measure terminology, so confusion abounds. To see this, look at the
Granados (1997) article.
Chapter 3-5 (revision 14 Feb 2012)
p. 9
Stata Exercise 2 (calculating incidence proportions with epitab)
The classical form for displaying an exposure-disease relationship in a 2  2 table (2 rows and 2
columns) is:
Incidence Proportion Layout
Disease
Cases
Noncases
Exposure
Exposed
Unexposed
a
b
c
d
There is a suite of commands in Stata, called epitab (tables for epidemiologists), used for
epidemiologic statistics.
To use epitab with Bergstrom’s (2004) Table 2,
Infliximab
Other therapy
Total
coccidioidomycosis
Yes
No
7
240
4
734
11
974
Total
247
738
985
Incidence of infection
0.028
0.005
0.011
we first must put the data in the order of the classical incidence proportion layout:
Disease
Cases
Noncases
Exposure
Exposed
Unexposed
7
4
240
734
Now, using Stata’s epitab feature
Statistics
Epidemiology and related
Tables for epidemiologists
Cohort study risk ratio etc. calculator
7
4
240 734
OK
csi 7 4 240 734
where “csi” is cohort study immediate (this name comes from the incidence
proportion being the effect measure of choice, and being a correct
statistic, for a cohort study)
Note: we did not have to use the “\” for a carriage return, as was done with tabi,
in the csi command because Stata knows it will be a 2 × 2 table read in row
order.
Chapter 3-5 (revision 14 Feb 2012)
p. 10
|
Exposed
Unexposed |
Total
-----------------+------------------------+-----------Cases |
7
4 |
11
Noncases |
240
734 |
974
-----------------+------------------------+-----------Total |
247
738 |
985
|
|
Risk | .0283401
.0054201 |
.0111675
|
|
|
Point estimate
|
[95% Conf. Interval]
|------------------------+-----------------------Risk difference |
.02292
|
.0015582
.0442818
Risk ratio |
5.228745
|
1.543692
17.71065
Attr. frac. ex. |
.8087495
|
.3522022
.9435368
Attr. frac. pop |
.5146588
|
+------------------------------------------------chi2(1) =
8.80 Pr>chi2 = 0.0030
Reading from the “Risk” line, we see the proportions 0.028 and 0.005, which are identical to
what we calculated using the crosstabulation approach above.
Chapter 3-5 (revision 14 Feb 2012)
p. 11
Stata Exercise 3 (calculating incidence rates with epitab)
The classical form for displaying an exposure-disease relationship when person-time
denominators are used is:
Incidence Rate Layout
Disease
Cases
Person-Time
Exposure
Exposed
Unexposed
a
b
PT(exp)
PT(unexp)
In the Sulkowski (2000) article, Table 3, we find
Antiretroviral Drug
Regimen
Dual nucleoside analog
(referent regimen)
Protease inhibitor all
(study regimen)
Severe
Person-Time
Hepatotoxicity (100 Person-Months)
cases
5
246
Incidence
Rate
26
3.3
795
2.0
To replicate these rates, we put the data in the Incidence Rate Layout
Incidence Rate Layout
Disease
Cases
Person-Time
Exposure
Exposed
Unexposed
26
5
795
246
Now, using Stata’s epitab feature
Statistics
Epidemiology and related
Tables for epidemiologists
Incidence-rate ratios calculator
26
5
795 246
OK
iri 26 5 795 246
where “iri” is incidence rate immediate
Chapter 3-5 (revision 14 Feb 2012)
p. 12
|
Exposed
Unexposed |
Total
-----------------+------------------------+-----------Cases |
26
5 |
31
Person-time |
795
246 |
1041
-----------------+------------------------+-----------|
|
Incidence Rate | .0327044
.0203252 |
.0297791
|
|
|
Point estimate
|
[95% Conf. Interval]
|------------------------+-----------------------Inc. rate diff. |
.0123792
|
-.0094249
.0341833
Inc. rate ratio |
1.609057
|
.6080283
5.36572
Attr. frac. ex. |
.3785178
|
-.6446602
.8136317
Attr. frac. pop |
.3174666
|
+------------------------------------------------(midp)
Pr(k>=26) =
0.1684
(midp) 2*Pr(k>=26) =
0.3368
(exact)
(exact)
(exact)
(exact)
Since Sulkowski was expressing his rates per 100 person-years (3.3 and 2.0), we multiply our
incidence rates by 100 to replicate his result.
Chapter 3-5 (revision 14 Feb 2012)
p. 13
Person-Time Units
Researchers generally express their rates in whatever person-time units provide at least one digit
to the left of the decimal place. Notice that
rate = 3.3 cases per 100 person-years
is much easier to grasp than
rate = 0.033 cases per person-year.
Aside: Changing units of person-time
To change the units of anything, we divide by the new units. For example, if we have 30 eggs
and we want to express it in dozens, we divide by a dozen (or 12),
eggs in dozens = 30/12 = 2.5 dozen
The general form for a rate is:
rate =
cases
person-years
To express this in 100 person-years units, or per 100 person-years, which is person-years in units
of 100, we simply divide the person-years by 100
rate per 100 person-years =
cases
person-years

100
cases
 person-years  
1 

 100 

 cases 100   rate  100
person-years
which we see is simply multiplying the original rate by 100, the unit of person-years we desire.
Chapter 3-5 (revision 14 Feb 2012)
p. 14
Prevalence Proportion
Both incidence proportion and incidence rate measure the frequency of disease onset.
In contrast, prevalence proportion, or simply prevalence, does not measure disease onset, but
instead is a measure of disease status (Rothman, 2002, p.40).
It is the proportion of people in a population that has a specific disease.
For a population of size N, where P individuals have the disease at a given point in time,
prevalence proportion = P/N.
For example, suppose that among 10,000 female residents of a town on July 1, 2001, that 1200
have hypertension. Then,
prevalence proportion = 1200/10,000 = 0.12, or 12%
Prevalence proportion is affected by disease occurrence, as the greater the incidence of disease,
the more people will have it.
Prevalence is also related to the length of time that a person has disease, as the longer the
duration of disease once it occurs, the higher the prevalence.
Because prevalence is a mixture of incidence rate and disease duration, it is not as useful for
studying the cause of disease. It is useful, however, as a measure of disease burden (such as what
proportion of the population at any given point in time have the disease and thus contribute to
health care costs).
Chapter 3-5 (revision 14 Feb 2012)
p. 15
Life Tables and Hazard Rates
Although the incidence rate is a more sensitive analysis than the incidence proportion, because its
denominator allows for attrition due to competing risks and losses to follow-up, it is usually too
simplistic.
That is, it makes no distinction between cases that occur early on from cases that occur later.
Because of its simplicity in calculation, it assumes that the risk is constant over the follow-up
period.
It is most often the case that the incidence rate changes over the follow-up period.
To account for the changing incidence rate, as well as the shrinking denominator, we can
calculate the risk separately for subintervals of the time period. This is called a life-table.
This will be demonstrated with data provided in Lee (1980, Table 3.5, p.31), which originally
came from Myers (1969). The data represent a cohort of male patients with localized cancer of
the rectum diagnosed in Connecticut from 1935 to 1944. In life-table format,
Life Table for male patients with localized cancer of the
rectum diagnosed in Connecticut from 1935 to 1954
(LeeLife dataset)
Interval Beginning
Lost to
(years)
N
Deaths
Follow-up Hazard
1
388
167
2
0.5502
2
219
45
1
0.2296
3
173
45
1
0.3000
4
127
19
1
0.1624
5
107
17
1
0.1735
6
89
11
1
0.1325
7
77
8
1
0.1103
8
68
0
1
0.0000
9
67
6
1
0.0945
10
60
7
1
0.1250
The “Hazard” is the interval-specific risk (year-specific risk in this example). The traditional
calculation for life tables is to use the acturial method of adjustment for deaths and censored
observations. In the actuarial method, deaths and losses are assumed to occur in the middle of
the time interval, so the formula is:
interval-specific hazard = deaths/[(beginning N) – (1/2)(deaths + lost)]
For the first interval, we get
year 1 hazard = 167 / [ 388 – (1/2)(167 + 2)] = 167 / [ 388 – 84.5 ] = 167 / 303.5 = 0.5502
Chapter 3-5 (revision 14 Feb 2012)
p. 16
If we added a cumulative failure column, without using the actuarial adjustment, based on
interval-specific hazard = deaths/(beginning N)
it would be the Kaplan-Meier failure curve. We will do this in Chapter 10, as well as defer the
calculation of a life-table in Stata until then.
Kaplan-Meier curves, accompanied by Cox regression, are a much better and much more often
used approach than incidence proportions and incidence rates. Kaplan-Meier curves and Cox
regression are survival analysis methods, which like life-tables, are methods of calculating risks
over a time period with changing incidence and shrinking denominators.
The interval-specific hazard described above can be extended into an effect measure called the
hazard rate, usually expressed without the actuarial adjustment. The hazard rate is the
theoretical limit approached by an incidence rate as the time interval is narrowed toward zero
(Rothman and Greenland, 1998, p.35)
Chapter 3-5 (revision 14 Feb 2012)
p. 17
Measures of Causal Effects
The primary goal of epidemiologic research is to study the causes of disease.
To determine whether an exposure causes disease is problematic, however, because of the
following essential epidemiologic principle (Rothman, 2002, p.44),
Essential epidemiologic principle
A person may be exposed to an agent and then develop disease
without there being any causal connection between exposure and
disease.
That is, the exposure-disease relationship may only be an association, where the disease seemed
to follow the exposure in time, but was actually caused by something else.
Given this reason (merely association), we cannot consider the incidence proportion or the
incidence rate among exposed people to measure a causal effect.
To measure a causal effect, we have to compare the experience of exposed people with what
would have happened without the exposure.
This is the counterfactual ideal.
Chapter 3-5 (revision 14 Feb 2012)
p. 18
The Counterfactual Ideal
If we compare risks or incidence rates between exposed and unexposed people, we cannot be
certain that the differences observed are attributable to the exposure.
The differences could be attributable to other factors that differ between exposed and unexposed
people, some being factors that we have not even measured.
The ideal comparison would be comparing people to themselves, in both an exposed and
unexposed state, at a single instance in time when nothing else varied (the perfect situation of
varying a single factor while holding all else constant).
If this impossible goal were achievable, we could determine the effect of the exposure, because
the only difference between the two settings would be the exposure.
Because this situation is not realistic, it is called counterfactual (Rothman, 2002, p.45).
A crossover study does use the same person, and so comes close to the counterfactual ideal, but
does not quite achieve it because a person can only be in one treatment group at a time, and thus
uncontrolled factors can vary.
Although not achievable, the counterfactual ideal provides a reference point, or best case, for
judging a measure of causal effect. That is, the closer our study approximates the counterfactual,
the better our effect measure represents an actual causal effect.
Chapter 3-5 (revision 14 Feb 2012)
p. 19
Effect Measures
We cannot achieve the counterfactual ideal in a research study, but we can strive to come as close
as possible.
What we do is compare an exposed group to an unexposed group, where the two groups are as
close as possible except for the exposure.
We might perform a crossover study, or perform a randomized experiment to balance the effect
of other factors, or perhaps choose an unexposed group with a similar risk profile.
We might also use statistical methods such as stratification and regression, which is discussed
later, to help assure comparability.
For a crude analysis, in particular, which does not use stratification or regression, we must make
the following assumption (Rothman, 2002, p.47),
“Otherwise Comparable” Assumption
If we can assume that the exposed and unexposed groups are otherwise
comparable with regard to risk for disease, we can compare measures of
disease occurrence to assess the effect of the exposure.
In other words, we must assume that there is no confounding.
Chapter 3-5 (revision 14 Feb 2012)
p. 20
Risk Difference
The risk difference (or incidence proportion difference) is simply the difference in incidence
proportions between the exposed and unexposed groups.
 A  A
RD  R1  R0      
 N 1  N 0
where 1 denotes exposed group and 0 denotes unexposed group
We subtract in this order, so that if risk for greater for the exposed group, the number is positive.
A positive number implies risk is increasing, while a negative number implies risk is decreasing
with exposure.
We computed this above for Bergstrom’s (2004) data,
csi 7 4 240 734
|
Exposed
Unexposed |
Total
-----------------+------------------------+-----------Cases |
7
4 |
11
Noncases |
240
734 |
974
-----------------+------------------------+-----------Total |
247
738 |
985
|
|
Risk | .0283401
.0054201 |
.0111675
|
|
|
Point estimate
|
[95% Conf. Interval]
|------------------------+-----------------------Risk difference |
.02292
|
.0015582
.0442818
Risk ratio |
5.228745
|
1.543692
17.71065
Attr. frac. ex. |
.8087495
|
.3522022
.9435368
Attr. frac. pop |
.5146588
|
+------------------------------------------------chi2(1) =
8.80 Pr>chi2 = 0.0030
Chapter 3-5 (revision 14 Feb 2012)
p. 21
Incidence Rate Difference
The incidence rate difference is simply the difference in incidence rates between the exposed and
unexposed groups.
a
b
 A   A 
IRD  IR1  IR0  

 
 
 PT 1  PT 0 PT1 PT0
where 1 denotes exposed group and 0 denotes unexposed group
We computed this above for Sulkowski’s (2000) data.
iri 26 5 795 246
|
Exposed
Unexposed |
Total
-----------------+------------------------+-----------Cases |
26
5 |
31
Person-time |
795
246 |
1041
-----------------+------------------------+-----------|
|
Incidence Rate | .0327044
.0203252 |
.0297791
|
|
|
Point estimate
|
[95% Conf. Interval]
|------------------------+-----------------------Inc. rate diff. |
.0123792
|
-.0094249
.0341833
Inc. rate ratio |
1.609057
|
.6080283
5.36572
Attr. frac. ex. |
.3785178
|
-.6446602
.8136317
Attr. frac. pop |
.3174666
|
+------------------------------------------------(midp)
Pr(k>=26) =
0.1684
(midp) 2*Pr(k>=26) =
0.3368
(exact)
(exact)
(exact)
(exact)
Difference measures such as risk difference and incidence rate difference measure the absolute
effect of an exposure.
It is also possible to measure the relative effect.
Chapter 3-5 (revision 14 Feb 2012)
p. 22
Risk Ratio or Relative Risk
The risk ratio, also called relative risk, is simply the ratio of incidence proportions for the
exposed and unexposed groups.
RR 
R1
R0
where 1 denotes exposed group and 0 denotes unexposed group
We place the exposed group risk on top, so that if risk is greater for the exposed group, RR > 1.
An RR > 1 indicates that the exposure is a risk factor, while an RR < 1 indicates that the exposure
is a protective factor. We sometimes refer to factors as deleterious or protective, respectively.
If the factor has no effect, than the numerator and denominator risks are equal, so RR = 1
indicates no effect, in contrast to a RD = 0 indicating no effect.
We computed this above for Bergstrom’s (2004) data,
csi 7 4 240 734
|
Exposed
Unexposed |
Total
-----------------+------------------------+-----------Cases |
7
4 |
11
Noncases |
240
734 |
974
-----------------+------------------------+-----------Total |
247
738 |
985
|
|
Risk | .0283401
.0054201 |
.0111675
|
|
|
Point estimate
|
[95% Conf. Interval]
|------------------------+-----------------------Risk difference |
.02292
|
.0015582
.0442818
Risk ratio |
5.228745
|
1.543692
17.71065
Attr. frac. ex. |
.8087495
|
.3522022
.9435368
Attr. frac. pop |
.5146588
|
+------------------------------------------------chi2(1) =
8.80 Pr>chi2 = 0.0030
Chapter 3-5 (revision 14 Feb 2012)
p. 23
Rate Ratio, or Incidence Rate Ratio, or Relative Rate
The rate ratio, also called incidence rate ratio or relative rate, is simply the ratio of incidence
rates for the exposed and unexposed groups.
IRR or RR 
IR1
IR0
where 1 denotes exposed group and 0 denotes unexposed group
As with the risk ratio, we place the exposed group rate on top, so that if rate is greater for the
exposed group, IRR > 1. An IRR > 1 indicates that the exposure is a risk factor, while an IRR < 1
indicates that the exposure is a protective factor.
If the factor has no effect, than the numerator and denominator rates are equal, so IRR = 1
indicates no effect, in contrast to an IRD = 0 indicating no effect.
We computed this above for Sulkowski’s (2000) data.
iri 26 5 795 246
|
Exposed
Unexposed |
Total
-----------------+------------------------+-----------Cases |
26
5 |
31
Person-time |
795
246 |
1041
-----------------+------------------------+-----------|
|
Incidence Rate | .0327044
.0203252 |
.0297791
|
|
|
Point estimate
|
[95% Conf. Interval]
|------------------------+-----------------------Inc. rate diff. |
.0123792
|
-.0094249
.0341833
Inc. rate ratio |
1.609057
|
.6080283
5.36572
Attr. frac. ex. |
.3785178
|
-.6446602
.8136317
Attr. frac. pop |
.3174666
|
+------------------------------------------------(midp)
Pr(k>=26) =
0.1684
(midp) 2*Pr(k>=26) =
0.3368
Chapter 3-5 (revision 14 Feb 2012)
(exact)
(exact)
(exact)
(exact)
p. 24
How to Interpret Sizes of Effect
Risk Difference
We call the risk difference an absolute effect.
RD = Rexposed – Runexposed which has a range of [-1, 1]
= 0 when there is no effect (Rexposed = Runexposed)
Some example interpretations are:
“there was no effect of exposure”
“there was an absolute 75% increase in risk with the exposure”
RD = 0
RD = 0.75
Suppose we are studying an intervention, such as a new therapy, that decreases risk. We would
call our new therapy the exposed group and the standard therapy the unexposed group. Than we
would say,
RD = -0.25
“there was an absolute 25% decrease in risk with the new therapy”
Incidence Rate Difference
Similarly, we call the incidence rate difference an absolute effect.
IRD = IRexposed – IRunexposed which has a range of (-, +)
= 0 when there is no effect (IRexposed = IRunexposed)
The incidence rate difference has an infinite range because an incidence rate, itself, has an
infinite range of (-, +). This is due to the 1/time units of the rate. We can greatly influence
the value of the rate by adjusting the units of time.
For example,
IR =0.025 per person-day
= 9.13 per person-year
(0.025  365.25 , where we use the “0.25” fraction to
allow for 366 days in a leap year, which occurs every
fourth year)
= 91,312.5 per 10,000 person-years (0.025  365.25  10,000)
Some example interpretations are:
IRD = 0
“there was no effect of exposure”
IRD = 5.5/person-day “there was an absolute 5.5 cases per person-day increase in
disease rate with exposure”
Chapter 3-5 (revision 14 Feb 2012)
p. 25
Risk Ratio
We call the risk ratio a relative effect.
RR = Rexposed / Runexposed which has a range of [0, +)
= 1 when there is no effect (Rexposed = Runexposed)
Some example interpretations are:
RR = 1
RR = 3.2
RR = 3.2
“there was no effect of exposure”
“there was a 3.2-fold increase in risk with the exposure”
“there was a 220% increase in risk with the exposure”
(this comes from RR-1 = 3.2 -1 = 2.2, or 220%, since RR=1 is no effect)
Suppose we are studying an intervention, such as a new therapy, that decreases risk. We would
call our new therapy the exposed group and the standard therapy the unexposed group. An
algebraically similar magnitude of decrease as the RR = 3.2 example would be
1/ RR = 1/3.2 = 0.31, so our protective RR would be RR = 0.31
To express this as a proportion decrease we use
1- RR = 1 - 0.31 = 0.69 or 69% “there was a 69% decrease in risk with the new therapy”
It hardly seems fair that a deleterious effect can range from 1 to +, and so can explode into a
really large effect, while a protective effect is crunched up between 0 and 1, so can never get
bigger than a 100% decrease.
It makes sense, however, since the best you can do is eliminate all cases of disease, which is a
100% reduction.
Chapter 3-5 (revision 14 Feb 2012)
p. 26
Incidence Rate Ratio
We call the incidence rate ratio a relative effect.
IRR = IRexposed / IRunexposed which has a range of [0, +)
= 1 when there is no effect (IRexposed = IRunexposed)
Some example interpretations are:
IRR = 1
IRR = 3.2
IRR = 3.2
“there was no effect of exposure”
“there was a 3.2-fold increase in disease rate with the exposure”
“there was a 220% increase in disease rate with the exposure”
(this comes from IRR-1 = 3.2 -1 = 2.2, or 220%, since RR=1 is no effect)
Notice that with IRR, we no longer have to include the units of time. This is because the units of
units cancel out of the IRR equation.
To illustrate this,
c
PT1
IRR 
d
PT0
5
88 person  years

3
50 person  years
, for example
  5 
1
 5 
 
  
 88   person  years   88 


, since we can cancel the units (just as we cancel numbers)
  3 
1
 3 
 
  
 50   person  years   50 
Chapter 3-5 (revision 14 Feb 2012)
p. 27
Attributable Fraction and Preventable Fraction
If we divide the risk difference by the risk in the exposed, we get the attributable fraction
(Rothman, 2002, p.53).
AF 
RD R1  R0 R1 R0
1
RR  1

 
 1

R1
R1
R1 R1
RR
RR
If the risk difference reflects a causal effect that is not distorted by any bias (including
confounding bias), then the attributable fraction is the proportion of the disease burden among
exposed people that is caused by the exposure (Rothman, 2002, p.53).
If the exposure is protective, we can calculate the preventable fraction (Rothman and Greenland,
1998, p.55).
PF 
R
RD R0  R1 R0 R1



 1  1  1  RR
R0
R0
R0 R0
R0
If the risk difference reflects a causal effect that is not distorted by any bias (including
confounding bias), then the preventable fraction is the proportion of the disease burden among
nonexposed people that could be prevented by exposure (Rothman and Greenland, 1998, p.55).
To obtain the overall attributable fraction for the population, we multiply the attributable
fraction by the proportion of all cases in the total population that are exposed. (Rothman, 2002,
p.54)
For example, if AF=0.8 and here are 1400 cases in the entire population, of whom 500 are
exposed (proportion of exposed cases is 500/1400 = 0.357) then the overall attributable fraction
for the population 0.8  0.357 = 0.286. That is, 28.6% of all cases in the population are
attributable to the exposure.
Chapter 3-5 (revision 14 Feb 2012)
p. 28
We computed this above for Bergstrom’s (2004) data, based on incidence proportions,
csi 7 4 240 734
|
Exposed
Unexposed |
Total
-----------------+------------------------+-----------Cases |
7
4 |
11
Noncases |
240
734 |
974
-----------------+------------------------+-----------Total |
247
738 |
985
|
|
Risk | .0283401
.0054201 |
.0111675
|
|
|
Point estimate
|
[95% Conf. Interval]
|------------------------+-----------------------Risk difference |
.02292
|
.0015582
.0442818
Risk ratio |
5.228745
|
1.543692
17.71065
Attr. frac. ex. |
.8087495
|
.3522022
.9435368
Attr. frac. pop |
.5146588
|
+------------------------------------------------chi2(1) =
8.80 Pr>chi2 = 0.0030
and again with Sulkowski’s (2000) data, based on incidence rates,
iri 26 5 795 246
|
Exposed
Unexposed |
Total
-----------------+------------------------+-----------Cases |
26
5 |
31
Person-time |
795
246 |
1041
-----------------+------------------------+-----------|
|
Incidence Rate | .0327044
.0203252 |
.0297791
|
|
|
Point estimate
|
[95% Conf. Interval]
|------------------------+-----------------------Inc. rate diff. |
.0123792
|
-.0094249
.0341833
Inc. rate ratio |
1.609057
|
.6080283
5.36572
Attr. frac. ex. |
.3785178
|
-.6446602
.8136317
Attr. frac. pop |
.3174666
|
+------------------------------------------------(midp)
Pr(k>=26) =
0.1684
(midp) 2*Pr(k>=26) =
0.3368
Chapter 3-5 (revision 14 Feb 2012)
(exact)
(exact)
(exact)
(exact)
p. 29
If the exposure is categorized into more than two levels, we can use an extension of our formula
which takes into account each of the exposure levels (Rothman, 2002, p.54)
total attributable fraction =  (AFi x Pi )
where i represents an exposure category
Hypothetical data giving 1-year disease risk for people at
three levels of exposure
Exposure
None
Low
High
Total
Disease
100
1200
1200
2500
No disease 9900
58,800
28,800
97,500
Total
10,000
60,000
30,000
100,000
Risk
0.01
0.02
0.04
0.025
Risk ratio
1.00
2.00
4.00
Proportion 0.04
0.48
0.48
of all cases
The attributable fraction for the group with no exposure is 0. For the low-exposure group, the
attributable fraction is (RR-1)/RR = (2-1)/2=0.5. For the high-exposure group, AF = (41)/4=0.75.
The total attributable fraction is:
total AF =  (AFi x Pi )
= 0 + 0.5(.48) + 0.75(.48) = 0.60
Chapter 3-5 (revision 14 Feb 2012)
p. 30
Published Population Attributable Fraction Example
Lee et al (N Engl J Med, 2006) report the following:
Statistical Analysis Section (p.140, second to last paragraph)
We estimated the population attributable risk (PAR) for heart failure associated with
parental occurrence of the condition as a function of the proportion of cases occurring in
those with a parent with heart failure (pd) and the multivariable-adjusted relative risk
(RR, equivalent to hazard ratio from models with clinical covaraites), calculated19 as
PAR = pd
(RR - 1)
×100.
RR
Results Section (p.143, first paragraph)
The population-attributable risk of heart failure that was due to the presence of the
condition in a parent was 17.8 percent.
Discussion Section (p.145, end of first paragraph)
Approximately 18 percent of the heart-failure burden in the offspring was attributable to
parental heart failure. Our findings that heart failure in their parents predisposes people
to both left ventricular systolic dysfunction and overt ehart failure underscore the
contribution of familial factors to development of the condition.
This was calculated using the following,
disp 39/(39+51)*(1.70-1)/1.70*100
where the 39 and 51 come from the 2nd row of Table 5 (the number cases for the two groups)
and the 1.70 is the multivariable-adjusted HR shown on row 6.
Lee is using his own sample data to estimate the proportion of all cases in the total population
that is exposed, which is the “pd” in his PAR equation. Generally, you would not do this because
a study sample is generally restricted in some way and so is not representative of the population
(see Note on page 34 below). Lee’s estimate might be sufficiently accurate, however, since his
sample is community-based sample (comes from the Framingham Offspring Study, began in
1971 with the enrollment of childen of the original Framingham Heart Study cohort).
In Lee’s Statistical Analysis Section, reference 19 is cited. This is Rockhill (1998), which is a
very helpful paper on the subject. This paper is well worth reading before publishing a
population attributable fraction.
Chapter 3-5 (revision 14 Feb 2012)
p. 31
Stata Exercise 4 (using variables with epitab)
We will practice using the following data file.
Evans County Dataset (evans.dta)
Source
dataset to accompany Kleinbaum and Klein (K&K chapter 2)
http://www.sph.emory.edu/~dkleinb/logreg2.htm#data
Brief Description
Data are from a cohort study in which n=609 white males were followed for 7
years, with coronary heart disease as the outcome of interest.
Codebook
n = 609
outcome
chd
coronary heart disease (1=presence, 0=absence)
predictors
cat
catecholamine level (1=high, 0=normal)
age
age in years (continuous)
chl
cholesterol (continuous)
smk smoker (1=ever smoked, 0=never smoked)
ecg
electrocardiogram abnormality (1=presence, 0=absence)
dbp
diastolic blood pressure (continuous)
sbp
systolic blood pressure (continuous)
hbp
high blood pressure (1=presence, 0=absence)
defined as: DBP  160 or SBP  95
data management
id
subject identifier (unique #, one observation per subject)
Chapter 3-5 (revision 14 Feb 2012)
p. 32
Start the Stata program and read in the data,
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on evans.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\evans.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\”
cd “Biostats & Epi With Stata\datasets & do-files"
use evans.dta, clear
Chapter 3-5 (revision 14 Feb 2012)
p. 33
To look at the association between CHD and Smoking, we use
Statistics
Epidemiology and related
Tables for epidemiologists
Cohort study risk ratio etc.
Main tab: Case variable: chd
Exposed variable: smk
OK
cs chd smk
| smk
|
|
Exposed
Unexposed |
Total
-----------------+------------------------+---------Cases |
54
17 |
71
Noncases |
333
205 |
538
-----------------+------------------------+---------Total |
387
222 |
609
|
|
Risk | .1395349
.0765766 | .1165846
|
|
|
Point estimate
| [95% Conf. Interval]
|------------------------+---------------------Risk difference |
.0629583
| .0138116
.112105
Risk ratio |
1.822161
| 1.083858
3.063382
Attr. frac. ex. |
.4512012
| .0773703
.6735634
Attr. frac. pop |
.3431671
|
+----------------------------------------------chi2(1) =
5.43 Pr>chi2 = 0.0198
Note: We see from the “Attr. frac. pop” line that the overall attributable fraction for the
population is 0.34, or 34%. Recall that this is computed by AR  (proportion of all cases
in the total population that is exposed), or 0.4512012  (54/71). We can verify this using
the display command in Stata’s command window,
display (54/71)*.4512012
.34316711
Note. Stata’s “prevented fraction for the population” is of little interest, because it
depends on an estimate of the “proportion of all cases in the total population that is
exposed” being made by your sample data. It is hard to convince your reader that this
estimate is reliable, since research samples are rarely representive of the “total
population”. Research samples in medical research are usually restricted in some way in
order to eliminate confounding and to better test theories.
Chapter 3-5 (revision 14 Feb 2012)
p. 34
References
Blosseld H-P, Hamerle A, Mayer KU. (1989). Event History Analysis: Statistical Theory and
Application in the Social Sciences. Hillsdale NJ, Lawrence Erlbaum Associates.
Granados JAT. (1997). On the terminology and dimensions of incidence. J Clin Epidemiol
50(8):891-897.
Horton NJ, Switzer SS. (2005). Statistical methods in the Journal. [letter] NEJM 353;18:197779.
Lee ET. (1980). Statistical Methods for Survival Data Analysis. Belmont CA, Lifetime Learning
Publications.
Myers MH. (1969). A Computing Procedure for a Significance Test of the Difference Between
Two Survival Curves, Methodological Note No. 18 in Methodoligcal Notes compiled by
the End Results Sections, National Cancer Institute, National Institute of Health,
Bethesda, Maryland.
Rockhill B, Newman B, Weinberg C. (1998). Use and misuse of population attributable
fractions. Am J Public Health 88(1):15-19.
Rothman KJ. (1986). Modern Epidemiology. Boston, Little, Brown.
Rothman KJ. (2002). Epidemiology: An Introduction. Oxford, Oxford University Press.
Rothman KJ, Greenland S. (1998). Modern Epidemiology, 2nd ed. Philadelphia, PA,
Lippincott-Raven Publishers.
Sulkowski MS, Thomas DL, Chaisson RE, Moore RD. (2000). Hepatotoxicity associated with
antiretroviral therapy in adults infected with human immunodeficiency virus and the role
of hepatitis C or B virus infection. JAMA 283(1):74-80.
Chapter 3-5 (revision 14 Feb 2012)
p. 35
Download