Session 1 Lecture

advertisement
Advanced Course in Epidemiological Analysis 2011
Session 1
Measures of disease and effect
Helen Weiss
Objectives:

To introduce the main measures of disease occurrence and exposure effect used in the
statistical analysis of epidemiological studies.

To explain the difference between risks and rates, and the advantages of using rates in the
analysis of cohort studies.

To explain why odds ratios are usually used in the analysis of case-control and crosssectional (prevalence) studies.
1. Introduction
Many of the statistical methods used in the analysis of epidemiological studies describe and
evaluate the extent to which the chance of disease varies according to different levels of some
factor (exposure).
Note that the chance (or probability) of disease refers to the unknown outcome for future
subjects, not to the actual outcomes in a study. The data from a study are used to estimate this
probability. Note also that “disease” is used here as a general term to refer to the outcome of
interest, whether this is infection, disease onset, death or any other well-defined event (e.g.
pregnancy).
2. Measures of disease outcome
The choice of the outcome measure used in a study will often depend on the study design:

For cohort studies, the outcome measure is usually the risk or rate of disease.

For cross-sectional studies, the outcome measure is usually the risk (prevalence) or odds of
disease.
In this session we will assume that there are two groups of individuals under study, an exposed
group who possess the risk factor of interest, and an unexposed group who do not. We are
interested in comparing the amount of disease in the exposed group to that in the unexposed
group. Whatever the outcome measure used (e.g. risk, rate, odds), we assume that there is some
true or underlying value for the population from which the data were sampled. We wish to
estimate this value, using the data from our study.
L.1.1
Advanced Course in Epidemiological Analysis 2011
3. Risk measures
Consider a group of N disease-free individuals who join a cohort at a particular point in time and
are then followed for a fixed period. The following table shows disease outcomes by exposure:
Disease
Exposed
Unexposed
Total
Yes
D1
D0
D
No
H1
H0
H
Total
N1
N0
N
The risk of disease is defined as the probability that an individual experiences the disease during
the fixed period of time. For exposed individuals, we can estimate the risk of disease (1) as
D1/N1.
The risk of disease in the unexposed group (0) is estimated as D0/N0.
The effect of exposure on disease is often assessed by taking the ratio of the risk in exposed
individuals to the risk in unexposed individuals.
Risk ratio = 1/0
Another measure of effect is the difference between the risks of disease in exposed and
unexposed individuals:
Risk difference = 1 - 0
Exercise 1: As an example, consider a group of 30,000 subjects observed for 10 years, after
which time 50 were diseased.
i)
Exposed
Unexposed
Total
Yes
30
20
50
No
9 970
19980
29 950
Total
10 000
20 000
30 000
Estimate the risk of disease separately for exposed and unexposed individuals and obtain
the risk ratio for exposed vs. unexposed individuals.
Risk in exposed =
ii)
Disease
Risk in unexposed =
Calculate the risk difference for exposed vs. unexposed individuals.
Risk difference =
L.1.2
Risk ratio =
Advanced Course in Epidemiological Analysis 2011
4. Rate measures
4.1 Disadvantages of risks
The risk of disease will generally increase with the length of the time interval considered: the
longer an individual is observed the greater their chance of developing the disease. It follows
that in comparing exposed and non-exposed groups a 5-year risk should only be compared with
other 5-year risks. It would be misleading to compare a 5-year risk with a 10-year risk, etc.
Some further disadvantages of risks are:
1.
They are more complicated to calculate if all individuals are not followed for the
same length of time. This complicates the analysis of cohort studies where individuals
may enter or exit at various points during the period of observation: for example,
individuals may be lost to follow-up through emigration or cease to be at risk of a
particular disease (e.g. because they have died from some other cause).
2.
For individuals who develop the disease, no account is taken of when the disease
occurred during the study period.
3.
They do not provide a suitable approach for examining the effect of exposures,
which change with time e.g. where individuals who are initially unexposed become
exposed at some point during the follow-up period. Studying the effect of any timedependent variable using risks is similarly difficult. An important example is the effect
of age, since individuals inevitably increase in age with increasing follow-up.
4.
They may not be useful for common diseases with recurrent episodes (e.g. malaria,
respiratory disease). In some settings everyone will experience an episode of disease, so
everyone’s risk is 1.
Most of these disadvantages can be overcome by measuring disease occurrence using rates.
4.2 Definition of a rate
Consider one individual at a particular point in time. As discussed above, the risk that the
individual experiences an event in an interval of time depends on the length of that time interval.
For a short interval of time, h, over which the risk is , the risk per unit time is /h, where 
depends on h and gets smaller as h gets smaller. We define the rate, , as the limiting value of
/h as h gets very small:
 = /h, h small.
The rate is an instantaneous measure of the subject's liability to the disease and, thus, in contrast
with risk, does not depend on the length of follow-up. When expressing a rate, the unit of time is
arbitrary and could be per year, per week, per day, etc. Whatever the unit used, a rate is the
instantaneous risk of disease, rather than the risk over that unit of time. The rate is also called
the force of disease. For example, a rate of 0.6 per year could also be expressed 0.6/12 = 0.05 per
month, as 0.6/365 = 0.00164 per day, or even as 6 per 10 years.
L.1.3
Advanced Course in Epidemiological Analysis 2011
4.3 Estimation of the rate 
Although defined in terms of an individual's instantaneous risk of disease, the disease rate for
one individual is not observable in practice. We estimate disease rates by observing a group of
individuals who are assumed to have the same disease rate . Examples of such groups might be
individuals in a particular age group in a particular time period, of the same sex and living in a
particular geographical area.
In analysing data on outcomes that only occur once in an individual (e.g. measles or death) or in
analysing the time to the first episode of disease that may recur (e.g. malaria, common cold)
each individual contributes a certain amount of observation time, starting when they join the
study and finishing when they develop the disease, or are lost to follow-up, or the follow-up
period ends, whichever happens first.
The observation time is therefore the time during which, were the subject to experience an event,
the event would be recorded in the study. Y is defined as the sum of the observation times for all
individuals (the person-time-at-risk). If measured in years it is called person-years-at-risk (or
pyar). D is defined as the total number of events observed among all individuals. The rate  is
estimated as D/Y (see Appendix A for further explanation). The estimate of  obtained in this
way is called the incidence rate.
Exercise 2: In the Whitehall study of UK civil servants, individuals were grouped as never
smokers or ex- and current smokers. Complete the table (which shows the total deaths D and
person-years Y in each group) by calculating the rate per 1000 person-years in each group.
D
Y
Rate per 1000 person-years
Never/ex smokers
359
32145.4
Current smokers
484
22713.3
4.4 Effect measures based on rates
We may compare disease occurrence in exposed and unexposed individuals by calculating the
ratio or difference of their disease rates:
Rate ratio = 1/0
Rate difference = 1 - 0
where 1 and 0 are the disease rates in exposed and unexposed individuals respectively.
Exercise 3: Calculate the rate ratio and rate difference for smokers (exposed) compared to
never/ex-smokers in the Whitehall study.
Rate ratio
=
Rate difference =
per 1000 person-years
(Note that a rate ratio is a number and has no units, whereas a rate difference has units)
L.1.4
Advanced Course in Epidemiological Analysis 2011
4.5 Converting rates into risks
When the time interval is short, the risk of disease is approximately the rate of disease multiplied
by time (see 4.2 above). But if the time interval is not short, we need to use another formula to
estimate risk.
If the rate of disease during a time interval 0 to t is , then risk of an individual developing the
disease at some time in the interval
=
1- Probability (escaping disease)
=
1 - (1 - t) t / t
=
1- e -t
Example
If the disease rate is 0.001/day
The risk of developing the disease during a 1000 day period
(also
=
=
=
1- e -.001 * 1000
1- e -1
0.6321
=
1- probability of escaping disease =1- 0.999 1000)
If R is risk in time (0,t), then R = 1-e -t or  = [-ln (1-R)] / t
5. Odds measures
5.1 Odds
The odds of disease, , is defined as the probability that an individual experiences the disease
divided by the probability that they do not:
 = /(1-)
[Note that if the risk is known, the odds can be calculated ( = /(1-)), and vice-versa ( =
/(1+)).]
Exercise 4:
i) Calculate the odds of disease, if the risk is (a) 0.01, (b) 0.1 (c) 0.5 (d) 0.8
(a)
(b)
(c)
(d)
ii) Calculate the risk of disease, if the odds are (a) 0.05 (b) 1 (c) 2 (d) 99
(a)
(b)
(c)
(d)
L.1.5
Advanced Course in Epidemiological Analysis 2011
5.2 Odds ratio
Consider a cohort study as below:
Disease
Exposed
Unexposed
Total
Yes
D1
D0
D
No
H1
H0
H
Total
N1
N0
N
We have seen how to calculate risks, risk ratios and risk differences from such tables. We can
also calculate odds and the ratio of the odds among exposed and non-exposed - that is, the odds
ratio.
For exposed individuals in the table, 1 = D1/N1, and 1-1 = H1/N1 (since N1=D1+H1), and the
odds in exposed individuals is estimated as
1 =
D1/N1 = D1/H1
H1/N1
Similarly, the odds in unexposed individuals is estimated as 0 = D0/H0.
The odds ratio is defined as  = 1/(1-1) divided by 0/(1-0) and this is estimated as:
D1/H1 =
D0/H0
D1 x H0
D0 x H1
5.3 Odds ratios in case-control studies
In case-control studies it is usually not possible to calculate disease rates or ratios of those rates
as the study design is essentially cross-sectional and the control group (non-cases) consists of a
(usually unknown) fraction of the total population at risk. Indeed, the case group may consist of
an unknown fraction of all cases.
Thus, suppose we are conducting a case-control study in a population in which the total expected
distribution between cases/non-cases and exposed/non-exposed is as below:
Disease
Exposed
Unexposed
Total
Yes
D1=N11
D0=N00
D
No
H1=N1(1-1)
H0=N0(1-0)
H
Total
N1
N0
N
In a case-control study we sample a fraction, f1 (often unknown) of cases and a different fraction,
f0 (often unknown) of non-cases. Thus the expected results of a case-control study may be
represented as:
L.1.6
Advanced Course in Epidemiological Analysis 2011
Disease
Exposed
Unexposed
Total
Yes
d1=N1 1f1
D0=N0 0f1
d=f1D
No
h1=N1(1-1)f0
H0=N0(1-0)f0
h=f0H
Because f1 and f0 are unknown we cannot estimate risks or rates but what can be estimated is the
odds ratio as (d1/d0) / (h1/h0) = (d1h0) / (d0h1) will give an estimate of (1/(1-1)) / (0(1-0))
5.4 Advantages of odds ratios
For most people, odds and odds ratios are more difficult concepts than risks and risk ratios.
However, there are two reasons why odds and odds ratios are used in the analysis of
epidemiological studies:
(i)
In case-control studies, we cannot calculate the absolute risk (or odds) of disease,
because we have selected people according to disease status. However, it is possible to
estimate the odds of exposure and, hence, the exposure odds ratio. The interpretation of
this will be discussed further in Session 4.
(ii)
Odds and log(odds) have useful mathematical properties that risks do not. Risks must lie
between 0 and 1, odds must lie between 0 and , and log(odds) can lie between - and
+. The fact that log(odds) are unbounded is an advantage when fitting statistical
models. We shall return to this point in the sessions on logistic regression.
Thus, the analysis of case-control is usually based on odds ratios.
6. Approximations between rate ratios, risk ratios and odds ratios
If a disease is rare (i.e.  « 1), then 1/(1-1)  1 and 0/(1-0)  0, so the ratio of odds will be
approximately equal to the ratio of risks. The more common the disease, the less close will be
the agreement between the risk ratio and the odds ratio. In these circumstances, the odds ratio
will be more extreme (further from 1) than the risk ratio.
Furthermore, if a disease is rare (small ), then = 1 - exp(- x Time)   x Time, so the ratio
of risks will be approximately equal to the ratio of rates.
In summary, for diseases that are rare over the course of the follow-up time:
Risk ratio  Odds ratio  Rate ratio.
L.1.7
Advanced Course in Epidemiological Analysis 2011
7. Measures of exposure effect: ratios versus differences
We have mentioned two ways of assessing the effect of exposure on disease - ratio measures and
difference measures. In general, ratio measures give a better measure of the strength of
association between exposure and disease and are commonly used for studying the aetiology of
disease (i.e. epidemiology). Difference measures (risk or rate differences) are usually more
appropriate for assessing the public health implications of an exposure. The two are illustrated
in the following example:
Example: Mortality of British male physicians 1951-1961
Death rate per 1000 person years
Non smokers
Smokers
Lung cancer
Cardiovascular
0.07
7.32
2.27
9.93
Rate
ratio
Rate
difference
32.4
1.4
2.20
2.61
The importance of smoking as an aetiological factor for lung cancer is clearly demonstrated by
the large rate ratio (RR=32.4). In contrast, the association between smoking and cardiovascular
disease is less strong (RR=1.4). From a public health viewpoint, however, smoking is just as
important for cardiovascular disease as lung cancer because the absolute increase in death rate is
similar for both diseases.
8. Summary
The following table shows, for each type of epidemiological study, what measure of disease
and effect should be obtained, and what method may be used to adjust for a confounder:
Measure of disease
(Ratio) Measure of Effect
1
Cohort
rate, risk
risk ratio, rate ratio
Cross-sectional1
risk, odds
risk ratio , odds ratio
Note that in cross-sectional studies, risk is usually interpreted as prevalence
L.1.8
Case-control
odds ratio
Advanced Course in Epidemiological Analysis 2011
Appendix A. Derivation of why D/Y estimates the rate of disease
It may not be intuitively obvious why D/Y estimates the rate of disease. The illustration
below gives a justification for this (which may be ignored for the less mathematically
inclined).
Suppose the rate of disease is .
The risk of disease in a small interval t is t.
The probability of escaping disease from time 0 to time t and then developing disease in the
interval t to t+t is:
(1-t) t/t t
= e - t t
(if t is small)
If everyone is followed-up until they get disease then the expected time to disease is:
∞
o
te - t dt
which is 1/
Thus the total follow-up time, Y, if N individuals are followed until they all get disease
(D = N) is:
Y = D/
Thus
 = D/Y
L.1.9
Advanced Course in Epidemiological Analysis 2011
Appendix B
The Mantel-Haenszel risk ratio is calculated as follows. Suppose that there are J levels of a
confounder, then the association between exposure and disease in the jth level of the
confounder may be summarised in the following table, where the subscripts 0, 1, and j
represent unexposed, exposed, and jth level of the confounder respectively:
Disease
Exposed
Unexposed
Total
Yes
D1j
D0j
Dj
No
H1j
H0j
Hj
Total
N1j
N0j
Nj
If each level of the confounder is summarised in this way, then the Mantel-Haenszel risk ratio is
given by:
RRMH =
 D1j N0j / Nj
 D0j N1j / Nj
where summation if for j=1,2,...,J.
A similar formula is available for a Mantel-Haenszel rate ratio (Session 2) and a MantelHaenszel odds ratio (Session 4).
L.1.10
Download