Lecture: Prevalence and Incidence, and 95% Confidence Intervals

advertisement
Lecture: Prevalence and Incidence, and 95% Confidence Intervals
Dr Richard Crossman, Research Fellow, Health Sciences, Warwick Medical
School
This lecture will: (1) introduce two important measures of health and disease –
prevalence and incidence, and (2) introduce confidence intervals. Please refer to
the webpage for the Social and Population Perspective theme for a fuller lecture
synopsis.
Learning Outcomes
This lecture will contribute to the following two learning outcomes:


To distinguish between, calculate and interpret measures of health and
disease found in the medical literature (such as prevalence, incidence,
incidence rate ratio, odds ratio, standardised mortality ratio, attributable risk)
(TD12a)
Interpret 95% confidence intervals and p values from statistical tests, and
distinguish between statistical and clinical significance (TD12a)
Sub-Outcomes
You should be able to:

Define and differentiate between the terms ‘incidence’ and ‘prevalence’, and
describe their inter-relationship

Distinguish between ‘observed’ epidemiological quantities (incidence, prevalence
etc.) and their ‘true’ or ‘underlying’ values.

Discuss how ‘observed’ epidemiological quantities depart from their ‘true’ values
because of random variation.

Describe how ‘observed’ values help us towards a knowledge of the ‘true’ values
by:
(a) allowing us to test hypotheses about the ‘true’ value
(b) allowing us to calculate a confidence interval that gives a range which
includes the ‘true’ value with a specific probability.
Relevant Reading
Ben-Shlomo Y, Brookes ST, Hickman M. Epidemiology, Evidence-Based Medicine
and Public Health. Lecture Notes. 6th Edition. Wiley-Blackwell, 2013, Chapters 2 & 4.
Lecture Synopsis
1. The ‘Extent’ of Disease: Incidence and Prevalence
To examine the extent of disease in a population we need to know about the number
of new cases arising in a given time (incidence), and the number of people who have
the disease at any given moment (prevalence).
We are interested in knowing how many new cases arise for many reasons:
(1) To see if new cases of an infectious disease are getting more frequent can help
decide whether an epidemic is in progress.
(2) To monitor the effect of prevention programmes (if they are working, the first
effect should be a fall in the frequency of new cases).
(3) To compare people exposed to some potential hazard with unexposed people, to
help us decide if the exposure really is dangerous.
On the other hand, numbers of new cases do not always give us much of an idea
about the ‘burden of disease’ (the extent to which a disease is a problem to a
community), so it is important to also know about numbers of existing cases. This
information helps us know the extent of need for particular health services. This is
especially true for conditions which exist throughout life.
a) Measuring New Cases: the Incidence Rate
A simple count of new cases is of little use; it is necessary to have information on
population size (i.e. the number at risk) and the time period. The best measure of
the population at risk is the product of (multiplication of) the number of people
observed and the number of years of observation. This is called the person-time at
risk, or the number of person-years (p-y). Dividing the number of new cases
observed (events) by the number of person-years gives the incidence rate. This
useful measure answers the question ‘how many new cases per year, per head of
population?’
A mortality rate is a special case of an incidence rate where the event is death
rather than onset of disease. Mortality rates may be calculated for specific diseases
(e.g. the malignant melanoma mortality rate in England and Wales is 20 per million
person-years) or for all diseases combined (the all-cause mortality rate).
Incidence and mortality rates can be compared between different populations to see
whether individuals in one population are at higher risk than another; the populations
and the periods of observation do not then have to be the same size.
For example, if one were to observe 300 myocardial infarctions (MIs) in a population
of 50,000 over an 18 month period, the incidence rate would be 300 ÷ (50,000 x 1.5),
i.e. 0.004 MIs per person per year, or 0.004 MIs per person-year. This is not a very
intuitive way of expressing the rate. ‘4 per 1,000 person-years’ is better – you can
see straight away that 4 MIs per year would be expected in a population of 1,000.
For rare diseases, rates are often expressed per 100,000 person-years. Note that,
for the purpose of calculating an incidence rate in this manner, observing 50,000
people for 18 months is equivalent to observing 25,000 people for 3 years, or any
other combination resulting in 75,000 person-years.
b) Measuring Existing Sufferers: the Point Prevalence
For health service planning, especially for incurable or long-standing diseases, it is
often more useful to know the number of people who currently have the disease
rather than the incidence rate. Once again the number of sufferers is not particularly
useful unless we know how many people are at risk of the disease. In this case
there is no time period because we are simply interested in the proportion of the
population who are affected by the disease. The number of sufferers divided by the
number at risk is called the point prevalence of the disease (usually referred to as
the prevalence). For example, if 80 members of a population of 1,500 have cancer
at a particular time, prevalence = 0.053 or 53 per 1,000 or 5.3%.
c) Relationship between Incidence and Prevalence
Prevalence and incidence are related because all the prevalent cases must at some
time have been incident cases; other things being equal, higher incidence will imply
higher prevalence. However the relationship is not quite this simple. The number of
prevalent cases is constantly being added to by new (incident) cases, and constantly
being depleted by patients dying or recovering:
Figure: relationship between incidence, prevalence and rates of death/cure
Prevalence is influenced by the death rate and the cure rate, as well as the incidence
rate. If a new treatment is found which keeps people with the disease alive longer,
prevalence will increase; if more patients are cured or die, prevalence will fall. When
the incidence rate and the rates of recovery and death are constant, then
P  ( I x L)
where P = prevalence, I = incidence rate and L = mean duration of disease. The
symbol ‘’ means ‘is approximately equal to’.
2. Confidence Intervals
a) Sampling Variation and Statistical Models
Often we would like to draw conclusions for a population but it is not feasible to
assess the whole population. Therefore we draw a representative sample (ideally a
random sample) from the population. However, different samples most likely will give
us at least slightly different answers. This we call sampling variation.
Statistical models are introduced and critically reflected upon. The idea that models
do not necessarily need to be true to be still useful is discussed.
2. Normal Distribution and Normal Approximation
The Normal distribution (also called Gaussian distribution) is introduced. The Normal
distribution has two parameters, namely mean and standard deviation. The normal
distribution with mean 0 and standard deviation 1 is called the standard normal
distribution. The normal distribution is important because it can be used to
approximate other distributions if sample sizes are not too small.
3. Proportions: Estimates, Confidence Intervals, and Hypothesis Tests
Using the prevalence of hypertension as an example, estimates, confidence
intervals, and hypothesis tests for proportions are presented.
The observed value of a quantity of interest (e.g. prevalence, incidence rate) is the
best estimate of the quantity’s true value. However, estimates are subject to
sampling variation. Their precision can be described by their standard errors (SE).
For instance the prevalence of hypertension in a population can be estimated by the
observed prevalence in a random sample from this population. Hereby the observed
prevalence is calculated as the number of subjects with hypertension divided by the
number of subjects in the sample. The standard error of an observed proportion (e.g.
prevalence) is p (1  p ) / n where p denotes the proportion (prevalence) and n the
number of subjects.
Estimated proportions from several samples from the same population follow
(approximately) a Normal distribution, which has two parameters, namely mean and
standard deviation. The mean of this normal distribution can be estimated from a
sample by the observed proportion and the standard deviation by the standard error
of the observed proportion. It is a feature of the Normal distribution that 95% of
values are in the range “mean ± 1.96 x standard deviation”. Plugging in observed
proportion and its standard error for mean and standard deviation we obtain the
interval “observed proportion ± 1.96 x SE of observed proportion”. This range is
known as the 95% confidence interval (CI). If we sample repeatedly from the same
population and calculate the 95% confidence interval for each sample, then we
expect that 95% of these confidence intervals include the true value of the
proportion. In 5% of the cases the range of the confidence interval would not include
the true value.
A lot of analyses include a comparison either between different groups or a sample
with a known quantity. The numerical value corresponding to the comparison is
called the effect (e.g. difference of means). The hypothesis of no effect is called the
‘null hypothesis’. For instance, common null hypotheses are “a difference of means
is 0”. Whether data are consistent with a certain null hypothesis or not can be tested
by means of a confidence interval. If the 95% CI includes the ‘null hypothesis’, then
the data are consistent with the null hypothesis of no effect at a level of 95%. If not,
the effect is called statistically significant.
An alternative approach to check whether data are consistent with a null hypothesis
is the hypothesis test. The probability that we could have obtained the observed data
or more extreme data if the null hypothesis were true is calculated. This probability is
known as the p-value. If a p-value is very small, then either something very unlikely
has occurred or the null hypothesis is wrong. Usually p-values smaller than 5% are
considered ‘small’. But this is of course somewhat arbitrary.
4. Difference between Two Proportions
In a study in about 1,300 adolescents boys and girls were asked whether they
always use seat belts. A difference of 9 percentage points between boys and girls
was observed suggesting that boys are more likely to use seat belts. The above
described methods for confidence intervals and hypothesis tests are extended to
differences between two proportions to investigate whether such findings could be
attributable to chance.
Download