Basic measures and tools
of descriptive epidemiology
Piyawat Saipan (DVM., M.Sc., Ph.D)
Department of Veterinary Public Health, KhonKaen U.
Topics:
Importance of describing an event
 Continuous VS categorical data
 Ratios, proportions, rates; Incidence VS
prevalence
 Other measures: AR, CFR, etc.
 Basic terminology: accuracy, precision,
bias

Epidemiologic Research Assumes

Disease occurrence is not random

Systematic investigation of different
populations can identify causal and preventive
factors

Making comparisons is the cornerstone of
systematic investigations
Definition of Epidemiology

The study of the distribution and
determinants of disease frequency
in human or animal populations and
the application of this study to control
health problems
Key Words in Definition




Disease frequency - count cases, need system,
records
Disease distribution - who, when, where
Frequency, distribution, other factors generate
hypotheses about determinants
A determinant is a characteristic that influences
whether or not disease occurs
Natural Progression in Epidemiologic
Reasoning
1st – Suspicion that a factor influences disease
occurrence. Arises from clinical practice, lab
research, examining disease patterns by person,
place and time, prior epidemiologic studies
2nd – Formulation of a specific hypothesis
Natural Progression in Epidemiologic
Reasoning
3rd – Conduct epidemiologic study to determine the
relationship between the exposure and the
disease. Need to consider chance, bias,
confounding when interpreting the study results.
4th – Judge whether association may be causal.
Need to consider other research, strength of
association, time directionality
Importance of describing an event



Descriptive epidemiology involves observing and
recording diseases and possible causal factors.
Descriptive epidemiology counts the frequency of
cases and describes distribution patterns of
disease among different groups in the population
for further analysis (who, what, when, where).
It is necessary to know the types of data that are
collected suitable for a particular investigation.
Epidemiologic measures:
Overview
● Types of data
 ● Measures of occurrence or health
 ● Measures of association
 ● Measures of attribution

Types of Data
Continuous data
 Categorical data



Variable: any observable event that can vary.
Variables may be either continuous and discrete.
Raw data: the initial measurements that form the
basis of analaysis
Categorical data
Qualitative data describes to which
category an animal belongs.
 Nominal and Ordinal scale
 Nominal data: sex (male, female),
categories of animals (pig, dog, cattle)
 Ordinal data: a series of clinical signs
(mild, moderate, serious, death)

Continuous data
Quantitative (numerical) data consisting of
numerical values on a well defined scale:
 A discrete (discontinuous) scale: data can
take only particular integer values,
typically count e.g., litter size, parity
 A continuous scale: for which all values
are theoretically possible e.g. height,
weight, concentration of chemical in blood
etc.

Some descriptive statistics
Tables to exhibit features of the data
 Diagram to illustrate patterns:

 Categorical
data: bar chat, pie chat
 Numerical data: dot diagram, histogram, stem
and leaf diagram etc.)

Numerical measures
 Measures
of central tendency or location
 Measure of disperation
Measures of central tendency

Arithmetic mean
Example: the distribution of body weight of 11 dogs:
10, 11, 12, 13, 14, 15, 16, 17, 18, 18, 20 kg.
Arithmetic mean = 164 ÷ 11 = 14.91 kg.
Cont.

Geometric mean: an average of the
logarithmic values converted back to base
10 numbers:
Example: four HI titers are given as the following dilutions:
4, 8, 16, and 32
GM = (4+8+16+32)¼ = 15
Cont.


Median is the mid-point between the minimum
and maximum value.
Mode is a value occurs most frequently and it is
used to highlight a common data point.
Measures of dispersion (spread)





Range is defined as the difference between the largest
and smallest observational.
The interquartile range is the range of values which
encloses the central 50% of the observations.
Variance is determined by calculating the deviation of
each observation from the mean.
Standard deviation may be regarded as a kind of
average of the deviations of the observations from the
arithmetic mean.
Interval estimation
Measures of Health (Occurrence)
Ratio, proportion, and rate
 Prevalence
 Incidence

 Incidence

risk and incidence rate
Other measures of health
 Attack
rate, secondary attack rate, mortality,
fatality, proportional mortality

Adjusted measures of health
Ratio

A ratio defines the relative size of two
quantities expressed by dividing one
(numerator) by the other (denominator).
Example, we have a herd of 100 cattle
and 58 are found to be diseased. Ratio ?
 The ratio of disease in this herd is 58:42
or 1.4 to 1.

Proportion
A proportion is a fraction in which the
numerator is included in the denominator.
 Say we have a herd of 100 cattle and 58
are found to be diseased. Proportion ?
 The proportion of diseased animals in this
herd is 58 ÷ 100 = 0.58 = 58%.

Rate

A rate is derived from three pieces of
information:
(1) a numerator: the number of individuals diseased
or dead,
 (2) a denominator: the total number of animals (or
animal time) in the study group and/or period; and
 (3) a specified time period.


Example, we might say that the rate of disease
in our herd over a 12-month period was 58
cases per 100 cattle.
Epidemiologic Measures
The term morbidity is used to refer to the
extent of disease or disease frequency
within a defined population.
 Two important measures of morbidity are
prevalence and incidence.

Prevalence

Prevalence is the proportion of a population has
a specific disease or attribute at a specified
point in time.
Prevalence (cont.)
Two types of prevalence:
 (1) point prevalence
 The
number of disease cases in population at a
single point in time.

(2) period prevalence:
 The
point prevalence at the beginning + the
number of new cases that occurred during the
remainder of study period
Example


Of the 216 dogs examined in Kingston, 192 had
evidence of tooth decay. Of the 184 dogs
examined in Newburgh 116 had evidence of
tooth decay. Assuming complete survey
coverage, there were 192 prevalent cases of
tooth decay among dogs in Kingston at the time
of the study. Prevalence ?
The prevalence of tooth decay was 192 ÷ 216 =
89% in Kingston and 116 ÷ 184 = 63% in
Newburgh.
Incidence
Incidence describes the number of new
cases that arise in a population over a
specified period of time.
 There are two ways to express incidence:
 Incidence risk and Incidence rate.

Incidence risk

Incidence risk (as cumulative incidence) is
the proportion of initially susceptible
individuals in a population who become
new cases during a defined time period.
For example:



Last year a herd of 121 cattle were tested for
tuberculosis using the tuberculin test and all tested
negative. This year the same 121 cattle were
tested and 25 tested positive. Incidence risk?
The incidence risk would then be 21 cases per 100
cattle for the 12-month period.
We can also say that the risk of cattle becoming
positive to the tuberculin test for the 12-month
period was 21%. This is an expression of average
risk applied to an individual (but estimated from
the population).
Epidemiologic Measures
Occurrence of disease can be measured
in a static way or in a dynamic way.
 Static way: Did the cow have mastitis or
not at the moment of measurement?
 Dynamic way: Did the cow get mastitis
during the study period?
 With respect to disease: prevalence is the
static way and incidence is the dynamic
way

Population at Risk
A closed and open population [to recruit
(births, purchase) and leave (sale, death)]
 When the population is open incidence
risk cannot be measured directly: adjusted
by;

Denominator = population size at the mid-point of
the study period
For example:

Last year a herd of 121 cattle were tested for
tuberculosis using the tuberculin test and all tested
negative. This year the cattle were reduced to 101
and were tested of 25 tested-positive. Incidence
risk?

The incidence risk would then be 23 cases per 100
cattle for the 12-month period.
Incidence rate

Incidence rate (incidence density) is the
number of new cases of disease that
occur per unit of individual time at risk,
during a defined time period.
Table 1
Note that incidence rate:




Accounts for individuals that enter and leave the
population throughout the period of study.
Can account for multiple disease events in the
same individuals (Table 1).
For example, On the basis of the data presented
in Table 1 the incidence rate of clinical mastitis
for the 12-month period is 5 cases per 825 cowdays at risk. Incidence rate ?
Incidence rate = 2.2 cases of clinical mastitis
per cow-year at risk.
Summary:
Summary
Comparison of prevalence, incidence risk, and incidence rate
The relationship between P and I


Providing incidence rate is constant, incidence risk can
be estimated following:
 Closed population: incidence risk = incidence rate x
length of study period
 Open population: 1- exp ^ (-incidence rate x length of
study period)
Providing incidence rate is constant, prevalence rate can
be estimated following:
 P = (incidence rate x duration of disease) ÷ (incident
rate x duration of disease + 1)
For example:
The incidence rate of disease is estimated
to be 0.006 cases per cow-day at risk. The
mean duration of disease is 7 days. The
estimated prevalence of disease is…
 (0.006 x 7) (0.006 x 7 +1) = 0.041
 The estimated prevalence is 4.1 cases per
100 cows.

Other measures of health
Attack rate
 Secondary attack rate
 Fatality rate
 Mortality

Attack rate
It is defined as the number of cases
divided by the number of individuals
exposed.
 Attack rate are usually in outbreak
situations where the period of risk is
limited and all cases arising from
exposure are likely to occur within the risk
period.

Secondary attack rate
It used to describes infectiousness.
 Secondary attack rate: the number of
cases at the end of the study period less
the number of initial (primary) cases
divides by the size of the population that
were initially at risk.

Mortality
Mortality risk or rate is an example of
incidence where death is the outcome of
interest.
 The denominator includes both prevalent
cases of the disease and individuals who
are at risk of developing the disease.

Proportional mortality

It is simply the proportion of all deaths that
are due to a particular cause for a
specified population and time period
Adjusted measures of health
Adjusted rates are used when we want to
compare the level of disease in different
population.
 In veterinary medicine, age, breed, and
population type are commonly used
adjustment variable.
 Two methods for adjusting disease: direct
and indirect adjustments

For example:


If we have two colonies of mice and observe them for one
day we might find the mortality rate in the first colony is 10
per 1,000 and the mortality rate in the second colony is 20
per 1,000.
We might initially think that this difference is due to a
difference in management, but it might also transpire that
the first colony is comprised of mainly young mice and the
second colony is comprised of mainly older mice. The two
colonies might be exactly the same in terms of standards
of care and housing quality and the difference in mortality
solely due to a difference in age composition of the two
populations.
Direct adjustment

With direct adjustment the observed
stratum-specific rates are known and an
estimated population distribution is used
as the basis for adjustment.
Where: STD Pi: the size of standard population in the ith strata
OBS Ri: the observed rate in the ith strata
Stratum-specific rates
• Stratum-specific rates are recommended for comparing
defined subgroups between or within populations when
rates are strongly stratum-dependent.
• Stratum-specific rates are recommended when specific
causal or protective factors or the prevalence of risk
exposures are different for different levels of strata.
For example:
Table1. Seroprevalence of leptospirosis in urban
dogs, stratified by city.
City
Positive
Sample
Seroprevalence
E
61
260
23%
G
69
251
27%
Total
130
511
25%
Cont.:
(cont.)
Table 2 Seroprevalence of leptospirosis in urban dogs, stratified by city and sex
Table 3 Directly adjusted seroprevalence of leptospirosis in urban dogs, stratified
by city
The difference between the cities is de to the different sex structure
of the 2 populations
Indirect adjustment


Indirect adjustment provides an estimate of the
expected number of cases, given the stratumspecific population size.
It is usual to divide the observed number of disease
cases by the expected number to yield a
standardised morbidity/mortality ratio (SMR).
Where: STD Ri: the standard rate in the ith strata of population
OBS Pi: the observed population size in the ith strata
For example:



We know that the prevalence of a given disease throughout
a country is 0.01%. If we are presented with a region with
20,000 animals the expected number of cases of disease in
this region will be 0.01% × 20,000 = 2.
If the actual number of cases of disease in this region is 5,
then the standardised mortality (morbidity) ratio is 5 ÷ 2 =
2.5.
That is, there were 2.5 times more cases of disease in this
region, compared with the number of cases we were
expecting.
Example:
Measures of association


Risk is the probability that an event will happen.
A characteristic or factor that influences whether
or not an event occurs, is called a risk factor.
Associations between putative risk factors
(exposures) and an outcome (a disease) can be
investigated using analytical observational
studies.
Measures of association

Both exposure and outcome are binary
variables (yes or no), the results can be
presented as a 2 × 2 table.
Exposed
Non-exposed
Total
Diseased
a
c
a+c
Non-diseased
Total
b
a+b
d
c+d
b+d
a+b+c+d


Incidence risk in the exposed population:
Incidence risk in the non-exposed population:
Cont.

Incidence risk in the total population:
Measures of association

Case-control study
Exposed
Non-exposed
Total
Case
a
b
a+b
Control
c
d
c+d


Odds of disease in the exposed population:
Odds of disease in the non-exposed population:
Cont.
Three main categories:
 (1) measures of strength
 (2) measures of effect
 (3) measures of total effect
Measures of strength
Risk ratio
 Incidence rate ratio
 Odds ratio

Risk ratio (RR)

To define as the ratio of the risk of disease (i.e.
incidence risk) in the exposed group to the risk
of disease in the unexposed group.
If RR = 1 : risk of disease in the exposed and non-exposed groups are equal.
If RR < 1 : exposure reduces the risk of disease and exposure is said to
protection
If RR > 1 : exposure increases the risk of disease
RR range between 0 and infinity. It cannot be estimated in case-control
study
Incidence rate ratio (IRR)
This is the ratio of the incidence rate in the
exposed group to that in non-exposed
group.
 The term relative risk is used as a
synonym for both risk ration and incidence
rate ratio.
 IRR is interpreted in the same way as risk
ratio.

Odd ratios (OR)

OR is the odds of disease, given
exposure.
Odd ratio (cont.)

When the No. of cases is low relative to
the No. of non-cases (i.e. the disease
rare) OR approximately RR.
Measures of effect in the exposed
population
Attributable risk (rate)
 Attributable fraction

Attributable risk (AR)

AR is defined as the increase or decrease
in the risk (or rate) of disease in the
exposed group that is attributable to
exposure.
Attributable fraction (AF)

Attributable fraction (the attributable
proportion in exposed subjects) is the
proportion of disease in the exposed
group that is due to exposure.
For example

In vaccine trials (in foxes), vaccine efficacy is defined as the proportion
of disease prevented by the vaccine in vaccinated individuals which is
the attributable fraction. The following results were obtained:
Vaccination Vaccination +
Total
•
Rabies +
18
12
30
Rabies 30
46
76
Total
48
58
106
The odds of rabies in the unvaccinated group was 2.3 times the
odds of rabies in the vaccinated group (OR = 2.3). Fifty six percent
of rabies cases in unvaccinated foxes was due to not being
vaccinated (AF = 0.56).
Measures of effect in the total
population

Population attributable risk or rate (PAR)
is the increase or decrease in risk (or rate)
of disease in the population that is
attributable to exposed.
Measures of effect in the total
population

Population attributable fraction (PAF) is
the proportion of disease in the population
that is due to the exposure.
For example:

A study investigating the relationship between DCF and
FUS was conducted. The following results were
obtained:
DCF +
DCF Total
FUS +
13
5
18
FUS 2163
3349
5512
Total
2176
3354
5530
Questions ?
The incidence risk in exposed group ?
 The incidence risk in non-exposed group ?
 RR ?
 AR ?
 AF ?
 PAR ?
 APF ?

Answers:



The incidence risk of FUS in DCF+ group was
5.97 cases per 1000
The incidence risk of FUS in DCF- group was
1.49 cases per 1000
The incidence risk of FUS in DCF exposed
group was 4.01 times greater than the incidence
risk of FUS in DCF non-exposed group (RR =
4.00)
Answers:




The incidence risk of FUS in DCF+ group that may be
attributed to DCF is 4.5 per 1000 (AR = 0.0045)
In DCF+ group 75% of FUS is attributable to DCF (AF =
0.75).
The incidence risk of FUS in the population that may be
attributed to DCF is 1.8 per 1000. That is, we would
expect the risk of FUS to decrease by 1.8 cases per 1000
if DCF were not fed (PAR = 0.0018).
Fifty-four percent of FUS cases in the population are
attributable to DCF (PAF = 0.54).
Other basic terminology
Accuracy: the accuracy of a test relates to
its ability to give a true measure of the
substance being measured.
 To be accurate, a test need not always be
close to the true value, but if repeat tests
are run, the average of the results should
be close to the true value.

Cont.


The precision of a test relates to how
consistent the results of the test are. If a test
always gives the same value for a sample
(regardless of whether or not it is the correct
value), it is said to be precise.
Variability among test results might be due to
variability among results obtained from running
the same sample within the same laboratory
(repeatability) or variability between laboratories
(reproducibility).
Cont.

Bias is caused by systematic error, a
systematic error being one that is inherent
to the technique being used that results in
a predictable and repeatable error for
each observation.
Cont.
The most common of bias are:
Bias due to confounding
 Measurement bias
 Interview bias
 Selection bias

Thank You