CPH EXAM REVIEW– EPIDEMIOLOGY
Lina Lander, Sc.D.
Associate Professor
Department of Epidemiology, College of Public Health
University of Nebraska Medical Center
January 24, 2014
• Review of basic topics covered in the
epidemiology section of the exam
• Materials covered cannot replace basic
epidemiology course
• This review will be archived on the NBPHE
website under Study Resources
www.nbphe.org
Outline
•
•
•
•
•
Overview
Terminology
Study design
Causation and validity
Screening
Populations
Group of people with a common characteristic
– Place of residence, age, gender, religion
– People who live in Omaha, Nebraska in January, 2014
– Occurrence of a life event (undergoing cancer
treatment, giving birth)
Populations
• Membership can be permanent or transient
– Population with permanent membership is
referred to as “Fixed” or “Closed”
• People present at Hiroshima
• Passengers on an airplane
– Population with transient membership is referred
to as “Dynamic” or “Open”
• Population of Omaha
Measures of Morbidity
Measures of Frequency
“Count” - the most basic epidemiologic measure
– Expressed as integers (1, 2, 3, …)
– Answers the question, “How many people have this
disease?”
– Often the numerator of many measures
– Important to distinguish between incident (new) and
prevalent (existing) cases
Forecast of cancer deaths
Ratio
𝑥
• One number (x) divided by another (y):
𝑦
• Range: zero (0) to infinity (∞)
• (x) and (y) may be related or completely
independent
•
•
Sex of children
Attending a clinic
females
males
females
all
Which of the following terms is expressed as a ratio
(as distinguished from a proportion)?
(A) Male Births / Male + Female Births
(B) Female Births / Male + Female Births
(C) Male Births / Female Births
(D) Stillbirths / Male + Female Births
Proportion
• Ratio in which the numerator (x) is included in the
denominator (x+y):
• Range: zero (0) to one (1)
• Often expressed as percentage ( e.g., Among all children who
attended a clinic, what proportion was female?
females
all
Rate
• Can be expressed as (a/T) where (a) = cases and (T)
involves a component of time
• Range: zero (0) to infinity (∞)
• Measures speed at which things happen
•
Rate of at-risk females coming to a clinic (time)
females(at-risk )
females(all ) * time
Prevalence
• Proportion
• Not a rate – no time component in the calculation
• Measures proportion of existing disease in the
population at a given time
• “Snapshot”
• Dimensionless, positive number (0 to 1)
Prevalence proportion
Prevalence
A

N
Where:
A = number of existing cases
B = number of non-cases
N = total population

A
AB
Incidence
• Measures the occurrence of new cases in a population at
risk over time
• Can be measured as a proportion or a rate
• The most fundamental epidemiologic indicator
• Measures force of morbidity (as a rate)
• Measures conversion of health status (proportion /rate)
Incidence proportion
• Synonyms: incidence, cumulative incidence,
risk
• Measures probability (risk) of developing
disease during period of observation
• Dimensionless, positive number (0 to 1)
Incidence Proportion
IP 
a
N
Where:
a = number of new onset cases (events)
N = population-at-risk at beginning

Incidence Proportion
• Appropriate for fixed (closed) populations and
short follow-up
• Must specify time period of observation
because risk changes with time
• Not appropriate for long-term follow-up due
to potential loss of subjects
• Assuming: complete follow-up, same risk over
time
Follow 2000 newborns at monthly intervals to
measure development of respiratory infection in the
first year
• Suppose 50 infants develop respiratory infection in
first year of life
50
IP =
= 0.025 = 2.5%
2000
•
The risk (probability) of developing a respiratory
infection in the first year of life is ~ 2.5%
•
25 of 1000 infants in this population or 1 in 40 will
develop infection in the first year of life.
Incidence Rate
• Measures how rapidly new cases develop
during specified time period
• Cases per person-time
• Synonyms: incidence, incidence density, rate
• Follow-up may be incomplete
• Risk period not the same for all subjects
Incidence Rate
a
IR =
T
Where:
a = number of new onset cases
T = person-time at risk during study period
(follow-up)
Person-time
• Accounts for all the time each person is in the
population at risk
• The length of time for each person is called
person-time
• Sum of person-times is called the total persontime at risk for the population
Person-time
1
2
Died
3
4
5
1
5
5
3
2
T = person-time at risk during study period
= 1+5+5+3+2 = 16 person-years
Person-time Assumption
• 100 persons followed 10 years = 1000 person years
• 1000 persons followed for 1 year = 1000 person years
Follow 2000 newborns at monthly intervals to
measure development of respiratory infection in the
first year
50 infants develop respiratory infection
1900 complete the first year disease free
25 complete 3 months (0.25 years) before infection
25 complete 6 months (0.5 years) before infection
Calculate incidence rate:
IR 
a
T

50
(1900 * 1)  ( 25 * 0 . 25 )  ( 25 * 0 . 5 )
= 2.6 per 100 person-years
Incidence, Prevalence, Duration
• Prevalence increases as new cases added to the
existing cases (i.e., incidence)
• Prevalence decreases as people are cured or die
• Prevalence = Incidence * Duration
Measures of Mortality
Mortality
• Measures the occurrence death
• Can be measured as a proportion or a rate
• Can measure disease severity or
effectiveness of treatment
Mortality Rate
• Measures rate of death in the population over
a specified amount of time
• Positive number (0 to ∞)
• Can be a measure of incidence rate (risk)
when disease is severe and fatal, e.g.
pancreatic cancer
• Synonym: fatality rate
Mortality Rate
d
MR =
N(T )
Where:
d = number of deaths
N = total population at mid-point of time period
T = follow-up time (usually one year)
Cancer Death Rates*, for Men, US, 1930-2003
*Age-adjusted to the 2000 US standard population.
Source: US Mortality Public Use Data Tapes 1960-1999, US Mortality Volumes 1930-1959,
National Center for Health Statistics, Centers for Disease Control and Prevention, 2002.
Case Fatality Rate
• This is not a rate, this is a proportion
• Proportion of deaths from a specific illness
Case Fatality Rate

a
N
Where:
a = Number of deaths from an illness
N = Number of people with that illness
What percentage of people diagnosed as having a
disease die within a certain time after diagnosis?
Case-fatality rate
• Case-fatality – a measure of the severity of the
disease
• Case-fatality – can be used to measure benefits of a
new therapy
– As therapy improves - the case-fatality rate would be
expected to decline
– e.g. AIDS deaths with the invention of ARVs
Proportionate Mortality
• Of all deaths, the proportion caused by a certain
disease
• Can determine the leading causes of death
• Proportion of cause-specific death is dependent on all
other causes of death
• This does not tell us the risk of dying from a disease
Proportionate mortality from Cardiovascular Disease in the U.S, in 2013
= # of U.S deaths from cardiovascular diseases in 2013 x 1,000
Total deaths in the U.S. for 2013
Which measure of mortality would you calculate to
determine the proportion of all deaths that is
caused by heart disease?
(A) Case fatality
(B) Cause-specific mortality rate
(C) Crude mortality rate
(D) Proportionate mortality ratio
(E) Potential years of life lost
Other Mortality Rates
• Crude Mortality Rate
– Includes all deaths, total population, in a time period
• Cause-Specific Mortality Rate
– Includes deaths from a specific cause, total
population, in a time period
• Age-Specific Mortality Rate
– Includes all deaths in specific age group, population in
the specific age group, in a time period
Age-Specific Mortality Rates
Mortality Rates (Year = 2000)
Panama
Sweden
• Population = 2,899,513
• Deaths = 13,483
• Mortality Rate =
4.65 per 1000 per year
• Population = 8,923,569
• Deaths = 93,430
• Mortality Rate =
10.47 per 1000 per year
Why do you think Sweden has almost a 2x higher mortality rate?
Differences in Mortality Rates
Crude mortality rates do not take into account differences
between populations such as age
Can we remove this confounding by age?
• Separate (stratify) the population into age groups and
calculate rates for each age
– Compare age-specific mortality rates
• If two different populations, adjust (standardize) the
mortality rates of the two populations, taking into
account the age structures
– Results in comparable rates between populations
or in the same population over time
Direct Standardization
• If the age composition of the populations were the
same, would there be any differences in mortality
rates?
• Direct age adjustment is used to remove the effects
of age structure on mortality rates in two different
populations
• Apply actual age-specific rates to a standard
population (US population 2000)
Indirect Standardization
• When age-specific rates are not available – use agespecific mortality rates from the general population to
calculate expected number of deaths
Standardized mortality ratios (SMR)
= observed deaths/ expected deaths
• If the age composition of the populations were the
same, would there be any differences in mortality
rates?
Study Design
• Experimental studies (Clinical Trial, Randomized
Controlled Trial)
• Observational studies
– Cohort
– Case-control
– Cross-sectional
– Ecological
Experimental studies are characterized by:
• The population under study: who is eligible for study
entry?
• The intervention(s) being used or compared: what
treatment(s) are being used? (Therapeutic (e.g., drug)
or preventive (e.g., education)
• The method of treatment assignment: how are subjects
assigned to intervention(s)?
• The outcomes of interest: how will success be
measured?
45
Randomized Controlled Trials
• A randomized controlled trial is a type of
experimental research design for comparing different
treatments, in which the assignment of treatments
to patients is made by a random mechanism.
• Customary to present table of patient characteristics
to show that the randomization resulted in a balance
in patient characteristics.
Randomized Controlled Trials
Time
Steps in carrying out a clinical trial
1.
Select a sample from the population
2.
Ethical considerations
3.
Measure baseline variables
4.
Randomize
5.
Apply interventions
6.
Follow up the cohorts
7.
Measure outcome variables (blindly, if possible)
Use of “Blinding”
 Important when knowing treatment could influence
the interpretation of results
 Especially important when outcomes are subjective
(pain, functional status) and/or when placebo is
employed (either alone or to mask actual treatment)
 Placebo- ensure control and treatment group have
same “experience”
 May not be necessary if the outcome is an object
measure (death, blood glucose)
49
Treat
• Blinding of the participants to which
treatment was used ensures that bias is
avoided
– Single-blind: patient does not know what
treatment they are receiving
– Double–blind: patient and investigator do not
know what treatment (cannot be used for some
treatments, e.g. surgery)
It All Comes Down to…
 Obtaining groups that are comparable for
everything except the treatment…
 So that differences in outcome can fairly be
ascribed only to the difference between the
groups (i.e., to the treatment).
Cohort Studies
• Definition: groups, defined on the basis of some
characteristics (often exposure and non-exposure) are
typically prospectively followed to see whether an outcome
of interest occurs (may also be retrospective)
• Comparison of interest: Compare the proportion of persons
with the disease in the exposed group to the proportion
with the disease in the unexposed group.
• Motivation: If the exposure is associated with the disease,
we expect that the proportion of persons with the disease
in the exposed group will be greater than the proportion
with disease in the unexposed group.
Cohort Studies
Prospective
Retrospective
past
now
Exposed
Diseased
future
Not
diseased
Not
Exposed
Diseased
Not
diseased
now
Prospective cohort studies
• Define sample free of the disease/outcome of
interest, measure the exposure and classify to
exposed vs unexposed at time zero, then follow up
at fixed time point to ascertain outcome
• Measure the ratio of outcome between the
exposed and unexposed (Relative Risk)
Retrospective cohort studies
• Synonyms: historical cohort study, historical
prospective study, non-concurrent prospective
study
• Do not design retrospective cohort studies a priori –
question always in retrospect
• Exposures and Outcomes have already occurred data on the relevant exposures and outcomes must
have been collected
Cohort study strengths
•
•
•
•
•
May be used to define incidence / natural history
Known temporal sequence
Efficient in investigating rare exposures
Permits study of multiple exposures AND outcomes
Fewer biases or bias can be limited or evaluated
– Homogenous sample population
– Accurate measurement of important variables
– No bias in ascertainment of outcome
Cohort study limitations
• Expensive and inefficient – especially for rare diseases
or outcomes (large sample size, long-term follow up)
• Associations may be due to confounding
– Can adjust statistically for any measured potential
confounders
• Must exclude subjects with outcome present at onset
• Disease with long pre-clinical phase may not be
detected
• Sensitive to follow-up bias (loss of diseased subjects)
Case-control Studies
• Definition: compare various characteristics (past
exposure) for cases (subjects with disease) to those of
controls (subjects without the disease)
• Comparison of interest: Compare the proportion with
the exposure in the cases to the proportion with the
exposure in the control group.
• Motivation: If the exposure is associated with the
disease, we expect that the proportion of persons with
the exposure in the cases will be greater than the
proportion with the exposure in the control group.
Case-control Studies
Cases with
disease
Exposed in past
Not Exposed in past
Controls without
disease
Exposed in past
Not Exposed in past
61
Case-Control Studies
•
•
•
•
Efficient for rare diseases
Efficient for diseases with long latency
Can evaluate multiple exposures
Lower costs of exposure assessment relative to
cohort studies
• Incident cases preferable for causal research
• Challenges of control selection
• Challenges of retrospective exposure assessment
Case-control studies are among the best
observational designs to study diseases of:
(A) High prevalence
(B) High validity
(C) Low case fatality
(D) Low prevalence
Example
• Is smoking associated with coronary heart disease (CHD)?
• 3000 smokers and 5000 nonsmokers were followed to see if
they developed CHD
Exposure
Developed
CHD
No CHD
Total
Smoker
Non-smoker
Total
84
87
171
2916
4913
7829
3000
5000
8000
• Case control or cohort study?
Cross-Sectional Studies
• Prevalence studies
• All measurements of exposure and outcome are made
simultaneously (snapshot)
• Disease proportions are determined and compared among
those with or without the exposure or at varying level of
the exposure
• Examine association – determination of associations with
outcomes; generates hypotheses that are the basis for
further studies
• Most appropriate for studying the associations between
chronic diseases and and chronic exposure
• Sometimes useful for common acute diseases of short
duration
Cross-Sectional Studies
Time
Defined
Population
Gather Data on Exposure (Cause) and Disease (Effect /
Outcome)
Exposed:
Have
Disease
Exposed:
No Disease
Not
Exposed:
Have
Disease
Not
Exposed:
No disease
T0
Ecological
• The unit of observation is the population or
community
• Disease rates and exposures are measured in each of
a series of populations
• Disease and exposure information may be abstracted
from published statistics and therefore does not
require expensive or time consuming data collection
Study Design
Category
Type
Analytic
Experimental
Investigates prevention and
treatment
Observational
Investigates causes,
prevention and treatment
Descriptive
Subtype
Characteristics
Observational
Cohort
Investigates health effects of
exposure
Observational
Cross-Sectional
Examines exposure-disease
associations at a point in time
Observational
Case-Control
Investigates risk factors for
disease
Observational
Ecological
Examines exposure-disease
association at the population
level
Descriptive
Describes health of
populations
Probabilities
• Denote the probability of an event by p, where p
ranges from 0 to 1.
• Notation:
p = probability that event occurred
1-p = probability that event did not occur
71
Example
• What is the probability of CHD?
•
171/8000 = 0.02
Exposure
Developed
CHD
No CHD
Total
Smoker
Non-smoker
Total
84
87
171
2916
4913
7829
3000
5000
8000
72
Relative Risk
• RR = incidence among exposed
incidence among unexposed
• Approximates how much more likely it is for the outcome
to be present among a certain group of subjects than
another group
• RR = 1 implies that the risk is the same in the two groups
• RR < 1 implies that the risk is higher in the unexposed
• RR > 1 implies that the risk is higher in the exposed
Example
Sleeping Position and Crib Death
Crib Death
Usual sleeping position
YES
NO
TOTAL
Prone
Other
9
6
837
1755
846
1761
Total
15
2592
2607
1-year cumulative incidence prone = 9/846 = 10.64 per 1000
1-year cumulative incidence other = 6/1761 = 3.41 per 1000
Risk Ratio = 10.64 per 1000 = 3.1
3.41 per 1000
Odds Ratios
• Relative risk requires an estimate of the incidence
of the disease
• For case control studies, we do not know the
incidence of disease because we determine the
number of cases and controls when the study is
designed
• For case control studies, use the odds ratio (OR)
75
Odds
• Odds are another way of representing a
probability
• The odds is the ratio of probability that the
event of interest occurs to the probability that
it does not.
• The odds are often estimated by the ratio of
the number of times that the event occurs to
the number of times that it does not.
76
Odds Ratio
p
1 p
 odds
 p1

 1  p1
 p2

1 p
2



p 1 (1  p 2 )


 odds ratio
 (1  p 1 ) p 2



77
Odds Ratio Example
CHD Cases Controls
Total
Smokers
112
176
Nonsmokers
88
224
Total
200
400
288
312
600
• Case control study of 200 CHD cases and 400 controls to
examine association of smoking with CHD
(Note: now we are examining the probability of exposure)
• What is the probability of smoking among CHD cases?
p = 112/200=0.56
• What is the odds of smoking among CHD cases?
p/(1-p) = 0.56/0.44 = 1.27
78
Odds Ratio Example
CHD Cases Controls
Total
Smokers
112
176
Nonsmokers
88
224
Total
200
400
288
312
600
• What is the probability of smoking among controls?
p = 176/400=0.44
• What is the odds of smoking among controls?
p/(1-p) = 0.44/0.56 = 0.79
• The odds ratio is 1.27 / 0.79 = 1.62
• Interpretation: The odds of smoking is 1.62 times higher
for CHD cases compared with controls
79
Odds Ratio vs. Relative Risk
Disease
a
c
a+c
Exposed
Not Explosed
Total
No Disease
b
d
b+d
Total
a+b
c+d
a+b+c+d
a
Relative
risk 
incidence
incidence
in exposed
in unexposed
 ab
c
cd
a
a  c
c
Odds ratio

a  c
b

a /c
b / d
b  d
d
b  d

ad
bc
Odds ratio
• Odds ratio = odds of exposure in case
odds of exposure in controls
OR=1 exposure is not associated with the disease
OR>1 exposure is positively associated with the disease
OR<1 exposure is negatively associated with the
disease
Odds Ratio vs. Relative Risk
• Both compare the likelihood of an event between two
groups
• OR compares the relative odds of an event in each group
• RR compares the probability of an event in each group
– More ‘natural’ interpretation because risk measured in terms of
probability
– Cannot always be computed
– Can lead to ambiguous interpretations
82
A case-control study comparing ovarian cancer cases with
community controls found an odds ratio of 2.0 in relation to
exposure to radiation. Which is the correct interpretation of the
measure of association?
(A) Women exposed to radiation had 2.0 times the risk of ovarian
cancer when compared to women not exposed to radiation
(B) Women exposed to radiation had 2.0 times the risk of ovarian
cancer when compared to women without ovarian cancer
(C) Ovarian cancer cases had 2.0 times the odds of exposure to
radiation when compared to controls
(D) Ovarian cancer cases had 2.0 times the odds of exposure to
radiation when compared to women with other cancers
Odds Ratios vs. Relative Risks
• In general, odds ratios summarize associations from case-control
studies and cross-sectional studies, and relative risks can be used to
summarize associations in cohort studies.
• Odds ratio can be used to estimate the relative risk when in a case
control study when:
1. Cases are representative of people with the disease in the population
with respect to history of exposure AND
2. The controls are representative of people without the disease in the
population with respect to history of exposure AND
3. The disease is rare
84
Odds ratio estimates relative risk when disease is rare
• When the disease is rare, the number of people with the disease (a
and c) is small so that a+b≈b and c+d≈d
RR 
a /( a  b )
c /( c  d )

a /b
c/d

ad
 OR
bc
85
Odds Ratios for matched case control
studies
• Often, cases are matched with a control based on age, sex,
etc.
• For a matched study, describe the results for each pair
• Concordant pairs: both case and control exposed or both not
exposed
• Discordant pairs: Case exposed/control unexposed or case
unexposed/control exposed
Odds Ratios for matched case control
studies
Controls
Cases
Exposed
Unexposed
Exposed
a
b
Unexposed
c
d
OR is based on the discordant pairs:
OR = b/c
Cohort study is to risk ratio as:
(A) Ecologic fallacy is to cross-sectional study
(B) Case-control study is to odds ratio
(C) Genetics is to environment
(D) Rate ratio is to ecologic study
Measures of Effect and Association
Risk
• RR and OR measure strength of the
association
• How much of the disease can be attributed to
the exposure? How much of the CHD risk
experienced by smokers can be attributed to
smoking?
• OR and RR do not address this.
Measures of Association
• Contrast measure of occurrence in two
populations
– Cancer incidence rates in males and females in
Canada
– Incidence rate of dental caries in children within a
community before and after fluoridation
– Both of these are measures of association
Absolute Measures
Causal rate difference
Incidence rate difference
Incidence density difference
Rate difference
Attributable rate
Causal risk difference
Incidence proportion difference
Cumulative incidence difference
Risk difference
Excess risk
Attributable risk
Risk Difference
• Most often referred to as “attributable risk”
– Refers to the amount of risk attributable to the
exposure of interest
– For example, in the birth cohort analysis, where
exposure = prenatal care in the first 5 months
RD = R1 – R0 = Excess risk of preterm birth
attributable to prenatal
care
Absolute Excess Measures
Incidence proportion (or rate)
Incidence
due to
exposure
Excess risk (or rate) in the exposed
Incidence
not
due to
exposure
Background risk –
incidence rate in unexposed
Unexposed
Exposed
If E is thought to cause D: Among persons exposed to E,
what amount of the incidence of D is E responsible for?
Example
Sleeping Position and Crib Death
Crib Death
Usual sleeping position
YES
NO
TOTAL
Prone
Other
9
6
837
1755
846
1761
Total
15
2592
2607
1-year cumulative incidence prone = 9/846 = 10.64 per 1000
1-year cumulative incidence other = 6/1761 = 3.41 per 1000
Risk difference = 10.64 per 1000 – 3.41 per 1000 = 7.23 per 1000
Added risk due to exposure
Attributable Risk Percent
% AR E 
IP1  IP 0
IP1
 100
(Risk difference / Risk in Exposed) x 100

What proportion of occurrence of disease in exposed persons
is due to the exposure?
Example
Sleeping Position and Crib Death
Crib Death
Usual sleeping position
YES
NO
TOTAL
Prone
Other
9
6
837
1755
846
1761
Total
15
2592
2607
1-year cumulative incidence prone = 9/846 = 10.64 per 1000
1-year cumulative incidence other = 6/1761 = 3.41 per 1000
Risk difference = 10.64 per 1000 – 3.41 per 1000 = 7.23 per 1000
Attributable risk percent = 10.64 per 1000 – 3.41 per 1000 x 100 = 68.0%
10.64 per 1000
Population Attributable Risk
PAR  IP T  IP 0

What is the excess risk in the population
caused by exposure E?

Population Attributable Risk Percent
PAR % 
IP T  IP 0
IP T
 100
What proportion of occurrence of disease in the population is
due to the exposure?
Incidence proportion (or rate)
Population Attributable Risk
Unexposed
Exposed
Population
Should resources be allocated to controlling E or, instead, to
exposures causing greater health problems in the population
Example
Sleeping Position and Crib Death
Crib Death
Usual sleeping position
YES
NO
TOTAL
Prone
Other
9
6
837
1755
846
1761
Total
15
2592
2607
1-year cumulative incidence total = 15/2607 = 5.75 per 1000
1-year cumulative incidence other = 6/1761 = 3.41 per 1000
Population attributable risk (PAR) = 5.75 per 1000 – 3.41 per 1000 =
= 2.35 per 1000
Example
Sleeping Position and Crib Death
Crib Death
Usual sleeping position
YES
NO
TOTAL
Prone
Other
9
6
837
1755
846
1761
Total
15
2592
2607
1-year cumulative incidence total = 15/2607 = 5.75 per 1000
1-year cumulative incidence other = 6/1761 = 3.41 per 1000
Population attributable risk percent (PAR) =
= 5.75 per 1000 – 3.41 per 1000 x 100 = 40.8%
5.75 per 1000
Population Attributable Risk Percent
PAR % 
PE  ( RR  1)
PE  ( RR  1)  1
 100
Affected by the prevalence of exposure in the
population and the relative risk
Absolute Measures
Measure
Risk difference
(attributable risk to
the exposed)
Attributable risk
percent
Attributable risk to
the population
Attributable risk to
the population (%)
Abbrev. Formula
Helps answer the question
RD
AR
If E is thought to cause D: Among
persons exposed to E, what amount
of the incidence of D is E
responsible for?
I1 – I0
AR%
[(I1 – I0)/I1] X 100
PAR
IT – I0
PAR%
[(IT – I0)/IT] X 100
What proportion of occurrence of
disease in exposed persons was due
to the exposure?
Should resources be allocated to
controlling E or, instead, to
exposures causing greater health
problems in the population?
What portion of D in the population
is caused by E? Should resources
allocated for D be directed toward
etiologic research or E?
Summary of Measures
• Absolute measures address questions about
public health impact of an exposure
– Excess risk in the exposed or population
attributable to the exposure
• Relative measures address questions about
etiology and relations between exposure and
outcome
– Relative difference in risk between exposed and
unexposed populations
Causal Inference
Definition of a cause
• “That which produces an effect, result or
consequence or the one such as a person,
event or condition that is responsible for an
action or result” American Heritage Dictionary
• Implies reason and occasion
109
Key Characteristic of a Cause
1. Essential attributes: association, time order and
direction
2. Causes include:
– host and environmental factors
– active agents and static conditions
3. Causes may be either positive (presence
induces disease) or negative (absence induces
disease)
The Epidemiologic Triad
HOST
AGENT
ENVIRONMENT
Factors involved in the Natural History of Disease
Agent
Vector
Host
Environment
Risk factors vs. causes
• Risk factors often used in epidemiology instead
of causes
• A cautious way of making causal inference
• Risk factors are not direct causes of disease
• Serve to identify proximate causes
113
Causal Inference
• During 1950s -1960s epidemiologists developed a set of
postulates for causal inferences regarding non-infectious
diseases of unknown etiology
•
Response to the discovery of association between smoking and
lung cancer
• Debates by many epidemiologists yielded 5 criteria in the 1964
Report of the Advisory committees to the US Surgeon general on
Smoking and Health
• Sir Austin Hill came up with the best known criteria or guidelines
in 1965
• In 1976 Rothman presented a view of causations now known as
the “Sufficient-Component Theory of Causation”
114
Hill Criteria
1. Strength of Association
2. Consistency
3. Specificity of the Association
4. Temporal relationship
5. Biological gradient
6. Biologic plausibility
7. Coherence
8. Experiment
9. Analogy
115
Sufficient-Component Cause Model
• Sufficient cause is a complete causal mechanism that
inevitably produces disease
• Sufficient cause is not a single factor but rather a minimal
set of factors that inevitably produce disease
– Sufficient cause for AIDS may include: exposure - HIV
infection, susceptibility, lack of preventive exposures-absence
of ARVs
• Each participating factor in a sufficient cause is termed a
component cause
116
Disease Causation – 2 components
• Sufficient Cause
– precedes the disease
– if the cause is present, the disease always occurs
• Necessary Cause
– precedes the disease
– if the cause is absent, the disease cannot occur
Disease causation:
Types of causal relationships
1. Necessary and sufficient: Without that factor, the disease
never develops, and in the presence of that factor, the
disease always develops
2. Necessary but not sufficient: Without that factor, the
disease never develops but need other factors as well
3. Sufficient but not necessary: The factor can produce the
disease, but so can other factor.
4. Neither sufficient not necessary: The factor itself cannot
cause the disease but plays a role—multiple factors interact
to cause the disease
118
Sufficient-Component Cause Model - attributes
1. Blocking the action of a single component stops the
completion of the sufficient cause, thus prevents the
disease from occurring by that pathway
2. Completion of a sufficient cause is synonymous with
the biologic onset of disease
3. Component causes may be distant causes and others
may be proximate causes
From Study to Causation
• Associations between ‘exposures’ and outcomes identified in
observational studies may or may not be ‘causal’
• There is need to pay attention to valid assessment of exposure
and outcome in order to think about causality
– Reliability
– Validity
• External validity
• Internal validity – three concepts are considered
– Bias
– Confounding
– Chance (Random error)
Validity
• Implies that a measure purports to measure what it is expected to
measure:
– Appropriate
– Accurate (has same numerical value as the phenomenon being investigated,
i.e. free of systemic error or bias)
– Precise (minimal variations are only because of chance or random error)
• Validity of a study implies that the findings are the “truth”
• The degree to which a measurement or study reaches a correct
conclusion
• Two types of validity: Internal validity,
External validity
122
External validity: generalizabilty
• The extent to which the results of a study are applicable to the
general population
– Do the study results apply to other patients?
• A representative sample is drawn from the population (usually
randomly)
• Individuals have equal chance to participate in the study
• Usually involves a sampling frame
• Inference is made back to the population
123
Internal validity
• Is the extent to which the results of the study accurately reflect
the true situation of the study population
• Is influenced by:
– Chance
• The probability that an observation occurred unpredictability
without discernible human intention or observable cause
– Bias
• Any systemic error (not random or due to chance) in a study which
leads to an incorrect estimate of the association between exposure
and disease
– Confounding
• The influence of other variables in a study which leads to an
incorrect estimate of the association between exposure and disease
124
Random error
• Chance
• “That part of our experience that we cannot
predict” (Rothman and Greenland)
• Usually most easily conceptualized as sampling
variability and can be influenced by sample size
Random error can be problematic, but . . .
• Influence can be reduced
– increase sample size
– change design of sampling
– improve precision of instrument
• Probability of some types of influence can be
quantified (e.g., confidence interval width)
I. Bias - Definition
• Any systemic error (not random or due to
chance) in a study which leads to an incorrect
estimate of the association between exposure
and disease or outcome
• Therefore:
– Bias is a systematic error that results in an incorrect
(invalid) estimate of the measure of association
127
I. Bias - Definition
1.
2.
3.
4.
5.
6.
7.
8.
Can create spurious association when there is none (bias away
from the null)
Can mask an association when there is one (bias towards the null)
Bias is primarily introduced by the investigator or study
participants
Bias does not mean that the investigator is “prejudiced”
Can occur in all study types: experimental, cohort, case-control
Occurs in the design and conduct of a study
Bias can be evaluated but not “fixed” in the analysis phase
Two main types are selection and observation bias
Direction of bias
• Bias towards the null – observed value is closer to 1.0 than is
the true value
Null
Observed
True
• Bias away from the null – observed value is farther from 1.0
than is the true value
Observed 1
Null
True
Observed 2
Types of bias
• Selection bias
– Refusals, exclusions, non-participants
– Failure to enumerate the entire population
– Loss to follow up
• Observation/Information bias
– Diagnostic (lead time) surveillance bias
– Interviewer bias
– Recall bias
– Classification of exposure and outcome
• Misclassification bias (is part of information bias)
– Non-differential
– Differential
130
II. Selection bias
• Any systematic error that occurs in the process of
identifying study populations
• The error that occurs whenever the identification and
selection of individual subjects for inclusion into study
is not independent of outcome (cohort) or exposure
(case-control)
• Error due to systematic difference between those
selected for study versus those not selected for the
study
131
II. Selection Bias
1.
Results from procedures used to select subjects into a study
that lead to a result different from what would have been
obtained from the entire population targeted for the study
2.
Most likely to occur in case-control or retrospective cohort
because exposure and outcome have occurred at the time of
study selection
3.
Selection bias can also occur in prospective cohort and
experimental studies form differential loss to follow-up
- impacts which subjects are “selected” for the analysis
132
II. Selection bias
• Occurs when there is a systematic difference between those
selected for study versus those who were not
– Refusers, non-participants, non-response, exclusions
– Failure to enumerate the entire population
– Those lost to follow up if related to exposure or outcome
– Differential selection of exposed/unexposed groups, or cases and
controls
– Volunteers
– Healthy workers - example
133
II. Selection bias- cohort study
• Selection bias occurs when selection of exposed and unexposed
subjects is not independent of the outcome (so this type can only
occur in retrospective cohort study)
• Examples:
• A retrospective study of an occupational exposure to asbestos and lung
disease in a factory setting
• The exposed and unexposed groups are enrolled on the basis of prior
employment records
• The records are old and many are lost, so the complete cohort working
in the plant is not available for study.
• If people who did not develop the disease and were exposed were
more likely to have their records lost, then there will be an
overestimate of association between the exposure and disease
134
II. Selection bias- cohort study
Solutions:
– Increase participation
– Get relevant information on refusers
– Develop follow-up mechanisms
– Use comparable populations
– Valid assessment of outcome in prospective cohort
studies (for example: blinding)
135
II. Selection bias: case-control study
• Sources of selection bias
– Decisions about selecting incident or prevalent (survival)
cases
– When controls do not reflect the population that gave rise to
the cases
• The selection of cases and controls must be independent of the
exposure status
– Do controls in the study have higher or lower prevalence of
exposure than controls not selected for the study?
– Cases and controls should have the same exposure
opportunities (e.g., welders and general population)
136
II. Selection bias: case-control study
1.
Occurs when controls or cases are more or less likely to be
included in a study if they have been exposed –
inclusion in the study is not independent of exposure
2. Results: relationship between exposure and disease observed
among study participants is different from relationship between
exposure and disease in eligible individuals who were not included
3.
The odds ratio from a study that suffers from selection bias will
incorrectly represent the relationship between exposure and
disease in the overall study population
137
II. Selection bias: cross-sectional study
• Selection bias can occur when
– Sampling frame does not represent the true
underlying population of interest
• Voter registration lists
• Driver’s license records
• Telephone lists
138
II. Selection bias: cross-sectional study
– Estimates of association do not take into account
the sampling structure
– There are sufficient numbers of refusers that the
underlying sampling structure is compromised
– When the “sample” is a convenience sample
(sampling fraction)
139
II. Selection Bias: solutions?
• Little or nothing can be done to fix this bias once it has
occurred in cross-sectional studies
• Need to avoid it during design and implementation:
–
–
–
–
using the same criteria for selecting cases and controls
obtaining all relevant subject records
obtaining high participation rates
taking in account diagnostic patterns of disease
140
III. Observation/information bias
• An error that arises from systematic differences in the way
information on exposure or disease is obtained from the
study groups
• Results in participants who are incorrectly classified as
either exposed or unexposed or as diseased or not diseased
• Occurs after the subjects have entered the study
• Several types of observation bias: recall bias, interviewer
bias, and differential and non-differential misclassification
142
III. Observation/Information bias
• Recall bias
• People with disease remember or report exposures
differently (more/less accurately) than those without disease
– Differential ability of subject to remember previous activities
and exposures, e.g. in serious diseases
– Cases search their memory to understand their illness
– E.g., birth defects
• Can result in over-or under-estimation of measure of
association
143
III. Observation/Information bias
• Recall bias
• Solutions:
– Use controls who are themselves sick
– Use standardized questionnaires that obtain complete
information
– Mask subjects to study hypothesis
144
III. Observation/Information bias
• Interviewer bias
• Systematic difference in soliciting, recording, interpreting
information
• Can occur whenever exposure information is sought when
outcome is known (as in case-control) or when outcome
information is sought when exposure is known (as in cohort
study)
145
III. Observation/Information bias
• Interviewer bias
– The way interviewer asks questions, and there is possibility of
probing e.g. in-person interviews and telephone interviews,
especially where the outcome has already occurred (casecontrol, and retrospective cohort studies)
– Solutions:
• Mask interviewers to study hypothesis and disease or
exposure status of subjects
• Use standardized questionnaires, or standardized
methods of outcome or exposure ascertainment
• Use biomarkers to compare when possible
• Surrogates tend to underreport exposures
146
III. Observation/Information bias
• Classification of exposure and outcome
– Leads to misclassification bias
– If exposure status is known in cohort studies, or
outcome status is known in case-control studies
– Solution- blinding of data collectors to
exposure/outcome status
147
III. Observation/Information bias –
Misclassification bias
• A type of information bias
• Error arising from inaccurate measurement or classification of
study subjects or variables
• Subject’s exposure or disease status is erroneously classified
• Happens at the assessment of exposure or outcome in both
cohort and case-control studies
• Two types: non-differential and differential
148
A. Non-differential misclassification
• Inaccuracies with respect to disease classification are
independent of exposure
• Inaccuracies with respect to exposure are independent of disease
status
• The probability of exposure (or of outcome) misclassification is
the same for cases and controls (or in study/comparison groups)
• Bias results towards the null - if the exposure has two categories,
will make groups more similar (Type II error)
• Solution: Use multiple measurements, most accurate sources of
information
149
B. Differential Misclassification
• Differential misclassification
– Probability of misclassification of disease or exposure status
differs for exposed and unexposed persons (cohort) or
presence of absence of exposure (case-control)
– Probability of misclassification is different for cases and
controls or for levels of exposure within cases and controls
– Direction of bias is unknown, i.e. overestimation or
underestimation of the true risk
– Know that the observed OR deviates from truth, but direction
is unknown
150
B. Differential Misclassification
• Also known as systematic misclassification:
– The probability of misclassification of disease or exposure
status is correlated with presence or absence of characteristic
in the study or control group.
– Thus, misclassification of presence or absence of disease
differs for exposed and unexposed persons, or of presence or
absence of exposure in cases and controls: it is differential
151
Confounding
Definition and Impact
• An alternate explanation for the observed association
between exposure and disease
• “A mixing of effects”: the association between
exposure and disease is distorted because it is mixed
with the effects of another factor that is associated
with the disease
• Result of confounding is to distort the true association
toward the null (negative confounding) or away from
the null (positive confounding)
153
Confounder
• A confounder is associated with the exposure and
independently of that exposure is a risk factor for the
disease
• A confounder has effect on the outcome which can be:
– overestimating (positive), or
– underestimating (negative) or even
– change the direction of the observed effect, a spurious
relationship
154
Criteria for a variable to be a
confounder
• The third variable must not be an intermediate link in the
causal chain between exposure and outcome (i.e., is not
an intermediate or intervening variable)
• The third variable must cause the outcome event (i.e.,
must be an independent predictor of disease with or
without exposure)
• The third variable must be associated (correlated) with
exposure (but not caused by the exposure)
155
CONFOUNDING
E
D
C
Example:
-smoking is a confounder of effect of occupational exposures (to dyes) on bladder
cancer
-age is confounder of effect of DDT pesticide exposure and breast cancer
Opportunities for confounding
• In an experimental designs:
– Participation differs in study and control groups
– There is no evidence for randomization
– There is evidence of residual confounding
• In cohort and case-control studies:
– When selection of comparison group differs by subject
characteristics
– When risk factors other than the exposure are distributed
differently between the exposed and unexposed groups
– There is evidence of residual confounding
157
Controlling for confounding
In the design phase:
• Goal is to eliminate or reduce variation in the
level of the confounding factor between
compared groups
• Remember, a variable can only be a
confounder if it is different between
compared groups.
158
Control for confounding- design phase
– Randomization
• With sufficient sample size, randomization is likely to control for both
known and unknown confounders- but not guaranteed
– Restriction
• Restrict admissibility criteria for study subjects and limit entrance to
individuals who fall within a specified category of the confounder
– Matching
• Select study subjects so that the potential confounders are
distributed in an identical manner among the exposed and
unexposed groups (cohort study) or among the cases and controls
(case-control study)
159
Control for confounding- analysis phase
– Standardization: by age, race, gender, or calendar time in order
to make fair comparisons between populations
– Stratified analysis: test for homogeneity between strata to check
if it is confounding or effect modification, only pool for summary
measure if evidence of homogeneity
– Matched analysis: implement analysis of matched design
– Restriction: Restrict during data analysis
– Multivariate analysis: To enable controlling for several potential
confounders simultaneously
160
Effect modification
• Interaction
• The strength of the association between an
exposure and disease differs according to the level
of another variable.
• Modification of the relationship between exposure
and a disease by a third variable.
• If the association changes according to the level of
the third variable, then effect modification is present
Measurement Error
Measurement
• Measurement of exposure, outcome, and other
relevant characteristics are a key part of
epidemiologic studies
• Almost all tests and measures are imperfect!
Knowledge of how well a measure performs helps
to:
– Choose alternative measures
– Interpret results of studies using a specific measure
Reliability
• How closely do duplicate measurements of the same
characteristic agree with each other
• Examples:
– Test-retest reliability: agreement between responses on a
questionnaire that is administered two (or more) times to the
same person
– Intra-observer: agreement of a given interpreter with
him/herself
– Inter-observer: agreement among different interpreters
• Reliability is usually higher with more standardized or
automated measurement procedures, lower when more
complex judgments are required by human observers (e.g.,
x-ray reading)
Validity
• The degree to which an instrument measures
what it sets out to measure
• For many epidemiologic applications, the
underlying characteristic being measured is
dichotomous (e.g., diseased vs. non diseased;
exposed vs. not exposed). Then, validity can be
regarded as having two components:
– Sensitivity
– Specificity
Sensitivity
• The probability of
testing positive if the
disease is truly present
Results of
Screening
Test
Sensitivity = a / (a + c)
True Disease
Status
+
-
+
-
a
b
c
d
Specificity
• The probability of
screening negative if
the disease is truly
absent
Results of
Screening
Test
Specificity= d / (b + d)
True Disease
Status
+
-
+
-
a
b
c
d
Disease
Test
PRESENT (+)
ABSENT (-)
Test positive (+)
TP
FP
Test negative (-)
FN
TN
Sens= TP /(TP+FN)
PPV= TP /(TP+FP)
NPV= TN /(TN+FN)
Spec= TN /(TN+FP)
Relationship between Sensitivity and Specificity
• Lowering the criterion of positivity results in an
increased sensitivity, but at the expense of
decreased specificity
• Making the criterion of positivity more stringent
increases the specificity, but at the expense of
decreased sensitivity
• The goal is to have a high sensitivity and high
specificity, but this is often not possible or
feasible
Relationship between Sensitivity and Specificity
• The decision for the cut-point involves weighing the
consequences of leaving cases undetected (false negatives)
against erroneously classifying healthy persons as diseased
(false positives)
• Sensitivity should be increased when the penalty associated
with missing a case is high
– When the disease can be spread
– When subsequent diagnostic evaluations are associated with
minimal cost and risk
• Specificity should be increased when the costs or risks
associated with further diagnostic techniques are substantial
(minimize false positives)
– Example: positive screen requires that a biopsy be performed
A screening test is used in the same way in two similar
populations, but the proportion of false-positive
results among those who test positive in population B
is higher than that among those who test positive in
population A. What is the most likely explanation for
this finding?
(A) The specificity of the test is higher in population A
(B) The specificity of the test is lower in population A
(C) The prevalence of disease is higher in population A
(D) The prevalence of disease is lower in population A
Performance Yield
• Predictive Value Positive (PV+)
– Individuals with a positive screening test results
will also test positive on the diagnostic test
• Predictive Value Negative (PV-)
– Individuals with a negative screening test results
are actually free of disease
Performance Yield

Predictive Value Positive (PV+)
The probability that a person
actually has a disease given that
he/she tests positive
PV+ = a / (a + b)
True Disease
Status

Predictive Value Negative (PV)
The probability that a person is
truly disease free given that
he/she tests negative
PV- = d / (c + d)
•
Results of
Screening Test
•
+
-
a
b
c
d
+
-
Performance Yield
• Factors that influence PV+ and PV1. The more specific the test, the higher the PV+
2. The higher the prevalence of preclinical disease in
the screened population, the higher the PV+
3. The more sensitive the test, the higher the PV-
Sampling errors
Decision
Reality
Treatments not
different
Ho true
Treatments are
different
Ho false
Conclude treatments are
not different
Fail to reject Ho
Correct decision
Type II error
β
Conclude treatments are
different
Reject Ho
Type I error
α
Correct decision
Power
Probability =1- β
Correct :
Reject the null hypothesis when it is false
Do not reject the null hypothesis when it is true
Errors:
Reject the null hypothesis when it is true ( Type I error =a)
Do not reject the null hypothesis when it is false( Type II error =ß)
Power = probability of detecting a difference if one truly exists, i.e. probability that a study
will find a statistically significant difference, when a difference of a given magnitude truly
exists.
Power = 1- ß, where beta is the probability of declaring a difference not statistically
significant, when a difference truly exists.
Outbreak
• Endemic: Habitual presence of disease within a
given geographic area
• Epidemic: Occurrence in a community of a group
of illnesses of similar nature in excess of what
would normally be expected. Amount of disease
depends on number susceptible (at risk) and
number not susceptible by way of immunization,
or genetics (immune).
• Pandemic: worldwide epidemic
Outbreak Types
• Common source: group of persons exposed to
common agent
• Point-exposed over a brief period of time
• Intermittent-exposed over a long period of time
• Propagated: spreads gradually from person to person
• Mixed epidemic: common source and from person to
person
How do you find outbreaks?
• Surveillance
– Track disease/injury rates over time
– Only for reported diseases
– Time delay: primary purpose is to examine trends
over time
• Laboratory reports
• Healthcare institutions
• Public health office
• Observant healthcare personnel
Epidemic Investigation
1. Establish the presence of an epidemic – case
definition
2. Communicate/Control
3. Analyze the outbreak
4. Form a hypothesis
5. Test the hypothesis
6. Complete the investigation
GOOD LUCK!!!