Cohort, case-control & cross- sectional studies
Kostas Danis
EPIET Introductory course
Menorca, Spain
16/9-12/10/2012
Source: Alain Moren, EPIET Introductory courses
Two types
Observation
Experiment
Exposure assigned
Exposed
Not exposed
Unethical to perform experiments on people if exposure is harmful
Disease occurrence
If exposure not harmful
Treatment
Preventive measure (vaccination)
R andomised
C ontrolled
T rial
Blinded
Doses
Time period
Risk - effect
No bias
If RCT not possible
Left with observation of experiments designed by Nature
Cohort studies
Cross-sectional studies
Case control studies
marching towards outcomes
One of 10 divisions of a Roman legion
Group of individuals
sharing same experience
followed up for specified period of time
Examples
EPIET cohort 2012
birth cohort
cohort of guests at barbecue
occupational cohort of chemical plant workers
influenza vaccinated in 2011-12
follow-up period
end of follow-up
Cumulative incidence
incidence proportion
attack rate (outbreak)
Incidence rate
Purpose
Study if an exposure is associated with outcome(s)
Estimate risk of outcome in exposed and unexposed cohorts
Compare risk of outcome in two cohorts
Cohort membership
Being at risk of outcome(s) studied
Being alive and
Being free of outcome at start of follow-up
exposed
unexposed
exposed
Incidence among exposed unexposed
Incidence among unexposed
ill not ill Total a b a+b ate ham did not eat ham c d c+d
Risk in exposed= a/a+b
Risk in unexposed= c/c+d
Incidence rate
Number of NEW cases of disease
Total person - time of observation
Incidence rate
Number of NEW cases of disease
Rate
Total person - time of observation
Denominator:
- is a measure of time
the sum of each individual’s time at risk and free from disease
C
D
E
A
B
Person-time
90 91 92 93 94 95 96 97 98 99 Time at risk x x
6.0
6.0
10.0
8.5
5.0
Total years at risk 35.5
-- time followed x disease onset
Incidence rate (IR)
(Incidence density)
90 91 92 93 94 95 96 97 98 99 00 Time at risk
A
B
6.0
6.0
C 10.0
D
E
= 0.056 cases / person year x
8.5
5.0
= 5.6 cases / 100 person years
Total years at risk 35.5
= 56 cases / 1000 person years
-- time followed x disease onset
Tobacco smoking and lung cancer,
England & Wales, 1951
Person-years Cases
Smoke 102,600 133
Do not smoke 42,800 3
Source: Doll & Hill
Daily number of cigarettes smoked
Person-years at risk
Lung cancer cases
> 25
15 - 24
1 - 14 none
25,100
38,900
38,600
42,800
57
54
22
3
Exposure Study starts
Disease occurrence
Study starts Exposure
Disease occurrence time time
Exposure
Disease occurrence
Study starts time
Identify group of
exposed subjects
unexposed subjects
Measure incidence of disease
Compare incidence between exposed and unexposed group
exposed
Incidence among exposed unexposed
Incidence among unexposed
Absolute measures
Risk difference (RD)
e
ue
Relative measures
Relative risk (RR)
Rate ratio
Risk ratio
I e
= incidence in exposed
I ue
= incidence in unexposed
e
ue
Exposed
Total
100
Not exposed
100
Cases
Non cases Risk %
50 50 50 %
10 90 10 %
Risk ratio 50% / 10% = 5
ate ham did not eat ham ill not ill Incidence
49 49 98 50 %
4 6 10 40 %
Risk difference 50% - 40% = 10%
Relative risk 50% / 40% = 1.25
Risk factor
No association
Protective factor
Exposure
HIV +
HIV -
Population
(f/u 2 years)
215
298
Cases
8
1
Incidence
(%)
3.7
0.3
Relative
Risk
11
Status Pop.
Vaccinated 301,545
Unvaccinated 298,655
Total 600,200
Cases
Cases per
1,000
150
515
0.49
1.72
RR
0.28
Ref.
665 1.11
VE = 1 - RR = 1 - 0.28
= 72%
Exposure level
High
Medium
Low
Population Cases Incidence at risk
N
1
N
2
N
3 a a a
1
2
3
I
I
I
1
2
3
Unexposed N ne c I ue
Exposure level
High
Medium
Low
Population Cases Incidence RR at risk
N
1
N
2
N
3 a a a
1
2
3
I
I
I
1
2
3
RR
RR
RR
1
2
3
Unexposed N ne c I ue
Reference
Tobacco smoking and lung cancer,
England & Wales, 1951
Cigarettes smoked/d
> 25
15 - 24
1 - 14 none
Source: Doll & Hill
Person-years at risk
Cases Rate per
1000 p-y
Rate ratio
25,100
38,900
38,600
42,800
57
54
22
3
2.27
1.39
0.57
0.07
32.4
19.8
8.1
Ref.
Large sample size
Latency period
Cost
Time-consuming
Loss to follow-up
Exposure can change
Multiple exposure = difficult
Ethical considerations
Can directly measure
incidence in exposed and unexposed groups
true relative risk
Well suited for rare exposure
Temporal relationship exposure-disease is clear
Less subject to selection biases
outcome not known (prospective)
Can examine multiple effects for a single exposure
Population exposed N e unexposed N ne
Outcome 1 Outcome 2 Outcome 3
I e1
I ue1
I
I e2 ue2
I
I e3 ue3
RR
1
RR
2
RR
3
A cohort study allows to calculate indicators which have a clear, precise meaning.
The results are immediately understandable.
Cross-sectional (prevalence) studies
Cross-sectional studies
Observation of a cross-section of a population at a single point in time
Recruitment of study participants
Population
Population sample
Observation for the presence of:
One or more outcomes
One or more exposures
Sampling
Sampling
Population
Sample
Target Population
Uses of cross-sectional surveys in public health
Estimate prevalence of disease or their risk factors
Estimate burden
Measure health status in a defined population
Plan health care services delivery
Set priorities for disease control
Generate hypotheses
Examine evolving trends
Before / after surveys
Iterative cross-sectional surveys
Potential objectives of a cross sectional study
Descriptive
Estimate prevalence
Analytic
Compare the prevalence of a disease in various subgroups, exposed and unexposed
Compare the prevalence of an exposure in various subgroups, affected and unaffected
Presentation of the data of an analytical cross sectional study in a 2 x 2 table
Exposed
Non exposed
Ill a c
Non ill b d
Total a+b c+d
Simultaneous measurement of outcomes and exposures
Total Cases
Non cases Prevalence %
Exposed 1,000 500 500 50 %
Not exposed
1,000 100 900 10 %
Prevalence ratio (PR) 50% / 10% = 5
Measuring association in analytical cross-sectional surveys
Prevalence among exposed / prevalence among unexposed
Prevalence ratio
Formula equivalent to risk ratio
Concept different
No incidence
Only prevalence
• depends on both occurrence of new cases & duration of disease
Prevalence of West Nile virus (WNV) infection by place of residence, Central Macedonia,Greece, 2010
Rural
Infected Total Prevalence Prevalence ratio
38 491 7.7% 5.9
Urban 3 232 1.3% Ref
Prevalence of HIV infection by socioeconomic status,
African country X, 1999
High class
Low class
Infected Total Prevalence Prevalence ratio
15 235 6.4% 2.6
11 450 2.4% Ref
Prevalence of hepatitis C (HCV) infection by quantity of therapeutic injections, Hazabad, Pakistan, 1993
No.of injection s
>10
Infected Total
Prevalence Prevalence ratio
9 41 22% 22
0-10 4 52 8% 8
0 1 82 1% Ref
Advantages of cross-sectional surveys
Fairly quick
Easy to perform
Less expensive
Adapted to chronic diseases
Limitations of cross-sectional surveys
Limited capacity to document causality
(exposure and outcome measured at the same time
-difficult to establish time sequence of events)
Not useful to study disease etiology
Not suitable for the study of rare / short diseases
Not adapted to severe / acute diseases
Not adapted to incidence measurement.
Limitations of causal inference in analytical cross sectional studies
• Prevalent cases
• Exposure and outcome examined at the same time
the incidence rate in the exposed population to the rate that would have been observed in the same population, at the same time if it had not been exposed
Source population
Exposed
Unexposed
Source population
Exposed
Unexposed
Cases
Source population
Exposed
Unexposed
Sample
Cases
Controls
Source population
Exposed
Unexposed
Sample
Cases
Controls:
Sample of the denominator
Representative with regard to exposure
Controls
Intuitively if the frequency of exposure is higher among cases than controls then the incidence rate will probably be higher among exposed than non-exposed
Exposure
?
?
Disease
Controls
Retrospective nature
Distribution of cases and controls according to exposure in a case control study
Exposed
Not exposed
Total
% exposed
Cases a c a + c
Controls b d b + d
Distribution of cases and controls according to exposure in a case control study
Exposed
Not exposed
Total
% exposed
Cases a c a + c a/(a+c)
Controls b d b + d b/(b+d)
Distribution of myocardial infarction by oral contraceptive use in cases and controls
Oral contraceptives
Yes
No
Total
% exposed
Myocardial
Infarction
693
307
1000
69.3%
Controls
320
680
1000
32%
Distribution of myocardial infarction by amount of physical activity in cases and controls
Physical activity
>= 2500 Kcal
< 2500 Kcal
Total
% exposed
Myocardial
Infarction
190
176
366
51.9%
Controls
230
136
366
62.8%
Volvo factory, Sweden, 3000 employees,
Cohort study
200 cases of gastroenteritis
Water
Consumption
YES
NO
Total
Cases
150
50
200
Controls
?
?
200
Exploratory
New disease
New risk factors
Several exposures
"Fishing expedition"
Analytical
Define a single hypothesis
Dose response
Rate/risk
Rate/risk difference
Rate Ratio/Risk ratio (strength of association)
No calculation of rates/risks
Proportion of exposure
Any way of estimating measures of association?
Odds
Probability that an event will happen
Probability that an event will not happen
Probability that cases/controls will be exposed
Probability that cases/controls will not be exposed
Cases Controls Odds ratio
Exposed a b
OR= (a/c) / (b/d)
= ad / bc
Not exposed c
Total a+c
% exposed a/(a+c)
% unexposed c/(a+c)
Odds of exposure
d b+d b/(b+d) d/(b+d)
Exposed
Not exposed
Total
Odds of exposure
Cases Controls Odds ratio
50 20 4 a
50 80 b c d
100
50/50
100
20/80
OR= (a/c) / (b/d)
= ad / bc
= (50x80) / (20x50)
= 4
E
Cases Controls a b
Odds ratio a b
-----= c d a x d
--- ---b x c
E c d
Frequency of chicken consumption in campylobacter cases and controls, Republic of Ireland and Northern
Ireland, 2003
Cases Controls Odds ratio
251 2.1
Ate chicken 181
Did not eat chicken
15 44 Ref
Frequency of contact with a dog in campylobacter cases and controls, Republic of Ireland and Northern
Ireland, 2003
Cases
Contact with dog
Yes
No
29
158
Controls
93
201
Odds ratio
0.40
Ref
Distribution of myocardial infarction by recent oral contraceptive use in cases and controls
Oral contraceptives
Yes
No
Myocardial
Infarction
693
307
Controls
320
680
Total 1000 1000
Odds 693/307= 320/680= of exposure 2.2
0.5
OR
4.8
Ref.
Distribution of myocardial infarction by amount of physical activity in cases and controls
Physical activity
>= 2500 Kcal
Myocardial
Infarction
190
Controls
230
< 2500 Kcal
Total
176
366
136
366 odds of 190/176= 230/136= exposure 1.1
1.7
OR
0.64
Ref.
Distribution of cases of endometrial cancer by oestrogen use in cases and controls
Oestrogen use
High
Low
None
Cases Controls Odds ratio a1 b1 a1d/b1c a2 b2 a2d/b2c c d Reference
Relation of hepatocellular adenoma to duration of oral contraceptive use in 79 cases and 220 controls
Months of
OC use
0-12
13-36
37-60
61-84
>= 85
Total
Cases
7
11
20
21
20
79
Source: Rooks et al. 1979
121
49
23
20
7
220
Controls Odds ratio
Ref.
3.9
15.0
18.1
49.7
Rare diseases
Several exposures
Long latency
Rapidity
Low cost
Small sample size
Available data
No ethical problem
Cannot compute directly risk
Not suitable for rare exposure
Temporal relationship exposure-disease difficult to establish
Biases +++
control selection
recall biases when collecting data
Loss of precision due to sampling
The cohort study is the gold-standard of analytical epidemiology
CASE-CONTROL STUDIES HAVE THEIR PLACE
IN EPIDEMIOLOGY, but if cohort study possible, do not settle for second best
Thank you!
Back-up slides
E
Cases
Population denominator a P
1
E c P
0
I
1
= a / P
1 }
I
1
/ I
0
= -----a /P
1 c /P
0 I
0
= c /P
0
Cases
Population sample
E a P
1
/10
E c P
0
/10 a
I
1
= --------
P
1
/ 10 c
I
0
= --------
P
0
/10
}
I
1
/ I
0
= -----a /P
1 c /P
0
Source population
Cases Pop.
E a
E c
P
1
P
0
I
1
= a / P
1
}
I
1
/ I
0
= -----a /P
1 c /P
0
I
0
= c /P
0
= sample
E
Cases Controls a b
P
1 b
--= ----
P
0 d
E c d
Source population
Cases Pop.
E a P
1
I
1
= a / P
1
E
E c c
E
Cases a d
P
0 b
= sample
Controls
P
1
I
0
= c /P
0 b
}
I
1
/ I
0 a /P
1 a . P
0 a . d
= ------ = ------- = ----- = a / c
-----c /P
0 c . P
1 c . b b / d
Since d/b = P
0
/ P
1
--= ----
P
0 d