Uploaded by Harneet Kaur

case-control study 2023 3

advertisement
Case-control study
Yimin Zhu
Dept. Epidemiology & Biostatistics
School of Public Health, ZJU
Dec 4, 2023
Outline
 Principle of case-control study
 Design of case-control study
 Data analysis
 Strengths and limitations
Methods in clinical epidemiology
descriptive
study
cross-sectional
study
Survey the prevalence
generate a hypothesis
ecological study
observational
study
investigator observes
exposure
No intervention
experimental
study
investigator assigns exposure
intervention
case-control study
analytical
study
examine a
hypothesis
cohort study
clinical trial
prove a hypothesis
field trial
Question
In the early 1940s, Alton Ochsner observed
that most of the patients on whom he was operating
for lung cancer had a history of cigarette smoking.
Then he hypothesized that cigarette smoking might linked to
lung cancer. Based only on his observations in cases of lung
cancer, was this conclusion valid? How to test this hypothesis?
1. Principle of a case-control study
Principle of a case-control study
To test association between a specific exposure
(smoking) and a certain disease (lung cancer):
 To recruit two groups of subjects, one group (cases group)
with a particular health problem or outcome, and the other
(control) group without this outcome.
 To investigate the past exposure status (before incidence of
the disease in case subject) in case and control subjects.
 To compare the rates/strength of past exposures between
case and control groups, and then determine if the exposure
could account for the health condition of the cases.
Principle
Exposure rate
a
Exposure status
in the past
Disease
yes
cases
A/(a+b)
b
no
c
yes
Target
Population
c/(c+d)
controls
d
no
exposure
Non-exposure
Overall analysis of a case-control study
exposure
non-exposure
Rate(%)
case
A
B
A/(A+B)
Control
C
D
C/(C+D)
If, A/(A+B) ≠ C/(C+D) and this difference arrive statistically
significant,
Then, we can inferred that the exposure was associated with
the risk of interested disease.
The smoking status in
lung cancer patients and control subjects
smoker
nonsmoker
total
Rate(%)
Lung cancer
patients
688
21
709
97.0
Controls
650
59
709
91.7
n )2 n
(
ad

bc

2
χ2 
 19.13
a  b(d  c)(a  c)(b  d)
2
χ > 3.84, P < 0.05;
Here, P < 0.001
χ2 > 6.63, P < 0.01;
basic characteristics of a case-control study
observational / analytical study
retrospective design, effect to cause, lacks temporality,
uses comparison group
case
Determine the past
exposure status
Direction of inquiry
control
-7
-6
-5
-4
-3
-2
-1
0
Start point
of study
2. How to design a case-control study
key steps for case-control study
 define and recruit subjects (cases and controls)
 investigate the exposure infromation of subjects
 compare the expoure rates between case and
control groups
Selection of cases
 Definition: the subjects with a specific disease/ interested
outcome. Clear definition of the disease
 Setting diagnostic criteria:
inclusion and exclusion criteria. to avoid misclassification
Example of case selection
Selection of cases
 Representativeness: one sample of case population/ be
representative of all cases.
population
μ, ρ, RR,
or HR?
sampling
x
sample
 Sources of cases: from hospital or community
Cases may be recruited from hospitals, clinics, or disease
registries, screening/community, etc.
newly diagnosed
cases
people who may
have had the
disease for some
time
 Types of cases: Incident, prevalent and death cases.
Incident cases are preferable to prevalent/ death cases
1. low recall bias
2. better representativeness
3. reducing the effect by survival factors
4. high cooperation
5. less probability of exposure change
Control selection
 Definition: the subjects without the disease of interest:
healthy persons or patients with other diseases.
control ≠ healthy persons
"other diseases", no common risk factors with interested
disease
 Representativeness: Controls should be a representative
subgroup of members of the same base group that give
rise to cases. target population
 Multiple controls: to increase the statistical power when
cases (rare disease) are difficult to obtain.
n1 cases / n2 controls (n2 > n1)
 Multiple control groups:
Using more than one control group increases the
consistency to the results.
control grp1
control grp2
case grp
control grp3
 Sources of controls
1. Population of defined area
2. Hospital patients (with other diseases)
3. Probability sample of total population
4. Neighbors, friends or associates of cases
5. Siblings, spouses or other relatives
Two types of case-control study based on the
sources of subjects

Population-based case-control study
high representativeness and easy to extrapolate to target
population, difficult to recruit
 Hospital-based case-control study
easy to recruit subjects, lower representativeness
Comparability: similar distribution in other respects
between case and control subjects
reducing confounding bias
Matching: the process of selecting controls so that they
are similar to the cases in certain characteristics
such as age, sex, race, socioeconomic status, and
occupation.
matching factors or confounding factors
Two types of matching
1.
frequency matching or group matching
2.
individual matching
frequency matching
To select the control subjects according to the distribution of
potential confounding factors in case group, and then the
proportions of controls with certain factors weres similar with
the proportion of cases.
same distribution of age,
sex, race with cases
Case
population
N1 cases
N2 controls
Grouped case-control study
Control
population
Data in the grouped case- control study
exposure
Nonexposure
total
%
case
a
b
n1
a/(a+b)
control
c
d
n2
c/(c+d)
example
Green tea consumption and the risk of endometrial cancer:
a population-based case-control study
in urban Shanghai
 0bjective: To assess the effect of tea consumption on the risk of
endometrial cancer.
 Methods: with a population based case-control study conducted
in urban Shanghai.Face-to-face interviews were completed for
995 incidence cases aged 30-69 from Jan 1997 to Dec 2002 and
1087 female controls that frequency-matched to cases on age.
Unconditiona1 logistic mode1 was used for analysis
Age distribution between case and control groups
case
control
total
<45
104(10.45)
124(11.41)
228
45-
163(16.83)
172(15.82)
335
50-
230(23.12)
255(23.46)
485
55-
156(15.68)
165(15.18)
321
60-
160(16.08)
186(17.11)
346
>65
182(18.29)
185(17.02)
367
P value
0.918
Control group was comparable to case group in age distribution.
Individual matching
selection of control subject according to the characteristics of
individual case
cases
controls
Pair No 1
Pair No 2
 Paired case-control study: 1 case: 1 control;
 Matched case-control study: 1 case: R controls
R >1, increase the efficiency of study when the number
of cases is limited.
R
1
2
efficiency
1
1.33
3
4
5
6
1.50 1.60 1.67 1.71
∞
2.00
Optimal for the etiological study for rare diseases
Data in Paired case-control study
Type
of
pairs
Exposure status
case
control
Num of
pairs
1
2
+
+
+
-
A
C
3
4
-
+
-
B
D
case
+
-
+
A
B
-
C
D
control
example
 Vagina cancer of the is rare, only 2% incidence of female reproductive
system tumors while adenocarcinoma vagina cancer only 5-10% of
Vagina cancer.
 Adenocarcinoma vagina cancer usuallyoccurs as epidermoid carcinoma
in women over the age of 50 years.
 However, between 1966 and 1969, seven girls 15 to 22 years of age with
adenocarcinoma were seen at the Vincent Memorial Hospital. Young
patients had been recorded rarely before.
 Different distribution of histology, onset of age, cluster of disease.
Adenocarcinoma of the vagina and Maternal ingestion of stilbestrol
 A matched case- control study was used to uncover the risk
factors of this cancer.
 Eight patients were recruited as cases (another case from a
another Boston hospital)
 Four matched controls were selected for each case and
recruit requirements were females born within five days and
on the same type of service room as the case
 Dozens of variables were investigated by personal interview
for cases and controls.
main findings
Maternal
smoking
Maternal age
Bleeding in this
pregnancy
Any prior
pregnancy loss
Estrogen given in
this pregnancy
Breast feeding
Intra uterine Xray exposure
No
case
control
case
control
case
control
case
control
case
control
case
control
case
control
1
25
32
yes
2/4
no
0/4
yes
1/4
yes
0/4
no
0/4
no
1/4
2
30
30
yes
3/4
no
0/4
yes
1/4
yes
0/4
no
1/4
no
0/4
3
22
31
yes
1/4
yes
0/4
no
1/4
yes
0/4
yes
0/4
no
0/4
4
33
30
yes
3/4
yes
0/4
yes
0/4
yes
0/4
yes
2/4
no
0/4
5
22
27
yes
3/4
no
1/4
no
1/4
no
0/4
no
0/4
no
0/4
6
21
29
yes
3/4
yes
0/4
yes
0/4
yes
0/4
no
0/4
no
1/4
7
30
27
no
3/4
yes
0/4
yes
1/4
yes
0/4
yes
0/4
no
1/4
8
26
28
yes
3/4
yes
0/4
yes
0/4
yes
0/4
no
0/4
yes
1/4
tota
l
7/8 21/32
5/8
1/32
6/8
5/32
7/8
1/32 3/8 3/32
χ2
0
P
OR
4.52
7.16
23.22
<0.05
<0.01
<0.0001
8.0
10.5
28.0
2.35
0.20
10.0
1/8 4/32
0
Conclusion:
Maternal ingestion of stilbestrol during early pregnancy appears
to have enhanced the risk of vaginal adeno-carcinoma
developing years later in the offspring exposed.
The main roles of case-control study:
to screen the risk factors from multiple candidate factors
to explore the causes of for common and rare diseases
Data collecting (expsoure measurement)
previous exposure information before incidence
Methods: history records/ measurements on the
environmental, biological samples/interviews/health
check
Exposure is usually an estimate unless past
measurements are available
Each factor should have a clear definition
Never smokers were defined as subjects who had never smoked or
had smoked fewer than 100 cigarettes in his or her lifetime.
Former smokers reported a history of smoking but had stopped at
least 1 year before being diagnosed with lung cancer (or 1 year
before enrollment in the study, for control subjects).
Current smokers were currently smoking or had stopped smoking
less than 1 year before being diagnosed with lung cancer (or less
than 1 year before enrollment in the study, for control subjects).
3. interview with questionnaire.
face to face
telephone
internet
To avoiding recall bias!
recall bias
mothers with mothers without
congenital
congenital
malformations malformations
assume that
true incidence of
infection (%)
15
15
infection recalled
(%)
60
10
infection rate
ascerteained by
interview (%)
9.0
1.5
P value
1.00
<0.01
Sample size estimation
 Adequate Sample size and statistical power
For grouped case-control study, the sample size was
calculated as the following formula:
N
( K 2 P Q  K  P1Q1  P0Q0 ) 2
( P0  P1 ) 2
, β: the Type I and Type II error rates,
P0, P1: the predicated exposed rate in case and control group
q0 = 1 p0,q1=1p1;
P=(p0+p1)/2, q=1p
Data analysis
Grouped case-control study
exposure
Nonexposure
total
%
case
A
B
n1
A/(A+B)
Control
C
D
n2
C/(C+D)
1. Statistical difference of exposure
between case and control groups?
n )2 n
(
ad

bc

2
χ2 
a  b (d  c)(a  c)(b  d)
 2 >3.84, P<0.05;
 2 >6.63, P<0.01
2. Calculate the Strength of association
+
-
case
A
B
Control
C
D
Odds ratio, OR
exposure probability for case
= A/(A+B)
Odd exposure for case
= exposure probability/ un exposure probability
=A/(A+B) / B/(A+B)=A/B
Similarly, Odd exposure for control=C/D
Odds ratio (OR) is defined as the odd of exposure for
cases divided by the odd of exposure for controls
OR=(A/B) / (C/D)=AD/BC
OR estimates the relative risk (RR)
OR=1, Null value
0
0.5
Protective factors
1
2
3
Risk factors
OR<1, Exposure reduces disease risk (Protective factor)
OR>1, Exposure increases disease risk (Risk factor)
OR=1, no association between exposure and disease
the association between OR value and the
strength of association
OR
association
0.9~1.0
1.0~1.1
no
0.7~0.8
1.2~1.4
weak
0.4~0.6
1.5~2.9
median
0.1~0.3
3.0~9.0
strong
<0.1
>10.0
very strong
95% confidence interval, 95%CI
OR
11.96

2
The smoking status in lung cancer patients and healthy controls
Lung cancer
Control
smoker
non-smoker
n
688
21
709
650
59
709
P<0.01
The risk for lung cancer in smokers was nearly 1.97 times greater
than that in the non-smokers. Smoking was associated with
increased risk for lung cancer
Dose- response association
Average Daily
Cigarettes
Lung Cancer
Patients
Control
Group
OR(95%CI)
0
7
61
1
1–4
55
129
3.7(1.6-8.6)
5–14
489
570
7.5(3.4-16.5)
15–24
475
431
9.6(4.4-21.2)
25–49
293
154
16.6(7.4-37.1)
50+
38
12
27.6(10.0-76.2)
Total
1357
1357
stratification analysis
esophageal
cancer
control
total
drinking
a
b
a+b
no-drinking
c
d
c+d
a+c
b+d
n
total
OR=ad/bc
confounding effect by smoking?
Smoking subjects
non-smoking subjects
esophageal
cancer
control
total
drinking
A1
B1
A1+b1
A2
B2
nodrinking
C1
D1
C1+d1
C2
D2
total
A1+c1
B1+d1
N1
A2+c2
B2+d2
OR1=a1×1/(b1×c1)
esophageal
control
cancer
total
A2+b2
C2+d2
N2
OR2=a2×d2/(b2×c2)
If OR ≠ OR1 or OR2,smoking induces confounding effect
Paired case-control study
Individual matching
control
χ2 
Exposure status
Type of
pairs
case
control
Num of
pairs
1
+
+
A
2
+
-
C
3
-
+
B
4
-
-
D
+
case
+
A
B
-
C
b  c
D
2
 1
bc
OR= c / b
95% CI for OR
OR
(11.96 /  2 )
Paired case-control study of esophageal cancer
To examine the association between smoking and esophageal
cancer with paired case-control design, How do you select
control subjects?
case
control
+
-
+
55
6
-
26
6
Advantages of case-control study
 Relatively take less time, sample size and expense, easy to
complete, no follow-up time is involved (relatively less sample
size
 Most efficient design for rare diseases (enlarge the number of
the control subjects
 multiple risk factors can be studied simultaneously. So it is
usually used to screen of the risk factors of a specific complex
disease such as cancer diabetes, etc.
Limitations of case-control study
 Sometimes time sequence between exposure and
outcome is uncertain
 No direct estimate proxy for risk or rate ratio
 Possible bias in selection of subjects and
measurement of risk factors (recall bias)
what's the differences between case control
study and cohort study?
What's the difference
between case-control study and cohort study?
case-control study
cohort study
classification of
methods
observation/ analytical
observation/ analytical
intervention
No
No
criteria of classification
disease?
exposure?
groups of comparison
case/control
exposure/unexposure
temporality
retrospective
prospective
Causal sequence
from cause to effect
from effect to cause
comparison?
exposure rates between
case and control groups
incidence rates between
exposure and unexposure
groups
index of association
odds ratio
relative risk
purpose in causal
exploration
test the hypothesis; rare
disease; multipe risk
factors
test the hypothesis; one risk
factors; common disease and
common factors
homework
Children with higher birth weights (birth
weight greater than 8 lbs) are at
increased risk for certain childhood
cancers. To test this hypothesis with a
case control study,
Type
of
pairs
1
2
How to select the case/ control subjects?
What’s exposure in this study?
What data should be collected? How to
3
4
Exposure status
case
control
Num of
pairs
+
+
-
+
+
-
8
18
7
38
get these information?
Calculate and interpret the main results:
+: birth weight greater than 8 lbs
-: birth weight lower than 8 lbs
PLS submit your homework before next Monday!
Download