匹配病例对照研究的分析

包办婚姻
Arranged marriage
匹配病例对照研究的分析
Analysis of matched case control studies
伊万 胡廷- 世界卫生组织
Dr Yvan Hutin – WHO
可从本讲座中获得的能力
Competency to be gained from this lecture
匹配病例对照研究的设计和分析
Design and analyze a matched case control study
关键内容
Key elements
• 匹配的概念
The concept of matching
• 匹配的分析
The matched analysis
• 匹配的优缺点
Pro and cons of matching
控制混杂因素
Controlling a confounding factor
•
•
•
•
•
分层
限制
匹配
随机
多因素分析
Stratification
Restriction
Matching
Randomization
Multivariate analysis
匹配定义
Matching: Definition
• 产生病例和对照之间的联系
Creation of a link
between cases and controls
• 这种联系是
This link is:
 基于共同的特点
Based upon common characteristics
 在研究设计时产生
 分析时也应保持联系
Created when the study is designed
Kept through the analysis
非匹配的对照组
Unmatched control group
病例 Cases
对照 Controls
病例
Bag of cases
对照 Bag of controls
匹配的对照组
病例
对照
Matched control group
Cases
Controls
病例与对照的对子不能分开
Sets of cases and
controls that cannot be dissociated
匹配概念:Matching: concept
• 可能存在混杂
Confounding is anticipated
 调整将是必须的
Adjustment will be necessary
• 对设想的层做好准备
 病例和对照入选
• 按层
Preparation of the strata a priori
Recruitment of cases and controls
By strata
• 保证层内有足够的样本量
strata size
To insure sufficient
为什么要匹配?
Why matching?
• 如果病例和对照在匹配因素方面相同
If cases
and controls are similar for the matching variables,
• 那么病例对照间的不同就是其他因素造成的而
不是匹配因素(混杂因素)Then, differences must
be otherwise explained.
结果....
• 问题
 混杂
Consequences....
The problem:
Confounding
• 解决的同时可能伴有其他问题
Is solved with
another problem:
 引入了更多地混杂 Introduction of more confounding,
 分层分析可以消除这些问题
analysis can eliminate it.
so that stratified
匹配标准
• 潜在的混杂因素
Matching criteria
Potential confounding factors
 与暴露存在关联
Associated with exposure
 与结局存在关联
Associated with the outcome
• 标准
Criteria
 独特
Unique
 多种
Multiple
 必要
Always justified
匹配策略的类型
Types of matching strategies
• 频数匹配
 层较大
• 集合匹配
 层较小
Frequency matching
Large strata
Set matching
Small strata
 有时候非常小(1:1配对匹配)
small (1/1: pairs)
Sometimes very
Example of frequency matching: Attack rate of
acute hepatitis (E) by residence, Baripada,
Orissa, India, 2004
Attack rate
0 - 0.9 / 1000
1 - 9.9 / 1000
10 -19.9 / 1000
20+ / 1000
Underground water supply
Pump from river bed
频数匹配 - Frequency matching example
• Residence is a possible confounding factor
 Associated with disease
 Probably associated with exposure
• Frequency-matching by area
 Each area, numbers of controls= Number of cases
• Analysis stratified by area
Distribution of cholera cases by
households, Parpatia Village,
Orissa, India 2003
Tube well
Protected well
Unprotected well
Household with no cases
Household with 1 case (Triangle: Index)
Household with 2 cases
Household with 3 cases
集合匹配 - Set matching example
• Residence is a possible confounding factor
 Associated with disease
 Probably associated with exposure
• Set-matching by neighbourhood
 For each case, two controls in adjacent houses
• Analysis matched by set
匹配:不正确的想法
Matching: False pre-conceived ideas
 所有病例对照研究都需要匹配
Matching is
necessary for all case-control studies
 必须对年龄和性别进行匹配
Matching needs to be
done on age and sex
 匹配是对病例和对照数量上的调整
Matching is a
way to adjust the number of controls on the number of cases
匹配:真实的描述
Matching: True statements
匹配可能会给你带来麻烦
Matching can
put you in trouble
匹配能够使你很快找到对照
be useful to quickly recruit controls
Matching can
关联指标的计算
Calculation of measures of association
• 相对危险
• 比值比
Relative risk
Odds ratio
 当疾病罕见时,可用比值比来估计相对危险
Approximation of relative risk if disease is rare
 非匹配时的计算公式: OR = ad/bc
Formula = ad/bc if unmatched
 匹配时的计算公式较复杂
• 患病率比 Prevalence ratio
More complex if matched
病例对照研究中OR的计算
Calculation of the odds ratio in a case control study
病例
Case
对照
Control
合计 Total
暴露 Exposed
a
b
N/A
非暴露 Non-exposed
c
d
N/A
a+c
b+d
N/A
合计 Total
病例中暴露比值
Odds of exposure among cases=
a/c
对照中暴露比值 Odds of exposure among controls = b/d
比值比 Odds ratio: ad/bc
匹配策略与对应的分析
Matching strategies and corresponding analysis
• 策略
Strategies
 频数匹配
• 层较大
Frequency matching
Large strata
 集合匹配
Set matching
• 层较小
Small strata
• 有时候非常小(1:1配对匹配)
Sometimes very small
(1/1: pairs)
• 分析
Analysis
 混杂因素的分层分析
 Stratified analysis as for confounding factors
 如果是1:1配对匹配,则每一层的样本量是 2
 If pair matching, then strata size is 2
Mantel-Haenszel 调整在分层分析中的应用
Mantel-Haenszel adjusted odds ratio applying to stratified analysis
ai.di) / Ti]
OR M-H=
bi.ci) / Ti]
配对病例对照研究的分析(1:1)
Matched analysis by set (Pairs of 1 case / 1 control)
• 一致的对子
Concordant pairs
 病例与对照有相同的暴露 Cases and controls have the same exposure
 ad 和 bc对计算没有贡献 No ad and bc: no input to the calculation
Cases
Controls
Total
Exposed
1
1
2
Nonexposed
0
0
合计 Total
1
1
无作用
No effect
Cases
Controls
Total
Exposed
0
0
0
0
Nonexposed
1
1
2
2
Total
1
1
2
无作用
No effect
配对病例对照研究的分析(1:1)
Matched analysis by set (Pairs of 1 case / 1 control)
• 不一致的对子
Discordant pairs
 病例与对照有相同的暴露
exposures
 ad 和 bc对计算有贡献
Cases
Controls
Total
Exposed
1
0
1
Non exposed
0
1
Total
1
1
正相关
Positive association
Cases and controls have different
ad’s and bc’s: input to the calculation
Cases
Controls
Total
Exposed
0
1
1
1
Non exposed
1
0
1
2
Total
1
1
2
负相关
Negative association
在这种情况下就是配对的OR
…becomes the matched odds ratio
OR
病例暴露(对照不暴露)不一致的对子数
 Discordant sets case exposed
M-H=
 对照暴露(病例不暴露)不一致的对子数
 Discordant sets control exposed
使用纸夹将相关对子夹起来,帮助分析…
…and the analysis can be done with paper clips!
• 一致对子的调查表: 垃圾!
Concordant questionnaire: trash
• 不一致对子的调查表 放在天平的两端
Discordant
questionnaires : on the scale
 “病例暴露”(而对照不暴露)的不一致对子放在
正相关一边
The "exposed case" pairs weigh for a positive
association
 “对照暴露”(而病例不暴露)的不一致对子放在
负相关一边
association
The "exposed control" pairs weigh for a negative
1:M (M>1) 病例对照研究的分析
Analysis of matched case control studies with >1 control per case
• 根据病例和对照暴露状态挑选集合
Sort out the sets
according to the exposure status of the cases and controls
例如1个病例/2个对照
Example for 1 case / 2 controls
暴露病例的集合
Sets with case exposed: +/++, +/+-, +/未暴露病例的集合
Sets with case unexposed: -/++, -/+-, -/--
• 对集合内的每一类型对子进行计数
Count reconstituted case-
control pairs for each type of set
• 将集合内不一致“对子数”乘以(符合该集合)的对
子总数 Multiply the number of discordant pairs in each type of set by the
number of sets
• 然后用f/g这个公式来计算OR
formula
Calculate odds ratio using the f/g
传统的四格表 The old 2 x 2 table...
Cases
Controls
Total
暴露Exposed
a
b
L1
非暴露Unexposed
c
d
L0
C1
C0
T
合计Total
Odds ratio: ad/bc
变成对子就很难认识了!
... is difficult to recognize!
Cases
Controls
Exposed
Unexposed
Total
Exposed
e
f
a
Unexposed
g
h
c
Total
b
d
P (T/2)
Odds ratio: f/g
The Mac Nemar chi-square
(f - g)
Chi2
McN=
(f+g)
2
匹配病例对照研究的分析:从概念到数据管理
Analysis of matched case control studies: From concepts to database management
• 在研究设计里就将病例和对照联系在一起
Link between sets of cases and controls in the study design
• 在分析中使用这些联系
Use of this link in the data
analysis
 频数匹配的分层分析
Stratification (Frequency matching)
 集合匹配的匹配分析
Matched analysis (Set matching)
 配对匹配的匹配分析Paired analysis (Case control pairs)
• 在数据库中要有识别这种联系的变量
•
Need to materialize this link in the dataset to prepare the
data analysis
匹配变量
The matched set variable
• 是一个另加的变量 Extra variable
• 每个病例-对照的识别号 Identifier for each case-control
 在频数匹配中是“层” Stratum (Frequency matching)
 在集合匹配中是“集合” Set (Case control sets)
 在配对匹配是“对子” Pair (Case control pairs)
• 在每一“层/集合/对子”内中,所有病例和对照
的这个变量值都是相同的 All cases and controls belonging to
the same stratum/ set/ pair share the same value for the matched set
variable
• 在不同的“层/集合/对子”,这个变量值不同
Each stratum/ set/ pair have a unique value
2个集合的1:3集合匹配数据库变量结构
Database structure for 2 sets of one case and three controls
ID
1
2
3
Set
1
1
1
Status
1
2
2
Age
12
13
11
Sex
1
1
1
SES
3
3
3
4
5
6
1
2
2
2
1
2
10
42
44
1
2
2
3
2
2
7
8
2
2
2
2
40
41
2
2
2
2
4个对子的配对匹配数据库变量结构
Database structure for 4 pairs of one case and one control
ID
1
2
3
Pair
1
1
2
Status
1
2
1
Age
12
13
22
Sex
1
1
2
SES
3
3
7
4
5
6
2
3
3
2
1
2
22
42
44
2
2
2
7
2
2
7
8
4
4
1
2
54
55
2
2
1
1
匹配:优点
Matching: Advantages
• 易于沟通 Is easy to communicate
• 对很强地混杂有用处 Is useful for strong confounding
factors
• 能增加小型研究的效率
Can increase the power of small
studies
• 能较容易地选择对照 Can ease control recruitment
• 如果只有一个研究因素,则有用处 Is useful if only
one factor is studied
• 匹配后,仍可以识别效应修饰因子
for effect modification with matching criteria
Allows looking
匹配:不方便的方面
Matching: Inconvenience
• 必须使作者清楚
Must be understood by the author
• 在不存在混杂情况下导致不利
Is deleterious in the
absence of confounding
• 可降低研究效率
Can decrease power
• 造成对照选择的复杂
Can complicate control recruitment
• 存在多个危险因素时,作用有限
Is limiting if more
than one factor
• 不再能对匹配因素进行关联性分析
Does not allow
examining the association with the matching criteria
匹配变量与暴露有关,但与疾病无关时
(匹配过度)
Matching with a variable associated with exposure, but not with
illness (Overmatching)
• 减少了变异度 Reduces variability
• 增加了一致地对子 Increases the number of
concordant pairs
• 造成有害的结果 Has deleterious
consequences:
 如果匹配分析:降低了研究效率
 If matched analysis: reduction of power
 如果不匹配分析:OR将偏向1
 If match broken: Odds ratio biased towards one
隐藏地匹配(“秘密匹配”)
Hidden matching (“Crypto-matching”)
• 一些对照的选择策略造成事实上的匹配
Some
control recruitment strategies consist de facto in matching
 邻居对照
Neighbourhood controls
 朋友对照
Friends controls
• 必须发现这样的匹配,并在分析中加以考虑
Matching must be identified and taken into account
in the analysis
匹配的操作性原因
Matching for operational reasons
• 暴发调查的情况下 Outbreak investigation setting
• 朋友和邻居对照是常见的选择 Friends or
neighbours controls are a common choice
• 优点:Advantages:
 更快地找到对照 Allows identifying controls fast
 可以大体上控制混杂 Will take care of gross
confounding factors
 可导致一些过度匹配,但使调查者处于安全地带
May results in some overmatching, which places the
investigator on “the safe side”
打破匹配
Breaking the match
• 合理性 Rationale
 匹配会限制分析
Matching may limit the analysis
 匹配也许只是为了操作方便
Matching may have
been decided for operational purposes only
• 处理过程 Procedure
 进行匹配的分析
Conduct matched analysis
 再进行非匹配的分析
Conduct unmatched analysis
 如果两者结果类似,可以打破匹配分析 Break the
match if the results are unchanged
结论
Take home messages
• 匹配是个难以掌握的技术
• Matching is a difficult technique
• 匹配设计意味着匹配分析
• Matching design means matched analysis
• 通常可以想到办法避免匹配
• Matching can always be avoided
对≥1:1匹配资料中连续性变量和整数变量的匹配分
析 Matched analysis for ≥ 1:1 matching for a continuous or integer variable
• 按匹配变量进行分层后,使用方差分析方法
Use ANOVA with stratification by the matching variable
• 使用条件Logistic模型
regression with one variable
Use conditional logistic
其他问题
• 分层
Additional issues
Stratification
 可以进行 Can be done
 通过选择来筛选数据 Cut the data set by selection
 甚至可对匹配变量采用此方法 Can even be done on the matching
variable
• 处理更多的混杂
Handling more confounding
 传统方法难以做到
 条件Logistc回归
Impossible with traditional methods
Conditional logistic regression
• >1危险因素的处理Handling > 1 risk factor
 匹配分析只能每一个因素的单独分析
at them individually
 使用限制
 使用条件Logistc回归
Matched analysis only look
Use restriction
Use conditional logistic regression
配对匹配的连续性和整数变量的匹配分析
Analysis of matched pairs for a continuous or integer variable
2.45
3.33
1.25
1.56
2.01
2.99
2.04
1.65
-0.44
-0.34
+0.79
+0.09
2.71
2.09
1.98
2.90
1.76
2.56
+0.19
-0.33
+0.58
1.42
1.73
2.07
2.65
+0.65
+0.92