包办婚姻 Arranged marriage 匹配病例对照研究的分析 Analysis of matched case control studies 伊万 胡廷- 世界卫生组织 Dr Yvan Hutin – WHO 可从本讲座中获得的能力 Competency to be gained from this lecture 匹配病例对照研究的设计和分析 Design and analyze a matched case control study 关键内容 Key elements • 匹配的概念 The concept of matching • 匹配的分析 The matched analysis • 匹配的优缺点 Pro and cons of matching 控制混杂因素 Controlling a confounding factor • • • • • 分层 限制 匹配 随机 多因素分析 Stratification Restriction Matching Randomization Multivariate analysis 匹配定义 Matching: Definition • 产生病例和对照之间的联系 Creation of a link between cases and controls • 这种联系是 This link is: 基于共同的特点 Based upon common characteristics 在研究设计时产生 分析时也应保持联系 Created when the study is designed Kept through the analysis 非匹配的对照组 Unmatched control group 病例 Cases 对照 Controls 病例 Bag of cases 对照 Bag of controls 匹配的对照组 病例 对照 Matched control group Cases Controls 病例与对照的对子不能分开 Sets of cases and controls that cannot be dissociated 匹配概念:Matching: concept • 可能存在混杂 Confounding is anticipated 调整将是必须的 Adjustment will be necessary • 对设想的层做好准备 病例和对照入选 • 按层 Preparation of the strata a priori Recruitment of cases and controls By strata • 保证层内有足够的样本量 strata size To insure sufficient 为什么要匹配? Why matching? • 如果病例和对照在匹配因素方面相同 If cases and controls are similar for the matching variables, • 那么病例对照间的不同就是其他因素造成的而 不是匹配因素(混杂因素)Then, differences must be otherwise explained. 结果.... • 问题 混杂 Consequences.... The problem: Confounding • 解决的同时可能伴有其他问题 Is solved with another problem: 引入了更多地混杂 Introduction of more confounding, 分层分析可以消除这些问题 analysis can eliminate it. so that stratified 匹配标准 • 潜在的混杂因素 Matching criteria Potential confounding factors 与暴露存在关联 Associated with exposure 与结局存在关联 Associated with the outcome • 标准 Criteria 独特 Unique 多种 Multiple 必要 Always justified 匹配策略的类型 Types of matching strategies • 频数匹配 层较大 • 集合匹配 层较小 Frequency matching Large strata Set matching Small strata 有时候非常小(1:1配对匹配) small (1/1: pairs) Sometimes very Example of frequency matching: Attack rate of acute hepatitis (E) by residence, Baripada, Orissa, India, 2004 Attack rate 0 - 0.9 / 1000 1 - 9.9 / 1000 10 -19.9 / 1000 20+ / 1000 Underground water supply Pump from river bed 频数匹配 - Frequency matching example • Residence is a possible confounding factor Associated with disease Probably associated with exposure • Frequency-matching by area Each area, numbers of controls= Number of cases • Analysis stratified by area Distribution of cholera cases by households, Parpatia Village, Orissa, India 2003 Tube well Protected well Unprotected well Household with no cases Household with 1 case (Triangle: Index) Household with 2 cases Household with 3 cases 集合匹配 - Set matching example • Residence is a possible confounding factor Associated with disease Probably associated with exposure • Set-matching by neighbourhood For each case, two controls in adjacent houses • Analysis matched by set 匹配:不正确的想法 Matching: False pre-conceived ideas 所有病例对照研究都需要匹配 Matching is necessary for all case-control studies 必须对年龄和性别进行匹配 Matching needs to be done on age and sex 匹配是对病例和对照数量上的调整 Matching is a way to adjust the number of controls on the number of cases 匹配:真实的描述 Matching: True statements 匹配可能会给你带来麻烦 Matching can put you in trouble 匹配能够使你很快找到对照 be useful to quickly recruit controls Matching can 关联指标的计算 Calculation of measures of association • 相对危险 • 比值比 Relative risk Odds ratio 当疾病罕见时,可用比值比来估计相对危险 Approximation of relative risk if disease is rare 非匹配时的计算公式: OR = ad/bc Formula = ad/bc if unmatched 匹配时的计算公式较复杂 • 患病率比 Prevalence ratio More complex if matched 病例对照研究中OR的计算 Calculation of the odds ratio in a case control study 病例 Case 对照 Control 合计 Total 暴露 Exposed a b N/A 非暴露 Non-exposed c d N/A a+c b+d N/A 合计 Total 病例中暴露比值 Odds of exposure among cases= a/c 对照中暴露比值 Odds of exposure among controls = b/d 比值比 Odds ratio: ad/bc 匹配策略与对应的分析 Matching strategies and corresponding analysis • 策略 Strategies 频数匹配 • 层较大 Frequency matching Large strata 集合匹配 Set matching • 层较小 Small strata • 有时候非常小(1:1配对匹配) Sometimes very small (1/1: pairs) • 分析 Analysis 混杂因素的分层分析 Stratified analysis as for confounding factors 如果是1:1配对匹配,则每一层的样本量是 2 If pair matching, then strata size is 2 Mantel-Haenszel 调整在分层分析中的应用 Mantel-Haenszel adjusted odds ratio applying to stratified analysis ai.di) / Ti] OR M-H= bi.ci) / Ti] 配对病例对照研究的分析(1:1) Matched analysis by set (Pairs of 1 case / 1 control) • 一致的对子 Concordant pairs 病例与对照有相同的暴露 Cases and controls have the same exposure ad 和 bc对计算没有贡献 No ad and bc: no input to the calculation Cases Controls Total Exposed 1 1 2 Nonexposed 0 0 合计 Total 1 1 无作用 No effect Cases Controls Total Exposed 0 0 0 0 Nonexposed 1 1 2 2 Total 1 1 2 无作用 No effect 配对病例对照研究的分析(1:1) Matched analysis by set (Pairs of 1 case / 1 control) • 不一致的对子 Discordant pairs 病例与对照有相同的暴露 exposures ad 和 bc对计算有贡献 Cases Controls Total Exposed 1 0 1 Non exposed 0 1 Total 1 1 正相关 Positive association Cases and controls have different ad’s and bc’s: input to the calculation Cases Controls Total Exposed 0 1 1 1 Non exposed 1 0 1 2 Total 1 1 2 负相关 Negative association 在这种情况下就是配对的OR …becomes the matched odds ratio OR 病例暴露(对照不暴露)不一致的对子数 Discordant sets case exposed M-H= 对照暴露(病例不暴露)不一致的对子数 Discordant sets control exposed 使用纸夹将相关对子夹起来,帮助分析… …and the analysis can be done with paper clips! • 一致对子的调查表: 垃圾! Concordant questionnaire: trash • 不一致对子的调查表 放在天平的两端 Discordant questionnaires : on the scale “病例暴露”(而对照不暴露)的不一致对子放在 正相关一边 The "exposed case" pairs weigh for a positive association “对照暴露”(而病例不暴露)的不一致对子放在 负相关一边 association The "exposed control" pairs weigh for a negative 1:M (M>1) 病例对照研究的分析 Analysis of matched case control studies with >1 control per case • 根据病例和对照暴露状态挑选集合 Sort out the sets according to the exposure status of the cases and controls 例如1个病例/2个对照 Example for 1 case / 2 controls 暴露病例的集合 Sets with case exposed: +/++, +/+-, +/未暴露病例的集合 Sets with case unexposed: -/++, -/+-, -/-- • 对集合内的每一类型对子进行计数 Count reconstituted case- control pairs for each type of set • 将集合内不一致“对子数”乘以(符合该集合)的对 子总数 Multiply the number of discordant pairs in each type of set by the number of sets • 然后用f/g这个公式来计算OR formula Calculate odds ratio using the f/g 传统的四格表 The old 2 x 2 table... Cases Controls Total 暴露Exposed a b L1 非暴露Unexposed c d L0 C1 C0 T 合计Total Odds ratio: ad/bc 变成对子就很难认识了! ... is difficult to recognize! Cases Controls Exposed Unexposed Total Exposed e f a Unexposed g h c Total b d P (T/2) Odds ratio: f/g The Mac Nemar chi-square (f - g) Chi2 McN= (f+g) 2 匹配病例对照研究的分析:从概念到数据管理 Analysis of matched case control studies: From concepts to database management • 在研究设计里就将病例和对照联系在一起 Link between sets of cases and controls in the study design • 在分析中使用这些联系 Use of this link in the data analysis 频数匹配的分层分析 Stratification (Frequency matching) 集合匹配的匹配分析 Matched analysis (Set matching) 配对匹配的匹配分析Paired analysis (Case control pairs) • 在数据库中要有识别这种联系的变量 • Need to materialize this link in the dataset to prepare the data analysis 匹配变量 The matched set variable • 是一个另加的变量 Extra variable • 每个病例-对照的识别号 Identifier for each case-control 在频数匹配中是“层” Stratum (Frequency matching) 在集合匹配中是“集合” Set (Case control sets) 在配对匹配是“对子” Pair (Case control pairs) • 在每一“层/集合/对子”内中,所有病例和对照 的这个变量值都是相同的 All cases and controls belonging to the same stratum/ set/ pair share the same value for the matched set variable • 在不同的“层/集合/对子”,这个变量值不同 Each stratum/ set/ pair have a unique value 2个集合的1:3集合匹配数据库变量结构 Database structure for 2 sets of one case and three controls ID 1 2 3 Set 1 1 1 Status 1 2 2 Age 12 13 11 Sex 1 1 1 SES 3 3 3 4 5 6 1 2 2 2 1 2 10 42 44 1 2 2 3 2 2 7 8 2 2 2 2 40 41 2 2 2 2 4个对子的配对匹配数据库变量结构 Database structure for 4 pairs of one case and one control ID 1 2 3 Pair 1 1 2 Status 1 2 1 Age 12 13 22 Sex 1 1 2 SES 3 3 7 4 5 6 2 3 3 2 1 2 22 42 44 2 2 2 7 2 2 7 8 4 4 1 2 54 55 2 2 1 1 匹配:优点 Matching: Advantages • 易于沟通 Is easy to communicate • 对很强地混杂有用处 Is useful for strong confounding factors • 能增加小型研究的效率 Can increase the power of small studies • 能较容易地选择对照 Can ease control recruitment • 如果只有一个研究因素,则有用处 Is useful if only one factor is studied • 匹配后,仍可以识别效应修饰因子 for effect modification with matching criteria Allows looking 匹配:不方便的方面 Matching: Inconvenience • 必须使作者清楚 Must be understood by the author • 在不存在混杂情况下导致不利 Is deleterious in the absence of confounding • 可降低研究效率 Can decrease power • 造成对照选择的复杂 Can complicate control recruitment • 存在多个危险因素时,作用有限 Is limiting if more than one factor • 不再能对匹配因素进行关联性分析 Does not allow examining the association with the matching criteria 匹配变量与暴露有关,但与疾病无关时 (匹配过度) Matching with a variable associated with exposure, but not with illness (Overmatching) • 减少了变异度 Reduces variability • 增加了一致地对子 Increases the number of concordant pairs • 造成有害的结果 Has deleterious consequences: 如果匹配分析:降低了研究效率 If matched analysis: reduction of power 如果不匹配分析:OR将偏向1 If match broken: Odds ratio biased towards one 隐藏地匹配(“秘密匹配”) Hidden matching (“Crypto-matching”) • 一些对照的选择策略造成事实上的匹配 Some control recruitment strategies consist de facto in matching 邻居对照 Neighbourhood controls 朋友对照 Friends controls • 必须发现这样的匹配,并在分析中加以考虑 Matching must be identified and taken into account in the analysis 匹配的操作性原因 Matching for operational reasons • 暴发调查的情况下 Outbreak investigation setting • 朋友和邻居对照是常见的选择 Friends or neighbours controls are a common choice • 优点:Advantages: 更快地找到对照 Allows identifying controls fast 可以大体上控制混杂 Will take care of gross confounding factors 可导致一些过度匹配,但使调查者处于安全地带 May results in some overmatching, which places the investigator on “the safe side” 打破匹配 Breaking the match • 合理性 Rationale 匹配会限制分析 Matching may limit the analysis 匹配也许只是为了操作方便 Matching may have been decided for operational purposes only • 处理过程 Procedure 进行匹配的分析 Conduct matched analysis 再进行非匹配的分析 Conduct unmatched analysis 如果两者结果类似,可以打破匹配分析 Break the match if the results are unchanged 结论 Take home messages • 匹配是个难以掌握的技术 • Matching is a difficult technique • 匹配设计意味着匹配分析 • Matching design means matched analysis • 通常可以想到办法避免匹配 • Matching can always be avoided 对≥1:1匹配资料中连续性变量和整数变量的匹配分 析 Matched analysis for ≥ 1:1 matching for a continuous or integer variable • 按匹配变量进行分层后,使用方差分析方法 Use ANOVA with stratification by the matching variable • 使用条件Logistic模型 regression with one variable Use conditional logistic 其他问题 • 分层 Additional issues Stratification 可以进行 Can be done 通过选择来筛选数据 Cut the data set by selection 甚至可对匹配变量采用此方法 Can even be done on the matching variable • 处理更多的混杂 Handling more confounding 传统方法难以做到 条件Logistc回归 Impossible with traditional methods Conditional logistic regression • >1危险因素的处理Handling > 1 risk factor 匹配分析只能每一个因素的单独分析 at them individually 使用限制 使用条件Logistc回归 Matched analysis only look Use restriction Use conditional logistic regression 配对匹配的连续性和整数变量的匹配分析 Analysis of matched pairs for a continuous or integer variable 2.45 3.33 1.25 1.56 2.01 2.99 2.04 1.65 -0.44 -0.34 +0.79 +0.09 2.71 2.09 1.98 2.90 1.76 2.56 +0.19 -0.33 +0.58 1.42 1.73 2.07 2.65 +0.65 +0.92