Analysis of matched data HRP 261 02/02/04 Chapter 9 Agresti – read sections 9.1 and 9.2 Pair Matching: Why match? Pairing can control for extraneous sources of variability and increase the power of a statistical test. Match 1 control to 1 case based on potential confounders, such as age, gender, and smoking. Example Johnson and Johnson (NEJM 287: 1122-1125, 1972) selected 85 Hodgkin’s patients who had a sibling of the same sex who was free of the disease and whose age was within 5 years of the patient’s…they presented the data as…. Tonsillectomy None Hodgkin’s 41 44 Sib control 33 52 OR=1.47; chi-square=1.53 (NS) From John A. Rice, “Mathematical Statistics and Data Analysis. Example But several letters to the editor pointed out that those investigators had made an error by ignoring the pairings. These are not independent samples because the sibs are paired…better to analyze data like this: Control Tonsillectomy None Tonsillectomy 37 7 None 15 26 Case OR=2.14; chi-square=2.91 (p=.09) From John A. Rice, “Mathematical Statistics and Data Analysis. Pair Matching: Agresti example Match each MI case to an MI control based on age and gender. Ask about history of diabetes to find out if diabetes increases your risk for MI. Pair Matching: Agresti example Just the discordant cells are informative! MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 Which cells are informative? 46 98 144 Pair Matching MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 46 98 144 OR estimate comes only from discordant pairs! The question is: among the discordant pairs, what proportion are discordant in the direction of the case vs. the direction of the control. If more discordant pairs “favor” the case, this indicates OR>1. MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 46 98 144 P(“favors” case/discordant pair) = P( E / D) * P(~ E / ~ D) P( E / D) * P(~ E / ~ D) P(~ E / D) * P( E / ~ D) =the probability of observing a case-control pair with only the case exposed =the probability of observing a case-control pair with only the control exposed MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 P(“favors” case/discordant pair) = 37 b 37 ˆ p 37 16 b c 53 46 98 144 MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 odds(“favors” case/discordant pair) = b 37 OR c 16 46 98 144 MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 46 98 144 OR estimate comes only from discordant pairs!! OR= 37/16 = 2.31 Makes Sense! McNemar’s Test MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 Null hypothesis: P(“favors” case / discordant pair) = .5 (note: equivalent to OR=1.0 or cell b=cell c) 53 53 53 37 16 38 15 p value (.5) (.5) (.5) (.5) (.5)39 (.5)14 ... 37 38 39 By normal approximation to binomial: Z 53 ) 10.5 2 2.88; p .01 3.64 53(.5)(. 5) 37 ( McNemar’s Test: generally controls exp No exp exp a b No exp c d cases By normal approximation to binomial: Z bc b c ) 2 2 bc 2 (b c )(. 5)(. 5) bc bc 4 b( Equivalently: 2 1 bc 2 (b c) 2 ( ) bc bc 95% CI for difference in dependent proportions MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 46 98 144 Var ( p1 p2 ) Var ( p1 ) Var ( p2 ) 2Cov( p1 , p2 ) Var ( p E / D p E / ~ D ) p E / D (1 p E / D ) p (1 p E / ~ D ) E /~D 2Cov( p E / D , p E / ~ D ) ncases controls ncases controls (.32)(. 68) (.17)(. 83) 2(.06 * .57 .26 * .11) .0024 144 95% CI : .32 - .17 .15 1.96( .0024 ) .05 .24 Each pair is it’s own “agegender” stratum Example: Concordant for exposure (cell “a” from before) Case (MI) Control Diabetes 1 1 No diabetes 0 0 Case (MI) Control Diabetes 1 1 No diabetes 0 0 Case (MI) Control Diabetes 1 0 No diabetes 0 1 Case (MI) Control 0 1 Diabetes 1 0 Case (MI) Control Diabetes 0 0 No diabetes 1 1 No diabetes x9 x 37 x 16 x 82 Mantel-Haenszel for pairmatched data We want to know the relationship between diabetes and MI controlling for age and gender. Mantel-Haenszel methods apply. RECALL: The Mantel-Haenszel Summary Odds Ratio k ai d i i 1 Ti k bi ci i 1 Ti Case Control Exposed a b Not Exposed c d Case (MI) Control Diabetes 1 1 ad/T = 0 No diabetes 0 0 bc/T=0 Case (MI) Control Diabetes 1 0 ad/T=1/2 No diabetes 0 1 bc/T=0 Case (MI) Control Diabetes 0 1 ad/T=0 No diabetes 1 0 bc/T=1/2 Case (MI) Control Diabetes 0 0 ad/T=0 No diabetes 1 1 bc/T=0 Mantel-Haenszel Summary OR 144 ORMH ai d i 1 37 x 37 i 1 2 2 144 1 16 bi ci 16 * 2 i 1 2 Mantel-Haenszel Test Statistic (same as McNemar’s) recall : E(n11k ) Var(n11k ) n1 k n1k n k n1 k n1k n2 k n 2 k n 2 k (n k 1) Concordant cells contribute 0 discordant cells : (1)(1) 1 (1)(1)(1)(1) 1 11k ;Var(n11k ) 2 2 2 2 (2 1) 4 [ CMH .5 con.disc.cells .5]2 case disc.cells .25 disc.cells [.5(b) .5(c)]2 (b c) 2 (b c)(.25) bc Example: Salmonella Outbreak in France, 1996 From: “Large outbreak of Salmonella enterica serotype paratyphi B infection caused by a goats' milk cheese, France, 1993: a case finding and epidemiological study” BMJ 312: 9194; Jan 1996. Epidemic Curve Matched Case Control Study Case = Salmonella gastroenteritis. Community controls (1:1) matched for: age group (< 1, 1-4, 5-14, 15-34, 35-44, 4554, 55-64, or >= 65 years) gender city of residence Results In 2x2 table form: any goat’s cheese Controls Goat’ cheese None Goat’s cheese 23 23 None 6 7 29 30 Cases b 23 OR 3.8 c 6 46 13 59 In 2x2 table form: Brand B Goat’s cheese Controls Goat’ cheese B None Goat’s cheese B 8 24 None 2 25 10 49 Cases b 24 OR 12.0 c 2 32 27 59 Case (MI) Control 1 1 0 0 Case (MI) Control Brand B 1 0 None 0 1 Case (MI) Control Brand B 0 1 None 1 0 Case (MI) Control Brand B 0 0 None 1 1 Brand B None x8 x24 x2 x25 8 concordant exposed : 11k n1 k n1k 2 *1 E(n11k ) 1 n k 2 Observed(n11k ) 11k 1 1 0 n1 k n1k n2 k n 2 k 2 *1 * 0 *1 Var(n11k ) 2 0 4(2 1) n k (n k 1) Summary: 8 concordant-exposed pairs (=strata) contribute nothing to the numerator (observed-expected=0) and nothing to the denominator (variance=0). n1 k n1k 0 *1 25 concordant unexposed : 11k E(n11k ) 0 n k 2 Observed(n11k ) 11k 0 0 0 n n n n 0 *1 * 2 *1 Var(n11k ) 12k 1k 2 k 2 k 0 4(2 1) n k (n k 1) Summary: 25 concordant-unexposed pairs contribute nothing to the numerator (observed-expected=0) and nothing to the denominator (variance=0). 2 discordant cells favor control : 11k Observed(n11k ) 11k 0 .5 .5 (1)(1) 1 2 2 n1 k n1k n2 k n 2 k 1 *1 *1 *1 1 Var(n11k ) 2 4(2 1) 4 n k (n k 1) Summary: 2 discordant “control-exposed” pairs contribute -.5 each to the numerator (observed-expected= -.5) and .25 each to the denominator (variance= .25). (1)(1) 1 24 discordant cells favor case : 11k 2 2 Observed(n11k ) 11k 1 .5 .5 n1 k n1k n2 k n 2 k 1 *1 *1 *1 1 Var(n11k ) 2 4(2 1) 4 n k (n k 1) Summary: 24 discordant “case-exposed” pairs contribute +.5 each to the numerator (observed-expected= +.5) and .25 each to the denominator (variance= .25). [8(0) 25(0) 24(.5) 2(.5)]2 CMH 0 0 24(.25) 2(.25) 22 (.25) 22 (24 2) (b c) 26(.25) 26 26 bc 2 2 2 2 M:1 matched studies One-to-one pair matching provides the most costeffective design when cases and controls are equally scarce. But when cases are the limiting factor, as with rare diseases, statistical power may be increased by selecting more than 1 control matched to each case. But with diminishing returns… M:1 matched studies 2:1 matched study of colorectal cancer. Background: Carcinoembryonic antigen (CEA) is the classical tumor marker for colorectal cancer. This study investigated whether the plasma levels of carcinoembryonic antigen and/or CA 242 were elevated BEFORE clinical diagnosis of colorectal cancer. From: Palmqvist R et al. Prediagnostic Levels of Carcinoembryonic Antigen and CA 242 in Colorectal Cancer: A Matched Case-Control Study. Diseases of the Colon & Rectum. 46(11):1538-1544, November 2003. M:1 matched studies Prediagnostic Levels of Carcinoembryonic Antigen and CA 242 in Colorectal Cancer: A Matched Case-Control Study Study design: A so-called “nested case-control study.” Idea: Study subjects who were members of an ongoing prospective cohort study in Sweden had given blood at baseline, when they had no disease. Years later, blood can be thawed and tested for the presence of prediagnostic antigens. Key innovation: The cohort is large, the disease is rare, and it’s too costly to test everyone’s blood; so only test stored blood of cases and matched controls from the cohort. M:1 matched studies Two cancer-free controls were randomly selected to each case from the corresponding cohort at the time of diagnosis of the matched case. Matched for: Gender age at recruitment (±12 months) date of blood sampling ±2 months fasting time (<4 hours, 4–8 hours, >8 hours). 2:1 matching: •stratum=matching group •3 subjects per stratum •6 possible 2x2 tables… Case (CRC) Controls CEA + 1 2 CEA - 0 0 Case (CRC) Controls CEA + 1 1 CEA - 0 1 Case (CRC) Controls CEA + 1 0 CEA - 0 2 Everyone exposed; noninformative Case exposed; 1 control unexposed Case exposed; both controls unexposed Case (CRC) Controls CEA + 0 2 CEA - 1 0 Case (CRC) Controls CEA + 0 1 CEA - 1 1 Case (CRC) Controls CEA + 0 0 CEA - 1 2 Case unexposed; both controls exposed Case unexposed; 1 control exposed Everyone unexposed; non-informative Case (CRC) Controls CEA + 1 2 CEA - 0 0 Case (CRC) Controls CEA + 1 1 CEA - 0 1 Case (CRC) Controls CEA + 1 0 CEA - 0 2 0 2 12 Case (CRC) Controls CEA + 0 2 CEA - 1 0 Case (CRC) Controls CEA + 0 1 CEA - 1 1 Case (CRC) Controls CEA + 0 0 CEA - 1 2 0 1 102 2 Tables with 2 exposed Case (CRC) Controls CEA + 0 2 CEA - 1 0 Case (CRC) Controls CEA + 1 1 CEA - 0 1 2 2 13 Tables with 1 exposed Case (CRC) Controls CEA + 1 0 CEA - 0 2 Case (CRC) Controls CEA + 0 1 CEA - 1 1 1 1 Represents all possible discordant tables (either 2 or 1 total exposed) 2 Tables with 2 exposed Case (CRC) Controls CEA + 0 2 CEA - 1 0 Case (CRC) Controls CEA + 1 1 CEA - 0 1 2 2 2 2 P(first table) (1 p E / D ) p E /~ D (1 p E /~ D ) 0 2 2 P(second table ) ( p E / D ) p E /~ D (1 p E /~ D ) 1 2 ( p E / D ) p E /~ D (1 p E /~ D ) 1 P(case exposed/ 2 total exposed) 2 2 2 (1 p E / D ) p E /~ D ( p E / D ) p E /~ D (1 p E /~ D ) 2 1 2 ( p E / D ) p E /~ D (1 p E /~ D ) 1 2 2 2 (1 p E / D ) p E /~ D ( p E / D ) p E /~ D (1 p E /~ D ) 2 1 ( p E / D )2(1 p E / ~ D ) (1 p E / D ) p E / ~ D ( p E / D )2(1 p E / ~ D ) 2 p E / D p ~E /~D p~ E / D p E / ~ D 2 p E / D p ~ E / ~ D 2 p E / D p ~E /~D p~ E / D p E / ~ D p~ E / D p E / ~ D p~ E / D p E / ~ D 2 p E / D p ~E /~D p~ E / D p E / ~ D 2OR 2OR 1 13 Tables with 1 exposed Case (CRC) Controls CEA + 1 0 CEA - 0 Case (CRC) 2 Controls CEA + 0 1 CEA - 1 1 1 1 2 0 P(first table) p E / D p E /~ D (1 p E /~D ) 2 0 2 P(second table ) (1 p E / D ) p E /~ D (1 p E /~ D ) 1 2 p E / D (1 p E /~ D ) 2 0 P(case exposed/ 1 total exposed) 2 2 2 p E / D (1 p E /~ D ) (1 p E / D ) p E /~ D (1 p E / ~ D ) 0 1 2 p E / D (1 p E /~ D ) 2 0 2 2 2 p E / D (1 p E / ~ D ) (1 p E / D ) p E / ~ D (1 p E /~ D ) 0 1 p E / D p~ E / ~ D 2 p E / D p~ E / ~ D p~ E / D 2 p E / ~ D p~ E / ~ D 2 p E / D p~ E / ~ D p E / D p~ E / ~ D 2 p~ E / D p E / ~ D p E / D p~ E / ~ D p~ E / D p E / ~ D OR p E / D p~ E /~ D 2 p~ E / D p E /~ D OR 2 p~ E / D p E / ~ D p~ E / D p E / ~ D Summary P(case exposed/2 total exposed)=2OR/(2OR+1) P(case unexposed/2 total exposed)=1-2OR/(2OR+1) P(case exposed/1 total exposed) = OR/(OR+2) P(case unexposed/1 total exposed)= 1-OR/(OR+2) Therefore, we can make a likelihood equation for our data that is a function of the OR, and use MLE to solve for OR Applying to example data 2OR 2 2OR 0 OR 12 OR 1 P(data / OR) ( ) (1 ) ( ) (1 ) 2OR 1 2OR 1 OR 2 OR 2 2OR 2 1 OR 12 2 0 ( ) ( ) ( ) ( )1 2OR 1 2OR 1 OR 2 OR 2 A little complicated to solve further… Applying to example data BD give a more simple robust estimate of OR for 2:1 matching: 1(# where 2 total exposed & case exposed) 2(# where1 total exposed & case exposed) 2(# where 2 total exposed & 2 controls exposed) 1(# where1 total exposed & control exposed) 1(2) 2(12) 26.0 2(0) 1(1) OR