Analysis of matched data; plus, diagnostic testing Correlated Observations Correlated data arise when pairs or clusters of observations are related and thus are more similar to each other than to other observations in the dataset. Ignoring correlations will: – overestimate p-values for within-person or within-cluster comparisons – underestimate p-values for between-person or between-cluster comparisons Pair Matching: Why match? Pairing can control for extraneous sources of variability and increase the power of a statistical test. Match 1 control to 1 case based on potential confounders, such as age, gender, and smoking. Example Johnson and Johnson (NEJM 287: 1122-1125, 1972) selected 85 Hodgkin’s patients who had a sibling of the same sex who was free of the disease and whose age was within 5 years of the patient’s…they presented the data as…. Tonsillectomy None Hodgkin’s 41 44 Sib control 33 52 OR=1.47; chi-square=1.53 (NS) From John A. Rice, “Mathematical Statistics and Data Analysis. Example But several letters to the editor pointed out that those investigators had made an error by ignoring the pairings. These are not independent samples because the sibs are paired…better to analyze data like this: Control Tonsillectomy None Tonsillectomy 26 15 None 7 37 Case OR=2.14*; chi-square=2.91 (p=.09) From John A. Rice, “Mathematical Statistics and Data Analysis. Pair Matching: example Match each MI case to an MI control based on age and gender. Ask about history of diabetes to find out if diabetes increases your risk for MI. Pair Matching: example Just the discordant cells are informative! MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 Which cells are informative? 46 98 144 Pair Matching MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 46 98 144 OR estimate comes only from discordant pairs! The question is: among the discordant pairs, what proportion are discordant in the direction of the case vs. the direction of the control. If more discordant pairs “favor” the case, this indicates OR>1. MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 P(“favors” case/discordant pair) = 37 b 37 ˆ p 37 16 b c 53 46 98 144 MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 odds(“favors” case/discordant pair) = b 37 OR c 16 46 98 144 MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 46 98 144 OR estimate comes only from discordant pairs!! OR= 37/16 = 2.31 Makes Sense! McNemar’s Test MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 Null hypothesis: P(“favors” case / discordant pair) = .5 (note: equivalent to OR=1.0 or cell b=cell c) 53 53 53 37 16 38 15 p value (.5) (.5) (.5) (.5) (.5)39 (.5)14 ... 37 38 39 McNemar’s Test MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 Null hypothesis: P(“favors” case / discordant pair) = .5 (note: equivalent to OR=1.0 or cell b=cell c) By normal approximation to binomial: Z 53 ) 10.5 2 2.88; p .01 3.64 53(.5)(. 5) 37 ( McNemar’s Test: generally controls exp No exp exp a b No exp c d cases By normal approximation to binomial: bc b c ) bc 2 2 2 Z (b c )(. 5)(. 5) bc bc 4 Equivalently: b( 12 bc 2 (b c) 2 ( ) bc bc McNemar’s Test MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 McNemar’s Test: 12 (37 16 ) 2 212 8.32 2.88 2 ; p .01 53 53 Example: McNemar’s EXACT test Split-face trial: – Researchers assigned 56 subjects to apply SPF 85 sunscreen to one side of their faces and SPF 50 to the other prior to engaging in 5 hours of outdoor sports during mid-day. The outcome is sunburn (yes/no). – Unit of observation = side of a face – Are the observations correlated? Yes. Russak JE et al. JAAD 2010; 62: 348-349. Results ignoring correlation: Table I -- Dermatologist grading of sunburn after an average of 5 hours of skiing/snowboarding (P = .03; Fisher’s exact test) Sun protection factor 85 50 Sunburned Not sunburned 1 55 8 48 Fisher’s exact test compares the following proportions: 1/56 versus 8/56. Note that individuals are being counted twice! Correct analysis of data: Table 1. Correct presentation of the data (P = .016; McNemar’s exact test). SPF-50 side SPF-85 side Sunburned Not sunburned Sunburned 1 0 Not sunburned 7 48 McNemar’s exact test: Null hypothesis: X~binomial (n=7, p=.5) 7 7 0 P( X 0) .5 .5 .0078 0 7 7 0 P( X 7) .5 .5 .0078 0 Two sided p - value .0156 RECALL: 95% confidence interval for a difference in INDEPENDENT proportions Standard error can be estimated by: pˆ (1 pˆ ) n Standard error of the difference of two proportions= pˆ1 (1 pˆ1 ) pˆ 2 (1 pˆ 2 ) n1 n2 95% confidence interval for the difference between two proportions: ( pˆ1 pˆ 2 ) 1.96 * pˆ1 (1 pˆ1 ) pˆ 2 (1 pˆ 2 ) n1 n2 95% CI for difference in dependent proportions Variance of the difference of two random variables is the sum of their variances minus 2*covariance: Var ( pˆ1 pˆ 2 ) Var ( pˆ1 ) Var ( pˆ 2 ) 2Cov ( pˆ1 , pˆ 2 ) Var( p E / D ) Var( p E / ~ D ) p E / D (1 p E / D ) ncases controls p E / ~ D (1 p E / ~ D ) ncases controls Cov( p E / ~ D , p E / D ) Var( pE / D pE / ~ D ) p E & D * p~ E & ~ D p~ E & D * p E & ~ D ncases controls pE / D (1 pE / D ) pE / ~ D (1 pE / ~ D ) p *p p~ E & D * p E & ~ D 2( E &D ~ E &~ D ) n n n 95% CI for difference in dependent proportions MI controls MI cases Diabetes No Diabetes Diabetes 9 37 No diabetes 16 82 25 119 46 98 144 46 25 .32 .17 .15 144 144 pE /~D ) pE / D pE /~D Var( p E / D p E / D (1 p E / D ) p E / ~ D (1 p E / ~ D ) p * p ~ E &~ D p ~ E & D * p E &~ D 2( E & D n n n 46 46 25 25 9 82 37 16 ( )(1 )( )(1 ) 2( * * ) 144 144 144 144 144 144 144 144 .0024 144 95 % CI : 0.15 1.96 ( .0024 ) 0.05 0.24 The connection between McNemar and Cochran-Mantel-Haenszel Tests View each pair is it’s own “age-gender” stratum Example: Concordant for exposure (cell “a” from before) Case (MI) Control Diabetes 1 1 No diabetes 0 0 Case (MI) Control Diabetes 1 1 No diabetes 0 0 Case (MI) Control Diabetes 1 0 No diabetes 0 1 Case (MI) Control 0 1 Diabetes 1 0 Case (MI) Control Diabetes 0 0 No diabetes 1 1 No diabetes x9 x 37 x 16 x 82 Mantel-Haenszel for pairmatched data We want to know the relationship between diabetes and MI controlling for age and gender (the matching variables). Mantel-Haenszel methods apply. RECALL: The Mantel-Haenszel Summary Odds Ratio k ai d i i 1 Ti k bi ci i 1 Ti Case Control Exposed a b Not Exposed c d Case (MI) Control Diabetes 1 1 ad/T = 0 No diabetes 0 0 bc/T=0 Case (MI) Control Diabetes 1 0 ad/T=1/2 No diabetes 0 1 bc/T=0 Case (MI) Control 0 1 Diabetes x 37 ad/T=0 bc/T=1/2 1 0 Case (MI) Control Diabetes 0 0 ad/T=0 No diabetes 1 1 bc/T=0 No diabetes x9 x 16 x 82 Mantel-Haenszel Summary OR 144 ORMH ai d i 1 37 x 37 i 1 2 2 144 1 16 bi ci 16 * 2 i 1 2 Mantel-Haenszel Test Statistic (same as McNemar’s) k [ (a E (ak ))] 2 k i 1 k Var(a ) ~ 2 1 k i 1 recall : E (ak ) (ak bk ) * (ak ck ) nk (ak bk ) * (ck d k ) * (ak ck ) * (bk d k ) Var(ak ) nk2 (nk 1) Concordant cells contribute nothing to MantelHaenszel statistic (observed=expected) Case (MI) Control Diabetes 1 1 No diabetes 0 0 Case (MI) Control Diabetes 0 0 No diabetes 1 1 recall : E (ak ) Var(ak ) (row1) * (col1) nk (row1) * (row2) * (col1) * (col2) nk2 (nk 1) (2) * (1) 1 2 a k E ( ak ) 1 1 0 E ( ak ) (2)(1)(1)(0) Var(ak ) 0 2 2 (1) (0) * (1) 0 2 a k E ( ak ) 0 0 0 E ( ak ) (0)(1)(1)(2) Var(ak ) 0 2 2 (1) Discordant cells Case (MI) Control Diabetes 1 0 No diabetes 0 1 Case (MI) Control Diabetes 0 1 No diabetes 1 0 recall : E (ak ) Var(ak ) (row1) * (col1) nk (row1) * (row2) * (col1) * (col2) nk2 (nk 1) (1) * (1) 1 2 2 1 1 ak E (ak ) 1 2 2 (1)(1)(1)(1) 1 Var(ak ) 2 2 (2 1) 4 E ( ak ) (1) * (1) 1 2 2 1 1 ak E (ak ) 0 2 2 (1)(1)(1)(1) 1 Var(ak ) 2 2 (2 1) 4 E ( ak ) k [ 2 1 (a E (ak ))] 2 k i 1 k Var(a ) k i 1 [37 (.5) 16(.5)]2 [.5(37 16)]2 (37 16)(.25) (53)(.25) .5 2 (37 16) 2 (37 16) 2 8.32; p .01 .25(53) 53 k [ CMH (a E (ak ))] 2 k i 1 k Var(a ) k .5 [ i 1 case disc.cells .5 ] control disc.cells .25 [.5(b) .5(c)] (b c)(.25) disc.cells .5 2 (b c) 2 (b c) 2 McNemar' s .25(b c) bc ~ 12 2 Example: Salmonella Outbreak in France, 1996 From: “Large outbreak of Salmonella enterica serotype paratyphi B infection caused by a goats' milk cheese, France, 1993: a case finding and epidemiological study” BMJ 312: 9194; Jan 1996. Epidemic Curve Matched Case Control Study Case = Salmonella gastroenteritis. Community controls (1:1) matched for: age group (< 1, 1-4, 5-14, 15-34, 35-44, 4554, 55-64, or >= 65 years) gender city of residence Results In 2x2 table form: any goat’s cheese Controls Goat’ cheese None Goat’s cheese 23 23 None 6 7 29 30 Cases b 23 OR 3.8 c 6 46 13 59 In 2x2 table form: Brand A Goat’s cheese Controls Goat’ cheese B None Goat’s cheese B 8 24 None 2 25 10 49 Cases b 24 OR 12.0 c 2 32 27 59 Case (MI) Control 1 1 0 0 Case (MI) Control Brand A 1 0 None 0 1 Case (MI) Control Brand A 0 1 None 1 0 Case (MI) Control Brand A 0 0 None 1 1 Brand A None x8 x24 x2 x25 n1 k n1k 2 *1 8 concordant exposed : 11k E(n11k ) 1 n k 2 Using Observed(n11k ) 11k 1 1 0 Agresti n1 k n1k n2 k n 2 k 2 *1 * 0 *1 Var(n11k ) 2 0 notation 4(2 1) n k (n k 1) here! Summary: 8 concordant-exposed pairs (=strata) contribute nothing to the numerator (observed-expected=0) and nothing to the denominator (variance=0). n1 k n1k 0 *1 25 concordant unexposed : 11k E(n11k ) 0 n k 2 Observed(n11k ) 11k 0 0 0 n n n n 0 *1 * 2 *1 Var(n11k ) 12k 1k 2 k 2 k 0 4(2 1) n k (n k 1) Summary: 25 concordant-unexposed pairs contribute nothing to the numerator (observed-expected=0) and nothing to the denominator (variance=0). 2 discordant cells favor control : 11k Observed(n11k ) 11k 0 .5 .5 (1)(1) 1 2 2 n1 k n1k n2 k n 2 k 1 *1 *1 *1 1 Var(n11k ) 2 4(2 1) 4 n k (n k 1) Summary: 2 discordant “control-exposed” pairs contribute -.5 each to the numerator (observed-expected= -.5) and .25 each to the denominator (variance= .25). (1)(1) 1 24 discordant cells favor case : 11k 2 2 Observed(n11k ) 11k 1 .5 .5 n1 k n1k n2 k n 2 k 1 *1 *1 *1 1 Var(n11k ) 2 4(2 1) 4 n k (n k 1) Summary: 24 discordant “case-exposed” pairs contribute +.5 each to the numerator (observed-expected= +.5) and .25 each to the denominator (variance= .25). [8(0) 25(0) 24(.5) 2(.5)]2 CMH 0 0 24(.25) 2(.25) 22 (.25) 22 (24 2) (b c) 26(.25) 26 26 bc 2 2 2 2 Diagnostic Testing and Screening Tests Characteristics of a diagnostic test Sensitivity= Probability that, if you truly have the disease, the diagnostic test will catch it. Specificity=Probability that, if you truly do not have the disease, the test will register negative. Calculating sensitivity and specificity from a 2x2 table Screening Test + - + a b a+b - c d c+d Truly have disease a Among those with true Sensitivity disease, how many test a b positive? d Specificity cd Among those without the disease, how many test negative? Hypothetical Example Mammography + - + 9 1 10 - 109 881 990 Breast cancer ( on biopsy) Sensitivity=9/10=.90 1 false negatives out of 10 cases Specificity= 881/990 =.89 109 false positives out of 990 What factors determine the effectiveness of screening? The prevalence (risk) of disease. The effectiveness of screening in preventing illness or death. – Is the test any good at detecting disease/precursor (sensitivity of the test)? – Is the test detecting a clinically relevant condition? – Is there anything we can do if disease (or pre-disease) is detected (cures, treatments)? – Does detecting and treating disease at an earlier stage really result in a better outcome? The risks of screening, such as false positives and radiation. Positive predictive value The probability that if you test positive for the disease, you actually have the disease. Depends on the characteristics of the test (sensitivity, specificity) and the prevalence of disease. Example: Mammography Mammography utilizes ionizing radiation to image breast tissue. The examination is performed by compressing the breast firmly between a plastic plate and an x-ray cassette that contains special x-ray film. Mammography can identify breast cancers too small to detect on physical examination. Early detection and treatment of breast cancer (before metastasis) can improve a woman’s chances of survival. Studies show that, among 50-69 year-old women, screening results in 20-35% reductions in mortality from breast cancer. Mammography Controversy exists over the efficacy of mammography in reducing mortality from breast cancer in 40-49 year old women. Mammography has a high rate of false positive tests that cause anxiety and necessitate further costly diagnostic procedures. Mammography exposes a woman to some radiation, which may slightly increase the risk of mutations in breast tissue. Example A 60-year old woman has an abnormal mammogram; what is the chance that she has breast cancer? E.g., what is the positive predictive value? Calculating PPV and NPV from a 2x2 table Screening Test + - + a b - c d Truly have disease a+c PPV a ac b+d Among those who test positive, how many truly have the disease? NPV d bd Among those who test negative, how many truly do not have the disease? Hypothetical Example Mammography + - + 9 1 - 109 881 118 882 Breast cancer ( on biopsy) PPV=9/118=7.6% NPV=881/882=99.9% Prevalence of disease = 10/1000 =1% What if disease was twice as prevalent in the population? Mammography + - + 18 2 20 - 108 872 980 Breast cancer ( on biopsy) sensitivity=18/20=.90 specificity=872/980=.89 Sensitivity and specificity are characteristics of the test, so they don’t change! What if disease was more prevalent? Mammography + - + 18 2 - 108 872 126 874 Breast cancer ( on biopsy) PPV=18/126=14.3% NPV=872/874=99.8% Prevalence of disease = 20/1000 =2% Conclusions Positive predictive value increases with increasing prevalence of disease Or if you change the diagnostic tests to improve their accuracy.