Comparing Population Parameters (Z-test, t-tests and Chi-Square test) Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director, Data Coordinating Center College of Human Medicine Michigan State University Is there an association between Drinking and Lung Cancer? Suppose a case-control study is conducted to test the above hypothesis? QUESTION: Is there a difference between the proportion of drinkers among cases and controls? Group 1 Disease P1= proportion of drinkers Group 2 No Disease P2= proportion of drinkrs Elements of Testing hypothesis • • • • • • Null Hypothesis Alternative hypothesis Level of significance Test statistics P-value Conclusion Case Control Study of Drinking and Lung Cancer Null Hypothesis: There is no association between Drinking and Lung cancer, P1=P2 or P1-P2=0 Alternative Hypothesis: There is some kind of association between Drinking and Lung cancer, P1P2 or P1-P20 Based on the data in the following contingency table we estimate the proportion of drinkers among those who develop Lung Cancer and those without the disease? Drinker Yes No Lung Cancer Case Control A=33 B=27 C=1667 D= 2273 eP1=33/1700 Total 60 3940 eP2=27/2300 Test Statistic How many standard deviations has our estimate deviated from the hypothesized value if the null hypothesis was true? Z (eP1 eP 2 0) /[(1/ n1 1/ n2)( p(1 p))] where p (33 27) /(1700 2300) 60 / 4000 3/ 200 0.015 Z [(33/1700) (27 / 2300) 0)]/( (1/1700 1/ 2300)(0.015)(0.985) Z 2.003 P-value for a two tailed test P-value= 2 P[Z > 2.003] = 2(.024)=0.048 How does this p-value compared with =0.05? Since p-value=0.048 < =0.05, reject the null hypothesis H0 in favor of the alternative hypothesis Ha. Conclusion: There is an association between drinking and lung cancer. Is this relationship causal? Chi-Square Test of Independence (based on a Contingency Table) (Observed E xp ected ) Expected 2 df (r 1)(c 1) 2 In the following contingency table estimate the proportion of drinkers among those who develop Lung Cancer and those without the disease? Drinker Yes No Total Lung Cancer Case Control O11=33 O12=27 O21=1667 O22= 2273 C1 = 1700 Total R1=60 R2=3940 C2 = 2300 n = 4000 E11=1700(60)/4000=25.5 E12=34.5 E21=1674.5 E22=2265.5 E11=1700(60)/4000=25.5 E12=34.5 E21=1674.5 E22=2265.5 k 4 2 obs (Observed E xp ected ) Expected k 1 2 (33 25.5) (27 34.5) 25.5 34.5 2 2 (1667 1674.5) (2273 2265.5) 1674.5 2265.5 4.0 2 2 How do we calculate P-value? • SPSS, Epi-Info statistical packages could be used to calculate the p-value for various tests including the Chi-Square Test • If p-value is less than 0.05, then reject the null hypothesis that rows and column variables are independent Testing Hypothesis When Two Population Means are Compared H0: 1= 2 Ha: 1 2 QUESTION: Is there an association between age and Lung Cancer? Group 1 Disease Mean age of the cases Group 2 No Disease Mean age of the controls Use Two-sample t-test when both samples are independent • H0: 1 = 2 vs Ha: 1 2 • H0: 1 - 2 = 0 vs Ha: 1 - 2 0 • t= difference in sample means – hypothesized diff. SE of the Difference in Means • Statistical packages provide p-values and degrees of freedom • Conclusion: If p-value is less than 0.05, then reject the equality of the means Paired t-test for Matched case control study • H0: 1 = 2 vs Ha: 1 2 • H0: 1 - 2 = 0 vs Ha: 1 - 2 0 • Paired t-test= Mean of the differences –0 SE of the Differences in Means • Statistical packages provide p-values for paired t-test • Conclusion: If p-value is less than 0.05, then reject the equality of the means