Test for Independence (multinomial sampling)

Two-Way Contingency Tables: Tests for Independence and Homogeneity 1. Test for Homogeneity (product multinomial sampling): Example 1 (2X2 table – test for equality of two population proportions, independent samples, both large). Based on the following case-control data: Smoker 15 8 Stroke No Stroke Non-Smoker 35 12 (a) Please test at the significance level of 0.05 whether smoking is a risk factor for stroke. (b) Please write up the entire SAS program necessary to answer the question raised in (a), including the data step. Solution: (a) Smoker Non-Smoker Total Stroke (case) 15 35 50 No Stroke (control) 8 12 20 𝐻0 : 𝑃 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 𝑎𝑚𝑜𝑛𝑔 𝑐𝑎𝑠𝑒 (𝑠𝑡𝑟𝑜𝑘𝑒) 𝐻𝛼 : 𝑃 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 𝑎𝑚𝑜𝑛𝑔 𝑐𝑎𝑠𝑒 (𝑠𝑡𝑟𝑜𝑘𝑒) 𝑃̂ = Z0 = p̂1 − p̂2 − 0 1 1 √p̂(1 − p̂) ( + ) n1 n2 = =𝑃 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 𝑎𝑚𝑜𝑛𝑔 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (𝑛𝑜 𝑠𝑡𝑟𝑜𝑘𝑒) ≠𝑃 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 𝑎𝑚𝑜𝑛𝑔 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (𝑛𝑜 𝑠𝑡𝑟𝑜𝑘𝑒) 15 + 8 ≈ 0.33 50 + 20 0.30 − 0.40 √0.33(1 − 0.33) ( 1 + 1 ) 50 20 ≈ −0.80 |Z0 | ≈ 0.08 < Z0.025 = 1.96 We can not reject H 0 at α = 0.05 and conclude that based on the given data, smoking is not confirmed as a risk factor for stroke. (a) SAS code: 1 Data smoking; Input stroke $ smoker $ count; Datalines; stroke yes 15 stroke no 35 nostrk yes 8 nostrk no 12 ; Run; Proc freq data=smoking; Tables stroke*smoker/chisq; Weight count; Run; Discussion: Now we examine the relations between the two possible Z-tests, and the Chi-square test for homogeneity as follows. The general format of the data is: Smoker Non-Smoker Stroke a (15) b (35) a+b (50) No stroke c (8) d (12) c+d (20) a+c (23) b+d (47) a+b+c+d (70) The two possible Z-tests are (*** although the one we have shown above in part (a) is the reasonable choice – but they indeed both lead to the same test statistic – which in turn is equivalent to the Chi-square test for homogeneity in this 2X2 table): 𝐻0 : 𝑃 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 𝑎𝑚𝑜𝑛𝑔 𝑐𝑎𝑠𝑒 (𝑠𝑡𝑟𝑜𝑘𝑒) 𝐻𝛼 : 𝑃 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 𝑎𝑚𝑜𝑛𝑔 𝑐𝑎𝑠𝑒 (𝑠𝑡𝑟𝑜𝑘𝑒) =𝑃 Since 𝑎𝑚𝑜𝑛𝑔 𝑠𝑚𝑜𝑘𝑒𝑟 ≠𝑃 ̂2 − 0 𝑃̂1 − 𝑃 1 1 √𝑃̂(1 − 𝑃̂)( + ) 𝑛1 𝑛2 𝑠𝑡𝑟𝑜𝑘𝑒𝑠 𝑎𝑚𝑜𝑛𝑔 𝑛𝑜𝑛− 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 𝐻𝛼 : 𝑃𝑠𝑡𝑟𝑜𝑘𝑒𝑠 ≠ 𝑃 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 𝑎𝑚𝑜𝑛𝑔 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (𝑛𝑜 𝑠𝑡𝑟𝑜𝑘𝑒) 𝑎𝑚𝑜𝑛𝑔 𝑠𝑚𝑜𝑘𝑒𝑟 𝑠𝑡𝑟𝑜𝑘𝑒𝑠 𝑎𝑚𝑜𝑛𝑔 𝑛𝑜𝑛− 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 𝑍0 (Large sample sizes) 𝑍0 = 𝐻0 : 𝑃𝑠𝑡𝑟𝑜𝑘𝑒𝑠 = 𝑃 𝑠𝑚𝑜𝑘𝑒𝑟𝑠 𝑎𝑚𝑜𝑛𝑔 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (𝑛𝑜 𝑠𝑡𝑟𝑜𝑘𝑒) 𝐻0 ~ 𝑁(0,1) = ∗ ̂2 ∗ − 0 𝑃̂1 − 𝑃 1 1 √𝑃̂∗ (1 − 𝑃̂∗ )( ∗ + ∗ ) 𝑛1 𝑛2 𝐻0 ~ 𝑁(0,1) Since 2 𝑎 𝑐 ̂2 = , 𝑃 , 𝑎+𝑏 𝑐+𝑑 𝑎+𝑐 𝑃̂ = 𝑎+𝑏+𝑐+𝑑 𝑛1 = 𝑎 + 𝑏, 𝑛2 = 𝑐 + 𝑑 𝑎 𝑏 ̂2 ∗ = , 𝑃 , 𝑎+𝑐 𝑏+𝑑 𝑎+𝑏 𝑃̂ ∗ = 𝑎+𝑏+𝑐+𝑑 𝑛1 ∗ = 𝑎 + 𝑐, 𝑛2 ∗ = 𝑏 + 𝑑 ∗ 𝑃̂1 = 𝑃̂1 = then then 𝑎 𝑐 − −0 𝑎 + 𝑏 𝑐 + 𝑑 𝑍0 = 𝑎+𝑐 𝑏+𝑑 1 1 √ ( + ) 𝑎+𝑏+𝑐+𝑑𝑎+𝑏+𝑐+𝑑 𝑎+𝑏 𝑐+𝑑 𝑍0 = 𝑎 𝑏 − −0 𝑎+𝑐 𝑏+𝑑 𝑎+𝑏 𝑐+𝑑 1 1 √ ( + ) 𝑎+𝑏+𝑐+𝑑𝑎+𝑏+𝑐+𝑑 𝑎+𝑐 𝑏+𝑑 𝑎+𝑏+𝑐+𝑑 = ⋯ = (𝑎𝑑 − 𝑏𝑐)√ (𝑎 + 𝑐)(𝑏 + 𝑑)(𝑎 + 𝑏)(𝑐 + 𝑑) Now we denote the probabilities of the four table cells as follows: Smoker Non-Smoker Stroke 𝑝11 (= 𝑝1 ) 𝑝12 (= 1 − 𝑝1 ) No stroke 𝑝21 (= 𝑝2 ) 𝑝22 (= 1 − 𝑝2 ) The original null hypothesis of equal population proportions: 𝐻0 : 𝑝1 = 𝑝2 is equivalent to the null hypothesis for the homogeneity test: 𝐻0 : 𝑝11 = 𝑝21 , 𝑝12 = 𝑝22 The test statistic is: 𝜒02 = [𝑎−𝑃̂(𝑎+𝑏)]2 𝑃̂(𝑎+𝑏) + [𝑏−(1−𝑃̂)(𝑎+𝑏)]2 (1−𝑃̂)(𝑎+𝑏) + [𝑐−𝑃̂(𝑐+𝑑)]2 𝑃̂(𝑐+𝑑) + [𝑑−(1−𝑃̂)(𝑐+𝑑)]2 (1−𝑃̂)(𝑐+𝑑) (𝑎 + 𝑐)(𝑎 + 𝑏) 2 (𝑏 + 𝑑)(𝑎 + 𝑏) 2 ] [𝑏 − ] 𝑎+𝑏+𝑐+𝑑 𝑎+𝑏+𝑐+𝑑 + (𝑎 + 𝑐)(𝑎 + 𝑏) (𝑏 + 𝑑)(𝑎 + 𝑏) 𝑎+𝑏+𝑐+𝑑 𝑎+𝑏+𝑐+𝑑 [𝑎 − = 2 2 𝑎+𝑏+𝑐+𝑑 2 and 𝜒02 ~𝜒(2−1)(2−1) (𝑎 + 𝑐)(𝑐 + 𝑑) (𝑏 + 𝑑)(𝑐 + 𝑑) [𝑐 − ] [𝑑 − ] 𝑎+𝑏+𝑐+𝑑 𝑎+𝑏+𝑐+𝑑 2 + + = 𝑍0 (𝑎 + 𝑐)(𝑐 + 𝑑) (𝑏 + 𝑑)(𝑐 + 𝑑) 𝑎+𝑏+𝑐+𝑑 𝑎+𝑏+𝑐+𝑑 Where 𝑃̂ = 𝑎+𝑐 [For 𝑚1 × 𝑚2 table, 𝑑𝑓 = (𝑚1 − 1)(𝑚2 − 1)] 3 The above example can be easily extended to other two-way contingency tables for testing homogeneity as follows: # of 0 1 2 ≥3 Stroke 𝑝11 𝑝12 𝑝13 𝑝14 No stroke 𝑝21 𝑝22 𝑝23 𝑝24 Children 𝐻0 : 𝑝11 = 𝑝21 , 𝑝12 = 𝑝22 , 𝑝13 = 𝑝23 , 𝑝14 = 𝑝24 Note: The test for homogeneity is indeed a test of independence as well, that is why the two tests have the same Chi-square test statistic. 2. Test for Independence (multinomial sampling): Example 2. A university conducted a study concerning faculty teaching evaluation classification by students. A sample of 467 faculty is randomly selected, and each person is classified according to rank (Instructor, Assistant Professor, etc. ) and teaching evaluation (Above, Average, Below). Rank Teaching Evaluation Above Average Average Below Average Sum Relative Frequency Instructor Assistant Associate Relative Professor Professor Professor Sum Frequency 36 62 45 50 193 0.413 48 50 35 43 176 0.377 30 13 20 35 98 0.210 114 125 100 128 467 1.000 0.244 0.268 0.214 0.274 1.000 Ho: Teaching Evaluation and Rank are independent variables. Teaching Evaluation Instructor Above p11 Average Average p21 Below p31 Average Sum n j 114 Relative p.1 Frequency Rank Assistant Associate Relative ni Professor Professor Professor Sum Frequency p12 p13 p14 193 p1. p22 p23 p24 176 p2. p32 p33 p34 98 p3. 125 100 128 467 p.2 p.3 p.4 1.000 n 1.000 4 The independence assumption: p ij  p ip  j for all ij Obserrved : nij Expected : Eij  n  pˆij  n  pˆi pˆ j  𝑟 ni   n j n 𝑐 (𝑛𝑖𝑗 −𝐸𝑖𝑗 ) 𝜒02 = ∑ ∑ 𝐸𝑖𝑗 2 𝑖=1 𝑗=1 df = (r-1)(c-1) For this example: r=#rows=3, c=#cols=4, 3X4 table Expected counts: Rank Teaching Evaluation Above Average Average Below Average Sum Instructor Assistant Associate Professor Professor Professor Sum 47.113 51.660 41.328 52.899 193 42.964 47.109 37.687 48.240 176 23.923 26.231 20.985 26.861 98 114 125 100 128 467 Individual cell chi-square values: Teaching Evaluation Above Average Instructor Assistant Associate Professor Professor Professor 2.6215 2.0698 0.3263 0.1589 Average 0.5904 0.1774 0.1916 0.5692 Below Average 1.5438 6.6740 0.0462 2.4663  2  2.62    2.47  17.44   62,0.95  12.59 5  Reject Ho There is evidence of an association between rank and evaluation. SAS Program: data eval; input job $ rating $ number; datalines; Instructor Above 36 Instructor Average 48 Instructor Below 30 Assistant Above 62 Assistant Average 50 Assistant Below 13 Associate Above 45 Associate Average 35 Associate Below 20 Professor Above 50 Professor Average 43 Professor Below 35 ; run; proc freq data=eval; weight number; tables job*rating / chisq ; run; The FREQ Procedure Statistics for Table of job by rating Statistic DF Value Prob ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Chi-Square 6 17.4354 0.0078 Likelihood Ratio Chi-Square 6 18.7430 0.0046 Mantel-Haenszel Chi-Square 1 10.8814 0.0010 Phi Coefficient 0.1932 Contingency Coefficient 0.1897 Cramer's V 0.1366 Sample Size = 467 6

Test for Independence (multinomial sampling)

Related documents

Products

Support

Test for Independence (multinomial sampling)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib