Stat 101L – Lecture 39 Categorical Data In Chapter 3 we introduced the idea of categorical data. In Chapter 15 we explored probability rules and when events are independent. In Chapter 26 we put these two ideas together to compare counts. 1 Categorical Data National Opinion Research Center’s General Social Survey In 2006 a sample of 1928 adults in the U.S. were asked the question “When is premarital sex wrong?” The participants were also asked with what religion they were affiliated. 2 Who?/What? Who? –A sample of 1928 adults. What? –Attitude towards premarital sex. –Religious affiliation. 3 Stat 101L – Lecture 39 What? When is premarital sex wrong? –Categorical: Always Wrong, Almost Always Wrong, Sometimes Wrong, Not Wrong at All 4 What? What is your religious affiliation? –Categorical: Catholic, Jewish, Protestant, None, Other 5 When is Premarital Sex Wrong? Religion Always Wrong Almost Always Wrong Sometimes Wrong Not Wrong at All Total Catholic 83 47 105 249 484 Jewish 4 2 9 20 35 364 97 190 341 992 27 12 52 219 310 Protestant None Other 28 8 20 51 107 Total 506 166 376 880 1928 6 Stat 101L – Lecture 39 When is Premarital Sex Wrong? Always Religion Catholic 17.2% Almost Always 9.7% Sometimes 21.7% Never 51.4% Total 100% Jewish 11.4% 5.7% 25.7% 57.2% 100% Protestant 36.7% 9.8% 19.1% 34.4% 100% None 8.7% 3.9% 16.8% 70.6% 100% Other 26.2% 7.5% 18.7% 47.6% 100% 7 Mosaic Plot 8 Comparing Counts People who have no religion or are Jewish are more likely to say premarital sex is Not Wrong at All. Protestants are much more likely to say premarital sex is Always Wrong. 9 Stat 101L – Lecture 39 Comparing Counts Are these differences statistically significant? Or, are these differences due to chance variation so that religion and attitude towards premarital sex are independent? 10 Comparing Counts If religion and attitude towards premarital sex are independent then Pr(A and B) = Pr(A)*Pr(B) where A is a religion category and B is an attitude category. 11 Expected Count If religion and attitude toward premarital sex are independent we would expect to see n*Pr(A)*Pr(B) people in the religion category A and the attitude category B. 12 Stat 101L – Lecture 39 Always Religion Almost Always Sometimes Never Total Catholic 484 Jewish 35 Protestant 992 None 310 Other 107 Total 506 166 376 880 1928 13 Expected Count Catholic and Always Wrong 484 506 * 1928 1928 484 * 506 E 127.0 1928 E 1928 * 14 Expected Counts Always Religion Catholic Jewish Protestant Almost Always Sometimes Never Total 127.0 41.7 94.4 220.9 484 9.2 3.0 6.8 16.0 35 260.3 85.4 193.5 452.8 992 None 81.4 26.7 60.4 141.5 310 Other 28.1 9.2 20.9 48.8 107 Total 506 166 376 880 1928 15 Stat 101L – Lecture 39 Observed = Expected? Take the difference between the observed and expected counts in a cell. Square the difference. Divide by the expected count. Sum up over all the cells. 16 Chi-square Test Statistic 2 O E 2 E df r 1* c 1 17 Cell contributions to Catholic and Always 83 127.02 44.02 127.0 2 127.0 15.24 18 Stat 101L – Lecture 39 Test of Independence H0: Religion and attitude towards premarital sex are independent. HA: Religion and attitude towards premarital sex are not independent. 2= 184.51, df=(5-1)*(4-1)=12 P-value < 0.0001 19 Test of Independence Because the P-value is so small we reject the null hypothesis. Religion and Attitude towards premarital sex are not independent. 20 Comment Look at the cells with the largest contributions to the test statistic. None and Always Wrong has much fewer people than expected and None and Not Wrong at All has much more people than expected. 21 Stat 101L – Lecture 39 Comment Look at the cells with the largest contributions to the test statistic. Protestant and Always Wrong has much more people than expected and Protestant and Not Wrong at All has much fewer people than expected. 22 Summary Protestants are much more likely to say Always Wrong and much less likely to say Not Wrong at All. People with no religion are much more likely to say Not Wrong at All and much less likely to say Always Wrong. 23 JMP Religion 1 Catholic 2 Jewish 3 Protestant 4 None 5 Other Attitude 1 Always Wrong 1 Always Wrong 1 Always Wrong 1 Always Wrong 1 Always Wrong 5 Other 4 Not Wrong at All Count 83 4 364 27 28 51 24 Stat 101L – Lecture 39 JMP Fit Y by X Y, Response: Attitude X, Factor: Religion Freq: Count 25 Test Likelihood Ratio Pearson ChiSquare 193.959 184.510 Prob>ChiSq <.0001* <.0001* 26 Comment Remember that JMP only does the calculations for you (Step 3). You have to provide all the other steps in the test of hypothesis. 27