Datasets with counts 150 people were surveyed about their softdrink preferences. They were asked, “Do you prefer coke, pepsi, or sprite?”. 50 people said “coke”. 70 people said “pepsi”. 30 people said “sprite”. Test the hypothesis that people differ in their soda preferences. Pepsi Coke Sprite 70 50 30 Solution: H0: ppepsi= pcoke=psprite=1/3 HA: not all p’s are equal Use a one-way chi-square test. One-way Chi-Square Test (c2) • Used when your dependent variable is counts within categories (# pepsi lovers, # coke lovers, # sprite lovers) • Used when your DV has two or more mutually exclusive categories • Compares the counts you got in your sample to those you would expect under the null hypothesis • Also called the Chi-Square “Goodness of Fit” test. One-way c2 example Which power would you rather have: flight, invisibility, or x-ray vision? Flight Invisibility X-ray vision 18 people 14 people 10 people Is this difference significant, or is just due to chance? One-way c2 example Step 1: Write hypotheses H0: pfly = pinvis = pxray = 1/3 HA: not all p’s are equal Step 2: Write the observed frequencies, and also the frequencies that would be expected under the null hypothesis Flight Invisibility X-ray vision fo= 18 fo= 14 fo= 10 N=42 One-way c2 example Step 1: Write hypotheses H0: pfly = pinvis = pxray = 1/3 HA: not all p’s are equal Step 2: Write the observed frequencies, and also the frequencies that would be expected under the null hypothesis Flight Invisibility X-ray vision fo= 18 fe= 14 fo= 14 fe= 14 fo= 10 fe= 14 N=42 One-way c2 example ( fo fe )2 Step 3: Compute the relative squared discrepancies fe Flight Invisibility X-ray vision fo= 18 fe= 14 fo= 14 fe= 14 fo= 10 fe= 14 ( fo fe )2 1.143 fe And sum them up ( fo fe )2 0 fe N=42 ( fo fe )2 1.143 fe ( fo fe )2 c 1.143 0 1.143 2.286 fe 2 df k 1 One-way c2 example 2 c Step 4: Compare to critical value of Flight Invisibility X-ray vision f0= 18 fe= 14 f0= 14 fe= 14 f0= 10 fe= 14 ( f0 fe )2 1.143 fe c 2 2.286 2 c crit 5.99 df 2 Retain null! ( f0 fe )2 0 fe ( f0 fe )2 1.143 fe N=42 Calculating one-way c2 Steps: 1) State hypotheses 2) Write observed and expected frequencies 3) Get c2 by summing up relative squared deviations 4) Use Table I to get critical c2 Practice Suppose we ask 200 randomly selected people if they think that voting should be made compulsory. The data come out like this: No Yes fo= 84 f0= 116 Is there evidence for a clear preference? 2 f f 2 o e c f e Practice Suppose we ask 200 randomly selected people if they think that voting should be made compulsory. The data come out like this: No Yes f0= 84 f0= 116 fe=100 fe=100 fo fe 2 fe c 2 5.12 2 c crit 3.84 2.56 df 1 Reject null! f o f e 2 fe 2.56 Other null hypotheses • we tested H0 that all cell frequencies are equal • But can test any expected frequencies • example – political affiliation among psych grad students: Democrat Republican Independent 9 5 18 • political affiliation in the U.S. (Gallup): Democrat Republican Independent 46% 43% 11% Other null hypotheses Is the distribution for psych grad students different than the distribution for the U.S.? If not, then 46% of the 32 students would be Democrats, 43% would be republican, and 11% would be independent Democrat Republican Independent fo = 9 fe= 14.7 fo = 5 fe= 13.8 fo= 18 fe= 3.5 ( fo fe )2 2.21 fe ( fo fe )2 5.61 fe 2 ( f f ) c 2 o e 87.89 fe 2 c crit 5.99 Reject null! df k 1 2 ( fo fe )2 60.07 fe N=32 Points of interest about c2 1. c2 cannot be negative 2. c2 will be zero only if each observed frequency exactly equals the expected frequency 3. The larger the discrepancies, the larger the c2 4. The greater the number of groups, the larger the c2. That’s why c2 distribution is a family of curves with df = k-1. Two Factor Chi-Square A 1999 New Jersey poll sampled people’s opinions concerning the use of the death penalty for murder when given the option of life in prison instead. 800 people were polled, and the number of men and women supporting each penalty were tabulated. Preferred Penalty Death Penalty Life in Prison No Opinion Female 151 179 80 Male 201 117 72 Contingency table: shows contingency between two variables Are these two variables (gender, penalty preference) independent?? Two-Factor Chi-Square Test • Used to test whether two nominal variables are independent or related • E.g. Is gender related to socio-economic class? • Compares the observed frequencies to the frequencies expected if the variables were independent • Called a chi-squared test of independence • Fundamentally testing, “do these variables interact”? Example A 1999 New Jersey poll sampled people’s opinions concerning the use of the death penalty for murder when given the option of life in prison instead. 800 people were polled, and the number of men and women supporting each penalty were tabulated. Preferred Penalty Death Penalty Life in Prison No Opinion Female 151 179 80 Male 201 117 72 H0: distribution of female preferences matches distribution of male preferences HA: female proportions do not match male proportions Example Preferred Penalty Death Penalty Life in Prison No Opinion Female f0= 151 fe= ___ f0= 179 fe= ___ f0= 80 fe= __ Male f0= 201 fe= ___ f0= 117 fe= ___ f0= 72 fe= __ Example Preferred Penalty Death Penalty Life in Prison No Opinion Female f0= 151 fe= 133.3? f0= 179 fe= 133.3? f0= 80 fe= 133.3? Male f0= 201 fe= 133.3? f0= 117 fe= 133.3? f0= 72 fe= 133.3? WRONG -- this is saying there is an equal # of men and women, and an equal preference for prison sentences (e.g. no main effects). We are willing to let there be main effects. We just want to test whether the distribution of preferences for men and women is the same (e.g. no interaction effects) Example Preferred Penalty Death Penalty Life in Prison No Opinion Female f0= 151 fe= ___ f0= 179 fe= ___ f0= 80 fe= __ Male f0= 201 fe= ___ f0= 117 fe= ___ f0= 72 fe= __ We need to look at the marginal totals to get our expected frequencies Example Preferred Penalty Death Penalty Life in Prison No Opinion frow Female f0= 151 fe= ___ f0= 179 fe= ___ f0= 80 fe= __ 410 Male f0= 201 fe= ___ f0= 117 fe= ___ f0= 72 fe= __ 390 fcol 352 296 152 n = 800 Example Preferred Penalty Death Penalty Life in Prison No Opinion frow Female f0= 151 fe= ___ f0= 179 fe= ___ f0= 80 fe= __ 410 Male f0= 201 fe= ___ f0= 117 fe= ___ f0= 72 fe= __ 390 fcol 352 pdeath=.44 296 plife=.37 152 pnone=.19 n = 800 Example Preferred Penalty Death Penalty Life in Prison No Opinion frow Female f0= 151 fe=.44(410) f0= 179 fe=.37(410) f0= 80 fe=.19(410) 410 Male f0= 201 fe=.44(390) f0= 117 fe=.37(390) f0= 72 fe=.19(390) 390 fcol 352 pdeath=.44 296 plife=.37 152 pnone=.19 n = 800 Example Preferred Penalty Death Penalty Life in Prison No Opinion frow Female f0= 151 fe=180.4 f0= 179 fe=151.7 f0= 80 fe=77.9 410 Male f0= 201 fe=171.6 f0= 117 fe=144.3 f0= 72 fe=74.1 390 fcol 352 pdeath=.44 296 plife=.37 152 pnone=.19 n = 800 f o f e 2 4.79 4.91 0.06 5.04 5.16 0.06 20.02 c f e 2 df (k gender 1)( k preference 1) 1 2 2 Reject null! c crit 5.99 2 Calculating two-way c2 Steps: 1) State hypotheses f 2) Get expected frequencies f e col ( f row ) n 3) Get c2 by summing up relative squared deviations 4) Use table to get critical c2 Practice Suppose we want to determine if there is any relationship between level of education and medium through which one follows current events. We ask a random sample of high school graduates and a random sample of college graduates whether they keep up with the news mostly by reading the paper or by listening to the radio or by watching television. radio paper TV 10 29 61 college 24 44 32 HS fe f col ( f row ) n 2 f f 2 o e c fe Practice HS college fcol f o f e 2 c f e 2 df = (2)*(1) = 2 radio paper TV frow fo=10 fe=17 fo=24 fe=17 34 fo=29 fe=36.5 fo=44 fe=36.5 73 fo=61 fe=46.5 fo=32 fe=46.5 93 100 pradio= .17 ppaper= .365 pTV= .465 = 17.89 2 c crit 5.99 100 N=200 Assumptions of Chi-Square Test 1. Categories are mutually exclusive – A subject cannot be counted in more than one cell 2. Expected frequency in each cell must be – at least 10 when kA and kB are less than or equal to 2 – at least 5 when kA or kB is greater than 2 (e.g., a 2x3 design) – N must be sufficiently large to ensure that this is true