Topics for Today Introduction to Non-parametric Significance tests 1 - Way Chi-square test 2 – Way Chi-square test Stat203 Fall 2011 – Week 9 Lecture 3 Page 1 of 23 Non-Parametric is a big word But it makes life easier! (sometimes) Recall, we had assumptions that were necessary to use the t-tests for comparing means and z-tests for comparing proportions. We might need a non-parametric test if these assumptions (or conditions) are not met There are also some scientific questions about nominal or ordinal data that can’t be answered using the t-tests or z-tests. First off, a recap of the assumptions/conditions necessary for the t-tests and z-tests we’ve looked at already Stat203 Fall 2011 – Week 9 Lecture 3 Page 2 of 23 Assumptions: t-tests for means Hypotheses involving means of interval or ratio level data. Assumptions required to use a t-tests for one or more samples: - Test variable(s) normally distributed, or the sample(s) large enough (ie: > 50) so that the sampling distribution of the mean is normally distributed - Interval or Ratio level data Stat203 Fall 2011 – Week 9 Lecture 3 Page 3 of 23 Assumptions: z-tests for proportions Hypotheses involving 1 or 2 proportions. Assumptions required to use a z-test: - 𝑛𝑝0 > 10 and 𝑛(1 − 𝑝0 ) > 10 (1-sample) - 𝑛1 > 10 & 𝑛2 > 10 and 𝑛1 𝑝1 > 5 & 𝑛2 𝑝2 > 5 (2-samples) Stat203 Fall 2011 – Week 9 Lecture 3 Page 4 of 23 1-Way Chi-Square Test The objective of this test is to determine how similar is an observed set of frequencies (or relative frequencies), fo, to an expected set of frequencies, fe. A typical research hypothesis would indicate that individuals are more (or less) likely than expected to select some categories more than others. … and the most common research hypothesis is that the relative-frequency of responses is similar for all categories. Which has the following statistical hypotheses: H0: fo = fe Ha: fo ≠ fe Stat203 Fall 2011 – Week 9 Lecture 3 Page 5 of 23 … but what are fo and fe? We’ve seen fo before, it’s just the observed frequency (or relative frquency) for each category of a nominal or ordinal variable! The new item is fe … think of this as some expected relative frequency, or %. What does that mean? - What is the expected relative frequency of men and women going into a mens room? - What is the expected relative frequency of heads and tails out of 100 flips of a fair coin? - In Vancouver, what is the expected relative frequency of raining and sunny days? Stat203 Fall 2011 – Week 9 Lecture 3 Page 6 of 23 The Chi-Square Test Statistic Here’s the formula that we’ll need to calculate the Chi-square test statistic: 2 ( ) 𝑓 − 𝑓 𝑜 𝑒 2 𝜒 =∑ 𝑓𝑒 …so it’s a bit more complicated than the tstatistic for means and the z-statistic for proportions. Once we calculate this, we then look up the value in Table E. As with the t-distribution, though, we need a ‘degrees of freedom’ for this test statistic. Differently from the t-test, the degrees of freedom for the Chi-square is the number of categories (k) minus 1. Stat203 Fall 2011 – Week 9 Lecture 3 Page 7 of 23 Example (1-way Chi-Square test): To determine whether dogs are color blind, a student sets up an experiment where she provides food to a dog in 4 differently coloured dishes and records the colour of the dish the dog chooses to eat from first. She does this for a total of 80 dogs, randomly ordering the dishes each time. If dogs are truly colour blind, each colour dish should be selected about the same number of times. The Chi-Square test allows us to formally test this research hypothesis. Research Hypothesis: Individuals: Population: Variable: Stat203 Fall 2011 – Week 9 Lecture 3 Page 8 of 23 Parameter: Statistical Hypotheses: The observed frequency of each colour from the 80 dogs is below, as is the expected frequency if the dogs were colour blind: Colour Brown Orange Yellow Green fo 25 18 19 18 fe 20 20 20 20 And we have, N = 80 k =4 (# of categories) Now, let’s calculate our test statistic: Stat203 Fall 2011 – Week 9 Lecture 3 Page 9 of 23 Stat203 Fall 2011 – Week 9 Lecture 3 Page 10 of 23 p-value: Reject H0 at α = 0.05? Conclusion: Stat203 Fall 2011 – Week 9 Lecture 3 Page 11 of 23 2-Way Chi-Square Test We used the 1-way test, to determine whether the observed relative frequency distribution was different than some ‘expected’ distribution. note the similarity to a 1-sample test for a mean or proportion where we are testing whether the mean or proportion is different than some ‘null’ value We can use a 2-way test to determine whether relative frequency distributions from two samples are the same or different from one another. note the similarity to a 2-sample test for means or proportions where we are testing whether the means or proportions are different from one another. Stat203 Fall 2011 – Week 9 Lecture 3 Page 12 of 23 Research questions that require a 2-way chisquare test, are based on relative frequencies (like the 1-way test), but compare two populations (or samples). - Is the relative frequency of sunny days in a year different between Vancouver and Seattle? - Is the relative frequency of female students different between UBC and SFU? - Is the relative frequency of job type (white vs blue vs service) the same for women and men? Stat203 Fall 2011 – Week 9 Lecture 3 Page 13 of 23 Example (Q24, pg 338): A radio executive considering a switch in his station’s format collects data on the radio preferences of various age groups of 78 listeners. Does radio format preference differ by age group? Research Hypothesis: Individuals: Populations: Variables: Parameters: Statistical Hypotheses: Stat203 Fall 2011 – Week 9 Lecture 3 Page 14 of 23 The observed frequency (fo) of age group and radio format preference is below: Format Music News-talk Sports Total Age Group Young Middle Older Total Adult Age Adult 14 10 3 27 4 15 11 30 7 9 5 21 25 34 19 78 And we have, N = 78 k =9 (# of categories) but … we need fe! Here’s the formula for each cell, with row and column totals associated with that cell: (𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 ) ∗ (𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙) 𝑓𝑒 = 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙 Stat203 Fall 2011 – Week 9 Lecture 3 Page 15 of 23 So, the table with fe is: Age Group Young Adult Music 25*27/78 News-talk 25*30/78 Sports 25*21/78 25 Total Format Middle Age 34*27/78 34*30/78 34*21/78 34 Older Total Adult 19*27/78 27 19*30/78 30 19*21/78 21 19 78 Which … after you do the arithmetic you get: Age Group Young Adult Music 8.7 News-talk 9.6 Sports 6.7 25 Total Format Stat203 Fall 2011 – Week 9 Lecture 3 Middle Age 11.8 13.1 9.2 34 Older Adult 6.6 7.3 5.1 19 Page 16 of 23 Total 27 30 21 78 … now that we have all the components, we can calculate our test statistic (try the arithmetic on your own): 2 ( ) 𝑓 − 𝑓 28.1 3.2 13.0 𝑜 𝑒 2 𝜒 =∑ = + + + 𝑓𝑒 8.7 11.8 6.6 31.4 9.6 0.09 6.7 + 3.6 13.1 + + 0.04 9.2 13.7 + 7.3 0.01 5.1 =10.9 p-value: Stat203 Fall 2011 – Week 9 Lecture 3 + Page 17 of 23 Reject H0 at α = 0.05? Conclusion: Stat203 Fall 2011 – Week 9 Lecture 3 Page 18 of 23 One snag … There is still an assumption necessary for us to be able to use any of the Chi-Square tests. All the cells in the table (ie: the frequency for all categories) must be at least 5. Stat203 Fall 2011 – Week 9 Lecture 3 Page 19 of 23 Other names for Chi-square tests: 1-Way: o One-sample Chi-square o Chi-square goodness of fit 2-Way o 2-sample Chi-square o 2x2 Chi-square o r by c Chi-square o Chi-square test for independence Nice page describing how to do Chi-Square tests in SPSS. http://academic.uofs.edu/department/psych/methods/cannon99/level2d.html Stat203 Fall 2011 – Week 9 Lecture 3 Page 20 of 23 So, a decision tree for choosing hypothesis tests: Single sample? - Interval or Ratio data? o One-sample t-test - Nominal or Ordinal data? o Proportion (ie: 2 categories)? 1-sample z-test for proportions o Distribution (ie: several categories)? 1-way Chi-square Stat203 Fall 2011 – Week 9 Lecture 3 Page 21 of 23 Two samples? - Interval or Ratio data? o Individuals measured twice/Matched? Paired t-test o Variances equal? 2-sample t-test w/equal variances o Variances not equal? 2-sample t-test w/unequal variances - Nominal or Ordinal data? o Proportion (ie: 2 categories) & conditions met? 2-sample z-test for proportions o Distribution (ie: several categories)? 2-way Chi-square Stat203 Fall 2011 – Week 9 Lecture 3 Page 22 of 23 Today’s Topics Chi – Square tests - for comparing distributions of nominal or ordinal data - 1-way compare distribution in a single sample to some expected distribution - 2-way compare distributions for two populations New Reading Chapter 10 up to pg 352 Stat203 Fall 2011 – Week 9 Lecture 3 Page 23 of 23