Chapter 15 Nonparametric Methods: Chi-Square Applications 1 Nonparametric One-Look.Com Definition: adjective: not involving an estimation of the parameters of a statistic adjective: not requiring knowledge of underlying distribution: used to describe or relating to statistical methods that do not require assumptions about the form of the underlying distribution You mean we can test without assuming a normal curve? Yes! 2 Goals 1. Conduct a test of hypothesis comparing an observed set of frequencies to an expected set of frequencies We can test a i. Goodness-of-fit tests: 1) Equal Expected Frequencies 2) Unequal Expected Frequencies hypothesis with assuming data distribution is normal! 2. List the characteristics of the Chi-square distribution 3 Chi-square (2) Applications 1. Testing Method where we don’t need assumptions about the shape of the data 2. Testing methods for Nominal data Data with no natural order Examples: Gender Brand preference Color There will be two difference from earlier tests when we do our hypothesis testing: Look up critical value of Chi-square in appendix B Use new formula for Calculated Test Statistic 4 Conduct A Test Of Hypothesis Comparing An Observed Set Of Frequencies To An Expected Set Of Frequencies 1. Goodness-of-fit tests: 1. Equal Expected Frequencies 2. Unequal Expected Frequencies 5 Purpose Of Goodness-of-fit Tests: 1. Compare an observed distribution (sample) to an expected distribution (population) 2. We will ask the question: 1. Is the difference between the observed values and the expected values: Due to chance (sampling error): The observed distribution is the same as the expected distribution Not due to chance: The observed distribution is not the same as the expected distribution 6 Hypothesis Testing: Equal Expected Frequencies Step 1: State null and alternate hypotheses Ho : There is no significant difference between the set of observed frequencies and the set of expected frequencies H1 : There is a difference between the observed and expected frequencies Step 2: Select a level of significance α = .01 or .05… 7 Hypothesis Testing Use α and df to look up critical value in appendix B k = number of categories (k – 1) = degrees of freedom Right Tail test Degrees of Freedom (df), (k - 1), k = # of categories Step 3: Identify the test statistic (Chi Square = 2) and draw curve with critical value 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Alpha = risk 0.1 2.706 4.605 6.251 7.779 9.236 10.645 12.017 13.362 14.684 15.987 17.275 18.549 19.812 21.064 22.307 23.542 24.769 25.989 that true Ho will be rejected 0.05 0.02 0.01 3.841 5.412 6.635 5.991 7.824 9.21 7.815 9.837 11.345 9.488 11.668 13.277 11.07 13.388 15.086 12.592 15.033 16.812 14.067 16.622 18.475 15.507 18.168 20.09 16.919 19.679 21.666 18.307 21.161 23.209 19.675 22.618 24.725 21.026 24.054 26.217 22.362 25.472 27.688 23.685 26.873 29.141 24.996 28.259 30.578 26.296 29.633 32 27.587 30.995 33.409 28.869 32.346 34.805 8 Hypothesis Testing Step 4: Formulate a decision rule If our calculated test statistic is greater than 18.307, we reject Ho and accept H1, otherwise we fail to reject Ho 9 Hypothesis Testing Step 5: Take a random sample, compute the calculated test statistic, compare it to critical value, and make decision to reject or not reject null and hypotheses Chi Square = 2 Observed (sample data) frequency in a particular category = fo Expected (pop data) frequency in a particular category = Number of categories = degrees of freedom = df = Sample size = fe k k-1 n Equal Expected Frequencies 1st 2nd fe Unequal Expected Frequencies f k o fe will be given or n*% for cell 2 fo fe 2 fe 10 Hypothesis Testing Step 5: Conclude: There is either: The sample evidence suggests that there is not a difference between the observed and expected frequencies The observed distribution is the same as the expected distribution The sample evidence suggests that there is a difference between the observed and expected frequencies The observed distribution is not the same as the expected distribution 11 List The Characteristics Of The Chi-square Distribution It is positively skewed However, as the degrees of freedom increase, the curve approaches normal It is non-negative Because (fo – fe)2 is never negative There is a family of chi-square distributions df determines which curve to use df = k – 1 k = # of categories 12 C2 Distribution 15- 13 df = 3 df = 5 df = 10 2 Limitations Of Chi-Square Because fe is used in the denominator, very small fe could result in very large calculated test statistic In General, avoid using Chi-Square when: 1. If there are only two cells: fe >= 5 2. If there are more than two cells 20% of fe cells contain values less than 5 14