“Chi-Square Statistics” By Namrata Khemka Table of Contents 1. 2. 3. 4. 5. What is Chi-Square? When and why is Chi-Square used? Limitations/Restrictions of Chi-Square Examples References What is “Chi Square” • • • • Invented by Pearson Test for “Goodness of fit” Tests for independence of variables Non parametric test Parametric vs. Non Parametric Data Parametric data Non Parametric data 1. Numerical scores 2. Manipulate the scores 3. Example • Average height of people in 10 cities 1. Nominal data 2. Scores not manipulated 3. Example • How many people are over 6ft and how many are below in 2 cities What is “Chi Square” • • • • • • Invented by Pearson Test for “Goodness of fit” Tests for independence of variables Non parametric test Analyze categorical or measurement data SPSS or Excel Goodness of the Fit 1. 2. 3. 4. 5. 6. Null Hypothesis Observed frequency Expected frequencies Good Fit Poor Fit Sum of observed frequencies = sum of expected frequencies. Computational Steps • Scenario Scenario: • A movie theater owner would like to know the factors involved in movie selection by people. • A sample of 50 people were asked, which of the following were important to them. • They may choose one of the following: 1. Actors 2. Directors 3. Time the movies is playing 4. Genre Question • Do any of these factors play a greater role than the others? Computational Steps • Scenario • Threshold Value = 0.05 • Null Hypothesis Null Hypothesis • There is no difference in the importance of these 4 factors in determining which movie is selected Computational Steps • • • • • • Scenario Threshold Value = 0.05 Null Hypothesis Observed Frequencies Expected Frequencies p-value Interpret the Results • Since p is < 0.05, we reject the null hypothesis. • There fore, some of the factors are mentioned more than others in response to movie selection Test of Independence • Examines the extent to which two variables are related • Example Scenario: • • University of Calgary is interested in determining whether or not there is a relationship between educational level and the number of flights taken each year. 150 travelers in the airport were interviewed and the results are: Scenario - Continued 2 or less flights a year University 53 Student High School 37 Student More than 2 flights a year 22 38 Computational Steps • Scenario • Threshold Value = 0.05 • Null Hypothesis Null Hypothesis • The educational level of the travelers and the number of flights are independent of one another. Computational Steps • • • • • • Scenario Threshold Value = 0.05 Null Hypothesis Observed Frequencies Expected Frequencies p-value Interpret the Results • Since p is < 0.05, we reject the null hypothesis. • These 2 variables are not independent of one another. • Thus, the educational level of travelers and the number of flights they take are related Requirements and Limitations • • • • • Random sampling Data must be in raw frequencies Independence of observations Size of the expected frequencies Collapsing values Collasping Values Leather Shoes Sandals Boots Runners Man 18 5 12 16 Women 20 19 6 10 Calculation - Details • • • • • Fo – fe (Fo – fe)2 ((Fo – fe)2)/fe Chi-square = SUM((Fo – fe)2)/fe Calculate the degrees of freedom = (R-1) (C-1) Calculation - Fo – Fe University Student High School Student 2 or less flights a year 8 More than 2 flights a year -8 8 -8 Calculation – (Fo – Fe)2 University Student High School Student 2 or less flights a year 64 More than 2 flights a year 64 64 64 Calculation – ((Fo – fe)2)/fe University Student High School Student 2 or less flights a year 1.42 More than 2 flights a year 1.42 2.13 2.13 Calculation – Continued • Chi-square = SUM((Fo – fe)2)/fe • 7.1111 • Calculate the degrees of freedom = (R-1) (C-1) • (2-1)(2-1) = 1 Distribution Table df 0.9 0.1 0.05 0.025 0.01 1 0.016 2.706 3.841 5.024 6.635 2 0.211 4.605 5.991 7.378 9.21 3 0.584 6.251 7.815 9.348 11.345 4 1.064 7.779 9.488 11.143 13.277 Interpretation Chi-Square Conclusion • • • • • What is chi-square When should chi-square be used Limitations of Chi-square Examples Resources References • • • www.ling.upenn.edu/courses/Summer_2 002/ling102/chisq.html Statistical techniques in business and economics by Lind, Marchal and Mason Statistics for the behavioral sciences by Federick J. Gravetter and Larry B. Wallnau Questions???