YMS - 13.1 Test for Goodness of Fit Intro to Three Chi-Squared Tests Goodness of Fit - used to determine whether a specified population distribution is valid (testing yours against a stated claim) Homogeneity – organize data in a two-way table and compare two or more population proportions (all of the proportions in your sample are the same) Independence/Association – also organizes data in a two-way table but then determines whether the distribution of one variable has been influenced by another Chi-Square Basics The more the observed counts differ from the expected counts, the more evidence we have to reject Ho Plot the data before testing (segmented bar graphs) Test statistic - square of difference between observed and expected counts divided by expected Reasoning behind degrees of freedom and p-value are the same as they have been for every other test (top paragraph on p731) Chi-Square Distributions Total area under the curve is equal to 1 (just like any other density curve) Each curve (except when df = 1) begins at 0 on the horizontal axis, peaks, and then approaches horizontal axis asymptotically from above. Each curve is skewed right. As df increases, the curve becomes more symmetrical and looks more like a normal curve. Goodness of Fit Test Ho – “The actual population proportions are equal to the hypothesized proportions” or list with proportions such as in Example 13.1 on p729 Ha – “The actual population proportions are different from the hypothesized proportions” or “At least one of the proportions differs from the stated values.” Conditions all individual expected counts are at least 1 no more than 20% of the expected counts are less than 5 Test statistic - can be found using lists P-value - x2(test statistic, upperbound, df) found in distribution menu Some calculators have this test, but most don’t (they will all do the other ones) p736 #13.1, 13.2, 13.7 and M&M activity Simulations Use if we don’t have the resources to gather a representative sample Follow-Up Analysis Which component contributed most to the test statistic? Calculate (O-E)2/E for each observed count to determine which one is furthest from expected. YMS -13.2 Inference for Two-Way Tables Problem of Multiple Comparisons How do we do many comparisons at once with some overall measure of confidence in all of our conclusions? If we used two-sample z procedures many times, it would tell how different each pair is, but not how likely it is that we get n sample proportions spread so far apart. Two – Way Tables Gives counts for both successes and failures r x c table showing the relationship between two categorical values Example: Create an r x c table and find the expected counts in each cell. Expected Count row total x column total divided by table total (finding proportion and multiplying by count) p748 #13.14-13.15 Homogeneity of populations Chi-Square statistic, conditions and follow-up are the same as for G of F test Degrees of freedom equal (r – 1)(c – 1) Ho states that distribution of response variable is the same in all c proportions of r x c two-way table (Example: All treatments for cocaine addicts are equally effective) Ha says there is at least one proportion that is different Chi-Square test in calculator Enter observed counts into matrix [A] and TI-83 will generate expected counts Practice: p756 #13.16-13.17 Homework: p761 #13.19 and 13.21 Association/Independence Two-way table classifies observations from a single population in two ways (2 categories... Not just success/failure) Ho states “There is no relationship between the two categorical variables” or “The two variables are independent.” (Remember to put in context) Ha says there is a relationship or they are not independent Expected counts will equal row total times column total divided by table total Distinguishing Between Tests Goodness of Fit is the only one not in a two way table Homogeneity is in a two-way table with sample from two of more populations Association/Independence is another twoway table but it comes from a single sample of a single population Different Hypothesis Goodness of Fit Null: p(br) = 0.13, p(y) = 0.14, p(r) = 0.13, p(bl) = 0.24, p(o) = 0.20, p(g) = 0.16 Alt: At least one of the color proportions differs from the stated proportion Homogeneity Null: All of the treatments to quit smoking are equally as effective. Alt: At least one of the treatments has a different rate of effectiveness. Association/Independence Null: The is no relationship between student smoking habits and parent smoking habits. Alt: There is a relationship between student smoking and parent smoking habits. Chi-Square and Z-test The tests yield the same results when counts come from a 2 x 2 table with chi-square stat just being the square of the z statistic Use z test to compare just two proportions because you can choose for it to be one-sided and it has a related confidence interval for the difference in the proportions p768 #13.29-13.30 Classify p770 #31-39