Chi-square test for goodness of fit -> to perform a test for the distribution of a categorical variable with two or more categories Chi-square test for homogeneity -> to perform a test to compare the distribution of a categorical variable for two or more populations or treatments (where the variable can have two or more categories) Chi-square test for independence -> to perform a test to determine if there is convincing evidence for an association between two variables in a population 11.1: Chi-square test for goodness of fit We compare the observed counts from our sample with the counts that would be expected if the null was true ● **We assume that the null is true when performing a significance test) ● The more the observed counts differ from the expected counts -> the more evidence wehave against the null hypothesis and for the alternative hypothesis Candy example: ● Company’s claims: brown, red, yellow, and green are 12.5% and orange and blue are 25% ● Sample of 60: brown = 12, red = 3, yellow = 7, green = 9, orange = 9, blue = 20 ̂ 𝑝 of brown = 0.20, which is not the the 0.125 that the company claimed, might use a onesample z test for a proportion ● P = true proportion of M&M’s in the large bag that are brown ● Hₒ: p = 0.125, Hₐ: p ≠ 125 ○ Perform a test like this for all 6 colors This method would be inefficne tand lead to the problem of multiple tests ● Wouldn’t tell us how likely it is to get a random sample of 60 candies with a color DISTRIBUTION that differs as much from one the claimed by the company as the sample does (taking all colors into consideration at one time) Stating Hypotheses ● ● Null hypothesis: should state a claim about the distribution of a single categorical variable in the population of interest Alternative hypothesis: should state that the categorical variable does not have the specified distribution ○ Don’t state the alternative hypothesis in a way that suggests that all the proportions in the hypothesized distribution are wrong Back to candy man: categorical variable is color, population of interest is all m&ms in large bag ● Hₒ : the distribution of color in the large bag of M&Ms is the same as the claimed distribution ○ ● Hₒ : p brown = 0.125, p red = 0.125, p yellow = 0.125, p green = 0.125, p orange = 0.25, p blue = 0.25 Hₐ : the distribution of color in the large bag of M&Ms is NOT the same as the claimed distribuiton ○ Hₐ : at least two of the p colors’s are incorrect ■ Where p color = the true proportion of M&Ms in the large bag of that color ○ This shouldn’t be written as all the p colors are not equal to the claimed We don’t say at least one because if the stated proportion in one category is wrong -> the stated proportion in at least one other category must be wrong because the sum of the p color’s must be 1 Calculating Expected Counts In a Chi-Square Test for GOF The expected count for category i in the distribution of a categorical variable is npᵢ Where pᵢ is the relative frequency for category i specified by the null hypothesis The number of counts in a specific category in a random sample is a binomial random ● variable ● Whose expected value is np = the average number of counts in that category ○ Expected count is not likely to be a whole number + shouldn’t be rounded to a whole number Back to candy man: Expected color distribution for the random sample of 60 candies in all color categories ● Red: 60 (0.125) = 7.5 ● Yellow: 60 (0.125) = 7.5 ● Green: 60 (0.125) = 7.5 ● Brown: 60 (0.125) = 7.5 ● Orange: 60 (0.25) = 15 ● Blue: 60 (0.25) = 15 To see if data gives convincing evidence for alternative -> compare observed counts with expected counts ● If observed counts are far from expected counts -> we get the evidence we want We can see some pretty big differences between the observed and expected counts in several color categories ● How likely is it that differences this large or larger would occur just by chance in random samples of size 60 from the population disribtuion claimed by the website? ○ Calculate a statistic that measures how far apart the observed and expected counts are -> chi-square test statistic Chi-Square Test Statistic Chi-square test statistic: a measure of how far the observed counts are from the expected counts ● 2 𝑝 ○ =∑ (𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 − 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝) 2 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 Where the sum is over all possible values of the categorical variable Back to candy example: 𝑝 + 1.67 = 9.8 2 2 = (12−7.5) 7 .5 + (3−7.5) 2 (20−15) 2 +. . . = 7 .5 15 2.7 + 2.7 + 0.03 + 0.30 + 2.4 Fair die example: made a 6-sided die, roll 90 times to test if each side was equally liekly to show up Outcome of roll 1 2 3 4 5 6 Total Frequency 12 28 12 13 10 15 90 State the hypotheses ● ● Hₒ : the sides of Carrie’s die are equally liekly to show up ○ Carri’s die is fair ○ The distribution of outcome for Carrie’s die is uniform Hₐ : the sides of Carrie’s die are not equally liely to show up Calculate the expected count for each of the possible outcomes ● ● If the null is true, each of the 6 sides should show up ⅙ of the time The expected count is 90 (⅙) = 15 for each side Calculate the value of the chi-squrae statistic ● 𝑝 2 2 = (12−15) 15 + (28−15) 2 15 +. . . (15−15) 2 15 = 0.6 + 11.27 + 0.6 + 0.27 + 1.67 + 0 = 14.41 Chi-Square Distributions and P-values 𝑝 2 is a measure of the distance the observed counts are from the expected counts Distance is always zero or positive Zero is only when the observed counts are exactly equal to the expected counts Large values of 𝑝 2-> stronger evidence for the alternative (observed counts are far from what we would expect if the null was true) ● Small values of 𝑝 2-> the data is consistent with the null ● ● ● Back to candy example: is the value from the candy sample, 𝑝 2= 9.8, a large value? ● In a simulation of 1000 random samples of size 60, only 87 of the 1000 simulated samples resulted in a chi-square test statistic of 9.8 or higher ● So estimated P-value is 87/1000 = 0.087 > ⍺ = 0.05 -> we fail to reject the null -> we don’t have convincing evidence that the color distribution in the sample is different form the distribution claimed by the company The sampling distribution of the chi-square test statistic is NOT a Normal distribution ● It’s right-skewed that allows only nonnegativ values (𝑝 2can’t be negative) ● Sampling distribuiotn depends on the number of possible values for the categorical variable (# of categories) When the expected counts are all at least 5 -> sampling distribution of 𝑝 2test statistic is modeled well by a chi-squre distribution with degrees of freedom = number of categories minus 1 ● Chi-squre distribution: defined by a density curve that takes only nonnegative valus and is skewed to the right A particular chi-square distribution is specified by its degrees of freedom ● As the degrees of freeodm increase -> the density curves becomes less skewed -> larger values become more probable ○ Mean of chi-squre distribution = degrees of freedom ○ When df > 2, the mode of the chi-squre density curve is at df =2 Back to candy: 𝑝 2= 9.8, because all the epecetd counts are at least 5 -> the 𝑝 2test statistic will be modeled well by a chi-sqare distribution where the null is true ● P-value is probability of getting a value of 𝑝 2as large or larger than 9.8 when the null is true (using 5 degrees of freedom) ● P-value = 0.081 What conclusions should we draw about Hₒ : the distribution of color in the large bag of M&Ms is the same as the claimed distribution? ● ● Because our P-value of 0.081 > ⍺ = 0.05 -> we fait to reject the null We don’t have convincing evidence that the distribution of of color in the large bag of M&Ms differs from the claimed distribution ○ **Remember -> failing to reject the null does not mean that the null is true, we can only say that the sample data did not provide convincing evidence to reject the null In calc -> go to 6: 𝑝 2 cdff, use 𝑝 2 as the lower bound and infinity as the upper bound Back to fair die example: Find the P-value ● Df = 6 -1 = 5 ● Using 𝑝 2= 14.41, P-value = 0.0132 We reject the null because the P-value of 0.0132 < ⍺ = 0.05. There is convincing evidence that Carrie’s die is unfair Carrying Out a Test Chi-square test for goodness of fit uses an approximaion that becomes more accurate as we take larger sample Conditions for Performing a Chi-Square Test for Goodness of Fit ● ● ● Random: the data comes from a random sample from the population of interest 10%: when sampling without replacement, n < 0.10N Large Counts: all EXPECTED counts are at least 5 ○ Expected not observed To compare the distribution of a categorical variable in one population to a claimed distribution -> use chi-square test for goodoness of fit The Chi-Square Test for Goodness of Fit To test a hypothesized model for the distribution of a categorical variable Suppose the conditions are met. To peform a test of Hₒ : the stated distribution of a categorical variable in the population of interest is correct ● Compute chi-square test staistics (where sum is over the k different caetgoires) ● P-value is area to the right of 𝑝 2 under chi-squre density curve with k -1 degrees of freedom Hockey birthday example:random sample of 80 NHL players from recent reasons selected, birthdays recorded -> to see if birth data is related to success (judged by if players makes it into NHL), do these data provide convincing evidence that the birthdays of NHL players are not uniformly distributed across the four quarters of the year? Birthday Jan - MAr Apr - Jun Jul - Sep Oct - Dec % of Players 32 20 16 12 State ● ● ● Hₒ : the birthdays of all NHL players are uniformly distributed across the four quarter of the year Hₐ : the birthdays of all NHL players are NOT uniformly distributed across the four quarter of the year We’ll use ⍺ = 0.05 Plan ● Chi-square test for goodness of fit ○ Random: data comes from random sample of all NHL players ○ 10%: must assume tha 80 is less than 10$ of all NHL players ○ Large Counts: all expected counts = 80 (¼) = 20 ≥ 5 ■ Some evidence in favor of the alternative because the observed counts differ from the epxiered counts ● 𝑝 ● P-value with df = 3 is equal to 0,011 Do 2 2 = (32−20) 20 + (20−20) 2 20 +. . . (12−20) 2 20 = 7.2 + 0 + 0.8 + 3.2 = 12.3 Conclude ● Because the P-value of 0.011 < ⍺ = 0.05, we reject the null. We have convincing evidence that the birthdays of NHL players are not uniformly distributed across the four quarter of the year In calc -> go to tests -> 𝑝 2 GOF Test ● When using calc, write out first few terms of chi-square calculator, name the procedure, test statistic, degrees of freedom, and P-value How to investigate HOW the distribution is different? ● Identify the categories that contribute the most to the chi-square statistic ● Describe how the observed and expcetd counts differ in those categories, noting the direction of the difference Back to hockey birthday example: Birthday Observed Expected O-E (O - E)^^2 / E Jan-Mar 32 20 12 7.2 Apr-Jun 20 20 0 0.00 Jul-Sep 16 20 -4 0.8 Oct-Dec 12 20 -8 3.2 The last column shows the contributors/components of the chi-square test statistics ● The two biggest contributions to the chi-square statistic came frm Jan-Mar and Oct-Dec ○ Jan-March -> 12 more players born than expected ○ October-December -> 9 fewer players were born than expected 11.2 Inference for Two-Way Tables If we want to compare the proportions of successes in more than two samples or groups OR compare the distributions of a single categorical variable across several populations or treatments -> chi-square test for homogenity ● Present data in a two-way table ○ Can be used to compare distributions of a single categorical variable ○ Can also be used to summarize relationships between two categorical variables To determine if there is a convincing evidence of an association between two categorical variables -> chi-square test for independence Tests for Homogeneity Stating Hypotheses ● ● Null hypothesis (in general) says that there is no difference in the true distribution of a categorical variable in the populations of interest or for the treatments in an experiment Alternative hypothesis (in general) says that there is a difference in the distributions but does not specify that nature of that difference ○ Alternative does not state that each distribution is different from each of the others ■ Alternative will be true even if just one of the true distributions is different from the others Restaurant example: does background music influence what customers buy? -> experiment in a restaurant compared 3 randomly assigned treatments (no music, French accordion music, Italian string music) + recorded # of customers who ordered French, Italian, and other entrees ● Hₒ : there is no difference in the true distributions of entrees ordered at this restaurant when no musics, French accordion music, or Italian string music is played ○ Can also say -> the distribution of a categorical variable is the same for each of several populations or treatments ○ prefer “no difference” because it’s more consisten with the language we used in previous sig tests ● Hₐ : there is a difference in the true distributions of entrees ordered at this restaurant when no music, French accordion music, or Italian string music is played ○ Any difference among the three observed distributions of entree ordered is evidence against the null and for the alternative Relative frequency bar graphs comparing the distributions of entrees ordered for difference music conditions ● ● Type of entree that customers order seems to differ considerably across the three music treatments Orders of Italian entrees are very low (1.3%) when French music is playing but higher when Italian music (22.6%) or no music (13.1%) is playing ● French entrees seem popular as they are ordered often under all music conditions, but more often when French music is playing ● For all three music treatments, the percent of Other entrees ordered was similar Do the differences in these distributions provide convincing evidence that background music affects customer behavior at this restaurant? OR Is it plausible that the background music has no effect on customer behavior and that these differences are due to chance involved in the random assignment of treatments? ● We have to know how likely it is to get differences this big or bigger when the null is true -> need a P-value We could compare many pairs of proportions -> ending up with many tests and many Pvalues ● This is a bad idea -> performing multiple tests on the same data increases the probability that we make a Type I error in at least one of the tests It’s also a bad idea to pick out one large difference from the two-way table and then perform a sig test as if it were the only comparison we had in mind -> P-hacking ● A test comparing the proportions of French entrees ordered under the no music and French accordion music treatments show that the difference is statistically significant (z = 2.06, P = 0.039) ● But the proportions of Italian entrees ordered for the no music and Italian string music treatments do not differ significantly (z = 1.61, P = 0.107) ○ Reporting the results of the first tests wouldn’t tell the whole story The problem of how to do many comparisons at once without increasing the overall probability of a Type I error is common in stats We can: ● Perform an overall tests to see if there is convincing evidence of any differences among the parameters that we want to compare ○ Uses chi-square tests statistic ● When the overall tess shows there IS convincing evidence of a difference -> perform a detailed follow up analysis to decide which of the parameters differ + estimate how large the differences are Expected Counts and the Chi-Square Test Statistic ● ● Hₒ : there is no difference in the distribution of a categorical variable for several populations or treatments Hₐ : there is a difference in the distribution of a categorical variable for several populations or treatments Compare the observed counts in a two-way table with the counts we would expect if the null was true Calculating expected counts for a Chi-Square Test Based on Data in a Two-Way Table When the null is true, the expected count in any cell of a two-way table is ● Expected count = 𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 (𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝) 𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 We then compare the observed counts with the expected counts using the chi-square statistic ● 𝑝 2 =∑ (𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 − 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝) 2 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 Back to restaurant example: null -> no difference in distribution of entrees ordered when no music, French accordion music, or Italian string music is played Find expected counts by assuming that null is true ● Out of the 243 entrees ordered, 99 were French -> so we expect the same proportion of French entrees to be ordered under all 3 music conditions (99/243 = 0.4074) ○ No music (84 entrees): 84 (0.4074) = 34.22 expected French entrees ○ French music (75 entrees): 75 (0.4074) = 30.56 expected French entrees ○ Italian music (84 entrees): 84 (0.4074) = 34.22 expected French entrees ● Out of the 243 entrees ordered, 31 were Italian -> so we expect the same proportion of Italian entrees to be ordered under all 3 music conditions (31/243 = 0.1276) ○ No music (84 entrees): 84 (0.1276) = 10.72 expected Italian entrees ○ French music (75 entrees): 75 (0.1276) = 9.57 expected Italian entrees ○ Italian music (84 entrees): 84 (0.1276) = 10.72 expected Italian entrees ● Out of the 243 entrees ordered, 113 were Other-> so we expect the same proportion of Other entrees to be ordered under all 3 music conditions (113/242 = 0.465) ○ No music (84 entrees): 84 (0.4654) = 39.06 expected Other entrees ○ French music (75 entrees): 75 (0.465) = 34.88 expected Other entrees ○ Italian music (84 entrees): 84 (0.465) = 39.06 expected Other entrees ● ● Values for no music and Italian music are the same -> 84 total entrees ordered under each condition -> expect the distributions of entree choice to be the same Example: we found expected count of French entrees when on music was 99 ) - 34.22 243 84 (99) 99 (84) or 243 = 243 playing by doing 84 ( ○ Rewrite as 34.22 Chi-square statistic ● Sum is over all the cells (9 total) ● 𝑝 2 2 = (20−34.22) 34.22 + (29−30.56) 2 (35−39.06) 2 +. . . = 30.56 39.06 0.52 + 2.33 + … + 0.42 = 18.28 Female president example: does the gender of an interviewer affect the responses to a suruvery question?, half of the 100 males were randomly assigned to be asked “would you vote for a female president” by a female interviewer, the other half by a male interviewer State the appropriate null and alternative hypothesis ● ● Hₒ : there is no difference in the true distributions of response to this question when asked by a male interviewer and when asked by a female interviewer for subjects like these Hₐ : there is a difference in the true distributions of response to this question when asked by a male interviewer and when asked by a female interviewer for subjects like these Show the calculation for the expected count in the Male/Yes cell + provide a complete table of expected counts ● Expected count for Male/Yes cell is 50 (69) = 100 34.5 ● Calculate the value of the chi-square test statistic ● 𝑝 2 2 = (30−24.5) 34.5 + (39−34.5) 2 +. ..= 34.5 4.25 Conditions and P-values Conditions for Performing a Chi-Square Test for Homogeneity ● ● ● Random: the data comes from a random sample from the poulation of interest 10%: when sampling without replacement, n < 0.10 N for each sample Large Counts: all EXPECTED counts are at least 5 Think of the chi-square test statistic 𝑝 2as a measure of how much the observed counts deviate from the expected counts ● Large values of 𝑝 2are evidence against the null and in favor of the alternative P-value measures the stregnth of the evidence ● When conditions are met -> P-values for a chi-square test for homogeneity come from a chi-square distribution ● Df = (number of rows -1) (number of columns - 1) In calc: use matrices ● 2nd -> x^-1 which is matrix -> edit -> choose A ● Enter dimensions of matrix as the dimensions of the table ● Same locations as in the table ● Stat -> test -> 𝑝 2test -> observed is matrix A and expected is matrix B ○ With calc be sure to name the procedure 𝑝 2test for homogeneity, report the test statistic, degrees of freedom, and p-value Back to restaurant example: conditions are met ● Random: the three treatments were assigned at random ● Large Counts: all expected counts are at least 5 (see table) ● 10%: doesn’t need to be checked because the researchers were not sampling without replacement from some population of interest -> performed an experiment using customers who happened to be in the restaurant at the same time What are the degrees of freedom for the distribution? ● Df = (3-1) (3-1) = 4 What is the P-value? ● P (𝑝 2> 18.28) w/ df = 4 is 0.0011 ○ Because the P-value of 0.0011 < ⍺ = 0.05, we reject the null. There is convincing evidence that there is a difference in the true distributions of entrees ordered at this restaurant when no music, French accordion music, or Italian string music is played Back to female president example: Verify that the conditions for inference are met ● ● ● Random: treatments were randomly assigned Large Counts: all expected counts are greater than or equal to 5 (see table of expected counts) 10%: don’t need to check because they aren’t ranodmly selecting subjects from some population Find the P-value ● ● Df = (3-1)(2-1) = 2 P (𝑝 2> 4.25) = 0.119 Interpret the P-value ● Assuming that the gender of the interviewer doesn’t affect responses to this question, there is a 0.119 probability of observing differences in the distributions of reponses as large or as larger than those in this study by chance alone What conclusion would you draw? ● Because the P-value of 0.119 > ⍺ = 0.05, we fail to reject the null. There is not convincing evidence of a difference in the true distributions of response to this question when asked by a male interviewer and when asked by a female interviewer for subjects like these The Chi-Square Test for Homogeneity To compare the distribution of a categorical variable in several populations or for several treatments Suppose the conditions are met. To perform a test of Hₒ : there is no difference in the distribution of a categorical variable for several populations or treatments ● Compute the chi-square test statistic ○ Sum is over all cells in the two-way table ● P-value is the area to the right of 𝑝 2 under the chi-square density curve with degrees of freedom = (number of rows - 1) (number of columns - 1) Speaking english example:survey residents of Australia, UK, and US “how important do you think it is to be able to speak English?” Do these data provide convincing evidence at the ⍺ = 0.05 level that the distributions of opinion about speaking English differ for residents of Australia, the UK, and the US? State: ● ● Hₒ : there is no difference in the true distributions of opinion about speaking English for residents of Australia, the UK, and the US Hₐ : there is a difference in the true distributions of opinion about speaking English for residents of Australia, the UK, and the US We’ll use ⍺ = 0.05 ● Plan: Chi-square test for homogeneity ● Random: independent random samples of residents from the three countries ● 10%: 1000 < 10% of all Australian residents, 1460 < 10% of all UK residents, 1003 < 10% of all US residents ● Large Counts: all expected counts are > 5 ○ Can’t just say this but have to SHOW the expected counts Do: 2 2 = (690−741.8) 741.8 (1177−1083.1) 2 +. .. 1083.1 ● Test statistic: 𝑝 ● ● Df = (4-1)(3-1) = 6 P (𝑝 2> 68.57) with df = 6 is approximately 0 + = 68.57 Conclude: ● Because the p-value of approximately 0 < ⍺ = 0.05, we reject the null. There is convincing evidence that there is a difference in the true distributions of opinion about speaking English for residents of Australia, the UK, and the US ○ There is some evidence for the alternative because the observed counts differ from the expected counts What if we want to compare several populations? ● Many studies involve comparing the proportion of successes for each of several populations or reatments ○ Two-sample z test allows us to test the null hypothesis which states that p1 and p2 are the same (the two proportions of successes for the two populations or treatments) ○ Chi-square test for homogeneity allows us to test the null hypothesis which states that p1 = p2 = … = pk (no difference in the proportions of successes for the k populations or treatments) against the alternative hypohesis (at least two of the true prportions are different) ■ It’s not ALL the proportions are different ■ The opposite of all proportions are equal is some of the proportions are not equal ● Chi-square test for homogeneity compares the distribution of a categorical variable for any number of populations or treatments ○ If the test allows us to reject the null hypothesis of no difference -> it’s time to examine the differences in detail ■ Identify the cells that contribute the most to the chi-square statistic ■ ● Describe how the observed and expected counts differ in those categories (note direction of the difference) Restaurant example: significant differences among the distributoins of entrees ordered under each of the three music conditions ○ ○ The two components that contribute the most to the chi-square statistic are Italian entrees with French music and Italian entrees with Italian music -> orders of Italian entrees are far below what we expect when French music is playing and far above what we expect when Italian music is playing ■ Orders of Italian entrees are strongly affected by Italian and French music Relationships Between Two Categorical Variables Two-tables can summarize data from different types of studies. We can compare the distribution of a categorical variable for several populations or treatments and use the chi-square test for homogeneity to perform inference in such settings. A two-way table can also be made when a SINGLE random sample of individuals is chosen from a SINGLE population and then classified based on TWO categorical variables -> we want to analyze the relationship between the variables Stating Hypotheses We are interested in whether the sample data provide convincing evidence that the variables have an association in the population ● Does knowing the value of one variable help predict the value of the other variable for individuals in the population? ● To determine if evidence from a sample is convincing -> perform a chi-square test for independence Null -> there is NO association between the two categorical variables in the population of interest ● No association = knowin the value of one variable does not help us predict the value of the other (the variables are INDEPENDENT) Alternative -> there IS an aossication between the variables Expected Counts Compare the observed counts in a two-way table with the expected counts if the null is true ● If we assume the null is true -> we assume the two categorical variables of interest are independent ● Therefore, we can use the definition of independent events to calculate the expected counts ○ P (A | B) = P (A) 𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 (𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝) ● Expected count = 𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 Conditions and Calculations 10% and Large Counts conditions for chi-square test for independence are the same as the test for homogeneity ● Difference in the Random condition -> a test for independence uses data from a single random sample (but a test for homogeneity uses data from two or more independent random samples or from two or more groups in a randomized experiment) Conditions for Performing a Chi-Square Test for Independence ● ● ● Random: the data comes from a random sample from the poulation of interest 10%: when sampling without replacement, n < 0.10 N for each sample Large Counts: all EXPECTED counts are at least 5 When the conditions are met -> use familiar 𝑝 2 test statistic to measure the strength of the association between the number of variables in the sample ● P-values come from a chi-square distribution with ○ Df = (number of rows - 1) x (number of columns -1) The Chi-Square Test for Independence When we want to test for an association between two categorical variables in a population, we use a chi-square test for independence Suppose the conditions are met. To perform a test of Hₒ : there is no association between two categorical variables in the population of interest ● Compute the chi-square test statistic ○ Sum is over all cells in the two-way table ● P-value is the area to the right of 𝑝 2 under the chi-square density curve with degrees of freedom = (number of rows - 1) (number of columns - 1) Anger and heart disease example: are people who are prone to sudden anger more likely to develop heart disease? Observational study with random sample of 8474 people with normal blood pressure, they were free of heart disease at the beginning of the study, took the Spielberger Trait Anger Scale to measure how prone they are to sudden anger and recorded if each individual developed coronary heart disease ● ● As the anger score increases, so does the percent who suffer heart disease A much higher percent of people in the high anger category developed CHD (4.27%) than in the moderate (2.33%) and low (1.70%) anger categories Does the data provide convincing evidence of an association between the variables in the larger population? Or is it plausible that there is no association between the variables in the population and that we osbreved an association in the sample by chance alone? Testing the hypotheses: ● Hₒ : there is no association between anger level and heart-disease status in the population of people with normal blood pressure ○ OR anger and heart-disease status are independent in the populatin of people with normal blood pressure ● Hₐ : there is an association between anger level and heart-disease status in the population of people with normal blood pressure ○ OR anger and heart-disease status are not independent in the populatin of people with normal blood pressure Expected counts: ● Null -> there is no association between anger level and heart disease status in the population of interest ○ If we assume that this is true -> anger level and CHD status are independent ○ Use def of independent events ○ ● Chance process in this case is randomly selected a person and recording their anger level and CHD status Start with “Yes” and “low anger” ○ 190/8474 people had CHD so P (Yes) = 190/8474 ○ If null is true and anger level and CHD status are independent -> knowing that the the selected individual is low anger doesn’t change the probability that this person develops CHD ■ P (Yes | low anger) = P (Yes) = 190/8473 = 0.02242 ■ 31110 low anger (0.02242) = 69.73 -> we expect that 69.73 of the 3110 low-anger people in the study would get CHD 𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 (𝑝𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝) 190 (3110) ■ = = 69.73 𝑝𝑝𝑝𝑝𝑝 𝑝𝑝𝑝𝑝𝑝 8474 ● Conditions ● Random: the data comes from a random sample of 8474 people with normal blood pressure ● 10%: it is reasonable to assume that 8474 < 10% of all people with normal blood pressure ● Large Counts: all the expected counts are at least 5 (see previous table) Test statistic and P-value 2 (53−69.73) 2 69.73 (110−106.08) 2 +. ..= 106.08 ● 𝑝 ● ● With df = (2-1) x (3-1) = 2 -> P-value = 0.00032 = + 16.077 Because the P-value of 0.00032 < ⍺ = 0.05, we reject the null. We have convincing evidence of an association between anger level and heart-disease status in the population of people with normal blood pressure Follow-up analysis ● The two cells that contribued most of the chi-square test statistic were Low anger + Yes (4.014) and High anger + Yes (11.564) ● A much smaller number of low-anger people developed CHD than expected and a much larger number of high-anger people got CHD than expected ● BUT we cannot conclude that proneness to anger causes heart disease ○ The anger and heart-disease study is an observational study (not an experiment) ○ It’s not surprising that some other variables are confounded with anger level (ex: people prone to anger are more likely to be me who drink and smoke) ■ Don’t know if increased rate of heart disease among those with high anger levels in the study is because of their anger or because or their drinking and smoking or because of their gender Snowmobiles example: 1526 sample of visitors to Yellowstone, asked “do you belong to an environmental club?” and “what is your experience with a snowmobile: own, rent, never used?” Do the data provide convincing evidence of an association between environmental club status and type of snowmobile use in the population of winter visitors to Yellowstone National Park? State: ● ● Hₒ : there is no association between environmental club status and type of snowmobile use in the pouplation of winter vistors to Yellowstone ○ OR environmental club status and type of snowmbile use are independent in the population of winter visitors to Yellowstone Hₐ : there is an association between environmental club status and type of snowmobile use in the pouplation of winter vistors to Yellowstone ○ OR environmental club status and type of snowmbile use are not independent in the population of winter visitors to Yellowstone We’ll use ⍺ = 0.05 ● Plan: Chi-square test for independence ● Random: random sample of 1526 winter visitors to Yellowstone ● 10%: it is reasonable to assume that 1526 < 10% of all winter visitors to Yellowstone ● Large Counts: all expected counts are at least 5 ○ Can’t just say this but have to SHOW the expected counts ● Because the observed counts differ from the expected counts, there is some evidence for the alternative hypothesis ● Test statistic: 𝑝 ● Df = (3-1)(2-1) = 2 ● P (𝑝 ○ Do: 2 2 = (445−525.7) 525.7 + (212−131.3) 2 +. .. 131.3 = 116.6 116.6) with df = 2 is 4.82 x 10⁻ ²⁶ Remember that P-value are probabilities that MUST be between 0 and 1 so double check with the calculator if it looks like your P-value is greater than 1 2> Conclude: ● Because the p-value of approximately 0 < ⍺ = 0.05, we reject the null. We have convincing evidence of an association between environmental club status and type of snowmobile use in the population of winter visitors to Yellowstone National Park ○ There is some evidence for the alternative because the observed counts differ from the expected counts Using Chi-Square Tests Wisely The chi-square test for homogeneity and the chi-square test for independence start with a two-way table of observed counts, calculate the test statistic, degrees of freedom, and P-value in the same way BUT ● A chi-square test for homogeneity tesets whether the distribution of a categorical variable is the same for each of several populations or treatments ● The chi-square test for independence tests whether two categorical variables are associated in some population of interest Examples: In tests for homogeneity -> one set of totals are known by the researchers before the data are collected ● Only one set of totals was left to vary ● Select independent random samples (or randomly assign treatment) and compare the distribution of a single categorical variable ● ● Gender of interviewer: Abby and Mia decided in advance to randomly assign 50 subjects to each treatment ○ English speaking: researchers knew in advance that they would survey 1000 people from Australia, 1460 from the U.K, and 1003 from the U.S. In tests for independence -> neither set of totals is known in advance ● Select one sample and record the values of two variables for each member ● Yellowstone: researchers didn’t know anything about either variable ahead of time, only knew that they would survey 1526 visitors ○ It sucks because it is common to see questions about association when a test for homogeneity applies and to see questions about differences between proportions or the distribution of a variable when a test for independence applies ● **consider how the data was produced** ● If data comes from two or more independent random samples or treatment groups in a randomized experiment -> do a chi-square test for homogeneity ● If data comes from a single random sample w/ individuals classified according to two categorical variables -> use chi-square test for independence Scary movies: are men and women equally likely to suffer lingering fear from watching scary movies as children? Asked random smaple of 117 college students to write narrative accounts of exposure to scary movies before the age of 13, more than ¼ said that some of the fright symptoms are still present when awake Assume that conditions for performing inference are met. Output for a chi-square test using these data is shown Should a chi-square test for independence or a chi-square test for homogeneity be used in this setting? ● Chi-square test for independence -> data was produced using a single random sample of college students then classified according to two variables (gender and whether they had fright symptoms) ○ Chi-squared test for homogeneity requires independent random samples from each population State an appropriate pair of hypotheses for researchers to test in this setting ● ● Hₒ : there is no association between gender and whether or not college students have lingering fright symptoms Hₐ : there is an association between gender and whether or not college students have lingering fright symptoms Which cell contributes the most to the chi-square test statistic? In what way does this cell differ from what the null hypothesis suggests? ● Men + having fright symptoms account for the largest component of the chisquaered test statistic ○ Far fewer men in the sample admitted t lingering fright symptoms than we would expect if the null was true Interpret the P-value. What conclusion would you draw at ⍺ = 0.01? ● If there is no association between gender and whether or not college students have lingering fright symptoms, there is a 0.045 probability of obtaining an association as strong or stronger than the one observed in the random sample of 117 students ● Because the P-value of 0.045 > ⍺ = 0.05, we fail to reject the null. We do not have convincing evidence that there is an association between gender and whether or not college students have lingering fright symptoms What if we want to compare two proportions? Shopping example: second hand stores have become popular, study of customers’ attitudes toward second hand stores interviewed separate random samples of shoppers at two second hand stores of the same chain in different cities Do the data provide convincing evidence of a difference in the distributions of gender for shoopers at these two stores -> chi-square test for homogeneity Hypotheses: ● Hₒ : there is no difference in the distributinos of gender for shoppers at these two stores ● Hₐ : there is a difference in the distributions of gender for shoppers at these two stores A difference in distribution of gender -> a difference in the true proportion of female shoppers at the two stores -> can use two-sample z test Hypotheses: ● Hₒ : Pa - Pb = 0 ● Hₐ: Pa - Pb ≠ 0 ○ Pa and Pb are the true proportions of female shoppers at Store A and Store B When performing both tests, the textbook obtained the same P-values ● The chi-square test statistic was the same as the square of the two-sample z statistic The chi-square test for homogeneity based on a 2 x 2 table is equivalent to a two-sample test for p1 - p2 with a two-sided alternative hypotheses HOWEVER ● If the two-way table is larger than 2 x 2 and alternative hypothesis is one-sided -> use two-sample z test for difference in proportions rather than chi-square test ● If the table is 2 x 2 and the alternative hypothesis is one-sided -> use a twosample z test for a difference in proportions rather than chi-square test ● If the table is 2 x 2 and you want to construct a confidence interval for a difference in proportions -> only option is a two-sample z interval Grouping Quantitative Data Into Categories Can convert a quantitative variable to a categorical variable by grouping together intervals of values Example: researchers surveyed independent random samples of shoppers at two second hand stores of the same chain in different cities, table summarizes data on the incomes of the shoppers in the two samples ● Personal income is a quantitative variable but by grouping the values of this variable we created a categorical variable ● Can then carry out chi-square test for homogeneity because data comes from independent random samples of shoppers at the two stores ○ Comparing distribution of income for shoppers at the two stores would give more info than just comparing mean inocmes What if some of the expected cell counts are less than 5? One strategy is to collapse the table by combining two or more rows or columns to ensure that all expected counts are greater than or equal to 5 -> will allow us to run the chi-square test we want to run ● Ries also said it is ok if less than 20% of the cells have an expected count that is less than 5