The Statistical Imagination • Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions © 2008 McGraw-Hill Higher Education The Chi-Square Test • Chi-Square is a test for a relationship between two nominal variables • Calculations are made using a crosstabulation (or “crosstab”) table, which reports frequencies of joint occurrences of attributes © 2008 McGraw-Hill Higher Education Crosstab Tables • Cross-tabulation or “crosstab” tables are designed to compare the frequencies of two nominal/ordinal variables at once © 2008 McGraw-Hill Higher Education Sample Crosstab Table • Spent night on streets in last 2 weeks by gender among homeless persons On streets Yes No Total Male 28 79 107 Female 10 44 54 © 2008 McGraw-Hill Higher Education Total 38 123 161 Reading a Crosstab Table • The number in a cell is the frequency of joint occurrences, where a joint occurrence is the combination of categories of the two variables for a single individual • From the cell, look up then look to the left • E.g., in the table above, the joint occurrence of “male and on-street” is 28, the number in the sample who are both male and spent a night on the streets © 2008 McGraw-Hill Higher Education Reading a Crosstab Table (cont.) • The numbers in the margins on the right side and the bottom present marginal totals, the total number of subjects in a category • The grand total (n, the sample size) is presented in the bottom right-hand corner © 2008 McGraw-Hill Higher Education Crosstab Tables and the Chi-Square Test • For the chi-square test, the categories of the independent variable (X) go in the columns of the table, and those of the dependent variable (Y), in the rows • E.g.: Is gender a good predictor of who among homeless persons is likely to spend a night on the streets? © 2008 McGraw-Hill Higher Education Calculating Expected Frequencies • In addition to the observed joint frequencies, the chi-square test involves calculating the expected frequency of each table cell • The expected frequency of a cell is equal to the column marginal total for the cell (look down) times the row marginal total for cell (look to the right) divided by the grand total © 2008 McGraw-Hill Higher Education Using Expected Frequencies to Test the Hypothesis • The expected frequencies are those that would occur if there is no relationship between the two nominal/ordinal variables • The chi-square statistic measures the gap between expected and observed frequencies • If there is no relationship, then the expected and observed frequencies are the same and chi-square computes to zero © 2008 McGraw-Hill Higher Education The Chi-Square Statistic • The sampling distribution is generated using the chi-square equation: χ2 = Σ[(O-E)2/ E] where O is the observed frequency of a cell, and E is the expected frequency • Chi-square tells us whether the summed squared differences between the observed and expected cell frequencies are so great that they are not simply the result of sampling error © 2008 McGraw-Hill Higher Education When to Use the Chi-Square Statistic 1) There is one population with a representative sample from it 2) There are two variables, both of a nominal/ordinal level of measurement 3) The expected frequency of each cell in the crosstab table is at least five © 2008 McGraw-Hill Higher Education Features of the Chi-Square Hypothesis Test • Step 1. The H0 states that there is no relationship between the two variables. When this is the case, chisquare calculates to a value of zero, give or take some sampling error • This null hypothesis asserts no difference in observed and expected frequencies © 2008 McGraw-Hill Higher Education Features of the Chi-Square Hypothesis Test (cont.) • Step 2. The sampling distribution is the chisquare distribution. It describes all possible outcomes of the chi-square statistic with repeated sampling when there is no relationship between X and Y • Degrees of freedom are determined by the number of columns and rows in the crosstab table: df = (r -1) (c -1) © 2008 McGraw-Hill Higher Education Features of the Chi-Square Hypothesis Test (cont.) • Step 4. The test effects are the differences between expected and observed frequencies • The test statistic is the chi-square statistic • The p-value is obtained by comparing the calculated chi-square value to the critical values of the chi-square distribution in Statistical Table G of Appendix B © 2008 McGraw-Hill Higher Education The Existence of a Relationship for the Chi-Square Test • Existence: Test the H0 that χ2 = 0; that is, there is no relationship between X and Y • If the H0 is rejected, a relationship exists © 2008 McGraw-Hill Higher Education Direction and Strength of a Relationship for Chi-Square • Direction: Not applicable (because the variables are nominal level) • Strength: These measures exist but are seldom reported because they are prone to misinterpretation © 2008 McGraw-Hill Higher Education Nature of a Relationship for the Chi-Square Test • Nature: Report the differences between the observed and expected cell frequencies for a couple of outstanding cells • Calculate column percentages for selected cells © 2008 McGraw-Hill Higher Education Column and Row Percentages • A column percentage is a cell’s frequency as a percentage of the column marginal total • A row percentage is a cell’s frequency as a percentage of the row marginal total © 2008 McGraw-Hill Higher Education Chi-Square as a Difference of Proportions Test • The chi-square test is frequently used to compare proportions of categories of a nominal/ordinal variable for two or more groups of a second nominal/ordinal variable • Thus, it may be viewed as a difference of proportions test as illustrated in Figure 13-2 in the text © 2008 McGraw-Hill Higher Education The Binomial Distribution • The binomial distribution test is a small single-sample proportions test. Contrast it to the large single-sample proportions test of Chapter 10 • The test hinges on mathematically expanding the binomial distribution equation, (P + Q)n © 2008 McGraw-Hill Higher Education When to Use the Binomial Distribution 1) There is only one nominal variable and it is dichotomous, with P = p [of success] and Q = p [of failure] 2) There is a single, representative sample from one population 3) Sample size is such that [(psmaller)(n)] < 5, where psmaller = the smaller of Pu and Qu 4) There is a target value of the variable to which we may compare the sample proportion © 2008 McGraw-Hill Higher Education Expansion of the Binomial Distribution Equation • Expansion of the binomial distribution equation, (P + Q)n, provides the sampling distribution for dichotomous events. That is, the equation describes all possible sampling outcomes and the probability of each, where there are only two possible categories of a nominal variable © 2008 McGraw-Hill Higher Education An Example of an Expanded Binomial Equation • The equation reveals, for example, the possible outcomes of the tossing of 4 coins • P = p [heads] = .5; Q = p [tails] = .5; n = 4 coins • (P + Q)4 = P4 + 4P3Q1 + 6P2Q2 + 4P1Q3 + Q4 • Add the coefficients to get the total number of possible outcomes = 16 • The probability of 3 heads and 1 tails, is the coefficient of P3Q1 over the sum of coefficients = 4 over 16 = .25 © 2008 McGraw-Hill Higher Education Pascal’s Triangle • Pascal’s Triangle provides a shortcut method for expanding the binomial equation • It provides the coefficients for small samples and allows a quick computation of the probabilities of all possible outcomes when P and Q are equal to .5 • See Table 13-7 in the text © 2008 McGraw-Hill Higher Education Features of the Binomial Distribution Test • Step 1. H0: Pu = a target value • Step 2. The sampling distribution is an expanded binomial equation for the given sample size © 2008 McGraw-Hill Higher Education Features of the Binomial Distribution Test (cont.) • Step 4. The effect is the observed combination of successes and failures, which corresponds to a term in the equation (e.g., 3 heads and 1 tails, is represented by the term 4P3Q1) • The test statistic is the expanded binomial equation • The p-value is taken directly from the equation (not from a statistical table) © 2008 McGraw-Hill Higher Education Statistical Follies: Statistical Power and Sample Size • For a given level of significance, statistical power is a test statistic’s probability of not incurring a Type II error (i.e., unknowingly making the incorrect decision of failing to reject a false null hypothesis) • Low statistical power can result from having too small a sample size © 2008 McGraw-Hill Higher Education