The Statistical Imagination

The Statistical Imagination • Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions © 2008 McGraw-Hill Higher Education The Chi-Square Test • Chi-Square is a test for a relationship between two nominal variables • Calculations are made using a crosstabulation (or “crosstab”) table, which reports frequencies of joint occurrences of attributes © 2008 McGraw-Hill Higher Education Crosstab Tables • Cross-tabulation or “crosstab” tables are designed to compare the frequencies of two nominal/ordinal variables at once © 2008 McGraw-Hill Higher Education Sample Crosstab Table • Spent night on streets in last 2 weeks by gender among homeless persons On streets Yes No Total Male 28 79 107 Female 10 44 54 © 2008 McGraw-Hill Higher Education Total 38 123 161 Reading a Crosstab Table • The number in a cell is the frequency of joint occurrences, where a joint occurrence is the combination of categories of the two variables for a single individual • From the cell, look up then look to the left • E.g., in the table above, the joint occurrence of “male and on-street” is 28, the number in the sample who are both male and spent a night on the streets © 2008 McGraw-Hill Higher Education Reading a Crosstab Table (cont.) • The numbers in the margins on the right side and the bottom present marginal totals, the total number of subjects in a category • The grand total (n, the sample size) is presented in the bottom right-hand corner © 2008 McGraw-Hill Higher Education Crosstab Tables and the Chi-Square Test • For the chi-square test, the categories of the independent variable (X) go in the columns of the table, and those of the dependent variable (Y), in the rows • E.g.: Is gender a good predictor of who among homeless persons is likely to spend a night on the streets? © 2008 McGraw-Hill Higher Education Calculating Expected Frequencies • In addition to the observed joint frequencies, the chi-square test involves calculating the expected frequency of each table cell • The expected frequency of a cell is equal to the column marginal total for the cell (look down) times the row marginal total for cell (look to the right) divided by the grand total © 2008 McGraw-Hill Higher Education Using Expected Frequencies to Test the Hypothesis • The expected frequencies are those that would occur if there is no relationship between the two nominal/ordinal variables • The chi-square statistic measures the gap between expected and observed frequencies • If there is no relationship, then the expected and observed frequencies are the same and chi-square computes to zero © 2008 McGraw-Hill Higher Education The Chi-Square Statistic • The sampling distribution is generated using the chi-square equation: χ2 = Σ[(O-E)2/ E] where O is the observed frequency of a cell, and E is the expected frequency • Chi-square tells us whether the summed squared differences between the observed and expected cell frequencies are so great that they are not simply the result of sampling error © 2008 McGraw-Hill Higher Education When to Use the Chi-Square Statistic 1) There is one population with a representative sample from it 2) There are two variables, both of a nominal/ordinal level of measurement 3) The expected frequency of each cell in the crosstab table is at least five © 2008 McGraw-Hill Higher Education Features of the Chi-Square Hypothesis Test • Step 1. The H0 states that there is no relationship between the two variables. When this is the case, chisquare calculates to a value of zero, give or take some sampling error • This null hypothesis asserts no difference in observed and expected frequencies © 2008 McGraw-Hill Higher Education Features of the Chi-Square Hypothesis Test (cont.) • Step 2. The sampling distribution is the chisquare distribution. It describes all possible outcomes of the chi-square statistic with repeated sampling when there is no relationship between X and Y • Degrees of freedom are determined by the number of columns and rows in the crosstab table: df = (r -1) (c -1) © 2008 McGraw-Hill Higher Education Features of the Chi-Square Hypothesis Test (cont.) • Step 4. The test effects are the differences between expected and observed frequencies • The test statistic is the chi-square statistic • The p-value is obtained by comparing the calculated chi-square value to the critical values of the chi-square distribution in Statistical Table G of Appendix B © 2008 McGraw-Hill Higher Education The Existence of a Relationship for the Chi-Square Test • Existence: Test the H0 that χ2 = 0; that is, there is no relationship between X and Y • If the H0 is rejected, a relationship exists © 2008 McGraw-Hill Higher Education Direction and Strength of a Relationship for Chi-Square • Direction: Not applicable (because the variables are nominal level) • Strength: These measures exist but are seldom reported because they are prone to misinterpretation © 2008 McGraw-Hill Higher Education Nature of a Relationship for the Chi-Square Test • Nature: Report the differences between the observed and expected cell frequencies for a couple of outstanding cells • Calculate column percentages for selected cells © 2008 McGraw-Hill Higher Education Column and Row Percentages • A column percentage is a cell’s frequency as a percentage of the column marginal total • A row percentage is a cell’s frequency as a percentage of the row marginal total © 2008 McGraw-Hill Higher Education Chi-Square as a Difference of Proportions Test • The chi-square test is frequently used to compare proportions of categories of a nominal/ordinal variable for two or more groups of a second nominal/ordinal variable • Thus, it may be viewed as a difference of proportions test as illustrated in Figure 13-2 in the text © 2008 McGraw-Hill Higher Education The Binomial Distribution • The binomial distribution test is a small single-sample proportions test. Contrast it to the large single-sample proportions test of Chapter 10 • The test hinges on mathematically expanding the binomial distribution equation, (P + Q)n © 2008 McGraw-Hill Higher Education When to Use the Binomial Distribution 1) There is only one nominal variable and it is dichotomous, with P = p [of success] and Q = p [of failure] 2) There is a single, representative sample from one population 3) Sample size is such that [(psmaller)(n)] < 5, where psmaller = the smaller of Pu and Qu 4) There is a target value of the variable to which we may compare the sample proportion © 2008 McGraw-Hill Higher Education Expansion of the Binomial Distribution Equation • Expansion of the binomial distribution equation, (P + Q)n, provides the sampling distribution for dichotomous events. That is, the equation describes all possible sampling outcomes and the probability of each, where there are only two possible categories of a nominal variable © 2008 McGraw-Hill Higher Education An Example of an Expanded Binomial Equation • The equation reveals, for example, the possible outcomes of the tossing of 4 coins • P = p [heads] = .5; Q = p [tails] = .5; n = 4 coins • (P + Q)4 = P4 + 4P3Q1 + 6P2Q2 + 4P1Q3 + Q4 • Add the coefficients to get the total number of possible outcomes = 16 • The probability of 3 heads and 1 tails, is the coefficient of P3Q1 over the sum of coefficients = 4 over 16 = .25 © 2008 McGraw-Hill Higher Education Pascal’s Triangle • Pascal’s Triangle provides a shortcut method for expanding the binomial equation • It provides the coefficients for small samples and allows a quick computation of the probabilities of all possible outcomes when P and Q are equal to .5 • See Table 13-7 in the text © 2008 McGraw-Hill Higher Education Features of the Binomial Distribution Test • Step 1. H0: Pu = a target value • Step 2. The sampling distribution is an expanded binomial equation for the given sample size © 2008 McGraw-Hill Higher Education Features of the Binomial Distribution Test (cont.) • Step 4. The effect is the observed combination of successes and failures, which corresponds to a term in the equation (e.g., 3 heads and 1 tails, is represented by the term 4P3Q1) • The test statistic is the expanded binomial equation • The p-value is taken directly from the equation (not from a statistical table) © 2008 McGraw-Hill Higher Education Statistical Follies: Statistical Power and Sample Size • For a given level of significance, statistical power is a test statistic’s probability of not incurring a Type II error (i.e., unknowingly making the incorrect decision of failing to reject a false null hypothesis) • Low statistical power can result from having too small a sample size © 2008 McGraw-Hill Higher Education

The Statistical Imagination

Related documents

Products

Support

The Statistical Imagination

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib