The Statistical Imagination

advertisement
The Statistical Imagination
• Chapter 13:
Nominal Variables: The
Chi-Square and Binomial
Distributions
© 2008 McGraw-Hill Higher Education
The Chi-Square Test
• Chi-Square is a test for a relationship
between two nominal variables
• Calculations are made using a crosstabulation (or “crosstab”) table, which
reports frequencies of joint
occurrences of attributes
© 2008 McGraw-Hill Higher Education
Crosstab Tables
• Cross-tabulation or “crosstab” tables
are designed to compare the
frequencies of two nominal/ordinal
variables at once
© 2008 McGraw-Hill Higher Education
Sample Crosstab Table
• Spent night on streets in last 2 weeks
by gender among homeless persons
On streets
Yes
No
Total
Male
28
79
107
Female
10
44
54
© 2008 McGraw-Hill Higher Education
Total
38
123
161
Reading a Crosstab Table
• The number in a cell is the frequency of
joint occurrences, where a joint occurrence
is the combination of categories of the two
variables for a single individual
• From the cell, look up then look to the left
• E.g., in the table above, the joint
occurrence of “male and on-street” is 28,
the number in the sample who are both
male and spent a night on the streets
© 2008 McGraw-Hill Higher Education
Reading a Crosstab Table (cont.)
• The numbers in the margins on the
right side and the bottom present
marginal totals, the total number of
subjects in a category
• The grand total (n, the sample size) is
presented in the bottom right-hand
corner
© 2008 McGraw-Hill Higher Education
Crosstab Tables and
the Chi-Square Test
• For the chi-square test, the categories
of the independent variable (X) go in
the columns of the table, and those of
the dependent variable (Y), in the rows
• E.g.: Is gender a good predictor of who
among homeless persons is likely to
spend a night on the streets?
© 2008 McGraw-Hill Higher Education
Calculating Expected
Frequencies
• In addition to the observed joint
frequencies, the chi-square test involves
calculating the expected frequency of each
table cell
• The expected frequency of a cell is equal
to the column marginal total for the cell
(look down) times the row marginal total for
cell (look to the right) divided by the grand
total
© 2008 McGraw-Hill Higher Education
Using Expected Frequencies
to Test the Hypothesis
• The expected frequencies are those that
would occur if there is no relationship
between the two nominal/ordinal variables
• The chi-square statistic measures the gap
between expected and observed
frequencies
• If there is no relationship, then the
expected and observed frequencies are
the same and chi-square computes to zero
© 2008 McGraw-Hill Higher Education
The Chi-Square Statistic
• The sampling distribution is generated using the
chi-square equation:
χ2 = Σ[(O-E)2/ E]
where O is the observed frequency of a cell,
and E is the expected frequency
• Chi-square tells us whether the summed squared
differences between the observed and expected
cell frequencies are so great that they are not
simply the result of sampling error
© 2008 McGraw-Hill Higher Education
When to Use the
Chi-Square Statistic
1) There is one population with a
representative sample from it
2) There are two variables, both of a
nominal/ordinal level of
measurement
3) The expected frequency of each cell
in the crosstab table is at least five
© 2008 McGraw-Hill Higher Education
Features of the Chi-Square
Hypothesis Test
• Step 1. The H0 states that there is no
relationship between the two
variables. When this is the case, chisquare calculates to a value of zero,
give or take some sampling error
• This null hypothesis asserts no
difference in observed and expected
frequencies
© 2008 McGraw-Hill Higher Education
Features of the Chi-Square
Hypothesis Test (cont.)
• Step 2. The sampling distribution is the chisquare distribution. It describes all
possible outcomes of the chi-square
statistic with repeated sampling when there
is no relationship between X and Y
• Degrees of freedom are determined by the
number of columns and rows in the
crosstab table: df = (r -1) (c -1)
© 2008 McGraw-Hill Higher Education
Features of the Chi-Square
Hypothesis Test (cont.)
• Step 4. The test effects are the differences
between expected and observed
frequencies
• The test statistic is the chi-square statistic
• The p-value is obtained by comparing the
calculated chi-square value to the critical
values of the chi-square distribution in
Statistical Table G of Appendix B
© 2008 McGraw-Hill Higher Education
The Existence of a Relationship
for the Chi-Square Test
• Existence: Test the H0 that χ2 = 0;
that is, there is no relationship
between X and Y
• If the H0 is rejected, a relationship
exists
© 2008 McGraw-Hill Higher Education
Direction and Strength of a
Relationship for Chi-Square
• Direction: Not applicable (because
the variables are nominal level)
• Strength: These measures exist but
are seldom reported because they
are prone to misinterpretation
© 2008 McGraw-Hill Higher Education
Nature of a Relationship for
the Chi-Square Test
• Nature: Report the differences
between the observed and expected
cell frequencies for a couple of
outstanding cells
• Calculate column percentages for
selected cells
© 2008 McGraw-Hill Higher Education
Column and Row
Percentages
• A column percentage is a cell’s
frequency as a percentage of the
column marginal total
• A row percentage is a cell’s frequency
as a percentage of the row marginal
total
© 2008 McGraw-Hill Higher Education
Chi-Square as a Difference
of Proportions Test
• The chi-square test is frequently used to
compare proportions of categories of a
nominal/ordinal variable for two or more
groups of a second nominal/ordinal
variable
• Thus, it may be viewed as a difference
of proportions test as illustrated in
Figure 13-2 in the text
© 2008 McGraw-Hill Higher Education
The Binomial Distribution
• The binomial distribution test is a small
single-sample proportions test. Contrast it
to the large single-sample proportions test
of Chapter 10
• The test hinges on mathematically
expanding the binomial distribution
equation, (P + Q)n
© 2008 McGraw-Hill Higher Education
When to Use the
Binomial Distribution
1) There is only one nominal variable and it
is dichotomous, with P = p [of success]
and Q = p [of failure]
2) There is a single, representative sample
from one population
3) Sample size is such that [(psmaller)(n)] < 5,
where psmaller = the smaller of Pu and Qu
4) There is a target value of the variable to
which we may compare the sample
proportion
© 2008 McGraw-Hill Higher Education
Expansion of the Binomial
Distribution Equation
• Expansion of the binomial distribution
equation, (P + Q)n, provides the sampling
distribution for dichotomous events. That
is, the equation describes all possible
sampling outcomes and the probability of
each, where there are only two possible
categories of a nominal variable
© 2008 McGraw-Hill Higher Education
An Example of an
Expanded Binomial Equation
• The equation reveals, for example, the
possible outcomes of the tossing of 4 coins
• P = p [heads] = .5; Q = p [tails] = .5; n = 4
coins
• (P + Q)4 = P4 + 4P3Q1 + 6P2Q2 + 4P1Q3 + Q4
• Add the coefficients to get the total number
of possible outcomes = 16
• The probability of 3 heads and 1 tails, is the
coefficient of P3Q1 over the sum of
coefficients = 4 over 16 = .25
© 2008 McGraw-Hill Higher Education
Pascal’s Triangle
• Pascal’s Triangle provides a shortcut
method for expanding the binomial
equation
• It provides the coefficients for small
samples and allows a quick computation of
the probabilities of all possible outcomes
when P and Q are equal to .5
• See Table 13-7 in the text
© 2008 McGraw-Hill Higher Education
Features of the
Binomial Distribution Test
• Step 1. H0: Pu = a target value
• Step 2. The sampling distribution is
an expanded binomial equation for
the given sample size
© 2008 McGraw-Hill Higher Education
Features of the Binomial
Distribution Test (cont.)
• Step 4. The effect is the observed
combination of successes and failures,
which corresponds to a term in the
equation (e.g., 3 heads and 1 tails, is
represented by the term 4P3Q1)
• The test statistic is the expanded binomial
equation
• The p-value is taken directly from the
equation (not from a statistical table)
© 2008 McGraw-Hill Higher Education
Statistical Follies:
Statistical Power and Sample Size
• For a given level of significance,
statistical power is a test statistic’s
probability of not incurring a Type II
error (i.e., unknowingly making the
incorrect decision of failing to reject a
false null hypothesis)
• Low statistical power can result from
having too small a sample size
© 2008 McGraw-Hill Higher Education
Download