H - Georgia Highlands College

advertisement
Chapter 11
Chi-Square and Analysis of
Variance (ANOVA)
© McGraw-Hill, Bluman, 5th ed., Chapter 11
1
Chapter 11 Overview

Introduction
11-1 Test for Goodness of Fit

11-2 Tests Using Contingency Tables

11-3 Analysis of Variance (ANOVA)
Bluman, Chapter 11
2
Chapter 11 Objectives
1. Test a distribution for goodness of fit, using
chi-square.
2. Test two variables for independence, using
chi-square.
3. Test proportions for homogeneity, using
chi-square.
4. Use the ANOVA technique to determine if
there is a significant difference among three or
more means
Bluman, Chapter 11
3
11.1 Test for Goodness of Fit

The chi-square statistic can be used to
see whether a frequency distribution fits
a specific pattern. This is referred to as
the chi-square goodness-of-fit test.
Bluman, Chapter 11
4
Test for Goodness of Fit
Formula for the test for goodness of fit:
 
2
O  E 
2
E
where
d.f. = number of categories minus 1
O = observed frequency
E = expected frequency
Bluman, Chapter 11
5
Assumptions for Goodness of Fit
1. The data are obtained from a random sample.
2. The expected frequency for each category
must be 5 or more.
Bluman, Chapter 11
6
Chapter 11
Chi-Square and Analysis of
Variance (ANOVA)
Section 11-1
Example 11-1
Page #592
Bluman, Chapter 11
7
Example 11-1: Fruit Soda Flavors
A market analyst wished to see whether consumers have
any preference among five flavors of a new fruit soda. A
sample of 100 people provided the following data. Is there
enough evidence to reject the claim that there is no
preference in the selection of fruit soda flavors, using the
data shown previously? Let α = 0.05.
Cherry
Strawberry Orange
Lime
Grape
Observed
32
28
16
14
10
Expected
20
20
20
20
20
Step 1: State the hypotheses and identify the claim.
H0: Consumers show no preference (claim).
H1: Consumers show a preference.
Bluman, Chapter 11
8
Example 11-1: Fruit Soda Flavors
Cherry
Strawberry Orange
Lime
Grape
Observed
32
28
16
14
10
Expected
20
20
20
20
20
Step 2: Find the critical value.
D.f. = 5 – 1 = 4, and α = 0.05. CV = 9.488.
Step 3: Compute the test value.
O  E

2
 
E2
2
2
2
32  20 
28  20 
16  20 
14  20 








20
20
20
20
2
10  20 


20
2
 18.0
Bluman, Chapter 11
9
Example 11-1: Fruit Soda Flavors
Step 4: Make the decision.
The decision is to reject the null hypothesis, since
18.0 > 9.488.
Step 5: Summarize the results.
There is enough evidence to reject the claim that
consumers show no preference for the flavors.
Bluman, Chapter 11
10
Chapter 11
Chi-Square and Analysis of
Variance (ANOVA)
Section 11-1
Example 11-2
Page #594
Bluman, Chapter 11
11
Example 11-2: Retirees
The Russel Reynold Association surveyed retired senior
executives who had returned to work. They found that
after returning to work, 38% were employed by another
organization, 32% were self-employed, 23% were either
freelancing or consulting, and 7% had formed their own
companies. To see if these percentages are consistent
with those of Allegheny County residents, a local
researcher surveyed 300 retired executives who had
returned to work and found that 122 were working for
another company, 85 were self-employed, 76 were either
freelancing or consulting, and 17 had formed their own
companies. At α = 0.10, test the claim that the
percentages are the same for those people in Allegheny
County.
Bluman, Chapter 11
12
Example 11-2: Retirees
New
Company
SelfEmployed
Freelancing
Owns
Company
Observed
122
85
76
17
Expected
.38(300)=
114
.32(300)=
96
.23(300)=
69
.07(300)=
21
Step 1: State the hypotheses and identify the claim.
H0: The retired executives who returned to work
are distributed as follows: 38% are employed
by another organization, 32% are selfemployed, 23% are either freelancing or
consulting, and 7% have formed their own
companies (claim).
H1: The distribution is not the same as stated in
the null hypothesis.
Bluman, Chapter 11
13
Example 11-2: Retirees
New
Company
SelfEmployed
Freelancing
Owns
Company
Observed
122
85
76
17
Expected
.38(300)=
114
.32(300)=
96
.23(300)=
69
.07(300)=
21
Step 2: Find the critical value.
D.f. = 4 – 1 = 3, and α = 0.10. CV = 6.251.
Step 3: Compute the test value.
O  E

2
 
E
2
2
2
2
122

114
85

96
76

69
17

21

 
 
 


114
96
69
21
 3.2939
2
Bluman, Chapter 11
14
Example 11-2: Retirees
Step 4: Make the decision.
Since 3.2939 < 6.251, the decision is not to reject
the null hypothesis.
Step 5: Summarize the results.
There is not enough evidence to reject the claim.
It can be concluded that the percentages are not
significantly different from those given in the null
hypothesis.
Bluman, Chapter 11
15
Chapter 11
Chi-Square and Analysis of
Variance (ANOVA)
Section 11-1
Example 11-3
Page #595
Bluman, Chapter 11
16
Example 11-3: Firearm Deaths
A researcher read that firearm-related deaths for people
aged 1 to 18 were distributed as follows: 74% were
accidental, 16% were homicides, and 10% were
suicides. In her district, there were 68 accidental deaths,
27 homicides, and 5 suicides during the past year. At
α = 0.10, test the claim that the percentages are equal.
Accidental
Homicides
Suicides
Observed
68
27
5
Expected
74
16
10
Bluman, Chapter 11
17
Example 11-3: Firearm Deaths
Accidental
Homicides
Suicides
Observed
68
27
5
Expected
74
16
10
Step 1: State the hypotheses and identify the claim.
H0: Deaths due to firearms for people aged 1
through 18 are distributed as follows: 74%
accidental, 16% homicides, and 10% suicides
(claim).
H1: The distribution is not the same as stated in
the null hypothesis.
Bluman, Chapter 11
18
Example 11-3: Firearm Deaths
Accidental
Homicides
Suicides
Observed
68
27
5
Expected
74
16
10
Step 2: Find the critical value.
D.f. = 3 – 1 = 2, and α = 0.10. CV = 4.605.
Step 3: Compute the test value.
 
2
O  E 
2
E
2
2
2
68  74 
27  16 
5  10 






74
16
10
 10.549
Bluman, Chapter 11
19
Example 11-3: Firearm Deaths
Step 4: Make the decision.
Reject the null hypothesis, since 10.549 > 4.605.
Step 5: Summarize the results.
There is enough evidence to reject the claim that
the distribution is 74% accidental, 16% homicides,
and 10% suicides.
Bluman, Chapter 11
20
Test for Normality (Optional)



The chi-square goodness-of-fit test can be used
to test a variable to see if it is normally
distributed.
The hypotheses are:
 H0: The variable is normally distributed.
 H1: The variable is not normally distributed.
This procedure is somewhat complicated. The
calculations are shown in example 11-4 on
page 597 in the text.
Bluman, Chapter 11
21
11.2 Tests Using Contingency
Tables

When data can be tabulated in table form in terms of
frequencies, several types of hypotheses can be tested
by using the chi-square test.

The test of independence of variables is used to
determine whether two variables are independent of or
related to each other when a single sample is selected.

The test of homogeneity of proportions is used to
determine whether the proportions for a variable are
equal when several samples are selected from
different populations.
Bluman, Chapter 11
22
Test for Independence



The chi-square goodness-of-fit test can be used
to test the independence of two variables.
The hypotheses are:
 H0: There is no relationship between two
variables.
 H1: There is a relationship between two
variables.
If the null hypothesis is rejected, there is some
relationship between the variables.
Bluman, Chapter 11
23
Test for Independence


In order to test the null hypothesis, one must
compute the expected frequencies, assuming
the null hypothesis is true.
When data are arranged in table form for the
independence test, the table is called a
contingency table.
Bluman, Chapter 11
24
Contingency Tables

The degrees of freedom for any contingency
table are d.f. = (rows – 1) (columns – 1) =
(R – 1)(C – 1).
Bluman, Chapter 11
25
Test for Independence
The formula for the test for independence:
 
2
where
d.f. = (R – 1)(C – 1)
O = observed frequency
E = expected frequency =
O  E 
2
E
 row sum  column sum 
grand total
Bluman, Chapter 11
26
Chapter 11
Chi-Square and Analysis of
Variance (ANOVA)
Section 11-2
Example 11-5
Page #606
Bluman, Chapter 11
27
Example 11-5: College Education and
Place of Residence
A sociologist wishes to see whether the number of years
of college a person has completed is related to her or his
place of residence. A sample of 88 people is selected
and classified as shown. At α = 0.05, can the sociologist
conclude that a person’s location is dependent on the
number of years of college?
Location
No
College
Four-Year Advanced
Degree
Degree
Urban
15
12
8
35
Suburban
8
15
9
32
Rural
6
8
7
21
Total
29
35
24
88
Bluman, Chapter 11
Total
28
Example 11-5: College Education and
Place of Residence
Step 1: State the hypotheses and identify the claim.
H0: A person’s place of residence is independent
of the number of years of college completed.
H1: A person’s place of residence is dependent on
the number of years of college completed
(claim).
Step 2: Find the critical value.
The critical value is 4.605, since the degrees of
freedom are (2 – 1)(3 – 1) = 2.
Bluman, Chapter 11
29
Example 11-5: College Education and
Place of Residence
Compute the expected values.
 row sum  column sum 
E
E1,1
grand total
Location
No
College
35 29 


 11.53
88
Four-Year Advanced
Degree
Degree
Total
Urban
15
(11.53)
12
(13.92)
8
(9.55)
35
Suburban
8
(10.55)
15
(12.73)
9
(8.73)
32
Rural
6
(6.92)
8
(8.35)
7
(5.73)
21
Total
29
35
24
88
Bluman, Chapter 11
30
Example 11-5: College Education and
Place of Residence
Step 3: Compute the test value.
O  E

2
 
E
2
15  11.53


2
11.53
12  13.92 


2
13.92
8  9.55 


2
9.55
8  10.55 
15  12.73 
9  8.73 






10.55
12.73
8.73
2
2
6  6.92 
8  8.35 
7  5.73 






6.92
8.35
5.73
2
2
2
2
 3.01
Bluman, Chapter 11
31
Example 11-5: College Education and
Place of Residence
Step 4: Make the decision.
Do not reject the null hypothesis, since 3.01<9.488.
Step 5: Summarize the results.
There is not enough evidence to support the claim
that a person’s place of residence is dependent on
the number of years of college completed.
Bluman, Chapter 11
32
Chapter 11
Chi-Square and Analysis of
Variance (ANOVA)
Section 11-2
Example 11-6
Page #608
Bluman, Chapter 11
33
Example 11-6: Alcohol and Gender
A researcher wishes to determine whether there is a
relationship between the gender of an individual and the
amount of alcohol consumed. A sample of 68 people is
selected, and the following data are obtained. At α = 0.10,
can the researcher conclude that alcohol consumption is
related to gender?
Alcohol Consumption
Gender
Low
Moderate
High
Total
Male
10
9
8
27
Female
13
16
12
41
Total
23
25
20
68
Bluman, Chapter 11
34
Example 11-6: Alcohol and Gender
Step 1: State the hypotheses and identify the claim.
H0: The amount of alcohol that a person
consumes is independent of the individual’s
gender.
H1: The amount of alcohol that a person
consumes is dependent on the individual’s
gender (claim).
Step 2: Find the critical value.
The critical value is 9.488, since the degrees of
freedom are (3 – 1 )(3 – 1) = (2)(2) = 4.
Bluman, Chapter 11
35
Example 11-6: Alcohol and Gender
Compute the expected values.
row sum  column sum 

E
E1,1
grand total
27  23


 9.13
68
Alcohol Consumption
Gender
Low
Moderate
High
Total
Male
10
(9.13)
9
(9.93)
8
(7.94)
27
Female
13
(13.87)
16
(15.07)
12
(12.06)
41
Total
23
25
20
68
Bluman, Chapter 11
36
Example 11-6: Alcohol and Gender
Step 3: Compute the test value.
O  E

2
 
E
10  9.13


2
2
9.13
9  9.93


9.93
13  13.87 


13.87
2
2
8  7.94 


16  15.07 


15.07
2
7.94
2
12  12.06 


2
12.06
 0.283
Bluman, Chapter 11
37
Example 11-6: Alcohol and Gender
Step 4: Make the decision.
Do not reject the null hypothesis, since
0.283 < 4.605.
.
Step 5: Summarize the results.
There is not enough evidence to support the claim
that the amount of alcohol a person consumes is
dependent on the individual’s gender.
Bluman, Chapter 11
38
Test for Homogeneity of
Proportions

Homogeneity of proportions test is used
when samples are selected from several
different populations and the researcher is
interested in determining whether the
proportions of elements that have a common
characteristic are the same for each population.
Bluman, Chapter 11
39
Test for Homogeneity of
Proportions


The hypotheses are:
 H0: p1 = p2 = p3 = … = pn
 H1: At least one proportion is different from
the others.
When the null hypothesis is rejected, it can be
assumed that the proportions are not all equal.
Bluman, Chapter 11
40
Assumptions for Homogeneity of
Proportions
1. The data are obtained from a random sample.
2. The expected frequency for each category
must be 5 or more.
Bluman, Chapter 11
41
Chapter 11
Chi-Square and Analysis of
Variance (ANOVA)
Section 11-2
Example 11-7
Page #610
Bluman, Chapter 11
42
Example 11-7: Lost Luggage
A researcher selected 100 passengers from each of 3
airlines and asked them if the airline had lost their
luggage on their last flight. The data are shown in the
table. At α = 0.05, test the claim that the proportion of
passengers from each airline who lost luggage on the
flight is the same for each airline.
Airline 1
Airline 2
Airline 3
Total
Yes
10
7
4
21
No
90
93
96
279
Total
100
100
100
300
Bluman, Chapter 11
43
Example 11-7: Lost Luggage
Step 1: State the hypotheses.
H0: p1 = p2 = p3 = … = pn
H1: At least one mean differs from the other.
Step 2: Find the critical value.
The critical value is 5.991, since the degrees of
freedom are (2 – 1 )(3 – 1) = (1)(2) = 2.
Bluman, Chapter 11
44
Example 11-7: Lost Luggage
Compute the expected values.
row sum  column sum 

E
grand total
Airline 1
Yes
No
Total
E1,1
21100 


7
300
Airline 2
Airline 3
Total
10
(7)
7
(7)
4
(7)
21
90
(93)
100
93
(93)
100
96
(93)
100
Bluman, Chapter 11
279
300
45
Example 11-7: Luggage
Step 3: Compute the test value.
O  E

2
 
E
10  7 


2
7
7  7


90  93


93
2
7
2
2
4  7


2
7
93  93


93
2
96  93


2
93
 2.765
Bluman, Chapter 11
46
Example 11-7: Lost Luggage
Step 4: Make the decision.
Do not reject the null hypothesis, since
2.765 < 5.991.
.
Step 5: Summarize the results.
There is not enough evidence to reject the claim
that the proportions are equal. Hence it seems that
there is no difference in the proportions of the
luggage lost by each airline.
Bluman, Chapter 11
47
11-3 Analysis of Variance (ANOVA)



The F test, used to compare two variances,
can also be used to compare three of more
means.
This technique is called analysis of variance
or ANOVA.
For three groups, the F test can only show
whether or not a difference exists among the
three means, not where the difference lies.
Bluman, Chapter 11
48
Analysis of Variance (ANOVA)
When an F test is used to test a
hypothesis concerning the means of
three or more populations, the technique
is called analysis of variance
(commonly abbreviated as ANOVA).
 Although the t test is commonly used to
compare two means, it should not be
used to compare three or more.

Bluman, Chapter 11
49
Assumptions for the F Test
The following assumptions apply when
using the F test to compare three or more
means.
1. The populations from which the samples
were obtained must be normally or
approximately normally distributed.
2. The samples must be independent of each
other.
3. The variances of the populations must be
equal.
Bluman, Chapter 11
50
The F Test
In the F test, two different estimates of
the population variance are made.
 The first estimate is called the betweengroup variance, and it involves finding
the variance of the means.
 The second estimate, the within-group
variance, is made by computing the
variance using all the data and is not
affected by differences in the means.

Bluman, Chapter 11
51
The F Test
If there is no difference in the means, the
between-group variance will be
approximately equal to the within-group
variance, and the F test value will be
close to 1—do not reject null hypothesis.
 However, when the means differ
significantly, the between-group variance
will be much larger than the within-group
variance; the F test will be significantly
greater than 1—reject null hypothesis.

Bluman, Chapter 11
52
Chapter 11
Chi-Square and Analysis of
Variance (ANOVA)
Section 11-3
Example 11-8
Page #620
Bluman, Chapter 11
53
Example 11-8: Lowering Blood Pressure
A researcher wishes to try three different techniques to
lower the blood pressure of individuals diagnosed with
high blood pressure. The subjects are randomly assigned
to three groups; the first group takes medication, the
second group exercises, and the third group follows a
special diet. After four weeks, the reduction in each
person’s blood pressure is recorded. At α = 0.05, test the
claim that there is no difference among the means.
Bluman, Chapter 11
54
Example 11-8: Lowering Blood Pressure
Step 1: State the hypotheses and identify the claim.
H0: μ1 = μ2 = μ3 (claim)
H1: At least one mean is different from the others.
Bluman, Chapter 11
55
Example 11-8: Lowering Blood Pressure
Step 2: Find the critical value.
Since k = 3, N = 15, and α = 0.05,
d.f.N. = k – 1 = 3 – 1 = 2
d.f.D. = N – k = 15 – 3 = 12
The critical value is 3.89, obtained from Table H.
Bluman, Chapter 11
56
Example 11-8: Lowering Blood Pressure
Step 3: Compute the test value.
a. Find the mean and variance of each sample
(these were provided with the data).
b. Find the grand mean, the mean of all
values in the samples.
X GM
X


N

10  12  9 
15
4

116
 7.73
15
2
c. Find the between-group variance, sB .
s
2
B
n X


i
i
 X GM 
2
k 1
Bluman, Chapter 11
57
Example 11-8: Lowering Blood Pressure
Step 3: Compute the test value. (continued)
c. Find the between-group variance, sB2 .
2
2
2
5
11.8

7.73

5
3.8

7.73

5
7.6

7.73






sB2 
3 1
160.13

 80.07
2
2
d. Find the within-group variance, sW .
2
n

1
s


 i
i
sB2 
  ni  1
4  5.7   4 10.2   4 10.3 104.80


 8.73
444
12
Bluman, Chapter 11
58
Example 11-8: Lowering Blood Pressure
Step 3: Compute the test value. (continued)
e. Compute the F value.
sB2
80.07
F 2 
 9.17
sW
8.73
Step 4: Make the decision.
Reject the null hypothesis, since 9.17 > 3.89.
Step 5: Summarize the results.
There is enough evidence to reject the claim and
conclude that at least one mean is different from
the others.
Bluman, Chapter 11
59
ANOVA




The between-group variance is sometimes
called the mean square, MSB.
The numerator of the formula to compute MSB
is called the sum of squares between
groups, SSB.
The within-group variance is sometimes called
the mean square, MSW.
The numerator of the formula to compute MSW
is called the sum of squares within groups,
SSW.
Bluman, Chapter 11
60
ANOVA Summary Table
Source
Sum of
Squares
d.f.
Mean
Squares
Between
SSB
k–1
MSB
Within (error)
SSW
N–k
MSW
F
MSB
MSW
Total
Bluman, Chapter 11
61
ANOVA Summary Table for
Example 11-8
Source
Sum of
Squares
d.f.
Mean
Squares
F
Between
160.13
2
80.07
9.17
Within (error)
104.80
12
8.73
Total
264.93
14
Bluman, Chapter 11
62
Chapter 11
Chi-Square and Analysis of
Variance (ANOVA)
Section 11-3
Example 11-9
Page #622
Bluman, Chapter 11
63
Example 11-9: Toll Road Employees
A state employee wishes to see if there is a significant
difference in the number of employees at the interchanges
of three state toll roads. The data are shown. At α = 0.05,
can it be concluded that there is a significant difference in
the average number of employees at each interchange?
Bluman, Chapter 11
64
Example 11-9: Toll Road Employees
Step 1: State the hypotheses and identify the claim.
H0: μ1 = μ2 = μ3
H1: At least one mean is different from the others
(claim).
Bluman, Chapter 11
65
Example 11-9: Toll Road Employees
Step 2: Find the critical value.
Since k = 3, N = 18, and α = 0.05,
d.f.N. = 2, d.f.D. = 15
The critical value is 3.68, obtained from Table H.
Bluman, Chapter 11
66
Example 11-9: Toll Road Employees
Step 3: Compute the test value.
a. Find the mean and variance of each sample
(these were provided with the data).
b. Find the grand mean, the mean of all
values in the samples.
X GM
X



N
7  14  32 
15
 11

152
 8.4
18
2
c. Find the between-group variance, sB .
s
2
B
n X


i
i
 X GM 
2
k 1
Bluman, Chapter 11
67
Example 11-9: Toll Road Employees
Step 3: Compute the test value. (continued)
c. Find the between-group variance, sB2 .
2
2
2
6
15.5

8.4

6
4

8.4

6
5.8

8.4






sB2 
3 1
459.18

 229.59
2
d. Find the within-group variance, sW2 .
2
n

1
s


 i
i
sB2 
  ni  1
5  81.9   5  25.6   5  29.0  682.5


 45.5
444
15
Bluman, Chapter 11
68
Example 11-9: Toll Road Employees
Step 3: Compute the test value. (continued)
e. Compute the F value.
sB2
229.59
F 2 
 5.05
sW
45.5
Step 4: Make the decision.
Reject the null hypothesis, since 5.05 > 3.68.
Step 5: Summarize the results.
There is enough evidence to support the claim
that there is a difference among the means.
Bluman, Chapter 11
69
ANOVA Summary Table for
Example 11-9
Source
Sum of
Squares
d.f.
Mean
Squares
F
Between
459.18
2
229.59
5.05
Within (error)
682.5
15
45.5
Total
1141.68
17
Bluman, Chapter 11
70
Download