10-Chi-square Tests and the F

advertisement
Page 1
Chapter 10
Math 170/171-V1
( )
10-Chi-square χ 2 Tests and the F-Distribution
10.1 – Goodness of Fit
In the previous sections first we tested the proportion ( p ) of one population. Then we tested the
proportions of two populations ( p1 − p 2 ). In this section we like to test the proportions of 3 or
more populations. They may not sound like the comparison of a few proportions. But it seems
like testing a frequency distribution.
Example 1: Suppose we have three different kind of soda (A, B, C) at Richland. A cafeteria
personnel claims that students at Richland have no preference for soda. This means that the
proportion of students who drink soda A, B, and C are the same. You doubt this claim and take a
random sample of 100 RCC students who drink soda. From your data 25 students prefer soda A,
35 prefer soda B, and 40 prefer soda A. Test the claim. Let α = .05 .
Solution:
There are two ways that you can write the null and the alternative hypotheses.
If you prefer using p, then you should write:
1
H 0 : p A = p B = p C = (claim)
3
H a : At least one p is differnt from the others.
The other way you can write this is to use the text of the given problem:
H 0 : Students at Richland have no preference for soda. (claim)
H a : Students at Richland have preference for soda.
Note: All The tests for this section are right-tailed test.
Find the critical value and the critical region: df = k − 1 = 3 − 1 = 2 . k is the number of
categories; number different soda for this problem.
Look at table 6 on page A21: χ 2 (2, .05) = 5.991 , this is the critical value. If the test statistics is
greater than 5.991, reject H 0 .
In order to find the test statistics we must first find the expected value for each category.
The expected value for the ith category is found by Ei = npi .
Since p1 = p 2 = p3 =
1
1
, then E1 = E 2 = E3 = (100)  = 33.33 .
3
 3
∑ (O − E ) = (25 − 33.33)
=
2
The formula is χ
2
33.33
E
Decision: Fail to reject H 0 .
Conclusion: We cannot reject the claim.
2
+
(35 − 33.33)2 + (40 − 33.33)2
33.33
33.33
= 3.5 .
Example 2:
Mr. X, who is a pet owner, claims that 40% of homeowners in Illinois have no pet, 30% have
one pet, 20% have two pets, and 10% own three or more pets. Mr. Y who doubts this claim took
a random sample of 100 Illinoisan home owners and observed the following:
Page 2
# of pets
0
1
2
3 or more
Chapter 10
Math 170/171-V1
Freq.
52
20
15
13
Let α = .10 , check Mr. X’s claim.
Solution:
H 0 : p1 = .4, p 2 = .3, p3 = .2, p 4 = .1
(claim)
H a : At least one p is different from the stated value.
Find the expected values:
E1 = .4(100) = 40, E 2 = .3(100 ) = 30, E3 = .2(100) = 20, E 4 = .1(100) = 10
Find the test statistics:
(O − E )2 (52 − 40)2 (20 − 30)2 (15 − 20)2 (13 − 10)2
∑
2
χ =
=
+
+
+
= 9.08
40
30
20
10
E
Find the critical value: χ 2 (3, .10) = 6.251 ; if the test statistics is greater than the critical value
reject H 0 .
Decision: Reject H 0 .
Conclusion: We must reject Mr. X’s claim.
10.2 – Independence
In this section we will use the χ 2 test to find out if two variables are independent of each other.
We will test claims such as:
H 0 : Smoking and education are independent of ech other. (claim)
H a : Smoking and eduacation are not independent from each other.
The two variables are smoking and education.
For this section, the null and the alternative hypotheses are usually written this way. The only
things that you have to change are the variables.
Like the previous section, all the tests are right-tailed tests.
Note that when we say smoking and education are independent of each other we mean that
education does not influence smoking.
Example: Professor X at Richland claims that taking Statistics is independent of gender. Mr. Y
took a random sample of 100 Richland students and found the below table (this table is called a
contingency table). At .05 level of significance, test the professor’s claim.
Male Female Row total
Took stat
5
15
20
Did not take stat 35
45
80
Column total
40
60
100
For this question you have 2 rows and 2 columns; these are your observed value. You must find
the expected value for each cell.
Page 3
Chapter 10
Math 170/171-V1
r1c1 (20 )(40 )
=
= 8 this is the expected value for the cell with row 1
n
100
and column 1. The other expected values are obtained the same way.
rc
(20)(60) = 12 . Note that 8+12 = 20, which was the sum for row 1.
E12 = 1 2 =
n
100
rc
(80)(40) = 32
E 21 = 2 1 =
n
100
rc
(80)(60) = 48
E 22 = 2 2 =
n
100
Note again that the sum of the expected values for row 2 is 80.
For row 1 column 1: E11 =
Step 1:
H 0 : Taking statistics is independent of gender. (claim)
H a : Taking statistics is not independent of gender.
Step 2: Find the critical region.
df = (# rows − 1)(# columns − 1) = (2 − 1)(2 −`1) = 1 . Like the previous section find the critical
region; χ 2 (1,.05) = 3.841 . If the test statistics is greater than the critical value, then we must
reject the null hypothesis.
Step 3: Find the test statistics. It is found exactly like the previous section.
(O − E )2 = (5 − 8)2 + (15 − 12)2 + (35 − 32)2 + (45 − 48)2 = 2.344
χ2 = ∑
E
8
12
32
48
Step 4: Make a decision: Since the test statistics is less than the critical value, we can’t reject
H0 .
Step 5: Write a conclusion: We don’t have enough evidence to reject the professor’s claim.
10.3 – Comparing Two Variances
Before, we tested the variance of one population. In this section, we compare the variances or
standard deviations of two populations. In order to compare the variances of two populations,
we must first talk about a new distribution-the F-distribution.
The F-distribution in many ways is like the χ 2 distribution.
1. The F-distribution is a family of curves,
2. The F-distribution is never negative and it is skewed to the right,
3. The area underneath the F curves is exactly one (like any other distribution with continuous
random variable).
Example: Mr. X claims that the standard deviation of the height of students at Richland the
same the average height of students at Parkland. Mr. Y took a random sample of 20 RCC
students and found the variance to be 11 inches. Then, he took a random sample of 24 Parkland
students and found the variance to be 16 inches. Check Mr. X’s claim. Let α = .05 .
H 0 : σ 1 = σ 2 (claim)
Step 1:
Ha :σ1 ≠ σ 2
s 22 16
Step 2: Find the test statistic: F = 2 = = 1.455
s1 11
Page 4
Chapter 10
Math 170/171-V1
If you look at the formula, you see s 22 on the top. This is because the variance of the second
sample is larger than the variance of the first one. You should do the same.
Step 3 - Find the critical value(s):
Even though this is a two-tailed test, you need to find the critical value on the right tail.
When you put the larger variance on the top you do not need the left critical value.
With the F-table we have degrees of freedom for the numerator (df N ) and degrees of freedom
for the denominator (df D ) .
df N = n2 − 1 = 24 − 1 = 23
df D = n1 − 1 = 20 − 1 = 19
Now, go to the F-table with α = .025 . This is on page A24. A nice notation is
F (df N , df D ,α ) = F (23,19, .025) = 2.51 .
Note that df N = 23 is not on the table. In this case, go to left of this number and select 20 for
df N .
Note also that I put .025 for α . This is because it is a two-tailed test.
Step 4: Make a decision : Fail to reject H 0 (since the test statistic is less than the critical vale).
Step 5: Conclusion: We don’t have enough evidence to reject Mr. X’s claim.
Note: If we have a one-tailed test, always use the greater than sign for the alternative
hypothesis.
Example: The variance weight of students at Richland is less than the variance weight of
students at Parkland. This is how you should write the null and the alternative hypotheses.
H 0 : σ P2 ≤ σ R2
H a : σ P2 f σ R2 (claim)
Note: Compare these two statements:
a. Five is less than 6.
b. Six is more than 5.
They imply the same thing.
10.4 – Analysis of Variance (ANOVA)
A few weeks ago we tested the mean of one population. Then, we tested (compared) the means
of two populations. In this section we will test the equality of the means of three or more
populations. In this text, we will deal with one-way analysis of variance.
Let me give a good situation that one-way ANOVA can be used. Suppose you are a farmer and
you have four choices for fertilizers- fertilizer A, B, C, or D. You would like to know if they are
all the same or may be one type of fertilizer is better or worse than the others. You randomly
select n plots and you plant corn in all. In the 5 plots you apply fertilizer A, in 4 plots you apply
fertilizer B, in 8 plots you apply fertilizer C, and in 6 plots you apply fertilizer D. You must
keep everything the same for all plots except the type of the fertilizer. That means many factors
like the amount of water, the amount of sun light, and…must be the same. That is why it is
called one-way ANOVA-only one factor is different in each plot. When you harvest the corn at
end of the season, you must find the yield for each plot and perform an F-test. Things must be
done randomly-the selection of a plot for a particular fertilizer, for example. Suppose we have
the following table at the end of the season.
Page 5
Chapter 10
Math 170/171-V1
Yield for Type of fertilizer
A B C D
4 5 3 6
3 5 4 5
6 4 2 7
7 6 3 8
2 7
5
3 8
5
4
You have to complete this table by including the sums and other things that I have done below.
A
B
C
D
∑ x = 25 ∑ x = 20 ∑ x = 26 ∑ x = 41
x =5
x =5
x = 3.25
s = 1.58
s = .82
s = 1.04
n=5
n=4
n=8
Let your calculator do the work.
x = 6.83
s = 1.17
n=6
H 0 : µ1 = µ 2 = µ 3 = µ 4 (claim)
H a : At least one mean is different from the others.
Before you can compute the test statistic, you must do some calculations:
1. Find the grand mean.
=
∑ x = ∑ x A + ∑ x B + ∑ xC + ∑ x D = 25 + 20 + 26 + 41 = 112 = 4.87
x=
n
n A + n B + nC + n D
5+4+8+6
23
2. Find the mean square between.
2
=
∑ ni  xi − x  5(5 − 4.87)2 + 4(5 − 4.87)2 + 8(3.25 − 4.87)2 + 6(6.83 − 4.87)2
=
MS B =
4 −1
k −1
44.1969
=
= 14.73
3
This is the variance between the categories (fertilizer). k is the number of categories.
3. Find the mean square within.
(ni − 1)si2 (5 − 1)1.58 2 + (4 − 1).82 2 + (8 − 1)1.04 2 + (6 − 1)1.17 2 26.42
∑
=
=
= 1.39
MSW =
n−k
23 − 4
19
This is the variance within the samples (fertilizer).
Now we can find the test statistic.
MS B 14.73
F=
=
= 10.60 .
MSW
1.39
Find the critical value: Let α = .05 .
Page 6
Chapter 10
Math 170/171-V1
Note: For this section, all the tests are right-tailed tests. Do not divide α by 2.
df N = k − 1 = 4 − 1 = 3
df D = n − k = 23 − 4 = 19
Look at the F-table on page A25. The critical value is 3.13.
Decision- Reject H 0 .
Conclusion- There is at least one mean which is different from the others.
Note: There are some tests that can show which mean is different from the others.
Homework assignment for chapter 10.
Due: 5/6/02
10.1- # 8, 10, 12*
10.2- # 8
10.3- # 12, 16
10.4- # 4
Each question is worth 4 points.
* This question is worth 10 points. You will learn how to check if a random sample is selected
from a normal distribution. Read the Extending the Basics on page 475 and practice with
question # 11 first.
Download