Document 15930264

advertisement
252solnE2 10/25/05 (Open this document in 'Page Layout' view!)
E. CHI-SQUARED AND RELATED TESTS.
1. Tests of Homogeneity and Independence
Text 12.18, 12.19 - 21, 12.26 [12.23*, 12.24, 12.27] (12.22, 12.27) E1, E2, E3
2. Tests of Goodness of Fit
Text 12.51, 12.54 [12.49*, 12.52*. Both on CD12_5], E4, E5, E6
a. Uniform Distribution
b. Poisson Distribution
c. Normal Distribution
3. Kolmogorov-Smirnov Test
E7, E8, E9, E10, E11
a. Kolmogorov-Smirnov One-Sample Test
b. Lilliefors Test.
This document includes Exercises 12.49, 12.27 and Problems E4-E6.
----------------------------------------------------------------------------------------------------------------------------- ----
Problems Involving Tests of Goodness of Fit.
Exercise 12.51 [12.49 in 9th] (On CD not in 8th edition): The manager of a computer network has the
following data on service interruptions per day over the last 500 days. Does it follow a Poisson
distribution?   .01
Interruptions Number
per day
of days
0
160
1
175
2
86
3
41
4
18
5
12
6
8
Total
500
Solution: H 0 : Poisson First find a mean for the distribution. Our first step would be to find a mean for the
distribution.
We would then have to use a Poisson table with
O
xO
x
a mean of 1.3.
0
160
0
O2
f
E  fn
O
x
1
175
175
E
2
86
172
0
.272532
136.27
160
187.862
3
41
123
1
.354291
177.15
175
172.876
4
18
72
2
.230289
115.14
86
64.235
5
12
60
3
.099792
49.90
41
33.687
6
8
48
4
.032432
16.22
18
19.975
500
650
5
.008432
4.22
12
34.123
xO 650
6
.002230
1.11
8
57.658

 1.3
This would give us m 
n
500
1.0000
500.01
500
570.414
We have only 5 degrees of freedom since we lost one degree of freedom because we estimated the mean

 
from the data.  .2015  15 .0863 . Since our computed chi-square of 570.414 – 500 = 70.414 is above the
1
252solnE2 10/30/03
table chi-square, we reject the null hypothesis. However, the small values of E in the last two rows make
O  E 2  12  4.22 2  14.34 and O  E 2  8  1.112  42.77
me suspicious and I will compute
E
4.22
E
1.11
These account for more than all the difference between the table value and the computed value, so let’s
merge the last two rows and try again.
0
1
2
3
4
5+
E  fn
O
136.27
177.15
115.14
49.90
16.22
5.33
500.01
160
175
86
41
18
20
500
f
x
.272532
.354291
.230289
.099792
.032432
.010732
1.0000
O2
E
187.862
172.876
64.235
33.687
19.975
75.047
553.680
 .2014   13 .2763 . Since our computed chi-square of 553.680 – 500 = 53.680 is above the table chi square,
we reject the null hypothesis.
Exercise 12.54 [12.52 in 9th] (On CD not in 8th edition): A random sample of 500 car batteries revealed
the following distribution of battery life in years. If x  2.80 and s  0.97 , does it follow a Normal
distribution?   .05 
Life
Frequency
0  under 1
12
1  under 2
94
2  under 3
170
3  under 4
188
4  under 5
28
5  under 6
Total
8
500
Solution: Let’s try to get the probabilities, by subtracting x  2.80 from the mean and dividing by
s  0.97 to get z . Then get F z  from the Normal table.
Income in
Thousands
Under 1
1–2
2–3
3–4
4–5
Over 5
Sum
x
1
2
3
4
5

z
F z 
f z 
E  fn
O
-1.86
-0.82
0.21
1.24
2.27

0.0314
0.2061
0.5832
0.8925
0.9884
1.0000
0.0314
0.1747
0.3771
0.3093
0.0959
0.0116
15.70
87.35
188.55
154.65
47.95
5.80
500.00
12
94
170
188
28
8
500
O2
E
9.171
101.156
153.275
228.542
16.350
11.034
519.528
We have only 6 – 1 – 2 = 3 degrees of freedom since we lost two degrees of freedom because we estimated
2 3
the mean and variance from the data.  .05  7.8147 . Since our computed chi-square of 519.528 – 500 =
19.528 is above the table value, reject the null Hypothesis.
2
252solnE2 10/30/03
Problem E4: Check to see if the following 1000 tax payments come from the distribution N(25000,
10000).
Amount in thousands
Number
Below 10
40
10-15
60
15-20
170
20-25
140
25-30
180
30-35
210
35-40
80
Above 40
120
Solution:
H 0 : N 25000 ,10000 
  .01
Income in x
z
F z 
f z 
Thousands
10
-1.5
0.0668
0.0668
Below 10
15
-1.0
0.1587
0.0919
10-15
20
-0.5
0.3085
0.1498
15-20
25
0.0
0.5000
0.1915
20-25
30
0.5
0.6915
0.1915
25-30
35
1.0
0.8413
0.1498
30-35
40
1.5
0.9332
0.0919
35-40
1.0000
0.0668
40 up


the normal table and either adding the table value to .5 (if
.5 (if z is negative).
E
66.8
91.9
149.8
191.5
191.5
149.8
91.9
66.8
To find E , write the top of each
interval in the x column. Then
x   x  25

find z 
.

10
F z  is the cumulative distribution,
for example, .1587 is
F 1.0  Pz  1.0 . The
F z  column can be found by
looking up the value for z in
z is positive) or subtracting the table value from
Since there are eight cells DF  8  1  7 . From the chi-squared table  .2017   18 .475 , but the  2 we
compute is 107.1918, so we reject H 0 . Note that this problem could be done using a KolmogorovSmirnov test.
Income in
E  O  2  E  O  2
E  O 
E
O
Thousands
E
66.8
40
26.8
718.24
10.7521
Below 10
91.9
60
31.9
1017.61
11.0730
10-15
149.8
170
-20.2
408.04
2.7239
15-20
191.5
140
51.5
2652.25
13.8499
20-25
191.5
180
11.5
132.25
0.6906
25-30
149.8
210
-60.2
3624.04
24.1925
30-35
91.9
80
11.9
141.61
1.5409
35-40
66.8
120
-53.2
2830.24
42.3689
40 up
1000.0 1000
0.0
107.1918
3
252solnE2 10/30/03
Problem E5: See if Frunzi earthquakes fit a Poisson distribution with parameter of 1.
Earthquakes Per Day
0
1
2
3
4 or more
Number of Days Observed
25
17
5
2
1
Solution: H 0 : Poisson(1) ,   .05 . f comes from the Poisson table.
E  fn
f
x
0
1
2
3
4+
.3679
.3679
.1839
.0613
.0190
1.0000
18.395
18.395
9.195
3.965
0.950
50.000
O
2
O
E
25
17
5
2
1
50
x
E  fn
O
0
1
2+
18.395
18.395
13.210
50.000
25
17
8
50
O2
E
33.977
15.711
4.845
54.533
O2
 n  54.533  50  4.533 . There are
E
only three cells now. DF  3  1  2 . From the

( 2)
chi-squared table  2 .05  5.991.
Due to the small size of the E’s in the last two
cells we must merge the last three cells to get the
table at right.
So we do not reject H 0 . Note that this problem could be done using a Kolmogorov-Smirnov test.
Note: What if our H 0 was simply H 0 : Poisson ?
Our first step would be to find a mean for the
distribution.
x
O
xO
0
1
2
3
4+
25
17
5
2
1
50
0
17
10
6
4
37
This would give us m 
any case we would have lost a degree of
freedom from estimating the mean. For example,
using Minitab to generate the 0.74 table we
would get:
x
 xO  37  .74
n
50
We would then have to use a Poisson table with
a mean of 0.7 unless a computer was available to
generate a Poisson table with a mean of. 0.74. In
f
E  fn
0
.4771
23.86
1
.3531
17.66
2
.1305
6.53
3
.0360
1.61
4
.0060
0.30
5
.0009
0.05
6
.0001
0.01
But, once again we would have to merge our
lowest cells, so that the last line would be 3+
with an E of 8.51. Once again we would have
only three cells, but, because of our estimation of
the mean, DF  3  1  1  1 .
4
252solnE2 10/30/03
Problem E6: Check to see if the earthquakes in Frunzi fit a Poisson Distribution
Earthquakes Per Day
0
1
2
3
4
5
6
7
Number of Days Observed
40
45
7
4
2
0
1
1
H 0 : Poisson   .05
The table at right shows our
calculation of the mean for the sample as was done in
the previous problem.
When we find that the mean is
Solution:

xO
92

 .92 , we use the Poisson table
n
100
with a mean of 0.9, but find values of E that are too
small. To get a count of 5,we must merge the last 5
rows.
O
E
O2
x
m
0
1
2
3+
40
45
7
8
100

40.66
36.59
16.47
6.28
100.00
E
39.351
55.343
2.975
10.191
107.860
x
O
xO
0
1
2
3
4
5
6
7
40
45
7
4
2
0
1
1
100
0
45
14
12
8
0
6
7
92
f
.4066
.3659
.1647
.0494
.0111
.0020
.0003
.0000
E  fn
40.66
36.59
16.47
4.94
1.11
0.20
0.03
0.00
100.00
O2
 n  107 .860  1000  7.860 There are only four cells now. DF  4 11  2 . From the chiE
( 2)
squared table  2 .05  5.991. So we reject H 0 .
5
252solnE2 10/30/03
Exercise 17.9 from McClave et. al. : (Not assigned but left in because it’s a good example of a test for
uniformity) This problem read that the 1997 Equifax/Harris Consumer Privacy Survey asked 128 Internet
users to indicate the level of their agreement with the statement "The government needs to be able to scan
Internet messages and user communications to prevent fraud and other crimes." Of the respondents 59
agreed strongly, 108 agreed somewhat, 82 disagreed somewhat and 79 disagreed strongly.
a) specify null and alternate hypothesis you would use to determine if the opinions of Internet users are
evenly divided among the four categories.
b) Test the hypothesis in a) using   .05.
c) In the context of this exercise what are Type I and Type II errors?
Solution: a) H 0 : Uniformity or H 0 : p1  p2  p3  p4 where pi is the probability of being in category i..
b)   .05 . Since there are 4 categories, each pi must be
Row
O
1
59
2
108
3
82
4
79
Total 328
O2
E
82
82
82
82
328
1
4
. We divide 328 into 4 equal parts of 82.
E
42.4512
142.2439
82.0000
76.1098
342.8049
O2
 n  342 .8049  328  14.8049 .
E
.2053  14.8049 . Since the computed  2 is less than the table  2 , do not reject the null hypothesis.
df  4  1  3
2 

c) Type I error: Wrongly reject uniformity.
Type II error: wrongly fail to reject uniformity.
6
Download