Document 15930263

advertisement
252solnE1 10/23/06
(Open this document in 'Page Layout' view!)
Roger Even Bove
E. CHI-SQUARED AND RELATED TESTS.
1. Tests of Homogeneity and Independence
Text 12.18, 12.19 - 21, 12.26 [12.23*, 12.24, 12.27] (12.22, 12.27) E1, E2, E3
2. Tests of Goodness of Fit
Text 12.51, 12.54 [12.49*, 12.52*. Both on CD12_5], E4, E5, E6
a. Uniform Distribution
b. Poisson Distribution
c. Normal Distribution
3. Kolmogorov-Smirnov Test
E7, E8, E9, E10, E11
a. Kolmogorov-Smirnov One-Sample Test
b. Lilliefors Test.
Solutions to outline point 1 are in this document.
--------------------------------------------------------------------------------------------------------------------------------Problems involving Tests of Homogeneity and Independence.
Exercise 12.18 [12.23 in 9th]: The results of a Gallup phone survey appear below. Consumers were asked
if they objected to having their medical records shared with different types of organizations. Results follow.
O
Ins Cos Pharm Research
Yes  820
590
670 


No  180
410
330 
a) Is the proportion of people who object different for different institutions?   .05 .
b) If appropriate, use the Marascuilo procedure to determine which organizations are different. Discuss.
Solution: a) We are testing H 0 : Homogeneity or H 0 : p1  p2  p3 , where p1 is the proportion saying
‘yes’ to an insurance company, p 2 is the proportion saying ‘yes’ to a pharmacy, etc.
O
Yes
Ins Cos



Pharm
Research
820
590
410
1000
pr
.6933
The row proportions are gotten by
.3067
1.0000
2080
dividing row totals into the overall total, for example .6933 
. We now get our expected table by
3000
using the row proportions to multiply the column totals, for example we replace 820 by .6933 1000 
No
Total
180
1000
Total
670 

330 
1000
E
 693 .3 . The expected array is
Yes
No
Total
2080
920
3000
Ins Cos



The formula for the chi-squared statistic is  2 
Pharm
693 .3
306.7
1000

Research
693.3
306.7
1000
O  E 2
E
or  2 
693.3 

306.7 
1000

Total
pr
2080
.6933
920
3000
.3067
1.0000
O2
 n . The first of these two
E
formulas is shown below.
1
252solnE1 10/23/06
E
(Open this document in 'Page Layout' view!)
E  O  2
E  O2
E O
O
Roger Even Bove
E
693.3
820
-126.7
16052.89
23.15432
306.7
180
127.7
16052.89
53.17017
693.3
590
103.3
10670.89
15.39145
306.7
410
-103.3
10670.89
34.79260
693.3
670
23.3
542.89
0.78305
306.7
330
-23.3
542.89
1.77010
3000.0
3000
0.0
129.06535
2
(The Instructor’s Solution Manual gets calc
 128.24 ) The degrees of freedom for this application are
r  1c  1  2  13  1  12  2 . Since   .05, we compare the calculated chi-square with
2
2
is larger than the table value we reject H 0 . The Instructor’s Solution
 2.05  5.9915 . Since our calc
Manual puts it this way:
H 0 : p1  p2  p3
H1 : at least one proportion differs
where population 1 = insurance companies, 2 = pharmacies, 3 = medical researchers
2
Decision rule: df = (c – 1) = (3 – 1) = 2. If  > 5.9915, reject H0.
 2 = 128.24
2
Decision: Since  calc = 128.24 is above the upper critical bound of 5.9915, reject H0. There
Test statistic:
is enough evidence to show that there is a significant difference in the proportion of people
who object to their medical records being shared.
b) The Marascuilo procedure says that, for 2 by c tests, if (i) equality is rejected and
 
(ii) p a  p b   2 s p , where a and b represent 2 groups, the chi - squared has c  1 degrees of
freedom and the standard deviation is s p 
p a q a pb qb

, you can say that you have a significant
na
nb
difference between p a and p b .
820
590
 .820 p2 
 .590
1000
1000
pq
.820 .180 
pq
.590 .410 
670
p3 
 .670 and the variances 1 1 
 .0001476 , 2 2 
 .0002419 ,
1000
n1
1000
n2
1000
For the three column proportions saying ‘yes,’ we have p1 
p3q3 .670 .330 

 .0002211 . We get the following table.
n3
1000
Pair
Critical Range
2 sp 
pa  pb
1 to 2
5.9915 .0001476  .0002419   .048
.820  .590  .230
2 to 3
5.9915 .0002419  .0002211   .053
.590  .670  .080
1 to 3
5.9915 .0001476  .0002211   .047
.820  .670  .150
Since, in every case, the difference between the proportions exceeds the critical range, we can say that there
is a significant difference between each pair of proportions.
2
252solnE1 10/23/06
(Open this document in 'Page Layout' view!)
Roger Even Bove
Exercise 12.19 -12.21 [12.24 9in 9th] (12.22 in 8th edition): On the basis of the first table of each pair
posted below, Are there significant differences in cities in such characteristics between cities? Provide a pvalue for each.
Solution: The solution is repeated from the Instructor’s Solution Manual .
(a)
Item #1: Use the guest's name:
H 0 : p1  p2  p3
H1 : Not all p j are equal
where population 1 = Hong Kong, 2 = New York, 3 = Paris
Observed Frequencies:
Finding
Expected Frequencies:
Finding
City
Hong Kong New York
Yes
26
39
No
74
61
Total
100
100
Paris
28
72
100
Total
93
207
300
City
Hong Kong New York
Yes
31
31
No
69
69
Total
100
100
Paris
31
69
100
Total
93
207
300
Level of Significance
0.05
Number of Rows
2
Number of Columns
3
Degrees of Freedom
2
Critical Value
5.991476
Chi-Square Test Statistic 4.581581
p -Value
0.101186
Do not reject the null hypothesis
Test statistic:
 


All cells
(b)
 fo  fe 
fe
2
 4.582
Decision: Since the measured test statistic of 4.582 is smaller than the critical value of 5.991,
we do not reject the null hypothesis. There is not enough evidence to conclude that there is a
difference in the proportion of hotels that use the guest's name among the three cities.
The p value is 0.101. The probability of obtaining a sample that gives rise to a test
statistic more extreme than 4.582 is 0.101 if the null hypothesis is true.
3
252solnE1 10/23/06
(c)
(Open this document in 'Page Layout' view!)
Roger Even Bove
Item #2: Minibar charges correctly posted at check-out:
H 0 : p1  p2  p3
H1 : Not all p j are equal
Observed Frequencies:
Expected Frequencies:
City
Minibar Charges Posted Hong Kong New York
Yes
86
76
No
14
24
Total
100
100
Paris
78
22
100
Total
240
60
300
City
Minibar Charges Posted Hong Kong New York
Yes
80
80
No
20
20
Total
100
100
Paris
80
20
100
Total
240
60
300
Level of Significance
0.05
Number of Rows
2
Number of Columns
3
Degrees of Freedom
2
Critical Value
5.991476
Chi-Square Test Statistic 3.499998
p -Value
0.173774
Do not reject the null hypothesis
Test statistic:
 


All cells
(d)
(e)
 fo  fe 
fe
2
 3.50
Decision: Since the measured test statistic of 3.5 is smaller than the critical value of 5.991,
we do not reject the null hypothesis. There is not sufficient evidence to conclude that there
is a difference in the proportion of hotels that correctly post Minibar charges among the three
cities.
The p value is 0.174. The probability of obtaining a sample that gives rise to a test statistic
more extreme than 3.5 is 0.174 if the null hypothesis is true.
Item #3: Bathroom tub and shower spotlessly clean:
H 0 : p1  p2  p3
H1 : Not all p j are equal
Observed Frequencies:
Expected Frequencies:
City
Bathroom and Shower Clean Hong Kong New York
Yes
81
76
No
19
24
Total
100
100
Paris
79
21
100
Total
236
64
300
City
Bathroom and Shower Clean Hong Kong New York Paris
Yes 78.6666667 78.666667 78.66667
No 21.3333333 21.333333 21.33333
Total
100
100
100
Total
236
64
300
Level of Significance
0.05
Number of Rows
2
Number of Columns
3
Degrees of Freedom
2
Critical Value
5.991476
Chi-Square Test Statistic 0.754766
p -Value
0.685653
Do not reject the null hypothesis
252solnE1 10/16/03
4
252solnE1 10/23/06
(e)
(Open this document in 'Page Layout' view!)
Test statistic:
 

 fo  fe 

.
(f)
(g)
(h)
2
fe
All cells
Roger Even Bove
 0.755
Decision: Since the measured test statistic of 0.755 is smaller than the critical value of 5.991,
we do not reject the null hypothesis and conclude that there is no significant relationship
between item #3 and the city.
The p value is 0.686. The probability of obtaining a sample that gives rise to a test statistic
more extreme than 0.755 is 0.686 if the null hypothesis is true.
Since the null hypotheses are not rejected for all the 3 items, it is not necessary to perform
the Marascuilo procedure.
(a)
Item #1: Use the guest's name:
H 0 : p1  p2  p3
H1 : Not all p j are equal
Test statistic:

 

 fo  fe 
(c)
Decision: Since the measured test statistic of 9.163 is greater than the critical value
of 5.991, we reject the null hypothesis and conclude that there is a significant
difference in the proportion of hotels that use the guest's name among the 3 cities.
The p value is 0.01. The probability of obtaining a sample that gives rise to a
test statistic more extreme than 9.163 is 0.01 if the null hypothesis is true.
Item #2: Minibar charges correctly posted at check-out:
H 0 : p1  p2  p3
H1 : Not all p j are equal
Test statistic:

 

 fo  fe 
(e)
 


 fo  fe 
2
 1.51
fe
All cells
(g)
 7.0
Decision: Since the measured test statistic of 7.0 is greater than the critical value of
5.991, we reject the null hypothesis and conclude that there is significant
relationship between item #2 and the city.
The p value is 0.03. The probability of observing a sample that gives rise to a test
statistic more extreme than 7.0 is 0.03 if the null hypothesis is true.
Item #3: Bathroom tub and shower spotlessly clean:
H 0 : p1  p2  p3
H1 : Not all p j are equal
Test statistic:
(f)
2
fe
All cells
(d)
 9.163
fe
All cells
(b)
2
Decision: Since the measured test statistic of 1.51 is smaller than the critical value
of 5.991, we do not reject the null hypothesis and conclude that there is no
significant relationship between item #3 and the city.
The p value is 0.470. The probability of obtaining a sample that gives rise to a test
statistic more extreme than 1.51 is 0.470 if the null hypothesis is true.
Marascuilo procedure for Item #1:
U2  2.4478 ;
U2

pS j 1  pS j
nj
Critical range =
 p
S j'
1  p 
S
j'
n j'
5
252solnE1 10/23/06
(Open this document in 'Page Layout' view!)
Roger Even Bove
(g)
Sample
Group Proportio
n
0.26
1
(i)
Sample
Size
Absolute
Std. Error Critical
Comparison Difference of Difference Range Results
200 Group 1 to
Group 2
0.13
0.04638426
0.114 Means
are
different
0.39
200 Group 1 to
0.02
0.04438468 0.109 Means
2
Group 3
are not
different
0.28
200 Group 2 to
0.11
0.0468775 0.115 Means
3
Group 3
are not
different
There is a difference between Hong Kong and New York in the proportion of hotels that use
the guest’s name.
Marascuilo procedure for Item #2:
Sample
Sample
Absolute
Std. Error Critical
Group Proportio
Size
Comparison Difference of Difference Range Results
n
0.86
200 Group 1 to
0.1
0.03891015 0.095 Means
1
Group 2
are
different
0.76
200 Group 1 to
0.08
0.03820995 0.094 Means
2
Group 3
are not
different
0.78
200 Group 2 to
0.02
0.04207137 0.103 Means
3
Group 3
are not
different
There is a difference between Hong Kong and New York in the proportion of hotels that
correctly post Minibar charges.
The larger is the sample size, the higher is the power of the test. When the sample size is
doubled, the ability of the test to recognize a difference in the proportion of hotels that use
the guest's name among the 3 cities and the proportion of hotels that correctly post Minibar
charges among the 3 cities is increased. However, we still cannot conclude that there is a
significant difference in the proportion of hotels with spotless bathroom tub and shower
among the 3 cities at 0.05 level of significance.
6
252solnE1 10/23/06
(Open this document in 'Page Layout' view!)
Roger Even Bove
Exercise 12.26 [12.27 in 9th] (12.27 in 8th edition): On the basis of the first table shown below, are the time of
year and numbers selected independent?
Solution: The solution is repeated from the Instructor’s Solution Manual . This is a test of independence.
12.27
(a)
Decision: Since the calc  20.680 is above the critical bound of 12.592, reject H0. There is
evidence of a relationship between the quarter of the year in which draftable-aged men were
born and the numbers assigned as their draft eligibilities during the Vietnam War.
It appears that the results of the lottery drawing are different from what would be
expected if the lottery were random.
(a)
H0: There is no relationship between the quarter of the year in which draftableaged men were born and the numbers assigned as their draft eligibilities during
the Vietnam War.
H1: There is a relationship between the quarter of the year in which draftableaged men were born and the numbers assigned as their draft eligibilities.
2
2
Decision rule: If  > 12.592, reject H0.
Test statistic:   9.803
2
(b)
(c)
Decision: Since the calc  9.803 is below the critical bound of 12.592, do not
reject H0. There is not enough evidence to conclude there is any relationship
between the quarter of the year in which draftable-aged men were born and the
numbers assigned as their draft eligibilities during the Vietnam War.
It appears that the results of the lottery drawing are consistent with what would be
expected if the lottery were random.
2
(b)
7
252solnE1 10/23/06
(Open this document in 'Page Layout' view!)
Roger Even Bove
Problem E1: (Sincich}An Ernst and Young survey of 126 warehouses operated by retail stores tests the
independence of the number of deliveries to stores per week from warehouse size. Use   .05 for
a test of independence.
Deliveries/week
1 or fewer
2-3
4-5
Size (thousands of square feet)
Below 100
100-249.9
250-400
5
13
9
12
11
13
9
14
13
Above 400
5
6
11
Solution: This is an extremely basic chi-squared problem with H 0 : Independence . First we total rows
and columns and then compute the fraction of data in each row. For example in the first row there are 32
deliveries out of a total of 126, so the fraction in the first row is 126/30=.3540, which is the first element in
p r . O (Observed) is the array in the frame.
O
Size (thousands of square feet)
Deliveries /week Below 100 100 - 249.9

1 or fewer
5
13

2-3
11
 12

4-5
9
14

Total
26
38
250 - 400
9
13
13
35
Above 400

5

6

11 

27
Total
32
42
pr
.2540
.3333
52
.4127
126
1.0000
We use p r to get E (Expected) by multiplying the column totals. For example we get 6.6032 by
multiplying the first column total, 26, by .2540. Row and column totals remain the same except for
rounding error.
E
Deliveries /week
1 or fewer
2-3
4-5
Total
Size (thousands of square feet)
Below 100
 6.6032

 8.6667
 10.7302

26.0001
100 - 249.9
9.6508
12.6667
250 - 400
8.8889
11.6667
Above 400
6.6571
9.0000
14.6825
14.4444
11.1429
38.0000
35.0000
27.0000





Total
32
42
pr
.2540
.3333
52
.4127
126
1.0000
8
252solnE1 10/23/06
(Open this document in 'Page Layout' view!)
We now place corresponding values of O and
O
Roger Even Bove
E
2
O
. Degrees of freedom are
E
r  1c  1  3  14  1  6 , where r is the
number of rows and c is the number of columns.
E together to get
5
12
9
13
11
14
9
13
13
5
6
11
126
O2
 n  133 .428  126  7.428
E
. We compare this with  .2056   12.5916 from
2 

the  2 table. Since our computed  2 is less
than the table  2 , we cannot reject H 0 .
6.6032
8.6667
10.7302
9.6508
12.6667
14.6825
8.8889
11.6667
14.4444
6.6571
9.0000
11.1429
126.0001
O2
E
3.7860
16.6153
7.5488
17.5115
9.5526
12.4960
9.1125
14.4857
11.7000
3.6459
4.0000
22.9743
133.426
Problem E2: A random sample of 64 cans of each of 3 brands of canned fruit is examined. The proportion
that are not as labeled is .1094 for brand 1, .0781 for brand 2 and .1563 for brand 3. Is the proportion the
same for each brand?   .01
Solution: We are testing H 0 : Homogeneity or H 0 : p1  p2  p3 , where p1 is the proportion not as
labeled in batch 1, p 2 is the proportion not as labeled in batch 2, etc.
Since the quantities in O must be whole numbers, we get the first row of O by multiplying 64, the
batch size, by .1094 for brand 1, .0781 for brand 2 and .1563 for brand 3 and taking the nearest integer. The
first value is thus .1094  64  7.0016 and we use 7. Our O table is thus the table at right.
We use our row proportions to create E at right.
O2
. Degrees of freedom are
E
r  1c  1  2  13  1  2 , where r is the number of rows and c is the number of columns.
We now place corresponding values of O and E together to get
O2
 n  193 .9508  192  1.9508 . We compare this with  .2012   9.21034 from the  2 table.
E
Since our computed  2 is less than the table  2 , we cannot reject H 0 .
2 

O
Batch 1 Batch2 Batch 3
Not as labeled 
7

As labeled
 57
Total
64
5
10
59
64
54
64
E
Batch 1 Batch2
Not as labeled  7.3333
7.3333

As labeled
56.6667
56.6667

Total
64
64
Batch 3
7.3333
56.6667
64



Total
pr
22
.1146
170
192
.8854
1.0000
Total
 21.9999

170.0001

192 .0000
pr
.1146
.8854
1.0000
9
252solnE1 10/23/06
(Open this document in 'Page Layout' view!)
O
O2
E
6.6818
57.3353
3.4091
61.4294
13.6364
51.4588
193.9508
7
57
5
59
10
54
192
E
7.3333
56.6667
7.3333
56.6667
7.3333
56.6667
192
Roger Even Bove
10
252solnE1 10/23/06
(Open this document in 'Page Layout' view!)
Roger Even Bove
Problem E3: A real estate firm wants to check whether selling price is related to the number of days a
home is on the market. A random sample of 100 homes is taken and divided into three classes according to
selling price. The realtor discovers that 57% of the 30 homes in the under $100,000 class were on the
market for 60 days or fewer. 38% of the 50 homes in the $100,000 - $200,000 class were on the market for
60 days or fewer. Finally, in the above $200,000 class, 35% of 20 homes were on the market for 60 days or
fewer.
a. Do a test of the equality of proportions for the $100,000-$200,000 class and the above $200,000
class. Repeat this test as a chi-squared test.
b. Do a test of equality of proportions for all three classes.
Solution: Our data is n  100 p1  .57, p 2  .38, p3  .35 and n1  30, n2  50, n3  20 .
a) H 0 : p2  p3 H 1 : p 2  p 3 Let p  p 2  p3 and p  p 2  p3  .38  .35  .03 Then our hypotheses
are H 0 : p  0 and H 1 : p  0 .
Interval for
Confidence
Interval
Hypotheses
Test Ratio
Difference
between
proportions
q  1 p
p  p  z 2 sp
H 0 : p  p0
p  p1  p2
H1 : p  p0
z
sp 
Critical Value
pcv  p0  z 2  p
p  p0
If p0  0
p
 p 
If p  0
p1q1 p2 q 2

n1
n2
p0  p01  p02
or p 0  0
 p 
p01q 01 p02 q 02

n1
n2
Or use
p0 q 0  1 n1 
1
n2

n p  n2 p2
p0  1 1
n1  n2
s p
We should replace 1 with 2 and 2 with 3 in these formulas.
p0 
n2 p 2  n3 p 3
n2  n3

50 .38   20 .35  26

 .3714
50  20
70
 1
1
  
 n2 n3 
 p  p0 q0 
z
 p  p 0
Note:
 
2
 p

q 0  1  p 0  1  .3714  .6286
.3714 .6286 
1
1 


 50 20 
.03  0
 0.2347 .
.12784
.3714 .6286 .07  
.016343  0.12784
Since this is between  z    z .025  1.96 we cannot reject H 0 .
2
To do a  2 test, use the numbers in columns 2 and 3 below, but instead of using

 O  E 2


E


 , use  2 




 OE  1

2

E

2 
 .

11
252solnE1 10/23/06
(Open this document in 'Page Layout' view!)
Roger Even Bove
b) H 0 : Homogeneity or H 0 : p1  p2  p3
Solution: We are testing H 0 : Homogeneity or H 0 : p1  p2  p3 , where p1 is the proportion sold in 60
days among the expensive homes, p 2 is the same proportion among midrange homes, etc.
To get the top line of O , multiply
p1  .57, p 2  .38 and p 3  .35 by
n1  20, n 2  50 and n3  20 respectively .As in
the previous problem the results must be whole
numbers.
O
Sold in 60 days
Not sold
Total
E
We use our row proportions to create E at right.
We now place corresponding values of O and
O2
. Degrees of freedom are
E
r  1c  1  2  13  1  2 , where r is the
number of rows and c is the number of
columns.
O2
2 
 n  103 .3184  100  3.31837 .
E
We compare this with  .2052   5.9915 from the
E together to get

 2 table. Since our computed  2 is less than
the table  2 , we cannot reject H 0 .
Sold in 60 days
Not sold
Total
O
E
17
19
7
13
31
13
100
12.9
21.5
8.6
17.1
28.5
11.4
100.0
Expensive
17
13
30
Midrange
19
31
50
Cheapest
7
13
20
Total
Expensive
12.9
17.1
30.0
Midrange
21.5
28.5
50.0
Cheapest
8.6
11.4
20.0
Total
pr
43.0
57.0
100.0
.43
.57
1.00
43
57
100
O2
E
22.4031
16.7907
5.6977
9.8830
33.7193
14.8246
103.3184
12
pr
.43
.57
1.00
Download