y

advertisement
5/5/00 252z0043
2. A sales manager wishes to predict unit sales by sales persons ( y ) on the basis of number of calls made
( x1 ) and number of different products the salesperson pushes ( x 2 ). The data is below (Use   .01) .
Row
units
y
28
71
38
70
22
27
28
47
14
70
1
2
3
4
5
6
7
8
9
10
calls
products
x1
14
35
22
29
6
15
17
20
12
30
x2
2
5
4
5
2
3
3
5
1
5
x y
1
392
2485
836
2030
132
405
476
940
168
2100
9964
x1 y . There is no way in the universe to get
Many people were too lazy or ignorant to compute
 x y from  x1 and  y nor to get  x x
1 2
1

from
x
1
x
and
2
. You will always be asked
to compute a sum of this sort on an exam, so figure out how to do it in advance.
The quantities below were given:
y  415 ,
y 2  21471 ,
x  200,
x12  4740,
x  35,
x 22  143,


 x y  ?,  x
1

y  1721 ,  x x
1
2
1 2



2
 806 and n  10 . You do not need all of these.
a. Compute a simple regression of units against calls.(8)
b. Compute R 2 (4)
c. Compute s e (3)
d. Compute s b1 ( the std deviation of the slope) and do a confidence interval for 1 .(3)
e. Do a prediction interval for units when the salesperson makes 5 calls. (3) Why is this interval likely to be
larger than other prediction intervals we might compute for numbers of calls that we have actually
observed? (1)
Solution:
x1 y  28 14   7135             70 30   9964 See computation above.
a)

Spare Parts Computation:
x1
x

y
1
n
200

 20 .000
10
 y  415  41.500
n
10
SSx1 
Sx1 y

SSx1
 x y  nx y  1664 .000  2.24865
 x  nx 740 .000
1
1
2
1
1
2
2
1
 nx12  4740  10 20 .000 2
 740 .000
Sx1 y 
 x y  nx y  9964 1020.000 41.500 
1
1
 1664 .000
SSy 
b1 
x
y
2
 ny  21471  10 41 .5000 2
2
 4248 .500  TSS
b0  y  b1 x1  41.500  2.24865 20.000  3.4729
Yˆ  b0  b1 x1 becomes Yˆ  3.4727  2.24865 x1 . Lots of people found b2 instead.
1
5/5/00 252z0043
They hadn't read the question!
 x1 y  nx1 y 
Sx1 y 2
1664 .000 2



 0.8807
SSx1 SSy  x12  nx12  y 2  ny 2  740 .000 4248 .500 
2
b) R
2
( 0  R 2  1 always!)
SSR  b1 Sx1 y  b1
 x y  nx y   2.24865 1664 .000   3741 .75
1
1
R2 
SSR 3741 .75

 0.8807 could be
SST 4248 .50
used in b) or SSR  R 2 SST   .88074248.5  3741.65 could be used in c).
c) SSE  SST  SSR  4248 .5  3741 .75  506 .75
s e  63 .3436  7.95887
d) s b21 
s e2

SSx1

 nx12
8
tn2  t.005
 3.355
SSE 506 .75

 63 .3436
n2
8
( s e2 is always positive!)
s e2
x12
s e2 


63 .3436
 0.08560
740 .000
sb1  0.08560  0.29257
so 1  b1  tsb1  2.24865 3.3550.29257  2.25  0.98 . Note: Some
2
1
b0
b
x 2 
versions of this exam asked for  0  b0  tsb0 , s b20  s e2  
,
or t1  1 . You have to
 n SSx  t 0  s
s b1
1
b0

read the question to find out which one is wanted. Many people didn't.
e) . If Yˆ  3.4727  2.24865 x1 and x10  5 , then Yˆ0  3.4727  2.248655  7.770
From the regression formula outline the Prediction Interval is Y0  Yˆ0  t sY , where
1
sY2  s e2  
n

X 0  X 2
 X
2
 nX
2

1

 1 . So sY2  s e2  

n


x1  x1 2  2  1 x1  x1 2 
 x12  nx12   1  se  n  SSx1  1
0
0
 1 5  20 2

 63 .3436  
 1  63 .3436 1.4041   88 .9400 s y0  88.94  9.4308 .
 10

740


ˆ
So Y0  Y0  t sY  7.770  3.355 9.4308   7.77  31.64 . This interval will be smallest when
x1 x 1  20 . Because 5 is below any values of x1 that we actually have, the prediction interval will be
relatively gigantic as the x 0  x 2 involves values of x 0 that are far from the mean.
12
5/5/00 252z0043
3. Data from problem 2 is repeated. (Use   .01) .
Row
units
y
28
71
38
70
22
27
28
47
14
70
1
2
3
4
5
6
7
8
9
10
calls
x1
14
35
22
29
6
15
17
20
12
30
products
x2
2
5
4
5
2
3
3
5
1
5
 y  415 ,  y  21471 ,  x1  200,  x12  4740,  x 2  35,  x22  143,
 x y  ?,  x y  1721,  x x  806 and n  10 .
2
1
2
1 2
a. Do a multiple regression of units against calls and products. (12)
b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare
the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with
the R 2 from the previous problem.(6)
c. Compute the regression sum of squares and use it in an F test to test the usefulness of this regression. (5)
d. Use your regression to predict the number of units sold when a sales person makes 20 calls and pushes 5
products.(2)
e. Use the directions in the outline to make this estimate into a confidence interval and a prediction interval.
(4)
35
 3.500 . Second, we compute or
Solution: a) First, we compute Y  41 .500 , X 1  20 .000 and X 2 
10
copy
X
 X Y  9964 ,  X
1
1X 2
2Y
 1721 ,
Y
2
 21471 ,
X
2
1
 4740 ,
X
2
2
 143 and
 806 . Third, we compute or copy our spare parts:
 Y  nY  4248 .500 *
Sx y   X Y  nX Y  9964  1020.000 41.500   1664 .00
Sx y   X Y  nX Y  1721  103.500 41.500   268 .5
SSx1   X 12  nX 12  4740  1020.0002  740.00 *
SSx2   X 22  nX 22  143  103.52  20.500*
and Sx x   X X  nX X  806  1020.000 3.500   106 .000 .
SSy 
2
2
1
1
2
2
1 2
1
2
1
2
1
2
* indicates quantities that must be positive. (Note that some of these were computed for the last problem.)
13
5/5/00 252z0043
Fourth, we substitute these numbers into the Simplified Normal Equations:
X 1Y  nX 1Y  b1
X 12  nX 12  b2
X 1 X 2  nX 1 X 2


 X Y  nX Y  b  X X
2
which are
2
1
1
2
 
 nX X   b  X
1
2
2
2
2

 nX  ,
2
2
1664 .00  740 .00b1  106 .00 b2

268 .500  106 .00 b1  20 .500 b2
and solve them as two equations in two unknowns for b1 and b2 . We do this by multiplying the second
equation by 6.9811, which is 740.00 divided by 106.00. The purpose of this is to make the coefficients of
b1 equal in both equations. We could do just as well by multiplying the second equation by 20.5 divided by
106 and making the coefficients b2 equal.
1664 .00  740 .00b1  106 .00 b2
So the two equations become 
. We then subtract the second equation
1874 .43  740 .00 b1  143 .11b2
210 .43
 5.6704 . The first of the two normal
from the first to get 210 .43  37 .11b2 , so that b 2 
 37 .11
equations can now be rearranged to get 740 b1  1874 .43  106 .005.6704  , which gives us b1  1.4364 .
Finally we get b0 by solving b0  Y  b1 X 1  b2 X 2  41.500  1.4364 20 .00   5.6704 3.500 
 7.0954 . Thus our equation is Yˆ  b0  b1 X 1  b2 X 2  7.0954  1.4364X 1  5.6704X 2 .
Note: An alternate way of solving the Simplified normal equations is to multiply the second equation by
1664 .00  740 .00b1  106 .00 b2
5.1707 which is 106 divided by 20.5. The resulting equations are 
We then
1388 .34  548 .10 b1  106 .00 b2
275 .66
 1.436 . If we
subtract the second equation from the first to get 275 .66  191 .90b1 , so that b1 
191 .90
then solve for b 2 , we get essentially the same answer.
b) The Regression sum of Squares is
SSR  b1
X 1Y  nX 1Y  b2
X 2 Y  nX 2 Y  b1 Sx1 y   b2 Sx2 y 

 

 1.4364 1664 .00   5.6704 268 .500   3912 .672 * and is used in the ANOVA below.
The coefficient of determination is R 2 

SSR b1 Sx1 y   b2 Sx 2 y 

SST
SSy
1.4364 1664 .00   5.6704 268 .500   3912 .672
4246 .50
R2 *
.8807
.9210
4246 .50
n
10
10
 .9210 * . Our results can be summarized below as:
k
1
2
R 2 , which is R 2 adjusted for degrees of freedom, has the formula R 2 
R2
.8807
.8984
n  1R 2  k , where
n  k 1
k is the
number of independent variables. R 2 adjusted for degrees of freedom seems to show that our second
regression is better.
14
5/5/00 252z0043
One way to do the F test is to note that the total sum of squares is SSy 
Y
2
 nY 2  4246 .500 . For
the regression with one independent variable the regression sum of squares is
SSR  b1 Sx1 y  b1
x1 y  nx1 y  2.24865 1664 .000   3741 .75 *. For the regression with two


independent variables the regression sum of squares is was computed in b) as 3912.672.. The difference
between these is 170.922. The remaining unexplained variation is SSE  SST  SSR
= 4248.500 – 3912.672 = 335.828*.
The ANOVA table is
Source
X1
X2
SS*
3741.75
170.922
DF*
1
1
MS*
3741.75
170.922
F*
3.563
F.01
F71  12.25
335.828
7
47.9755
Error
4248.500
9
Total
Since our computed F is smaller than the table F , we do not reject our null hypothesis that X 2 has no
effect.
A faster way to do this is to use the R 2 s directly. The difference between R 2 = 88.07% and R 2 = 92.10%
is 4.03%.
Source
SS*
DF*
MS*
F*
F.01
88.07
1
88.07
X1
X1
4.03
1
4.03
3.57
F71  12.25
7.90
7
1.12857
Error
100.00
9
Total
The numbers are a bit different because of rounding, but the conclusion is the same.
c) We computed the regression sum of squares in the previous section.
Source
SS
DF
MS
3912.672
2
1956.33
X1 , X 2
F
41.02
F.01
2,7   9.55
F.01
335.828
7
47.975
Error
4248.500
9
Total
Since our computed F is larger than the table F , we reject our null hypothesis that X 1 and X 2 do not
explain Y .
d) Yˆ  b0  b1 X 1  b2 X 2  7.0954  1.4364X 1  5.6704X 2 . Since the last few digits don't seem to mean a
lot I used Yˆ  7.10  1.4420  5.675  50.05 .
15
5/5/00 252z0043
From the ANOVA above SSE  335 .828 .
s e2
Y

2
 nY 2  b1
 X Y  nX Y  b  X Y  nX Y   Y

1
1
2
2
2
2

 nY 2 1  R 2

n3
n3
SSE
335 .828


 47 .975 * . This can be read from the MS in the ANOVA above.
n  k 1
7
s e  47 .975  6.926 .
According to the outline " An approximate confidence interval  Y0  Yˆ0  t
prediction interval Y0  Yˆ0  t s e ."
se
se
and an approximate
n
7
Use tnk 1  t.005
 3.499. So the Confidence Interval is
2
 50 .05  3.499 
 50 .05  7.66 and the Prediction Interval is Y0  Yˆ0  t s e
n
10
 50.05  3.499 6.926   50.05  24.2.
 Y0  Yˆ0  t
6.926
16
5/5/00 252z0043
4. Your country's tourist office reports the
following tourist arrivals over a 20 year period.
year arrivals (thousands)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
11.75
78.93
203.04
268.95
380.49
457.32
525.51
596.56
640.74
710.67
748.02
795.13
845.21
843.08
922.58
945.22
934.72
945.67
952.38
933.86
Do the following:
a. Using only the R 2 s given above:   .01
(i) Show that R 2 adjusted for degrees of
freedom rises between the first and second
regression. (2)
(ii) Fake an F test to show that the addition of
the year squared improves the regression. (4)
(iii) Test the correlation between arrivals and
year for significance (3)
(iv) Test the hypothesis that the correlation
between arrivals and year is .99 (4)
b. Compute a rank correlation between arrivals
d  0 you are
and year (Note: if you can't get

Your assistant fits the following equations to the
data:
arrivals = 162 + 50.0 year
(39.5) (3.55)
R-sq =91.7% Durbin-Watson statistic =
0.19
wasting both our time) (3) and
(i) Test it for significance (2)
(ii) Explain why it is higher than the correlation
you computed in part a above. (1)
c. Explain what the values of the Durbin-Watson
statistics show. (4)
arrivals= -3.34 + 105 year - 2.90 yearsq
(8.94) (2.18)
(0.111)
R-sq = 99.8% Durbin-Watson statistic =
2.48
Solution: a) (i) R 2 , which is R 2 adjusted for degrees of freedom, has the formula R 2 
n  1R 2  k ,
n  k 1
where k is the number of independent variables. For the first one, n  20 and k  1 so
19 0.917  1  0.912 , and for the second one, n  20 and k  2 so R 2  19 0.998  2  0.998 .
R2 
18
17
(ii) The difference between R 2 = 91.7% and R 2 = 99.8% is 8.1%.
Source
SS
DF
MS
91.7
1
X1
X2
8.1
1
8.1
F
68.85
F.01
1,17  8.40
F.01
0.2
17
0.11764
Error
100.0
19
Total
Since our computed F is larger than the table F , we reject our null hypothesis that X 2 has no effect.
17
5/5/00 252z0043
 XY  nXY
 X  nX  Y
(iii) The simple sample correlation coefficient is r 
2
2
2
 nY 2
, square root of
 XY  nXY 

 X  nX  Y  nY   .917 . Since this was given by the printout, we don't need to compute
2
R
2
2
2
2
2
it, so r  .917  .9576 . From the outline, if we want to test H 0 : xy  0 against H1 : xy  0 and
x and y are normally distributed, we use t n  2  
r
1 r
n2
2

9576
1  .9576 
20  2
2
18
 2.878 ,
 14 .1018 . Since t .005
we reject H 0 .
(ii) The outline says, if   0, and we want to test H 0 : xy   0 against H1 : xy   0 “ we need to use
1  1  0
1 1 r 
Fisher's z-transformation. Let ~
z  ln 
 . This has an approximate mean of  z  ln 
2  1  0
2  1 r 
~
n 2 
z  z
1

standard deviation of s z 
, so that t
. “ So if r  .9576, n  20 and
sz
n3
1  1  .9576 
1  1  .99 
 0  .99, ~z  ln 
  2.64665 ,  z  ln 
  1.91616 , s z 

 and a

1
 0.242536 and
20  3
2  1  .9576 
2  1  .99 
~
n  2 
z   z 2.64665  1.91616
18
t


 3.011 Since t .005
 2.878 and this is a 2-sided test, we reject H 0 .
sz
0.242536
b) (i) The data is repeated below with the calculations for rank correlation.
year
arrivals
x1
x2
r1 r2 d  r1  r2 d 2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
rs  1 
11.75
78.93
203.04
268.95
380.49
457.32
525.51
596.56
640.74
710.67
748.02
795.13
845.21
843.08
922.58
945.22
934.72
945.67
952.38
933.86
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
3
4
5
6
7
8
9
10
11
12
14
13
15
18
17
19
20
16
0
0
0
0
0
0
0
0
0
0
0
0
1
-1
0
-2
0
-1
-1
4
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
4
0
1
1
16
24
 d  1  624   0.9820 . If we want a 2-sided test at the 99% confidence level of
nn  1
20 20   1
2
6
2
2
H 0 :   0 , compare rs with the 0.5% value from the Pearson’s rank correlation coefficient table. Since
the table value is .4451 reject the null hypothesis. We conclude that the rank correlation is significant.
18
5/5/00 252z0043
(ii) The second regression shows that there is a slight curvature in the relation between the two variables.
Since correlation tests for a linear relationship, it is not quite appropriate, but rank correlation will detect a
slightly curved but generally positive relationship.
c) A Durbin-Watson Test is a test for autocorrelation. For   .01 , k  1 and n  20 , the text table gives
d L  .95 and d U  1.15 . The null hypothesis is ‘No Autocorrelation’ and our rejection region is
d  d L, 2  .95 or d  4  d L, 2  3.05 . We really should use the   .005 value for d L , but a check of the
  .05 table leaves us sure that it is somewhat below .95. thus the D-W statistic of 0.18 is probably in the
rejection region. For   .01 , k  2 and n  20 , the text table gives d L  .86 and d U  1.27 The 'do not
reject' region is between dU ,  1.27 and 4  dU ,  2.73 . 2.48 is in this region, but this is really for
2
2
  .02 . We can't be sure if we actually use   .01 .
19
5/5/00 252z0043
5. An analysis of a sample of 200 prisoners of their adjustment to civil life after release from prison reveals
the following:   .01
Residence
Adjustment to Civil Life.
after release
Outstanding
Good Fair
Poor
Total
Hometown
27
34
34
25
120
Not Hometown
15
16
24
25
80
Total
42
50
58
50
200
Do statistical tests of the following:
a. The proportion in each adjustment category was the same for both 'hometown' and 'not hometown'
groups. (8)
b. The proportion in the combined 'outstanding' and 'good' categories was higher in the 'hometown' group
than the 'not hometown' group. (5)
c. The combined proportion of the whole group of 200 that made an 'outstanding' or 'good' adjustment was
50% (4)
Solution: Note!! A test of multiple proportions is a  2 test! Every year I see people trying to
compare more than two proportions by a method appropriate for b) below. It doesn't work! p is
defined as a difference between two proportions, when you have more than two that definition
doesn't work. Also, simply computing the proportions and telling me that they are different is just a
way of making me suspect that you don't know what a statistical test is.
a) The data is copied below. The p r s are found by dividing the row sums in O by grand total. The p r s
are then used to multiply the column totals to get the material in E.
O
O G F P
Total p r
E
O
G
F
P
Total p r
H
27 34 34 25 120 .60
H
25 .2 30 .0 34 .8 30 .0 120 .60
80 .40
80 .40
NH 15 16 24 25
NH 16 .8 20 .0 23 .2 20 .0
Total 42 50 58 50 200
Total 42 .0 50 .0 58 .0 50 .0 200
This is a chi-squared test of homogeneity. Our null hypothesis is 'Homogeneity' . The calculations are done
in two ways below. Save time by computing only
Row
O
1
27
2
34
3
34
4
25
5
15
6
16
7
24
8
25
Total 200
OE
E
25.2
30.0
34.8
30.0
16.8
20.0
23.2
20.0
200.0
-1.80000
-4.00000
0.80000
5.00000
1.80000
4.00000
-0.80000
-5.00000
O  E 2
3.2400
16.0000
0.6400
25.0000
3.2400
16.0000
0.6400
25.0000
O2
.
E
O  E 2
E
0.12857
0.53333
0.01839
0.83333
0.19286
0.80000
0.02759
1.25000
3.78406
O  E 2
O2
E
28.9286
38.5333
33.2184
20.8333
13.3929
12.8000
24.8276
31.2500
203.7841
O2
 n  203 .7841  200  3.7841
E
E
 .2013  11 .3449 so do not reject the null hypothesis. We conclude that, except for random variations, the
df  r  1c  1  13  3
2 



proportion in each category is the same for both groups.
20
5/5/00 252z0043
b) From Table 3
Interval for
Difference
Between
Proportions
q  1 p
Confidence
Interval
Hypotheses
Test Ratio
p  p  z 2 sp
H 0 : p  p 0
p  p1  p 2
H 1 : p  p 0
z
sp 
p 0  p 01  p 02
p1q1 p2 q 2

n1
n2
or p 0  0
p  p 0
 p
If p  0
 p 
p01q 01 p02 q 02

n1
n2
Or use
Critical Value
pcv  p0  z 2  p
If p0  0
 p 
p0 q 0  1 n1 
1
n2

n p  n2 p2
p0  1 1
n1  n2
s p
H : p  0
H : p  p 2
Our Hypotheses are  0
or  0 1
where p  p1  p2 . If we use the test ratio method, we
H 1 : p  0
H 1 : p1  p 2
61
31
61  31
92
 .5083 , p 2 
 .3875 and p 0 

 .46 . So p  p1  p 2
need to find p1 
120
80
120  80 120
 .5083 .3875  .1208 .


p 0 q 0  1  1   .46 .54  1
1
. So
n2 
120
80  .005175  .07193
 n1
p  p 0
.1208
z

 1.680 . Since z  z.01  2.327 do not reject H 0 . We do not reject the null
 p
.07193
 p 
hypothesis if z  2.327 .
c) Table 3 says the following:
Interval for
Confidence
Interval
Proportion
p  p  z 2 s p
pq
n

p  1 p
sp 
H 0 : p  .50

H 1 : p  .50
Hypotheses
Test Ratio
H 0 : p  p0
z
H1 : p  p0
p  p0
p
Critical Value
pcv  p0  z 2 p
p 
p0 q0
n
In the last part of the problem, we found that the proportion of people in the 'outstanding'
or 'good' categories was p  .46. Thus, if we use the test ratio method
p  p0
.46  .50
 .04
z


 1.1314 . We reject H 0 if z is not between z.005  2.576 . It is
p
.
03536


.50 .50
200
between these values, so we do not reject H 0 .
21
5/5/00 252z0043
6. In an effort to teach safety principles to a group of your employees, 22 employees were randomly
assigned to one of four groups. After the sessions they took a test that was scored from 0 to 10 with the
following results:
Programmed
Lecture
Videotape
Discussion
Instruction
7
8
7
8
6
5
9
5
5
8
6
6
6
6
8
6
6
9
5
5
8
10
Do statistical tests of the following:   .01 (Assume that the underlying distribution is Normal)
a. Is there a difference between the means? (7)
b. Does column 4 have a Normal distribution with a population mean of 7.2 and a population standard
deviation of 1.5? (5)
c. At the same time we gave the managers a test on safety and then a day of training - scores were not
reported, but of 15 managers 11 performed better after the day of training. Use a sign test to show if the
day of training was successful.   .05  (4)
Solution: Note!! A test of multiple means is an Analysis of Variance! Every year I see people trying
to compare more than two means by a method appropriate for comparing two means. It doesn't
work!    is defined as a difference between two means, when you have more than two that
definition doesn't work. Also, simply computing the means and telling me that they are different is
just a way of making me suspect that you don't know what a statistical test is.
a) Because we are comparing means under the assumption that the underlying distribution is normal, this is
an ANOVA.
Sum
x 1
x 2
x 3
x 4
7
6
5
6
6
8
38
8
5
8
6
9
….
7
9
6
8
5
.…
36 +
35 +
8
5
6
6
5
10
40
 149 
nj
6+
5+
5+
6
 22  n
x j
6.3333
7.2000
7.0000
6.6667
SS
246 +
Sum 
x
ij
 x
ij
i
255 +
286

2   xij2  nx 2  1057  226.7727 2  47.8636
2
2
2
2
2
2
2
2
. j  x    n j x. j  nx  66.3333   57.2000   57.0000   66.6667   22 6.7727 
 x
SSB   x
SST 
270 +
149
 6.7727  x
22
 1057 
xij2
ij
x
 1011 .53347  1009 .12824  2.4052
22
5/5/00 252z0043
Source
SS
Between
2.4052
DF
MS
3
0.8017
F
0.32
F.01
H0
F 3,18  5.09 ns
Column means equal
Within
45.4584
18
2.5255
Total
47.8636
21
H 0 : 1   2   3 H 1 : Not all means equal
Explanation: Since the Sum of Squares (SS) column must add up, 45.4584 is found by subtracting 2.4057
from 47.8636. Since n  22 , the total degrees of freedom are n  1  21 . Since there are 4 random samples
or columns, the degrees of freedom for Between is 4 – 1 = 3. Since the Degrees of Freedom (DF) column
must add up, 18 = 21 – 3. The Mean Square (MS) column is found by dividing the SS column by the DF
MSB
column. 0.8017 is MSB and 2.5255 is MSW . F 
, and is compared with F.01 from the F table
MSW
df1  3, df 2  18  . Because our computed F is less than the table F , do not reject H 0 .
b) Because the mean and variance are known and the sample is small, the only test that is practical is the
Kolmogorov-Smirnov Test. H 0 : x 4 ~ N 7.2, 1.5
The column Fe is the cumulative distribution computed from the Normal table. z is
x1  1
1

x1  7.2
.
1.5
Fo is the Cumulative O divided by n  7 . D  Fo  Fe
O Cumulative O
Fo
Fe
z
D
1
1
.1667 -1.47
.0708
.0959
1
2
.3333 -1.47
.0708
.2625
1
3
.5000 -0.80
.2119
.2881
1
4
.6667
0.80
.2119
.4548
1
5
.8333
0.53
.7019
.1314
1
6
1.0000
1.87
.9693
.0307
7
From the Kolmogorov-Smirnov Table, the critical value for a 95% confidence level is .4050. Since the
largest number in D is above this value, we reject H 0 .
x1
5
5
6
6
8
10
 H 0 : p  .5
c) 
We get the p-value for this result by using the binomial table with p  .5 and n  15 .
 H 1 : p  .5
pvalue  Px  11  1  Px  10  1  .94077  .05923 . Since this is greater than   .05 , we do not reject
H 0 and thus conclude that the training was not successful.
23
5/5/00 252z0043
7. Three groups of Executives are given a test on management principles. We will assume that the
underlying distribution is not Normal. (M&L p627)
Manufacturing Executives
Score
Rank
51
9
31
7
14
1
69
14
86
17
62
12
96
20
80
Finance Executives
Score
Rank
15
2
32
8
68
13
87
18
20
3.5
28
6
77
16
97
21
87.5
Trade Executives
Score
Rank
89
19
20
3.5
60
11
72
15
56
10
22
5
63.5
Using rank tests, test the following:
a. The distributions of scores are the same for all three groups (7)
b. Taken as a single group, nonmanufacturing executives do worse on the test than manufacturing
executives. (7)
c. The median score for Finance executives is 60 (Do not use a sign test if you used it in the last problem.)
(4 points for a sign test, 5 for a better method)
d. 45 days after you get back from Cancun, your doctor orders a runs test. If + indicates days when you had
the runs and - indicates days when you did not. There were 27 + days and 18 - days, and a total of 18 runs
of either plusses or minuses. Was the sequence random? (5)
Solution: a) Since this involves comparing three apparently random samples from a non-normal
distribution, we use a Kruskal-Wallis test. The null hypothesis is H 0 : Columns come from same
distribution or medians are equal.
Sums of ranks were given above. To check the ranking, note that the sum of the three rank sums is 80 +
87.5 + 63.5 = 231, that the total number of items is 7 + 8 + 6 = 21 and that the sum of the first n numbers
nn  1 2122 

 231 . Now, compute the Kruskal-Wallis statistic
is
2
2
2
2
2 
 12
 SRi 2 


  3n  1   12  80   87 .5  63 .5   322 
H 
8
6 
 nn  1 i  ni 
 2122   7


12
914 .2857  957 .0312  672 .0417   66  .025974 2543 .3586   66  0.0613 . If we try to look up this
462
result in the (7 ,8 ,6) section of the Kruskal-Wallis table (Table 9) , we find that the problem is to large for
the table. Thus we must use the chi-squared table with 2 degrees of freedom. Since  .2052   5.9915 do not
reject H 0 .
b) Because we are comparing two random samples from a nonnormal distribution, we use the WilcoxonMann-Whitney Method. If we designate manufacturing as sample 1 and nonmanufacturing as sample 2, our
hypotheses are H 0 : 1   2 and H1 : 1   2 . The sum of ranks for manufacturing is 80. The sum of ranks
for nonmanufacturing is 87.5 + 63.5 = 151. As in part a), their sum is 231, and this checks out as equal to
nn  1
.
2
24
5/5/00 252z0043
We designate the smaller of the two rank sums, 80, as W . We are unable to find critical values or p-values
for a 5% two-tailed test with n1  7 and n2  12 on either of the Wilcoxon-Mann-Whitney tables, since
n2  12 is too high. The outline says that for values of n1 and n 2 that are too large for the tables, W has
the normal distribution with mean W  1 2 n1 n1  n 2  1  1 2 77  14  1  77 and variance
 W2  16 n2 W  16 1477  179.6667 .  W  179 .6667  13 .4040 . Note that our value of W is above
the mean. This is because the average rank of sample 1 is higher than the average rank of sample 2, as it
would have to be if nonmanufacturing executives do worse on the test. This means that we are doing a right
W  W
80  77

 0.2238 . Since this is below z .05  1.645 , we do not reject H 0 .
sided test. z 
W
13 .4040
c) The Wilcoxon Signed rank test for paired data was used in class as a powerful test of the median. Our
hypotheses are H 0 :  60 and H1 :  60 . The difference column will be x   x  60 .
x difference rank
15 -45
8 If we total negative and positive ranks separately, we get
32 -28
4 T   24 and T   12 . According to the Wilcoxon signed
68
8
1 +
rank test table, the 2.5% value for n  8 is 4. Since the
87
27
3 +
smaller of the two rank sums, 12, is above this critical
20 -40
7 value, do not reject the null hypothesis.
28 -32
5 77
17
2 +
97
37
6 d) This is, of course, a runs test. n  45 is the total number of items, n1  27 , n2  18 and r  18 .
To test the null hypothesis of randomness for a small sample, assume that the significance level is 5% and
use the table entitled 'Critical values of r in the Runs Test.’ Unfortunately, n1  27 is to high for the table.
According to the outline, for a larger problem (if n1 and n 2 are too large for the table), r follows the
2n1 n 2
  1  2 . Then   2n1 n 2  1  227 18   1
 1 and  2 
n
n 1
n
45





1


2
r





21
.
6
20
.
6
18

22
.
6

 10 .1127 . So z 

 1.44 . Since this
 22 .6 and  2 
n 1
44

10 .1127
value of z is between  z  1.960 , we do not reject H 0 : Randomness.
normal distribution with  
2
25
Download