252y0141

advertisement
252y0141 5/4/01
(Open in 'Page Layout')
ECO252 QBA2
Name KEY
FINAL EXAM
Hour of Class Registered (Circle)
May 1, 2001
Read "Things that You Should Never Do on a Statistics Exam (or Anywhere Else)!" before taking exams!
I. (16+ points) Do all the following.
1.
Hand in your fourth regression problem (3 points)
Remember: Y = Company profit in millions of dollars, X1 = CEO's yearly income in thousands of
dollars (X1 = 1000 means a million dollar annual income) , X2 = Percentage of stock owed by CEO
(X2 = 3 means the CEO owns 3.0% of the stock)
Use a significance level of 10% in this problem.
2.
Answer the following questions.
a. For the regression of Y against X1 and X2 only, what does the ANOVA tell us? Which of the
coefficients are significant? What tells you this? (3)
b. Do an F test to show if the addition of X2 and X3 improves the regression over your results with X1
alone. (4)
c. Based on your regression of Y against X1, X2, and X3,
(i) What evidence is there that CEO income and stock percentage interact? (1)
(ii) What change does this equation predict for every one thousand dollars of CEO income when
the CEO owns 3% of the company's stock? (3)
(iii) What profit does the equation predict for a firm where the CEO earns $1 million and owns
44% of the stock? What might this lead you to suspect about this equation? (2)
(iv) Based only on the adjusted R-squared and the significance of the coefficients, is there an
equation that seems to work better than the equation with three independent variables? Why? (3)
Solution: a) The regression of Y against X1 and X2 is probably useless. The ANOVA gives a p-value of
0.685 to the null hypothesis that there is no relation between Y and the Xs. This leads us to expect
insignificant coefficients and the high p-values, well above   .10 , confirm that. I'm sure that if you
9
compute or copy the t-ratios and compare them against t .05
, you will find that they are in the 'accept' zone.
b) From my printout, if we regress Y against X1, X2, and X3, we find.
Analysis of Variance
SOURCE
Regression
Error
Total
DF
3
8
11
SS
56926416
42689452
99615872
SOURCE
x1
x2
x3
DF
1
1
1
SEQ SS
2063334
5962759
48900324
MS
18975472
5336182
F
3.56
p
0.067
F
0.21
p
0.655
Compare this to the original regression with X1 only
Analysis of Variance
SOURCE
Regression
Error
Total
DF
1
10
11
SS
2063334
97552536
99615872
MS
2063334
9755254
If we use either the sequential sum of squares in the top printout or the regression sum of squares in the
bottom printout, we find that X1 explains 206334 leaving 56926416 - 206334 = 54863082 for X2 and X3.
If we combine these two we get
SOURCE
DF
SS
MS
F
F.10
x1
x2 and x3
Error
Total
1
2
8
11
2063334
54863082
42689452
99615872
2063334
27431541
5336182
.
5.14
3.11
1
252y0141 5/4/01
2,8  , we reject the null hypothesis of no relation between Y and X2 and
Since our F of 5.14 is larger than F.10
X3.
1159  0.12198 * X 1  6.10 X 2  0.03534 * X 2
, where the numbers in
982 .7
0.04232  61.16 
0.01167 
parentheses are standard deviations, and asterisks indicate coefficients at the 10% level.
(i) One of the coefficients that is significant is that of X2, the interaction term. This tells us that X1
and X2 interact.
(ii) If we substitute X 2  3 and X 3  X 1 X 2  3 X 1, our equation becomes
c) Our regression reads
Y
Y  1159  0.12198 * X 1  6.10 3  0.03534 * X 13 , so that X1 is multiplied by .12198 - 3(.03534)
= .0161.. This is the amount that Y will rise every time X1 goes up by one.
(iii) Substitute 1000 for X1 and 44 for X2., You should get a value for Y of about -5280. It is
doubtful that either a million-dollar executive or high stock ownership by the CEO would produce
a loss. We should realize that in our data million dollar salaries and really high stock ownership do
nor appear together., so we might suspect that this equation has poor predictive powers.
(iv) Given the fact that the coefficient of X2 is not significant, perhaps we ought to limit ourselves
to the equation with X1 and X3 alone. It has a higher R-squared adjusted and the coefficients of
both the remaining Xs are significant at the 10% level.
2
252y0141 5/4/01
II. Do at least 4 of the following 7 Problems (at least 15 each) (or do sections adding to at least 60 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where
applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing
appropriate statistical tests.
1. (Black, p532) A researcher wishes to predict the price of a meal in New Orleans ( y ) on the basis of
location ( x1 - a dummy variable, 1 if the restaurant is in the French Quarter, 0 otherwise) and the
probability of being seated on arrival. ( x 2 ). The data is below (Use   .10 ) .
Row
price
1
2
3
4
5
6
7
8
9
10
FQ
prob
y
x1
x2
8.52
21.45
16.18
6.21
12.19
25.62
13.90
18.66
5.25
7.98
0
1
1
0
1
1
0
1
0
0
0.62
0.43
0.58
0.74
0.19
0.49
0.80
0.75
0.37
0.63
The following are given to help you.
y  135 .96 ,
y 2  2270 .68,
x  5,


 1  x12  5,  x 2  5.6,  x22  3.4658,
 x y  ?,  x y  75.4405 ,  x x  ? and n  10 .
1
2
1 2
You do not need all of these.
a. Compute a simple regression of price against x1 .(7)
b. On the basis of this regression, what price do you expect to pay for a meal in the French Quarter? Outside
the French Quarter? (2)
b. Compute R 2 (4)
c. Compute s e (3)
d. Compute s b0 ( the std deviation of the intercept) and do a confidence interval for  0 .(3)
f. Do a confidence interval for the price of a meal in the French Quarter. (3)
 x y  94 .1 See computation at end of problem 2.
Solution: a)
1
Spare Parts Computation:
x1
x

y
1
n
SSx1 
5

 0.5
10
 y  135 .96  13.596
n
10
 2.50
Sx1 y 
Sx1 y

SSx1
 x y  nx y  26.12  10.44
 x  nx 2.50
1
1
2
1
1
2
2
1
 nx12  5  10 0.52
 x y  nx y  94.1  100.513.596 
1
1
 26 .12
SSy 
b1 
x
y
2
 ny  2270 .68  10 13 .596 2
2
 422 .166
b0  y  b1 x1  13.596  10.44  0.5  8.376
Yˆ  b0  b1 x1 becomes Yˆ  8.376  10.44 x1 .
3
252y0141 5/4/01
b) If x1  1, Yˆ  8.376  10.44 1  18.82
SSR 272 .69
 x y  nx y   10.4426.12   272 .69 R  SST

 0.646
422 .166
 x y  nx y 
Sx y 
26.12 



 0.646
SSx SSy  x  nx  y  ny  2.50 422 .166 
c) SSR  b1 Sx1 y  b1
2
1
1
2
2
R2
1
1
2
1
1
2
1
2
1
2
or
2
( 0  R 2  1 always!)
d) SSE  SST  SSR  422 .166  272 .69  149 .476
s e  18 .685  4.323
SSE 149 .476

 18 .685
n2
8
( s e2 is always positive!)
1
1
x 2 
x2
e) s b20  s e2  
 s e2  
 n SSx 
n
x12  nx12
1


8
sb1  3.737  1.933 tn2  t.05
 1.860



2
  18 .685  1  0.5   18 .685 0.2  3.737

 10 2.50 



2
f) .
s e2 
so  0  b0  tsb0  8.376  1.8601.933  8.38  3.60 .
We have already found that If x1  1, Yˆ  8.376  10.44 1  18.82
From the regression formula outline the Prediction Interval is The Confidence Interval is  Y0  Yˆ0  t sYˆ ,
where
1
sY2ˆ  s e2  
n

X 0  X 2
 X
2
 nX
2




  s 2  1  x10  x1
e
n

SSx1


2 
 1 1  0.52
 18 .685  

 10
2.50



  18 .685 0.2  3.737


s ˆ  3.737  1.933 .
Y
So  Y0  Yˆ0  t sYˆ  18.82  1.860 1.933   18.82  3.60 .
12
252y0141 5/4/01
2. Data from the previous problem is repeated. below . (Use   .10 ) .
Row
price
1
2
3
4
5
6
7
8
9
10
FQ
prob
y
x1
x2
8.52
21.45
16.18
6.21
12.19
25.62
13.90
18.66
5.25
7.98
0
1
1
0
1
1
0
1
0
0
0.62
0.43
0.58
0.74
0.19
0.49
0.80
0.75
0.37
0.63
The following are given to help you.
y  135 .96 ,
y 2  2270 .68,
x  5,


 1  x12  5,  x 2  5.6,  x22  3.4658,
 x y  ?,  x y  75.4405 ,  x x  ? and n  10 .
1
2
1 2
a. Do a multiple regression of price against x1 and x 2 . (12)
b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare
the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with
the R 2 from the previous problem.(4)
c. Compute the regression sum of squares and use it in an F test to test the usefulness of this regression. (5)
d. Use your regression to predict the price of a meal in the French Quarter sold when the probability of
being seated on arrival is 40%(2)
e. Use the directions in the outline to make this estimate into a confidence interval and a prediction interval.
(4)
Sx y
Sx y
Solution: Note: Deciding that, since b1  1 in simple regression, it must be true that b2  2 in
SSx1
SSx2
multiple regression won't get you an ounce of credit for this type of problem.
5.6
 0.56 . Second, we compute or copy
a) First, we compute Y  13 .596 , X 1  0.50 and X 2 
10
 X Y  94.1 ,  X Y  75.4405 ,  Y  2270 .68 ,  X  5 ,  X
 X X  2.44 . Third, we compute or copy our spare parts:
SSy   y  ny  2270 .68  10 13 .596   422 .166 *
Sx y   x y  nx y  94 .1  10 0.513 .596   26 .12
Sx y   X Y  nX Y  75.4405  100.56 13.596   0.6971
SSx1   x12  nx12  5  100.52  2.50 *
SSx2   X 22  nX 22  3.4658  100.562  0.3298*
and Sx x   X X  nX X  2.44  100.50 0.56   0.36 .
2
1
2
1
2
1
2
2
 3.4658 and
2
2
2
1
1
2
2
1
2
1 2
2
1
2
1
2
* indicates quantities that must be positive. (Note that some of these were computed for the last
problem.)
13
252y0141 5/4/01
Fourth, we substitute these numbers into the Simplified Normal Equations:
X 1Y  nX 1Y  b1
X 12  nX 12  b2
X 1 X 2  nX 1 X 2


 X Y  nX Y  b  X X
2
2
1
1
 
 nX X   b  X
2
1
2
2
2
2

 nX  ,
2
2
26 .12  2.50b1  0.36 b2

 0.6971  0.36 b1  0.3298 b2
which are
and solve them as two equations in two unknowns for b1 and b2 . We do this by multiplying the first
equation by 0.144, which is 0.36 divided by 2.50. The purpose of this is to make the coefficients of
b1 equal in both equations. We could do just as well by multiplying the second equation by 0.36 divided by
0.3298 and making the coefficients b2 equal.
 3.7613  0.36b1  0.0518 b2
So the two equations become 
. We then add the equations to get
 0.6971  0.36 b1  0.3298 b2
0.278
 11 .022 . The first of the two normal equations can now have our
3.0642  0.278 b2 , so that b 2 
3.0642
new value substituted into it to get 26 .12  2.50b1  0.3611 .022  or 26.12  3.97  2.50b1 , which gives us
b1  12.035 . Finally we get b0 by solving b0  Y  b1 X 1  b2 X 2
 13.598  12.035 0.50   11.022 0.56   1.4062 . Thus our equation is
Yˆ  b  b X  b X  1.4062  12.035X  11.022X .
0
1
1
2
2
1
2
b) The Regression sum of Squares is
SSR  b1
X 1Y  nX 1Y  b2
X 2 Y  nX 2 Y  b1 Sx1 y   b2 Sx2 y 

 

 12.035 26.12   11.022 0.6971   306 .671 * and is used in the ANOVA below.
The coefficient of determination is R 2 
SSR b1 Sx1 y   b2 Sx 2 y 
306 .671

 .726 * . Our results can

SST
422 .166
SSy
be summarized below as:
R2 *
.646
.726
n
k
1
2
10
10
R 2 , which is R 2 adjusted for degrees of freedom, has the formula R 2 
R2
.607
.647
n  1R 2  k , where
n  k 1
k is the
number of independent variables. R 2 adjusted for degrees of freedom went up and seems to show that our
second regression is better.
Our previous regression had SSR at 272.69. If it rose to 306.671, the new variable must explain 306.671 272.69 = 33.981. SSE  SST  SSR  422 .166  306 .671  115 .495
The ANOVA table is
Source
X1
X2
SS*
272.69
33.981
DF*
1
MS*
272.69
F*
F.10
1
33.981
2.0595
F71  3.59
115.495
7
16.4993
Error
422.166
9
Total
Since our computed F is smaller than the table F , we do not reject our null hypothesis that X 2 has no
effect.
14
252y0141 5/4/01
A faster way to do this is to use the R 2 s directly. The difference between R 2 = 72.6% and R 2 =64.6% is
8.0%.
Source
SS*
DF*
MS*
F*
F.10
72.6
1
72.6
X1
8.0
X1
1
8.0
2.0438
F71  3.59
27.4
7
3.91428
Error
100.0
9
Total
The numbers are a bit different because of rounding, but the conclusion is the same.
c) We computed the regression sum of squares in the previous section.
Source
SS
DF
MS
306.671
2
153.3355
X1 , X 2
F
9.293
F.10
2,7   3.26
F.10
115.495
7
16.49939
Error
422.116
9
Total
Since our computed F is larger than the table F , we reject our null hypothesis that X 1 and X 2 do not
explain Y .
d) Yˆ  b0  b1 X 1  b2 X 2  1.4062  12.035X 1  11.022X 2  1.4062  12.0351  11.022.40  17.85 .
e) From the ANOVA table, s  16 .49939  4.062 . Since k  2, t nk 1  t 7   1.895 . The outline says

e
that an approximate confidence interval is  Y0  Yˆ0  t
se
.05
2
 17 .85  1.895
4.062
 17 .95  2.43 and an
10
n
approximate prediction interval is Y0  Yˆ0  t s e  17.85  1.895 4.062   17.95  7.70. .
Computation of sums follows.
Row
1
2
3
4
5
6
7
8
9
10
price
8.52
21.45
16.18
6.21
12.19
25.62
13.90
18.66
5.25
7.98
135.96
FQ
0
1
1
0
1
1
0
1
0
0
5
prob
0.62
0.43
0.58
0.74
0.19
0.49
0.80
0.75
0.37
0.63
5.60
x1sq
0
1
1
0
1
1
0
1
0
0
5
x2sq
0.3844
0.1849
0.3364
0.5476
0.0361
0.2401
0.6400
0.5625
0.1369
0.3969
3.4658
ysq
72.590
460.103
261.792
38.564
148.596
656.384
193.210
348.196
27.562
63.680
2270.68
x1y
0.00
21.45
16.18
0.00
12.19
25.62
0.00
18.66
0.00
0.00
94.10
x2y
5.2824
9.2235
9.3844
4.5954
2.3161
12.5538
11.1200
13.9950
1.9425
5.0274
75.4405
x1x2
0.00
0.43
0.58
0.00
0.19
0.49
0.00
0.75
0.00
0.00
2.44
15
252y0141 5/4/01
3. An airline wants to select a computer package for its reservation system. Over 20 weeks it tries the four
commercially available reservation system packages and records as x1 , x 2 , x3 , and x 4 , the number of
passengers bumped by each system. It will choose the package with the smallest average bumps, assuming
that there is a significant difference between the median or average number of bumps. The data below are
in the columns labeled x, the original numbers and, in the r columns, their ranks on a 1 to 20 scale. Below
this I have given you the sums of the columns, the number of items in each column, the means for each
columns and the sums of the squared numbers (ssq) in each column. The columns are independent samples.
Use a 5% significance level.
Row
P1
r1
P2
x1
1
2
3
4
5
6
12
14
9
11
16
16.0
18.0
9.5
14.0
20.0
x1
x2
62.0
5.0
12.4
798.0
r2
P3
x2
17.0
5.0
3.4
79.0
2
4
7
3
1
2.0
4.0
7.5
3.0
1.0
57.0
6.0
9.5
561.0
40
4
10
454
x3
r3
P4
x3
r4
x4
10
9
6
10
12
10
12.0
9.5
5.5
12.0
16.0
12.0
7
6
15
12
7.5
5.5
19.0
16.0
x4
sum
count
mean
ssq
a. Assume that the underlying distribution is Normal and test for a significant difference between the means.
(7)
b. Assume that the underlying distribution is not normal and test for a significant difference between the
medians. (5).
c. Find the mean and standard deviation for column P3 and test column P3 for a Normal distribution. (5)
Solution: I followed the posted solution to Exercise 14.24.   .05
Sum
1
12
14
9
11
16
Sum
nj
2
2
4
7
3
1
3
10
9
6
10
12
10
4
7
6
15
12
 x
62 +
17 +
57 +
40
 176 
5+
5+
6+
4
 20  n
ij
x j
12.4
3.4
9.5
10
176
 8.8  x
20
SS
798 +
79 +
561 +
454
 1892 
x 2j
153.76
11.56
90.25
100
Sum is not useful.
 x  nx  176  208.8  343.20
SSB   n x  nx  512.4  53.4  69.5
SST 
2
ij
2
j .j
2
ij
2
2
2
 x
2
2
2

 4102  208.82  768.8  57.8  541.5  400  1548.8
=219.3
16
252y0141 5/4/01
Source
SS
DF
MS
F
F.05
9.440
F 3,16  3.24 s
Between
219.3
3
73.10
Within
Total
123.9
343.2
16
19
7.74375
H0
Column means equal
Since the value of F we calculated is more than the table value, we reject the null hypothesis and conclude
that there is a significant difference between column means.
b) Since this involves comparing three apparently random samples from a non-normal distribution, we use a
Kruskal-Wallis test. The null hypothesis is H 0 : Columns come from same distribution or medians are
equal. r1
r2
r3
r4
1
2
3
4
5
6
16
18
9.5
14
20
2
4
7.5
3
1
77.5
17.5
12
9.5
5.5
12
16
12
67
7.5
5.5
19
16
.
48
Sums of ranks are given above. To check the ranking, note that the sum of the three rank sums is 77.5 +
17.5 + 67 + 48 = 210, that the total number of items is 5 + 5 + 6 + 4 = 20 and that the sum of the first n
nn  1 20 21

 210 . Now, compute the Kruskal-Wallis statistic
numbers is
2
2
2
2
2
2 
 12
 SRi 2 


  3n  1   12  77 .5  17 .5  67   48    321
H 

5
6
4 
 nn  1 i  ni 
 20 21  5


12
1201 .25  61 .25  748 .17  576   63  10 .905 . If we try to look up this result in the (5, 5, 6, 4) section
420
of the Kruskal-Wallis table (Table 9) , we find that the problem is to large for the table. Thus we must use
the chi-squared table with 3 degrees of freedom. Since  .2053  7.8147 reject H 0 .
c) H 0 : N  ?, ? H 1 : Not Normal
Because the mean and standard deviation are unknown, this is a Lilliefors problem.
xx
From the data we can find that x  9.5 and s  1.9748 . t 
. F t  actually is computed from the
s
Normal table. For example F 0.25   Pz  0.25   Pz  0  P0.25  z  0  .5  .0987  .4013 and
F 0.25   Pz  0.25   Pz  0  P0  z  0.25   .5  .0987  .5987 . .
x
6
t
 1.77
F t  .0384
O
1
O
0.1667
n
Fo 0.1667
D
.1283
9
10
10
10
12
 0.25
.4013
1
0.25
.5987
1
0.25
.5987
1
0.25
.5987
1
1.26
.8962
1
0.1667
0.3333
.0680
0.1667
0.5000
.0987
0.1667
0.6667
.0680
0.1667
0.8333
.2346
0.1667
1.0000
.1038
MaxD   .2346

Since the Critical
O  n  6 Value for   .05
is .319 , do not
reject H 0 .
17
252y0141 5/4/01
4. The data from the previous page is repeated.
Use a 5% significance level.
Row
P1
r1
P2
x1
1
2
3
4
5
6
12
14
9
11
16
16.0
18.0
9.5
14.0
20.0
x1
P3
2.0
4.0
7.5
3.0
1.0
x3
17.0
5.0
3.4
79.0
r3
P4
x3
2
4
7
3
1
x2
62.0
5.0
12.4
798.0
r2
x2
r4
x4
10
9
6
10
12
10
12.0
9.5
5.5
12.0
16.0
12.0
7
6
15
12
7.5
5.5
19.0
16.0
x4
57.0
6.0
9.5
561.0
40
4
10
454
sum
count
mean
ssq
a. Assume that the underlying distribution is Normal and test columns 1 and 3 for differences in means.
Assume identical variances. Use a (i) test ratio, (ii) a critical value and (iii) a confidence interval (6)
b. Assume that the underlying distribution is not normal and test for a significant difference between the
medians of columns 1 and 3(4)
c. Assume again that the distributions are Normal and test that the variances are the same. (3)
d. Test column P3 to see if its standard deviation is 5. (3).
Solution: a) First, we need the variances of x1 and x3 . We know that x1  12 .4, n1  5 and
x3  9.5, n3  6 . Recall that s12 
s 32 
x
2
3
 nx32
n3  1

x
2
1
 nx12
n1  1

798  512 .42
 7.3 and
4
 H 0 : 1   2
561  69.52
 3.9 . We wish to test 
5
 H 1 : 1   2
From Table 3 in the Syllabus Supplement: Methods for comparing two sample means differ greatly
from methods for comparing one sample mean with a population mean!
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H0 :   0
d cv   0  t  2 sd
  d  t  2 s d or
d  0
t
between Two
H1 :    0
sd
or
  x  t 2 s x
Means (








x cv
x   0
1
2
unknown,
or s
1
1
ˆp
x
s

s

s


x
H
:





  0  t  2 s x
d
variances
0
n1 n2 or 0
n  1s12  n2  1s22
sˆ 2p  1
assumed equal)
H 1 :    0
n1  n2  2
DF  n1  n2  2
d  x  x1  x3  12.4  9.5  2.9 DF  n1  n3  2  5  6  2  9   .05,
sˆ 2p 
n1  1s12  n2  1s32
n1  n3  2
s x  s d  sˆ p
1
1


n1 n3
=
47.3  53.9
 5.4111
9
5.4111  1  1  
5
6
9
t .025
 2.262
1.984  1.4085
 H 0 : 1   2
H 0 :     0
Our hypotheses are 
or 
 H 1 : 1   2
H 1 :     0
252y0141 5/4/01
Use one of the following methods.
18
(i)Test Ratio: t 

d   0 x   0 2.9  0


 2.059 . The 'do not reject' region is between
sd
s x
1.4085



9
9
 t n1  n2  2  t .025
 2.262 and t n1  n2  2  t .025
 2.262 . Since our t is between these two numbers, do
2
2
not reject the null hypothesis.
(ii) Critical Value: d cv   0  t  2 sd or xcv   0  t s x  0  2.262 1.4085   3.186 . The 'do not
2
reject' region is between -3.186 and 3.186. Since x  d  2.9 is between these values, we do not
reject H 0 .
(iii)Confidence Interval: The 2-sided interval is   d  t sd or   x  t s x
2
2
 2.9  2.262 1.4085   2.9  3.19. since this interval includes zero, we do not reject H 0 .
b) Since we have two independent samples from non-Normal populations, we use the Wilcoxon-MannH 0 : 1   2
Whitney Test for Two Independent Samples. 
H 1 : 1   2
Sum
x1
12
14
9
11
r1
3.5
2
9.5
5
r1 *
8.5
10
2.5
7
16
1
21
11 .
39
x3
10
9
6
10
12
10
r3
7
9.5
11
7
3.5
7
45
r3 *
5
2.5
1
5
8.5
5 .
27
For the purposes of this test, n 2  5 is the size of the larger sample (actually sample 3), n1  4 is the size of
the smaller sample and we wish to compare their medians.
Our first step is to rank the numbers from 1 to n  n1  n 2  5  6  11 . Note that there are a number of ties
that must receive an average rank. The numbers can be ordered from the largest to the smallest or from the
smallest to the largest. To decide which to do, look at the smaller sample. If the smallest number is in the
smaller sample, order from smallest to largest, if the largest number is in the smallest sample, order from the
largest to the smallest. Since 16 is the largest number, let that be 1. Now compute the sums of the ranks.
SR1  21, SR2  39 . As a check, note that these two rank sums must add to the sum of the first n numbers,
and that this is
nn  1 1112 

 66 , and that SR1  SR2  21  45  66 .
2
2
The smaller of SR1 and SR2 is called W and is compared with Table 5 or 6. To use Table 5, first find the
part for n 2  6 , and then the column for n1  5 . Then try to locate W  21 in that column. In this case, the
p-value is .0628, which should be doubled for a two-sided test. Since this is above the significance level, we
cannot reject the null hypothesis. This can also be compared against the critical values for TL and TU ( TU is
actually only needed for a 2-sided test) in table 14a. these are 41 and 19. Since W  21 is between these
values, we cannot reject the null hypothesis. The starred rankings are what you would get if you ranked
from the bottom to the top. They are incorrect, but would cause you to correctly not reject the null
hypothesis.
19
252y0141 5/4/01
c). From Table 3 in the Syllabus Supplement:
Interval for
Confidence
Hypotheses
Interval
Ratio of Variances  22 s22 DF , DF
H 0 :  12   22
 2 F.5 1.5 2 
2
2
1 s1
H 1 :  12   22
1
F1DF 21 , DF2  DF1 , DF2
DF1  n1  1
F 2
DF2  n 2  1
 2

.5  .5  2    or
1  
2

Test Ratio
F
Critical Value
s12
 2
s2
DF1 , DF2
and
F DF2 , DF1 

Since this is a 2 sided test we use both F 4,5 
s12
s 32

s22
s12
s 2 3.9
7.3
 1.8718 and F 5, 4   32 
 0.534 . We
3.9
7.3
s1
4,5  7.39 and F 5,4   9.36 . Since our computed F's are less than the corresponding table Fs,
find that F.025
.025
we cannot reject the null hypothesis. (Because the smaller F is below 1 and there are no values on the F
4,5  7.39 .)
table below 1, we actually only look at F.025
d) e. From Table 3:
Interval for
Confidence
Interval
Variancen  1s 2
2


Small Sample
.25 .5 2 
Hypotheses
Test Ratio
H 0 :  2   02
2 
H1: :  2   02
n  1s 2
 02
Critical Value
2
s cv

 .25 .5 2  02
n 1
H 0 :  2  25
H 0 :   5
Our hypotheses are 
or 
. We know that We know that x3  9.5, n3  6 and
H 1: :  2  25
H 1: :   5
s 32 
x
2
3
 nx32
n3  1

n  1s 2  53.9  0.78 and DF  n 1  5. From
561  69.52
 3.9 . Here  2 
25
5
 02
( 5)
( 5)
 12 .8325 . We would not reject the null hypothesis
 0.8312 and  .2025
the chi-squared table, we find  .2975
if our  2 were between these values. It is not, so reject the null hypothesis.
20
252y0141 5/4/01
5. a. A machine fills a sample of 100 one-pound boxes of a product and they are later tested to see how
many are over or under the desired one-pound size. The manufacturer wishes to test whether exactly half of
the population of boxes is over the one-pound mark and that the occurrence of boxes that are 'over' and
'under' is random. In the sample there are 55 boxes that are 'over' and 45 that are under and there are 45 runs
of 'overs' or 'unders'.
(i) Test that the proportion of 'overs' is 50%. (2)
(ii) Test that the sequence of 'overs' and 'unders' is random. (5)
b. A series of 24 observations are used to calculate a simple regression with only one independent variable.
We calculate a Durbin-Watson statistic of 0.471. Is Autocorrelation present? Is it positive or negative. (3)
c. We are testing to see if the mean of a normally distributed population with a known variance of 20 is 5.
We take a sample of 100 and find that the mean is 9. Given these results, what is the p-value of our result if
(i) the Null hypothesis is H 0 :   5 , (ii) the Null hypothesis is H 0 :   5 , (iii) The Null hypothesis is
H 0 :   5 (6)
Solution: a) (i) From Table 3
Interval for
Confidence
Interval
Proportion
p  p  z 2 s p
pq
n
q  1 p
sp 
Our hypotheses are
use a test ratio, z 
H0 : p  5
H 1 : p  .5
p  p0

Hypotheses
Test Ratio
H 0 : p  p0
z
H1 : p  p0
. n  100 , p 
Critical Value
p  p0
pcv  p0  z 2 p
p
55
 .55 and   .05 .  p 
100
 p 
p0 q0

n
p0 q0
n
.5.5
 .05 . If we
100
.55  .5
 1 . Since this is between -1.96 and 1.96, we do not reject the null
.05
p
hypothesis. If we use a critical value pcv  p0  z  p  .5  1.960.05 or .402 to .598. Since .55 lies between
2
these numbe5rs, do not reject the null hypothesis.
(ii) This is a runs test. The null hypothesis is randomness. r  45, n1  55, n 2  45 and
n  n1  n2  100 . Our values are too large for the runs test table, but we know that for a larger problem (if
n1 and n 2 are too large for the table), r follows the normal distribution with
2n n
  1  2  49 .548 .5  24 .25 . So
255 45 
  1 2 1 
 1  50 .5 and  2 
100
n 1
99
45  50 .5
z

 1.12 . Since this value of z is between  z  1.960 , we do not reject
2

24 .25
H 0 : Randomness.
b) This is a Durbin-Watson test and we are given the Durbin-Watson statistic, DW  0.471 . Use a DurbinWatson table with n  24 and k  1 to fill in the diagram below.
n
r
0
+
 0
dL
+
?
dU
+
 0
2  0
+
4  dU
+
?
4 dL
+
 0 4
+
If you used the 5% table, you got d L  1.27 and dU  1.45. If you used the 1% table, you got
d L  1.04 and dU  1.20. In either case the given value of DW  0.471 falls well below d L , indicating
positive autocorrelation.
21
252y0141 5/4/01
c) From table 3
Interval for
Confidence
Interval
  x  z 2  x
Mean (
known)
Hypotheses
H0 :   0
x  0
x

z
H1 :    0
The problem states  2  20 , n  100 , and x  9.  x 
test ratio, z 
Test Ratio

n

20
x  0
x
Critical Value
xcv  0  z 2 x
 2 . To get a p-value, we must use a
100
95
 2.00 .
2
H :   5
(i)  0
H 1 :   5
pval  Px  5  Pz  2  Pz  0  P0  z  2.00   .5  .4772  .9772
H :   5
(ii)  0
H 1 :   5
pval  Px  5  Pz  2  Pz  0  P0  z  2.00   .5  .4772  .0228
H :   5
(iii)  0
In the case of a 2-sided hypothesis, find the probability to the nearest tail and double it.
H 1 :   5
pval  2Px  5  2Pz  2  2.0228   .0456 .
22
252y0141 5/4/01
6. An electronics chain reports the following data on number of households, sales volume and number of
customers for 10 stores.
Row hshlds sales
x1
1
2
3
4
5
6
7
8
9
10
161
99
135
120
164
221
179
204
214
101
x
1
cust
x2
x3
157
93
136
123
153
241
201
207
230
135
 1598.0,
305
55
205
105
255
505
355
455
405
155
x
2
1
 273738,
x
2
 1676.0,
x
2
2
 302788,
x x
1 2
 287019
a) Compute the correlation between households and sales and test it for significance. (5)
b) Test the same correlation to see if it is .9 (5)
c) Compute the rank correlation between households and sales and test it for significance. (5)
d) Compute Kendall's W for households, sales and customers and test it for significance (6)
Solution: From the outline the simple sample correlation coefficient is
S
 XY  nXY
. In this case we want

SS SS
X

n
X
Y

n
Y


287019  10 159 .8167 .6
 x x  nx x

 x  nx  x  nx 273738  10159 .8 302788  10167 .6
xy
r
2
r
2
1 2
1 2
2
1
2
2
2
1

2
19194 .22

18377 .621890 .4
2
x
y
2
2
2

2
19194 .2
18377 .6 21890 .4
.9158  .9570
If we want to test H 0 : xy  0 against H1 : xy  0 and x and y are normally distributed, we use
t n  2  
r
r
.9570
.9570



 9.328 . Compare this with  t n2 2  t .8025  2.306 . Since
2
sr
.
10259
1  .9158
1 r
10  2
n2
9.328 does not lie between these two values, reject the null hypothesis.
b) If we are testing H 0 : xy   0 against H 1 : xy   0 , and  0  0 , we use Fisher's z-transformation.
1  1  r  1  1  .9570
z  ln 
Let ~
  ln 
2  1  r  2  1  .9570
1  1 0
 z  ln 
2  1 0
sz 
1

n3
 1
  ln 45 .5116   1.90898 . This has an approximate mean of
 2
 1  1  .9  1
  ln 
 2  1  .9   2 ln 19   1.47222 and a standard deviation of

~
n  2 
z   z 1.90898  1.47222
1
 0.37796 , so that t


 1.156 . Compare this with
7
sz
0.27796
 t n2 2  t .8025  2.306 . Since 1.156 lies between these two values, do not reject the null hypothesis.
Note:: To do the above with logarithms to the base 10, try
1  1  r  1  1  .9570  1
~
z10  log 
  log
  log 45 .5116   0.82906 . This has an approximate mean of
2  1  r  2  1  .9570  2
23
252y0141 5/4/01
1 1 0
 z 10  log
2  1 0
s z 10 

1  1  .9  1
  log 
  log 19   0.63938 and a standard deviation of

2  1  .9  2

~
n  2 
z   z 10 0.82906  0.63938
0.18861
.18861

 0.16415 , so that t
 10

 1.156 .
n3
10  3
s z 10
0.16415
c) The data given is repeated below with ranking necessary for the remainder of the problem. r1 , r2 and r3
are bottom-to-top rankings within each column, d  r1  r2 and d 2 are needed for Spearman's rank
correlation, SR isd the sum of rhe ranks in the given row. SR 2 is required for Kendall's W..
Row
hshlds
sales
x1
1
2
3
4
5
6
7
8
9
10
cust
x2
161
99
135
120
164
221
179
204
214
101
x3
157
93
136
123
153
241
201
207
230
135
305
55
205
105
255
505
355
455
405
155
r1
5
1
4
3
6
10
7
8
9
2
d  r1  r2 d 2 SR
r2 r3
6
1
4
2
5
10
7
8
9
3
6
1
4
2
5
10
7
9
8
3
-1
0
0
1
1
0
0
0
0
-1
0
1
0
0
1
1
0
0
0
0
1
4
SR 2
17
3
12
7
16
30
21
25
26
8
165
289
9
144
49
196
900
441
625
676
64
3393
To compute Spearman's Rank Correlation Coefficient, take a set of n points x, y  and rank both
x and y from 1 to n to get rx , ry . Do not attempt to compute a rank correlation without replacing


the original numbers by ranks, then compute d  rx  ry ,and then rs
 1
64
 0.9758 . In this case, we have a 1-sided test
10 100  1
 d  0 and  d
2
d
 1
nn  1
2
6
2
H 0 :  s  0
. Note that

H 1 :  s  0
 4 . If we check the table ‘Critical Values of rs , the Spearman Rank Correlation
Coefficient,’ we find that the critical value for n  10 and   .05 is .5515 or, for a two sided test, the 2.5%
value is ,5515. So we must reject the null hypothesis and conclude that we can say that the rankings have
significant agreement..
c) For Kendall's Coefficient of Concordance, take k columns with n items in each and rank each column
from 1 to n . The null hypothesis is that the rankings disagree.
Compute a sum of ranks SRi for each row. Then S 
where 16 .5 
W
n
2
 
 n SR
2
 3393  10 16 .52  670 .5 ,
 SR  165  SR  n  1k  113 is the mean of the SR s.
n
S
1 k2
12
 SR
3
n
10


2
S
1 32
12
10
3
 10


2
i
S
 .90303 is the Kendall Coefficient of Concordance and must
742 .5
be between 0 and 1.
H 0 is disagreement. Since n is too large for the table use  2n1  k n 1W  390.90303  24.38181
.Since  .2059   21 .660 , we reject the null hypothesis and say that there is significant agreement.
24
252y0141 5/4/01
7. A producer of filters is getting complaints about the quality of the filters it is producing. It thus examines
1000 filters from each of its three shifts and discovers for shift 1 36 defects, for shift 2 40 defects and for
shift 3 55 defects.
a) Test the hypothesis that the proportion of defective filters is the same for all three shifts at the 95% level.
(7)
b) Test the hypothesis that the defect rate is higher for the third shift than the first. (3)
c) Find a p-value for your result in b) (2)
d) Do a confidence interval for the difference between the proportion defective for shifts 1 and shift 2. (4)
Solution::
H 0 : Homogeneousor p1  p 2  p 3 
H 1 : Not homogeneousNot all ps are equal
DF  r  1c  1  12  2
 .2052   5.9915
O
Shift1 Stift 2 Shift 3 Total
pr
Defective  36
40
55  131 .04367


Not
960
945  2869 .95633
 964
Total
1000 1000
1000
3000 1.0000
E
Shift1 Shift 2 Shift 3
Total
pr
Defective  43 .67
43 .67
43 .67 
131 .04367


Not
 956 .33 956 .33 956 .33  2869 .95633
Total
1000 .00 1000 .00 1000 .00 3000 1.0000
The proportions in rows, p r , are used with column totals to get the items in E . Note that row and column
sums in E are the same as in O . (Note that  2  0.7434 is computed two different ways here - only
one way is needed.)
O2
O  E 2
O  E 2
Row
OE
O
E
E
E
1
36
43.67
7.6700
58.829
1.34712
29.677
2
40
43.67
3.6700
13.469
0.30842
36.638
3
55
43.67 -11.3300
128.369
2.93952
69.270
4
964
956.33
-7.6700
58.829
0.06151
971.732
5
960
956.33
-3.6700
13.469
0.01408
963.684
6
945
956.33
11.3300
128.369
0.13423
933.804
3000 3000.00
0.0000
4.8049
3004.8049
O  E 2  3004 .8049  3000  4.8049
O2
n 
E
E
Since this is less than 5.9915, do not reject H 0 .


25
252y0141 5/4/01
b) We are comparing p1  .036 , n1  1000 and p3  .055 , n 2  1000 .
From Table 3
Interval for
Confidence
Hypotheses
Test Ratio
Interval
Difference
p  p 0
p  p  z 2 sp
H 0 : p  p0
z
between
 p
H 1 : p  p0
p  p1  p2
proportions
If

p

0
p 0  p 01  p 02
p1q1 p2 q 2
q  1 p
s p 

p01q 01 p02 q 02
 p 

n1
n2
or p 0  0
n
n
1
2
Or use s p
Critical Value
pcv  p0  z 2  p
If p0  0
 p 
p0 q 0  1 n1 
1
n2

n p  n2 p2
p0  1 1
n1  n 2
H 0 : p1  p 3
H 0 : p 3  p1  0
H : p  0
b)  0
Same as 
or 
Note that p  p3  p1  .019 ,
H 1 : p  0
H 1 : p1  p 3
H 1 : p 3  p1  0
n p  n 2 p 3 1000 .036   1000 .055 
36  55
p0 
 1 1

 .0455 ,
1000  1000
n1  n3
1000  1000
  .05, z  z.05  1.645, z 2  z 025  1.960. Note that q  1  p and that q and p are between 0 and 1.
 p  p 0 q 0

1
n1

1
n3

.0455 .9545  11000  11000 
.00008686  .0093198
(Only one of the following methods is needed!)
Test Ratio: z 
p  p 0
 p

.019  0
 2.03 Make a Diagram showing a 'reject' region above
.0093198
1.645. Since 2.03 is above this, reject H 0 .
or Critical Value: pcv  p0  z  p  0  1.645 .0093198   .01533 . Make a Diagram showing a
'reject' region above .01533. Since 019 is above this, reject H 0 .
or Confidence Interval: p  p  z s p (Probably not worth doing.). In all cases reject H 0 .
c) z 
p  p 0
 p

.019  0
 2.03 pval  Pp  .019   Pz  2.03  .5  .4788  .0212
.0093198
d) Let p  p1  p 2  .036  .040  .004
s p 
p1 q1 p 2 q 2
.036 .964  .040 .960 



 .000034704  .0000384  .0007310  .00855
n1
n2
1000
1000
Then p  p  z s p  .004  1.960 .00855   ..004  .017 or -.013 to .021.
2
© 2001, R. E. Bove
26
Download