advertisement
6/10/99 252zz9943
5. Data from problem 4 is repeated below. (Use   .01 )
y  272 ,
y 2  5128 ,
x  187,
x12  2469,


 x y  3549 ,  x
1

2y
1
 759 .090 ,

x x
1 2
x
2
 40.29,
x
 113.339,
2
2
 525 .38 and n  15 .
a. Do a multiple regression of passengers against advertising and National Income. (12)
b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem. Compare
the values of R 2 adjusted between this and the previous problem. Use an F test to compare R 2 here with
the R 2 from the previous problem.(5)
c. Compute the regression sum of squares and use it in an F test to test the usefulness of this regression. (5)
d. Use your regression to predict the number of passengers when we spend $13 (thousand) on advertising
and National Income is $3.5 (trillion).(2)
e. The regression on the previous page was run with the command
MTW > regress C1 on 1 C3;
SUBC > dw.
As a result, the last line of the regression read
Durbin-Watson statistic = 0.71
Solution: a) First, we compute Y  18 .1333 , X 1 
187
 12.4667 and X 2  2.6860 . Second, we compute
15
 X Y  3549 ,  X Y  759 .090 ,  Y  5128 ,  X  2469 ,  X  113 .339 and
 X X  525 .38 . Third, we compute our spare parts SSy   Y  nY  195 .733 ,
Sx y   X Y  nX Y  2469  1512.4667 18.1333   158 .067 , Sx y   X Y  nX Y
 759 .09  152.6860 18.1333   28 .4979 , SSx1   X 12  nX 12  2469  1512.46672  137.733 ,
SSx   X  nX  5.11975 and Sx x   X X  nX X  525 .38  1512 .4667 2.6860 
2
1
2
1
2
2
2
2
1
2
2
1
2
1
1
2
2
2
2
2
1 2
1
2
1
2
2
2
 23 .0979 . (Note that some of these were computed for the last problem.) Fourth, we substitute these
numbers into the Simplified Normal Equations:
X 1Y  nX 1Y  b1
X 12  nX 12  b2
X 1 X 2  nX 1 X 2


 X Y  nX Y  b  X X
2
which are
2
1
1
2
 
 nX X   b  X
1
2
2
2
2

 nX  ,
2
2
158 .067  137 .733 b1  23 .0979 b2
28 .4979  23 .0979 b1  5.11975 b2
and solve them as two equations in two unknowns for b1 and b2 . We do this by multiplying the second
equation by 4.5115, which is 23.0979 divided by 5.11975 so that the two equations become
158 .067  137 .733 b1  23 .0979 b2
, we then subtract the second equation from
128 .569  104 .207 b1  23 .0979 b2
the first to get 29.598  33.526 b1 , so that b1  0.8799 . The first of the two normal equations can now be
rearranged to get 23.0979 b2  128 .569  104 .207 0.8799  , which gives us b2  1.5963 . Finally we get b0
by solving b0  Y  b1 X 1  b2 X 2  18 .1333  0.8799 12 .4667   1.5963 2.6860   2.8762 . Thus our
equation is Yˆ  b0  b1 X 1  b2 X 2  2.8762  0.8799X 1  1.5963X 2
b) The coefficient of determination is R 2 

b1
 X Y  nX Y  b  X Y  nX Y 
 Y  nY
0.8799 158 .067   1.5963 28 .4979   .9430
195 .7333
1
1
2
2
2
2
2
. (The standard error is
17
6/10/99 252zz9943
s e2
Y

2
 nY 2  b1
 X Y  nX Y  b  X Y  nX Y   Y

1
1
2
2
n3
need it yet.) Our results can be summarized below as:
n
R2
.8104
15
.9430
15
2
2

 nY 2 1  R 2
n3
 , but we don’t
R2
.7958
.9335
k
1
2
R 2 , which is R 2 adjusted for degrees of freedom, has the formula R 2 
n  1R 2  k , where
k is the
n  k 1
number of independent variables. R 2 adjusted for degrees of freedom seems to show that our second
regression is better.
Y 2  nY 2  195 .733 . For the
The easiest way to do the F test and have it look right is to note that

regression with one independent variable the regression sum of squares is
R2
Y 2  nY 2  .8104 195 .733   158 .622 . For the regression with two independent variables the


regression sum of squares is R 2
 Y
2

 nY 2  .9430 195 .733   184 .576 . The difference between these
is 25.954. the remaining unexplained variation is 195.733 –184.576 = 11.157. the ANOVA table is
Source
SS
DF
MS
F
F.01
158.622
1
158.622
X2
X1
25.954
1
25.954
27.9105
1
F12
 9.33
11.157
12
0.9299
Error
195.733
14
Total
Since our computed F is larger than the table F , we reject our null hypothesis that X 1 has no effect.
c) We computed the regression sum of squares in the previous section.
Source
SS
DF
MS
F
F.01
184.576
2
92.288
99.245
X1 , X 2
F 2  6.93
12
11.157
12
0.9299
Error
195.733
14
Total
Since our computed F is larger than the table F , we reject our null hypothesis that X 1 and X 2 do not
explain Y .
d) Yˆ  b0  b1 X 1  b2 X 2  2.8762  0.8799X 1  1.5963X 2  2.8762  0.8799 13  1.5963 3.5 =11.103.
e) A Durbin-Watson Test is a test for autocorrelation. For   .01 , k  2 and n  15 , the test table gives
d L  .70 and d U  .1.25 .According to the text, the null hypothesis is ‘No Autocorrelation’ and our
rejection region is d  d L, or 4  d   d L, . We really should use the   .005 value for d L , but a
2
2
check of the   .05 table leaves us sure that it is below .70. thus the D-W statistic of 0.71 is not in the
rejection region. Check the examples to see that it could be in the “possibly significant” region.
18
6/14/99 252zz9943
6.(Watch it!) Three methods are used to train candidates for the FAA pilots exam. Scores for trainees are
shown below classified by method.
Method
a. Assume that the data is normal and compare the means
Video
Audio
for the first two methods (Assume unequal variances) (5)
Cassette Cassette Classroom
b. Do the same for all three methods (You may assume
72
73
68
equal variances now) (7)
86
75
83
80
60
50
c. Test column 1 to see if it has the normal distribution (5)
91
52
91
46
84
84
68
76
77
75
94
81
92
90
72
86
80
91
46
68
75
 x1

Note: For the first column:  x1  x1
 0.14 0.82 0.41 1.16  1.91  0.41 0.07
 s1
Note:
x
1
 518,
x
 39626,
2
1
x
3
 810,
x
2
3
 67240.
Note: In spite of the words “Watch it!, ” many people assumed that this was identical to a problem
with similar data on an earlier exam. You have to read the question before answering it!
x 2  420 ,
x 22  30090 , n3  10
Solution: Note: n1  7, n 2  6,


a) Assume unequal variances. From Table 3 of the Syllabus Supplement:
Interval for
Confidence
Hypotheses
Test Ratio
Interval
Difference
between Two
Means(
unknown,
variances
assumed
unequal)
x1
x2 
1
n1
x
n2
2
s12 s22

n1 n2
sd 
DF 
 s12 s22 
  
n

 1 n2 
2
s12
2
s 22
n1
518

 74, s12 
7

 H 0 : 1   2

 H 1 : 1   2
2
420
 70, s 22 
6
 nx12
n1  1
x
2
2
d cv   0  t  2 s d
if  0  0
n2  1
2
1
d 0
sd
H 1:  1   2
2
n2
x
t
  1   2
Same as
H 0: 1   2
   
n1  1
x

H 0 :   0
H 1:   0
  d  t  sd
Critical Value
 nx 22
n2  1

39626  774 2
 215 .6667
6
s1  14 .68559

30090  770 2
 138 .0000
5
s 2  11.74734 d  x1  x 2  4
s12 215 .6667

 30 .8095
n1
7
s 22 138 .0000

 23 .0000
n2
6
s12 s 22

 53 .8095
n1 n 2
sd 
s12 s 22

 53 .8095  7.3355
n1 n 2
19
6/14/99 252zz9943
DF 
 s12 s 22 



 n1 n 2 


2
2
2
 s12 
 s 22 
 
 
 n1 
 n2 
 
 

n1  1
n2 1

53 .8095 2
30 .8095 2  23 .0000 2
6
 10 .9675 , so use 10 degrees of freedom.
5
t.10
025  2.228 , so, using a test ratio t 
d  0
40

 0.545 . Since this is between 2.228 , do not
sd
7.3355
reject H 0 or, using a critical value, d cv   0  t s d  0  2.228 7.3355   16 .387 . Since d  4 is
2
between these values, do not reject H 0 .
b) 1-way ANOVA
Method
x2
73
75
60
52
84
76
x1
72
86
80
91
46
68
75
.
518
Sum
.
+ 420
x3
68
83
50
91
84
77
94
81
92
90
+810
= 1748 
= 23  n
nj
7
+6
+ 10
x j
74
70
81
SS
39626
+ 30090
+ 67240
x 2j
5476
4900
6561
Note that x is not a sum, but is
SSB 

n j x2j
 x . SST 
n
 x
76  x
= 136956   xij2
 x
2
ij
 n x  136956  2376 2  4108 .
2
 n x  75476   64900   10 6561   2376 2  494 .
Source
Between (Methods)
2
SS
494
DF
2
MS
247
F
1.365
( SSW  SST  SSB  3614 )
F.05
F 2, 20  3.49 ns
H0
Column means equal
Within (Error)
3614
20
181
Total
4108
22
2, 20  3.49 , we cannot reject
Because our computed F is smaller than F.05
H0
20
6/14/99 252zz9943
c) H 0 : Normal
We use the Lilliefors method because we are testing for the Normal distribution, we
have a small sample and the population mean and variance are unknown. The column Fe is the cumulative
distribution computed from the Normal table. t or z is
x1  x1
, which was computed for you.
s1
t or z
O Cumulative O
Fo
Fe
D
1
1
.14286 -1.91
.0281
.1148
1
2
.28571 -0.41
.3409
.0552
1
3
.42857 -0.14
.4443
.0157
1
4
.57142
0.07
.5279
.0435
1
5
.71428
0.41
.6591
.0552
1
6
.85714
0.82
.7939
.0632
1
7
1.00000
1.16
.8770
.1230
7
From the Lilliefors Table, the critical value for a 95% confidence level is .300. Since the largest number in
D is not above this value, we do not reject H 0 .
x1
46
68
72
75
80
86
91
21
6/14/99 252zz9943
7. (Watch it!) Three methods are used to train candidates for the FAA pilots exam. Scores for trainees are
shown below classified by method.
Method
a. Using a sign test, check x3 to see if it has a median of 85. (4)
Video
Audio
Cassette Cassette Classroom
b. Repeat the test on x3 using a more powerful method. (5)
x1
72
86
80
91
46
68
75
x2
73
75
60
52
84
76
Solution:
x3 x 3  85
68 -17
83
-2
50 -35
91
6
84
-1
77
-8
94
9
81
-4
92
7
90
5
x3
68
83
50
91
84
77
94
81
92
90
c. Apply the Runs Test as follows:
Write down the numbers in x1 and x3 together in order.
Underneath the numbers write down A if the number comes from
x1 and C if it comes from x3 . You will have a sequence like AACACC
….. . In case of a tie remove both tying numbers from your test.
Do a runs test on the resulting sequence to see if the A’s and C’s
appear randomly.
Congratulations! You have just done a Wald-Wolfowitz Test for the
equality of means in two (nonnormal) samples. If the sequence is
random, the means are equal. (6)
r r (corrected)
9
92
210
105
5
1
17
78
8
3
36
6
4
4
T   23
T   32
pvalue  2 Px  6  2 Px  4  2.37695 
 .7539 .
If   .05, this p-value is above the significance
level and we do not reject H 0 .
b) To do a Wilcoxon Signed Rank Sum Test,
rank the differences from 85 and put the sign of
the difference next to these ranks. To check ,
nn  1
note that T    T   55 
. From
2
the table the 2 1 2 % critical value is 8, since
both Ts are above this value, do not reject H 0 .
a) H 0 :  85 . To do a sign test, note that there
are 6 numbers below 85 and 4 above. Using the
binomial table with n  10, p  .5,
c) The numbers written out in order are:
46 50 68 68 72 75 77 80 81 83 84 86 90 91 91 92 94

A C C A A A C A C C C A C C A C C
46 50 72 75 77 80 81 83 84 86 90 92 94
If we eliminate ties we get: 
A C A A C A C C C A C C C
The number of As is n1  5 and the number of Cs is n 2  7 and there are r  8 runs. If we look this up
in the table entitled “Critical Values of r for the Runs Test, ” we fine that the upper critical value is 11 and
the lower critical value is 3. Since 8 lies between these values we do not reject the null hypothesis of
randomness. Out final conclusion is that the means of the populations from which the two samples come are
equal.
22
Download