    14

advertisement
252y0771 11/27/07 (Page layout view!)
ECO252 QBA2
THIRD EXAM
November 29, 2007
Version 1
Name ______Key____________
Student number_______________
Class Day and hour____________
I. (8 points) Do all the following (2 points each unless noted otherwise). Make Diagrams! Show your
work! All probabilities must be between zero and (positive) 1.
x ~ N 26,14 
38  26 
 20  26
z
1. P20  x  38   P 
 P0.43  z  0.86   P0.43  z  0  P0  z  0.86 
14
14 

.1664  .3051  .4715
For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line!
Shade the entire area between -0.43 and 0.86. Because this is on both sides of zero we must add the area
between -0.43 and zero to the area between zero and 0.86. If you wish, make a completely separate diagram
for x . Draw a Normal curve with a mean at 26. Indicate the mean by a vertical line! Shade the entire
area between 20 and 38. This area is on both sides of the mean (26) so we add to get our answer.
0  26 

2. Px  0  P  z 
 Pz  1.86   P1.86  z  0  Pz  0  .4686  .5  .9686
14 

For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line!
Shade the entire area above -1.86. Because this is on both sides of zero we must add the area between
-1.86 and zero to the area above zero. If you wish, make a completely separate diagram for x . Draw a
Normal curve with a mean at 26. Indicate the mean by a vertical line! Shade the entire area above zero,
remembering that zero is below the mean. This area is on both sides of the mean (26) so we add to get our
answer.
76  26 
 32  26
z
3. P32  x  76   P 
 P0.43  z  3.57   P0  z  3.57   P0  z  0.43
14
14 

 .4998  .1664  .3334
For z make a diagram. Draw a Normal curve with a mean at 0. Indicate the mean by a vertical line!
Shade the area between 0.43 and 3.43. Because this is on one side of zero we must subtract the area
between zero and 0.43 from the larger area between zero and 3.43. If you wish, make a completely separate
diagram for x . Draw a Normal curve with a mean at 26. Indicate the mean by a vertical line! Shade the
area between 32 and 76. This area is on one side of the mean (26) so we subtract to get our answer.
4. x.075 (Do not try to use the t table to get this.) For z make a diagram. Draw a Normal curve with a
mean at 0. z .075 is the value of z with 7.5% of the distribution above it. Since 100 – 7.5 = 92.5, it is also
the 92.5th percentile. Since 50% of the standardized Normal distribution is below zero, your diagram should
show that the probability between z .075 and zero is 92.5% - 50% = 42.5% or P0  z  z.075   .4250 . The
closest we can come to this is P0  z  1.44   .4251 . So z.075  1.44 (or something slightly smaller). To
get from z .075 to x.075 , use the formula x    z , which is the opposite of z 
x
.

x  26  1.4414   46.16 . If you wish, make a completely separate diagram for x . Draw a Normal curve
with a mean at 26. Show that 50% of the distribution is below the mean (26). If 8.5% of the distribution is
above x.085 , it must be above the mean and have 41.5% of the distribution between it and the mean.
Check:
46 .16  26 

Px  46.16   P  z 
  Pz  1.44   Pz  0  P0  z  1.44   .5  .4251  .0749  .075
14


1
252y0771 11/27/07 (Page layout view!)
II. (22+ points) Do all the following (2 points each unless noted otherwise). Do not answer a question
‘yes’ or ‘no’ without giving reasons. Show your work when appropriate. Use a 5% significance level except
where indicated otherwise. Note that this is extremely long and that no one will do all the problems, so look
them over!
1. Turn in your computer problems 2 and 3 marked as requested in the Take-home. (5 points, 2 point
penalty for not doing.)
2. In an ordinary 1-way ANOVA, if the computed F statistic is below the value from the F table at the
given significance level, we can
a. Reject the null hypothesis because the difference between the means is not significant
b. Reject the null hypothesis because there is evidence of a significant difference between some of
the means.
c. *Not reject the null hypothesis because the difference between the means is not significant.
d. Not reject the null hypothesis because the difference between the means is significant.
c. Not reject the null hypothesis because the difference between the variances is not significant.
d. Not reject the null hypothesis because the difference between the variances is significant.
e. None of the above.
[7]
3. After an analysis if variance, you would use the Tukey-Kramer procedure or similar confidence
intervals to check
a. For Normality
b. For equality of variances
c. For independence of error terms
d. *For pairwise differences in means
e. For all of the above
f. For none of the above
4. If an ordinary one-way ANOVA has 25 columns 17 rows and 17 25   425 , the degrees of freedom
for the F test are
a. 400 and 24
b. 408 and 16
c.* 24 and 400
d. 16 and 408
e. 400 and 424
f. 408 and 424
g. 424 and 400
h. 424 and 408
i. 16 and 24
j. None of the above. The correct answer is _______.
Explanation: This is a one-way ANOVA. The total number of observations is n  425 and the number
of columns is m  25 . This means there are 425-1 = 424 total degrees of freedom and that between the
columns there are 25-1 = 24 degrees of freedom. This leaves 424 – 24 = 400 degrees of freedom for the
error (within) term. Numbers are filled in below.
Source
SS
DF
MS
F
F

SSB
m  1  24
Within
SSW
n  m  400
Total
SST
n  1  424
Between
SSB
m 1
SSW
MSW 
nm
MSB 
F
MSB
MSW
24, 400  1.54
F.05
2
252y0771 11/27/07 (Page layout view!)
5. Assuming that your answer to 4 is correct and that the significance level is 5%, the correct value of F
from the table is _1.54_____. (This may have to be approximate. If so, what did you use?) (1) [12]
Note: I will check your answer against what you said in the previous question. The answer above is
24,400 .
wrong if you did not say something close to F.05
Exhibit 1 A realtor believes that the selling price of a home (in $ thousands) is related to the condition of
the home (on a 1 to 10 scale) and the size of the home (in hundreds of square feet). He runs the data below
on Minitab and gets the following.
Row Price Size Condition
1
360
23
5
2
200
11
2
3
340
20
9
4
280
17
3
5
280
15
8
6
330
21
4
7
380
24
7
8
250
13
6
MTB > regress c1 2 c2 c3
Regression Analysis: Price versus Size, Condition
The regression equation is
Price = 64.5 + 11.7 Size + 4.88 Condition
Predictor
Coef SE Coef
T
P
Constant
64.539
4.228 15.27 0.000
Size
11.7282
0.2317 50.62 0.000
Condition
4.8826
0.4494 _____ _____
S = 2.75997
R-Sq = 99.9%
R-Sq(adj) = 99.8%
Analysis of Variance
Source
DF
SS
Regression
2 25712
Residual Error
5
38
Total
7 25750
Source
Size
Cond
DF
1
1
MS
12856
8
F
1687.70
P
0.000
Seq SS
24813
899
The sum of the price column is 600 and the sum of the squared numbers in the sales column is not needed.
The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2950.
The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is
284.
If Price is the dependent variable and Size and Condition are the independent variables we have found that
the sum of x1y is 45540 and the sum of x1 x2 is 818. The sum of x2y has not been computed.
6 and 7. In the multiple regression, are the coefficients of size and condition significant at the 5%
significance level? Give reasons. Do not do unneeded computations. (3)
[15]
Solution: 11.7282, the coefficient of ‘score’ has a p-value below .05 and thus must be significant.
4.8826
5
 10 .865 . Our table says that t .025
For the coefficient of ‘Condition,’ we can compute t 
 2.572 .
0.4494
Since the computed value of t exceeds the table value, we reject the null hypothesis and say that this
coefficient is significant.
8. Assuming that the coefficients in the multiple regression are correct, what price would we predict for a
home with 20(hundred) square feet and a condition score of 9? (1)
Price = 64.539 + 11.7282 Size + 4.8826 Condition = 64.539+11.7282(20)+4.8826(9)
64.539 + 234.564 + 43.943 = 343.046 (Thousand)
3
252y0771 11/27/07 (Page layout view!)
Exhibit 1 A realtor believes that the selling price of a home
(in $ thousands) is related to the condition of the home (on a
1 to 10 scale) and the size of the home (in hundreds of square
feet). He runs the data below on Minitab and gets the
following.
Row Price Size Condition
1
360
23
5
2
200
11
2
3
340
20
9
4
280
17
3
5
280
15
8
6
330
21
4
7
380
24
7
8
250
13
6
MTB > regress c1 2 c2 c3
Regression Analysis: Price versus Size, Condition
The regression equation is
Price = 64.5 + 11.7 Size + 4.88 Condition
Predictor
Coef SE Coef
T
P
Constant
64.539
4.228 15.27 0.000
Size
11.7282
0.2317 50.62 0.000
Condition
4.8826
0.4494 _____ _____
S = 2.75997
R-Sq = 99.9%
R-Sq(adj) =
99.8%
Analysis of Variance
Source
DF
SS
P
Regression
2 25712
0.000
Residual Error
5
38
Total
7 25750
MS
F
12856
1687.70
8
Source DF Seq SS
Size
1
24813
Cond
1
899
The sum of the price column is 2420 and the sum of the squared numbers in the sales column is not needed.
The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2750.
The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284.
If Price is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540
and the sum of x1 x2 is 818. The sum of x2y has not been computed.
9. Using the information in the multiple regression printout, make your result in 8) into a rough prediction
interval. (2)
Solution: The outline says that an approximate prediction interval Y0  Yˆ0  t s e . Remember
df  n  k  1  8  2  1  5 . In the printout s e  S = 2.75997  MSE  8 . So
Y0  343.046  t.5025 2.75997  343.046  2.5712.75997  343.05  7.10
10. Using the information in the printout, what is the value of R-squared for a regression of ‘Price’ against
‘Size’ alone? (2)
[20]
Solution: Looking at the sequential Sum of squares the regression sum of squares is 24813 for ‘Size’ alone.
24813
 .9636
The total sum of squares is 25750, so we have R 2 
25750
11. Do a simple regression of ‘Price’ against ‘Condition’ alone. Before you do something ridiculous see
252blunders!
xy that you will need for this regression. Show your work! (2)
a) Compute the sum

Don’t compute stuff that has already been done for you!
Solution: The only column that you should have computed is in bold below.
Row
1
2
3
4
5
6
7
8
Price
360
200
340
280
280
330
380
250
2420
Size
23
11
20
17
15
21
24
13
144
Cond
5
2
9
3
8
4
7
6
44
Ysq
129600
40000
115600
78400
78400
108900
144400
62500
757800
x1sq
529
121
400
289
225
441
576
169
2750
x2sq
x1y
x2y
25 8280 1800
4 2200
400
81 6800 3060
9 4760
840
64 4200 2240
16 6930 1320
49 9120 2660
36 3250 1500
284 45540 13820
x1x2
115
22
180
51
120
84
168
78
818
4
252y0771 11/27/07 (Page layout view!)
MTB > regress c1 2 c2 c3
Regression Analysis: Price versus Size, Condition
The regression equation is
Price = 64.5 + 11.7 Size + 4.88 Condition
Predictor
Coef SE Coef
T
P
Constant
64.539
4.228 15.27 0.000
Size
11.7282
0.2317 50.62 0.000
Condition
4.8826
0.4494 _____ _____
S = 2.75997
R-Sq = 99.9%
R-Sq(adj) =
99.8%
Exhibit 1 A realtor believes that the selling price of a home
(in $ thousands) is related to the condition of the home (on a
1 to 10 scale) and the size of the home (in hundreds of square
feet). He runs the data below on Minitab and gets the
following.
Row Price Size Condition
1
360
23
5
2
200
11
2
3
340
20
9
4
280
17
3
5
280
15
8
6
330
21
4
7
380
24
7
8
250
13
6
Analysis of Variance
Source
DF
SS
P
Regression
2 25712
0.000
Residual Error
5
38
Total
7 25750
MS
F
12856
1687.70
8
Source DF Seq SS
Size
1
24813
Cond
1
899
The sum of the price column is 2420 and the sum of the squared numbers in the sales column is not needed.
The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2750.
The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284.
If Price is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540
and the sum of x1 x2 is 818. The sum of x2y has not been computed.
b) It says that you do not need to know the sum of squares in the sales column. You do
Y 2  nY 2 . Without doing any computing, tell
however need the spare part SS y 

what its value is. (1)
Solution: The ANOVA in the computer output says that the total sum of squares is 25750.
Of course, if you like to waste time SS y 

2
Y 2  nY
2
 2420 
 757800  8
  25750 .
 8 
( Y  302.5 )
c) Compute the coefficients of the equation Yˆ  b0  b2 x to predict the value of ‘Price’ on
the basis of ‘Condition.’ (4)
[27]
XY  13820 ,
X  44,
Y  2420 , (you computed )
Solution: First copy n  8,
X
2
 284 and
Y

2

is not needed. (It’s 757800.)
Then compute means: X 
 X  44  5.5
n
8
The ‘Spare Parts’ are as follows: SS x 
You already found SS y 

Y
2
X
Y 
2
 Y  2420  302 .5 .
n
8
 nX  284  85.52  42
2
 nY 2  25750  SST (Total Sum of Squares) .
 XY  nXY  13820  85.5302 .5  510 .
S xy
 XY  nXY  510  12.14286 and
So b1 

SS x  X 2  nX 2
42
S xy 
b0  Y  b1 X  302 .5  12 .14286  5.5  235 .71429 , which means Yˆ  235 .714  12.143 X or
Y  235 .714  12.143 X  e
5
252y0771 11/27/07 (Page layout view!)
Exhibit 1 A realtor believes that the selling price of a home
(in $ thousands) is related to the condition of the home (on a
1 to 10 scale) and the size of the home (in hundreds of square
feet). He runs the data below on Minitab and gets the
following.
Row Price Size Condition
1
360
23
5
2
200
11
2
3
340
20
9
4
280
17
3
5
280
15
8
6
330
21
4
7
380
24
7
8
250
13
6
MTB > regress c1 2 c2 c3
Regression Analysis: Price versus Size, Condition
The regression equation is
Price = 64.5 + 11.7 Size + 4.88 Condition
Predictor
Coef SE Coef
T
P
Constant
64.539
4.228 15.27 0.000
Size
11.7282
0.2317 50.62 0.000
Condition
4.8826
0.4494 _____ _____
S = 2.75997
R-Sq = 99.9%
R-Sq(adj) =
99.8%
Analysis of Variance
Source
DF
SS
P
Regression
2 25712
0.000
Residual Error
5
38
Total
7 25750
MS
F
12856
1687.70
8
Source DF Seq SS
Size
1
24813
Cond
1
899
The sum of the price column is 2420 and the sum of the
squared numbers in the sales column is not needed.
The sum of the 'Size' column is 144 and the sum of the
squared numbers in the Size column is 2750.
The sum of the ‘Condition’ column is 44 and the sum of the
squared numbers in the Condition column is 284.
If Price is the dependent variable and Size and Condition are
the independent variables we have found that the sum of x1y
is 45540 and the sum of x1 x2 is 818. The sum of x2y has
not been computed.
d) Compute R 2 . (3)
Solution: SSR  b1 S xy  12.14286510  6192.859 . We can say R 2 
or R 2 
b1 S xy
SSy

SSR 6192 .859

 .2405
SST
25750
S xy 2
12 .14286 510 
510 2  .2405

 .2405 or R 2 
42 25750 
25750
SS x SS y
e) Is the slope of the simple regression significant at the 5% level? Do not answer this question
without appropriate calculations! (4)
Solution: We can compute SSE  SST  SSR  25750  6192 .859  19557 .141 . Then
SS y  b1 S xy 25750  12 .14286 510 
SSE 19557 .141
s e2 

 3259 .5235 or s e2 

 3259 .5235
n2
6
n2
6
 1  3259 .5235
s e  3259 .5235  57 .0922 . So s b21  s e2 
 77 .6077 and

42
 SS x 
H 0 : 1  0
b 0
use t  1
and if the null
s b  77 .6077  8.80952 . The outline says to test 
1
s b1
H 1 : 1  0
hypothesis is false in that case we say that 1 is significant. So our ‘do not reject’ zone is between


12 .14286  0
 1.378 is between these two
.8.80952
values, so we cannot reject the null hypothesis and must conclude that the coefficient is
insignificant.
6
 t .025
 2.447 . If   .05 . Our calculated t 
6
252y0771 11/27/07 (Page layout view!)
Exhibit 1 A realtor believes that the selling price of a home
(in $ thousands) is related to the condition of the home (on a
1 to 10 scale) and the size of the home (in hundreds of square
feet). He runs the data below on Minitab and gets the
following.
Row Price Size Condition
1
360
23
5
2
200
11
2
3
340
20
9
4
280
17
3
5
280
15
8
6
330
21
4
7
380
24
7
8
250
13
6
MTB > regress c1 2 c2 c3
Regression Analysis: Price versus Size, Condition
The regression equation is
Price = 64.5 + 11.7 Size + 4.88 Condition
Predictor
Coef SE Coef
T
P
Constant
64.539
4.228 15.27 0.000
Size
11.7282
0.2317 50.62 0.000
Condition
4.8826
0.4494 _____ _____
S = 2.75997
R-Sq = 99.9%
R-Sq(adj) =
99.8%
Analysis of Variance
Source
DF
SS
P
Regression
2 25712
0.000
Residual Error
5
38
Total
7 25750
MS
F
12856
1687.70
8
Source DF Seq SS
Size
1
24813
Cond
1
899
The sum of the price column is 2420 and the sum of the squared numbers in the sales column is not needed.
The sum of the 'Size' column is 144 and the sum of the squared numbers in the Size column is 2750.
The sum of the ‘Condition’ column is 44 and the sum of the squared numbers in the Condition column is 284.
If Price is the dependent variable and Size and Condition are the independent variables we have found that the sum of x1y is 45540
and the sum of x1 x2 is 818. The sum of x2y has not been computed.
f) Predict the price of an average home with a condition of 9 and make your estimate into an
appropriate 99% interval. (4)
Solution: We found Yˆ  235 .714  12.143 X  235 .714  12.143 9  345 .001 . The outline says

1 X X
that the Confidence Interval is  Y0  Yˆ0  t sYˆ , where sY2ˆ  s e2   0
n
SS x

 1 9  5.52 

s e2  3259.5235 and SS x  42 . So sŶ2  3259 .5325  
8

42


2  ,


X  5.5 ,
 3259 .5325 0.12500  0.29167   3259 .5325 0.41667   1358 .139 . We will use
6
 3.707 . So  Y0  Yˆ0  t sYˆ  345 .001  3.707 36 .8529 
sYˆ  1358.139  36.8529 and t .005
 345 .001  136 .614 or 208.387 to 481.615.
g) Do an analysis of variance using your SST, SSE and SSR for this equation or using 1,
R 2 and 1  R 2 . What have you already done that makes this table redundant? If you
don’t know what redundant means, ask! (3)
[43]
Solution: We actually have almost all this done. We have already found SS y  25750  SST ,
SSR  b1 S xy  12.14286510  6192.859 and SSE  SST  SSR  25750  6192 .859  19557 .141 .
So our ANOVA table will be as below.
Source
SS
DF
Regression
6192.859
1
MS
F
6192.859
1.900
F
1,6   5.99
F.05
Error
19557.141 6
3559.5235
Total
25750
7
If we recall R 2  .2405 for this regression, we can rewrite the table as below.
Source
DF
‘MS’
F
F
R2
Regression
0.2405
1
0.2405
1.900
1,6   5.99
F.05
Error
0.7595
6
0.12658
Total
1.0000
7
Just for reassurance, here is the Minitab output
7
252y0771 11/27/07 (Page layout view!)
Regression Analysis: Price versus Condition
The regression equation is
Price = 236 + 12.1 Condition
Predictor
Coef SE Coef
T
P
Constant
235.71
52.49 4.49 0.004
Condition 12.143
8.810 1.38 0.217
S = 57.0922
R-Sq = 24.0%
R-Sq(adj) = 11.4%
Analysis of Variance
Source
DF
SS
Regression
1
6193
Residual Error
6 19557
Total
7 25750
MS
6193
3260
F
1.90
P
0.217
This is redundant because we have already shown that the coefficient of ‘Condition’ is
insignificant. Because there is only one independent variable, this shows the same thing.
h) Using the information on Regression Sums of squares or R 2 and 1  R 2 in the
ANOVA that you just did and from the multiple regression, do an F test to see if adding
‘Size’ to the regression of ‘Price’ against ‘Condition’ is worthwhile. Do not waste our time by
repeating stuff that has already been done. (3)
[46]
Solution: In view of the original printout, we can rewrite our ANOVA tables above for the
multiple regression.
So our ANOVA table will be as below.
Source
SS
DF
MS
F
F
Regression
25912
2
12856
1687
Error
38
5
7.6
Total
25750
7
If we recall R 2  .9985 for this regression, we can rewrite the table as below.
Source
DF
‘MS’
F
F
R2
Regression
0.9985
2
0.49925
1642
Error
0.0015
5
0.0003
Total
1.0000
7
For the regression against ‘Condition’ alone, we had R 2  .2495 and SSR  6192 .859 . So that if
we itemize the regressions above, we get the tables below.
Source
SS
DF
MS
F
F
Regression
25912
2
Condition
6193
1
1,5   6.61
F.05
Size
19719
1
19719
2595
Error
38
5
7.6
Total
25750
7
If we use R 2 instead for this regression, we can rewrite the table as below.
Source
DF
‘MS’
F
F
R2
Regression
0.9985
2
Condition
0.2495
1
1,5   6.61
F.05
Size
0.7490
1
0.7490
2497
Error
0.0015
5
0.0003
Total
1.0000
7
In spite of the gigantic rounding error in the table using R 2 , the results are the same as in the ttest on the coefficient of ‘years’ in the second regression, the calculated F is far above the table F
so that adding ‘Size’ significantly improves the results.
8
252y0771 11/27/07 (Page layout view!)
Exhibit 2 (Groebner) A product is being produced on 3 different lines using 3 different layouts for the
lines. A sample of 36 observations are taken on various days over a period of four weeks so that there are
12 observations for the daily output for each line evenly divided between the three possible layouts. Assume
  .05 .
MTB > Twoway c4 c2 c3;
SUBC>
Means c2 c3.
Two-way ANOVA: output 1 versus line, layout
Source
DF
line
2
layout
2
Interaction __
Error
__
Total
35
S = 20.63
R-Sq
SS
MS
F
P
187.1
93.5
0.22 0.804
28263.4 14131.7 33.21 0.000
_______
_____
____ _____
11489.0
425.5
41874.6
= 72.56%
R-Sq(adj) = 64.43%
Individual 95% CIs For Mean Based on
Pooled StDev
line
Mean ------+---------+---------+---------+--1
132.583
(---------------*--------------)
2
128.167
(--------------*--------------)
3
127.417 (--------------*---------------)
------+---------+---------+---------+--120.0
128.0
136.0
144.0
Individual 95% CIs For Mean Based on
Pooled StDev
layout
Mean ----+---------+---------+---------+----1
116.667
(----*----)
2
168.250
(----*----)
3
103.250 (----*----)
----+---------+---------+---------+----100
125
150
175
12. Fill in the missing degrees of freedom, the missing sum of squares and the missing mean square. (2)
Solution: We can find the degrees of freedom by multiplying the degrees of freedom for the factors that
interact. The error degrees of freedom are whatever is needed to make the column add up and the mean
squares are found by dividing the sums of squares by degrees of freedom. The error sum of squares is
whatever makes the SS column add up. The MS is SS divided by DF. The corrected table reads as below
Two-way ANOVA: output 1 versus line, layout
Source
DF
line
2
layout
2
Interaction
4
Error
27
Total
35
S = 20.63
R-Sq
SS
MS
F
P
187.1
93.5
0.22 0.804
28263.4 14131.7 33.21 0.000
1935.1
483.78 1.136 _____
11489.0
425.5
41874.6
= 72.56%
R-Sq(adj) = 64.43%
13. Is there significant interaction between ‘line’ and ‘layout’? Don’t answer unless you can tell me what the
evidence is. (2)
4, 27  2.73
Solution: We can look up F.05
. It is larger than the computed F of 1.136. This means that we
cannot reject the null hypothesis that the interaction is insignificant.
14. Is the difference between lines significant? Why?(1)
2, 27  3.35 if we like to work. It is larger than the computed F of 0.22. Or we
Solution: We can look up F.05
can simply note that since the p-value of 0.804 is well above any significance level that we are likely to use,
we cannot reject the null hypothesis that the difference between the line means is insignificant.
9
252y0771 11/27/07 (Page layout view!)
15. Do a confidence interval of your choice for the difference between layout 1 and layout 3. Tell what kind
of interval you are using , what its characteristics are and whether it shows a significant difference. (4)[55]
Solution: The outline gives us a choice. There are 3 rows, 3 columns and 4 observations per cell. R  3,
C  3, P  4, x1..  116 .667 , x3..  103 .250 and RC P  1  333  27 . The error (within) mean square is
MSW  425 .5 . x1..  x3..  116 .667  102 .250  14.417 .
2MSW

PC
2425 .5
 70 .9167  8.4212
12
i. A Single Confidence Interval
If we desire a single interval we use the formula for a Bonferroni Confidence Interval below with
2MSW
m  1 . For rows this would be 1   3  x1  x 3   t RC P 1
2
PC
27 
8.4212   14.417  2.052 8.4212   14.417  17.280
 14 .417  t .025
ii. Scheffé Confidence Interval
If we desire intervals that will simultaneously be valid for a given confidence level for all possible
intervals between means, for row means, we use
2MSW
2, 27  8.4212 
 14 .417  2 F.05
1   2  x1  x 2   R  1FR 1, RC P 1
PC
 14 .417  23.35 8.4212   14.417  21.798
iii. Bonferroni Confidence Interval
If we use this for row means 1   2  x1  x 2   t RC P 1
2m
2MSW
, but it is usually
PC
impractical.
iv. Tukey Confidence Interval
This is of similar meaning to the Scheffé. For row means, we use
MSW
1   2  x1  x 2   qR , RC P 1
PC

3, 27
8.4212  14.417  3.518.4212   14.417  29.558 . I’m suspicious. The Tukey
 14.417  q

interval should be smaller than the Scheffé.
Part of the Tukey table appears below.

df  24
0.05
0.01
df  30
0.05
0.01
2
3
4
5
m
6
7
8
9
10
*
2.92
3.96
*
*
3.53
4.55
*
3.90
4.91
*
*
4.17
5.17
*
4.37
5.37
*
*
4.54
5.54
*
4.68
5.69
*
*
4.81
5.81
*
4.92
5.92
*
2.89
3.89
*
*
3.49
4.45
*
3.85
4.80
*
*
4.10
5.05
*
4.30
5.24
*
*
4.46
5.40
*
4.60
5.54
*
*
4.63
5.65
*
4.73
5.76
3, 24
3,30
 3.53 and q.05
 3.49 , so
It looks to me as if q3,27 ought to be halfway between q.05
q3,27  3.51 . Note that, since all of these intervals include zero, none of these differences is
significant.
10
252y0771 11/27/07 (Page layout view!)
16. (Groebner) An industrial firm analyses the amount of breakage (in dollar cost) that occurs using 3
different shipping methods. There is a strong likelihood that the data does not come from the Normal
distribution. The purpose of the test is to see if the four shipping methods differ in breakage. The columns
can be considered random samples.
Rail
Plane Truck
7960
8053
8818
8399
7764
9432
9429
9196
9260
6022
5821
5676
The most appropriate method for doing this test is:
a) The Friedman Test
b) *The Kruskal-Wallis Test
c) One-way ANOVA
d) Two-way ANOVA
e) The sign test
[57]
f) Another test (Name it!)
17. Assume that your decision is correct in 16. What is your null hypothesis or hypotheses? Be specific! Are
you talking about rows or columns or both? Are you comparing means, medians, proportions or variances?
Solution: The null hypothesis is that the medians of the columns are equal.
18. OK. Let’s see you do the test. (4)
Rail
Plane Truck
5
6
8
7
4
12
11
9
10
3
2
1
Sum
26
21
31
[63]
nn  1 12 13 

 78 and 26 + 21 + 31 = 78.
2
2
 12
 SRi 2 

  3n  1
Now, compute the Kruskal-Wallis statistic H  
 nn  1 i  ni 
Check: the sum of the first n numbers is

 12  26 2 212 312 

  313  .   1  1 676  441  961   313   1 519 .50   39  0.9615



4
4 
13
12 13   4
 13   4 

The 4,4,4 part of the K-W table follows. We can only conclude that the p-value is above .104. This value is
above any commonly used significance level, so we cannot reject our null hypothesis.
4
4
4
7.6538
7.5385
5.6923
5.6538
4.6539
4.5001
.008
.011
.049
.054
.097
.104
Minitab output follows for reassurance.
MTB > Kruskal-Wallis c1 c2.
Alternate 3
Kruskal-Wallis Test: breakage versus mode
Kruskal-Wallis Test on breakage
Ave
mode
N Median Rank
Z
plane
4
7909
5.3 -0.85
rail
4
8180
6.5
0.00
truck
4
9039
7.8
0.85
Overall 12
6.5
H = 0.96 DF = 2 P = 0.618
* NOTE * One or more small samples
11
252y0771 11/27/07 (Page layout view!)
ECO252 QBA2
THIRD EXAM
Nov 26-29, 2007
TAKE HOME SECTION
Name: _________________________
Student Number: _________________________
Class days and time : _________________________
Please Note: Computer problems 2 and 3 should be turned in with the exam (2). In problem 2, the 2 way
ANOVA table should be checked. The three F tests should be done with a 1% significance level and you
should note whether there was (i) a significant difference between drivers, (ii) a significant difference
between cars and (iii) significant interaction. In problem 3, you should show on your third graph where the
regression line is. You should explain whether the coefficients are significant at the 1% level. Check what
your text says about normal probability plots and analyze the plot you did. Explain the results of the t and F
tests using a 5% significance level. (3)
III Do the following. (22+ points) Note: Look at 252thngs (252thngs) on the syllabus supplement part of
the website before you start (and before you take exams). Show your work! State H 0 and H 1 where
appropriate. You have not done a hypothesis test unless you have stated your hypotheses, run the
numbers and stated your conclusion. (Use a 95% confidence level unless another level is specified.)
Answers without reasons or accompanying calculations usually are not acceptable. Neatness and
clarity of explanation are expected. This must be turned in when you take the in-class exam. Note
that from now on neatness means paper neatly trimmed on the left side if it has been torn, multiple
pages stapled and paper written on only one side. Show your work!
1) The Lees, in their book on statistics for Finance majors, ask about the relationship of gasoline prices  y 
in cents per gallon to crude oil prices x1  in dollars per barrel and present the data for the years 1975 1988. I have obtained most of the data for the years 1980 – 2007. It is presented below.
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Year
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
GasPrice
1.25
1.38
1.30
1.24
1.21
1.20
0.93
0.95
0.96
1.02
1.16
1.14
1.13
1.11
1.11
1.15
1.23
1.23
1.06
1.17
1.51
1.46
1.36
1.59
1.88
2.30
*
3.10
CrudePrice
26.07
35.24
31.87
26.99
28.63
26.25
14.55
17.90
14.67
17.97
22.22
19.06
18.43
16.41
15.59
17.23
20.71
19.04
12.52
17.51
28.26
22.95
24.10
28.53
36.98
50.23
*
90.00
Yr-1979
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
This data set also contains the year with 1979 subtracted from it x 2  . You may need to use this later.
Ignore it in Problem 1. Note that the numbers for 2006 have not yet been published in my source, Statistical
12
252y0771 11/27/07 (Page layout view!)
Abstract of the United States, and that the numbers for 2007 are my estimates for third quarter prices. These
are unleaded prices, which the Lees did not use. You are supposed to use only the numbers for 1990
through 2006 and one other observation for your data. You will thus have n  17 observations. The other
column is the value for the year 1980  a  , where a is the second to last digit of your student number. If
you are unsure of the data that you are using or if you want help with the sums that you need to do the
regression go to 3takehome072a.
Show your work – it is legitimate to check your results by running the problem on the computer. (In fact, I
will give you 2 points extra credit for checking it and annotating the output for significance tests etc.) But I
expect to see hand computations for every part of this problem.
a. Compute the regression equation Y  b0  b1 x to predict the price of gasoline on the basis of
crude oil prices. (3)
b. Compute R 2 . (2)
c. Compute s e . (2)
d. Compute s b1 and do a significance test on b1 (2)
e. Compute a confidence interval for b0 . (2)
f. You have a crude price for 2007. Using this, predict the gasoline price for 2007 and create a
prediction interval for the price of gasoline for that year. Explain why a confidence interval for the
price is inappropriate and check to see if my estimated price is in the interval. (3)
g. Do an ANOVA for this regression. (3)
f) Make a graph of the data. Show the trend line and the data points clearly. If you are not willing
to do this neatly and accurately, don’t bother. (2)
[19]
2) Now we can use the date to see if there is a trend line in addition to the effect of crude oil.
a. Do a multiple regression of the price of gasoline against crude prices and the data variable,
which has been massaged to make 1980 year 1. This involves a simultaneous equation solution.
Attempting to recycle b1 from the previous page won’t work. (7)
c. Compute the regression sum of squares and use it in an ANOVA F test to test the usefulness of
this regression. (4)
b. Compute R 2 and R 2 adjusted for degrees of freedom for both this and the previous problem.
Compare the values of R 2 adjusted between this and the previous problem. Use an F test to
compare R 2 here with the R 2 from the previous problem. The F test here is one to see if adding a
new independent variable improves the regression. This can also be done by modifying the
ANOVAs in b.(4)
d. Use your regression to predict the price of gasoline in 2007. Is this closer to the estimated
gasoline price? Do a confidence interval and a prediction interval. (3)
[37]
e. Again there is extra credit for checking your results on the computer. Use the pull-down menu or
try
Regress GasPrice on 2 CrudePrice Yr-1979 (2)
3) According to Russell Langley, three sopranos were discussing their recent performances. Fifi noted that
she got 36 curtain calls at La Scala last week, but Adalina put her down with the fact that she got 39. Could
one of the singers really say that she had more curtain calls than another or could the differences just be due
to chance?
Personalize the data below by adding the last digit of your student number to each number in the
first row. Use a 10% significance level throughout this question.
Row
1
2
3
4
Fifi
36
22
19
16
Adelina
39
14
20
18
Maria
21
32
28
22
a) State your hypothesis and use a method to compare means assuming that each column represents a
random sample of curtain calls at La Scala. (4)
13
252y0771 11/27/07 (Page layout view!)
b) Still assuming that these are random samples, use a method that compares medians instead. (3)
c) Actually, these were not random samples. Though row 1 represents curtain calls at La Scala (Milan), row
2 was in Venice, row 3 in Naples and row 4 in Rome. Will this affect our results? Does this show anything
about audiences on the four cities? Use an appropriate method to compare medians. (5)
d) Do two different types of confidence intervals between Milan and the least enthusiastic opera house.
Explain the difference between the intervals. (2)
e) Assume that we want to compare medians instead. How does the fact that these data were collected at
three opera houses affect the results? (3)
f) Do you prefer the methods that compare medians or means? Don’t answer this unless you can
demonstrate an informed opinion. (1)
g) (Extra credit) Do a Levine test on these data and explain what it tests and shows.(3)
h) (Extra credit)Check your work on the computer. This is pretty easy to do. Use the same format as in
Computer Problem 2, but instead of car and driver numbers use the singers’ and cities’ names. You can use
the stat and ANOVA pull-down menus for One-way ANOVA, two-way ANOVA and comparison of
variances of the columns. You can use the stat and the non-parametrics pull-down menu for Friedman and
Kruskal-Wallis. You also probably ought to test columns for Normality. Use the Statistics pull-down menu
and basic statistics to find the normality tests. The Kolmogorov-Smirnov option is actually Lilliefors. The
ANOVA menu can check for equality of variances. In light of these tests was ANOVA appropriate? You
can get descriptions of unfamiliar tests by using the Help menu and the alphabetic command list or the Stat
guide. (Up to 7) [58]
You should note conclusions on the printout – tell what was tested and what your conclusions are using a
10% significance level.
14
Download