252y0343 1/12/04 ECO252 QBA2 Name KEY

advertisement
252y0343 1/12/04
ECO252 QBA2
Name KEY
FINAL EXAM
Hour of Class Registered _______
DEC 11, 2003
I. (25+ points) Do all the following. Note that answers without reasons receive no credit. Most answers
require a statistical test, that is, stating or implying a hypothesis and showing why it is true or false by citing
a table value or a p-value.
The fourth computer problem involved the regression of the Y variable below against some, but not all of
the X values.
Column
Variables in Data Set
C2
X2
Type
1 = Private, 0 = Public.
C3
X1
First Quartile SAT
C4
X5
3rd Quartile SAT
C5
X4
Room and Board Cost
C6
Y
Annual Total Cost
C7
X6
Average Indebtedness at Graduation
C8
X3
Interaction = X1 * X2
You were directed to hand in the computer output (4 points) and your answer to problem 14.37 or the
equivalent problem in the 8th edition. (Up to 7 points). I ran the same problem you did, but went on to add
X4 and X5 to the input. The output appears on pages 1-7.
My first two regressions were stepwise regressions. The second stepwise regression is set up to force the
dummy variable designating ‘type of university’ into the equation. This means that our first equation in
regression 2 is essentially Yˆ  b0  b2 X 2 .
a)According to the first regression in regression 2 what are the mean annual total costs for public
and private universities and how does the printout show us that they are significantly different? (2)
b) Regressions 3 and 4 are the regressions you supposedly did. According to this regression for a
public university the constant in the regression equation is b0  1013 and the slope, relative to the first
quartile SAT is b1  11.3339. The equation relating annual total costs to the first quartile SAT effectively
has both a different intercept and a different slope; what is the equation? Are the intercepts and slopes for
public and private universities, in fact significantly different? What tells us this? (3). (Extra credit: at what
SAT level do public and private universities have the same cost? (2))
c) Regression 6 should be the best of all the regressions, because it has the most independent
variables and the highest R-squared, but it isn’t. (i) Look at the coefficients of the independent variables and
ignore their significance, one of those coefficients is incredibly unreasonable, which one is it? (1) (ii)
Which coefficients are significant at the 1% level, why? (2) What about the 10% level? (1) Compare the
adjusted R-squares with the other regressions, what do they tell us? (1) Look at the VIFs, what do they
imply?(2)
d) Do an F test to tell whether adding X3, X4 and X5 as a package to equation 3 with only X1 and
X2 was useful? What is your conclusion? (4)
e) I didn’t follow directions when I did a prediction interval for equation 3, so it should disagree
with yours. I added some guesses as to (median?) values for X3, X4 and X5. What does the printout say I
used? What would you expect should happen to the size of the prediction interval if our addition of new
variables gives us a better estimate of Y? Did it happen? Cite numbers.(3)
f) Use the method suggested in the text, using the standard error s e to compute a prediction
interval for the same values of the independent variables and equation 3 – how accurate is it? (3) 32
————— 12/5/2003 7:15:10 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > Retrieve "C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive
D\MINITAB\Colleges2002.MTW".
252y0343 1/12/04
Retrieving worksheet from file: C:\Documents and Settings\RBOVE.WCUPANET\My
Documents\Drive D\MINITAB\Colleges2002.MTW
# Worksheet was saved on Fri Dec 05 2003
Results for: Colleges2002.MTW
MTB > Stepwise c6 c3 c2 c8 c5 c4;
SUBC>
AEnter 0.15;
SUBC>
ARemove 0.15;
SUBC>
Constant.
1) Stepwise Regression: Annual Total versus First quarti, Type of Scho, ...
Alpha-to-Enter: 0.15
Alpha-to-Remove: 0.15
Response is Annual T on
5 predictors, with N =
Step
Constant
1
12198
2
-1021
inter
T-Value
P-Value
10.42
17.40
0.000
8.35
11.97
0.000
First qu
T-Value
P-Value
80
13.3
4.62
0.000
S
3058
2724
R-Sq
79.51
83.95
R-Sq(adj)
79.25
83.54
C-p
20.2
1.3
More? (Yes, No, Subcommand, or Help)
SUBC> yes
No variables entered or removed
More? (Yes, No, Subcommand, or Help)
SUBC> no
MTB > Stepwise c6 c3 c2 c8 c5 c4;
SUBC>
Force c2;
SUBC>
AEnter 0.15;
SUBC>
ARemove 0.15;
SUBC>
Constant.
2) Stepwise Regression: Annual Total versus First quarti, Type of Scho, ...
Alpha-to-Enter: 0.15
Alpha-to-Remove: 0.15
Response is Annual T on
5 predictors, with N =
Step
Constant
1
12478
2
-7264
3
1013
Type of
T-Value
P-Value
11646
13.97
0.000
8732
11.57
0.000
-3016
-0.48
0.630
19.5
7.37
0.000
11.3
2.25
0.027
First qu
T-Value
P-Value
inter
T-Value
P-Value
11.2
1.90
0.061
S
3610
2783
2737
R-Sq
71.44
83.24
84.00
R-Sq(adj)
71.07
82.81
83.37
C-p
58.1
4.7
3.1
More? (Yes, No, Subcommand, or Help)
SUBC> yes
80
252y0343 1/12/04
No variables entered or removed
More? (Yes, No, Subcommand, or Help)
SUBC> no
MTB > Name c18 = 'RESI1'
MTB > Regress c6 2 c3 c2;
SUBC>
Residuals 'RESI1';
SUBC> GHistogram;
SUBC> GNormalplot;
SUBC> GFits;
SUBC> RType 1;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Predict c9 c10;
SUBC>
Brief 2.
3) Regression Analysis: Annual Total versus First quarti, Type of Scho
The regression equation is
Annual Total Cost = - 7264 + 19.5 First quartile SAT + 8732 Type of School
Predictor
Constant
First qu
Type of
Coef
-7264
19.524
8732.4
S = 2783
SE Coef
2728
2.651
754.7
R-Sq = 83.2%
T
-2.66
7.37
11.57
P
0.009
0.000
0.000
VIF
1.4
1.4
R-Sq(adj) = 82.8%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
First qu
Type of
DF
1
1
DF
2
77
79
SS
2963313624
596492779
3559806404
MS
1481656812
7746659
F
191.26
P
0.000
Seq SS
1926306635
1037006989
Unusual Observations
Obs
First qu
Annual T
27
1040
21484
56
1010
15722
61
1320
17526
Fit
13041
21188
27240
SE Fit
514
560
578
Residual
8443
-5466
-9714
St Resid
3.09R
-2.00R
-3.57R
R denotes an observation with a large standardized residual
Predicted Values for New Observations
New Obs
1
Fit
20993
SE Fit
579
(
95.0% CI
19839,
22147)
(
95.0% PI
15332,
26654)
Values of Predictors for New Observations
New Obs
1
First qu
1000
Type of
1.00
Residual Histogram for Annual T
Normplot of Residuals for Annual T
Residuals vs Fits for Annual T
MTB > %Resplots c18 c2;
SUBC>
Title "Residuals vs Type".
Executing from file: W:\wminitab13\MACROS\Resplots.MAC
Macro is running ... please wait
Residual Plots: RESI1 vs Type of Scho
252y0343 1/12/04
MTB > %Resplots c18 c3;
SUBC>
Title "Residuals vs Type".
Executing from file: W:\wminitab13\MACROS\Resplots.MAC
Macro is running ... please wait
Residual Plots: RESI1 vs First quarti
MTB > Name c19 = 'RESI2'
MTB > Regress c6 3 c3 c2 c8;
SUBC>
Residuals 'RESI2';
SUBC> GHistogram;
SUBC> GNormalplot;
SUBC> GFits;
SUBC> RType 1;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Predict c9 c10 c11;
SUBC>
Brief 2.
4) Regression Analysis: Annual Total versus First quarti, Type of Scho, ...
The regression equation is
Annual Total Cost = 1013 + 11.3 First quartile SAT - 3016 Type of School
+ 11.2 inter
Predictor
Constant
First qu
Type of
inter
Coef
1013
11.339
-3016
11.177
S = 2737
SE Coef
5120
5.039
6234
5.889
R-Sq = 84.0%
T
0.20
2.25
-0.48
1.90
P
0.844
0.027
0.630
0.061
VIF
5.2
97.2
120.6
R-Sq(adj) = 83.4%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
First qu
Type of
inter
DF
1
1
1
DF
3
76
79
SS
2990309581
569496823
3559806404
MS
996769860
7493379
F
133.02
P
0.000
Seq SS
1926306635
1037006989
26995957
Unusual Observations
Obs
First qu
Annual T
3
800
9476
9
1250
13986
27
1040
21484
61
1320
17526
Fit
10084
15186
12805
27718
SE Fit
1176
1303
520
622
Residual
-608
-1200
8679
-10192
St Resid
-0.25 X
-0.50 X
3.23R
-3.82R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Predicted Values for New Observations
New Obs
1
Fit
20513
SE Fit
623
(
95.0% CI
19271,
21755)
Values of Predictors for New Observations
New Obs
1
First qu
1000
Type of
1.00
MTB > Name c20 = 'RESI3'
MTB > Regress c6 4 c3 c2 c8 c5;
SUBC>
Residuals 'RESI3';
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Predict c9 c10 c11 c12;
SUBC>
Brief 2.
inter
1000
(
95.0% PI
14921,
26105)
252y0343 1/12/04
5) Regression Analysis: Annual Total versus First quarti, Type of Scho, ...
The regression equation is
Annual Total Cost = - 13 + 11.4 First quartile SAT - 3053 Type of School
+ 10.9 inter + 0.165 Room and Board
Predictor
Constant
First qu
Type of
inter
Room and
Coef
-13
11.382
-3053
10.928
0.1655
S = 2750
SE Coef
5483
5.064
6263
5.934
0.3062
R-Sq = 84.1%
T
-0.00
2.25
-0.49
1.84
0.54
P
0.998
0.028
0.627
0.069
0.591
VIF
5.2
97.3
121.3
1.9
R-Sq(adj) = 83.2%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
First qu
Type of
inter
Room and
DF
4
75
79
DF
1
1
1
1
SS
2992518033
567288370
3559806404
MS
748129508
7563845
F
98.91
P
0.000
Seq SS
1926306635
1037006989
26995957
2208452
Unusual Observations
Obs
First qu
Annual T
9
1250
13986
27
1040
21484
61
1320
17526
Fit
15174
12880
27621
SE Fit
1309
541
650
Residual
-1188
8604
-10095
St Resid
-0.49 X
3.19R
-3.78R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Predicted Values for New Observations
New Obs
1
Fit
20071
SE Fit
1030
(
95.0% CI
18020,
22122)
(
95.0% PI
14221,
25922)
Values of Predictors for New Observations
New Obs
1
First qu
1000
Type of
1.00
inter
1000
Room and
5000
MTB > Name c21 = 'RESI4'
MTB > Regress c6 5 c3 c2 c8 c5 c4;
SUBC>
Residuals 'RESI4';
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Predict c9 c10 c11 c12 c13;
SUBC>
Brief 2.
6) Regression Analysis: Annual Total versus First quarti, Type of Scho, ...
The regression equation is
Annual Total Cost = 5873 + 26.2 First quartile SAT - 4605 Type of School
+ 12.2 inter + 0.150 Room and Board - 17.0 Third quartile SAT
Predictor
Constant
First qu
Type of
inter
Room and
Third qu
S = 2754
Coef
5873
26.23
-4605
12.162
0.1503
-17.01
SE Coef
8515
17.18
6502
6.096
0.3070
18.81
R-Sq = 84.2%
T
0.69
1.53
-0.71
2.00
0.49
-0.90
P
0.493
0.131
0.481
0.050
0.626
0.369
R-Sq(adj) = 83.2%
VIF
59.2
104.5
127.7
1.9
58.0
252y0343 1/12/04
Analysis of Variance
Source
Regression
Residual Error
Total
Source
First qu
Type of
inter
Room and
Third qu
DF
5
74
79
DF
1
1
1
1
1
SS
2998718897
561087507
3559806404
MS
599743779
7582264
F
79.10
P
0.000
Seq SS
1926306635
1037006989
26995957
2208452
6200863
Unusual Observations
Obs
First qu
Annual T
3
800
9476
9
1250
13986
27
1040
21484
61
1320
17526
Fit
9606
15381
13192
27217
SE Fit
1323
1331
642
789
Residual
-130
-1395
8292
-9691
St Resid
-0.05 X
-0.58 X
3.10R
-3.67R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Predicted Values for New Observations
New Obs
1
Fit
20002
SE Fit
1034
95.0% CI
17942,
22061)
(
(
95.0% PI
14141,
25862)
Values of Predictors for New Observations
New Obs
1
First qu
1000
Type of
1.00
inter
1000
Room and
5000
Third qu
1200
MTB > Name c22 = 'RESI5'
MTB > Regress c6 2 c3 c5 ;
SUBC>
Residuals 'RESI5';
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Predict c9 c12 ;
SUBC>
Brief 2.
7) Regression Analysis: Annual Total versus First quarti, Room and Boa
The regression equation is
Annual Total Cost = - 24258 + 27.9 First quartile SAT + 1.84 Room and Board
Predictor
Constant
First qu
Room and
Coef
-24258
27.927
1.8439
S = 3959
SE Coef
3686
3.532
0.3534
R-Sq = 66.1%
T
-6.58
7.91
5.22
P
0.000
0.000
0.000
VIF
1.2
1.2
R-Sq(adj) = 65.2%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
First qu
Room and
DF
1
1
DF
2
77
79
SS
2352968982
1206837422
3559806404
Seq SS
1926306635
426662346
MS
1176484491
15673213
F
75.06
P
0.000
252y0343 1/12/04
Unusual Observations
Obs
First qu
Annual T
14
920
7210
16
1120
9451
41
1060
25865
53
900
17886
61
1320
17526
Fit
16752
18272
14472
18758
26398
SE Fit
1006
593
849
1441
845
Residual
-9542
-8821
11393
-872
-8872
St Resid
-2.49R
-2.25R
2.95R
-0.24 X
-2.29R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Predicted Values for New Observations
New Obs
1
Fit
12889
SE Fit
820
(
95.0% CI
11255,
14522)
(
95.0% PI
4838,
20939)
Values of Predictors for New Observations
New Obs
1
First qu
1000
Room and
5000
Note:
The assignment of the fourth computer problem was given as follows.
This problem is problem 14.37 in the 9 th edition of the textbook. The data are on the next two
pages. As far as I can see, the problem is identical to Problem 15.10 in the 8 th edition, but you
should use the 9th edition data which can be downloaded from the website at
http://courses.wcupa.edu/rbove/eco252/Colleges2002.MTP or the disk for the 9th edition. To
download from the website enter Minitab, use the file pull-down menu, pick ‘open worksheet’
and copy the URL into ‘File name.’ If you can save the worksheet on the computer on which you
are working.
Read through the document 252soln K1 and use the analysis there to help you with this problem.
In particular Exercise 14.35 is almost identical to this problem and my description is very similar.
Once you get the data loaded, you need a column for the interaction variable and two
columns for the
input to the prediction interval.
The problem says “ Develop a model to predict the annual total cost based on SAT score and whether the
school is public or private.”
The assignment was amplified as follows.
Dec 5 – I finally got around to running the last computer problem . I have suggested that you
carefully and neatly write up a solution to problem 14.37, using my write – up to problems in that
section as a model, and turn it in with your computer output and the exam. When I ran it I used t
the following allocation of columns:
Column
Variables in Data Set
C2
X2
Type
1 = Private
C3
X1
First Quartile SAT
C4
X5
3rd Quartile SAT
C5
X4
Room and Board Cost
C6
Y
Annual Total Cost
C7
X6
Average Indebtedness at Graduation
C8
X3
Interaction = X1 * X2
I used the columns beyond column 8 for the data for the prediction interval, for example I put
1000 in column 9, your inputs for the interval should be in the same order that you name the
predictors in the pull – down menu regression instruction. Of course I didn’t actually use X6,
which is part of a very different problem, though I did experiment with some of the other
variables.
252y0343 1/12/04
1) There was no reason why the majority of you concluded that Y , the dependent variable, was either X2
(Type: 1 = Private) or X1 (First Quartile SAT). Does it make sense to explain First Quartile SAT by how
much the school costs?
2) To be able to explain the results or even do the problem, you have to have read some of the posted
problem solutions. I don’t see any evidence that many of you had read them.
Unfortunately, I do not have time to write the solution I wanted to see. The following comes from the text
solution manual – but I hardly expected anything this complete.
(a)
(b)
(c)
Yˆ  7263.5561  19.5239X 1  8732.4175X 2 , where X1 = first quartile SAT
score and X2 = type of institution (public = 0, private = 1).
Holding constant the effect of type of institution, for each point increase on the
first quartile SAT, the total cost is estimated to increase on average by $19.53.
For a given first quartile SAT score, a private college or university is estimated
to have an average total cost of $8732.42 over a public institution.
Yˆ  7263.5561  19.5239 1000   8732.4175 0  = $12260.38
(d)
First quartile SAT Residual Plot
Residuals
14.37
10000
8000
6000
4000
2000
0
-2000
-4000
-6000
-8000
-10000
-12000
0
200
400
600
800
1000
First quartile SAT
1200
1400
1600
252y0343 1/12/04
Based on a residual analysis, the model shows departure from the
homoscedasticity assumption caused by the first quartile SAT score. From the
normal probability plot, the residuals appear to be normally distributed with the
exception of the single outlier in each of the two tails.
(e)
(f)
(g)
between total cost and the two dependent variables.
For X1: t  7.3654  t77  1.9913 . Reject H0. first quartile SAT score makes a
significant contribution and should be included in the model.
For X2: t  11.5700  t77  1.9913 . Reject H0. Type of institution makes a
significant contribution and should be included in the model.
Based on these results, the regression model with the two independent variables
should be used.
14.2456  1  24.8023 , 7229.5242  2  10235.3109
Normal Probability Plot
10000
8000
6000
4000
Residuals
14.37
cont.
F  191.26  F2,77  3.1154 . Reject H0. There is evidence of a relationship
2000
0
-2.5
-2000
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-4000
-6000
-8000
-10000
-12000
Z Value
(h)
rY2.12  0.8324 . 83.24% of the variation in total cost can be explained by
variation in first quartile SAT score and variation in type of institution.
(i)
2
radj
 0.8281
(j)
rY21.2  0.4133 . Holding constant the effect of type of institution, 41.33% of the
(k)
variation in total cost can be explained by variation in first quartile SAT score.
rY22.1  0.6348 . Holding constant the effect of first quartile SAT score, 63.48%
of the variation in total cost can be explained by variation in type of institution.
The slope of total cost with first quartile SAT score is the same regardless of
whether the institution is public or private.
(l)
Yˆ  1013.1354  11.3386X 1  3015.6623X 2  11.1768X 1 X 2 .
(m)
For X1X2: the p-value is 0.0615. Do not reject H0. There is not evidence that the
interaction term makes a contribution to the model.
The two-variable model in (a) should be used.
252y0343 12/10/03
II. Do at least 4 of the following 6 Problems (at least 13 each) (or do sections adding to at least 50 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where
applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing
appropriate statistical tests – That is, explain your hypotheses and what values from what table were used to
test them.
1. A marketing analyst collects data on the screen size and price of the two models produced by a
competitor. Here ‘price’ is the price in dollars, ‘size’ is screen size in inches, ‘model’ is 1 for the deluxe
model (zero for the regular model) and rx1 is the column in which you may rank x1 .
Row
price
1
2
3
4
5
6
7
8
9
10
size
y
x1
371.69
403.61
484.41
492.89
606.25
634.41
651.00
806.25
1131.00
1739.00
7320.51
13
19
21
17
25
21
210
25
290
370
1011
a. Fill in the
model
x12
x 22
0
169
0
361
0
441
1
289
0
625
1
441
0 44100
1
625
0 84100
1 136900
4 268051
0
0
0
1
0
1
0
1
0
1
4
x2
y2
x1 y
138153
162901
234653
242941
367539
402476
423801
650039
1279161
3024121
6925785
x2 y
4832
0.00
7669
0.00
10173
0.00
8379 492.89
15156
0.00
13323 634.41
136710
0.00
20156 806.25
327990
0.00
643430 1739.00
1187817 3672.55
x1 x 2
0
0
0
17
0
21
0
25
0
370
433
rx1
1
3
9.0
10.0
x1 x 2 column.(2) Most people did this and just about everyone got credit.
b. Compute the simple regression of price against size.(6)
c. Compute R squared and R squared adjusted for degrees of freedom. (3)
d. Compute the standard error s e (3)
e. Compute
s b1
and make it into a confidence interval for
1 . (3)
f. Do a prediction interval for the price of a model with a 19 inch screen. (4) 21
Solution: a) Fill in the x1 x 2 column.(2) Material is in red above.
b) Compute the simple regression of price against size.
x1  1011,
y  7320.51,
From above n  10,
y

2

x
2
1
 x y  1187817 and
 268051,
1
 6925785. (In spite of the fact that most column computations were done for you, many of you
wasted time and energy doing them over again. Then there were those who decided that instead of
x1
y  1011 7320 .51 . Anyone
x1 y that was computed for you decided that
x1 y  ???
the

  

who did this should be sentenced to repeat ECO251.)
Spare Parts Computation:
x1 1011
x1 

 101 .100
n
10


y
y
7320 .51

 732 .051
n
10
Note that the starred quantities are sums of
squares and must be positive.
x1 y  nx1 y 447713 .439
Sx y
b1  1 

 2.6997
SSx1
165838 .9
x12  nx1 2


Yˆ  b0  b1 x becomes Yˆ  459.11 2.700 x .
SSx1 
x
2
1
 nx12  268051  10 101 .100 2
 165838 .9 *
Sx1 y 
 x y  nx y  1187817  10101 .100 732 .051
1
1
 447713 .439
SSy 
y
2
 ny 2  6925785  10 732 .051 2 
 1566799 *
b0  y  b1 x  732 .051  2.6997 101 .100  459 .11
252y0343 12/10/03
c) Compute R squared and R squared adjusted for degrees of freedom. (3)
y 2  ny 2  1566799 .
We already know that SST  SSy 

 x y  nx y   2.6997 447713 .439   1208692 so
b  X Y  nX Y 
b Sx y 
SSR 1208692
R 



 .7714 . ( R must be between zero and one!)
SSy
SSy 1566799 
 Y  nY 
 X Y  nX Y 
Sx y 
447713 .439   .7714 so
We could also try R 






SSx
SSy
165838
.91566799 
 X  nX  Y  nY 
SSR  b1 Sx1 y  b1
1
1
2
2
1
1
2
1
2
1
1
2
2
2
1
2
2
2
1
2
1
2
1
2
2
1
1
that SSR  b1 Sx1 y  R SST   .77141566799  1208629.
2
R2 
n  1R 2  k  90.7714  1  .7428 .
k is the number of independent variables. R squared adjusted
n  k 1
8
for degrees of freedom must be below R squared.
SSE
358107
d) Compute the standard error s e (3) s e2 

 44763 and s e  44763  211 .57
n  k 1
8
where SSE  SST  SSR  1566799  1208692  358107
e) Compute s b1 and make it into a confidence interval for 1 . (3) Using the formula from the outline,


s2
1
  e  44763  0.2699 s  0.2699  0.5195
s b21  s e2 
b1

X 1 2  nX 1 2  SSx1 165838 .9


8
t nk 1  t.025
 2.306
1  b1  t nk 1 sb1  2.6997  2.3060.5195  2.70  1.20

2
2
f) Do a prediction interval for the price of a model with a 19 inch screen. (4)
21
2
1

X0  X
 . In
From the outline, the Prediction Interval is Y0  Yˆ0  t sY , where sY2  s e2  

1

2
2
n

X 1  nX 1


ˆ
ˆ
this formula, for some specific X , Y  b  b X . Here X  19 and Y  459.11 2.700 x , so


0
0
0
1
0


0
Yˆ0  459.11 2.70019  510.41, X  101.1 and n  10 . Then

1 X X
sY2  s e2   0
n
SSx1

2  1  44763  1  19  101 .12  1  44763 1.1  .0406   51059
 10
165838 .9 

8
sY  51059  225 .96 , so that, if t nk 1  t.025
 2.306 the prediction interval is


and
2
Y0  Yˆ0  t sY  510.41 2.306225.96  510.41 521.06 . This represents a confidence interval for a
particular value that Y will take when x  19 and is proportionally rather gigantic because we have picked
a point fairly far from the mean of the data that was actually experienced.
11
252y0343 1/12/04
2. A marketing analyst collects data on the screen size and price of the two models produced by a
competitor. Here ‘price’ is the price in dollars, ‘size’ is screen size in inches, ‘model’ is 1 for the deluxe
model (zero for the regular model) and rx1 is the column in which you will rank x1 .
Row
1
2
3
4
5
6
7
8
9
10
price
size
y
x1
371.69
403.61
484.41
492.89
606.25
634.41
651.00
806.25
1131.00
1739.00
7320.51
13
19
21
17
25
21
210
25
290
370
1011
model
x12
x 22
0
169
0
361
0
441
1
289
0
625
1
441
0 44100
1
625
0 84100
1 136900
4 268051
0
0
0
1
0
1
0
1
0
1
4
x2
y2
x1 y
138153
162901
234653
242941
367539
402476
423801
650039
1279161
3024121
6925785
x2 y
x1 x 2
4832
0.00
7669
0.00
10173
0.00
8379 492.89
15156
0.00
13323 634.41
136710
0.00
20156 806.25
327990
0.00
643430 1739.00
1187817 3672.55
0
0
0
17
0
21
0
25
0
370
433
rx1
1
3
9.0
10.0
a. Do a multiple regression of price against size and model.(10)
b. Compute R-squared and R-squared adjusted for degrees of freedom for this regression and compare them with the
values for the previous problem. (4)
c. Using either R – squares or SST, SSR and SSE do F tests (ANOVA). First check the usefulness of the simple
regression and then the value of ‘model’ as an improvement to the regression (6)
d. Predict the price of a deluxe model with a 19 inch screen – how much change is there from your last prediction? (2)
Solution: a) Do a multiple regression of price against size and model.(10)
We have the following spare parts from the last problem.
Spare Parts Computation:
SSx1 
x12  nx12  268051  10 101 .100 2
x1 1011
 165838 .9 *
x1 

 101 .100
n
10
Sx1 y 
x1 y  nx1 y  1187817  10 101 .100 732 .051 
y 7320 .51
 447713 .439
y

 732 .051
n
10
SSy 
y 2  ny 2  6925785  10 732 .051 2 





 1566799 *
x
And from above n  10,
x
2
x
 4,
2
2
X
 4*,
2Y
 2672 .55 and

X
1X 2
 433 .00 so that
 
4
2
X 2 Y  ??
X2
Y . Perhaps you thought

 0.400 . About half of you decided that
n
10
that I was crazy to do all these computations for you. Note that the starred quantities are sums of squares
and must be positive.
x2 
We need
and
 X Y  nX Y  3672 .55 100.40 732 .051  744 .346
SSx2   X 22  nX 22  4.00  100.42  2.400 *
Sx x   X X  nX X  433 .00  10101 .10.400   28.6
Sx 2 y 
1 2
2
1
2
2
1
2
* indicates quantities that must be positive.
Then we substitute these numbers into the Simplified Normal Equations:
X 1Y  nX 1Y  b1
X 12  nX 12  b2
X 1 X 2  nX 1 X 2


 X Y  nX Y  b  X X
2
2
1
1
2
 
 nX X   b  X
1
2
2
2
2

 nX 
2
2
12
252y0343 1/12/04
or S x1 y  SS x1 b1  S x1 x2 b2
1
and S x2 y  S x1x2 b1  SS x2 b2 ,
447713 .439  165838 .9b1  28 .6b2

28 .6b1  2.4b2
 744 .346 
which become
We solve the Normal Equations as two equations in two unknowns for b1 and b2 . These are a fairly tough
pair of equations to solve until we notice that, if we multiply 2.4 by 11.91667 we get 28.6
447713 .439  165838 .9b1  28 .6b2
If we subtract these, we get 438843  165498 .1b1 . This means that

 8870 .000  340 .82b1  28 .6b2
438843
 2.6547 . Now remember that 744 .346  28.6b1  2.4b2 and this means
165498 .1
618 .5074
 278 .54 .
744 .346  28 .62.6537   2.4b2 or 2.4b2  744 .346  75.8386  618 .5074 . So b2 
2.4
Finally we get b0 by solving b0  Y  b1 X 1  b2 X 2  732 .051  2.6547 101 .1  278 .54 0.4  352 .55 .
Thus our equation is Yˆ  b  b X  b X  352.55  5.232X  0.2351X .
b1 
0
1
1
2
2
1
2
b. Compute R-squared and R-squared adjusted for degrees of freedom for this regression and
compare them with the values for the previous problem. (4)
On the previous pages R 2  .7714 and R 2  .7428 . ( R 2 must be between zero and one! R squared
adjusted for degrees of freedom must be below R squared.)
(The way I did it) SSE  SST  SSR and so b1  2.6547 , b2  278 .54 . Sx1 y  447713 .439
Sx 2 y  744 .346 SST  SSy  1566799 *
SSR  b1 Sx1 y  b2 Sx2 y  2.6547 447713   278 .54744 .346   1188544  207330  1395874 *
so SSE  SST  SSR  1566799  1395874  170925 * Note that the starred quantities are sums of squares
and must be positive.
SSR 1395874
R2 

 0.891 . If we use R 2 , which is R 2 adjusted for degrees of
SST 1566799
freedom R 2 
n  1R 2  k  90.891  2  .860 .
n  k 1
7
Both of these have risen, so it looks like we did well by adding the new independent variable.
c. Using either R – squares or SST, SSR and SSE do F tests (ANOVA). First check the
usefulness of the simple regression and then the value of ‘model’ as an improvement to the
regression (6)
For this regression, the ANOVA reads:
Source
DF
SS
MS
Regression
2
1395874
697937
Residual Error
Total
7
9
170925
1566799
24418
F
28.58
F.05
2,7   4.74
F.05
Since our computed F is larger that the table F, we reject the hypothesis that X and Y are unrelated.
For the previous regression we had.
SSR  b1 Sx1 y  b1
x1 y  nx1 y  2.6997 447713 .439   1208692


13
252y0343 1/12/04
For the previous regression, the ANOVA reads:
Source
DF
SS
MS
Regression
1
1208692
1208692
Residual Error
Total
8
9
357107
1566799
44763
F
27.00
F.05
1,8  5.32
F.05
Since our computed F is larger that the table F, we reject the hypothesis that X and Y are unrelated.
The change in the regression sum of squares is 1395874 -1208692 = 187182, so we have
Source
Size
DF
1
SS
1208692
MS
Model
1
187182
187182
Residual Error
Total
7
9
170935
1566799
24418
F
7.775
F.05
1,7   5.59
F.05
Since our computed F is larger that the table F, we reject the hypothesis that Model does not contribute to
the explanation of Y.
Recall that now R 2 
SSR 1395874

 0.891 and before R 2  .771 . We can rewrite the analysis with RSST 1566799
squared as follows.
Source
Size
DF
1
‘SS’
.771
MS
F
Model
1
.891-.771 = .120
.120
7.692
Residual Error
Total
7
9
1 - .891 = .109
1.000
F.05
1,7   5.59
F.05
.0156
This is identical with the previous ANOVA except for rounding error.
d. Predict the price of a deluxe model with a 19 inch screen – how much change is there from
your last prediction? (2)
22
ˆ
Our equation is Y  b0  b1 X 1  b2 X 2  352.55  5.232X 1  0.2351X 2 . So we have
Yˆ  b  b X  b X  352.55  5.23219  0.23511  452.19 . Our previous predication was 510.41,
0
1
1
2
2
more than 10% less, which indicates that the new variable is making a difference.
14
252y0343 1/12/04
3. A marketing analyst collects data on the screen size and price of the two models produced by a
competitor. Here ‘price’ is the price in dollars, ‘size’ is screen size in inches, ‘model’ is 1 for the deluxe
model (zero for the regular model) and rx1 is the column in which you will rank x1 .
Row
price
1
2
3
4
5
6
7
8
9
10
size
y
x1
371.69
403.61
484.41
492.89
606.25
634.41
651.00
806.25
1131.00
1739.00
7320.51
13
19
21
17
25
21
210
25
290
370
1011
model
x12
x 22
0
169
0
361
0
441
1
289
0
625
1
441
0 44100
1
625
0 84100
1 136900
4 268051
0
0
0
1
0
1
0
1
0
1
4
x2
x1 y
x2 y
4832
0.00
7669
0.00
10173
0.00
8379 492.89
15156
0.00
13323 634.41
136710
0.00
20156 806.25
327990
0.00
643430 1739.00
1187817 3672.55
x1 x 2
rx1
0
0
0
17
0
21
0
25
0
370
433
1
3
4.5
2
6.5
4.5
8.0
6.5
9.0
10.0



b  X Y  nX Y 
b Sx y 


SSy
 Y  nY 
2
R
138153
162901
234653
242941
367539
402476
423801
650039
1279161
3024121
6925785
a. Compute the correlation between price and size and check to see if it is significant using the
spare parts from problem 1 if you have them. (5)
b. Use the same correlation to test the hypothesis that the correlation is .85 (4)
c. Do ranks for the values of ‘size’ in the rx1 column, compute a rank correlation between price
and size and test it for significance using the rank correlation table if possible. (5) 14
XY  nXY
, but the easiest war to compute it is to remember that
X 2  nX 2
Y 2  nY 2
a) r 
2
y2
1
1
2
1
1
1
2
2

SSR 1208692

 .7714 and that the slope was positive so that
SSy 1566799 
we can take the positive square root and get r  .7714  .8783. The outline says that if we want to
r
r
test H 0 : xy  0 against H1 : xy  0 and x and y are normally distributed, we use t n  2  
.

sr
1 r 2
n2
If we use this we get t 
r
1 r
n2
2

.8783
1  .8783
8

.8783
 7.123 . Our rejection region is below
.1233
8
8
 t .025
 2.308 and above t .025
 2.308 . Since our computed value of t falls in the reject region, we reject
the null hypothesis.
b) The outline says that if we are testing H 0 : xy   0 against H 1 : xy   0 , and  0  0 , the test is quite
1 1 r 
z  ln 
different. We need to use Fisher's z-transformation. Let ~
 . This has an approximate mean of
2  1 r 
~
n 2 
z  z
1
1  1 0 
 and a standard deviation of s z 

 z  ln 
, so that t
. We know

n3
sz
2  1 0 
r  .7714  .8783 and  0  .85 . So
1  1  r  1  1.8783  1
1
~
z  ln 
  ln 
  ln 15 .4338   2.7366   1.3683 s z 
2  1  r  2  .1217  2
2
1

n3
1
 0.3780
7
15
252y0343 1/12/04
1 1 0
 z  ln 
2  1 0
t
n  2 
 1  1.85  1
1
  ln 
 2  .15   2 ln 12 .3333   2 2.5123   1.2562 . Finally

~
z   z 1.3683  1.2562
8


 .2965 . Our rejection region is below  t .025
 2.308 and above
sz
0.3780
8
t .025
 2.308 . Since our computed value of t does not fall in the reject region, we do not reject the null
hypothesis.
c. Do ranks for the values of ‘size’ in the rx1 column, compute a rank correlation between price and size
and test it for significance using the rank correlation table if possible. (5)
The ranking for size appears above, and the ranking of price is obvious. We now have the following
columns.
d
Row
rprice rsize
d2
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
1.0
3.0
4.5
2.0
6.5
4.5
8.0
6.5
9.0
10.0
Since n  5, rs  1 
0.0
-1.0
-1.5
2.0
-1.5
1.5
-1.0
1.5
0.0
0.0
0.0
0.00
1.00
2.25
4.00
2.25
2.25
1.00
2.25
0.00
0.00
15.00
 d  1  615   1  90  1  .090909 =0.9091. If we check the table
990
nn  1
10 10   1
2
6
2
2
‘Critical Values of rs , the Spearman Rank Correlation Coefficient,’ we find that the critical value for
n  10 and   .05 is .55150 so we must reject the null hypothesis and we conclude that we cannot say that
the rankings agree.
16
252y0343 1/12/04
4. Explain the following.
a. Under what circumstances you could use a Chi squared method to test for Normality but not a
Kolmogorov - Smirnov? (2)
b. Under what circumstances could you use a Lilliefors test to test for Normality but not a
Kolmogorov – Smirnov? (2)
c. Under what circumstances could you use a Kruskal – Wallis test to test whether four
distributions are similar but not a one – way ANOVA? (2)
d. What 2 tests can be used to test for the equality of two medians? Which is more powerful? (2)
f. A random sample of 21 Porsche drivers were asked how many miles they had driven in the last
year and a frequency table was constructed of the data.
Miles
Observed frequency
0 – 4000
2
4000 – 8000
7
8000 – 12000
7
Over 12000
5
Does the data follow a Normal distribution with a mean of 8000 and a standard deviation of 2000? Do not
cut the number of groups below what is presented here. Find the appropriate E or cumulative E and do the
test. (6)
Solution:
a. When the parameters of the distribution must be computed from the data.
b. When the mean and standard deviation had to be computed from the data.
c. When the distributions are not approximately Normal.
d. This was misstated and almost any answer that showed you knew how to do a test for 2 medians was
accepted.
e.
f.
Miles
Values of z O
Fo
Fe
D  F0  Fe
0 – 4000
4000 – 8000
8000 – 12000
Over 12000
-4.00
-2.00
0 to
2.00
to -2.00 2
to 0
7
2.00
7
and up
5
21
.0952
.4286
.7619
1.000
.5 - .4772 =.0228
.5
.9772
1.0000
.0624
.0714
.2153
0
This is a K-S test where Fe is the cumulative probability under the Normal distribution. The
tabulated probabilities are Pz  2 , Pz  0 , Pz  2 and Pz    . According to the K-S table, the
5% critical value for n  21 is .287. Since the maximum discrepancy does not exceed the critical value, do
not reject H 0 : N 8000 ,2000 
17
252y0343 12/10/03
5. (Ullman) A Latin Square is an extremely effective way of doing a 3 way ANOVA. In this example the
data is arranged in 4 rows and 4 columns . there are 3 factors. Factor A is rows - machines. Factor B is
columns – operators and Factor C - materials is shown by a tag C1, C2, C3, and C4.each material appears
once in each row or column. These are times to do a job categorized by machines, operators and cutting
material. The rules are just the same as in any ANOVA- degrees of freedom add up and sums of squares
add up. I am going to set this up as a 2 way ANOVA with one measurement per cell. There is no
interaction. We, of course assume that the parent distribution is Normal
B1
B2
B3
B4
Sum
SS
ni
x i 
x 2
i
A1
A2
A3
A4
Sum
nj
7 C1
6 C4
5 C3
6 C2
24
4
4 C2
9 C1
1 C4
3 C3
17
4
5 C3
4 C2
6 C1
4 C4
19
4
3 C3
2 C2
1 C1
10 C4
16
4
6.00
4.25
4.75
4.00
x j 
SS
x j 
146
107
93
19
21
13
23
76
16
(
4
4
4
4
16
n
)
4.75
5.25
3.25
5.75
(
)
x
99
137
63
161
2
 xijk
 x i 2
x
2
 xijk
114
 x .2j .
2
You now have a choice. a) If you are a real wimp, you will pretend that each column is a random sample
and compare the means of each operator. (5) the table will look like that below.
Source
SS
DF
MS
F
F.05
Between
Within
Total
b) If you are less wimpy, you will pretend that this is a 2-way ANOVA and your table will look like that
below (8) A choice means you don’t get full credit for doing more than one of these!
Source
SS
DF
MS
F
F.05
Rows A
Columns B
Within
Total
c) If you are very daring, you will try the table below. (11) To do this you need to know that the means for
the 4 materials are 8, 3.75, 3.75 and 3.50 and that the factor C sum of squares is
x
SSC  4
2
..k


 nx 2  4 8 2  3.752  3.752  3.50 2  16
2 
?. I think that the degrees of freedom
should be obvious. Please don’t make the same mistakes you make on the last exam! You have 3 null
hypotheses. Tell me what they are and whether you reject them.
Source
SS
DF
MS
F
F.05
Rows A
Columns B
Materials C
Within
Total
d) Assuming that your data is cross classified, compare the means of columns 1 and 4 using a 2-sample
method. (3)
e) Assume that this is the equivalent of a 2-way one-measurement per cell ANOVA, but that the underlying
distribution is not Normal and do an appropriate rank test. (5)
18
252y0343 12/10/03
Solution: Lets start by playing ‘fill in the blanks.’ I am using the same format as was given in the 2-way
ANOVA example with one measurement per cell. No SS can be negative, ever. And saying that an SS is
zero implies that the variable it describes is a constant! We, of course, assume that the parent distribution is
Normal.
B1
B2
B3
B4
Sum
SS
ni
x i 
x 2
i
A1
A2
A3
A4
Sum
nj
x j 
SS
7 C1
6 C4
5 C3
6 C2
24
4
4 C2
9 C1
1 C4
3 C3
17
4
5 C3
4 C2
6 C1
4 C4
19
4
3 C3
2 C2
1 C1
10 C4
16
4
19
21
13
23
76
16
4
4
4
4
16
n
6.00
4.25
4.75
4.00
(4.75)
x
107
18.0625
93
22.5625
114
16
460
92.625
146
36
x j  2
4.75
5.25
3.25
5.75
(4.75)
x
99
137
63
161
460
2
 xijk
22.5625
27.5625
10.5625
33.0625
93.7500
 x i 2
2
 xijk
 x .2j .
Since most of this has been done for you, the only mystery is x  4.75 , which should be found by dividing
x  76 by n  16 . In this particular case, because the row sizes and the column sizes are equal, the

overall mean can also be found by averaging the row means or averaging the column means. There are
x i2.  93 .75 and
x ij2  5037 ,
R  4 rows, C  4 columns and K  4 materials.

x
2
.j

 92 .625 . All the formulas that we need to use end with n x  164.752  361 .
2
So SST 
 x
2
ij
2
 n x  460  361  99 .
If we use SSA for rows, we can use
SS ( Rows)  SSA 
 Rx
2
i.
 n x  493 .75   16 4.75 2  375  361  14 .
2
If we use SSB for columns, we can use
SS (Columns)  SSB 
one way ANOVA.
Finally, I gave you
SSC  K
x
2
..k
 Cx
2
j
 n x  492 .625   16 4.75 2  370 .5  361  9.5 . This is SSB in a
2


 nx 2  4 8 2  3.752  3.752  3.50 2  164.752  4104.375  361  417.5  361  56.5 .
Because there are 4 items in each of the sums of squares, the degrees of freedom for each is 3. The total
degrees of freedom are n  1  16  1  15. If we remember that sums of squares and degrees of freedom
must add up, that MS is SS divided by DF and that F is MS divided by the Within MS, we get the following
tables.
a) If you are a real wimp, you will pretend that each column is a random sample and compare the means of
each operator. (5) the table will look like that below.
H 0 : Column means are equal. We do not reject this hypothesis because our computed F is less than the
table F.
Source
SS
DF
MS
F
F.05
Between
9.5
3
3.167
0.106ns
F 3,12  3.49
.05
Within
Total
89.5
99.0
12
15
29.833
19
252y0343 12/10/03
b) If you are less wimpy, you will pretend that this is a 2-way ANOVA and your table will look like that
below (8)
H 01 : Row means are equal. We do not reject this hypothesis because our computed F is less than the table
F.
H 02 : Column means are equal. We do not reject this hypothesis because our computed F is less than the
table F.
Source
SS
Rows A
Columns B
DF
MS
F
F.05
14.0
3
4.667
0.556ns
9.5
3
3.167
0.377ns
3,9   3.86
F.05
3,9   3.86
F.05
Within
75.5
9
8.389
Total
99
15
c) If you are very daring, you will try the table below. (11) To do this you need to know that the means for
the 4 materials are 8, 3.75, 3.75 and 3.50 and that the factor C sum of squares is
x
SSC  4
2
..k


 nx 2  4 8 2  3.752  3.752  3.50 2  16
2 
?. I think that the degrees of freedom
should be obvious. You have 3 null hypotheses. Tell me what they are and whether you reject them.
H 01 : Row means are equal. We do not reject this hypothesis because our computed F is less than the table
F.
H 02 : Column means are equal. We do not reject this hypothesis because our computed F is less than the
table F.
H 03 : Material means are equal. We reject this hypothesis because our computed F is larger than the table
F.
Source
SS
DF
MS
F
F.05
Rows A
14.0
3
4.6667 1.474ns
F 3,6   3.76
.05
Columns B
9.5
3
3.1667
1
ns
Materials C
56.5
3
18.833
5.947 s
3,6   3.76
F.05
3,6   3.76
F.05
Within
19.0
6
3.1667
Total
99
15
d) Assuming that your data is cross classified, compare the means of columns 1 and 4 using a 2-sample
method. (3)
Solution: This is an easy one. Let d  x1  x 4
e) Assume that this is the equivalent of a 2-way one-measurement per cell ANOVA, but that the underlying
distribution is not Normal and do an appropriate rank test. (5)
x1 x 4 d d 2
7
3 4 16
6 2 4 16
5
1 4 16
5
1 6 36
23 7 18 84
From the document 252solnD2 we have the
following. To do this problem we do not need
statistics on x1 or x 2 , but only on the difference,
d  x1  x 2 which is displayed above. You
should be able to compute n  4,
d
2
 d  18 and
 84 .
20
252y0343 1/12/04
 d  18  4.5 and s   d
So we have d 
n
sd
2
d
4
2
 nd 2
n 1

84  44.52
 1 , which gives s d  1  1 We
3
1
 0.25  0.5.
4
n
If the paired data problem were on the formula table, it would appear as below.
Interval for
Confidence
Hypotheses
Test Ratio
Critical Value
Interval
Difference
H 0 : D  D0 *
D  d t  2 s d
d cv  D0 t  2 s d
d  D0
t
between Two
H 1 : D  D0 ,
s
d
d  x1  x 2
Means (paired
D  1   2
sd
data.)
sd 
n
H 0 : 1   2
3
* Same as
t n1  t.025
 3.182 . We can do one of the following.
H 1 : 1   2 if D0  0. 2
need s d 

(i) Confidence interval – D  d t  2 s d  4.5  3.182 0.5  4.5  1.591 . Because this interval does
not include zero, we reject our null hypothesis and conclude that there is a significant difference between
the means of the populations from which the two data columns come.
d  D 0 4.5  0

 9.00 Our reject zone is below -3.182 above 3.182. Since
(ii) Test ratio – t 
sd
.5
the t-ratio falls in the upper reject zone, we reject our null hypothesis and conclude that there is a significant
difference between the means of the populations from which the two data columns come.
(iii) Critical value - d cv  D0 t  2 s d  0  3.182 0.5  1.591 . Since d  4.5 does not fall
between these 2 limits and conclude that there is a significant difference between the means of the
populations from which the two data columns come.
e) Assume that this is the equivalent of a 2-way one-measurement per cell ANOVA, but that the underlying
distribution is not Normal and do an appropriate rank test. (5)
In general if the parent distribution is Normal use ANOVA, if it's not Normal, use Friedman or
Kruskal-Wallis. If the samples are independent random samples use 1-way ANOVA or Kruskal
Wallis. If they are cross-classified, use Friedman or 2-way ANOVA. So the other method that allows for
cross-classification is Friedman and we use it if the underlying distribution is not Normal.
The null hypothesis is H 0 : Columns from same distribution or H 0 : 1   2   3   4 . We use a
Friedman test because the data is cross-classified by store. This time we rank our data only within rows.
There are c  4 columns and r  4 rows.
1
2
3
4
Sum
x1
x2
x3
7
6
5
6
4
9
1
3
5
4
6
4
x4
r1
3
2
1
10
r2
r3
r4
2
4
1.5
1
3
2
4
2
1
1
1.5
4
4
3
3
3
13
8.5
11
7.5
To check the ranking, note that the sum of the three rank sums is 13 + 8.5 + 11 +7.5 = 40, and that
rcc  1 445
SRi 

 40 .
the sum of the rank sums should be
2
2

21
252y0333 11/25/03
 12
Now compute the Friedman statistic  F2  
 rc c  1


 SR   3r c  1
2
i
i

 12
132  8.52  112  7.52   345  0.15169  72.25  121  56.25   60 

.




4
4
5


 .15418 .5  60  2.775
If we check the Friedman Table for c  4 and r  4 , we find that the p-value is between .508 (for 2.7) and
.432 )for 3. Since 2.775 is about halfway between 2.7 and 3, the p-value must be above 5% and we do not
reject the null hypothesis. Alternately, since the table says that 7.5 has a p-value of .052 and 7.8 has a pvalue of .036, the 5% critical value must be slightly above 7.5. Since 2.775 is well below the critical value,
do not reject the null hypothesis.
22
252y0343 1/12/04
6.
a. A Stock moves up and down as follows. In 36 days it goes up 14 times and down 22 times.
UDDDDUUUDUDDDUUDDDDUDDDUUDDDUDUUDDDU
(i) Test these movements for randomness. (5)
(ii) Take the first half of the series and test it for randomness – (and don’t repeat what you did in
part (i) exactly. (4)
b. Explain, briefly, why I did not bother with a Durbin – Watson test in the regression that began the exam
(2)
c. Test the hypothesis that the population the D’s and U’s above came from is evenly split between D’s and
U’s (4).
Solution: a)(i) U DDDD UUU D U DDD UU DDD - D U DDD UU DDD U D UU DDD U
1
2
3 4 5 6
7
8
9 10 11 12 13 14 15 16 17
. n  36 , n1  14 , n 2  22 , r  17 The outline says that for a larger problem (if n1 and n 2 are too large
2n1 n 2
214 22
1 
 1  18 .111 and
n
36
  1  2  16 .11117 .111   7.877 . So z  r    17  17 .111  .0395 . Since this value of z
2 
n 1
35

7.877
is between  z  1.960 , we do not reject H 0 : Randomness.
for the table), r follows the normal distribution with  
2
(ii) For half the series, n  18 , n1  7, n 2  11 , r  8. The Runs Test table says that the critical values are
5 and 14. Since 8 is between these numbers, we do not reject the null hypothesis.
b. The Durbin Watson is a test for serial correlation. It is useful in problems that have a time dimension,
which this problem does not have.
c. We are testing that p  .5, where p is the proportion of D’s in the population underlying the sample.
22
 0.61 .
36
Hypotheses
Test Ratio
Our table has the following. x  22 , n  36 and p 
Interval for
Proportion
Confidence
Interval
p  p  z 2 s p
pq
n
q  1 p
sp 
H 0 : p  p0
H1 : p  p0
z
p  p0
p
Critical Value
pcv  p0  z 2  p
p0 q0
n
q0  1  p0
p 
p0 q0
.5.5

 .0069  .08333 H 0 : p  .5.
36
n
 p0  z 2  p  .5  1.960.08333  .5  0.16 or .34 to .66. Since
For the test ratio or critical value method,  p 
Critical value method: pcv
22
 0.61 falls between these limits, we cannot reject the null hypothesis.
36
p  p 0 .61  .5

 1.32 . Critical values for z are 1.96 . Since our computed z
Test ratio method: z 
p
.08333
p
falls between these limits, we cannot reject the null hypothesis.
23
Download