252x0343 12/06/03 ECO252 QBA2 Name

advertisement
252x0343 12/06/03
ECO252 QBA2
FINAL EXAM
DEC 11, 2003
Name
Hour of Class Registered _______
I. (25+ points) Do all the following. Note that answers without reasons receive no credit. Most answers
require a statistical test, that is, stating or implying a hypothesis and showing why it is true or false by citing
a table value or a p-value.
The fourth computer problem involved the regression of the Y variable below against some, but not all of
the X values.
Column
Variables in Data Set
C2
X2
Type
1 = Private, 0 = Public.
C3
X1
First Quartile SAT
C3
X5
3rd Quartile SAT
C5
X4
Room and Board Cost
C6
Y
Annual Total Cost
C7
X6
Average Indebtedness at Graduation
C8
X3
Interaction = X1 * X2
You were directed to hand in the computer output (4 points) and your answer to problem 14.37 or the
equivalent problem in the 8th edition. (Up to 7 points). I ran the same problem you did, but went on to add
X4 and X5 to the input. The output appears on pages 1-7.
My first two regressions were stepwise regressions. The second stepwise regression is set up to force the
dummy variable designating ‘type of university’ into the equation. This means that our first equation in
regression 2 is essentially Yˆ  b0  b2 X 2 .
a)According to the first regression in regression 2 what are the mean annual total costs for public
and private universities and how does the printout show us that they are significantly different? (2)
b) Regressions 3 and 4 are the regressions you supposedly did. According to this regression for a
public university the constant in the regression equation is b0  1013 and the slope, relative to the first
quartile SAT is b1  11.3339. The equation relating annual total costs to the first quartile SAT effectively
has both a different intercept and a different slope; what is the equation? Are the intercepts and slopes for
public and private universities, in fact significantly different? What tells us this? (3). (Extra credit: at what
SAT level do public and private universities have the same cost? (2))
c) Regression 6 should be the best of all the regressions, because it has the most independent
variables and the highest R-squared, but it isn’t. (i) Look at the coefficients of the independent variables and
ignore their significance, one of those coefficients is incredibly unreasonable, which one is it? (1) (ii)
Which coefficients are significant at the 1% level, why? (2) What about the 10% level? (1) Compare the
adjusted R-squares with the other regressions, what do they tell us? (1) Look at the VIFs, what do they
imply?(2)
d) Do an F test to tell whether adding X3, X4 and X5 as a package to equation 3 with only X1 and
X2 was useful? What is your conclusion? (4)
e) I didn’t follow directions when I did a prediction interval for equation 3, so it should disagree
with yours. I added some guesses as to (median?) values for X3, X4 and X5. What does the printout say I
used? What would you expect should happen to the size of the prediction interval if our addition of new
variables gives us a better estimate of Y? Did it happen? Cite numbers.(3)
f) Use the method suggested in the text, using the standard error s e to compute a prediction
interval for the same values of the independent variables and equation 3 – how accurate is it? (3) 32
————— 12/5/2003 7:15:10 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > Retrieve "C:\Documents and Settings\RBOVE.WCUPANET\My Documents\Drive
D\MINITAB\Colleges2002.MTW".
252x0343 12/06/03
Retrieving worksheet from file: C:\Documents and Settings\RBOVE.WCUPANET\My
Documents\Drive D\MINITAB\Colleges2002.MTW
# Worksheet was saved on Fri Dec 05 2003
Results for: Colleges2002.MTW
MTB > Stepwise c6 c3 c2 c8 c5 c4;
SUBC>
AEnter 0.15;
SUBC>
ARemove 0.15;
SUBC>
Constant.
1) Stepwise Regression: Annual Total versus First quarti, Type of Scho, ...
Alpha-to-Enter: 0.15
Alpha-to-Remove: 0.15
Response is Annual T on
5 predictors, with N =
Step
Constant
1
12198
2
-1021
inter
T-Value
P-Value
10.42
17.40
0.000
8.35
11.97
0.000
First qu
T-Value
P-Value
80
13.3
4.62
0.000
S
3058
2724
R-Sq
79.51
83.95
R-Sq(adj)
79.25
83.54
C-p
20.2
1.3
More? (Yes, No, Subcommand, or Help)
SUBC> yes
No variables entered or removed
More? (Yes, No, Subcommand, or Help)
SUBC> no
MTB > Stepwise c6 c3 c2 c8 c5 c4;
SUBC>
Force c2;
SUBC>
AEnter 0.15;
SUBC>
ARemove 0.15;
SUBC>
Constant.
2) Stepwise Regression: Annual Total versus First quarti, Type of Scho, ...
Alpha-to-Enter: 0.15
Alpha-to-Remove: 0.15
Response is Annual T on
5 predictors, with N =
Step
Constant
1
12478
2
-7264
3
1013
Type of
T-Value
P-Value
11646
13.97
0.000
8732
11.57
0.000
-3016
-0.48
0.630
19.5
7.37
0.000
11.3
2.25
0.027
First qu
T-Value
P-Value
inter
T-Value
P-Value
80
11.2
1.90
0.061
S
3610
2783
2737
R-Sq
71.44
83.24
84.00
R-Sq(adj)
71.07
82.81
83.37
C-p
58.1
4.7
3.1
More? (Yes, No, Subcommand, or Help)
SUBC> yes
2
252x0343 12/06/03
No variables entered or removed
More? (Yes, No, Subcommand, or Help)
SUBC> no
MTB > Name c18 = 'RESI1'
MTB > Regress c6 2 c3 c2;
SUBC>
Residuals 'RESI1';
SUBC> GHistogram;
SUBC> GNormalplot;
SUBC> GFits;
SUBC> RType 1;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Predict c9 c10;
SUBC>
Brief 2.
3) Regression Analysis: Annual Total versus First quarti, Type of Scho
The regression equation is
Annual Total Cost = - 7264 + 19.5 First quartile SAT + 8732 Type of School
Predictor
Constant
First qu
Type of
Coef
-7264
19.524
8732.4
S = 2783
SE Coef
2728
2.651
754.7
R-Sq = 83.2%
T
-2.66
7.37
11.57
P
0.009
0.000
0.000
VIF
1.4
1.4
R-Sq(adj) = 82.8%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
First qu
Type of
DF
1
1
DF
2
77
79
SS
2963313624
596492779
3559806404
MS
1481656812
7746659
F
191.26
P
0.000
Seq SS
1926306635
1037006989
Unusual Observations
Obs
First qu
Annual T
27
1040
21484
56
1010
15722
61
1320
17526
Fit
13041
21188
27240
SE Fit
514
560
578
Residual
8443
-5466
-9714
St Resid
3.09R
-2.00R
-3.57R
R denotes an observation with a large standardized residual
Predicted Values for New Observations
New Obs
1
Fit
20993
SE Fit
579
(
95.0% CI
19839,
22147)
(
95.0% PI
15332,
26654)
Values of Predictors for New Observations
New Obs
1
First qu
1000
Type of
1.00
Residual Histogram for Annual T
Normplot of Residuals for Annual T
Residuals vs Fits for Annual T
MTB > %Resplots c18 c2;
SUBC>
Title "Residuals vs Type".
Executing from file: W:\wminitab13\MACROS\Resplots.MAC
Macro is running ... please wait
Residual Plots: RESI1 vs Type of Scho
3
252x0343 12/06/03
MTB > %Resplots c18 c3;
SUBC>
Title "Residuals vs Type".
Executing from file: W:\wminitab13\MACROS\Resplots.MAC
Macro is running ... please wait
Residual Plots: RESI1 vs First quarti
MTB > Name c19 = 'RESI2'
MTB > Regress c6 3 c3 c2 c8;
SUBC>
Residuals 'RESI2';
SUBC> GHistogram;
SUBC> GNormalplot;
SUBC> GFits;
SUBC> RType 1;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Predict c9 c10 c11;
SUBC>
Brief 2.
4) Regression Analysis: Annual Total versus First quarti, Type of Scho, ...
The regression equation is
Annual Total Cost = 1013 + 11.3 First quartile SAT - 3016 Type of School
+ 11.2 inter
Predictor
Constant
First qu
Type of
inter
Coef
1013
11.339
-3016
11.177
S = 2737
SE Coef
5120
5.039
6234
5.889
R-Sq = 84.0%
T
0.20
2.25
-0.48
1.90
P
0.844
0.027
0.630
0.061
VIF
5.2
97.2
120.6
R-Sq(adj) = 83.4%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
First qu
Type of
inter
DF
1
1
1
DF
3
76
79
SS
2990309581
569496823
3559806404
MS
996769860
7493379
F
133.02
P
0.000
Seq SS
1926306635
1037006989
26995957
Unusual Observations
Obs
First qu
Annual T
3
800
9476
9
1250
13986
27
1040
21484
61
1320
17526
Fit
10084
15186
12805
27718
SE Fit
1176
1303
520
622
Residual
-608
-1200
8679
-10192
St Resid
-0.25 X
-0.50 X
3.23R
-3.82R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Predicted Values for New Observations
New Obs
1
Fit
20513
SE Fit
623
(
95.0% CI
19271,
21755)
(
95.0% PI
14921,
26105)
Values of Predictors for New Observations
New Obs
1
First qu
1000
Type of
1.00
inter
1000
MTB > Name c20 = 'RESI3'
MTB > Regress c6 4 c3 c2 c8 c5;
SUBC>
Residuals 'RESI3';
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Predict c9 c10 c11 c12;
SUBC>
Brief 2.
4
252x0343 12/06/03
5) Regression Analysis: Annual Total versus First quarti, Type of Scho, ...
The regression equation is
Annual Total Cost = - 13 + 11.4 First quartile SAT - 3053 Type of School
+ 10.9 inter + 0.165 Room and Board
Predictor
Constant
First qu
Type of
inter
Room and
Coef
-13
11.382
-3053
10.928
0.1655
S = 2750
SE Coef
5483
5.064
6263
5.934
0.3062
R-Sq = 84.1%
T
-0.00
2.25
-0.49
1.84
0.54
P
0.998
0.028
0.627
0.069
0.591
VIF
5.2
97.3
121.3
1.9
R-Sq(adj) = 83.2%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
First qu
Type of
inter
Room and
DF
4
75
79
DF
1
1
1
1
SS
2992518033
567288370
3559806404
MS
748129508
7563845
F
98.91
P
0.000
Seq SS
1926306635
1037006989
26995957
2208452
Unusual Observations
Obs
First qu
Annual T
9
1250
13986
27
1040
21484
61
1320
17526
Fit
15174
12880
27621
SE Fit
1309
541
650
Residual
-1188
8604
-10095
St Resid
-0.49 X
3.19R
-3.78R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Predicted Values for New Observations
New Obs
1
Fit
20071
SE Fit
1030
(
95.0% CI
18020,
22122)
(
95.0% PI
14221,
25922)
Values of Predictors for New Observations
New Obs
1
First qu
1000
Type of
1.00
inter
1000
Room and
5000
MTB > Name c21 = 'RESI4'
MTB > Regress c6 5 c3 c2 c8 c5 c4;
SUBC>
Residuals 'RESI4';
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Predict c9 c10 c11 c12 c13;
SUBC>
Brief 2.
6) Regression Analysis: Annual Total versus First quarti, Type of Scho, ...
The regression equation is
Annual Total Cost = 5873 + 26.2 First quartile SAT - 4605 Type of School
+ 12.2 inter + 0.150 Room and Board - 17.0 Third quartile SAT
Predictor
Constant
First qu
Type of
inter
Room and
Third qu
S = 2754
Coef
5873
26.23
-4605
12.162
0.1503
-17.01
SE Coef
8515
17.18
6502
6.096
0.3070
18.81
R-Sq = 84.2%
T
0.69
1.53
-0.71
2.00
0.49
-0.90
P
0.493
0.131
0.481
0.050
0.626
0.369
VIF
59.2
104.5
127.7
1.9
58.0
R-Sq(adj) = 83.2%
5
252x0343 12/06/03
Analysis of Variance
Source
Regression
Residual Error
Total
Source
First qu
Type of
inter
Room and
Third qu
DF
5
74
79
DF
1
1
1
1
1
SS
2998718897
561087507
3559806404
MS
599743779
7582264
F
79.10
P
0.000
Seq SS
1926306635
1037006989
26995957
2208452
6200863
Unusual Observations
Obs
First qu
Annual T
3
800
9476
9
1250
13986
27
1040
21484
61
1320
17526
Fit
9606
15381
13192
27217
SE Fit
1323
1331
642
789
Residual
-130
-1395
8292
-9691
St Resid
-0.05 X
-0.58 X
3.10R
-3.67R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Predicted Values for New Observations
New Obs
1
Fit
20002
SE Fit
1034
95.0% CI
17942,
22061)
(
(
95.0% PI
14141,
25862)
Values of Predictors for New Observations
New Obs
1
First qu
1000
Type of
1.00
inter
1000
Room and
5000
Third qu
1200
MTB > Name c22 = 'RESI5'
MTB > Regress c6 2 c3 c5 ;
SUBC>
Residuals 'RESI5';
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Predict c9 c12 ;
SUBC>
Brief 2.
7) Regression Analysis: Annual Total versus First quarti, Room and Boa
The regression equation is
Annual Total Cost = - 24258 + 27.9 First quartile SAT + 1.84 Room and Board
Predictor
Constant
First qu
Room and
Coef
-24258
27.927
1.8439
S = 3959
SE Coef
3686
3.532
0.3534
R-Sq = 66.1%
T
-6.58
7.91
5.22
P
0.000
0.000
0.000
VIF
1.2
1.2
R-Sq(adj) = 65.2%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
First qu
Room and
DF
1
1
DF
2
77
79
SS
2352968982
1206837422
3559806404
MS
1176484491
15673213
F
75.06
P
0.000
Seq SS
1926306635
426662346
6
252x0343 12/06/03
Unusual Observations
Obs
First qu
Annual T
14
920
7210
16
1120
9451
41
1060
25865
53
900
17886
61
1320
17526
Fit
16752
18272
14472
18758
26398
SE Fit
1006
593
849
1441
845
Residual
-9542
-8821
11393
-872
-8872
St Resid
-2.49R
-2.25R
2.95R
-0.24 X
-2.29R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Predicted Values for New Observations
New Obs
1
Fit
12889
SE Fit
820
(
95.0% CI
11255,
14522)
(
95.0% PI
4838,
20939)
Values of Predictors for New Observations
New Obs
1
First qu
1000
Room and
5000
7
252x0343 12/06/03
II. Do at least 4 of the following 6 Problems (at least 13 each) (or do sections adding to at least 50 points Anything extra you do helps, and grades wrap around) . Show your work! State H 0 and H1 where
applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing
appropriate statistical tests – That is, explain your hypotheses and what values from what table were used to
test them.
1. A marketing analyst collects data on the screen size and price of the two models produced by a
competitor. Here ‘price’ is the price in dollars, ‘size’ is screen size in inches, ‘model’ is 1 for the deluxe
model (zero for the regular model) and rx1 is the column in which you may rank x1 .
Row
1
2
3
4
5
6
7
8
9
10
price
size
y
x1
371.69
403.61
484.41
492.89
606.25
634.41
651.00
806.25
1131.00
1739.00
7320.51
13
19
21
17
25
21
210
25
290
370
1011
model
x12
x 22
0
169
0
361
0
441
1
289
0
625
1
441
0 44100
1
625
0 84100
1 136900
4 268051
0
0
0
1
0
1
0
1
0
1
4
x2
y2
138153
162901
234653
242941
367539
402476
423801
650039
1279161
3024121
6925785
x1 y
x2 y
4832
7669
10173
8379
15156
13323
136710
20156
327990
643430
1187817
x1 x 2
0
0
0
17
0
21
0
25
0
370
433
rx1
1
3
9.0
10.0
a. Fill in the x 2 y column.(2)
b. Compute the simple regression of price against size.(6)
c. Compute R squared and R squared adjusted for degrees of freedom. (3)
d. Compute the standard error s e (3)
e. Compute s b1 and make it into a confidence interval for 1 . (3)
f. Do a prediction interval for the price of a model with a 19 inch screen. (4)
21
8
252x0343 12/06/03
2. A marketing analyst collects data on the screen size and price of the two models produced by a
competitor. Here ‘price’ is the price in dollars, ‘size’ is screen size in inches, ‘model’ is 1 for the deluxe
model (zero for the regular model) and rx1 is the column in which you will rank x1 .
Row
1
2
3
4
5
6
7
8
9
10
price
size
y
x1
371.69
403.61
484.41
492.89
606.25
634.41
651.00
806.25
1131.00
1739.00
7320.51
13
19
21
17
25
21
210
25
290
370
1011
model
x12
x 22
0
169
0
361
0
441
1
289
0
625
1
441
0 44100
1
625
0 84100
1 136900
4 268051
0
0
0
1
0
1
0
1
0
1
4
x2
y2
138153
162901
234653
242941
367539
402476
423801
650039
1279161
3024121
6925785
x1 y
4832
7669
10173
8379
15156
13323
136710
20156
327990
643430
1187817
x2 y
x1 x 2
0
0
0
17
0
21
0
25
0
370
433
rx1
1
3
9.0
10.0
a. Do a multiple regression of price against size and model.(10)
b. Compute R-squared and R-squared adjusted for degrees of freedom for this regression and
compare them with the values for the previous problem. (4)
c. Using either R – squares or SST, SSR and SSE do F tests (ANOVA). First check the
usefulness of the simple regression and then the value of ‘model’ as an improvement to the
regression (6)
d. Predict the price of a deluxe model with a 19 inch screen – how much change is there from
your last prediction? (2)
22
9
252x0343 12/06/03
3. A marketing analyst collects data on the screen size and price of the two models produced by a
competitor. Here ‘price’ is the price in dollars, ‘size’ is screen size in inches, ‘model’ is 1 for the deluxe
model (zero for the regular model) and rx1 is the column in which you will rank x1 .
Row
1
2
3
4
5
6
7
8
9
10
price
size
y
x1
371.69
403.61
484.41
492.89
606.25
634.41
651.00
806.25
1131.00
1739.00
7320.51
13
19
21
17
25
21
210
25
290
370
1011
model
x12
x 22
0
169
0
361
0
441
1
289
0
625
1
441
0 44100
1
625
0 84100
1 136900
4 268051
0
0
0
1
0
1
0
1
0
1
4
x2
y2
138153
162901
234653
242941
367539
402476
423801
650039
1279161
3024121
6925785
x1 y
4832
7669
10173
8379
15156
13323
136710
20156
327990
643430
1187817
x2 y
x1 x 2
0
0
0
17
0
21
0
25
0
370
433
rx1
1
3
9.0
10.0
a. Compute the correlation between price and size and check to see if it is significant using the
spare parts from problem 1 if you have them. (5)
b. Use the same correlation to test the hypothesis that the correlation is .85 (4)
c. Do ranks for the values of ‘size’ in the rx1 column, compute a rank correlation between price
and size and test it for significance using the rank correlation table if possible. (5) 14
10
252x0343 12/06/03
4. Explain the following.
a. Under what circumstances you could use a Chi squared method to test for Normality but not a
Kolmogorov - Smirnov? (2)
b. Under what circumstances could you use a Lilliefors test to test for Normality but not a
Kolmogorov – Smirnov? (2)
c. Under what circumstances could you use a Kruskal – Wallis test to test whether four
distributions are similar but not a one – way ANOVA? (2)
d. What 2 tests can be used to test for the equality of two medians? Which is more powerful? (2)
f. A random sample of 21 Porsche drivers were asked how many miles they had driven in the last
year and a frequency table was constructed of the data.
Miles
Observed frequency
0 – 4000
2
4000 – 8000
7
8000 – 12000
7
Over 12000
5
Does the data follow a Normal distribution with a mean of 8000 and a standard deviation of 2000? Do not
cut the number of groups below what is presented here. Find the appropriate E or cumulative E and do the
test. (6)
11
252x0343 12/06/03
5. (Ullman) A Latin Square is an extremely effective way of doing a 3 way ANOVA. In this example the
data is arranged in 4 rows and 4 columns . there are 3 factors. Factor A is rows - machines. Factor B is
columns – operators and Factor C - materials is shown by a tag C1, C2, C3, and C4.each material appears
once in each row or column. These are times to do a job categorized by machines, operators and cutting
material. The rules are just the same as in any ANOVA- degrees of freedom add up and sums of squares
add up. /I am going to set this up as a 2 way ANOVA with one measurement per cell. There is no
interaction. We, of course assume that the parent distribution is Normal
B1
B2
B3
B4
Sum
SS
ni
x i 
x 2
i
A1
A2
A3
A4
Sum
nj
7 C1
6 C4
5 C3
6 C2
24
4
4 C2
9 C1
1 C4
3 C3
17
4
5 C3
4 C2
6 C1
4 C4
19
4
3 C3
2 C2
1 C1
10 C4
16
4
6.00
4.25
4.75
4.00
x j 
SS
x j 
146
107
93
19
21
13
23
76
16
(
4
4
4
4
16
n
)
4.75
5.25
3.25
5.75
(
)
x
99
137
63
161
2
 xijk
 x i 2
x
2
 xijk
114
 x .2j .
2
You now have a choice. a) If you are a real wimp, you will pretend that each column is a random sample
and compare the means of each operator. (5) the table will look like that below.
Source
SS
DF
MS
F
F.05
Between
Within
Total
b) If you are less wimpy, you will pretend that this is a 2-way ANOVA and your table will look like that
below (8)
Source
SS
DF
MS
F
F.05
Rows A
Columns B
Within
Total
c) If you are very daring, you will try the table below. (11) To do this you need to know that the means for
the 4 materials are 8, 3.75, 3.75 and 3.50 and that the factor C sum of squares is
x
SSC  4
2
..k


 nx 2  4 8 2  3.752  3.752  3.50 2  16
2 
?. I think that the degrees of freedom
should be obvious. Please don’t make the same mistakes you make on the last exam! You have 3 null
hypotheses. Tell me what they are and whether you reject them.
Source
SS
DF
MS
F
F.05
Rows A
Columns B
Materials C
Within
Total
d) Assuming that your data is cross classified, compare the means of columns 1 and 4 using a 2-sample
method. (3)
e) Assume that this is the equivalent of a 2-way one-measurement per cell ANOVA, but that the underlying
distribution is not Normal and do an appropriate rank test. (5)
12
252x0343 12/06/03
(Blank Page – more on near page)
13
252x0343 12/06/03
6.
a. A Stock moves up and down as follows. In 36 days it goes up 14 times and down 22 times.
UDDDDUUUDUDDDUUDDDDUDDDUUDDDUDUUDDDU
(i) Test these movements for randomness. (5)
(ii) Take the first half of the series and test it for randomness – (and don’t repeat what you did in
part (i) exactly. (4)
b. Explain, briefly, why I did not bother with a Durbin – Watson test in the regression that began the exam
(2)
c. Test the hypothesis that the population the D’s and U’s above came from is evenly split between D’s and
U’s (4).
14
Download