252x0781

advertisement
252x0781 12/05/07
ECO252 QBA2
Final Exam
December 12-15, 2007
Version 1
Name and Class hour:_________________________
I. (25+ points) Do all the following. Note that answers without reasons and/or citation of appropriate
statistical tests receive no credit. Most answers require a statistical test, that is, stating or implying a
hypothesis and showing why it is true or false by citing a table value or a p-value. If you haven’t done it
lately, take a fast look at ECO 252 - Things That You Should Never Do on a Statistics Exam (or Anywhere
Else). There are over 150 possible points, but the exam is normed on 75 points.
In the Lees’ 2000 text they noted that before 1979 the Federal Reserve targeted interest rates, letting the
money supply grow in such a way that the interest rates would remain stable. After 1979, the Fed switched
to targeting the money supply. The Lees did a regression of Money supply against GNP (I had to replace
this with GDP.), the prime rate (PrRt) and a dummy variable (Dummy) that is 1 before 1979 and zero from
1979 till 1990, when their analysis stops, They report a high R-squared, and extremely significant
coefficients for the Prime Rate, GNP and the dummy variable, which seems to tell us that the Fed’s change
of regime had a real effect on the money supply. Later in the text they suggest the addition of an interaction
variable (GDPPR), which is the product of the Prime rate and the GDP, and a second interaction variable
(GDPPR). I added the year and its square measured from 1958, population, and GDP squared. My attempt
to update the Lees results was terrible discouraging. The dependent variable is M1 or its logarithm (logM1).
————— 12/3/2007 11:31:46 PM ————————————————————
Welcome to Minitab, press F1 for help.
MTB > WOpen "C:\Documents and Settings\RBOVE\My Documents\Minitab\M1PrRGDP.MTW".
Retrieving worksheet from file: 'C:\Documents and Settings\RBOVE\My
Documents\Minitab\M1PrRGDP.MTW'
Worksheet was saved on Mon Dec 03 2007
MTB > print c5 c2 c4 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15
Data Display
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
C5
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
M1
140.0
140.7
145.2
147.8
153.3
160.3
167.8
172.0
183.3
197.4
203.9
214.4
228.3
249.2
262.9
274.2
287.1
306.2
330.9
357.3
381.8
408.5
436.7
474.8
521.4
551.6
619.8
724.7
750.2
786.7
792.9
824.7
896.9
1024.8
1129.7
1150.7
PrRt
4.50
5.00
4.50
4.50
4.50
4.50
4.50
5.52
5.50
6.50
8.23
8.00
5.50
5.04
7.49
11.54
7.07
7.20
6.75
8.63
11.65
12.63
20.03
16.50
10.50
12.60
9.78
8.50
8.25
9.00
11.07
10.00
8.50
6.50
6.00
7.25
GDP
$506.60
$526.40
$544.70
$585.60
$617.70
$663.60
$719.10
$787.80
$832.60
$910.00
$984.60
$1,038.50
$1,127.10
$1,238.30
$1,382.70
$1,500.00
$1,638.30
$1,825.30
$2,030.90
$2,294.70
$2,563.30
$2,789.50
$3,128.40
$3,255.00
$3,536.70
$3,933.20
$4,220.30
$4,462.80
$4,739.50
$5,103.80
$5,484.40
$5,803.10
$5,995.90
$6,337.70
$6,657.40
$7,072.20
Dummy
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
GDPPr
2280
2632
2451
2635
2780
2986
3236
4349
4579
5915
8103
8308
6199
6241
10356
17310
11583
13142
13709
19803
29862
35231
62662
53708
37135
49558
41275
37934
39101
45934
60712
58031
50965
41195
39944
51273
GDPdum
506.6
526.4
544.7
585.6
617.7
663.6
719.1
787.8
832.6
910.0
984.6
1038.5
1127.1
1238.3
1382.7
1500.0
1638.3
1825.3
2030.9
2294.7
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
year
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
yearsq
1
4
9
16
25
36
49
64
81
100
121
144
169
196
225
256
289
324
361
400
441
484
529
576
625
676
729
784
841
900
961
1024
1089
1156
1225
1296
1
252x0781 12/05/07
37
38
39
40
41
42
43
44
45
46
47
48
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
1127.4
1081.4
1072.8
1095.9
1123.0
1087.7
1182.0
1219.5
1305.5
1375.2
1373.2
1365.9
Pop
176289
179979
182992
185771
188483
191141
193526
195576
197457
199399
201385
203984
206827
209284
211357
213342
215465
217583
219760
222095
224567
227225
229466
231664
233792
235825
237924
240133
242289
244499
246819
249623
252981
256514
259919
263126
266278
269394
272647
275854
279040
282217
285226
288126
290796
293638
296507
299398
9.00
8.25
8.50
8.50
7.75
9.50
6.98
4.75
4.22
4.01
6.01
8.02
GDPsq
256644
277097
296698
342927
381553
440365
517105
620629
693223
828100
969437
1078482
1270354
1533387
1911859
2250000
2684027
3331720
4124555
5265648
6570507
7781310
9786887
10595025
12508247
15470062
17810932
19916584
22462860
26048774
30078643
33675970
35950817
40166441
44320975
50016013
54725965
61103926
68961398
76510009
85903239
96373489
102576384
109612524
120139137
136560259
154601869
174100108
$7,397.70
$7,816.90
$8,304.30
$8,747.00
$9,268.40
$9,817.00
$10,128.00
$10,469.60
$10,960.80
$11,685.90
$12,433.90
$13,194.70
log M1
4.94164
4.94663
4.97811
4.99586
5.03240
5.07705
5.12277
5.14749
5.21112
5.28523
5.31763
5.36784
5.43066
5.51826
5.57177
5.61386
5.65983
5.72424
5.80182
5.87858
5.94490
6.01249
6.07925
6.16289
6.25652
6.31282
6.42940
6.58576
6.62034
6.66785
6.67570
6.71502
6.79894
6.93225
7.02971
7.04813
7.02767
6.98601
6.97803
6.99933
7.02376
6.99182
7.07496
7.10620
7.17434
7.22635
7.22490
7.21957
0
0
0
0
0
0
0
0
0
0
0
0
66579
64489
70587
74350
71830
93262
70693
49731
46255
46860
74728
105821
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
37
38
39
40
41
42
43
44
45
46
47
48
1369
1444
1521
1600
1681
1764
1849
1936
2025
2116
2209
2304
logM1l
4.89222
4.94164
4.94663
4.97811
4.99586
5.03240
5.07705
5.12277
5.14749
5.21112
5.28523
5.31763
5.36784
5.43066
5.51826
5.57177
5.61386
5.65983
5.72424
5.80182
5.87858
5.94490
6.01249
6.07925
6.16289
6.25652
6.31282
6.42940
6.58576
6.62034
6.66785
6.67570
6.71502
6.79894
6.93225
7.02971
7.04813
7.02767
6.98601
6.97803
6.99933
7.02376
6.99182
7.07496
7.10620
7.17434
7.22635
7.22490
2
252x0781 12/05/07
I followed the course suggested by the textbook to find what variables were actually important in predicting
the money supply.
Results for: M1PrRGDP.MTW
MTB > Regress c2 5 c4 c6 c7 c10 c12;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Brief 2.
Regression 1
Regression Analysis: M1 versus PrRt, GDP, Dummy, year, Pop
The regression equation is
M1 = 2874 - 19.1 PrRt + 0.0714 GDP - 115 Dummy + 46.2 year - 0.0149 Pop
Predictor
Coef
SE Coef
T
Constant
2874
1232
2.33
PrRt
-19.116
3.941 -4.85
GDP
0.07138
0.01762
4.05
Dummy
-114.81
48.62 -2.36
year
46.23
15.57
2.97
Pop
-0.014888 0.007176 -2.07
S = 57.7863
R-Sq = 98.4%
R-Sq(adj)
Analysis of Variance
Source
DF
SS
Regression
5 8498077
Residual Error 42
140249
Total
47 8638326
Source
PrRt
GDP
Dummy
year
Pop
DF
1
1
1
1
1
MS
1699615
3339
P
VIF
0.025
0.000
2.241
0.000
62.461
0.023
8.260
0.005 668.523
0.044 917.418
= 98.2%
F
508.98
P
0.000
Seq SS
3746
8260319
139454
80187
14371
Unusual Observations
Obs PrRt
M1
Fit SE Fit Residual St Resid
23 20.0
436.70 361.08
37.33
75.62
1.71 X
35
6.0 1129.70 982.60
18.35
147.10
2.68R
36
7.3 1150.70 986.80
14.01
163.90
2.92R
37
9.0 1127.40 975.89
11.81
151.51
2.68R
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
So the regression above was my first attempt. There are several questions that can be asked at this point.
1) Why does this regression look awfully good as far as significance and the amount of the variation in the
Y variable that is explained by the equation? (3)
2) There are only two coefficients here whose sign you can predict in advance. What are they, what did you
predict and why and were you right? (2)
3) What does the Analysis of Variance tell us? What hypothesis did it cause you to reject?(1)
3
252x0781 12/05/07
MTB > Regress c2 4 c4 c6 c7 c10 ;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Brief 2.
Regression 2
Regression Analysis: M1 versus PrRt, GDP, Dummy, year
The regression equation is
M1 = 321 - 20.7 PrRt + 0.0415 GDP - 174 Dummy + 14.5 year
Predictor
Coef SE Coef
Constant
321.24
66.06
PrRt
-20.668
4.016
GDP
0.04152 0.01055
Dummy
-173.71
40.96
year
14.530
3.077
S = 59.9651
R-Sq = 98.2%
T
P
VIF
4.86 0.000
-5.15 0.000
2.160
3.94 0.000 20.791
-4.24 0.000
5.444
4.72 0.000 24.254
R-Sq(adj) = 98.0%
Analysis of Variance
Source
DF
SS
Regression
4 8483706
Residual Error 43
154620
Total
47 8638326
MS
2120927
3596
Source
PrRt
GDP
Dummy
year
DF
1
1
1
1
F
589.83
P
0.000
Seq SS
3746
8260319
139454
80187
Unusual Observations
Obs PrRt
M1
Fit SE Fit Residual St Resid
23 20.0
436.70 371.34
38.39
65.36
1.42 X
35
6.0 1129.70 982.21
19.04
147.49
2.59R
36
7.3 1150.70 988.13
14.53
162.57
2.79R
37
9.0 1127.40 980.00
12.08
147.40
2.51R
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
MTB > Regress c2 3 c4 c6 c7 ;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
Brief 2.
Regression 3
Regression Analysis: M1 versus PrRt, GDP, Dummy
The regression equation is
M1 = 451 - 14.3 PrRt + 0.0865 GDP - 240 Dummy
Predictor
Coef
SE Coef
T
P
VIF
Constant
450.99
73.19
6.16 0.000
PrRt
-14.269
4.605 -3.10 0.003 1.914
GDP
0.086456 0.005548 15.58 0.000 3.875
Dummy
-239.76
46.90 -5.11 0.000 4.809
S = 73.0515
R-Sq = 97.3%
R-Sq(adj) = 97.1%
Analysis of Variance
Source
DF
SS
Regression
3 8403519
Residual Error 44
234807
Total
47 8638326
Source
PrRt
GDP
Dummy
DF
1
1
1
MS
2801173
5337
F
524.91
P
0.000
Seq SS
3746
8260319
139454
Unusual Observations
Obs PrRt
M1
Fit
23 20.0
436.7 435.7
SE Fit
43.7
Residual
1.0
St Resid
0.02 X
4
252x0781 12/05/07
35
6.0 1129.7 941.0
20.6
188.7
2.69R
36
7.3 1150.7 959.0
16.0
191.7
2.69R
37
9.0 1127.4 962.1
14.0
165.3
2.30R
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
4) What did I do to get from Regression 1 to regression 3 and why? (2)
5) Why was I now ready to quit dropping variables and do a ‘best subsets’ regression? (1) [9]
6) What would the money supply be that would be predicted for 1970 assuming that the numbers given for
1970 are correct? By what percent is it off the actual value? (2)
7) Can you make this into a rough prediction interval? Does this include the actual value for 1970? (2) [13]
MTB > BReg c2 c4 c6 c7 ;
SUBC>
NVars 1 3;
SUBC>
Best 2;
SUBC>
Constant.
Regression 4
Best Subsets Regression: M1 versus PrRt, GDP, Dummy
Response is M1
Vars
1
1
2
2
3
R-Sq
95.6
67.8
96.7
95.7
97.3
R-Sq(adj)
95.6
67.1
96.5
95.5
97.1
Mallows
Cp
26.5
477.7
11.6
28.1
4.0
S
90.432
246.02
79.727
91.197
73.051
D
P
u
r G m
R D m
t P y
X
X
X X
X X
X X X
8) What is Regression 4 telling me to do? Why can you say that? (2)
MTB > Regress c2 3 c4 c6 c7 ;
SUBC>
GFourpack;
SUBC>
RType 1;
SUBC>
Constant;
SUBC>
VIF;
Regression 5
5
252x0781 12/05/07
SUBC>
SUBC>
DW;
Brief 2.
Regression Analysis: M1 versus PrRt, GDP, Dummy
The regression equation is
M1 = 451 - 14.3 PrRt + 0.0865 GDP - 240 Dummy
Predictor
Coef
SE Coef
T
P
VIF
Constant
450.99
73.19
6.16 0.000
PrRt
-14.269
4.605 -3.10 0.003 1.914
GDP
0.086456 0.005548 15.58 0.000 3.875
Dummy
-239.76
46.90 -5.11 0.000 4.809
S = 73.0515
R-Sq = 97.3%
R-Sq(adj) = 97.1%
Analysis of Variance
Source
DF
SS
Regression
3 8403519
Residual Error 44
234807
Total
47 8638326
Source
PrRt
GDP
Dummy
DF
1
1
1
MS
2801173
5337
F
524.91
P
0.000
Seq SS
3746
8260319
139454
Unusual Observations
Obs PrRt
M1
Fit SE Fit Residual St Resid
23 20.0
436.7 435.7
43.7
1.0
0.02 X
35
6.0 1129.7 941.0
20.6
188.7
2.69R
36
7.3 1150.7 959.0
16.0
191.7
2.69R
37
9.0 1127.4 962.1
14.0
165.3
2.30R
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
Durbin-Watson statistic = 0.445619
Residual Plots for M1
9) Regression 5 is just a repeat of regression 3, but now I am doing residual analysis. What are the DurbinWatson statistic and the plot of residuals vs. order telling me is present? What 2 conditions for regression
seem to be being violated? (3) [18]
6
252x0781 12/05/07
MTB > Regress c2 4 c4 c6 c7 c13;
SUBC>
GFourpack;
SUBC>
RType 1;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
DW;
SUBC>
Brief 2.
Regression 6
Regression Analysis: M1 versus PrRt, GDP, Dummy, GDPsq
The regression equation is
M1 = 131 - 13.1 PrRt + 0.187 GDP - 26.3 Dummy - 0.000007 GDPsq
Predictor
Coef
SE Coef
T
P
Constant
131.36
64.18
2.05 0.047
PrRt
-13.142
3.050 -4.31 0.000
GDP
0.18659
0.01370 13.62 0.000
Dummy
-26.33
41.88 -0.63 0.533
GDPsq
-0.00000671 0.00000088 -7.59 0.000
S = 48.3231
R-Sq = 98.8%
R-Sq(adj) = 98.7%
Analysis of Variance
Source
DF
SS
Regression
4 8537916
Residual Error 43
100410
Total
47 8638326
Source
PrRt
GDP
Dummy
GDPsq
DF
1
1
1
1
MS
2134479
2335
F
914.07
VIF
1.919
53.994
8.764
33.120
P
0.000
Seq SS
3746
8260319
139454
134396
Unusual Observations
Obs PrRt
M1
Fit SE Fit Residual St Resid
23 20.0
436.70
386.21
29.65
50.49
1.32 X
35
6.0 1129.70
997.46
15.53
132.24
2.89R
36
7.3 1150.70 1020.24
13.32
130.46
2.81R
37
9.0 1127.40 1026.39
12.53
101.01
2.16R
42
9.5 1087.70 1191.94
14.93
-104.24
-2.27R
48
8.0 1365.90 1320.38
30.91
45.52
1.23 X
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
Durbin-Watson statistic = 0.551845
Residual Plots for M1
7
252x0781 12/05/07
10) I now felt free to add the square of GDP as a new independent variable? What happened to the VIFs?
Do I care? Why? (2)
11) What did adding the square of GDP do to the significance of my coefficients and the fraction of the
variation of Y that is explained by the equation? (2) [22]
MTB > let c14 = loge (c2)
MTB > Regress c14 4 c4 c6 c7 c13;
SUBC>
GFourpack;
SUBC>
RType 1;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
DW;
SUBC>
Brief 2.
Regression 7
Regression Analysis: log M1 versus PrRt, GDP, Dummy, GDPsq
The regression equation is
log M1 = 4.79 + 0.00846 PrRt + 0.000453 GDP + 0.0289 Dummy - 0.000000 GDPsq
Predictor
Coef
SE Coef
T
P
Constant
4.7882
0.1358
35.26 0.000
PrRt
0.008461
0.006453
1.31 0.197
GDP
0.00045309 0.00002899
15.63 0.000
Dummy
0.02889
0.08862
0.33 0.746
GDPsq
-0.00000002 0.00000000 -11.66 0.000
S = 0.102246
R-Sq = 98.5%
R-Sq(adj) = 98.4%
Analysis of Variance
Source
DF
SS
Regression
4 29.3981
Residual Error 43
0.4495
Total
47 29.8476
Source
PrRt
GDP
DF
1
1
MS
7.3495
0.0105
F
703.01
VIF
1.919
53.994
8.764
33.120
P
0.000
Seq SS
1.2680
25.6375
8
252x0781 12/05/07
Dummy
GDPsq
1
1
1.0725
1.4202
Unusual Observations
Obs PrRt log M1
Fit SE Fit Residual St Resid
23 20.0 6.0792 6.1618 0.0627
-0.0826
-1.02 X
42
9.5 6.9918 7.2158 0.0316
-0.2239
-2.30R
48
8.0 7.2196 7.0393 0.0654
0.1803
2.29RX
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
Durbin-Watson statistic = 0.306367
Residual Plots for log M1
12) I just replaced the money supply by its logarithm. The residual analysis tells me this was a sort of good
idea? What does that mean? (1)[23]
13) What is really weird about these coefficients? Which one has the wrong sign? (1)
MTB > Regress c14 3 c4 c6
SUBC>
GFourpack;
SUBC>
RType 1;
SUBC>
Constant;
SUBC>
VIF;
c13;
Regression 8
9
252x0781 12/05/07
SUBC>
SUBC>
DW;
Brief 2.
Regression Analysis: log M1 versus PrRt, GDP, GDPsq
The regression equation is
log M1 = 4.83 + 0.00732 PrRt + 0.000445 GDP - 0.000000 GDPsq
Predictor
Coef
SE Coef
T
P
Constant
4.83016
0.04310 112.06 0.000
PrRt
0.007316
0.005359
1.37 0.179
GDP
0.00044536 0.00001650
26.99 0.000
GDPsq
-0.00000002 0.00000000 -15.60 0.000
S = 0.101203
R-Sq = 98.5%
R-Sq(adj) = 98.4%
Analysis of Variance
Source
DF
SS
Regression
3 29.3970
Residual Error 44
0.4506
Total
47 29.8476
Source
PrRt
GDP
GDPsq
DF
1
1
1
MS
9.7990
0.0102
F
956.75
VIF
1.351
17.854
18.176
P
0.000
Seq SS
1.2680
25.6375
2.4915
Unusual Observations
Obs
23
42
48
PrRt
20.0
9.5
8.0
log M1
6.0792
6.9918
7.2196
Fit
6.1606
7.2104
7.0413
SE Fit
0.0620
0.0267
0.0644
Residual
-0.0814
-0.2186
0.1783
St Resid
-1.02 X
-2.24R
2.28RX
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
Durbin-Watson statistic = 0.289829
Residual Plots for log M1
MTB > Regress c14 2
SUBC>
GFourpack;
SUBC>
RType 1;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
DW;
SUBC>
Brief 2.
c6
c13;
Regression 9
10
252x0781 12/05/07
Regression Analysis: log M1 versus GDP, GDPsq
The regression equation is
log M1 = 4.87 + 0.000457 GDP - 0.000000 GDPsq
Predictor
Coef
SE Coef
T
P
Constant
4.87027
0.03184 152.96 0.000
GDP
0.00045654 0.00001446
31.58 0.000
GDPsq
-0.00000002 0.00000000 -18.76 0.000
S = 0.102169
R-Sq = 98.4%
R-Sq(adj) = 98.4%
Analysis of Variance
Source
DF
SS
Regression
2 29.378
Residual Error 45
0.470
Total
47 29.848
Source
GDP
GDPsq
DF
1
1
MS
14.689
0.010
F
1407.18
VIF
13.455
13.455
P
0.000
Seq SS
25.705
3.673
Unusual Observations
Obs
GDP log M1
Fit SE Fit Residual St Resid
42
9817 6.9918 7.1988 0.0256
-0.2070
-2.09R
47 12434 7.2249 7.0925 0.0478
0.1324
1.47 X
48 13195 7.2196 7.0041 0.0590
0.2154
2.58RX
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
Durbin-Watson statistic = 0.208342
Residual Plots for log M1
MTB > Regress c14 3
SUBC>
GFourpack;
SUBC>
RType 1;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
DW;
SUBC>
Brief 2.
c6
c13 c8;
Regression 10
Regression Analysis: log M1 versus GDP, GDPsq, GDPPr
The regression equation is
11
252x0781 12/05/07
log M1 = 4.87 + 0.000465 GDP - 0.000000 GDPsq - 0.000001 GDPPr
Predictor
Coef
SE Coef
T
P
VIF
Constant
4.86787
0.03240 150.23 0.000
GDP
0.00046548 0.00002208
21.08 0.000 30.892
GDPsq
-0.00000002 0.00000000 -16.38 0.000 17.958
GDPPr
-0.00000070 0.00000130
-0.54 0.593
5.889
S = 0.102985
R-Sq = 98.4%
R-Sq(adj) = 98.3%
Analysis of Variance
Source
DF
SS
Regression
3 29.3810
Residual Error 44
0.4667
Total
47 29.8476
MS
9.7937
0.0106
F
923.42
P
0.000
Source DF
Seq SS
GDP
1 25.7052
GDPsq
1
3.6727
GDPPr
1
0.0031
Unusual Observations
Obs
GDP log M1
Fit SE Fit Residual St Resid
42
9817 6.9918 7.1826 0.0396
-0.1908
-2.01R
48 13195 7.2196 6.9803 0.0741
0.2393
3.35RX
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
Durbin-Watson statistic = 0.196041
14) What has happened to significance and the fraction of the variation in the dependent variable explained
by the regression in Regressions 8), 9) and 10.? In terms of significance etc. which of these 3 is the ‘best’
regression? Why would the Chairman of the FRB be very annoyed? (3) [27]
Residual Plots for log M1
MTB > Regress c14 4
SUBC>
GFourpack;
SUBC>
RType 1;
SUBC>
Constant;
SUBC>
VIF;
SUBC>
DW;
SUBC>
Brief 2.
c6
c13 c8
Not Shown.
c15;
Regression 11
Regression Analysis: log M1 versus GDP, GDPsq, GDPPr, logM1l
The regression equation is
log M1 = - 0.174 + 0.000001 GDP - 0.000000 GDPsq - 0.000001 GDPPr + 1.04 logM1l
Predictor
Coef
SE Coef
T
Constant
-0.1738
0.2820 -0.62
GDP
0.00000085 0.00002708
0.03
GDPsq
-0.00000000 0.00000000 -0.36
GDPPr
-0.00000109 0.00000045 -2.39
logM1l
1.04474
0.05838 17.89
S = 0.0358443
R-Sq = 99.8%
R-Sq(adj) =
Analysis of Variance
Source
DF
SS
Regression
4 29.7924
Residual Error 43
0.0552
Total
47 29.8476
Source
GDP
GDPsq
GDPPr
logM1l
DF
1
1
1
1
MS
7.4481
0.0013
P
0.541
0.975
0.723
0.021
0.000
99.8%
F
5797.02
VIF
383.407
136.981
5.902
80.236
P
0.000
Seq SS
25.7052
3.6727
0.0031
0.4114
Unusual Observations
12
252x0781 12/05/07
Obs
28
37
38
48
GDP
4463
7398
7817
13195
log M1
6.58576
7.02767
6.98601
7.21957
Fit
6.49641
7.09767
7.07589
7.18793
SE Fit
0.00824
0.00984
0.00849
0.02830
Residual
0.08935
-0.07000
-0.08988
0.03164
St Resid
2.56R
-2.03R
-2.58R
1.44 X
R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large leverage.
Durbin-Watson statistic = 1.17315
Residual Plots for log M1
Not displayed.
15) So what problem did this fix? Incidentally what I added to the independent variables was the money
supply of the previous period? (1) [28]
13
252x0781 12/05/07
II. Do at least 4 of the following 8 Problems (at least 12 each) (or do sections adding to at least 50 points –
(Anything extra you do helps, and grades wrap around). It is especially important to do more if you have
skipped much of parts I or II. Show your work! State H 0 and H1 where applicable. Use a significance
level of 5% unless noted otherwise. Do not answer questions without citing appropriate statistical tests
– That is, explain your hypotheses and what values from what table were used to test them. Clearly
label what section of each problem you are doing! The entire test has about 160 points, but 70 is
considered a perfect score. Don’t waste our time by telling me that two means, proportions, variances
or medians don’t look the same to you. You need statistical tests! There are some blank pages below.
Put your name on as many loose pages as possible! Mark sections of your answer clearly.
1). Multiple choice.
a) If I want to test to see if the mean of x1 is larger than the mean of x 2 my null hypothesis is:
(Note: D  1   2 ) Only check one answer!
(2)
i) 1   2 and D  0
ii) 1   2 and D  0
v) 1   2 and D  0
vi) 1   2 and D  0
iii) 1   2 and D  0
iv) 1   2 and D  0
vii) 1   2 and D  0
viii) 1   2 and D  0
b) Compared to multiple regression, simple regression is different in having only one
i) Observation
ii) Parameter
iii) Dependent variable
iv) Independent variable
v) Y-intercept
vi) All of the above
c) For the following quantities, mark their lines with yes (Y) or no (N) as to whether they must be positive
___ R 2 adjusted for degrees of freedom
___ The correlation rx1 x2 between two independent variables x1 and x 2
___ S xy 
 xy  nx y 
___ The coefficient b0 in a multiple regression.
d) Assume that we wish to test the hypothesis that a mean is greater than 3 and we compute the ratio
x 3
t
where our sample statistics are computed from a sample of 29. If   .05 , we reject the null
sx
hypothesis if
i) t is above 1.645 or below -1.645
ii) t is above 1.960 or below -1.960
iii) t is below – 1.645
iv) t is below -1.960
v) t is above 1.645
vi) t is above 1.960
vii) None of the above. (Fill in a more appropriate answer!)
14
252x0781 12/05/07
e) Consumers are asked to take the Pepsi Challenge. They were asked they which cola they preferred and
the number that preferred Pepsi was recorded. Sample 1 was males and sample 2 was females. The
following was run on Minitab
MTB > PTwo 109 46 52 13;
SUBC>
Pooled.
Test and CI for Two Proportions
Sample
X
N Sample p
1
46 109 0.422018
2
13
52 0.250000
Difference = p (1) - p (2)
Estimate for difference: 0.172018
95% CI for difference: (0.0221925, 0.321844)
Test for difference = 0 (vs not = 0): Z = 2.12
P-Value = 0.034
On the basis of the printout above we can say one of the following.
i) At a 99% confidence level we can say that we have enough evidence to state that the proportion
of men that prefer Pepsi differs from the proportion of women that prefer Pepsi
ii) At a 95% confidence level we can say that we have enough evidence to state that the proportion
of men that prefer Pepsi differs from the proportion of women that prefer Pepsi
iii) At a 99% confidence level we can say that we have enough evidence to state that the proportion
of men that prefer Pepsi equals the proportion of women that prefer Pepsi.
iv) At a 96% confidence level there is insufficient evidence to indicate that the proportion of men
that prefer Pepsi differs from the proportion of women that prefer Pepsi
f) A researcher is comparing room temperatures preferred by random samples of 135 adults and 80
children. The Minitab output follows.
MTB > TwoT 135 77.5 4.5 80 76.5 2.5;
SUBC>
Alternative 1.
Two-Sample T-Test and CI
Sample
1
2
N
135
80
Mean
77.50
76.50
StDev
4.50
2.50
SE Mean
0.39
0.28
Difference = mu (1) - mu (2)
Estimate for difference: 1.000
95% lower bound for difference: 0.211
T-Test of difference = 0 (vs >): T-Value = 2.09
P-Value = 0.019
DF = 212
On the basis of what you see here and the way we have stated null-alternate hypothesis pairs in class we
come to the following conclusion if we use a 99% confidence level.
i) Do not reject H 0 : 1   2
ii) Do not reject H 0 : 1   2
iii) Do not reject H 0 : 1   2
iv) Reject H 0 : 1   2
v) Reject H 0 : 1   2
vi) Reject H 0 : 1   2
vii) None of the above (Fill in a more appropriate answer!)
15
252x0781 12/05/07
2) The data below represent the sales of Friendly Autos for 7 randomly selected months. They believe that
the number of cars sold depends on the average price for that month (in $ thousands), Number of
advertising spots that appeared on the local TV station and whether other types of advertising were used in
that month (a dummy variable that is 1 if other types of advertising were used in a given month.
Row
1
2
3
4
5
6
7
Sold
10
8
12
13
9
14
15
Price
28.2
28.7
27.9
27.8
28.1
28.8
28.9
Adv
10
6
14
18
10
19
20
Type
1
1
1
0
0
1
1
Sum of Sold = 81, Sum of Price = 198.4, Sum of Adv = 97, Sum of Sold squared = 979, Sum of Price
squared = 5624.44, Sum of Adv squared = 1517, Sum of Sold * Price = 2297.4, Sum of Sold * Adv = 1206,
Sum of Price * Adv = 2751.4.
a) If advertising (Adv) is x5 (it isn’t) and Type is x 6 , compute
x
5 x6
(2)
b) Compute the coefficients of the equation Yˆ  b0  b1 x to predict the value of ‘Sold’ on the basis of
‘Price.’ (5)
c) Compute R 2 and R 2 adjusted for degrees of freedom. (4)
d) Compute the standard error s e . (3)
e) Is the slope of the simple regression significant at the 1% level? Do not answer this question without
appropriate calculations! (3)
[17]
f) Is the sign of the coefficient of Price, what you expected? Why or why not? (1)
g) Predict the average number of cars that will be sold when the price is $30 thousand using the equation
you got and make it into an appropriate interval. (4)
h) Do a 1% confidence interval for  o , the y-intercept. (3)
[24, 36]
16
252x0781 12/05/07
3) The data below represent the sales of Friendly Autos for 7 randomly selected months. They believe that
the number of cars sold depends on the average price for that month (in $ thousands), Number of
advertising spots that appeared on the local TV station and whether other types of advertising were used in
that month (a dummy variable that is 1 if other types of advertising were used in a given month.
Row
1
2
3
4
5
6
7
Sold
10
8
12
13
9
14
15
Price
28.2
28.7
27.9
27.8
28.1
28.8
28.9
Adv
10
6
14
18
10
19
20
Type
1
1
1
0
0
1
1
Sum of Sold = 81, Sum of Price = 198.4, Sum of Adv = 97, Sum of Sold squared = 979, Sum of Price
squared = 5624.44, Sum of Adv squared = 1517, Sum of Sold * Price = 2297.4, Sum of Sold * Adv = 1206,
Sum of Price * Adv = 2751.4.
a) Do a multiple regression of ‘Sold’ against ‘Price’ and ‘Advertising.’ Attempts to recycle b1 from the
previous page or to compute b2 by using a simple regression formula won’t work and won’t get any credit.
(12)
b) Compute R 2 and R 2 adjusted for degrees of freedom. (3)
c) i) Do an ANOVA for the simple regression using either your regression sum of squares or R 2 (2).
ii) Do a similar ANOVA for the multiple regression. (2) iii) Combine the two ANOVAs to do an F test to
see if the addition of ‘Adv’ was worthwhile. (2) [21]
d) Predict the average number of cars that will be sold when the price is $30 thousand and there are 15 spots
using the equation you got and make it into an appropriate interval. (3) [24, 60]
17
252x0781 12/05/07
4) The data below represent the sales of Friendly Autos for 7 randomly selected months. They believe that
the number of cars sold depends on the average price for that month (in $ thousands), Number of
advertising spots that appeared on the local TV station and whether other types of advertising were used in
that month (a dummy variable that is 1 if other types of advertising were used in a given month.
Row
1
2
3
4
5
6
7
Sold
10
8
12
13
9
14
15
Price
28.2
28.7
27.9
27.8
28.1
28.8
28.9
Adv
10
6
14
18
10
19
20
Type
1
1
1
0
0
1
1
The Minitab output below gives the full regression of ‘Sold’ against all three independent variables.
Regression Analysis: Sold versus Adv, Price, Type
The regression equation is
Sold = 8.46 + 0.487 Adv - 0.153 Price + 0.982 Type
Predictor
Constant
Adv
Price
Type
Coef
8.457
0.48699
-0.1530
0.9815
S = 0.218501
SE Coef
6.990
0.01696
…………
0.2297
R-Sq = 99.7%
Analysis of Variance
Source
DF
SS
Regression
3 41.571
Residual Error
3
0.143
Total
6 41.714
Source
Adv
Price
Type
DF
1
1
1
T
1.21
28.72
…………
4.27
P
0.313
0.000
0.586
0.024
R-Sq(adj) = 99.3%
MS
13.857
0.048
F
290.24
P
0.000
Seq SS
40.404
0.295
0.872
a) Using the material in this output find the value of R 2 for a regression against ‘Adv’ alone. (2)
b) Look at the line that represents the coefficient of ‘Price.’ What about the coefficient makes me happy?
What about the coefficient makes me sad? (2)
c) Find the partial correlation of ‘Type’ with ‘Sold.’ (2)
d) Since you now have enough information to do it, use an F test the see whether the addition of the two
advertising independent variables as a pair was worthwhile. (4) [10]
18
252x0781 12/05/07
(Blank)
19
252x0781 12/05/07
Row
1
2
3
4
5
6
7
Sold
10
8
12
13
9
14
15
Price
28.2
28.7
27.9
27.8
28.1
28.8
28.9
Adv
10
6
14
18
10
19
20
Type
1
1
1
0
0
1
1
Sum of Sold = 81, Sum of Price = 198.4, Sum of Adv = 97, Sum of Sold squared = 979, Sum of Price
squared = 4868.44, Sum of Adv squared = 1517, Sum of Sold * Price = 1997.4, Sum of Sold * Adv = 1206,
Sum of Price * Adv = 2751.4.
e) Compute the correlation between ‘Adv’ and ‘Price’ and test it for significance. Try to use the spare parts
that you already have. (4) [14]
f) Test the same correlation to see if it is 0.2. (4) [18, 78]
g) Don’t forget to hand in your last computer problem. Check here if you did. __________________.
(2 to 7)
[78+]
20
252x0781 12/05/07
5) The manager of a computer network has the following data on the 200 service interruptions that have
occurred over the last 100 days.
x
O
0
1
2
3
4
5
6
7
2
51
18
12
11
4
1
1
100
x
O
0
1
2
3
4
5
3
16
30
29
18
4
a) Test to see if these follow a Poisson distribution (6)
b) Use another method to test whether this has a Poisson distribution
with a parameter of 1.8. (5)
c) A coin is to be tested to see if it is fair. In order to test it the coin
is given 5 flips 100 times and the number of heads in 5 flips is
recorded at left. This means that there are a total of 500 flips and the
coin has come up heads 255 times. Construct a 99% confidence
interval for the proportion of times it comes up heads. Test the
hypothesis that the proportion is 50% using this interval. (4)
d) The distribution shown here should be a binomial distribution
with n  5 and p  .5 . A more powerful test of the fairness of the
coin should be to use probabilities from your cumulative binomial
table to check whether this distribution is correct. (4) [19, 97]
e) Assume that a coin is flipped 20 times and comes up heads half the time. If the sequence of heads and
tails is HHHTTTHHHTTTHHHTTTTH, can we say that the sequence is random? (This is not a yes or no
question – I want a statistical test for randomness! (2)
f) Now assume that there are 5 times as many flips and 5 times as many runs and heads half the time. Can
we say that the sequence is random now? (3) [24, 102]
21
252x0781 12/05/07
6) Do the following. Use a 1% significance level in this problem!
a) (Multiple choice) I wish to test to see if a distribution is Normal, but I must first use my data to figure out
the mean and standard deviation. I have 100 data points divided into 0 to under 20, 20 to under 40, 40 to
under 60, 60 to under 80 and 80 to under 100. Assume that my expected frequency is 5 or larger for each
class. I could use
(i) A chi-squared test with 4 degrees of freedom or a Kolmogorov – Smirnov test.
(ii) A chi-squared test with 2 degrees of freedom or a Kolmogorov – Smirnov test.
(iii) A chi-squared test with 4 degrees of freedom or a Lilliefors test.
(iv) A chi-squared test with 2 degrees of freedom or a Lilliefors test.
(v) Only a Lilliefors test.
(vi) Only a Kolmogorov – Smirnov test.
(vii) Only a chi-squared test.
(2)
b) (Bassett et al) An industrial process is run at 4 different temperatures on four different days. A random
sample of 3 units is taken and scored. The results are as follows. Do the scores differ according to
temperature?
100C degrees
41
44
48
120C degrees
54
56
53
140C degrees
50
52
48
160C degrees.
38
36
41
Minitab has computed the following.
Sum of 100C = 133, Sum of 120C = 163, Sum of 140C = 150, Sum of 160C = 115,
Sum of squares of 100C = 5921, Sum of squares of 120C = 8861, Sum of squares of
140C = 7508, Sum of squares of 160C = 4421, Bartlett's Test - Test statistic =
1.22, p-value = 0.748 and Levene's Test - Test statistic = 0.43, p-value =
0.736.
Assume that the scores are not considered to come from the Normal distribution, state your null hypothesis
and test it. (5).
c) Assume that the scores are considered to come from the Normal distribution, state your null hypothesis
and test it. (6)
d) Why were the Bartlett and Levene tests run? Which of the two is correct here if the underlying
distribution is Normal? What do they tell us? (2) [15]
e) Ignore everything that has gone before. Assume that the Normal distribution applies and test the
hypothesis that the mean of the 120C population is larger than the mean of the 100C population. Assume
that the underlying distributions are Normal and have equal variances (4) or assume that the underlying
distributions are Normal and do not necessarily have equal variances. (6) Do not do both! [19, 116]
22
252x0781 12/05/07
(Blank)
23
252x0781 12/05/07
7) The following are tests of proportions. (Bassett et al). You must do legitimate tests at the 10%
significance level.
a) Is there any association between Forecasted and observed rainfall? 141 forecasts are considered.
Observed Rainfall No rain Forecasted
None
34
Light Rain
21
Heavy Rain
23
Light Rain Forecasted
24
4
9
Heavy Rain Forecasted.
17
3
38
State your null and alternative hypotheses and test it. (7)
b) Are there significant differences in the proportions of female insects in 3 different locations?
In location 1, 44% of 100 bugs are female. In location 2, 43% of 200 bugs are female. In location 3, 55% of
200 bugs are female. First test to see if there is a significant difference between the proportions in locations
1 and 2. (4)
c) In b, test whether proportions of females are independent of location using all three proportions. (5) [16,
132]
24
252x0781 12/05/07
8) The following are odds and ends that don’t fit anywhere else. We are selling our production in an
imperfect market. x1 is the number of units produced and x2 is our revenue. r1 and r2 are the ranks of the
items in x1 and x2.   .05 
Row
1
2
3
4
5
6
7
8
9
10
x1
330
263
428
584
423
219
308
123
173
140
x2
221
194
245
243
244
171
213
108
143
120
r1
7
5
9
10
8
4
6
1
3
2
r2
7
5
10
8
9
4
6
1
3
2
Minitab has computed the following: sum of x1
= 2991,
squares
squares
x1 x2 =
Sum of x2 = 1902, Sum of
of x1 = 1088721, Sum of
of x2 = 386210 and Sum of
767524.
a) Test x1 to see if its median is 200. Do not use the sign test or compute any medians. (4)
b) Assuming that x1 and x2 are both random samples from a nonnormal distribution, test to see if they have
similar medians. (4)
c) Compute the correlation between x1 and x2 and the rank correlation between them. Why is the rank
correlation higher? (6)
d) Test the rank correlation for significance. (2) [16]
25
Download