252y0582h 12/5/05 Student Number: _________________________ Class days and time : _________________________

advertisement
252y0582h 12/5/05
ECO252 QBA2
Final EXAM
December 14-16, 2005
TAKE HOME SECTION
Name: _________________________
Student Number: _________________________
Class days and time : _________________________
III Take-home Exam (20+ points)
A) 4th computer problem (5+)
This is an internet project. You should do only one of the following 2 problems.
Problem 1: In his book, Statistics for Economists: An Intuitive Approach (New York, HarperCollins,
1992), Alan S. Caniglia presents data for 50 states and the District of Columbia. These data are presented as
an appendix at the end of this section.
The Data Consists of six variables.
The dependent variable, MIM, the mean income of males (having income) who are 18 years of age or older.
PMHS, the percent of males 18 and older who are high school graduates.
PURBAN, the percent of total population living in an urban area.
MAGE, the median age of males.
Using his data, I got the results below.
Regression Analysis: MIM versus PMHS
The regression equation is
MIM = 2736 + 180 PMHS
Predictor
Constant
PMHS
Coef
2736
180.08
S = 1430.91
SE Coef
2174
31.31
R-Sq = 40.3%
T
1.26
5.75
P
0.214
0.000
R-Sq(adj) = 39.1%
Analysis of Variance
Source
DF
SS
Regression
1
67720854
Residual Error 49 100328329
Total
50 168049183
MS
67720854
2047517
F
33.07
P
0.000
Unusual Observations
Obs PMHS
MIM
Fit SE Fit Residual St Resid
1 69.1 12112 15180
200
-3068
-2.17R
3 71.6 12711 15630
215
-2919
-2.06R
50 81.9 21552 17485
447
4067
2.99R
R denotes an observation with a large standardized residual.
His only comment is that a 1% increase in the percent of males that are college graduates results is
associated with about a $180 increase in male income and that there is evidence here that the relationship is
significant.
He then describes three dummy variables: NE = 1 if the state is in the Northeast (Maine through
Pennsylvania in his listing); MW = 1 if the state is in the Midwest (Ohio through Kansas) and SO = 1 if the
state is in the South (Delaware through Texas). If all of the dummy variables are zero, the state is in the
West (Montana through Hawaii). I ran the regression with all six independent variables. To check these
variables, look at his data.
MTB > regress c2 6 c3-c8;
SUBC> VIF;
SUBC> brief 2.
Regression Analysis: MIM versus PMHS, PURBAN, MAGE, NE, MW, SO
The regression equation is
MIM = - 1294 + 198 PMHS + 49.4 PURBAN - 42 MAGE + 247 NE + 757 MW + 1269 SO
1
252y0582h 12/5/05
Predictor
Constant
PMHS
PURBAN
MAGE
NE
MW
SO
Coef
-1294
198.13
49.36
-42.1
246.6
756.7
1268.9
S = 1271.71
SE Coef
5394
53.97
14.27
151.6
723.7
608.2
863.0
R-Sq = 57.7%
T
-0.24
3.67
3.46
-0.28
0.34
1.24
1.47
DF
1
1
1
1
1
1
VIF
3.8
1.4
1.5
2.4
2.1
5.2
R-Sq(adj) = 51.9%
Analysis of Variance
Source
DF
SS
Regression
6
96890414
Residual Error 44
71158768
Total
50 168049183
Source
PMHS
PURBAN
MAGE
NE
MW
SO
P
0.811
0.001
0.001
0.783
0.735
0.220
0.149
MS
16148402
1617245
F
9.99
P
0.000
Seq SS
67720854
23781889
281110
1416569
193443
3496549
Unusual Observations
Obs PMHS
MIM
Fit SE Fit Residual St Resid
50 81.9 21552 16999
543
4553
3.96R
R denotes an observation with a large standardized residual.
He has asked whether region affects the independent variable, on the strength of the significance tests in the
output above, he concludes that the regional variables do not have any affect on male income. (Median Age
looks pretty bad too.)
There are two ways to confirm these conclusions. Caniglia does one of these, an F test that shows whether
the regional variables as a group have any effect. He says that they do not. Another way to test this is by
using a stepwise regression.
MTB > stepwise c2 c3-c8
Stepwise Regression: MIM versus PMHS, PURBAN, MAGE, NE, MW, SO
Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15
Response is MIM on 6 predictors, with N = 51
Step
Constant
1
2736
2
2528
PMHS
T-Value
P-Value
180
5.75
0.000
134
4.46
0.000
PURBAN
T-Value
P-Value
S
R-Sq
R-Sq(adj)
Mallows C-p
50
3.86
0.000
1431
40.30
39.08
15.0
1263
54.45
52.55
2.3
More? (Yes, No, Subcommand, or Help)
SUBC> y
No variables entered or removed
More? (Yes, No, Subcommand, or Help)
SUBC> n
What happens is that the computer picks PMHS as the most valuable independent variable, and gets the
same result that appeared in the simple regression above. It then adds PURBAN and gets
MIM = 2528 + 134 PMHS + 50 PURBAN. The coefficients of the 2 independent variables are significant, the
adjusted R-Sq is higher than the adjusted R-sq with all 6 predictors and the computer refuses to add any
more independent variables. So it looks like we have found our ‘best’ regression. (See the text for
interpretation VIFs and C-p’s.)
2
252y0582h 12/5/05
So here is your job. Update this work. Use any income per person variable, a mean or a median for men,
women or everybody. Find measures of urbanization or median age. Fix the categorization of states if you
don’t like it. Regress state incomes against the revised data. Remove the variables with insignificant
coefficients. If you can think of new variables add them. (Last year I suggested trying percent of output or
labor force in manufacturing.) Make sure that you pick variables that can be compared state to state.
Though you can legitimately ask whether size of a state affects per capita income, using total amount
produced in manufacturing is poor because it’s just going to be big for big states. Similarly the fraction of
the workforce with a certain education level is far better then the number. For instructions on how to do a
regression, try the material in Doing a Regression. For data sources, try the sites mentioned in 252Datalinks.
Use F tests for adding the regional variables and use stepwise regression. Don’t give me anything you don’t
understand.
Problem 2: Recently the Heritage Foundation produced the graph below.
What I want to know is if you can develop an equation relating per capita income (the dependent variable)
and Economic freedom x  . Because it is pretty obvious that a straight line won’t work, you will probably
need to create a x 2 variable too. But I would like to know what parts of ‘economic freedom’ affect per
capita income. In addition to the Heritage Foundation Sources, the CIFP site mentioned in 252datalinks,
and the CIA Factbook might provide some interesting independent variables. You should probably use a
sample of no more than 50 countries and it’s up to you what variables to use. You are, of course, looking
for significant coefficients and high R-squares. For instructions on how to do a regression, try the material
in Doing a Regression.
3
252y0582h 12/5/05
B. Do only Problem 1 or problem 2. (Problem Due to Donald R Byrkit). Four different job candidates are
interviewed by seven executives. These are rated for 7 traits on a scale of 1-10 and the scores are added
together to create a total score for each candidate-rater pair that is between 0 and 70. The results appear
below.
Row
1
2
3
4
5
6
7
Sum
Sum
Sum
Sum
Sum
Sum
Raters
Moore
Gaston
Heinrich
Seldon
Greasy
Waters
Pierce
of
of
of
of
of
of
Lee
52
38
54
43
58
36
52
Candidates
Jacobs
25
31
38
30
44
28
41
Wilkes
29
24
40
31
46
22
37
Delap
33
29
39
28
47
25
45
Jacobs = 237
squares (uncorrected) of Jacobs = 8331
Wilkes = 229
squares (uncorrected) of Wilkes = 7947
Delap = 246
squares (uncorrected) of Delap = 9094
Personalize the data by adding the second to last digit of your student number to Lee’s column. For example
Roland Dough’s student number is 123689, so he uses 52 + 8 = 60, 38 + 8 = 46, 62 etc. If the second to
last digit of your student number is zero, add 10.
Problem 1: a) Assume that a Normal distribution applies and use a statistical procedure to compare the
column means, treating each column as an independent random sample. If you conclude that there is a
difference between the column means, use an individual confidence interval to see if there is a significant
difference between the best and second-best candidate. If you conclude that there is no difference between
the means, use an individual confidence interval to see if there is a significant difference between the best
and worst candidate. (6)
b) Now assume that a Normal distribution does not apply but that the columns are still independent random
samples and use an appropriate procedure to compare the column medians. (4)
[16]
Problem 2: a) Assume that a Normal distribution applies and use a statistical procedure to compare the
column means, taking note of the fact that each row represents one executive. If you conclude that there is a
difference between the column means, use an individual confidence interval to see if there is a significant
difference between the best and second-best candidate. If you conclude that there is no difference between
the column means, use an individual confidence interval to see if there is a significant difference between
the kindest and least kind executive. (8)
b) Now assume that a Normal distribution does not apply but that each row represents the opinion of one
rater and use an appropriate procedure to compare the column medians. (4)
c) Use Kendall’s coefficient of concordance to show how the raters differ and do a significance test. (3)
Problem 3: (Extra Credit) Decide between the methods used in Problem 1 and Problem 2. To do this test
for equal variances and for Normality on the computer. What is your decision? Why?
(4)
You can do most of this with the following commands in Minitab if you put your data in 3 columns of
Minitab with A, B, C and D above them.
MTB >
MTB >
SUBC>
SUBC>
MTB >
MTB >
AOVOneway A B C D
stack A B C D C11;
subscripts C12;
UseNames.
rank C11 C13
vartest C11 C12
MTB > Unstack (c13);
SUBC>
Subscripts c12;
SUBC>
After;
SUBC>
VarNames.
MTB > NormTest 'A';
SUBC>
KSTest.
#Does a 1-way ANOVA
# Stacks the data in c12, col.no. in c12.
#Puts the ranks of the stacked data in c13
#Does a bunch of tests, including Levene’s
On stacked data in c11 with IDs in c12.
#Unstacks the ranks in the next available
# columns. Uses IDs in c12.
#Does a test (apparently Lilliefors)for Normality
# on column A.
4
252y0582h 12/5/05
C. You may do both problems. These are intended to be done by hand. A table version of the data for
problem 2 is provided in 2005data1 which can be downloaded to Minitab. I do not want Minitab results for
these data except for Problem 2e.
Problem 1: Using data from the 1970s and 1980s, Alan S. Caniglia calculated a regression of
nonresidential investment on the change in level of final sales to verify the accelerator model of investment.
This theory says that because capital stock must be approximately proportional to production, investment
will be driven by changes in output. In order to check his work I put together a data set 2005series. The last
two years of the series are in Exhibit C1 below.
Exhibit C1
Row Date
73 1988 01
74 1988 02
75 1988 03
76 1988 04
77 1989 01
78 1989 02
79 1989 03
80 1989 04
RPFI
862.406
879.330
882.704
891.502
900.401
901.643
917.375
902.298
Sales
6637.22
6716.38
6749.47
6835.07
6873.33
6933.55
7015.34
7026.76
Sales-4Q
6344.41
6431.37
6510.82
6542.55
6637.22
6716.38
6749.47
6835.07
Change
292.815
285.006
238.644
292.522
236.106
217.171
265.876
191.695
DEFL %Y
2.897
3.318
3.699
3.724
4.013
4.016
3.596
3.537
MINT %
9.88
9.67
9.96
9.51
9.62
9.79
8.93
8.92
RINT
6.983
6.352
6.261
5.786
5.607
5.774
5.334
5.383
‘Date’ consists of the year and the quarter. ‘RPFI’ consists of real fixed private investment from
2005InvestSeries1. ‘Sales’ consists of sales data (actually a version of gross domestic product) from
2005SalesSeries1. ‘Sales-4Q’ (Sales 4 Quarters earlier’ is also sales data from 2005SalesSeries1, but is the
data of one year earlier. (Note that the 1989 numbers in ‘Sales-4Q’ are identical to the 1988 numbers in
‘Sales.’ ‘Change’ is ‘Sales’ – ‘Sales-4Q. ‘DEFL %Y’ is the percent change in the gross domestic deflator
over the last year (a measure of inflation) taken from 2005deflSeries1. ‘MINT %’ is an estimate of the
percent return on Aaa bonds taken from 2005intSeries1. Only the values for January, April, July and
October are used since quarterly data was not available. ‘RINT’ (an estimate of the real interest rate) is
‘MINT %’ - ‘DEFL %Y’.
These are manipulated in the input to the regression program as in Exhibit C2 below.
Exhibit C2
Row Time
73 1988 01
74 1988 02
75 1988 03
76 1988 04
77 1989 01
78 1989 02
79 1989 03
80 1989 04
Y
86.2406
87.9330
88.2704
89.1502
90.0401
90.1643
91.7375
90.2298
X1
29.2815
28.5006
23.8644
29.2522
23.6106
21.7171
26.5876
19.1695
X2
6.98
6.35
6.26
5.79
5.61
5.77
5.33
5.38
Here Y is ‘RFPI’ divided by 10. X1 is ‘Change’ divided by 10. X2 is ‘RINT’ rounded to eliminate the last
decimal place. If you don’t understand how I got Exhibit C2 from Exhibit C1 find out before you go
any further,
Personalize the data by adding one year (four values) to the data in 2005 series. Pick the year to be added
by adding the last digit of your student number to 1990. Make sure that I know the year you are using. Then
get, for your year, ‘RPFI’ from 2005InvestSeries1, ‘Sales’ from 2005SalesSeries1, ‘Sales-4Q’ from
2005SalesSeries1 (Make sure that you use the sales of one year earlier, not 1989 unless your year is 1990.),
‘DEFL %Y’ 2005deflSeries1 and ‘MINT %’ from 2005intSeries1. Calculate ‘Change’ by subtracting
‘Sales-4Q ’ from ‘Sales.’ If you are going to do Problem 2, calculate ‘RINT’ by subtracting ‘DEFL %Y’
from ‘MINT %.’ Present your four rows of new values in the format of Exhibit C1. Now manipulate your
numbers to the form in Exhibit C2 and again present your four rows of numbers. These are observations 81
through 84.
Now it’s time to compute your spare parts. The following are computed for you from the data for 1970
through 1989.
5
252y0582h 12/5/05
Sum of Y = 5323.20
Sum of X1 = 1283.42
Sum of X2 = 328.33
Sum of Ysq = 371032
Sum of X1sq = 30307.57
Sum of X2sq = 2080.65
Sum of X1Y = 92676.9
Sum of X2Y = 24188.2
Sum of X1X2 = 6324.09
 Y  5323 .2
 X 1  1283.42
 X 2  328.33
 Y  371032
 X  30307 .6
 X  2080 .65
 X 1 Y  92676.9
 X 2 Y 24188.2
 X 1 X 2  6324.09
n  80
2
2
1
2
2
Add the results of your data to these sums (You only need the sums involving X1 and Y if you are not doing
Problem 2.) (Show your work!) and do the following.
a. Compute the regression equation Yˆ  b0  b1 x1 to predict investment on the basis of change in
sales only. (2)
b. Compute R 2 . (2)
c. Compute s e . (2)
d. Compute s b0 and do a significance test on b0 (1.5)
e. Compute s b1 and do a significance test on b1 (2)
f. In the first quarter of 2001 sales were 9883.167, the interest rate was 7.15% and the gdp inflation
rate was 2.176%. In the first quarter of 2000 sales were 9668.827. Get values of Y and X1 from
this and predict the level of investment for 2001. Using this create a confidence interval or a
prediction interval for investment in 2001 as appropriate. (3)
g. Do an ANOVA table for the regression. What conclusion can you draw from the hypothesis test
in the ANOVA? (2)
[30]
Problem 2: Continue with the data in problem 1.
a. Compute the regression equation Yˆ  b0  b1 x1  b2 x 2 to predict investment on the basis of real
interest rates and change in sales. Do not attempt to use the value of b1 you got in problem 1. Is
the sign of the coefficient what you expected? Why? (5)
b. Compute R-squared and R-squared adjusted for degrees of freedom for this regression and
compare them with the values for the previous problem. (2)
c. Using either R – squares or SST, SSR and SSE do an F tests (ANOVA). First check the
usefulness of the multiple regression and show whether the use of real interest rates gives a
significant improvement in explanatory power of the regression? (Don’t say a word without
referring to a statistical test.) (3)
d. Use the values in 1f to compute a predicition for 2001 investment. By what percent does the
predicted investment change if you add real interest rates. (2)
e. If you are prepared to explain the results of VIF and Durbin-Watson (Check the text!), run the
regression of Y on X1 and X2 using
MTB > Regress Y 2 X1 X2;
SUBC>
VIF;
SUBC>
DW;
SUBC>
Brief 2.
Explain your results. (2)
[44]
6
252y0582h 12/5/05
Caniglia’s Original Data
Data Display
Row
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
STATE
ME
NH
VT
MA
RI
CT
NY
NJ
PA
OH
IN
IL
MI
WI
MN
IA
MO
ND
SD
NE
KS
DE
MD
DC
VA
WV
NC
SC
GA
FL
KY
TN
AL
MS
AR
LA
OK
TX
MT
ID
WY
CO
NM
AZ
UT
NV
WA
OR
CA
AK
HI
MIM
12112
14505
12711
15362
13911
17938
15879
17639
15225
16164
15793
17551
17137
15417
15878
15249
14743
13835
12406
14873
15504
16081
17321
15861
15506
13998
12529
12660
13966
14651
13328
13349
13301
11968
12274
15365
14818
16135
14256
14297
17615
16672
14057
15269
15788
16820
17042
15833
17128
21552
15268
PMHS
69.1
73.0
71.6
74.0
65.1
71.8
68.9
70.0
68.0
69.0
68.8
68.9
69.3
70.9
73.5
71.9
66.2
68.0
68.3
74.2
74.5
70.4
69.2
67.9
64.3
58.6
58.2
58.2
60.4
68.0
55.8
59.0
59.9
57.2
58.3
61.3
68.7
65.3
73.8
73.5
77.9
79.1
70.6
73.4
80.4
76.0
77.5
75.1
74.3
81.9
76.9
PURBAN
47.5
52.2
33.8
83.8
87.0
78.8
84.6
89.0
69.3
73.3
64.2
83.3
70.7
64.2
66.9
58.6
68.1
48.8
46.4
62.9
66.7
70.6
80.3
100.0
66.0
36.2
48.0
54.1
62.4
84.3
50.9
60.4
60.0
47.3
51.6
68.6
67.3
79.6
52.9
54.0
62.7
80.6
72.1
83.8
84.4
85.3
73.5
67.9
91.3
64.3
86.5
MAGE
29.2
29.2
28.4
29.6
30.1
30.6
30.3
30.7
30.4
28.6
28.0
28.6
27.8
28.3
28.3
28.7
29.3
27.5
27.9
28.6
28.7
28.7
29.2
29.9
28.6
29.1
28.1
26.7
27.3
32.9
27.8
28.7
27.8
26.1
29.2
26.2
28.6
27.1
28.4
27.0
26.7
27.9
26.6
28.2
23.8
30.0
29.0
29.5
28.9
26.3
27.6
NE
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
MW
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
SO
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
7
Download