Econometrics Problems & Answers

advertisement
Professor Òscar Jordà
ECN 140 – ECONOMETRICS
MIDTERM 2
Name:
Spring 2003
Student ID:
Instructions
Completion of this test only requires a writing instrument, scratch paper, and a calculator. Put away all
other materials. Answer the questions in the space provided in the test. You will not receive credit for any
work done in the scratch area or in any papers that you provide. Budget your time appropriately. Remember
to write down YOUR NAME and ID number. Good luck!
Multiple Choice Questions: [20 points]
Please write the answers in the template provide here:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
1) Heteroskedasticity means that
a.
b.
c.
d.
homogeneity cannot be assumed automatically for the model.
the variance of the error term is not constant.
the observed units have different preferences.
agents are not all rational.
Answer: b
2) E(ui | Xi) = 0 says that
a.
b.
c.
d.
dividing the error by the explanatory variable results in a zero (on average).
the sample regression function residuals are unrelated to the explanatory variable.
the sample mean of the Xs is much larger than the sample mean of the errors.
the conditional distribution of the error given the explanatory variable has a zero mean.
Answer: d
3) In the simple linear regression model, the regression slope
a. indicates by how many percent Y increases, given a one percent increase in X.
b. when multiplied with the explanatory variable will give you the predicted Y.
c. indicates by how many units Y increases, given a one unit increase in X.
d. represents the elasticity of Y on X.
Answer: c
4) Under imperfect multicollinearity
a.
b.
c.
d.
the OLS estimator cannot be computed.
two or more of the regressors are highly correlated.
the OLS estimator is biased even in samples of n > 100.
the error terms are highly, but not perfectly, correlated.
Answer: b
1
Professor Òscar Jordà
ECN 140 – ECONOMETRICS
MIDTERM 2
Name:
Spring 2003
Student ID:
5) If you had a two regressor regression model, then omitting one variable which is relevant
a.
b.
c.
d.
will have no effect on the coefficient of the included variable if the correlation between
the excluded and the included variable is negative.
will always bias the coefficient of the included variable upwards.
can result in a negative value for the coefficient of the included variable, even though the
coefficient will have a significant positive effect on Y if the omitted variable were
included.
makes the sum of the product between the included variable and the residuals different
from 0.
Answer: c
6) If you reject a joint null hypothesis using the F-test in a multiple hypothesis setting, then
a.
b.
c.
d.
a series of t-tests may or may not give you the same conclusion.
the regression is always significant.
all of the hypotheses are always simultaneously rejected.
the F-statistic must be negative.
Answer: a
7) A nonlinear function
a.
b.
c.
d.
makes little sense, because variables in the real world are related linearly.
can be adequately described by a straight line between the dependent variable and one of
the explanatory variables.
is a concept that only applies to the case of a single or two explanatory variables since
you cannot draw a line in four dimensions.
is a function with a slope that is not constant.
Answer: d
8) In the regression model Yi =
and D is a dummy variable,
a.
b.
c.
d.
β 0 + β1 X i + β 2 Di + β 3 ( X i × Di ) + ui , where X is a continuous variable
β3
indicates the slope of the regression when D=1.
has a standard error that is not normally distributed even in large samples since D is not a
normally distributed variable.
indicates the difference in the slopes of the two regressions.
has no meaning since ( X i × Di ) = 0 when Di = 0.
Answer: c
9) In the model Yi =
a.
β 0 + β1 X 1 + β 2 X 2 + β 3 ( X 1 × X 2 ) + ui ,
the expected effect
∆Y
∆ X1
is
β1 + β 3 X 2 .
2
Professor Òscar Jordà
ECN 140 – ECONOMETRICS
MIDTERM 2
Name:
Spring 2003
Student ID:
b.
c.
d.
β1 .
β1 + β 3 .
β1 + β 3 X 1 .
Answer: a
10) By including another variable in the regression, you will
a.
b.
c.
d.
2
decrease the regression R if that variable is important.
eliminate the possibility of omitted variable bias from excluding that variable.
look at the t-statistic of the coefficient of that variable and include the variable only if the
coefficient is statistically significant at the 1% level.
decrease the variance of the estimator of the coefficients of interest.
Answer: b
Analytical Questions: [40 points]
1. You have collected data for 104 countries to address the difficult questions of the determinants for
differences in the standard of living among the countries of the world. You recall from your
macroeconomics lectures that the neoclassical growth model suggests that output per worker (per capita
income) levels are determined by, among others, the saving rate and population growth rate. To test the
predictions of this growth model, you run the following regression:
R̂elPersInc = 0.339 – 12.894 × n + 1.397 × sK , R2=0.621, SER = 0.177
(0.068) (3.177)
(0.229)
where RelPersInc is GDP per worker relative to the United States, n is the average population growth rate,
1980-1990, and sK is the average investment share of GDP from 1960 to1990 (remember investment equals
saving). Numbers in parentheses are for heteroskedasticity-robust standard errors.
Hint: in the Solow growth model, the steady state level of savings (and therefore investment) is such that
the level of capital per worker is kept constant. Therefore, investment is such that it replaces the capital that
depreciates and covers the capital for the new workers added by population growth. Countries far away
from steady state require heavy rates of savings (since their capital per worker levels are low). Similarly,
developed countries will exhibit savings rates that are close to the steady state level. With declining
marginal product rates, productivity will be higher the further away a country is from steady state, all else
being equal.
(a)
Interpret the results by interpreting the slope coefficients of the regression, commenting on the fit,
and any other characteristics you deem relevant. How does you interpretation of the regression
results square with the predictions of the Solow growth model?
Answer: The Solow growth model predicts higher productivity with higher saving rates and lower
population growth. The signs therefore correspond to prior expectations. A 10 percent
point increase in the saving rate results in a roughly 14 percent increase in per capita
income relative to the United States. Lowering the population growth rate by 1 percent
3
Professor Òscar Jordà
ECN 140 – ECONOMETRICS
MIDTERM 2
Spring 2003
Name:
Student ID:
results in a 13 percent higher per capita income relative to the United States. It is best
not to interpret the intercept. The regression explains approximately 62 percent of the
variation in per capita income among the 104 countries of the world.
(b)
Calculate the t-statistics and test (say at the 95% confidence level) whether or not each of the
population parameters are significantly different from zero.
Answer: The t-statistics for population growth and the saving rate are –4.06 and 6.10, making
both coefficients significantly different from zero at conventional levels of
significance.
(c)
You remember that human capital in addition to physical capital also plays a role in determining
the standard of living of a country. You therefore collect additional data on the average
educational attainment in years for 1985, and add this variable (Educ) to the above regression.
This results in the modified regression output:
R̂elPersInc = 0.046 – 5.869 × n + 0.738 × sK + 0.055 × Educ, R2=0.775, SER = 0.1377
(0.079) (2.238)
(0.294)
(0.010)
How has the inclusion of Educ affected your interpretation of the slope estimates, r-squared and
other properties of the regression?
Answer: The coefficient on the population growth rate is roughly half of what it was originally,
while the coefficient on the saving rate has approximately doubled. The regression R2
has increased significantly.
(d)
Brazil has the following values in your sample: RelPersInc = 0.30, n = 0.021, sK = 0.169, Educ =
3.5. Does your equation overpredict or underpredict the relative GDP per worker? What would
happen to this result if Brazil managed to double the average educational attainment?
Answer: The predicted value for Brazil is 0.240. Hence the regression underpredicts Brazil’s per
capita income. Increasing Educ to 7.0 would result in a predicted per capita income of
0.43, which is a substantial increase from both its current actual position and the
previously predicted value.
2.
After analyzing the age-earnings profile for 1,744 workers as shown in the figure, it becomes clear
to you that the relationship cannot be approximately linear.
4
Professor Òscar Jordà
ECN 140 – ECONOMETRICS
MIDTERM 2
Name:
Spring 2003
Student ID:
2000
EARNINGS
1500
1000
500
0
0
20
40
60
80
100
AGE
You estimate the following polynomial regression model, controlling for the effect of gender by
using a dummy variable that takes on the value of one for females and is zero otherwise:
Êarn = –795.90 + 82.93 × Age – 1.69 × Age 2 + 0.015 × Age3 – 0.0005 × Age 4
(283.11) (29.29)
(1.06)
(0.016)
(0.0009)
– 163.19 Female, R2=0.225, SER=259.78
(12.45)
(a)
Test for the significance of the Age4 coefficient. Describe one strategy you can think of to
determine the appropriate degree of the polynomial.
Answer: The coefficient has a t-statistic of 0.56 and hence is not statistically significant at
conventional levels. The strategy is described in section 6.2 of the textbook.
Considering first a polynomial of degree r, the coefficient associated with the largest
value of r is tested for significance. From there, a sequential hypothesis testing
procedure should be followed.
(b)
You run two further regressions. Present an argument as to which one you should use for further
analysis.
Êarn = – 683.21 + 65.83 × Age – 1.05 × Age 2 + 0.005 × Age3
(120.13) (9.27)
(0.22)
(0.002)
– 163.23 Female, R2=0.225, SER=259.73
(12.45)
Êarn = – 344.88 + 41.48 × Age – 0.45 × Age 2
5
Professor Òscar Jordà
ECN 140 – ECONOMETRICS
MIDTERM 2
Spring 2003
Name:
Student ID:
(51.58) (2.64)
(0.03)
– 163.81 Female, R2=0.222, SER=260.22
(12.47)
3
Answer: The coefficient of Age is statistically significant at the 1% level using a one-sided
hypothesis. The polynomial of degree three seems therefore the appropriate regression.
(c)
Sketch the graph of fitted earnings of males against age of your preferred regression. Does this
make sense? Are you concerned about the negative coefficient on the regression intercept? What is
the implication for female earners in this sample?
Answer:
Earnings
Predicted Earnings and Age
700
600
500
400
300
200
100
0
15
25
35
45
55
65
Years
Polynomial Degree 3
Polynomial Degree 2
There is little difference between the two fits for values between the age of 25 and 60. The inverted Ushape is well known to exist for age-earnings profiles, and hence the plot makes sense. There is no
interpretation for the intercept, since there is no data close to the origin. Females earn significantly less
at every age level.
(d)
Calculate the effect of changing age from 30 to 31 on earnings, holding constant the gender
variable and using the model with polynomial of degree 3 regressors. Finally, calculate the
standard errors of the estimated effect (you do not have to calculate this numerically, just explain
the procedure in detail).
Answer: Since this is a nonlinear relationship, the effect will depend on the age level. This is
described in section 6.1 of the textbook. In essence, the predicted earnings value for
one age level has to be computed first. Next, the same has to be done for the age level
plus one. Finally the two values are differenced to find the change in earnings
associated with the age level.
For the polynomial of degree 3, the first task is to consider the estimated change in
earnings associated with a change in age by one year, say from 30 to 31. This is given
=β
× (31 − 30) + β
(312 − 302 ) + β
(313 − 303 ) or
by ∆Y
1
2
3
6
Professor Òscar Jordà
ECN 140 – ECONOMETRICS
MIDTERM 2
Name:
Spring 2003
Student ID:
+ 61β
+ 2791β
. The standard error of the estimated effect is then given
∆Y = β
1
2
3
from
| ∆Y |
SE (∆Y ) =
, where
F
+ 61β
+ 2791β
) / SE ( β
+ 61β
+ 2791β
]2 . A 95% confidence
F = [( β
1
2
3
1
2
3
interval for the change in the expected value of earnings is
+ 61β
+ 2791β
) ± 1.96 × SE ( β
+ 61β
+ 2791β
) . Obviously these
(β
1
2
3
1
2
3
expressions get quite complicated once you go beyond a quadratic.
7
Professor Òscar Jordà
ECN 140 – ECONOMETRICS
MIDTERM 2
Name:
Spring 2003
Student ID:
Empirical Question: [40 points]
1.
The following output is based on a sample of 401, 12th grade students and the variables:
DRUGS: Index of how much drug selling there is in the student's neighborhood..
ENROLLMENT: Enrollment at the school attended by the student.
MATH87: Math test score in 8th Grade, in standard deviations from the mean.
MATH91: Math test score in 12th Grade, in standard deviations from the mean.
SES: Socio-economic status = a combination of parent's education, income, and goods in the household.
SES is measured in standard deviations from the mean.
URBAN: % of people in school' zip code that live in an urban area.
Three models are estimated with these data, whose EViews output is reported below:
Model 1
Dependent Variable: MATH91
Method: Least Squares
Sample: 1 407
Included observations: 402
Excluded observations: 5
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
MATH87
SES
URBAN
ENROLLMENT
DRUGS
0.012374
0.641167
0.133049
0.000446
6.76E-05
-0.916503
0.083436
0.037062
0.039321
0.000880
7.72E-05
0.436590
0.148309
17.29973
3.383634
0.507469
0.876704
-2.099230
0.8822
0.0000
0.0008
0.6121
0.3812
0.0364
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.548663
0.542964
0.610152
147.4253
-368.7840
1.970192
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
-0.025206
0.902533
1.864597
1.924246
96.27848
0.000000
8
Professor Òscar Jordà
ECN 140 – ECONOMETRICS
MIDTERM 2
Name:
Spring 2003
Student ID:
Model 2
Dependent Variable: MATH91
Method: Least Squares
Sample: 1 407
Included observations: 405
Excluded observations: 2
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
MATH87
SES
DRUGS
0.070319
0.639761
0.141094
-0.735341
0.071595
0.036826
0.039050
0.413901
0.982178
17.37269
3.613192
-1.776612
0.3266
0.0000
0.0003
0.0764
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.542521
0.539099
0.610941
149.6727
-373.0943
1.954262
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
-0.026229
0.899902
1.862194
1.901738
158.5144
0.000000
Model 3
Dependent Variable: MATH91
Method: Least Squares
Sample: 1 407
Included observations: 406
Excluded observations: 1
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
MATH87
SES
-0.044097
0.647725
0.132900
0.030527
0.036618
0.038857
-1.444501
17.68890
3.420206
0.1494
0.0000
0.0007
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.540066
0.537784
0.611986
150.9343
-375.2188
1.969220
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
-0.023767
0.900158
1.863147
1.892750
236.6066
0.000000
9
Professor Òscar Jordà
ECN 140 – ECONOMETRICS
MIDTERM 2
Name:
Spring 2003
Student ID:
Answer the following questions:
(a) Choose the “best” model and explain how you based your choice.
(b) Test the null hypothesis that the coefficients of the variables URBAN and ENROLLMENT are
jointly zero in Model 1 using the information in the output for any of the three models. Hint: the
( SSRR − SSRu ) n − k − 1
~ Fq ,∞ . The 95% confidence
SSRu
q
= 3.84; F2,∞ = 3.00; F3,∞ = 2.60; F4,∞ = 2.37; F5,∞ = 2.21
F-statistic can be computed as F =
level critical values are: F1,∞
(c) Test the null hypothesis that the coefficient for the variable DRUGS in model 2 is not statistically
significant without using the t-statistic reported in that output. Compare your results to the tstatistic reported in Model 2. Hint: The F ratio is the square of the t-ratio when only one
constraint is being tested.
(d) What do you learn (from a policy point of view) from the output in these three models? In
particular, discuss the coefficient on the variables MATH87, DRUGS and SES and I am not just
referring to their sign.
SCRATCH AREA
10
Download