Answer Key - Mercer University

advertisement
Fall 2013: Mercer University
Econ 353: Introduction to Econometrics
Exam 1: Answers
Section 1: Multiple Choice Questions
1) The residual from a standard regression model is defined as:
a) The difference between the actual value, y, and the mean, y-bar
b) The difference between the fitted value, y-hat, and the mean, y-bar
c) The difference between the actual value, y, and the fitted value, y-hat
d) The square of the difference between the fitted value, y-hat, and the mean, y-bar
Answer: C
2) Which one of the following statements best describes the algebraic representation of
the fitted regression line?
a)
b)
c)
d)
Answer: B
3) Which of the following statements is true concerning the population regression
function (PRF) and sample regression function (SRF)?
a) The PRF is the estimated model.
b) The PRF is used to infer likely values of the SRF.
c) Whether the model is good can be determined by comparing the SRF and the PRF
d) The PRF is a description of the process thought to be generating the data.
Answer: D
4) Which of the following models can be estimated using OLS, following suitable
transformations if necessary? (Note that "e" denotes the exponential).
i)
ii)
iii)
iv)
a)
b)
c)
d)
(i) only
(i) and (iii) only
(i), (iii), and (iv) only
(i), (ii), (iii), and (iv)
1
Answer: D
5) If an estimator is said to have minimum variance, which of the following statements
is NOT implied?
a) The probability that the estimate is a long way away from its true value is
minimized.
b) The estimator is efficient.
c) Such an estimator would be termed "best".
d) Such an estimator will always be unbiased
Answer: C
6) Consider the following 2 regression models:
Model 1:
Model 2:
Which of the following statements are true?
i) Model 2 must have an R2 at least as high as that of model 1.
ii) Model 2 must have an adjusted R2 at least as high as that of model 1
iii) Models 1 and 2 would have identical values of R2 if the estimated coefficient on
α3 is zero
iv)) Models 1 and 2 would have identical values of adjusted R2 if the estimated
coefficient on α3 is zero.
a)
b)
c)
d)
(ii) and (iv) only
(i) and (iii) only
(i), (ii), and (iii) only
(i), (ii), (iii), and (iv)
Answer: D
7)
a)
b)
c)
d)
What is the meaning of the term "heteroscedasticity"?
The variance of the errors is not constant
The variance of the dependent variable is not constant
The errors are not linearly independent of one another
The errors have non-zero mean
Answer: A
2
8) If an estimator is a linear estimator, this means that:
a) you can calculate its value using linear regression methods
b) Its formula can be expressed as a linear function of the values of the random variable that
appear in any particular sample.
c) As sample size increases, the accuracy of the estimator increases linearly with sample size.
d) As the population from which the sample is drawn gets larger, the size of the sample you
must take gets larger as a linear function of population size
Answer: A
9) In order to estimate the parameters of a simple ordinary least squares regression model, we
need all but which of the following conditions to be met?
a) expected value of the error term is zero
b) homoscedasticity (constant variance of the errors)
c) no correlations among the error terms
d) no correlations between the error term and the explanatory variable(s)
e) normally distributed error term
Answer: B
10) Which of the following statements is correct concerning the conditions required for
OLS to be a usable estimation technique?
a) The model must be linear in the parameters
b) The model must be linear in the variables
c) The model must be linear in the variables and the parameters
d) The model must be linear in the residuals.
Answer: A
11) Which of the following is NOT a good reason for including a disturbance term in a
regression equation?
a)
It captures omitted determinants of the dependent variable
b) To allow for the non-zero mean of the dependent variable
c) To allow for errors in the measurement of the dependent variable
d) To allow for random influences on the dependent variable
Answer: B
12) Which one of the following is the most appropriate as a definition of R2 in the
context that the term is usually used?
a) It is the proportion of the total variability of y that is explained by the model
b) It is the proportion of the total variability of y about its mean value that is explained
by the model
3
c) It is the correlation between the fitted values and the residuals
d) It is the correlation between the fitted values and the mean.
Answer: B
13) Suppose that the value of R2 for an estimated regression model is exactly one.
Which of the following are true?
(i)
(ii)
(iii)
(iv)
All of the data points must lie exactly on the line
All of the residuals must be zero
All of the variability of y about is mean have has been explained by the model
The fitted line will be horizontal with respect to all of the explanatory variables
a) (ii) and (iv) only
b) (i) and (iii) only
c) (i), (ii), and (iii) only
d) (i), (ii), (iii), and (iv)
Answer: C
14) Consider the following two regressions
yt  1   2 x2t   3 yt 1  ut
log( yt )   1   2 log( x2t )   3 log( yt 1 )  ut
Which of the following statements are true?
(i)
(ii)
(iii)
(iv)
The SSR will be the same for the two models
The R2 will be the same for the two models
The SSE will be different for the two models
The estimated regression coefficients would be identical for the two models.
(a) (ii) and (iv) only
(b) (i) and (iii) only
(c) (i), (ii), and (iii) only
(d) (i), (ii), (iii), and (iv).
Answer: None of them are Correct
4
^
15) What does the following condition imply? E     
 
a) The estimated coefficients may be smaller or larger, depending on the sample that is the
result of a random draw.
b) On average, the estimated coefficients will be equal to the values that characterize the true
relationship between y and x in the population.
c) In a given sample, estimates may differ considerably from true values.
d) All of the above.
Answer: D
Section 2: Computer Applied Questions
1) Download the dataset “experiment.dta” from blackboard. The dataset contains GDP data for USA
and its various components. Answer the following questions:
Run the following two regressions:
𝐺𝐷𝑃𝑡 = 𝛽0 + 𝛽1 𝐶𝑡 + 𝛽2 𝐼𝑡 + 𝛽3 𝐺𝑡 + 𝑈𝑡 (Model 1)
𝐿𝑜𝑔(𝐺𝐷𝑃𝑡 ) = 𝛽0 + 𝛽1 𝐿𝑜𝑔(𝐶𝑡 ) + 𝛽2 𝐿𝑜𝑔(𝐼𝑡 ) + 𝛽3 𝐿𝑜𝑔(𝐺𝑡 ) + 𝑈𝑡 (Model 2)
Now answer the following questions:
a) Report values of the estimated coefficients from model 1 and model2. How do you interpret
them? (7 points)
Answer:
Model 1: 𝛽̂1 = .0148398 , 𝛽̂2 = 2.916066 , 𝛽̂3 = .8198977 . Since both GDP and its
components are measured in dollars and the variables in the model are expressed in levels, the
regression coefficient estimates indicate by how in dollars would GDP change when any of the
components change by 1 dollar.
Model 2: 𝛽̂1 = 1.049151 , 𝛽̂2 = −.1250103 , 𝛽̂3 = −.4991365 . Since both GDP and its
components are measured in dollars and but the variables in the model are expressed in logs,
the regression coefficient estimates report elasticity of substitution between GDP and its various
components. This means that they indicate by what percentage would GDP change when any of
the components change by 1 %.
b) Report R2. Are they very different from each other? Why or why not?. (7 points)
Model 1: 𝑹𝟐 = 𝟎. 𝟗𝟖𝟕𝟓
Model 2: 𝑹𝟐 = 𝟎. 𝟗𝟖𝟖𝟗
5
The R-squared or the goodness of fit of the regression models would be very similar since we
are using the same set of explanatory variables in both model and the transformations of the
variables in the model (log-log vs level-level) are identical.
c) Report standard errors for each of the coeffients in both models. What can you infer from the
size of the standard errors? (7 points)
Model 1: 𝑆𝐸( 𝛽̂1 ) = 0.2150444 , 𝑆𝐸( 𝛽̂2 ) = .3166968 , 𝑆𝐸( ̂
𝛽3) = .3009458
̂
̂
̂
Model 2: 𝑆𝐸( 𝛽1 ) = .0872508 , 𝑆𝐸( 𝛽2 ) = .0339217 , 𝑆𝐸( 𝛽3) = .0678367
It is clear that standard errors in model 2 are much smaller than model 2. It means that
regression coefficients are more accurately estimated in model 2 and we should have more
confidence in the estimated coefficients of model 2 than model 1.
d) Is there multicollinearity in these modes? Report the result of a test that might tell whether
there is any multicollinearity. (7 points)
The multicollinearity in the model would be detected by the variance inflation factor.
Model 1: The variance inflation factor for each of the variables is as follows:
C= 602.60
I= 286.40
G= 77.77
Since the VIF for each of the explanatory variables in the model, there is high degree of
multicollienarity in the model.
Model 2: The variance inflation factor for each of the variables is as follows:
C= 789.20
I= 496.59
G= 110.81
Since the VIF for each of the explanatory variables in the model, there is high degree of
multicollienarity in the model.
e) Is there hetaroskedasticity in these modes? Report the result of a test that might tell whether
there are any hetaroskedasticity . (7 points)
The heteroskedastcity test in our model is a test performed by stata for which the code was
provided. The probability of rejecting the hypothesis of homoskedastcity in model 1 is 0.9108
which is very high. This means that there is great degree of heteroskedasticity in model1. The
probability of rejecting the hypothesis of homoskedastcity in model 2 is 0.0 which is very low.
This means that there is great degree of homoskedastcity in model2.
2) Download the dataset “wage2.dta” from blackboard. The dataset contains wage and its many
determinants. The relevant variable included in the dataset are as follows:
a) Wage = monthly earnings of a person.
b) Hours= average weekly hours of work.
c) IQ=IQ scores
d) Educ = years of education
e) Exper =years of experience.
6
f)
g)
h)
i)
Age = age in years
Sibs = number of siblings
Meduc= mothers year of education
Feduc=father’s year of education
Now do the following:
1) Run the following regression:
𝑒𝑑𝑢𝑐𝑡 = 𝛽0 + 𝛽1 𝑠𝑖𝑏𝑠𝑡 + 𝛽2 𝑚𝑒𝑑𝑢𝑐𝑡 + 𝛽3 𝑓𝑒𝑑𝑢𝑐𝑡 +𝑈𝑡
Interpret the signs of 𝛽1 and𝛽2 . Do they have an expected sign? Explain (10 Points)
Answer:
The regression results are as follows:
𝛽̂1 = −.0936359 , 𝛽̂2 = .1307872 , 𝛽̂3 = .2100041 .
The regression results suggest that number of siblings have a negative impact on the years of
education by a person in the sample; mother’s and father’s education level have a positive
impact. It makes sense because siblings compete with each other for the financial resources of
the parents. As the number of siblings increase, resources allocated a single child goes down
which might reduce the number of years of schooling the child can acquire.
2) Run the following regression:
𝑤𝑎𝑔𝑒𝑡 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑡 + 𝛽2 𝑒𝑥𝑝𝑒𝑟𝑡 + 𝛽3 ℎ𝑜𝑢𝑟𝑠𝑡 +𝛽4 𝐼𝑄𝑡 + 𝑈𝑡
Interpret the signs of𝛽2 , 𝛽3 𝑎𝑛𝑑 𝛽1 . Do they have an expected sign? Explain (10 Points)
Answer:
The regression results are as follows:
𝛽̂1 = 58.5538 , 𝛽̂2 = 17.31687 , 𝛽̂3 = −2.286972 , 𝛽̂4 = 5.109458 .
The regression results suggest that years of education, experience and IQ has a positive impact
on the monthly wage while number of hours worked has a negative impact on wage. Since the
former indicate skill and productivity, they should all have a positive impact on wage. On the
other hand, people who work more number of hours are usually hourly-contracted workers
whose monthly wage is usually low. Therefore, all the signs make sense.
3) Suppose you belief that educ and exper might reflect same information about a person. You
suspect that there might be multicollinearity between these variables. How do you test for that?
Report the value of that test and interpret. (7.5 Points)
We can use the variance inflation factor to determine the degree of collinearity between the
variable of interest. We see that the VIF for educ is 1.64 and VIF for exper is about 1.26. Since
both of them have a value less than 10, we can argue that there is no strong multicollinearity
between these two variables.
4) Suppose without running a test on multicollinearity, you just drop one of the variables from
educ or exper. Report the regression results by reporting the R-square. Did the goodness fit of
your model improve? Why or why not? By trying to solve the multicollinearity problem, you
might have created another problem. Please explain that. (7.5 Points)
If we drop educ, the regression model now looks like:
𝑤𝑎𝑔𝑒𝑡 = 𝛽0 + 𝛽1 𝑒𝑥𝑝𝑒𝑟𝑡 + 𝛽2 ℎ𝑜𝑢𝑟𝑠𝑡 +𝛽3 𝐼𝑄𝑡 + 𝑈𝑡
In this model the goodness of fit looks like:
7
𝑹𝟐 = 𝟎. 𝟏𝟎𝟏𝟖
If we drop exper, the regression model now looks like:
𝑤𝑎𝑔𝑒𝑡 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑡 + 𝛽2 ℎ𝑜𝑢𝑟𝑠𝑡 +𝛽3 𝐼𝑄𝑡 + 𝑈𝑡
In this model the goodness of fit looks like:
𝑹𝟐 = 𝟎. 𝟏𝟑𝟓𝟖
In both case, we see that the value of the R-squared goes down compared to the value of the Rsquared in the original model which was 0.1636. This indicates that by dropping any of the
above mentioned variables, we lose some goodness of fit. That means that we mistakenly
omitted a value which was relevant for explaining variation of our dependent variable. This kind
error is called the omitted variable bias.
Codes required for the test
1) regress dependent variable set of independent variables.
2) summarize: summarizes the variables.
3) estat hettest: tests for the presence of hetaroskedasticity. If the prob value is low, you reject the
assumption of heteroskedasticity. So your model has homoskedasticity
4) vif: calculates variance inflection factor.
8
Download