1.) Which of the following is not an assumption of... a.) The model is correctly specified

advertisement
1.) Which of the following is not an assumption of the CLRM?
a.) The model is correctly specified
b.) The independent variables are exogenous
c.) The errors are normally distributed
d.) The errors have mean zero
e.) The errors have constant variance
2.) For a model to be correctly specified under the CLRM, it must:
a.) be linear in the coefficients
c.) include all relevant independent variables and their associated transformations
c.) have an additive error term
d.) all of the above
e.) none of the above
3.) In order for our independent variables to be labelled “exogenous” which of the following must be true:
a.) E(εi) = 0
b.) Cov(Xi,εi) = 0
c.) Cov(εi,εj) = 0
d.) Var(εi) = σ2
e.) none of the above
4.) The OLS estimator is said to be BLUE when:
a.) Assumptions 1 through 2 are satisfied
b.) Assumptions 1 through 3 are satisfied
c.) Assumptions 1 through 4 are satisfied
d.) Assumptions 1 through 5 are satisfied
e.) Assumptions 1 through 6 are satisfied
5.) The Gauss-Markov Theorem says that when the 6 classical assumptions are satisfied:
a.) The least squares estimator is unbiased
b.) The least squares estimator has the smallest variance of all linear unbiased estimators
c.) The least squares estimator has an approximately normal sampling distribution
d.) The least squares estimator is consistent
e.) None of the above
6.) The central limit theorem tells us that the sampling distribution of least squares regression coefficients:
a.) is always normal
b.) is always normal in large samples
c.) approaches normality as the sample size increases
d.) is normal in Monte Carlo simulations
e.) none of the above
7.) The sampling variance of the slope coefficient in the regression model with one independent variable:
a.) will be smaller when there is more variation in ε
b.) will be larger when there is less variation in ε
c.) will be smaller when there is more variation in X
d.) will be larger when there is more co-variation in ε and X
e.) none of the above
8.) Suppose you want to test the following hypothesis at the 5% level of significance:
H0: β1 = 0
H1: β1 ≠ 0
Which of the following statements is/are true?
a.) the probability of a Type I error is 0.025
b.) the probability of a Type I error is 0.05
c.) the probability of a Type II error is 0.95
d.) the probability of a Type II error is 0.05
e.) none of the above
9.) The F test of overall significance
a.) is based on a test statistic that has an F distribution with k and n-k-1 degrees of
freedom
b.) is based on a test statistic that has an F distribution with n-k-1and k degrees of freedom
c.) helps to detect whether relevant variables have been omitted from the model
d.) a and c
e.) b and c
10.) Given the equation for the F statistic, we can say that it is
a.) decreasing in R2, decreasing in n, and decreasing in k
b.) increasing in R2, increasing in n, and increasing in k
c.) decreasing in R2, increasing in n, and decreasing in k
d.) increasing in R2, increasing in n, and decreasing in k
e.) none of the above
11.) The F test of overall significance is valid if:
a.) Assumptions 1 through 6 of the CLRM are true
b.) the Central Limit Theorem can be invoked
c.) the Gauss-Markov Theorem is true
d.) the errors of the regression are normally distributed
e.) none of the above
12.) From a gravity model of trade, you estimate that Pr[0.9828  distance  0.7982]  95% , this
allows you to state that:
a.) there is a 95% chance that all potential estimates of the coefficient on distance are in this range
b.) you can reject the null hypothesis that the true coefficient on distance is equal to zero at
the 5% level of significance.)
c.) there is a 5% chance that some of the potential estimate of the coefficient on distance fall
outside of this range
d.) all of the above
e.) none of the above
13.) Omitting a relevant explanatory variable that is correlated with the other independent variables
causes:
a.) no bias and no change in variance
b.) no bias and an increase in variance
c.) no bias and a decrease in variance
d.) bias
e.) none of the above
14.) Adding an irrelevant explanatory variable that is uncorrelated with the other independent variables
causes:
a.) bias and no change in variance
b.) bias and an increase in variance
c.) no bias and no change in variance
d.) no bias and an increase in variance
e.) none of the above
15.) The RESET test is designed to detect problems associated with:
a.) specification error of a known form
b.) heteroskedasticity
c.) multicollinearity
d.) serial correlation
e.) none of the above
16.) Omitting a constant term from our regression will likely lead to:
a.) a lower R2, a lower F statistic, and biased estimates of the independent variables when β0 = 0
b.) a higher R2, a lower F statistic, and biased estimates of the independent variables when β0 = 0
c.) a higher R2, a lower F statistic, and biased estimates of the independent variables when β0 = 0
d.) a higher R2, a higher F statistic, and biased estimates of the independent variables when β0 = 0
e.) none of the above
17.) In the regression model ln Yi  0  1 ln X i   i ,
a.) β1 measures the elasticity of Y with respect to X
b.) β1 measures the elasticity of X with respect to Y
c.) β1 measures the percentage change in Y for a one unit change in X
d.) the marginal effect of X on Y is constant
e.) none of the above
18.) Suppose we estimate the linear regression Yˆi  10  2 X i  5Fi where Yi is person i’s hourly wage, Xi
is years of employment experience, and Fi equals one for women and zero for men. If we redefined the
variable Di to equal one for men and zero for women instead, the least squares estimates would be:
a.) Yˆi  5  2 X i  5Di
b.) Yˆi  5  2 X i  5Di
c.) Yˆi  10  2 X i  5Di
d.) Yˆi  10  2 X i  5Di
e.) none of the above
19.) The consequences of multicollinearity are that the OLS estimates:
a.) will be biased while the standard errors will remain unaffected
b.) will be biased while the standard errors will be smaller
c.) will be unbiased while the standard errors will remain unaffected
d.) will be unbiased while the standard errors will be smaller
e.) none of the above
20.) Which of the following pairs of independent variables would violate Assumption VI?
That is, which pairs of variables are perfect linear functions of each other?
a.) Right shoe size and left shoe size of students enrolled in BUEC 333
b.) Consumption and disposable income in Canada over the last 30 years
c.) Xi and 2Xi
d.) Xi and Xi2
e.) none of the above
21.) Which of the following is not an appropriate remedy for serial correlation?
a) Generalized least squares
b) Ordinary least squares and Newey-West standard errors
c) Drop a redundant variable
d) Add an omitted variable
e) None of the above
22.) The Durbin-Watson test:
a) is pretty useless
b) tests for positive serial correlation
c) tests for first-order autocorrelation
d) tests for multicollinearity
e) none of the above
23.) Impure serial correlation:
a) is the same as pure serial correlation
b) can be detected with residual plots
c) is caused by mis-specification of the regression model
d) b and c
e) none of the above
24.) If regression errors are serially correlated, then:
a) least squares coefficient estimates are biased
b) GLS coefficient estimates are biased
c) least squares standard errors are wrong, but we can get consistent estimates of them
using the Newey-West formula
d) the BLUE is weighted least squares
e) none of the above
1.) Economists have long tried to understand the determinant of stock market performance. One
possibility is that the stock market is affected by the weather on Wall Street. Using daily data for 28
years, one researcher estimated an equation with the following variables (standard errors in parentheses):
D̂J t  ˆ0  0.10 DJ t 1  0.0010 J t  0.017 M t  0.0005Ct
(0.01)
(0.0006) (0.004)
(0.0002)
N = 6,911
R 2  0.02
Where: DJt = the percentage change in the Dow Jones Industrial Average on day t
Jt = a dummy variable equal to 1 if the t-th day was in January and 0 if otherwise
Mt = a dummy variable equal to 1 if the t-th day was a Monday and 0 if otherwise
Ct = a variable equal to 1 if the cloud cover from sunrise to sunset on the t-th day was 20% or
less, equal to -1 of the cloud cover was 100% and 0 if otherwise
a) The researcher did not include an estimate of the constant term in the published regression
results. Which of the Classical Assumptions supports the conclusion that you should
not spend too much time analysing estimates of the constant term? Explain.
Assumption II. This effectively tells us that an estimate of the constant typically includes not
only the true parameter value, but also the sample mean of the residuals (if this is not equal to
zero).
b) Which of the Classical Assumptions would be violated if you decided to add a dummy
variable to the equation that was equal to 1 if the t-th day was a Tuesday, Wednesday,
Thursday, or Friday and equal to 0 if otherwise?
Assumption VI as such a dummy variable would be perfectly correlated with M as stock markets
are closed on Saturdays and Sundays.
c) Carefully state the meaning of the coefficients on DJt-1 and Mt.
DJt-1: A one-unit (in other words, a 1%) increase in yesterday’s DJ will result in a 0.1%
increase in today’s Dow Jones average, holding constant the other independent
variables in the equation.
M: The Dow Jones will fall by 0.017% on Mondays, holding constant the other independent
variables in the equation.
d) The variable C reflects the fact that approximately 85% of all New York City’s rain fails
on days with 100% cloud cover. In this case, is C a dummy variable? What
assumptions did the author have to make to use this variable?
Technically, C is not a dummy variable because it can take on three different values (and not
two as for a true dummy variable. The researcher assumed (at least implicitly) that all levels
of cloud cover between 0% and 20% have the same impact on the Dow and also that all levels
of cloud cover between 21% and 99% have the same impact on the Dow. In addition, by using
the same variable to represent both sunny and cloudy days, he constrained the coefficient of
sun and cloud to be equal.
e) The researcher concludes that these findings cast doubt on what is known as the “efficient
markets hypothesis”. Do some quick research on this topic (hello, Wikipedia). Based
on the results above, would you agree? Why or why not?
This particular equation does little to support the researcher’s conclusion. The poor fit and
the constrained specification combine to outweigh the significant coefficients of Rt–1 and M.
Although the efficient markets hypothesis does not have a lot going for it in other instances,
this regression alone is not evidence to reject it since it is so poorly specified.
2.) Prescription drug prices vastly differ across countries, leading to charges of international price
discrimination on the part of drug companies. Researchers have estimated a model of prices of
pharmaceuticals in a cross section of 32 countries. Their estimates are given below (standard errors in
parentheses).
P̂i  38.22  1.43GDPNi  0.60CVNi  7.31PPi  15.63DPCi  11.38IPCi
(0.21)
(0.32)
(6.52)
(3.96)
(7.16)
N = 32
R 2  0.78
Where: Pi = the level of pharmaceutical prices in the i-th country divided by that of the US
GDPNi = per capita domestic product in the i-th country divided by that of the US
CVNi = per capita consumption of pharmaceuticals divided by that of the US
PPi = a dummy variable equal to 1 if patents for pharmaceutical products are recognized in
the i-th country and 0 if otherwise
DPCi = a dummy variable equal to 1 if the i-th country applied strict price controls and 0 if
otherwise
IPCi = a dummy variable equal to 1 if the i-th encourage price competition and 0 if otherwise
a) Explain what your expectations are for the sign of each of the independent variables.
If price discrimination exists, then the coefficient of per capita income in a properly specified
price equation should be positive: the higher the ability to pay for drugs, the lower (in
absolute value) the price elasticity of demand for pharmaceuticals and the higher the price a
price discriminator could conceivably charge. In addition, prices will be higher if
pharmaceutical patents are allowed (as this discourages competition) and prices will be lower if
price controls exist, if competition is encouraged, or if the pharmaceutical market in a country is
relatively large (as measured by CVN).
b) Formulate and test appropriate hypotheses concerning the regression coefficients using the
t-test at the 5% level of significance.
All five tests are one-sided with 26 degrees of freedom, so tc = 1.7056 throughout.
GDPN: H0 :   0, HA :   0. Reject H0 because |+6.81|>1.7056 and 6.81 is positive.
CVN: H0 :   0, HA :   0. Reject H0 because |-1.88|>1.7056 and 1.88 is negative.
PP: H0 :   0,HA :   0. Do not reject H0 because |+1.12|<1.7056.
DPC: H0 :   0, HA :   0. Reject H0 because |-3.95|>1.7056 and 3.95 is negative.
IPC: H0 :   0, HA :   0. Do not reject H0 because |-1.59|<1.7056.
c) Set up and interpret 90% confidence intervals for the coefficients on GDPN and PP.
Our confidence interval equation is ˆ  tC* *SE(ˆ ), and the 10% two-sided critical value with
26 degrees of freedom is 1.7056 (the same as a one-sided 5% tC), so the confidence interval
equals ˆ  1.7056*SE( ˆ ), or:
GDPN: ˆ  1.7056*SE(ˆ )  1.43  1.7056*0.21  1.07    1.79
CVN: ˆ  1.7056*SE(ˆ )  0.60  1.7056*0.32  1.15    0.05
PP: ˆ  1.7056*SE(ˆ )  7.31  1.7056*6.52  3.81    18.43
DPC: ˆ  1.7056*SE(ˆ )  15.63  1.7056*3.96  22.38  ˆ  8.88
IPC: ˆ  1.7056*SE(ˆ )  11.38  1.7056*7.16  23.59  ˆ  0.83
In general, a confidence interval is an interval estimator. That is, it is an interval defined by
our critical value and standard errors. A set of similarly constructed confidence intervals will
contain the true beta with a known probability (namely 90%).
d) Do you think the researchers should conclude that international price discrimination exists?
Why or why not?
Yes, this is a reasonable conclusion. The signs of the most relevant variables were as
expected and the coefficients of the most relevant variables were statistically significant. Finally,
the overall fit of the regression was good.
e) How would the estimated results have differed if the authors had not divided each
country’s prices, per capita income, and per capita consumption by that of the US?
Explain your answer.
The values of the coefficients would change, but not their signs or significance. Since we are
normalizing by the United States (that is, making all the variables factors of the levels of things
like consumption observed in the US), we are simply dividing by a constant which should effect
the calculation of the beta-hats, but nothing else.
3.) There are at least two different possible approaches to the problem of building a model of the costs of
production of electric power. Model I hypothesizes that per-unit costs (C) as a function of the number of
kilowatt-hours produced (Q) continually and smoothly falls as production is increased, but it falls at a
decreasing rate. Model II hypothesizes that per-unit costs (C) decrease fairly steadily as production (Q)
increases across plant type, but costs start at a higher level for hydroelectric plants than for other kinds of
facilities.
a) Referring to Lecture 11b, what functional form would you recommend for estimating
Model I? Be sure to write out a specific equation.
A number of forms are possible, but a log-log form would perhaps be the most appropriate:
ln(Ct )  0  1 *ln(Qt )   t
Whatever functional form chosen, it has to satisfy that the conditions that the first derivative
and the second derivatives of the LHS with respect to Q are respectively negative and
positive. In the case of the equation above,
C
C  ln C  ln Q
 ln C 1
C

C
 β1 <0
Q  ln Ci  ln Q Q
 ln Q Q
Q
 β1  0
 2C  β1C

0
Q 2
Q2
b) Referring to Lecture 11b, what functional form would you recommend for estimating Model
II? Be sure to write out a specific equation.
A number of forms are possible, but a linear form with a dummy variable (Dt) capturing a
different intercept term for hydroelectric plants would perhaps be the most appropriate:
Ct  0  1Qt  2 Dt  t
Whatever functional form chosen, it has to satisfy that the conditions that the first derivative
of the LHS with respect to Q is constant and negative or β1<0 and that β2>0.
c) Would R2 be a reasonable way to compare the overall fits of the two equations? Why or
why not?
Answers may vary depending on the functional form indicated. In the example above, R2 is
not appropriate for comparing the overall fits of the two equations as the functional form of
the dependent variable changes and, thus, the TSS.
4.) The Director of Admissions has contracted you to review the annual numbers for the number of
applications that SFU received from high school seniors. Evaluate the following regression results
(standard errors in parentheses):
N̂ t  15000  18000A t  150ln Tt  3000Pt
(9000)
(150)
(6000)
N  22
R 2  0.50
where: Nt the number of high school seniors who apply for admission in year t
At the number of people on the admission staff who visit high schools full time
spreading information about the school in year t
Tt dollars of tuition in year t
Pt the percent of the faculty that had PhDs in year t
a) Discuss the expected signs of the coefficients.
More promotion activities should result in a higher number of applications; higher tuition
should dissuade people from applying to SFU; and more qualified faculty should lead to a
higher number of applications.
b) Compare these expectations with the estimated coefficients using the appropriate tests,
assuming a 5% level of significance.
tc = 1.734 (18 d.f.; one-sided at 5% level of significance)
Coefficient:
Hypothesized
sign:
Calculated tstatistic:
Significance:
βA
+
βT
-
βP
+
2.00
1.00
0.50
Significant
Insignificant; Insignificant
unexpected
sign
c) Evaluate the possible econometric problems that could have caused any observed
differences between the estimated coefficients and what you expected.
There is a potential problem related to omitted variables. It is puzzling why higher tuition
would lead to higher applications. Given that this is time series data, there are probably long
run trends in the economy driving both variables. Furthermore, the percent of the faculty with
PhDs may be an irrelevant variable, given its low t-value and the fact that most applicants
probably do not care that much.
d) Determine whether the semilog function for T makes theoretical sense.
The semilog function for T might be problematic. This β measures the change in the numbers
of applications for a given percentage change in tuition costs. Potential students are
potentially more responsive to the actual levels of tuition and fees (they only have so much
money to spend on school) rather than the relative changes in tuition and fees from year to
year.
e) How would you improve this specification?
The Percent PhDs variables is intended to proxy for the quality of the school. If we could find
a better proxy for quality, then it would make sense to substitute it for X3. If the unexpected
sign for lnT is being caused by an omitted variable that omitted variable must be causing positive
bias. Thus, any suggested omitted variable must be either positively correlated with tuition and
have a positive expected coefficient (like the number of high school seniors in the country or in
the region) or else have a negative correlation with tuition and have a negative expected
coefficient (like the age and quality of a school’s facilities).
5.) Given your good work on the contract work above, now the undergraduate dean hires you to figure out
how to reduce damage done to SFU dorms by rowdy students. Your first step is to build a model of last
term’s damage to each dorm as a function of the attributes of that dorm (standard errors in parentheses):
D̂i  210  733Fi  0.805Si  74.0A i
(253)
(0.752) (12.4)
N  33
R 2  0.84
where: Di the amount of damage in dollars done to the i-th dorm last term
Fi the percentage of the i-th dorm residents who are freshmen
Si the number of students who live in the i-th dorm
Ai the number of incidents involving alcohol that were reported to Ancillary Services from the
i-th dorm last terms (where such incidents involving alcohol may or may not involve damage
to the dorm)
a) Hypothesize signs, calculate t-scores, and test hypotheses for these results at the 5% level.
tc = 1.699 (29 d.f.; one-sided at 5% level of significance)
Coefficient:
Hypothesized
sign:
Calculated tstatistic:
Significance:
βF
+
βS
+
βA
+
2.90
-1.07
5.07
Significant
Insignificant; Insignificant
unexpected
sign
b) What problems appear to exist in this equation? Omitted variables, irrelevant variables, or
multicollinearity? Why?
All three are possibilities; any reasonable answer is acceptable.
c) Suppose you were now told that the simple correlation coefficient between S and A was
0.94. Would that change your answer? How?
Multicollinearity is a stronger possibility/concern now.
d) Is it possible that the unexpected signs in your regression could have been caused by
multicollinearity? Why?
Yes; even though beta-hat remains unbiased in the presence of multicollinearity, the
distribution of beta-hats is wider with multicollinearity.
6.) In your role as a market analyst, you decide to investigate the relationship between stock market
returns in Canada and oil prices. You estimate the following regression on annual data for the period 1965
to 2005 (standard errors in parentheses):
Yˆt  7.05  0.11Pt  3.2 Rt  1.5Gt
(1.25) (0.02) (0.1) (0.3)
R 2  0.55 n  40 F  14.7 DW  0.9
where: Yt is the percentage return of the Toronto Stock Exchange in year t,
Pt is the average price of oil (in Dollars per barrel) in year t,
Rt is the average Canadian interest rate (in percent) in year t,
Gt is the percentage change in Canadian GDP in year t.
a) Test the null hypothesis that, all else equal, oil prices have no effect on stock market
returns, at the 5% level of significance.
The null of oil prices having no effect on stock market returns is equivalent to βP = 0. The
t-statistic in this instance is equal to 0.11 / 0.02 = 5.50, a very large value. Consulting one of our
trusty tables, we find that the critical value for a t-statistic with 37 (= n – k – 1 = 41 – 3 – 1)
degrees of freedom and a two-sided test at the 5% level of significance is 2.026. Thus, we can
easily reject the null hypothesis that oil prices have no effect on stock market returns.
b) What is the marginal effect of GDP growth on stock market returns?
This is equal to 1.5 and is interpreted as follows: holding all other variables constant, if we were
to observe one percentage point increase in Canadian GDP for a given year, we would expect
returns on the TSX to be 1.5 percentage points higher.
c) What proportion of variation in stock market returns growth is explained by variation in oil
prices, the interest rate, and GDP growth over this period?
This is 0.55 and comes from the very definition of the adjusted-R-squared variable.
d) Does anything appear to be wrong with your regression specification? Do you think any
assumptions of the CLRM are violated? If yes, what do you think the cause is? If not,
explain why not. Does this affect your answer to part a) in any way?
The key insight here is when the value of the Durbin Watson statistic is far from a value of 2.0
there may be problems with serial correlation. And the value of 0.9 reported is very far from 2.0.
Consulting another table of the DW critical values, you will find that lower bound critical value
for 40 observations and three independent variables is 1.34 at the 5% level of significance and
1.15 at the 1% level of significance. In either case, we must reject the null hypothesis that ρ ≤ 0.
In this case, we know that OLS is underestimating the size of the standard errors and, thus,
inflating the value of our t-statistics. It remains possible that upon correcting the standard errors
for serial correlation we may fail to reject the null that oil prices have no effect on stock market
returns.
Download