Given name:____________________ Family name:___________________

advertisement
Given name:____________________
Student #:______________________
Family name:___________________
Section #:______________________
BUEC 333 FINAL
Multiple Choice (2 points each)
1.) Suppose that in the simple linear regression model Yi = β0 + β1Xi + εi on 100 observations, you
calculate that R2= 0.5, the sample covariance of X and Y is 10, and the sample variance of X is 15. Then
the least squares estimator of β1 is:
a) not calculable using the information given
b) 1/3
c) 1 / 3
d) 2/3
e) none of the above
2.) The Durbin-Watson test is only valid:
a) with models that include an intercept
b) with models that include a lagged dependent variable
c) with models displaying multiple orders of autocorrelation
d) all of the above
e) none of the above
3.) Suppose you have a random sample of 10 observations from a normal distribution with mean = 10 and
variance = 2. The sample mean (x-bar) is 8 and the sample variance is 3. The sampling distribution of xbar has
a.) mean 8 and variance 3
b.) mean 8 and variance 0.3
c.) mean 10 and variance 0.2
d.) mean 10 and variance 2
e.) none of the above
4.) From a gravity model of trade, you estimate that Pr[0.9828  distance  0.7982]  95% , this
allows you to state that:
a.) there is a 95% chance that all potential estimates of the coefficient on distance are in this range
b.) you can reject the null hypothesis that the true coefficient on distance is equal to zero at
the 5% level of significance.
c.) there is a 5% chance that some of the potential estimate of the coefficient on distance fall
outside of this range
d.) all of the above
e.) none of the above
1
5.) Suppose you compute a sample statistic q to estimate a population quantity Q. Which of the following
is/are false?
[1] the variance of Q is zero
[2] if q is an unbiased estimator of Q, then q = Q
[3] if q is an unbiased estimator of Q, then q is the mean of the sampling distribution of Q
[4] a 95% confidence interval for q contains Q with 95% probability
a.) 2 only
b.) 3 only
c.) 2 and 3
d.) 2, 3, and 4
e.) 1, 2, 3, and 4
6.) In order for our independent variables to be labelled “exogenous” which of the following must be true:
a.) E(εi) = 0
b.) Cov(Xi,εi) = 0
c.) Cov(εi,εj) = 0
d.) Var(εi) = σ2
e.) none of the above
7.) If (correlated) omitted independent variables are serially correlated, then:
a) least squares coefficient estimates are biased
b) GLS coefficient estimates are biased
c) least squares standard errors are wrong
d) ordinary least squares is not BLUE
e) all of the above
8.) We saw the claim that the value of X1 is an unbiased estimator of the sample mean because E(X1) = μ.
Now consider, the estimator (X1 + X2)*2. Is this another unbiased estimator of the population mean?
a.) answer depends on the underlying distribution of X
b.) this is a biased estimator of the population mean
c.) this is an unbiased estimator of the population mean
d.) there is insufficient information to answer this question
e.) none of the above
9.) To be useful for hypothesis testing, a test statistic must:
a.) be computable using sample data
b.) have a known sampling distribution when the null hypothesis is true
c.) have a known sampling distribution when the null hypothesis is false
d.) a and b only
e.) none of the above
10.) Adding an irrelevant explanatory variable that is uncorrelated with the other independent variables
causes:
a.) bias and no change in variance
b.) bias and an increase in variance
c.) no bias and no change in variance
d.) no bias and an increase in variance
e.) none of the above
2
11.) A newspaper reports a poll estimating the proportion u of the adult population in favour of legalizing
marijuana as 65%, but qualifies this result by saying that “this result is accurate within plus or minus 3
percentage points (19 times out of twenty).” What does this mean?
a.) the probability is 95% that u lies between 62% and 68%
b.) the probability is 95% that u is equal to 65%
c.) 95% of estimates calculated from samples of this size will lie between 62% and 68%
d.) not enough information
e.) none of the above
12.) Omitting a relevant explanatory variable that is correlated with the other independent variables
causes:
a.) no bias and no change in variance
b.) no bias and an increase in variance
c.) no bias and a decrease in variance
d.) bias
e.) none of the above
13.) The OLS estimator of the variance of the slope coefficient in the regression model with one
independent variable:
a.) will be smaller when there is less variation in ei
b.) will be smaller when there are fewer observations
c.) will be smaller when there is less variation in X
d.) will be smaller when there are more independent variables
e.) none of the above
14.) The central limit theorem tells us that the sampling distribution of the sample mean:
a.) is always normal
b.) is always normal in large samples
c.) approaches normality as the sample size increases
d.) is normal in Monte Carlo simulations
e.) none of the above
15.) Suppose you compute a sample statistic q to estimate a population quantity Q. Which of the
following is/are true?
[1] the variance of Q is zero
[2] if q is an unbiased estimator of Q, then q = Q
[3] if q is an unbiased estimator of Q, then q is the mean of the sampling distribution of Q
[4] a 95% confidence interval for q contains Q with 95% probability
a.) 1 only
b.) 2 only
c.) 2 and 3
d.) 2, 3, and 4
e.) 1, 2, 3, and 4
16.) If the covariance between two random variables X and Y is zero then
a.) X and Y are independent
b.) Knowing the value of X provides no information about the value of Y
c.) E(X) = E(Y) = 0
d.) a and b are true
e.) none of the above
3
17.) Given the equation for the F statistic, we can say that it is
a.) decreasing in R2, decreasing in n, and decreasing in k
b.) increasing in R2, increasing in n, and increasing in k
c.) decreasing in R2, increasing in n, and decreasing in k
d.) increasing in R2, increasing in n, and decreasing in k
e.) none of the above
18.) In the Capital Asset Pricing Model (CAPM),
a.) β measures the sensitivity of the expected return of a portfolio to systematic risk
b.) β measures the sensitivity of the expected return of a portfolio to specific risk
c.) β is greater than one
d.) α is less than zero
e.) R2 is meaningless
19.) If a random variable X has a normal distribution with mean μ and variance σ2 then:
a.) X takes positive values only
b.) ( X   ) /  2 has a standard normal distribution
c.) ( X   ) 2 /  2 has a chi-squared distribution with n degrees of freedom
d.) ( X   ) /( s / n ) has a t distribution with n-1 degrees of freedom
e.) none of the above
20.) Suppose the assumptions of the CLRM model applies and you have used OLS to estimate a slope
coefficient as 2.43. If the true value of this slope is 3.05, then the OLS estimator
a.) has bias of 0.62
b.) has bias of –0.62
c.) is unbiased
d.) not enough information
e.) none of the above
4
Short Answer #1 (10 points)
According to the Canada Revenue Agency, the average length of time for an individual to complete a
CRA Income Tax Return is 10.53 hours with a standard deviation of 2.00 hours. The distribution of this
variable, however, is unknown. Suppose we randomly sample 360 taxpayers.
a.) In words, explain what Xi equals.
b.) In words, explain what X-bar equals.
c.) Now, tell me how X-bar is distributed—that is, tell me the type of distribution and its parameters.
d.) Would you be surprised if the 360 taxpayers finished their Income Tax Return in an average of more
than 12 hours? Explain why or why not in complete sentences.
e.) Would you be surprised if one taxpayer out of the 360 taxpayers finished his Income Tax Return in
more than 12 hours? Explain why or why not in complete sentences.
a.) Xi is simply one of the observations underlying the sample of individuals filling out an Income Tax
Return.
b.) X-bar is simply the sample average calculated from the individual Xi’s.
c.) Because our sample size exceeds 30, we can invoke the Central Limit Theorem. Therefore, we can
state with reasonable assurance that X-bar will be normally distributed with a mean equal to mu (that is,
the population mean or 10.53 hours) and a variance equal to sigma-squared divided by n (that is, the
variance of the underlying individual observations in the population divided by 360, or 4/360 = 1/90 of
an hour = 0.66 minutes). Thus, X N (10.53, 0.0111),
d.) We would be very surprised by this result as X-bar is very tightly distributed around the population
mean in this particular case. An average of more than 12 hours would be more than about 14 standard
deviations from the population mean (=1.47/0.105).
e.) We would not be very surprised by this result as the Xi’s are distributed fairly widely around the
population mean in this particular case. A value of more than 12 hours would only be about 0.75
standard deviations from the population mean (=1.47/2.00)
5
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
6
Short Answer #2 (10 points)
Suppose we have a linear regression model with one independent variable and no intercept:
Yi = βXi + εi
Suppose also that εi satisfies the six classical assumptions.
a.) Verbally, explain the steps necessary to derive the least squares estimator.
b.) Formally, derive a mathematical expression for this estimator given your answer in part a).
For part a)
1.) Thus, we first have to define our residual as the difference between that which is observed and that
which is predicted by the regression. In this way, the residual is best thought of as a prediction error, that
is, something we would like to make as small as possible.
2.) Next, we need to define a minimization problem. Because our residuals will likely be both positive and
negative, simply considering their sum is unsatisfactory as these will tend to cancel one another out.
Additionally, minimizing the sum of residuals does not generally yield a unique answer. A better way
forward is to minimize the sum of the squared “prediction errors” which will definitely yield a unique
answer and which will penalize us for making big errors.
3.) We must take the derivatives of the sum of squared residuals with respect to the beta-hats and set them
equal to zero. These first order conditions establish the values of beta-hat for which the sum of squared
residuals “bottoms out” and is, thus, minimized.
4.) Finally, we must solve for the values of the beta-hats which are consistent with these first order
conditions, thus, yielding our least squares estimators.
For part b)
ei  Yi  ˆ X i
 e  
n
Minˆ
n
2
i
i 1
i 1
Yi  ˆ X i
  Y   
2
n
n
2
i 1
i
i 1

n

2Yi ˆ X i   ˆ X i
i 1

2
This allows us to derive the following first order condition:
n
  ei2
i 1
ˆ
 2 Yi X i   2 ˆ   X i2   0
n
n
i 1
i 1
 Yi X i   ˆ   X i2   0
n
n
i 1
i 1
ˆ   X i2    Yi X i 
n
n
i 1
i 1
n
ˆ 
Y X
i 1
n
i
i
 X 
i 1
2
i
7
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
8
Short Answer #3 (10 points)
There are at least two different possible approaches to the problem of building a model of the costs of
production of electric power. Model I hypothesizes that per-unit costs (C) as a function of the number of
kilowatt-hours produced (Q) continually and smoothly falls as production is increased, but it falls at a
decreasing rate. Model II hypothesizes that per-unit costs (C) decrease fairly steadily as production (Q)
increases across plant type, but costs start at a higher level for hydroelectric plants than for other kinds of
facilities.
a.) What functional form would you recommend for estimating Model I? Write out a specific equation.
b.) What functional form would you recommend for estimating Model II? Write out a specific equation.
c.) Would R2 be a reasonable way to compare the overall fits of the two equations? Why or why not?
a.) A number of forms are possible, but a log-log form would perhaps be the most appropriate:
ln(Ct )  0  1 *ln(Qt )   t
Whatever functional form chosen, it has to satisfy that the conditions that the first derivative and the
second derivatives of the LHS with respect to Q are respectively negative and positive. In the case of the
equation above,
C
C  ln C  ln Q
 ln C 1
C

C
 β1 <0
Q  ln Ci  ln Q Q
 ln Q Q
Q
 β1  0
 2C  β1C

0
Q 2
Q2
b.) A number of forms are possible, but a linear form with a dummy variable (Dt) capturing a different
intercept term for hydroelectric plants would perhaps be the most appropriate:
Ct  0  1Qt  2 Dt  t
Whatever functional form chosen, it has to satisfy that the conditions that the first derivative of the LHS
with respect to Q is constant and negative or β1<0 and that β2>0.
c.) Answers may vary depending on the functional form indicated. In the example above, R2 is not
appropriate for comparing the overall fits of the two equations as the functional form of the dependent
variable changes and, thus, the value of TSS.
9
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
10
Short Answer #4 (10 points)
Consider the regression results below where the dependent variable is the amount of time in minutes that
individuals spend traveling from home to work. The sample consists of workers across Canada. It also
contains information on individual’s earnings, years of schooling, age, sex, and place of birth.
Dependent variable: Canadian commuting times
Independent
variables:
Total earnings in 2012 (in $1000s)
0.0225
p-value
0.000
Years of schooling
-0.0344
p-value
0.162
Age
0.0183
p-value
0.007
Female
-3.1650
p-value
0.000
Africa
4.0390
p-value
0.000
Asia
1.1200
p-value
0.000
Australasia
1.2630
p-value
0.066
Europe
-0.5527
p-value
0.635
Latin America
2.039
p-value
0.000
Intercept
27.64
p-value
0.000
R2
0.0081
2
Adjusted R
F-statistic
p-value of F
Observations
0.0080
104.2
0.000
115089
a.) How many of the independent variables are statistically significant? Which ones are not?
b.) Does the whole set of independent variables have a reliable collective effect on the dependent
variables? Explain your answer.
c.) Consider the values of the R2 and adjusted R2 of the regression. Tell me what these mean individually
and collectively.
d.) Interpret the coefficient associated with variable on total earnings in 2012. Do the sign and magnitude
of this coefficient seem reasonable? Why or why not?
e.) The sample for this regression include individuals in many different cities of Canada. Does this seem
like a good idea? Why or why not? What would you suggest as an alternative?
11
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
a.) (ANSWERS MAY VARY DEPENDING ON SIGNIFICANCE LEVEL SPECIFED – THEREFORE,
FULL MARKS ONLY TO THOSE WHO TELL US WHAT VALUE OF ALPHA THEY ARE USING)
The regression contains nine independent variables of which six have p-values less than or equal to .05
and are, thus, statistically significant. “Years of schooling”, “Australiasia” and “Europe” are not
statistically significant.
b.) Yes. The p-value associated with the F-statistic is less than .05, indicating that the ensemble of nine
explanatory variables has a reliable collective effect on the dependent variable.
c.) Individually, the students need to define the two in terms of the amount of variation in commuting
times explained by the independent variables (adjusted or not adjusted for the degrees of freedom).
Collectively, the results imply that the number of observations far exceeds the number of explanatory
variables. Consequently, adjusting the R for the number of 2 explanatory variables has very little effect.
d.) The slope associated with earnings in $1,000 is .0225. Its p-value is less than .05, so this slope is
statistically significant. It indicates that each increase of $1,000 in annual earnings increases the
commuting time to work by a little more than two one-hundredths of a minute, or slightly more than a
second. In other words, an increase of a little less than $50,000 in annual earnings is associated with an
increase of one minute in commuting time. This may be plausible because people with higher earnings
typically live in more expensive housing, lots of expensive housing is in the suburbs, and suburban
residents typically have longer commutes than city residents.
e.)(ANSWERS MAY VARY – FULL CREDIT FOR ANY WELL-REASONED ARGUMENT) This seems like
a potentially bad idea. The transportation systems in different cities can be very different. Two people
with similar characteristics living in different cities might therefore have very different commuting times.
Consequently, it is probably misleading to have a regression for commuting times that does not,
somehow, account for the differences in commuting times across metropolitan areas.
12
Short Answer #5 (10 points)
Consider the regression results below where the dependent variable is the natural log of annual earnings
for single, child-less men with high-school education or less. The sample consists of workers in
Vancouver over the years from 2003 to 2012. It also contains information on individual’s age (as a set of
dummy variables capturing a range of ages), their status as a “visible minority” (that is, whether or not
they are Caucasian), their status as “Aboriginal” (that is, whether or not they are First Nations origin), and
whether or not they possess a “High School Degree”.
Dependent variable: natural log of annual earnings
Independent
variables:
OLS
Age from 30-34
0.29
standard error
0.16
t-statistic
1.80
Age from 35-39
0.19
standard error
0.18
t-statistic
1.10
Age from 40-44
0.25
standard error
0.16
t-statistic
1.51
Age from 45-49
0.12
standard error
0.17
t-statistic
0.71
Age from 50-54
0.08
standard error
0.17
t-statistic
0.46
Age from 55-59
0.37
standard error
0.19
t-statistic
2.02
Age from 60-64
0.35
standard error
0.30
t-statistic
1.15
Visible minority
0.07
standard error
0.19
t-statistic
0.36
Aboriginal
-0.54
standard error
0.20
t-statistic
-2.74
High school degree
standard error
t-statistic
Intercept
10.18
standard error
0.12
t-statistic
84.44
R2
F-statistic
DW statistic
p-value of F
Observations
0.03
89.21
0.98
0.000
4160
OLS with
OLS with
Newey-West SEs Newey-West SEs
0.29
0.26
0.13
0.14
2.20
1.88
0.19
0.21
0.17
0.17
1.13
1.23
0.25
0.24
0.15
0.15
1.66
1.65
0.12
0.13
0.17
0.17
0.70
0.79
0.08
0.06
0.18
0.18
0.43
0.32
0.37
0.38
0.20
0.21
1.83
1.84
0.35
0.33
0.26
0.27
1.34
1.24
0.07
0.02
0.19
0.09
0.73
0.29
-0.54
-0.49
0.27
0.26
-2.01
-1.88
0.33
3.09
0.00
10.18
9.96
0.11
0.14
89.22
73.62
0.03
89.21
1.78
0.000
4160
0.05
108.98
1.78
0.000
4160
13
For a.) through e.), consider only the output in the first and second columns and assume that with 4160
observations, the t distribution is functionally the same as the standard normal distribution.
a.) Why are the coefficients the same, but the standard errors different in the first and second column?
b.) Which set of estimates do you think are more reliable? Explain.
c.) What is the test statistic for the hypothesis that Aboriginal and Caucasian men have the same earnings
against the alternative that they do not? Can you reject this hypothesis? Hint: use the “rule of thumb” that
2.00 is a sufficiently large critical value.
d.) Do you reject the hypothesis that these two groups have the same log-earnings against the alternative
that Aboriginal men have lower log-earnings?
e.) Are the R-squared’s too low? Should we ignore these results?
f.) The third column reports the results for a regression just like that reported in the second column, but it
adds a dummy variable equal to 1 if an individual has a high school diploma. Why is the coefficient on
“Aboriginal” now smaller in absolute value than in the second column?
a.) The second column simply corrects for pure serial correlation. Inherently, this is a problem with the
standard errors and not the values of the coefficients themselves as OLS remains unbiased.
b.) The results in the first column indicate that serial correlation is a potential problem as the DurbinWatson test statistic is 0.98. This is far from the value of 2.00 when there is no positive serial correlation.
Therefore, we prefer the results in the second column.
c.) This is simply the t-statistic right off the table as Caucasian men are the omitted category. In the first
column, this is -2.74. In the first column, this is -2.01. These are larger in absolute value than the rule of
thumb for the 5% critical value of 2.00, so one can reject with either.
d.) The 5% critical value for a one-sided test must be even lower than 2.00, so we still reject this
hypothesis regardless of whether we consider the first or second column.
e.) R-squared measures the explained variation in the regression. A low R-squared suggests that there is
a lot of other stuff going on which we are not accounting for. However, this does not mean that our
independent variables do not matter. In fact, the p-value of the F statistic strongly suggests otherwise.
f.) “High school degree” must be correlated with both the natural log of earnings and “Aboriginal”.
Given that “High school degree”=1 is for high-school completion, “High school degree”=0 is for noncompletion, the coefficient on “High school degree” should be positive. So, it must be negatively
correlated with Aboriginal status to make it so that adding it to the regression reduces the measured
disparity. Alternatively put, high school completion is less likely for Aboriginals, so controlling for it
reduces the amount of disparity in earnings we see in comparison with not controlling for it.
14
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
15
Short Answer #6 (10 points)
The first half of the course was dedicated to developing the least squares estimator. The rest of the course
was dedicated to considering those instances when problems with the least squares estimator arise.
Underlying the discussion, there were the six assumptions of the classical linear model.
a.) Name the six assumptions and explain what each of them mean.
b.) Some of these assumptions are necessary for the OLS estimator to be unbiased. Some of these
assumptions are necessary for the OLS estimator to be “best”. Explain the distinction between
these two concepts.
c.) Indicate which of the six assumptions are necessary for the OLS estimator to be unbiased and
which of the six assumptions are necessary for the OLS estimator to be “best”.
d.) In general, would you prefer your estimates to be biased but efficient or unbiased but not efficient?
Explain your answer.
a) The regression model is: a.) linear in the coefficients, b.) is correctly specified, and c.) has an additive
error term.
The error term has zero population mean or E(εi) = 0.
All independent variables are uncorrelated with the error term, or Cov(Xi,εi) = 0 for each independent
variable Xi (we say there is no endogeneity).
Errors are uncorrelated across observations, or Cov(εi,εj) = 0 for two observations i and j (we say there
is no serial correlation).
The error term has a constant variance, or Var(εi) = σ2 for every i (we say there is no heteroskedasticity).
No independent variable is a perfect linear function of any other independent variable (we say there is no
perfect collinearity).
b) Unbiasedness relates to the property whereby the expected value of an estimator is equal to the
population parameter of interest.
“Best” relates to the size of the sampling variance of any such estimator with the lower, the better.
Blah, blah, blah…
c) Of the assumption listed above the first three are required for unbiasedness. Four through six are
necessary for the OLS estimator to be “best”.
d) It is probably better to have an indication that your estimator is centered on the population parameter
“on average” rather than be “wrong” but very precisely estimated. Thus, bias is the greater sin than
inefficiency (although I am open to students persuasively arguing the opposite if we think of a small bias
versus large variance case).
16
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
17
Useful Formulas:
E( X ) 

  x  
k
Var ( X )  E  X   X  
k
p x
i i
2
i
2 pi
X
i 1
i 1
k
Pr( X  x)   Pr X  x, Y  yi 
Pr(Y  y | X  x) 
i 1
m
E Y    E Y | X  xi  Pr X  xi 
k
E Y | X  x    yi PrY  yi | X  x 
i 1
i 1
k
Var (Y | X  x)    yi  E Y | X  x  PrY  yi | X  x 
Ea  bX  cY   a  bE( X )  cE (Y )
2
i 1
Cov( X , Y )   x j   X  yi  Y  PrX  x j , Y  yi 
k
m
i 1 j 1
Cov X , Y 
Var  X Var Y 
Corr  X , Y    XY 
Pr( X  x, Y  y)
Pr( X  x)
Var a  bY   b 2Var (Y )
Var aX  bY   a 2Var ( X )  b 2Var (Y )  2abCov( X ,Y )
E Y 2   Var (Y )  E (Y ) 2
Cova  bX  cV ,Y   bCov( X ,Y )  cCov(V ,Y )
E XY   Cov( X ,Y )  E( X ) E(Y )
t
1
X 
n
1 n
xi  x 2
s 

n  1 i 1
n
x
2
i
i 1
s XY 
X 
Z
s/ n
n
1
 xi  x  yi  y 
n  1 i 1
 X
n
For the linear regression model Yi   0  1 X i   i , ˆ1 
i 1
i
 X Yi  Y 
n
 X
i 1
i
X
2
X 

rXY  s XY / s X sY
& βˆ0  Y  ˆ1 X
Yˆi  ˆ0  ˆ1 X 1i  ˆ2 X 2i    ˆk X ki
e2
ESS TSS  RSS
RSS

i i
R 

 1
 1
2
TSS
TSS
TSS
 Yi  Y 
 e / (n  k  1)
R  1
 Y  Y  / (n  1)
e /  n  k  1
ˆ  ˆ     
Var
 
 X  X 
2
e
2
i
i
i
2
2
s 
where E  s 2    2
 n  k  1
2
2
i i
2
i i
i i
1
2
i
Z
ˆ j   H
Var[ ˆ j ]
~ N  0,1
Pr[ˆ j  t* /2  s.e.(ˆ j )   j  ˆ j  t* /2  s.e.(ˆ j )]  1  
 e  e 
d
 e
T
t 2
t
T
2
t 1 t
t 1
t
F
i
ˆ1   H
~ tn k 1
s.e.( ˆ1 )
ESS / k
ESS (n  k  1)

RSS / (n  k  1)
RSS
k
2
 2(1   )
18
Download