Given name:____________________ Family name:___________________

advertisement
Given name:____________________
Student #:______________________
Family name:___________________
Section #:______________________
BUEC 333 FINAL
Multiple Choice (2 points each)
1) The Durbin-Watson test is only valid:
a) with models that exclude an intercept
b) with models that include a lagged dependent variable
c) with models displaying multiple orders of autocorrelation
d) all of the above
e) none of the above
2) If q is an unbiased estimator of Q, then:
a) Q is the mean of the sampling distribution of q
b) q is the mean of the sampling distribution of Q
c) Var[q] = Var[Q] / n where n = the sample size
d) q = Q
e) a and c
3) The OLS estimator of the variance of the slope coefficient in the regression model with one
independent variable:
a) will be smaller when there is less variation in ei
b) will be smaller when there are fewer observations
c) will be smaller when there is less variation in X
d) will be smaller when there are more independent variables
e) none of the above
4) Suppose the assumptions of the CLRM model applies and you have used OLS to estimate a slope
coefficient as 2.43. If the true value of this slope is 3.05, then the OLS estimator
a) has bias of 0.62
b) has bias of –0.62
c) is unbiased
d) not enough information
e) none of the above
5) Which of the following is not a linear regression model:
a) Yi    X i  X i2   i
b) Yi   0  1 X i
2
 i
c) log( Yi )   0  1 log( X i )   i
d) Yi   0  1 log( X i )   i
e) none of the above
1
6) To be useful for hypothesis testing, a test statistic must:
a) be computable using sample data
b) have a known sampling distribution when the alternative hypothesis is true
c) have a known sampling distribution when the null hypothesis is false
d) a and b only
e) none of the above
7) Impure serial correlation:
a) is the same as pure serial correlation
b) can be detected with residual plots
c) is caused by mis-specification of the regression model
d) b and c
e) none of the above
8) The power of a test statistic should become larger as the
a) sample size becomes larger
b) type II error becomes larger
c) null becomes closer to being true
d) significance level becomes larger
e) none of the above
9) Suppose you want to test the following hypothesis at the 5% level of significance:
H0: μ = μ0
H1: μ ≠ μ0
Which of the following statements is/are true?
a) the probability of erroneously failing to reject the null hypothesis when it is true is 0.05
b) the probability of erroneously failing to reject the null hypothesis when it is false is 0.05
c) the probability of erroneously rejecting the null hypothesis when it is true is 0.05
d) the probability of erroneously rejecting the null hypothesis when it is false is 0.05
e) none of the above
10) Suppose upon running a regression, EViews reports a value of the explained sum of squares as 1648
and an R2 of 0.80. What is the value of the residual sum of squares in this case?
a) 0
b) 412
c) 1318.4
d) unknown as it is incalculable
e) none of the above
11) The power of a test is the probability that you:
a) reject the null when it is true
b) reject the null when it is false
c) fail to reject the null when it is false
d) fail to reject the null when it is true
e) none of the above
2
12) In a regression explaining earnings, you include one independent variable: an individuals' number of
years of education as an independent variable but nothing else. You know that more educated people earn
more. You also know that more educated people drink more. In this case, the OLS estimate of the effect
of education on earnings will likely be:
a) negatively biased
b) positively biased
c) unbiased
d) not enough information
e) none of the above
13) The consequences of multicollinearity are that the OLS estimates:
a) will be biased while the standard errors will remain unaffected
b) will be biased while the standard errors will be smaller
c) will be unbiased while the standard errors will remain unaffected
d) will be unbiased while the standard errors will be smaller
e) none of the above
14) In the regression specification, Yi  0  1 X i   i , which of the following is a justification for
including epsilon?
a) it accounts for potential non-linearity in the functional form
b) it captures the influence of all omitted explanatory variables
c) it incorporates measurement error in Y
d) it reflects randomness in outcomes
e) all of the above
15) In order for our independent variables to be labelled “exogenous” which of the following must be
true:
a) E(εi) = 0
b) Cov(Xi,εi) = 0
c) Cov(εi,εj) = 0
d) Var(εi) = σ2
e) none of the above
16) The F test of overall significance
a) is based on a test statistic that has an F distribution with k and n-k-1 degrees of
freedom
b) is based on a test statistic that has an F distribution with n-k-1and k degrees of freedom
c) helps to detect whether relevant variables have been omitted from the model
d) a and c
e) b and c
17) The OLS estimator is said to be BUE when:
a) Assumptions 1 through 6 are satisfied and errors are normally distributed
b) Assumptions 1 through 3 are satisfied and errors are normally distributed
c) Assumptions 1 through 6 are satisfied
d) Assumptions 1 through 3 are satisfied
e) errors are normally distributed
3
18) The RESET test is designed to detect problems associated with:
a) specification error of an unknown form
b) heteroskedasticity
c) multicollinearity
d) serial correlation
e) none of the above
19) Omitting a constant term from our regression will likely lead to:
a) a lower R2, a lower F statistic, and unbiased estimates of the independent variables
b) a higher R2, a lower F statistic, and biased estimates of the independent variables
c) a higher R2, a lower F statistic, and unbiased estimates of the independent variables
d) a higher R2, a higher F statistic, and biased estimates of the independent variables
e) none of the above
20) If two random variables X and Y are independent,
a) their joint distribution equals the product of their conditional distributions
b) the conditional distribution of X given Y equals the marginal distribution of X
c) their variance is zero
d) a and c
e) a, b, and c
4
Short Answer #1 (10 points)
Suppose we specify the following regression model on the determination of incomes in British Columbia:
ln( wi )  0  1Educationi   2 FirstNationsi  3 Malei   4 Northi
 5 ( Educationi * Malei )  6 ( Malei * Northi )   i
The dependent variable is the log of wages for individual i.
Education is years of education for individual i.
FirstNations is a dummy variable equal to 1 if an individual self-identifies as being of First Nations origin
and equal to 0 if an individual does not.
Male is a dummy variable equal to 1 if an individual self-identifies as being male and equal to 0 if an
individual self-identifies as being female.
North is a dummy variable equal to 1 if an individual resides in northern British Columbia and equal to 0
if an individual resides in southern British Columbia.
Verbally explain the following:
a) What null hypothesis are you testing when you test β2 = 0?
b) What null hypothesis are you testing when you test β3 = 0?
c) What null hypothesis are you testing when you test β5 = 0?
d) What null hypothesis are you testing when you test β6 = 0?
It is best to think of this in terms of the possible combinations of ethnic, gender, and regional origins.
First Nations, male, northern British Columbia: β0 + β2 + β3 + β4 + β6
First Nations, male, southern British Columbia: β0 + β2 + β3
First Nations, female, northern British Columbia: β0 + β2 + β4
First Nations, female, southern British Columbia: β0 + β2
Non- First Nations, male, northern British Columbia: β0 + β3 + β4 + β6
Non- First Nations, male, southern British Columbia: β0 + β3
Non- First Nations, female, northern British Columbia: β0 + β4
Non- First Nations, female, southern British Columbia: β0
a) The impact of ethnic origin on wages is zero for everyone.
b) The impact of gender identification for someone in southern British Columbia is zero.
c) The impact of education for a male is the same as for a female.
d) The impact of living in northern British Columbia for a male is the same as for a female.
5
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
6
Short Answer #2 (10 points)
The first half of the course was dedicated to developing the least squares estimator. The rest of the course
was dedicated to considering those instances when problems with the least squares estimator arise.
Underlying the discussion, there were the six assumptions of the classical linear model.
a.) Name the six assumptions and explain what each of them mean.
b.) Some of these assumptions are necessary for the OLS estimator to be unbiased. Some of these
assumptions are necessary for the OLS estimator to be “best”. Explain the distinction between these two
concepts.
c.) Indicate which of the six assumptions are necessary for the OLS estimator to be unbiased and
which of the six assumptions are necessary for the OLS estimator to be “best”.
d.) In general, would you prefer your estimates to be biased with small sampling variance or unbiased
with a larger sampling variance? Explain your answer.
a) The regression model is: a.) linear in the coefficients, b.) is correctly specified, and c.) has an additive
error term.
The error term has zero population mean or E(εi) = 0.
All independent variables are uncorrelated with the error term, or Cov(Xi,εi) = 0 for each independent
variable Xi (we say there is no endogeneity).
Errors are uncorrelated across observations, or Cov(εi,εj) = 0 for two observations i and j (we say there
is no serial correlation).
The error term has a constant variance, or Var(εi) = σ2 for every i (we say there is no heteroskedasticity).
No independent variable is a perfect linear function of any other independent variable (we say there is no
perfect collinearity).
b) Unbiasedness relates to the property whereby the expected value of an estimator is equal to the
population parameter of interest.
“Best” relates to the size of the sampling variance of any such estimator with the lower, the better.
Blah, blah, blah…
c) Of the assumption listed above the first three are required for unbiasedness. Four through six are
necessary for the OLS estimator to be “best”.
d) It is probably better to have an indication that your estimator is centered on the population parameter
“on average” rather than be “wrong” but very precisely estimated. Thus, bias is the greater sin than
inefficiency (although I am open to students persuasively arguing the opposite if we think of a small bias
versus large variance case).
7
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
8
Short Answer #3 (10 points)
Consider the results from running a regression of SCORE (out of 100) on a BUEC 333 exam on GPA and
a dummy variable equal to 1 if a student was born in Tennessee (USA) and 0 if a student was born
elsewhere.
Dependent variable: SCORE
Method: Least Squares
Sample: 1 100
Included observations: 100
Variable
GPA
TENNESSEE
C
Coefficient
8.03
-3.69
55.00
R-squared
Adjusted R-squared
Mean dependent variable
S.D. dependent variable
Std. Error
?
3.03
6.67
t-Statistic
2.00
-1.22
8.25
Prob.
0.00
0.23
0.00
0.12
0.10
76.22
12.78
a) How would you interpret the coefficient estimate for TENNESSEE?
b) Approximately what number should appear in the Std. Error column for GPA?
c) What score would you forecast for a student born in Sydney, Australia with a GPA of 3.0? Explain
your calculation.
d) If approximately 20% of the data are from Tennessee, what is the approximate average GPA of the
sample? Explain your thinking (you can also round up to the first decimal place in your
explanation).
a) Holding GPA constant, students from Tennessee score on average 3.69 points lower.
b) The t value is given by:
t
ˆGPA   H
ˆ  0
ˆ1
 GPA

 2.00
s.e.( ˆGPA ) s.e.( ˆGPA ) s.e.( ˆGPA )
In this case, this means that the value of the standard error for GPA should be equal to 8.03/2.00 =
4.015.
ˆ
 55.00  8.03*3  79.09
c) SCORE
d) By construction, the regression line passes through the sample averages. So,
76.2  55.0  8.0* GPA  3.7 *0.2
76.2  54.3  8.0* GPA
8.0* GPA  21.9
GPA  2.74
9
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
10
Short Answer #4 (10 points)
In this course, we have repeatedly considered the linear regression model with one independent variable:
Yi  0  1 X i   i
We have also seen that OLS defines the set of estimators that minimize the sum of squared residuals:
βˆ0  Y  ˆ1 X
n
ˆ1 
 X
i 1
i
 X Yi  Y 
 X
n
i 1
i
X
2
a) Suppose that β1-hat = -2. What must be the sign of the sample covariance between X and Y. Explain
your reasoning.
b) Now, derive an expression for β1-hat as a function of the following sample statistics: the correlation
between X and Y (rx,y); the standard deviation of Y (sY); and the standard deviation of X (sX).
c) Given your answer in b), suppose that β1-hat = -2 and sY = 3. Is it possible that sX = 2? Explain.
d) Given your answer in b), suppose that β1-hat = -2 and sY = 3. Is it possible that sX = 1? Explain.
e) The estimated variance of β1-hat is generally given as
ˆ  ˆ  
Var
 1
s2
 X
i
i
X
2
Explain why the estimated variance of β1-hat can also be written as
ˆ  ˆ  
Var
 1
s2
(n  1)* s X2
a) The first step is in recognizing that
ˆ1 
Cov( X , Y ) s XY
 2
Var ( X )
sX
So if β1-hat = -2, then the sample covariance must be negative since variances are positive for our
purposes.
b) Using the expression above,
ˆ1 
Cov( X , Y ) s XY
 2
Var ( X )
sX
We need to make use of the definition of the sample correlation given on the last page of the exam, or
namely:
rXY 
s XY
s
rXY * s X sY
sY
 s XY  rXY * s X sY  ˆ1  XY


r
*
XY
s X sY
s X2
s X2
sX
11
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
c)If β1-hat = -2 and sY = 3, then
ˆ1  rXY *
sY
3
 2  rXY * 
sX
sX
2
rXY   s X
3
In this case, the standard deviation of X could not be 2, since this would imply that the correlation was
-4/3. This value for the correlation is less than -1 which is impossible by construction/definition.
d)If β1-hat = -2 and sY = 3, then
ˆ1  rXY *
sY
3
 2  rXY * 
sX
sX
2
rXY   s X
3
In this case, the standard deviation of X could be 1, since this would imply that the correlation was
-2/3. This value for the correlation falls in the feasible ranges of (-1, +1).
e) From the last page of the final exam,
s X2 
2
1 n
Xi  X 


n  1 i 1
We are also given the information that
ˆ  ˆ  
Var
 1
s2
 X
i
ˆ  ˆ  
Var
 1
ˆ  ˆ  
Var
 1
i
X
2
ˆ  ˆ  
 Var
 1
s2
2
(n  1)
X

X



i
(n  1) i

s2
2
c
X

X



i
c i
s2
 X
(n  1)
i
i
X
2
(n  1)
s2
(n  1) * s X2
12
Short Answer #5 (10 points)
Suppose that for all fourth year seminar courses in Economics the length of term papers is uniformly
distributed from 10 to 14 pages (also assume that page length is discrete). Suppose we also survey a
random sample of 55 terms papers across classes as we are interested in the average length of the research
papers.
a) In words, define what Xi will be in this case.
b) Calculate the value of µX and σX in this case.
c) How should we characterize the distribution of X-bar in this case?
d) Describe how you would find the probability that an individual paper is longer than 12 pages. Use a
diagram to support your answer.
e) Describe how you would find the probability that the average length of the 55 papers is longer than 12
pages. Use a diagram to support your answer.
a) Xi will be the number of pages in one individual paper.
b) Since it is a uniform distribution across the outcomes of 10, 11, 12, 13, and 14, we know the associated
probability for any outcome is 0.20. So,
1
5
1
5
1
5
1
5
1
5
 X  *10  *11  *12  *13  *14 
k
 X2   ( X i   X ) 2 pi 
i 1
60
 12
5
1 5
( X i  12) 2

5 i 1

1
(10  12) 2  (11  12) 2  (12  12) 2  (13  12) 2  (14  12) 2
5
1
 X2  (2) 2  (1) 2  (0) 2  (1) 2  (2) 2
5
1
1
 X2   4  1  0  1  4   (10)  2
5
5
X  2
 X2 



c) We know that regardless of the underlying distribution X-bar should be normally distributed with a
mean equal to µX and a sampling variance equal to (σX)2/n. That is,
X ~ N (12,
2
)
55
d) This would amount to consulting the probability density function for the distribution of individual
paper lengths which we know is uniform and should look something like the figure below.
13
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
It would then be a matter of adding up the area of the pdf to the right of 12 (that is, 13 and 14 pages). In
this case, we can see that this would be equal to 40%.
d) This would amount to consulting the probability density function for the distribution of the average of
paper lengths which we know is normal and should look something like the figure below.
It would then be a matter of adding up the area of the pdf to the right of 12. In this case, we can see that
this would be equal to 50%.
14
Short Answer #6 (10 points)
Consider the following set of regression results generated from gravity model of international trade for
the period from 1950 to 2000:
REGRESSION A
Linear regression
Number of obs
F( 6, 6622)
Prob > F
R-squared
Root MSE
logtrade
Coef.
loggdpprod
ldist
fixed
language
empire
border
2.018092
-1.27416
.161583
.4565153
1.002045
1.514522
Robust
Std. Err.
.0089598
.0277619
.0986606
.0845662
.1541489
.0884167
t
225.24
-45.90
1.64
5.40
6.50
17.13
P>|t|
0.000
0.000
0.102
0.000
0.000
0.000
=
=
=
=
=
6628
.
0.0000
0.9970
2.2473
[95% Conf. Interval]
2.000528
-1.328583
-.0318237
.2907382
.6998638
1.341197
2.035656
-1.219738
.3549896
.6222923
1.304227
1.687848
Number of obs
F( 6, 6621)
Prob > F
R-squared
Root MSE
=
6628
= 3328.98
= 0.0000
= 0.7385
= 1.8071
REGRESSION B
Linear regression
logtrade
Coef.
loggdpprod
ldist
fixed
language
empire
border
_cons
1.407351
-1.694913
1.205402
.6835998
.6023074
.6093824
18.84188
Robust
Std. Err.
.0134916
.0232049
.0933823
.0714326
.1348587
.0795588
.3180509
t
104.31
-73.04
12.91
9.57
4.47
7.66
59.24
P>|t|
0.000
0.000
0.000
0.000
0.000
0.000
0.000
[95% Conf. Interval]
1.380903
-1.740402
1.022342
.5435689
.3379409
.4534216
18.21839
1.433799
-1.649424
1.388461
.8236306
.8666739
.7653432
19.46536
The dependent and independent variables are the same as those from Homework #2. Namely,
logtrade = the natural log of the product of trade12 and trade21
loggdpprod = the natural log of the product of gdp1 and gdp2
ldist = the natural log of distance separating country 1 and country 2
fixed= a dummy variable equal to 1 if country 1 and country 2 have a fixed nominal exchange
rate and 0 if otherwise
language = a dummy variable equal to 1 if country 1 and country 2 share the same language and 0
if otherwise (e.g., if Canada is country 1 and the US is country 2, then language = 1; if
Canada is country 1 and India is country 2, then language = 1)
empire = a dummy variable equal to 1 if country 1 and country 2 were in the same empire either
now or in the past and 0 if otherwise (e.g., if Canada is country 1 and the US is country 2,
then empire = 0; if Canada is country 1 and India is country 2, then empire = 1)
border = a dummy variable equal to 1 if country 1 and country 2 share a border and 0 if otherwise
(e.g., if Canada is country 1 and the US is country 2, then border = 1; if Canada is
country 1 and the India is country 2, then border = 0)
15
a) From Regression A, interpret the coefficient on ldist.
b) From Regression A, interpret the confidence interval for ldist. Your answer should include reference to
both population parameters and statistical significance.
c) From Regression B, perform a test of joint significance for the model, using a 1% level of
significance and explain the results. Do the results change at the 5% level of significance?
d) From Regression A, interpret the value of the R-squared. Are the R-squareds from Regression A and
Regression B comparable?
e) Of the two candidate regressions, which should be your preferred specification and why?
a) This is simply the elasticity of bilateral trade with respect to distance. It says that, holding all else
constant, for every 1% increase in distance separating two countries the level of bilateral trade between
them falls by 1.27416%.
b) There is a 95% probability that confidence intervals constructed in this fashion over repeated samples
will include the true value of the population parameter βDISTANCE. Also, the fact that the CI does not extend
into positive values suggests that we reject the null hypothesis at the 5% level that the true coefficient is
equal to zero.
c) This test is implicitly being performed in the calculation of the F-statistic and, in particular, its
associated p-value. That Prob > F is 0.0000 suggests we can reject the null hypothesis of joint
insignificance at the 1% level. As to the 5% level, the logic of hypothesis testing suggests that any null
which is rejected at 1% will be rejected at 5% as well.
d) This is the proportion of variation in bilateral trade explained by variation in our independent
variables. Thus, we ostensibly capture nearly 100% of the variation in bilateral trade in this
specification. However, this specification is suspect as it contains no constant term. Because of this, we
also know that the TSS changes across Regression A and Regression B. Therefore, the R-squareds are not
comparable even though they have the same dependent variable.
e) Regression B as it contains a constant term, even though it registers a lower R-squared. We have
argued that all specifications should contain a constant. This is primarily because we know that by
excluding a constant term we are potentially biasing the estimates of the coefficients attached to our
independent variables. This result is clearly seen in how the values of these estimates diverge across the
two specifications.
16
Useful Formulas:
k
k
2
2
 X2  Var ( X )  E  X   X      xi   X  pi
 X  E ( X )   pi xi

i 1
k
Pr( X  x)   Pr X  x, Y  yi 

Pr(Y  y | X  x) 
i 1
m
E Y | X  x    yi PrY  yi | X  x 
i 1
i 1
Var (Y | X  x)    yi  E Y | X  x  PrY  yi | X  x 
Ea  bX  cY   a  bE( X )  cE (Y )
2
i 1
 XY  Cov( X , Y )    x j   X   yi  Y  Pr  X  x j , Y  yi 
k
Pr( X  x, Y  y)
Pr( X  x)
E Y    E Y | X  xi  Pr X  xi 
k
k
i 1
m
Var a  bY   b 2Var (Y )
i 1 j 1
Cov X , Y 
Var  X Var Y 
Corr  X , Y    XY 
Var aX  bY   a 2Var ( X )  b 2Var (Y )  2abCov( X ,Y )
E Y 2   Var (Y )  E (Y ) 2
E XY   Cov( X ,Y )  E( X ) E(Y )
1
X 
n
t
1 n
2
s 
 xi  x 

n  1 i 1
n
x
Cova  bX  cV ,Y   bCov( X ,Y )  cCov(V ,Y )
2
X
i
i 1
X 
Z
s/ n
s XY 
X 
 2 
X ~ N  , 
n 

rXY  s XY / s X sY

n
1
 xi  x  yi  y 
n  1 i 1
 X
n
For the linear regression model Yi   0  1 X i   i , ˆ1 
i 1
i
 X Yi  Y 
n
 X
i 1
i
X
2
& βˆ0  Y  ˆ1 X
Yˆi  ˆ0  ˆ1 X 1i  ˆ2 X 2i    ˆk X ki
e2
ESS TSS  RSS
RSS

i i
R 

 1
 1
2
TSS
TSS
TSS
 Yi  Y 
 e / (n  k  1)
R  1
 Y  Y  / (n  1)
e /  n  k  1
ˆ  ˆ     
Var
 
 X  X 
2
e
2
i
i
i
2
2
s 
where E  s 2    2
 n  k  1
2
2
i i
2
i i
i i
1
2
i
Z
ˆ j   H
Var[ ˆ j ]
~ N  0,1
Pr[ˆ j  t* /2  s.e.(ˆ j )   j  ˆ j  t* /2  s.e.(ˆ j )]  1  
 e  e 
d
 e
T
t 2
t
T
2
t 1 t
t 1
t
F
i
ˆ1   H
~ tn k 1
s.e.( ˆ1 )
ESS / k
ESS (n  k  1)

RSS / (n  k  1)
RSS
k
2
 2(1   )
17
Download