Given name:____________________ Family name:___________________

advertisement
Given name:____________________
Student #:______________________
Family name:___________________
Section #:______________________
BUEC 333 MIDTERM
Multiple Choice (2 points each)
1.) The Gauss-Markov Theorem says that when the 6 classical assumptions are satisfied:
a.) The least squares estimator is unbiased
b.) The least squares estimator has the smallest variance of all linear unbiased estimators
c.) The least squares estimator has an approximately normal sampling distribution
d.) The least squares estimator is consistent
e.) None of the above
2.) Which of the following is not a linear regression model:
a.) Yi    X i  X i2   i
2
b.) Yi     cos( X i )   exp( X i )   i
c.) log( Yi )   0  1 log( X i )   i
d.) Yi   0  1 log( X i )   i
e.) none of the above
3.) The distribution of X when Y is known is called the _____ distribution of X, and is written as _____.
These blanks are best filled with the following
a.) conditional, p(X)
b.) conditional, p(X|Y)
c.) marginal, p(X)
d.) marginal, p(X|Y)
e.) none of the above
4.) In the linear regression model, the degrees of freedom
a.) affects the precision of the coefficient estimates
b.) is equal to the number of observations (n) minus 1
c.) affects the value of the coefficient estimates
d.) all of the above
e.) none of the above
5.) The power of a test statistic should become larger as the
a.) the probability of a type II error becomes smaller
b.) null becomes closer to being true
c.) significance level becomes larger
d.) sample size becomes larger
e.) none of the above
(note: typo in exam – credit given for either answer)
1
6.) The central limit theorem tells us that the sampling distribution of the sample mean:
a.) is always normal
b.) is always normal in large samples
c.) approaches normality as the sample size increases
d.) is normal in Monte Carlo simulations
e.) none of the above
7.) Suppose [L(X), U(X)] is a 90% confidence interval for a population mean. Which of the following
is/are true?
a.) Pr  L  X     U  X   0.90
b.) Pr  L  X     U  X   0.90
c.) Pr    L  X   Pr U  X      0.10
d.) a and c
e.) none of the above
8.) The sampling variance of the slope coefficient in the regression model with one independent variable:
a.) will be smaller when there is less variation in ε
b.) will be larger when there is less variation in ε
c.) will be smaller when there is less variation in X
d.) will be larger when there is less co-variation in ε and X
e.) none of the above
9.) The central limit theorem tells us that the sampling distribution of least squares regression coefficient:
a.) is always normal
b.) is always normal in large samples
c.) approaches a uniform distribution as the sample size increases
d.) is normal in Monte Carlo simulations
e.) none of the above
10.) In order for our independent variables to be labelled “exogenous” which of the following must be
true:
a.) E(εi) = 0
b.) Cov(Yi,εi) = 0
c.) Cov(εi,εj) = 0
d.) Var(εi) = σ2
e.) none of the above
11.) Which of the following statements is false regarding the Central Limit Theorem:
a.) when the sample size is large, the mean of X-bar is approximately equal to the mean of X.
b.) when the sample size is large, X-bar is approximately normally distributed.
c.) when the sample size is large, the standard deviation of X-bar is approximately the same
as the standard deviation of X.
d.) all of the above
e.) none of the above
2
12.) If the covariance between two random variables X and Y is zero then
a.) X and Y are independent
b.) Knowing the value of X provides no information about the value of Y
c.) E(X) = E(Y) = 0
d.) a and b are true
e.) none of the above
13.) If two random variables X and Y are independent,
a.) their joint distribution equals the product of their conditional distributions
b.) the conditional distribution of X given Y equals the joint distribution of X
c.) their covariance is zero
d.) a and c
e.) a, b, and c
14.) If a random variable X has a normal distribution with mean μ and variance σ2 then:
a.) X takes positive values only
b.) ( X   ) /  2 has a standard normal distribution
c.) ( X   ) /( s / n ) has a t distribution with n-1 degrees of freedom
d.) ( X   ) 2 /  2 has a chi-squared distribution with n degrees of freedom
e.) none of the above
15.) Suppose you want to test the following hypothesis at the 5% level of significance:
H0: μ = μ0
H1: μ ≠ μ0
Which of the following is/are true?
a.) the probability of a Type I error is 0.05
b.) the probability of a Type I error is 0.025
c.) the t statistic for this test has a t distribution with n degrees of freedom
d.) a and c
e.) b and c
3
Short Answer #1 (10 points – show your work!)
Consider the case of a uniformly distributed random variable where each outcome (1, 2, 3, 4) has
an equal chance of occurring. It can be easily shown that the population mean and variance of
this random variable are 2.50 and 1.25, respectively.
a.) Suppose that a random number generator provides the following sequence of numbers,
2-1-4-1. What is the mean and variance of this sample?
b.) What is the sampling distribution of the sample mean calculated above? Provide a verbal
interpretation of the sampling distribution of the sample mean.
c.) Compute the value of the t-statistic for testing the null hypothesis that μ = 2.5. Hint: the
square root of two is approximately equal to 1.40.
d.) The critical value for a t distribution with 3 degrees of freedom and a 0.20 level of
significance in the presence of a two-sided alternative is equal to 1.638. Can you reject
the null hypothesis that μ = 2.5 at the 20% level of significance? What about at the 10%
level of significance?
a.) X 
1
(2  1  4  1)
Xi 
2

n
4
2
1
(2  2)2  (1  2) 2  (4  2) 2  (1  2) 2
X

X

 i 
n 1
3
2
2
2
2
(0)  (1)  (2)  (1)
0 1 4 1 6
s2 

 2
3
3
3
s2 
b.) The mean is equal to the population mean of 2.50. The sampling variance is equal to the
population variance divided by the number of observations or 1.25/4 = 0.3125. This represents
the set of possible values that the statistic might take, and the probabilities associated with each
of them. It measures uncertainty over the possible value that the statistic might take in repeated
samples from the same population.
c.) t 
X    2.0  2.5 0.5 1.0



 0.7 ~ t3
1.4 / 2 1.4
s/ n
2/ 4
d.) In this case, you will fail to reject the null that μ = 2.5 at the 20% level of significance. And
as the critical value will only increase with the level of significance, we will fail to reject at the
10% level as well.
4
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
5
Short Answer #2 (20 points – show your work!)
A researcher is using data for a sample of 1,000 male wage-earners to investigate the relationship between
hourly wage rates, Yi (measured in dollars per hour), and length of work experience with a particular firm,
Xi (measured in years). Analysis of the data in Excel produces the following sample information:
 Y  12,500
n  1, 000
Cov( X i , Yi )  36
e
2
i
X
i
Var(X i )  60
i
 3,500
Var(Yi )  60
 15, 000
Use the information above to answer the following questions. Show all formulas and calculations, using
the following approximation, n = (n – 1) = (n – 2).
a.) What are the OLS estimates of the constant term (β0) and the slope coefficient (β1)?
b.) Interpret the estimate of the slope coefficient you calculated in part a.).
c.) Calculate an estimate of the variance of the error term in the population regression model.
d.) Calculate an estimate of the variance of the estimated slope coefficient.
e.) Compute the value of R2 and briefly explain what the calculated value of R2 means.
a.) The easiest way forward is in remembering that
ˆ1  
( X i  X )(Yi  Y )
(X
i
 X )2
(X

 X )(Yi  Y )
Cov( X i , Yi ) 36
(n  1)


 0.6
2
Var ( X i )
60
(Xi  X )
(n  1)
i
From the formula sheet, we also know that
ˆo  Y  ˆ1 X  
Yi
n
 0.6*
X
n
i

12,500
3,500
 0.6*
1, 000
1, 000
ˆo  12.5  0.6*3.5  12.5  2.1  10.4
b.) The estimate of 0.6 means than an increase in length of work experience by one year is associated on
average with an increase in male wage-earners hourly wage rate of 0.60 dollars per hour, or 6o cents per
hour. Critically, we are not holding constant the variation in other potential independent variables.
c.) Here, we make use of the equation for
s 
2
e
i
2
 n  k  1
=
15, 000 15, 000

 15.0
1, 000  2 1, 000
6
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
d.) Here, we make use of the equation for
ˆ  ˆ    
Var
 1
ei 2  /  n  k  1
 Xi  X 
 X
2
=
15.0
 Xi  X 
2
where
 X   (n  1) *Var ( X i )  1, 000*60  60, 000 
2
i
ˆ  ˆ  
Var
 1
15
1

60, 000 4, 000
e.) Here, we make use of the equation for
ei2
ESS TSS  RSS
RSS

R 

 1
 1
where
2
TSS
TSS
TSS
 Yi  Y 
2
 Y  Y 
i
R2  1 
2
 (n  1) *Var (Yi )  1, 000*60  60, 000 
15, 000
 1  0.25  0.75
60, 000
The value of 0.75 indicates that 75% of the observed variation in Yi is “explained” by variation in
worker’s experience with a particular firm.
7
Short Answer #3 (20 points – show your work!)
Consider the standard univariate population regression model:
Yi  0  1 X i   i
Assume that all of the classical assumptions are satisfied. Show that the OLS estimator ˆ1 is an unbiased
estimator of 1 . Hint: you should make use of the fact that Yi  0  1 X i   i  Y  0  1 X  
From the formula sheet, we know that
ˆ1 
  X  X Y  Y 
 X  X 
i
i
2
i
Here, we can make use of the hint in the second expression of the numerator:
ˆ1 
 X
i
 X  0  1 X i   i  0  1 X   
 X
i
X
2
Collecting terms, we find that
ˆ1 
ˆ1 
 X
i

 X
1   X i  X 
 X
ˆ1  1 

 X  1  X i  X    i  
i
X
2
i
2

X
2
  X  X     
 X  X 
i
i
2
i
  X  X     
 X  X 
i
i
2
i
Finally, we know that unbiasedness entails that the expected value of an estimator should equal the true
parameter value we are interested in. In this case,
  X  X      
 i
i

E ( ˆ1 )  E ( 1 )  E 
2

X

X
  i  

The first term is, of course, a constant and just returns the expression in the parenthesis.
8
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
The second term can be expressed as
  X  X      
i
 i

  X  X      
 Cov( X i ,  i ) 
(
n

1)



i
i
E
E
 E

2
2




Var ( X i ) 
X

X
X

X







i
i






(n  1)


Since all of the classical assumptions including #3 regarding exogeneity are satisfied, we have
 Cov( X i ,  i ) 


0
E ( ˆ1 )  1  E 
  1  E 
  1
Var
(
X
)
Var
(
X
)
i
i 



9
Short Answer #4 (20 points – show your work!)
You wish to determine if the application of fertilizer and water affect plant growth. To that end, you run
an experiment where you randomly apply different amounts of fertilizer and water to your hemp plants.
You then use regression analysis to determine how they affect the yield of a plant in grams. Yield is
measured in grams. Fertilizer is measured in kilograms and ranges in value from (0.0 to 1.0) while water
is measured in liters per week. The standard errors of the regression coefficients are reported in
parentheses. You get the following results:
Yieldi  12.1  5.5* Fertilizeri  12*Wateri
(0.4) (0.5)
n  140
RSS=1234
(2.7)
R  0.76
2
a.) Do you think this type of analysis will give you an unbiased estimate of how much adding fertilizer
increases your crop yield? Why or why not?
b.) How do you interpret the constant in this case? Explaining why in most instances we ignore such
results.
c.) How much does fertilizer increase plant growth?
d.) What is the regression’s predicted yield for a plant exposed to 50 kilograms of fertilizer? Do you think
this prediction is reliable? Why or why not?
a.) The unbiasedness of our estimator will depend on the first three assumptions of the classical linear
regression model being satisfied. That is, we need our model to be correctly specified, the error terms to
have a zero mean, and our independent variables to be exogenous with respect to the error term. It seems
to be the case that our estimates will be unbiased given the controlled setting of the environment, in
particular, the ability to control for omitted variables which could be correlated with our independent
variables.
b.) This suggests a predicted yield of 12.1 in the case where zero fertilizer is applied and a plant is
completely deprived of water. Even our basic understanding of biology as economists would suggest that
this type of interpretation should not be pushed too far as water, in particular, is necessary for plant
growth and so it is unreasonable expect any yield in its absence. This reflects the fact that the constant
term absorbs the constant effect of any omitted variables.
c.) The results suggest that applying an entire kilogram of fertilizer in a week is leads on average to an
additional 5.5 grams of yield from hemp plants.
d.) The predicted value would be 12.1 plus 5.5 * 50 which equals 287.1 grams. But we probably should
not take this at face value. First of all, this is for a level of fertilizer which is well outside of our sample of
values for X which is bounded by zero and one. Second of all, we would expect some sort of diminishing
marginal returns to set in above some threshold level. It is unlikely then that this would be a very
accurate prediction. Finally, this also implicitly assumes no application of water which, even not being
botanists, we can assume is bad for plant yields.
10
Page intentionally left blank. Use this space for rough work or the continuation of an answer.
11
Useful Formulas:
E( X ) 

p x
i i
2
i
X
2 pi
Pr( X  x, Y  y)
Pr(Y  y | X  x) 
Pr( X  x)
k
Pr( X  x)   Pr X  x, Y  yi 
i 1
m
E Y    E Y | X  xi  Pr X  xi 
k
E Y | X  x    yi PrY  yi | X  x 
i 1
i 1
k
Var (Y | X  x)    yi  E Y | X  x  PrY  yi | X  x 
Ea  bX  cY   a  bE( X )  cE (Y )
2
i 1
Cov( X , Y )   x j   X  yi  Y  PrX  x j , Y  yi 
k
m
i 1 j 1
Cov X , Y 
Var  X Var Y 
Corr  X , Y    XY 
Cova  bX  cV ,Y   bCov( X ,Y )  cCov(V ,Y )
E XY   Cov( X ,Y )  E( X ) E(Y )
t
X 
1
xi  x 2
s 

n  1 i 1
2
i
i 1
Z
s/ n
n
n
Var a  bY   b 2Var (Y )
Var aX  bY   a 2Var ( X )  b 2Var (Y )  2abCov( X ,Y )
E Y 2   Var (Y )  E (Y ) 2
x
k
i 1
i 1
1
X 
n
  x  
Var ( X )  E  X   X  
k
s XY 
X 
 2 
X ~ N  , 
n 

rXY  s XY / s X sY

n
1
 xi  x  yi  y 
n  1 i 1
 X
n
For the linear regression model Yi   0  1 X i   i , ˆ1 
i 1
i
 X Yi  Y 
n
 X
i 1
i
X
2
& βˆ0  Y  ˆ1 X
Yˆi  ˆ0  ˆ1 X 1i  ˆ2 X 2i    ˆk X ki
e2
ESS TSS  RSS
RSS

i i
R 

 1
 1
2
TSS
TSS
TSS
 Yi  Y 
 e / (n  k  1)
R  1
 Y  Y  / (n  1)
e /  n  k  1
ˆ  ˆ     
Var
 
 X  X 
2
s2 
e
2
i
2
where E  s 2    2
 n  k  1
i
i
2
i i
2
i i
2
i i
1
2
i
Z
ˆ j   H
Var[ ˆ j ]
~ N  0,1
Pr[ˆ j  t* /2  s.e.(ˆ j )   j  ˆ j  t* /2  s.e.(ˆ j )]  1  
 e  e 
d
 e
T
t 2
t
T
2
t 1 t
t 1
t
F
i
ˆ1   H
~ tn k 1
s.e.( ˆ1 )
ESS / k
ESS (n  k  1)

RSS / (n  k  1)
RSS
k
2
 2(1   )
12
Download