1 ECN 405 Exam I Key 1. Population Regression Model: Systematic

advertisement
ECN 405
Exam I Key
1.
Population Regression Model:
y   0  1 x  u
Systematic Component:
Stochastic Component:
Parameters:
 0  1 x
u
 0 (the intercept) and  1 (the slope)
2.
Experimental data are obtained via a controlled experiment. In such a setting, the
researcher/econometrician varies the level of the explanatory variable (x) randomly and
observes the resulting value of the dependent variable (y). Since x values are assigned
randomly, they should not be correlated with other determinants of y lurking in the error term
(u). Thus experimental data should typically satisfy the zero conditional mean assumption.
Unfortunately, however, we almost never have experimental data to work with in econometric
problems. Instead, econometric problems usually involve observational data. The problem with
observational data is that the researcher has no control over the value of x. Its value is
determined simultaneously with y. As a consequence, x values may be correlated with other
factors that determine y but which are not observed directly and which, consequently, are in
the error term. If x is correlated with any element of the error term, then the zero conditional
mean assumption is violated and  1 , the pure effect of x on y, cannot be estimated without
bias.
Problem Set #1, question 5 and Wooldridge end-of-chapter question 2.11 give good illustrations
of the difference between the two types of data.
3.
4.
This is a slightly recycled version of Wooldridge end-of-chapter question 2.1 (p. 61). See the
Chapter 2 Solutions on the course web page for one possible way to answer the question.
First, note that:
n
 x (y
i 1
i
n
i
 y )   [ xi y i  xi y ]
i 1
n
  x i y i  y  x i , by summation rules 3 and 2. Then multiplying the last
i 1
term by n/n, we get:
  xi y i  ny
x
n
i
  xi y i  nx y .
Second, note that:
n
n
i 1
i 1
 ( xi  x )( yi  y )   [ xi yi  xi y  xy i  xy ]
2
  xi yi  y  xi  x  yi  nxy
, where summation rule 3 was applied
to distribute the sum sign, summation rule 2 was applied in order to get terms 2 and 3, and
summation rule 1 was applied in order to obtain the last term. Then multiplying both terms 2
and 3 by n/n, we get:
  xi y i  ny
x
n
i
 nx
QED.
5.
R 2  .9
R 2  .1
y
n
i
 nx y   xi y i  nx y  nx y  nx y   xi y i  nx y i
3
6.
True. By definition, errors are homoskedastic if Var (u | x)   2 . This means that variation of
observed y from predicted y is the same regardless of the value of x. See Figure 2.8 in the text.
Here’s a two-dimensional, hand-drawn rendering:
7.
Definitely false. The variance of ˆ1 decreases as sample size , n, increases. From the formula
n
2
ˆ
sheet, Var (  1 ) 
. Furthermore, SST x   ( x i  x ) 2 . All terms on the right-hand side
SSTx
i 1
of SSTx are positive by virtue of the fact that they are squared. Therefore if the sample is
increased, that will put additional positive terms in the sum of squared deviations thereby
increasing SSTx. Since SSTx is in the denominator of the right-hand-side of Var ( ˆ1 ) , an increase
in SSTx induced by an increase in sample size necessarily shrinks the variance of ˆ1 . This result
should seem quite intuitive. A larger sample size implies you have more information about the
population. More information about the population should, of course, buy you more precise
estimates of the population parameters.
8.
log( y)   0  1 x .
Log-level model:
y is measured in log terms, x is said to be measured in “level” terms. We use the log-level model
whenever we believe a change in x will induce a constant percentage change in variable y,
rather than a constant absolute change in y (as is the case in a level-level model). A log-level
model allows us to capture a non-linear relationship between y and x using a linear regression
model. Hence, linear regression allows for much more flexibility in functional form than one
might initially presume.
4
9.
10.
a.
SLR.1 requires that the model be “linear in the parameters” which is the case in the
model presented. SLR.1 does not require that the model be linear in the dependent and
independent variables.
b.
The slope coefficient of the log-log model is the elasticity of the predicted value of the
dependent variable with respect to the regressor. In the problem at hand, this implies
that the elasticity of predicted CEO salary with respect to firm sales is .257. So this
would imply that a 1% increase in firm sales is associated with a .257% increase in
predicted CEO salary. You could also say based on this elasticity that a 10% increase in
firm sales is associated with a 2.57% increase in predicted CEO salary.
Partially differentiating ŷ with respect to x1 yields ˆ1 , which is the change in the predicted
value of y due to a 1-unit change in x1, holding x2 constant. It measures the pure effect of x1 on
y. Similarly, partial differentiation of ŷ with respect to x 2 yields ˆ 2 , which is interpreted as
the change in predicted y due to a 1-unit change in x2 holding x1 constant. We no doubt know
from other econ classes that such ceteris paribus effects are of the utmost importance in
economic analysis. Multiple regression analysis allows us to conduct ceteris paribus analysis and
this is a key reason why we view regression as such a powerful tool in the economist’s tool kit.
11.
a.
Given k = 5, there are six parameters to be estimated (don’t forget  0 !). Hence the SSR
is a function of { ˆ0 , ˆ1 , ˆ2 ,..., ˆ5 }. As a consequence, SSR must be partially
differentiated with respect to each of these choice variables. By implication, there will
be six normal equations (or first order conditions). They are listed on p. 74 for the
curious.
b.
SST measures variation in the dependent variable about its own mean. It is a constant;
the number of regressors has no effect on SST.
An additional regressor will reduce SSR (or possibly leave it the same, but reduction in
SSR is most likely). The reason this must be is that when the sixth regressor is added, it
must at the very least be the case that the same SSR that was found with five regressors
could once again be achieved. This is because the computer has the option of choosing
ˆ6  0 and using the same set of ˆ0 , ˆ1 , ˆ2 ,..., ˆ5 as in the five regressor case. This
would then result in exactly the same SSR as in the five regressor case. So what this says
is that if the sixth regressor has no explanatory power whatsoever, then SSR will be the
same as it was when only five regressors were included. On the other hand, if the sixth
regressor has even a small amount of explanatory power, then ˆ 6  0 . In this event,
the sixth regressor has some degree of explanatory power, which implies some
reduction in the unexplained variation in y measured by SSR.
If a sixth regressor is added, then R 2 cannot decrease and will most likely increase
rather than stay the same. R 2 will increase as long as SSR decreases when the sixth
5
regressor is added. This follows because: R 2  1 
SSR
. Clearly, if SSR decreases and
SST
SST is constant, then the right-hand-side of the statement must increase. Adding
regressors will generally drive R 2 up; the real question is whether R 2 increases
meaningfully.
Longer Problems
1.
a.
 ( x  x ) and y     x  u . Substituting for y in the
numerator and for  ( x  x ) in the denominator of the term after the second
2
Note that SSTx 
i
i
0
1 i
i
i
2
i
equality yields:
ˆ1 
 (x
i
 x )(  0   1 x i  u i )
SSTx
. Doing the multiplication implied inside the
summation gives:
ˆ1  (1 / SSTx ){ 0 ( xi  x )  1 xi ( xi  x )  ( xi  x )u i }.
Next, invoke summation rule 3 to distribute the summation operator to the terms inside
the sum and invoke summation rule 2 to move multiplicative constants outside of the
relevant summation operator. These actions give the following result:

n


i 1

ˆ1  (1 / SSTx )  0  ( xi  x )  1  xi ( xi  x )   ( xi  x )u i } .
n
Note that the first sum is 0 since
 (x
i 1
i
 x )  0 . Furthermore, note that
1  xi ( xi  x )  1  ( xi  x ) 2  1 SSTx . Substituting these results into ˆ1 and
re-arranging yields the desired expression:
n
SSTx
ˆ1  0   1

SSTx
b.
 (x
i
 x )u i
SSTx
 1 
 (x
i 1
i
 x )u i
SSTx
.
To prove that ˆ1 is unbiased, we must show that E ( ˆ1 )   1 . Assume a given sample
of data, hence we’ll take expectations conditional on the sampled values of the
regressor. Now, take the expectation on both sides of the most recent expression of
ˆ1 :
6


E ( ˆ1 )  E   1 



n

 x )u i 
 . From the formula sheet, apply rule of expectation #2
SSTx



 (x
i 1
i
to the right hand side expression. This yields:
  ( xi  x )u i 
E ( ˆ1 )  E (  1 )  E 
.
SSTx


 1 is a parameter and, therefore, is constant. Therefore: E(1 )  1 , by expectation
rule #1. Moreover, given that we’re conditioning on the sampled values of x, SSTx is
treated as a constant as is any ( xi  x ) in the numerator of the second term on the
right hand side. Now apply rule of expectation #3 to the second term on the right hand
side. This results in:
n
E ( ˆ1 )   1  (1 / SSTx ) ( xi  x )E (u i ) .
i 1
By SLR.4, E (u | x)  E (u )  0 , therefore:
n


E ( ˆ1 )  1  (1 / SSTx )  {0   ( xi  x )}  1 .
i 1


Invoking SLR.2, the above result holds for any randomly selected sample, hence ˆ1 is an
unbiased estimator of  1 .
c.
Answers will vary, but here’s how I would answer. ˆ1 , is unlikely to equal  1 for any
particular random sample of size n that we might draw. The unbiasedness property
implies, however, that if we repeatedly drew random samples of size n from the
population and ran the regression for each such sample, we would then find that the
average value of ˆ1 across the many samples of size n would equal the parameter  1 .
2.
a.
By SLR.5, Var (u | x)   2 . Furthermore, it follows from the general definition of
variance that Var (u | x)  E[(u  E (u | x)) 2 | x]  E (u 2 | x)  [ E (u | x)] 2 . By virtue
of SLR.4, [ E (u | x)] 2  0 , therefore Var (u | x)   2  E (u 2 | x) . Since the
conditional variance of u given any x is the constant  2 , it follows that the
unconditional expectation of u 2 , i.e., E (u 2 ), is equal to  2 . In other words, the error
variance,  2 , is the average squared error in the population. We don’t observe the
7
population, we observe the sample. Furthermore, we don’t observe the errors in the
sample, rather we observe the residuals. We nevertheless think of the residuals as
estimates of the errors. Hence a reasonable approach to estimating the average
squared error in the population is to use the average squared residual in the sample.
b.
c.
 n 2
  uˆ i 
  2.
To show that the proposed estimator is biased, show that E  i 1
 n 




n

2 
  uˆ i 
n2 2
n

E  i 1   (1 / n) E  uˆ i2   (1 / n)( n  2) 2 
   2 . What this shows
n
 n 
 i 1 


is that the average of the average squared residual across repeated samples of size
n does not equal the parameter it is intended to estimate. Therefore such an
estimator is deemed to be biased.
n2 2
   2 , therefore the proposed estimator would systematically under-estimate
n
the true error variance in repeated sampling.
d.
As n increases,
n2
 1 , hence the degree of bias decreases as the sample size gets
n
larger. Recall the two examples given in class. First, consider a really small sample, say,
n = 3. If we repeatedly drew samples of n = 3 from the sample, the average value of the
proposed estimator would be
32 2
  (1 / 3) 2  .333 2 . In other words, in this
3
scenario the estimator is very likely to be much smaller than the true error variance.
Second, consider a relatively large sample of, say, n = 1000. If we repeatedly drew
samples of this size from the population, the average value of the proposed estimator
across such samples would be
1000  2 2
  (998 / 1000) 2  .998 2   2 . In the
1000
second scenario, the average value of the proposed estimator and the true error
variance are virtually indistinguishable. A moral of this story is that the degree of
freedom correction is important in small sample settings but not in large sample
settings.
^
3
a.
math10  37.36  .00246 exp end
n  408, R 2  .033
8
b.
ˆ1  .00246 implies that one extra dollar spent per student in the school district is
associated with an increase of .00246 of a percentage point in the pass rate of the test.
Not much!
c.
According to the table of regression results, SE ( ˆ1 )  .00066 . In other words, if we
repeatedly sampled, we estimate the typical amount of discrepancy between  1 and
ˆ1 to be .00066. This seems like a pretty good degree of precision!
d.
R 2  .033 implies that 3.3% of the total variation in the pass rate on the test is
explained by variation in school district spending per student. So variation in the pass
rate is explained largely by factors other than school district spending. Hmmm.
e.
ˆ  10.332 . This can be read directly from the regression output at the entry for Root
MSE. Alternatively, the value could be calculated from the ANOVA table as the square
root of 106.748728, which is the entry corresponding to the Mean Square Residual.
ˆ is otherwise known as the standard error of the regression. It aims to estimate the
typical amount of vertical distance between the observed and the predicted value of the
dependent variable.
f.
Recall from Problem Set 2, Question 7 that linear transformation of the dependent
variable to c1y and of the independent variable to c2x would yield a slope coefficient of
~
1 
c1 ˆ
~
 1 and an intercept coefficient of  0  c1 ˆ1 , where ˆ1 and ˆ0 are the
c2
relevant regression coefficients from the original regression. In the problem at hand,
the dependent variable is not transformed, hence c1  1 . The independent variable
expend is transformed by dividing by 100, putting expend in hundred dollar units rather
than dollar units. Accordingly, c2  1 / 100 . The implication is that the intercept in the
regression using the transformed independent variable is the same as in the original
~
regression, i.e.,  0  1  ˆ0  37.36 . In other words, whether we measure spending
per student in dollar units or hundred dollar units, the predicted pass rate is 37.36% if
expenditure per student is 0. The implication for the slope coefficient in the new
~
regression is  1 
1
 .00246  100  .00246  .246 . In other words, dividing
(1 / 100)
expend by 100 has the effect of moving the decimal point over two places. The
interpretation of the slope coefficient in the transformed regression is that a $100
increase in spending per student is associated with a .246 percentage point increase in
the test’s pass rate. It should, on reflection, seem that the best way to define the
spending variable here is to use the measure set in hundred dollar units. With the
variable defined as it was originally, the interpretation should seem a bit strained.
Relative to the mean of expend (= $4,376.58), a $1 change is infinitesimal as is the
implied impact on the pass rate of .00246 of a percentage point. The moral of this story
is that proper choice of units of measurement for the variables can lead to simpler,
more straightforward interpretation of resulting regression coefficients.
9
R 2 remains at .033 in the transformed regression, since linear transformation of the
variables does not alter the relative amount of variation in the dependent variable
explained by variation in the independent variable.
Download