The total sum of squares is defined as

advertisement
Economics 405
Due—October 16 at 6:30 pm.
Problem Set #2
Instructions: Show all of your work and give complete explanations. As noted on the
course syllabus, the problem set will be graded on a credit/no-credit basis. You
must give a substantive answer to each question/problem. Non-attempts and weak
attempts will be docked accordingly. Group work is discouraged. To the extent
that you do work with a colleague, be absolutely sure to give answers in your own
words. Duplicate answers will automatically be assigned a 0.0.
n
1.
The total sum of squares is defined as SST   ( y i  y ) 2 .
i 1
a.
n
n
n
i 1
i 1
i 1
Show that SST   ( y i  yˆ i ) 2   ( yˆ i  y ) 2  2 ( y i  yˆ i )( yˆ i  y ) .
 ( y  y )   ( y  yˆ  yˆ  y )  [( y  yˆ )  ( yˆ
 {[( y  yˆ )  ( yˆ  y )]  [( y  yˆ )  ( yˆ  y )]}
 {( y  yˆ )  ( yˆ  y )  2( y  yˆ )( yˆ  y )}
2
2
i
i
i
i
i
i
i
i
i
2
i
i
i
i
i
 y )] 2
i
2
i
i
i
i
Then applying Property Sum.3 (p.708), we obtain:
n
n
n
i 1
i 1
i 1
SST   ( y i  yˆ i ) 2   ( yˆ i  y ) 2  2 ( y i  yˆ i )( yˆ i  y )
n
b.
Show that the last term on the right-hand-side of a.), 2 ( y i  yˆ i )( yˆ i  y ) ,
i 1
equals 0. (Hint: First show that
n
n
i 1
i 1
 ( yi  yˆ i )( yˆ i  y )   uˆi (ˆ0  ˆ1 xi  y ) , then apply the algebraic
properties of OLS in conjunction with the rules of summation.)
By definition, uˆ i  yi  yˆ i and yˆ i  ˆ0  ˆ1 xi . Substituting the residual
into the expression inside the first set of brackets and the predicted value
of y into the second set of brackets, we get the hint. Therefore:
n
 uˆ (ˆ
i 1
i
0
 ˆ1 xi  y )   {ˆ0 uˆ i  ˆ1 xi uˆ i  yuˆ i } . Recognizing that
ˆ0 , ˆ1 , and y are constants in a given sample and applying Property
Sum.3, we obtain:
2
n
n
n
i 1
i 1
i 1
 {ˆ 0 uˆ i  ˆ1 xi uˆ i  yuˆ i }  ˆ 0  uˆ i  ˆ1  xi uˆ i  y  uˆ i
The sum of the OLS residuals equals zero (see p. 40 of the text), therefore terms 1
n
and 3 must both equal 0. Furthermore,
 x uˆ
i 1
i
i
 0 follows from the second first-
order-condition for the minimization of SSR (see p. 40 or p. 30), therefore the
second term in the expression above must also equal zero. Thus:
n
2 ( y i  yˆ i )( yˆ i  y )  2  0  0 .
i 1
n
2.
The OLS estimator of the slope coefficient is ̂1 
 (x
i 1
i
n
 (x
i 1
n
a.
Show algebraically that ˆ1  1 
 (x
i 1
i
 x )( y i  y )
i
 x)2
 x )u i
SSTx
.
See pp. 53-54 in the text or lecture notes from the second half of class on
September 25.
b.
Using the expression in a.), explain why it is unlikely that ˆ1   1 for a
given sample of data.
There is nothing to force the second element on the right hand side of
n
ˆ1  1 
 (x
i 1
i
 x )u i
to add up to zero. ui is the population error.
SSTx
While on average it is equal to zero, any given value can be positive or
negative. Therefore, over a sample of n observations, the sum of the
product of the mean deviations of the regressor and the population errors
can be positive or negative depending on the sample of data.
c.
Given the Simple Linear Regression Assumptions (SLR.1 – SLR.4), show
that ˆ1 is an unbiased estimator of  1 .
Proof is given in Theorem 2.1 on p. 54 of the text and in the lecture notes
from the second half of class on September 25.
3
d.
Under what circumstances is the zero conditional mean assumption not
valid? If the zero conditional mean assumption is not valid, does that
imply that ˆ1 is a biased estimator of  1 ? Why or why not?
The zero conditional mean assumption is not valid when the covariance
between the regressor and the error is not equal to zero. The error contains
factors that determine the dependent variable which have not been
included in the systematic component of the regression model. To the
extent that any of these factors are correlated with the regressor, the zero
conditional mean assumption will not be valid.
If the zero conditional mean assumption is not valid then, E (u | x)  0 .
As a result,
n
E ( ˆ1 )  1  (1 / SSTx ) ( xi  x )E (u i | x)  1 , since the sum term does
i 1
not equal zero when the zcma is not valid. Specifically, if x and u are
positively correlated, then the products inside the sum will tend to be
positive. Likewise, if correlation between x and u is negative, then the
products inside the sum will tend to be negative.
3.
a.
If the errors, ui, have constant variance regardless of the value of X, then
we say that they are homoskedastic. True or false? Explain and illustrate
with the appropriate graph.
True. By definition, the errors are homoskedastic if Var (u | x)   2 . This
says that the variance of the errors is the same regardless of the value of x.
See Figure 2.8 on p. 58.
b.
If the errors, ui, are heteroskedastic, then the OLS estimators, ̂ 0 and ˆ1 ,
are biased. True or false? Explain.
This statement is false. Unbiasedness of the OLS estimators requires
assumptions SLR.1-SLR.4. The homoskedasticity assumption SLR.5 is
not necessary to show unbiasedness of the OLS estimators.
n
 uˆ
4.
i 1
2
i
is a biased estimator of the error variance, Var (u ) . Is this
n
estimator biased in the downward direction or the upward direction? Explain.
The claim was made in class on October 2 (the claim’s proof is part of the proof
Show that
n
of Theorem 2.3 on p. 62 of the text) that E ( uˆ i2 )  (n  2) 2 . Therefore,
i 1
4
n
 uˆ
2
i
n2 2
   2 . The punch line
n
n
i 1
here is that the sample average of the squared residuals biased in the downward
direction, i.e., it would systematically underestimate the true population variance.
Note that with a small sample, this would be a big problem. For example, n = 3
=> (n-2)/n = 1/3 = .333. With a large sample, there wouldn’t be much bias at all.
For example, n = 1000 => (n-2)/n = .998  1.
E(
5.
i 1
n
)  (1 / n) E ( uˆ i2 )  (1 / n)( n  2) 2 
I obtained a sample of 88 home prices (measured in thousands of dollars). I
regressed the house sale price (price) on house size (sqrft, measured in hundreds
of square feet, e.g., 2300 hundred square feet implies sqrft = 23.0). Here’s what I
found:
Dependent Variable: PRICE
Method: Least Squares
Sample: 1 88
Included observations: 88
Variable
Coefficient
Std. Error
t-Statistic
Prob.
SQRFT
C
14.02110
11.20414
________
24.74261
0.452828
0.0000
0.6518
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
a.
0.620797
0.616387
________
348053.4
-489.3087
1.728723
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
293.5460
102.7134
11.16611
11.22241
140.7913
0.000000
Write down the fitted model.
^
price  11.204  14.021sqrft
b.
R 2  .621
Determine ̂ . Interpret the value you obtain (be sure to make your
interpretation relative to the standard deviation of the dependent variable).
ˆ 
SSR

n2
348053.4
 63.617
88  2
The standard deviation of the dependent variable is 102.713, which, for
this sample, implies that the typical amount of deviation from the mean for
a given observation is $102,713 (since the dependent variable is measured
in thousands of dollars). The standard error of the regression, at 63.617,
implies that the typical amount of discrepancy between an observed value
5
and the predicted value based on the regression relationship is $63,617. In
other words, the typical amount of deviation in an observation due to
unobservable factors is $63,617.
c.
It turns out that SSTx  2898.406 . Determine SE ( ˆ1 ) . Interpret the
value you obtain.
SE ( ˆ1 ) 
ˆ

SSTx
63.617
2898.406
 1.182 . The point estimate of  1 is
14.021. The standard error of ˆ1 gives us an idea of how precise the
estimate of the parameter is. The closer the standard error is to 0, the
more precise the estimate. In the problem at hand, we have estimated that
an extra hundred square feet in house size is associated with a $14,021
increase in house price. The standard error of the estimate indicates that
the true impact of a hundred square foot increase in house size on price
could be as much $15,000+ or as little as a bit less than $13,000.
6.
Let ̂ 0 and ˆ1 be the intercept and slope from the simple regression of yi on xi,
~
~
using n observations. Let c1 and c2 , with c2  0 , be constants. Let  0 and  1
be the intercept and slope from the regression of c1 yi on c2 xi .
a.
c
~
~
~
Show that 1  ( 1 ) ˆ1 and  0  c1 ˆ0 . [Hint: To obtain  1 , plug c1 yi
c2
and c2 xi into equation (2.19) from the text. Then, use equation (2.17)
~
~
from the text for  0 , being sure to plug in c1 yi and c2 xi and  1 .]
The formula for the OLS slope coefficient estimator is
 ( xi  x )( yi  y ) . In this problem the dependent variable is c y and the
1 i
 ( xi  x ) 2
regressor is c2 xi . Note that the mean of the dependent variable is
c y
i
 c1
y
 c1 y and, by similar reasoning, the mean of the
n
n
regressor is c2 x . Plugging into the formula for the OLS estimator we get:
1
i
6
~
1 
 (c x  c x )(c y  c y )   c c ( x  x )( y
 (c x  c x )
 c (x  x)
2
i
2
1
1
1 2
2
1 i

i
i
2
2
1
 y)
i
i
2

c1c 2
c 22
 ( x  x )( y  y )
 (x  x)
i
i
2
i
c1 ˆ
1
c2
The general formula for the OLS intercept estimator is ˆ0  y  ˆ1 x .
Recalling the definitions of the dependent variable and the regressor in the
case at hand and plugging into the general formula, the intercept estimator
is
~
~
 0  c1 y  1c2 x  c1 y 
b.
c1 ˆ
1c2 x  c1 y  c1 ˆ1 x  c1 ( y  ˆ1 x )  c1 ˆ0 .
c2
Using the Ceosal1 data from the textbook’s data files, I regressed salary
on roe and obtained
^
salary = 963.19 +18.501roe
R2 = .013
Let c1 = 1000 and c2 = 1/100. What do the results from part a.) imply
regarding the estimated slope and intercept coefficients from a regression
of c1 yi on c2 xi ? With the proposed transformations of x and y, what units
are the variables measured in?
c1 ˆ
1000
) 1  (
)18.501  1,850,100
c2
1 / 100
~
  c ˆ  1000  963.19  963,190
~
1  (
0
1
0
Since salary is measured in thousands of dollars, the transformation puts
the dependent variable into dollars. Since roe is measured in percentage
points, the transformation puts the regressor into decimal terms. For
example, if roe = 20.0, then the transformation implies c2xi = .20.
ˆ1  18.501 implies that a 1 percentage point increase in roe raises CEO
~
salary by $18,501. 1  1,850,100 implies that a 1 unit change in c 2 x
raises CEO salary (measured in dollars) by $1,850,000. A 1 unit change
in the transformed regressor, however, is much too large a change to
consider. That would be like going from a return on equity of 20% to
120%! What’s a more reasonable change to look at? A 1 percentage point
change in the transformed regressor would be .01 (e.g., .20 to .21).
Therefore, the predicted increase in salary (measured in dollars) due to a
~
.01 change in c 2 x is c1 y  1 (c 2 x)  1,850,000  .01  $18,500 . If this
looks like the interpretation for ˆ , it should, since they are the same! In
1
7
other words, the transformations employed here do not alter the
fundamental relationship between the dependent and independent
variables.
c.
Use Eviews to actually run the regression of c1 yi on c2 xi proposed in b.
Note that you’ll have to GENR the transformed variables. Attach your
computer output. Does the regression confirm your claim in part b? (It
should.)
Dependent Variable: C1Y
Method: Least Squares
Date: 10/15/07 Time: 06:13
Sample: 1 209
Included observations: 209
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C2X
C
1850119.
963191.3
1112325.
213240.3
1.663290
4.516930
0.0978
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.013189
0.008421
1366555.
3.87E+14
-3248.264
2.104990
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
1281120.
1372345.
31.10301
31.13499
2.766532
0.097768
The claims in b.) are confirmed.
d.
Compare the R2 of the new regression with the original regression. Which
is greater? Explain.
The R2s of the two regression are the same. This shows that changing the
definitions of the dependent and independent variables by linear
transformation has no impact on the goodness of fit of the model. The
underlying amount of variation in the dependent variable explained by
variation in the regressor remains constant.
7.
Use Eviews to access Wage2 from the textbook’s data files. The variables of
interest are monthly salary (wage) and IQ score (IQ).
8
a.
Use EViews to generate the descriptive statistics table for wage and IQ.
Attach the printout. What are average wage and average IQ in the
sample? What is the sample standard deviation of IQ?
Date: 10/15/07
Time: 06:24
Sample: 1 460
b.
WAGE
IQ
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
1005.246
960.0000
3078.000
233.0000
409.6735
1.316589
6.029875
104.7196
106.0000
145.0000
59.00000
13.82357
-0.295697
2.884268
Jarque-Bera
Probability
308.8473
0.000000
6.960182
0.030805
Observations
460
460
Estimate a level-level model with wage as the dependent variable and IQ
as the regressor. Use your estimated model to determine the predicted
increase in wage for a 15 point increase in IQ. Does IQ account for most
of the variation in wage? Explain.
Dependent Variable: WAGE
Method: Least Squares
Date: 10/15/07 Time: 06:27
Sample: 1 460
Included observations: 460
Variable
Coefficient
Std. Error
t-Statistic
Prob.
IQ
C
8.096739
157.3586
1.332109
140.7054
6.078137
1.118355
0.0000
0.2640
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.074642
0.072622
394.5175
71284974
-3401.435
1.794438
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
1005.246
409.6735
14.79754
14.81550
36.94375
0.000000
The fitted model is:
waˆge  157.36  8.097 IQ
R 2  .075
waˆge  ˆ1 IQ  8.097  15  121.45 . A 15 point increase in IQ raises
predicted monthly earnings by $121.45. Since R 2  .075 , variation in IQ
scores does not account for much variation in monthly earnings.
9
c.
GENR a new variable that gives the logarithm of wage. Call this new
variable logwage. (Note I want you to perform this step despite the fact
that the data file already contains the logarithm of wage (lwage)). Attach
the descriptive statistics table for logwage.
Date: 10/15/07
Time: 06:47
Sample: 1 460
LOGWAGE
WAGE
6.835596
6.866931
8.032035
5.451038
0.396488
-0.138713
3.399628
1005.246
960.0000
3078.000
233.0000
409.6735
1.316589
6.029875
Jarque-Bera
Probability
4.536122
0.103513
308.8473
0.000000
Observations
460
460
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
d.
Estimate a log-level model, using logwage as the dependent variable and
IQ as the regressor. Attach the printout and write down the fitted model.
If IQ increases by 15 points, what is the approximate percentage increase
in predicted wage? Explain.
Dependent Variable: LOGWAGE
Method: Least Squares
Date: 10/15/07 Time: 06:50
Sample: 1 460
Included observations: 460
Variable
Coefficient
Std. Error
t-Statistic
Prob.
IQ
C
0.007968
6.001208
0.001287
0.135991
6.188747
44.12959
0.0000
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.077172
0.075157
0.381298
66.58771
-208.1892
1.795664
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
F-statistic
Prob(F-statistic)
6.835596
0.396488
0.913866
0.931828
38.30059
0.000000
^
The fitted model is: log wage  6.001  0.00797 IQ
R 2  .077
10
In a log-level model, the percent change in the dependent variable due to a
1-unit change in the regressor is 100  ˆ1 . Therefore, the regression model
here implies that monthly earnings increase by .797% for a 1 point
increase in IQ. Accordingly, a 15 point increase in IQ is predicted to
increase earnings by .797% 15  11.95% (i.e., 100ˆ1  IQ  %wage ).
Download