Lecture 8

advertisement
Econ 140
Classical Regression II
Lecture 8
Lecture 8
1
The story so far...
Econ 140
• We learned how to compute least squares estimates
• We talked about the assumptions underlying the CLRM:
1) Y and e are random variables
2) Xi is nonrandom (it’s given)
3) E(ei) = E(ei|Xi) = 0
4) V(ei)= V(ei|Xi) = 2
5) Covariance (eiej) = 0
Clear about difference of ei and ei.
Note that a and b (also denoted a^ and b^) are estimates
of a and b; they are also random variables and have
sampling distributions.
Lecture 8
2
Today’s Plan
Econ 140
• Inference with the classical linear regression model
– Calculating the standard error
– Calculating the t-ratio
– Root-mean square error
– 95% confidence intervals
– ANOVA tables
– ANOVA table: ANOVA stands for analysis of variance
Lecture 8
3
Variation around the regression line
Econ 140
• iid, and assumed normal:
Y3
Yˆ
Y1
Y2
X
X1
Lecture 8
X2
X3
4
Sum of Squares Identity
Econ 140
• Let’s take one point, X1 and look at it graphically:
Y
3. Residual
unexplaine d
ˆ - Y)  e
(
Y
1. total
(Y  Y )
Yˆ
2. Model, explained
(Yˆ  Y )
Y
Lecture 8
X1
5
Sum of Squares Identity (2)
Econ 140
• The Sum of Squares Identity is
Total = Explained + Unexplained
or
2
2
2
ˆ
ˆ
(
Y

Y
)

(
Y

Y
)

(
Y

Y
)



Lecture 8
6
Sum of Squares Identity (3)
Econ 140
2
ˆ
(
Y

Y
)
reveals how much of the variation is explained

by the regression line
2
ˆ
(
Y

Y
)
reveals how much of the variation is not

explained by the regression line, or is left over
– Notice that this is also equal to  e2
 (Y  Y ) 2 reveals how much total variation there is
– remember in a previous lecture we said that
 (Y  Y )  0
Lecture 8
7
How to calculate sum of squares
Econ 140
• We can write the total sum of squares as
2
2
(
Y

Y
)

y


– We’re given the Y values so we can compute Y
• We can write the explained sum of squares as
2
ˆ
(
Y

Y
)

– Calculating the ESS: bxy
Lecture 8
8
How to calculate sum of squares (4) Econ 140
• We can calculate the unexplained variation (the
unexplained sum of squares) as the difference between the
total and the explained sum of squares:
2
2
e

y

  b  xy
• Because we have to consider degrees of freedom when
calculating each variance term, we divide the SSI by the
corresponding degrees of freedom:
2
2
 (Y Y )  (Yˆ Y )
 (Yˆ Y )


n 1
1
n2
Lecture 8
9
How to calculate sum of squares (5) Econ 140
• The residual variance of the regression line is
2
2
ˆ
e
 i
 (Y Y )

 ˆ 2yx
n2
n2
• If we take the square root we get the root mean square
error (root MSE):
ˆ 2yx  ˆ yx
Lecture 8
10
Calculating test statistics
Econ 140
• We can calculate test statistics from the sum of squares
statistics
• The variance of b, the slope coefficient is
ˆ b2 
ˆ 2yx
x
2
– Where x  ( X  X )
Lecture 8
11
Calculating test statistics (2)
Econ 140
• The standard error of b is
ˆ 2yx
ˆ b ˆ b2 
x
2
• The variance of the intercept a is
ˆa2 
2
X

n x 2
 ˆ 2yx
• The standard error of a is
2
Lecture 8
ˆa  ˆa 
2
X

2
 ˆ yx
n x 2
12
Confidence intervals
Econ 140
• Once we have the standard errors, we can do two things:
– form a confidence interval
– perform a hypothesis test
• A confidence interval for b:
b  b  ta / 2df  ˆ b
– Where df in a bi-variate model is 2
• As with univariate cases, we can calculate a confidence
interval for b in a bi-variate case
Lecture 8
13
Hypothesis testing
Econ 140
• Set up your null hypothesis and alternative
• Determine the critical region - choose a significance level
(a)
• Using the relevant distribution, determine your critical
(tabled) value (Za/2 , or ta/2 for the moment; Fdf1,df2 and cn
soon).
• For a given sample, compute the numeric value of the test
statistic: Z*, t*, F* or c*.
• Given the decision rule, determine whether to reject or not
the null hypothesis.
Lecture 8
14
Hypothesis testing (2)
Econ 140
• For standard statistical packages, the null hypothesis is that
the population parameter is zero, or
Ho : b = 0
• Most of the time we only have a sample and an estimate b,
– we don’t know the actual population value
• Sometimes the value of b is dictated by economic theory
– in that case, a value will be imposed on b, such as b=1:
Ho : b = 1
Lecture 8
15
Hypothesis testing (3)
Econ 140
• The standard t-ratio or t statistic is
b b
t
ˆ b
• So if the null hypothesis dictates b= 0, the t-ratio becomes
b 0
t
ˆ b
Lecture 8
16
Example
Econ 140
• Data on female earnings in Illinois {spreadsheet L8.xls}
• The variables include earnings, earnings weights, and
years of education
• In this example, the first three columns represent the
‘population’. Select two samples of 30 at random from
that population. First sample, create log earnings (ln Y).
Note you can create means of X and Y. Multiply (ln Y) by
years of education (XY). Square years of education (X2).
Sum (XY) and Sum (X2). Provides all the statistics you
need to calculate the least squares line
Lecture 8
17
Example (2)
Econ 140
• I have also included an example of how to use Excel’s
LINEST to calculate the regression line
• On the web you’ll find some output from Stata using the
population and sample regressions from the Illinois data.
Try the LINEST function and check that your output
agrees with the output from Stata
• Let’s look at a graph of the sample and popluation
regression lines
Lecture 8
18
Example (3)
Econ 140
• From the spreadsheet we calculated the following:
 XY  2010.61
X  12.67
Sample size : n=30
Y  5.48
2
X
  4480
• We use these numbers to calculate b
XY  n XY 2010.6130(12.675.48) 26.30

b
 4480 30(145.61)  111.84  0.235
2
2
X
  nX
Lecture 8
19
Example (4)
Econ 140
• And to calculate a
a  Y  b X  5.48  0.235(12.07)  2.645
• Compare our estimates with the Stata output
• Now let’s use the numbers from the spreadsheet to
calculate the regression line variance
2
 yx
2
y
 b xy


n2
( 26.30 )
 16.110.235
28
 .355
Lecture 8
20
Example (5)
Econ 140
• The variance of b is
ˆ b 
2
2
ˆ yx
2
x

0.355  .0032
 111
.84
• Thus the standard error of b is
ˆ b  0.0032  .056
Lecture 8
21
Example (6)
Econ 140
• We can calculate a confidence interval for b:
b  b  ta / 2 df  ˆ b
 0.235  2.048(0.056)
• For a 95% confidence interval, b is bounded between
0.120 < b < 0.350
Lecture 8
22
Example (7)
Econ 140
• Now the hypothesis test: The Stata output gives a t-ratio of
4.06. Our null and alternative hypotheses are
Ho: b = 0
Ho: b  0
• Our t statistic:
t
b b
b
0  4.19
 0.235
.056
• Since |t| > ta/2df,, we reject the null hypothesis.
• Thus, at a 95% confidence interval, the estimate does not
equal zero
Lecture 8
23
A word on modeling
Econ 140
• The model we’ve been using is Y = a+bX
• In our spreadsheet example, our model is lnY = a + bX
• This suggests an underlying model of Y = ea+bX
• Sometimes it is better to take logs of variables to make the
relationship between Y and X linear
• Because of outliers, the underlying relationship will
sometimes look more like an upward sloping curve
• Logging the earnings and then comparing it it years of
education gives you a far more linear relationship - it does
not change your conclusions
Lecture 8
24
A word on modeling (2)
Econ 140
• We are asking the question:
• What is the increase in earnings for an additional year of
education?
• It is the differential
d (ln Y )
dX
b
• More simply we can write
ln Y1  a  bX1
ln Y2  a  bX 2
ln Y2  ln Y1  a  a  b( X 2  X 1 )
Lecture 8
25
A word on modeling (3)
Econ 140
• The difference between X1 and X2 is a discreet change in
years of education, so the difference will be one
• So we can write:
ln Y2
ln Y2  ln Y1 
 bX
ln Y1
ln(1  %of Y )  bX
% of Y  ebX  1  eb  1
• On the spreadsheet, calculate an additional year of school:
% of Y = e0.235 - 1 = approximately 26%
Enter into Excel: =exp(0.235)-1
• So b in a semi-log equation is lnY = a + bX % of Y
Lecture 8
26
Download