Document 10276418

advertisement
Mathematical Properties of the Least Squares Regression
The least squares regression line obeys certain mathematical properties which are useful to
know in practice. The following properties can be established algebraically:
a)
The least squares regression line passes through the point of sample means of Y and
X. This can be easily seen from (4.9) which can be rewritten as follows,
(4.12)
Y = b1 + b2 .X
b)
The mean of the fitted (predicted) values of Y is equal to the mean of the Y values:
Let Yˆ i = b1 + b2 . X i then we have 1
1
Yˆ = ∑ ( b1 + b 2 . X i ) = ∑( Y - b2 .X + b2 . X i ) = Y - b2 .X + b 2 .X = Y
n
n
)
c)
The residuals of the regression line sum up to zero:
1
1
∑ ei = ∑ ( Y i - Yˆ i ) = Y - Yˆ = 0
n
n
d)
(4.13
(4.14)
The residuals ei are uncorrelated with the Xi values:
∑ ei X i = ∑ ei X i - ∑ ei X since ∑ ei = 0
= ∑ ei ( X i - X )
= ∑ ( Y i - Y ).( X i - X ) - b2 . ∑ ( X i - X ) since b1 =Y - b2 .X
2
= 0 since b2 = ∑ ( Y i - Y ).( X i - X ) / ∑ ( X i - X )2
e)
The residuals ei are uncorrelated with the fitted values Yi. This property follows
logically from the previous one since each fitted value of Yi is linear function of the
corresponding Xi value.
f)
The least squares regression splits the variation in the Y variable into two components
- the explained variation due to the variation in Xi and the residual variation:
TSS = RSS + ESS
where,
(4.15)
TSS = ∑ ( Y i - Y )2
ESS = ∑ ( Yˆ i - Yˆ )2
(4.16)
RSS = ∑ ei 2 = ∑( Y i - Yˆ )2
TSS is the total variation observed in the dependent variable Y. It is called the total
sum of squares. ESS, the Explained Sum of Squares, is the variation of the predicted
values (b1+b2.X). This is the variation in Y accounted for by the variation in the
explanatory variable X. What is left is the RSS, the Residual Sum of Squares. The
reason why the ESS and RSS neatly add up to the TSS is that the residuals are
uncorrelated with the fitted Y values and, hence, there is no term with the sum of
covariances.
This last property suggests a useful way to measure the goodness of fit of the
estimated sample regression. This is done as follows,
R2 = ESS/TSS
(4.17)
where R2, called R-square, is the coefficient of determination. It gives us the proportion of
the total sum of squares of the dependent variable explained by the variation in the
explanatory variable. In fact, the R2 equals the square of the linear correlation coefficient
between the observed and the predicted values of the dependent variable Y, computed as
follows,
r=
∑ ( Y i - Y ).( Yˆ i - Yˆ )
Cov (Y , Yˆ)
=
V( Y ).V( Yˆ )
∑ ( Y i - Y )2 . ∑ ( Yˆ i - Yˆ )2
A correlation coefficient measures the degree of linear association between two variables.
Note, however, that if the underlying relation between the variables is non-linear, the
correlation coefficient may perform poorly, notwithstanding that fact that a strong non-linear
association exists between two variables.
(4.17a)
Statistical Properties of LS Linear Regression
We briefly review the main points without much further elaboration, apart from a few specific
points which concern regression only. We shall merely remind you of the results of formal
derivations without bothering about proofs which can be found in most introductory texts on
statistics or econometrics.
Standard Errors
Given the assumptions of the classical linear regression model, the variances of the least
squares estimators are given by,
1
var( b1 ) = σ 2  
n
(4.19)
σ
∑ ( X i - X )2
2
var( b2 ) =
(4.20)
Furthermore, an unbiased estimator of σ2 is given by s2 as follows:
∑ ( Y i - b1 - b2 X i )2
s =
n-2
(4.21)
2
where s2 is called the standard error of regression since σ2 is the variance of the error term
which measures the deviation of individuals points from the regression line. Replacing σ2 by
s2 in (4.19) and (4.20), we get unbiased estimates of the variances of b1 and b2. Obviously,
the estimated standard errors are the square roots of these variances.
The total sum of squares of X,
∑ ( X i - X )2
which features in the
denominator of the
variances of the intercept and slope coefficients is a measure of the total variation in the X
values. Thus, other things being equal, the higher the variation in the X values, the lower will
be the variances of the estimators, which implies that higher will be the precision in
estimation. In other words, the range of observed X plays a crucial role in the reliability of
the estimates. Think about this. It would indeed be difficult to measure the response of Y on
X if X hardly varies at all. The greater the range over which X varies, the easier it is to
capture its impact on the variation in Y.
Sampling Distributions
To construct the confidence intervals and to perform tests of hypotheses we need the
probability distribution of the errors which implies that we use the normality assumption of
the error terms. Under this assumption, the least squares estimators b1 and b2 each follow a
normal distribution. However, since we generally do not know the variance of the error term,
we cannot make use of the normal distribution directly. Instead, we use the t-distribution
defined as follows in the case of b2,
t=
b2 - β 2
_ t ( n-2 )
se( b 2 )
(4.22)
where se(b2), the standard error of b2, is given by,
se( b 2 ) =
s
(4.23)
[ ∑( X - X ) ]
2
i
1
2
using (4.20) and (4.21). The statistic, t(n-2), denotes the Student's t-distribution with (n-2)
degrees of freedom. The reason why we now have only (n-2) degrees of freedom is that, in
simple regression, we use the sample data to estimate 2 coefficients: the slope and the
intercept of the line. In the case of the sample mean, in contrast, we only estimated one
parameter (the mean itself) from the sample.
Similarly, for b1, we get,
t=
b1 - β 1
_ t( n - 2 )
se( b1 )
(4.24)
where se(b1), the standard error of b1, is given by,
1
X
se( b1 ) = s  +
2
n
∑
(
Xi - X

2


)
1
2
(4.25)
using (4.19) and (4.21).
Confidence Intervals for the Parameters ß1 and ß2
The confidence limits for ß2 and ß1 with (1-α) per cent confidence co-efficient (say, 95 per
cent, in which case α=0.05) are given by,
α

b2 + _ t  n - 2, .se( b2 )
2

(4.26)
α

b1 + _ t  n - 2, .se( b1 )
2

(4.27)
respectively, where t(n-2,α/2) is the (1-α/2) percentile of a t-distribution with (n-2) degrees of
freedom, and se(b2) and se(b1) are given by (4.23) and (4.25) respectively.
Confidence Interval for the Conditional Mean of Y
At times, we may be interested to construct a confidence interval for the conditional mean.
For example, after fitting a regression of household savings on income, we may want to
construct a confidence interval for average savings given the level of income in order to
assess the savings potential of a certain type of households. Suppose,
µ0 = β 1 + β 2 . X 0
(4.28)
i.e. µ0 is the conditional mean of Y given X=X0. The point estimate of µ0 is given by,
b1 + b2 . X 0
while its (1-α) per cent confidence interval can be obtained as follows,
α 

µ 0 + _ t  n - 2, .se( µ 0 )
2 

(4.29)
where,
1
1 (
- X )2  2
se( µ 0 ) = s  + X 0
2
 n ∑( Xi - X ) 
(4.30)
Confidence Interval for the Predicted Y Values
There are other occasions where we might be interested in the uncertainty in prediction on the
basis of the estimated regression. For example, when estimating a regression of paddy yield
(physical output per unit area) on annual rainfall, we may want to predict next year's yield
given the anticipated rainfall. In this case, our interest is not to obtain a confidence interval of
the conditional mean of the yield i.e. the mean yield at a given level of rainfall. Rather, we
want to find a confidence interval for the yield (Y0) itself, given the rainfall (X0)? Obviously,
in this case,
Y 0 = β 1 + β 2 . X 0 + ε = µ0 + ε
where µ0 is given by (4.28). The (1-α) per cent confidence interval for the Y0 given X=X0 is
then obtained as follows,
α 

Y 0 + _ t  n - 2, .se( Y 0 )
2 

(4.31)
where,
2

1 (
-X ) 
se( Y 0 ) = s  1 + + X 0

n ∑ ( X i - X )2 

2
(4.32)
In this case, therefore, the standard error of Y0 is larger than that of µ0 since the latter
corresponds to the conditional mean of the yield for a given level of rainfall, while the former
corresponds to the predicted value of the yield. In both cases, (4.30) and (4.32), the
confidence intervals will be larger, the farther the X value is away from its mean in the
sample.
Standard Error of a Residual
Finally, the residuals ei are the estimators of errors εi (see (4.7) and (4.8)). The standard error
of ei is obtained as follows,
2
se ( ei ) = s 1 - hi
1
( Xi - X )
where hi = +
n ∑ ( X i - X )2
(4.33)
where s is given by (4.21). Note that while the standard deviation of the error term is
assumed to be homoscedastic, equation 4.33 shows that the residuals of the regression line are
heteroscedastic in nature. The standard error of each residual depends on the value of hi.
The statistic hi is called the hat statistic: hi will be larger, the greater the distance of Xi
from its mean. A value of X which is far away from its mean (for example, an outlier in the
univariate analysis of X) will produce a large hat statistic which, as we shall see in section
4.7, can exert undue influence on the location of a regression line. A data point with a large
hat statistic is said to exert leverage on the least squares regression line, the importance of
which will be shown in section 4.7.
Download