Uploaded by aprajeet Singh

Chapter2-Regression-SimpleLinearRegressionAnalysis

advertisement
Chapter 2
Simple Linear Regression Analysis
The simple linear regression model
We consider the modelling between the dependent and one independent variable. When there is only one
independent variable in the linear regression model, the model is generally termed as a simple linear
regression model. When there are more than one independent variable in the model, then the linear model is
termed as the multiple linear regression model.
The linear model
Consider a simple linear regression model
y   0  1 X  
where y is termed as the dependent or study variable and X is termed as the independent or explanatory
variable. The terms  0 and 1 are the parameters of the model. The parameter  0 is termed as an intercept
term, and the parameter 1 is termed as the slope parameter. These parameters are usually called as
regression coefficients. The unobservable error component  accounts for the failure of data to lie on a
straight line and represents the difference between the true and observed realization of y . There can be
several reasons for such difference, e.g., the effect of all deleted variables in the model, variables may be
qualitative, inherent randomness in the observations etc. We assume that  is observed as an independent
and identically distributed random variable with mean zero and constant variance  2 . Later, we will
additionally assume that  is normally distributed.
The independent variables are viewed as controlled by the experimenter, so it is considered as non-stochastic
whereas y is viewed as random variable with
E ( y )   0  1 X
and
Var ( y )   2 .
Sometimes
X can also be a random variable. In such a case, instead of the sample mean and sample
variance of y , we consider the conditional mean of y given X  x as
E ( y | x)   0  1 x
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
1
and the conditional variance of y given X  x as
Var ( y | x)   2 .
When the values of  0 , 1 and  2 are known, the model is completely described. The parameters  0 , 1 and
 2 are generally unknown in practice and  is unobserved. The determination of the statistical model
y   0  1 X   depends on the determination (i.e., estimation ) of  0 , 1 and  2 . In order to know the
values of these parameters, n pairs of observations ( xi , yi )(i  1,..., n) on ( X , y ) are observed/collected and
are used to determine these unknown parameters.
Various methods of estimation can be used to determine the estimates of the parameters. Among them, the
methods of least squares and maximum likelihood are the popular methods of estimation.
Least squares estimation
Suppose a sample of n sets of paired observations ( xi , yi ) (i  1, 2,..., n) is available. These observations
are assumed to satisfy the simple linear regression model, and so we can write
yi   0  1 xi   i (i  1, 2,..., n).
The principle of least squares estimates the parameters  0 and 1 by minimizing the sum of squares of the
difference between the observations and the line in the scatter diagram. Such an idea is viewed from different
perspectives. When the vertical difference between the observations and the line in the scatter diagram is
considered, and its sum of squares is minimized to obtain the estimates of  0 and 1 , the method is known
as direct regression.
yi
(xi,
Y   0  1 X
(Xi,
xi
Direct regression
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
2
Alternatively, the sum of squares of the difference between the observations and the line in the horizontal
direction in the scatter diagram can be minimized to obtain the estimates of  0 and 1 . This is known as a
reverse (or inverse) regression method.
yi
Y   0  1 X
(xi, yi)
(Xi, Yi)
x i,
Reverse regression method
Instead of horizontal or vertical errors, if the sum of squares of perpendicular distances between the
observations and the line in the scatter diagram is minimized to obtain the estimates of  0 and 1 , the
method is known as orthogonal regression or major axis regression method.
yi
(xi
Y   0  1 X
(Xi
)
xi
Major axis regression method
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
3
Instead of minimizing the distance, the area can also be minimized. The reduced major axis regression
method minimizes the sum of the areas of rectangles defined between the observed data points and the
nearest point on the line in the scatter diagram to obtain the estimates of regression coefficients. This is
shown in the following figure:
yi
(xi yi)
Y   0  1 X
(Xi, Yi)
xi
Reduced major axis method
The method of least absolute deviation regression considers the sum of the absolute deviation of the
observations from the line in the vertical direction in the scatter diagram as in the case of direct regression to
obtain the estimates of  0 and 1 .
No assumption is required about the form of the probability distribution of  i in deriving the least squares
estimates. For the purpose of deriving the statistical inferences only, we assume that  i ' s are random
variable with
E ( i )  0, Var ( i )   2 and Cov ( i ,  j )  0 for all i  j (i, j  1, 2,..., n).
This assumption is
needed to find the mean, variance and other properties of the least-squares estimates. The assumption that
 i ' s are normally distributed is utilized while constructing the tests of hypotheses and confidence intervals
of the parameters.
Based on these approaches, different estimates of  0 and 1 are obtained which have different statistical
properties. Among them, the direct regression approach is more popular. Generally, the direct regression
estimates are referred to as the least-squares estimates or ordinary least squares estimates.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
4
Direct regression method
This method is also known as the ordinary least squares estimation. Assuming that a set of n paired
observations on ( xi , yi ), i  1, 2,..., n are available which satisfy the linear regression model y   0  1 X   .
So we can write the model for each observation as yi   0  1 xi   i , (i  1, 2,..., n) .
The direct regression approach minimizes the sum of squares
n
n
i 1
i 1
S (  0 , 1 )    i2   ( yi   0  1 xi ) 2
with respect to  0 and 1 .
The partial derivatives of S (  0 , 1 ) with respect to  0 is
n
S (  0 , 1 )
 2 ( yt   0  1 xi )
 0
i 1
and the partial derivative of S (  0 , 1 ) with respect to 1 is
n
S (  0 , 1 )
 2 ( yi   0  1 xi )xi .
1
i 1
The solutions of  0 and 1 are obtained by setting
S (  0 , 1 )
0
 0
S (  0 , 1 )
 0.
1
The solutions of these two equations are called the direct regression estimators, or usually called as the
ordinary least squares (OLS) estimators of  0 and 1 .
This gives the ordinary least squares estimates b0 of  0 and b1 of 1 as
b0  y  b1 x
b1 
sxy
sxx
where
n
n
i 1
i 1
sxy   ( xi  x )( yi  y ), sxx  ( xi  x )2 , x 
1 n
1 n
,
x
y

i
 yi .
n i 1
n i 1
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
5
Further, we have
n
 2 S (  0 , 1 )


2
(1)  2n,

 02
i 1
n
 2 S (  0 , 1 )

2
xi2

12
i 1
n
 2 S (  0 , 1 )
 2 xt  2nx .
 0 1
i 1
The Hessian matrix which is the matrix of second-order partial derivatives, in this case, is given as
  2 S (  0 , 1 )

 02
H*   2
  S ( ,  )
0
1





0
1

 2 S (  0 , 1 ) 

 0 1 
 2 S (  0 , 1 ) 

12

nx 
n


n
2
 nx  xi2 


i 1


 '
 2    , x 
 x '
where   (1,1,...,1) ' is a n -vector of elements unity and x  ( x1 ,..., xn ) ' is a n -vector of observations on X .
The matrix H * is positive definite if its determinant and the element in the first row and column of H * are
positive. The determinant of H * is given by
 n

H *  4  n xi2  n 2 x 2 
 i 1

n
 4n ( xi  x ) 2
i 1
 0.
n
The case when
 (x  x )
i 1
2
i
 0 is not interesting because all the observations, in this case, are identical, i.e.
xi  c (some constant). In such a case, there is no relationship between x and y in the context of regression
n
analysis. Since
 (x  x )
i 1
i
2
 0, therefore H  0. So H is positive definite for any (  0 , 1 ) , therefore,
S (  0 , 1 ) has a global minimum at (b0 , b1 ).
The fitted line or the fitted linear regression model is
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
6
y  b0  b1 x.
The predicted values are
yˆi  b0  b1 xi (i  1, 2,..., n).
The difference between the observed value yi and the fitted (or predicted) value yˆi is called a residual. The
i th residual is defined as
ei  yi ~ yˆi (i  1, 2,..., n)
 yi  yˆi
 yi  (b0  b1 xi ).
Properties of the direct regression estimators:
Unbiased property:
Note that b1 
sxy
and b0  y  b1 x are the linear combinations of yi (i  1,..., n).
sxx
Therefore
n
b1   ki yi
i 1
where ki  ( xi  x ) / sxx . Note that
n
k
i 1
i
 0 and
n
k x
i 1
i i
 1, so
n
E (b1 )   ki E ( yi )
i 1
n
  ki (  0  1 xi ) .
i 1
 1.
This b1 is an unbiased estimator of 1 . Next
E (b0 )  E  y  b1 x 
 E   0  1 x    b1 x 
  0  1 x  1 x
 0 .
Thus b0 is an unbiased estimator of  0 .
Variances:
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
7
Using the assumption that yi ' s are independently distributed, the variance of b1 is
n
Var (b1 )   ki2Var ( yi )   ki k j Cov( yi , y j )
i 1
2
=
=
i
 (x  x )
j i
2
i
i
sxx2
(Cov( yi , y j )  0 as y1 ,..., yn are independent)
 2 sxx
sxx2
2
.
sxx
The variance of b0 is
Var (b0 )  Var ( y )  x 2 Var (b1 )  2 xCov( y , b1 ).
First, we find that
Cov( y , b1 )  E  y  E ( y )b1  E (b1 )


 E  ( ci yi  1 ) 
 i

1 

 E (  i )(  0  ci  1  ci xi   ci i )  1   i 
n  i
i
i
i
i

1
 0  0  0  0
n
0

So
 1 x2 
Var (b0 )    
.
 n sxx 
2
Covariance:
The covariance between b0 and b1 is
Cov(b0 , b1 )  Cov( y , b1 )  xVar (b1 )

x 2
 .
sxx
It can further be shown that the ordinary least squares estimators b0 and b1 possess the minimum variance
in the class of linear and unbiased estimators. So they are termed as the Best Linear Unbiased Estimators
(BLUE). Such a property is known as the Gauss-Markov theorem, which is discussed later in multiple
linear regression model.
Residual sum of squares:
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
8
The residual sum of squares is given as
n
n
i 1
i 1
SS res   ei2   ( yi  yˆi ) 2
n
  ( yi  b0  b1 xi ) 2
i 1
n
   yi  y  b1 x  b1 xi 
2
i 1
n
   ( yi  y )  b1 ( xi  x ) 
2
i 1
n
n
n
i 1
i 1
  ( yi  y ) 2  b12  ( xi  x ) 2  2b1  ( xi  x )( yi  y )
i 1
 s yy  b s  2b s
2
1 xx
2
1 xx
 s yy  b12 sxx
2
s 
 s yy   xy  sxx
 sxx 
sxy2
 s yy 
sxx
 s yy  b1sxy .
n
where s yy   ( yi  y ) 2 , y 
i 1
1 n
 yi .
n i 1
Estimation of  2
The estimator of  2 is obtained from the residual sum of squares as follows. Assuming that yi is normally
distributed, it follows that SS res has a  2 distribution with (n  2) degrees of freedom, so
SSres
2
~  2 (n  2).
Thus using the result about the expectation of a chi-square random variable, we have
E ( SSres )  (n  2) 2 .
Thus an unbiased estimator of  2 is
SS res
.
n2
Note that SS res has only (n  2) degrees of freedom. The two degrees of freedom are lost due to estimation
s2 
of b0 and b1 . Since s 2 depends on the estimates b0 and b1 , so it is a model-dependent estimate of  2 .
Estimates of variances of b0 and b1 :
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
9
The estimators of variances of b0 and b1 are obtained by replacing  2 by its estimate ˆ 2  s 2 as follows:
2
 (b )  s 2  1  x 
Var


0
 n sxx 
and
2
 (b )  s .
Var
1
sxx
n
It is observed that since
 ( y  yˆ )  0,
i
i 1
i
n
so
e
i 1
i
 0. In the light of this property, ei can be regarded as an
estimate of unknown  i (i  1,..., n) . This helps in verifying the different model assumptions on the basis of
the given sample ( xi , yi ), i  1, 2,..., n.
Further, note that
n
(i)
xe
i 1
n
(ii)
i i
 yˆ e
i 1
n
(iii)
i i
 0,
 0,
n
 yi   yˆi and
i 1
i 1
(iv) the fitted line always passes through ( x , y ).
Centered Model:
Sometimes it is useful to measure the independent variable around its mean. In such a case, the model
yi   0  1 X i   i has a centred version as follows:
yi   0  1 ( xi  x )  1 x  
(i  1, 2,..., n)
  0*  1 ( xi  x )   i
where  0*   0  1 x . The sum of squares due to error is given by
n
n
i 1
i 1
2
S (  0* , 1 )    i2    yi   0*   1 ( xi  x )  .
Now solving
S (  0* , 1 )
0
 0*
S (  0* , 1 )
 0,
1*
we get the direct regression least squares estimates of  0* and 1 as
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
10
b0*  y
and
b1 
sxy
sxx
respectively.
,
Thus the form of the estimate of slope parameter 1 remains the same in the usual and centered model
whereas the form of the estimate of intercept term changes in the usual and centered models.
Further, the Hessian matrix of the second order partial derivatives of
is positive definite at
S ( 0* , 1 ) with respect to  0* and 1
0*  b0* and 1  b1 which ensures that S ( 0* , 1 ) is minimized at 0*  b0* and
1  b1 .
Under the assumption that E ( i )  0, Var ( i )   2 and Cov( i j )  0 for all i  j  1, 2,..., n , it follows that
E (b0* )   0* , E (b1 )  1 ,
Var (b0* ) 
2
n
, Var (b1 ) 
2
sxx
.
In this case, the fitted model of yi   0*  1 ( xi  x )   i is
y  y  b1 ( x  x ),
and the predicted values are
yˆi  y  b1 ( xi  x ) (i  1,..., n).
Note that in the centered model
Cov(b0* , b1 )  0.
No intercept term model:
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
11
Sometimes in practice, a model without an intercept term is used in those situations when xi  0  yi  0 for
all i  1, 2,..., n . A no-intercept model is
yi  1 xi   i (i  1, 2,.., n).
For example, in analyzing the relationship between the velocity ( y ) of a car and its acceleration ( X ) , the
velocity is zero when acceleration is zero.
Using the data ( xi , yi ), i  1, 2,..., n, the direct regression least-squares estimate of
n
n
i 1
i 1
1 is obtained by
minimizing S ( 1 )    i2   ( yi  1 xi ) 2 and solving
S ( 1 )
0
1
gives the estimator of 1 as
n
b 
*
1
yx
i 1
n
i i
.
x
i 1
2
i
The second-order partial derivative of S ( 1 ) with respect to 1 at 1  b1 is positive which insures that b1
minimizes S ( 1 ).
Using the assumption that E ( i )  0, Var ( i )   2 and Cov( i j )  0 for all i  j  1, 2,..., n , the properties
of b1* can be derived as follows:
n
E (b1* ) 
 x E( y )
i
i 1
i
n
x
2
i
i 1
n

x 
i 1
n
2
i
1
x
i 1
2
i
 1
This b1* is an unbiased estimator of 1 . The variance of b1* is obtained as follows:
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
12
n
Var (b1* ) 
 x Var ( y )
2
i
i
i 1
 n 2
  xi 
 i 1 
n
2

2
x
i 1
2
i
 n 2
  xi 
 i 1 
2
2
n
x
i 1
2
i
and an unbiased estimator of  2 is obtained as
n
n
 yi2  b1  yi xi
i 1
i 1
n 1
.
Maximum likelihood estimation
We assume that  i ' s (i  1, 2,..., n) are independent and identically distributed
following a normal
distribution N (0,  2 ). Now we use the method of maximum likelihood to estimate the parameters of the
linear regression model
yi   0  1 xi   i (i  1, 2,..., n),
the observations yi (i  1, 2,..., n) are independently distributed with N (  0  1 xi ,  2 ) for all i  1, 2,..., n.
The likelihood function of the given observations ( xi , yi ) and unknown parameters  0 , 1 and  2 is
1/ 2
 1 
L( xi , yi ;  0 , 1 ,  )   
2 
i 1  2 
2
n
 1

exp   2 ( yi   0  1 xi ) 2 .
 2

The maximum likelihood estimates of  0 , 1 and  2 can be obtained by maximizing L( xi , yi ;  0 , 1 ,  2 ) or
equivalently in ln L( xi , yi ;  0 , 1 ,  2 ) where
n
ln L( xi , yi ;  0 , 1 ,  2 )     ln 2
2
n
 1  n
   ln  2   2   ( yi   0  1 xi )2 .
2
 2  i 1
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
13
The normal equations are obtained by partial differentiation of log-likelihood with respect to  0 , 1 and  2
and equating them to zero as follows:
 ln L( xi , yi ;  0 , 1 ,  2 )
1
 2
 0

 ln L( xi , yi ;  0 , 1 ,  2 )
1
 2
1

n
(y  
i 1
i
0
 1 xi )  0
0
 1 xi )xi  0
n
(y  
i 1
i
and
 ln L( xi , yi ;  0 , 1 ,  2 )
n
1 n



 ( yi  0  1 xi )2  0.
 2
2 2 2 4 i 1
The solution of these normal equations give the maximum likelihood estimates of  0 , 1 and  2 as
b0  y  b1 x
n
b1 
 ( x  x )( y  y )
i
i 1
i
n
 (x  x )
i 1
2

sxy
sxx
i
and
n
s 2 
 ( y  b
i 1
i
0
 b1 xi ) 2
n
respectively.
It can be verified that the Hessian matrix of second-order partial derivation of ln L with respect to  0 , 1 ,
and  2 is negative definite at  0  b0 , 1  b1 , and  2  s 2 which ensures that the likelihood function is
maximized at these values.
Note that the least-squares and maximum likelihood estimates of  0 and 1 are identical. The least-squares
and maximum likelihood estimates of  2 are different. In fact, the least-squares estimate of  2 is
1 n
( yi  y ) 2

n  2 i 1
so that it is related to the maximum likelihood estimate as
s2 
n2 2
s .
n
Thus b0 and b1 are unbiased estimators of  0 and 1 whereas s 2 is a biased estimate of  2 , but it is
asymptotically unbiased. The variances of b0 and b1 are same as of b0 and b1 respectively but
s 2 
Var ( s 2 )  Var ( s 2 ).
Testing of hypotheses and confidence interval estimation for slope parameter:
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
14
Now we consider the tests of hypothesis and confidence interval estimation for the slope parameter of the
model under two cases, viz., when  2 is known and when  2 is unknown.
Case 1: When  2 is known:
Consider the simple linear regression model yi   0  1 xi   i
(i  1, 2,..., n) . It is assumed that  i ' s are
independent and identically distributed and follow N (0,  2 ).
First, we develop a test for the null hypothesis related to the slope parameter
H 0 : 1  10
where 10 is some given constant.
Assuming
 2 to be known, we know that E (b1 )  1 , Var (b1 ) 
2
sxx
and b1 is a linear combination of
normally distributed yi ' s . So
 2 
b1 ~ N  1 , 
sxx 

and so the following statistic can be constructed
Z1 
b1  10
2
sxx
which is distributed as N (0,1) when H 0 is true.
A decision rule to test H1 : 1  10 can be framed as follows:
Reject H 0 if
Z1  Z / 2
where Z /2 is the  / 2 percent points on the normal distribution.
Similarly, the decision rule for one-sided alternative hypothesis can also be framed.
The 100 (1   )% confidence interval for 1 can be obtained using the Z1 statistic as follows:
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
15
P   z / 2  Z1  z /2   1  




b1  1

P  z /2 
 z /2   1  


2



sxx



2
2 
P b1  z /2
 1  b1  z /2
  1 .
sxx
sxx 

So 100 (1   )% confidence interval for 1 is

2
2 
, b1  z / 2
b1  z / 2

sxx
sxx 

where z / 2 is the  / 2 percentage point of the N (0,1) distribution.
Case 2: When  2 is unknown:
When  2 is unknown then we proceed as follows. We know that
SS res

2
~  2 (n  2)
and
 SS 
E  res    2 .
n2
Further, SSres /  2 and b1 are independently distributed. This result will be proved formally later in the next
module on multiple linear regression. This result also follows from the result that under normal distribution,
the maximum likelihood estimates, viz., the sample mean (estimator of population mean) and the sample
variance (estimator of population variance) are independently distributed, so b1 and s 2 are also
independently distributed.
Thus the following statistic can be constructed:
t0 
b1  1
ˆ 2
sxx

b1  1
SS res
(n  2) sxx
which follows a t -distribution with (n  2) degrees of freedom, denoted as tn  2 , when H 0 is true.
A decision rule to test H1 : 1  10 is to
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
16
reject H 0 if t0  tn  2, / 2
where tn  2, / 2 is the  / 2 percent point of the t -distribution with (n  2) degrees of freedom. Similarly, the
decision rule for the one-sided alternative hypothesis can also be framed.
The 100 (1   )% confidence interval of 1 can be obtained using the t0 statistic as follows:
Consider
P  t /2  t0  t / 2   1  




b1  1

P t / 2 
 t /2   1  


2
ˆ


sxx



ˆ 2
ˆ 2 
 1  b1  t / 2
P b1  t /2
  1.
sxx
sxx 

So the 100 (1   )% confidence interval 1 is

SSres
SSres 
, b1  tn  2, /2
b1  tn  2, /2
.
(n  2) sxx
(n  2) sxx 

Testing of hypotheses and confidence interval estimation for intercept term:
Now, we consider the tests of hypothesis and confidence interval estimation for intercept term under two
cases, viz., when  2 is known and when  2 is unknown.
Case 1: When  2 is known:
Suppose the null hypothesis under consideration is
H 0 :  0   00 ,
 1 x2 
using the result that E (b0 )   0 , Var (b0 )   2    and b0 is a linear
 n sx 
combination of normally distributed random variables, the following statistic
b0   00
Z0 
 1 x2 
2  
 n sxx 
where  2 is known, then
has a N (0,1) distribution when H 0 is true.
A decision rule to test H1 :  0   00 can be framed as follows:
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
17
Reject H 0 if
Z 0  Z /2
where Z /2 is the  / 2 percentage points on the normal distribution. Similarly, the decision rule for onesided alternative hypothesis can also be framed.
The 100 (1   )% confidence intervals for  0 when  2 is known can be derived using the Z 0 statistic as
follows:
P   z /2  Z 0  z /2   1  




b0   0


 z /2   1  
P   z / 2 
2


1
x


2  


 n sxx 



 1 x2 
 1 x2
P b0  z /2  2      0  b0  z /2  2  

 n sxx 
 n sxx
So the 100 (1   )% of confidential interval of  0 is

 1 x2 
 1 x2
b0  z / 2  2    , b0  z / 2  2  

 n sxx 
 n sxx




  1 .


.
 
Case 2: When  2 is unknown:
When  2 is unknown, then the following statistic is constructed
t0 
b0   00
SS res  1 x 2 
 

n  2  n sxx 
which follows a t -distribution with (n  2) degrees of freedom, i.e., tn  2 when H 0 is true.
A decision rule to test H1 :  0   00 is as follows:
Reject H 0 whenever t0  tn  2, / 2
where tn  2, / 2 is the  / 2 percentage point of the t -distribution with (n  2) degrees of freedom. Similarly,
the decision rule for one-sided alternative hypothesis can also be framed.
The 100 (1   )% confidence interval of  0 can be obtained as follows:
Consider
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
18
P tn  2, / 2  t0  tn  2, /2   1  




b0   0


P tn  2, / 2 
 tn  2, /2   1  
SSres  1 x 2 


  


n
2
n
s

xx 




SSres  1 x 2 
SSres  1 x 2

P b0  tn  2, /2
b
t






 
0
0
n  2, /2
n  2  n sxx 
n  2  n sxx


   1.
 
So 100(1   )% confidence interval for  0 is

SSres  1 x 2 
SS res  1 x 2
b0  tn  2, / 2


,
b
t

 0 n  2, / 2
 
n  2  n sxx 
n  2  n sxx


.
 
Test of hypothesis for  2
We have considered two types of test statistics for testing the hypothesis about the intercept term and slope
parameter- when  2 is known and when  2 is unknown. While dealing with the case of known  2 , the
value of  2 is known from some external sources like past experience, long association of the experimenter
with the experiment, past studies etc. In such situations, the experimenter would like to test the hypothesis
like H 0 :  2   02 against H 0 :  2   02 where  02 is specified. The test statistic is based on the result
SSr es

2
~  n2 2 . So the test statistic is
C0 
SS r es
~  n2 2 under H 0 .

The decision rule is to reject H 0 if C0   n2 2, / 2 or C0   n2 2,1 / 2 .
2
0
Confidence interval for  2
A confidence interval for  2 can also be derived as follows. Since SSres /  2 ~  n2 2 , thus consider
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
19
SS


P   n2 2, /2  res
  n2 2,1 /2   1  
2



 SS
SS 
P  2 res   2  2 res   1   .
 n 2, / 2 
  n  2,1 / 2
The corresponding 100(1   )% confidence interval for  2 is
 SS res
SS 
, 2 res  .
 2
  n  2,1 / 2  n  2, / 2 
Joint confidence region for  0 and 1 :
A joint confidence region for  0 and 1 can also be found. Such a region will provide a 100(1   )%
confidence that both the estimates of  0 and 1 are correct. Consider the centered version of the linear
regression model
yi   0*  1 ( xi  x )   i
where  0*   0  1 x . The least squares estimators of  0* and 1 are
b0*  y and b1 
sxy
sxx
,
respectively.
Using the results that
E (b0* )   0* ,
E (b1 )  1 ,
Var (b0* ) 
Var (b1 ) 
2
n
2
sxx
,
.
When  2 is known, then the statistic
b0*   0*
2
n
~ N (0,1)
and
b1  1
2
~ N (0,1).
sxx
Moreover, both statistics are independently distributed. Thus
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
20
2




b1  1 

and
~ 12

2 
  
 s 
xx 

are also independently distributed because b0* and b1 are independently distributed. Consequently, the sum
2


 *
* 
 b0   0  ~ 12
 2 


n 

of these two
n(b0*   o* ) 2

2

sxx (b1  1 ) 2

2
~  22 .
Since
SSres

2
~  n2 2
and SS res is independently distributed of b0* and b1 , so the ratio
 n(b0*   0* ) 2 sxx (b1  1 ) 2 


 2
2
2


~ F2,n  2 .
SS
 res 
 2  (n  2)
  
*
Substituting b0  b0  b1 x and  0*   0  1 x , we get
 n  2   Qf 



 2   SS res 
where
n
n
i 1
i 1
Q f  n(b0   0 ) 2  2 xt (b0  1 )(b1  1 )   xi2 (b1  1 ) 2 .
Since
 n  2  Q f

 F2, n  2   1  
P 

 2  SS res

holds true for all values of  0 and 1 , so the 100 (1   ) % confidence region for  0 and 1 is
 n  2  Qf
 F2,n  2;1 . .

.
 2  SS res
This confidence region is an ellipse which gives the 100 (1   )% probability that  0 and 1 are contained
simultaneously in this ellipse.
Analysis of variance:
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
21
The technique of analysis of variance is usually used for testing the hypothesis related to equality of more
than one parameters, like population means or slope parameters. It is more meaningful in case of multiple
regression model when there are more than one slope parameters. This technique is discussed and illustrated
here to understand the related basic concepts and fundamentals which will be used in developing the analysis
of variance in the next module in multiple linear regression model where the explanatory variables are more
than two.
A test statistic for testing H 0 : 1  0 can also be formulated using the analysis of variance technique as
follows.
On the basis of the identity
yi  yˆi  ( yi  y )  ( yˆi  y ),
the sum of squared residuals is
n
S (b)   ( yi  yˆi ) 2
i 1
n
n
n
i 1
i 1
i 1
  ( yi  y ) 2   ( yˆi  yi ) 2  2 ( yi  y )( yˆi  y ).
Further, consider
n
n
i 1
i 1
 ( yi  y )( yˆi  y )   ( yi  y )b1 ( xi  x )
n
 b12  ( xi  x ) 2
i 1
n
  ( yˆi  y ) 2 .
i 1
Thus we have
n
n
n
i 1
i 1
i 1
 ( yi  y )2   ( yi  yˆi )2   ( yˆi  y )2 .
n
The term
(y y)
2
is called the sum of squares about the mean, corrected sum of squares of y (i.e.,
i
i 1
SScorrected), total sum of squares, or s yy .
n
The term
 ( y  yˆ )
i 1
i
2
i
describes the deviation: observation minus predicted value, viz., the residual sum of
n
squares, i.e., SS res   ( yi  yˆi ) 2
i 1
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
22
n
 ( yˆ  y )
whereas the term
i 1
2
i
describes the proportion of variability explained by the regression,
n
SS r e g   ( yˆi  y ) 2 .
i 1
n
If all observations
 ( y  yˆ )
yi are located on a straight line, then in this case
i 1
i
2
i
 0 and thus
SScorrected  SSr e g .
Note that SSr e g is completely determined by b1 and so has only one degree of freedom. The total sum of
n
squares s yy   ( yi  y ) 2 has (n  1) degrees of freedom due to constraint
i 1
n
( y  y)  0
i 1
i
and SS res has
(n  2) degrees of freedom as it depends on the determination of b0 and b1 .
All sums of squares are mutually independent and distributed as  df2 with df degrees of freedom if the
errors are normally distributed.
The mean square due to regression is
MSr e g 
SSr e g
1
and mean square due to residuals is
MSE 
SSres
.
n2
The test statistic for testing H 0 : 1  0 is
F0 
MSr e g
MSE
.
If H 0 : 1  0 is true, then MSr e g and MSE are independently distributed and thus
F0 ~ F1,n  2 .
The decision rule for H1 : 1  0 is to reject H 0 if
F0  F1,n  2;1
at  level of significance. The test procedure can be described in an Analysis of variance table.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
23
Analysis of variance for testing H 0 : 1  0
Source of variation
Sum of squares
Degrees of freedom
Mean square
Regression
SSr e g
1
MSr e g
Residual
SS res
n2
MSE
Total
s yy
n 1
F
MSr e g / MSE
Some other forms of SSreg , SSres and s yy can be derived as follows:
The sample correlation coefficient then may be written as
sxy
rxy 
sxx
s yy
.
Moreover, we have
b1 
sxy
sxx
s yy
 rxy
sxx
.
The estimator of  2 in this case may be expressed as
1 n 2
 ei
n  2 i 1
1

SSres .
n2
s2 
Various alternative formulations for SS res are in use as well:
n
SS res   [ yi  (b0  b1 xi )]2
i 1
n
  [( yi  y )  b1 ( xi  x )]2
i 1
 s yy  b12 sxx  2b1sxy
 s yy  b12 sxx
 s yy 
( sxy ) 2
sxx
.
Using this result, we find that
SScorrected  s yy
and
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
24
SS r e g  s yy  SS res

( sxy ) 2
sxx
 b12 sxx
 b1sxy .
Goodness of fit of regression
It can be noted that a fitted model can be said to be good when residuals are small. Since SS res is based on
residuals, so a measure of the quality of a fitted model can be based on SS res . When the intercept term is
present in the model, a measure of goodness of fit of the model is given by
R2  1 
SSres SS r e g

.
s yy
s yy
This is known as the coefficient of determination. This measure is based on the concept that how much
variation in y ’s stated by s yy is explainable by SSreg and how much unexplainable part is contained in
SS res . The ratio SSr e g / s yy describes the proportion of variability that is explained by regression in relation
to the total variability of y . The ratio SSres / s yy describes the proportion of variability that is not covered
by the regression.
It can be seen that
R 2  rxy2
where rxy is the simple correlation coefficient between x and y. Clearly 0  R 2  1 , so a value of R 2 closer
to one indicates the better fit and value of R 2 closer to zero indicates the poor fit.
Prediction of values of study variable
An important use of linear regression modeling is to predict the average and actual values of the study
variable. The term prediction of the value of study variable corresponds to knowing the value of E ( y ) (in
case of average value) and value of y (in case of actual value) for a given value of the explanatory variable.
We consider both cases.
Case 1: Prediction of average value
Under the linear regression model, y   0  1 x   , the fitted model is y  b0  b1 x where b0 and b1 are the
OLS estimators of  0 and 1 respectively.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
25
Suppose we want to predict the value of E ( y ) for a given value of x  x0 . Then the predictor is given by

E
( y | x0 )  ˆ y / x0  b0  b1 x0 .
Predictive bias
Then the prediction error is given as
ˆ y|x  E ( y )  b0  b1 x0  E (  0  1 x0   )
0
 b0  b1 x0  (  0  1 x0 )
 (b0   0 )  (b1  1 ) x0 .
Then
E  ˆ y| x0  E ( y )   E (b0   0 )  E (b1  1 ) x0
 00  0
Thus the predictor  y / x0 is an unbiased predictor of E ( y ).
Predictive variance:
The predictive variance of ˆ y|x0 is
PV ( ˆ y| x0 )  Var (b0  b1 x0 )
 Var  y  b1 ( x0  x ) 
 Var ( y )  ( x0  x ) 2 Var (b1 )  2( x0  x )Cov( y , b1 )

2
n

 2 ( x0  x )2
sxx
0
 1 ( x  x )2 
2   0
.
sxx 
n
Estimate of predictive variance
The predictive variance can be estimated by substituting  2 by ˆ 2  MSE as
2
 ( ˆ )  ˆ 2  1  ( x0  x ) 
PV


y| x0
sxx 
n
 1 ( x  x )2 
 MSE   0
.
sxx 
n
Prediction interval estimation:
The 100(1-  )% prediction interval for E ( y / x0 ) is obtained as follows:
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
26
The predictor ˆ y|x0 is a linear combination of normally distributed random variables, so it is also normally
distributed as

ˆ y|x ~ N  0  1 x0 , PV  ˆ y| x
0
0
 .
So if  2 is known, then the distribution of
ˆ y| x  E ( y | x0 )
0
PV ( ˆ y|x0 )
is N (0,1). So the 100(1-  )% prediction interval is obtained as


ˆ y| x0  E ( y | x0 )
P   z /2 
 z /2   1  
PV ( ˆ y|x0 )


which gives the prediction interval for E ( y / x0 ) as

 1 ( x  x )2 
( x0  x ) 2  
2 1
ˆ
 ˆ y| x0  z /2  2   0
 ,  y|x0  z /2   
 .
n
s
n
s

xx
xx



 
When  2 is unknown, it is replaced by ˆ 2  MSE and in this case the sampling distribution of
ˆ y| x  E ( y | x0 )
0
 1 ( x0  x ) 2 
MSE  
sxx 
n
is t -distribution with (n  2) degrees of freedom, i.e., tn  2 .
The 100(1-  )% prediction interval in this case is




ˆ y|x0  E ( y | x0 )


 t   1   .
P  t /2,n  2 
,n 2
 1 ( x  x )2 
2


MSE   0



sxx 
n


which gives the prediction interval as

 1 ( x  x )2 
 1 ( x0  x ) 2  
ˆ
 ˆ y| x0  t /2, n  2 MSE   0
,
t
MSE



 
 .
y| x0
 /2, n  2
sxx 
sxx

n
n
 
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
27
Note that the width of the prediction interval E ( y | x0 ) is a function of x0 . The interval width is minimum
for x0  x and widens as x0  x increases. This is also expected as the best estimates of y to be made at
x -values lie near the center of the data and the precision of estimation to deteriorate as we move to the
boundary of the x -space.
Case 2: Prediction of actual value
If x0 is the value of the explanatory variable, then the actual value predictor for y is
ŷ0  b0  b1 x0 .
The true value of y in the prediction period is given by y0   0  1 x0   0 where  0 indicates the value that
would be drawn from the distribution of random error in the prediction period. Note that the form of
predictor is the same as of average value predictor, but its predictive error and other properties are different.
This is the dual nature of predictor.
Predictive bias:
The predictive error of ŷ0 is given by
yˆ 0  y0  b0  b1 x0  (  0  1 x0   0 )
 (b0   0 )  (b1  1 ) x0   0 .
Thus, we find that
E ( yˆ 0  y0 )  E (b0   0 )  E (b1  1 ) x0  E ( 0 )
 000  0
which implies that ŷ0 is an unbiased predictor of y0 .
Predictive variance
Because the future observation y0 is independent of ŷ0 , the predictive variance of ŷ0 is
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
28
PV ( yˆ 0 )  E ( yˆ 0  y0 ) 2
 E[(b0   0 )  ( x0  x )(b1  1 )  (b1  1 ) x   0 ]2
 Var (b0 )  ( x0  x ) 2 Var (b1 )  x 2Var (b1 )  Var ( 0 )  2( x0  x )Cov(b0 , b1 )  2 xCov(b0 , b1 )  2( x0  x )Var (b1 )
[rest of the terms are 0 assuming the independence of  0 with 1 ,  2 ,...,  n ]
 Var (b0 )  [( x0  x ) 2  x 2  2( x0  x )]Var (b1 )  Var ( )  2[( x0  x )  2 x ]Cov(b0 , b1 )
 Var (b0 )  x02Var (b1 )  Var ( 0 )  2 x0Cov(b0 , b1 )
1 x2 
2
x 2
  2     x02
  2  2 x0
sxx
sxx
 n sxx 
 1 ( x  x )2 
  2 1   0
.
sxx 
 n
Estimate of predictive variance
The estimate of predictive variance can be obtained by replacing  2 by its estimate ˆ 2  MSE as

 1 ( x  x )2 
PV ( yˆ 0 )  ˆ 2 1   0

sxx 
 n
 1 ( x  x )2 
 MSE 1   0
.
sxx 
 n
Prediction interval:
If  2 is known, then the distribution of
yˆ 0  y0
PV ( yˆ 0 )
is N (0,1). So the 100(1-  )% prediction interval is obtained as


yˆ  y0
P   z /2  0
 z /2   1  
PV ( yˆ 0 )


which gives the prediction interval for y0 as

 1 ( x  x )2 
1 ( x0  x ) 2  
2
 yˆ 0  z /2  2 1   0
 , yˆ 0  z /2  1  
.
sxx 
sxx  

 n
 n

When  2 is unknown, then
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
29
yˆ 0  y0
 ( yˆ )
PV
0
follows a t -distribution with (n  2) degrees of freedom. The 100(1-  )% prediction interval for ŷ0 in this
case is obtained as


yˆ  y0
P  t / 2, n  2  0
 t /2,n  2   1  
 ( yˆ )


PV
0


which gives the prediction interval

 1 ( x  x )2 
 1 ( x0  x ) 2  
ˆ
 yˆ 0  t /2,n  2 MSE 1   0
 , y0  t / 2,n  2 MSE 1  
 .
n
s
n
s

xx
xx



 
The prediction interval is of minimum width at x0  x and widens as x0  x increases.
The prediction interval for ŷ0 is wider than the prediction interval for ˆ y / x0 because the prediction interval
for
ŷ0 depends on both the error from the fitted model as well as the error associated with the future
observations.
Reverse regression method
The reverse (or inverse) regression approach minimizes the sum of squares of horizontal distances between
the observed data points and the line in the following scatter diagram to obtain the estimates of regression
parameters.
yi
Y   0  1 X
(xi,
(Xi,
)
x,
Reverse regression
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
30
The reverse regression has been advocated in the analysis of gender (or race) discrimination in salaries. For
example, if y denotes salary and x denotes qualifications, and we are interested in determining if there is
gender discrimination in salaries, we can ask:
“Whether men and women with the same qualifications (value of x) are getting the same salaries
(value of y). This question is answered by the direct regression.”
Alternatively, we can ask:
“Whether men and women with the same salaries (value of y) have the same qualifications (value of
x). This question is answered by the reverse regression, i.e., regression of x on y.”
The regression equation in case of reverse regression can be written as
xi   0*  1* yi   i (i  1, 2,..., n)
where  i ’s are the associated random error components and satisfy the assumptions as in the case of the
usual simple linear regression model. The reverse regression estimates ˆOR of  0* and ˆ1R of  1* for the
model are obtained by interchanging the x and y in the direct regression estimators of  0 and 1 . The
estimates are obtained as
ˆOR  x  ˆ1R y
and
s
ˆ1R  yy
sxy
for  0 and 1 respectively. The residual sum of squares in this case is
SS res  sxx 
*
sxy2
s yy
.
Note that
ˆ1Rb1 
sxy2
sxx s yy
 rxy2
where b1 is the direct regression estimator of the slope parameter and rxy is the correlation coefficient
between x and y. Hence if rxy2 is close to 1, the two regression lines will be close to each other.
An important application of the reverse regression method is in solving the calibration problem.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
31
Orthogonal regression method (or major axis regression method)
The direct and reverse regression methods of estimation assume that the errors in the observations are either
in x -direction or y -direction. In other words, the errors can be either in the dependent variable or
independent variable. There can be situations when uncertainties are involved in dependent and independent
variables both. In such situations, the orthogonal regression is more appropriate. In order to take care of
errors in both the directions, the least-squares principle in orthogonal regression minimizes the squared
perpendicular distance between the observed data points and the line in the following scatter diagram to
obtain the estimates of regression coefficients. This is also known as the major axis regression method.
The estimates obtained are called orthogonal regression estimates or major axis regression estimates of
regression coefficients.
yi
(xi, yi)
Y   0  1 X
(Xi, Yi)
xi
Orthogonal or major axis regression
If we assume that the regression line to be fitted is Yi   0  1 X i , then it is expected that all the observations
( xi , yi ), i  1, 2,..., n lie on this line. But these points deviate from the line, and in such a case, the squared
perpendicular distance of observed data ( xi , yi ) (i  1, 2,..., n) from the line is given by
di2  ( X i  xi )2  (Yi  yi ) 2
where ( X i , Yi ) denotes the i th pair of observation without any error which lies on the line.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
32
n
The objective is to minimize the sum of squared perpendicular distances given by

i 1
di2 to obtain the
estimates of  0 and 1 . The observations ( xi , yi ) (i  1, 2,..., n) are expected to lie on the line
Yi   0  1 X i ,
so let
Ei  Yi   0  1 X i  0.
n
The regression coefficients are obtained by minimizing
d
i 1
2
i
under the constraints Ei ' s using the
Lagrangian’s multiplier method. The Lagrangian function is
n
n
i 1
i 1
L0   di2  2 i Ei
where 1 ,..., n are the Lagrangian multipliers. The set of equations are obtained by setting
L0
L
L
L
 0, 0  0, 0  0 and 0  0 (i  1, 2,..., n).
X i
Yi
 0
1
Thus we find
L0
 ( X i  xi )  i 1  0
X i
L0
 (Yi  yi )  i  0
Yi
L0

 0
L0

1
n

i 1
i
0
n
 X
i 1
i
i
 0.
Since
X i  xi  i 1
Yi  yi  i ,
so substituting these values is Ei , we obtain
Ei  ( yi  i )   0  1 ( xi  i 1 )  0
 i 
 0  1 xi  yi
.
1  12
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
33
Also using this i in the equation
n
 (
i 1
0
 1 xi  yi )
1  12
n

i
i 1
 0 , we get
0
and using ( X i  xi )  i 1  0 and
n
 X
i 1
i
i
 0 , we get
n
  ( x    )  0.
i 1
i
i
i
1
Substituting i in this equation, we get
n
 ( x   x
i 1
2
1 i
0 i
 yi xi )
(1  i2 )

1 (  0  1 xi  yi )2
 0.
(1  12 ) 2
Using i in the equation and using the equation
n
 (
i 1
0
 1 xi  yi )
1  12
n

i 1
i
(1)
 0 , we solve
 0.
The solution provides an orthogonal regression estimate of  0 as
ˆ0OR  y  ˆ1OR x
where ˆ1OR is an orthogonal regression estimate of 1.
Now, substituting  0OR in equation (1), we get
2
 (1   )  yxi  1 xxi  1 xi2  xi yi   1   y  1 x  1 xi  yi   0
n
i 1
n
2
1
i 1
or
(1  12 )
n
n
i 1
i 1
2
 xi  yi  y  1 ( xi  x )  1   ( yi  y )  1 ( xi  x )  0
or
n
n
i 1
i 1
(1  12 ) (ui  x )(vi  1ui )  1  (vi  1ui ) 2  0
where
ui  xi  x ,
vi  yi  y .
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
34
Since
n
n
 ui   ui  0, so
i 1
i 1
n
  
u v  1 (ui2  vi2 )  ui vi   0
2
1 i i
i 1
or
12 sxy  1 ( sxx  s yy )  sxy  0.
Solving this quadratic equation provides the orthogonal regression estimate of 1 as
ˆ
1OR

s
yy
 sxx   sign  sxy  ( sxx  s yy ) 2  4s 2xy
2sxy
where sign( sxy ) denotes the sign of sxy which can be positive or negative. So
1 if sxy  0
sign( sxy )  
.
1 if sxy  0.
Notice that this gives two solutions for
n
solution maximizes
d
i 1
2
i
ˆ1OR . We choose the solution which minimizes
n
d
i 1
2
i
. The other
and is in the direction perpendicular to the optimal solution. The optimal solution
can be chosen with the sign of sxy .
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
35
Reduced major axis regression method:
The direct, reverse and orthogonal methods of estimation minimize the errors in a particular direction which
is usually the distance between the observed data points and the line in the scatter diagram. Alternatively,
one can consider the area extended by the data points in a certain neighbourhood and instead of distances, the
area of rectangles defined between the corresponding observed data point and the nearest point on the line in
the following scatter diagram can also be minimized. Such an approach is more appropriate when the
uncertainties are present in the study and explanatory variables both. This approach is termed as reduced
major axis regression.
yi
(xi yi)
Y   0  1 X
(Xi, Yi)
xi
Reduced major axis method
Suppose the regression line is Yi   0  1 X i on which all the observed points are expected to lie. Suppose
the points ( xi , yi ), i  1, 2,..., n are observed which lie away from the line. The area of rectangle extended
between the i th observed data point and the line is
Ai  ( X i ~ xi )(Yi ~ yi ) (i  1, 2,..., n)
where ( X i , Yi ) denotes the i th pair of observation without any error which lies on the line.
The total area extended by n data points is
n
n
i 1
i 1
 Ai   ( X i ~ xi )(Yi ~ yi ).
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
36
All observed data points ( xi , yi ), (i  1, 2,..., n) are expected to lie on the line
Yi   0  1 X i
and let
Ei*  Yi   0  1 X i  0.
So now the objective is to minimize the sum of areas under the constraints Ei* to obtain the reduced major
axis estimates of regression coefficients. Using the Lagrangian multiplier method, the Lagrangian function is
n
n
i 1
i 1
LR   Ai   i Ei*
n
n
i 1
i 1
  ( X i  xi )(Yi  yi )   i Ei*
where 1 ,..., n are the Lagrangian multipliers. The set of equations are obtained by setting
LR
L
L
L
 0, R  0, R  0, R  0 (i  1, 2,..., n).
X i
Yi
 0
1
Thus
LR
 (Yi  yi )  1i  0
X i
LR
 ( X i  xi )  i  0
Yi
LR n
  i  0
 0 i 1
LR n
  i X i  0.
1 i 1
Now
X i  xi  i
Yi  yi  1i
 0  1 X i  yi  1i
 0  1 ( xi  i )  yi  1i
y   0  1 xi
 i  i
.
2 1
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
37
n

i in
Substituting
i 1
i
0,
the reduced major axis regression estimate of
 0 is obtained as
ˆ0 RM  y  ˆ1RM x
where ˆ1RM is the reduced major axis regression estimate of 1 . Using X i  xi  i , i and ˆ0 RM in
n
 X
i 1
i
i
 0 , we get
 yi  y  1 x  1 xi  
yi  y  1 x  1 xi
  xi 
2 1
2 1
i 1 

n


  0.

Let ui  xi  x and vi  yi  y , then this equation can be re-expressed as
n
 (v   u )(v   u  2 x )  0.
Using
1 i
i
i 1
n
n
i 1
i 1
i
1 i
1
 ui   ui  0, we get
n
v
i 1
2
i
n
 12  ui2  0.
i 1
Solving this equation, the reduced major axis regression estimate of 1 is obtained as
ˆ1RM  sign( sxy )
s yy
sxx
1 if sxy  0
where sign ( sxy )  
1 if sxy  0.
We choose the regression estimator which has same sign as of sxy .
Least absolute deviation regression method
The least-squares principle advocates the minimization of the sum of squared errors. The idea of squaring the
errors is useful in place of simple errors because random errors can be positive as well as negative. So
consequently their sum can be close to zero indicating that there is no error in the model and which can be
misleading. Instead of the sum of random errors, the sum of absolute random errors can be considered which
avoids the problem due to positive and negative random errors.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
38
In the method of least squares, the estimates of the parameters  0 and
yi   0  1 xi   i . (i  1, 2,..., n) are chosen such that the sum of squares of deviations
1 in the model
n

i 1
2
i
is minimum. In
the method of least absolute deviation (LAD) regression, the parameters  0 and 1 are estimated such that
n
the sum of absolute deviations

i 1
i
is minimum. It minimizes the absolute vertical sum of errors as in the
following scatter diagram:
yi
(xi, yi)
Y   0  1 X
(Xi, Yi)
xi
Least absolute deviation regression
The LAD estimates ˆ0 L and ˆ1L are the estimates of  0 and 1 , respectively which minimize
n
LAD(  0 , 1 )   yi   0  1 xi
i 1
for the given observations ( xi , yi ) (i  1, 2,..., n).
Conceptually, LAD procedure is more straightforward than OLS procedure because e (absolute residuals)
is a more straightforward measure of the size of the residual than e 2 (squared residuals). The LAD
regression estimates of
 0 and 1
are not available in closed form. Instead, they can be obtained
numerically based on algorithms. Moreover, this creates the problems of non-uniqueness and degeneracy in
the estimates. The concept of non-uniqueness relates to that more than one best line pass through a data
point. The degeneracy concept describes that the best line through a data point also passes through more than
one other data points. The non-uniqueness and degeneracy concepts are used in algorithms to judge the
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
39
quality of the estimates. The algorithm for finding the estimators generally proceeds in steps. At each step,
the best line is found that passes through a given data point. The best line always passes through another
data point, and this data point is used in the next step. When there is non-uniqueness, then there is more than
one best line. When there is degeneracy, then the best line passes through more than one other data point.
When either of the problems is present, then there is more than one choice for the data point to be used in the
next step and the algorithm may go around in circles or make a wrong choice of the LAD regression line.
The exact tests of hypothesis and confidence intervals for the LAD regression estimates can not be derived
analytically. Instead, they are derived analogously to the tests of hypothesis and confidence intervals related
to ordinary least squares estimates.
Estimation of parameters when X is stochastic
In a usual linear regression model, the study variable is supped to be random and explanatory variables are
assumed to be fixed. In practice, there may be situations in which the explanatory variable also becomes
random.
Suppose both dependent and independent variables are stochastic in the simple linear regression model
y   0  1 X  
where  is the associated random error component. The observations ( xi , yi ), i  1, 2,..., n are assumed to be
jointly distributed. Then the statistical inferences can be drawn in such cases which are conditional on X .
Assume the joint distribution of X and y to be bivariate normal N (  x ,  y ,  x2 ,  y2 ,  ) where  x and  y
are the means of X and y; x2 and  y2 are the variances of X and y; and  is the correlation coefficient
between X and
y . Then the conditional distribution of
y given X  x is the univariate normal
conditional mean
E ( y | X  x)   y|x  0  1 x
and the conditional variance of y given X  x is
Var ( y | X  x)   y2| x   y2 (1   2 )
where
 0   y   x 1
and
1 
y
.
x
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
40
When both X and y are stochastic, then the problem of estimation of parameters can be reformulated as
follows. Consider a conditional random variable y | X  x having a normal distribution with mean as
conditional mean  y|x and variance as conditional variance Var ( y | X  x)   y2| x . Obtain n independently
distributed observation yi | xi , i  1, 2,..., n from N (  y|x ,  y2| x ) with nonstochastic X . Now the method of
maximum likelihood can be used to estimate the parameters which yield the estimates of  0 and 1 as
earlier in the case of nonstochastic X as
b  y  b1 x
and
s
b1  xy ,
sxx
respectively.
Moreover, the correlation coefficient

E ( y   y )( X   x )
 y x
can be estimated by the sample correlation coefficient
n
ˆ 
 ( y  y )( x  x )
i 1
i
n
 ( xi  x )2
i 1

i
n
 ( y  y)
i 1
2
i
sxy
sxx
s yy
s
 b1 xx .
s yy
Thus
ˆ 2  b12
sxx
s yy
sxy
 b1
s yy
n

s yy   ˆi2
i 1
s yy
 R2
which is same as the coefficient of determination. Thus R 2 has the same expression as in the case when X
is fixed. Thus R 2 again measures the goodness of the fitted model even when X is stochastic.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
41
Download