Simple Linear Regression Analysis

advertisement
IT233: Applied Statistics
TIHE 2005
Lecture 07
Simple Linear Regression Analysis
In this section we do an in-depth analysis of the linear association between two
variables X (called an independent variable or regressor) and Y (called a
dependent variable or response).
Simple Liner Regression makes 3 basic assumptions:
1.
Given a value ( x0 ) of X , the corresponding value of Y is a
random variable whose mean 
(the mean of Y given the value
Y x0
x0 ) is a linear function of X .
i.e.

Y x0
    x0
or, E Y X  x0      x0
2.
The variation of Y around this given value x0 is Normal.
3.
The variance of Y is same for all given values of X
i.e.
2
Y x0
 2 ,
for any x0
Example: In simple linear regression, suppose the variance of Y when X = 4 is
16, what is the variance of Y when X = 5?
Answer: Same
Simple Linear Regression Model:
Using the above assumptions, we can write the model as:
Y    x 
Where  is a random variable (or error term) and follows normal distribution
with E ( )  0 and Var ( )   2
i.e.


N 0,  2

Illustration of Linear Regression:
Let X = the height of father
Y = the height of son
For fathers whose height is x0 , the heights of the sons will vary randomly.
Linear regression states that the height on son is a linear function of the height
of father i.e.
Y    x
Scatter Diagrams:
A scatter diagram will suggest whether a simple linear regression model would
be a good fit.
Figure (A) suggests that a simple linear regression model seems OK, though
the fit is not very good (Wide scatter around the fitted lines).
Figure (B) suggest that a simple linear regression model seems to fit well
(points are closed to the fitted line).
In (C) a straight line could be fitted, but relationship is not linear.
In (D) there is no relationship between Y and X .
Fitting a Linear Regression Equation:
Keep in mind that there are two lines
Y
Y     x (True line)
Yˆ  ˆ  ˆ x (Estimated line)
X
Notation:
a (or ˆ ) = Estimate of 
b (or ˆ ) = Estimate of 
yi
= The observed value of Y corresponding to xi
yˆ
= The fitted value of Y corresponding to xi
i
ei
= y  yˆ = The residual .
i
i
Residual: The error in Fit:
The residual, denoted by ei = y  yˆ , is the difference between the observed
i
i
and fitted values of Y . It estimates i .
Y
yi
Yˆ  ˆ  ˆ x

ei 

yˆ i
xi
X
The Method of Least Squares:
We shall find a (or ˆ ) and b (or ˆ ) , the estimates of  and  , so that the sum
of the squares of the residuals is minimum. The residual sum of squares is often
called the sum of squares of the errors about the regression line and denoted by
SSE . This minimization procedure for estimating the parameters is called the
method of least squares. Hence, we shall find a and b so as to minimize
n

i 1
n
SSE   ei2   yi  yˆi
i 1

2

i 1
n
  yi  a  bxi

2
Differentiating SSE with respect to a and b and setting the partial derivatives
equal to zero, we obtain the equations (called the normal equations).
n
n
i 1
i 1
n
n
n
i 1
i 1
i 1
na  b  xi   yi
a  xi  b  xi2   xi yi
Which may be solved simultaneously for a and b .
Download