Regression equation of X on Y

advertisement
REGRESSION
Meaning of regression: A British biometrician, Sir Francis
Galton, defined regression as ‘stepping back towards the
average’. He found that the offspring of abnormally tall or
short parents tends to regress or step back to average.
Line of regression: If a straight line can be drawn through
the points of a scatter diagram, it is called as the line of
regression.
1
Regression Equation: Regression equation establishes a
linear mathematical relationship between the two
variables.
Use of regression equation: Regression equations are used to
estimate/ predict values of one variable from the known
values of the other.
2
Let the regression line be represented by the equation
Y = A + BX or Y = a + bX
Here X is called the independent variable and Y is called the
dependent variable.
The equation of the regression line can be obtained by
minimization of error from the above mentioned equation.
While doing so we get the following two NORMAL
EQUATIONS.
3
Σyi = n a + b Σxi
--------------- 1
Σxiyi = a Σxi + b Σxi2 -------------- 2
These are two equations in two
unknowns a and b. On solving them
simultaneously, we get,
4
1
xi yi  xy

n
b
2
 xi  x 2
n
a  y  bx
5
Regression equation of Y on X
y = a + bx
Or
_
_
(y-y) = byx(x-x)
_
_
where byx = 1/n ∑(xi – x)(yi – y)
_
1/n∑(xi – x)2
6
Let the regression line be represented by the equation
X = a’ + b’Y or x = a’ + yb’
Here Y is called the independent variable and X is called
the dependent variable.
The equation of the regression line can be obtained by
minimization of error from the above mentioned
equation. While doing so we get the following two
NORMAL EQUATIONS.
7
Σxi = n a’ + b’ Σyi
--------------- 1
Σxiyi = a’ Σyi + b’ Σyi2 -------------- 2
These are two equations in two
unknowns a’ and b’. On solving them
simultaneously, we get,
8
1
xi yi  xy

n
b' 
2
 yi  y 2
n
a '  x  b' y
9
Regression equation of X on Y
x = a’ + b’y
_
_
(x – x) = bxy(y – y)
_
_
where bxy = 1/n ∑(xi – x)(yi – y)
_
1/n∑(yi – y)2
10
Why are there two line of regression?
There are two lines of regression: One of Y on X and the
other of X on Y.
The line of regression of Y on X is used to estimate or
predict the value of Y for any given value of X i.e. when Y
is a dependent variable and X is an independent variable.
The estimate so obtained will be best in the sense that it will
have the minimum possible error as defined by the principle
of least squares.
11
We can also estimate X for any given value of Y by using the
same equation Y = a + bX but the estimate so obtained will
not be best since this equation is obtained by minimizing the
sum of squares of errors of estimate in Y and not in X.
Hence to estimate X for any given value of Y we use the
regression equation of X on Y which is derived on minimizing
the sum of squares of errors of estimates in X.
Here X is a dependent variable and Y is an independent
variable.
12
The two regression equations are not reversible or
interchangeable because the basis and assumption for
deriving these equations are quite different.
The regression equation of Y on X is obtained on
minimizing the sum of squares of the errors parallel to
the Y–axis while the regression equation of X on Y is
obtained on minimizing the sum of squares of the errors
parallel to the X-axis.
13
Properties of Regression Coefficients
1. Regression coefficients represent the slopes of the
respective lines of regression.
2. The point of intersection of the two regression lines is
____
____
at (X, Y).
3. A regression coefficient represents the increment in the
value of the dependent variable corresponding to a unit
change in the value of the independent variable.
4.
r   byxbxy
14
Note: The sign of r depends on the sign of the regression
coefficients. If the regression coefficients are positive, r
will be positive and if the regression coefficients are
negative, r will be negative.
4. If one of the regression coefficients is less than unity,
the other must be greater than unity.
i.e. if byx < 1 then bxy > 1 and if bxy < 1 then byx > 1
5. The arithmetic mean of the regression coefficients is
always greater than or equal to the correlation coefficient.
i.e. ½ (bxy + byx) > r
15
Download