REGRESSION Meaning of regression: A British biometrician, Sir Francis Galton, defined regression as ‘stepping back towards the average’. He found that the offspring of abnormally tall or short parents tends to regress or step back to average. Line of regression: If a straight line can be drawn through the points of a scatter diagram, it is called as the line of regression. 1 Regression Equation: Regression equation establishes a linear mathematical relationship between the two variables. Use of regression equation: Regression equations are used to estimate/ predict values of one variable from the known values of the other. 2 Let the regression line be represented by the equation Y = A + BX or Y = a + bX Here X is called the independent variable and Y is called the dependent variable. The equation of the regression line can be obtained by minimization of error from the above mentioned equation. While doing so we get the following two NORMAL EQUATIONS. 3 Σyi = n a + b Σxi --------------- 1 Σxiyi = a Σxi + b Σxi2 -------------- 2 These are two equations in two unknowns a and b. On solving them simultaneously, we get, 4 1 xi yi xy n b 2 xi x 2 n a y bx 5 Regression equation of Y on X y = a + bx Or _ _ (y-y) = byx(x-x) _ _ where byx = 1/n ∑(xi – x)(yi – y) _ 1/n∑(xi – x)2 6 Let the regression line be represented by the equation X = a’ + b’Y or x = a’ + yb’ Here Y is called the independent variable and X is called the dependent variable. The equation of the regression line can be obtained by minimization of error from the above mentioned equation. While doing so we get the following two NORMAL EQUATIONS. 7 Σxi = n a’ + b’ Σyi --------------- 1 Σxiyi = a’ Σyi + b’ Σyi2 -------------- 2 These are two equations in two unknowns a’ and b’. On solving them simultaneously, we get, 8 1 xi yi xy n b' 2 yi y 2 n a ' x b' y 9 Regression equation of X on Y x = a’ + b’y _ _ (x – x) = bxy(y – y) _ _ where bxy = 1/n ∑(xi – x)(yi – y) _ 1/n∑(yi – y)2 10 Why are there two line of regression? There are two lines of regression: One of Y on X and the other of X on Y. The line of regression of Y on X is used to estimate or predict the value of Y for any given value of X i.e. when Y is a dependent variable and X is an independent variable. The estimate so obtained will be best in the sense that it will have the minimum possible error as defined by the principle of least squares. 11 We can also estimate X for any given value of Y by using the same equation Y = a + bX but the estimate so obtained will not be best since this equation is obtained by minimizing the sum of squares of errors of estimate in Y and not in X. Hence to estimate X for any given value of Y we use the regression equation of X on Y which is derived on minimizing the sum of squares of errors of estimates in X. Here X is a dependent variable and Y is an independent variable. 12 The two regression equations are not reversible or interchangeable because the basis and assumption for deriving these equations are quite different. The regression equation of Y on X is obtained on minimizing the sum of squares of the errors parallel to the Y–axis while the regression equation of X on Y is obtained on minimizing the sum of squares of the errors parallel to the X-axis. 13 Properties of Regression Coefficients 1. Regression coefficients represent the slopes of the respective lines of regression. 2. The point of intersection of the two regression lines is ____ ____ at (X, Y). 3. A regression coefficient represents the increment in the value of the dependent variable corresponding to a unit change in the value of the independent variable. 4. r byxbxy 14 Note: The sign of r depends on the sign of the regression coefficients. If the regression coefficients are positive, r will be positive and if the regression coefficients are negative, r will be negative. 4. If one of the regression coefficients is less than unity, the other must be greater than unity. i.e. if byx < 1 then bxy > 1 and if bxy < 1 then byx > 1 5. The arithmetic mean of the regression coefficients is always greater than or equal to the correlation coefficient. i.e. ½ (bxy + byx) > r 15