British Biometrician Sir Francis Galton was the one who used the term Regression in the later part of 19 century. Regression means the estimation or prediction of the unknown value of dependent variable from the known value of independent variable. In other words, regression analysis is a mathematical measure of the average relationship between two or more variables. Definition of Regression By M.M. Blair- Regression is the measure of the average relationship between two or more variables. By Hirsch- regression analysis measures the nature and extent of the relation between two or more variables, thus enables us to make predictions. 1. Regression analysis is essential for planning and policy making: It is widely used to estimate the coefficient of economic relationship. These coefficient helps in the formulation of economic policy of govt. 2. Testing Economic Theory: Regression analysis is one of the basic tool to test the accuracy of economic theory. 3. Prediction: By regression analysis, the value of dependent variable can be predicted on the basis of the value of an independent variable. For example, if price of a commodity rises, what will be the probable fall in demand, this can be predicted by regression. 4. Regression is used to study the functional relationship: In agriculture, it is important to study the response of fertilizer. Basis Correlation Regression 1. Degree and nature of Relationship Correlation is a measure of degree of relationship between X and Y. Regression studies the relationship between two variables so that one may be able to predict the value of one variable on the basis of another. 2. Cause and Effect Relationship Correlation does not Regression analysis always assume cause and expresses the cause effect relationship. and effect relationship. 3. Prediction Correlation does not make Regression analysis any prediction. enable us to make prediction. 4. Non-sense Correlation: In Correlation analysis, sometimes there is a nonsense correlation like between rise in income and rise in weight In regression analysis there is nothing like nonsense regression. 1. Simple and Multiple Regression: In simple regression analysis, we study only two variables at a time, in which one variable is dependent and another is independent. The relationship between income and expenditure is an example of simple regression. In multiple regression analysis, we study more than two variables at a time in which one is dependent variable and others are independent variable. The study of effect of rain and irrigation on yield of wheat is an example of multiple regression. 2. Linear and Non-linear Regression: When one variable changes with other variable in fixed ratio this is called linear regression. When one variable varies with other variable in changing ratio, then it is called as non-linear regression. 3. Partial and Total Regression: When two or more variables are studied for functional relationship but at a time, relationship between two variables is studied and the other variables are constant, it is known as partial regression. When all the variables are studied simultaneously for the relationship among them is called total regression. Regression Lines Regression Equations Regression Coefficients 1. 2. The Regression Line shows the average relationship between two variables. This is also known as the line of Best Fit. If two variables X and Y are given, then there are two regression lines related to them: Regression Line of X on Y: The regression line of X on Y gives the best estimate for the value of X for any given value of Y Regression Line of Y on X: The regression line of Y on X gives the best estimate for the value of Y for any value of X. 1. Scatter diagram method 2. Least Square method This is the simplest method of constructing regression lines. In this method, values of related variables are plotted on a graph. A straight line is drawn passing through the plotted points. The straight line is drawn with freehand. The shape of regression line can be linear or non-linear. Regression Lines can also be constructed by this method. Under this method a regression line is fitted through different points in such a way that the sum of squares of the deviations of the observed values from the fitted line shall be least. The line drawn by this method is called Line of Best Fit. Under this method two lines regression lines are drawn in such a way that sum of squared deviations becomes minimum. Regression equations are the algebraic formulation of regression lines. Regression Equations represent regression lines. There are two regression equations: Regression Equation of Y on X Regression equation of X on Y This equation is used to estimate the probable value of Y on the basis of the given values of X. This equation can be expressed in the following way: Y=a+bX Here a and b are constants Regression equation of Y on X can also be presented in another way: Y−Y¯=byx(X—X¯) This equation is used to estimate the probable values of X on the basis of the given values of Y. This equation is expressed in the following: X=a+bY Here a and b are constants Regression equation of X on Y can also be written in the following way: X—X¯=bxy(Y—Y¯¯) There are two regression coefficients. Regression coefficients measures the average change in the value of one variable for q unit change in the value of another variables. Regression coefficient represents the slope of a regression line. There are two regression coefficients: Regression coefficient of Y on X Regression coefficient of X on Y This coefficient shows that with a unit change in the value of X variable, what will be the average change in the value of Y variable. This is represented byx. Its formula is as follows: byx=r. y∕ x Regression coefficient of X on Y This coefficient shows that with a unit change in the value of Y variable, what will be the average change in the value of X-variable. It is represented by bxy. Its formula is as follows: bxy=r. x∕ y Following are the properties of regression coefficients: 1. Coefficient of correlation is the geometric mean of the regression coefficients. r=√bxy×byx 2. Both the regression coefficients must have same sign- This means either both regression coefficients will either be positive or negative. In other words when one regression coefficient is negative, the other would also be negative. It is never possible that one regression coefficient is negative while the other is postive. 3. The coefficient of correlation will have the same sign as that of regression coefficients- If both regression coefficients are negative, then the correlation would be negative. And if byx and bxy have positive signs, then r will also take plus sign. 4. Both regression coefficients cannot be greater than unity- If one regression coefficient of y on x is greater than unity, then the regression coefficient of x on y must be less than unity. This is because r=√byx.bxy=±1 5. Arithmetic mean of two regression coefficients is either equal to or greater than the correlation coefficient. In terms of the formula: byx+bxy÷2≥r 6. Shift of origin does not affect regression coefficients but shift in scale does affect regression coefficient- Regression coefficients are independent of the change of origin but not of scale. Regression Equations in case of Individual Series Regression Equations in case of Grouped Data In Individual series, regression equations can be worked out by two methods: Using Normal Equations Using regression Coefficients This method is also called Least Square Method. Under this method, computation of regression equations is done by solving two normal equations. Regression Equation of Y on X Y=a+bX Under least square method, the values of a and b are obtained by using the following two normal equations: ∑Y=Na+b∑X ∑XY=a∑X+b∑X² Solving these equations, we get the following value of a and b: bxy= N.∑XY−∑X.∑Y÷N.∑X²−(∑X)² a=Y-bX Finally a and b are put in the equation: Y=a+bX Regression Equation of X on Y is expressed as follows: X=a+bY Under least Square Method the value of a and b are obtained by using the following normal equations: ∑X=Na+b∑Y ∑XY=a∑Y+b∑Y² Solving these equations we get the value of a and b The calculated value of a and b are put in the equation: X=a+bY Calculate the regression equation of X on Y from the following by least square method X Y X² Y² XY 1 2 1 4 2 2 5 4 25 10 3 3 9 9 9 4 8 16 64 32 5 7 25 49 35 N=5 ∑X=15 ∑Y=25 ∑X²=55 ∑Y²=151 ∑XY=88 Regression Equation of X on Y X=a+bY The two normal equations are: ∑X=Na+b∑Y ∑XY=a∑Y+b∑Y² Substituting the values we get 15=5a+25b 88=25a+151b Multiplying 1 and 2 88=25a+151b 75=25a+125b − − − 13=26b b=13÷26=0.5 Putting the value of b in equation1 15=5a+25×0.5 15=5a+12.5 5a=2.5 a=0.50 Therefore: X=0.5+0.5Y Regression equations can also be computed with the help of regression coefficients. For this we will have to find out, X‾, Y‾, byx, bxy from the data. Regression equations can be computed from the regression coefficients by following method: 1. Using actual values of X and Y 2. Using deviations from Actual Means 3. Using deviations from Assumed Means 4. Using r, x, y, and X‾, Y‾. In this method , actual values of X and Y are used to determine regression equations. Regression equation are put in the following way: Regression Equation of Y on X Y-Y‾=byx(X-X‾) Using the actual values, the value of byx can be calculated as: byx=N.∑XY−∑X.∑Y÷N.∑X²−(∑X)² Regression Equation of X on Y X−X‾=bxy(Y−Y‾) Using the actual values bxy can be calculated as : bxy=N.∑XY−∑X.∑Y÷N.∑Y²−(Y)² Obtain two lines of equations from the following : X Y XY X² Y² 2 5 10 4 25 4 7 28 16 49 6 9 54 36 81 8 8 64 64 64 10 11 110 100 121 ∑X=30 ∑Y=40 ∑XY=266 ∑X²=220 ∑Y²=340 Regression coefficient of Y on X byx=N.∑XY−(∑X) (∑Y)÷N.∑X²−(∑X)² =5×266−30×40÷5×220−(30)² =1330−1200÷1100−900 =130÷200 =0.65 Regression coefficient of X on Y bxy=N.∑XY−∑X.∑Y÷N.∑Y²−(Y)² =5×266−30×40÷5×340−(40)² =1330−1200÷1700−1600 =130÷100 =1.30 Y−Y‾=byx(X−X‾) Y‾=∑Y÷N =40÷5 =8 X‾=∑X÷N =30÷5 =6 Regression equation of Y on X Y−Y‾=byx(X−X‾) Y−8=0.65(X− 6) Y=4.1+0.65X Regression equation of X on Y X−X‾=bxy(Y−Y‾) X−6=1.30(Y−8) X=−4.40+1.30Y When the size of the values of X and Y is very large, then the method using actual values becomes very difficult to use. In such case, deviations taken from arithmetic means are used. Regression equation of Y on X Y−Y‾=byx(X−X‾) Using deviations from actual means: byx=∑xy÷∑x² Where, x=X−X‾, y=Y−Y‾ Regression equation of X on Y X−X‾=bxy(Y−Y‾) Using deviations from actual means: bxy=∑xy÷∑y² Obtain the two regression equations from the following: X Y X‾=7 X−X‾ x x² Y‾=5 Y−Y‾ y Y² xy 2 4 −5 25 −1 1 5 4 2 −3 9 −3 9 9 6 5 −1 1 0 0 0 8 10 1 1 5 25 5 10 3 3 9 −2 4 −6 12 6 5 25 1 1 5 ∑x²=7 0 ∑y=0 ∑y²=4 0 ∑xy=1 8 ∑X=42 ∑Y=30 ∑x=0 N=6 Since the actual means of X and Y are whole numbers we should take deviations from X‾and Y‾ byx=∑xy÷∑x² =18÷70 =0.257 bxy=∑xy÷∑y² =18÷40 =0.45 Regression equation of Y on X Y−Y‾=byx(X−X‾) Y−5=0.257(X−7) Y−5=0.257X−7 Y=0.257X+3.201 Regression equation of X on Y X−X‾=bxy(Y−Y‾) X−7=0.45(Y−5) X−7=0.45Y−2.25 X=0.45Y=4.75 When actual means turn out to be in fractions rather than the whole number like 24.14, 56.89 etc, then deviations from assumed means rather than actual means are used. Following are the regression equations: Regression equation of Y on X Y−Y‾=byx(X−X‾) Using deviations from the assumed means, value of byx can be calculated as: byx=N×∑dxdy−∑dx.∑dy÷N.∑dx²−(∑dx)² Regression equation of X on Y X−X¯=bxy(Y−Y¯) Using deviations from assumed means the value of bxy can be calculated: bxy=N.∑dxdy−∑dx.∑dy÷N.∑dy²−(∑dy)² Where dx=X−Ax, dy=Y−Ay Obtain the two regression equations for the following data: X Y A=69 dx dx² A=112 dy dy² dxdy 78 125 9 81 13 169 117 89 137 20 400 25 625 500 97 156 28 784 44 1936 1232 69=A 112=A 0 0 0 0 0 59 107 −10 100 −5 25 50 79 136 10 100 24 576 240 68 124 −1 1 12 144 −12 61 108 −8 64 −4 16 32 ∑dx²= 1530 ∑dy=1 09 ∑dy²= 3491 ∑dxdy =2159 N=8 ∑Y=10 ∑dx=4 ∑X=60 05 8 0 byx=N.∑dxdy−∑dx.∑dy÷N.∑dx²−(dx)² =8×2159−(48)(109)÷8×1530−(48)² =17272−5232÷12240−2304 =12040÷9936 =1.212 Regression equation of Y on X Y−Y¯=byx(X−X¯) Y−125.625=1.212(X−75) Y−125.625=1.212X−90.9 Y=1.212X+34.725 bxy=N.∑dxdy−∑dx.∑dy÷N.∑dy²−(∑ dy)² =8×2159−(48)(109)÷8×3491−(1005)² =17272−5232÷27928−1010025 =12040÷−9827097 =−0.001 Regression equation of X on Y X−X¯=bxy(Y−Y¯) X−75=−0.001(Y−125.625) X−75=−0.001Y+0.125 X=−0.001Y+9.375 When the values of X¯, Y, x, y and r of X and Y series are given then regression equations are expressed in the following way: 1. Regression Equation Of Y on X Y−Y¯=byx(X−X¯) Where, byx=r. y x 2. Regression equation of X on Y X−X¯=bxy(Y−Y¯) Where, bxy=r. x y You are given the following information: Obtain two regression equations: X Y Arithmetic Mean: 5 12 Standard deviation: 2.6 3.6 Correlation Coefficient: r=0.7 1. Regression Equation Of X on Y: X−X¯=r. x y(Y−Y¯) Putting the values in the equation we get: X−5=0.7×2.6÷3.6(Y−12) X−5=0.51(Y−12) X−5=0.51Y−6.12 X=0.51Y−1.12 2. Regression equation of Y on X Y−Y¯=r. y x(X−X¯) Putting the values in the equation, we get Y−12=0.7×3.6÷2.6(X−5) Y−12=0.97(X−5) Y−12=0.97X−4.85 Y=0.97X+7.15 For obtaining regression equations from grouped data, first of all we have to construct a correlation table. Special adjustments must be made while calculating the value of regression coefficients because regression coefficients are independent of change of origin but not of scale. In grouped data the regression coefficients are computed by using the formula: 1.bxy=N×∑fdxdy−∑fdx.∑fdy÷N×∑fdy²−(∑fdy)²×ix÷iy 2.byx=N×∑fdxdy−∑fdx.∑fdy÷N×∑fdx²−(∑fdx)²×iy÷ix 1. To find the Mean Values from the Regression Equations: Two regression lines intersect each other at mean value(X¯ and Y‾ points. 2. To find the Coefficient of Correlation from two Regression Equations: Correlation coefficient can be worked out from the regression coefficients bxy and byx. r=√byx.bxy From the following two regression equations, identify which one is X on Y and which one is Y on X 2X+3Y=42 X+2Y=26 Solution In the absence of any clear cut indication, Let us assume that equation first to be Y on X and equation second to be X on Y Let equation 1 be regression equation of Y on X 2X+3Y=42 3Y=42−2X Y=42÷3−2X÷3 From this it follows that byx=Coefficient of X in 1=−2÷3 Now equation 2 be regression equation on X on Y X+2Y=26 X=26−2Y From this it follows that bxy=coefficient of Y in 2=−2 Now, we calculate ‘r’ on the basis of the above values of two regression coefficients we get: r²=byx.bxy =−2÷3×−2 = 4÷3>1 Here, r²>1 which is impossible as r²≤1. So our assumption is wrong. We now choose equation 1 as regression of X on Y and 2 as regression equation of Y on X: Assuming the first equation as on X on Y, we have 2X+3Y=42 2X=42−3Y X=42÷2−3Y÷2 From this, it follows that bxy=Coefficient of Y in above equation=−3÷2 Now, assuming the second equation as Y on X, we have, X+2Y=26 2Y=26−X Y=26÷2−1÷2X From this it follows that byx=Coefficient of X in above equation=−1÷2 Now, r²=bxy.byx =−3÷2×−1÷2 =3÷4 =0.75 Here, r²<1 which is possible. r² is within the limit , r²≤1. Hence, it is proved that the first equation is of X on Y and the second equation is of Y on X