Chapter 6 Intercept A and Gradient of regression line, B 1 y – intercept or constant term Gradient or Slope of regression line y = A + Bx Dependent Variable, DV y- axis Independent Variable, IV y – intercept or constant term A Gradient or Slope of regression line = B 2 x-axis y = A + Bx is sometimes called a Deterministic model and it gives an exact relationship between y and x But in reality yobs is slightly different from the value predicted by ypre So y = A + Bx + e where e is random error term to take into consideration the difference (see slide 5 if you do not understand this concept) A and B are population parameters and the regression line is called Population regression line and values of A and B in the population are called true values of the y-intercept and slope. But population data are difficult to obtain. So we use sample data to estimate the population. Thus the values calculated from sample data are estimates and so the y-intercept and the slope for the sample data are denoted as ‘a’ and ‘b’ and yo is denoted as the predicted or estimated value for a given x. yo = a + bx this equation is called estimated regression model; it gives the regression of y on x based on sample data 3 Example 1 Income 35 49 21 39 15 28 25 Food Expenditure 9 15 7 11 5 8 9 4 Scatter Plot for example 1 yobs ypre Regression Line e e (error) = ypre - yobs ypre – y value predicted by regression line or best straight line x1 Yobs – actual y value obtained Or e = y - yo 5 Error Sum of Squares, SSE The sum of errors is always zero for the best straight line or least squares line. i.e. Σe = Σ(y –yo) = 0 So to find the line that best fits the points, we cannot minimize the sum of errors Since it will always be zero. Instead we minimize the error sum of squares, SSE SSE = Σe2 = Σ(y –yo)2 The value of ‘a’ and ‘b’ that give the minimum SSE are called the least squares estimates of A and B and the regression line obtained with these estimates is called the least squares regression line. For the least squares regression line, yo = a + bx Where, b = SSxy and a = y - b x SSxx y = mean of y scores x = mean of x scores 6 SSxy = Σ (x - x)(y – y) SSxx = Σ (x – x)2 SSxy = Σxy – (Σx) (Σy) n SSxx = Σx2 – (Σx)2 n Can be positive or negative Is always positive y = mean of y scores x = mean of x scores 7 Example 1 Income x 35 49 21 39 15 28 25 Σx = 212 Food Expenditure, y 9 15 7 11 5 8 9 Σy = 64 xy 315 735 147 429 75 224 225 Σxy = 2150 x2 1225 2401 441 1521 225 784 625 Σx2 = 7222 Step 1: Compute Σx, Σy, x and y. Σx = 212 Σy= 64 X = Σx / n = 212 / 7 = 30.2857 Y = Σy / n = 64 / 7 = 9.1429 Step 2: Compute Σxy and Σx2 8 Step 3: Compute SSxy and SSxx SSxy = Σxy – (Σx) (Σy) n SSxx = Σx2 – (Σx)2 n = 2150 – (212)(64) /7 = 7222 – (212)2 / 7 = 211.7143 = 801.4286 Step 4: Compute ‘a’ and ‘b’ b = SSxy SSxx = 211.7143 801.4286 and a = y - b x a = 9.1429 – (.2642)(30.2857) a = 1.1414 = .2642 The estimated regression model ypre = a + bx is ypre = 1.1414 + .2642x 9 This gives the regression of food expenditure on income. Using this estimated regression model, we can find the predicted value Of y for any specific value of x. Eg. If the monthly income is RM3500, where x = 35 in hundred Then ypre = 1.1414 – (.2642)(35) = RM10.3884 hundred = RM1038.84 But the actual y value when x = 35 is RM900 There is an error in the prediction of –RM138.84 . This negative error indicates that the predicted value of y is greater than the actual value of y. Thus if We use the regression model, the household food expenditure is overestimated by RM138.84 Calculate what happens when income = RM0? 10 Exercise 1 Calculate the regression equation for the Math (x scores) and Science (y scores) marks. Maths Science 32 45 67 56 23 12 86 79 65 73 55 65 32 40 67 77 90 87 31 40 56 49 77 82 10 13 75 76 67 68 77 79 34 45 28 31 44 49 11 Exercise 2 Calculate the regression equation for the Maths (x scores) and History (y scores) marks. Maths History 32 70 67 30 23 80 86 45 65 35 55 65 32 65 67 35 90 10 31 70 56 49 77 42 10 90 75 40 67 59 77 51 18 81 28 55 44 49 12 Exercise 3 Calculate the regression equation for the graph between IQ ranges (x axis) and the Correlation coefficients ( r) between Overall Creativity (OC) and Overall Achievement (OA) (y axis) based on the data on page 77 (Graph 7.1) (Palaniappan, 2006) 13