“Teach A Level Maths” Vol. 2: A2 Core Modules Calculating Residuals © Christine Crisp Calculating Residuals Statistics 1 AQA EDEXCEL OCR "Certain images and/or photos on this presentation are the copyrighted property of JupiterImages and are being used with permission under license. These images and/or photos may not be copied or downloaded without permission from JupiterImages" Calculating Residuals Once we have found a regression line, we may need to know how close any particular observation is to the line. To do this, we find a residual. For the height and foot length data . . . Foot length and height of UK children Foot length (cm) ( xA, yA ) y on x regression line Height (cm) To find the residual for the point ( x A , y A ) we find yA y Calculating Residuals e.g. The marks for 10 students in Maths and Physics are as follows: A B C D E F G H I J Maths, x 41 37 38 39 47 42 34 35 48 49 Physics, y 36 20 31 24 35 42 26 27 29 37 The regression line for y on x is y 1 81 0 70x Residual of point A = y A y ( The residual is negative if the point is below the line.) To find y, substitute the value of x at point A into the regression line: y 1 81 0 70(41) 30 5 y A y 36 30 51 5 49 Calculating Residuals SUMMARY To find the residual for a particular observation, A, • calculate the y-coordinate on the regression line corresponding to the x-value at A, • find y A y ( xA, yA ) ( xA, y ) x • Since y a bx , the residual at A is also given by • The residual is negative if the point is below the line y A a bx A Calculating Residuals Outliers Outliers are points that lie well away from the regression line. Since a residual measures the distance of a point from a line, residuals are used to identify outliers. Outliers can have a considerable effect on a regression line and make it unreliable. Calculating Residuals e.g. The diagram is a scatter diagram of the data shown in the table. If we were to draw the line “by eye”, the 1st point . . . would lie well away from the line we would want to draw. x 1 2 3 4 5 6 7 8 y 5 18 12 14 12 11 7 3 However, the calculation of the regression line includes the 1st point and distorts the position of the line. Calculating Residuals e.g. The diagram is a scatter diagram of the data shown in the table. y 14 21 0 88 x The diagram shows the y on x regression line for all the data. The residuals are shown by the red lines. The left-hand end of the line is further down than it would be without the 1st point. x 1 2 3 4 5 6 7 8 y 5 18 12 14 12 11 7 3 Calculating Residuals e.g. The diagram is a scatter diagram of the data shown in the table. y 14 21 0 88 x Removing the 1st point . . . x 1 y 5 2 3 4 5 18 12 14 12 6 7 8 11 7 3 Calculating Residuals e.g. The diagram is a scatter diagram of the data shown in the table. y 14 21 0 88 x Removing the 1st point gives y 21 36 2 07 x x 1 y 5 2 3 4 5 18 12 14 12 6 7 8 11 7 3 Calculating Residuals e.g. The diagram is a scatter diagram of the data shown in the table. The sum of the squares of the residuals, 2 R 139 y 14 21 0 88 x Removing the 1st point gives The sum of the squares of the residuals, 2 R 19 9 y 21 36 2 07 x Without the 1st point, we have a regression line that is a much better fit. Calculating Residuals Exercise 1. The table shows the number of accidents to children as a percentage of those to adults, y, in 9 areas of London together with the percentage of open space in those areas, x. Open Spaces(%) Children’s Accidents (%) A B C D E F G H I 5 1·3 1·4 7 4·5 5·2 6·3 46· 3 42· 9 40 38· 2 37 33· 6 30· 8 14· 6 23· 8 14· 8 17· 1 (a) Find the equation of the regression line of y on x (b) Estimate the percentage of accidents to children in an area with 10% open space. (c) Find the residual for A. Calculating Residuals Solutions (a) Find the equation of the regression line of y on x (b) Estimate the percentage of accidents to children in an area with 10% open space. (c) Find the residual for A. Solution: (a) The equation of the regression line for y on x is y 45 40 1 65x (b) x 10 y 28 9 Nearly 29% of accidents will involve children. (c) At A(5, 46 3) , x5 y 45 40 1 65(5) 37 15 Residual = y A y 46 3 37 15 9 15 Calculating Residuals The following slides contain repeats of information on earlier slides, shown without colour, so that they can be printed and photocopied. For most purposes the slides can be printed as “Handouts” with up to 6 slides per sheet. Calculating Residuals SUMMARY To find the residual for a particular observation, A, • calculate the y-coordinate on the regression line corresponding to the x-value at A, • find y A y ( xA, yA ) ( xA, y ) x • Since y a bx, the residual at A is also given by y A a bx A The residual is negative if the point is below the line Calculating Residuals e.g. The marks for 10 students in Maths and Physics are as follows: A Maths, x B C D E F G H I J 41 37 38 39 47 42 34 35 48 49 Physics, y 36 20 31 24 35 42 26 27 29 37 The regression line for y on x is y 1 07 0 72x Residual of point A = y A y ( The residual is negative if the point is below the line.) To find y, substitute the value of x at point A into the regression line: y 1 07 0 72(41) 30 59 y A y 36 30 59 5 41 Calculating Residuals Outliers Outliers are points that lie well away from the regression line. Since a residual measures the distance of a point from a line, residuals are used to identify outliers. Outliers can have a considerable effect on a regression line and make it unreliable. Calculating Residuals e.g. The diagram is a scatter diagram of the data shown in the table. If we were to draw the line “by eye”, the 1st point . . . would lie well away from the line we would want to draw. x 1 2 3 4 5 6 7 8 y 5 18 12 14 12 11 7 3 However, the calculation of the regression line includes the 1st point and distorts the position of the line. Calculating Residuals e.g. The diagram shows the y on x regression line for the data in the table. The residuals are shown by the lines parallel to the y-axis. The sum of the squares of the residuals, 2 R 139 y 17 71 1 46 x The 1st point has the largest residual. y 21 36 2 07 x Without the 1st point, we have a regression line that is a much better fit. The sum of the squares of the residuals, 2 R 19 9