Module 13: SSE and Residual Plots Sum of Squared Errors (SSE) The estimate made from a model is the predicted value (denoted as 𝑦̂ ). Residual = observed – predicted ̂ = y-𝒚 Which line is better? What’s a way we can determine which is better? When we compare the sum of the areas of the yellow squares, the line on the left has an SSE of 57.8 (using computer technology). The line on the right has a smaller SSE of 43.9. So the line on the right fits the points better, but is it the best fit? Computer technolgy finds this best fit line where the SSE is the mimimum. 1 Residuals To create a residual plot, we will take the residuals and plot these errors as distances from a base line described by the explanatory or x-variable. Recall that the error or residual is the distance from the data point and the line of regression which is given by: y – ŷ Take these distances and plot them as vertical distances based on the x-value. Here we are showing the graph of the points with an attached line which shows the distances. When we do our residual plots these connected lines will not be present. You may want to use lines to get used to marking the distances if you need and then erase them afterward to get your completed residual plot 2 Example: A Residual 2 If there is NO PATTERN in the residual plot then the linear model is a good fit. 1 0 -1 -2 -3 2.5 5 7.5 10 12.5 15 x Example: B Residual 2 If there IS A PATTERN in the residual plot than the linear model is not the best fit and perhaps another equation would be a better model for the data. 1 0 -1 -2 2.5 5 7.5 10 12.5 15 x 3