Residuals
Residuals – vertical difference between points in your data and points predicted by a line of fit for the data. residual = residual = y -value of data point – y i
y
ˆ i y -value of predicted point on line of fit
A residual is a signed distance
a positive residual indicates its data point is above the line of fit
a negative residual indicates its data point is below the line of fit
A good line of fit should have about as many points above the line of fit as below it
The sum of the residuals of a good line of fit is close to zero
Calculating residuals is one way to evaluate a line of fit before using it to make predictions. If the sum of the residuals is close to zero relative to the magnitude of the y -values, then the line of fit is a good model.
The manager of Big K Pizza must order supplies for the month of November. The number of pizzas sold in November over four years is shown in the table and graph below.
November
(200__)
Number of Pizzas
Sold
1
2
3
4
512
603
642
775
Step 1. Enter the data into your calculator and find the median-median line.
Step 2. Find predicted values (
ˆ ) from your model using the x -values from your data. Enter into the table below. y
ˆ i
ax i
b = y
ˆ
1
y
ˆ
2
y
ˆ
3
y
ˆ
4
Step 3. Calculate the residual ( y i
y
ˆ i
) for each data point. Enter into the table below. y
1
y
1
y
2
y
ˆ
2
y
3
y
ˆ
3
y
4
y
ˆ
4
Step 4. Calculate the sum of the residuals. Enter into the table below.
Sum of the residuals:
4 i
1
( y i
i
)
Year
(200__)
( x ) i
1
Number of Pizzas Sold
( y ) i
512
Predicted Number of Pizzas Sold
(
ˆ ) i
Residuals
( y i
y i
)
2
3
603
642
4 775 i
4
1
( y i
i
)
Compare the sum of the residuals to the number of pizzas sold each November. Is the sum of the residuals close to zero when compared to the number of pizzas sold? Would you consider this model a good line of fit?
Root Mean Square Error
As previously discussed, when the sum of the residuals for a set of data is close to zero, it is an indication that the line of fit is a good model. However , as seen in the graph to the right, it is possible to have a poor line of fit even when the sum of the residuals is close to zero.
A good line of fit should follow the general direction of the points for a given set of data. This means that each residual should also be as close to zero as possible.
Because the sum of the residuals does not guarantee a good line of fit, we need a better measure to judge how accurate predictions from a model will be. One useful measure is called the root mean square error .
Root Mean Square Error – a measure of the spread of data points from a model line of fit
Step 5. Calculate the square of each residual. Enter into the table below.
( y
1
1
)
2
( y
2
2
)
2
( y
3
3
)
2
( y
4
4
)
2
Step 6. Calculate the sum of the square residuals. Enter into the table below.
Sum of square residuals:
4 i
1
( y i
i
)
2
Year
(200__)
( x ) i
1
2
3
4
Number of Pizzas Sold
( y ) i
512
603
642
775
Predicted Number of Pizzas Sold
( y
ˆ ) i
505.01
592.68
680.35
768.02
(
Residuals y i
y i
)
6.99
10.32
–38.35
6.98 i
4
1
( y i
i
)
2
Square Residuals
( y i
y i
)
2
The sum of the square residuals is large compared to the number of pizzas sold, so we calculate an average square residual in the predictions to provide a more useful measure.
We calculate the average square residual of a model line of fit by dividing the sum of the square residuals by n – 2, where n is the total number of data points.
Why do we divide by n – 2? When we consider that it takes only two data points to determine the equation of a line, we use only the remaining data points to determine the spread of the data points from the model line of fit. Therefore, we use only n – 2 data points to calculate the average square residual.
The average of the square residuals is measured in square units (e.g., pizzas
2
). Since we want our measure of average error to be in number of pizzas, rather than pizzas 2 , we take the square root of the average of the square residuals to find the root mean square residual. We call this statistic the root mean square error.
Root Mean Square Error: s
i n
1
( y i n
2 y i
)
2
A small root mean square error indicates that the line of fit is good.
That is, the closer the root mean square error is to zero relative to the actual y-values, the better the line of fit.
Step 7. Calculate the root mean square error for the pizza data. s
i
4
1
( y i
4
2 i
)
2
Does this value indicate the line of fit is good? Why or why not?
Step 8. This value means that the actual number of pizzas sold should be within ________ pizzas of the predicted value from our line of fit. Now predict how many pizzas Big K Pizza will sell in November 2008.
Step 9. Using your prediction and the root mean square error, what is the range of pizzas the manager of Big K Pizza can expect to sell in November 2008?