Outliers, Residuals and influential points

advertisement
Residuals, outliers, influential
observations
AP Statistics
Rediduals
 A residual is the difference between an ovserved value of the
response variable and the value predicted by the regression
line. That is,
 Residual = observed y – predicted y
ˆ
y  y
 Or
 THE MEAN OF THE LEAST SQUARES RESIDUALS IS
ALWAYS ZERO.
Example 3.14
Here is a scatterplot of the data with the regression
line
Predicted y = 109.8738-1.1270 ( Age at first word)
Example 3.14 Continued

Here is a scatterplot of the data with the regression
line

Predicted y = 109.8738-1.1270 ( Age at first
word)

For child 1, who first spoke at 15 months,
we predict the score:

Predicted score = 109.8738 – 1.1270 (15)
= 92.97

The child’s actual score was 95. The
residual is

Residual = observed y – predicted y = 95 –
92.97 = 2.03

This residual is positive because it lies above
the line.
Residual Plot
 a scatter plot of the regression residuals against the
explanatory variable. Residual plots help us assess the fit
of the regression line.
INTERPRETING RESIDUAL PLOTS:
 The following residual plot is in a curved pattern and
shows that the relationship is not linear. A straight line is not
a good summary for such data.
INTERPRETING RESIDUAL PLOTS:
 Increasing or decreasing spread about the line as x increases
indicates that prediction of y will be less accurate for larger
x. The following is an example of this sort of situation:
INTERPRETING RESIDUAL PLOTS:
 The following shows a residual plot that has a uniform scatter
of points about the fitted line with no unusual ovservations.
This tells us that our linear model (regression line) will give
us a good prediction of the data.
Outliers and Influential points
 Outlier: an observation that lies outside the overall pattern of the other
observations
 Influential points: Points that when removed would markedly change the result
of your calculations. Points that are outliers in the x direction of a scatterplot
are often influential for the least squares regression line.
Residual plots on the calculator:

3.45 p. 166-167 Beavers and Beetles
Ecologists sometimes find rather strange relationships in our environment. One study seems to show that beavers benefit beetles. The
researchers laid out 23 circular plots, each four meters in diameter, in an area where beavers were cutting down cottonwood trees. In each plot
they counted the number of stumps from trees ct down by beavers and the number of clusters of beetle larvae. Here are the data:
 Stumps: 2
 Beetle: 10











2 1
30 12
3 3
24 36
4
40
3
43
Larvae
Stumps: 1
3 2
1
2
2 1
Beetle
18 40 25
8 21
14 16
Larvae:
Stumps: 2
1 4
Beetle 13 14 50
NOW, Calculate the LSRL
Highlight List 3
2nd STAT
Go down to the RESID List and hit enter
2nd y=, turn plot 1 on scatterplot
For x list use List 1, but for y list use List 3
Zoom 9 (Zoom Stat)
1
11
2
27
5
56
1
6
4
54
1
9
Download