Statistics 305 ASSESSING GOODNESS OF A LEAST SQUARES FIT

advertisement
Statistics 305
ASSESSING GOODNESS OF A LEAST SQUARES FIT
•
Basically, the magnitude of the set of residuals indicates relative success or failure.
However, it must be remembered that in every least squares fit the sum of residuals is
zero. This is a mathematical fact. Thus, to look at magnitude of residuals, we use the
square of each, and often employ the sum of squared residuals as a point indicator.
•
A somewhat related point indicator is the coefficient of determination, R2, defined as
R2 =
Σ( yi − y ) 2 − Σ( yi − yˆ i ) 2
Σ( y i − y ) 2
.
The second term in the numerator is the sum of squared residuals. The first term is a
measure of total variation in the sample. The interpretation of R2 is - “the proportion of
the variation in the data which is explained by the fitted function.” An R2 close to 1 is the
result of small sum of squared residuals which is itself the result of the fitted function
being “close” to the data points. This is evidence of a “desirable” fit.
•
Graphical indicators of a good fit are:
1. The normal plot of residuals is strongly linear.
2. The plot of residuals at observation numbers shows a random looking scattering
about zero, i.e., no pattern.
3. The plot of residuals vs. ŷ ’s shows no pattern.
4. The plot of residuals vs. x’s shows no pattern.
All of the above help one decide whether a modification in general form of fitted function
seems to be indicated.
Don’t lose sight of the fact that there is not a correct (as opposed to an incorrect) answer.
A least squares fit can be judged to be pretty good or not so pretty good. In the latter case
one looks for improvements.
Cautions given in your textbook include:
1. Don’t infer causality. The good fit across x values does not mean that the x
variable “causes” the behavior observed in the y’s.
2. A least squares fit can be strongly influenced by one or more outlying data points.
In a worst case, the fitted function can essentially be determined by a single
erroneous y value.
3. Do not expect that the fitted function will be a good predictor outside the region
of x values where observations were made. Quite often it won’t be.
2
Download