Regression Residual Analysis ppt

advertisement
TODAY we will
•
•
•
•
Review what we have learned so far
about Regression
Develop the ability to use Residual
Analysis to assess if a model (LSRL) is
appropriate for predictions
Understand how the Standard error (Se)
is used in regression Analysis
Review
How to describe a scatterplot
Correlation Coefficient ( r )
Math Vs. Stats
Equation of Line vs. LSRL
Interpret Slope and y-intercept
What is a residual (or error)?
Review
How to describe a scatterplot
 Trend ~ Positive or Negative
 Form ~ Linear or non Linear
 Strength ~ moderate, weak or strong
 Correlation Coefficient ( r )
• -1< r < 1
• Strength
• R Close to 1 or -1 ~ Strong association
• R Close to 0 ~ Weak or no linear association
 Trend
• Positive association (as x variable increase, y variable also
increase)
• Negative Association (as x variable increase, Y variable
decrease)
Review
 Math vs. Stats
 Equation of Line vs. LSRL
Line  Math  y = mx + b
Line  Stats 
ŷ = a + bx
or
ŷ = b 0 +b1 x
Review
Interpret Slope and y-intercept
 Slope:
• For every one unit of x, y increases (decreases) on
average by the slope.
 Y-intercept
• When the value of the variable x=0 then the value
of the variable y = “a”
Review
What is a residual (or Error)
Observed y
Predicted y
} residual
Error = Residuals
OBSERVED Y VALUE – Predicted Y value
Use Residual Analysis to assess if
the model (LSRL) is appropriate for
making predictions
Correlation and Linearity and
Outliers
Only use linear correlation to interpret the data
when there is a linear relationship
An outlier can strongly influence the correlation.
Fitting a Model for Prediction
or
Fitting the LRSL for Prediction
Allow Random Variation
A model is not the reality
Text
Signal
Deterministic
MESSAGES
Noise
Stochastic
All models are wrong but some are useful
Residual Analysis Address directly the problem of Signal and Noise
Signal and Noise
Signal and Noise
Signal and Noise
Types of Residual plots
Different plots can highlight different
departures or problems in the prediction
model.
1)
2)
3)
4)
Residual vs. Fitted
Histogram
PP~PLOT
Order vs. Fitted
Note: these plots are from software output (Minitab)
Residual vs. Fitted value plot
Three common defects may be revealed by
plotting residuals vs. fitted value
 1) Outliers
 2) Progressive change in the variance:
• Band of uniform width
• Funnel shape = not equal variance : transform
 3) inadequacy of the model :
• Curvature ~ wrong model
• Linear trend going up ~ wrong calculation
Residual vs. Fitted
Let's look at an example to see what a
"well-behaved"
residual plot looks like.
Scatterplot
Some researchers (UrbanoMarquez, et al., 1989) were
interested in determining
whether or not alcohol
consumption was linearly
related to muscle strength.
The researchers measured
the total lifetime
consumption of alcohol (x)
on a random sample of n =
50 alcoholic men. They also
measured the strength (y) of
the deltoid muscle in each
person's nondominant arm.
A fitted line plot of the
resulting data,
(alcoholarm.txt), looks like:
Scatterplot
Residual Plot
Residual vs. Fitted
Let's look at an example to see what a
”not so well-behaved"
residual plot looks like.
What do you notice in this scatterplot?
Scatterplot
Residual plot
0
Foot length
Predicted or Fitted
OUTLIER
0
Predicted or Fitted
0
Outlier Removed
Predicted or Fitted
Let's look at an example to see what a
”not well-behaved"
residual plot looks like.
0
Heteroscedasticity
 When the requirement of a constant variance is
violated we have a condition of heteroscedasticity.
 Diagnose heteroscedasticity by plotting the residual
against the predicted y.
+
++
^y
Residual
+ + +
+
+
+
+
+
+
+
+
+
++ +
+
+ +
+
+
+
+ +
+
+ +
+
+
+
The spread increases with ^y
y^
++
+ ++
++
++
+
+
++
+
+
Signal and Noise
Residuals plots
fitted vs. residuals
Homoscedasticity vs. Heteroscedasticity
Homoscedasticity
• A residual plot is a
scatterplot of
• the standardized
residuals
• against the fitted
values
Let's look at an example to see what a
”not well-behaved"
residual plot looks like.
How does a non-linear
regression function show up on
a residual vs. fits plot?
How does a non-linear regression function show up on a
residual vs. fits plot?
The answer: The residuals depart from 0 in some systematic
manner, such as being positive for small x values, negative for
medium x values, and positive again for large x values. Any
systematic (non-random) pattern is sufficient to suggest that the
regression function is not linear.
2) The random errors are normally distributed
and centered at zero
•
•
•
Histograms + PP PLOTS -- Normality assumption
Histogram show why center at zero and why bell shape
QQ plots better to discover the normal shape because the histogram bins
can be manipulated and therefore the normal shape maybe difficult in
some cases.
Histograms of residuals
What to look for?
Centered at zero
Bell shaped
No outliers
Centered at zero
Bell shaped
No outliers
How strict?
Centered at zero
Bell shaped
No outliers
What does it mean
when Histogram is
skewed
R, R-squared,SE
4 in one residual plots
Look at this graph normal
residuals???
Here's the corresponding normal
probability plot of the residuals:
residuals vs. order plot
residuals vs. order plot" as a way of detecting a particular form of nonindependence of the error terms, namely serial correlation. If the data are
obtained in a time (or space) sequence, a residuals vs. order plot helps
to see if there is any correlation between the error terms that are near each
other in the sequence.
The plot is only appropriate if you know the order in which the data were
collected! Highlight this, underline this, circle this, ..., er, on second thought, don't
do that if you are reading it on a computer screen. Do whatever it takes to
remember it though — it is a very common mistake made by people new to
regression analysis.
So, what is this residuals vs. order plot all about? As its name suggests, it is a
scatter plot with residuals on the y axis and the order in which the data were
collected on the x axis. Here's an example of a well-behaved residuals vs. order
plot:
Residual Vs. Order
The residuals bounce randomly around the
residual = 0 line as we would hope so. In
general, residuals exhibiting normal
random noise around the residual = 0 line
suggest that there is no serial correlation.
Residual Vs. Order
A residuals vs. order plot that exhibits
(positive) trend as the following plot does:
R2
Residuals
Se
R-Squared
Residual
Standard Error
Residuals Analysis is more important than High R2
Residual Activity
https://www.causeweb.org/repository/StarLibrary/activities/miller2001/
Download