TODAY we will • • • • Review what we have learned so far about Regression Develop the ability to use Residual Analysis to assess if a model (LSRL) is appropriate for predictions Understand how the Standard error (Se) is used in regression Analysis Review How to describe a scatterplot Correlation Coefficient ( r ) Math Vs. Stats Equation of Line vs. LSRL Interpret Slope and y-intercept What is a residual (or error)? Review How to describe a scatterplot Trend ~ Positive or Negative Form ~ Linear or non Linear Strength ~ moderate, weak or strong Correlation Coefficient ( r ) • -1< r < 1 • Strength • R Close to 1 or -1 ~ Strong association • R Close to 0 ~ Weak or no linear association Trend • Positive association (as x variable increase, y variable also increase) • Negative Association (as x variable increase, Y variable decrease) Review Math vs. Stats Equation of Line vs. LSRL Line Math y = mx + b Line Stats ŷ = a + bx or ŷ = b 0 +b1 x Review Interpret Slope and y-intercept Slope: • For every one unit of x, y increases (decreases) on average by the slope. Y-intercept • When the value of the variable x=0 then the value of the variable y = “a” Review What is a residual (or Error) Observed y Predicted y } residual Error = Residuals OBSERVED Y VALUE – Predicted Y value Use Residual Analysis to assess if the model (LSRL) is appropriate for making predictions Correlation and Linearity and Outliers Only use linear correlation to interpret the data when there is a linear relationship An outlier can strongly influence the correlation. Fitting a Model for Prediction or Fitting the LRSL for Prediction Allow Random Variation A model is not the reality Text Signal Deterministic MESSAGES Noise Stochastic All models are wrong but some are useful Residual Analysis Address directly the problem of Signal and Noise Signal and Noise Signal and Noise Signal and Noise Types of Residual plots Different plots can highlight different departures or problems in the prediction model. 1) 2) 3) 4) Residual vs. Fitted Histogram PP~PLOT Order vs. Fitted Note: these plots are from software output (Minitab) Residual vs. Fitted value plot Three common defects may be revealed by plotting residuals vs. fitted value 1) Outliers 2) Progressive change in the variance: • Band of uniform width • Funnel shape = not equal variance : transform 3) inadequacy of the model : • Curvature ~ wrong model • Linear trend going up ~ wrong calculation Residual vs. Fitted Let's look at an example to see what a "well-behaved" residual plot looks like. Scatterplot Some researchers (UrbanoMarquez, et al., 1989) were interested in determining whether or not alcohol consumption was linearly related to muscle strength. The researchers measured the total lifetime consumption of alcohol (x) on a random sample of n = 50 alcoholic men. They also measured the strength (y) of the deltoid muscle in each person's nondominant arm. A fitted line plot of the resulting data, (alcoholarm.txt), looks like: Scatterplot Residual Plot Residual vs. Fitted Let's look at an example to see what a ”not so well-behaved" residual plot looks like. What do you notice in this scatterplot? Scatterplot Residual plot 0 Foot length Predicted or Fitted OUTLIER 0 Predicted or Fitted 0 Outlier Removed Predicted or Fitted Let's look at an example to see what a ”not well-behaved" residual plot looks like. 0 Heteroscedasticity When the requirement of a constant variance is violated we have a condition of heteroscedasticity. Diagnose heteroscedasticity by plotting the residual against the predicted y. + ++ ^y Residual + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + The spread increases with ^y y^ ++ + ++ ++ ++ + + ++ + + Signal and Noise Residuals plots fitted vs. residuals Homoscedasticity vs. Heteroscedasticity Homoscedasticity • A residual plot is a scatterplot of • the standardized residuals • against the fitted values Let's look at an example to see what a ”not well-behaved" residual plot looks like. How does a non-linear regression function show up on a residual vs. fits plot? How does a non-linear regression function show up on a residual vs. fits plot? The answer: The residuals depart from 0 in some systematic manner, such as being positive for small x values, negative for medium x values, and positive again for large x values. Any systematic (non-random) pattern is sufficient to suggest that the regression function is not linear. 2) The random errors are normally distributed and centered at zero • • • Histograms + PP PLOTS -- Normality assumption Histogram show why center at zero and why bell shape QQ plots better to discover the normal shape because the histogram bins can be manipulated and therefore the normal shape maybe difficult in some cases. Histograms of residuals What to look for? Centered at zero Bell shaped No outliers Centered at zero Bell shaped No outliers How strict? Centered at zero Bell shaped No outliers What does it mean when Histogram is skewed R, R-squared,SE 4 in one residual plots Look at this graph normal residuals??? Here's the corresponding normal probability plot of the residuals: residuals vs. order plot residuals vs. order plot" as a way of detecting a particular form of nonindependence of the error terms, namely serial correlation. If the data are obtained in a time (or space) sequence, a residuals vs. order plot helps to see if there is any correlation between the error terms that are near each other in the sequence. The plot is only appropriate if you know the order in which the data were collected! Highlight this, underline this, circle this, ..., er, on second thought, don't do that if you are reading it on a computer screen. Do whatever it takes to remember it though — it is a very common mistake made by people new to regression analysis. So, what is this residuals vs. order plot all about? As its name suggests, it is a scatter plot with residuals on the y axis and the order in which the data were collected on the x axis. Here's an example of a well-behaved residuals vs. order plot: Residual Vs. Order The residuals bounce randomly around the residual = 0 line as we would hope so. In general, residuals exhibiting normal random noise around the residual = 0 line suggest that there is no serial correlation. Residual Vs. Order A residuals vs. order plot that exhibits (positive) trend as the following plot does: R2 Residuals Se R-Squared Residual Standard Error Residuals Analysis is more important than High R2 Residual Activity https://www.causeweb.org/repository/StarLibrary/activities/miller2001/