Chapter Eight Linear Regression © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Linear Regression Linear regression is a statistical procedure that uses relationships to predict unknown Y scores based on the X scores from a correlated variable. © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 2 Linear Regression Line The linear regression line is the straight line that summarizes the linear relationship in the scatterplot by, on average, passing through the center of the Y scores at each X. © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 3 Predicted Y Scores • The symbol Y stands for a predicted Y score • Each Y is our best prediction of the Y score at a corresponding X, based on the linear relationship that is summarized by the regression line © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 4 Slope and Intercept • The slope is a number that indicates how slanted the regression line is and the direction in which it slants • The Y intercept is the value of Y at the point where the regression line intercepts, or crosses, the Y axis (that is, when X equals 0) © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 5 Regression Lines Having Different Slopes and Y Intercepts © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 6 The Linear Regression Equation • The linear regression equation indicates the predicted Y values are equal to the slope (b) times a given X value and this product then is added to the Y intercept (a) Y bX a © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 7 Computing the Slope • The formula for the slope (b) is N (XY ) (X )(Y ) b 2 2 N (X ) (X ) • Since we usually first compute r, the values of the elements of this formula already are known © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 8 Computing the Y-Intercept • The formula for the Y-intercept (a) is a Y (b)( X ) © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 9 Summary of Computations for the Linear Regression Equation © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 10 Describing Errors in Prediction © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 11 The Variance of Y Scores Y Around • The variance of the Y scores around Y is the average squared difference between the actual Y scores and their corresponding predicted Y scores • The formula for defining the variance of the Y scores around Y is S S (1 r ) 2 Y 2 Y 2 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 12 The Standard Error of the Estimate The standard error of the estimate is the clearest way to describe the “average” error when using Y to predict Y scores. SY SY 1 r 2 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 13 Interpreting the Standard Error of the Estimate © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 14 Assumption 1 of Linear Regression • The first assumption of linear regression is that the data are homoscedastic © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 15 Homoscedasticity • Homoscedasticity occurs when the Y scores are spread out to the same degree at every X © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 16 Heteroscedasticity • Heteroscedasticity occurs when the spread in Y is not equal throughout the relationship © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 17 Assumption 2 of Linear Regression • The second assumption of linear regression is that the Y scores at each X form an approximately normal distribution © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 18 Scatterplot Showing Normal Distribution of Y Scores at Each X © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 19 The Strength of a Relationship and Prediction Error © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 20 The Strength of a Relationship As the strength of the relationship increases, the actual Y scores are closer to their corresponding Y scores, producing less prediction error and 2 smaller values of SY and SY . © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 21 Scatterplot of a Strong Relationship © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 22 Scatterplot of a Weak Relationship © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 23 Computing the Proportion of Variance Accounted For © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 24 Proportion of Variance Accounted For The proportion of variance accounted for is the proportional improvement in the accuracy of our predictions produced by using a relationship to predict Y scores, compared to our accuracy when we do not use the relationship. © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 25 Proportion of Variance Accounted For • When we do not use the relationship, we use the overall mean of the Y scores (Y ) as everyone’s predicted Y • The error here is the difference between the actual Y scores and the Y we predict they got (Y Y ) • When we do not use the relationship to 2 predict scores, our error is SY © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 26 Proportion of Variance Accounted For • When we do use the relationship, we use the corresponding Y as determined by the linear regression equation as our predicted value • The error here is the difference between the actual Y scores and the Y that we predict they got (Y Y ) • When we do use the relationship to predict scores, our error is SY2 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 27 Proportion of Variance Accounted For The computational formula for the proportion of variance in Y that is accounted for by a 2 linear relationship with X is r . © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 28 Proportion of Variance Not Accounted For The computational formula for the proportion of variance in Y that is not accounted for by a linear relationship with X is 1 r . 2 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 29 Example 1 • For the following data set, calculate the linear regression equation. X Y 1 8 2 6 3 6 4 5 5 1 6 3 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 30 Example 1 Linear Regression Equation N (XY ) (X )(Y ) b 2 2 N (X ) (X ) 6(81) (21)( 29) 2 6(91) (21) 123 1.171 105 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 31 Example 1 Linear Regression Equation a Y (b)( X ) 4.833 (1.171)(3.500) 8.933 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 32 Example 1 Linear Regression Equation Y bX a 1.17 X 8.93 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 33 Example 2 Predicted Y Value • Using the linear regression equation from example 1, determine the predicted Y score for X = 4. Y 1.17 X 8.93 1.17(4) 8.93 4.25 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 34 Example 3 • Using the same data set, calculate the standard error of the estimate. X Y 1 8 2 6 3 6 4 5 5 1 6 3 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 35 Example 3 Standard Error of the Estimate SY SY 1 r 2 2 2 ) 29 ( ) Y ( 171 Y 2 6 5.139 N SY2 6 N SY 5.139 2.267 r 0.88 SY 2.267 1 (0.88) 1.08 2 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 36 Example 4 • Using the same data set, calculate the proportion of variance accounted for and the proportion of variance not accounted for. X Y 1 8 2 6 3 6 4 5 5 1 6 3 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 37 Example 4—Proportion of Variance Accounted For The proportion of variance in Y that is accounted 2 for by a linear relationship with X is r . The proportion of variance in Y that is not accounted for by a linear relationship with X is 1 r 2. r 0.88 r 2 0.774 1 r 2 0.226 © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 38 Key Terms • coefficient of alienation • coefficient of determination • criterion variable • heteroscedasticity • homoscedasticity • linear regression equation • linear regression line • multiple correlation coefficient • multiple regression equation • predicted Y score • predictor variable • proportion of the variance accounted for © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 39 Key Terms (Cont’d) • slope • standard error of the estimate • variance of the Y scores around Y • Y intercept © 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website for classroom use. Chapter 8 - 40