Reasoning in Psychology Using Statistics Psychology 138 2015 • Quiz 8 due Fri. Apr 17 – Includes both correlation and regression • Final Project due date Wed. April 29th (you should get your cases assigned to you in labs today) Announcements Reasoning in Psychology Using Statistics • Lecture Exam 3 – Mean 53.6 (53.6/75 = 71.4%) • Combined Exam 3 – Mean 116.1 (116.1/150 = 77.4%) Exam(s) 3 Reasoning in Psychology Using Statistics • Lab Exam 3 – Mean 61.0 (61.0/75 = 81.3%) • Regression procedures can be used to predict the response variable based on the explanatory variable(s) Suppose that you notice that the more you study for an exam, the better your score typically is. – This suggests that there is a relationship between the variables. – You can use this relationship to predict test performance base on study time. study time 115 mins 15 mins Regression Reasoning in Psychology Using Statistics test performance • Regression – Describing the nature of the relationship between variables for the purposes of prediction Two variables Relationship between variables Quantitative variables Decision tree Reasoning in Psychology Using Statistics Making predictions based on form of the relationship • For correlation: “it doesn’t matter which variable goes on the X-axis or the Y-axis” Predicted variable • For regression this is NOT the case Y 6 5 – The variable that you are predicting (response variable) goes on the Y-axis Quiz 4 performance 3 2 1 1 2 Hours of study 3 4 5 Predictor variable Regression Reasoning in Psychology Using Statistics 6 X – The variable that you are making the prediction based on (explanatory variable) goes on the X-axis • For correlation: “Imagine a line through the points” • But there are lots of possible lines Y 6 5 Quiz 4 performance 3 2 1 1 2 3 4 5 Hours of study Regression Reasoning in Psychology Using Statistics 6 • One line is the “best fitting line” • Today: learn how to compute the equation corresponding to this “best fitting line” X • A brief review of geometry Y = intercept, when X = 0 Y 6 5 Y = (X)(slope) + (intercept) 4 3 2 1 0 2.0 1 2 3 4 5 Regression Reasoning in Psychology Using Statistics 6 X • A brief review of geometry Y 6 5 Y = (X)(slope) + (intercept) 4 3 2 1 0 1 0.5 2 1 2 3 Change in Y 4 5 Regression Reasoning in Psychology Using Statistics 6 X Change in X 2.0 = slope • A brief review of geometry Y 6 5 Y = (X)(slope) + (intercept) Y = (X)(0.5) + 2.0 4 3 2 1 0 1 2 3 4 5 Regression Reasoning in Psychology Using Statistics 6 X – In regression analysis this line (or the equation that describes it) represents our predicted values of Y given particular values of X • A brief review of geometry • Consider a perfect correlation X=5 4.5 Y 6 5 Y = (X)(0.5) + (2.0) Y=? Y = (5)(0.5) + (2.0) 4 3 2 1 1 2 3 4 5 Regression Reasoning in Psychology Using Statistics 6 X Y = 2.5 + 2 = 4.5 • Can make specific predictions about Y based on X • Consider a less than perfect correlation • The line still represents the predicted values of Y given X X=5 4.5 Y 6 5 Y = (X)(0.5) + (2.0) Y=? Y = (5)(0.5) + (2.0) 4 3 2 1 Y = 2.5 + 2 = 4.5 1 2 3 4 5 Regression Reasoning in Psychology Using Statistics 6 X • The “best fitting line” is the one that minimizes the differences (error or residuals) between the predicted scores (the line) and the actual scores (the points) Y 6 5 • Rather than compare the errors from different lines and picking the best, we will directly compute the equation for the best fitting line 4 3 2 1 1 2 3 4 5 Regression Reasoning in Psychology Using Statistics 6 X • Using the dataset from our correlation lecture Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. X Y A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 Example Reasoning in Psychology Using Statistics Y 6 5 4 3 2 1 1 2 3 4 5 6 X Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. X Y ( X - X ) ( X - X ) (Y - Y ) (Y - Y ) ( X - X )(Y -Y ) A 6 6 2.4 B 1 2 C 5 D 3 4.8 -2.0 5.2 1.96 2.0 4.0 2.8 -0.6 0.36 0.0 0.0 0.0 -0.6 0.0 0.36 15.20 -2.0 SSX 0.0 4.0 16.0 1.2 14.0 -2.6 6 1.4 4 Example Reasoning in Psychology Using Statistics 2.0 4.0 4.0 5.76 6.76 E 3 2 mean 3.6 4.0 2 2 SSY SP Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. X Y A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 mean 3.6 4.0 SP 14 slope = b = = = 0.92 SSX 15.2 15.20 Example Reasoning in Psychology Using Statistics SSX 16.0 SSY 14.0 SP Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. X Y A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 mean 3.6 4.0 SP 14 slope = b = = = 0.92 SSX 15.2 intercept = a = Y - bX = 4.0 - (0.92)(3.6) = 0.688 15.20 Example Reasoning in Psychology Using Statistics SSX 16.0 SSY 14.0 SP Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. X Y A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 mean 3.6 4.0 Y = 0.92X + 0.688 Y 6 5 4 3 2 1 slope = b = 0.92 intercept = 0.688 1 Example Reasoning in Psychology Using Statistics 2 3 4 5 6 X Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. X Y A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 mean 3.6 4.0 Y The two means will be on the line 6 5 4 3 2 1 Y slope = b = 0.92 intercept = 0.688 1 Example Reasoning in Psychology Using Statistics Y = 0.92X + 0.688 X 2 3 4 5 6 X Suppose that you notice that the more you study for an exam (X= hours of study), the better your exam score typically is (Y = exam score). Compute the regression equation predicting exam score with study time. X Y A 6 6 B 1 2 C 5 6 D 3 4 E 3 2 mean 3.6 4.0 Y = 0.92X + 0.688 Y 6 5 4 3 Hypothesis testing 2 of these on each 1 1 Example Reasoning in Psychology Using Statistics 2 slope = b = 0.92 intercept = 0.688 3 4 5 6 X • SPSS Regression output gives you a lot of stuff Hypothesis testing with Regression Reasoning in Psychology Using Statistics • SPSS Regression output gives you a lot of stuff Make sure you put the variables in the correct role Hypothesis testing with Regression Reasoning in Psychology Using Statistics • SPSS Regression output gives you a lot of stuff • Unstandardized coefficients – “(Constant)” = intercept – Variable name = slope • These t-tests test hypotheses – H0: Intercept (constant) = 0 – H0: Slope = 0 Hypothesis testing with Regression Reasoning in Psychology Using Statistics • The linear equation isn’t the whole thing • Also need a measure of error Y = X(.5) + (2.0) + error Y = X(.5) + (2.0) + error • Same line, but different relationships (strength difference) Y 6 5 Y 6 5 4 3 2 1 4 3 2 1 1 2 3 4 5 6 X 1 2 3 4 5 6 X Measures of Error in Regression Reasoning in Psychology Using Statistics • The linear equation isn’t the whole thing • Also need a measure of error • Three common measures of error – r2 (r-squared) – Sum of the squared residuals = SSresidual= SSerror – Standard error of estimate Measures of Error in Regression Reasoning in Psychology Using Statistics • R-squared (r2) represents the percent variance in Y accounted for by X r = 0.8 Y 6 5 r2 = 0.64 r = 0.5 r2 = 0.25 64% of the variance in Y is explained by X 4 3 2 1 Y 6 5 25% of the variance in Y is explained by X 4 3 2 1 1 2 3 4 5 6 X 1 2 3 4 5 6 X Measures of Error in Regression Reasoning in Psychology Using Statistics • Sum of the squared residuals = SSresidual = SSerror Y 6 5 4 3 2 1 1 2 3 4 5 6 X • Compute the difference between the predicted values and the observed values (“residuals”) • Square the differences • Add up the squared differences Measures of Error in Regression Reasoning in Psychology Using Statistics • Sum of the squared residuals = SSresidual = SSerror X Y 6 6 1 2 5 6 3 4 Yˆ 3 2 mean 3.6 4.0 Ŷ = 0.92X + 0.688 Predicted values of Y (points on the line) Measures of Error in Regression Reasoning in Psychology Using Statistics • Sum of the squared residuals = SSresidual = SSerror X Y 6 6 1 2 5 6 3 4 Yˆ 6.2 = (0.92)(6)+0.688 3 2 mean 3.6 4.0 Ŷ = 0.92X + 0.688 Predicted values of Y (points on the line) Measures of Error in Regression Reasoning in Psychology Using Statistics • Sum of the squared residuals = SSresidual = SSerror X Y 6 6 1 2 5 6 3 4 3 2 mean 3.6 4.0 Yˆ 6.2 = (0.92)(6)+0.688 1.6 = (0.92)(1)+0.688 5.3 = (0.92)(5)+0.688 3.45 = (0.92)(3)+0.688 3.45 = (0.92)(3)+0.688 Ŷ = 0.92X + 0.688 Measures of Error in Regression Reasoning in Psychology Using Statistics • Sum of the squared residuals = SSresidual = SSerror X Y 6 6 1 2 5 6 3 4 3 2 Ŷ = 0.92X + 0.688 Yˆ 6.2 1.6 5.3 3.45 3.45 Y 6.2 6 5.3 5 4 3.45 3 2 1.6 1 1 2 3 4 5 6 X Measures of Error in Regression Reasoning in Psychology Using Statistics • Sum of the squared residuals = SSresidual = SSerror X Y 6 6 1 2 5 6 3 4 3 2 mean 3.6 4.0 Yˆ 6.2 1.6 5.3 3.45 3.45 residuals (Y - Yˆ ) -0.20 2 - 1.6 = 0.40 6 - 5.3 = 0.70 4 - 3.45 = 0.55 2 - 3.45 = -1.45 6 - 6.2 = Quick check 0.00 Ŷ = 0.92X + 0.688 Measures of Error in Regression Reasoning in Psychology Using Statistics • Sum of the squared residuals = SSresidual = SSerror X Y 6 6 1 2 5 6 3 4 3 2 mean 3.6 4.0 Ŷ = 0.92X + 0.688 Yˆ 6.2 1.6 5.3 3.45 3.45 ( ) ( Y - Yˆ -0.20 0.40 0.70 0.55 -1.45 0.00 ) Y - Yˆ 0.04 0.16 0.49 0.30 2.10 3.09 2 SSERROR Measures of Error in Regression Reasoning in Psychology Using Statistics • Sum of the squared residuals = SSresidual = SSerror X Y 6 6 1 2 5 6 3 4 3 2 mean 3.6 4.0 Ŷ = 0.92X + 0.688 Yˆ 6.2 1.6 5.3 3.45 3.45 (Y - Y ) 4.0 4.0 4.0 0.0 4.0 2 ( ) ( Y - Yˆ -0.20 0.40 0.70 0.55 -1.45 0.00 16.0 SSY ) Y - Yˆ 0.04 0.16 0.49 0.30 2.10 3.09 2 SSERROR Measures of Error in Regression Reasoning in Psychology Using Statistics • Standard error of the estimate represents the average deviation from the line SSerror = df Y 6 5 4 3 2 1 df = n - 2 SSerror = n-2 1 2 3 4 5 6 X 3.09 = = 1.01 3 Measures of Error in Regression Reasoning in Psychology Using Statistics • SPSS Regression output gives you a lot of stuff • r2 – percent variance in Y accounted for by X • Standard error of the estimate – the average deviation from the line • SSresiduals or SSerror Measures of Error in Regression Reasoning in Psychology Using Statistics • You’ll practice computing the regression equation and error for the “best fitting line” (by hand and using SPSS) In lab Reasoning in Psychology Using Statistics