Stat 401 B – Lecture 12 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X1, X2,…, Xk 1 Multiple Regression Y = μY | x1 , x2 ,..., xk + ε Y = β 0 + β1 x1 + β 2 x2 + ... + β k xk + ε 2 Example Y, Response – Effectiveness score based on experienced teachers’ evaluations. Explanatory – Test 1, Test 2, Test 3, Test 4. 3 Stat 401 B – Lecture 12 Student Eval Teacher 1 2 3 4 5 6 Test1 489 423 507 467 340 524 23 Test2 81 68 80 107 43 129 Test3 151 156 165 149 134 163 434 76 141 Test4 46 46 76 56 49 72 54 44 45 55 43 49 50 58 4 JMP Analyze – Fit Model Pick Role Variables Y – EVAL Construct Model Effects Add – Test1, Test2, Test3, Test4 5 JMP Analyze – Fit Model Personality – Standard Least Squares Emphasis – Minimal Report 6 Stat 401 B – Lecture 12 Response EVAL Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.802861 0.759052 37.53627 444.4783 23 Analysis of Variance Source Model Error C. Total DF 4 18 22 Sum of Squares Mean Square 103286.25 25821.6 25361.49 1409.0 128647.74 F Ratio 18.3265 Prob > F <.0001* Parameter Estimates Term Intercept Test1 Test2 Test3 Test4 Estimate Std Error t Ratio Prob>|t| -193.4994 125.3074 -1.54 0.1399 1.1158539 0.319746 3.49 0.0026* 2.243267 0.628449 3.57 0.0022* -1.367001 0.563965 -2.42 0.0261* 6.0482387 1.202281 5.03 <.0001* 7 Prediction Equation Predicted Evaluation = –193.50 + 1.116*Test1 + 2.243*Test2 – 1.367*Test3 + 6.048*Test4 8 Conditions The random error term, ε , is Independent Identically distributed Normally distributed with standard deviation, σ . 9 Stat 401 B – Lecture 12 Estimate of Error Variance, σ 2 MSError = SSError df Error ( y − yˆ ) = 2 MSError MSError n − (k + 1) 25361.49 = = 1409.0 18 10 Estimate of Error Std Dev, σ Root Mean Square Error RMSE = MSError RMSE = 1409.0 = 37.54 11 Multiple R2 R2 = SSModel SS = 1 − Error SSTotal SSTotal R2 = 103286.25 = 0.802861 128647.74 12 Stat 401 B – Lecture 12 Interpretation 80.3% of the variation in the evaluation scores can be explained by the model, i.e. the relationship with the explanatory variables. 13 Caution Including additional explanatory variables in a model can only increase the value of R2, even if those explanatory variables have nothing to do with the response variable. 14 Adjusted R2 adjR 2 = 1 − adjR 2 = 1 − MSError MSTotal (25361.49 18) = 0.75905 (128647.74 22) 15 Stat 401 B – Lecture 12 Test of Model Utility Is there any explanatory variable in the model that is helping to explain significant amounts of variation in the response? 16 Step 1: Hypotheses H 0 : β1 = β 2 = ... = β k = 0 H A : at least one parameter is not zero 17 Step 2: Test Statistic F= MSModel MSError 25821.6 = 18.3265 1409.0 P − value < 0.0001 F= 18 Stat 401 B – Lecture 12 Step 3: Decision Reject the null hypothesis because the P-value is so small. 19 Step 4: Conclusion At least one of the tests is providing statistically significant information about the evaluation score. The model is useful. Maybe not the best, but useful. 20 Alternative Form (R k ) 2 F= ( ) 1− R2 ( ( ) ) − + 1 n k 0.802861 4 = 18.3265 F= 0.197139 18 P − value < 0.0001 ( ( ) ) 21