Multiple Regression Extension of Simple Linear Regression – using multiple predictors each predictor could help predict or explain additional variability in the response/criterion variable However: What should be the effect of using any additional predictors? Multiple Regression What should be the effect of using additional predictors? Logically, unless correlation with DV is 0, each predictor will improve prediction (explain additional variance in DV) So just adding variables as predictors at random will usually “improve” model Creates potential for misuse of the strategy IDEALLY each predictor should be - correlated with the DV - uncorrelated with other predictors (r over .8 undesirable) each predictor should explain some unique variability in DV each predictor should make sense! Best situation CLEAR THEORY or LOGIC determines the predictors selected Examples Relationship Commitment satisfaction with outcomes (+) investments in relationship (+) attractiveness of available alternatives (-) Job Satisfaction salary physical conditions social conditions Simple linear regression Yp = a + bX (+ residuals) Multiple regression Yp = a + b1X1 + b2X2 (+ residuals) a is value of Y when all X = 0 (regression constant) b’s are ‘partial regression coefficients’ slope for each predictor when other predictors held constant Graph of relationship when two predictors are used Now try to fit a plane rather than a line – to minimize the errors of prediction Multiple regression Yp = a + b1X1 + b2X2 + b3X3 (+ residuals) Commitment = a + b1 (satisfaction) + b2 (investments) + - b3 (alternatives) (+ residuals) A weighted linear combination of predictors comparison to ANOVA – main effects only model • Let’s return to the question of predicting Exam 2 grades – using multiple predictors • • • • • Undergraduate GPA (0-4 scale) GRE Verbal (200-800 scale) GRE Quantitative (200-800 scale) Exam 1 grade (0-100 scale) Mean Homework grade (0-10 scale) Note variety of scales for predictors, weights (partial regression coefficients) will be variable to take those into account Ideally, all predictors are related to the criterion, and are unrelated to each other Variables Entered/Removedb Model 1 Variables Entered homework, grev, gpatot, greq , exam1 Variables Removed a . Using just Exam 1 score, the correlation between Exam 1 and Exam 2 was r = .637, r2 = .406 Method Enter a. All requested variables entered. b. Dependent Variable: exam2 Model Summary Model 1 R .735a Adjusted R Sq uare .518 R Sq uare .540 Std. Error of the Estimate 4.61879 Now the R, between the set of predictors and Exam 2 is .735, and R2 = .540 a. Predictors: (Constant), homework, grev, g patot, g req, exam1 ANOVAb Model 1 Reg ression Residual Total Sum of Squares 2609.715 2218.658 4828.373 df 5 104 109 Mean Square 521.943 21.333 F 24.466 Sig . .000a a. Predictors: (Constant), homework, grev, gpatot, greq , exam1 b. Dependent Variable: exam2 Coefficientsa Model 1 (Constant) gpatot grev greq exam1 homework Unstandardized Coefficients B Std. Error 16.059 8.406 .192 1.226 -7.4E-005 .006 .001 .007 .440 .067 3.673 .681 Standardized Coefficients Beta .011 -.001 .006 .500 .390 t 1.911 .157 -.011 .086 6.538 5.394 Sig . .059 .876 .991 .932 .000 .000 Correlations Zero-order Partial .127 .137 .078 .637 .564 .015 -.001 .008 .540 .468 Part .010 -.001 .006 .435 .359 a. Dependent Variable: exam2 Exam 2 (Pred) = 16.06 + .19 (gpa) - .00(grev) + .00(greq) +.44(exam1) +3.67(homework) Since gpatot, grev, greq were all not significant, should they be excluded from the equation? Assumptions - never likely to satisfy all Essentially same as for r, but at a multivariate level Independent Observations Interval/Ratio Data – or at least pretend Normality – all Predictors (X’s) and Response (Y) - errors of prediction are normally distributed Linearity all X’s have linear relationship with Y - errors of prediction/predicted scores are linear Equality of Variances (Homoscedasticity) - variability of errors of Y are same at all values of X Assumptions can be evaluated within SPSS at the multivariate level. In the Regression window, choose Plots and request zresid (Y) and zpred (X). The tables at the right demonstrate the patterns that would indicate each violation. Although deciding when there is ‘enough’ discrepancy is still subjective. From Tabachnick & Fidell (2007). Using multivariate statistics (5th). Boston: Allyn & Bacon. Example to follow Predicting Rated Distress : (1) none to Extreme (9) - when partner is emotionally unfaithful using Age and Rated Distress over Sexual Infidelity as predictors. All 3 variables are skewed. Other Considerations in Multiple Regression -- Truncated Range – same as with r, can lead to poor assessment of ‘real’ R -- Outliers due to multivariate deviation Discrepancy (distance) –outlier on criterion Leverage – outlier on predictors Influence – combines D & L to assess influence on solution (change in regression coefficients if case deleted) How these would appear in a simple linear regression situation From Tabachnick & Fidell (2007). Using multivariate statistics (5th). Boston: Allyn & Bacon. Other Considerations in Multiple Regression -- Outliers due to multivariate deviation Simple diagnostic for Influence is to request Cook’s Distance statistic in Regression window, Save option. Values over 1 would suggest potentially strong influence. Linear Regres s ion Line with outlier 6.00 Note that residual for the outlier is not great, but it has strong influence on solution 4.00 Cook’s distance = 92.6 2.00 0.00 Linear Regres s ion 10.00 20.00 30.00 Line without outlier ye ars 8.00 6.00 pubs pubs 4.00 2.00 0.00 10.00 20.00 ye ars 30.00 Other Considerations in Multiple Regression – Sample Size – if too small may get good, but meaningless prediction – too little variability Minimum sample sizes recommended (to detect moderate effect sizes, 13% with power of approximately .80) (Green, S. B. (1991) How many subjects does it take to do a regression analysis? Multivariate Behavioral Research, 26, 499-510.) • For test of a model n= 50 +8p • For test of individual predictors in model 104 + p » p = number of predictors Can also conduct a power analysis based on the effect size you desire to select your sample size Other Considerations in Multiple Regression – Multicollinearity or Singularity • Singularity – when one predictor is a combination of other predictors included • Multicollinearity - when other predictors can account for a high degree of variability in a predictor Other Considerations in Multiple Regression – Diagnostics for Multicollinearity or Singularity Tolerance is used as diagnostic statistic If other predictors used to predict a predictor, what variance is shared? But reported as 1-R2, so closer to 1 is better less than .2 indicates a problem Variance Inflation Factor (VIF) also used. It is the reciprocal of Tolerance, so can range from 1 up. Reflects degree to which standard error of b is increased due to correlations among predictors. value of 4 cause for some concern value of 10 serious problem Assessing the Outcome Testing the Overall Model as a single outcome How well do the set of predictors (X’s) predict the criterion (Y) Ho: all b’s are = 0, all partial regression coefficients = 0 Or Ho: R = 0, the Multiple Correlation Coefficient = 0 R = correlation of actual Y with weighted linear combination of predictors (X’s) Or – since weighted linear combination leads to predicted scores R = correlation of actual Y with predicted Yp Reminder: Partitioning the Variability in Y SSTotal = Sum (Y - Mean Y)2 variability of Y scores from the mean Separated into SSregression = Sum (Yp – Mean Y) 2 Improvement in predictions when using X (variability in Y explained by X), rather than assuming everyone gets the Mean SSresidual = Sum (Y - Yp) 2 Degree to which predictions do not match the actual scores (prediction errors that have been minimized) Example from Simple Linear Regression 120.00 iq = 53.05 + 16.99 * gpa R-Square = 0.69 110.00 iq 100.00 Improvement in Prediction using GPA Residual – distance from the prediction line Residual much greater here 90.00 Linear Regression 2.50 3.00 Mean GPA = 3.06 gpa 3.50 Mean IQ = 105 This would be your best ‘guess’ for every person if you had no useful predictor Test using F – similar to simple linear regression Partition SST into • SSregression (explained by weighted combination) • SSresidual (unexplained) F = SSregression /df regression SSresidual / df residual F = MSregression MSresidual Number of parameters in model (predictors + intercept) – df often indicated as p, since always only one a (df = p +1 -1) df = (p + a) - 1 df = n – p - 1 = explained (systematic+unsystematic) unexplained (unsystematic) Was R reliably different from 0 ? Yes, if F is significant Recall: Standard Error of the Estimate = SQRT (MS residual) R2 = SSregression = explained variability SST total variability % of variance accounted for by the model (see next slide for ANOVA example) Adjusted R2 for better estimate for population, adjusted based on number of predictors and sample size Adjusted R2 = 1- ((1-R2) (n-1/n-p-1)) Can use R2 for describing a sample so lower if small sample, but many predictors Tests of Between-Subjects Effects Dependent Variable: Sensitive Source Corrected Model Intercept GENDER RELATE GENDER * RELATE Error Total Corrected Total Type III Sum of Squares 83.200a 6451.600 14.400 44.600 24.200 549.200 7084.000 632.400 Partial df 7 1 1 3 3 152 160 159 Mean Square 11.886 6451.600 14.400 14.867 8.067 3.613 F 3.290 1785.585 3.985 4.115 2.233 Sig. .003 .000 .048 .008 .087 Eta Squared .132 .922 .026 .075 .042 a. R Squared = .132 (Adjusted R Squared = .092) Test of model in which there are 3 predictors used to predict the rating on “Sensitive”, the DV Example from Handout Packet, Page approx. 47 In some cases, the purpose of the regression analysis is simply to see if the Model “works”. Does it explain variance in the criterion? Can it be used to make predictions? Thus, the overall test of the model is all you need, and you can interpret the R2 or R2adj and the SEE, if plan predictions In other cases, you might want to know how the individual predictors contributed to the overall model. Assessing the contribution of individual predictors Dependent upon the set of predictors included! Partial regression coefficient– can test to see if b = 0 Is b = 0 (slope = 0) when other predictors are held constant Tested using a t test with df = n – p – 1 Beta – partial regression coefficient when all variables are standardized (standardized slope) If b is significant, so is beta Test of Partial Regression Coefficient is like a typical statistical test of significance - it is or is not significant, and is influenced by sample size Can also evaluate predictors based on “effect size” measures (practical significance) These would be “significant” if b is significant Partial correlation (pr) – as described in simple covariation section correlation of predictor (X1) with DV (Y) after removing the variance in both explained by the other predictors So both X1 and Y are adjusted before correlation is calculated All other X’s are ‘partialed’ out of X1 and Y pr2 – shared variance within context – what % of variability in Y does X1 explain after other variables’ contributions to explaining both are removed there is less than 100% of variability of Y left for X1 explain Semi-partial (part) correlation (sr) – correlation of predictor (X1) with DV (Y) after removing the variance of X1 shared with the other predictors So X1 is adjusted by removing variance shared with other X’s But all variability in Y is left to be explained Assesses ‘unique’ contribution of X1 to explaining Y There is 100% of variability in Y to explain for each X in model sr2 is considered best measure of individual predictor importance (practical significance) R2 will be lowered by sr2 for predictor when it is removed from model (BOTH pr and sr ARE STILL DEPENDENT ON THE MODEL USED) WHY? Shared variability DV and X1 IV 1 a Variability of DV (Y) d Shared variability IV 1 and X2 b c IV 2 Partial correlation (X1) = a/(a + d) Semi-partial correlation (X1) = a/(a + b + c + d) Shared variability DV and X2 Types of Multiple Regression Standard – all predictors entered together contribution of each depends on others in the group Assumes other variables would usually be there and/or are relevant Four Humor Styles Investment Model Variables Big Five Personality Dimensions Hierarchical Regression - enter in planned sequence Can enter individual predictors one at a time Or enter groups of variables at separate steps As new predictors are added, each one can only explain variability that is left Assess change in R2 at each step (increase significantly) and overall model when done Predicting adult IQ Parental IQ Prenatal experience Early infant experience Education Statistical Methods – Let the data determine inclusion in the model, not based on a logical or theoretical ‘plan’ Assess each step by evaluating change in R or R2 Usually an exploratory tool in possible model building Requires a larger sample to have confidence (40 cases per predictor) Stepwise Begins with single best predictor Adds next best, and assesses if model is better At each step, each variable is reassessed, and might be kept or removed Stops when adding additional variables does not significantly improve model (R) Forward inclusion Begins with single best predictor Adds next best, and assesses improvement Once in, stay in, but only stay if improved model (R) Backward exclusion Begins with full model Removes weakest contributor and assesses loss Keeps removing unless significant drop in R Research questions using Multiple Regression Assess Overall Model Assess individual predictors Effects of adding or changing predictors on overall model on other individual predictors Predictions in new sample Other Multiple Regression issues/applications Suppressor variables – variables that improve the model due to correlations with other predictors, not criterion. They ‘suppress’ variance in another predictor that is ‘noise’ evident if simple r with criterion very low but contributes to model (sr is higher) can also produce a change in sign from r to b (i.e. positive r but negative b) Other issues/applications Mediation Models Relationship of X to Y is mediated by some other variable C Positive use of Humor for self a Perceived Stress Positive Personality (optimistic, hopeful, happy) b Positive use of Humor for self (High self-enhancing/Low self-defeating) Perceived Stress C’ Humor use (H) predicts Perceived Stress (c) - the direct path Humor use predicts Positive Personality (PP) (a) Positive Personality predicts Perceived Stress, with Humor in model (b) In Hierarchical Model, enter PP first, then H, if PP mediates H, H no longer ‘contributes’ to the model – the c’ path, indirect, not significant Other issues/applications Moderator Models – relationship of predictor with the criterion depends upon some other variable (just like an interaction in ANOVA) Yp = a + b1x1 + b2x2 + b3(x1x2) + residuals Main effects Interaction term added to equation Often requires some modifications of the data prior to the analysis - centering variables to avoid multicollinearity (if predictors do not have true 0 scores) Best situation CLEAR THEORY to be tested Relationship Commitment (low 8 – 72 high) satisfaction with outcomes (+) (low 3 – 21 high) investments in relationship (+) attractiveness of available alternatives (-) ( low 6 – high 48) (Subj low 6 – 54 high) (Obj none 0 - ?? Lots) In Handout Packet Begin by examining the individual variables for normality and outliers etc. Can request Cook’s D to assess for outlier influence Can check assumptions using plot from regression analysis Then look at the simple correlations (r) Expect predictors to correlate with criterion, but not a lot with each other Correlations Pearson Correlation Sig. (1-tailed) N Global Commitment Global Satisfaction Global alternatives Objective Investments Subjective Investments Global Commitment Global Satisfaction Global alternatives Objective Investments Subjective Investments Global Commitment Global Satisfaction Global alternatives Objective Investments Subjective Investments Global Commitment 1.000 .310 -.422 .395 .551 . .003 .000 .000 .000 75 75 75 75 75 Global Satisfaction .310 1.000 -.237 .157 .339 .003 . .020 .089 .001 75 75 75 75 75 Global alternatives -.422 -.237 1.000 -.257 -.440 .000 .020 . .013 .000 75 75 75 75 75 Objective Investments .395 .157 -.257 1.000 .408 .000 .089 .013 . .000 75 75 75 75 75 Subjective Investments .551 .339 -.440 .408 1.000 .000 .001 .000 .000 . 75 75 75 75 75 Check to see how well the model worked R and R2, and test of significance Standard error of the estimate To describe sample To generalize to population Model Summary Model 1 R .619a Adjusted R Sq uare .348 R Sq uare .384 Std. Error of the Estimate 6.34228 Typical residual a. Predictors: (Constant), Global alternatives, Global Satisfaction, Objective Investments, Subjective Investments ANOVAb Model 1 Regression Residual Total Sum of Squares 1751.967 2815.713 4567.680 df 4 70 74 Mean Square 437.992 40.224 F 10.889 a. Predictors: (Constant), Subjective Investments, Global Satisfaction, Objective Investments, Global alternatives b. Dependent Variable: Global Commitment Sig. .000a Now can look at individual predictors check collinearity see which predictors are individually significant look at individual contributions (semi-partial or part r2) Coefficientsa Unstandardized Coefficients Model 1 (Constant) Global Satisfaction Global alternatives Objective Investments Subjective Investments B 19.247 .247 -.179 3.390E-02 .346 a. Dependent Variable: Global Commitment Std. Error 6.372 .214 .098 .019 .113 Standardi zed Coefficien ts Beta .116 -.192 .183 .353 95 t 3.021 1.153 -1.823 1.773 3.071 Sig. .004 .253 .073 .081 .003 95% Confidence Interval for B Lower Upper Bound Bound 6.539 31.955 -.180 .674 -.376 .017 -.004 .072 .121 .571 Correlations Zero-order .310 -.422 .395 .551 Partial .137 -.213 .207 .345 Collinearity Statistics Part .108 -.171 .166 .288 Tolerance .875 .791 .826 .668 VIF 1.143 1.264 1.211 1.496 • Go through example in SPSS • Look at G*Power • Stepwise example in Handouts