Week 12 November 17-21 Four Mini-Lectures QMM 510 Fall 2014 ML 12.1 Chapter Contents 13.1 Multiple Regression 13.2 Assessing Overall Fit 13.3 Predictor Significance 13.4 Confidence Intervals for Y 13.5 Categorical Predictors 13.6 Tests for Nonlinearity and Interaction 13.7 Multicollinearity 13.8 Violations of Assumptions 13.9 Other Regression Topics 13-2 Much of this is like Chapter 12, except that we have more than one predictor. Chapter 13 Multiple Regression Chapter 13 Multiple Regression Simple or Multivariate? • Multiple regression is an extension of simple regression to include more than one independent variable. • Limitations of simple regression: • often simplistic • biased estimates if relevant predictors are omitted • lack of fit does not show that X is unrelated to Y if the true model is multivariate 13-3 Chapter 13 Multiple Regression Visualizing a Multiple Regression 13-4 Chapter 13 Multiple Regression Regression Terminology • Y is the response variable and is assumed to be related to the k predictors (X1, X2, … Xk) by a linear equation called the population regression model: Use Greek letters for population parameters • The estimated (fitted) regression equation is: Use Roman letters for sample estimates 13-5 Chapter 13 Multiple Regression Fitted Regression: Simple versus Multivariate If we have more than two predictors, there is no way to visualize it … 13-6 Chapter 13 Multiple Regression Data Format n observed values of the response variable Y and its proposed predictors X1, X2, …, Xk are presented in the form of an n x k matrix. 13-7 Chapter 13 Multiple Regression Common Misconceptions about Fit • A common mistake is to assume that the model with the best fit is preferred. • Sometimes a model with a low R2 may give useful predictions, while a model with a high R2 may conceal problems. • Thoroughly analyze the results before choosing the model. 13-8 Chapter 13 Multiple Regression Four Criteria for Regression Assessment • Logic - Is there an a priori reason to expect a causal relationship between the predictors and the response variable? • Fit - Does the overall regression show a significant relationship between the predictors and the response variable? • Parsimony - Does each predictor contribute significantly to the explanation? Are some predictors not worth the trouble? • Stability - Are the predictors related to one another so strongly that the regression estimates become erratic? 13-9 Chapter 13 Assessing Overall Fit F Test for Significance • • For a regression with k predictors, the hypotheses to be tested are H0: All the true coefficients are zero H1: At least one of the coefficients is nonzero In other words, H0: b1 = b2 = … = bk= 0 H1: At least one of the coefficients is nonzero 13-10 Chapter 13 Assessing Overall Fit F Test for Significance The ANOVA calculations for a k-predictor model resemble those for a simple regression, except for degrees of freedom: 13-11 Chapter 13 Assessing Overall Fit Coefficient of Determination (R2) • R2, the coefficient of determination, is a common measure of overall fit. • It can be calculated in one of two ways (always done by computer). • For example, for the home price data, 13-12 Chapter 13 Assessing Overall Fit Adjusted R2 • It is generally possible to raise the coefficient of determination R2 by including additional predictors. • The adjusted coefficient of determination is done to penalize the inclusion of useless predictors. • For n observations and k predictors: 13-13 Chapter 13 Assessing Overall Fit How Many Predictors? • Limit the number of predictors based on the sample size. • A large sample size permits many predictors. • When n/k is small, the R2 no longer gives a reliable indication of fit. • Suggested rules are: Evan’s Rule (conservative): n/k 0 (at least 10 observations per predictor) Doane’s Rule (relaxed): n/k 5 (at least 5 observations predictor) These are just guidelines – use your judgment. 13-14 Chapter 13 Predictor Significance • Test each fitted coefficient to see whether it is significantly different from zero. • The hypothesis tests for the coefficient of predictor Xj are • If we cannot reject the hypothesis that a coefficient is zero, then the corresponding predictor does not contribute to the prediction of Y. 13-15 Chapter 13 Predictor Significance Test Statistic • Excel reports the test statistic for the coefficient of predictor Xj : • Find the critical value tα for chosen level of significance α from Appendix D or from Excel using =T.INV.2T(α,df) 2 tailed test. • To reject H0 we compare tcalc to tα for the different hypotheses (or reject if p-value α). • The 95% confidence interval for coefficient bj is 13-16 Chapter 13 Confidence Intervals for Y Standard Error • The standard error of the regression (se) is another important measure of fit. Except for d.f. the formula for se resembles se for simple regression. • For n observations and k predictors • If all predictions were perfect (SSE = 0) then se = 0. 13-17 Chapter 13 Confidence Intervals for Y Approximate Confidence and Prediction Intervals for Y • Approximate 95% confidence interval for conditional mean of Y: • Approximate 95% prediction interval for individual Y value: 13-18 Chapter 13 Confidence Intervals for Y Quick 95 Percent Confidence and Prediction Interval for Y • The t-values for 95% confidence are typically near 2 (as long as n is not too small). • Very quick prediction and confidence intervals for Y interval without using a t table are: 13-19 ML 12.2 Standardized Residuals • Use Excel, MINITAB, MegaStat or other software to compute standardized residuals. • If the absolute value of any standardized residual is at least 2, then it is classified as unusual (as in simple regression). Leverage and Influence • A high leverage statistic indicates unusual X values in one or more predictors. • Such observations are influential because they are near the edge(s) of the fitted regression plane. • Leverage for observation i is denoted hi (computed by MegaStat) 12-20 Chapter 13 Unusual Observations Leverage • For a regression model with k predictors, an observation whose leverage exceeds 2(k+1)/n is unusual. • In Chapter 12, the leverage rule was 4/n. With k = 1 predictor, we get 2(k+1)/n = 2(1+1)/n = 4/n. • So this leverage criterion applies to simple regression as a special case. 12-21 Chapter 13 Unusual Observations Example: Heart Death Rate in 50 States standard error se = 27.422 n = 50 states, k = 3 predictors 4 states (FL, HI, OK, WV) have unusual residuals (> 2 se) highlighted by MegaStat high leverage criterion is 2(k+1)/n = 2(3+1)/50 = 0.160 Note: Only unusual observations are shown (there were n = 50 observations) 12-22 MegaStat highlights the high leverage observations (> .160) Chapter 13 Unusual Observations Chapter 13 Categorical Predictors ML 12.3 What Is a Binary or Categorical Predictor? • A binary predictor has two values (usually 0 and 1) to denote the presence or absence of a condition. • For example, for n graduates from an MBA program: Employed = 1 Unemployed = 0 • These variables are also called dummy , dichotomous, or indicator variables. • For easy understandability, name the binary variable the characteristic that is equivalent to the value of 1. 13-23 Chapter 13 Categorical Predictors Effects of a Binary Predictor • A binary predictor is sometimes called a shift variable because it shifts the regression plane up or down. • Suppose X1 is a binary predictor that can take on only the values of 0 or 1. • Its contribution to the regression is either b1 or nothing, resulting in an intercept of either b0 (when X1 = 0) or b0 + b1 (when X1 = 1). • The slope does not change: only the intercept is shifted. For example, 13-24 Testing a Binary for Significance • In multiple regression, binary predictors require no special treatment. They are tested as any other predictor using a t test. More Than One Binary • More than one binary occurs when the number of categories to be coded exceeds two. • For example, for the variable GPA by class level, each category is a binary variable: Freshman = 1 if a freshman, 0 otherwise Sophomore = 1 if a sophomore, 0 otherwise Junior = 1 if a junior, 0 otherwise Senior = 1 if a senior, 0 otherwise Masters = 1 if a master’s candidate, 0 otherwise Doctoral = 1 if a PhD candidate, 0 otherwise 13-25 Chapter 13 Categorical Predictors What if I Forget to Exclude One Binary? • Including all binaries for all categories may introduce a serious problem of collinearity for the regression estimation. Collinearity occurs when there are redundant independent variables. • When the value of one independent variable can be determined from the values of other independent variables, one column in the X data matrix will be a perfect linear combination of the other column(s). • The least squares estimation would fail because the data matrix would be singular (i.e., would have no inverse). 13-26 Chapter 13 Categorical Predictors • Outliers? (omit only if clearly errors) • Missing Predictors? (usually you can’t tell) • Ill-Conditioned Data (adjust decimals or take logs) • Significance in Large Samples? (if n is huge, almost any regression will be significant) • Model Specification Errors? (may show up in residual patterns) • Missing Data? (we may have to live without it) • Binary Response? (if Y = 0,1 we use logistic regression) • Stepwise and Best Subsets Regression (MegaStat does these) 13-27 13-27 Chapter 13 Other Regression Problems