Multiple Regression Model: Concepts & Applications

Ch. 14: The Multiple Regression Model building Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi) Multiple Regression Model with k Independent Variables: Y-intercept Population slopes Random Error Yi  β0  β1 X 1i  β2 X 2i    βk X ki  ε • The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷi  b0  b1 X 1i  b2 X 2i    bk X ki • Interpretation of the Slopes: (referred to as a Net Regression Coefficient) – b1=The change in the mean of Y per unit change in X1, taking into account the effect of X2 (or net of X2) – b0 Y intercept. It is the same as simple regression. Graph of a Two-Variable Model • Three dimension Y Ŷ  b0  b1 X 1  b2 X 2 X2 X1 Example: • Simple Regression Results Intercept (b0) Lotsize (b1) Coefficients Standard Error t Stat 165.0333581 16.50316094 10.000106 6.931792143 2.203156234 3.1463008 F-Value Adjusted R Square Standard Error 9.89 0.108 36.34 • Multiple Regression Results Intercept Lotsize Rooms Coefficients Standard Error 59.32299284 20.20765695 3.580936283 1.794731507 18.25064446 2.681400117 t Stat 2.935669 1.995249 6.806386 F-Value Adjusted R Square Standard Error 31.23 0.453 28.47 • Check the size and significance level of the coefficients, the F-value, the R-Square, etc. You will see what the “net of “ effects are. Using The Equation to Make Predictions • Predict the appraised value at average lot size (7.24) and average number of rooms (7.12). App.Val .  59.32  3.58 (7.24)  18.25(7.12)  215.18 or $215,180 • What is the total effect from 2000 sf increase in lot size and 2 additional rooms? Increse in app. value  (3.58)(200 0)  (18.25)(2)  $43,660 Coefficient of Multiple Determination, r2 and Adjusted r2 • Reports the proportion of total variation in Y explained by all X variables taken together (the model) 2 Y.12..k r SSR regression sum of squares   SST total sum of squares • Adjusted r2 • r2 never decreases when a new X variable is added to the model – This can be a disadvantage when comparing models • What is the net effect of adding a new variable? – We lose a degree of freedom when a new X variable is added – Did the new X variable add enough explanatory power to offset the loss of one degree of freedom? • Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used   n  1  2 radj  1  ( 1  rY .12.. k )   n  k  1   2 (where n = sample size, k = number of independent variables) – Penalize excessive use of unimportant independent variables – Smaller than r2 – Useful in comparing among models Multiple Regression Assumptions • • • • • Assumptions: The errors are normally distributed Errors have a constant variance The model errors are independent Errors (residuals) from the regression model: ei = (Yi – Yi) • These residual plots are used in multiple regression: – Residuals vs. Yi – Residuals vs. X1i – Residuals vs. X2i – Residuals vs. time (if time series data) Two variable model Y Yi Ŷ  b0  b1 X 1  b2 X 2 < Residual = ei = (Yi – Yi) Sample observation < Yi x2i X1 < x1i X2 The best fit equation, Y , is found by minimizing the sum of squared errors, e2 Are Individual Variables Significant? • Use t-tests of individual variable slopes • Shows if there is a linear relationship between the variable Xi and Y; Hypotheses: • H0: βi = 0 (no linear relationship) • H1: βi ≠ 0 (linear relationship does exist between Xi and Y) • Test Statistic: bi  0 tn  k  1  Sb i • Confidence interval for the population slope βi bi  tnk 1 Sb i Is the Overall Model Significant? • F-Test for Overall Significance of the Model • Shows if there is a linear relationship between all of the X variables considered together and Y • Use F test statistic; Hypotheses: H0: β1 = β2 = … = βk = 0 (no linear relationship) H1: at least one βi ≠ 0 (at least one independent variable affects Y) • Test statistic: SSR MSR k F   SSE MSE n  k 1 Testing Portions of the Multiple Regression Model • • To find out if inclusion of an individual Xj or a set of Xs, significantly improves the model, given that other independent variables are included in the model Two Measures: 1. Partial F-test criterion 2. The Coefficient of Partial Determination Contribution of a Single Independent Variable Xj SSR(Xj | all variables except Xj) = SSR (all variables) – SSR(all variables except Xj) • Measures the contribution of Xj in explaining the total variation in Y (SST) • consider here a 3-variable model: SSR(X1 | X2 and X3) = SSR (all variablesX1-x3) – SSR(X2 and X3) SSRUR Model SSRR Model The Partial F-Test Statistic • Consider the hypothesis test: H0: variable Xj does not significantly improve the model after all other variables are included H1: variable Xj significantly improves the model after all other variables are included (SSRUR - SSRR)/(df  number of restrictio n) F MSE  SSEUR/(n - k - 1) Note that the numerator is the contribution of Xj to the regression. If Actual F Statistic is > than the Critical F, then Conclusion is: Reject H0; adding X1 does improve model Coefficient of Partial Determination for one or a set of variables • Measures the proportion of total variation in the dependent variable (SST) that is explained by Xj while controlling for (holding constant) the other explanatory variables 2 rYj.(allvariablesexcept j) SSRUR - SSRR  SSTUR  SSRR Using Dummy Variables • A dummy variable is a categorical explanatory variable with two levels: – yes or no, on or off, male or female – coded as 0 or 1 • Regression intercepts are different if the variable is significant • Assumes equal slopes for other variables • If more than two levels, the number of dummy variables needed is (number of levels - 1) • Different Intercepts, same slope Ŷ  b 0  b1 X1  b 2 (1)  (b 0  b 2 )  b1 X1 Fire Place Ŷ  b 0  b1 X1  b 2 (0)  No Fire Place b 0  b 1 X1 Y (sales) b0 + b2 b0 If H0: β2 = 0 is rejected, then “Fire Place” has a significant effect on Values Interaction Between Explanatory Variables • Hypothesizes interaction between pairs of X variables • Response to one X variable may vary at different levels of another X variable • Contains two-way cross product terms Ŷ  b 0  b1X1  b 2 X 2  b 3 X 3  b 0  b1X1  b 2 X 2  b 3 (X1 X 2 ) • Effect of Interaction – Without interaction term, effect of X1 on Y is measured by β1 – With interaction term, effect of X1 on Y is measured by β1 + β3 X2 – Effect changes as X2 changes • Example: Suppose X2 is a dummy variable and the estimated regression equation is Yˆ = 1 + 2X1 + 3X2 + 4X1X2 Y 0 0.5 1 1.5 X1 Slopes are different if the effect of X1 on Y depends on X2 value

Multiple Regression Model: Concepts & Applications

Related documents

Products

Support

Multiple Regression Model: Concepts & Applications

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib