Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008 1.3.1: The General Linear Model Model building is the process of developing an estimated regression equation that describes the relationship between a dependent variable and one or more predictors. Two major issues in model building are: a) b) To find the proper functional form (linear, quadratic, etc.) of the relationship To select the predictors to be included in the model Suppose we have data for one response y and k predictors x1, x2, …, xk. The General Linear Model (GLM) involving p predictors is: Multicollinearity in a regression model results from the common mistake of putting too many predictors into the model. Such a model is said to be “over-fit”. Inevitably many of the predictors will have effects that are too correlated and cannot be separately estimated y 0 1 z1 2 z 2 ... P z P w here z j f ( x 1 , x 2 , ..., x k ) j 1, 2, ..., p Note: The regression techniques are not limited to linear relationships. The word “linear” in the term GLM refers only to the fact that β0, β1,…, βp all have exponents of one; it does not imply that the relationship between y and the xi’s is linear. 1.3.2: Modeling Curvilinear Relationships Curvilinear relationships in RA can be handled by using powers of predictors. File: SALESPEOPLE.MTW We want to investigate the relationship between months of employment of the salespeople and the number of units sold. Although the output shows that a linear relationship explains a high percentage of the variability in sales, the standardized residual plot suggest that a curvilinear relationship is needed. R e s idua ls V e r s us the F itte d V a l ue s (r e s po ns e is S o ld) R e s idua ls V e r s us the F itte d V a l ue s (r e s po ns e is S o ld) 2 1,5 0 -1 -2 100 15 0 200 25 0 Fit t e d V a lue 300 35 0 400 This standardized residual plot shows that the previous curvilinear pattern has been removed. S t a nd a r d iz e d R e s id ua l S t a nd a r d iz e d R e s id ua l 1,0 1 0,5 0,0 -0,5 -1,0 -1,5 -2,0 1 00 15 0 200 Fit t e d V a lu e 250 300 350 1.3.3: Interaction between Predictors When interaction between two predictors is present, we cannot study the effect of one predictor on the response y independently of the other predictor. File: SALESPRICE.MTW Note that, at higher selling prices, the effect of increased advertising expenditure diminishes. This fact suggest existence of interaction between the price and advertising predictors. S c a tte r plo t o f S A L E S v s P R IC E 2 ,0 0 50 900 2 ,2 5 2 ,50 2,75 3,0 0 10 0 800 S A LES 700 600 500 400 300 2,00 2 ,25 2 ,50 2 ,75 3,00 PR ICE Pa ne l va r ia ble : A DV ER T IS ING We will use the predictor price*advertising (P*A) to account for the effect of interaction. Since the p-value corresponding to the t test for P*A is 0.000, we conclude that interaction is significant, i.e.: the effect of advertising expenditure on sales depends on the price We want to investigate the relationship between price, advertising expenditure and the number of units sold of a product. 1.3.4: Transformations of the Response Often the problem of non-constant variance can be corrected by transforming the response variable to a different scale using a logarithmic transformation, Log(y), or a reciprocal transformation,1/y. We want to investigate the relationship File: KMWEIGHT.MTW between Km per liter and the vehicle weight. This standardized residual plot shows that the variability in the residuals appears to increase as the fitted value increases. Furthermore, there is a large standardized residual. R e s idua ls V e r s us the F itte d V a l ue s R e s idua ls V e r s us the F itte d V a l ue s (r e s po ns e is Km ) (r e s po ns e is Lo gKm ) 2 S t a nd a r d iz e d R e s id ua l S t a nd a r d iz e d R e s id ua l 2 1 0 -1 -2 15 20 25 Fit t e d V a lue 30 35 1 0 -1 The wedge-shaped pattern has now disappeared. -2 Moreover, there 2,7 is not 2,8 any2large ,9 3standardized ,0 3 ,1 3 ,2residual. 3 ,3 Fit t e d V a lue 3,4 3,5 3 ,6 1.3.5: “Linearizing” Nonlinear Models Models in which the β parameters have exponents other than one are called nonlinear models. In some cases we can easily transform the nonlinear model into a linear one, as in the case of the exponential model: S c a tte r pl o t o f E [ y ] v s B 0 * B 1 ^ x 1600 1400 1200 1000 E[y ] 800 600 400 200 0 E [ y ] 0 1 x log E [ y ] log 0 x log 1 0 y lo g E [ y ] 3 4 5 6 7 8 9 S c a tte r plo t o f L o g( E [ y ] ) v s L o g( B 0 ) + x L o g( B 1 ) 8 7 1 lo g 1 Many nonlinear models cannot be transformed into an equivalent linear model. However, such models have had limited use in practical applications. 6 Lo g (E[y ]) 2 B0 * B1 ^ x 0 lo g 0 y 0 1 x 1 5 4 3 2 1 1 2 3 4 5 Lo g ( B0 ) + x Lo g ( B1 ) 6 7 8 1.3.6: Model Building: Initial Steps File: MODELBUILDING.MTW We want to investigate the relationship between Sales and the eight predictors. The Correlation Matrix provides the sample correlation coefficients between each pair of variables and the p-value for the corresponding test on significant correlation. The best predictors for Sales seem to be: Accts (R2=57%), Time, Poten and Adv. There is multicollinearity between Time and Accts; hence if Accts were used as a predictor, Time would not add much more explanatory power to the model. A similar problem occurs between Change and Rating. 1.3.7: Model Building: Predictors Selection Objective: to select those predictors that provide the “best” regression model. Alternative methods for selecting the predictors: 1. 2. Stepwise (forward and backward) regression: At each step, a single variable can be added or deleted. The process continues until no more improvement can be obtained by adding or deleting a variable. Forward selection: At each step, a single variable can be added, but variables are never deleted. The process continues until no more improvement can be obtained by adding a variable. 3. Backward elimination: The procedure starts with a model involving all the possible predictors. At each step a variable is deleted. The procedure stops when no more improvement can be obtained by deleting a variable. 4. Best-subsets regression: This is not a one-variableat-a-time method. It evaluates regression models involving different subsets of the predictors. Functions of the predictors (e.g.: z = x1 * x2) can be used to create new predictors for use with any of the methods presented here. • Iterative methods • The stopping criterion is based on the F statistic • They provide a “good” regression model with little multicollinearity problems, but not necessarily the “best” model (the one with the highest R2) It provides the “best” regression model for the given data. 1.3.8: Model Building: Stepwise Regression File: MODELBUILDING.MTW Stat > Regression > Stepwise... At each step: 1. The already-in-the-model predictor with the highest non-significant pvalue (p-value > α) will be deleted from the model 2. The not-already-in-the-model predictor with the lowest significant p-value (p-value <= α) will enter the model 3. P-values are recalculated In this example, Stepwise Regression takes 5 steps. At the end of the procedure, five predictors (Accts, Adv, Poten, Share and Change) have been selected for the regression model. Note that R-Sq(adj) = 88,94% for the last model. In Stepwise (Forward and Backward) Regression, a significance level of α=0.15 is recommended both for add or delete new variables to/from the model. 1.3.9: Model Building: Forward Selection File: MODELBUILDING.MTW Stat > Regression > Stepwise... At each step: 1. The not-already-in-the-model predictor with the lowest significant p-value (p-value <= α) will enter the model 2. P-values are recalculated In this example, Forward Selection takes 6 steps. At the end of the procedure, six predictors (Accts, Adv, Poten, Share, Change and Time) have been selected for the regression model. Note that R-Sq(adj) = 89,38% for the last model. Mallows’ C-p statistic represents a subsetting criterion to be used in selecting a reduced model without multicollinearity problems. A rule of thumb is to select a model in which the value of Cp is close to the number of terms, including any constant term, in the model. In Forward Selection, a significance level of α=0.25 is recommended for add new variables to the model. 1.3.10: Model Building: Backward Eliminat. File: MODELBUILDING.MTW Stat > Regression > Stepwise... At each step: 1. The already-in-the-model predictor with the highest nonsignificant p-value (p-value > α) will be deleted from the model 2. P-values are recalculated In this example, Backward Elimination takes 4 steps. At the end of the procedure, five predictors (Time, Poten, Adv, Share, and Change) have been selected for the regression model. Note that R-Sq(adj) = 89,27% for the last model. In Backward Elimination, a significance level of α=0.1 is recommended for delete variables from the model. 1.3.11: Model Building: Best-Subsets Reg. File: MODELBUILDING.MTW Stat > Regression > Best Subsets... In this example, the adjusted coefficient of determination is largest for the model with six predictors (Time, Poten, Adv, Share, Change and Accts). However, the best model with four independent variables (Poten, Adv, Share and Accts) has an adjusted coefficient of determination almost as high. The Best-Subsets output identifies the two best one-predictor models, the two best two-predictor models, and so on. The criterion used which models are best for any number of predictors is the value of R2. All other things being equal, a simpler model with fewer variables is usually preferred.