P1-3: Multiple Regression II

advertisement
Part I – MULTIVARIATE ANALYSIS
C3 Multiple Linear Regression II
© Angel A. Juan & Carles Serrat - UPC 2007/2008
1.3.1: The General Linear Model

Model building is the process of developing an
estimated regression equation that describes the
relationship between a dependent variable and one
or more predictors.

Two major issues in model building are:
a)
b)

To find the proper functional form (linear, quadratic,
etc.) of the relationship
To select the predictors to be included in the model
Suppose we have data for one response y and k
predictors x1, x2, …, xk. The General Linear Model
(GLM) involving p predictors is:
Multicollinearity in a regression
model results from the common
mistake of putting too many
predictors into the model. Such a
model is said to be “over-fit”.
Inevitably many of the predictors will
have effects that are too correlated and
cannot be separately estimated
y   0   1 z1   2 z 2  ...   P z P  
w here z j  f ( x 1 , x 2 , ..., x k )  j  1, 2, ..., p

Note: The regression techniques are not limited to
linear relationships.
The word “linear” in the term GLM refers
only to the fact that β0, β1,…, βp all have
exponents of one; it does not imply that the
relationship between y and the xi’s is linear.
1.3.2: Modeling Curvilinear Relationships

Curvilinear relationships in RA can be
handled by using powers of predictors.

File: SALESPEOPLE.MTW
We want to investigate the
relationship between
months of employment of
the salespeople and the
number of units sold.
Although the
output shows
that a linear
relationship
explains a high
percentage of
the variability in
sales, the
standardized
residual plot
suggest that a
curvilinear
relationship is
needed.
R e s idua ls V e r s us the F itte d V a l ue s
(r e s po ns e is S o ld)
R e s idua ls V e r s us the F itte d V a l ue s
(r e s po ns e is S o ld)
2
1,5
0
-1
-2
100
15 0
200
25 0
Fit t e d V a lue
300
35 0
400
This
standardized
residual plot
shows that the
previous
curvilinear
pattern has
been
removed.
S t a nd a r d iz e d R e s id ua l
S t a nd a r d iz e d R e s id ua l
1,0
1
0,5
0,0
-0,5
-1,0
-1,5
-2,0
1 00
15 0
200
Fit t e d V a lu e
250
300
350
1.3.3: Interaction between Predictors

When interaction between two predictors
is present, we cannot study the effect of
one predictor on the response y
independently of the other predictor.

File: SALESPRICE.MTW
Note that, at higher selling prices, the effect of increased
advertising expenditure diminishes. This fact suggest existence
of interaction between the price and advertising predictors.
S c a tte r plo t o f S A L E S v s P R IC E
2 ,0 0
50
900
2 ,2 5
2 ,50
2,75
3,0 0
10 0
800
S A LES
700
600
500
400
300
2,00
2 ,25
2 ,50
2 ,75
3,00
PR ICE
Pa ne l va r ia ble : A DV ER T IS ING
We will use the predictor price*advertising (P*A) to account for the
effect of interaction. Since the p-value corresponding to the t test for
P*A is 0.000, we conclude that interaction is significant, i.e.: the effect
of advertising expenditure on sales depends on the price
We want to
investigate the
relationship
between price,
advertising
expenditure
and the number
of units sold of
a product.
1.3.4: Transformations of the Response

Often the problem of non-constant variance can be
corrected by transforming the response variable to
a different scale using a logarithmic transformation,
Log(y), or a reciprocal transformation,1/y.
We want to investigate the relationship

File: KMWEIGHT.MTW
between Km per liter and the vehicle weight.
This standardized
residual plot shows
that the variability in
the residuals appears
to increase as the
fitted value increases.
Furthermore, there is
a large standardized
residual.
R e s idua ls V e r s us the F itte d V a l ue s
R e s idua ls V e r s us the F itte d V a l ue s
(r e s po ns e is Km )
(r e s po ns e is Lo gKm )
2
S t a nd a r d iz e d R e s id ua l
S t a nd a r d iz e d R e s id ua l
2
1
0
-1
-2
15
20
25
Fit t e d V a lue
30
35
1
0
-1
The wedge-shaped
pattern has now disappeared.
-2
Moreover, there
2,7 is not
2,8 any2large
,9
3standardized
,0
3 ,1
3 ,2residual.
3 ,3
Fit t e d V a lue
3,4
3,5
3 ,6
1.3.5: “Linearizing” Nonlinear Models

Models in which the β parameters
have exponents other than one are
called nonlinear models.
In some cases we can easily
transform the nonlinear model into a
linear one, as in the case of the
exponential model:
S c a tte r pl o t o f E [ y ] v s B 0 * B 1 ^ x
1600
1400
1200
1000
E[y ]

800
600
400
200
0
E [ y ]   0  1
x
log E [ y ]  log  0  x log  1
0
y   lo g E [ y ]
3
4
5
6
7
8
9
S c a tte r plo t o f L o g( E [ y ] ) v s L o g( B 0 ) + x L o g( B 1 )
8
7
 1  lo g  1
Many nonlinear models cannot be
transformed into an equivalent linear
model. However, such models have
had limited use in practical
applications.
6
Lo g (E[y ])

2
B0 * B1 ^ x
 0   lo g  0
y    0    1 x
1
5
4
3
2
1
1
2
3
4
5
Lo g ( B0 ) + x Lo g ( B1 )
6
7
8
1.3.6: Model Building: Initial Steps

File: MODELBUILDING.MTW
We want to investigate the relationship
between Sales and the eight predictors.
The Correlation Matrix provides the sample
correlation coefficients between each pair of
variables and the p-value for the corresponding
test on significant correlation.
The best predictors for Sales seem to be: Accts (R2=57%),
Time, Poten and Adv. There is multicollinearity between
Time and Accts; hence if Accts were used as a predictor,
Time would not add much more explanatory power to the model.
A similar problem occurs between Change and Rating.
1.3.7: Model Building: Predictors Selection

Objective: to select those predictors that provide the
“best” regression model.

Alternative methods for selecting the predictors:
1.
2.
Stepwise (forward and backward) regression: At
each step, a single variable can be added or
deleted. The process continues until no more
improvement can be obtained by adding or deleting
a variable.
Forward selection: At each step, a single variable
can be added, but variables are never deleted. The
process continues until no more improvement can
be obtained by adding a variable.
3.
Backward elimination: The procedure starts with a
model involving all the possible predictors. At each
step a variable is deleted. The procedure stops
when no more improvement can be obtained by
deleting a variable.
4.
Best-subsets regression: This is not a one-variableat-a-time method. It evaluates regression models
involving different subsets of the predictors.
Functions of the predictors
(e.g.: z = x1 * x2) can be
used to create new predictors
for use with any of the
methods presented here.
• Iterative methods
• The stopping
criterion is based on
the F statistic
• They provide a
“good” regression
model with little
multicollinearity
problems, but not
necessarily the “best”
model (the one with
the highest R2)
It provides the “best”
regression model for
the given data.
1.3.8: Model Building: Stepwise Regression

File: MODELBUILDING.MTW

Stat > Regression >
Stepwise...
At each step:
1. The already-in-the-model predictor
with the highest non-significant pvalue (p-value > α) will be deleted
from the model
2. The not-already-in-the-model
predictor with the lowest significant
p-value (p-value <= α) will enter the
model
3. P-values are recalculated
In this example, Stepwise
Regression takes 5 steps. At the
end of the procedure, five
predictors (Accts, Adv, Poten,
Share and Change) have been
selected for the regression model.
Note that R-Sq(adj) = 88,94%
for the last model.
In Stepwise (Forward
and Backward)
Regression, a
significance level of
α=0.15 is
recommended both for
add or delete new
variables to/from the
model.
1.3.9: Model Building: Forward Selection

File: MODELBUILDING.MTW

Stat > Regression >
Stepwise...
At each step:
1. The not-already-in-the-model
predictor with the lowest significant
p-value (p-value <= α) will enter the
model
2. P-values are recalculated
In this example, Forward
Selection takes 6 steps. At the
end of the procedure, six
predictors (Accts, Adv, Poten,
Share, Change and Time) have
been selected for the regression
model. Note that R-Sq(adj) =
89,38% for the last model.
Mallows’ C-p statistic represents a subsetting
criterion to be used in selecting a reduced
model without multicollinearity problems. A
rule of thumb is to select a model in which the
value of Cp is close to the number of terms,
including any constant term, in the model.
In Forward
Selection, a
significance
level of α=0.25
is
recommended
for add new
variables to the
model.
1.3.10: Model Building: Backward Eliminat.

File: MODELBUILDING.MTW

Stat > Regression >
Stepwise...
At each step:
1. The already-in-the-model
predictor with the highest nonsignificant p-value (p-value > α)
will be deleted from the model
2. P-values are recalculated
In this example, Backward
Elimination takes 4 steps. At the
end of the procedure, five
predictors (Time, Poten, Adv,
Share, and Change) have been
selected for the regression model.
Note that R-Sq(adj) = 89,27%
for the last model.
In Backward Elimination,
a significance level of
α=0.1 is recommended
for delete variables from
the model.
1.3.11: Model Building: Best-Subsets Reg.

File: MODELBUILDING.MTW

Stat > Regression >
Best Subsets...
In this example, the adjusted
coefficient of determination is
largest for the model with six
predictors (Time, Poten, Adv,
Share, Change and Accts).
However, the best model with four
independent variables (Poten,
Adv, Share and Accts) has an
adjusted coefficient of
determination almost as high.
The Best-Subsets output
identifies the two best
one-predictor models, the
two best two-predictor
models, and so on. The
criterion used which
models are best for any
number of predictors is
the value of R2.
All other things being equal, a
simpler model with fewer
variables is usually preferred.
Download