Multiple Regression (I)

advertisement
Multiple Regression (I)
6.1 Multiple Regression Models
Let Y = (Y1 , · · · , Yn ) be the response (or dependent) variable. Let Xj = (X1j , · · · , Xnj ) for
j = 1, · · · , p − 1 be the predictor (or covariate, independent)variable. In a multiple regression
model, we assume
• The relation between response and predictor variables is linear.
• Predictor variables are not random.
• Response variable is random.
• The error term is identically independently normally distributed (iid).
• The expected value of the the error term is 0.
The model can be written as
Yi =β0 + β1 Xi1 + β2 Xi2 + · · · + βp Xi,p−1 + ϵi
=β0 +
p−1
∑
(1)
Xij βj + ϵi
j=1
where ϵi ∼iid N (0, σ 2 ).
Model (1) can include:
• Categorical predictor variables. A dummy variable may be defined as
{
1 if the i-th observation is female
.
0 if the i-th observation if male
We need to use k − 1 dummy variables if a categorical predictor variable has k levels.
Xij =
• Ordinal predictor variables. If the level of a categorical predictor variable can be rank,
we can generally define the value of a predictor variable from 1 to k for low level to high
level of the predictor variable.
• Polynomial regression. The model can be written as
2
Yi =β0 + β1 Xi1 + β2 Xi1
+ ϵi
=β0 + β1 Zi1 + β2 Zi2 + ϵi .
• Transformation. The model may also be the form of
log Yi = β0 +
p−1
∑
βj Xij + ϵi .
j=1
The Box-Cox transformation may also be used as
p−1
∑
Yiλ − 1
= β0 +
βj Xij + ϵi ,
λ
j=1
where the conventional values of λ are λ = 2, 1, 0.5, 0, −0.5, −1.0, −2.
1
• Interaction effects. The model may also be
Yi =β0 + β1 Xi1 + β2 Xi2 + β3 Xi1 Xi2 + ϵi .
• Combination of Cases. A model may contain the combination of the above cases.
6.2-6.6 General Linear Regression Model in Matrix Terms
Let



1 X11
Y1



 1 X21
 Y2 


Y =
..
 ..  , X = 
 ..
 . 
.
.
Yn
1 Xn1
···
···
..
.

 
X1,p−1
ϵ1



X2,p−1 
 ϵ2 


, and ϵ =  .. 
.. 
.
 . 
. 
· · · Xn,p−1
(2)

ϵn
Then, the matrix expression of the model is
Y = Xβ + ϵ,
where Y is an n-dimensional random vector, X is an n × p matrix, β is a p-dimension parameter
defined in (2), ϵ ∼ N (0, σ 2 I) is an error term and σ 2 is an unknown parameter. Assume the rank
of X is p.
We have shown
β̂ = (X ′ X)−1 X ′ Y.
Thus,
E(β̂) = E[(X ′ X)−1 X ′ Y ] = (X ′ X)−1 X ′ E(Y ) = (X ′ X)−1 X ′ (Xβ) = β,
and
Cov(β̂, β̂) = (X ′ X)−1 X ′ Cov(Y, Y )X(X ′ X)−1 = σ 2 (X ′ X)−1 .
The model residual is
ei = Yi − Ŷi = Yi − (β̂0 +
p−1
∑
β̂j Xij ).
i=1
Let e = (e1 , · · · , en )′ . Then
e = [I − X(X ′ X)−1 X ′ ]Y = (I − H)Y,
where H = X(X ′ X)−1 X ′ .
The estimator of σ 2 is
σ̂ 2 =
SSE
1
=
Y ′ [I − X(X ′ X)−1 X ′ ]Y,
n−p
n−p
where
SSE =
n
∑
(Ŷi − Yi )2 = Y [I − X(X ′ X)−1 X ′ ]Y.
i=1
2
Comment: σ̂ is not an MLE of σ 2 , but it is UMVU(Uniform Minimim Variance Unbiased)
estimator. The MLE is SSE/n.
The Analysis of Variable Table may be given as
2
Source
df
SS
Regression p − 1 SSR
Error
n − p SSE
Total
n−1
MS
M SR =
M SE =
F-value
p-value
MSR/MSE P [Fp−1,n−p > M SR/M SE]
SSR
p−1
SSE
n−p
SST
where SSR may be partitioned. The p-value tests
H0 : β1 = · · · = βp−1 = 0.
In addition, we can use the
SSR
SSE
=1−
.
SST
SST
The value of R2 tell us how much variation of the data can be interpreted by the linear function.
Its value is always between 0 and 1. If the value is large, then more variation is interpreted by the
linear function.
There is an adjusted R2 given by
R2 =
n−1
(1 − R2 )
n−p
2
Radj
=1−
which can be used the same way as R2 .
We have the following important properties:
•
E(β̂) = β
•
β̂j − βj
s(β̂j )
∼ tn−p .
Therefore, the (1 − α)-level confidence interval for βj is
β̂j ± tα/2,n−p s(β̂j ).
To test
H0 : βj = 0,
we can look at the p-value of βj given by
P [|tn−p | > |
β̂j
s(β̂j )
|].
If p-value is small, then we reject H0 .
• To compute the joint confidence interval of (βj1 , · · · , βjk ) , we can use the Bonferroni interval
as
α
β̂j ± t 2k
,n−p s(β̂j ).
3
•
SSE ∼ σ 2 χ2n−p .
Therefore, the (1 − α)-level confidence interval for σ 2 is
[
SSE
χ2α/2,n−p
,
SSE
χ21−α/2,n−p
].
6.7 Estimation of Mean Response and Prediction of New Observation
Let x0 = (x01 , · · · , x0,p−1 ) be a new observation of predictor variables. Then, the prediction of
the response is
ŷ0 = β̂0 +
p−1
∑
β̂j x0j .
j=1
The variance of the mean of the response is



1






′
−1  x01 
(1,
x
,
·
·
·
,
x
)(X
X)
V (ŷ0 ) = SSE 
01
0,p−1

 ..  .

 . 
x0,p−1
The 1 − α level confidence interval for the mean response is
√
ŷ0 ± tα/2,n−p V (ŷ0 )
and the 1 − α level confidence interval for the prediction of the observed value is
√
ŷ0 ± tα/2,n−p V (ŷ0 ) + M SE.
6.8 Diagnostics and Remedial Measures
We have the following methods.
• Scatter plot: plot response vs predictor variables.
• Residual plot: plot residual vs predicted values.
• Normal plot: plot normal CDF vs residuals
• Test for constancy of error variance.
– Modified Levene test. Divide residual into two groups with n1 and n2 observations,
respectively, where n = n1 + n2 . Let ẽ1 and ẽ2 be the medians of the residuals in the
two groups. Let
di1 = |ei1 − ẽ1 |, di2 = |ei2 − ẽ2 |.
Use the pooled t-test statistic
d¯1 − d¯2
,
t∗L = √
s 1/n1 + 1/n2
where
∑
∑
(di1 − d¯1 )2 + (di2 − d¯2 )2
s =
.
n−2
We reject constancy of variance if |t∗L | > tα/2,n−2 .
2
4
Table 1: Model Summaries for Dwaine Studio Example, SST = 26196.
Model
S=N
S=I
S=I+N
R2
0.8922
0.6986
0.9167
SSE
2824.40
7896.43
2180.92
Intercept is always included in the model.
– Breusch-Pagan test. Fit a model
log e2i = γ0 + γ1 Xi
and test H0 : γ1 = 0.
• F-test for lack of fit. The method is useful in ANOVA model because it needs replicated
values of independent variables.
6.9 Examples-Multiple Regression with Two Predictor Variables
Example of the textbook. The example is from Section 6.9 in the textbook. In this example,
Dwaine Studio company operates portrait studios in 21 cities of medium size. The studios specialize in portraits of children. It is considered an expansion and wish to investigate whether the
sales(S)(in thousand dollars) in a community can be predicted from the number of persons(N)(in
thousand persons) aged 10 or younger and the per capita disposable personal income(I)(in thousand dollars) in the community. The summary of the SAS output are given in Table 1.
The plots do not show us any significantly abnormal. In the model S = I +N , all the predictors
are significant (p-values less than 0.05). Thus the regression model S = I + N is the final fitted
model.
5
Download