Ch 3: Multiple Linear Regression
1. Multiple Linear Regression Model
• Multiple regression model has more than one regressor.
• For example, we have one response variable and two regressor variables:
1. delivery time (y)
2. the number of cases of product stocked (x1 )
3. the distance walked by the route driver (x2 )
• With these variables, the regression model is
y = β0 + β1 x1 + β2 x2 + ε,
where we assume that the errors ε i.i.d N(0,σ 2 )
• In general, a multiple linear regression model with k regressor variables is
y = β0 + β1 x1 + · · · + βk xk + ε,
where ε i.i.d N(0,σ 2 ).
• Example with the delivery time data. Plot scatter diagram (matrix)
10
15
20
25
30
●
60
70
●
80
5
●
●
●
30
30
●
●
●
●
● ● ●
● ●● ●
● ●
● ●
●
●
●
●
●
●
20
●
●
●●●●●
●
●
●
● ●
●
●
●●
●
●
10
●
50
●
●
40
●
d.time
●
●
●
20
25
●
case
●
●
● ●
● ●
●●
●
●
●
1400
●
●
1000
●
●
●
●
● ● ●
●
● ●
●
●
●
10
20
●
●
●
●
30
40
50
60
70
●
distance
●
●
● ●
●
● ●● ●
● ● ●●
●
●
●
●
●
●
●
600
●
●
●
●
●
●● ●
●
●
●
●●
● ●●
●
200
●
●
●
●
0
5
10
15
●
●●
●
● ●
●
●●●
●● ●
●
● ●●
●●
●
●
80
0
1
200
600
1000
1400
• Data for multiple linear regression:
Observation
Response
Regressors
i
yi
x1
x2
...
xk
1
y1
x11
x12
...
x1k
2
..
.
y2
..
.
x21
..
.
x22
..
.
...
..
.
x2k
..
.
n
yn
xn1
xn2
...
xnk
• With the data above, the multiple regression model is
yi = β0 + β1 xi1 + β2 xi2 + · · · + βk xik + εi ,
where i = 1, 2, . . . , n
• Multiple regression model (matrix notation):
– The model is
y = Xβ + ,
where
1
1
X=
.
..
y1
y2
y= .
,
..
yn
x11 x12 . . . x1k
x21 x22 . . . x2k
,
..
..
..
.
.
.
1 xn1 xn2 . . . xnk
β0
β1
β= .
,
..
ε1
and
βk
ε2
= .
..
εn
– In general, y is an n × 1 matrix (vector) of the observation, X is an n × p(= k + 1)
matrix of the regressor variables, β is a p(= k + 1) × 1 vector of the regression
coefficients and is an n × 1 vector of random errors.
– The assumption of the error is
∼ N (0, σ 2 In )
2
2. Least-Squares Estimation
• In general, the multiple linear regression model is
yi = β0 + β1 xi1 + β2 xi2 + · · · + βk xik + εi ,
(i = 1, 2, . . . , n)
y = Xβ + ε,
where ε ∼ N (0, σ 2 In )
• Wish to find β̂ that minimize
S(β) =
n
X
ε2i
i=1
0
= εε
0
= (y − Xβ) (y − Xβ)
• S(β) can be expressed as
0
S(β) = (y − Xβ) (y − Xβ)
0
0
0
0
0
0
= y y − β X y − y Xβ + β X Xβ
0
0
0
0
= y y − 2y Xβ + β X Xβ
• LS estimators must satisfy
∂S
0
0
= −2X y + 2X X β̂ = 0
∂β βˆ
which is
0
0
X X β̂ = X y
It is called the least-squares normal equations
• The least-squares estimators of β is
0
0
β̂ = (X X)−1 X y
0
provided that (X X)−1 exists.
3
• The fitted regression model is
ŷ = X β̂
0
0
= X(X X)−1 X y
= Hy,
0
0
where the n × n matrix H = X(X X)−1 X is called the hat matrix
• The residual is
e = y − ŷ
= y − X β̂
= y − Hy
= (I − H)y
• Example (The delivery time data)
– The multiple regression model is
yi = β0 + β1 xi1 + β2 xi2 + εi ,
(i = 1, 2, . . . , 25)
y = Xβ + ε,
– Least-squares estimator of β is
β̂0
2.3412
β=
= 1.6159
β̂
1
β̂2
0.0143
– The least-squares fit is
ŷ = 2.3412 + 1.6159x1 + 0.0143x2
Properties of the Least-Squares Estimators
• Under the assumption E(ε) = 0 and V (ε) = σ 2 In
E(β̂) = β
and
4
0
V (β̂) = σ 2 (X X)−1
Estimation of σ 2
• Estimation of σ 2
n
X
SSE =
e2i
i=1
0
= ee
0
= (y − X β̂) (y − X β̂)
0
0
0
0
0
= y y − 2β̂ X y + β̂ X X β̂
0
0
0
= y y − β̂ X y
• SSE has n − (k + 1) degree of freedom
• The estimator of σ 2 is
σ̂ 2 = M SE =
SSE
n−k−1
Note that M SE is model dependent.
• M SE is an unbiased estimator of σ 2 , that is
E(M SE ) = σ 2
• Example (The delivery time data)
0
0
0
SSE = y y − β̂ X y
= 18310.63 − 18076.90 = 233.726
So the estimator is
σ̂ 2 =
233.726
= 10.6239
25 − 2 − 1
Output from R (the delivery time data)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.341231
1.096730
2.135 0.044170 *
5
case
1.615907
0.170735
9.464 3.25e-09 ***
distance
0.014385
0.003613
3.981 0.000631 ***
Residual standard error: 3.259 on 22 degrees of freedom
3. Hypothesis testing in multiple linear regression
Test for significance of regression
• The appropriate hypotheses are
H0 :
β1 = β2 = · · · = βk = 0
H1 :
βj 6= 0
for at least one j
• Rejection the null hypothesis: at least one of the regressors x1 , x2 , . . . , xk contributes
significantly to the model
• Do perform ANOVA
1. Decompose SST into SSR and SSE . That is,
n
n
2
n 0 0
2 o n 0
o
0
1X
1X
0
yy−
yi = β̂ X y −
yi
+ y y − β̂ X y .
n i=1
n i=1
0
2. Consider degrees of freedom for each source of variation
dfSST = dfSSR + dfSSE
(n − 1) = k + (n − k − 1)
3. Compute MS
M SR =
SSR
k
and
M SE =
4. Obtain F-statistic
F0 =
M SR
M SE
5. Decision rule: reject H0 if
F0 > Fα,k,(n−k−1)
6
SSE
n−k−1
• ANOVA table
Source of Variation
SS
0
0
Pn
β̂ X y − n1 (
Regression
i=1 yi )
0
0
0
y y − n1 (
Total
MS
F0
k
M SR
M SR /M SE
2
0
y y − β̂ X y
Error
DF
n − k − 1 M SE
Pn
2
i=1 yi )
n−1
• Example (the delivery time data)
Source of Variation
SS
DF
MS
F0
Regression
5550.8166
2
2775.408
261.24
Error
233.726
22
10.6239
Total
5784.54
24
Note that F0.05,2,22 = 3.44
Tests on individual regression coefficients
• Test for the significance of any individual regression coefficient
H0 :
βj = 0
H1 :
βj 6= 0
• Test statistic for the hypothesis is
t0 = q
β̂j
σ̂ 2 Cjj
=
β̂j
se(β̂j )
0
,
where Cjj is the diagonal element of (X X)−1 corresponding to β̂j .
• Decision rule: reject H0 if
|t0 | > tα/2,n−k−1
• Output from R (the delivery time data)
7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.341231
1.096730
2.135 0.044170 *
case
1.615907
0.170735
9.464 3.25e-09 ***
distance
0.014385
0.003613
3.981 0.000631 ***
Residual standard error: 3.259 on 22 degrees of freedom
Multiple R-Squared: 0.9596,
Adjusted R-squared: 0.9559
F-statistic: 261.2 on 2 and 22 DF,
p-value: 4.441e-016
Coefficient of multiple determination
• The R2 is defined as
SSR
SSE
=1−
SST
SST
R2 =
• In general, R2 always increases when a regressor is added to the model regardless of
the value of the contribution of that variable.
• Use an adjusted R2 defined as
2
Radj
=1−
SSE
n−1
n − k − 1 SST
• Example (the delivery time data)
2
Radj
=1−
24 233.726
= 0.9559
22 5784.54
4. Confidence Intervals in Multiple Regression
C.I. on the regression coefficients
• The test statistic
β̂j − βj
q
,
j = 0, 1, . . . , k
σ̂ Cjj
is distributed as t with n − (k + 1) df
8
• 100(1 − α) percent C.I. for βj is
β̂j − tα/2,n−(k+1) se(β̂j ) ≤ βj ≤ β̂j + tα/2,n−(k+1) se(β̂j )
• Example (the delivery time): find C.I. for βj . With β̂1 = 1.61519, C11 = 0.00274378,
and σ̂ 2 = 10.6239, obtain
1.26181 ≤ β1 ≤ 1.97001
0
C.I. on the mean response at a particular point x0 = (1, x01 , . . . , x0k )
• The fitted value at this point is
0
ŷ0 = x0 β̂
0
0
0
• E(ŷ0 ) = x0 β = E(y|x0 ) and V ar(ŷ0 ) = σ 2 x0 (X X)−1 x0
• The test statistic
ŷ0 − E(y|x0 )
q
0
σ̂ x0 (X0 X)−1 x0
is distributed as t with n − (k + 1) df
• 100(1 − α) percent C.I. on the mean response at the point x0 is
q
q
0
0
ŷ0 − tα/2,n−(k+1) σ̂ x0 (X0 X)−1 x0 ≤ E(y|x0 ) ≤ ŷ0 + tα/2,n−(k+1) σ̂ x0 (X0 X)−1 x0
• Example (the delivery time): find C.I. on the mean response at the point x0 =
0
(1, 8, 275) .
17.66 ≤ E(y|x0 ) ≤ 20.78
5. Extra sum of squares method for testing
Introduction
• The goal of this section is to investigate the contribution of a subset of the regressor
variables to the model.
9
• Let the vector of regression coefficients be partitioned as follows
β1
β=
β2
• Wish to test the hypotheses
H0 :
β2 = 0
H1 :
β 2 6= 0
• The model my be written as
y = Xβ + ε = X1 β 1 + X2 β 2 + ε
• Under the assumption that the null hypothesis is true, we have the reduced model as
y = X1 β 1 + ε
• Note that the full model is
y = Xβ + ε = X1 β 1 + X2 β 2 + ε
• In the reduced model, the model sum of squares (SSR) is
0
0
SSR (β 1 ) = β̂ 1 X1 y
• The SSR due to β 2 given that β 1 is already in the model is
SSR (β 2 |β 1 ) = SSR (β) − SSR (β 1 )
where this sum of squares is called the extra sum of squares due to β 2
• The test statistic the null hypothesis is
F0 =
SSR (β 2 |β 1 )/r
M SE
follows the distribution Fr,n−(k+1) under the null hypothesis
10
Partial F test
• Consider the model
y = β0 + β1 x1 + β2 x2 + β3 x3 + ε
• The sum of squares,
SSR (β1 |β0 , β2 , β3 )
SSR (β2 |β0 , β1 , β3 )
SSR (β3 |β0 , β1 , β2 )
represent partial contribution to the model
• SST = SSR (β1 , β2 , β3 |β0 ) + SSE
• Decompose the 3 df SSR into SS with 1 df as follows
SSR (β1 , β2 , β3 |β0 ) = SSR (β1 |β0 ) + SSR (β2 |β1 , β0 ) + SSR (β3 |β1 , β2 , β0 )
• Example (the delivery time data): Investigate the contribution of the distance (x2 ) to
the model
• The hypothesis is
H0 :
β2 = 0
H1 :
β2 6= 0
• The extra SS due to β2 is
SSR (β2 |β1 , β0 ) = SSR (β1 , β2 , β0 ) − SSR (β1 , β0 )
= SSR (β1 , β2 |β0 ) − SSR (β1 |β0 )
• The term SSR (β1 , β2 |β0 ) = 5550.8166 can be obtained from SSR of the model y =
β0 + β1 x1 + β2 x2 + ε
• The term SSR (β1 |β0 ) = 5382.4077 can be obtained from the SSR of the model y =
β0 + β1 x1 + ε
11
• SSR (β2 |β1 , β0 ) = 168.4078 with 1 df
• The test statistic is
F0 =
168.4078
SSR (β2 |β1 , β0 )/1
=
= 15.85
M SE
10.6239
• Reject H0 because (15.85 > F0.05,1,22 = 4.30)
Testing the general linear hypothesis
• Consider the full model
y = β0 + β1 x1 + β2 x2 + β3 x3 + ε
• Wish to test the hypothesis
H0 :
β1 = β3
H0 :
T β = 0,
where T = (0, 1, 0, −1).
• Under the null hypothesis, the full model will be the reduced model as
y = β0 + β1 x1 + β2 x2 + β1 x3 + ε
= β0 + β1 (x1 + x3 ) + β2 x2 + ε
= γ0 + γ1 z1 + γ2 z2 + ε,
where γ0 = β0 , γ1 = β1 = β3 , z1 = x1 + x3 , γ2 = β2 , and z2 = x2 .
• The sum of squares due to hypothesis is
SSH = SSE (RM ) − SSE (F M )
• The df is (n − 3) − (n − 4) = 1
• The statistic for testing is
F0 =
SSH /1
SSE (F M )/(n − 4)
12
• Consider the full model
y = β0 + β1 x1 + β2 x2 + β3 x3 + ε
• Wish to test the hypothesis
H0 :
β1 = β3 , β2 = 0
H0 :
T β = 0,
where
0 1 0 −1
T =
0 0 1
0
• Under the null hypothesis, the full model will be the reduced model as
y = β0 + β1 x1 + β1 x3 + ε
= β0 + β1 (x1 + x3 ) + ε
= γ0 + γ1 z1 + ε,
where γ0 = β0 , γ1 = β1 = β3 , z1 = x1 + x3 .
• The sum of squares due to hypothesis is
SSH = SSE (RM ) − SSE (F M )
• The df is (n − 2) − (n − 4) = 2
• The statistic for testing is
F0 =
SSH /2
SSE (F M )/(n − 4)
• In general, the test statistic for the hypothesis, T β = 0
F0 =
SSH /r
,
SSE (F M )/(n − k − 1)
where r is the degrees of freedom due to the hypothesis. It is equal to the number of
independent equations of T β = 0.
13