Multiple Linear Regression H A P T E R 3

advertisement
HAPTER
3
Multiple Linear Regression
-:gression model that involves more than one regressor variable is called a
: -rltiple regression model. Fitting and analyzing these models is discussedin this
.rter. The results are extensions of those in Chapter 2 for simple linear
-:cssion.
1I
\IULTIPLE
REGRESSION MODELS
.-rose that the yield in pounds
of conversion in a chemical process depends on
" rcrature and the catalyst concentration. A multiple regression model that
*:t describethis relationship is
l:
F o - tF f i r + F z x r * e
(3 . 1 )
-:e ,1, denotes the yield, x, denotes the temperature, and x2 denotes the
,.rst concentration. This is a multiple linear regression model with two regres- '.ariables. The term linear is used becauseEq. (3.1) is linear function
a
of the
..:r()wnparameters Be, Br, and B,r.
. ne regression model in Eq. (3.1) describes a plane in the three-dimensional
.-: of l, xt, and xr. Figure3.la shows this regressionplane for the model
E(y):50*10x,t7x,
;r-d we have assumedthat the expected value of the error term e in Eq. (3.1) is
- ,. The parameter
Bo is the intercept of the regressionplane. If the range of the
: - r i n c l u d e s1 1 : x z : 0 , t h e n F o i s t h e m e a n o f y w h e i l x t : x 2 : 0 . O t h e r w i s e
has no physical interpretation. The parameter Fr indicates the expected
,nse iil response (y) per unit change in x, when x, is held constant. Similarly
measures the expected change in y per unit change in x, when x, is held
:r:triflt. Figure 3.1b shows a contour plot of the regressionmodel, that is, lines of
.luction to Linear RegressionAnalysis, Fourth Edition.
:)r)uglas C. Montgomery,F,lizabeth A. Peck, and Geoffrey Vining.
',rieht @ 2006 John Wiley & Sons, Inc.
63
64
MULTIPLE LINEAR REGRESSION
-:tt_i
f
L
|l
*
ii(,R
240
2AO
160
E(y)12o
80
4A
0
0L
0
6
(a)
I
10x1
(b)
Figure3.1 (a) ttre regression plane for the model E(y) :50
+ L0rr -r 7xr. (b) The contour plot.
constant expected response E(y) as a function of x, and xr. Notice that the
contour lines in this plot are parallel straight lines.
In general, the response y may be related to k regressor or predictor variables.
The model
f : F o i -F f l t t
Fzxz+ ...+ Bpxpt e
(3.2)
is called a multiple linear regression model with k regressors. The parameters F.,.
j :0,1,...,k,
are called the regression coefficients. This model describes a
hyperplane in the k-dimensional space of the regressor variables xr. The parameter Bt represents the expected change in the response y per unit change in x
when all of the remaining regressor variables xi(i + /) are held constant. For this
reason the parameters Fi, j : 1,2, . . . , k, are often called partiat regression
coefficients.
Multiple linear regression models are often used as empirical models or
approximating functions. That is, the true functional relationship between y and
xy x2,. . ., xk is unknown, but over certain ranges of the regressor variables the
linear regression model is an adequate approximation to the true unknown
function.
Models that are more complex in structure than Eq. (3.2) may often still be
analyzed by multiple linear regression techniques. For example, consider the cubic
polynomial model
!:Fo*Ffi*Fzxz+Brx3+e
( 3 . 3|
(3.4)
which is a multiple linear regressionmodel with three regressorvariables.Polyno
mial modelswill be discussedin more detail in Chapter 7.
Models that include interaction effectsmay also be analyzedby multiple linear
regressionmethods.For example,supposethat the model is
! : B o - f F f i r t F z x z* F n x r x , * e
r- .lr
xE.H
ir -J
tir.l_rS
L
(3.5)
t
u--*Oc- s
fu ::rrJ
tl
t}
fraa
{tbtrlll
!j
rr-..si
a
:5al
trFr
grr: of
*
I
tl:n I :o
,r - -.: lnt,
ksa
tlMs
$c1
ffi<.r.r
l - , i - Le
If we let xr: x, x2: x2, and .x3: x3, then Eq. (3.3)can be written as
f:Fo1-Ffir*Fzxz*Brxr+.e
llr
-,
t t
*" + &
=
r,
r:rcn
t
r-5,
--:c d
Ll rnr
{,
r
=
t
3.I
MULTIPLE
65
REGRESSION MODELS
xz
10
I
6
4
10
Nill
2
0
(a)
(b)
plot of regressionmodel E(y) :50 * 1.0x1* 7x2 * 5xp2. (D) The
ftlre 3.2 (c) Three-dimensional
otour plot.
I we let x3 : xtxz and B3 : Ftz, then Eq. (3.5) can be written as
.
y:Fo*Qflr*Fzxz*Brxr*e
(3.6)
Itich is a linear regressionmodel.
Figure 3.2a showsthe three-dimensionalplot of the regressionmodel
1l:50*10x, *7x, l5xrx2
rd Figure 3.2b the correspondingtwo-dimensionalcontour plot. Notice that,
*bough this model is a linear regressionmodel, the shape of the surface that is
;nerated by the model is not linear. In general, any regressionmodel that is
in the parameters (the B's) is a linear regressionmodel, regardlessof the
hrr
of
-pe
the surface that it generates.
Figure 3.2 provides a nice graphical interpretation of an interaction. Generally,
beraction implies that the effect produced by changing one variable (xr, say)
tpends on the level of the other variable (x). For example,Figure 3.2 showsthat
&nging x, from 2 to 8 producesa much smaller changein E(y) when xz : 2
whefl xz: 10. Interaction effectsoccur frequently in the study and analysis
h
d real-world systems,and regressionmethods are one of the techniquesthat we
C! use to describethem.
As a final example,consider the second-ordermodel with interaction
+ Brrxrx, * e
! : Fo * Ftxr * Fzxz + Fnx? * Fzzx2z
(3.7)
f we let x, .: x?, x4: xtr,x5 : xtx2, Fz: Fw F+: Br2, and Fs: Fp, then
D+ (3.7) can be written as a multiple linear regressionmodel as follows:
!:
F o * F f i r * F z x z * F z x t + .F + x q * B r x r * e
Figure 3.3 showsthe three-dimensionalplot and the correspondingcontour plot
A0)
: 800 * 1,0x,* 7x, - 8.5x2,- 5*7 * 4xrx,
66
MULTIPLE LINEAR REGRESSION
800
600
400
200
0
(a)
OF THI
: m odel, all o1
. rl variables.'
' iln
observat
. \ \ hen t he da
I be fixed r
" : : t ir t t he obs
:t r.rtttdepend
: - , t h c s e so r c o
giren
t 'i
-____ . r '
1000
E(v)
-
. . :\ T I O N
(b)
,
Figure 3.3 (a) Three-dimensional plot of the regression model E(y) -- 800 + 10xr * 7x2 - 8.5x1 5xj + 4xrxr. (b) The contour plot.
Liflcl Vanan
----------{,
,\ntL' thC Sa
-
t ,
These plots indicate that the expected change in y when x, is changed by one unit
(say) is a function of both x, and xz. The quadratic and interaction terms in this
model produce a mound-shaped function. Depending on the values of the regression coefficients, the second-order model with interaction is capable of assuminga
wide variety of shapes;thus, it is a very flexible regression model.
In most real-world problems, the values of the parameters (the regression
coefficients Fi) and the error variance o2 will not be known, and they must be
estimated from sample data. The fitted regression equation or model is typically
used in prediction of future observations of the response variable y or for
estimating the mean response at particular levels of the y's.
:
. . . . . r r c fsu n c t i
\,
/l
R
,",. t_1...
\ ilLrst be n
-
1
/1
.,.\
3.2
ESTIMATION OF THE MODEL PARAMETERS
,
. 1
J
. p t
3.2.1 Least-SquaresEstimation of the RegressionCoefTicients
The method of least squares can be used to estimate the regressioncoefficients in
F,q. (3.2). Suppose that n ) k observationsare available, and leL y, denote the lth
observed response and x,, denote the ith observation or level of regressor xr. The
data will appear as in Table 3.1. We assumethat the error term s in the model has
E ( e) : 0,
V a r( e ): c r2 , a n d th a t th e errors are uncorrel ated.
Throughout this chapter we will assume that the regressor variables
x1, x2,..., xk are fixed (i.e., mathematical or nonrandom) variables, measured
without error. However, just as was discussedin Section 2.II for the simple linear
TABLE 3.1
Data for Multiple Linear Regression
Observation,
i
1
2
:
n
Regressors
Response,
v
It
xl
xtt
*?.'
x2
Xr
'r'
xtz
Xt
t.
Xc
r,
:
I"
Xnl
v
n n 2
. . .
Xnk
\-
l:
ESTIMATION
OF THE MODEL PARAMETERS
lcgressionmodel, all of our resultsare still valid for the case
where the regressors
re random variables.This is certainly important, because
when regressiondata
rb€ from an observationalstudy, some or most of the regressors
will be random
reriables' When the data result iio- a designed
experiment,it is more likely that
6e 1'5 will be fixed variables.When the x's are random
variables,it is only
Dcessary that the observationson each regressorbe independent
and that the
{srribution not depend on the regressionc6efficients(the
F's) or on o2. when
T$ing hypothesesor constructing-I.s, we will have to assume,trui rtt.-.."0iffi"j
rtr%""".-.t
€@r
+ ---
r,
. w e rm1>kffivar:.rtncg
@ r e g r e s s i o n m oo-.
delcorreSpondingtoEq.(3.2)as
Fo * Ffit * Fzxn + ... * B1,x,p
* e,
li:
k
:Fo*
DF,*,, Ia;,
i:I,2,...,fl
(3.8)
(r,- Fo8u,.,,)'
,rl
j:1
Tbe least-squares
function is
nn
s( FoF
, r , . . . ,F , , ): D t ? : I
i:l
i:L
The function,s mustbe minimizedwith respectto
Fo, Fr,. . ., Fr,.Theleast-squares
cslimatorsof Fo, By. . ., Fr, must satisff
-2i(''- Bo- : 0
#1u,,8,,
,6o:
,Lu,.,,)
#lr,,B,,
,Bo
',1,(''- Eo
(3.10a)
o'
Eu'.")xri:
simplifing Eq. (3.10),we obtain the least-squaresnormar
equations
nEo
+ p r i x , * B r i x i zr . . . + B r i x i k : i r ,
i:r
i:r
i:r
i:l
nnflnn
p oI x n * B r D x ? t+ i . r f x i t x r z +. . . + B r i x r t x u , :
I xitti
i:r
i:t
i:I
i:\
i:1
n
...*
BrL xir,* BrD xir,x;t+ Er.f:*,**,r+
i:7
i:1.
i:t
pri x?r,:i *,0y,(3.11)
i:L
i:1,
lbte that there are p : k + L normal equations, one for
each of the unknown
:
MULTIPLE LINEAR REGRESSION
T.l :olve the norn
I I' Thus. the least-
regression coefficients. The solution to the normal equations will be the least-
rq,ru"esestimatorr60, 8r,.. ., Eo.
It is more convenient to deal with multiple regressionmodels if they are
expressedin matrix notation. This allows a very compact display of the model,
data,and results.In matrix notation, the model givenby Eq. (3.8) is
rhat the inr
F:lrr*J
firr : the regressor
rFrsalinearcon
f s sas to see thi
r fu r-alar form (3.
y:xF+e
where
v:
,::l
|:lI
|:l
x-
xtt
xtz
xzt
xzz
xnl
B:
ESTI\TATION OF T
r
Lxit
,- |
: _
Xn2
t
1
n
\,.:
L''tl
::.
e-
4
f
l-
-
'-r,
[::]
Xis an nxp matrixof the
In general,yis an n X L vectorof the observations,
levelsof the regressorvariables,B is a p x L vector of the regressioncoefficients,
and e is an n X I vector of random errors.
estimators,B, that minimizes
We wish to find the vector of least-squares
: e't : (y -Xn'(y- xB)
S ( B ): i
":
i:I
rrrlr-r:sJ matrlr
I *
lHrnr
iiltisobtr
aG.
G.,
_r -, L- o
y
r\c Jugonal e
G
|} ns*r.rf
I- and
t*ctr
n rhe co
;f i{}!\ prod
fry
rcgris.io
n
t
*
Note that S( F) may be expressedas
S(B):y'y
-
F'X'y-y'X B+ B'X'I^P
: y , y - 2 B , X , y* B , X , X B
since F'X'y is a 1X l matrix,or a scalar,and its transpose(F'X'y)':y'XB
estimatorsmust satisfy
the samescalar.The least-squares
is
i*lnr
dsl : -2x'y+lx'xp:o
-aF
l
la
:t irrteJ r al
arr;-rr-rtl5
which simplifiesto
tnrtrerJ
tltr
*
X'XP: X'Y
(3.r2)
Equations (3.12) are the least-squares nonnal equations. They are the matrix
analogue of the scalar presentation in (3.11).
r alr
r .tnlra
*"ax
fr r*nElilldr
frrru
n
rr\r
IT
69
ESTIMATION OF THE MODEL PARAMETERS
To solve the normal equations, multiply both sides of (3.L2) by the inverse of
TI. Thus, the least-squares estimator of B is
-tx'y
F : (x'x)
(3.13)
that the inversematrix (X'X)-r exists.The (X'X)-l matrix will always
;nided
if the regressors are linearly independent, that is, if no column of the X
rrix is a linear combinationof the other columns.
It is easyto seethat the matrix form of the normal equations(3.L2)is identical
tte scalarform (3.11).Writing ofi (3.ID in detail, wd obtain
nn
n
F1
fa
i:t
i:l
Lxt
nnn
sr!?sr
L xn
i:l
L xit
n
\-r
Lxiz
i:l
n
\-r
L xnx*
L xifiiz
i:t
i:1,
sr
L X*Xtt
i:l
sr
L Xi*Xiz
i:I
i: 1.
:::
nnnn
sL Xi*
j:1
n
L xir
Fo
B,
Lv'
i:1.
n
D x,r!,
i:I
:
F _ 2* i k
lJ
i:l
:
:
Fr
n
L x,,,v,
i:l
the indicated matrix multiplication is performed, the scalar form of the normal
ions (3.11)is obtained.In this displaywe see that X'X is a p xp symmetric
ix and X'y is a p x 1 column vector. Note the special structure of the X'X
ir The diagonalelementsof X'X are the sumsof squaresof the elementsin
columns of X, and the off-diagonal elementsare the sumsof crossproducts of
elementsin the columns of X. Furthermore, note that the elementsof X'y are
srms of crossproductsof the columnsof X and the observationsy,.
The fitted regressionmodel correspondingto the levels of the regressorvarix ' : [ 1 ,x p x 2 t . . . , x 2 ] i s
k
9:*'B:i.ot
D fl,,
j :1.
vector of fitted vdlues j, corresponding to the observed values y, is
g:XF:x(x'X)-tx'y:Hy
n x n matrix H : X(X'X)-lX'
(3.r4)
is usually called the hat matrix. It maps the
of observed values into a vector of fitted values. The hat matrix and its
rties play a central role in regressionanalysis.
The difference between the observed value y, and the corresponding fitted
fr is the residual €r : li - i. Th" n residualsmay be convenientlywritten in
notation as
e:y-X
(3.1sa)
MULTIPLE
j
LINEAR REGRESSION
:.lIt\t..\TIoN
OF
There are several other ways to express the vector of residuals e that will prove
useful, including
-
L
(3.1sb)
e:y-Xp-y-Hy:(I-H)V
I
I
r
Example 3.1 The DeliveryTime Data
A soft drink bottler is analyzing the vending machine service routes in his
distribution system.He is interestedin predicting the amount of time required by
the route driver to servicethe vending machinesin an outlet. This serviceactivity
includesstockingthe machinewith beverageproducts and minor maintenanceor
housekeeping.
The industrial engineerresponsiblefor the study has suggestedthat
the two most important variablesaffectingthe deliverytime (y) are the number of
casesof product stocked(x1) and the distancewalked by the route driver (x2). The
engineerhas collected25 observationson deliverytime, which are shownin Table
3.2.(Note that this is an expansionof the data set usedin Example2.9.)We will fit
the multiple linear regressionmodel
I
9:
-l
^ - t- , '
a,
a l
--_
!:Fo*Ffir*Brxt*e
_ a
to the deliverytime data in Table 3.2.
- l o
_g
2(
TABLE 3.2 Delivery Time Data for Example3.1
Observation
Number
1,
2
3
4
5
6
7
8
9
10
11
12
t3
1,4
15
l6
T7
18
19
20
2l
22
23
24
25
Delivery Time,
v (min)
16.68
1 1 .5 0
12.03
14.88
13.75
18.11
8.00
t7.83
79.24
21.50
40.33
21,.00
13.50
19.75
24.00
29.00
15.35
19.00
9.50
3 5 .10
L7.90
52.32
18.75
19.83
r0.75
Number of
Cases,
X1
7
J
J
4
6
7
2
7
30
5
r6
10
4
6
9
10
6
7
a
J
17
10
26
9
8
4
Fgrn
Distance,
xr(ft)
560
220
340
80
150
330
110
2t0
1,460
605
688
2r5
255
462
448
776
200
r32
36
770
t40
810
450
635
150
J.{
la*crr
.-ao b€ v
-lrr
rrtrir
of
*rFmr:ia.rn.rl
plO
ffrr
i.rt:ern. Thr
k"r
r 31tr Of r.
lm
I tri:-€F*-al
SUn
.i r-rnable
tl
rc
tmume
intJ son
{f:
:.t}-}n.
t
-e
.-24
-
:.3 33 _
j' ;5 -
i:c<
300
{5r
!^-t
T}rce
l:l
7l
ESTIMATION OF THE MODEL PARAMETERS
15
25
t-----l
t---_lt-l
t'*'
I
]l- ..'. ]
l'l
LrJ
lol
t.l
[- ,,
ro
fl
r1
tF {l
l..l
t-t
l- f.'
l-:1
l.- r
I
'
r
l
rl
r._1-r----..T-----]
tl
.. ' .
[
.t
f
F i
I
]
I
a.
I
t a
I
loa.
Lf::'
I
,
t-l
,
.] F
20 40 60 80
l
i
o
,l l : 3 , t i
,
$
()
(\l
F.']',,,
F*::''l
It
t-t
t '.
l - ''.'
.'
l-.
L.'o
o
F-r-1-T--rT---r--rl
rt
rt
L J
I
Ir 1 case. I
r..r.._..._-Jl
r.l
o
(o
i L';i,:':,,,'l
]b,T,
ro L . l
(\l
o
o
J
,
,
J
o
c)
o
oistriuution]
F]
f,,,,,,,,.l
0
r
o
o
sf
o
400 1000
Figure 3.4 Scatterplotmatrix for the delivery time data from Example3.1.
Graphicscan be very useful in fitting multiple regressionmodels.Figure 3.4 is a
rrlterplot matrix of the delivery time data. This is just a two-dimensionalarray of
plots, where (except for the diagonal) each frame contains a
bdimensional
Glter diagram. Thus, each plot is an attempt to shed.light on the relationship
breen a pair of variables. This is often a better summary of the relationships
a numerical summary(such as displayingthe correlation coefficientsbetween
h
pair of variables) becauseit gives a senseof linearity or nonlinearity of the
d
drtionship and some awarenessof how the individual data points are arranged
tBr the region.
o
Time
79.24
55.49
31.75
8.00
30.00
l-:"":--4;ffi-2
cases
1460
Distance
2'oo
Figure3.5 Three-dimensional
scatterplot
of the deliverytimedatafromExample3.1.
I'
ESTIMATION
lle
least-squares e
MULTIPLE LINEAR REGRESSION
When there are only two regressors,sometimes a three-dimensional scatter
diagram is useful in visualizing the relationship between the response and the
regressors.Figure 3.5 presents this plot for the delivery time data. By spinning
these plots, somesoftwarepackagespermit different views of the point cloud. This
view provides an indication that a multiple linear regressionmodel may provide a
reasonablefit to the data.
To fit the multiple regressionmodel we first form the X matrix and y vector:
x:
1,7
I3
1.3
t4
L6
L7
L2
1,7
130
15
IL6
110
1.4
L6
t9
110
1,6
T7
L3
L17
110
t26
1,9
18
L4
2r0
r460
605
688
215
255
462
448
776
200
r32
36
770
140
810
450
63s
150
v:
-o
|- o
s+
: | zr.o
I
2r.50
40.33
2r.00
r3.50
L0.01
19.75
24.00
29.00
15.35
19.00
9.50
35.10
17.90
52.32
18.75
19.83
r0.75
IJULE 33
OI
Obeervatio
l-umber
:|
,?l
llfi
\:::
|+
roJli i ,ro lto,ztz
fsoozzo
";iSl
I
r0,nzf
2r9
3,055 133,899
I
L33,899 6,725,688l
and the X'y vector is
X'Y :
[;:l:| 2
:fr;
The X'X matrix is
X'X:
lf
Iu,J lrc,
16.68
11.50
12.03
14.88
L3.75
18.11
8.00
17.83
79.24
560
220
340
80
150
330
110
OF
:
|,,,,,iii,t2;l
f,,i,,i:::,,1]l:.::]
I
2
3
4
\
;
7
8
9
l0
lt
l:
l-i
l{
.t5
l6
lt8
l9
l0
:l
:
t;t
lr
x
73
ESTIMATION OF THE MODEL PARAMETERS
least-squares
estimatorof B is
F : (x'x)-tx'v
-'
Lo,z32l
I ssl.oo
I
2r9
rr:,aerlI t,zts.++l
3,055
L0,232 L33,899 6,72s,688J
Lzn,on.oo
l
- o.oooos
sss.oo
o.rrzrsrl - o.oo4448se
wlf
[l;]
25
:
219
I-
l
o.oo4448se0.00274378-0.00004786 7,37s.44
|
ll
|
-o.ooo08367
- 0.00004786 0.00000123
f
1L337,072.00
l
z.z+rztr
rsI
I
: I r.orssonzl
lo.or+:s+al
I
TABLE 3.3 Obsenations, Fitted Values,and Residualsfor Example 3.1
Observation
Number
I
2
J
4
5
6
7
8
9
10
LT
t2
t3
1.4
15
t6
t7
18
79
20
2l
22
23
24
25
li
16.68
1 1 .5 0
12.03
14.88
13.75
18.11
8.00
17.83
79.24
21.50
40.33
21,.00
13.50
19.75
24.00
29.00
15.35
19.00
9.50
3 5 .1 0
17.90
52.32
18.75
19.83
10.75
21,.7081,
10.3536
12.0798
9.9556
t4.1944
18.3996
7.1554
1,6.6734
7r.8203
19.1236
38.0925
2r.5930
12.4730
18.6825
23.3288
29.6629
14.9t36
t5.5514
7.7068
40.8880
20.5142
56.0065
23.3576
24.4028
r0.9626
-s.0287
1.t464
-0.0498
4.9244
-0.4444
-0.2896
0.8446
1,.1566
7.4197
2.3764
2.2375
- 0.5930
r.0270
1,.0675
0.6712
-0.6629
0.4364
3.4486
t.7932
- 5.7880
-2.6142
- 3.6865
- 4.6076
-4.5728
-0.2L26
74
MULTIPLE LINEAR REGRESSION
rJ
ESTIMATION 01
TABLE3.4 MINITAB Outputfor SoftDrink TimeData
Regression
Analysis:
Time versus
Cases,
Distance
is
The regression.equation
T i m e = 2 . 3 4 + 1 . 6 2 c a s e s + 0 . 0 1 - 4 4D i s t a n c e
Coef
2.34L
r_.6159
0.014385
Predictor
Constant
Cases
Distance
S = 3 . 25947
Analysis
R- Sq= 96 .OVo
SE Coef
1-.097
0.L707
0.003613
T
2.r3
9.46
3.98
P
0.044
0.000
0.00r-
R- Sq (adj 1 = 95 .6Vo
a ___
of Variance
Source
Regression
Residual Error
Total
DF
2
22
24
SS
5550.8
2 3 3. 7
5784.5
Source
Cases
Distance
DF
t1
Seq SS
5382.4
1 , 6 8. 4
MS
2775.4
r0 .6
F
P
26L.24
0.000
The least-squares
fit (with the regressioncoefficientsreported to five decimals)is
2/
F
ftln
space.T
rFcnt
any poir
f rr.,- - .. rr. Thus,
E
Ip determ
Jrpr
+ 0.01,438x'
i :2.34123+ L.6L591,x1
Table 3.3 showsthe observationsy, along with the correspondingfitted values f,
and the residualse, from this model.
ComputerOutput
Table 3.4 presents a portion of the MINITAB output for the soft drink delivery
time data in Example 3.1,.While the output format differs from one computer
program to another, this displaycontainsthe information typically generated.Most
of the output in Table 3.4 is a straightforward extensionto the multiple regression
caseof the computer output for simple linear regression.In the next few sections
we will provide explanationsof this output information.
3.2.2 A GeometricalInterpretation of Least Squares
An intuitive geometrical interpretation of least squaresis sometimeshelpful. We
y' :lly!2,...,!rl
as defininga vector
may think of the vector of observations
from the origin to the point A in Figure 3.6. Note that f y !2,. .. , yn form the
coordinatesof an n-dimensionalsamplespace.The samplespacein Figure 3.6 is
three-dimensional.
The X matrix consistsof p (n x L) column vectors,for example,L (a column
vector of L's),xpx2,. . . , Xk. Each of thesecolumnsdefinesa vector from the origin
in the sample space.These p vectors form a p-dimensionalsubspacecalled the
lre. minimizi
E
t lo the estin
fL r dmest to A.
m
spaceis
I:m
space.Tr
Thererore
t- It
rc may write
||F-
x
recognize
lfrrpcrties of tl
Frtal
prop€
Cmider fin
E (B )
&ts'=0and
ESTIMATION
OF THE MODEL
75
PARAMETERS
z___
Figure 3.6
A geometrical interpretation
of least squares.
space.The estimation spacefor p : 2 is shown in Figure 3.6. We may
nt any point in this subspaceby a linear combination of the vectors
r,...,x0. Thus, any point in the estimationspaceis of the form XB.I-et the
XB determinethe point B in Figure 3.6. The squareddistancefrom B to
b just
s ( B ) : ( y- x p ) ' ( y- x B )
bre, minimizingthe squareddistanceof point ,4 defined by the observation
lr y to the estimation spacerequires finding the point in the estimation space
is closestto A. The squareddistancewill be a minimum when the point in the
imation spaceis the foot of the line from A normal (or perpendicular) to the
imation space.This is point C in Figure 3.6. This point is defined by the vector
- XP. Therefore, since y - f : y - XB is perpendicular to the estimation
we mav wnte
X'(y-xB) :o
or
X'XB : X'y
we recognizeas the least-squares
normal equations.
Propertiesof the Least-SquaresEstimators
statisticalproperties of the least-squaresestimator F may be easily demonlrated. Consider first bias:
: r[1x'x)-'x'(xp
* ')l
E(B): a[{x'x)-'v'vl
: r[1x'x)-tx'xB+ (x'x)-tx'tl : B
-rc€ E(e):0
and (x'x)-lx'x:
I. Thus, B ir an unbiasedestimatorof B.
MULTIPLELINEARREGRESSION
76
The variance property of f
it
j
by the covariance matrix
"*ptessed
p):u{lp-E(P)It p-n(P)l')
cov(
which is a p X p symmetricmatrix whose 7th diagonal element is the variance
o1 E,and whos" tA)tft off-$iagonalelement is the covariancebetween B,^and p,.
The covariancematrix of P is found by applyinga varianceoperator to F;
cov(p) : var(B) : var[1x'x)-'x'yl
Now (X'X)-iX'is
-tx,X1x,x) -t
:
otlx'x)
C.3 also sh
of a: is si,
b Lt sJ
fin*rt-
in t he sim
3J The Det
ls
*
-'*']
-lx'
-'v'vl :
var(y)
(x'x)
var(p) : var[(x'x)
[(x'x)
OF T
ls:e:"Jn
fror
htt
a matrix of constants, and the variance of y is o2I, so
: al(x,x)
:-STI\|-{TIO\
r.li .\llmate the e
. f , r r Jelir c. n- t im
-t
Therefore, if we t^etC : (x'X)*l, the varianceo1 E, is ozC,, and the covariance
between B, and F; is o'Cii.
Appendix C.4 dstablishesthat the least-squaresestimator B is the best linear
unbiasedestimatorof B (the Gauss-Markovtheorem).If we further assumethat
the errors s, are normally distributed, then as we will see in Section 3.2.6, p is
alsothe maximum-tikelihoodestimatorof B. The maximum-likelihoodestimatoris
the minimum varianceunbiasedestimatorof B.
F \_r:
[].,
= 18.
ur-:r-r-
*
rJm of sql
3.2.4 Estimation of o2
-S-S*.
As in simple linear regression,we may devetop an estimator of o2 from the
residualsum of squares
's
htrc
stlmat(
irf : e'e
ss*",
i0,-9,)':
i:I
i:T
Substitutinge : y - X B, *"have
- ---_\E-S-n,lel
L
l:i
: y , y - B , X , y- y , X B * B , X , X B
: y , y- Z B , X , *y B , X , X B
*
SS*",
n-p
inh
or
a : '- {
It
r : r 'h is
ar:cl- \l
-r
.kpenr
s Jrrrre(
*
(3.16)
Appendix C.3 shows that the residual sum of squares has n - p degrees of
freedom associatedwith it since p parametersare estimated in the regression
model. The residual mean squareis
MS*", :
scrc'-lrs thr
rq
SinceX'XP : X'y, this last equationbecomes
SS*",: y'y - B'x'y
\B '"::tDU
! mf:
SS*",:(v-I^B)'(v-xB)
th
:3i:-s
:
-Ir
r;;lre
-Fr-r
of Sce
r {}gs:
(3.r7)
h:*c:
r |tt
t(
I thr
t'i
rf scel
ESTIMATION
D
ficndix
-tor
77
OF THE MODEL PARAMETERS
C.3 also showsthat the expected value of MS*", is o2, so an unbiased
of o2 is given by
62 : MS*",
(3.18)
fi ooted in the simple linear regressioncase, this estimator of c2 is model
f:rdent.
hple
3.2 The DeliveryTime Data
estimatethe error varianceaz for the multiple regressionmodel fit to the
triU
J drink deliverytime data in Example3.1. Since
25
y ' y : D y ? : 18,310.6290"
i:r
'
559.601
|
^
:
p7s.44
7
r.6l5s0721
0.01438483I
B'X'y lz.z+tzztts
|
|
L337,072.00
)
: 18,076.90304
t
residualsum of squaresis
SS*"r:Y'Y-B'x'Y
:233.7260
: 18,3L0
.6290- 18,076.9030
Drefore,
the estimateof o2 is the residualmean square
SS*",
62:
n-p
-
233.7260
:
25-3
10.6239
MINITAB output in Table 3.4 reports the residualmean squareas 10.6.
The model-dependentnature of this estimate a2 may be easilydemonstrated.
fture 2.13 displaysthe computer output from a least-squaresfit to the delivery
be data using only one regressor,cases(xr). The residual mean squarefor this
Ddel is 17.5,which is considerablylarger than the result obtained abovefor the
bregressor model. Which estimateis "correct"? Both estimatesare in a sense
Grect, but they depend heavily on the choice of model. Perhapsa better question
I shich model is correct?Since o2 is the varianceof the errors (the unexplained
fise about the regressionline), we would usually prefer a model with a small
rrilual mean squareto a model with a large one.
llc
12.5 Inadequacyof ScatterDiagrams in Multiple Regression
Je sawin Chapter 2 that the scatter diagram is an important tool in analyzingthe
dationship between y and r in simple linear regression.We also saw in Example
t,l that a matrix of scatterplotswas useful in visualizing the relationship between
78
:
MULTIPLE LINEAR REGRESSION
X1
1021
1732
4845
27
12
5556
2664
973co
1684
X2
r--l
tJtl
lf.
][ ,'
t
r--l
tJ
l
.J
[----]rI
l..l
'
o
(f)
I '^
o
br rs in the simple I
Lrr:( srtimators for tl
-rt.
crTors are nol
r*fr..i:rrf,\.
.l
'.h
r:f
c:rf
rt t " l
tl
mOde
Srrors are nol
is distributed
',
[,',
10
The
tiL"l
r-----ll
tl
r'1
\laximum-Likel
F
l."lti
l-.
OF T
f,E:arc- nearly indep
r?so s€r'eral importz
o:rra:lms can be very
Grc. r =L'rn'eensevera
o
ro
'
|
FJTIMATION
]
30
Figure 3.7
50
A matrix of scatterplots.
y and two regressors.It is tempting to concludethat this is a generalconcept;that
is, examiningscatter diagramsof y versus xp ! versus xz,. .., y versus Jk is
alwaysuseful in assessing
the relationshipsbetween y and each of the regressors
xp x2t. . ., xk. Unfortunately,this is not true in general.
Following Daniel and Wood [1980],we illustrate the inadequacyof scatter
diagramsfor a problemwith t'woregressors.Considerthe data shownin Figure 3.7.
These data were generatedfrom the equation
Y:8-5xr*L2x,
The matrix of scatterplotsis shown in Figure 3.7.The y-versus-x,plot does not
exhibit any apparentrelationshipbetweenthe two variables.The y-versus-x,plot
indicatesthat a linear relationship exists,with a slope of approximately8. Note
that both scatter diagramsconveyerroneousinformation. Since in this data set
there are two pairs of points that have the samex, values(xz: 2 and xz:^ 4),
we could measure the x, effect at fixed x, from both pairs. This gives, Fr:
- 8): -5for xz:4
(tl -27)/(3 - 1): -5for xz
^2and Ft:QS-L6)/(6
the correct results. Knowing Fp we could now estimate the x2 effect. This
procedure is not generallyuseful, however,becausemany data sets do not have
duplicatepoints.
This example illustrates that constructing scatter diagrams of y versus xj
(j : 1,2, . . . , k) can be misleading, even in the case of only two regressors
operatingin a perfectly additivefashionwith no noise.A more realisticrejression
situation with severalregressorsand error in the y's would confusethe situation
evenfurther. If there is only one (or a few) dominant regressor,or if the regressors
*
*,
tis*:..xxt functior
r€ ielihood fur
/ - tr . p . o
b
rr[=
Lc jan u'ritt
.rr.I.
'|tr
F.o.)
s'-;'13 linear
*
l.[n
I
p. o:l =
n:r' 1"r3a t-Uer
:
Tkretort
a:lJ;*1
'ti.s-r.lrj
3^Cnt
est
79
ESTIMATION OF THE MODEL PARAMETERS
nearly independently,the matrix of scatterplots is most useful. Flowever,
severalimportant regressorsare themselvesinterrelated,then these scatter
can be very misleading.Analytical methods for sorting out the relationbetweenseveralregressorsand a responseare discussedin Chapter 9.
Maximum-Likelihood Estimation
c in the simple linear regressioncase,we can show that the maximum-likeliestimatorsfor the model parametersin multiple linear regressionwhen the
errors are normally and independentlydistributed are also least-squares
. The model is
E
Y:XBT
the errors are normally and independently distributed with constant variance
'. q e is distributed as N(0, o2I). The normal density function for the errors is
1
f(",):
1
I
_\
, ^ e x p-l ^z a -, r e f l
av z7r
\
I
likelihood function is the joint density of a1, 82,. . . 1 Ent or I-Il:r fG).
the likelihood function is
: n f (",): ^+r, -*pf
L(t,F,.-')
^ +r'r
'
(2n)"'o'
i:1.
\ 2o'
There-
)
I
sincewe can write r : y - X P, the likelihood function becomes
L(y,x,
F,o\' : -+exp[(2o)'/'on
'
\
- xp)'(y
- xp))
'-F
F ) \r
+(v
20"
I
in the simple linear regressioncase,it is convenientto work with the log of the
l n z ( vx, , B , . - ' ) : - ; h ( 2 n ) - n t n ( o )-
fift-xD
'(y- xp)
I b clear that for a fixed value of o the log-likelihoodis maximizedwhen the term
(v-xF)'0-xB)
i minimized. Therefore, the maximum-likelihoodestimator of B under normal
errors is equivalent to the least-squaresestimator F : (X'X)-tX'y. The
Drimum-liketihood estimator of o2 is
-)
o.- -
(y-xF)'(v-xB)
80
MULTIPLE LINEAR REGRESSION
. , II | H E S I S T E I
These are multiple linear regression generalizations of the results given for simple
linear regression in Section 2.10. The statistical properties of the maximum-likelihood estimators are summarized in Section 2.L0.
3.3
I{YPOTHESIS TESTING IN MULTIPLE
lj
:(Ft,F
LINEAR RBGRESSION
Once we have estimatedthe parametersin the model, we face two immediate
questions:
1. What is the overalladequacyof the model?
2. Which specificregressors
seemimportant?
--rcd me
r;ir that
. :hcn F(
l:rcJomI
Severalhypothesistestingproceduresprove usefulfor addressingthesequestions.
The formal tests require that our random errors be independentand follow a
normaldistribution
with mean E(e,):0 andvarianceVar(e,): o2.
3.3.1 Test for Significance of Regression
The test for significance of regression is a test to determine if there is a linear
relationship between the response y and any of the regressor variables
x12x.t
xo. This procedure is often thought of as an overall or global test of
model adequacy.The appropriate hypothesesare
Ho: Fr:
:9*:0
Fr:
Hr: B, + 0
::.:litrpi
'.-.1\t
On
: ] l ' \ L l t ct
for at least one /
.
Rejection of this null hypothesis implies that at least one of the regressors
x11x2
xo contributes significantly to the model.
The test procedure is a generalization of the analysis of variance used in simple
linear regression. The total sum of squares SS., is partitioned into a sum of
squares due to regression, SSp, and a residual sum of squares, SS*"r. Thus,
1.,) |\
' ':t.rl
f,
SSr:SS*+SSo",
Appendix C.3 shows that if the null hypothesis is true, then SS*/o2 follows & X;:
distribution, which has the same number of degrees of freedom as number of
regressorvariables in the model. Appendix C.3 also shows that SS*.,/ o' - Xn2-tand that SS*". and SS* are independent. By the definition of an F statistic given
in Appendix C.1,
rtro -
ssR/k
MS*
SS*../(n-k-1)
MS*",
r.j. 1;f
follows the Fo.n-k-l distribution. Appendix C.3 shows that
. . tt ,)
E(MSR..): o'
E(MSR):
o'+
F*
F*'X'rX..
kd
'il'irnr'!3,.'
"{HEFflIll
Download