HAPTER 3 Multiple Linear Regression -:gression model that involves more than one regressor variable is called a : -rltiple regression model. Fitting and analyzing these models is discussedin this .rter. The results are extensions of those in Chapter 2 for simple linear -:cssion. 1I \IULTIPLE REGRESSION MODELS .-rose that the yield in pounds of conversion in a chemical process depends on " rcrature and the catalyst concentration. A multiple regression model that *:t describethis relationship is l: F o - tF f i r + F z x r * e (3 . 1 ) -:e ,1, denotes the yield, x, denotes the temperature, and x2 denotes the ,.rst concentration. This is a multiple linear regression model with two regres- '.ariables. The term linear is used becauseEq. (3.1) is linear function a of the ..:r()wnparameters Be, Br, and B,r. . ne regression model in Eq. (3.1) describes a plane in the three-dimensional .-: of l, xt, and xr. Figure3.la shows this regressionplane for the model E(y):50*10x,t7x, ;r-d we have assumedthat the expected value of the error term e in Eq. (3.1) is - ,. The parameter Bo is the intercept of the regressionplane. If the range of the : - r i n c l u d e s1 1 : x z : 0 , t h e n F o i s t h e m e a n o f y w h e i l x t : x 2 : 0 . O t h e r w i s e has no physical interpretation. The parameter Fr indicates the expected ,nse iil response (y) per unit change in x, when x, is held constant. Similarly measures the expected change in y per unit change in x, when x, is held :r:triflt. Figure 3.1b shows a contour plot of the regressionmodel, that is, lines of .luction to Linear RegressionAnalysis, Fourth Edition. :)r)uglas C. Montgomery,F,lizabeth A. Peck, and Geoffrey Vining. ',rieht @ 2006 John Wiley & Sons, Inc. 63 64 MULTIPLE LINEAR REGRESSION -:tt_i f L |l * ii(,R 240 2AO 160 E(y)12o 80 4A 0 0L 0 6 (a) I 10x1 (b) Figure3.1 (a) ttre regression plane for the model E(y) :50 + L0rr -r 7xr. (b) The contour plot. constant expected response E(y) as a function of x, and xr. Notice that the contour lines in this plot are parallel straight lines. In general, the response y may be related to k regressor or predictor variables. The model f : F o i -F f l t t Fzxz+ ...+ Bpxpt e (3.2) is called a multiple linear regression model with k regressors. The parameters F.,. j :0,1,...,k, are called the regression coefficients. This model describes a hyperplane in the k-dimensional space of the regressor variables xr. The parameter Bt represents the expected change in the response y per unit change in x when all of the remaining regressor variables xi(i + /) are held constant. For this reason the parameters Fi, j : 1,2, . . . , k, are often called partiat regression coefficients. Multiple linear regression models are often used as empirical models or approximating functions. That is, the true functional relationship between y and xy x2,. . ., xk is unknown, but over certain ranges of the regressor variables the linear regression model is an adequate approximation to the true unknown function. Models that are more complex in structure than Eq. (3.2) may often still be analyzed by multiple linear regression techniques. For example, consider the cubic polynomial model !:Fo*Ffi*Fzxz+Brx3+e ( 3 . 3| (3.4) which is a multiple linear regressionmodel with three regressorvariables.Polyno mial modelswill be discussedin more detail in Chapter 7. Models that include interaction effectsmay also be analyzedby multiple linear regressionmethods.For example,supposethat the model is ! : B o - f F f i r t F z x z* F n x r x , * e r- .lr xE.H ir -J tir.l_rS L (3.5) t u--*Oc- s fu ::rrJ tl t} fraa {tbtrlll !j rr-..si a :5al trFr grr: of * I tl:n I :o ,r - -.: lnt, ksa tlMs $c1 ffi<.r.r l - , i - Le If we let xr: x, x2: x2, and .x3: x3, then Eq. (3.3)can be written as f:Fo1-Ffir*Fzxz*Brxr+.e llr -, t t *" + & = r, r:rcn t r-5, --:c d Ll rnr {, r = t 3.I MULTIPLE 65 REGRESSION MODELS xz 10 I 6 4 10 Nill 2 0 (a) (b) plot of regressionmodel E(y) :50 * 1.0x1* 7x2 * 5xp2. (D) The ftlre 3.2 (c) Three-dimensional otour plot. I we let x3 : xtxz and B3 : Ftz, then Eq. (3.5) can be written as . y:Fo*Qflr*Fzxz*Brxr*e (3.6) Itich is a linear regressionmodel. Figure 3.2a showsthe three-dimensionalplot of the regressionmodel 1l:50*10x, *7x, l5xrx2 rd Figure 3.2b the correspondingtwo-dimensionalcontour plot. Notice that, *bough this model is a linear regressionmodel, the shape of the surface that is ;nerated by the model is not linear. In general, any regressionmodel that is in the parameters (the B's) is a linear regressionmodel, regardlessof the hrr of -pe the surface that it generates. Figure 3.2 provides a nice graphical interpretation of an interaction. Generally, beraction implies that the effect produced by changing one variable (xr, say) tpends on the level of the other variable (x). For example,Figure 3.2 showsthat &nging x, from 2 to 8 producesa much smaller changein E(y) when xz : 2 whefl xz: 10. Interaction effectsoccur frequently in the study and analysis h d real-world systems,and regressionmethods are one of the techniquesthat we C! use to describethem. As a final example,consider the second-ordermodel with interaction + Brrxrx, * e ! : Fo * Ftxr * Fzxz + Fnx? * Fzzx2z (3.7) f we let x, .: x?, x4: xtr,x5 : xtx2, Fz: Fw F+: Br2, and Fs: Fp, then D+ (3.7) can be written as a multiple linear regressionmodel as follows: !: F o * F f i r * F z x z * F z x t + .F + x q * B r x r * e Figure 3.3 showsthe three-dimensionalplot and the correspondingcontour plot A0) : 800 * 1,0x,* 7x, - 8.5x2,- 5*7 * 4xrx, 66 MULTIPLE LINEAR REGRESSION 800 600 400 200 0 (a) OF THI : m odel, all o1 . rl variables.' ' iln observat . \ \ hen t he da I be fixed r " : : t ir t t he obs :t r.rtttdepend : - , t h c s e so r c o giren t 'i -____ . r ' 1000 E(v) - . . :\ T I O N (b) , Figure 3.3 (a) Three-dimensional plot of the regression model E(y) -- 800 + 10xr * 7x2 - 8.5x1 5xj + 4xrxr. (b) The contour plot. Liflcl Vanan ----------{, ,\ntL' thC Sa - t , These plots indicate that the expected change in y when x, is changed by one unit (say) is a function of both x, and xz. The quadratic and interaction terms in this model produce a mound-shaped function. Depending on the values of the regression coefficients, the second-order model with interaction is capable of assuminga wide variety of shapes;thus, it is a very flexible regression model. In most real-world problems, the values of the parameters (the regression coefficients Fi) and the error variance o2 will not be known, and they must be estimated from sample data. The fitted regression equation or model is typically used in prediction of future observations of the response variable y or for estimating the mean response at particular levels of the y's. : . . . . . r r c fsu n c t i \, /l R ,",. t_1... \ ilLrst be n - 1 /1 .,.\ 3.2 ESTIMATION OF THE MODEL PARAMETERS , . 1 J . p t 3.2.1 Least-SquaresEstimation of the RegressionCoefTicients The method of least squares can be used to estimate the regressioncoefficients in F,q. (3.2). Suppose that n ) k observationsare available, and leL y, denote the lth observed response and x,, denote the ith observation or level of regressor xr. The data will appear as in Table 3.1. We assumethat the error term s in the model has E ( e) : 0, V a r( e ): c r2 , a n d th a t th e errors are uncorrel ated. Throughout this chapter we will assume that the regressor variables x1, x2,..., xk are fixed (i.e., mathematical or nonrandom) variables, measured without error. However, just as was discussedin Section 2.II for the simple linear TABLE 3.1 Data for Multiple Linear Regression Observation, i 1 2 : n Regressors Response, v It xl xtt *?.' x2 Xr 'r' xtz Xt t. Xc r, : I" Xnl v n n 2 . . . Xnk \- l: ESTIMATION OF THE MODEL PARAMETERS lcgressionmodel, all of our resultsare still valid for the case where the regressors re random variables.This is certainly important, because when regressiondata rb€ from an observationalstudy, some or most of the regressors will be random reriables' When the data result iio- a designed experiment,it is more likely that 6e 1'5 will be fixed variables.When the x's are random variables,it is only Dcessary that the observationson each regressorbe independent and that the {srribution not depend on the regressionc6efficients(the F's) or on o2. when T$ing hypothesesor constructing-I.s, we will have to assume,trui rtt.-.."0iffi"j rtr%""".-.t €@r + --- r, . w e rm1>kffivar:.rtncg @ r e g r e s s i o n m oo-. delcorreSpondingtoEq.(3.2)as Fo * Ffit * Fzxn + ... * B1,x,p * e, li: k :Fo* DF,*,, Ia;, i:I,2,...,fl (3.8) (r,- Fo8u,.,,)' ,rl j:1 Tbe least-squares function is nn s( FoF , r , . . . ,F , , ): D t ? : I i:l i:L The function,s mustbe minimizedwith respectto Fo, Fr,. . ., Fr,.Theleast-squares cslimatorsof Fo, By. . ., Fr, must satisff -2i(''- Bo- : 0 #1u,,8,, ,6o: ,Lu,.,,) #lr,,B,, ,Bo ',1,(''- Eo (3.10a) o' Eu'.")xri: simplifing Eq. (3.10),we obtain the least-squaresnormar equations nEo + p r i x , * B r i x i zr . . . + B r i x i k : i r , i:r i:r i:r i:l nnflnn p oI x n * B r D x ? t+ i . r f x i t x r z +. . . + B r i x r t x u , : I xitti i:r i:t i:I i:\ i:1 n ...* BrL xir,* BrD xir,x;t+ Er.f:*,**,r+ i:7 i:1. i:t pri x?r,:i *,0y,(3.11) i:L i:1, lbte that there are p : k + L normal equations, one for each of the unknown : MULTIPLE LINEAR REGRESSION T.l :olve the norn I I' Thus. the least- regression coefficients. The solution to the normal equations will be the least- rq,ru"esestimatorr60, 8r,.. ., Eo. It is more convenient to deal with multiple regressionmodels if they are expressedin matrix notation. This allows a very compact display of the model, data,and results.In matrix notation, the model givenby Eq. (3.8) is rhat the inr F:lrr*J firr : the regressor rFrsalinearcon f s sas to see thi r fu r-alar form (3. y:xF+e where v: ,::l |:lI |:l x- xtt xtz xzt xzz xnl B: ESTI\TATION OF T r Lxit ,- | : _ Xn2 t 1 n \,.: L''tl ::. e- 4 f l- - '-r, [::] Xis an nxp matrixof the In general,yis an n X L vectorof the observations, levelsof the regressorvariables,B is a p x L vector of the regressioncoefficients, and e is an n X I vector of random errors. estimators,B, that minimizes We wish to find the vector of least-squares : e't : (y -Xn'(y- xB) S ( B ): i ": i:I rrrlr-r:sJ matrlr I * lHrnr iiltisobtr aG. G., _r -, L- o y r\c Jugonal e G |} ns*r.rf I- and t*ctr n rhe co ;f i{}!\ prod fry rcgris.io n t * Note that S( F) may be expressedas S(B):y'y - F'X'y-y'X B+ B'X'I^P : y , y - 2 B , X , y* B , X , X B since F'X'y is a 1X l matrix,or a scalar,and its transpose(F'X'y)':y'XB estimatorsmust satisfy the samescalar.The least-squares is i*lnr dsl : -2x'y+lx'xp:o -aF l la :t irrteJ r al arr;-rr-rtl5 which simplifiesto tnrtrerJ tltr * X'XP: X'Y (3.r2) Equations (3.12) are the least-squares nonnal equations. They are the matrix analogue of the scalar presentation in (3.11). r alr r .tnlra *"ax fr r*nElilldr frrru n rr\r IT 69 ESTIMATION OF THE MODEL PARAMETERS To solve the normal equations, multiply both sides of (3.L2) by the inverse of TI. Thus, the least-squares estimator of B is -tx'y F : (x'x) (3.13) that the inversematrix (X'X)-r exists.The (X'X)-l matrix will always ;nided if the regressors are linearly independent, that is, if no column of the X rrix is a linear combinationof the other columns. It is easyto seethat the matrix form of the normal equations(3.L2)is identical tte scalarform (3.11).Writing ofi (3.ID in detail, wd obtain nn n F1 fa i:t i:l Lxt nnn sr!?sr L xn i:l L xit n \-r Lxiz i:l n \-r L xnx* L xifiiz i:t i:1, sr L X*Xtt i:l sr L Xi*Xiz i:I i: 1. ::: nnnn sL Xi* j:1 n L xir Fo B, Lv' i:1. n D x,r!, i:I : F _ 2* i k lJ i:l : : Fr n L x,,,v, i:l the indicated matrix multiplication is performed, the scalar form of the normal ions (3.11)is obtained.In this displaywe see that X'X is a p xp symmetric ix and X'y is a p x 1 column vector. Note the special structure of the X'X ir The diagonalelementsof X'X are the sumsof squaresof the elementsin columns of X, and the off-diagonal elementsare the sumsof crossproducts of elementsin the columns of X. Furthermore, note that the elementsof X'y are srms of crossproductsof the columnsof X and the observationsy,. The fitted regressionmodel correspondingto the levels of the regressorvarix ' : [ 1 ,x p x 2 t . . . , x 2 ] i s k 9:*'B:i.ot D fl,, j :1. vector of fitted vdlues j, corresponding to the observed values y, is g:XF:x(x'X)-tx'y:Hy n x n matrix H : X(X'X)-lX' (3.r4) is usually called the hat matrix. It maps the of observed values into a vector of fitted values. The hat matrix and its rties play a central role in regressionanalysis. The difference between the observed value y, and the corresponding fitted fr is the residual €r : li - i. Th" n residualsmay be convenientlywritten in notation as e:y-X (3.1sa) MULTIPLE j LINEAR REGRESSION :.lIt\t..\TIoN OF There are several other ways to express the vector of residuals e that will prove useful, including - L (3.1sb) e:y-Xp-y-Hy:(I-H)V I I r Example 3.1 The DeliveryTime Data A soft drink bottler is analyzing the vending machine service routes in his distribution system.He is interestedin predicting the amount of time required by the route driver to servicethe vending machinesin an outlet. This serviceactivity includesstockingthe machinewith beverageproducts and minor maintenanceor housekeeping. The industrial engineerresponsiblefor the study has suggestedthat the two most important variablesaffectingthe deliverytime (y) are the number of casesof product stocked(x1) and the distancewalked by the route driver (x2). The engineerhas collected25 observationson deliverytime, which are shownin Table 3.2.(Note that this is an expansionof the data set usedin Example2.9.)We will fit the multiple linear regressionmodel I 9: -l ^ - t- , ' a, a l --_ !:Fo*Ffir*Brxt*e _ a to the deliverytime data in Table 3.2. - l o _g 2( TABLE 3.2 Delivery Time Data for Example3.1 Observation Number 1, 2 3 4 5 6 7 8 9 10 11 12 t3 1,4 15 l6 T7 18 19 20 2l 22 23 24 25 Delivery Time, v (min) 16.68 1 1 .5 0 12.03 14.88 13.75 18.11 8.00 t7.83 79.24 21.50 40.33 21,.00 13.50 19.75 24.00 29.00 15.35 19.00 9.50 3 5 .10 L7.90 52.32 18.75 19.83 r0.75 Number of Cases, X1 7 J J 4 6 7 2 7 30 5 r6 10 4 6 9 10 6 7 a J 17 10 26 9 8 4 Fgrn Distance, xr(ft) 560 220 340 80 150 330 110 2t0 1,460 605 688 2r5 255 462 448 776 200 r32 36 770 t40 810 450 635 150 J.{ la*crr .-ao b€ v -lrr rrtrir of *rFmr:ia.rn.rl plO ffrr i.rt:ern. Thr k"r r 31tr Of r. lm I tri:-€F*-al SUn .i r-rnable tl rc tmume intJ son {f: :.t}-}n. t -e .-24 - :.3 33 _ j' ;5 - i:c< 300 {5r !^-t T}rce l:l 7l ESTIMATION OF THE MODEL PARAMETERS 15 25 t-----l t---_lt-l t'*' I ]l- ..'. ] l'l LrJ lol t.l [- ,, ro fl r1 tF {l l..l t-t l- f.' l-:1 l.- r I ' r l rl r._1-r----..T-----] tl .. ' . [ .t f F i I ] I a. I t a I loa. Lf::' I , t-l , .] F 20 40 60 80 l i o ,l l : 3 , t i , $ () (\l F.']',,, F*::''l It t-t t '. l - ''.' .' l-. L.'o o F-r-1-T--rT---r--rl rt rt L J I Ir 1 case. I r..r.._..._-Jl r.l o (o i L';i,:':,,,'l ]b,T, ro L . l (\l o o J , , J o c) o oistriuution] F] f,,,,,,,,.l 0 r o o sf o 400 1000 Figure 3.4 Scatterplotmatrix for the delivery time data from Example3.1. Graphicscan be very useful in fitting multiple regressionmodels.Figure 3.4 is a rrlterplot matrix of the delivery time data. This is just a two-dimensionalarray of plots, where (except for the diagonal) each frame contains a bdimensional Glter diagram. Thus, each plot is an attempt to shed.light on the relationship breen a pair of variables. This is often a better summary of the relationships a numerical summary(such as displayingthe correlation coefficientsbetween h pair of variables) becauseit gives a senseof linearity or nonlinearity of the d drtionship and some awarenessof how the individual data points are arranged tBr the region. o Time 79.24 55.49 31.75 8.00 30.00 l-:"":--4;ffi-2 cases 1460 Distance 2'oo Figure3.5 Three-dimensional scatterplot of the deliverytimedatafromExample3.1. I' ESTIMATION lle least-squares e MULTIPLE LINEAR REGRESSION When there are only two regressors,sometimes a three-dimensional scatter diagram is useful in visualizing the relationship between the response and the regressors.Figure 3.5 presents this plot for the delivery time data. By spinning these plots, somesoftwarepackagespermit different views of the point cloud. This view provides an indication that a multiple linear regressionmodel may provide a reasonablefit to the data. To fit the multiple regressionmodel we first form the X matrix and y vector: x: 1,7 I3 1.3 t4 L6 L7 L2 1,7 130 15 IL6 110 1.4 L6 t9 110 1,6 T7 L3 L17 110 t26 1,9 18 L4 2r0 r460 605 688 215 255 462 448 776 200 r32 36 770 140 810 450 63s 150 v: -o |- o s+ : | zr.o I 2r.50 40.33 2r.00 r3.50 L0.01 19.75 24.00 29.00 15.35 19.00 9.50 35.10 17.90 52.32 18.75 19.83 r0.75 IJULE 33 OI Obeervatio l-umber :| ,?l llfi \::: |+ roJli i ,ro lto,ztz fsoozzo ";iSl I r0,nzf 2r9 3,055 133,899 I L33,899 6,725,688l and the X'y vector is X'Y : [;:l:| 2 :fr; The X'X matrix is X'X: lf Iu,J lrc, 16.68 11.50 12.03 14.88 L3.75 18.11 8.00 17.83 79.24 560 220 340 80 150 330 110 OF : |,,,,,iii,t2;l f,,i,,i:::,,1]l:.::] I 2 3 4 \ ; 7 8 9 l0 lt l: l-i l{ .t5 l6 lt8 l9 l0 :l : t;t lr x 73 ESTIMATION OF THE MODEL PARAMETERS least-squares estimatorof B is F : (x'x)-tx'v -' Lo,z32l I ssl.oo I 2r9 rr:,aerlI t,zts.++l 3,055 L0,232 L33,899 6,72s,688J Lzn,on.oo l - o.oooos sss.oo o.rrzrsrl - o.oo4448se wlf [l;] 25 : 219 I- l o.oo4448se0.00274378-0.00004786 7,37s.44 | ll | -o.ooo08367 - 0.00004786 0.00000123 f 1L337,072.00 l z.z+rztr rsI I : I r.orssonzl lo.or+:s+al I TABLE 3.3 Obsenations, Fitted Values,and Residualsfor Example 3.1 Observation Number I 2 J 4 5 6 7 8 9 10 LT t2 t3 1.4 15 t6 t7 18 79 20 2l 22 23 24 25 li 16.68 1 1 .5 0 12.03 14.88 13.75 18.11 8.00 17.83 79.24 21.50 40.33 21,.00 13.50 19.75 24.00 29.00 15.35 19.00 9.50 3 5 .1 0 17.90 52.32 18.75 19.83 10.75 21,.7081, 10.3536 12.0798 9.9556 t4.1944 18.3996 7.1554 1,6.6734 7r.8203 19.1236 38.0925 2r.5930 12.4730 18.6825 23.3288 29.6629 14.9t36 t5.5514 7.7068 40.8880 20.5142 56.0065 23.3576 24.4028 r0.9626 -s.0287 1.t464 -0.0498 4.9244 -0.4444 -0.2896 0.8446 1,.1566 7.4197 2.3764 2.2375 - 0.5930 r.0270 1,.0675 0.6712 -0.6629 0.4364 3.4486 t.7932 - 5.7880 -2.6142 - 3.6865 - 4.6076 -4.5728 -0.2L26 74 MULTIPLE LINEAR REGRESSION rJ ESTIMATION 01 TABLE3.4 MINITAB Outputfor SoftDrink TimeData Regression Analysis: Time versus Cases, Distance is The regression.equation T i m e = 2 . 3 4 + 1 . 6 2 c a s e s + 0 . 0 1 - 4 4D i s t a n c e Coef 2.34L r_.6159 0.014385 Predictor Constant Cases Distance S = 3 . 25947 Analysis R- Sq= 96 .OVo SE Coef 1-.097 0.L707 0.003613 T 2.r3 9.46 3.98 P 0.044 0.000 0.00r- R- Sq (adj 1 = 95 .6Vo a ___ of Variance Source Regression Residual Error Total DF 2 22 24 SS 5550.8 2 3 3. 7 5784.5 Source Cases Distance DF t1 Seq SS 5382.4 1 , 6 8. 4 MS 2775.4 r0 .6 F P 26L.24 0.000 The least-squares fit (with the regressioncoefficientsreported to five decimals)is 2/ F ftln space.T rFcnt any poir f rr.,- - .. rr. Thus, E Ip determ Jrpr + 0.01,438x' i :2.34123+ L.6L591,x1 Table 3.3 showsthe observationsy, along with the correspondingfitted values f, and the residualse, from this model. ComputerOutput Table 3.4 presents a portion of the MINITAB output for the soft drink delivery time data in Example 3.1,.While the output format differs from one computer program to another, this displaycontainsthe information typically generated.Most of the output in Table 3.4 is a straightforward extensionto the multiple regression caseof the computer output for simple linear regression.In the next few sections we will provide explanationsof this output information. 3.2.2 A GeometricalInterpretation of Least Squares An intuitive geometrical interpretation of least squaresis sometimeshelpful. We y' :lly!2,...,!rl as defininga vector may think of the vector of observations from the origin to the point A in Figure 3.6. Note that f y !2,. .. , yn form the coordinatesof an n-dimensionalsamplespace.The samplespacein Figure 3.6 is three-dimensional. The X matrix consistsof p (n x L) column vectors,for example,L (a column vector of L's),xpx2,. . . , Xk. Each of thesecolumnsdefinesa vector from the origin in the sample space.These p vectors form a p-dimensionalsubspacecalled the lre. minimizi E t lo the estin fL r dmest to A. m spaceis I:m space.Tr Thererore t- It rc may write ||F- x recognize lfrrpcrties of tl Frtal prop€ Cmider fin E (B ) &ts'=0and ESTIMATION OF THE MODEL 75 PARAMETERS z___ Figure 3.6 A geometrical interpretation of least squares. space.The estimation spacefor p : 2 is shown in Figure 3.6. We may nt any point in this subspaceby a linear combination of the vectors r,...,x0. Thus, any point in the estimationspaceis of the form XB.I-et the XB determinethe point B in Figure 3.6. The squareddistancefrom B to b just s ( B ) : ( y- x p ) ' ( y- x B ) bre, minimizingthe squareddistanceof point ,4 defined by the observation lr y to the estimation spacerequires finding the point in the estimation space is closestto A. The squareddistancewill be a minimum when the point in the imation spaceis the foot of the line from A normal (or perpendicular) to the imation space.This is point C in Figure 3.6. This point is defined by the vector - XP. Therefore, since y - f : y - XB is perpendicular to the estimation we mav wnte X'(y-xB) :o or X'XB : X'y we recognizeas the least-squares normal equations. Propertiesof the Least-SquaresEstimators statisticalproperties of the least-squaresestimator F may be easily demonlrated. Consider first bias: : r[1x'x)-'x'(xp * ')l E(B): a[{x'x)-'v'vl : r[1x'x)-tx'xB+ (x'x)-tx'tl : B -rc€ E(e):0 and (x'x)-lx'x: I. Thus, B ir an unbiasedestimatorof B. MULTIPLELINEARREGRESSION 76 The variance property of f it j by the covariance matrix "*ptessed p):u{lp-E(P)It p-n(P)l') cov( which is a p X p symmetricmatrix whose 7th diagonal element is the variance o1 E,and whos" tA)tft off-$iagonalelement is the covariancebetween B,^and p,. The covariancematrix of P is found by applyinga varianceoperator to F; cov(p) : var(B) : var[1x'x)-'x'yl Now (X'X)-iX'is -tx,X1x,x) -t : otlx'x) C.3 also sh of a: is si, b Lt sJ fin*rt- in t he sim 3J The Det ls * -'*'] -lx' -'v'vl : var(y) (x'x) var(p) : var[(x'x) [(x'x) OF T ls:e:"Jn fror htt a matrix of constants, and the variance of y is o2I, so : al(x,x) :-STI\|-{TIO\ r.li .\llmate the e . f , r r Jelir c. n- t im -t Therefore, if we t^etC : (x'X)*l, the varianceo1 E, is ozC,, and the covariance between B, and F; is o'Cii. Appendix C.4 dstablishesthat the least-squaresestimator B is the best linear unbiasedestimatorof B (the Gauss-Markovtheorem).If we further assumethat the errors s, are normally distributed, then as we will see in Section 3.2.6, p is alsothe maximum-tikelihoodestimatorof B. The maximum-likelihoodestimatoris the minimum varianceunbiasedestimatorof B. F \_r: []., = 18. ur-:r-r- * rJm of sql 3.2.4 Estimation of o2 -S-S*. As in simple linear regression,we may devetop an estimator of o2 from the residualsum of squares 's htrc stlmat( irf : e'e ss*", i0,-9,)': i:I i:T Substitutinge : y - X B, *"have - ---_\E-S-n,lel L l:i : y , y - B , X , y- y , X B * B , X , X B : y , y- Z B , X , *y B , X , X B * SS*", n-p inh or a : '- { It r : r 'h is ar:cl- \l -r .kpenr s Jrrrre( * (3.16) Appendix C.3 shows that the residual sum of squares has n - p degrees of freedom associatedwith it since p parametersare estimated in the regression model. The residual mean squareis MS*", : scrc'-lrs thr rq SinceX'XP : X'y, this last equationbecomes SS*",: y'y - B'x'y \B '"::tDU ! mf: SS*",:(v-I^B)'(v-xB) th :3i:-s : -Ir r;;lre -Fr-r of Sce r {}gs: (3.r7) h:*c: r |tt t( I thr t'i rf scel ESTIMATION D ficndix -tor 77 OF THE MODEL PARAMETERS C.3 also showsthat the expected value of MS*", is o2, so an unbiased of o2 is given by 62 : MS*", (3.18) fi ooted in the simple linear regressioncase, this estimator of c2 is model f:rdent. hple 3.2 The DeliveryTime Data estimatethe error varianceaz for the multiple regressionmodel fit to the triU J drink deliverytime data in Example3.1. Since 25 y ' y : D y ? : 18,310.6290" i:r ' 559.601 | ^ : p7s.44 7 r.6l5s0721 0.01438483I B'X'y lz.z+tzztts | | L337,072.00 ) : 18,076.90304 t residualsum of squaresis SS*"r:Y'Y-B'x'Y :233.7260 : 18,3L0 .6290- 18,076.9030 Drefore, the estimateof o2 is the residualmean square SS*", 62: n-p - 233.7260 : 25-3 10.6239 MINITAB output in Table 3.4 reports the residualmean squareas 10.6. The model-dependentnature of this estimate a2 may be easilydemonstrated. fture 2.13 displaysthe computer output from a least-squaresfit to the delivery be data using only one regressor,cases(xr). The residual mean squarefor this Ddel is 17.5,which is considerablylarger than the result obtained abovefor the bregressor model. Which estimateis "correct"? Both estimatesare in a sense Grect, but they depend heavily on the choice of model. Perhapsa better question I shich model is correct?Since o2 is the varianceof the errors (the unexplained fise about the regressionline), we would usually prefer a model with a small rrilual mean squareto a model with a large one. llc 12.5 Inadequacyof ScatterDiagrams in Multiple Regression Je sawin Chapter 2 that the scatter diagram is an important tool in analyzingthe dationship between y and r in simple linear regression.We also saw in Example t,l that a matrix of scatterplotswas useful in visualizing the relationship between 78 : MULTIPLE LINEAR REGRESSION X1 1021 1732 4845 27 12 5556 2664 973co 1684 X2 r--l tJtl lf. ][ ,' t r--l tJ l .J [----]rI l..l ' o (f) I '^ o br rs in the simple I Lrr:( srtimators for tl -rt. crTors are nol r*fr..i:rrf,\. .l '.h r:f c:rf rt t " l tl mOde Srrors are nol is distributed ', [,', 10 The tiL"l r-----ll tl r'1 \laximum-Likel F l."lti l-. OF T f,E:arc- nearly indep r?so s€r'eral importz o:rra:lms can be very Grc. r =L'rn'eensevera o ro ' | FJTIMATION ] 30 Figure 3.7 50 A matrix of scatterplots. y and two regressors.It is tempting to concludethat this is a generalconcept;that is, examiningscatter diagramsof y versus xp ! versus xz,. .., y versus Jk is alwaysuseful in assessing the relationshipsbetween y and each of the regressors xp x2t. . ., xk. Unfortunately,this is not true in general. Following Daniel and Wood [1980],we illustrate the inadequacyof scatter diagramsfor a problemwith t'woregressors.Considerthe data shownin Figure 3.7. These data were generatedfrom the equation Y:8-5xr*L2x, The matrix of scatterplotsis shown in Figure 3.7.The y-versus-x,plot does not exhibit any apparentrelationshipbetweenthe two variables.The y-versus-x,plot indicatesthat a linear relationship exists,with a slope of approximately8. Note that both scatter diagramsconveyerroneousinformation. Since in this data set there are two pairs of points that have the samex, values(xz: 2 and xz:^ 4), we could measure the x, effect at fixed x, from both pairs. This gives, Fr: - 8): -5for xz:4 (tl -27)/(3 - 1): -5for xz ^2and Ft:QS-L6)/(6 the correct results. Knowing Fp we could now estimate the x2 effect. This procedure is not generallyuseful, however,becausemany data sets do not have duplicatepoints. This example illustrates that constructing scatter diagrams of y versus xj (j : 1,2, . . . , k) can be misleading, even in the case of only two regressors operatingin a perfectly additivefashionwith no noise.A more realisticrejression situation with severalregressorsand error in the y's would confusethe situation evenfurther. If there is only one (or a few) dominant regressor,or if the regressors * *, tis*:..xxt functior r€ ielihood fur / - tr . p . o b rr[= Lc jan u'ritt .rr.I. '|tr F.o.) s'-;'13 linear * l.[n I p. o:l = n:r' 1"r3a t-Uer : Tkretort a:lJ;*1 'ti.s-r.lrj 3^Cnt est 79 ESTIMATION OF THE MODEL PARAMETERS nearly independently,the matrix of scatterplots is most useful. Flowever, severalimportant regressorsare themselvesinterrelated,then these scatter can be very misleading.Analytical methods for sorting out the relationbetweenseveralregressorsand a responseare discussedin Chapter 9. Maximum-Likelihood Estimation c in the simple linear regressioncase,we can show that the maximum-likeliestimatorsfor the model parametersin multiple linear regressionwhen the errors are normally and independentlydistributed are also least-squares . The model is E Y:XBT the errors are normally and independently distributed with constant variance '. q e is distributed as N(0, o2I). The normal density function for the errors is 1 f(",): 1 I _\ , ^ e x p-l ^z a -, r e f l av z7r \ I likelihood function is the joint density of a1, 82,. . . 1 Ent or I-Il:r fG). the likelihood function is : n f (",): ^+r, -*pf L(t,F,.-') ^ +r'r ' (2n)"'o' i:1. \ 2o' There- ) I sincewe can write r : y - X P, the likelihood function becomes L(y,x, F,o\' : -+exp[(2o)'/'on ' \ - xp)'(y - xp)) '-F F ) \r +(v 20" I in the simple linear regressioncase,it is convenientto work with the log of the l n z ( vx, , B , . - ' ) : - ; h ( 2 n ) - n t n ( o )- fift-xD '(y- xp) I b clear that for a fixed value of o the log-likelihoodis maximizedwhen the term (v-xF)'0-xB) i minimized. Therefore, the maximum-likelihoodestimator of B under normal errors is equivalent to the least-squaresestimator F : (X'X)-tX'y. The Drimum-liketihood estimator of o2 is -) o.- - (y-xF)'(v-xB) 80 MULTIPLE LINEAR REGRESSION . , II | H E S I S T E I These are multiple linear regression generalizations of the results given for simple linear regression in Section 2.10. The statistical properties of the maximum-likelihood estimators are summarized in Section 2.L0. 3.3 I{YPOTHESIS TESTING IN MULTIPLE lj :(Ft,F LINEAR RBGRESSION Once we have estimatedthe parametersin the model, we face two immediate questions: 1. What is the overalladequacyof the model? 2. Which specificregressors seemimportant? --rcd me r;ir that . :hcn F( l:rcJomI Severalhypothesistestingproceduresprove usefulfor addressingthesequestions. The formal tests require that our random errors be independentand follow a normaldistribution with mean E(e,):0 andvarianceVar(e,): o2. 3.3.1 Test for Significance of Regression The test for significance of regression is a test to determine if there is a linear relationship between the response y and any of the regressor variables x12x.t xo. This procedure is often thought of as an overall or global test of model adequacy.The appropriate hypothesesare Ho: Fr: :9*:0 Fr: Hr: B, + 0 ::.:litrpi '.-.1\t On : ] l ' \ L l t ct for at least one / . Rejection of this null hypothesis implies that at least one of the regressors x11x2 xo contributes significantly to the model. The test procedure is a generalization of the analysis of variance used in simple linear regression. The total sum of squares SS., is partitioned into a sum of squares due to regression, SSp, and a residual sum of squares, SS*"r. Thus, 1.,) |\ ' ':t.rl f, SSr:SS*+SSo", Appendix C.3 shows that if the null hypothesis is true, then SS*/o2 follows & X;: distribution, which has the same number of degrees of freedom as number of regressorvariables in the model. Appendix C.3 also shows that SS*.,/ o' - Xn2-tand that SS*". and SS* are independent. By the definition of an F statistic given in Appendix C.1, rtro - ssR/k MS* SS*../(n-k-1) MS*", r.j. 1;f follows the Fo.n-k-l distribution. Appendix C.3 shows that . . tt ,) E(MSR..): o' E(MSR): o'+ F* F*'X'rX.. kd 'il'irnr'!3,.' "{HEFflIll