1 Chapter 7 Multivariate Linear Regression Models 7.1-7.3: Least Squares Estimation Data available: (Y1 , z11 , z12 ,, z1r ), (Y2 , z 21 , z 22 ,, z 2 r ), , (Yn , z n1 , z n 2 ,, z nr ). The multiple linear regression model for the above data is Yi 0 1 zi1 2 zi 2 r zir i , i 1,, n, where the error terms are assumed to have the following properties: 1. E i 0; 2. Var i 2 ; 3. Cov i , j 0, i j; The above data can be represented as the matrix form. Let Y1 1 z11 Y 1 z 2 21 Y ,Z Yn 1 z n1 z1r 1 0 z 2r 2 , , 2 . z nr r n Then, Y1 0 1 z11 r z1r 1 0 1 z11 r z1r 1 Y z z z z 1 21 r 2r 2 1 21 r 2r Y 2 0 0 2 Yn 0 1 z n1 r z nr n 0 1 z n1 r z nr n 1 z11 1 z 21 = 1 z n1 z1r 0 1 z 2 r 1 2 Z , z nr r n where the error terms become 1. E 0; 2. Cov E t 2 I ; 1 2 Least squares method: The least squares method is to find the estimate of minimizing the sum of squares of residual, n S ( ) S ( 0 , 1 ,, r ) i2 1 i 1 2 1 n 2 t n (Y Z ) t (Y Z ) since Y Z . Expanding S ( ) gives S ( ) (Y Z )t (Y Z ) (Y t t Z t )(Y Z ) Y tY Y t Z t Z tY t Z t Z Y tY 2 t Z tY t Z t Z since t Z t Y ( t Z t Y ) t Y t Z a real number. Note: For two matrices A and B, AB t B t A t and A A 1 t t 1 Similar to the procedure in finding the minimum of a function in calculus, the least squares estimate b can be found by solving the equation based on the first derivative of S ( ) , S 0 S ( ) S Y t Y 2 t Z t Y t Z t Z 1 2Z t Y 2Z t Z 0 S r Z t Z Z tY ˆ0 ˆ ˆ ( Z t Z ) 1 Z t Y 1 ˆ r 2 3 Yˆ1 ˆ Y Yˆ Zˆ 2 The fitted values (in vector): Yˆn ˆ1 ˆ Y Yˆ Y Zˆ ˆ 2 The residuals (in vector): ˆn r 1 ( a) t Note: (i) ( i 1ai ) i 1 0 a1 a 1 and a 2 . a, where r a r 1 r 1 r 1 (ii) ( t A ) ( i 1 j 1 aij ) i 1 j 1 2 A , where A is any symmetric (r 1) (r 1) matrix. Note: Since Z Z Z Z Z Z , t t t t t t Z t Z is a symmetric matrix. Also, Z Z Z Z t Z Z t 1 1 t t t 1 ZtZ 1 , is a symmetric matrix. t t Note: Z Z Z Y is called the normal equation. Note: ˆ t Z (Y t ˆ t Z t ) Z Y t Y t Z ( Z t Z ) 1 Z t Z Y t Z Y t Z ( Z t Z ) 1 Z t Z Y tZ Y tZ 0. 3 4 1 1 Therefore, if there is an intercept, then the first column of Z is . Then, 1 1 z11 z1r 1 z z 2 r n 21 ˆ t Z ˆ1 ˆ2 ˆn ˆi 0 i 1 1 z n1 z nr n ˆi 0 i 1 n Note: for the linear regression model without the intercept, ˆi i 1 might not be equal to 0. Note: Yˆ Zˆ Z Z t Z t where H Z Z Z 1 1 Z tY HY , Z t is called “hat” matrix (or projection matrix). Thus, ˆ Y Zˆ Y HY I H Y . Example 1: Heller Company manufactures lawn mowers and related lawn equipment. The managers believe the quantity of lawn mowers sold depends on the price of the mower and the price of a competitor’s mower. We have the following data: Competitor’s Price Heller’s Price z i1 zi2 120 100 102 140 190 130 155 175 125 145 180 110 90 150 210 150 250 270 300 100 120 77 46 93 26 69 65 4 Quantity sold yi 5 150 250 85 The regression model for the above data is Yi 0 1 zi1 2 zi 2 i . The data in matrix form are Y1 102 1 z11 z12 1 120 100 0 Y 100 1 z 1 140 110 z 2 21 22 , 1 Y , Z . 2 Y10 85 1 z101 z102 1 150 250 The least squares estimate b is ˆ0 ˆ ˆ 1 Z t Z ˆ 2 66.518 1 Z t Y 0.414 . 0.269 The fitted regression equation is yˆ ˆ0 ˆ1 z1 ˆ2 z2 66.518 0.414z1 0.269 z2 . The fitted equation implies an increase in the competitor’s price of 1 unit is associated with an increase of 0.414 unit in expected quantity sold and an increase in its own price of 1 unit is associated with a decrease of 0.269 unit in expected quantity sold. Thus, Yˆ t Ẑ t [89.21,94.79,120.88,79.86,74.02,98.49,50.81,53.69,60.09,61.16] and ˆ t Y Yˆ [12.79,5.21,-0.88,-2.86,-28.02,-5.49,-24.81,15.31,4.91,23.84] t Suppose now we want to predict the quantity sold in a city where Heller prices it mower at $160 and the competitor prices its mower at $170. The quantity sold predicted is 66.518 0.414 170 0.269 160 93.718 . 5 6 Geometry of Least Squares: S Y Z Y Z . t In linear algebra, z11 z1r 1 z z 1 21 Z 0 1 r 2 r z 1 n1 znr Z is the linear combination of the column vector of . That is, Z RZ the column space of Z . Then, S Y Z the square distance between Y and Z 2 Least squares method is to find the appropriate Ẑ such that the distance between Y and Ẑ is smaller than the one between combination of the column vectors of Intuitively, Z Z , for example, Y Z a , Zb , Zc , . is the information provided by covariates interpret the response Y . Thus, Ẑ and the other linear z1, z2 ,, zr is the information which interprets to Y most accurately. Further, S Y Z Y Z t Y Zˆ Y Zˆ 2Y Zˆ Zˆ Z Zˆ Z Zˆ Z Y Zˆ Zˆ Z 2Y Zˆ Z ˆ t Y Zˆ Zˆ Z Y Zˆ Zˆ Z t t 2 If we choose the estimate t 2 ˆ t of such that 6 Y Z is orthogonal 7 every vector in t RZ , then Y Zˆ Z 0 . Thus, S Y Zˆ That is, if we choose ˆ 2 Zˆ Z S ˆ Y Zˆ S ˆ Y Zˆ Thus, ˆ satisfying ̂ 2 of 2 , Zˆ Zˆ . t ˆ Y Z Z 0 , then satisfying and for any other estimate 2 t ˆ Y Z Z 0 2 Y Zˆ 2 S ˆ . is the least squares estimate. Therefore, Y Zˆ Z 0 Z Y Zˆ 0 t t Z tY Z t Zˆ ˆ Z t Z 1 Z tY Since Yˆ Z ˆ Z Z t Z H Y 1 Z tY HY , H Z Z t Z is called the projection matrix or hat matrix. H 1 Zt , projects the response vector on the space spanned by the covariate vectors. The vector of residuals is ˆ Y Yˆ Y Z ˆ Y HY I H Y We have the following two important theorems. Properties of the least squares estimate: E ( ˆ0 ) 0 ˆ E ( 1 ) ˆ 1 1. E ( ) E ( ˆr ) r 7 . 8 2. The variance –covariance matrix of the least squares estimate b is Var ( ˆ0 ) cov( ˆ1 , ˆ0 ) ˆ Cov( ) cov( ˆr , ˆ0 ) cov( ˆ0 , ˆ1 ) Var ( ˆ1 ) cov( ˆr , ˆ1 ) cov( ˆ0 , ˆr ) cov( ˆ1 , ˆr ) Var ( ˆr ) 2 ( Z t Z ) 1 [Derivation:] E (ˆ ) E Z t Z since 1 1 Z tY Z t Z Z t E (Y ) Z t Z 1 Z t Z E(Y ) EZ Z E( ) Z 0 Z . Also, Cov( ˆ ) Cov Z t Z 1 Z IZ Z Z Z Z 1 Z tZ 2 since t t Z tY Z t Z 2 1 t 2 Z Z Z Z Z Z 1 Z t Cov(Y ) Z t Z t 1 t 1 Zt t 1 t 1 Cov(Y ) CovZ Cov( ) 2 I Denote n s2 (Yi Yˆi )2 i 1 n r 1 n ˆ ˆ ˆ Y Y Y Y t i 1 n r 1 2 i n r 1 , the mean residual sum of squares (the residual sum of squares divided by n-r-1). n n ˆi2 i 1 n r 1 (ˆi ˆ ) 2 i 1 n r 1 n ( i 1 i )2 n 1 the sample variance estimate, n where ˆ ˆ i 1 i n . s 2 can be used to estimate 2 . 8 9 Properties of the mean residual sum of squares: 1. H H H and I H I H I H . 2. I H Z 0 3. trH r 1, trI H n r 1 4. E mean residual sum of square t t ˆ ˆ ˆ Y Y Y Yˆ 2 E s E E n r 1 n r 1 [proof:] 1. H H Z ZtZ 1 ZtZ ZtZ 1 Zt Z ZtZ 1 2 Zt H and I H I H I H H H H I H H H I H . 2. I H Z Z HZ Z Z Z t Z 1 Z t Z Z Z 0 3. trH tr Z Z t Z 1 Z t tr Z t Z 1 Z t Z trI r 1r 1 r 1 Similarly, trI H trI trH tr A B tr A trB n r 1 4. 9 10 t t ˆ t ˆ Y Yˆ Y Yˆ Y Zˆ Y Zˆ Y HY t Y HY Y t I H I H Y Y t I H Y I H I H I H t Z I H Z t I H I H Z 0 t tr t I H tr I H t Thus, trI H Cov tr I H I trI H E ˆ t ˆ E tr I H t tr I H E t 2 2 n r 1 2 Therefore, ˆ t ˆ E mean residual sum of square E n r 1 2 Gauss’s Least Squares Theorem: Let Y Z , where E 0, Cov 2 I , and Z has full rank r+1. For any c, the estimator 1 c t ˆ c t Z t Z Z t Y c0 ˆ0 c1ˆ1 cr ˆr of c t has the smallest possible variance among all linear estimators of the form a t Y a1Y1 a2Y2 anYn that are unbiased for c t . [proof:] Let c t ˆ c t Z t Z 1 t Z tY a Y , a Z Z t Z 1 c. Let a t Y be any unbiased estimator of c t with E a t Y c t . Then, 10 11 E a t Y E a t Z E a t Z a t a t Z c t a t Z c t 0 ct at Z That is, c t a Z . Thus, t a a a a a a a a a a a a a a Var a Y Var a t Y Var a t Z a t Var a t a t Cov a a t 2 Ia 2 t 2 t a a a a a Z Z Z t t since t 2 t 1 t Useful Splus Commands: >estate=matrix(scan("E:\\T7-1.dat"),ncol=3,byrow=T) >estatelm=lm(estate[,3]~estate[,1]+estate[,2]) >estatelm >summary(estatelm) >anova(estatelm) Useful SAS Commands: title 'Regression Analysis'; data estate; infile 'E:\T7-1.dat'; input z1 z2 y; proc reg data=estate; model y = z1 z2; run; 11 1 c ct ct Z t Z c 0 . t