7.1-7.3 Least squares estimation

1 Chapter 7 Multivariate Linear Regression Models 7.1-7.3: Least Squares Estimation Data available: (Y1 , z11 , z12 ,, z1r ), (Y2 , z 21 , z 22 ,, z 2 r ), , (Yn , z n1 , z n 2 ,, z nr ). The multiple linear regression model for the above data is Yi   0  1 zi1   2 zi 2     r zir   i , i  1,, n, where the error terms are assumed to have the following properties: 1. E  i   0; 2. Var  i    2 ; 3. Cov i ,  j   0, i  j; The above data can be represented as the matrix form. Let Y1  1 z11 Y  1 z 2 21  Y ,Z         Yn  1 z n1  z1r   1   0        z 2r  2  ,  ,   2 .             z nr   r   n  Then, Y1    0  1 z11     r z1r   1    0  1 z11     r z1r   1  Y      z     z        z     z    1 21 r 2r 2 1 21 r 2r  Y   2   0  0   2                 Yn    0  1 z n1     r z nr   n    0  1 z n1     r z nr   n  1 z11 1 z 21 =    1 z n1  z1r    0   1   z 2 r   1   2    Z   ,              z nr    r   n  where the error terms become 1. E    0; 2. Cov   E  t   2 I ;   1 2 Least squares method: The least squares method is to find the estimate of  minimizing the sum of squares of residual, n S (  )  S (  0 , 1 ,,  r )    i2   1 i 1 2  1      n  2    t      n   (Y  Z ) t (Y  Z ) since   Y  Z . Expanding S (  ) gives S ( )  (Y  Z )t (Y  Z )  (Y t   t Z t )(Y  Z )  Y tY  Y t Z   t Z tY   t Z t Z  Y tY  2 t Z tY   t Z t Z since  t Z t Y  (  t Z t Y ) t  Y t Z  a real number. Note: For two matrices A and B,  AB t  B t A t and A   A  1 t t 1 Similar to the procedure in finding the minimum of a function in calculus, the least squares estimate b can be found by solving the equation based on the first derivative of S (  ) , S     0     S (  )  S    Y t Y  2  t Z t Y   t Z t Z 1     2Z t Y  2Z t Z  0      S      r     Z t Z  Z tY    ˆ0   ˆ   ˆ  ( Z t Z ) 1 Z t Y   1       ˆ     r 2 3 Yˆ1  ˆ Y Yˆ  Zˆ   2   The fitted values (in vector):   Yˆn  ˆ1  ˆ  Y  Yˆ  Y  Zˆ  ˆ   2  The residuals (in vector):    ˆn  r 1  (  a)   t Note: (i) (  i 1ai ) i 1   0   a1    a  1  and a   2  .  a, where             r  a r 1  r 1 r 1 (ii)  (  t A )    (  i 1  j 1 aij ) i 1 j 1   2 A , where A is any symmetric (r  1)  (r  1) matrix. Note: Since Z Z   Z Z   Z Z , t t t t t t Z t Z is a symmetric matrix. Also, Z Z    Z Z   t Z Z  t 1 1 t t t 1    ZtZ 1 , is a symmetric matrix. t t Note: Z Z  Z Y is called the normal equation. Note: ˆ t Z  (Y t  ˆ t Z t ) Z  Y t  Y t Z ( Z t Z ) 1 Z t Z  Y t Z  Y t Z ( Z t Z ) 1 Z t Z  Y tZ Y tZ  0. 3 4 1 1 Therefore, if there is an intercept, then the first column of Z is   . Then,    1 1 z11  z1r  1 z  z 2 r   n  21 ˆ t Z  ˆ1 ˆ2  ˆn   î   0       i 1    1 z n1  z nr  n   î  0 i 1 n Note: for the linear regression model without the intercept,  î i 1 might not be equal to 0. Note:  Yˆ  Zˆ  Z Z t Z  t where H  Z Z Z  1  1 Z tY  HY , Z t is called “hat” matrix (or projection matrix). Thus, ˆ  Y  Zˆ  Y  HY  I  H Y . Example 1: Heller Company manufactures lawn mowers and related lawn equipment. The managers believe the quantity of lawn mowers sold depends on the price of the mower and the price of a competitor’s mower. We have the following data: Competitor’s Price Heller’s Price z i1 zi2 120 100 102 140 190 130 155 175 125 145 180 110 90 150 210 150 250 270 300 100 120 77 46 93 26 69 65 4 Quantity sold yi 5 150 250 85 The regression model for the above data is Yi   0  1 zi1   2 zi 2   i . The data in matrix form are  Y1  102 1 z11 z12  1 120 100   0   Y  100 1 z  1 140 110  z 2 21 22  ,     1  Y      , Z    .              2          Y10   85  1 z101 z102  1 150 250 The least squares estimate b is  ˆ0   ˆ  ˆ    1   Z t Z  ˆ   2   66.518  1 Z t Y   0.414  .  0.269  The fitted regression equation is yˆ  ˆ0  ˆ1 z1  ˆ2 z2  66.518  0.414z1  0.269 z2 . The fitted equation implies an increase in the competitor’s price of 1 unit is associated with an increase of 0.414 unit in expected quantity sold and an increase in its own price of 1 unit is associated with a decrease of 0.269 unit in expected quantity sold. Thus,   Yˆ t  Z̂ t [89.21,94.79,120.88,79.86,74.02,98.49,50.81,53.69,60.09,61.16] and   ˆ t  Y  Yˆ  [12.79,5.21,-0.88,-2.86,-28.02,-5.49,-24.81,15.31,4.91,23.84] t Suppose now we want to predict the quantity sold in a city where Heller prices it mower at $160 and the competitor prices its mower at $170. The quantity sold predicted is 66.518  0.414 170  0.269 160  93.718 . 5 6 Geometry of Least Squares: S    Y  Z  Y  Z  . t In linear algebra,  z11   z1r  1 z  z  1 21 Z   0    1       r  2 r              z 1   n1   znr  Z is the linear combination of the column vector of . That is, Z  RZ   the column space of Z . Then, S    Y  Z  the square distance between Y and Z 2 Least squares method is to find the appropriate Z̂ such that the distance between Y and Z̂ is smaller than the one between combination of the column vectors of Intuitively, Z Z , for example, Y Z a , Zb , Zc , . is the information provided by covariates interpret the response Y . Thus, Z̂ and the other linear z1, z2 ,, zr is the information which interprets to Y most accurately. Further, S    Y  Z  Y  Z  t     Y  Zˆ  Y  Zˆ   2Y  Zˆ  Zˆ  Z   Zˆ  Z  Zˆ  Z   Y  Zˆ  Zˆ  Z  2Y  Zˆ  Z ˆ    t  Y  Zˆ  Zˆ  Z Y  Zˆ  Zˆ  Z t t 2 If we choose the estimate t 2 ˆ t of  such that 6 Y  Z is orthogonal 7 every vector in   t RZ  , then Y  Zˆ Z  0 . Thus, S    Y  Zˆ That is, if we choose ˆ 2   Zˆ  Z    S ˆ  Y  Zˆ   S ˆ   Y  Zˆ Thus, ˆ satisfying ̂  2 of  2 ,  Zˆ  Zˆ   . t ˆ Y  Z Z  0 , then satisfying and for any other estimate 2  t ˆ Y  Z Z  0 2  Y  Zˆ 2    S ˆ . is the least squares estimate. Therefore, Y  Zˆ  Z  0  Z Y  Zˆ   0 t t   Z tY  Z t Zˆ  ˆ  Z t Z  1 Z tY Since  Yˆ  Z ˆ  Z Z t Z H Y  1  Z tY  HY , H  Z Z t Z is called the projection matrix or hat matrix. H  1 Zt , projects the response vector on the space spanned by the covariate vectors. The vector of residuals is ˆ  Y  Yˆ  Y  Z ˆ  Y  HY  I  H Y We have the following two important theorems. Properties of the least squares estimate:  E ( ˆ0 )  0   ˆ    E ( 1 )   ˆ  1 1. E (  )                E ( ˆr )   r  7 . 8 2. The variance –covariance matrix of the least squares estimate b is  Var ( ˆ0 )  cov( ˆ1 , ˆ0 )  ˆ Cov(  )     cov( ˆr , ˆ0 ) cov( ˆ0 , ˆ1 ) Var ( ˆ1 )  cov( ˆr , ˆ1 )  cov( ˆ0 , ˆr )   cov( ˆ1 , ˆr )       Var ( ˆr )    2 ( Z t Z ) 1 [Derivation:]  E (ˆ )  E Z t Z since  1    1 Z tY  Z t Z   Z t E (Y )  Z t Z 1 Z t Z   E(Y )  EZ     Z  E( )  Z  0  Z . Also,  Cov( ˆ )  Cov Z t Z   1   Z  IZ Z Z    Z Z  1  Z tZ 2 since t t   Z tY  Z t Z 2 1 t 2   Z Z  Z Z Z Z  1 Z t Cov(Y ) Z t Z t 1 t 1 Zt  t 1 t 1 Cov(Y )  CovZ     Cov( )   2 I Denote n s2   (Yi  Yî )2 i 1 n  r 1 n ˆ  ˆ ˆ Y  Y Y  Y    t i 1 n  r 1 2 i n  r 1 , the mean residual sum of squares (the residual sum of squares divided by n-r-1). n n  î2 i 1 n  r 1   (î  ˆ ) 2 i 1 n  r 1 n   ( i 1 i   )2 n 1  the sample variance estimate, n where ˆ   ˆ i 1 i n . s 2 can be used to estimate  2 . 8 9 Properties of the mean residual sum of squares: 1. H  H  H and I  H I  H   I  H . 2. I  H Z  0 3. trH   r  1, trI  H   n  r 1 4. E mean residual sum of square    t t  ˆ ˆ ˆ     Y  Y Y  Yˆ 2  E s  E   E  n  r 1 n  r  1      [proof:] 1.  H  H  Z ZtZ  1  ZtZ ZtZ  1  Zt  Z ZtZ     1 2  Zt  H and I  H I  H   I  H  H  H  H  I  H  H  H  I  H . 2. I  H Z  Z  HZ  Z  Z Z t Z 1 Z t Z  Z  Z  0 3.  trH   tr Z Z t Z  1   Z t  tr Z t Z  1  Z t Z  trI r 1r 1   r  1 Similarly, trI  H   trI   trH  tr A  B   tr A  trB   n  r 1 4. 9 10       t t ˆ t ˆ  Y  Yˆ Y  Yˆ  Y  Zˆ Y  Zˆ  Y  HY t Y  HY   Y t I  H  I  H Y  Y t I  H Y  I  H I  H   I  H  t  Z    I  H Z      t I  H   I  H Z  0 t     tr  t I  H   tr I  H  t Thus,       trI  H Cov   tr I  H  I    trI  H     E ˆ t ˆ  E tr I  H  t  tr I  H E  t 2 2  n  r  1 2 Therefore,  ˆ t ˆ  E mean residual sum of square   E    n  r  1 2 Gauss’s Least Squares Theorem: Let Y  Z   , where E    0, Cov    2 I , and Z has full rank r+1. For any c, the estimator   1 c t ˆ  c t Z t Z Z t Y  c0 ˆ0  c1ˆ1    cr ˆr of c t  has the smallest possible variance among all linear estimators of the form a t Y  a1Y1  a2Y2    anYn that are unbiased for c t  . [proof:] Let   c t ˆ  c t Z t Z 1   t   Z tY  a Y , a  Z Z t Z 1 c. Let a t Y be any unbiased estimator of c t  with E a t Y   c t  . Then, 10 11         E a t Y  E a t Z     E a t Z  a t   a t Z  c t   a t Z  c t   0  ct  at Z That is, c t  a   Z . Thus, t         a  a  a  a  a  a    a  a  a  a   a  a    a  a   Var a  Y  Var a t Y  Var a t Z  a t   Var a t   a t Cov a  a t 2 Ia  2  t 2  t   a  a  a  a  a  Z Z Z    t   t  since  t 2  t 1 t  Useful Splus Commands: >estate=matrix(scan("E:\\T7-1.dat"),ncol=3,byrow=T) >estatelm=lm(estate[,3]~estate[,1]+estate[,2]) >estatelm >summary(estatelm) >anova(estatelm) Useful SAS Commands: title 'Regression Analysis'; data estate; infile 'E:\T7-1.dat'; input z1 z2 y; proc reg data=estate; model y = z1 z2; run; 11   1 c  ct  ct Z t Z c  0 .  t 

7.1-7.3 Least squares estimation

Related documents

Products

Support

7.1-7.3 Least squares estimation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib