1 Some Different Perspectives on Linear Least Squares A standard

Some Different Perspectives on Linear Least Squares A standard problem in statistics is to measure a response or dependent variable, y, at fixed values of one or more independent variables. Sometimes there exists a deterministic model y  f  x,   , where x stands collectively for all of the independent variables of the system and  stands collectively for all of the model parameters. In the probabilistic model it is often assumed that each measured y value can be   expressed as yi  f xi ,    i , here i is an index that labels the fixed input condition at xi and  i represents random error associated with the given measurement. For a valid model the mean random error should be zero. For a given set of n data points the problem of choosing the “best” values for the parameters is typically solved via the method of least squares. The parameters are chosen to minimize the function F     n   yi  f  xi ,   i 1 2   . If the function f xi ,  is linear in  the procedure is referred to as a linear least squares or linear regression problem. The purpose of these notes is to examine the least squares solution from several different points of view. The Least Squares Approximation to a Vector in an n Dimensional Euclidean Space Consider an n dimensional Euclidean space, V with an inner product of vectors v and u denoted as  u , v  .   Let W be a subspace of V, then V  W  W , where W is the orthogonal complement of W. Let v be  any element of V, then v  w  u , where w  W and u  W . The vector w is called the projection of v 2 onto W. Let g v    v     v   , v    . The following argument shows that   w is the unique minimum of this function on W. First, by expanding the inner product, g v   can be expressed as 2 2 2 g v     v  w     w   v  w  2  v  w,  w     w . If   W , then   w  W 2 2 2 and  v  w,  w   u ,  w  0 . So on W, g v    u    w  u , with equality only at n   w . If V  R and W has the orthonormal basis C1 , C 2 , C m with m  n , then the column vector   m which is the projection of v onto W is given by w    Cˆ j , v Cˆ j . Define the n  m matrix Q as j 1 Q  C1 , C 2 , C m  , that is Q i, j   C j  . The n  m matrix QQT , called the “outer product”,   i,1 m m m projects v onto W: wi,1    C j , v  C j     C j   QT v  j ,1  Q i , j  QT v  j ,1  QQT v  . Thus, i,1 j 1 i ,1 i,1 j 1 j 1 the vector QQT v minimizes the squared distance from v to W. Al Lehnen 2/17/2016 Madison Area Technical College 2 Some Different Perspectives on Linear Least Squares Linear Least Squares in Matrix Notation: The Generalized Inverse m If a model is linear in its m parameters designated by the vector  in R , then all the n estimated y n responses can be generated as a vector in R given by A where A is an n  m matrix. We will assume that A has rank m with m  n . If the rank of A is less than m, it means that the model parameters are not all independent and need to be reduced in number so as to make a “design matrix” with all of its columns linearly independent. Denote the k’th column of A as Ak , i.e., A   A1, A2 , Am  . Since the columns of A n are linearly independent vectors in R , Ab  m  bk Ak  0n if and only if each bk  0 . Thinking of A as a k 1 linear transformation from R m to R , the image of A, W, is the span of  A1, A2 , n Am  and has dimension m, while the kernel of A has dimension zero and consists of only the zero vector 0m . Thus,   T every vector in W is orthogonal to each column of A, or stated differently, if u  W , A u  0 m .  T Conversely, if A u  0 m , then every column of A is perpendicular to u, so u  W . Thus, the kernel of n A transpose is the orthogonal complement of the column space of A. Let y  R , then y  w  u where  w  W and u  W . Unless u  0 n , the system of equations Ab  y has no solutions. However, Ab  w has a solution,  , and since  A1, A2 , Am  is a linearly independent set, this solution is unique. Furthermore, w is the closest vector in the column space of A to y and the minimum value of 2 2 2 m Fy  b   y  Ab as b varies over R is Fy     y  w  y  A . Now, A  w , so m T  AT A  AT w  AT  y  u   AT y  AT u  AT y , since ker A  W . Suppose that for b  R ,   2 T AT Ab  0m , then bT AT Ab   Ab  Ab   Ab, Ab   Ab  0 . So, if AT Ab  0m , Ab  0 n which implies that b  0 m since the kernel of A consists of only 0m . Therefore, AT A is nonsingular and the least    1 squares solution to minimizing Fy  b  is given by   AT A AT y , the m  n matrix AT A sometimes called the generalized inverse of the n  m matrix A.  1 AT is An Alternate Formulation of Linear Least Squares Using Multivariable Calculus m The least squares solution is the vector  in R that minimizes the function of m variables, 2 m Fy  b   y  Ab   y  Ab, y  Ab    y , y   2  Ab, y    Ab, Ab  as b varies over R . Rewriting this function using sigma notation gives the following expression. Al Lehnen 2/17/2016 Madison Area Technical College 3 Some Different Perspectives on Linear Least Squares Fy  b    y , y   2  n m  i 1 j 1  Ab i,1 yi n i 1 1 j 1 b b Ai , j b j   Ai, j b j yi   b b j  AT A , j n m   y, y  2 m m    Ai , m m i 1 j 1 1 j 1 Now a necessary condition for Fy  b  to be at a minimum is that all of the partial derivatives F y bk b j vanish for every integer k with 1  k  m . Since Fy bk   02   Ai, j b  i 1  n T A i 1 k ,i   1 j 1  m yi  k ,1 k   k ,1  b j bk   AT A     AT A ,j ,j   AT A ,k b m k, j bj  1   AT A m T  A Ab bj  b ,k b j  b  j,k j 1 k ,1 bk 1 j 1  T A A     T T  2  A y   2  A Ab   2 AT y  b   b yi  m m  2  Ai ,k yi  2 m m k n   b j i 1 j 1 Fy   j , k , the following simplification results.  n m m m   T  y , y   2  A  b j yi  b b A A  j i , j   , j bk  i 1 j 1 1 j 1  n m The condition, bk T 1 b k, k ,1  0 for every integer k with 1  k  m , requires that AT Ab  AT y . This has the  unique solution given by   AT A  1 AT y , which is identical to the result obtained in the last section. A Formulation of Linear Least Squares Using QR Factorization Since the columns of A are linearly independent a Gram Schmidt procedure on the columns yields an  orthonormal basis for W. Designate this basis as C1 , C 2 , defined recursively for j  2 as C j  Al Lehnen Cj Cj , where C j  A j  2/17/2016  C m . Specifically, this orthonormal basis is j 1   Ak , C k  C k and C1  k 1 A1 . A1 Madison Area Technical College 4 Some Different Perspectives on Linear Least Squares The column vector which is the projection of y onto W is given by w  QQT y , where the n  m matrix Q   is defined as Q i, j  C j , i.e., Q  C1 , C 2 , C m  . Now, A  QQT A , since the projection of each   i ,1 column of matrix A onto A’s column space is just that original column of A . Let the m  m matrix R be   defined as R  QT A . Now,  R i, j  C i  span of C1 , C 2 ,   Ci , A j  j  k 1  m   j Ri, j   C i , Ck C k , A j     A j  C i , A j . By the Gram Schmidt procedure if j  m the  A1 , A2 , C j is equal to the span of spanned by A1 , A2 , 1 , ,  2 , ,  3 , T  A j . So A j  j  k 1   C k C k , A j . Now, if i  j , j   0   C k , A j   0 , i.e., if i  j , C i is orthogonal to the subspace k 1 A j . Therefore, the matrix R is upper triangular. Consider for m real numbers, ,  m , the sum m   j Ri, j  j 1 m       j C i , A j  C i , z , where z  j 1 m   j A j W . If j 1   C i , z  C i  0 . But since the columns of A are m  0 for every integer i, 1  i  m , then z  i 1 j 1 linearly independent, this requires that 1   2  3    m  0 . Hence, the columns of the matrix R are linearly independent and R is therefore nonsingular. This factorization of A, A  QQT A  QR , is generically called the QR factorization. Since the unique solution of the linear least squares problem solves A  w  QQT y , we have QR   QQT y . Now, QT Q  I m , i.e., QT Q i, j  QT C1, C 2 ,  T   Cm   C i C j  C i , C j  i, j .  i, j So R   QT QR   QT QQT y  QT y and the unique linear least squares solution is given by   R 1QT y . The following steps show that this solution is the same as that obtained by the generalized inverse. AT   QR   RT QT T AT A  RT QT QR  RT I m R  RT R  AT A   RT R   R1  RT  1 1  AT A AT  R1  RT  RT QT  R1QT 1 1 1 Calculating the Parameter Variances From the theory of the probability distributions of random variables we have the following fundamental result. If y1 , y2 , y3 , , yn are n statistically independent random variables with the variance of y j being Al Lehnen 2/17/2016 Madison Area Technical College 5 Some Different Perspectives on Linear Least Squares n n j 1 j 1  2j , and h    j y j , then  h2    2j  2j . Now, using the generalized inverse the linear least squares solution for parameter j is given by n  m n 1 T  1  T 1 AT  .  j   AT A A y  yk   yk  A AT A  A A  k ,i j ,i k 1 i 1   j ,1 k 1   j, k Hence,      m n    k2   Ak ,i AT A  j  k  1 i 1 2   1   j,i  2 .  In the special case where all of the random variables have a common variance,  2 , this simplifies to 2 2 n m n  1  1      2   AT A AT   .  2   2    A AT A k ,i j ,i  j  j ,k  k 1 i 1 k 1       In the QR factorization solution of the linear least squares problem,   j  R 1QT y  j,1   k 1  n R 1QT 2  j ,k  yk  n n m   k 1 yk  m i 1 R j ,1i QiT,k    k    k2   R j,1i C i   Rj,1i C i k , so that n k 1 m yk i 1 2 .  k 1  i 1  In the special case where all of the random variables have a common variance this simplifies as follows. j 2 n  m m m n  m 1    m 1  2 1 2 1 1        R j ,i C i  R j ,i C i R j, C  R j ,i R j , C Ci  j    k k  k k k k 1  i 1 k 1  i 1 i 1 1 k 1    1  2 2 2 n  m m  i 1 1      R j ,1i R j ,1 C , C i   2 m m  i 1 1      R j ,1i R j ,1  ,i   2   R j,1i  m      2 i 1 An Example of the Methods: The Linear Model in One Independent Variable x   If the model is that yestimate  1 x  0 , and the 2  1 parameter vector is defined as    1  , then the  0  n 2 n   x1 1   xi  xi   x 1  x2 x  i 1 i 1  n  2 “design” matrix given by A   2  . Hence, AT A    n   and  n    x 1     xi n    x 1    n   i 1  Al Lehnen 2/17/2016 Madison Area Technical College 6 Some Different Perspectives on Linear Least Squares  1 x   1 x      2 2 1    x x  x x 1 n    . Here the average value of a variable is denoted as AT A      i 2 SS xx n i 1 n  x 2   x     while the covariation of two variables SS  , is defined as   SS   n n i 1 i 1  i   i     ii  n n n n n i   i  n      i i  n    n i 1 n i 1 i 1    n  1 n   i i    i   i   n      n  i 1   i 1  i 1 n       . n    xi yi  i 1  , so the least squares solution is given by the following expression, Also, AT y    n   yi   i 1   n  n    n  1 n  xi yi   yi   xi    xi yi     1  x n     1      1 1 i 1 i 1  i 1   i 1   .    1   AT A AT y        n n SS xx   x x 2   n   0   SS xx  2 yi  x yi  x xi yi     i 1    i 1 i 1 These are the “regression” equations which are sometimes expressed as n  n  1 n xi yi   yi   xi  n  i 1   i 1  SS xy 1  i 1  SS xx SS xx              n 2  n   n  n   xi   yi    xi   xi yi         0   i 1   i 1   i 1   i 1 nSS xx     2  n 21 n   n  1  n  1  n  n  1  n  1  n  n   xi    yi    xi  yi     xi   xi yi    xi   xi   yi         2     n      i 1  n  i 1   i 1  n  i 1   n  i 1   i 1   i 1  n  i 1   i 1    SS xx          2 n  n 2 1 n   1  n  n  xi    xi  xi yi   xi   yi    n  n  i 1   i 1  SS xy SS i 1   i 1   i 1 y  x   y xx  x SS xx SS xx SS xx SS xx  y  1 x      Assuming a common variance for each independent random variable yi , the variances of the model parameters are computed as follows. Al Lehnen 2/17/2016 Madison Area Technical College Some Different Perspectives on Linear Least Squares  2  2 1   2 2 n  n  x  x 2  2 n 1   2 2 T T k  x  x  SS      A A A     2   SS   k xx SS 1,k  xx  k 1  k 1  SS 2 k 1 SS 2 xx xx xx 2 2 7 0   2 n  n  x2   x  x 1   T T 2   A A A       SS k  2,k  xx k 1  k 1    2 2  2 n  x2  x x       k   2 k 1  SS  xx 2 n   x2  x 2   x x  x 2           k   SS 2 k 1  xx 2  2 n  2 2 2 2 2  2   x   x    2  x   x   x    xk  x    x   xk  x        SS 2 k 1  xx 2 n n  2   2 2 2 2  2  2   n  x   x    2  x   x   x     xk  n  x     x    xk  x       k 1  k 1  SS 2   xx 2    x2  x 2   n  2       2     2   SS xx    2 2 2  2    2  x   x   x    0    x  SS xx    x  SS xx    2  n  n   SS 2    SS xx  xx    1  x 2   2    n SS xx     2   Using the QR formulation to calculate the parameters requires an orthonormal basis for the column space of A. A Gram Schmidt procedure results in the following vectors.  x1   x1  x    n 2 2  x2   C1  xi  n x2      i 1      xn   xn  n  n 2    xi  x1  xi   x2  x x  i 1   i 1 1  1  1  x1    x1   n  n 1  1  x    x  2  2  n   x  x2  xi  x  x2 x   C2        ,  2    2  xi2  i 1 i  n   i 1            i 1               1  1  xn    xn   x2  x x  n  n n     xi2  xn  xi  i 1 i 1  Al Lehnen 2/17/2016 Madison Area Technical College 8 Some Different Perspectives on Linear Least Squares Normalizing C2 requires the previously encountered sum, 2 2 2 n  x 2  x x    x 2  x 2   x x  x   SS xx   x 2 SS . So that     i    xx    i  n    i 1  i 1  n  i C2  x 2  xi x SS xx    x1   x  Thus, Q    2        xn   1 x  n SS xx 2 .   2  1 x  and nx 2 SS xx  n SS xx      nx 2   nx nx    nx 2   2  C1 , A x2 nx 2   C1 , A2   nx 1   R  QT A   1  1    0 0   C 2 , A2   2  C 2 , A1 2   1 x 1 x        n SS xx n SS xx     2   2 1 nx  x   1 x 1 x    1        2  2 n SS xx   1  x 2 2 n SS x x xx  and   R 1     nx   n SS   xx 2 nx 2   1 x     0  2   0 nx  n SS xx      1 n  xi yi   i 1   2 nx   n QT y    2 n  . 1   x  yi  x  xi yi   2  i 1  i 1  1 x  SS    xx n SS xx     x2  x x  1    2   x  x2 x       x2  x x  n              . Proceeding,     Computing the parameter vector gives n  1 n  2 n  x   xi yi  2  x  yi  x  xi yi    nx 2 i 1 i 1  x SS xx  i 1  1  1 T  R Q y   .    1  n 2   n   n   n  0     xi    yi     xi    xi yi      nSS xx  i 1   i 1   i 1   i 1    Al Lehnen 2/17/2016 Madison Area Technical College Some Different Perspectives on Linear Least Squares The expression for  0 agrees with the earlier derivation, while the previous expression for 1 is recovered as is shown by the following steps. n  2 n  1 n x 1   xi yi  2  x  yi  x  xi yi  i 1  x SS xx  i 1 nx 2 i 1  n n   2 n   SS xx  xi yi  nx  x  yi  x  xi yi   i 1 i 1  i 1   SS xx nx 2  1 n n n 2 n  n  n  2 n    xi  xi yi  x  xi   xi yi    xi  x  yi  x  xi yi   i 1  i 1 i 1  i 1  i 1   SS xx nx 2 i 1 i 1 n n n 2 n  1 2    xi  xi yi   xi  x  yi   i 1  i 1   nx 2 SS i 1 i 1 xx n 2  xi n 1  n   n   SS xy i 1    xi yi    xi    yi    n  i 1   i 1   SS xx  n  i 1  xi2  SS   xx  i 1  1 1   Assuming a common variance for each independent random variable yi , in the QR factorization the variances of the model parameters are computed as follows.   2 1 2  2 i 1 R 1,1i  2 2   x   1  x 2 1        2  2   n SS xx x    nx 2  2    2    n  x 2    2 2   n x   x 2   x 2  1        SS xx       2 4 2  2 2 4  2  2  x   x   SS xx  n  x     x   x    x   x     x   2 2  x 2  SS n  x 2  SS xx   xx      2  2   2 2   x    2  x 2  SS    SS xx   xx   2  2   2   2 2 2 1 2  2  1  x   2  1 x  .   R 2,i   0       n SS xx    n SS xx  0 i 1        2  Both of these results agree with the previous analysis that used the generalized inverse. Al Lehnen 2/17/2016 Madison Area Technical College 9

1 Some Different Perspectives on Linear Least Squares A standard

Related documents

Products

Support

1 Some Different Perspectives on Linear Least Squares A standard

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib