PowerPoint

advertisement
Lecture XXVII
Orthonormal Bases and Projections
 Suppose that a set of vectors {x1,…,xr} for a basis for
some space S in Rm space such that r  m. For
mathematical simplicity, we may want to form an
orthogonal basis for this space. One way to form such
a basis is the Gram-Schmit orthonormalization. In
this procedure, we want to generate a new set of
vectors {y1,…yr} that are orthonormal.
The Gram-Schmit process is:
y1  x1
x2 ' y1
y 2  x2 
y1
y1 ' y1
x3 ' y1 x3 ' y2
y3  x3 

y1 ' y1 y2 ' y2
zi 
yi
 yi ' yi  2
1
Example
 1
9
 
 
x1   3 , x2   7 
 4
16 
 
 
 1
 
9 7 16 3 
 70 
1
9


 4     13 
 
  3    50 
y2   7  
  
13
1


16 
 4   20 


  1 3 4  3    13 
 
 4
 
 The vectors can then be normalized to one. However,
to test for orthogonality:
 70 
 13 
1 3 4  5013  0
 20 
 13 
 Theorem 2.13 Every r-dimensional vector space, except
the zero-dimensional space {0}, has an orthonormal
basis.
 Theorem 2.14 Let {z1,…zr} be an orthornomal basis for
some vector space S, of Rm. Then each x  Rm can be
expressed uniquely as
x uv
were u  S and v is a vector that is orthogonal to every
vector in S.
 Definition 2.10 Let S be a vector subspace of Rm. The
orthogonal complement of S, denoted S, is the
collection of all vectors in Rm that are orthogonal to
every vector in S: That is, S={x:x  Rm and x’y=0 for all
yS}.
 Theorem 2.15. If S is a vector subspace of Rm then its
orthogonal complement S is also a vector subspace of
Rm.
Projection Matrices
 The orthogonal projection of an m x 1 vector x onto a
vector space S can be expressed in matrix form.
 Let {z1,…zr} be any othonormal basis for S while {z1,…zm}
is an orthonormal basis for Rm. Any vector x can be
written as:
x  1 z1   r zr    r 1 zr 1   m zm   u  v
 Aggregating 1’,2’)’ where 1=(1 …r)’ and
2=(r+1…m)’ and assuming a similar decomposition of
Z=[Z1 Z2], the vector x can be written as:
x  Z  Z11  Z 2 2
u  Z11
v  Z 2 2
given orthogonality, we know that Z1’Z1=Ir and Z1’Z2=(0),
and so
1 
1 
Z1Z1 ' x  Z1Z1 ' Z1 Z 2    Z1 0   Z11  u
 2 
 2 
 Theorem 2.17 Suppose the columns of the m x r matrix
Z1 from an orthonormal basis for the vector space S
which is a subspace of Rm. If x  Rm, the orthogonal
projection of x onto S is given by Z1Z1’x.
 Projection matrices allow the division of the space into a
spanned space and a set of orthogonal deviations from
the spanning set. One such separation involves the
Gram-Schmit system.
 In general, if we define the m x r matrix X1=(x1,…xr) and
define the linear transformation of this matrix that
produces an orthonormal basis as A, so that:
Z1  X1 A
we are left with the result that:
Z1 ' Z1  A' X1 ' X1 A  I r
 Given that the matrix A is nonsingular, the projection
matrix that maps any vector x onto the spanning set
then becomes:
1
PS  Z1Z1 '  X 1 AA' X 1 '  X 1 ( X ' X ) X 1 '
 Ordinary least squares is also a spanning
decomposition. In the traditional linear model:
y  Xb  
yˆ  Xbˆ
within this formulation b is chosen to minimize the
error between y and estimated y:
 y  Xb '  y  Xb 
 This problem implies minimizing the distance
between the observed y and the predicted plane Xb,
which implies orthogonality. If X has full column
rank, the projection space becomes X(X’X)-1X’ and the
projection then becomes:
Xb  X  X ' X  X ' y
1
 Premultiplying each side by X’ yields:
X ' Xb  X ' X  X ' X  X ' y
1
b  X ' X  X ' X X ' X  X ' y
1
b  X ' X  X ' y
1
1
 Idempotent matrices can be defined as any matrix
such that AA=A.
 Note that the sum of square errors under the projection
can be expressed as:
SSE   y  Xb '  y  Xb 

 I


 X  X ' X  X 'y ' I  X  X ' X  X 'y
 y ' I  X  X ' X  X 'I  X  X ' X  X 'y
 y  X X ' X  X ' y ' y  X X ' X  X ' y
1
1
1
1
N
N
1
n
1
n
 In general, the matrix In-X(X’X)-1X’ is referred to as an
idempotent matrix. An idempotent matrix is one that
AA=A:
I
n


 X X ' X  X ' In  X X ' X  X ' 
1
1
I n  X  X ' X  X ' X  X ' X  X '
1
1
X X ' X  X ' X X ' X  X '
1
 In  X X ' X  X '
1
1
 Thus, the SSE can be expressed as:


SSE  y ' I n  X  X ' X  X ' y
1
 y ' y  y ' X  X ' X  X ' y  v' v
1
which is the sum of the orthogonal errors from the
regression
Eigenvalues and Eigenvectors
 Eigenvalues and eigenvectors (or more appropriately
latent roots and characteristic vectors) are defined by
the solution
Ax  x
for a nonzero x. Mathematically, we can solve for the
eigenvalue by rearranging the terms:
Ax  x  0
 A  I x  0
 Solving for  then involves solving the characteristic
equation that is implied by:
A  I  0
 Again using the matrix in the previous example:
 1 9 5
1 0 0 1  
 3 7 8   0 1 0   3




2 3 5
0 0 1
2
 5  14  132  3  0
9
5
7
8
3
5
 In general, there are m roots to the characteristic
equation. Some of these roots may be the same. In
the above case, the roots are complex. Turning to
another example:
5  3 3


A  4  2 3    {1,2,5}
4  4 5
 The eigenvectors are then determined by the linear
dependence in A-I matrix. Taking the last example:
 4  3 3


A  I   4  3 3
4  4 4
 Obviously, the first and second rows are linear. The
reduced system then implies that as long as x1=x2 and
x3=0, the resulting matrix is zero.
 Theorem 11.5.1 For any symmetric matrix, A, there exists
an orthogonal matrix H (that is, a square matrix
satisfying H’H=I) wuch that:
H ' AH  L
where L is a diagonal matrix. The diagonal elements of
L are called the characteristic roots (or eigenvalues) of
A. The ith column of H is called the characteristic vector
(or eigenvector) of A corresponding to the characteristic
root of A.
 This proof follows directly from the definition of
eigenvalues. Letting H be a matrix with eigenvalues in
the columns it is obvious that
AH  LH
H ' AH  H ' LH  LH ' H  L
Kronecker Products
 Two special matrix operations that you will encounter
are the Kronecker product and vec() operators.
 The Kronecker product is a matrix is an element by
element multiplication of the elements of the first
matrix by the entire second matrix:
 a11B a12 B
a B a B
21
22

A B 
 


am1 B am 2 B
 a1n B 

 a2 n B 

 

 amn B 
 The vec(.) operator then involves stacking the columns
of a matrix on top of one another.
Download