Notes

Linear Algebra Take the model with L endogenous and K exogenous variables. Then jth equation for the ith observation is L K l 1 k 1  vlj yil    kj xik   ij We can write it ( for j=1,……,L) yi  xi   i where yi   yi1 .... yi 2  and xi   xi1 .....xik  are observation vectors.  and  are coefficient matrices of orders LxL and KxK  v11   v  21 v1l    vLL   11      K1 1L     KL  where there are N observations i =1,……,N. We combine all Y  X   E (1) where Y and X are of order NxL and NxK  y11  Y  y  L1 y1L    yLL   x11  X  x  N1 x1K    xNK   11  E    N1 1L     NL  If  is nonsingular, we can postmultiply above by  1 Y   X 1  E 1 (2) and get the reduced form while the earlier is the structural equation Variance Matrix Let r be a column vector of random variables r1 ,....., rN  var(r1 ) cov  r1 , r2   cov(r2 .r1 )   E  r  E  r    r  E  r          cov(rN , r1 ) cov  r1 , rN       var(rN )  Least Squares y  x   b   x x  x y 1 (3) because ee  y y  2 y xb  bx xb is minimized by changing b. b     x x  x  1 if E    is 0 and x is nonrandom, then b is unbiased. var(b)   x x  x  var    x  x x    2  x x  1 1 1 since var     2  . Substitution of (3) into e  y  xb yields the protection matrix e  My , where M is symmetric and idempotent. M  I  X  X X  X  1 since MX=0, e  My  M  x      M  . Also M 2 =M, causing the residual sum of squares ee   M M    M 2  ee   M  Trace: The trace of a square matrix is the sum of its diagonal elements tr(AB) = tr(BA) if both exist tr(A+B) = tr(A)+tr(B) if they are of same order example: E  ee   E   M    trE  M    = tr E      trX   X X  XE     1   2 tr      2 tr  X X  X X 1   2n   2k   2 n  k  Generalized Least Squares If cov     2 v rather than  2  where v is nonsingular then ˆ   xv 1 x  xv 1 y 1 and   1 var ˆ   2  x v 1 x  (derived later in Aitken`s Theorem) Partitioned matrices: Take the system in (2) and rewrite it as Y  Y1Y2   y13 y12   Y2     yn3 yn 2   y11 y12   Y1     yn1 yn 2  We partition by sets of columns. Rules:  A11 A  21 A12   B11  A22   B21 B12   A11  B11  B22   A21  B21  A11 B11  A12  B21  A21 B11  A22 B21  A B   A12  B12  A22  B22  A11 B12  A12 B22  A21 B12  A22 B22  Inverse of a symmetric partitioned matrix: 1  D   DBC 1 A B   1  B C  1 1 1     C BD C  C BDBC   A1  A1BEBA1  A1BE  =   EBA1 E   where D   A  BC 1 B   and E   C  B A1 B  . We use the first when C is nonsingular and 1 second when A is nonsingular. Application: deviation from mean i i  x i  ix  x x  bottom right of inverse  X MX  1 1 M    ii  n Kronecker Product A special form of partitioning when all submatrices are scalar multiples of the same matrix.  a11 B A  B    am1 B a1n B    amn B  We refer to this as Kronecker product of A and B. If B is pxq, then the order is mpxnq  A  B    C  D   AC  BD 1  A  B   A1  B 1 since  A  B   A1  B 1   AA1  BB 1  A  B   A1  B 1 A   B  C    A  B   A  C   B  C   A   B  A   C  A 1 Application: Joint GLS yj  xj  j   j  X1 0  y1     0     y2    0 j=1,......,L for L endogenous variable 0 0  1   1           0         X L    L    L  Y  Z  If we assume n disturbances of L equations have equal variance and uncorrelated so that var( j )   jj I and for j  i E ( j i ) contains contemporaneous covariances, then the full covariance matrix  11 I  12 I  I  I 22 var( )   21    L1 I  1L I      I    LL I  where Assuming  is non-singular, application of GLS results in ˆ  ( Z ( 1  I ) Z ) 1 Z ( 1 I ) y and var(ˆ )   Z (1 I ) Z  1 ̂ is superior to Least Squares unless X i are all identical causing    j1  X 0 Z    0 0 X 0 0  IX 0  X Z ( 1  I ) Z )  ( I  X )( 1  I )( I  X )   1  ( X X ) var GLS var OLS or  is diagonal, indicating no covariance between j and i. Vectorization Sometimes we need to work with vectors rather than matrices, e.g. finding the variance of  k . If the  ’s are in a matrix form B. Therefore, we vectorize these parameters by: A=  a1 .........aq  (a pxq matrix), ai being the i’th column of A. vec(A) =  a1 a2 aq  which is a pq element column vector. vec(A+B) = vec(A)+vec(B) vec(AB)= ( I  A)vec( B)  ( B   I )vec( A) Definiteness If the quadratic form xAx is positive for any x  0 is said to be positive definite. The covariance matrix  of any random vector is always positive semi definite. Example: BLUE implies Var (  )  var(b) is positive semi definite. Proof:  is linear in y so =  y . Define C = B - ( X X ) X  To write  y as C  ( X X ) 1 X  y  C  ( X X ) 1 X   X     =(CX  I )  + C  ( X X )1 X   Unbiasedness implies CX = 0. var(  )  C  ( X X ) 1 X  var( ) C  ( X X ) 1 X  = 2CC    2 ( X X ) 1   2 CX ( X X ) 1   2 ( X X ) 1 X C  0 0 The difference  2 CC  is positive semi definite because  2 wCC w  ( C w)( Cw) is the non-negative squared length of the vector  Cw (inner product). We use definiteness in evaluation of minima and maxima; e.g. for maximizatior of utility the Hessian should be negative definite. If a matrix is definite and black diagonal, the principal submatrices are also definite. Diagonalization For some nxn matrix A, we seek a vector x so that Ax equals  x (  is a scalar) This is trivially satisfied by x = 0 so we impose xx  1 implying x  0 . Ax   x  ( A   I ) x  0, xx  1 (4)  A   I is singular  A   I  0 (characteristic equation) If A is diagonal, then the terms in the diagonal will be the solution to the characteristic eq. The determinant is a polynomial of degree n and has i as the latent roots. The product of i equals the determinant of A, and the sum i equals the trace of A. All vectors are called characteristic (eigen)vectors. If A is symmetric x Ay  y Ax , so assume i and  j are two different roots with vectors xi and x j . Premultiplying Axi  i xi by x j and Ax j by xi and then subtract x j Axi  xiAx j  (i   j ) xix j For a symmetric matrix, left side vanishes and (i   j ) is non zero, implying orthogonality of the characteristic vectors. X X  I where X   x1 x 2 ......xn  AX=  Ax1 Ax 2 .....Ax n   1 x1.......n xn  AX  X  where  is diagonal with i on the diagonal. Premultiply by X  X AX  X X    This double multiplication diagonalizes A (symmetric). Also postmultiplication n AXX   A  X X    i X i X i i 1 Special cases: If A is square Ax   x A2 x   Ax   2 x which shows that the roots of a square matrix A2 is the square of  . For symmetric and non singular A premultiply both sides by ( A) 1 to obtain A1 x  1  x All latent roots of a positive definite matrix are positive. Aitken’s theorem: Any symmetric positive definite matrix A can be written as A  QQ  where Q is some nonsingular matrix. For example, consider the covariance matrix  2V . This matrix is positive def. So its inverse also is. So we can decompose it V 1  QQ . We premultiply both sides of y  X  by Q Qy  (Q X )   Q     (QX )(QX ) (Q X )Q y  ( X QQ X ) 1 X QQy 1  ( X V 1 X )1 X V 1 y (GLS) var( Q)   2 QVQ   2 Q(QQ)1 Q   2 QQ 1Q 1Q   2 I Cholesky decomposition Rather than using an orthogonal matrix, X, in the previous diagonalization, it is also possible to use a triangular matrix. For instance, consider a diagonal D and an upper triangular C with units in the diagonal. 1 C12 C  0 1 0 0 C13  C23  1   d1 0 D   0 d 2  0 0 0 0  d3  yielding  d1 C DC   d1c12  d1c13 d1c12 d c  d2 d1c12c13  d 2c23 2 1 12  d1c12c13  d 2c23  2 d1c132  d 2c23  d3  d1c13 Any symmetric positive definite matrix A   aij  can be uniquely written as CDC (d1  a11 , c12  a12 a11 , etc.) , which is referred to as the Cholesky decomposition. Simultaneous diagonalization of two matrices ( A   B) x  0 xBx  1 (5) where A and B are symmetric nxn matrices, is being positive definite. B1 is also positive def. So B 1  QQ . (QAQ   I ) y  0 yy  1 y  Q-1 x This shows that (5) can be reduced to (4) when A in (4) is interpreted as Q AQ . If A is symmetric, so is Q AQ . (5) has n solutions 1 ...........n , and if they are distinct, then X i are unique. ( A   B) x  0 as Axi   Bxi and AX  BX  premultiplication by X  X AX  X BX    Therefore X AX   and X BX  I which shows that matrices being diagonalized together, A into the latent root matrix and B into I. Example: Constrained extremum Maximize xAx subject to xBx  1 with respect to x Lagrangian  x AX   ( x Bx  1) L  2 Ax  2  Bx  Ax   Bx x which shows that  must be a root. Next premultiply by x   x Ax   x Bx   shows that the largest root i is the maximum of xAx st xBx  1 Principal components Consider nxk observation matrix Z. The objective is to approximate Z by a matrix of unit rank, vc , where V is an n element vector and C is a k element coefficient vector. The objective is to minimize the discrepancy matrix Z  vc by minimizing the square of all KN discrepancies and also impose vv  1 to be able to solve for c. The solution when v  v1 and c  c1 is ( ZZ   1 I )v1  0 c1  Z v1 So 1 is the largest latent root of ZZ  . Next to approximate Z by v1c1  v2 c2 , we again minimize the discrepancies st. v2 v2  1 and v2 v1  0 . The solution again is ( ZZ   2 I )v2  0 c2  Z v2 which is the second largest root and the corresponding vector. You can generalize to i roots. Derivation for 1 root Since the sum of squares of any matrix A is equal to trace AA , the discrepancy matrix sum of squares is tr  Z  vc   Z  vc   trZ Z  trcvZ  trZ vc  trcvvc inner products since c=vZ trZ Z  2v Zc   v v  c c   trZ Z  2v Zc  c c Derivative wrt c is 2Z v  2c  c  Z v Substituting this back in, we get trZ Z  vZZ v , so to minimize the discrepancy, maximize vZZ v wrt v st vv  1 Langrangian  vZZ v    vv 1 ZZ v   v  v ZZ v   v v   This, shows that the maximum root,  , minimizes ZZ  . Extend to i cases. Z Z need not be diagonal; the observed variables can be correlated. But the principle components are all uncorrelated because vj v j  0 for i  j . Therefore, these components can be viewed as “uncorrelated linear combinations of correlated variables”.

Notes

Related documents

Products

Support

Notes

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib