mathb6

Section 6 Applications 1. Differential Operators Definition of differential operator: Let  f1  x   f  x   f  x1 , x2 ,, xm   f  x    2        f x  n  Then,  f1  x   x 1  f  x  f  x   1   x2 x    f  x   1  xm f 2  x  x1 f 2  x  x2  f 2  x  xm     f n  x   x1   f n  x   x2    f n  x    xm  mn Example 1: Let f x   f x1 , x2 , x3   3x1  4 x2  5x3 Then, 1  f  x      x 1    3 f  x   f  x       4  x2    x  f  x   5     x3  Example 2: Let  f1 x   2 x1  6 x2  x3  f x   f x1 , x2 , x3    f 2 x   3x1  2 x2  4 x3   f 3 x  3x1  4 x2  7 x3  Then,  f1  x    x1 f  x   f1  x    x x  f 2x   1  x3 f 3  x    x1   2 3 f 3  x      6 2  x2  f 3  x    1 4 x3  f 2  x  x1 f 2  x  x2 f 2  x  x3 Note: In example 2, 2 f x   3 3  1  x1  4   x2   Ax ,    7   x3  6 2 4 where 2 3 4 7  2 A  3 3 Then,  1 4  , 7  6 2 4  x1  x   x2  .  x3  f  x    Ax    At . x x Theorem: f  x   Am n xn1 f  x   At x  Theorem: Let A be an nn matrix and x be a  n 1 vector. Then,   x t Ax  Ax  At x x [proof:]  x1  x  n n 2 t A  aij , x     x Ax    aij xi x j    i 1 j 1    xn      x t Ax Then, the k’th element of x 3  is n  n     akj xk x j   aij xi x j   t  x Ax j 1 i  k j 1    xk xk    n    a kj xk x j  j 1    xk n     aij xi x j   i  k j 1  xk      2a kk xk   a kj x j    aik xi jk   ik        a kk xk   a kj x j    a kk xk   aik xi  jk ik     n n j 1 i 1   a kj x j   aik xi  rowk  Ax  col kt  Ax    rowk  Ax  rowk At x t while the k’th element of Ax  A x is   rowk  Ax  rowk At x . Therefore,    x t Ax  Ax  At x . x Corollary: Let A be an n  n symmetric matrix, Then,    x t Ax  2 Ax . x 4 Example 3:  x1  1 , A  3 x x 2        x3  5 3 4 7 5 7  9  . Then, x t Ax  x12  6 x1 x2  10 x1 x3  4 x22  14 x2 x3  9 x32   x t 2 x1  6 x 2  10 x3   2 6 10  x1    6 x  6 x  8 x  14 x 8 14 1 2 3     2  10 x1  14 x 2  18 x3  10 14 18  x3  1 3 5  x1   2 3 4 7   x 2   2 Ax 5 7 9   x3   Ax    x Example 3: For standard linear regression model Yn1  X n p  p1   n1 ,  x11 x12 Y1  x Y  x22 21 2 Y   , X         Y  xn1 xn 2 n    x1 p   1  1       x2 p  2 ,    ,    2              xnp    p   n  The least square estimate b is the minimizer of S    Y  X  Y  X  . t To find b, we need to solve S   S   S   S    0,  0, ,  0  0. 1  2  p  Thus, 5  S    Y t Y   t X t Y  Y t X   t X t X      Y t Y  2Y t X   t X t X      2Y t X 0  t   2 X t X  X t X  X t Y  b  X t X    1 X tY Theorem:   A  aij  x  r c  a11  x  a  x    21     ar 1  x  a12  x  a22  x   ar 2  x      a1c  x  a2 c  x     arc  x  Then, A1  A  1   A1  A x  x  where  a11  x   x  a  x  A  21   x x    ar1  x   x a12  x  x a22  x  x  ar 2  x  x 6     a1c  x   x  a2 c  x    x    arc  x   x  Note: Let ax  be a function of x. Then,  1    ax    '     a x    1 a ' x  1 x a 2 x  ax  ax  . Example 4: Let A  X t X  I , where X is an m n matrix, I is an n  n identity matrix, and  is a constant. Then, A1  A  1   A1  A       X  X  X   X  I  I X X  I  X  I  X X  I    X X  I t t t 1  X t X  I  1 t 1 t 7 1 1 t X  I  1 2. Vectors of Random Variable In this section, the following topics will be discussed:  Expectation and covariance of vectors of random variables  Mean and variance of quadratic forms  Independence of random variables and chi-square distribution 2.1 Let Expectation and covariance Z ij , i  1,, m, j  1,, n, be random variables. Let  Z11 Z12 Z Z 22 Z   21     Z m1 Z m 2  Z1n   Z 2 n       Z mn  be the random matrix. Definition:  E ( Z11 ) E ( Z12 )  E (Z ) E (Z ) 21 22 E Z         E ( Z m1 ) E ( Z m 2 )  E ( Z1n )   E ( Z 2 n )   E Z ij  mn .      E ( Z mn )    X1  Y1  X  Y  2 X   Y   2    and    be the m  1 and n 1 random Let     X  m Yn  vectors, respectively. The covariance matrix is 8  Cov( X 1 , Y1 ) Cov( X 1 , Y2 )  Cov( X , Y ) Cov( X , Y ) 2 1 2 2 C  X ,Y        Cov( X m , Y1 ) Cov( X m , Y2 )  Cov( X 1 , Yn )   Cov( X 2 , Yn )   CovX i , Y j  mn      Cov( X m , Yn )   and the variance matrix is  Cov( X 1 , X 1 ) Cov( X 1 , X 2 )  Cov( X , X ) Cov( X , X ) 2 1 2 2 V X   CX , X        Cov( X m , X 1 ) Cov( X m , X 2 )  Cov( X 1 , X m )   Cov( X 2 , X m )       Cov( X m , X m ) Theorem:     Alm  aij , Bm p  bij are two matrices, then E AZB  AEZ B . [proof:] Let  w11 w 21 W      wl1 w12  w1 p  w22  w2 p   AZB      wl 2  wlp  t11 t12  t1n  t  b b12  21 t 22  t 2 n   11      b21 b22  TB     t t  t i 1 i 2 in         bn1 bn 2    t l1 t l 2  t ln  where 9  b1 j  b2 j    bnj  b1 p   b2 p       bnp  t11 t T   21    t l1 t12  t 22    tl 2   a11 a t1n   21    t 2n   AZ      ai1    t ln    al1 a12 a 22  ai 2  al 2  a1m   a 2 m   Z 11     Z 21   aim         Z m1   alm  Z 12  Z 22  Z 2r  Z m2  Z1r   Z mr Z 1n   Z 2 n       Z mn   Thus,  m  wij   t ir brj     ais Z sr brj r 1 r 1  s 1  n n n m  n m  E wij   E  ais Z sr brj    ais E Z sr brj  r 1 s 1  r 1 s 1 Let   w11   w  W   21      wl1  w12  w1p    w22  w2 p   AE ( Z ) B      wl2  wlp  t11 t12  t1n      b11 b12 t t  t 2n   21 22       b21 b22  T B         ti1 ti 2  tin          bn1 bn 2      t t  t  ln   l1 l 2 where 10  b1 j  b2 j    bnj  b1 p   b2 p       bnp   t11  t  T   21     t l1  t12  t 22  t l2  a11 a  21     ai1     al1 a12 a 22  ai 2  al 2  t1n    t 2n   AE ( Z )      t ln   a1m   a 2 m   E ( Z 11 )     E ( Z 21 )   aim         E ( Z m1 )   alm  E ( Z 12 ) E ( Z 22 )  E (Z m2 )  E ( Z 1r )  E (Z 2r )    E ( Z mr )  E ( Z 1n )   E ( Z 2 n )       E ( Z mn ) Thus, n m  m  w   t b     ais E ( Z sr ) brj   ais E Z sr brj r 1 r 1  s 1 r 1 s 1  n  ij Since  ir rj wij  E wij , n for every i, j , then E W   W  . Results:  E  X mn  Z mn   E  X mn   E Z mn   E  Amn X n1  BmnYn1   AE X n1   BE Yn1  2.2 Mean and variance of quadratic Forms Theorem: Y1  Y  Y   2    be an n 1 vector of random variables and Let   Yn    Ann  aij be an n n symmetric matrix. If EY   0 and 11   V Y   nn   ij nn .  Then,  E Y t AY  tr A  , where trM  is the sum of the diagonal elements of the matrix M. [proof:]  a11 a12 a a22  Yn  21     an1 an 2 Y t AY  Y1 Y2  a1n  Y1   a2 n  Y2          ann  Yn   n      a jiY j Yi i 1  j 1  n n n   a jiY jYi i 1 j 1 Then,  n n  n n  E (Y AY )  E   a jiY jYi    a ji E Y jYi   i 1 j 1  i 1 j 1 t   a jiCovY j , Yi  n n i 1 j 1 n n   a ji ij i 1 j 1 On the other hand,  11  12   A   21 22      n1  n 2   1n   a11 a12   2 n  a 21 a 22         nn  a n1 a n 2     n    1 j a j1 a1n   j 1 a 2 n          a nn      12  n  j 1 2j              n    nj a jn   j 1  a j2  Then, n n n j 1 j 1 j 1 tr (A)    1 j a j1    2 j a j 2      nj a jn n n    ij a ji i 1 j 1 Thus, n n i 1 j 1   tr(A)    ij a ji  E Y t AY Theorem:   E Y t AY  tr A   t A where , V Y    and EY    . Note: 2 For a random variable X, Var  X    and E X    . Then         E aX 2  aE X 2  a Var  X   E  X   a  2   2  a 2  a 2 . 2 Corollary: If Y1 , Y2 ,, Yn are independently normally distributed and have 2 common variance  . Then   E Y t AY   2tr   t A . Theorem: If Y1 , Y2 ,, Yn are independently normally distributed and have 13 2 common variance  . Then     Var Y t AY  2 2tr A2  4 2 t A2 . 2.3 Independence of random variables and chi-square distribution Definition of Independence:  X1  Y1  X  Y  2 X   Y   2    and    be the m  1 and n 1 random Let     X  m Yn  vectors, respectively. f X x1 , x2 ,, xm  Let and fY  y1 , y2 ,, yn  be the density functions of X and Y, respectively. Two random vectors X and Y are said to be (statistically) independent if the joint density function f x1 , x2 ,, xm , y1 , y2 ,, yn   f X x1 , x2 ,, xm  fY  y1 , y2 ,, yn  Chi-Square Distribution: k  Y ~  k2  gamma ,2  has the density function 2  f y  where   1 2 2  k  k y k 1 2  y exp  ,  2  is gamma function. Then, the moment generating 14 function is M Y t   Eexp tY   1  2t  k 2 and the cumulant generating function is k kY t   log M Y t     log 1  2t  .  2  Thus,   k  1   k t   E Y    Y    k   2   1  2t  t  0  t  t  0  2  and   k    2 kY t  1  Var Y       2  2   2k    2 2 1  2t   t 0  t  t  0  2  Theorem: If Q1 ~  r21 , Q2 ~  r22 , r1  r2 independent of and Q  Q1  Q2 is statistically 2 Q2 . Then, Q ~  r1  r2 . [proof:]  r1 M Q1 t   1  2t  2  E exp tQ1   E exp t Q2  Q   independen ce of  E exp tQ2 E exp tQ    Q2 and Q  r2  1  2t  2    M Q t  Thus,   r1  r2  M Q t   1  2t  2  the moment generating function of  r21  r2 Therefore, Q ~  r21  r2 . 15 3. Multivariate Normal Distribution In this chapter, the following topics will be discussed:  Definition  Moment generating function and independence of normal variables  Quadratic forms in normal variable 3.1 Definition Intuition: Let  Y ~ N  , 2  . Then, the density function is 1 2    y   2   1  f y   exp   2  2 2     2  1 2 1 2 1 1  1   1   y     exp   y       Var Y   2  Var Y   2  Definition (Multivariate Normal Random Variable): A random vector Y1  Y  Y   2  ~ N  ,      Yn  with EY    , V Y    has the density function 1 2 n 2  1   1  1  t f  y   f  y1 , y2 ,  , yn    exp   y     1  y        2   det    2  Theorem: 16 Q  Y     1 Y    ~  n2 t [proof:] Since  is positive definite, matrix ( TT t   TT t , where T T T  I t  1  ) and    0   0  1  T1T t T 1  T t 2    0   0  . Thus, is a real orthogonal 0 0  . Then,    n  Q  Y     1 Y    t  Y    T1T t Y    t  X t 1 X where X  T t Y    . Further, Q  X t 1 X  X 1 n  i 1 X2  1   1 0 X n    0   Xi  X       i i 1  i  2 i n Therefore, if we can prove 0 1 2     0   0   X1  X   0  2        1  X n   n  2 X i ~ N 0, i  17 and Xi are mutually independent, then  Xi ~ N 0,1, Q      i i 1  i n Xi The joint density function of X 1 , X 2 ,, X n 2   ~  n2   . is g x   g x1 , x2 ,, xn   f  y  J , where   y1    x1   y 2   y   J  det   i    det   x1   x j          y n   x  1  det T           y1 x2 y 2 x2  y n x2      X  T Y    Y    TX   Y  T X  t   det TT t  det I   1    t 2 1   det T  det T  det T      det T   1    1 Therefore, the density function of   X 1 , X 2 ,, X n 18 y1    xn    y 2    xn     y n    xn   g x   f  y  n 2 1 2 n 2 1 2 n 2 1 2  1   1   1  t 1      exp y    y       2   2   det     1   1   1   exp  x t 1 x      2   det    2    1 n xi2   1   1      exp  2      2  det     i 1  i       t      det   det T  T n     1 n xi2    1  2  1  t  exp     det TT   det I   n   2      2 i 1 i   n i         det    i 1   i   i 1   1   1 2 2 n 2   x   1   1        exp  i  i 1  2   i   2i     1 2  Therefore, 3.2 X i ~ N 0, i  and Xi    are mutually independent. Moment generating function and independence of normal random variables Moment Generating Function of Multivariate Normal Random Variable: Let 19 Y1   t1  Y  t  2 Y    ~ N  ,  , t   2   .     Yn  tn  Then, the moment generating function for Y is    M Y t   M Y t1 , t2 ,, tn   E exp t tY  E exp t1Y1  t2Y2    tnYn  1    exp  t t  t t t  2   Theorem: If Y ~ N  ,  and C is a pn matrix of rank p, then  CY ~ N C , CC t  . [proof:] Let X  CY . Then,        s  C t    E exp s Y   M X t   E exp t t X  E exp t t CY t t t t  s  t C  1    exp  s t  s t s  2   1    exp  t t C  t t CC t s  2   Since M X t  is the moment generating function of 20   N C , CC t ,  CY ~ N C , CC t  ◆ . Corollary:  2 If Y ~ N  , I  then  TY ~ N T ,  2 I  , where T is an orthogonal matrix. Theorem: If Y ~ N  ,  , then the marginal distribution of subset of the elements of Y is also multivariate normal.  Y1   Yi1  Y  Y  2  Y    ~ N  ,   Y   i 2  ~ N   ,  , then , where         Yn  Yim    i21i1  i21i2  i1   2   2   i i i i   2  2i2 m  n, i1 , i2 ,  , im  1,2,  , n ,    ,   2 1      2   2  im   imi1  imi2    i21im     i22im        i2mim  Theorem: t Y has a multivariate normal distribution if and only if a Y is univariate normal for all real vectors a. [proof:] : 21 Suppose EY    , V Y    . a tY is univariate normal. Also,     E atY  at E Y   at , V atY  atV Y a  at a . Then,   a tY ~ N a t , a t a . Since    Z ~ N  , 2    1 t   t M X 1  exp  a   a a   1 2   2    M 1  exp        Z 2     E exp  X      E exp a t Y  M Y a  Since 1   M Y a   exp  a t  a t a  , 2   is the moment generating function of distribution N  , , thus Y has a multivariate N  , . : ◆ By the previous theorem. 3.3 Quadratic form in normal variables Theorem:  2 If Y ~ N  , I  and let P be an n  n symmetric matrix of rank r. Then, 22 t   Y   PY    Q 2 2 is distributed as  r if and only if P 2  P (i.e., P is idempotent). [proof] : Suppose P 2  P and rank P  r . Then, P has r eigenvalues equal to 1 and n  r eigenvalues equal to 0. Thus, without loss generalization, 1 0   t P  TT  T  0    0 0  0    0   0 1   0       0  0  0 0   t T 0   0  where T is an orthogonal matrix. Then, t t  Y    PY    Y    TT t Y    Q  2  Z t Z 2 2 Z  T Y     Z t 1  Z1  Z  1  2 Z1 Z 2  Z n   2      Z n  Z12  Z 22    Z r2  2  23 Z2  Zn  t  Since   Z  T t Y    and Y   ~ N 0, 2 I , thus     Z  T t Y    ~ N T t 0, T tT 2  N 0, 2 I . Z1 , Z 2 ,, Z n are i.i.d. normal random variables with common variance  2 . Therefore, Q Z12  Z 22    Z r2 2 2 2 2 Z  Z  Z    1    2      r  ~  r2       : Since P is symmetric, P  TT t , where T is an orthogonal matrix and  a diagonal matrix with elements Since  is 1 , 2 ,, r . Thus, let Z  T t Y    .  Y   ~ N 0, 2 I ,    Z  T t Y    ~ N T t 0, T t T 2  N 0, 2 I That is, Z1 , Z 2 , , Z r t t  Y    PY    Y    TT t Y    Q  2   2 Z  T Y     Z t Z2 1 2 r   Z i 1 i . are independent normal random variable with variance  2 . Then, Z t Z  2 i 2 r The moment generating function of Q  24  Z i 1 i 2 2 i is  Zn  t    r   i Z i2  E exp  t i 1 2        r    ti zi2    zi2 exp  exp  2  2 2  2  2    1  i 1    r    2 r   t  Z i i   E exp  2    i 1         dzi   ti z i2    z i2  dz i exp  2  exp  2  2 2     2  1  i 1     z i2 1  2i t   dz i   exp  2 2 2 i 1   2    r   z i2 1  2i t   1  2i t 1 dz i  exp  2 2  2 1  2i t   2 i 1    r 1 r 1  1  2i t i 1 r   1  2i t  1 2 i 1 Also, since Q is distributed as 1  2t  r 2  r2 , the moment generating function is also equal to . Thus, for every t, E exp tQ   1  2t  r r 2   1  2i t  i 1 Further, 25 1 2 1  2t  r r   1  2i t  . i 1 By the uniqueness of polynomial roots, we must have i  1 . Then, P2  P by the following result: a matrix P is symmetric, then P is idempotent and rank r if and only if it has r eigenvalues equal to 1 and n-r eigenvalues equal to 0. ◆ Important Result: t Let Y ~ N 0, I  and let Q1  Y P1Y t and Q2  Y P2Y be both distributed as chi-square. Then, Q1 and Q2 are independent if and only if P1P2  0 . Useful Lemma: 2 2 If P1  P1 , P2  P2 and P1  P2 is semi-positive definite, then  P1P2  P2 P1  P2  P1  P2 is idempotent. Theorem:  2 If Y ~ N  , I Q1  and let t  Y    P1 Y     ,Q 2 2 t  Y    P2 Y     2 2 2 If Q1 ~  r1 , Q2 ~  r2 , Q1  Q2  0 , then Q1  Q2 and Q2 26 2 are independent and Q1  Q2 ~  r1  r2 . [proof:] We first prove Q1  Q2 ~  r21  r2 . Q1  Q2  0 , thus Q1  Q2 Since t   P1  P2 Y    Y     Y   ~ N 0, 2 I 2 , Y  is any vector in 0 R n . Therefore, P1  P2 is semidefinite. By the above useful lemma, P1  P2 is idempotent. Further, by the previous theorem, Q1  Q2 since t   P1  P2 Y    Y     2 ~  r21  r2 rank P1  P2   trP1  P2   trP1   trP2   rank P1   rank P2   r1  r2 We now prove Q1  Q2 and Q2 are independent. Since P1P2  P2 P1  P2  P1  P2 P2  P1P2  P2 P2  P2  P2  0 By the previous important result, the proof is complete. 27 ◆ 4. Linear Regression Let  Y  X   ,  ~ N 0, 2 I . Denote S    Y  X  Y  X  . t In linear algebra,  X 1 p1   X 11  1 X  X  1 2 p  1  X   0    1  21      p1              X X 1  np1    n1  is the linear combination of the column vector of X . That is, X  R X   the column space of X . Then, S    Y  X 2  the square distance between Y and X Least square method is to find the appropriate between Y and Xb X X , for example, Y . Thus, Xb and the other linear X1 , X 2 , X 3 , . is the information provided by covariates to interpret the response such that the distance Y is smaller than the one between combination of the column vectors of Intuitively, Xb X 1 , X 2 ,, X p1 is the information which interprets Y most accurately. Further, S    Y  X  Y  X  t  Y  Xb  Xb  X  Y  Xb  Xb  X  t  Y  Xb  Y  Xb   2Y  Xb   Xb  X    Xb  X   Xb  X  t  Y  Xb  Xb  X 2 t 2 t  2Y  Xb  X b    t 28 b If we choose the estimate  of Y  Xb such that is orthogonal every R X  , then Y  Xb X  0 . Thus, t vector in S    Y  Xb That is, if we choose b 2  Xb  X Y  Xbt X satisfying   S b   Y  Xb Thus, b satisfying b 2 of  Y  Xbt X 2 ,  Xb  Xb Y  Xbt X .  0 , then S b   Y  Xb and for any other estimate 2 0 2  Y  Xb 2  S b  . is the least square estimate. Therefore,  0  X t Y  Xb  0  X tY  X t Xb  b  X t X  X tY 1 Since  Yˆ  X b  X X t X  1  X tY  PY , P  X X t X P  1 P is called the projection matrix or hat matrix. Y on the space spanned by the covariate vectors. The vector of residuals is projects the response vector e  Y  Yˆ  Y  X b  Y  PY  I  P Y We have the following two important theorems. Theorem: 1. P and I  P are idempotent. 2. rank I  P  trI  P  n  p 29 Xt, . I  PX  0 3. 4. E mean residual sum of square     Y  Yˆ t Y  Yˆ  E s2  E  n p       2  [proof:] 1.  PP  X X tX  1  XtX XtX  1  Xt  X XtX  1 Xt  P and I  PI  P  I  P  P  P  P  I  P  P  P  I  P . 2. Since P is idempotent, rank P  trP . Thus,   rank P  trP  tr X X t X  1   X t  tr X t X  1  X t X  trI p p   p Similarly, rank I  P   trI  P   trI   trP   tr A  B   tr A  trB  n p 3. I  P  X 4.   X  PX  X  X X t X   1 XtX  X  X  0  t RSS model p   et e  Y  Yˆ Y  Yˆ  Y  Xb  Y  Xb   t  Y  PY  Y  PY  t  Y t I  P  I  P Y t  Y t I  P Y  I  P is idempotent 30  Thus,  E RSS model p   E Y t I  P Y   E Z    , V Y       t t  trI  P V Y    X  I  P  X    E Z AZ    t    tr A    A     tr I  P  2 I  0  I  P X  0      2trI  P     n  p  2 Therefore,  RSS model p  E mean residual sum of square   E   n p   2 Theorem: If   Y ~ N X , 2 I , Then,   where X is a n  p matrix of rank p .   1 1. b ~ N  , 2 X t X 2. b   t X t X b    ~  2 2 p RSS model p  3. 2  n  p s 2 2  ~  n p 2  b   t X t X b    4. 2 is independent of RSS model p  2  n  p s 2  2 31 . [proof:] 1. Z Since for a normal random variable ,  Z ~ N  ,    CZ ~ N C , CC t thus for    Y ~ N X , 2 I ,   1 b  XtX  ~ N  X t X   1 X tY  X t X , X t X  1  X t 2 I X t X    X X X X     N  , X X     N , X tX t 1 t 1 2 1 t 2.   b   ~ N 0, X t X  1  t X t   2  1  2 . Thus, b    t X X    t 1 2 1 t  b    X t X b    b     2   Z ~ N 0,     ~   t 1 2  Z  Z ~  p  . 2 p 3. I  PI  P  I  P and rank I  P  n  p , thus    for A2  A, rank  A  r   Y  X t I  P Y  X  ~  2  and Z ~ N  , 2 I  n p 2    t  Z    AZ     ~  r2   2     32  Since I  P X  0, Y t I  P X  0,  X t I  P Y  X   0 , RSS model p  n  p s 2 Y t I  P Y   2 2 2 t  Y  X  I  P Y  X   ~ 2 . n p 2 4. Let Q1 t  Y  X  Y  X   2 t t  Y  Xb  Y  Xb    Xb  X   Xb  X   2 t Y t I  P Y b    X t X b      2 2  Q2  Q2  Q1  where Q2 t t  Y  Xb Y  Xb Y  PY  Y  PY    2  2 Y t I  P Y 2 and Q1  Q2 t t  Xb  X   Xb  X  b    X t X b      2  Xb  X  2 2 2 0 Since 33 Q1 t t  Y  X  Y  X  Y  X  I 1 Y  X    ~ 2 2 2  Z  Y  X ~ N 0, I  Z  I  2 t 2 1 Z  Q1 ~  n2  n and by the previous result, Q2 t  Y  Xb Y  Xb RSS model p    2 2 t  Y  X  I  P Y  X  2  ~  n p 2 therefore, Q2  , RSS model p  2 is independent of Q1  Q2 t  b    X t X b     2 .  Q1 ~  r21 , Q2 ~  r22 , Q1  Q2  0, Q1 , Q2 are quadratic form     of multivaria te normal  Q is independen t of Q  Q  2 1 2   34 5. Principal Component Analysis: 5.1 Definition:  xi1  x  i2 X i   , i  1,, n,  Suppose the data generated by the random    xip   Z1  Z  2 Z      . Suppose the covariance matrix of Z is variable    Z p  Cov( Z1 , Z 2 )  Var ( Z1 ) Cov( Z , Z ) Var ( Z 2 ) 2 1       Cov( Z p , Z1 ) Cov( Z p , Z 2 ) Let  s1  s  2 a      s p  combination of  Cov( Z1 , Z p )   Cov( Z 2 , Z p )      Var ( Z p )   a t Z  s1Z1  s2 Z 2    s p Z p  Z 1 , Z 2 ,, Z p . Then, Var(a t Z )  a t a and Cov(b t Z , a t Z )  b t a , 35 the linear  The  b  b1 where t b2  b p . principal components are those uncorrelated Y1  a1t Z , Y2  a2t Z ,, Yp  a tp Z combinations Var (Yi ) are as large as possible, where whose a1 , a 2 ,  , a p linear variance are p  1 vectors. The procedure to obtain the principal components is as follows: First principal component  linear combination a1t Z that maximizes Var (a t Z ) subject to a a  1 and a1 a1  1.  Var (a1 Z )  Var (b Z ) t t for any t t btb  1 Second principal component  linear combination maximizes Var (a t Z ) at a  1 subject to Cov(a1t Z , a 2t Z )  0 .  a 2 Z t , a 2t Z that a2t a2  1. and t maximize Var (a Z ) and is also uncorrelated to the first principal component.   At the i’th step, i’th principal component  linear combination ait Z that maximizes Var (a t Z ) subject Cov(ait Z , a kt Z )  0, k  i at a  1 to  . , a it a i  1. a it Z and maximize Var (a t Z ) and is also uncorrelated to the first (i-1) principal 36 component. Intuitively, these principal components with large variance contain “important” information. On the other hand, those principal components with small variance might be “redundant”. For example, suppose we have 4 variables, Z1 , Z 2 , Z 3 Var (Z1 )  4,Var (Z 2 )  3,Var (Z 3 )  2 suppose Z1 , Z 2 , Z 3 and and Z 4 . Let Z 3  Z 4 . Also, are mutually uncorrelated. Thus, among these 4 variables, only 3 of them are required since two of them are the same. As using the procedure to obtain the principal components above, then the first principal component is 1 0 0  Z1  Z  0 2   Z 1 Z 3  ,   Z 4  the second principal component is 37 0 1 0  Z1  Z  0 2   Z 2 Z3  ,   Z 4  the third principal component is , 0 0 1  Z1  Z  0 2   Z 3 Z3    Z  4 and the fourth principal component is  0 0  1 2  Z1     1  Z 2  1  Z   (Z 3  Z 4 )  0 2 3 2 .   Z 4  Therefore, the fourth principal component is redundant. That is, only 3 “important” pieces of information hidden in Z1 , Z 2 , Z 3 and Z4 . Theorem: a1 , a 2 , , a p are the eigenvectors of  corresponding to eigenvalues 1   2     p components are . In addition, the variance of the principal the eigenvalues Var (Yi )  Var (ait Z )  i . 38 1 ,  2 ,,  p . That is [justification:] Since  is symmetric and nonsigular,   PP , where P is an t orthonormal matrix, elements vector  is a diagonal matrix with diagonal 1 ,  2 ,,  p , ai ( the i’th column of P is the orthonormal ait a j  a tj ai  0, i  j, ait ai  1) eigenvalue of  corresponding to and i is the a i . Thus,   1a1a1t  2 a2 a2t     p a p a tp . For any unit vector is a basis of b  c1a1  c 2 a 2    c p a p ( a1 , a 2 ,  , a p R P ), c1 , c 2 ,  , c p  R , p c i 1 2 i  1, Var (b t Z )  b t b  b t (1 a1a1t  2 a 2 a 2t     p a p a tp )b  c12 1  c22 2    c 2p  p  1 , and Var(a1t Z )  a1t a1  a1t (1a1a1t  2 a2 a2t     p a p a tp )a1  1 . Thus, a1t Z is the first principal component and Var (a1 Z )  1 . t Similarly, for any vector c satisfying Cov(c t Z , a1t Z )  0 , then c  d 2 a2    d p a p , where d 2 , d 3 ,  , d p  R and . p d i 2 39 2 i  1 . Then, Var (c t Z )  c t c  c t (1 a1 a1t  2 a 2 a 2t     p a p a tp )c  d 22 2    d p2  p  2 and Var(a2t Z )  a2t a2  a2t (1a1a1t  2 a2 a2t     p a p a tp )a2  2 . Thus, a 2t Z is the second principal component and Var (a 2 Z )  2 . t The other principal components can be justified similarly. 5.2 Estimation: The above principal components are the theoretical principal components. To find the “estimated” principal components, we estimate the theoretical variance-covariance matrix  by the sample variance-covariance ̂ ,  Vˆ ( Z1 ) Cˆ ( Z1 , Z 2 ) ˆ C ( Z 2 , Z1 ) Vˆ ( Z 2 ) ˆ       Cˆ ( Z p , Z1 ) Cˆ ( Z p , Z 2 )  Cˆ ( Z1 , Z p )    Cˆ ( Z 2 , Z p ) ,     Vˆ ( Z p )  where  X n Vˆ ( Z j )   X n Cˆ ( Z j , Z k )  i 1 ij i 1  Xj 2 ij n 1  X j X ik  X k  n 1 40 , , j, k  1,, p. , n and where Xj  X i 1 n ij . Then, suppose e1 , e2 ,, e p are orthonormal eigenvectors of ̂ corresponding to the eigenvalues ˆ1  ˆ2    ˆ p . Thus, the i’th estimated principal component is Yˆi  eit Z , i  1, , p. and the estimated variance of the i’th estimated principal component is Vˆ (Yˆi )  ̂i . 41

mathb6

Related documents

Products

Support

mathb6

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib