STAT 501 Formula Sheet Moments: X Multivariate Normal Distributions:

advertisement
STAT 501 Formula Sheet
Moments:
2
66
= 66
4
3
7
2 7
... 775
2
P 666
=6
4
1
p
11
...
p 3
p 777
12
21
...
E (AX) = A
V (AX) = AA0
Cov(AX; B X) = AB 0
1
22
. . . ...
2
p pp
75
1
Multivariate Normal Distributions:
0 ;
f (x) = (2)p=1jj = e; (x ; ) (x ; )
n
n
X
;
0 ;
L(; ) = (2)np=1 jjn= e; tr( A)e; (x ; ) (x ; ) where A = (xj ; x )(xj ; x )0
j
2
1
2
1 2
2
2
1
1
2
1
=1
2p= h ip= jj =
Volume = p;(
p=2) p ;
2
2
( )
1
2
2
jj = p
1 2
1
2
tr() = + + + p = + + + pp
where ei = iei and keik = 1 and e0iej = 0 (i 6= j )
1
2
Conditional Moments:
+ ; (x ; )
; ; rik rjk
rijk = q rij ; q
1 ; rik 1 ; rjk
t = r pn ; 2 ; k on (n ; 2 ; k) d:f :
1;r
12
1
1
22
2
2
2
2
11
p
12
1
22
11
22
21
2
!
1 + r
1
;
1
Z = 2 ln 1 ; r : N 12 ln 11 +
; n;3;k
!
Sample Moments:
n
n
X
X
x = n1 xj
S = n ;1 1 (xj ; x )(xj ; x )0
j =1
j =1
Percentage Points for t-Distribution:
(tv;= ); = ;:0953 ; :631f (v) + :81g() + :076h(v; )
p
where f (v) = 1 and h(v; ) = (2 v) =v and g() = [;ln((2 ; ))]; =
2
1
v+1
1
1 2
STAT 501 Formula Sheet
(page 2)
Inferences for Mean Vectors:
T = n(x ; )0S ; (x ; ) and F = p(nn;;p1) T on (p; n ; p) d:f :
s
0
a0x t a S a
2
1
0
n;1) 2k
n
(
s
2
0
s
0
a0x p(nn;;p1) F p;n;p a nS a
; 1 T on (p; n + n ; p ; 1) d:f :
T = nn+nn (x ; x )0S ; (x ; x ) and F = n(n++nn;;p 2)
p
(
1
2
1
2
)
1
2
1
2
1
1
2
1
s
a0S a n1 + n1
s
a0(x ; x ) np(n+ n+ n; p;;2)1 F p;n n ;p;
a0(x1 ; x 2) t
(n1 +n2 ;2) 2k
1
1
2
(x1 ; x 2)0
1
1
1
2
(
2
2
2
1+ 2
1
2
s
a0 S a
1)
;
1
n S + n S (x ; x )
1
2
2
1
n +n
1
1
2
1
1
2
2
1
2
T = n(C x ; C )0(CSC 0); (C x ; C ) and F = r(nn;;r1) T on (r; n ; r)df ; r = rank(C )
2
1
0
2
0
Inferences for Covariance Matrices:
S=
g
X
g
X
(nj ; 1)Sj =( (nj ; 1))
j =1
j =1
g
X
g
X
j =1
j =1
2
3
g
X
2
p
+
3
p
;
1
1 ;P 1
;
4
5
C = 1; 6(p + 1)(g ; 1)
g
n
;
1
(
n
;
1)
j
j
j
j
M = [ (nj ;1)]lnjS j; (nj ;1)lnjSj j
2
1
=1
=1
X = MC ; with d:f : = 21 p(p + 1)(g ; 1)
2
1
"
1
X = 1 ; 6(n ; 1) 2p + 1 ; p +2 1
2
!#
h
i
(n ; 1) lnj j ; lnjS j + tr(; S ) ; p on p(p 2+ 1) d:f :
0
0
1
2
p
+
5
X = ; n ; 1 + 6 lnjRj on p(p ; 1)=2 d:f :
"
#h
i
p
(
p
+
1)
(2
p
;
3)
X = ; (n ; 1) ; 6(p ; 1)(p + p ; 4) lnjS j ; p ln(w ) ; (p ; 1)ln(1 ; r) ; ln(1 + (p ; 1)r)
XX
on 21 p(p + 1) ; 2 d:f : where w = 1p tr(S ) and r = p(p ;1 1)w
Sij
i 6= j
2
2
2
2
2
2
2
STAT 501 Formula Sheet
(page 3)
" XX
#
p
X
(
n
;
1)
(rik ; r) ; ^ (rk ; r) on 12 (p + 1)(p ; 2) d:f :
X = (1 ; r)
k
i<k
p
XX
X
2
1
[1 ; (1 ; r) ]
where r = p(p ; 1)
rik ; rk = p ; 1 rik ; ^ = (pp ;; 1)
(p ; 2)(1 ; r)
i
i<k
2
2
2
2
=1
2
2
2
=1
i6=k
MANOVA:
Xnp = Anr rp+ 2np
H : Ckr rpMpu = Oku
H = M 0X 0A(A0A); C 0[C (A0A); C 0]; C (A0A); A0XM
E = M 0X 0[I ; A(A0A); A0]XM
=b ab ; c !
1
;
Wilks Criterion : = jE j=jH + E j and F = =b
on (uk; ab ; c) d:f :
uk
s
u
;
k
+
1
b = uu+k k;;4 5 c = uk 2; 2
where a = (n ; r) ; 2
1
0
1
1
1
1
1
1
2
2
2
2
Principal Components:
p
p
X
X
= i eie0i
S = ^ie^ie^0i
i=1
i=1
k-th estimated component is y^k = e^0k x with sample variance ^k
Total variance:
ryk ;xi = e^ik
^
q
^k
p
X
i=1
sii = tr(S ) =
p
p
X
^
i=1
i or tr(R) = p =
p
X
^
i=1
i
sii : correlation between scores for the k-th component and the i-th trait :
Factor Analysis:
X ; = LF + 2 where F and 2 are independent; E (F) = 0; V (F) = I
E (2) = 0; V (2) = = a diagonal matrix :
Then
V (X) = = LL0 + (or the covariance matrix may be replaced with the correlation matrix.)
Cov(X; F) = L
Discriminant Analysis:
linear discriminant: dk (x) = ; 12 x 0k S ; x k + x 0k S ; x + ln(pk )
1
1
quadratic discriminant: dk (x) = ; 12 ln jSk j ; 21 (x ; x k )0Sk; (x ; x k ) + ln(pk )
1
STAT 501 Formula Sheet
(page 4)
ECM = (expected cost of misclassication) = c(2j1)p(2j1)p + c(1j2)p(1j2)p is minimized by
classifying into group 1 if
f (x) c(1j2)p
f (x) c(2j1)p
1
Canonical discriminants: maximize
1
2
2
1
2
g X
ni
g
X
[`0x i: ; `0x ::]2 `0[X ni(xi: ; x ::)(xi: ; x ::)0]`
`0B`
i=1 j =1
i=1X
X
X
X 0
=
=
(xij ; x i:)(xij ; x i:)0]` `0W`
[` xij ; `0x i:]2 `0[
i
i
j
j
by computing eigenvalues and eigenvectors of W ; B
1
Canonical Correlation:
" #
x N "y # ; "yy yx #! ; maximize correlation between u = a0y and v = b0x by solving
y
x
xy xx
; S ; c S )a = 0 where c is the i-th root of jS S ; S ; c S j = 0
(Syx Sxx
xy
i yy i
i
yx xx xy
i yy
(Sxy Syy; Syx ; ciSxx)bi = 0 where ci is the i-th root of jSxy Syy; Syx ; ciSxxj = 0
1
1
1
1
Logistic Regression:
pi = Pr(successjconditions determined by X i ; X i ; : : : ; Xpi )
1 ; pi = Pr(failurejconditions determined by X i ; X i ; : : : ; Xpi )
The model relating pi to the explanatory variables is:
!
p
f + X i + + pXpi g
i
ln 1 ; p = + X i + + pXpi or pi = 1 +expexp
f + X i + + pXpi g
i
1
1
0
1
Data are of the form:
2
0
1
1
0
Y X
Y X
1
11
2
12
...
...
X
X
21
22
...
2
1
1
1
Xp
Xp
1
...
2
Yn X n X n Xpn
1
2
where Yi is a binary variable coded 1 for success and 0 for failure. Maximum likelihood estimates
of ( ; ; : : : ; p) and, consequently, of the pi's are obtained by nding the parameter values that
maximize the likelihood function
0
n
Y
i=1
1
p
yi
i (1
; pi )
;yi
1
=
n " expf + X + + X g #yi "
Y
0
1 1i
p pi
i=1
1 + expf + + pXpi g
0
1
1 + expf + + pXpi g
This must be done numerically. There is no closed form solution.
0
#1;yi
Download