Document

advertisement
Sample Geometry and
Random Sampling
Shyh-Kang Jeng
Department of Electrical Engineering/
Graduate Institute of Communication/
Graduate Institute of Networking and
Multimedia
1
Array of Data
 x11
x
 21
 
X
x
 j1
 

 xn1
x12
x22

x j2

xn 2






x1k
x2 k

x jk

xnk






x1 p 

x2 p 
 

x jp 
 

xnp 
*a sample of size n from a p-variate population
2
Row-Vector View
 x11
x
 21
 
X
 x j1
 

 xn1
x12
x22

x j2

xn 2






x1k
x2 k

x jk

xnk






x1 p   x1' 


' 
x2 p   x 2 
  
 ' 
x jp  x j 
  
  '
xnp  x n 
3
Example 3.1
 4 1


X   1 3
 3 5
4
Column-Vector View
 x11
x
 21
 
X
 x j1
 

 xn1
x12
x22

x j2

xn 2






x1k
x2 k

x jk

xnk






x1 p 
x2 p 
 
  [ y1 | y 2 |  | y p ]
x jp 
 

xnp 
5
Example 3.2
 4 1


X   1 3
 3 5
6
Geometrical Interpretation of
Sample Mean and Deviation
1'  [1,1,  ,1]
x1i  x2i    xni
1
1
(y
1)
1
1  xi 1
n
n
n
' 1
xi  y i 1
n
d i  y i  xi 1
'
i
 x1i  xi 
x  x 
  2i i 
  


 xni  xi 
7
Decomposition of Column Vectors
8
Example 3.3
 4 1


X   1 3, x1  2, x2  3
 3 5
x11  2[1, 1, 1]'  [2, 2, 2]'
x2 1  3[1, 1, 1]'  [3, 3, 3]'
d1  y1  x11  [4,  1, 3]'[2, 2, 2]'  [2,  3, 1]'
d 2  y 2  x2 1  [1, 3, 5]'[3, 3, 3]'  [2, 0, 2]'
9
Lengths and Angles of
Deviation Vectors
L2di  d i' d i   x ji  xi   nsii
n
2
j 1
d i' d k   x ji  xi x jk  xk   nsik
n
j 1
 Ldi Ld k cos  ik

 x
n
j 1
cos  ik 
ji  xi 
sik
sii skk
2
 x
n
j 1
ji  xi  cos  ik
2
 rik
10
Example 3.4
 4 1
X   1 3
 3 5
d1  [2,  3, 1]' , d 2  [2, 0, 2]'
d1' d1  14  3s11 ,
d '2d 2  8  3s22
d1' d 2  2  3s12
r12 
s12
s11 s22
 0.189
 0.189
 14 / 3  2 / 3
 1
Sn  
, R



2
/
3
8
/
3

0
.
189
1




11
Random Matrix
 X 11
X
21

X
 

 X n1
X 12
X 22

X n2
 X 1 p   X1' 


' 
 X 2 p  X2 

     
  '
 X np   X n 
12
Random Sample
Row vectors X1’, X2’, …, Xn’
represent independent
observations from a common joint
distribution with density function
f(x)=f(x1, x2, …, xp)
Mathematically, the joint density
function of X1’, X2’, …, Xn’ is
f (x1 ) f (x 2 )  f (x n )
f (x j )  f ( x j1 , x j 2 ,, x jp )
13
Random Sample
Measurements of a single trial, such
as Xj’=[Xj1,Xj2,…,Xjp], will usually be
correlated
The measurements from different
trials must be independent
The independence of measurements
from trial to trial may not hold when
the variables are likely to drift over
time
14
Geometric Interpretation of
Randomness
Column vector Yk’=[X1k,X2k,…,Xnk] regarded
as a point in n dimensions
The location is determined by the joint
probability distribution f(yk) = f(x1k, x2k,…,xnk)
For a random sample, f(yk)=fk(x1k)fk(x2k)…fk(xnk)
Each coordinate xjk contributes equally to
the location through the same marginal
distribution fk(xjk)
15
Result 3.1
X1 , X 2 ,  , X n are a random sample from a joint
distributi on that has mean vecto r μ and covariance
matrix Σ, then
E ( X)  μ, ( X as an unbiased point estimate of μ)
n 1
1
Σ
Cov( X)  Σ, E (S n ) 
n
n
n
Sn )  Σ
E(
n 1
n
S n as an unbiased point estimate of Σ)
(S 
n 1
16
Proof of Result 3.1
1
1
1
E ( X)  E ( X1  X 2    X n )
n
n
n
1
1
1
 E ( X1 )  E ( X 2 )    E ( X n )  μ
n
n
n
 1 n
1 n



( X  μ)( X  μ)'    X j  μ   X   μ '

 n  1
 n j 1
1 n n
 2  X j  μ X   μ '
n j 1  1
1
Cov( X)  E ( X  μ)( X  μ)'  2
n
 E X
n
n
j 1  1
j
 μ X   μ '
17
Proof of Result 3.1
E ( X j  μ)( X   μ)  0 for j   because of independen ce.
1
Cov( X)  2
n
n
1
1
E ( X j  μ)( X j  μ)'  2 nΣ  Σ

n
n
j 1
1 n
E (S n )  E (  X j  X X j  X ' )
n j 1
 X
n
j 1
j
 X X j  X '
  X j  X X j   X j  X X'   X j X 'j  nXX'
n
n
n
j 1
j 1
'
j 1
E ( X j X 'j )  E (X j  μ  μ X j  μ  μ ' )  Σ  μμ'
18
Proof of Result 3.1
1
E ( XX ' )  E (X     X     ' )     '
n
n
1
E ( S n )  E ( X j X 'j  nXX ' )
n j 1
1
1
  n 1
  n   '  n    '   

n
n
n

19
Some Other Estimators
n
The expectatio n of the (i, k )th entry of
Sn
n 1
n
1 n
X ji  X i X jk  X k )   ik
E(
sik )  E (

n 1
n  1 j 1
E ( sii )   ii , E (rik )   ik
Biases E ( sii )   ii and E (rik )   ik can usually
be ignored if size n is moderately large
20
Generalized Sample Variance
Generalize d Sample Variance  S
Example 3.7 : Employees and profits
per employee for 16 largest publishing
firms in US
 252.04  68.43
S


68
.
43
123
.
67


S  26.487
21
Geometric Interpretation for
Bivariate Case
Area generated by two deviation vectors
d1  y1  x11, d 2  y 2  x2 1
is area  Ld1 Ld 2 sin   Ld1 Ld 2 1  cos 2 
Ld1 
 x
n
j 1
(n  1) s11 , Ld 2 
j1  x1  
2
 x
n
j 1
(n  1) s22
j 2  x2  
2
cos   rik , area  (n  1) s11s22 (1  r122 )

s11
S
 s11 s22 r12
s11 s22 r12 
2
  s11s22 (1  r12 )
s22

 (area ) 2 /( n  1) 2

22
Generalized Sample Variance for
Multivariate Cases
S  (n  1) (volume)
p
2
23
Interpretation in
p-space Scatter Plot
Equation for points within a
constant distance c from the
sample mean
1
2
(x  x)' S (x  x)  c
Volume of (x  x)' S 1 (x  x)  c 2 
 kp S
1/ 2
cp
A large volume correspond s to a
large generalize d variance
24
Example 3.8: Scatter Plots
25
Example 3.8: Sample Mean and
Variance-Covariance Matrices
5
S
4
3
S
0
4
, r  0.8

5
0
,r  0

3
 5  4
S
, r  0.8

 4 5 
x'  [2, 1], S  9 for all three cases
26
Example 3.8:
Eigenvalues and Eigenvectors
5 4
4 5 : 1  9, 2  1


e1'  [1 / 2 , 1 / 2 ], e '2  [1 / 2 ,  1 / 2 ]
3 0
0 3 : 1  3, 2  3


e1'  [1, 0], e '2  [0, 1]
 5  4
 4 5  : 1  9, 2  1


e1'  [1 / 2 ,  1 / 2 ], e '2  [1 / 2 , 1 / 2 ]
27
Example 3.8:
Mean-Centered Ellipse
x  x ' S 1 (x  x)  c 2
x  x ' S 1 (x  x) 
1
S : eigenvalue s
1
y12
1
,

1
1 2
y22
2
; eigenvecto rs e1 , e 2
( Se  e, e  S 1e)
 y1   e1'   x1  x1 
 y    ' x  x 
 2  e 2   2 2 
Choose c 2  5.99 to cover approximat ely 95% observatio ns
28
Example 3.8:
Semi-major and Semi-minor Axes
5
S
4
3
S
0
4
, a  3 5.99 , b  5.99

5
0
, a  3 5.99 , b  3 5.99

3
 5  4
S
, a  3 5.99 , b  5.99

 4 5 
29
Example 3.8:
Scatter Plots with Major Axes
30
Result 3.2
The generalized variance is zero
when the columns of the following
matrix are linear dependent
 x1'  x'   x11  x1
 '
 x  x
x 2  x'   21 1
    
 '
 
x n  x'  xn1  x1
x12  x2
x22  x2

xn 2  x2
 x1 p  x p 

 x2 p  x p 
 X  1x '

 

 x pp  x p 
31
Proof of Result 3.2
0  a1 col1 ( X  1x' )    a p col p ( X  1x' )
 ( X  1x' )a, a  0
(n  1)S  ( X  1x' )' ( X  1x' )
 ( X  1x' )' ( X  1x' )
 x1'  x' 
 '

x 2  x'
'
'
'

 x1  x' x 2  x'  x p  x'
  
 '

x n  x'


  x j  x x j  x '
n
j 1
32
Proof of Result 3.2
(n  1)Sa  ( X  1x' )' ( X  1x' )a  0
a1 col1 (S)    a p col p (S)  0  S  0
if S  0, a such that Sa  0
0  (n  1)Sa  ( X  1x' )' ( X  1x' )a
a' (X  1x' )' (X  1x' )a  0
2
( X 1x ') a
L
 0  ( X  1x' )a  0
33
Example 3.9
1 2 5 
 2 1 0 




X  4 1 6, x'  3, 1, 5, X  1x'   1
0 1
4 0 4
 1  1  1
d1'  [2, 1, 1], d '2  [1, 0,  1], d 3'  [0,1,1]
d 3  d1  2d 2  S  0
3/ 2 0 
 3


check : S   3 / 2
1
1 / 2  S  0
 0
1/ 2
1 
34
Example 3.9
35
Examples Cause Zero
Generalized Variance
Example 1
– Data are test scores
– Included variables that are sum of
others
– e.g., algebra score and geometry score
were combined to total math score
– e.g., class midterm and final exam
scores summed to give total points
Example 2
– Total weight of chemicals was included
along with that of each component
36
Example 3.10
1 9 10
 2  1  3
4 12 16
1

2
3
2.5 0 2.5




X  2 10 12, X  1x'    1 0  1, S   0 2.5 2.5




2.5 2.5 5.0
5 8 13
 2 2 0 
3 11 14
 0
1
1 
S  0  Sa  0
Eigenvecto r correspond ing to zero eigenvalue s of S
 a'  [1, 1,  1]
1( x j1  x1 )  1( x j 2  x2 )  ( x j 3  x3 )  0
37
Result 3.3
If the sample size is less than or
equal to the number of variables
( n  p ) then |S| = 0 for all
samples
38
Proof of Result 3.3
The n row vectors of X-1x' sum to the zero vector
n
n
j 1
j 1
because  x jk   xk
Thus the rank of X-1x' is less than or equal to n  1,
i.e., less than or equal to p  1, because of n  p
Since (n  1)S  ( X  1x' )' ( X  1x' ),
(n  1) col k (S)  ( X  1x' )' col k ( X  1x' )
 ( x1k  xk ) row 1 ( X  1x' )'   ( xnk  xk ) row n ( X  1x' )'
39
Proof of Result 3.3
 row 1 ( X  1x' )' is a linear combinatio n of the
remaining row vectors
col k ( S ) is a linear combinatio n of at most n  1
linear independen t of transpose of row vectors
The rank of S is thus less than or equal to n-1, i.e.,
less than or equal to p-1.
Since S is a p by p matrix, | S |  0
40
Result 3.4
Let the p by 1 vectors x1, x2, …, xn, where xj’
is the jth row of the data matrix X, be
realizations of the independent random
vectors X1, X2, …, Xn.
If the linear combination a’Xj has positive
variance for each non-zero constant vector
a, then, provided that p < n, S has full rank
with probability 1 and |S| > 0
If, with probability 1, a’Xj is a constant c
for all j, then |S| = 0
41
Proof of Part 2 of Result 3.4
a' X j  a1 X j1  a2 X j 2    a p X jp  c with probabilit y 1,
a' x j  c for all j. The sample mean for it is
 a x
n
j 1
1
j1
 a2 x j 2    a p x jp  / n  a' x  c
 x1 p  x p 
 x11  x1 


( X  1x' )a  a1       a p   
 xnp  x p 
 xn1  x1 


 a' x1  a' x  c  c 
      0 | S | 0
 

 

a' x n  a' x  c  c 
42
Generalized Sample Variance of
Standardized Variables
Generalize d sample variance of
the standardiz ed variables  |R|
yi  xi 1  x1i  xi

sii
 sii
x2i  xi
sii

xni  xi 
'
sii 
R  (n  1)  p (volume) 2 , S  s11s22  s pp  R
R is large when all rik are nearly zero, and is small
when one or more rik are nearly  1 or - 1
43
Volume Generated by Deviation
Vectors of Standardized Variables
44
Example 3.11
4

S  3
1
s11  4,
S  14,
3 1
 1 1/ 2 1/ 2



9 2, R  1 / 2 1 2 / 3
1 / 2 2 / 3 1 
2 1 
s22  9, s33  1
7
R  ,
18
S  s11s22 s33 R
45
Total Sample Variance
Total Sample Variance  s11  s22    s pp
Pays no attention to the orientatio n of the residual vectors
 252.04  68.43
Example 3.7 : S  

67
.
123
43
.
68



Total sample variance  375.71
3/ 2 0 
 3
1 / 2
1
Example 3.9 : S   3 / 2
 0
1 
1/ 2
Total sample variance  5
46
Sample Mean as Matrix Operation
'
x

y11 / n 
 1
 x11

x   '
x
y 2 1 / n  1  12
2


x


      n 

   '

 x p  y p 1 / n 
 x1 p
1
 X'1
n
x21  xn1  1

x22  xn 2  1

    
 
x2 p  xnp  1
47
Covariance as Matrix Operation
 xp 
 x p 
1
 1x'  11' X
 
n

 x p 
 x11  x1 x12  x2  x1 p  x p 
x  x x  x  x  x 
p
2p
2
22
 21 1
 X  11' X
 
 




 xn1  x1 xn 2  x2  xnp  x p 
 x1
x
 1


 x1
x2
x2

x2
48
Covariance as Matrix Operation
 xn1  x1 
 xn 2  x2 





 xnp  x p 
 x11  x1 x12  x2  x1 p  x p 
x  x x  x  x  x 
p
2p
2
22
 21 1
 
 




x

x

x

x
x

x
 n1 1
p
np
2
n2

1
1



  X  11' X '  X  11' X 
n
n



 x11  x1
x  x
2
12

(n  1)S 
 

 x1 p  x p
x21  x1
x22  x2

x2 p  x p
49
Covariance as Matrix Operation
1
1



 X  11' X '  X  11' X 
n
n



1
1
 X' (I  11' )' (I  11' ) X
n
n
1
1
1
1
1
(I  11' )' (I  11' )  I  11' 11' 2 11'11'
n
n
n
n
n
1
 I  11' ( 1'1  n)
n
1
1
X' (I  11' ) X
S
n
n 1
50
Sample Standard Deviation Matrix
 s11

0

1/ 2
D 


 0
 s11

 s11 s11
 s21
R   s22 s11



 s p1

 s pp s11
S  D1/ 2 RD1/ 2
1 / s11
0 
0



s22 
0  1/ 2  0
1 / s22 
,D










0

s pp 
0
 1/
 0
s1 p

s12


s11 s22
s11 s pp 

s2 p
s22

  D 1/ 2SD 1/ 2
s22 s22
s22 s pp 





s p2
s pp


s pp s22
s pp s pp 
0



0 
 

s pp 
0
51
Result 3.5
b' X  b1 X 1  b2 X 2    b p X p
c' X  c1 X 1  c2 X 2    c p X p
Sample mean of b' X  b' x
Sample variance of b' X  b' Sb
Sample covariance of b' X and c' X  b' Sc
52
Proof of Result 3.5
b' x j  b1 x j1  b2 x j 2    b p x jp
b' x1  b' x 2    b' x n
 b' x
Sample mean 
n
(b' x j  b' x) 2  b' (x j  x)( x j  x)' b
1 n
2


b' x j  b' x
Sample variance 

n-1 j 1
n
1
b'  (x j  x)( x j  x)' b  b' Sb

n  1 j 1
53
Proof of Result 3.5
1 n
b' x j  b' x c' x j  c' x 
Sample covariance 

n-1 j 1
n
1

b'  (x j  x)( x j  x)' c  b' Sc
n  1 j 1
54
Result 3.6
 a1 p   X 1 



 a2 p   X 2 
    
 
 aqp   X p 
Sample mean of AX  Ax
Sample covariance matrix  ASA '
 a11 a12
a
a
22
21

AX 
 


aq1 aq 2
55
Download