Sample Geometry and
Random Sampling
Shyh-Kang Jeng
Department of Electrical Engineering/
Graduate Institute of Communication/
Graduate Institute of Networking and
Multimedia
1
Array of Data
x11
x
21
X
x
j1
xn1
x12
x22
x j2
xn 2
x1k
x2 k
x jk
xnk
x1 p
x2 p
x jp
xnp
*a sample of size n from a p-variate population
2
Row-Vector View
x11
x
21
X
x j1
xn1
x12
x22
x j2
xn 2
x1k
x2 k
x jk
xnk
x1 p x1'
'
x2 p x 2
'
x jp x j
'
xnp x n
3
Example 3.1
4 1
X 1 3
3 5
4
Column-Vector View
x11
x
21
X
x j1
xn1
x12
x22
x j2
xn 2
x1k
x2 k
x jk
xnk
x1 p
x2 p
[ y1 | y 2 | | y p ]
x jp
xnp
5
Example 3.2
4 1
X 1 3
3 5
6
Geometrical Interpretation of
Sample Mean and Deviation
1' [1,1, ,1]
x1i x2i xni
1
1
(y
1)
1
1 xi 1
n
n
n
' 1
xi y i 1
n
d i y i xi 1
'
i
x1i xi
x x
2i i
xni xi
7
Decomposition of Column Vectors
8
Example 3.3
4 1
X 1 3, x1 2, x2 3
3 5
x11 2[1, 1, 1]' [2, 2, 2]'
x2 1 3[1, 1, 1]' [3, 3, 3]'
d1 y1 x11 [4, 1, 3]'[2, 2, 2]' [2, 3, 1]'
d 2 y 2 x2 1 [1, 3, 5]'[3, 3, 3]' [2, 0, 2]'
9
Lengths and Angles of
Deviation Vectors
L2di d i' d i x ji xi nsii
n
2
j 1
d i' d k x ji xi x jk xk nsik
n
j 1
Ldi Ld k cos ik
x
n
j 1
cos ik
ji xi
sik
sii skk
2
x
n
j 1
ji xi cos ik
2
rik
10
Example 3.4
4 1
X 1 3
3 5
d1 [2, 3, 1]' , d 2 [2, 0, 2]'
d1' d1 14 3s11 ,
d '2d 2 8 3s22
d1' d 2 2 3s12
r12
s12
s11 s22
0.189
0.189
14 / 3 2 / 3
1
Sn
, R
2
/
3
8
/
3
0
.
189
1
11
Random Matrix
X 11
X
21
X
X n1
X 12
X 22
X n2
X 1 p X1'
'
X 2 p X2
'
X np X n
12
Random Sample
Row vectors X1’, X2’, …, Xn’
represent independent
observations from a common joint
distribution with density function
f(x)=f(x1, x2, …, xp)
Mathematically, the joint density
function of X1’, X2’, …, Xn’ is
f (x1 ) f (x 2 ) f (x n )
f (x j ) f ( x j1 , x j 2 ,, x jp )
13
Random Sample
Measurements of a single trial, such
as Xj’=[Xj1,Xj2,…,Xjp], will usually be
correlated
The measurements from different
trials must be independent
The independence of measurements
from trial to trial may not hold when
the variables are likely to drift over
time
14
Geometric Interpretation of
Randomness
Column vector Yk’=[X1k,X2k,…,Xnk] regarded
as a point in n dimensions
The location is determined by the joint
probability distribution f(yk) = f(x1k, x2k,…,xnk)
For a random sample, f(yk)=fk(x1k)fk(x2k)…fk(xnk)
Each coordinate xjk contributes equally to
the location through the same marginal
distribution fk(xjk)
15
Result 3.1
X1 , X 2 , , X n are a random sample from a joint
distributi on that has mean vecto r μ and covariance
matrix Σ, then
E ( X) μ, ( X as an unbiased point estimate of μ)
n 1
1
Σ
Cov( X) Σ, E (S n )
n
n
n
Sn ) Σ
E(
n 1
n
S n as an unbiased point estimate of Σ)
(S
n 1
16
Proof of Result 3.1
1
1
1
E ( X) E ( X1 X 2 X n )
n
n
n
1
1
1
E ( X1 ) E ( X 2 ) E ( X n ) μ
n
n
n
1 n
1 n
( X μ)( X μ)' X j μ X μ '
n 1
n j 1
1 n n
2 X j μ X μ '
n j 1 1
1
Cov( X) E ( X μ)( X μ)' 2
n
E X
n
n
j 1 1
j
μ X μ '
17
Proof of Result 3.1
E ( X j μ)( X μ) 0 for j because of independen ce.
1
Cov( X) 2
n
n
1
1
E ( X j μ)( X j μ)' 2 nΣ Σ
n
n
j 1
1 n
E (S n ) E ( X j X X j X ' )
n j 1
X
n
j 1
j
X X j X '
X j X X j X j X X' X j X 'j nXX'
n
n
n
j 1
j 1
'
j 1
E ( X j X 'j ) E (X j μ μ X j μ μ ' ) Σ μμ'
18
Proof of Result 3.1
1
E ( XX ' ) E (X X ' ) '
n
n
1
E ( S n ) E ( X j X 'j nXX ' )
n j 1
1
1
n 1
n ' n '
n
n
n
19
Some Other Estimators
n
The expectatio n of the (i, k )th entry of
Sn
n 1
n
1 n
X ji X i X jk X k ) ik
E(
sik ) E (
n 1
n 1 j 1
E ( sii ) ii , E (rik ) ik
Biases E ( sii ) ii and E (rik ) ik can usually
be ignored if size n is moderately large
20
Generalized Sample Variance
Generalize d Sample Variance S
Example 3.7 : Employees and profits
per employee for 16 largest publishing
firms in US
252.04 68.43
S
68
.
43
123
.
67
S 26.487
21
Geometric Interpretation for
Bivariate Case
Area generated by two deviation vectors
d1 y1 x11, d 2 y 2 x2 1
is area Ld1 Ld 2 sin Ld1 Ld 2 1 cos 2
Ld1
x
n
j 1
(n 1) s11 , Ld 2
j1 x1
2
x
n
j 1
(n 1) s22
j 2 x2
2
cos rik , area (n 1) s11s22 (1 r122 )
s11
S
s11 s22 r12
s11 s22 r12
2
s11s22 (1 r12 )
s22
(area ) 2 /( n 1) 2
22
Generalized Sample Variance for
Multivariate Cases
S (n 1) (volume)
p
2
23
Interpretation in
p-space Scatter Plot
Equation for points within a
constant distance c from the
sample mean
1
2
(x x)' S (x x) c
Volume of (x x)' S 1 (x x) c 2
kp S
1/ 2
cp
A large volume correspond s to a
large generalize d variance
24
Example 3.8: Scatter Plots
25
Example 3.8: Sample Mean and
Variance-Covariance Matrices
5
S
4
3
S
0
4
, r 0.8
5
0
,r 0
3
5 4
S
, r 0.8
4 5
x' [2, 1], S 9 for all three cases
26
Example 3.8:
Eigenvalues and Eigenvectors
5 4
4 5 : 1 9, 2 1
e1' [1 / 2 , 1 / 2 ], e '2 [1 / 2 , 1 / 2 ]
3 0
0 3 : 1 3, 2 3
e1' [1, 0], e '2 [0, 1]
5 4
4 5 : 1 9, 2 1
e1' [1 / 2 , 1 / 2 ], e '2 [1 / 2 , 1 / 2 ]
27
Example 3.8:
Mean-Centered Ellipse
x x ' S 1 (x x) c 2
x x ' S 1 (x x)
1
S : eigenvalue s
1
y12
1
,
1
1 2
y22
2
; eigenvecto rs e1 , e 2
( Se e, e S 1e)
y1 e1' x1 x1
y ' x x
2 e 2 2 2
Choose c 2 5.99 to cover approximat ely 95% observatio ns
28
Example 3.8:
Semi-major and Semi-minor Axes
5
S
4
3
S
0
4
, a 3 5.99 , b 5.99
5
0
, a 3 5.99 , b 3 5.99
3
5 4
S
, a 3 5.99 , b 5.99
4 5
29
Example 3.8:
Scatter Plots with Major Axes
30
Result 3.2
The generalized variance is zero
when the columns of the following
matrix are linear dependent
x1' x' x11 x1
'
x x
x 2 x' 21 1
'
x n x' xn1 x1
x12 x2
x22 x2
xn 2 x2
x1 p x p
x2 p x p
X 1x '
x pp x p
31
Proof of Result 3.2
0 a1 col1 ( X 1x' ) a p col p ( X 1x' )
( X 1x' )a, a 0
(n 1)S ( X 1x' )' ( X 1x' )
( X 1x' )' ( X 1x' )
x1' x'
'
x 2 x'
'
'
'
x1 x' x 2 x' x p x'
'
x n x'
x j x x j x '
n
j 1
32
Proof of Result 3.2
(n 1)Sa ( X 1x' )' ( X 1x' )a 0
a1 col1 (S) a p col p (S) 0 S 0
if S 0, a such that Sa 0
0 (n 1)Sa ( X 1x' )' ( X 1x' )a
a' (X 1x' )' (X 1x' )a 0
2
( X 1x ') a
L
0 ( X 1x' )a 0
33
Example 3.9
1 2 5
2 1 0
X 4 1 6, x' 3, 1, 5, X 1x' 1
0 1
4 0 4
1 1 1
d1' [2, 1, 1], d '2 [1, 0, 1], d 3' [0,1,1]
d 3 d1 2d 2 S 0
3/ 2 0
3
check : S 3 / 2
1
1 / 2 S 0
0
1/ 2
1
34
Example 3.9
35
Examples Cause Zero
Generalized Variance
Example 1
– Data are test scores
– Included variables that are sum of
others
– e.g., algebra score and geometry score
were combined to total math score
– e.g., class midterm and final exam
scores summed to give total points
Example 2
– Total weight of chemicals was included
along with that of each component
36
Example 3.10
1 9 10
2 1 3
4 12 16
1
2
3
2.5 0 2.5
X 2 10 12, X 1x' 1 0 1, S 0 2.5 2.5
2.5 2.5 5.0
5 8 13
2 2 0
3 11 14
0
1
1
S 0 Sa 0
Eigenvecto r correspond ing to zero eigenvalue s of S
a' [1, 1, 1]
1( x j1 x1 ) 1( x j 2 x2 ) ( x j 3 x3 ) 0
37
Result 3.3
If the sample size is less than or
equal to the number of variables
( n p ) then |S| = 0 for all
samples
38
Proof of Result 3.3
The n row vectors of X-1x' sum to the zero vector
n
n
j 1
j 1
because x jk xk
Thus the rank of X-1x' is less than or equal to n 1,
i.e., less than or equal to p 1, because of n p
Since (n 1)S ( X 1x' )' ( X 1x' ),
(n 1) col k (S) ( X 1x' )' col k ( X 1x' )
( x1k xk ) row 1 ( X 1x' )' ( xnk xk ) row n ( X 1x' )'
39
Proof of Result 3.3
row 1 ( X 1x' )' is a linear combinatio n of the
remaining row vectors
col k ( S ) is a linear combinatio n of at most n 1
linear independen t of transpose of row vectors
The rank of S is thus less than or equal to n-1, i.e.,
less than or equal to p-1.
Since S is a p by p matrix, | S | 0
40
Result 3.4
Let the p by 1 vectors x1, x2, …, xn, where xj’
is the jth row of the data matrix X, be
realizations of the independent random
vectors X1, X2, …, Xn.
If the linear combination a’Xj has positive
variance for each non-zero constant vector
a, then, provided that p < n, S has full rank
with probability 1 and |S| > 0
If, with probability 1, a’Xj is a constant c
for all j, then |S| = 0
41
Proof of Part 2 of Result 3.4
a' X j a1 X j1 a2 X j 2 a p X jp c with probabilit y 1,
a' x j c for all j. The sample mean for it is
a x
n
j 1
1
j1
a2 x j 2 a p x jp / n a' x c
x1 p x p
x11 x1
( X 1x' )a a1 a p
xnp x p
xn1 x1
a' x1 a' x c c
0 | S | 0
a' x n a' x c c
42
Generalized Sample Variance of
Standardized Variables
Generalize d sample variance of
the standardiz ed variables |R|
yi xi 1 x1i xi
sii
sii
x2i xi
sii
xni xi
'
sii
R (n 1) p (volume) 2 , S s11s22 s pp R
R is large when all rik are nearly zero, and is small
when one or more rik are nearly 1 or - 1
43
Volume Generated by Deviation
Vectors of Standardized Variables
44
Example 3.11
4
S 3
1
s11 4,
S 14,
3 1
1 1/ 2 1/ 2
9 2, R 1 / 2 1 2 / 3
1 / 2 2 / 3 1
2 1
s22 9, s33 1
7
R ,
18
S s11s22 s33 R
45
Total Sample Variance
Total Sample Variance s11 s22 s pp
Pays no attention to the orientatio n of the residual vectors
252.04 68.43
Example 3.7 : S
67
.
123
43
.
68
Total sample variance 375.71
3/ 2 0
3
1 / 2
1
Example 3.9 : S 3 / 2
0
1
1/ 2
Total sample variance 5
46
Sample Mean as Matrix Operation
'
x
y11 / n
1
x11
x '
x
y 2 1 / n 1 12
2
x
n
'
x p y p 1 / n
x1 p
1
X'1
n
x21 xn1 1
x22 xn 2 1
x2 p xnp 1
47
Covariance as Matrix Operation
xp
x p
1
1x' 11' X
n
x p
x11 x1 x12 x2 x1 p x p
x x x x x x
p
2p
2
22
21 1
X 11' X
xn1 x1 xn 2 x2 xnp x p
x1
x
1
x1
x2
x2
x2
48
Covariance as Matrix Operation
xn1 x1
xn 2 x2
xnp x p
x11 x1 x12 x2 x1 p x p
x x x x x x
p
2p
2
22
21 1
x
x
x
x
x
x
n1 1
p
np
2
n2
1
1
X 11' X ' X 11' X
n
n
x11 x1
x x
2
12
(n 1)S
x1 p x p
x21 x1
x22 x2
x2 p x p
49
Covariance as Matrix Operation
1
1
X 11' X ' X 11' X
n
n
1
1
X' (I 11' )' (I 11' ) X
n
n
1
1
1
1
1
(I 11' )' (I 11' ) I 11' 11' 2 11'11'
n
n
n
n
n
1
I 11' ( 1'1 n)
n
1
1
X' (I 11' ) X
S
n
n 1
50
Sample Standard Deviation Matrix
s11
0
1/ 2
D
0
s11
s11 s11
s21
R s22 s11
s p1
s pp s11
S D1/ 2 RD1/ 2
1 / s11
0
0
s22
0 1/ 2 0
1 / s22
,D
0
s pp
0
1/
0
s1 p
s12
s11 s22
s11 s pp
s2 p
s22
D 1/ 2SD 1/ 2
s22 s22
s22 s pp
s p2
s pp
s pp s22
s pp s pp
0
0
s pp
0
51
Result 3.5
b' X b1 X 1 b2 X 2 b p X p
c' X c1 X 1 c2 X 2 c p X p
Sample mean of b' X b' x
Sample variance of b' X b' Sb
Sample covariance of b' X and c' X b' Sc
52
Proof of Result 3.5
b' x j b1 x j1 b2 x j 2 b p x jp
b' x1 b' x 2 b' x n
b' x
Sample mean
n
(b' x j b' x) 2 b' (x j x)( x j x)' b
1 n
2
b' x j b' x
Sample variance
n-1 j 1
n
1
b' (x j x)( x j x)' b b' Sb
n 1 j 1
53
Proof of Result 3.5
1 n
b' x j b' x c' x j c' x
Sample covariance
n-1 j 1
n
1
b' (x j x)( x j x)' c b' Sc
n 1 j 1
54
Result 3.6
a1 p X 1
a2 p X 2
aqp X p
Sample mean of AX Ax
Sample covariance matrix ASA '
a11 a12
a
a
22
21
AX
aq1 aq 2
55