Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia 1 Array of Data x11 x 21 X x j1 xn1 x12 x22 x j2 xn 2 x1k x2 k x jk xnk x1 p x2 p x jp xnp *a sample of size n from a p-variate population 2 Row-Vector View x11 x 21 X x j1 xn1 x12 x22 x j2 xn 2 x1k x2 k x jk xnk x1 p x1' ' x2 p x 2 ' x jp x j ' xnp x n 3 Example 3.1 4 1 X 1 3 3 5 4 Column-Vector View x11 x 21 X x j1 xn1 x12 x22 x j2 xn 2 x1k x2 k x jk xnk x1 p x2 p [ y1 | y 2 | | y p ] x jp xnp 5 Example 3.2 4 1 X 1 3 3 5 6 Geometrical Interpretation of Sample Mean and Deviation 1' [1,1, ,1] x1i x2i xni 1 1 (y 1) 1 1 xi 1 n n n ' 1 xi y i 1 n d i y i xi 1 ' i x1i xi x x 2i i xni xi 7 Decomposition of Column Vectors 8 Example 3.3 4 1 X 1 3, x1 2, x2 3 3 5 x11 2[1, 1, 1]' [2, 2, 2]' x2 1 3[1, 1, 1]' [3, 3, 3]' d1 y1 x11 [4, 1, 3]'[2, 2, 2]' [2, 3, 1]' d 2 y 2 x2 1 [1, 3, 5]'[3, 3, 3]' [2, 0, 2]' 9 Lengths and Angles of Deviation Vectors L2di d i' d i x ji xi nsii n 2 j 1 d i' d k x ji xi x jk xk nsik n j 1 Ldi Ld k cos ik x n j 1 cos ik ji xi sik sii skk 2 x n j 1 ji xi cos ik 2 rik 10 Example 3.4 4 1 X 1 3 3 5 d1 [2, 3, 1]' , d 2 [2, 0, 2]' d1' d1 14 3s11 , d '2d 2 8 3s22 d1' d 2 2 3s12 r12 s12 s11 s22 0.189 0.189 14 / 3 2 / 3 1 Sn , R 2 / 3 8 / 3 0 . 189 1 11 Random Matrix X 11 X 21 X X n1 X 12 X 22 X n2 X 1 p X1' ' X 2 p X2 ' X np X n 12 Random Sample Row vectors X1’, X2’, …, Xn’ represent independent observations from a common joint distribution with density function f(x)=f(x1, x2, …, xp) Mathematically, the joint density function of X1’, X2’, …, Xn’ is f (x1 ) f (x 2 ) f (x n ) f (x j ) f ( x j1 , x j 2 ,, x jp ) 13 Random Sample Measurements of a single trial, such as Xj’=[Xj1,Xj2,…,Xjp], will usually be correlated The measurements from different trials must be independent The independence of measurements from trial to trial may not hold when the variables are likely to drift over time 14 Geometric Interpretation of Randomness Column vector Yk’=[X1k,X2k,…,Xnk] regarded as a point in n dimensions The location is determined by the joint probability distribution f(yk) = f(x1k, x2k,…,xnk) For a random sample, f(yk)=fk(x1k)fk(x2k)…fk(xnk) Each coordinate xjk contributes equally to the location through the same marginal distribution fk(xjk) 15 Result 3.1 X1 , X 2 , , X n are a random sample from a joint distributi on that has mean vecto r μ and covariance matrix Σ, then E ( X) μ, ( X as an unbiased point estimate of μ) n 1 1 Σ Cov( X) Σ, E (S n ) n n n Sn ) Σ E( n 1 n S n as an unbiased point estimate of Σ) (S n 1 16 Proof of Result 3.1 1 1 1 E ( X) E ( X1 X 2 X n ) n n n 1 1 1 E ( X1 ) E ( X 2 ) E ( X n ) μ n n n 1 n 1 n ( X μ)( X μ)' X j μ X μ ' n 1 n j 1 1 n n 2 X j μ X μ ' n j 1 1 1 Cov( X) E ( X μ)( X μ)' 2 n E X n n j 1 1 j μ X μ ' 17 Proof of Result 3.1 E ( X j μ)( X μ) 0 for j because of independen ce. 1 Cov( X) 2 n n 1 1 E ( X j μ)( X j μ)' 2 nΣ Σ n n j 1 1 n E (S n ) E ( X j X X j X ' ) n j 1 X n j 1 j X X j X ' X j X X j X j X X' X j X 'j nXX' n n n j 1 j 1 ' j 1 E ( X j X 'j ) E (X j μ μ X j μ μ ' ) Σ μμ' 18 Proof of Result 3.1 1 E ( XX ' ) E (X X ' ) ' n n 1 E ( S n ) E ( X j X 'j nXX ' ) n j 1 1 1 n 1 n ' n ' n n n 19 Some Other Estimators n The expectatio n of the (i, k )th entry of Sn n 1 n 1 n X ji X i X jk X k ) ik E( sik ) E ( n 1 n 1 j 1 E ( sii ) ii , E (rik ) ik Biases E ( sii ) ii and E (rik ) ik can usually be ignored if size n is moderately large 20 Generalized Sample Variance Generalize d Sample Variance S Example 3.7 : Employees and profits per employee for 16 largest publishing firms in US 252.04 68.43 S 68 . 43 123 . 67 S 26.487 21 Geometric Interpretation for Bivariate Case Area generated by two deviation vectors d1 y1 x11, d 2 y 2 x2 1 is area Ld1 Ld 2 sin Ld1 Ld 2 1 cos 2 Ld1 x n j 1 (n 1) s11 , Ld 2 j1 x1 2 x n j 1 (n 1) s22 j 2 x2 2 cos rik , area (n 1) s11s22 (1 r122 ) s11 S s11 s22 r12 s11 s22 r12 2 s11s22 (1 r12 ) s22 (area ) 2 /( n 1) 2 22 Generalized Sample Variance for Multivariate Cases S (n 1) (volume) p 2 23 Interpretation in p-space Scatter Plot Equation for points within a constant distance c from the sample mean 1 2 (x x)' S (x x) c Volume of (x x)' S 1 (x x) c 2 kp S 1/ 2 cp A large volume correspond s to a large generalize d variance 24 Example 3.8: Scatter Plots 25 Example 3.8: Sample Mean and Variance-Covariance Matrices 5 S 4 3 S 0 4 , r 0.8 5 0 ,r 0 3 5 4 S , r 0.8 4 5 x' [2, 1], S 9 for all three cases 26 Example 3.8: Eigenvalues and Eigenvectors 5 4 4 5 : 1 9, 2 1 e1' [1 / 2 , 1 / 2 ], e '2 [1 / 2 , 1 / 2 ] 3 0 0 3 : 1 3, 2 3 e1' [1, 0], e '2 [0, 1] 5 4 4 5 : 1 9, 2 1 e1' [1 / 2 , 1 / 2 ], e '2 [1 / 2 , 1 / 2 ] 27 Example 3.8: Mean-Centered Ellipse x x ' S 1 (x x) c 2 x x ' S 1 (x x) 1 S : eigenvalue s 1 y12 1 , 1 1 2 y22 2 ; eigenvecto rs e1 , e 2 ( Se e, e S 1e) y1 e1' x1 x1 y ' x x 2 e 2 2 2 Choose c 2 5.99 to cover approximat ely 95% observatio ns 28 Example 3.8: Semi-major and Semi-minor Axes 5 S 4 3 S 0 4 , a 3 5.99 , b 5.99 5 0 , a 3 5.99 , b 3 5.99 3 5 4 S , a 3 5.99 , b 5.99 4 5 29 Example 3.8: Scatter Plots with Major Axes 30 Result 3.2 The generalized variance is zero when the columns of the following matrix are linear dependent x1' x' x11 x1 ' x x x 2 x' 21 1 ' x n x' xn1 x1 x12 x2 x22 x2 xn 2 x2 x1 p x p x2 p x p X 1x ' x pp x p 31 Proof of Result 3.2 0 a1 col1 ( X 1x' ) a p col p ( X 1x' ) ( X 1x' )a, a 0 (n 1)S ( X 1x' )' ( X 1x' ) ( X 1x' )' ( X 1x' ) x1' x' ' x 2 x' ' ' ' x1 x' x 2 x' x p x' ' x n x' x j x x j x ' n j 1 32 Proof of Result 3.2 (n 1)Sa ( X 1x' )' ( X 1x' )a 0 a1 col1 (S) a p col p (S) 0 S 0 if S 0, a such that Sa 0 0 (n 1)Sa ( X 1x' )' ( X 1x' )a a' (X 1x' )' (X 1x' )a 0 2 ( X 1x ') a L 0 ( X 1x' )a 0 33 Example 3.9 1 2 5 2 1 0 X 4 1 6, x' 3, 1, 5, X 1x' 1 0 1 4 0 4 1 1 1 d1' [2, 1, 1], d '2 [1, 0, 1], d 3' [0,1,1] d 3 d1 2d 2 S 0 3/ 2 0 3 check : S 3 / 2 1 1 / 2 S 0 0 1/ 2 1 34 Example 3.9 35 Examples Cause Zero Generalized Variance Example 1 – Data are test scores – Included variables that are sum of others – e.g., algebra score and geometry score were combined to total math score – e.g., class midterm and final exam scores summed to give total points Example 2 – Total weight of chemicals was included along with that of each component 36 Example 3.10 1 9 10 2 1 3 4 12 16 1 2 3 2.5 0 2.5 X 2 10 12, X 1x' 1 0 1, S 0 2.5 2.5 2.5 2.5 5.0 5 8 13 2 2 0 3 11 14 0 1 1 S 0 Sa 0 Eigenvecto r correspond ing to zero eigenvalue s of S a' [1, 1, 1] 1( x j1 x1 ) 1( x j 2 x2 ) ( x j 3 x3 ) 0 37 Result 3.3 If the sample size is less than or equal to the number of variables ( n p ) then |S| = 0 for all samples 38 Proof of Result 3.3 The n row vectors of X-1x' sum to the zero vector n n j 1 j 1 because x jk xk Thus the rank of X-1x' is less than or equal to n 1, i.e., less than or equal to p 1, because of n p Since (n 1)S ( X 1x' )' ( X 1x' ), (n 1) col k (S) ( X 1x' )' col k ( X 1x' ) ( x1k xk ) row 1 ( X 1x' )' ( xnk xk ) row n ( X 1x' )' 39 Proof of Result 3.3 row 1 ( X 1x' )' is a linear combinatio n of the remaining row vectors col k ( S ) is a linear combinatio n of at most n 1 linear independen t of transpose of row vectors The rank of S is thus less than or equal to n-1, i.e., less than or equal to p-1. Since S is a p by p matrix, | S | 0 40 Result 3.4 Let the p by 1 vectors x1, x2, …, xn, where xj’ is the jth row of the data matrix X, be realizations of the independent random vectors X1, X2, …, Xn. If the linear combination a’Xj has positive variance for each non-zero constant vector a, then, provided that p < n, S has full rank with probability 1 and |S| > 0 If, with probability 1, a’Xj is a constant c for all j, then |S| = 0 41 Proof of Part 2 of Result 3.4 a' X j a1 X j1 a2 X j 2 a p X jp c with probabilit y 1, a' x j c for all j. The sample mean for it is a x n j 1 1 j1 a2 x j 2 a p x jp / n a' x c x1 p x p x11 x1 ( X 1x' )a a1 a p xnp x p xn1 x1 a' x1 a' x c c 0 | S | 0 a' x n a' x c c 42 Generalized Sample Variance of Standardized Variables Generalize d sample variance of the standardiz ed variables |R| yi xi 1 x1i xi sii sii x2i xi sii xni xi ' sii R (n 1) p (volume) 2 , S s11s22 s pp R R is large when all rik are nearly zero, and is small when one or more rik are nearly 1 or - 1 43 Volume Generated by Deviation Vectors of Standardized Variables 44 Example 3.11 4 S 3 1 s11 4, S 14, 3 1 1 1/ 2 1/ 2 9 2, R 1 / 2 1 2 / 3 1 / 2 2 / 3 1 2 1 s22 9, s33 1 7 R , 18 S s11s22 s33 R 45 Total Sample Variance Total Sample Variance s11 s22 s pp Pays no attention to the orientatio n of the residual vectors 252.04 68.43 Example 3.7 : S 67 . 123 43 . 68 Total sample variance 375.71 3/ 2 0 3 1 / 2 1 Example 3.9 : S 3 / 2 0 1 1/ 2 Total sample variance 5 46 Sample Mean as Matrix Operation ' x y11 / n 1 x11 x ' x y 2 1 / n 1 12 2 x n ' x p y p 1 / n x1 p 1 X'1 n x21 xn1 1 x22 xn 2 1 x2 p xnp 1 47 Covariance as Matrix Operation xp x p 1 1x' 11' X n x p x11 x1 x12 x2 x1 p x p x x x x x x p 2p 2 22 21 1 X 11' X xn1 x1 xn 2 x2 xnp x p x1 x 1 x1 x2 x2 x2 48 Covariance as Matrix Operation xn1 x1 xn 2 x2 xnp x p x11 x1 x12 x2 x1 p x p x x x x x x p 2p 2 22 21 1 x x x x x x n1 1 p np 2 n2 1 1 X 11' X ' X 11' X n n x11 x1 x x 2 12 (n 1)S x1 p x p x21 x1 x22 x2 x2 p x p 49 Covariance as Matrix Operation 1 1 X 11' X ' X 11' X n n 1 1 X' (I 11' )' (I 11' ) X n n 1 1 1 1 1 (I 11' )' (I 11' ) I 11' 11' 2 11'11' n n n n n 1 I 11' ( 1'1 n) n 1 1 X' (I 11' ) X S n n 1 50 Sample Standard Deviation Matrix s11 0 1/ 2 D 0 s11 s11 s11 s21 R s22 s11 s p1 s pp s11 S D1/ 2 RD1/ 2 1 / s11 0 0 s22 0 1/ 2 0 1 / s22 ,D 0 s pp 0 1/ 0 s1 p s12 s11 s22 s11 s pp s2 p s22 D 1/ 2SD 1/ 2 s22 s22 s22 s pp s p2 s pp s pp s22 s pp s pp 0 0 s pp 0 51 Result 3.5 b' X b1 X 1 b2 X 2 b p X p c' X c1 X 1 c2 X 2 c p X p Sample mean of b' X b' x Sample variance of b' X b' Sb Sample covariance of b' X and c' X b' Sc 52 Proof of Result 3.5 b' x j b1 x j1 b2 x j 2 b p x jp b' x1 b' x 2 b' x n b' x Sample mean n (b' x j b' x) 2 b' (x j x)( x j x)' b 1 n 2 b' x j b' x Sample variance n-1 j 1 n 1 b' (x j x)( x j x)' b b' Sb n 1 j 1 53 Proof of Result 3.5 1 n b' x j b' x c' x j c' x Sample covariance n-1 j 1 n 1 b' (x j x)( x j x)' c b' Sc n 1 j 1 54 Result 3.6 a1 p X 1 a2 p X 2 aqp X p Sample mean of AX Ax Sample covariance matrix ASA ' a11 a12 a a 22 21 AX aq1 aq 2 55