Object Orie’d Data Analysis, Last Time • Gene Cell Cycle Data • Microarrays and HDLSS visualization • DWD bias adjustment • NCI 60 Data Today: More NCI 60 Data & Detailed (math’cal) look at PCA Last Time: Checked Data Combo, using DWD Dir’ns DWD Views of NCI 60 Data Interesting Question: Which clusters are really there? Issues: • DWD great at finding dir’ns of separation • And will do so even if no real structure • Is this happening here? • Or: which clusters are important? • What does “important” mean? Real Clusters in NCI 60 Data Simple Visual Approach: • Randomly relabel data (Cancer Types) • Recompute DWD dir’ns & visualization • Get heuristic impression from this Deeper Approach • Formal Hypothesis Testing (Done later) Random Relabelling #1 Random Relabelling #2 Random Relabelling #3 Random Relabelling #4 Revisit Real Data Revisit Real Data (Cont.) Heuristic Results: Strong Clust’s Weak Clust’s Not Clust’s Melanoma CNS NSCLC Leukemia Ovarian Breast Renal Colon Later: will find way to quantify these ideas i.e. develop statistical significance NCI 60 Controversy • Can NCI 60 Data be normalized? • Negative Indication: • Kou, et al (2002) Bioinformatics, 18, 405412. – Based on Gene by Gene Correlations • Resolution: Gene by Gene Data View vs. Multivariate Data View Resolution of Paradox: Toy Data, Gene View Resolution: Correlations suggest “no chance” Resolution: Toy Data, PCA View Resolution: PCA & DWD direct’ns Resolution: DWD Adjusted Resolution: DWD Adjusted, PCA view Resolution: DWD Adjusted, Gene view Resolution: Correlations & PC1 Projection Correl’n Needed final verification of Cross-platform Normal’n • Is statistical power actually improved? • Will study later DWD: Why does it work? Rob Tibshirani Query: • Really need that complicated stuff? (DWD is complex) • Can’t we just use means? • Empirical Fact (Joel Parker): (DWD better than simple methods) DWD: Why does it work? Xuxin Liu Observation: • Key is unbalanced sub-sample sizes (e.g biological subtypes) • Mean methods strongly affected • DWD much more robust • Toy Example DWD: Why does it work? Xuxin Liu Example • Goals: – Bring colors together – Keep symbols distinct (interesting biology) • Study varying sub-sample proportions: – – – – Ratio = 1: Both methods great Ratio = 0.61: Mean degrades, DWD good Ratio = 0.35: Mean poor, DWD still OK Ratio = 0.11: DWD degraded, still better • Later: will find underlying theory PCA: Rediscovery – Renaming Statistics: Principal Component Analysis (PCA) Social Sciences: Factor Analysis (PCA is a subset) Probability / Electrical Eng: Karhunen – Loeve expansion Applied Mathematics: Proper Orthogonal Decomposition (POD) Geo-Sciences: Empirical Orthogonal Functions (EOF) An Interesting Historical Note The 1st (?) application of PCA to Functional Data Analysis: Rao, C. R. (1958) Some statistical methods for comparison of growth curves, Biometrics, 14, 1-17. 1st Paper with “Curves as Data” viewpoint Detailed Look at PCA Three important (and interesting) viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1st: Review linear alg. and multivar. prob. Review of Linear Algebra Vector Space: x, • set of “vectors”, • and “scalars” (coefficients), • “closed” under “linear combination” ( x1 e.g. d x : x1 ,..., xd x , d a a x i i i “ d dim Euclid’n space” in space) Review of Linear Algebra (Cont.) Subspace: • subset that is again a vector space • i.e. closed under linear combination • e.g. lines through the origin • e.g. planes through the origin • e.g. subsp. “generated by” a set of vector (all linear combos of them = = containing hyperplane through origin) Review of Linear Algebra (Cont.) Basis of subspace: set of vectors that: • span, i.e. everything is a lin. com. of them • are linearly indep’t, i.e. lin. Com. is unique • e.g. • since d 1 0 0 “unit vector basis” 0 1 , ,..., 0 0 0 1 x1 1 0 0 0 1 x2 x1 x 2 x d 0 0 0 1 xd Review of Linear Algebra (Cont.) Basis Matrix, of subspace of Given a basis, d v1 ,..., vn , create matrix of columns: B v1 v1n v11 vn v vdn d n d1 Review of Linear Algebra (Cont.) Then “linear combo” is a matrix multiplicat’n: n a v i 1 i i Ba Check sizes: where a1 a a n d 1 (d n) (n 1) Review of Linear Algebra (Cont.) Aside on matrix multiplication: (linear transformat’n) For matrices a1,1 a1, m b1,1 b1, n A B a , b a b k , 1 k , m m , 1 m , n Define the “matrix product” m a1,i bi ,1 i 1 AB m a k ,i bi ,1 i 1 a1,i bi , n i 1 m a b k ,i i , n i 1 m (“inner products” of columns with rows) (composition of linear transformations) Often useful to check sizes: k n k m m n Review of Linear Algebra (Cont.) Matrix trace: • For a square matrix m • Define tr ( A) ai ,i a1,1 a1, m A a a m,m m,1 i 1 • Trace commutes with matrix multiplication: tr AB tr BA Review of Linear Algebra (Cont.) Dimension of subspace (a notion of “size”): • number of elements in a basis (unique) • dim d d • e.g. dim of a line is 1 • e.g. dim of a plane is 2 • dimension is “degrees of freedom” (use basis above) Review of Linear Algebra (Cont.) Norm of a vector: • in d, 1/ 2 2 x x j j 1 d x x t 1/ 2 • Idea: “length” of the vector • Note: strange properties for high d , e.g. “length of diagonal of unit cube” = d Review of Linear Algebra (Cont.) Norm of a vector (cont.): • “length normalized vector”: x x (has length one, thus on surf. of unit sphere & is a direction vector) • get “distance” as: d x , y x y x y x y t Review of Linear Algebra (Cont.) Inner (dot, scalar) product: d x, y x j y j x y t j 1 • for vectors x and y, • related to norm, via x x, x x x t Review of Linear Algebra (Cont.) Inner (dot, scalar) product (cont.): • measures “angle between x, y 1 anglex, y cos x y x and y ” as: t x y cos 1 xt x yt y • key to “orthogonality”, i.e. “perpendicul’ty”: x y if and only if x, y 0 Review of Linear Algebra (Cont.) Orthonormal basis v1 ,..., vn : • All ortho to each other, i.e. vi , vi ' 0 , for i i' • All have length 1, i.e. vi , vi 1, for i 1,..., n Review of Linear Algebra (Cont.) Orthonormal basis v1 ,..., vn (cont.): n x a i vi • “Spectral Representation”: ai x, vi where check: x, vi i 1 n a v ,v i '1 i' i' n i a i ' vi ' , vi a i i '1 • Matrix notation: x B a where a t x t B i.e. a B t x a is called “transform (e.g. Fourier, wavelet) of x ” Review of Linear Algebra (Cont.) Parseval identity, for x in subsp. gen’d by o. n. basis v1 ,..., vn : n x x, vi 2 i 1 2 n a a i 1 2 i 2 • Pythagorean theorem • “Decomposition of Energy” • ANOVA - sums of squares • Transform, a , has same length as x , i.e. “rotation in d ” Review of Linear Algebra (Cont.) Gram-Schmidt Ortho-normalization Idea: Given a basis v1 ,..., vn, find an orthonormal version, by subtracting non-ortho part u 1 v1 / v1 v u 2 v 2 v 2 , u1 u1 / v 2 v 2 , u1 u1 u3 3 v 3 , u1 u1 v 3 , u1 u1 / v 3 v 3 , u1 u1 v 3 , u1 u1 Review of Linear Algebra (Cont.) Projection of a vector x onto a subspace V : • Idea: member of V that is closest to x (i.e. “approx’n”) • Find PV x V that solves: min x v vV (“least squares”) • For inner product (Hilbert) space: PV x exists and is unique Review of Linear Algebra (Cont.) Projection of a vector onto a subspace (cont.): • General solution in : for basis matrix BV , d PV x BV B BV B x 1 t V t V • So “proj’n operator” is “matrix mult’n”: PV BV B BV t V 1 BVt (thus projection is another linear operation) (note same operation underlies least squares) Review of Linear Algebra (Cont.) Projection using orthonormal basis v1 ,..., vn : • Basis matrix is “orthonormal”: v ,v v1t 1 1 v1 vn t vn vn , v1 • So BVt BV I nn v1 , vn 1 0 vn , vn 0 1 PV x BV BVt x = = Recon(Coeffs of x “in V dir’n”) Review of Linear Algebra (Cont.) Projection using orthonormal basis (cont.): V • For “orthogonal complement”, , x PV x PV x x PV x PV x 2 and 2 • Parseval inequality: n PV x x x, vi 2 2 i 1 2 n ai2 a i 1 2 2 Review of Linear Algebra (Cont.) (Real) Unitary Matrices: U d d with U tU I • Orthonormal basis matrix (so all of above applies) • Follows that UU I t 1 U (since have full rank, so exists …) • Lin. trans. (mult. by U ) is like “rotation” of d • But also includes “mirror images” Review of Linear Algebra (Cont.) Singular Value Decomposition (SVD): For a matrix X d n Find a diagonal matrix S d n, with entries s1 ,..., smin( d , n ) called singular values And unitary (rotation) matrices U d d , Vnn (recall U tU V tV I ) so that X USV t Review of Linear Algebra (Cont.) Intuition behind Singular Value Decomposition: • For X a “linear transf’n” (via matrix multi’n) X v U S V t v U S V t v • First rotate • Second rescale coordinate axes (by si ) • Third rotate again • i.e. have diagonalized the transformation Review of Linear Algebra (Cont.) SVD Compact Representation: Useful Labeling: s1 smin( n ,d ) Singular Values in Increasing Order Note: singular values = 0 can be omitted Let r = # of positive singular values Then: Where X U d r SrrVnr t are truncations of U , S , V Review of Linear Algebra (Cont.) Eigenvalue Decomposition: For a (symmetric) square matrix X d d 1 0 Find a diagonal matrix D 0 d And an orthonormal matrix Bd d (i.e. B t B B B t I d d ) So that: X B B D, i.e. X B D B t Review of Linear Algebra (Cont.) Eigenvalue Decomposition (cont.): • Relation to Singular Value Decomposition (looks similar?): • Eigenvalue decomposition “harder” U V • Since needs • Price is eigenvalue decomp’n is generally complex • Except for X square and symmetric • Then eigenvalue decomp. is real valued • Thus is the sing’r value decomp. with: U V B