singular values = 0

advertisement
Object Orie’d Data Analysis, Last Time
• Gene Cell Cycle Data
• Microarrays and HDLSS visualization
• DWD bias adjustment
• NCI 60 Data
Today: More NCI 60 Data &
Detailed (math’cal) look at PCA
Last Time: Checked Data
Combo, using DWD Dir’ns
DWD Views of NCI 60 Data
Interesting Question:
Which clusters are really there?
Issues:
• DWD great at finding dir’ns of separation
• And will do so even if no real structure
• Is this happening here?
• Or: which clusters are important?
• What does “important” mean?
Real Clusters in NCI 60 Data
Simple Visual Approach:
• Randomly relabel data (Cancer Types)
• Recompute DWD dir’ns & visualization
• Get heuristic impression from this
Deeper Approach
• Formal Hypothesis Testing
(Done later)
Random Relabelling #1
Random Relabelling #2
Random Relabelling #3
Random Relabelling #4
Revisit Real Data
Revisit Real Data (Cont.)
Heuristic Results:
Strong Clust’s
Weak Clust’s
Not Clust’s
Melanoma
CNS
NSCLC
Leukemia
Ovarian
Breast
Renal
Colon
Later: will find way to quantify these ideas
i.e. develop statistical significance
NCI 60 Controversy
• Can NCI 60 Data be normalized?
• Negative Indication:
• Kou, et al (2002) Bioinformatics, 18, 405412.
– Based on Gene by Gene Correlations
• Resolution:
Gene by Gene Data View
vs.
Multivariate Data View
Resolution of Paradox: Toy
Data, Gene View
Resolution: Correlations
suggest “no chance”
Resolution: Toy Data, PCA View
Resolution: PCA & DWD direct’ns
Resolution: DWD Adjusted
Resolution: DWD Adjusted,
PCA view
Resolution: DWD Adjusted,
Gene view
Resolution: Correlations &
PC1 Projection Correl’n
Needed final verification of
Cross-platform Normal’n
• Is statistical power actually improved?
• Will study later
DWD: Why does it work?
Rob Tibshirani Query:
• Really need that complicated stuff?
(DWD is complex)
• Can’t we just use means?
• Empirical Fact (Joel Parker):
(DWD better than simple methods)
DWD: Why does it work?
Xuxin Liu Observation:
• Key is unbalanced sub-sample sizes
(e.g biological subtypes)
• Mean methods strongly affected
• DWD much more robust
• Toy Example
DWD: Why does it work?
Xuxin Liu Example
• Goals:
– Bring colors together
– Keep symbols distinct (interesting biology)
• Study varying sub-sample proportions:
–
–
–
–
Ratio = 1: Both methods great
Ratio = 0.61: Mean degrades, DWD good
Ratio = 0.35: Mean poor, DWD still OK
Ratio = 0.11: DWD degraded, still better
• Later: will find underlying theory
PCA: Rediscovery – Renaming
Statistics:
Principal Component Analysis (PCA)
Social Sciences:
Factor Analysis (PCA is a subset)
Probability / Electrical Eng:
Karhunen – Loeve expansion
Applied Mathematics:
Proper Orthogonal Decomposition (POD)
Geo-Sciences:
Empirical Orthogonal Functions (EOF)
An Interesting Historical Note
The 1st (?) application of PCA to Functional
Data Analysis:
Rao, C. R. (1958) Some statistical methods
for comparison of growth curves,
Biometrics, 14, 1-17.
1st Paper with “Curves as Data” viewpoint
Detailed Look at PCA
Three important (and interesting) viewpoints:
1.
Mathematics
2.
Numerics
3.
Statistics
1st: Review linear alg. and multivar. prob.
Review of Linear Algebra
Vector Space:
x,
•
set of “vectors”,
•
and “scalars” (coefficients),
•
“closed” under “linear combination”
(


 x1 
e.g. d   

   x     : x1 ,..., xd  
x 

,


d


a
a x
i
i
i
“ d dim Euclid’n space”
in space)
Review of Linear Algebra (Cont.)
Subspace:
• subset that is again a vector space
• i.e. closed under linear combination
• e.g. lines through the origin
• e.g. planes through the origin
• e.g. subsp. “generated by” a set of vector
(all linear combos of them =
= containing hyperplane
through origin)
Review of Linear Algebra (Cont.)
Basis of subspace: set of vectors that:
•
span, i.e. everything is a lin. com. of them
•
are linearly indep’t, i.e. lin. Com. is unique

•
e.g.
•
since
d
 1   0   0 
     

“unit vector basis”  0   1    
 ,  ,...,  
       0 
 0   0   1 
 x1 
1
 0
 0
 
 
 
 
0
1

 x2 



    x1     x 2       x d  0 
 
 
 
 
0
 0
1
 xd 
Review of Linear Algebra (Cont.)
Basis Matrix, of subspace of
Given a basis,

d
v1 ,..., vn ,
create matrix of columns:

B  v1
v1n 
 v11


 vn      
v

vdn  d n
 d1

Review of Linear Algebra (Cont.)
Then “linear combo” is a matrix multiplicat’n:
n
a v
i 1
i i
 Ba
Check sizes:
where
 a1 
 
a  
a 
 n
d 1  (d  n)  (n 1)
Review of Linear Algebra (Cont.)
Aside on matrix multiplication: (linear transformat’n)
For matrices
 a1,1  a1, m 
 b1,1  b1, n 




A  
 
B    
a
 ,
b


a

b
k
,
1
k
,
m
m
,
1
m
,
n




Define the “matrix product”
 m
  a1,i bi ,1 
 i 1
AB  


 m
  a k ,i bi ,1 
 i 1

a1,i bi , n 

i 1



m

a
b

k ,i i , n 
i 1

m
(“inner products” of columns with rows)
(composition of linear transformations)
Often useful to check sizes: k  n  k  m  m  n
Review of Linear Algebra (Cont.)
Matrix trace:
• For a square matrix
m
• Define tr ( A)   ai ,i
 a1,1  a1, m 


A  
 
a


a
m,m 
 m,1
i 1
• Trace commutes with matrix multiplication:
tr  AB   tr  BA
Review of Linear Algebra (Cont.)
Dimension of subspace (a notion of “size”):
•
number of elements in a basis (unique)
•
dim d   d
•
e.g.
dim of a line is 1
•
e.g.
dim of a plane is 2
•
dimension is “degrees of freedom”
(use basis above)
Review of Linear Algebra (Cont.)
Norm of a vector:
• in

d,
1/ 2

2
x    x j 
 j 1 
d
 
 x x
t
1/ 2
• Idea: “length” of the vector
• Note:
 strange properties for high d ,
e.g. “length of diagonal of unit cube” = d
Review of Linear Algebra (Cont.)
Norm of a vector (cont.):
• “length normalized vector”:
x
x
(has length one, thus on surf. of unit sphere
& is a direction vector)
• get “distance” as:
d x , y   x  y 
x  y  x  y 
t
Review of Linear Algebra (Cont.)
Inner (dot, scalar) product:
d
x, y   x j y j  x y
t
j 1
• for vectors
x
and
y,
• related to norm, via
x  x, x  x x
t
Review of Linear Algebra (Cont.)
Inner (dot, scalar) product (cont.):
• measures “angle between
 x, y
1 
anglex, y   cos
 x y

x and y ” as:
t



x
y

  cos 1 

 xt x  yt y 



• key to “orthogonality”, i.e. “perpendicul’ty”:
x y
if and only if
x, y  0
Review of Linear Algebra (Cont.)
Orthonormal basis
v1 ,..., vn :
• All ortho to each other,
i.e. vi , vi '  0 ,
for
i  i'
• All have length 1,
i.e. vi , vi  1,
for i  1,..., n
Review of Linear Algebra (Cont.)
Orthonormal basis
v1 ,..., vn
(cont.):
n
x   a i vi
• “Spectral Representation”:
ai  x, vi
where
check: x, vi 
i 1
n
a v ,v
i '1
i' i'
n
i
  a i ' vi ' , vi  a i
i '1
• Matrix notation: x  B a where a t  x t B i.e. a  B t x
a is called “transform (e.g. Fourier, wavelet) of x ”
Review of Linear Algebra (Cont.)
Parseval identity, for x
in subsp. gen’d by o. n. basis v1 ,..., vn :
n
x   x, vi
2
i 1
2
n
 a  a
i 1
2
i
2
• Pythagorean theorem
• “Decomposition of Energy”
• ANOVA - sums of squares
• Transform, a , has same length as x ,
i.e. “rotation in  d ”
Review of Linear Algebra (Cont.)
Gram-Schmidt Ortho-normalization
Idea: Given a basis
v1 ,..., vn,
find an orthonormal version,
by subtracting non-ortho part
u 1  v1 / v1

 v

u 2  v 2  v 2 , u1 u1 / v 2  v 2 , u1 u1
u3
3

 v 3 , u1 u1  v 3 , u1 u1 / v 3  v 3 , u1 u1  v 3 , u1 u1
Review of Linear Algebra (Cont.)
Projection of a vector
x onto a subspace V :
• Idea: member of V that is closest to x
(i.e. “approx’n”)
• Find PV x  V that solves:
min x  v
vV
(“least squares”)
• For inner product (Hilbert) space:
PV x exists and is unique
Review of Linear Algebra (Cont.)
Projection of a vector onto a subspace (cont.):
• General solution in  : for basis matrix BV ,
d
PV x  BV B BV  B x
1
t
V
t
V
• So “proj’n operator” is “matrix mult’n”:

PV  BV B BV
t
V

1
BVt
(thus projection is another linear operation)
(note same operation underlies least squares)
Review of Linear Algebra (Cont.)
Projection using orthonormal basis v1 ,..., vn :
• Basis matrix is “orthonormal”:
 v ,v
 v1t 
 1 1
 
   v1  vn   

 t
 vn 
 vn , v1
 


• So




BVt BV  I nn
v1 , vn   1  0 




  
 
vn , vn   0  1 

PV x  BV BVt x  =
= Recon(Coeffs of x “in V dir’n”)
Review of Linear Algebra (Cont.)
Projection using orthonormal basis (cont.):

V
• For “orthogonal complement”,
,
x  PV x  PV  x
x  PV x  PV  x
2
and
2
• Parseval inequality:
n
PV x  x   x, vi
2
2
i 1
2
n
  ai2  a
i 1
2
2
Review of Linear Algebra (Cont.)
(Real) Unitary Matrices: U d d with
U tU  I
• Orthonormal basis matrix
(so all of above applies)
• Follows that
UU  I
t
1
U
(since have full rank, so
exists …)
• Lin. trans. (mult. by U ) is like “rotation” of  d
• But also includes “mirror images”
Review of Linear Algebra (Cont.)
Singular Value Decomposition (SVD):
For a matrix X d n
Find a diagonal matrix S d n,
with entries
s1 ,..., smin( d , n )
called singular values
And unitary (rotation) matrices U d d , Vnn
(recall U tU  V tV  I )
so that
X  USV
t
Review of Linear Algebra (Cont.)
Intuition behind Singular Value Decomposition:
• For X a “linear transf’n” (via matrix multi’n)
X  v  U  S  V t  v  U  S  V t  v 
• First rotate
• Second rescale coordinate axes (by si )
• Third rotate again
• i.e. have diagonalized the transformation
Review of Linear Algebra (Cont.)
SVD Compact Representation:
Useful Labeling:
s1    smin( n ,d )
Singular Values in Increasing Order
Note: singular values = 0 can be omitted
Let
r = # of positive singular values
Then:
Where
X  U d r SrrVnr
t
are truncations of U , S , V
Review of Linear Algebra (Cont.)
Eigenvalue Decomposition:
For a (symmetric) square matrix X d d
 1  0 


Find a diagonal matrix D      
0   

d 
And an orthonormal matrix Bd d
(i.e. B t  B  B  B t  I d d )
So that: X  B  B  D,
i.e. X  B  D  B t
Review of Linear Algebra (Cont.)
Eigenvalue Decomposition (cont.):
• Relation to Singular Value Decomposition
(looks similar?):
• Eigenvalue decomposition “harder”
U V
• Since needs
• Price is eigenvalue decomp’n is generally
complex
• Except for X square and symmetric
• Then eigenvalue decomp. is real valued
• Thus is the sing’r value decomp. with:
U V  B
Download