18.338 Course Project
Miriam Huntley
SEAS, Harvard University
May 15, 2013
Real World Data
“When it comes to RMT in the real world, we know close to nothing.”
-Prof. Alan Edelman, last week
RMT
Who Cares about
Covariance Matrices?
•
•
• Basic assumption in many areas of data analysis: multivariate data
X
=
Y
S 1/2
You get , want to find
X T X
X n can be a very bad estimator if finite
• Current standard using PCA (=SVD): distinguish from
X
=
Y
I
• In RMT language: any eigenvalues which lie very far away from the distribution expected for a white
Wishart matrix should be considered signal
Who Cares about
Covariance Matrices?
Gene Expression Data
2000
2500
3000
3500
4000
500
1000
1500
20 40
Samples
60 80
Data from:
Alizadeh A, et al. (2000) Distinct types of diffuse large B-cell lymphoma identifed by gene expression profiling. Nature 403:503-511.
Why adventure beyond white Wishart?
• X
=
Y
I
Can we do better?
• Noise with structure
Example: Financial data x i
= s
What if there is no right edge?
t x i
• S we recover it from empirical data?
Approach:
General MP Law
• X nxp
X
=
Y
S
• Y entries are iid (real or complex) and
E ( Y i , j g = p n
)
=
0, E ( Y i , j
• Let H p be the spectral distribution of and assume H p converges weakly to H
∞
S p
• Let F and
• Then:
P v
F
P its Stieltjes transform
XX
T v
F
P
® v
¥
v
¥
1
( z )
= z
ò
1
+ dH l v
¥
¥
( l
)
( z )
,
" z
Î
C
+
2
)
=
1
See: Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. J. Multivariate Anal. 54, 2,175–192.
El Karoui, N., Spectrum estimation for large dimensional covariance matrices using random matrix theory, Ann. Statist. 36 (2008), 2757–2790
Numerical Solutions of
General MP
Single, True
Covariance
Matrix
True Covariance
Matrix Spectral
Distribution
-
1 v ( z )
= z
ò
1
+ dH ( l
) l v ( z )
,
" z
Î
C
+
Discretize in z
Numerically Solve
Live Demos…
Empirical
Spectral
Distribution
Inverse Solutions of
General MP?
Single, True
Covariance
Matrix
True Covariance
Matrix Spectral
Distribution
-
1 v ( z )
= z
ò
1
+ dH ( l
) l v ( z )
,
" z
Î
C
+
Discretize in z
Numerically Solve
Empirical
Spectral
Distribution
Toy Example:
Block Covariance Matrix
?
Warning: Don’t try this at home
Toy Example:
Block Covariance Matrix
Thanks!
This was fun.
• Colwell LJ, Qin Y, Manta A and Brenner MP (2013). Signal identification from Sample Covariance Matrices with
Correlated Noise. Under Review
• El Karoui, N., Spectrum estimation for large dimensional covariance matrices using random matrix theory, Ann.
Statist. 36 (2008), 2757–2790
• MARCENKO , V. A. and PASTUR, L. A. (1967). Distribution of eigenvalues in certain sets of random matrices. Mat.
Sb. (N.S.) 72 507–536.
• Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. J. Multivariate Anal. 54, 2,175–192.