18.338 Course Project

Numerical Methods for Empirical

Covariance Matrix

Analysis

Miriam Huntley

SEAS, Harvard University

May 15, 2013

Real World Data

“When it comes to RMT in the real world, we know close to nothing.”

-Prof. Alan Edelman, last week

RMT

Who Cares about

Covariance Matrices?

• Basic assumption in many areas of data analysis: multivariate data

X

=

Y

S 1/2

You get , want to find

X T X

X n can be a very bad estimator if finite

• Current standard using PCA (=SVD): distinguish from

X

=

Y

I

• In RMT language: any eigenvalues which lie very far away from the distribution expected for a white

Wishart matrix should be considered signal

Who Cares about

Covariance Matrices?

Gene Expression Data

2000

2500

3000

3500

4000

500

1000

1500

20 40

Samples

60 80

Data from:

Alizadeh A, et al. (2000) Distinct types of diffuse large B-cell lymphoma identifed by gene expression profiling. Nature 403:503-511.

Why adventure beyond white Wishart?

• X

=

Y

I

Can we do better?

• Noise with structure

 Example: Financial data x i

= s

 What if there is no right edge?

t x i

• S we recover it from empirical data?

Approach:

General MP Law

• X nxp

X

=

Y

S

• Y entries are iid (real or complex) and

E ( Y i , j g = p n

)

=

0, E ( Y i , j

• Let H p be the spectral distribution of and assume H p converges weakly to H

S p

• Let F and

• Then:

P v

F

P its Stieltjes transform

XX

T v

F

P

® v

¥

v

¥

1

( z )

= z

ò

1

+ dH l v

¥

¥

( l

)

( z )

,

" z

Î

C

+

2

)

=

1

See: Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. J. Multivariate Anal. 54, 2,175–192.

El Karoui, N., Spectrum estimation for large dimensional covariance matrices using random matrix theory, Ann. Statist. 36 (2008), 2757–2790

Numerical Solutions of

General MP

Single, True

Covariance

Matrix

True Covariance

Matrix Spectral

Distribution

-

1 v ( z )

= z

ò

1

+ dH ( l

) l v ( z )

,

" z

Î

C

+

Discretize in z

Numerically Solve

Live Demos…

Empirical

Spectral

Distribution

Inverse Solutions of

General MP?

Single, True

Covariance

Matrix

True Covariance

Matrix Spectral

Distribution

-

1 v ( z )

= z

ò

1

+ dH ( l

) l v ( z )

,

" z

Î

C

+

Discretize in z

Numerically Solve

Empirical

Spectral

Distribution

Toy Example:

Block Covariance Matrix

?

Warning: Don’t try this at home

Toy Example:

Block Covariance Matrix

Thanks!

This was fun.

• Colwell LJ, Qin Y, Manta A and Brenner MP (2013). Signal identification from Sample Covariance Matrices with

Correlated Noise. Under Review

• El Karoui, N., Spectrum estimation for large dimensional covariance matrices using random matrix theory, Ann.

Statist. 36 (2008), 2757–2790

• MARCENKO , V. A. and PASTUR, L. A. (1967). Distribution of eigenvalues in certain sets of random matrices. Mat.

Sb. (N.S.) 72 507–536.

• Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. J. Multivariate Anal. 54, 2,175–192.