On the optimal stacking of noisy observations Øyvind Ryan June 2010 SUPELEC 2010

SUPELEC 2010

On the optimal stacking of noisy observations

Øyvind Ryan

June 2010

Øyvind Ryan On the optimal stacking of noisy observations

SUPELEC 2010 On the optimal stacking of noisy observations

Main question

Given A, B two n × n independent square Hermitian (or symmetric) random matrices

1.

Can one derive the eigenvalue distribution of A from the ones of A + B and B?

2.

Can one derive the eigenvalue distribution of A from the ones of AB and B?

More generally, such questions can be asked starting with any functional of the involved random matrices. If 1. or 2. can be answered for given random matrices A and B, the corresponding operation for nding the eigenvalue distribution is called deconvolution.



Deconvolution

I

I

I

Deconvolution can be easier to perform in the large n-limit, in particular for Vandermonde matrices [1, 2], and Gaussian matrices [3] (freeness and almost sure convergence).

For Gaussian matrices, there also exist results in the nite regime. [4] provides a moment-based framework for deconvolution for such matrices.

The p-th moment of an n × n random matrix A is

A p

= E [ tr ( A p

)] ,

I where E [ · ] is the expectation, tr the normalized trace. The moments can be used to retrieve the spectrum.

We will more generally write A what we call mixed moments.

p

1

,..., p k

= A p

1

A p

2

· · · A p k for

(1)



Main question

How can the framework [4] best be adapted for deconvolution/spectrum estimation once we have L observations of a given random matrix model?

A random matrix model takes the form

Y = f ( D , X

1

, X

2

, ..., X n

) , where D is the unknown matrix, X i are random matrices with known statistical properties, and f is a known function.

[5] considers the particular case where

Y = f ( D , X ) = D + X , and how the following result from [4] can be adapted to obtain a

"best possible" estimator of the spectrum of D from L observations

Y

1

, ..., Y l of Y:



Result from [4]

Theorem

Let X be an n × N standard, complex, Gaussian matrix, D a deterministic n × N matrix, and set D p

= tr

¡¡ that

1

N

DD H

¢ p

¢

. We have

E

· tr

=

µµ

1

N

(

X

D

π ∈ SPp

π = π ( ρ 1 ,ρ 2

+

, q )

X )( nN

1

| ρ

E

1

|

+

N

X )

H

¶ p

¶¸ k ( ρ ( π )) − kd ( ρ ( π ))

× n l ( ρ ( π )) − ld ( ρ ( π ))

Y

× n | σ ( π ) | D

| σ ( π ) i i

| / 2

.

(2) where σ ( π ) , ρ ( π ) , k ( ρ ( π )) , kd ( ρ ( π )) , l ( ρ ( π )) , ld ( ρ ( π )) , are from [4], l i are the cardinalities of the blocks of σ ( π ) divided by 2.



Result adapted for deconvolution

Theorem 1 can be turned around to the following [5], which can be used for deconvolution:

Theorem

Let Y = D + X be an observation, and let Y p the moments, D p as before.

= tr

¡¡

1

N

YY H

¢ p

¢ be

D \

1

,..., p k

=

X

( − 1 ) | ρ

1

| n | σ ( π ) |− k

N | ρ

1

|

π ∈ SPp

π = π ( ρ 1 ,ρ 2 , q )

× N k ( ρ ( π )) − kd ( ρ ( π )) n l ( ρ ( π )) − ld ( ρ ( π )) for all p. In particular,

× Y l

1

,..., l r is an unbiased estimator for D p

1

,..., p k

, i.e.

E

³

D \

1

,..., p k

´

(3)

= D p

1

,..., p k is an unbiased estimator for D p

.



Stacking observations into a compound observation matrix

I

I

From L = L

( nL

1

1

) × ( NL

2

L

2 observations Y

1

, ..., Y

L of Y, we can form the

) compound observation matrix, denoted Y by stacking the observations into a L given order.

1

× L

2

L

1

, L

2 block matrix in a

,

Similarly, if D is non-random, we will denote by D compound matrix formed in the same way from D.

L

1

, L

2 the

Some questions arise:

I

I

Can the compound observation matrix be put into the framework [4]?

We can stack observations in any way we want: horizontally

(H), vertically (V ), or rectangularly (R). Is one stacking better than another? Which stacking is "optimal"?

I There are many other ways of combining observations than stacking (averaging etc.). Is there a general way for nding which way is the best?



Since Y

L

1

, L

2

= D

L

1 the moments of D

, L

2

L

1

, L

2

+ X

L

1

, L

2

, X

L

1

, L

2 is again Gaussian, and since and D are directly related, we can prove:

Theorem

Let instead Y p be the moments

Y p

= tr

µµ

1

NL

2

Y

L

1

, L

2

Y

L

1

, L

2

¶ p

¶

.

(4)

D p

1

\ k

, L

1

, L

2

= L k − p

1

1

−···− p k

X

π ∈ SPp

π = π ( ρ 1 ,ρ 2 , q )

( − 1 ) | ρ

1

|

( nL

1

) | σ ( π ) |− k

( NL

2

) | ρ

1

|

× ( L

2

N ) k ( ρ ( π )) − kd ( ρ ( π )) ( L

1 n ) l ( ρ ( π )) − ld ( ρ ( π ))

× Y l

1

,..., l r is also an unbiased estimator for D particular D \

1 2 p

1

,..., p k for any L is an unbiased estimator for D

Øyvind Ryan

.

1

, L

2

. In

(5)

SUPELEC 2010

Notation for main result


I

I

We will denote by v p , · , L the variance of

· denotes the stacking (H , V , R , A).

D \

1

, L

2

,

The following is the main result in [5]:



Main result

v p , · , L are all O ( L − 1

) . Moreover,

L lim

→∞

Lv

1 , · , L

=

2 nN

D

1

+

1 nN

, and for p ≥ 2

L

L

L

L lim

→∞ lim

→∞ lim

→∞ lim

→∞

Lv

Lv

Lv

Lv p , R , L p , V , L p , H , L p , A , L

=

=

=

=

2p 2 nN

2p nN

2 nN

2p 2 nN

2p 2

D

D

D

D

2p − 1

2p − 1

2p − 1

2p − 1

+ Q ( D

2p − 3

+

+

+

N p p 2

2

2

, ..., p

N

D

D

D

2

1

2

2p − 2

2p − 2

) ,

+ p 2 nN

¶

D

2p − 2


SUPELEC 2010

Main result continued


In particular, all rectangular stackings asymptotically have the same variance, and

L lim

→∞

L lim

→∞

Lv p , R , L

Lv p , R , L

≤

L lim

→∞

≤

L lim

→∞

Lv

Lv p , V , L p , H , L

≤

L lim

→∞

≤

L lim

→∞

Lv p , A , L

Lv p , A , L

.

Also,

I the variance decreases with L for a xed stacking aspect ratio,

I

I for a given L and any rectangular stackings R

1

L = L v p , R

1

, L

( 1 )

1

× L ( 1 )

2

< v p , R

2

, L and L = L ( 2 )

1

× L ( 2 )

2 if and only if the ( nL

, R

2 into observations, respectively.

( 1 )

1

) × ( NL observation matrix is more square than the ( nL compound observation matrix,

( 1 )

2

( 2 )

1

) compound

) × ( NL ( 2 )

2

) v p , · , L

< v p , A , L for any stacking.



Proof

I

I

We nd concrete expressions for the variances v p , · , L the ones obtained for the estimators themselves (5).

, similar to

In these expressions we nd that the polynomials Q in the main theorem have only positive coecients, and can be written as sums of terms of the form

µ nL

1

NL

2

¶ k

+

µ nL

1

NL

2

¶

− k

I

I

The function c k + c − k has its minimum on ( 0 , ∞ ) when c = 1.

c = 1 corresponds to nL

1

= NL

2

, which corresponds to a compound observation matrix which is square.


A simple simulation


I

I

I

I

The gure displays Lv

3 , · , L for the dierent estimators for the additive model, for dierent number of observations L.

A diagonal matrix D with entries 2 , 1 , 1 , 0 .

5 on the diagonal has been chosen.

The three rectangular lines are the theoretical limits lim

L →∞ order.

Lv

3 , · , L for rectangular stacking, horizontal stacking, and averaging, as predicted by the main theorem, in increasing

It is seen that aspect ratio near 1 gives lowest variance, and that the variances decreases towards the theoretical limit predicted by the main theorem when L increases.



1.2

1

0.8

0.6

1.6

1.4

0.4

0.2

0 2 4 6 c

(a) L = 5

8 10

1.6

1.4

1.2

1

0.8

0.6

0.4

0.2

0 2 4 6 8 10 c

(b) L = 50. Empirical variances are also shown for c = 0 .

02 , 0 .

08 , 0 .

5 , 2.

Figure: Simulation for the additive model


SUPELEC 2010

A more involved model


I

I

The model Y = f ( D , X

1

, X

2

) = DX

1

+ X

2 with X

1

, X

2

Gaussian, also ts into the deconvolution framework of [4].

Since the compound matrix

Y

1 , L

= DX

1 , 1 , L

+ X

2 , 1 , L

I

I follows the same model, we can also apply the framework to this matrix.

We have compared the empirical variances of the corresponding moment estimators in the case of stacking, with the case where we perform averaging.

We have not computed the mathematical expression for the variance itself, as this is more involved.



700

600

500

400

300

200

100

Empirical variance horizontal stacking

Empirical variance averaging

0

0 10 20 30 60 70 80 90 100 40 50

L

Figure: The empirical variances for the estimators for the third moment of DD H for the model Y = DX

1

+ X

2

. Same matrix D as in Figure 1 was chosen. For each L the estimator was run 50 times on a set of L observations, and the empirical variance was computed from this. It seems that the empirical variance is lower for the case of horizontal stacking, suggesting that results on stacking valid for the additive model may have validity for more general models also.



Further work

I

I

I

I

I

What is the best way, in more general terms, of combining observations? Initial calculations suggest that averaging observations Y i

= D + X i before the deconvolution framework is applied has variance comparable with that of stacking.

Finding expressions for the variance for more general models f ( D , X

1

, X

2

, ..., X n

) , and analyze these in the same way.

How can more general models be stacked?

We have only considered estimators which perform averaging or stacking of observations. Future work could consider non-linear ways or other ways of combining observations, and compare results on these with the results obtained here.

The case when the noise is not Gaussian (main result has some signicance here as well), also in the nite regime.



I

I

This talk is available at http://folk.uio.no/oyvindry/talks.shtml

My publications are listed at http://folk.uio.no/oyvindry/publications.shtml

THANK YOU!



Ø. Ryan and M. Debbah, Asymptotic behaviour of random

Vandermonde matrices with entries on the unit circle, IEEE

Trans. on Information Theory, vol. 55, no. 7, pp. 31153148,

2009.

, Convolution operations arising from Vandermonde matrices, Submitted to IEEE Trans. on Information Theory,

2009.

, Free deconvolution for signal processing applications,

Submitted to IEEE Trans. on Information Theory, 2007, http://arxiv.org/abs/cs.IT/0701025.

Ø. Ryan, A. Masucci, S. Yang, and M. Debbah, Finite dimensional statistical inference, Submitted to IEEE Trans. on

Information Theory, 2009.

Ø. Ryan, On the optimal stacking of noisy observations,

Submitted to IEEE Trans. Signal Process., 2010.


On the optimal stacking of noisy observations Øyvind Ryan June 2010 SUPELEC 2010

On the optimal stacking of noisy observations

Main question

Deconvolution

Main question

Result from [4]

Result adapted for deconvolution

Stacking observations into a compound observation matrix

Notation for main result

Main result

Main result continued

Proof

A simple simulation

A more involved model

Further work

Related documents

Products

Support

On the optimal stacking of noisy observations Øyvind Ryan June 2010 SUPELEC 2010