SUPELEC 2010
Øyvind Ryan
June 2010
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
Given A, B two n × n independent square Hermitian (or symmetric) random matrices
1.
Can one derive the eigenvalue distribution of A from the ones of A + B and B?
2.
Can one derive the eigenvalue distribution of A from the ones of AB and B?
More generally, such questions can be asked starting with any functional of the involved random matrices. If 1. or 2. can be answered for given random matrices A and B, the corresponding operation for nding the eigenvalue distribution is called deconvolution.
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
I
I
I
Deconvolution can be easier to perform in the large n-limit, in particular for Vandermonde matrices [1, 2], and Gaussian matrices [3] (freeness and almost sure convergence).
For Gaussian matrices, there also exist results in the nite regime. [4] provides a moment-based framework for deconvolution for such matrices.
The p-th moment of an n × n random matrix A is
A p
= E [ tr ( A p
)] ,
I where E [ · ] is the expectation, tr the normalized trace. The moments can be used to retrieve the spectrum.
We will more generally write A what we call mixed moments.
p
1
,..., p k
= A p
1
A p
2
· · · A p k for
(1)
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
How can the framework [4] best be adapted for deconvolution/spectrum estimation once we have L observations of a given random matrix model?
A random matrix model takes the form
Y = f ( D , X
1
, X
2
, ..., X n
) , where D is the unknown matrix, X i are random matrices with known statistical properties, and f is a known function.
[5] considers the particular case where
Y = f ( D , X ) = D + X , and how the following result from [4] can be adapted to obtain a
"best possible" estimator of the spectrum of D from L observations
Y
1
, ..., Y l of Y:
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
Theorem
Let X be an n × N standard, complex, Gaussian matrix, D a deterministic n × N matrix, and set D p
= tr
¡¡ that
1
N
DD H
¢ p
¢
. We have
E
· tr
=
µµ
1
N
(
X
D
π ∈ SPp
π = π ( ρ 1 ,ρ 2
+
, q )
X )( nN
1
| ρ
E
1
|
+
N
X )
H
¶ p
¶¸ k ( ρ ( π )) − kd ( ρ ( π ))
× n l ( ρ ( π )) − ld ( ρ ( π ))
Y
× n | σ ( π ) | D
| σ ( π ) i i
| / 2
.
(2) where σ ( π ) , ρ ( π ) , k ( ρ ( π )) , kd ( ρ ( π )) , l ( ρ ( π )) , ld ( ρ ( π )) , are from [4], l i are the cardinalities of the blocks of σ ( π ) divided by 2.
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
Theorem 1 can be turned around to the following [5], which can be used for deconvolution:
Theorem
Let Y = D + X be an observation, and let Y p the moments, D p as before.
= tr
¡¡
1
N
YY H
¢ p
¢ be
D \
1
,..., p k
=
X
( − 1 ) | ρ
1
| n | σ ( π ) |− k
N | ρ
1
|
π ∈ SPp
π = π ( ρ 1 ,ρ 2 , q )
× N k ( ρ ( π )) − kd ( ρ ( π )) n l ( ρ ( π )) − ld ( ρ ( π )) for all p. In particular,
× Y l
1
,..., l r is an unbiased estimator for D p
1
,..., p k
, i.e.
E
³
D \
1
,..., p k
´
(3)
= D p
1
,..., p k is an unbiased estimator for D p
.
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
I
I
From L = L
( nL
1
1
) × ( NL
2
L
2 observations Y
1
, ..., Y
L of Y, we can form the
) compound observation matrix, denoted Y by stacking the observations into a L given order.
1
× L
2
L
1
, L
2 block matrix in a
,
Similarly, if D is non-random, we will denote by D compound matrix formed in the same way from D.
L
1
, L
2 the
Some questions arise:
I
I
Can the compound observation matrix be put into the framework [4]?
We can stack observations in any way we want: horizontally
(H), vertically (V ), or rectangularly (R). Is one stacking better than another? Which stacking is "optimal"?
I There are many other ways of combining observations than stacking (averaging etc.). Is there a general way for nding which way is the best?
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
Since Y
L
1
, L
2
= D
L
1 the moments of D
, L
2
L
1
, L
2
+ X
L
1
, L
2
, X
L
1
, L
2 is again Gaussian, and since and D are directly related, we can prove:
Theorem
Let instead Y p be the moments
Y p
= tr
µµ
1
NL
2
Y
L
1
, L
2
Y
L
1
, L
2
¶ p
¶
.
(4)
D p
1
\ k
, L
1
, L
2
= L k − p
1
1
−···− p k
X
π ∈ SPp
π = π ( ρ 1 ,ρ 2 , q )
( − 1 ) | ρ
1
|
( nL
1
) | σ ( π ) |− k
( NL
2
) | ρ
1
|
× ( L
2
N ) k ( ρ ( π )) − kd ( ρ ( π )) ( L
1 n ) l ( ρ ( π )) − ld ( ρ ( π ))
× Y l
1
,..., l r is also an unbiased estimator for D particular D \
1 2 p
1
,..., p k for any L is an unbiased estimator for D
Øyvind Ryan
.
1
, L
2
. In
(5)
SUPELEC 2010
On the optimal stacking of noisy observations
I
I
We will denote by v p , · , L the variance of
· denotes the stacking (H , V , R , A).
D \
1
, L
2
,
The following is the main result in [5]:
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
v p , · , L are all O ( L − 1
) . Moreover,
L lim
→∞
Lv
1 , · , L
=
2 nN
D
1
+
1 nN
, and for p ≥ 2
L
L
L
L lim
→∞ lim
→∞ lim
→∞ lim
→∞
Lv
Lv
Lv
Lv p , R , L p , V , L p , H , L p , A , L
=
=
=
=
2p 2 nN
2p nN
2 nN
2p 2 nN
2p 2
D
D
D
D
2p − 1
2p − 1
2p − 1
2p − 1
+ Q ( D
2p − 3
+
+
+
N p p 2
2
2
, ..., p
N
D
D
D
2
1
2
2p − 2
2p − 2
) ,
+ p 2 nN
¶
D
2p − 2
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010
On the optimal stacking of noisy observations
In particular, all rectangular stackings asymptotically have the same variance, and
L lim
→∞
L lim
→∞
Lv p , R , L
Lv p , R , L
≤
L lim
→∞
≤
L lim
→∞
Lv
Lv p , V , L p , H , L
≤
L lim
→∞
≤
L lim
→∞
Lv p , A , L
Lv p , A , L
.
Also,
I the variance decreases with L for a xed stacking aspect ratio,
I
I for a given L and any rectangular stackings R
1
L = L v p , R
1
, L
( 1 )
1
× L ( 1 )
2
< v p , R
2
, L and L = L ( 2 )
1
× L ( 2 )
2 if and only if the ( nL
, R
2 into observations, respectively.
( 1 )
1
) × ( NL observation matrix is more square than the ( nL compound observation matrix,
( 1 )
2
( 2 )
1
) compound
) × ( NL ( 2 )
2
) v p , · , L
< v p , A , L for any stacking.
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
I
I
We nd concrete expressions for the variances v p , · , L the ones obtained for the estimators themselves (5).
, similar to
In these expressions we nd that the polynomials Q in the main theorem have only positive coecients, and can be written as sums of terms of the form
µ nL
1
NL
2
¶ k
+
µ nL
1
NL
2
¶
− k
I
I
The function c k + c − k has its minimum on ( 0 , ∞ ) when c = 1.
c = 1 corresponds to nL
1
= NL
2
, which corresponds to a compound observation matrix which is square.
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
I
I
I
I
The gure displays Lv
3 , · , L for the dierent estimators for the additive model, for dierent number of observations L.
A diagonal matrix D with entries 2 , 1 , 1 , 0 .
5 on the diagonal has been chosen.
The three rectangular lines are the theoretical limits lim
L →∞ order.
Lv
3 , · , L for rectangular stacking, horizontal stacking, and averaging, as predicted by the main theorem, in increasing
It is seen that aspect ratio near 1 gives lowest variance, and that the variances decreases towards the theoretical limit predicted by the main theorem when L increases.
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
1.2
1
0.8
0.6
1.6
1.4
0.4
0.2
0 2 4 6 c
(a) L = 5
8 10
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0 2 4 6 8 10 c
(b) L = 50. Empirical variances are also shown for c = 0 .
02 , 0 .
08 , 0 .
5 , 2.
Figure: Simulation for the additive model
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010
On the optimal stacking of noisy observations
I
I
The model Y = f ( D , X
1
, X
2
) = DX
1
+ X
2 with X
1
, X
2
Gaussian, also ts into the deconvolution framework of [4].
Since the compound matrix
Y
1 , L
= DX
1 , 1 , L
+ X
2 , 1 , L
I
I follows the same model, we can also apply the framework to this matrix.
We have compared the empirical variances of the corresponding moment estimators in the case of stacking, with the case where we perform averaging.
We have not computed the mathematical expression for the variance itself, as this is more involved.
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
700
600
500
400
300
200
100
Empirical variance horizontal stacking
Empirical variance averaging
0
0 10 20 30 60 70 80 90 100 40 50
L
Figure: The empirical variances for the estimators for the third moment of DD H for the model Y = DX
1
+ X
2
. Same matrix D as in Figure 1 was chosen. For each L the estimator was run 50 times on a set of L observations, and the empirical variance was computed from this. It seems that the empirical variance is lower for the case of horizontal stacking, suggesting that results on stacking valid for the additive model may have validity for more general models also.
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
I
I
I
I
I
What is the best way, in more general terms, of combining observations? Initial calculations suggest that averaging observations Y i
= D + X i before the deconvolution framework is applied has variance comparable with that of stacking.
Finding expressions for the variance for more general models f ( D , X
1
, X
2
, ..., X n
) , and analyze these in the same way.
How can more general models be stacked?
We have only considered estimators which perform averaging or stacking of observations. Future work could consider non-linear ways or other ways of combining observations, and compare results on these with the results obtained here.
The case when the noise is not Gaussian (main result has some signicance here as well), also in the nite regime.
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
I
I
This talk is available at http://folk.uio.no/oyvindry/talks.shtml
My publications are listed at http://folk.uio.no/oyvindry/publications.shtml
THANK YOU!
Øyvind Ryan On the optimal stacking of noisy observations
SUPELEC 2010 On the optimal stacking of noisy observations
Ø. Ryan and M. Debbah, Asymptotic behaviour of random
Vandermonde matrices with entries on the unit circle, IEEE
Trans. on Information Theory, vol. 55, no. 7, pp. 31153148,
2009.
, Convolution operations arising from Vandermonde matrices, Submitted to IEEE Trans. on Information Theory,
2009.
, Free deconvolution for signal processing applications,
Submitted to IEEE Trans. on Information Theory, 2007, http://arxiv.org/abs/cs.IT/0701025.
Ø. Ryan, A. Masucci, S. Yang, and M. Debbah, Finite dimensional statistical inference, Submitted to IEEE Trans. on
Information Theory, 2009.
Ø. Ryan, On the optimal stacking of noisy observations,
Submitted to IEEE Trans. Signal Process., 2010.
Øyvind Ryan On the optimal stacking of noisy observations