Applying multiplicative free deconvolution to find limiting eigenvalue distributions of random matrices

advertisement
Applying multiplicative free deconvolution to find
limiting eigenvalue distributions of random
matrices
Øyvind Ryan and Merouane Debbah
Abstract
We reprove some well-known theorems on limiting eigenvalue distributions of random matrices, using free probability methods. The method
we will use is that of multiplicative free deconvolution, a tool we will
show both can simplify expressions for limiting eigenvalue distributions,
and also expressions for estimators of the spectral function for covariance
matrices. The method used will alse be explained in a purely free probabilistic framework also, i.e. independent of a random matrix setting.
Result of Dozier/Silverstein [1]
Let F µA be the empirical distribution function of the eigenvalues of A (F µA (x)
proportion of the eigenvalues of A less than x). µA means the probability
D
measure associated with the distribution of A. F µAn → F µA will denote weak
convergence. The Stieltjes transform of a matrix A with distribution function
F µA (x) is defined by
Z
1
dF µA (λ).
(1)
GµA (z) =
λ−z
The following terminology and restrictions will be used:
n
1. For n = 1, 2, · · · ,, Xn = (Xij
, n × N , i.d. for all i, j, n, independent across
1
1 2
i, j for each n, and E|X11 − EX11
| =1
2. Rn is n × N and independent of Xn , with F
F µR is a nonrandom p.d.f.
3. limn→∞
4. Wn =
n
N
=c
1
N (Rn
+ σXn )(Rn + σXn )∗
We will study the following theorem [1]:
1
µ
1 R R∗
N n n
D
→ F µR a.s. where
D
Theorem 1 Under assumptions 1-4, F µWn → F µW a.s. where F µW is a nonrandom p.d.f. characterized by
Z
dF µR (t)
GµW (z) =
(2)
t
2
2
1+σ 2 cGµ (z) − (1 + σ cGµW (z))z + σ (1 − c)
W
We will attempt to rewrite this expression to a simpler one using free probability
constructs, introduced in the next section.
Free probability
Definition 1 Let φ be a normalized linear functional on the ∗-algebra A. (A, φ)
is called a noncommutative probability space. A family of unital ∗-subalgebras
(Ai )i∈I ⊂ A will be called a free family if


aj ∈ Aij


i1 6= i2 , i2 6= i3 , · · · , in−1 6= in
⇒ φ(a1 · · · an ) = 0,
(3)


φ(a1 ) = φ(a2 ) = · · · = φ(an ) = 0
Important noncummutative probability space: (Mn (C), τn ), where τn is the normalized trace on Mn (C). This space is the stage for combining free probability
and random matrix theory.
Definition 2 if a and b are free in (A, φ), then the distribution of a + b, ab,
depend only on the distributions of a and b, not on their realizations. We denote
by µa µb the distribution of a + b, and by µa µb the distribution of ab.
Given probability measures µ and µ2 , when there is a unique probability
measure µ1 such that µ = µ1 µ2 , we will write µ1 = µrµ2 . We say that µ1 is
the multiplicative free deconvolution of µ with µ2 .
Analogies with classical probability
1. Freeness is the analogy of independence.
2. The R-transform: Transform on probability distributions which satisfies
Rµa µb (z) = Rµa (z) + Rµb (z).
Analogy to the logarithm of the Fourier transform for classical random
variables.
3. The S-transform: Transform on probability distributions which satisfies
Sµa µb (z) = Sµa (z)Sµb (z).
2
4. Free central limit theorem: If a1 , a2 , ... are free, and φ(ai ) = 0, φ(a2i ) =√1,
supi |φ(aki )| < ∞ for all k. Then the sequence (a1 + a2 + · · · +√an )/ n
1
converges in distribution to a probability measure with density 2π
4 − x2 .
This is called the semicircular law, and is the analogy to the normal law
of classical probability.
5. Poisson distributions in classical probability have their analogy in the free
Poisson distributions. These are given by the so-called Marc̆enko Pastur.
The Marc̆enko Pastur law µc with parameter c is characterized by the
density
p
(x − a)+ (b − x)+
1 +
µc
,
(4)
f (x) = (1 − ) δ(x) +
c
2πcx
√
√
where (z)+ = max(0, z), a = (1 − c)+ and a = (1 + c)+ .
6. Infinite divisibility: There exists an analogy to the Lévy-Hinčin formula
for infinite divisibility in classical probability.
Rewriting theorem 1 in terms of free probability
Theorem 2 Under the same conditions 1-4, if
µ
1 R R∗
N n n
D
→ F µr µc a.s.
(5)
F µWn → F µr+σ2 I µc a.s.
(6)
F
Then
D
Proof: can be expressed in terms of the S-transform. Formulas connecting
the Stieltjes transform and the S-transform exist. Using these is enough to
rewrite the expression in theorem 1 to the formula above.
In other words, after replacing with limits R and W , we get:
µW rµc = (µR rµc ) µσ2 I
(7)
So, multiplicative free deconvolution can simplify limit eigenvalue expressions
for random matrices. Advantage of using multiplicative free convolution to
detect the limiting eigenvalue distribution:
• One can apply tools from free probability,
• the process is reversible: One can use deconvolution to also go the opposite
way in find limiting eigenvalue distribution,
• the noise contribution has been identified in it’s own term (µσ2 I ), so that
this can be identified as well based on knowledge of the system and on
observations from the system.
3
Can theorem 1 be reproved in a free probability
framework?
The method of proof of [1] in theorem 1 does not use the free probability framework. We propose a method for reproving it within a free probability framework,
based on the following free probability theorem, which we also prove:
Proposition 1 Suppose that a and {p, b} are ∗-free, with a R-diagonal and p
a projection. In the reduced probability space (pAp, φ(p)−1 φ), µp(a+b)(a+b)∗ p is
uniquely identified by the equation
µp(a+b)(a+b)∗ p rµc = (µpaa∗ p rµc ) (µpbb∗ p rµc )
(8)
In particular, µp(a+b)(a+b)∗ p has no dependence on other moments than those of
paa∗ p and pbb∗ p.
Note the similarity between (7) and proposition 1. We will not define Rdiagonal pairs here, but just mention that they can be defined through their
R-transforms. When a is R-diagonal, it’s ∗-distributions has R-transform
Ra,a∗ (X1 , X2 ) =
∞
X
αn ((X1 X2 )n + (X2 X1 )n ) .
n=1
The proof of proposition 1 depends heavily on a combinatorial description of
freeness and the R-transform, introduced by Nica/Speicher [4]). Similar methods have been used in results for obtaining the distribution of i(ab − ba) (the
free commutator) when a and b are free. This can also be expressed in terms of
multiplicative free decovolution.
Sketch of free probability based proof of theorem 1, restricted to
the case of compactly supported probability measure limits:
The
rectangular random matrices Rn can be viewed as the N × N -matrices pn Sn ,
Xn can be viewed as the N × N -matrices pn Yn , where
• the projection pn is a diagonal matrix, with the fraction of 1’s on the
diagonal equal to c,
• Sn is an extension of the n × N matrix Rn to an N × N -matrix, obtained
by adding zeros,
• Yn is an extension of the n × N matrix Xn to an N × N -matrix, obtained
by adding independent entries with the same distribution.
It is well known that µ N1 Yy Yn∗ converges in distribution almost surely to µ1 , the
Marc̆enko Pastur law with parameter c [3]. It is also well known that Yn is
asymptotically free from {pn , Sn } in many cases (some generalizations of known
results are needed in the proof, for instance how to cope with the fact that Sn
itself may not have a limit distribution). Therefore, proposition 1 fits our needs
well, and the similarity between (7) and proposition 1 is what we need to finish
the proof.
4
G-analysis
The general statistical analysis of observations, also called G-analysis [2] is a
mathematical theory studying complex systems, such that the number of parameters of it’s mathematical model can increase together with the growth of
the number of observations of the system. The mathematical models which approach the system in some sense are called G-estimators, and the main difficulty
in G-analysis is to find these estimators. We use N for the number of observations of the system, and n for the number of parameters of the matematical
model. The condition used in G-analysis expressing the growth of the number
of observations vs. the number of parameters in the mathematical model, is
called the G-condition, which here is
n
= c,
(9)
lim
n→∞ N
We restrict ourselves to systems where a number of independent random vector
observations are taken, and where the random vectors have identical distributions. If a random vector has length n, we will use the notation rn for it’s
covariance. The Rn we analyze in this section are more restrictive than in
previous sections, since independence across samples is assumed. Girko calls
esimators for the Stieltjes transform of covariance matrices G2 -estimators. He
introduces the following expression as candidate for a G2 -estimator:
G2n (z) =
θ̂(z)
GµRn (θ̂(z)),
z
(10)
where the function θ̂(z) is the solution to the equation
θ̂(z)cGµRn (θ̂(z)) − (1 − c) +
θ̂(z)
= 0.
z
(11)
Girko claims that a function G2n (z) satisfying (10) and (11) is a good approximation for the Stieltjes transform of the covariance matrices Grn (z) =
−1
1
.
n T r {rn − zIn }
We claim that:
Theorem 3 For the G2 -estimator given by (10), (11), the following holds:
G2n (z) = GµRn r µc
(12)
Thus, in some circumstances, multiplicative free convolution can be used to
estimate the covariance of systems, based on samples taken from the system.
The proof of this is also based on utilizing the connection with the Stieltjes
transform and the S-transform.
Example
In [5], systems of the form yn = An xn + wn are studied, where xn and wn are
standard Gaussian random vectors, An is an n × L matrix, and n → ∞. The
5
covariance of this system
rn = An AH
n + In .
We estimate the system with N independent observations y1 , ..., yn . and obtain
a sample covariance matrix (SCM)
Rn =
N
1 X
1
yi yiH = Yn YnH ,
N i=1
N
where Yn = [y1 , ..., yn ]. In this case one can write
1
Rn = rn1/2
Xn Xn∗ rn1/2 .
N
(13)
where Xn is an n × N matrix with independent standard Gaussian entries. The
matrices N1 Xn Xn∗ are of similar type to the ones we have considered (and have
limit µc ), so that asymptotic freeness will occur, so that in the limit µR = µr µc ,
which was as expected from theorem 3.
Other systems of interest
Similar points can be made for systems of the form
Wn = Xn + HnH Tn Hn ,
(14)
where in many cases both multiplicative and additive free convolution can describe the limiting behaviour. One case considered in the litterature is when Tn
is diagonal, Xn positive definite. The limiting eigenvalue distribution of Tn is
often of interest, and additive and multilplicative free deconvolution can aid in
finding it.
References
[1] B. Dozier and J.W. Silverstein. On the empirical distribution of eigenvalues
of large dimensional information-plus-noise type matrices. Unknown journal,
2004. Submitted.
[2] V. L. Girko. Ten years of general statistical analysis.
statistical-analysis.girko.freewebspace.com/chapter14.pdf.
http://general-
[3] F. Hiai and D. Petz. The Semicircle Law, Free Random Variables and Entropy. American Mathematical Society, 2000.
[4] A. Nica and R. Speicher. Lectures on the Combinatorics of Free Probability.
Cambridge University Press, 2006.
[5] N.R. Rao and A. Edelman. Free probability, sample covariance matrices and
signal processing. ICASSP, pages 1001–1004, 2006.
6
Download