Applying multiplicative free deconvolution to find limiting eigenvalue distributions of random matrices Øyvind Ryan and Merouane Debbah Abstract We reprove some well-known theorems on limiting eigenvalue distributions of random matrices, using free probability methods. The method we will use is that of multiplicative free deconvolution, a tool we will show both can simplify expressions for limiting eigenvalue distributions, and also expressions for estimators of the spectral function for covariance matrices. The method used will alse be explained in a purely free probabilistic framework also, i.e. independent of a random matrix setting. Result of Dozier/Silverstein [1] Let F µA be the empirical distribution function of the eigenvalues of A (F µA (x) proportion of the eigenvalues of A less than x). µA means the probability D measure associated with the distribution of A. F µAn → F µA will denote weak convergence. The Stieltjes transform of a matrix A with distribution function F µA (x) is defined by Z 1 dF µA (λ). (1) GµA (z) = λ−z The following terminology and restrictions will be used: n 1. For n = 1, 2, · · · ,, Xn = (Xij , n × N , i.d. for all i, j, n, independent across 1 1 2 i, j for each n, and E|X11 − EX11 | =1 2. Rn is n × N and independent of Xn , with F F µR is a nonrandom p.d.f. 3. limn→∞ 4. Wn = n N =c 1 N (Rn + σXn )(Rn + σXn )∗ We will study the following theorem [1]: 1 µ 1 R R∗ N n n D → F µR a.s. where D Theorem 1 Under assumptions 1-4, F µWn → F µW a.s. where F µW is a nonrandom p.d.f. characterized by Z dF µR (t) GµW (z) = (2) t 2 2 1+σ 2 cGµ (z) − (1 + σ cGµW (z))z + σ (1 − c) W We will attempt to rewrite this expression to a simpler one using free probability constructs, introduced in the next section. Free probability Definition 1 Let φ be a normalized linear functional on the ∗-algebra A. (A, φ) is called a noncommutative probability space. A family of unital ∗-subalgebras (Ai )i∈I ⊂ A will be called a free family if aj ∈ Aij i1 6= i2 , i2 6= i3 , · · · , in−1 6= in ⇒ φ(a1 · · · an ) = 0, (3) φ(a1 ) = φ(a2 ) = · · · = φ(an ) = 0 Important noncummutative probability space: (Mn (C), τn ), where τn is the normalized trace on Mn (C). This space is the stage for combining free probability and random matrix theory. Definition 2 if a and b are free in (A, φ), then the distribution of a + b, ab, depend only on the distributions of a and b, not on their realizations. We denote by µa µb the distribution of a + b, and by µa µb the distribution of ab. Given probability measures µ and µ2 , when there is a unique probability measure µ1 such that µ = µ1 µ2 , we will write µ1 = µrµ2 . We say that µ1 is the multiplicative free deconvolution of µ with µ2 . Analogies with classical probability 1. Freeness is the analogy of independence. 2. The R-transform: Transform on probability distributions which satisfies Rµa µb (z) = Rµa (z) + Rµb (z). Analogy to the logarithm of the Fourier transform for classical random variables. 3. The S-transform: Transform on probability distributions which satisfies Sµa µb (z) = Sµa (z)Sµb (z). 2 4. Free central limit theorem: If a1 , a2 , ... are free, and φ(ai ) = 0, φ(a2i ) =√1, supi |φ(aki )| < ∞ for all k. Then the sequence (a1 + a2 + · · · +√an )/ n 1 converges in distribution to a probability measure with density 2π 4 − x2 . This is called the semicircular law, and is the analogy to the normal law of classical probability. 5. Poisson distributions in classical probability have their analogy in the free Poisson distributions. These are given by the so-called Marc̆enko Pastur. The Marc̆enko Pastur law µc with parameter c is characterized by the density p (x − a)+ (b − x)+ 1 + µc , (4) f (x) = (1 − ) δ(x) + c 2πcx √ √ where (z)+ = max(0, z), a = (1 − c)+ and a = (1 + c)+ . 6. Infinite divisibility: There exists an analogy to the Lévy-Hinčin formula for infinite divisibility in classical probability. Rewriting theorem 1 in terms of free probability Theorem 2 Under the same conditions 1-4, if µ 1 R R∗ N n n D → F µr µc a.s. (5) F µWn → F µr+σ2 I µc a.s. (6) F Then D Proof: can be expressed in terms of the S-transform. Formulas connecting the Stieltjes transform and the S-transform exist. Using these is enough to rewrite the expression in theorem 1 to the formula above. In other words, after replacing with limits R and W , we get: µW rµc = (µR rµc ) µσ2 I (7) So, multiplicative free deconvolution can simplify limit eigenvalue expressions for random matrices. Advantage of using multiplicative free convolution to detect the limiting eigenvalue distribution: • One can apply tools from free probability, • the process is reversible: One can use deconvolution to also go the opposite way in find limiting eigenvalue distribution, • the noise contribution has been identified in it’s own term (µσ2 I ), so that this can be identified as well based on knowledge of the system and on observations from the system. 3 Can theorem 1 be reproved in a free probability framework? The method of proof of [1] in theorem 1 does not use the free probability framework. We propose a method for reproving it within a free probability framework, based on the following free probability theorem, which we also prove: Proposition 1 Suppose that a and {p, b} are ∗-free, with a R-diagonal and p a projection. In the reduced probability space (pAp, φ(p)−1 φ), µp(a+b)(a+b)∗ p is uniquely identified by the equation µp(a+b)(a+b)∗ p rµc = (µpaa∗ p rµc ) (µpbb∗ p rµc ) (8) In particular, µp(a+b)(a+b)∗ p has no dependence on other moments than those of paa∗ p and pbb∗ p. Note the similarity between (7) and proposition 1. We will not define Rdiagonal pairs here, but just mention that they can be defined through their R-transforms. When a is R-diagonal, it’s ∗-distributions has R-transform Ra,a∗ (X1 , X2 ) = ∞ X αn ((X1 X2 )n + (X2 X1 )n ) . n=1 The proof of proposition 1 depends heavily on a combinatorial description of freeness and the R-transform, introduced by Nica/Speicher [4]). Similar methods have been used in results for obtaining the distribution of i(ab − ba) (the free commutator) when a and b are free. This can also be expressed in terms of multiplicative free decovolution. Sketch of free probability based proof of theorem 1, restricted to the case of compactly supported probability measure limits: The rectangular random matrices Rn can be viewed as the N × N -matrices pn Sn , Xn can be viewed as the N × N -matrices pn Yn , where • the projection pn is a diagonal matrix, with the fraction of 1’s on the diagonal equal to c, • Sn is an extension of the n × N matrix Rn to an N × N -matrix, obtained by adding zeros, • Yn is an extension of the n × N matrix Xn to an N × N -matrix, obtained by adding independent entries with the same distribution. It is well known that µ N1 Yy Yn∗ converges in distribution almost surely to µ1 , the Marc̆enko Pastur law with parameter c [3]. It is also well known that Yn is asymptotically free from {pn , Sn } in many cases (some generalizations of known results are needed in the proof, for instance how to cope with the fact that Sn itself may not have a limit distribution). Therefore, proposition 1 fits our needs well, and the similarity between (7) and proposition 1 is what we need to finish the proof. 4 G-analysis The general statistical analysis of observations, also called G-analysis [2] is a mathematical theory studying complex systems, such that the number of parameters of it’s mathematical model can increase together with the growth of the number of observations of the system. The mathematical models which approach the system in some sense are called G-estimators, and the main difficulty in G-analysis is to find these estimators. We use N for the number of observations of the system, and n for the number of parameters of the matematical model. The condition used in G-analysis expressing the growth of the number of observations vs. the number of parameters in the mathematical model, is called the G-condition, which here is n = c, (9) lim n→∞ N We restrict ourselves to systems where a number of independent random vector observations are taken, and where the random vectors have identical distributions. If a random vector has length n, we will use the notation rn for it’s covariance. The Rn we analyze in this section are more restrictive than in previous sections, since independence across samples is assumed. Girko calls esimators for the Stieltjes transform of covariance matrices G2 -estimators. He introduces the following expression as candidate for a G2 -estimator: G2n (z) = θ̂(z) GµRn (θ̂(z)), z (10) where the function θ̂(z) is the solution to the equation θ̂(z)cGµRn (θ̂(z)) − (1 − c) + θ̂(z) = 0. z (11) Girko claims that a function G2n (z) satisfying (10) and (11) is a good approximation for the Stieltjes transform of the covariance matrices Grn (z) = −1 1 . n T r {rn − zIn } We claim that: Theorem 3 For the G2 -estimator given by (10), (11), the following holds: G2n (z) = GµRn r µc (12) Thus, in some circumstances, multiplicative free convolution can be used to estimate the covariance of systems, based on samples taken from the system. The proof of this is also based on utilizing the connection with the Stieltjes transform and the S-transform. Example In [5], systems of the form yn = An xn + wn are studied, where xn and wn are standard Gaussian random vectors, An is an n × L matrix, and n → ∞. The 5 covariance of this system rn = An AH n + In . We estimate the system with N independent observations y1 , ..., yn . and obtain a sample covariance matrix (SCM) Rn = N 1 X 1 yi yiH = Yn YnH , N i=1 N where Yn = [y1 , ..., yn ]. In this case one can write 1 Rn = rn1/2 Xn Xn∗ rn1/2 . N (13) where Xn is an n × N matrix with independent standard Gaussian entries. The matrices N1 Xn Xn∗ are of similar type to the ones we have considered (and have limit µc ), so that asymptotic freeness will occur, so that in the limit µR = µr µc , which was as expected from theorem 3. Other systems of interest Similar points can be made for systems of the form Wn = Xn + HnH Tn Hn , (14) where in many cases both multiplicative and additive free convolution can describe the limiting behaviour. One case considered in the litterature is when Tn is diagonal, Xn positive definite. The limiting eigenvalue distribution of Tn is often of interest, and additive and multilplicative free deconvolution can aid in finding it. References [1] B. Dozier and J.W. Silverstein. On the empirical distribution of eigenvalues of large dimensional information-plus-noise type matrices. Unknown journal, 2004. Submitted. [2] V. L. Girko. Ten years of general statistical analysis. statistical-analysis.girko.freewebspace.com/chapter14.pdf. http://general- [3] F. Hiai and D. Petz. The Semicircle Law, Free Random Variables and Entropy. American Mathematical Society, 2000. [4] A. Nica and R. Speicher. Lectures on the Combinatorics of Free Probability. Cambridge University Press, 2006. [5] N.R. Rao and A. Edelman. Free probability, sample covariance matrices and signal processing. ICASSP, pages 1001–1004, 2006. 6