CMA 2010 Vandermonde matrices, adjacency matrices and Fourier-Motzkin elimination Øyvind Ryan May 2010

Vandermonde matrices A Vandermonde matrix is a matrix on the form 1 1 ··· 1 x1 x2 · · · xn V= .. .. .. .. . . . . . n −1 n−1 n − 1 x1 x2 · · · xn Classical linear algebra exercises on Vandermonde matrices: 1. Show that V is invertible if and only if x1 , ..., xn are distinct. Q 2. Show that the determinant of V is 1≤i <j ≤n (xi − xj ). The Vandermonde matrix and the Companion matrix If p (t ) = a0 + a1 t + · · · + an−1 t n−1 + t n is a polynomial, and C (p ) = 0 0 .. . 1 0 .. . 0 ··· 0 1 ··· 0 .. .. .. . . . 0 0 0 ··· 1 −a0 −a1 −a2 · · · −an−1 is its Companion matrix, then C (p ) = V diag(λ1 , ..., λn )V −1 , where λ1 , ..., λn are the (distinct) roots of p and V is the Vandermonde matrix with x1 = λ1 , ..., xn = λn .

Vandermonde matrices in my research I The matrices are not necessarily square (N × L) I The matrices are (very) large. N , L → ∞ at the same rate, i.e. limN →∞ NL = c for some c. I The x1 , ..., xn lie on the unit circle. I The x1 , ..., xn are random quantities, 1 ··· −j ω1 e ··· 1 V= √ .. .. . N . − j ( N − 1 )ω 1 e ··· typically i.i.d. 1 e −j ωL .. . − j ( N − 1 )ω L e (1) ωi are called phase distributions, and takes values in [0, 2π). The normalizing factor √1 is included to ensure limiting asymptotic N behaviour.

When ω has a given distribution, we can plot the distribution of the singular values of V. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 10 Figure: Mean eigenvalue distribution for 640 realizations of (1), with N = 1600, L = 800, and ω uniformly distributed. Matlab code N=1600; L=800; V = zeros(N,L); for (k=1:L) V(:,k) = ((exp(2*pi*j*rand(1))).^(0:(N-1)))'; end V = (1/N)*V'*V; [n,xout] = hist(eig(V),30); n = n / ( L*(xout(2) - xout(1)) ); bar(xout,n) axis([0 10 0 1]) xlabel('\lambda') ylabel('Density')

Main questions 1. What are the statistical properties of the singular value distribution/eigenvalue distribution of V? 2. What happens when N , L → ∞? 3. For many random matrices, approximately the same eigenvalue distribution is seen for any realization, as long as the matrices are large. Does Vandermonde matrices also exhibit such non-random behaviour? Moments of a matrix Denition The p'th moment of an n × n-matrix V is dened by Vp = tr(V p ) = ¢ 1¡ p λ1 + · · · + λpn . n We have denoted the eigenvalues of V by λ1 , ..., λn . I The characteristic equation of a matrix can be easily retrieved from its moments. I The moments give an alternative description of the eigenvalues, since they can be retrieved from them. Closed form expressions for the moments of Vandermonde matrices can be found as we will see, but it is more dicult to nd expressions for the eigenvalues.

Main result [1] Assume ω uniform, and N = L. Dene P(p ) as the set of partitions of p elements, with blocks ρ = {W1 , ..., Wr }. We have h ³³ ´p ´i X lim E tr V H V = Kρ , (2) N ,L→∞ ρ∈P(p ) where Kρ is the volume of the solution set of X X xk − 1 = xk k ∈W1 X k ∈W2 X k ∈W1 X xk − 1 = .. . xk − 1 = k ∈Wr xk k ∈W2 .. . X xk , (3) k ∈Wr (r equations in n unknowns) where 0 ≤ xi ≤ 1, 1 ≤ i ≤ p.

Noncrossing partitions Kρ is always a rational number. I If n and n + 1 are in the same block of ρ, the corresponding variables in (3) cancel. I It is straightforward to show that Kρ = 1 if ρ is an interval partition. I More generally it is not too hard to show that Kρ = 1 if an only if ρ is a noncrossing partition: Denition ρ is called noncrossing if, whenever i < j < k < l with i , k in the same block, and j , l in the same block, then ALL i , j , k , l are in the same block. Many results on moments of random matrices can be expressed in terms of noncrossing partitions. Support of the eigenvalue distribution I Exact expressions for Kρ are hard to nd I Bounds for Kρ are easier to nd. These bounds can be used to bound the moments, since these can be written as sums of such. I Using such bounds one can show that (mp )1/p → ∞ as n → ∞, where mp is the limit (2), which in turn implies that the eigenvalue distribution has unbounded support.

Adjacency matrices I The coecient matrix of (3) can be viewed as the adjacency matrix of a directed graph with r nodes and p edges. I Due to the sum over all ρ, all such graphs with any number og nodes and p edges are considered. I To nd the volume of the solution sets, it is not enough to nd just one solution, we must nd them all, for all possible ρ. This is computationally intensive, and makes the problem computable only for the lower order moments. For all ρ we perform the following procedure, called Fourier-Motzkin elimination [2]:

Fourier-Motzkin elimination 1. Row-reduce the matrix (3) to nd the pivot variables. 2. Bring the equations into a standard form by expressing the r − 1 pivot variables by means of the free variables. Since all variables are between 0 and 1 we can write Pn−r +1 a x ≤ 1 Pn−j r=+11 1j j − a1j xj ≤ 0 j =1 P n−r +1 a x ≤ 1 Pn−j r=+11 2j j −a2j xj ≤ 0 j =1 .. .. . . Pn−r +1 a ≤ 1 x Pn−j r=+11 (r −1)j j −a(r −1)j xj ≤ 0 j =1 where we have re-indexed the variables so that x1 , ..., xn−r +1 are the free variables, xn−r +2 , ..., xn are the pivot variables)

1. We can also write x1 −x1 x2 −x2 .. . ≤ ≤ ≤ ≤ 1 0 1 0 .. . xn−r +1 ≤ 1 −xn−r +1 ≤ 0, for the free variables. 2. The coecients aij are taken from −1, 0, 1.

By reordering the equations, we get the standard form (equations are sorted by the x1 -coecient): x1 .. . + Pn−r j =1 Pn−r 1 br1 j xj +1 Pj = n −r j =1 c1j xj +1 .. . Pn−r c x Pj =1 r2 j j +1 + nj =−1r d1j xj +1 .. . Pn−r + j =1 dr3 j xj +1 x1 + .. . −x1 .. . −x1 b1j xj +1 ≤ .. . e1 .. . ≤ ≤ er1 f1 ≤ ≤ fr2 g1 .. . ≤ gr3 . 1. Consider all possible maximums among { nj =−1r bsj xj +1 }1≤s ≤r1 in the rst r1 equations. P 2. Consider all possible minimums among { nj =−1r dsj xj +1 }1≤s ≤r3 in the last r3 equations. 3. The minimum and maximum in 1. and 2. restricts to an interval of legal x1 -values, and the x1 -variable can be eliminated accordingly, giving one particular part of the solution set. 4. We now have |ρ| − 1 variables, and at most p − 1 equations. Work on these iteratively. 5. Finally we add together the volumes of the dierent parts. 6. Software implementation on my webpages [3] which performs Fourier-Motzkin elimination for all graphs. In my publications, this implementtion is applied for inference in wireless models. The rst steps are suitable for parallel processing. Finite matrices I Replace the continuous variables xi with discrete variables 0 ≤ xi ≤ N − 1, I dene Kρ as the number of solutions to (3) divided by N to the power of free variables, I the same result holds for nite matrices also. In other words, we can write down the exact moments of Vandermonde moments once we use Fourier-Motzkin eliminiation on a discrete set.

Stochastic eigen-inference Given a random matrix Y = f (D , X1 , ..., Xn ), where D is an unknown matrix, and X1 , ..., Xn are known (independent) random matrices. I From observations Y1 , ..., YL of Y , can we in a reliable manner infer on the matrix D? I Typically, such inference is on the eigenvalue distribution of D, rather than on the matrix itself, and is thus called eigen-inference. I Some of my research focus on what statistical properties are needed for X1 , ..., Xn in order for such inference to be possible. Do the matrices X1 , ..., Xn have to be unitarily invariant? I Inference method depends highly on f and on the properties of X1 , ..., Xn .

Random matrix library [4] I I I I I Eigen-inference methods in my research are most often moment-based, in that the inference methods nd expressions for the moments of D (i.e. moment-based eigen-inference). I am developing a random matrix library [4], which contains stochastic eigen-inference methods for many types of f and X1 , ...Xn X being Vandermonde matrices, Toeplitz matrices and Gaussian matrices, are considered. The library is moment-based (in the literature, other approaches exist, such as that based on the Stieltjes transform). Many of the moment-based methods are based on summing certain expressions over classes of partitions. Much room for optimizations and simplifying expressions through combinatorics.

Applications I A random matrix has the form DV H V , where D is an unknown diagonal matrix, and V is a Vandermonde matrix. Can we infer on the eigenvalue distribution of D from that of DV H V ? The latter is observable. I A random matrix has the form V H V + D, where D is an unknown diagonal matrix, and V is a Vandermonde matrix. Can we infer on the eigenvalue distribution of D from that of V H V + D? The latter is observable. This inference is also based on nding expressions for the moments of DV H V , D + V H V , which can also be found using the same techniques.

The moments of a Toeplitz matrix X0 X1 X2 · · · X1 X X 0 1 1 X2 X1 X0 Tn = √ . .. n .. . Xn − 2 Xn−1 Xn−2 . . . X2 Xn−2 Xn−1 Xn − 2 .. .. . . X0 X1 X2 X1 X0 , where Xi are i.i.d., real-valued random variables with variance 1, are similar, in that they are volumes of solution sets to certain linear equations, and can be found through Fourier-Motzkin elimination. The same applies for Hankel- and Markov matrices.

Convergence of random matrices Assume that Vn is a sequence of random matrices. In random matrix theory, one is interested in convergence of the empirical eigenvalue distribution of Vn as n → ∞. There are several modes of convergence: I Convergence in expectation: limn→∞ E(tr((Vn )m )) exists for all m. I Convergence in probability: limn→∞ Pr(|tr((Vn )m ) − vm | ≥ ²) = 0 for all ². I Almost sure convergence: limn→∞ tr((Vn )m ) exists almost surely. This is the strongest form of convergence. Almost sure convergence is the case for both Vandermonde matrices and Gaussian matrices.

Gaussian matrices I When the entries in a random matrix are i.i.d. Gaussian, examples of the eigenvalue limit distributions we observe are shown on the next foils I The paper [5] builds a moment-based inference framework where general combinations of Gaussian matrices are involved. I The formulas for the moments of Gaussian matrices are dierent from (3). They typically involve noncrossing partitions in some way. I When A is a selfadjoint, complex, Gaussian random matrix, Here for n = 500. 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1 −0.5 0 0.5 1 plot(eig((1/sqrt(1000))*(randn(500,500)+j*randn(500,500))),'kx') Øyvind Ryan Vandermonde matrices, adjacency matrices and Fourier-Motz CMA 2010 Vandermonde matrices, adjacency matrices and Fourier-Motzk The semicircle law 35 30 25 20 15 10 5 0 −3 −2 −1 0 1 2 3 A = (1/sqrt(2000)) * (randn(1000,1000) + j*randn(1000,1000)); A = (sqrt(2)/2)*(A+A'); hist(eig(A),40) Øyvind Ryan Vandermonde matrices, adjacency matrices and Fourier-Motz CMA 2010 Vandermonde matrices, adjacency matrices and Fourier-Motzk I This talk is available at I My publications are listed at THANK YOU! Øyvind Ryan Vandermonde matrices, adjacency matrices and Fourier-Motz CMA 2010 Vandermonde matrices, adjacency matrices and Fourier-Motzk Ø. Ryan and M. Debbah, Asymptotic behaviour of random Vandermonde matrices with entries on the unit circle, IEEE Trans. on Information Theory, vol. 55, no. 7, pp. 31153148, 2009. G. 