Beta-Ensembles with Covariance Alexander Dubbs

Beta-Ensembles with Covariance
by
Alexander Dubbs
A.B. Harvard University (2009)
Submitted to the Department of Mathematics
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Applied Mathematics
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
&
MSACHUSET
OF TECHNOLOGY
June 2014
@2014 Alexander Dubbs. All rights reserved.
4
Author .
1'
V
JUN 17 2014
LI BRARI ES
Signature redacted
L"
-
- --
to"
Department of Mathematics
April 18, 2014
Certified by.
Signature redacted
Alan Edelman
Professor
Thesis Supervisor
Signature redacted
Accepted by ....................................................
Peter Shor
Chairman, Applied Mathematics Committee
2
Beta-Ensembles with Covariance
by
Alexander Dubbs
Submitted to the Department of Mathematics
on April 18, 2014, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy in Applied Mathematics
Abstract
This thesis presents analytic samplers for the ,3-Wishart and O-MANOVA ensembles
with diagonal covariance. These generalize the /3-ensembles of Dumitriu-Edelman,
Lippert, Killip-Nenciu, Forrester-Rains, and Edelman-Sutton, as well as the classical
3 = 1, 2,4 ensembles of James, Li-Xue, and Constantine. Forrester discovered a
sampler for the -Wishart ensemble around the same time, although our proof has
key differences. We also derive the largest eigenvalue pdf for the /3-MANOVA case. In
infinite-dimensional random matrix theory, we find the moments of the Wachter law,
and the Jacobi parameters and free cumulants of the McKay and Wachter laws. We
also present an algorithm that uses complex analysis to solve "The Moment Problem."
It takes the first batch of moments of an analytic, compactly-supported distribution
as input, and it outputs a fine discretization of that distribution.
Thesis Supervisor: Alan Edelman
Title: Professor
3
4
Acknowledgments
I am grateful for the help of my adviser Alan Edelman. This thesis would not have
been possible without his patience and inspiration. Thanks to him I am a much better
researcher than I was when I arrived at MIT. It was a priviledge to contribute to the
fields of ,3-ensembles and infinite random matrix theory.
I am also grateful for the help of my coauthors Plamen Koev and Praveen Venkataramana. Plamen's mhg software and Praveen's combinatorial skills helped push this
thesis across the finish line.
I would also like to thank Marcelo Magnasco and Christopher Jones. Marcelo let
me into his lab while I was still a high school student and taught me to do computational neuroscience research, culminating in a paper. Chris both kept me occupied
with "bonus" problems and allowed me the opportunity to learn independently.
To my friends, it has been a wonderful experience living with you in Cambridge
for the last nine years, you will all be missed.
Finally, I would like to thank my family, who encouraged me to study mathematics.
5
6
Chapter 1
Introduction
Work on beta-ensembles to date.
1.1
We define a -ensemble to be a probability distribution with a continuous dimension
parameter
/3 >
0 that adjusts the degree of Vandermonde repulsion among its vari-
ables. /3-ensembles are typically the eigenvalue, singular value, or generalized singular
value distributions of finite random matrices with Gaussian entries. The three main
ones are the Hermite, Laguerre, and Jacobi ensembles, see [12].
c. f7 JAi - A
Hermite
1exp
)
-
i<j
c' aFJIAi - AjL exp f A
Laguerre
Jacobi
cI
a1a2
f
i=1
i
i<j
3
-2ZAi
IA - A1j- 3 expJJ (Aalp(1 - Ai)a2-P)
i<j
The
/
-
1, 2, 4 cases of these distributions are the eigenvalue distributions of ensem-
bles of real
(/
= 1), complex
(/
= 2), and quaternionic
(/
= 4) random matrices of
Gaussians. Let X and Y denote a Gaussian random matrices over the reals, complexes, or quaternions, depending on
/.
In terms of the eigenvalue distribution, we
have the correspondence:
7
Hermite
eig ((X + Xt)/2)
Laguerre
eig (XXt)
Jacobi
eig (XXt/(XXt + YYt))
There also exist finite Gaussian random matrix ensembles over the reals, complexes,
or quaternions governed by a diagonal matrix of nonrandom tuning parameters, called
the ensemble's "covariance." The two known ones are below, where gsvdc indicates
the "cosine generalized singular values."
- ) 1/2
gsvdc (Y, XQ) = eig (Yty/(yty + QXtXQ>
D and Q are diagonal matrices of tuning parameters, E = diag(- 1 , . .. , -) are singular
values, and C = diag(ci,..., cn) are cosine generalized singular values. see [10] and
[11].
Wishart
svd
n
D
CW
(D 1 / 2 x)
2
(m-n+1),3-1
F
i 0 '3
x Fo (3)(IY
MANOVA
c Q fl
gsvdc (Y, XQ)
M
x
H
fJi<j |c
1"C(-1
-
1
R
2,
n"
2
-
i
(ZD
D-1)
(1 - C2)-(p+n-2),3/2-1
c|11Fo(3 ) (rn-
.;
c 2 (C2
-
i)-1,
Q2)
The hypergeometric functions pF (3 ) are defined in Chapter 2, Section 2.5 (and that
definition uses the Jack functions, which are in Section 2.4).
It is a natural question to ask, "For continuous 0 > 0, is there a matrix ensemble
that has a given 3-ensemble as its eigenvalue distribution?" In [12], Dumitriu and
Edelman were the first to answer yes, in the cases of the Hermite and Laguerre
ensemble. If we define the matrix B as it is below, eig(BB t ) follows the Laguerre
ensemble, and it works for any 3 > 0.
Xk's
denote independent X-distributed variables
8
with the correct degrees of freedom.
X2a
B
X3(m-i)
X2a-0
X,3
X2a-O(m-1)
This thesis' contributions to finite random matrix theory are analytic samplers for
the /3-Wishart and /3-MANOVA ensembles with covariance for general 3 > 0. They
are not as simple as finding the eigenvalues of a matrix, instead, the eigenvalues of
many matrices are needed to produce the samples, which are proven to come from the
exactly correct distributions. In addition, we contribute the probability distribution
function of the largest eigenvalue of the 3-MANOVA ensemble, which we check with
the software mhg [43]. Chapter 2 (originally in [10]) is concerned with the 3-Wishart
ensemble, which was discovered around the same time by Forrester [27], and Chapter
3 (originally in [11]) is concerned with the /-MANOVA
ensemble. Most work to date
on matrix models for 3-ensembles is described below:
Laguerre/Wishart Models
1
I3
Q = I (Laguerre)
D general (Wishart)
Fisher [24] (1939),
Hsu [33] (1939),
James [36] (1960)
Roy [61] (1939)
3
2
James [37] (1964)
James [37] (1964)
3
4
Li-Xue [45] (2009)
Li-Xue [45] (2009)
Forrester [27] (2011),
3 > 0
[12ti(Ed02)
,
[12] (2002)[10]
Dubbs-Edelman-Koev-Venkatarmana,
(2013)
9
Jacobi/MANOVA Models
3=
/
1
Q = I (Jacobi)
Q general (MANOVA)
Fisher [24] (1939), Girshick [29] (1939),
Constantine
Hsu [33] (1939), Mood [51] (1951),
(unpublished,
Olkin-Roy [55] (1954), Roy [61] (1939)
found in [37] (1964))
James [37] (1964)
-
=2
Lippert [46] (2003),
/
Killip-Nenciu [40] (2004),
Forrester-Rains [28] (2005),
> 0
Dubbs-Edelman [11] (2014)
Edelman-Sutton [21] (2008)
The Hermite ensemble does not, as of this writing, have a generalization using
a covariance matrix. In addition, a matrix model for the /-circular ensemble was
were proven to work by [40].
The samplers for the /3-Wishart and /3-MANOVA
ensembles are described below. The Wishart covariance parameters are in D and its
singular values are in E; the MANOVA covariance parameters are in Q and its cosine
generalized singular values are in C.
Beta-Wishart (Recursive) Model Pseudocode
Function E := BetaWishart(m, n, /3, D)
if n
1 then
E:= Xmi3D12
else
Z:n_1,1:nI := BetaWishart(m, n - 1,
Zn,:ni
[0, ..., 0]
Zl:n_1,n :
[X, Dnff ; ...; X, Dnfn]
Zn,n := X(m-n+1)Dnn
E := diag(svd(Z))
end if
10
/,
D:n1,1:n_1)
Beta-MANOVA Model Pseudocode
Function C := BetaMANOVA(m, n, p, /3, Q)
BetaWishart(m, n, /3, Q2 )
A :
M := BetaWishart(p, n, 1, A- 1 ) 1
C:= (M + I)--2
The distributions of the largest and smallest /-Wishart eigenvalues are due to
[41]
and included in Chapter 2. The distribution of the largest cosine generalized singular
value of the
-MANOVA distribution is new to this thesis and proved in Chapter 3,
it is:
Theorem 1. If t = (m - n + 1)//2 - 1 E Z>o,
P(ci < x) = det(x2 Q 2 ((1
-
2)+
±
21)@
nt1
(p/2)$jCj ((1
x
-
x2)((1
-
X2)1 + x 2 Q2 )),
(1.1)
k=O & kp16t
where the Jack polynomial C,3 and Pochhammer symbol (-)(j) are defined Sections
2.4 and 2.5.
1.2
Ghost methods.
Dumitriu and Edelman's original paper on /3-ensembles [12] as well as Chapters 2
and 3 of this thesis make use of Ghost methods, a concept formally put forward
by Edelman [17].
There are two ways of looking at it. 1. Say you can use linear
algebra to reduce a complex or quaternionic matrix to a real matrix with the same
eigenvalues, and say that method works in the same way for an initially given random
real, complex, or quaternionic matrix. The derived similar real matrix will have a
tuning parameter
/
> 0 indicating whether it originally came from a real
1), complex (3 = 2), or quaternionic
(/3
=
4) matrix.
(/
=
Then, make that tuning
parameter 3 in the derived similar matrix continuous and find the matrix's eigenvalue
p.d.f., which will be a /-ensemble with Vandermonde 3-repulsion. Now, we have
11
accomplished two things: we have a proof of the eigenvalue p.d.f. for the initial real,
complex, or quaternionic random matrix, and we have additionally found a matrix
model for a generalizing
-ensemble.
Another way to look at Ghost methods, which has not yet fully formalized, is to
pretend that an initial matrix for which we desire the eigenvalue p.d.f. is populated
by independent "Ghost Gaussians," and possibly some real covariance parameters.
Ghost matrices have the property that their eigenvalue distributions are invariant
under real orthogonal matrices and "Ghost Orthogonal Matrices," including but not
limited to diagonal matrices of "Ghost Signs." A Ghost Orthogonal Matrix or a real
orthogonal matrix times a vector of Ghost Gaussians leaves it invariant. Ghost Signs
have the property that if they multiply their respective Ghost Gaussians, the answer
is a real x3 random variable.
Let's consider the case of the 3 x 3 Wishart over the reals, complex numbers,
or quaternions with identity covariance. Let G,3 represent an independent Gaussian
real, complex, or quaternion for 3 = 1, 2, 4, with mean zero and variance one. Let
Xd
be a X-distributed real with d degrees of freedom. The following algorithm computes
the singular values, where all of the random variables in a given matrix are assumed
independent. We assume D = I for purposes of illustration, but this algorithm generalizes. We proceed through a series of matrices related by orthogonal transformations
on the left and the right.
G,3 G3 G
X3/3
G,
Go
Go
Go
0
G3
Go
G[
0
Go
G3
G,3
GJ G
X313
-
X3 G
0
X20
Go
0
0
G
1
To create the real, positive (1, 2) entry, we multiply the second column by a real
sign, or a complex or quaternionic phase. We then use a Householder reflector on the
bottom two rows to make the (2,2) entry a X2,3. Now we take the SVD of the 2 x 2
12
upper-left block:
T1
0
G1
0
T2
G,3
G,-
0
0
G,
T1
0
[0
T2
0
0
1-1
X3
0
X[
0
0
-2
0
0
0
3 ]
We convert the third column to reals using a diagonal matrix of signs on both sides.
The process can be continued for a larger matrix, and can work with one that is taller
than is wide. What it proves is that the second-to-last-matrix,
T1
0
xO
0
T2
X,3
0
0
x,3
has the same singular values as the first matrix, if 3 = 1, 2, 4. We call this new
matrix a "Broken-Arrow Matrix." The previously stated algorithm, "Beta-Wishart
(Recursive) Model Pseudocode," which generalizes the one above for the 3 x 3 case,
samples the singular values of the Wishart ensemble for general 3 and general D.
We can also use Ghost methods to derive the correctness of the previously stated
algorithm, "Beta-MANOVA Model Pseudocode," for / = 1, 2,4, and conjecture that
the algorithm works for continuous /3. Let X be m x n real, complex, quaternion,
or Ghost normal, Y be p x n real, complex, quaternion, or Ghost normal, and let
Q be n x n diagonal real p.d.s. Let QX*XQ have eigendecomposition UAU*, and
QX*XQ(Y*Y)-
1
have eigendecomposition VMV*. We want to draw M so we can
draw C from (C, S) = gsvd 0 (Y, XQ) = (M
+
I)d. Let ~ mean "having the same
eigenvalue distribution."
QX*XQ(Y*Y)-
1
~ AU*(Y*Y )-U ~ A((U*Y*)(YU))-
1
~ A(Y*Y)
which we can draw the eigenvalues M of using BetaWishart(p, n, /3,A')-.
A can be drawn using BetaWishart(m, n3,
,
Since
Q2 ), this completes the algorithm for
BetaMANOVA(m, n, p, /, Q) and proves that it has the desired generalized singular
13
values in the 0 = 1, 2, 4 cases.
Infinite Random Matrix Theory and The Mo-
1.3
ment Problem.
Consider the "big" laws for asymptotic level densities for various random matrices:
Wigner semicircle law
[66]
Marchenko-Pastur law
[49]
McKay law
[50]
Wachter law
[65]
Their measures and support are defined in the table below (The McKay and
Wachter laws are related by the linear transform ( 2 xMcKay - 1)v = XWachter and
a = b = v/2).
Support
Measure
Parameters
Wigner semicircle
4
d pws =
21rX dxr
27r
=[±2]
2Iws
N/A
Marchenko-Pastur
dpump=
(A±-x)(x
27x
A) dx
McKay
dyuM =
IM = [i2 v -I]
2
V,4(v - 1) - X
27(v
2
- X2 )
Yp
Wachter
_
v> 2
(a + b) V p+ -
)(x- i-)
=-Y
(Vs±
a(a-+b-1)
a
a, b > 1
27rx(1 - x)
14
)
2
These four measures have other representations: As their Cauchy, R, and S transforms; as their moments and free cumulants; and as their Jacobi parameters and
orthogonal polynomial sequences. In fact, their Jacobi parameters (ai,#4)g0
the property that they are "bordered Toeplitz,"
#1 = /32 = - - - =
Ce
= a2
an =--
have
and
n = - - . This motivates the two parts of Chapter 4:
1. We tabulate in one place key properties of the four laws, not all of which can
be found in the literature. These sections are expository, with the exception of
the as-of-yet unpublished Wachter moments, and the McKay and Wachter law
Jacobi parameters and free cumulants.
2. We describe a new algorithm to exploit the Toeplitz-with-length-k boundary
structure. In particular, we show how practical it is to approximate distributions with incomplete information using distributions having nearly-Toeplitz
encodings. We can use the theory of Cauchy transforms to go from the first
batch of moments or Jacobi parameters of an analytic, compactly-supported
distribution to a fine discretization thereof which can be used computationally.
Studies of nearly Toeplitz matrices in random matrix theory have been pioneered
by Anshelevich [2, 3].
Other laws may be characterized as being asymptotically
Toeplitz or numerically Toeplitz fairly quickly, such as the limiting histogram of the
eigenvalues of (X//m + plI)'(X/-ij + MI), where X is m x n, n is 0(m), and
m -
oc. Figure 4-3 shows that its histogram can be reconstructed from its first
batch of Jacobi parameters. Figure 4-2 shows distributions recovered from random
first batches of Jacobi parameters. Figure 4-4 shows the normal distribution - which is
not compactly-supported - reasonably well recovered from its first 10 or 20 moments.
15
16
Chapter 2
-Wishart
A Matrix Model for the
Ensemble
2.1
Introduction
The goal of this chapter is to prove that a random matrix Z has eig(Z t Z) distributed
with pdf equal to the O-ensemble below:
c2
13 =
T
A(A)
- oFo()
A D
1)dA.
Z's singular values are said to be the VAi. Z is defined by the recursion in the box,
if n is a positive integer and m a real greater than n - 1.
Beta-Wishart (Recursive) Model, W() (D, m, n)
rn- 1
)( Lin',n
X(m-n+1)3Dnn ]
where {rTi, .
W(D, m, 1) is
. Tn1} are the singular values of W(O)(D.:n_1,1.:n_,
T = Xm3D12
17
m, n - 1), base case
The singular values of WO) (D, m, n) are the singular values of Z.
The critical aspect of the proof is changing variables from the Ti's and x D1n2's to
the singular values of Z and the bottom row of its eigenvector matrix, q. This requires
the derivation of a Jacobian between the two sets of variables. In addition, to complete
the recursion, a theorem about Jack polynomials is needed. It is originally due to [54]
in a different form, the proof in this thesis is due to Praveen Venkataramana. Let q
be the surface area measure of the first quadrant of the n-sphere. The theorem is:
q-2
C)(A)
- qq t )A)dq.
1Cf )((I
The Jack polynomial Cf is defined in Section 2.4.
For completeness we also include the distributions of the extreme eigenvalues,
which are due to [19], and we check them with the mhg software from [43].
2.2
Arrow and Broken-Arrow Matrix Jacobians
Define the (symmetric) Arrow Matrix
di
Ci
A=
dnC1
..
1
Cn-1
Cn-1
Cn
Let its eigenvalues be A,.. . , An. Let q be the last row of its eigenvector matrix, i.e.
q contains the n-th element of each eigenvector. q is by convention in the positive
quadrant.
18
Define the broken arrow matrix B by
a,
an_1
0
Let its singular values be -1,..
...
0
an
and let q contain the bottom row of its right
. , U,
singular vector matrix, i.e. A = BtB, BtB is an arrow matrix. q is by convention in
the positive quadrant.
Define dq to be the surface-area element on the sphere in R .
Lemma 1. For an arrow matrix A, let f be the unique map f : (c, d)
(q, A). The
-
Jacobian of f satisfies:
dqdA
-* dcdd.
=
1 ci
The proof is after Lemma 3.
Lemma 2. For a broken arrow matrix B, let g be the unique map g: (a, b)
-
(q, or).
The Jacobian of g satisfies:
dqdu
= 1 -qi
.dadb.
f1=1 ai
The proof is after Lemma 3.
Lemma 3. If all elements of a, b, q, - are nonnegative, and b, d, A, o- are ordered, then
f and g are bijections excepting sets of measure zero (if some bi = bj or some di = dj
for ij).
Proof. We only prove it for
f;
the g case is similar. We show that
f
is a bijection
using results from Dumitriu and Edelman [12], who in turn cite Parlett [57]. Define
19
the tridiagonal matrix
6
7)1
61
1
0
0
2
C2
0
0
0
En-1 ?7n-1
to have eigenvalues dj, ... , d,_ 1 and bottom entries of the eigenvector matrix u
c + -.. - + c2_ 1. Let the whole eigenvector matrix be
(cI,... , cn1)/y, where y =
U. (d, u) + (6, r) is a bijection[12],
[57]
excepting sets of measure 0. Now we extend
the above tridiagonal matrix further and use ~ to indicate similar matrices:
? 1
61
0
0
0
6I
T/2
62
0
0
0
0
6n_1
7n_1
1
0
0
0
1Y
C
(ci, ... ,cn1)
I,
,
c_1,
Un_17
-
e (u, y) is a bijection, as is (ca) +
bijection from (ci,...
0. (cn, y,
dn_ 1
(ca), so we have constructed a
), excepting sets of measure
cn, di,.. . , dn1) + (cn, y,1,
c) defines a tridiagonal matrix which is in bijection with (q, A) [12], [57].
Hence we have bijected (c, d) ++ (q, A). The proof that
f is a bijection is complete.
Proof of Lemma 1. By Dumitriu and Edelman [12], Lemma 2.9,
H
dqdA-=
dcnd-ydcdTI.
Also by Dumitriu and Edelman [12], Lemma 2.9,
n-I
dddu
=
=_
R=1 Ci
dcdq.
Together,
dqd\ =
n-
R =1 q dCndddud-y
u2
1Y R1i
20
D
The full spherical element is, using -y as the radius,
dci
-cn
=
_n-2
dudy.
Hence,
dqdA =q
dcdd,
which by substitution is
dqdA
I qj dcdd
=
R=1 ci
Proof of Lemma 2.
Let A = BtB.
dA
=2
]
-ido-, and since H
det(B t B) = det(B)2 = a2 H1 -1 b , by Lemma 1,
dqd-
The full-matrix Jacobian
(
,9(a,b)
2
2nan
Hn1(bici)
dcdd.
is
2a,
bn_ 1
2anI
a(c, d)
a(a, b)
2a,
2b,
a,
2bn-1
an_-
The determinant gives dcdd = 2nan Hn
u n
dqdo- = IM--
b'dadb. So,
n-1 b
fi=
n-1c
dadb
U=>n
1
ai
J
K
21
dadb.
1 O- 2
2.3
Further Arrow and Broken-Arrow Matrix Lemmas
Lemma 4.
n-i
qk
+1
-
-1/2
2
Ck
(Ak
j=1
-
d)
Temporarily fix vn = 1.
Proof. Let v be the eigenvector of A corresponding to Ak.
Using Av = Av, for
j
= cj /(Ak
< n, v
- dj). Renormalizing v so that IIvII = 1, we
El
get the desired value for v, = qk.
|xi -
Lemma 5. For a vector x of length 1, define A(x) =H<
n-i
A(A) = A(d)
n
JJlCkj
-1.qk
k=1
k=1
Proof. Using a result in Wilkinson
[67], the characteristic polynomial of A is:
n-1
n-I
n
p(A) = f(A - A) = j(di - A)
i=1
xj|. Then,
A -1
Cn
j=1
i=1
2
C3
cij
-
AJ
(2.1)
Therefore, for k < n,
n-I
n
p(dk) =
J(A - dk)
(2.2)
(di - dk).
H
=
i=1
i=1,izk
Taking a product on both sides,
n-1
n n-1
f fj(A - dk)
=( -
)n-
)2
c .
k=1
i=1 k=1
Also,
p'(Ak)
=
(Ai - A)
i=1,ik
n-1
n-i
n
=
(di
-f
i=1
22
- Ak)
I +
E
j=1
(dj
2
C.
- Ak )2}
(2.3)
Taking a product on both sides,
n n-1
i=1 k=1 n-i
Equating expressions equal to
171
n
n-1
i=1
j=1
-
2
(d
A3
]kJ_- (Ai - dk), we get
n-I
n
n-1
2
A(d) 17c = A(A)2 9
k=1
(
1+
i=1 (
-
2
i
j=1
E
The desired result follows by the previous lemma.
Lemma 6. For a vector x of length 1, define A 2 (x)
-717<3Ix2
-
x
\.
The singular
values of B satisfy
n-I
2
A (u) -
n
H Iakbj 171 q;-'.
A 2 (b)
k=1
k=1
l
Proof. Follows from A = B'B.
2.4
Jack and Hermite Polynomials
The proof structure of this section, culminating in Theorem 2, is due to my collaborator Praveen Venkataramana, as are several of the lemmas.
As in [141, if K F k, K =
it sums to k. Let a = 2/.
(KI, K2
,.. .)
Let pc
=
is nonnegative, ordered non-increasingly, and
>_
Ki(Ki -
1 - (2/a)(i - 1)). We define l(K)
to be the number of nonzero elements of K. We say that A < K in "lexicographic
ordering" if for the largest integer j such that pi =
Ki
for all i < j, we have pIj <
Kj.
Definition 1. As in Dumitriu, Edelman and Shuman [141, we define the Jack polynomial of a matrix argument, Cfj'(X), as follows: Let x 1 , ...
of X.
,
xn be the eigenvalues
Cfj3(X) is the only homogeneous polynomial eigenfunction of the Laplace-
Beltrami-type operator:
D* =
ZX
2
i=
+3 1<i4j<n xi
23
'x3
x
with eigenvalue p' + k(n - 1), having highest order monomial basis function in lexicographic ordering (see [14], Section 2.4) corresponding to
S
i.
In addition,
Cf)(X) =trace(X)k.
&4-k,l(i)<n
Lemma 7. If we write Cfi (X) in terms of the eigenvalues x 1 ,. . . , xa, as C (i(x, ...
then C
(xi,
.(.).
, i_1)
=
C
(xi,...
,
xn_ 1 , 0) if l(s) < n. If l(K) = n, CXn(xi,
...
, Xn),
, x-
0.
Proof. The l(K) = n case follows from a formula in Stanley [63], Propositions 5.1 and
5.5 that only applies if K, > 0,
Cfj)(X) oc det(X)C
_1 ,.
3
If rn = 0, from Koev [43, (3.8)], C,(j (xi, ... , x,-1) = C( ) (xi, ...
, xn_ 1 , 0).
E
Definition 2. The Hermite Polynomials (of a matrix argument) are a basis for the
space of symmetric multivariate polynomials over eigenvalues x 1 ,...
,
x,
of X which
are related to the Jack polynomials by (Dumitriu, Edelman, and Shuman [14J, page
17)
H()(X) =
O.
-
c
where o- C r means for each i, o- < rj, and the coefficicents c13 are given by (Du-
mitriu, Edelman, and Shuman [14], page 17). Since Jack polynomials are homogeneous, that means
HO)(X) oc C()(X) + L.O.T.
Furthermore, by (Dumitriu, Edelman, and Shuman /14], page 16), the Hermite Polynomials are orthogonal with respect to the measure
exp(42
i=1
f
24
i54j
x
-
l
1,
0)
Lemma 8. Let
C1
Cl
Pn-1
Cn_1
Cf-1
Cn-I
C
[I
A (p, c) =
C1
...
j
C1
Cn-1
...
Cn
and let for l(Q) < n,
n-1
4c_
Q(P, cn) =
1
H O)(A(_, c)) exp(-c'--
- c_ 1 )dc .. dc2_.
i=1
Q
is a symmetric polynomial in p with leading term proportional to H
terms of order strictly less than
3(M)
plus
i|.
Proof. If we exchange two ci's, i < n, and the corresponding pi's, A(,
c) has the same
eigenvalues, so HfjO(A(y, c)) is unchanged. So, we can prove Q(P, cn) is symmetric
in p by swapping two pi's, and seeing that the integral is invariant over swapping the
corresponding ci's.
Now since H )(A(p, c)) is a symmetric polynomial in the eigenvalues of A(p, c),
we can write it in the power-sum basis, i.e. it is in the ring generated by t, =
AP + -
+ AP, for p
=
0, 1, 2,3,.. ., if A,,... , An are the eigenvalues of A(p, c). But
tp = trace(A(p, c)P), so it is a polynomial in p and c,
H
j) (A(p, c))
ypi,(p)c'c"1
=
. ..
ci[_1.
20 Ei,..,En_1>O
Its order in p and c must be jrj, the same as its order in A. Integrating, it follows
that
Q(P, cn)
pE([t)ciMe,
=
i>0 Ei,...,f"__12
25
for constants M. Since deg(Hj3 (A(y, c))) = I'i, deg(pi, (y)) < Ir -
Q(1, cn) = M6POd(P) +
-
i. Writing
pi,E(1p)ci-Mc,
5
(i,E)#(O, )
we see that the summation has degree at most
Hrl
- 1 in p only, treating c, as a
constant. Now
PO 6(p)
=
()
Hr
where r(p) has degree at most Jr
M
0
0
0
Hja(p) + r(p),
-
This follows from the expansion of H/j3 in
1.
Jack polynomials in Definition 2 and the fact about Jack polynomials in Lemma 7.
EI
The new lemma follows.
Lemma 9. Let the arrow matrix below have eigenvalues in A = diag(A1,..., An) and
have q be the last row of its eigenvector matrix, i.e. q contains the n-th element of
each eigenvector,
ci
c1
1k
M
A(A, q)
/Tn-I
LC1
Cn-1
Cn-1
cnI
c
j
c1
-
...
1
cn
By Lemma 3 this is a well-defined map except on a set of measure zero. Then, for
U(X) a symmetric homogeneous polynomial of degree k in the eigenvalues of X,
n
V(A)= Jf
q,-U(M)dq
is a symmetric homogeneous polynomial of degree k in A1,..., An.
Proof. Let en be the column vector that is 0 everywhere except in the last entry, which
is 1. (I - ene' )A(A, q)(I - enet) has eigenvalues {yi, ...
26
,
p_1, 0}. If the eigenvector
matrix of A(A, q) is
Q, so must
Qt (I - ene')QAQ t(I - ese')Q
have those eigenvalues. But this is
(I - qqt)A(I - qqt).
So
U(M) = U(eig((I - qq t )A(I - qq t ))\{O}).
(2.4)
It is well known that we can write U(M) in the power-sum ring, U(M) is made of
sums and products of functions of the form pp, + - - -
+
_
where p is a positive
integer. Therefore, the RHS is made of functions of the form
p+-
-,
+ OP = trace(((I - qqt)A(I - qqt))P),
which if U(M) is order k in the pi's, must be order k in the Ai's. So V(A) is a
polynomial of order k in the A's. Switching A1 and A2 and also qi and q2 leaves
J
q U(eig((I - qq')A(I - qq'))\{0})dq
invariant, so V(A) is symmetric.
Theorem 2 is a new theorem about Jack polynomials.
Theorem 2. Let the arrow matrix below have eigenvalues in A = diag(A,,..., A,)
and have q be the last row of its eigenvector matrix, i.e. q contains the n-th element
27
of each eigenvector,
Cl
Cl
M
A(A, q) =
Cl
k2n-I
Cn-1
Cn-1
Cn
...
cn-1
Cl
Cn_1
Cn
By Lemma 3 this is a well-defined map except on a set of measure zero. Then, if for
a partition r,, l(r;) < n, and q on the first quadrant of the unit sphere,
-n
Cl$;I) (A) cx
Proof. Define
q
,n
77(0)J(A)
H(3)(M)dq.
=q-
i=1
This is a symmetric polynomial in n variables (Lemma 9). Thus it can be expanded
in Hermite polynomials with max order r (Lemma 9):
(e) (A)
rK)H(")(A),
=C(rK('),
,~(K)O)
where Isj = /-i1 + r 2 +
+ rS(,). Using orthogonality, from the previous definition of
Hermite Polynomials,
c(r,(O) r') OC
AERn/
x exp(-
2
/Anj
H,()3)(M)HL (" (A)
qf-
trace(A2 )) n
IAi - Aj[dqdA.
Using Lemmas 1 and 3,
C(rO),I r)
OC
q
28
1Hn(3)(M)H )(A)
x exp(--trace( A2 ))
I dydc.
i-
l
2
ci
Using Lemma 6,
Jc
0
c(r( ), iK) cX
x exp(-
2
HN(' (M) H (")(A)
trace(A 2 ))
Hj I1j
-
yjdpdc,
1
i:Ai
and by substitution
c(r
0
lu
- Hsi
'
(M)HAf " (A (A,7 q))
), I's) c
x exp(-
2
trace(A(A, q) 2 ))
S-pdpdc
Define
QC,
Q(p, c,)
c(Y) =
C
1
Ho0(A(A, q)) exp(-c
-
-
c_ 1 )dci
1 -
-l-
is a symmetric polynomial in p (Lemma 8). Furthermore, by Lemma 8,
Q(p, cn) oc H) (M) + L.O.T.,
where the Lower Order Terms are of lower order than
JK( 0 )
and are symmetric poly-
nomials. Hence they can be written in a basis of lower order Hermite Polynomials,
and as
H) (M)Q(pt, cn)
c(i(0), rI) cJ
x
H
itj - P| 1texp
(C + p+_+
itj
we have by orthogonality
c(r(0 ),
,)
c 6(r0), r),
29
pi)
dpdc,
where 6 is the Dirac Delta. So
r01 (A)
Jf
q -H(
(M)dq oc H0)(A).
By Lemma 9, coupled with Definition 2,
C
Cj) (M)dq.
q-
(A) cx f
i=1
El
Corollary 1. Finding the proportionalityconstant: For l(K) < n,
C
(A) =
J
2n- F(no/2)
17(,3/2)n
(In)
1qC)((I -
qq t )A)dq.
Proof. By Theorem 2 with Equation (2.4) (in the proof of Lemma 9),
qI C')(eig((I - qqt)A(I - qqt)) - {O})dq,
Cr(3)(A) ocx
which by Lemma 7 and properties of matrices is
Cr
(A) oc
fl
q - 1C(1)((I
- qqt) A)dq.
n
, i=1
Now to find the proportionality constant. Let A = I,
and let cp
be the constant of
proportionality.
CK( (In)
=C
-
q1- - C) (I - qq jdq.
Since I - qqt is a projection, we can replace the term in the integral by Cn)(In1),
which can be moved out. So
c<(')(I=)
(
30
qftldq
Now
q -idq
2
f~ 'rnOIerV2 d
,
dr
1(n/3/2) J
f f
2
jfrq
')OI-1
F(r 0/2)
n
2
2
2
n31~~
jo
xl
F(_/2)_
F(r 0/2)
-~
(0/2)n
2n-
(rn-ldrdq)
xdx
(00
F(n /3/2)
qI311d
x-
-
e
X
F(T /2)
n
exJ
n
2J
117(no/2)'
and the corollary follows.
Corollary 2. The Jack polynomials can be defined recursively using Corollary 1 and
two results in the compilation [411.
Proof. By Stanley [63], Proposition 4.2, the Jack polynomial of one variable under
the J normalization is
(l')(A,) = An (1 + (2/3))
...
(1 +
(K1 -
1)(2/3)).
There exists another recursion for Jack polynomials under the J normalization:
n
J(n - i + 1 + (2/# 3 )(ni - 1)),
Jr3)(A) = det(A)J(-1,. ..,_l)
if ru
> 0. Note that if rIn > 0 we can use the above formula to reduce the size of , in
a recursive expression for a Jack polynomial, and if
Kn
= 0 we can use Corollary 1 to
reduce the number of variables in a recursive expression for a Jack polynomial. Using
those facts together and the conversion between C and J normalizations in [14], we
E
can define all Jack polynomials.
31
Hypergeometric Functions
2.5
Definition 3. We define the hypergeometric function of two matrix arguments and
parameter 3, oFo3(X, Y), for n x n matrices X and Y, by
00
oFo(3X, Y)
=
k=O
Cf(j (X)Cfj (Y)
k!Cfj3 (I)
i&'k,l(r)<n
as in Koev and Edelman [43]. It is efficiently calculated using the software described
in Koev and Edelman [43], mhg, which is available online [42]. The C's are Jack
polynomials under the C normalization, r H k means that r is a partition of the
integer k, so
1
;>
r12 > ...
> 0 have |jr=k = x1+
2+ --- = k.
Lemma 10.
0 FO (
(X, Y) = exp (s - trace(X)) oFo ( 3 )(X, Y - sI).
Proof. The claim holds for s = 1 by Baker and Forrester [4]. Now, using that fact
with the homogeneity of Jack polynomials,
oFo()(X, Y - sI) = oFo()(X, s((1/s)Y - I)) = oFo('3 )(sX, (1/s)Y
-
I)
exp (s -trace(X)) oFo(') (sX, (1/s)Y) = exp (s -trace(X)) oFo(' (X, Y).
Definition 4. We define the generalized Pochhammer symbol to be, for a partition
K
=
('I,
.
,K)
(a
=riri
)(3 i=1 j=1
a -
2
2
Definition 5. As in Koev and Edelman [43], we define the hypergeometric function
1F1
to be
00
iF1
(a),3
(a;b;X,Y) =
()
k=O
-k,l(K)<n
32
C.,()C
C
(Y
!(X)C()(Y)
The best software available to compute this function numerically is described in Koev
and Edelman [43], mhg.
Definition 6. We define the generalized Gamma function to be
n
F($)(c) =
n(n-1),3/4
f
-(c -
(i - 1)3/2)
for !R(c) > (n - 1)3/2.
2.6
The
-Wishart ensemble, and its Spectral Dis-
tribution
The ,3-Wishart ensemble for m x n matrices is defined iteratively; we derive the m x n
case from the m x (n - 1) case.
Definition 7. We assume n is a positive integer and m is a real greater than n - 1.
Let D be a positive-definite diagonal n x n matrix. For n = 1, the 43- Wishart ensemble
is
Xm,31/2
1,1
0
Z=,
0
with n - 1 zeros, where X-i3 represents a random positive real that is x-distributed
with m/3 degrees of freedom. For n > 1, the 0- Wishart ensemble with positive-definite
diagonal n x n covariance matrix D is defined as follows: Let
TI, ...
,
r_1 be one
draw of the singular values of the m x (n - 1) /3-Wishart ensemble with covariance
Dl:(n-1),1:(n-1).
Define the matrix Z by
T=
1/2
n--
X Dn,n
3
1/2
X(m-n+1)ODn,n
33
, on be the singular
All the X-distributed random variables are independent. Let o,.
. .
values of Z. They are one draw of the singularvalues of the mxn
/3-Wishart
ensemble,
completing the recursion. Ai = a are the eigenvalues of the 3-Wishart ensemble.
Theorem 3. Let E = diag(o-1,
,
...
o-), oi > o-2 >
...
> o-n.
The singular values of
the 0- Wishart ensemble with covariance D are distributed by a pdf proportionalto
( m--n+I),--1A2(,,)O oFo(O)
det(D) -mo/2
D-1
-E2,I
do-.
It follows from a simple change of variables that the ordered Ai 's are distributed as
CW' det(D)-m//
2
A2
-
A, D-)
lA(A)OoFo-3)
dA.
Proof. First we need to check the n = 1 case: the one singular value a- is distributed
as
-1 =
1D
which has pdf proportional to
=D,
(,
exp
Di-m/2
1
I1
2
.mno-1
dcri.
al
We use the fact that
OFo(O)
- a , D -1
= Fo ( )
--2 D 1,1
The first equality comes from the expansion of OFO in terms of Jack polynomials and
the fact that Jack polynomials are homogeneous, see the definition of Jack polynomials and OFO in this paper, the second comes from (2.1) in Koev [41], or in Forrester
[28]. We use that oFo"(X, I) = oFo()(X), by definition [43].
Now we assume n > 1. Let
1/2
rn-
XpDn,n
T_
1/2
X(m-n+1),Dn,n
34
a,
an
so the ai's are X-distributed with different parameters. By hypothesis, the Ti's are a Wishart draw. Therefore, the ai's and the Ti's are assumed to have joint distribution
proportional to
n-
det(nm//
JfJ
2
(
+
(m-n±2 )/ 3 -- lA(T13F(
i=1
a
where T = diag(i,
...
1
a m-n+10m exp
)
12
-- T 2I D-1:n1
(
4
2D
1,1:n 1)
dadr,
, Tn_1). Using Lemmas 2 and 3, we can change variables to
n- 1F
fI
det(D)m!rn3 / 2
3
Ti~mn±)
-A2(),Fo(
-) T2,D _1,1:n
1)
i=1
a m-n+l)-l exp
ai)
- dadq.
k\2Dn,n
Using Lemma 6 this becomes:
n-1
det(D)-m3/
2
J-
(m- n+1)O-1
2 (a)oF
3 (-
T2, D-1
1,1:n-1)
i=1
x exp
q''do-dq
a,)f
-j
i 1
flnfl
Using properties of determinants this becomes:
n
det(D)m! 3 /
2
Im~)
3
3
A()
F()
(_2 T 1D1:
1,1:n 1)
n
X21
x exp
1n,n
q13 1 d1odq.
To complete the induction, we need to prove
oFo)
(
- E2 , D
1q
e-Il
12/(2Dn,n)
35
Fo()
-
T2, D-
dq.
We can reduce this expression using Ia|
2
+
Z
= E 1 or that it suffices to
show
(
exp (trace(E 2 )/(2Dn,n)) OFO()
oC
1
q-1 exp (trace(T 2 )/ (2Dn,n)) OFO(
)
2
, D-1)
(nT2,
D-
.1)1:n-1
) dq,
or moving some constants and signs around,
exp ((-1/Dn,n)trace(- E2 /2)) OFO(")
oc
exp ((- 1/ (Dn,n)) trace( -T
q
(
2 D-
IT 2,ID
oFo(O)
2/2))
. ),1:(n1))dq,
or using Lemma 10,
oc
I
.1
2, D
DE2
oFo
nJ
Dn'
n
IT 2, D-1
q -1 oFo(")
~
-
-In_ 1) dq.
We will prove this expression termwise using the expansion of OFO into infinitely many
Jack polynomials. The (k, r,) term on the right hand side is
fn
IC
T
K3)
Cro)
I
D
-1)
-
n~
In
1)
dq,
where , - k and l(K) < n. The (k, K) term on the left hand side is
C o)
where
'
-
Cro)
D-1
-
In_
- k and I(K) < n. If l(K) =n, the term is 0 by Lemma 7, so either it has a
corresponding term on the right hand side or it is zero. Hence, using Lemma 7 again
it suffices to show that for l(r,) < n,
- C3)(T )dq.
iiq
1
C$,) (y 2 )
36
2
0
This follows by Theorem 2, and the proof of Theorem 3 is complete.
Corollary 3. The normalization constant, for A, > A2 > -
> An:
OW- 1
m,n
where
"~
2mno/2
7rn(n-1)0/2
(n/3/2)
(m3/2)F$)
pF~)
F(0/2)n
Proof. We have used the convention that elements of D do not move through oc, so
we may assume D is the identity. Using OFO(' 3 )(-A/2, I)
-
exp (-trace(A)/2), (Koev
[411, (2.1)), the model becomes the /-Laguerre model studied in Forrester [25].
E
Corollary 4. Using Definition 6 of the generalized Gamma, the distribution of Amax
for the
43-Wishart ensemble with general covariance in diagonal D, P(Amax < x), is:
'n)(I
+ (n - 1)0/2)
r,)(1) +(m + n - 1)0/2)
det
mD
1
F
M
;
+n
-
+1;
xD-.
2
2
2
Proof. See page 14 of Koev [41], Theorem 6.1. A factor of / is lost due to differences in
nomenclature. The best software to calculate this is described in Koev and Edelman
[43], mhg. Convergence is improved using formula (2.6) in Koev [41].
Corollary 5. The distribution of Amin for the
El
3- Wishart ensemble with general co-
variance in diagonal D, P(Amin < x), is:
nt~
1 - exp (trace(-xD-1 /2))
Q)(xD--1/2)
E
k=O &1k,,1<t
It is only valid when t = (m - n
k(
k!
+ 1)0/2 - 1 is a nonnegative integer.
Proof. See page 14-15 of Koev [41], Theorem 6.1. A factor of
4
is lost due to differ-
ences in nomenclature. The best software to calculate this is described in Koev and
D
Edelman [43], mhg.
[41] Theorem 6.2 gives a formula for the distribution of the trace of the 3-Wishart
ensemble.
37
0.90.80.70.60.50.40.30.20.10
0
20
60
40
80
100
120
Figure 2-1: The line is the empirical cdf created from many draws of the maximum
eigenvalue of the -Wishart ensemble, with m = 4, n = 4, ,3 = 2.5, and D =
diag(1.1, 1.2, 1.4, 1.8). The x's are the analytically derived values of the cdf using
Corollary 4 and mhg.
The Figures 2-1, 2-2, 2-3, and 2-4 demonstrate the correctness of Corollaries 4
and 5, which are derived from Theorem 3.
2.7
The
#-Wishart
Ensemble and Free Probability
Given the eigenvalue distributions of two large random matrices, free probability
allows one to analytically compute the eigenvalue distributions of the sum and product
of those matrices (a good summary is Nadakuditi and Edelman [59]). In particular,
we would like to compute the eigenvalue histogram for XtXD/(m), where X is a
tall matrix of standard normal reals, complexes, quaternions, or Ghosts, and D is a
positive definite diagonal matrix drawn from a prior. Dumitriu [13] proves that for the
D = I and 0 = 1, 2,4 case, the answer is the Marcenko-Pastur law, invariant over 6.
So it is reasonable to assume that the value of 3 does not figure into hist(eig(X t XD)),
where D is random.
We use the methods of Olver and Nadakuditi [56] to analytically compute the
product of the Marcenko-Pastur distribution for m/n -- + 10 and variance 1 with
the Semicircle distribution of width 2f2/centered at 3. Figure 2-5 demonstrates that
38
0.90.80.70.60.50.4-
0.30.2 0.1 0
0
10
20
40
30
50
60
70
Figure 2-2: The line is the empirical cdf created from many draws of the maximum
eigenvalue of the 3-Wishart ensemble, with m = 6, n = 4, / = 0.75, and D =
diag(1.1, 1.2, 1.4, 1.8). The x's are the analytically derived values of the cdf using
Corollary 4 and mhg.
1 -0
0.90.80.7-
0.60.5-
0.40.30.2-
0.10
5
15
10
20
25
Figure 2-3: The line is the empirical cdf created from many draws of the minimum eigenvalue of the /-Wishart ensemble, with m = 4, n = 3, 3 = 5, and
D = diag(1.1, 1.2, 1.4). The x's are the analytically derived values of the cdf using Corollary 5 and mhg.
39
0.9-
0.80.70.6-
0.4
-
0.3
-
0.2
0.1
5mlig
and
00
8
1'0
12
14
Figure 2-4: The line is the empirical cdf created from many draws of the minimum
eigenvalue of the O-Wishart ensemble, with m = 7, n = 4, 3 = 0.5, and D =
diag (1, 2, 3, 4). The x's are the analytically derived values of the cdf using Corollary
5 and mhg.
the histogram of 1000 draws of XtXD/(m3) for m = 1000, n = 100, and / = 3,
represented as a bar graph, is equal to the analytically computed red line. The
/-
Wishart distribution allows us to draw the eigenvalues of XtXD/(mO), even if we
cannot sample the entries of the matrix for 3 = 3.
2.8
Acknowledgements
We acknowledge the support of National Science Foundation through grants SOLAR
Grant No. 1035400, DMS-1035400, and DMS-1016086. Alexander Dubbs was funded
by the NSF GRFP.
We also acknowledge the partial support by the Woodward Fund for Applied
Mathematics at San Jose State University, a gift from the estate of Mrs. Marie Woodward in memory of her son, Henry Tynham Woodward. He was an alumnus of the
Mathematics Department at San Jose State University and worked with research
groups at NASA Ames.
40
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
1
2
3
4
5
6
7
Figure 2-5: The analytical product of the Semicircle and Marcenko-Pastur laws is the
red line, the histogram is 1000 draws of the ,-Wishart (3 = 3) with covariance drawn
from the shifted semicircle distribution. They match perfectly.
41
42
Chapter 3
A Matrix Model for the
#-MANOVA
Ensemble
Introduction
3.1
Recall from the thesis introduction, that:
Beta-MANOVA Model Pseudocode
Function C := BetaMANOVA(m, n, p, #, Q)
A
:=
BetaWishart(m, n ,3 Q2 )
M :=BetaWishart (p, n , 13, A-1-
C := (M + I)-2
Our main theorem is the joint distribution of the elements of C,
Theorem 4. The distributionof the generalizedsingular values diag(C)
c1 > c2 >
...
> cn, generated by the above algorithm for m, p
2nldi3)
nnI
2 +p" n det(Q)P'
m,n
p,n
17
i=1
- _7J(i
cl
(p-n+l)/-1
-
c
3)1
=
(c 1, ...
,Cn)
> n is equal to:
-
C |#
i<j
i=1
x 1Fo) (
43
-2P
3; C2(C2
-
I)
1
Q2
dc.
where 1Fj()) and IC$
2
are defined in the upcoming section, Preliminaries.
We also find the distributions of the largest generalized singular value in certain
cases, generalizing Dumitriu and Koev's results in on the Jacobi ensemble in [15].
Theorem 5. If t = (m - n + 1)0/2 - 1 E Z>O,
P(ci < x) = detr
2
2 )I+ x2Q2)1)92
Q2
nt
(
I
x
(pO/2)f)C3) ((1
-
x2)((1
-
x 2 )I + x 2 Q2 -1 )
,
(3.1)
k=O K1-kK<t
and Pochhammer symbol (.) ) are defined in the
where the Jack polynomial C
upcoming section, Preliminaries.
The following section contains preliminaries to the proofs of Theorems 4 and 5 in
the general 1 case. Most important are several propositions concerning Jack polynomials and hypergeometric functions. Proposition 1 was conjectured by Macdonald
[47] and proved by Baker and Forrester [4], Proposition 3 is due to Kaneko, in a paper
containing many results on Selberg-type integrals [39], and the other propositions are
found in [26, pp. 593-596].
3.2
Preliminaries
Definition 8. We define the generalized gamma function to be
F$()(c)
-
7nCn-i34
J
F(c - (i - 1)13/2)
for W(c) > (n - 1)1/2.
Definition 9.
2mn /2
p() (m/3/2) IF) (n1/2)
rmn
n(n-1)0/2
F(13/2)n
Definition 10.
A(A) =
f(Ai i<j
44
Aj).
If X is a diagonal matrix,
A(X)
H IXi,i - X, L.
i<j
As in [141, if
F k, K
H
(K
1
, K2,...
is nonnegative, ordered non-increasingly,
, Kn)
n_1
and it sums to k. Let a = 2/3. Let po =
l()
j(rj - 1 - (2/a)(i - 1)). We define
to be the number of nonzero elements of K. We say that p < K in "lexicographic
ordering" if for the largest integer j such that pi = Ki for all i < j, we have yj <Kj.
Definition 11.
We define the Jack polynomial of a matrix argument, Cfi'(X), (see,
for example, [14]) as follows: Let x 1 ,... ,x, be the eigenvalues of X. CK (X) is the
only homogeneous polynomial eigenfunction of the Laplace-Beltrami-type operator:
n9
D* =
n
with eigenvalue pa
x2
i
+13.
-
j-
1 Ei54j~n T1
xj
xi'
-
+ k(n - 1), having highest order monomial basis function in lex-
icographic ordering (see Dumitriu, Edelman, Shuman, Section 2.4) corresponding to
K. In addition,
S
Cl)(X) =trace(X)k.
Kk,l(K) n
Definition 12.
We define the generalized Pochhammer symbol to be, for a partition
K = (K1, . . . , Ki)
fj
(a)() =
(a-
-
+13 - 1).
)
i=1 j=1
Definition 13. As in Koev and Edelman
[43], we define the hypergeometric function
SF(O to be
00 (al)13 ...(ap)l
F,
na; X ,k
(b
C,(
3(X)C,('
)K(' -... (bq) K$
k!Cf3
(Y)
(I)
The best software available to compute this function numerically is described in Koev
and Edelman, mhg, [43]. pFO (a; b; X) = pF9O(a; b; X, I).
45
We will also need several theorems from the literature about integrals of Jack
polynomials and hypergeometric functions.
The first was conjectured by MacDonald [47] and proved by Baker and Forrester
([4], (6.1)) with the wrong constant. The correct constant is found using Special
Functions [1, p. 406] (Corollary 8.2.2):
Proposition 1. Let Y be a diagonal matrix.
"F" (a + (n - 1)>/2 + 1)(a + (n - 1)3/2 + 1)
y a+(n-l)/
where c 3) =
-n(n-1) 3 / 4 n!F(3/2)
2
C3)(Y-1)
OF0, (-X,Y) IX aCj)(X)|A(X |dX,
+1
In I
F(i/3/2).
From [26, p.593],
Proposition 2. If X < I is diagonal,
1Fo(
- XI.
; X) = 1II(a;
Kaneko, Corollary 2 [39]:
Proposition 3. Let
-1
K
=
..
(I,.
, n) be nonincreasingand X be diagonal. Let a,b >
and /3> 0.
O<X<I
C
=~
C(j (X)A(X), 3
[xa(I -
Xi)b]
dX
1
)
F(i/2 + i)F(h2 + a + (//2)(n
f Cr()
-
i) + 1)F(b + (#/2)(n - i) + 1)
((0/2) + 1)F(Ki + a + b + (0/2)(2n - Z'- 1) + 2)
From [26, p. 595],
Proposition 4. Let X be diagonal,
2 FI$')(a,
b; c; X)
- a, b; c;
=
2 FI))(c
=
2 Fle)(c -
-X(I
-
X )1)1I
-
a, c - b; c; X)|I - Xicab
46
X|-b
From [26, p. 596],
Proposition 5. If X is n x n diagonal and a or b is a nonpositive integer,
2 F/)(a, b;
c; X)
2 FP)(a,b;
-
C; I) 2 Fo)(a,b; a + b + 1 + (n - 1)3/2 - c; I - X).
From [26, p. 5941,
Proposition 6.
2 FIP)(a,b;
3.3
0)
c; I)
(c)
)(c - a - b)
(c - a)F13(c
r)
-
b)
Main Theorems
Proof of Theorem 4.
BetaWishart(m, n
We will draw M by drawing A ~ P(A) =
Let m, p > n.
, Q 2 ), and compute M by drawing
M ~ P(MIA) = BetaWishart(p, n, /3, A- 1 )- 1 .
The distribution of M is f P(MIA)P(A)dA.
I)-2.
Then we will compute C by C = (M +
We use the convention that eigenvalues and generalized singular values are
unordered. By the paper [10], the BetaWishart described in the introduction, we
sample the diagonal A from
P(A)
-
det(A MI
n
A+1
2t +
,
1
A-Aj 3oFo('3 )
n!/m ,n
2mn,/2
m,n
(
1
2A,
Q-2)
dA,
F(nl3/2)
p$()(m3/2)P
,n(n-1)0/2
Likewise, by inverting the answer to the [10] BetaWishart described in the introduction, we can sample diagonal M from
47
n
3 2
= det (A)p /
P(M|A)
P(M ~
rIKj'
n 3)
P,
+I3 ~
-1
4
-
pt 1i0F(13
dp.
M-1,A
z<j
To get P(M) we need to compute
det(Q) m,,
n
- p-n+l1 1 3
_
-
n!2
X
det(A)p, 3 / 2
i
H
mC
2
1<J
J+
2/3
rn-i
1
1 L~dAt
I~t
A A~F( 3)
(Q-2
k\Fo
A1,-=-,Ai;>
(2
oFo(O)-- M
-A)
Expanding the hypergeometric function, this is
det(Q)-"
3
nm,nlvp
ii
n
2
[l-
0
n i=1
n
0
k
i<j
2-
l1.-,An>0
Aj'oO
1
)
(.IM-1)
k2
I
dp
&k
I
m-n+p+1
1Ai
C
-
-Q-
-2
Cf3 (A) dA]
-A)
i<j
i=1
Using Proposition 1,
det(Q)-"l
2)(Sm,n
p,n
n i=1
x n!Km+p,n2
j
- -1
_1
GC (-jMI)
k 2 I)
i<j
(3)
m+p
det
2
2(m+p)nO/
dp
k=O &d-k
(Q-2
3)
(2
)
-+p,3
C(j3 ) (2Q2)
Cleaning things up,
det(Q)P,3KC(")
(3) 1-(3)
n!/%m,n/\p,n
n
p
_- n4
00
P j11ECE
r
i=
k=O Kd-k
i~i
(
n-p
2
.~3) (~)
CK( 3 (2Q2)
Il
48
(3 )
(I)
k!C Q()I
dp
1
A) dA.
By the definition of the hypergeometric function, this is
det(Q)P K~
3 )(1 p,n
-T+
Pn+-
_-1 iH j
i=1
m,nJ-p,n
1p
-
1Fo (\;;
-
1 Q2) dp.
-M
igj
(3.2)
Converting to cosine form, C = diag(cl,
n
IflcP-
n!K+p,(2-- det(Q')"#
...
,
i-
-n-14)3-1 rJJ(j
m,np,n
cn ) = (M +
2~ _
p±~nf-I
-
i<j
i=1
mP
x 1Fo(O)
Theorem 6. If we set Q
=
I and ui
=
this is
1)-1/2,
0 0; _ j)-i
;C2(C2
c , (u1,..
Q2 )
dc.
(3.3)
obey the standard/-Jacobi
.,u)
density of [46], [40], [28], and [22].
(W)
nn
-l
nnn1
1
( ) =
_3
!k_
nm,ni-p,n
-
J
1
U)
lu i
(3.4)
u /l#du.
-
j
Proof. Proposition 2 works from the statement of Theorem 4 because C 2 (C 2
-
I)-<
I (we know that M > 0 from how it is sampled, so 0 < C 2 = (M+ I)-1 < I, likewise
C2 _
I).
n
n
m+p,n
rl
C P-n+1)-1 rj(1
- ci)--
2
-
m,ni-p,n
i
-C
i<j
det(I
02(C2 _
-
1) -
dc,
or equivalently
n!K(
+p,( IC
,np,n
i=1
pn+1
JJ(i
-
C1)7m-
1
c
-
cIdc.
=1<j
If we substitute ui = c, by the change-of-variables theorem we get the desired result.
Proof of Theorem 5. Let H = diag(q 1 , . .., n) = M-.
49
Changing variables from (3.2)
we get
mp$
}f
det(Q)
nm,n
-n±13
2
p,n
11
=
m-
r1 ,q
1 F(
3 ((m+
p),3 / 2 ; ; H, -Q 2)cdb.
i<i
Taking the maximum eigenvalue, following [19],
det(Q)P3K<,3p
P(H < xI)
I! nK
0)I
77i2 I
x
'I
1
SIF(')((m + p) 3 / 2 ; ; H, -Q 2 )d1 .
'i<j
Letting N = diag(vi,... , vn) = H/x, changing variables again we get
det(Q)P)3Co
P(H < xI)
,
nI N I
-x
"0g
2
Jki
r1
2
N<I=1
-I 1)((m + p)/3
-
2
; ; N, -xQ 2 )dv,
i<j
Expanding the hypergeometric function we get
P(H <xI) =
n
det(Q)P,3/C(")
Kmn p~n
kO
-
((im + p)13/2)$j)Cfj)(-xQ2)
k=O r, k
J nj~~
Pfl± 31/3
7 2i'
V
r
k!Cfj" (I)
-
.Cf)(N) dv
(3.5)
N<I
Using Proposition 3,
ng +
N<I
ji__1
--C_
Nj d
i<j
C()
F(3/2
~
1
+ 1)n
H F(i/2 + 1)F (K + (/3/2)(p + 1 - i))F((3/2)(n
F(rsi + (#/2)(p + n - i) + 1)
1
i) + 1)
F(Ki + (0/2)(p + 1 - i))
(((n - 1)0/2) + 1) n
/2 + 1)n
1F(i
+ (/3/ 2 )(p + n - i) + 1)
CK)(I)Fn)((n±/2) + 1)F
n(n-1)
-
50
Now
n
flrK+
(/3/2)(p±+1-i)
I= F((j/2)(P+
-Zi))f1 ((/2)(p + I - i) + i - 1)
i~1
(p/2)fj((#3/2)(p+ 1- i) ±j -1)
ni
=
j=1
=r
(p4/2)(p#3/2)(O)
41
n
F (i + (3/2)(p + n - Zi)+ 1)
=lF((/3/2)(p+n-
i) + 1)
l ((3/ 2 )(p + n - i) +J)
j~1
i=1
wr
=
n
=
((p +n - 1)0/2+l)((/2)(p+n - Z)j)
]rF O)((p + n - 1)0/2 + 1)((p + n -- 1)0/2 + 1)(.
Therefore,
-
v| - Cfj)(N) dv
|i<j
Cf) (I)Fn((n3/2) + 1)IQ()(((n - 1),3/2) + 1)I(,f (p,3/2)
n(n
()
rr21(/3/2 ± 1)nr(,3)((p +Fn - 1)03/2 +F1)
((p
(p3/ 2))
+n
- 1),/2 +1)$j
Using (3.5) and the definition of the hypergeometric function we get
P(H < xI) = det(Q) 'KC,,n
n!KC2,O)nK/
kp
F$ 3((n43/2) + 1)IF((((n - 1)3/2) + 1)IF(7 (p4/2)
7 n(n 21)0
npO
XX 2 -*
2 F,
F(3/2 +)nF"I((p
(rn-FMp
(2
4l p3-3
'2'
+ n - 1)#/2 +)
p-+-n- -Fl+ ; _
- Q
3
2
Rewriting the constant we get
F(#3/2)nFFl)((m + p)4/2)
F$(3((no/2) + 1)F$ (((n - 1)0/2) + 1)
n!F7n (mO/2)F(" (n#/2)
F(3/2 + 1)nF$o((p + n - 1)0/2 + 1)
51
Commuting some terms gives
F(/2)nF
((n/3/2) + 1)
Fn('((m + p)0/2)F F)(((n - 1)0/2) + 1)
")(m#3/2)F()((p + n - 1),3/2 + 1)
n!F(,3/2 + 1)nF$()(nO/2),
The left fraction in parentheses is
LM F((nr3/2) + 1 - (i - 1),/32)
H= 1 (i/3/2) H., F((n#/2) - (i - 1)/3/2)
H>s F((i#3/2) +±1)
_1
j= (i3/2)
Hn 1(i0/2)
1
Hence
P(H < xI)
=
F()((m + p)3/2)F$()(((n - 1)0/2) + 1)
IF) (m/2)IF(((n + p - 1),3/2 + 1)
np
~
rnIp
+
p p-rI
x x XX-2FI 13)
0, 2-1;
+n-2 ~-13 + 1; xQ2 ).
det(Q)"-
(2
Now H= M-
1
and C
P(C < xl)
=
det(Q)? -FW ((m + p),/32)F()(((n - 1)3/2) + 1)
)(m,/2)F()((n + p - 1)0/2 + 1)
(3.6)
(M + I)-1/2, so equivalently,
-PO
2
1
2FI1(,\
X2)
+/ 01
; p + n -3
+
1-
x
2 2 . Q2 )
(3.7)
Remark. Using U = diag(Ui,... , un) = C2 and setting Q = I this is
P(U < xI) =
IF ((m + p)13/2)I
(((n - 1),3/2) + 1)
F2(m#/2) F() ((n + p - 1),3/2 + 1)
np,3
2
x(1
x)
*rnmFP p
2
( 2
p-mi
21-
so by using Proposition 4, this is
P(U < xI) =
F($)((m + p)/3/2)F($)(((n - 1)0/2) + 1)
Fn ) (mo/ 2) Fnf)((n + p - 1),3/2 + 1)
x-F n - m - I i)+
xX
2 F1 (3
3+1
52
p; p + -+1;XI
n
2'
2/
x
I),
which is familiar from Dumitriu and Koev [15].
Now back to the proof of Theorem 5. If we use Proposition 4 on (3.7) we get
P(C < xI) = det(x 2 Q 2 ((1-X 2 )I+X 2Q 2 )
x
2 F 1 (')
(
-nm-
F(3 ((m + p),3/2)F ()(((n - 1)/3/2) + 1)
1)
F() (mo/2)F(")((n + p - 1)/3/2 + 1)
+ 1, P; p + n -
k\ 2
2
Q 2 ((1 _X 2 )I + x 2Q 2 )
;
2
'
1)
(3.8)
Using the approach of Dumitriu and Koev [15], let t = (M - n +1)3/2 - 1 in Z;>o. We
can prove that the series truncates: Looking at (3.8), the hypergeometric function
involves the term
(-t)()
- -
1t
=
#+1
-1
,
i=1 j=1
1 and j
which is zero when i
1 = t, so the series truncates when any
-
ri
has
Ki - 1 > t, or just ri= t + 1. This must happen if k > nt. Thus (3.8) is just a finite
polynomial,
P F((m + p)/2) IF (((n - 1)4/2) + 1)
P(C < x1) = det (22((1
((n - m
k=1
-
2)/+X22-+)
2
F( 3)(m,3/2)F('3 ((n + p - 1)#3/2 + 1
(
1)4/2 +
1
1
n
-
C')(X2Q2((
X2 )I +
_
X2Q2W)
k,i<t
(3.9)
Let Z be a positive-definite diagonal matrix, and c a real with
nt
E
((n - m - 1)43/2 + 1)
k=1 Kak,p 1<t
((p-+n
-
(po 2)
1)3/2+
|cl
> 0. Define
-CF (Z).
Using Proposition 5,
f(Z, C) =
n - m2F1(0)
X 2F1(,)
(n
143
1,
13; p + n
2
'2
m - 143 + 1, p 3 n
'2'
S2
53
1+ I+;I
'
m - 1 + 143l
2
;I-
Z)
(3.10)
Using the definition of the hypergeometric function and the fact that the series must
truncate,
n- m -
=2F()
f Z)IE
f(Z~e=2F1%2
0
x
XE
L\
1/ +
p p+
+ '
12
-((r
m - 1),3/2 + 1)
(p,/2)
S
t
k=1 Kd-k,Kj
m
-
Cr3)(I - Z)
(3.11)
(p1 2)() C() (I - Z).
(3.12)
-F 1I
-1),312
Now the limit is obvious
n - m-i
f(Z, 0) = 2F(#)
+ 1; i)
P3; P + n -
+1,
nt
k=1 K -k,K1<t
Plugging this expression into (3.9)
P(C < XI) = det(x222 ( _( 2)+
x 2 F 1 (,
-m
2
-
2
I7$jF((m + p)0/2)Fn($ (((n - 1)0/2) + 1)
(2O
n
1/2)F ((n + p - 1),3/2 + 1)
-3)
1
,+P ; p + n 2
'2
)
#+
1
±iI)
nt
(pol~~
2)()C()(IX
2
)((1
-X
2
)I
± X2 Q 2 <1l).
k=1 K -k,Kist
Cancelling via Proposition 6 gives
P(C < xI)
=
det(x 2 Q 2 ((I
-
X2 )I+
Q2)-1)2
nt
(p#/2)()Cr) ((1
x
-
X2) ((I _
X2 )I + X2-2i--)
.
(3.13)
k=0 K~k,KI <t
3.4
Numerical Evidence
The plots below are empirical cdf's of the greatest generalized singular value as sam-
pled by the BetaMANOVA pseudocode in the introduction (the solid lines) against
the Theorem 5 formula for them as calculated by mhg (the o's).
54
CDF of Greatest Generalized Singular Value
1
0.9
0.8F
0.7S0.6 2
0.5E 0.40.30.20.1 -
0.4
0.5
0.8
0.7
0.6
Greatest Generalized Singular Value
Figure 3-1: Empirical vs. analytic when m
Q = diag(1, 2, 2.5, 2.7).
=
0.9
1
7, n = 4, p = 5, /# = 2.5, and
CDF of Greatest Generalized Singular Value
0.9
-
0.8 6
0.7-
2
0.6-
2
aa 0.5
E 0.40.3-
0.2
0.1
0.4
.5
0.8
0.7
0.6
Greatest Generalized Singular Value
0.9
1
Figure 3-2: Empirical vs. analytic when m = 9, n = 4, p = 6, /3 = 3, and Q =
diag(1, 2,2.5, 2.7).
55
Acknowledgments
We acknowledge the support of the National Science Foundation through grants SOLAR Grant No. 1035400, DMS-1035400, and DMS-1016086. Alexander Dubbs was
funded by the NSF GRFP.
56
Chapter 4
Infinite Random Matrix Theory,
Tridiagonal Bordered Toeplitz
Matrices, and the Moment
Problem
4.1
Introduction
First, we list the Cauchy, R, and S transforms; moments and free cumulants; and
Jacobi parameters and orthogonal polynomial sequences; of the four major laws of
infinite random matrix theory, the Wigner semicircle law, the Marchenko-Pastur law,
the McKay law, and the Wachter law. We discuss special properties of the moments
and free cumulants.
Then, we present an algorithm that starts with the moments of an analytic,
compactly-supported measure and returns a fine discretization of the measure in
MATLAB. Moments are converted to Jacobi parameters via the continuous Lanczos
iteration, which are then placed in a continued fraction, the imaginary part of which
is nearly the original measure, see Theorem 7 and the algorithm before it.
57
4.2
The Jacobi Symmetric Tridiagonal Encoding
of Probability Distributions
All distributions have corresponding tridiagonal matrices of Jacobi parameters. They
may be computed, for example, by the continuous Lanczos iteration, described in [64,
p.286] and reproduced in Table 4.7.
We computed the Jacobi representations of the four laws providing the results in
Table 4.1. The Jacobi parameters (ai and
3
.j
.) are elements of an
for i = 0, 1, 2 ...
infinite Toeplitz tridiagonal representations bordered by the first row and column,
which may have different values from the Toeplitz part of the matrix.
a0
10
a1
31
,31 a
#1
00 =31
o 01
a1
3
f 1
ao = al
ao 7 al
Wigner semicircle
McKay
Marchenko-Pastur
Wachter
ai
ao
an, (n ; 1)
0
f3, (n ; 1)
Wigner Semicircle
0
0
1
1
Marchenko-Pastur
A
A+ 1
V
McKay
0
0
Measure
a
a+b
V/f
ab
a2 - a+ab+b
2
(a + b)3/ 2
(a + b)
V
V/o -I
Fab(a+ b -1)
(a + b)2
Table 4.1: Jacobi parameter encodings for the big level density laws. Upper left:
Symmetric Toeplitz Tridiagonal with 1-boundary , Upper Right: Laws Organized by
Toeplitz Property, Below: Specific Parameter Values
Ansehlovich
[3]
provides a complete table of six distributions that have Toeplitz
58
Jacobi structure. The first three of which are semicircle, Marchenko-Pastur, and
Wachter. The other three distributions occupy the same box as Wachter in Table 4.1.
Anshelovich casts the problem as the description of all distributions whose orthogonal
polynomials have generating functions of the form
00
1
1 (X) z
-1
- xu(z) + tv(z)'
n=O
which he calls Free Meixner distributions.
He includes the one and two atom forms of the Marchenko-Pastur and Wachter
laws which correspond in random matrix theory to the choices of tall-and-skinny vs.
short-and-fat matrices in the SVD or CS decompositions, respectively.
4.3
Infinite RMT Laws.
This section compares the properties of all four major infinite random matrix theory
laws, the Wigner semicircle law, the Marchenko-Pastur law, the McKay law, and the
Wachter law. Their Cauchy transforms are below.
Measure
Cauchy Transform
2
z - dz -_4
Wigner Semicircle
2
(1 - A+ z) 2 - 4z
2z
Marchenko-Pastur
- A+z -
McKay
(v- 2)z -v
Wachter
2(v2
4(1 -v) + z 2
- z2 )
1 - a + (a + b - 2)z - y'(a+ 1 - (a+ b)z) 2 - 4a(1 - z)
2z(1 - z)
Table 4.2: Cauchy transforms .
59
We can also write down the moments for each measure in Table 4.3, for Wigner
and Marchenko-Pastur see [18], for McKay see [50], and for Wachter see Theorem 6.1
in the Section 6. Remember the Catalan number C = n1
polynomial Nn(r)
=
1
Nn,jrj, where Nn, =
(
I()
n
(2n)
and the Narayana
), excepting No(r) = 1. The
coefficients of v3 (1 - v)n/ 2-j in the McKay moments form the Catalan triangle. We
discuss the pyramid created by the Wachter moments in Section 4.4.
Moment n
Measure
Wigner Semicircle
Cn/2 if n is even, 0 otherwise
Marchenko-Pastur
Nn(A)
McKay
Wachter
n
(
E n/2
j=+
a
a+b-
(
n- j
b n-2 [
(a + b) 1:
.
vj(v
-
-
1)n/2-j
a(a+b-
1)
if n is even, 0 otherwise
2j+4
a+bNj+1
b
aa+b-1
Table 4.3: Moments
Inverting the Cauchy transforms and subtracting 1/w, computes the R-transform,
see Table 4.4. If there are multiple roots, we pick one with a series expansion with
no pole at w = 0.
The free cumulants rn for each measure appear in Table 4.5 by expnding the Rtransform above (the generating function for the Narayana polynomials is given by
[48], the generating function for the Catalan numbers is well known).
It is widely known that the Catalan numbers are the moments of the semicircle
law, but we have not seen any mention that the same numbers figure prominently as
60
Measure
R-transform
S-transform
w
1
1-w
z
Wigner Semicircle
Marchenko-Pastur
-v+v
McKay Mc~ay
V+
-a-b+ w-+F
Wachter
1+4w2
v
V2
V -z
' +422
2w
2
(a+b)2 + 2(a-b)w+w
a-az-bz
2w
z2(z- 1)
Table 4.4: R-transforms and S-transforms computed as S(z)
R-l(z)/z
the free cumulants of the McKay Law. The Narayana Polynomials are prominent as
the moments of the Marchenko-Pastur Law, but they also figure clearly as the free
cumulants of the Wachter Law. There are well known relationships, involving Catalan
numbers, between the moments and free cumulants of any law [53], but we do not
know if the pattern is general enough to take the moments of one law, transform it
somewhat, and have them show up in the free cumulants in another law.
We compute an S-transform as S(z) = R- 1 (z)/z. See Table 4.4.
Each measure has a corresponding three-term recurrence for its orthonormal polynomial basis, with q_1
((x
-
an)qn(x) -
(x)
= 0, qo(x) = 1, 3_1 = 0, and for n > 0, qni+(x) =
/3_1qn_(x))/.
In the case of the Wigner semicircle, Marchenko-
Pastur, McKay, and Wachter laws, the Jacobi parameters an and f3 are constant for
n > 1 because they are all versions of the Meixner law [3] (a linear transformation
may be needed). The Wigner Semicircle case is given by simplifying the Meixner law
in [2], and the Marchenko-Pastur, McKay, and Wachter cases are given by taking two
iterations Lanczos algorithm symbolically to get a 1 and
#1. See Table
4.1.
Each measure also has an infinite sequence of monic polynomials qn(x) which are
61
Measure
rn
Wigner Semicircle
6n,2
Marchenko-Pastur
A
(-1)(n- 1 )/ 2vC(n-l)/
McKay
if n is odd, 0 otherwise
2
-Nn (b)
Wachter
( a)n+
a
(a +b)2n+1
Table 4.5: Free cumulants.
qn(x),
Measure
Wigner Semicircle
A(n- 1 )/ 2 (x - A)Un
Wachter
(v - 1)(n- 1)/ 2 xUn
(x
-
ab)
- An/ 2 Un-
1 (xA
1 (2
- v(v
ab(a+b-1)
Un-
a~b(a~)
a+b
a-b-1
(
ab(a+b-1)
(a+b) 2
1.
(')
Un
Marchenko-Pastur
McKay
>
U
2
(XA1)
1)(n- 2 )/ 2 Un-2
1
(
-b-a(a+b-l)+(a+b)2
2V/ab(a+b--1)
x
-b-a(a+b-1)+(a+b)2X
2Vab(a+b-1)
n-2
Table 4.6: Sequences of polynomials orthogonal over of the four major laws.
orthogonal with respect to that measure. They can be written as sums of Chebyshev
polynomials of the second kind, Un(x), which satisfy U
62
1
= 0, Uo(x) = 1, and Un(x) =
2xU,_ 1 (x)
-
Un- 2 (x) for n > 1, [35]. See Table 4.6. For n = 0, qo(x) = 1, and in
general for n > 1,
qn(x) =
3'--(x - ao)Un-1 ((x - a1)/(2/1))
-
/32,3n-
2
Un-
2
((X
-
a1)/(21)).
In the Wigner semicircle case the polynomials can be combined using the recursion
rule for Chebyshev polynomials.
4.4
The Wachter Law Moment Pyramid.
Using Mathematica we can extract an interesting number pyramid from the Wachter
moments, see Figure 4-1.
Each triangle in the pyramid is formed by taking the
coefficients of a and b in the i-th Wachter moment, with the row number within the
pyramid determined by the degree of the corresponding monomial in a and b. All
factors of (a + b) are removed from the numerator and denominator beforehand and
alternating signs are ignored.
Furtheremore, there are many patterns within the pyramid. The top row of each
triangle is a list of Narayana numbers, which sum to Catalan numbers. The bottom
entries of each pyramid are triangular numbers. The second-to-bottom entry on the
right of every pyramid is a sum of consecutive triangular numbers. The second to
both the left and right on the top row of every triangle are also triangular numbers.
4.5
Moments Build Nearly-Toeplitz Jacobi Matrices.
This section is concerned with recovering a probability distribution from its Jacobi
parameters, ac and
#3 such
that they are "Nearly Toeplitz," i.e. there exists a k such
that for i > k all ai are equal and all /3# are equal. Note that i ranges from k to oc.
The Jacobi parameters are found from a distribution by the Lanczos iteration.
We now state the continuous Lanczos iteration, replacing the matrix A by the
63
I
1
1
3
1
1
3
4
6
6
6
1
1
6
10
20
5
10
20
10
1
10
60
6
45
15
75
50
1
10
20
15
20
50
15
15
1
15
50
84
210
21
315
189
1
50
140
21
105
175
210
105
7
35
35
21
1
21
28
105
175
105
700
280
560
392
1176
588
56
1176
70
490
490
980
196
490
196
1
21
8
140
28
56
28
1
36
28
490
196
490
28
196
216
2520
1260
1890
504
36
720
5040
3360
2520
336
1344
84
4704
4704
2176
1512
3528
1764
1008
1176
1
9
126
126
84
336
36
1
36
336
1176
336
1176
1764
36
1
315
10
2520
7350
8820
840
4410
45
45
1215
8100
18900
17010
5670
540
120
2700
14400
25200
15120
2520
210
3780
15120
17640
5292
2520
252
3402
9072
5292
1890
120
540
210
45
Figure 4-1: A number pyramid from the coefficients of the Wachter law moments.
64
variable x and using p =pws,IptMP, ptM, pw to compute dot products. A good source
is [64]. For a given measure p on an interval I, let
(p(x), q(x)) =
p(x)q()dy,
and ||p(x)IH = V(p(x),p(x)). Then the Lanczos iteration is described by Table 4.7.
Lanczos on Measure p
/3 -i
= 0, q_ 1 (x) = 0, qo(x) = 1
for n= 0,1,2, ... do
v(x) = xq,(x)
an = (qn(x), v(x))
v(x) = v(x) - #2n-1qn-i(x) - anqn(x)
f3 = 11v(x)1|
qn+1(x)
=
VW #n
end for
Table 4.7: The Lanczos iteration produces the Jacobi parameters in a and /3.
There are two ways to compute the integrals numerically. The first is to sample x
and qa(x) at many points on the interval of support for qo(x) = 1 and discretize the
integrals on that grid. The second can be done if you know the moments of 1t. If r(x)
and s(x) are polynomials, (r(x), s(x)) can be computed given p's moments.
Since
the qn(x) are polynomials, every integral in the Lanczos iteration can be done in this
way. In that case, the qn(x) are stored by their coefficients of powers of x instead of
on a grid. Once we have reached k iterations, we have fully constructed the infinite
Jacobi matrix using the first batch of f's moments, or a discretization of P.
Step 1 can start with a general measure in which case Step 3 finds an approximate
measure with a nearly Toeplitz representation. Step 1 could also start with a sequence
of moments. It should be noted that the standard way to go from moments to Lanczos
coefficients uses a Hankel matrix of moments and its Cholesky factorization
((30],
(4.3)).
As an example, we apply the algorithm to the histogram of the eigenvalues of
(X/ II
+ pI)'(X/Vm + pI), where X is m x n, which has Jacobi parameters ai
65
Algorithm: Compute Measure from Nearly Toeplitz Jacobi Matrix.
1. Nearly Jacobi Toeplitz Representation: Run the continuous Lanczos algorithm
up to step k, after which all ai are equal and all 13i are equal, or very nearly so.
If they are equal, this algorithm will recover dp exactly, otherwise it will find
it approximately. The Lanczos algorithm may be run using a discretization of
the measure y, or its initial moments.
(ao:oo, /3o:oo) = Lanczos (dp(x)).
2. Cauchy transform: evaluate the finite continued fraction below on the interval
of x where it is imaginary.
1
g(x)
2
x
-
ao -
/32
-1
Xk- al
2
k-2
3. Inverse Cauchy Transform: divide the imaginary part by -wr, to compute the
desired measure.
1
dp(x)
-- Im (g(x)).
Table 4.8: Algorithm recovering or approximating an analytic measure by Toeplitz
matrices with boundary.
and 13i that converge asymptotically and quickly. We smooth the histogram using a
Gaussian kernel and then compute its Jacobi parameters. The reconstruction of the
histogram is in Figure 4-3 We also use the above algorithm to reconstruct a normal
distribution from its first sixty moments, see Figure 4-4.
The following theorem concerning continued fractions allows one to stably recover
a distribution from its Lanczos coefficients ac
and fi. As we have said, if the first
batch of p's moments are known, we can find all ai and
/J
from i = 0 to 00 using the
continuous Lanczos iteration.
Theorem 7. Let p be a measure on interval I C R with Lanczos coefficients ai and
03, with the property that all ai are equal for i > k and all f3 are equal for i > k.
We can recover I = [ak -
2 ,3 k,
ack + 20k], and we can recover dp(x) using a continued
66
fraction. This theorem combines Theorems 1.97 and 1.102 of [32].
1
g(x)
x
-
a1
-
____________
-ak
dpu(x)
1
-- Im (g(x)).
Ir
Figure 4-2 illustrates curves recovered from random terminating continued fractions g(x) such that the
f3#
are positive and greater in magnitude than the aj. In
both cases, the above theorem allows correct recovery of the ai and
fj
(which is not
always numerically possible). In the first one, k = 5, in the second, k = 3.
If X is an m x n, m < n matrix of normals for m and n very large, (X/inI
pI)(X// m + pI) has ai and
+
/J which converge to a constant, making its eigenvalue
distribution recoverable up to a very small approximation. See Figure 4-3
We also tried to reconstruct the normal distribution, whose Jacobi parameterization is not at all Toeplitz, and which is not compactly supported. Figure 4-4 plots
the approximations using 10 and 20 moments.
4.6
Direct computation of the Wachter law moments.
While the moments of the Wachter law may be obtained in a number of ways, including expanding the cauchy transform, or applying the mobius inverse formula to
the free cumulants, in this section we show that a direct computation of the integral
is possible.
67
0.21
0.180.160.140.120.10.080.060.040.020
-6
8
6
4
2
0
-2
-4
x
0.18
0.160.140.12>
0.1-
C
V
0.080.060.040.020
-8
-6
-4
-2
2
0
4
6
8
10
x
Figure 4-2: Recovery from of a distribution from random ao and /3 using Theorem
5.1. On top we use k = 5, on bottom we use k = 3.
68
140 r
120-
100-
80-
60-
40-
20-
0
10
15
20
30
25
35
40
45
x
Figure 4-3: Eigenvalues taken from (X/V/~r + pI)(X/#/i+ JI), where X is m x n,
4
= 10 , n = 3m, p = 5. The blue bar histogram is taken using hist.m, a better one
was taken by convolving the data with Gaussian kernel. That convolution histogram
was used to initialize the continuous Lanczos algorithm which produced five a's and
/'s. They were put into a continued fraction as described above, assuming ac and 3i
to be constant after i = 5. The continued fraction recreated the histogram, which is
the thick red line.
m
69
0.45 r
0.40.350.3-
0.25 0.2 0.150.1 -
0.050
-4
-3
-2
0
X
-1
1
2
4
3
Figure 4-4: The normal distribution's Jacobi matrix is not well approximated by
Toeplitz plus boundary, but with sufficiently many moments good approximations are
possible. The above graph shows the normal distribution recovered by the method
in this paper using 10 and 20 moments. The thick line is the normal computed by
e- 2 /2
/2w, and the thin lines on top of it use our algorithm.
Theorem 8. We find the moments of the Wachter law, ink.
Mk
=
a+b-
b__4
Va(a + b - 1)
k-2
(
aN+1
(ak+-b)
-b)(
Proof. We start by integrating the following expression by comparing it to the MarchenkoPastur law.
1
f"+
IXk
(
- Xa(X-
,adX.
If x = su, dx = sdu and this integral becomes
J1
=
IUIV(Uur-
du.
To compare this expression to the Marchenko-Pastur law, we need to pick s and A
such that
= (1V
(
) 2 and I
= (1 - VA) 2 for A > 1. There are more than one
70
(
choices of each parameter, but we pick Vi =
,'±
)
- gjp
and
Using the Narayana numbers, and the formula for the moments of the MarchenkoPastur law, the integral equals
J1 =
2 k+
P4 )
I(VITT-
4
±+2
/
Nk+1
Using a and b, this becomes
2k+4
J
a(a+b - 1))
a +b)
Nk+1
(a(a + b - 1)
We also need to integrate
1
V/+(P+ -x)(x -
)dx
- x
1
27r fsu_
Let su = x - 1. sdu = dx. This becomes
J2
~'1 (/1±-i
=
-
a) (a
U
-
____
LULL,
du,
which by symmetry is
S
J2 = 2
U
27r
Using the same technique as previously, v/s
l --
(g1-
+ + V/l -p-
71
iu.
- v/1 - p+) and
Using the fact that the Marchenko-Pastur law is normalized, the answer is
1J2
= s = 4 (VI
a
2
_
l-
+
= (a+ b) 2
*
Now we are ready to find the moments of the Wachter law. Using the geometric
series formula,
a+b
mk
27
a+b
a~b
27
a
a +b
4.7
x-
t_-
0
I
dx
1 -x
A+
PJ7
-)(- p-)dx - a+ b
F(p+ -
SA+
1- x
/1
k-2
(a+b)E
'
27rj
(
a(a +b-)
a+b
Y+I
(p,+
-
x)(x -
p_)dx
2j+4
)
2
Nj1a(a+b-1))
Acknowledgements
We would like to thank Michael LaCroix, Plamen Koev, Sheehan Olver and Bernie
Wang for interesting discussions. We gratefully acknowledge the support of the National Science Foundation: DMS-1312831, DMS-1016125, DMS-1016086.
72
Bibliography
[1] George E. Andrews, Richard Askey, and Ranjan Roy, Special Functions, Cambridge University Press, 1999.
[2]
Michael Anshelevich, Wojciech Motkowski, "The free Meixner class for pairs of
measures," arXiv, 2011, http://arxiv.org/abs/1003.4025
[3]
Michael Anshelevich,
"Bochner-Pearson-type
characterization
of the free
Meixner class," Adv. in Appl. Math. 46 (2011), 25-45
[4] T. H. Baker and P. J. Forrester, "The Calogero-Sutherland model and generalized
classical polynomials," Communications in Mathematical Physics 188 (1997), no.
1, 175-216.
[51
A. Bekker and J.J.J. Roux, "Bayesian multivariate normal analysis with a
Wishart prior," Communications in Statistics - Theory and Methods, 24:10, 24852497.
[6] James R. Bunch and Christopher P. Nielson, "Updating the Singular Value Decomposition," Numerische Mathematik, 31, 111-129, 1978.
[7] Mireille Capitaine, Muriel Casalis, "Asymptotic freeness by generalized moments
for Gaussian and Wishart matrices. Application to beta random matrices." Indiana University Mathematics Journal,01/2004; 53(2):397-432.
[8] Djalil Chafai, "Singular Values of Random Matrices," notes available online.
[9] Percy Deift, Orthogonal Polynomials and Random Matrices: A Riemann-Hilbert
Approach, Courant Lecture Notes in Mathematics, 1998.
73
[10] Alexander Dubbs, Alan Edelman, Plamen Koev, and Praveen Venkataramana,
"The Beta-Wishart Ensemble," 2013, http://arxiv.org/abs/1305.3561
[11] Alexander Dubbs, Alan Edelman, "The Beta-MANOVA Ensemble with General
Covariance," Random Matrices: Theory and Applications, Vol. 03, No. 1.
[12] Ioana Dumitriu and Alan Edelman, "Matrix Models for Beta Ensembles," Journal of Mathematical Physics, Volume 43, Number 11, November, 2002.
[13] loana Dumitriu, "Eigenvalue Statistics for Beta-Ensembles," Ph.D. Thesis, MIT,
2003.
[14] loana Dumitriu, Alan Edelman, Gene Shuman, "MOPS: Multivariate orthogonal
polynomials (symbolically)," Journal of Symbolic Computation, 42, 2007.
[15] Joana Dumitriu and Plamen Koev, "Distributions of the extreme eigenvalues of
Beta-Jacobi random matrices," SIAM Journal of Matrix Analysis and Applications, Volume 30, Number 1, 2008.
[16] Freeman J. Dyson, "The Threefold Way. Algebraic Structure of Symmetry
Groups and Ensembles in Quantum Mechanics,"
Journal of Mathematical
Physics, Volume 3, Issue 6.
[17] Alan Edelman, "The Random Matrix Technique of Ghosts and Shadows,"
Markov Processes and Related Fields, 16, 2010, No. 4, 783-790.
[18] Alan Edelman, Random Matrix Theory, in preparation.
[19] Alan Edelman and Plamen Koev, "Eigenvalue distributions of beta-Wishart matrices," unpublished, available at: math.mit.edu/~plamen/files/mvs.pdf
[20] Alan Edelman, N. Raj Rao, "Random matrix theory," Acta Numerica, 2005.
[21] Alan Edelman and Brian D. Sutton, "The Beta-Jacobi Matrix Model, the CS
Decomposition, and Generalized Singular Value Problems," Foundationsof Computational Mathematics, 259-285 (2008).
74
[22] Alan Edelman and Brian Sutton, "The Beta-Jacobi Matrix Model, the CS decomposition, and generalized singular value problems," Foundations of Computational Mathematics, 2007.
[23] I. G. Evans, "Bayesian Estimation of Parameters of a Multivariate Normal Distribution" , Journal of the Royal Statistical Society. Series B (Methodological),
Vol. 27, No. 2 (1965), pp. 279-283.
[24] R. A. Fisher, "The sampling distribution of some statistics obtained from nonlinear equations," Ann. Eugenics 9, 238-249, 1939.
[25]
Peter Forrester, "Exact results and universal asymptotics in the Laguerre random
matrix ensemble," J. Math. Phys. 35, (1994).
[26] Peter Forrester, Log-gases and random matrices, Princeton University Press,
2010.
[27] Peter Forrester, "Probability densities and distributions for spiked Wishart /3ensembles," arXiv:1101.2261 v1 (2011).
[28] Peter J. Forrester, Eric M. Rains, "Interpretations of some parameter dependent
generalizations of classical matrix ensembles," Probability Theory and Related
Fields, Volume 131, Issue 1, pp. 1-61, January, 2005.
[29] Girshick, M. A. "On the sampling theory of roots of determinantal equations,"
Ann. Math. Statist., 10, 203-224, (1939).
[30] Golub, G. H., and J. A. Welsch, "Calculation of Gauss Quadrature Rules," 1969,
Math. Comp. 23, 221.
[31] Ming Gu and Stanley Eisenstat, "A stable and fast algorithm for updating the
singular value decomposition," Research Report YALEU/DCS/RR9-66, Yale
University, New Haven, CT, 1994.
75
[32] Akihito Hora and Nobuaki Obata, Quantum Probability and Spectral Analysis
of Graphs, Theoretical and Mathematical Physics (Springer, Berlin Heidelberg
2007).
[33] P. L. Hsu, "On the distribution of roots of certain determinantal equations,"
Ann. Eugenics 9, 250-258, 1939.
[34] Suk-Geun Hwang, "Cauchy's Interlace Theorem for Eigenvalues of Hermitian
Matrices," The American Mathematical Monthly, Vol. 111, No. 2 (Feb., 2004),
pp. 157-159.
[35] Timothy Kusalik, James A. Mingo, Roland Speicher, "Orthogonal Polynomials
and Fluctuations of Random Matrices," (2005), on arXiv.
[36] Alan T. James, "The distribution of the latent roots of the covariance matrix,"
Ann. Math. Statist. Volume 31, 151-158, (1960).
[37] Alan T. James, "Distributions of matrix variates and latent roots derived from
normal samples," Ann. Math. Statist. Volume 35, Number 2, (1964).
[38] Kurt Johansson, Eric Nordenstam, "Eigenvalues of GUE minors," Electronic
Journal of Probability,Vol. 11 (2006), pp. 1342-1371.
[39] Jyoichi Kaneko, "Selberg integrals and hypergeometric functions associated with
Jack polynomials," SIAM Journal on Mathematical Analysis, Volume 24, Issue
4, July 1993.
[40] Rowan Killip and Irina Nenciu, "Matrix models for circular ensembles," International Mathematics Research Notes, Volume 2004, Issue 50, pp. 2665-2701.
[41] Plamen
Koev,
"Computing
Multivariate
Statistics,"
online
notes
at
http://math.mit.edu/~plamen/files/mvs.pdf
[42] Plamen Koev's web page: http://www-math.mit.edu/~plamen/software/mhgref.html
76
[43]
Plamen Koev and Alan Edelman, "The Efficient Evaluations of the Hypergeometric Function of a Matrix Argument," Mathematics of Computation, Volume
75, Number 254, Pages 833-846, January 19, 2006. Code available at http://wwwmath.mit.edu/ plamen/software/mhgref.html
[44] Fei Li and Yifeng Xue, Zonal polynomials and hypergeometric functions of
quaternion matrix argument, Communications in Statistics: Theory and Methods, Volume 38, Number 8, January 2009.
[45] Fei Li and Yifeng Xue, "Zonal polynomials and hypergeometric functions of
quaternion matrix argument," Communications in Statistics: Theory and Methods, Volume 38, Number 8, January 2009.
[46] Ross Lippert, "A matrix model for the
-Jacobi ensemble," Journal of Mathe-
matical Physics 44(10), 2003.
[47] I. G. Macdonald, "Hypergeometric functions," unpublished manuscript.
[48] T. Mansour and Y. Sun, "Identities involving Narayana polynomials and Catalan
numbers," Disc. Math., 309:4079-4088, 2009.
[49] V. A. Marchenko and L. A. Pastur, "Distribution of the eigenvalues in certain
sets of random matrices," Matematicheskii Sbornik, 72 (114), 1967.
[50] Brendan D. McKay, "The Expected Eigenvalue Distribution of a Large Regular
Graph," Linear Algebra and its Applications, 40:203-216 (1981).
[51] Mood, A. M., "On the distribution of the characteristic roots of normal secondmoment matrices," Ann. Math. Statist., 22, 266-273, 1951.
[52] Robb J. Muirhead, Aspects of MultivariateStatistical Theory, Wiley-Interscience,
1982.
[53] Alexandru Nica and Roland Speicher, Lectures on the Combinatorics of Free
Probability, Cambridge University Press, 2006.
77
[54] A. Okounkov and G. Olshanksi, "Shifted Jack Polynomials, Binomial Formla,
and Applications," Mathematical Research Letters 4, 69-78, (1997).
[55] I. Olkin and S. N. Roy, "On Multivariate Distribution Theory," Ann. Math.
Statist. Volume 25 Number 2 (1954), 329-339.
[56] S. Olver and R. Nadakuditi, "Numerical computation of convolutions in free probability theory," preprint on arXiv:1203.1958.
[57] B. N. Parlett, The Symmetric Eigenvalue Problem. SIAM Classics in Applied
Mathematics, 1998.
[58] Victor Perez-Abreu and Noriyoshi Sakuma, "Free infinite divisibility and free
multiplicative mixtures of the wigner distribution," Comunicaciones del CIMAT,
No 1-09-07/15-10-2009.
[59] N. Raj Rao, Alan Edelman, "The Polynomial Method for Random Matrices,"
Foundations of Computational Mathematics, 2007.
[60] T. Ratnarajah, R. Vaillancourt, M. Alvo, "Complex Random Matrices and Applications," CRM-2938, January, 2004.
[61] S. N. Roy, "p-Statistics and some generalizations in analysis of variance appropriate to multivariate problems," Sankhya 4, 381-396, 1939.
[62] Luis Santalo, Integral Geometry and Geometric Probability, Addison-Wesley
Publishing Company, Inc. 1976.
[63] Richard P. Stanley, "Some combinatorial properties of Jack symmetric functions," Adv. Math. 77, 1989.
[64] Lloyd N. Trefethen and David Bau, III, Numerical Linear Algebra, SIAM, 1997.
[65]
Kenneth W. Wachter, "The strong limits of random matrix spectra for sample
matrices of independent elements," Annals of Probability,6, 1978.
78
[66] Eugene P. Wigner, "Characteristic vectors of bordered matrices with infinite
dimensions," Annals of Mathematics, Vol. 62, 1955.
[67] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press,
1999.
79