Free probability: Basic concepts, tools, applications, and relations to other fields

advertisement
Free probability: Basic concepts, tools,
applications, and relations to other fields
Øyvind Ryan
February 21, 2008
Abstract
In (at least) two talks, I will formally define the concepts of free probability theory, state its main theorems, and present some of the useful
tools it provides which may be applicable to other fields. Free probability,
its connection with random matrix theory, and some applications, were
presented in my talk on the CMA seminar 13th of December. These talks
will go deeper, in that all concepts from the talk will be formally defined
and theorems proved, while many new facets of the theory are added.
Contents
1 Intoduction to free probability and the concept of freeness
2
2 Free convolution, its analytical machinery, and its combinatorial facet
3
3 Connection to random matrix theory
7
4 Results from classical probability which have their analogue in
free probability
10
5 The free central limit theorem
13
6 An important result from free probability, applicable in many
situations
16
7 Applications to wireless communication
17
7.1 Channel capacity estimation using free probability theory [6] . . 17
7.2 Estimation of power and the number of users in CDMA systems [7] 20
8 Applications to portfolio optimization
20
8.1 Interpretation of eigenvalues and eigenvectors . . . . . . . . . . . 22
8.2 Markowitz portfolio optimization . . . . . . . . . . . . . . . . . . 23
8.3 Cleaning of correlation matrices . . . . . . . . . . . . . . . . . . . 24
1
8.4
8.5
8.6
1
Other ways of forming an empirical matrix . .
Dynamics of the top eigenvalue and eigenvector
Relation between the correlation matrix and the
lation matrix . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
empirical corre. . . . . . . . . .
25
27
28
Intoduction to free probability and the concept of freeness
A useful view of classical probability is in terms of (C(Ω), E), where C(Ω) is
the space of (real-valued) functions (i.e. the space of random variables)
on the
R
probability space (µ, Ω), and E is the linear functional E(f ) = f (x)dµ(x)
on C(Ω) (i.e. the expectation). In this classical setting, all random variables
commute w.r.t. multiplication. Is it possible to find a useful theory where the
space C(Ω) is replaced by an algebra (in particular matrix algebras), where the
random variables not necessarily commute? And how does an expectation look
like in such a theory? By a useful theory, we mean a theory where a concept
analogous to independence exists, so that classical results hold within our new
setting also (with random variables, expectation, independence replaced by the
new concepts). The candidate for our new theory will be as follows:
Definition 1. By a (noncommutative) probability space we mean a pair (A, φ),
where A is a unital ∗-algebra, and where φ (the expectation) is a unital linear
functional on A. The elements of A are called random variables. A family of
unital ∗-subalgebras (Ai )i∈I is called a free family if


aj ∈ Aij


i1 6= i2 , i2 6= i3 , · · · , in−1 6= in
⇒ φ(a1 · · · an ) = 0.
(1.1)


φ(a1 ) = φ(a2 ) = · · · = φ(an ) = 0
A family of random variables ai is called a free family if the algebras they generate form a free family.
The algebra A will mostly be some subalgebra of the n × n matrices Mn (C)
(for instance the unitary matrices U(n), or diagonal matrices), or some subalgebra of n × n random matrices. The freeness relation will typically not be
found for small matrices, but, as we will see, it will be very useful in describing
relationships in the spectra of random matrices when the matrices get large.
Remark: Freeness is actually quite different from classical independence:
2
We have that E(a1 a2 a1 a2 ) = E(a21 )E(a22 ) when a1 and a2 are independent, but
¡
¢
φ(a1 a2 a1 a2 ) = φ (a01 + φ(a1 )I)(a02 + φ(a2 )I)(a01 + φ(a1 )I)(a02 + φ(a2 )I)
= φ(a01 a02 a01 a02 ) + · · ·
|
{z
}
0
+φ(a1 )φ(a2 )φ(a1 )φ(a2 ) + φ((a01 )2 )φ(a2 )2 + φ((a02 )2 )φ(a1 )2
= φ(a1 )2 φ(a2 )2
¡
¢
¡
¢
+ φ(a21 ) − φ(a1 )2 φ(a2 )2 + φ(a1 )2 φ(a22 ) − φ(a2 )2
= φ(a21 )φ(a2 )2 + φ(a22 )φ(a1 )2 − φ(a1 )2 φ(a2 )2
6
=
φ(a21 )φ(a22 )
when a1 and a2 are free. Here we have used the shorthand notation a0 =
a−φ(a)I. The order of the random variables is highly revelant in free probability.
2
Free convolution, its analytical machinery, and
its combinatorial facet
The definition of freeness is seen to give many combinatorial challenges in terms
of computation. We will here introduce the basic machinery on combinatorics
needed to prove some useful results from the definition of freeness. We denote
by P(n) the set of partitions of {1, ..., n}. In classical probability, any compactly
supported probability measure can be associated with its (classical) cumulants
cn :
Definition 2. The classical cumulants cn of the random variable x are defined
through the recursive relation
X
E(xn ) =
cπ ,
(2.1)
π∈P(n)
where cπ =
x by cn [x].
Q|π|
i=1 c|πi |
when π = {π1 , ..., π|π| }. We also denote the cumulants of
The nice property about cumulants in the classical case, is the following:
Theorem 1. If x and y are indepedendent, then cn [x + y] = cn [x] + cn [y] for
all n.
One can show that
log F(t) =
∞
X
(−it)n
cn ,
n!
n=1
R
where F(t) = R e−itx dν(x) is the Fourier transform of the probability measure
ν. In other words, the classical cumulants are actually the coefficients in the
power series expansion of the logarithm of the Fourier transform. This also
3
proves theorem 1, since log F is known to have the same linearizing property
for the sum of independent random variables.
In free probability, a similar definition of cumulants can be made, so that
the mentioned additivity property is maintained (with independence replaced by
freeness). One simply replaces the set of all partitions with the set of noncrossing
partitions in (2.1):
Definition 3. A partition is said to be noncrossing if, whenever i < j < k < l,
i and k are in the same block, and j and l are in the same block, then all i, j, k, l
are in the same block. The set of noncrossing partitions is denoted by N C(n).
In the following, we will also say that a partition (noncrossing or not) is a
pairing, if all blocks have exactly two elements.
The following provides the replacement of classical cumulants and log F for
the setting of free random variables:
Definition 4. The free cumulants αn of the random variable x are defined
through the recursive relation
X
φ(xn ) =
απ ,
(2.2)
π∈N C(n)
Q|π|
where απ = i=1 α|πi | when π = {π1 , ..., π|π| }. We also denote the cumulants
of x by αn [x]. The power series
Rx (z) =
∞
X
αi z i
i=1
is also called the R-transform (of the distribution of x).
We will also have need for the more general definition of mixed cumulants:
Definition 5. The mixed (free) cumulants α[x1 , ..., xk ] for a sequence of random
variables are defined through the recursive relation
X
απ ,
(2.3)
φ(xi1 · · · xin ) =
π∈N C(n)
where απ =
Q|π|
i=1
α[xπi1 , ..., xπi|πi | ], when π = {π1 , ..., π|π| }, and πi = {πi1 , ..., πi|πi | }.
One can show that the functional (also called the cumulant functional)
(x1 , ..., xn ) → α[x1 , ..., xk ] is linear in all variables.
The formulas (2.1) and (2.2) are also called moment-cumulant formulas.
Analogous to theorem 1, we have the following result:
Theorem 2. If a and b are free, then αn [a + b] = αn [a] + αn [b] for all n. In
other words, Ra+b (z) = Ra (z) + Rb (z) (i.e. the R-transform takes the role of
log F in the free setting).
4
This theorem has a nice interpretation in terms of probability measures: We
can associate with the moments of a and b two probability measures µa and µb
(at least in the case of compactly supported probability measures). Since the
definition of freeness really gives us a rule for computing the moments of a + b
(when they are free) (and thereby finding a new probability measure, we can
alternatively view addition of free random variables as a binary operation on
probability measures. This operation is denoted ¢. Using this new notation,
theorem 2 can be rewritten as
Rµa ¢µb (z) = Rµa (z) + Rµb (z),
which corresponds to the relationship between (classical) convolution of probability measures and the logarithm of the Fourier transform.
To prove theorem 2, we need the following result:
Theorem 3. Assume that a and b are free. Then the mixed cumulant α[x1 , ...xn ]
is zero whenever at least one xi = a, and at least one xi = b (with all xi taking their values in {a, b}). The converse also holds: Whenever all such mixed
cumulants of a and b vanish, a and b must be free.
We will not give the proof for this in its entirety (since more background on
the combinatorics of partitions is needed), but only sketch how the proof goes:
1. The first step in the proof is noting that we can just as well assume that
φ(a) = φ(b) = 0 due to the linearity of the cumulant functional, and since
α[x1 , ..., xn ] = 0 whenever one of the xi is a scalar (this last part requires
a proof of its own, and goes by induction).
2. If x1 6= x2 6= · · · 6= xn , then the definition of freeness gives us (with a little
extra argument) that α[x1 , ...xn ] = 0.
3. If not x1 6= x2 6= · · · 6= xn , we can group together neighbouring a’s and
b’s (at least once), so that we obtain an element y1 · · · ym with m < n,
and each yi = ai or yi = bi . The proof goes by analyzing y1 · · · ym instead
of the longer product x1 · · · xn , and using induction (the details are quite
involved, however).
It is easy to prove theorem 2 once theorem 3 is proved: (2.3) says that
X
φ((a + b)n ) =
απ [a + b]
π∈N C(n)
=
X
|π|
Y
π∈N C(n) i=1

=
=
α[a + b, ..., a + b]
|
{z
}
|πi | times

|π|
|π|
Y
Y


α[ b, ..., b ]
 α[ a, ..., a ] +
| {z }
| {z }
i=1
π∈N C(n) i=1
|πi | times
|πi | times
X
(απ [a] + απ [b]) ,
X
π∈N C(n)
5
where we have used the vanishing of mixed cumulants (theorem 3) between the
second and third equation. This proves that αn [a + b] = αn [a] + αn [b], so that
Ra+b (z) = Ra (z) + Rb (z), and this is the contents of theorem 2.
One distribution will have a very special role as a replacement for the Gaussian law in free probability theory:
Definition 6. A random variable a is called (standard) semicircular if its Rtransform is on the form Ra (z) = z 2 .
In section 5, it will be explained why such random variables are called semicircular.
It remains to explain how the probability density can be recovered from the
moments and the cumulants. For this, a connection with the Cauchy-transform
can be used:
Definition 7. The Cauchy-transform for a probability measure µ is defined by
Z
dµ(t)
Gµ (z) =
.
R z−t
The probability measure µ can be recovered from its Cauchy transform Gµ
via the Stieltjes inversion formula, which says that
1
fµ (t) = lim+ − =Gµ (t + i²)
π
²→0
for all t ∈ R. To see the connection between the Cauchy transform and the
R-transform, the following result can be used (we omit the proof, as it involves
more background on combinatorics):
¸
·
1
Gµ
(1 + Rµ (z)) = z.
(2.4)
z
In other words, the Cauchy transform can be found from the R-transform as
the inverse function of z1 (1 + Rµ (z)). Once the Cauchy transform has been
found, we can recover the density using Stieltjes inversion formula. This will
be examplified for the semicircular and the free Poisson distributions (to be
defined) in later sections.
There also exists a transform which does the same thing for multiplication
of free variables as the R-transform does for addition of free random variables.
This transform is called the S-transform, and has the property
Sµa £µb (z) = Sµa (z)Sµb (z),
where £ is defined just as ¢, but with addition replaced by multiplication. We
will not go into details on this, but only explain that the same type of analytical
machinery can be used to find the density of µa £ µb , as was the case above for
µa ¢ µb . To be more precise, one can show the relationship
1 −1
R (z),
z
which enables one to compute the R-transform, from which we have already
seen how to compute th Cauchy transform, and thus the density.
S(z) =
6
3
Connection to random matrix theory
We first need some more terminology:
Definition 8. A sequence of random variables an1 , an2 , ... in probability spaces
(An , φn ) is said to converge in distribution if, for any m1 , ..., mr ∈ N, k1 , ..., kr ∈
mr
1
{1, 2, ...}, we have that limn→∞ φn (am
nk1 · · · ankr ) exists. If also
mr
m1
mr
1
lim φn (am
nk1 · · · ankr ) = φ(Ak1 · · · Akr )
n→∞
for any m1 , ..., mr ∈ N, k1 , ..., kr ∈ {1, 2, ...}, with A1 , A2 , ..., free in some probability space (A, φ), then we will say that the an1 , an2 , ... are asymptotically free.
We will prove that independent Gaussian matrices are asymptotically free
as the matrices grow in size. To be more precise we will consider n × n matrices
An = √1n an = √1n (an (i, j))1≤i,j≤n , where
1. The entries an (i, j), 1 ≤ i ≤ j ≤ n form a set of of 12 n(n + 1) independent,
complex values random variables.
2. a(k, k) is real valued with standard Gaussian distribution (i.e. mean 0,
variance 1).
3. When i < j, the real and imaginary parts <(a(i, j)) and =(a(i, j)) are
independent and identically distributed with mean 0 variance 12 Gaussian
distribution.
4. a(i, j) = a( j, i).
The following holds:
Theorem 4. Let Sp be the set of permutations of p elements {1, 2, ..., p}. For
π ∈ Sp , let also π̂ be the order 2 permutation in S2p defined by
π̂(2j − 1) = 2π −1 (j),
π̂(2j) = 2π(j) − 1,
(j ∈ {1, 2, ..., p})
(j ∈ {1, 2, ..., p}),
(3.1)
let ∼π̂ denote the equivalence relation on {1, ..., 2p} generated by the expression
j∼π̂ π̂(j) + 1, (addition formed mod. 2p),
(3.2)
and let d(π̂) denote the number of equivalence classes of ∼π̂ .
If An1 , An2 , ... are Gaussian, independent matrices as described above, then
X
£
¤
nd(π̂)−p−1 ,
E trn (Ani1 · · · Ani2p ) =
π̂∈S2p ≤σ
where σ is the partition with blocks σj = {k|ik = j}. Moreover, d(π̂) ≤ p + 1,
with equality if and only if π̂ is noncrossing. Also, d(π̂) − p − 1 is always an
even number.
7
We will not show this in its entirety, only sketch some parts of the proof. The
proof builds heavily on the fact that the entries are Gaussian. The following fact
makes things more simple when entries are Gaussian: If f, g are real, standard,
Gaussian, and independent, then
E ((f + ig)m (f − ig)n ) = 0 unless m = n.
(3.3)
One multiplies the matrices Ani1 · · · Ani2p entry for entry, and only keeps certain
terms using (3.3). One then considers all possible identifications of depenedent
(i.e. equal) entries from different matrices (this leads us to consider π̂ which
stisfy π̂ ≤ σ only). These identifications give rise to the partition π̂ from the
text of the theorem. It turns out that, again due to the Gaussian property, a
cancellation phenomenon occurs so that one only has to consider π which are
pairings. The term nd(π̂)−p−1 follows from another careful count of the terms
for such pairings.
An important consequence of theorem 4 is the following:
Corollary 1. The An1 , An2 , ... are asymptotically free as n → ∞. Moreover,
each Ani converge in distribution to a standard semicircular random variable
Ai . Moerover the convergence is almost everywhere, i.e.
2p
lim trn (A2p
ni ) = φ(Ai )
n→∞
almost everywhere.
Proof: We see from theorem 4 that
£
¤
E trn (Ani1 · · · Ani2p ) =
X
¡
¢
1 + O n−2 ,
(3.4)
π̂∈S2p ≤σ
so that
X
£
¤
lim E trn (Ani1 · · · Ani2p ) =
n→∞
1.
π̂∈S2p ≤σ
Denoting this limit by φ(Ai1 · · · Ai2p ), this says that α[Ai1 , Ai2 , ...] = 0 whenever two different ij occur (i.e. the mixed cumulants vanish). This means
that A1 , A2 , ... are free (theorem 3), hence the An1 , An2 are asymptotically free.
Moreover, since
h
i
X
lim E trn (A2p
)
=
1,
ni
n→∞
π̂∈S2p
we see that α2 [Ai ] = 1, while all other αj [Ai ] = 0, since we only sum over
noncrossing pairings. But this implies that Ai is a standard semicircular random
variable.
Finally, the almost everywhere
¡
¢ convergence follows from the fact that the
deviation term in (3.4) is O n−2 . We will not present a complete proof for
this, but remark that it follows from the Borel-Cantelli lemma once we have
shown that
∞
X
¡¯
¡
¢
£
¡
¢¤¯
¢
P ¯trn Ani1 · · · Ani2p − E trn Ani1 · · · Ani2p ¯ > ² < ∞.
n=1
8
This in turn follows from the Chebychev inequality if we show that
h¯
¡
¢
£
¡
¢¤¯ i
P∞
¯trn Ani1 · · · Ani2p − E trn Ani1 · · · Ani2p ¯2
E
n=1
¡
¢¤¯2 ´
¡
¢¯2 i ¯ £
P∞ ³ h¯
= n=1 E ¯trn Ani1 · · · Ani2p ¯ − ¯E trn Ani1 · · · Ani2p ¯ < ∞.
(3.5)
We have already shown that the last term in the last expression satisfies
¯
¯2
¯
¯
¯ £
¯
¡
¢¤¯2 ¯ X
¡
¢
¯E trn Ani1 · · · Ani2p ¯ = ¯
(3.6)
1¯¯ + O n−2 .
¯
¯π̂∈S2p ≤σ ¯
A more complicated argument shows that the first term in the last expression
satisfies
¯
¯2
¯
¯
h¯
¯
¡
¢¯2 i ¯ X
¡
¢
E ¯trn Ani1 · · · Ani2p ¯ = ¯¯
(3.7)
1¯¯ + O n−2
¯π̂∈S2p ≤σ ¯
The proof is completed by putting (3.6) and (3.7) into (3.5).
.
In figure 1, a histogram is shown for the eigenvalues of a 1000 × 1000 selfadjoint standard Gaussian random matrix. The shape of a semicircle (ellipse
would be more correct to say) of radius 2 centered at the origin is clearly visible.
The following Matlab code produces the plot:
A = (1/sqrt(2000)) * (randn(1000,1000) + j*randn(1000,1000));
A = (sqrt(2)/2)*(A+A’);
hist(eig(A),40)
Not only Gaussian matrices display the freeness property when the matrices
get large. More correct would it perhaps be to say that this occurs for any
random matrix system where the eigenvector structure has a uniform distribution (i.e. point in each direction with equal probability) (it can be shown
that our Gaussian matrices have this property). Also, our Gaussian matrices
are not only asymptotically free with other Gaussian matrices: One can show
that, in a very general sense, the Gaussian matrices are asymptotically free from
any other random matrix system independent from it, and which converges in
distribution.
Another type of random matrices frequently used with similar properties as
Gaussian matrices, are standard unitary matrices. These are unitary random
matrices with distribution equal to the Haar measure on U(n).
Another much used random matrix system which exhibits asymptotic freeness is the system N1 XXH (which has the structure of a sample covariance
matrix), where X is n × N with i.i.d. complex standard Gaussian entries (n
is interpreted as the number of parameters in a system, N as the number of
samples taken from the system). One here lets n and N go to infinity at a given
n
= c. With a slight generalization of theorem 4, one can
ratio, say limn→∞ N
show that
·µ
¶p ¸
X
1
lim E
XXH
=
cp−|π|
n→∞
N
π∈N C(n)
9
35
30
25
20
15
10
5
0
−3
−2
−1
0
1
2
3
Figure 1: Histogram of the eigenvalues of a 1000 × 1000 selfadjoint standard
Gaussian random matrix
almost everywhere. We call this limit distribution the Marc̆henko Pastur law.
We quickly see that its free cumulants are 1, c, c2 , .... In the next section, we
will introduce analytic methods to calculate the density of this law.
4
Results from classical probability which have
their analogue in free probability
Besides the free central limit theorem in the next section, several other results
and concepts can be generalized from a classical to a free probability setting.
For instance, the following is the analogue of the Poisson distribution:
Definition 9. Let λ ≥ 0 and α ∈ R. The limit distribution for N → ∞ of
µµ
νN =
1−
λ
N
¶
δ0 +
λ
δα
N
¶¢N
(4.1)
is called the free Poisson distribution with rate λ and jump size α.
¡
¢
λ
λ
Denote by mn (νN ) the n’th moment of 1 − N
δα . It is clear that
δ0 + N
n
λα
mn (νN ) = N . It is not hard to show that the cumulants αn (νN ) of the
approximation (4.1) is on the form λαn + O(N −1 ). Taking the limit gives that
10
the n’th cumulant of the free Poisson distribution with rate λ and jump size α
is λαn , i.e. its R-transform is given by
R(z) =
∞
X
λαn z n =
n=1
λαz
.
1 − αz
This corresponds nicely to the cumulants of the classical Poisson distribution.
Example. The limit distribution of N1 XXH encountered above (which had
cumulants 1, c, c2 , ...) is the free Poisson distribution with rate 1c and jump size
c. Let us compute the density of this law. First of all,
z
R(z) = z + cz 2 + c2 z 3 + ... =
.
1 − cz
³
´
z
Using (2.4), we can find the Cauchy transform as the inverse function of z1 1 + 1−cz
.
We therefore solve
µ
¶
1
z
w=
1+
,
z
1 − cz
which implies that wcz 2 + (1 − c − w)z + 1 = 0. we find the roots of this as
p
w + c − 1 ± (1 − c − w)2 − 4wc
z =
p 2wc
w + c − 1 ± w2 − 2(1 + c)w + (1 − c)2
=
2wc
p
√
√
w + c − 1 ± (w − (1 − c)2 )(w − (1 + c)2 )
=
.
2wc
Using the Stieltjes inversion formula, (i.e. taking the limit of the imaginary part
of this as w → x (x real)), we obtain the density
p
√
√
(x − (1 − c)2 )((1 + c)2 ) − x
f (x) =
2πxc
√ 2
√ 2
for (1 − c) ≤ x ≤ (1 + c) (note that we have to treat the case x = 0
separately). In figure 2, four different Marc̆henko Pastur laws µc have been
plotted.
In classical probability, a probability measure µ is said to be infinitely divisible if, for any n, it can be written on the form
µ = µ∗(n)
,
n
for some probability measure µn , where ∗(n) denotes n-fold convolution. In
classical probability, the Lévy-Hinčin formula states which probability measures
are infinitely divisible. A similar result exist in free probability (with ∗ replaced
by ¢). To be more precise, a compactly supported probability measure µ is
¢-infinitely divisible if and only if its R-transform has the form
¶
µ
Z
z
dρ(x)
Rµ (z) = z α1 [µ] +
R 1 − xz
for some finite measure ρ on R with compact support.
11
1.6
c=0.5
c=0.2
c=0.1
c=0.05
1.4
1.2
Density
1
0.8
0.6
0.4
0.2
0
0.5
1
1.5
2
2.5
Figure 2: Four different Marc̆henko Pastur laws µc
12
3
5
The free central limit theorem
The free central limit theorem is analogous to the classical central limit theorem:
We simply need to replace independence with freeness, and the Gaussian law
with another law, called the semicircle law:
Definition 10. The semicircle law wm,r (of radius r and centered at m) is the
probability measure on R with density
½ 2 p
r2 − (x − m)2 if m − r ≤ x ≤ m + r
πr 2
wm,r (x) =
0
otherwise
w0,2 is also called the standard semicircle law.
Note that the term semicircular is a bit misleading.
More correct would be
q
to say semielliptic, since it is only the case r = π2 which yields a circular shape
of the density. All other values of r give an elliptic shape of the density. The
following lemma will be helpful when we connect to the semicircle law in the
proof of the free central limit theorem:
Lemma 1. The k’th moment of the standard semicircle law is equal to the
number of noncrossing pairings of {1, ..., k}. Equivalently, Rw0,2 (z) = z 2 ,i.e. the
standard semicircle law is the same as the distribution of a standard semicircular
random variable. In particular, the odd moments of the standard semicircle law
are all 0.
Proof: That the odd moments of the standard semicircle law all are zero
follows immediately by symmetry. Assume that k = 2s. Integration by parts
gives
Z 2
p
1
m2s =
x2s 4 − x2 dx
2π −2
Z 2
1
−x
√
= −
x2s−1 (4 − x2 )dx
2π −2 4 − x2
Z 2p
¡
¢0
1
=
4 − x2 x2s−1 (4 − x2 ) dx
2π −2
= 4(2s − 1)m2s−2 − (2s + 1)m2s .
This means that the recursion m2s = 2(2s−1)
m
holds. By induction, we
s+1
¡2s¢ 2s−2
1
will prove that this implies that m2s = s+1
(these
numbers are also called
s
the Catalan numbers cs ). This obviously holds for s = 0. Assume that we have
13
proved that m2(s−1) =
m2s
=
=
=
¡
1 2(s−1)
s
s−1
¢
. Then
µ
¶
2(2s − 1)
2(2s − 1) 1 2(s − 1)
m2s−2 =
s+1
s+1 s
s−1
2s(2s − 1) (2s − 2) · · · (s + 1)
s(s + 1)
1 · · · (s − 1)
µ ¶
1
2s · · · (s + 2)
2s
=
,
1···s
s+1 s
which
¡2s¢proves the induction step. It now suffices to prove that m2s = cs =
1
s+1 s equals the number of noncrossing pairings r2s of {1, ..., 2s}. Note that
these satisy the equation
r2s = r2s−2 r0 + r2s−4 r2 + · · · r2 r2s−4 + r0 r2s−2
P∞
(with r0 = 1). Setting g(x) = s=0 r2s xs+1 , this can equivalently be expressed
as the power series equation
g(x)2 = g(x) − x.
P∞ 1 ¡2s¢ s+1
P∞
It is easily checked that s=0 m2s xs+1 = s=0 s+1
is the Taylor series
s x
¡
¢
√
of f (x) = 12 1 − 1 − 4x , and that f (x) satisfies
f (x)2 = f (x) − x.
Therefore f (x) and g(x) are the same power series, so that m2s = r2s , which is
what we had to show.
Here we could also have used (2.4) and Stieltjes inversion formula to obtain
the density from the R-transform R(z) = z 2 : The Cauchy transform is simply
the inverse function of
¢
1¡ 2
1
w=
z +1 =z+
z
z
√
2
We thus need to solve z 2 − wz + 1 = 0, which implies that z = w± 2w −4 .
√
1
Taking imaginary parts we obtain the density 2π
4 − x2 on [−2, 2], which is
the density of the standard semicircle law.
The free central limit theorem goes as follows:
Theorem 5. If
• a1 , ..., an are free and self-adjoint,
• φ(ai ) = 0,
• φ(a2i ) = 1,
• supi |φ(aki )| < ∞ for all k,
√
then the sequence (a1 + · · · + an )/ n converges in distribution to the standard
semicircle law.
14
√
Proof: The proof goes by computing the moments of (a1 + · · · + an )/ n,
and comparing these with the moments of the semicircle law. Note that the
k’th moment can be written
µ
φ
(a1 + · · · + an )k
nk/2
¶
=
k XX
X
φ(ai(1) ai(2) · · · ai(k) )
nk/2
s=1 V
,
(5.1)
where the second summation is over all partitions V = {V1 , V2 , ..., Vs } (where the
blocks are the equivalence classes of the equivalence relation u ∼ v if i(u) = i(v)),
and the third summation is over all choices of (i(1), ..., i(k)) which give this
partition.
By repeatedly using definition 1, we see that each term in (5.1) can be
written as a polynomial in φ(am
i ), 1 ≤ m ≤ k, and the polynomial depends
only on the partition V. The fourth assumption of the theorem now gives us
that there exists a constant Ck such that |φ(ai(1) ai(2) · · · ai(k) )| ≤ Ck , for all
choices of i(a), ..., i(k). We now split the first summation of (5.1) into three
parts: s < k/2, s = k/2, and s > k/2. For s < k/2,
¯
¯
¯
¯
X X n(n − 1) · · · (n − s + 1)
¯ X X X φ(ai(1) ai(2) · · · ai(k) ) ¯
¯
¯≤
Ck ,
¯
¯
nk/2
nk/2
¯
¯
V
V
s<k/2
s<k/2
and this tends to 0 as n → ∞. For s > k/2, it is easy to show that
X X X φ(ai(1) ai(2) · · · ai(k) )
nk/2
s<k/2 V
,
since at least one of the blocks Vi must contain only one element (we here use
that φ(ai ) = 0 for all i). Therefore, only the case s = k/2 is of interest (in
particular, k must be even). We therefore must consider the terms
X X φ(ai(1) ai(2) · · · ai(k) )
.
nk/2
V:s=k/2
We will prove by induction that φ(ai(1) ai(2) · · · ai(k) ) = 0 if V has crossings, and
that it is 1 if V is noncrossing. Assume that this has been shown for k < l.
Assume first that there exists a j such that i(j) = i(j + 1). Then it is easy
to see that
´
³
´
³
¡
¢
φ ai(1) ai(2) · · · ai(k)
= φ ai(1) · · · ai(j−1) (a2i(j) )0 + φ(a2i(j) ) ai(j+2) · · · ai(k)
´
³
= φ ai(1) · · · ai(j−1) (a2i(j) )0 ai(j+2) · · · ai(k)
³
´ ¡
¢
+φ a2i(j) φ ai(1) · · · ai(j−1) ai(j+2) · · · ai(k)
³
´ ¡
¢
= φ a2i(j) φ ai(1) · · · ai(j−1) ai(j+2) · · · ai(k) .
15
Since the partition defined by i(1), ..., i(j − 1), i(j + 2), ..., i(k) is noncrossing if
and only if V is noncrossing, and since φ(ai(j) )2 = 1, we have by induction that
φ(ai(1) ai(2) · · · ai(k) ) = 1 if and only if V is noncrossing (and zero otherwise)
whenever there exists a j such that i(j) = i(j + 1).
Assume now that there exists no j such that i(j) = i(j + 1). Then condition
2 coupled with the definition of freeness gives that φ(ai(1) ai(2) · · · ai(k) ) = 0.
Note also that V always has crossings in this case.
Since the partitions with crossings do not contribute, we have proved that
(denoting by j(1), ..., j(k/2) representatives from the equivalence classes of V)
¶
µ
(a1 + · · · + an )k
lim φ
n→∞
nk/2
X
X
1
= lim
k/2
n→∞
n
V noncrossing pairing j(1),...,j(k/2)
X
n(n − 1) · · · (n − k/2 + 1)
= lim
n→∞
nk/2
V noncrossing pairing
X
=
1
V noncrossing pairing
= #{noncrossing pairings},
since there are n(n − 1) · · · (n − k/2 + 1) choices of i(1), ..., i(k) which give rise to
V. By lemma 1, the above is simply the k’th moment of the standard semicircle
law. This concludes the proof of the free central limit theorem.
6
An important result from free probability, applicable in many situations
Theorem 6. Assume that Rn and Xn are independent random matrices of
dimension n × N , where Xn contains i.i.d. standard (i.e. mean 0, variance
1) complex Gaussian entries. Assume that the empirical eigenvalue distribution
of Γn = N1 Rn RH
n converges in distribution almost everywhere to a compactly
n
supported probability measure ηΓ as n, N goes to infinity with N
→ c. Then we
have that the empirical eigenvalue distribution of
Wn =
1
(Rn + σXn )(Rn + σXn )H
N
(6.1)
converges in distribution almost surely to a compactly supported probability measure ηW uniquely identified by
ηW rµc = (ηΓ rµc ) ¢ δσ2 ,
where δσ2 is dirac measure (point mass) at σ 2 .
16
(6.2)
Here r is ”the opposite” of multiplicative free convolution. (6.1) can be
thought of as the sample covariance matrices of random vectors rn + σxn . rn
can be interpreted as a vector containing the system characteristics (direction
of arrival for instance in radar applications or portfolio assets in financial applications). xn represents additive noise, with σ a measure of the strength of
the noise. Theorem 6 is important, since it opens up for estimation of the
information-part in systems by filtering out the noise.
7
Applications to wireless communication
7.1
Channel capacity estimation using free probability theory [6]
Theorem 6 can be used for estimation of the channel capacity in wireless systems,
which is defined as follows:
Definition 11. The capacity per receiving antenna (in the case where the noise
is spatially white additive Gaussian) of a channel with n × m channel matrix H
and signal to noise ratio ρ = σ12 is given by
C=
µ
¶
n
1
1
1X
1
H
log2 det In +
HH
=
log2 (1 + 2 λl ),
n
mσ 2
n
σ
(7.1)
l=1
where λl are the eigenvalues of
1
H
m HH .
Assume that we have L observations Ĥi of the form
Ĥi = H + σXi ,
(7.2)
in a MIMO system. To adapt to theorem 6, we form the n × mL random
matrices
σ
Ĥ1...L = H1...L + √ X1...L
(7.3)
L
with
i
1 h
Ĥ1...L = √ Ĥ1 , Ĥ2 , ..., ĤL ,
L
1
H1...L = √ [H, H, ..., H] ,
L
X1...L = [X1 , X2 , ..., XL ] .
H
Noting that H1...L HH
1...L = HH , theorem 6 now gives us the approximation
´
³
n
n
¢ δσ2 .
(7.4)
ν 1 Ĥ1...L ĤH rµ mL
≈ ν m1 HHH rµ mL
m
1...L
This can be used to obtain estimates of the moments of the channel matrix
1
H
from the observation matrices. These can in turn be used to obtain an
m HH
17
2.6
2.4
2.2
Capacity
2
1.8
1.6
1.4
1.2
1
True capacity
C1
C2
C3
0.8
0
5
10
15
20
Number of observations
25
30
Figure 3: Comparison of the classical capacity estimators for various number
of observations. σ 2 = 0.1, n = 10 receive antennas, m = 10 transmit antennas.
The rank of H was 3.
estimate of the eigenvalues, and thus the capacity. The estimates prove to work
better than existing methods, at least for the common case when the channel
matrix has low rank (≤ 4). Existing methods for estimating the capacity from
observation matrices are
³
´
PL
1
1
H
C1 = nL
i=1 log2 det In + mσ 2 Ĥi Ĥi
³
´
PL
(7.5)
C2 = n1 log2 det In + Lσ12 m i=1 Ĥi ĤH
i
³
´
PL
PL
1
1
1
1
H
C3 = n log2 det In + σ2 m ( L i=1 Ĥi )( L i=1 Ĥi ) )
These are compared with the free probability based estimator Cf defined through
(7.4) in figure 3 The estimation is worse when the rank of the channel matrix
is increased, as can be seen in figure 4.
18
4.5
4
Capacity
3.5
True capacity, rank 3
C , rank 3
f
True capacity, rank 5
C , rank 5
3
f
True capacity, rank 6
Cf, rank 6
2.5
2
1.5
0
10
20
30
40
50
60
Number of observations
70
80
90
100
Figure 4: Cf for various number of observations. σ 2 = 0.1, n = 10 receive
antennas, m = 10 transmit antennas. The rank of H was 3, 5 and 6.
19
7.2
Estimation of power and the number of users in CDMA
systems [7]
In communication applications, one needs to determine the number of users in
a cell in a CDMA type network as well the power with which they are received.
Denoting by n the spreading length, the received vector at the base station in
an uplink CDMA system is given by:
1
yi = WP 2 si + bi
(7.6)
where yi , W, P, si and bi are respectively the n × 1 received vector, the n × N
spreading matrix with i.i.d zero mean, n1 variance entries, the N × N diagonal
power matrix, the N × 1 i.i.d gaussian unit variance modulation signals and the
n × 1 additive white zero mean Gaussian noise. Adaption of theorem 6 to this
case gives
µµ
¶ ¶
¶
µ
N
N
(µ N £ µP ) + 1 −
δ0 ¢ µσ2 I £ µ Ln ≈ µΘ̂
(7.7)
n n
n
We will show that this enables us to estimate the numbers of users N through a
best-match procedure in the following way: Try all values of N with 1 ≤ N ≤ n,
and choose the N which gives a best match between the left and right hand
side in (7.7). By best match we mean the value of N which gave the smallest
deviation in the first moments, when compared to the moments we observe in
the sample covariance matrix. We use a 36 × 36 (N = 36) diagonal matrix as
our power matrix P with µP = δ1 .
In this case, a common method that try to find just the rank exists. This
method tries the number of eigenvalues greater than σ 2 . Some threshold is used
in this process. We will set the threshold at 1.5σ 2 , so that only eigenvalues larger
that 1.5σ 2 are counted. There are no general known rules for where the threshold
should be set, so some guessing is inherent in this method. Also, choosing a
wrong threshold can lead to a need for a very high number of observations for
the method to be precise.
The two methods are tested with varying number of observations, from L = 1
to L = 4000, In figure 5, it is seen that when L increases, we get a prediction of
N which is closer to the actual value 36. The classical method starts to predict
values close to the right one only for a number of observations close to 4000.
The method using free probability predicts values close to the right one for a
less greater number of realizations.
8
Applications to portfolio optimization
Certain companies exist which have specialized themselves in automatic trading
strategies based on research and results from random matrix theory. An example
is Capital Fund Management (is http://www.cfm.fr). This company employs
25 PhDs (mostly in physics). The publications from this company can be found
in http://www.cfm.fr/us/publications.php. Two of the founders of Capital
20
90
Predicted number of users
80
70
60
50
40
30
20
10
0
0
Classical prediction method
Prediction method based on free convolution
500
1000
1500
2000
2500
Number of observations
3000
3500
4000
Figure 5: Estimation of the number of users with a classical method, and free
convolution L = 1024 observations have been used.
Fund Management have written the book [1], and the paper [2], which the topics
of this section are based on. We will follow the notation used in [2]:
• We assume that we have a portfolio with N assets with weight wi on the
ni y0i
i’th asset, i.e. wi = W
, where ni is the number of asset i in the portfolio,
y0i is the initial price of asset i (t = 0), and W is the total wealth invested
in the portfolio (at time t = 0). We will also write w = (w1 , ..., wN ) for
the portfolio.
• rti will denote the daily return of asset i in our portfolio at time t, i.e.
rti =
i
yt+1
−yti
,
i
yt
where yti is the price of asset i at time t. The expected
P
return of the portfolio is thus i wi rti .
• Denote by (σti )2 the (daily) variance of rti . We will denote the correlation
matrix of the rti by
Cijt =
E(rti rtj ) − E(rti )E(rtj )
σti σtj
.
Similarly, the covariance matrix is defined by
Dijt = E(rti rtj ) − E(rti )E(rtj ) = σti σtj Cijt .
21
The correlation matrix may or may not evolve over time. We will mostly
assume it does not evolve over time, in which case we will write σi and
Cij , i.e. drop the subscript t. It is common to assume (without loss of
generality) that E(rti ) = 0.
• The (daily) variance/risk of the portfolio return is given by
X
wi σi Cij σj wj ,
(8.1)
ij
where σi2 is the (daily) variance of asset i.
• The empirical correlation matrix E is given by
Eij =
T
1X i j
xx ,
T t=1 t t
(8.2)
where xit = rti /σi . The empirical correlation matrix is typically very different from the true correlation matrix.
• The risk of our portfolio can be faithfully measured by
Ã
!
X
X
1X
1
j
j
wi σi xit xt σj wj =
wi σi
xi x σj wj ,
T ijt
T t t t
ij
(8.3)
which is (8.1) with the correlation matrix replaced with the empirical
correlation matrix.
8.1
Interpretation of eigenvalues and eigenvectors
If the portfolio is given by weights from a normalized eigenvector wa = (wa1 , ..., waN )
of the covariance matrix with eigenvalue λa , then (8.1) says that the variance/risk of the portfolio return is
X
X
wai σi Cij σj waj =
wai Dij waj = wa · Dwa = λa .
ij
ij
Now consider two portfolios corresponding to (orthogonal) eigenvectors wa and
wb . Then the covariance of their returns is given by
Ã


! N
! N
ÃN
N
X
X
X
X
E
wai rti 
wbj rtj  − E
wai rti E 
wbj rtj 
i=1
=
X
j=1
wai E(rti rtj )wbj
−
ij
=
X
ij
wai Dij wbj
ij
=
=
X
wb · Dwa
0.
22
i=1
wai E(rti )E(rtj )wbj
j=1
Therefore, the eigenvectors of the covariance matrix correspond to uncorrelated
portfolios, and the corresponding eigenvalues correspond to the risks of these
portfolios. We can now decompose the original portfolio as a sum of these
uncorrelated portfolios:
X
sa wa .
(w1 , ..., wN ) =
a
This decomposition is called a principal component analysis.
8.2
Markowitz portfolio optimization
Markowitz portfolio optimization helps us find weights for a portfolio which give
us maximal expected return for a given risk, or, equivalently, the minimum risk
for a given return (G):
C−1 g
wC = G T −1 ,
(8.4)
g C g
where g = (g1 , ..., gN ) are the predicted gains for the assets. Typically, the
correlation matrix C is not known, so the Markowitz portfolio estimate (8.4) is
computed from an empirical correlation matrix:
wE = G
E−1 g
gT E−1 g
(8.5)
What is the true risk of the Markowitz optimized portfolio? This is called the
2
”true” minimal risk Rtrue
. Wa assume that C is perfectly known, and we get
2
Rtrue
=
=
=
=
=
T
wC
CwC
gT C−1
C−1 g
G T −1 CG T −1
g C g
g C g
T −1
−1
g
C
CC
g
G2
2
T
−1
(g C g)
gT C−1 g
G2
2
(gT C−1 g)
G2
.
gT C−1 g
When C is not known, we use an empirical correlation matrix as in (8.5). The
2
”in-sample” risk Rin
is defined as the risk of the optimized portfolio over the
period used to construct it:
2
Rin
=
G2
gT E−1 g
.
2
One can show that Rin
≤ Rtrue . In particular, in the case where C = I, one
has that
r
G2
N G2
2
2
Rin = 1 −
and
R
=
.
true
T gT g
gT g
23
see also figure 1 in [2]. It is only the case when T is much larger than N that
these values will be close. Thus, using past returns (in form of an empirical
correlation matrix) to optimize a portfolio strategy leads to an over-optimistic
estimate of the risk. To eliminate this bias in the estimation of risk, one can
attempt to do some cleaning of the empirical correlation matrix. When this is
done properly, we will see that the ”cleaned” matrix can provide more reliable
risk estimation.
8.3
Cleaning of correlation matrices
A large part of the empirical correlation matrices must be considered as ”noise”,
and can not be trusted for risk management To describe a useful way of cleaning
this noise, let us first look at what happens when the prices of the assets are
independent, identically distributed random variables. This is also called the
null-hypothesis of independent
PTassets. When C = I, the empirical correlation
matrix thus has the form T1 t=1 xit xjt , where xit are real, standard, Gaussian,
and independent. When N and T go to infinity at a given ratio, it is known
that the eigenvalues are almost everywhere close to the Marc̆henko Pastur law,
as shown in figure 2. The derivation of the density of this law was given in section 4, using R-transform techniques. The Marc̆henko Pastur law thus serves
as a theoretical prediction under the assumption that the market is ”all-noise”.
Deviations from this theoretical limit in the eigenvalue distribution should indicate non-noisy components, i.e. they should suggest information about the
market: Most eigenvalues can be explained in terms of a purely random correlation matrix, except for the largest ones, which correspond to the fluctuations
of the market as a whole, and of several industrial sectors.
In figure 1 in [2], the effect on the risk estimates is shown after cleaning
described as follows:
• Replace all ”low-lying” eigenvalues with a unique value.
• Keep all the high eigenvalues. These should correspond to meaningful
economic information (sectors).
This boils down to finding a k ∗ (which is the number of meaningful economic
sectors), and set
λkc = 1 − δ if k > k ∗ , and λkc = λkE if k ≤ k ∗ ,
where δ is chosen such that the trace of the correlation matrix is exactly preserved. Here the eigenvalues have been listed in descending order. k ∗ is found
by first finding the theoretical edge of the eigenvalues under the assumption that
the null hypothesis holds (i.e. the upper limit of the support of the Marc̆henko
∗
Pastur law), and choosing k ∗ such that λkE is close to this edge.
Another cleaning strategy used in the literature is to shift the empirical
matrix closer to the indentity matrix, i.e. replace E with Ec , where
Ec = αE + (1 − α)I.
24
The new eigenvalues are λkc = 1 + α(λk − 1). This method of cleaning is also
alled the shrinkage estimator.
8.4
Other ways of forming an empirical matrix
In finance, it is standard practice to form an exponentially weighted moving
average (EWMA) correlation matrix
EijT
¶T −t
T µ
1X
1
=
xit xjt ,
1−
T t=1
T
(8.6)
instead of the standard empirical matrix (8.2), where each sample has equal
weight ( T1 ). This can also be written in matrix notation as
¶T −t
T µ
X
1
ET =
1−
δEt ,
T
t=1
where δEt is the rank one matrix given by δEijt = T1 xit xjt . We can perform
cleaning of the correlation matrix (8.6) in a similar way, but now the null hypothesis gives us a different law. In order to compute this law, we will restrict
to a covariance matrix equal to the identity. Note first that
ET
=
=
=
¢T −t
PT ¡
1− 1
δEt
´
¡ t=11 ¢ ³PTT −1 ¡
¢T −1−t
1− T
1
− T1
δEt + δET
t=1
¢
¡
1 − T1 ET −1 + δET .
(8.7)
It is easily checked that
δET (x1t , ..., xN
t )
1
=
T
Ã
X
!
(xit )2
1
N
(x1t , ..., xN
t ) → c(xt , ..., xt ),
i
so that δET has for large N one eigenvalue close to c, while all the other N − 1
eigenvalues are 0. This means that
GµδET (z) =
1 1
N −11
+
.
N z−c
N z
25
The inverse function of this is found by solving
w
=
wz 2 − wzc
=
wz 2 − (wc + 1)z +
N −1
c =
N
z
=
N −11
1 1
+
N z−c
N z
1
N −1
z+
(z − c)
N
N
0
q
(wc + 1)2 − 4cw NN−1
wc + 1 ±
2w
q
z
=
z
≈
z
≈
wc + 1 ±
(wc − 1)2 +
4cw
N
2w
1
c
−
w N (wc − 1)
1
c
+
,
w N (1 − wc)
where we have used the Taylor expansion
r
4cw
2cw
(wc − 1)2 +
= wc − 1 +
+ ...
N
N (wc − 1)
Using (2.4), we see that
µ
RµδET (z) ≈ z
1
c
+
z
N (1 − cz)
¶
−1=
cz
.
N (1 − cz)
If we replace ET and ET −1 in (8.7) with E, since the matrices δEt are rotationally invariant, a result from free probability gives us that
RµE (z) ≈
=
cz
(z) +
(1− T1 )E
N (1 − cz)
µµ
¶ ¶
1
cz
RµE
1−
z +
.
T
N (1 − cz)
Rµ
where we have used that well-known R-transform property RaE (z) = RE (az).
Denoting R(z) = RµE (z), we now have the equation
µµ
¶ ¶
1
cz
R(z) = R
1−
z +
,
T
N (1 − cz)
26
which also can be written
µµ
¶ ¶
1
cz
R
1−
z − R(z) +
T
N (1 − cz)
¡¡
¢ ¢
R 1 − T1 z − R(z)
− T1 z
R0 (z)
R(z)
= 0
1
1 − cz
1
=
1 − cz
ln(1 − cz)
.
= −
c
=
To find the density of this, we must first find the Cauchy transform by solving
µ
¶
µ µ
¶¶
1
1
ln(1 − cz)
G
(1 + R(z)) = G
1−
= z,
z
z
c
and then use the Stieltjes inversion formula to find the density. The density is
shown in [2].
8.5
Dynamics of the top eigenvalue and eigenvector
The largest eigenvalue of the empirical correlation matrix is typically much
larger than the predicted value from the null hypothesis. The interpretation of
the corresponding eigenvector is ”the market itself”, i.e. it has roughly equal
components on all the N assets.
A first approximation to the market could be that all stocks in the beginning
move up or down together. One way to state this is through the model
rti = βi φt + ²it ,
where φt (market mode) is common to all stocks, βi is market exposure, and
²it is noise, uncorrelated from stock to stock. The covariance matrix for such a
model is
Cij = βi βj σφ2 + σi2 δij ,
where σφ2 is the variance of φ, σi2 is the variance of ²it . When all σi are equal
with common value σ, the largest eigenvalue of C is
X
Λ0 = (
βj2 )σφ2 + σ 2
j
and is of multiplicity 1 with eigenvector (β1 , ..., βN ). The other N −1 eigenvalues
are all equal to Λα = σ 2 .
It would be very interesting to see how the top eigenvector and eigenvalue
of the empirical covariance matrix fluctuates, and how this is related to the
top eigenvector/eigenvalue of the covariance matrix itself (note that here we
27
consider covariance matrices, not correlation matrices. We consider also here
EWMA covariance matrices, i.e. in (8.7) we do the replacements
δEijt
=
EijT
=
1 i j
rr
T t t
¶T −t
T µ
1X
1
1−
rti rtj .
T t=1
T
It turns out that the largest eigenvalue λ0t of Et (with corresponding eigenvector
ψ0t ) has the following relationship to the largest eigenvalue Λ0 of the actual
covariance matrix C:
¡
¢
¡
¢
Var ³(λ0(t+τ ) − ´
λ0t )2
= T2 1 − e−τ /T
¡
¢
(8.8)
t
≈ 1 − TΛΛ10 1 − e−τ /T ,
Var ψ0(t+τ
) ψ0t
where Λ1 is the second largest eigenvalue of C. We have asssumed that the
actual covariance matrix stays constant. Measurements suggesting deviations
from (8.8) would suggest otherwise.
8.6
Relation between the correlation matrix and the empirical correlation matrix
Assume that
• N and T goes to infinity at the ratio limN,T →∞
N
T
= c,
• the eigenvalue distribution of the empirical covariance matrix converges
almost everywhere to a meaure µE .
Then ”very often” the eigenvalue distribution of the corresponding covariance
matrices converge almost everywhere to a measure µC , and the following relationship determines the one from the other [8]:
µE = µC £ µc .
This also establishes the connection with theorem 6, since £ is used there as
well.
Further reading
Most of the material on the combinatorics of free probability presented here
is taken from [5]. For a survey of free probability covering many other facets
also, [3] is a good reference. For a survey of random matrix results, take a
look at [4]. The result for the exact moments of Gaussian matrices was taken
from [9]. [10] is a good reference for the connection between random matrices,
free probability, and wireless communication.
28
References
[1] J-P Bouchaud and M. Potters. Theory of Financial Risk and Derivative
Pricing - From Statistical Physics to Risk Management. Cambridge University Press, Cambridge, 2000.
[2] J-P Bouchaud and M. Potters.
Financial applications of random
matrix theory: Old laces and new pieces.
pages 1–11, 2005.
arxiv.org/abs/physics/0507111.
[3] F. Hiai and D. Petz. The Semicircle Law, Free Random Variables and
Entropy. American Mathematical Society, 2000.
[4] M. L. Mehta. Random Matrices. Academic Press, New York, 2nd edition,
1991.
[5] A. Nica and R. Speicher. Lectures on the Combinatorics of Free Probability.
Cambridge University Press, 2006.
[6] Ø. Ryan and M. Debbah. Channel capacity estimation using free
probability theory. Submitted to IEEE Trans. Signal Process., 2007.
http://arxiv.org/abs/0707.3095.
[7] Ø. Ryan and M. Debbah. Free deconvolution for signal processing applications. Submitted to IEEE Trans. on Information Theory, 2007.
http://arxiv.org/abs/cs.IT/0701025.
[8] Ø. Ryan and M. Debbah. Multiplicative free convolution and informationplus-noise type matrices. 2007. http://arxiv.org/abs/math.PR/0702342.
[9] S. Thorbjørnsen. Mixed moments of Voiculescu’s Gaussian random matrices. J. Funct. Anal., 176(2):213–246, 2000.
[10] A. M. Tulino and S. Verdú. Random Matrix Theory and Wireless Communications. www.nowpublishers.com, 2004.
29
Download