Randomly Generating Computationally Stable Correlation Matrices Mark Simon Advisors: Jo Hardin, Stephan Garcia

advertisement
Randomly Generating Computationally Stable Correlation
Matrices
Mark Simon
Advisors: Jo Hardin, Stephan Garcia
Contents
List of Figures
5
Chapter 1.
7
Motivation
Chapter 2. Derivation of the Algorithm
1. The eigenvalues of E, Σ and S
9
10
Chapter 3. Simulations Conducted to Evaluate the Algorithm
1. The effects of our algorithm on the condition numbers of matrices
2. How different are S and Σ?
15
15
20
Chapter 4. Extensions of our algorithm
1. Multiple δi
2. Block Diagonal Correlation Matrices
3. Make some ρij for Σ negative
4. Some concluding remarks
23
23
24
26
26
Bibliography
27
3
List of Figures
1
Pairs plot of the dimension of our matrices (Dimension.S) versus the true value of κ(S) (True.K)
and our estimate of the upper bound of κ(S), κmax (S) (Upper.Bound.K). The dimension of S was
varied over [500,1000] in increments of 50 and 100 matrices were generate from each set of parameter
values.
15
2
Pairs plot of the dimension of our matrices (Dimension.S) versus R’s estimate of κ(S) (R.Estimate.K)
and our estimate of the upper bound for κ(S), κmax (S) (Upper.Bound.K). The dimension of S
was varied over [500,1000] in increments of 50 and 100 matrices were generate from each set of
parameter values.
16
3
Pairs plot of the δ used to generate each matrix S (Delta.S) versus the true value of log(κ(S))
(Log.True.K) and log(κmax (S)) (Log.Upper.Bound.K). The plot was done on log scales for condition
numbers and estimates of condition numbers because of the wide range of observed condition
numbers in these simulations.
16
Pairs plot of the δ used to generate each matrix S (Delta.S) versus R’s estimate of κ(S)
(R.Estimate.K) and log(κmax (S)) (Log.Upper.Bound.K). The plot was done on log scales for
condition numbers and estimates of condition numbers because of the wide range of observed
condition numbers in these simulations.
17
4
5
Plot of the average true condition number for matrices constructed by increasing δ + ρ = c from
c = .8 to c = .99 while letting δ, ρ be uniformly distributed random variables over [0.0, c]
18
6
Plot of the average true condition number for matrices constructed varying δ from 0 to .99 while
setting ρ = 0.8. Note that the values of c = ρ + δ will still go from c = .8 to c = .99 in this set of
simulations
18
7
The average condition numbers found from matrices with randomly varied ρ + δ subject to the
constraint ρ + δ = c (red) and the average condition numbers of matrices generated from ρ = .8 and
δ increasing monotonically (blue) plotted against the values of c used to generate each set of 100
matrices from c = [.8, .99]
19
8
The Frobenius norm distance between S, Σ0 (Frobenius.Norm.Distance), the values of δ used to
generate S (Delta.S) and the P-value of Box’s M test for the hypotheses that S and Σ0 are different
(Box.M.Test.pvalue)
21
9
The Modulus norm distance between S, Σ0 (Frobenius.Norm.Distance), the values of δ used to
generate S (Delta.S) and the P-value of Box’s M test for the hypotheses that S and Σ0 are different
(Box.M.Test.pvalue)
22
5
CHAPTER 1
Motivation
Correlation is a measure of the strength of the linear relationship between two variables. A correlation
matrix is a p × p matrix whose entries are pairwise correlations. They are important for studying the
relationships between random variables in many contexts ranging from modeling portfolio risk in finance
to understanding the interactions between genes in microarray data from genetics. Many applications of
statistics involve large data sets and the correlation structure of the data is sometimes a topic of primary
interest [2]. As a result, it would be useful to be able to randomly generate realistic correlation matrices to
perform simulation studies in order to gain a deeper understanding of correlation structures. One approach
to performing simulation studies on a particular correlation structure is to generate random variables and
determine the correlation structure of the random variables. However, generating random variables and
computing matrices from them is a highly inefficient way of randomly generating correlation matrices and
would not necessarily produce every possible correlation matrix because of distributional assumptions which
must be made about the random variables in order to generate the correlation matrices from them.
A different tactic is to try to generate a random correlation matrix directly. This is more difficult
than it might first appear. Correlation matrices have a number of required properties and using them for
simulations further requires that they be invertible in a computationally feasible manner. Four criteria
characterize correlation matrices:
(1) each entry of the matrix must have absolute value less than or equal to 1,
(2) each entry on the main diagonal must be 1,
(3) the matrix must be real symmetric, and
(4) the matrix positive semi-definite.
Further, every matrix with these properties is a correlation matrix (see Remark 2). We impose an additional
requirement, namely that a correlation matrix be computationally invertible. This creates the constraint
that our correlation matrix must have a lowest eigenvalue which is bounded away from zero and a greatest
eigenvalue which is bounded below infinity. Matrices which cannot be inverted computationally cannot be
used for simulation studies. We seek to design an algorithm which will allow us to satisfy the necessary
criteria as well as our additional desired property of computational invertibility.
The approach we take to generating random correlation matrices is to provide an algorithm to perturb
an initial correlation matrix with known eigenvalues. We derive this algorithm in Chapter 2. We then
evaluate the effectiveness of our algorithm by conducting simulations in Chapter 3. Finally, we conclude
with a number of natural generalizations of our method in Chapter 4.
7
CHAPTER 2
Derivation of the Algorithm
Our goal in this section is to describe a method which allows us to generate random computationally
invertible correlation matrices and to prove that the matrices which we create will have a controlled degree
of computational instability. First we consider what it means for a matrix to be positive.
Definition 1. Let T be a matrix or vector, then T ∗ is the conjugate transpose of T . For real symmetric
matrices T , T ∗ = T . A square matrix is self-adjoint if T ∗ = T .
Definition 2. A p × p self-adjoint matrix, T , is positive semi-definite if z ∗ T z ≥ 0
is positive definite if z ∗ T z > 0 ∀ z ∈ Cp such that z 6= 0.
∀
z ∈ Cp and T
Obviously, it would be impractical to check if z ∗ T z > 0 for all z 6= 0, so instead we rely on an alternative
characterization of positivity for matrices.
Definition 3. Let T be a p × p matrix. Denote the eigenvalues of T repeated by their multiplicity in
decreasing order as
λ1 (T ) ≥ . . . ≥ λp (T ).
Remark 1. [3, Observation 7.1.4] T is positive definite if and only if all eigenvalues of T are positive
real numbers.
Therefore, we can randomly generate positive definite matrices if we can control the eigenvalues of the
random matrix. We now consider in more detail the criteria which characterize correlation matrices. In
particular, we will derive a sufficient criteria to ensure that a matrix is a correlation matrix.
Remark 2. Every real symmetric positive semi-definite matrix with a main diagonal of 1s and off
diagonal entries with absolute value less than or equal to 1 is a correlation matrix.
Proof. First, observe that every positive semi-definite matrix is a covariance matrix [6]. Now, note
that every covariance matrix with unit variance is a correlation matrix. Matrices with unit variance have a
main diagonal of 1’s and off diagonal entries with absolute value less than or equal to 1.
Now that we understand the most relevant characterizations of positivity for matrices and the sufficient
criteria to ensure a matrix is a correlation matrix, we can begin to design our algorithm to generate correlation matrices. We begin with a correlation matrix having all off diagonal entries equal to ρ where 0 < ρ < 1.
Then, we perturb the matrix by adding noise of at most magnitude δ where 0 < δ < 1 to each term. Further,
we choose ρ and δ subject to the constraint ρ + δ < 1. Now, construct a matrix Σ ∈ Mp×p with all off
diagonal entries equal to ρ and all diagonal entries equal to 1 − δ. That is:



Σ=

1−δ
ρ
..
.
ρ
1−δ
..
.
...
...
..
.
ρ
ρ
..
.
ρ
ρ
...
1−δ
9





10
2. DERIVATION OF THE ALGORITHM
√
Then we randomly choose p ei ’s ∈ Rm such that kei k = δ and construct their Gram matrix, E


δ
he1 , e2 i . . . he1 , ep i
 he2 , e1 i
δ
. . . he2 , ep i


E=
..
..
.. 
..

.
.
.
. 
hep , e1 i hep , e2 i . . .
δ
Note that choosing the ei from Rm with p > m will tend to produce a Gram matrix with off diagonal entries
with magnitude closer to δ than choosing ei from Rm with p < m at the expense of introducing linear
dependence to the rows of the E. However, if we draw p ei ’s from Rm with p < m, then hei , ej i will tend to
be closer to zero because it is more likely we will choose mutually orthogonal ei ’s.
Now, define S = Σ + E. The absolute value of each off diagonal entry of S is strictly less than 1 because
|hei , ej i| ≤ δ for i 6= j, so we are adding at most δ to each off diagonal entry ρ. We chose δ such that
ρ + δ < 1, so we know that each off diagonal entry must be less than or equal to 1 in absolute value.


. . . ρ + he1 , ep i
. . . ρ + he2 , ep i


..
..

.
.
ρ + hep , e1 i ρ + hep , e2 i . . .
1
Thus, all that remains in order to show that S is a correlation matrix is to establish that S is positive
semi-definite. However, we will actually establish a stronger result. We argue that S is positive definite. In
order to show that S is positive definite, we will need to establish a bound on λp (S) which is strictly greater
than zero. To establish a bound on λp (S), we will consider the eigenvalues of E and Σ.
1
ρ + he2 , e1 i

S=
..

.
ρ + he1 , e2 i
1
..
.
1. The eigenvalues of E, Σ and S
In order to find a bound for λp (S) we will find bounds for each of λp (E) and λp (Σ). Consider the
matrix E first. We argue that all Gram matrices are positive semi-definite, so E must be at least positive
semi-definite. To see why all Gram matrices are positive semi-definite, consider that E can be factored as
E = A∗ A where A∗ , A ∈ Mp×p and A is the matrix with columns composed of the vectors ei .
A = e1 e2 · · · ep
Clearly, A∗ is the matrix with the vectors ei as rows. Note that A∗ A is positive semi-definite because, for
for all v ∈ CP , hA∗ Av, vi = hAv, Avi = kAvk2 ≥ 0 by positivity of the norm. Therefore, E is positive
semi-definite which implies that λp (E) ≥ 0.
Now, we argue that Σ is positive definite. First, we re-write Σ:
Σ = (1 − ρ − δ)Ip×p + P
where P is the p × p matrix of ρ’s.

ρ
ρ

P = .
 ..
ρ
ρ
..
.
...
...
..
.

ρ
ρ

.. 
.
ρ ρ ... ρ
Further, rank(P ) = 1, so p − 1 of the eigenvalues of P are zero and the only non-zero eigenvalue of P can be
readily computed as pρ. The only eigenvalues of (1 − ρ − δ)Ip×p are equally easy to find, they are p copies of
1 − ρ − δ. Now we consider the eigenvalues of Σ = (1 − ρ − δ)Ip×p + ρP . We observe that adding a multiple
of the identity to a matrix linearly shifts the eigenvalues of the matrix.
1. THE EIGENVALUES OF E, Σ AND S
11
Remark 3. Let A be a p × p matrix with eigenvalue λi (A) for i = 1, 2, . . . , p and let c ∈ R, then
λi (A + cI) = λi (A) + c
i = 1, 2, . . . , p.
Proof. If Avi = λi (A)vi for some λi (A) with vi 6= 0 then to find the eigenvalues of A + cI:
(A + cI)vi = Avi + cIvi
= λi (A)vi + cvi
= (λi (A) + c)vi
(A + cI)vi = (λi (A) + c)vi
∀ λi (A).
Therefore, the eigenvalues of A + cI are λi (A) + c.
We apply this result to Σ and find that c = 1 − ρ − δ and A = (1 − ρ − δ)Ip×p . Thus, we know all the
eigenvalues of Σ because Σ = (1 − ρ − δ)Ip×p + P . In particular, the eigenvalues of Σ are the eigenvalues of
P shifted by 1 − ρ − δ.
Remark 4. The spectrum of Σ is:
λ1 (Σ) = pρ + 1 − ρ − δ,
λ2 (Σ) = . . . = λp (Σ) = 1 − ρ − δ
So, we find that the lowest eigenvalue of Σ is 1 − ρ − δ. Note that we chose ρ + δ < 1, so this is greater than
zero. Therefore, Σ is positive definite. We now argue, by establishing bounds on λp (S) that S is, in fact,
positive definite. In particular, we claim that 1 − ρ − δ ≤ λp (S). Our argument will utilize a form of Weyl’s
Inequalities [1, Cor. III.2.2].
Theorem 1 (Weyl’s Inequality). Let A, B be p × p self-adjoint matrices, then for 1 ≤ j ≤ p:
λj (A) + λp (B) ≤ λj (A + B) ≤ λj (A) + λ1 (B)
Result 1. Let S = Σ + E. Then 1 − ρ − δ ≤ λp (S).
Proof. From Weyl’s Inequality, we set A = Σ and B = E, so that the statement of the theorem
becomes
λj (Σ) + λp (E) ≤ λj (S = Σ + E).
Now set j = p, and we have that
λp (Σ) + λp (E) ≤ λp (S)
As previously showed, λp (Σ) = 1 − ρ − δ. Further, recall that λp (E) ≥ 0 because E is positive semidefinite. We can now find a lower bound for λp (S) by combining inequalities
λp (Σ) ≤ λp (Σ) + λp (E) ≤ 1 − ρ − δ ≤ λp (S). We have now finished our proof of the bound on the smallest eigenvalue of S and demonstrated that S is
positive definite because this lower bound is strictly positive. This means that we have now established
that S is a correlation matrix because it satisfies the required properties for correlation matrices Remark
2 However, we have not yet defined what it means for a matrix S to be computationally stable formally,
let alone shown that our S will be computationally stable, so while we are guaranteed based on our work
thus far that S will be a positive definite correlation matrix, it would not necessarily be computationally
12
2. DERIVATION OF THE ALGORITHM
invertible. We will first formalize notions of computational stability and then show that S is computationally
stable in the next section.
1.1. Bounds for the Condition Number of S. The computational stability of a matrix can be
thought of intuitively as a means of measuring the impact of small errors on the solutions to a system of
linear equations. We provide an example to help build intuition.
Example 1 ([4]). Let


11 10 14
A = 12 11 −13, then
14 13 −66


−557 842 −284
A−1 =  610 −922 311 
2
−3
1
Consider the solutions to Ax = b for



−.683
1.001
b =  .999 , x =  .843 
.006
1.001

Now let




1.00
1
b0 =  1.00  then x0 =  −1 
1.00
0
0
This is a large change in the solution from x to x for a seemingly minuscule change from b to b0 . Matrices
which are sensitive to rounding errors cannot be inverted computationally and are said to be computationally
unstable.
There is a measurable quantity which will allow us to determine the extent to which a matrix will exhibit
sensitivity to small changes such as the one in the previous example and whether or not the matrix can be
computationally inverted.
Definition 4. Let T be a positive semi-definite matrix, then the condition number of T is defined as
λ1 (T )
.
λp (T )
Denote the condition number of T as κ(T ).
Recall that S is positive definite. Thus, our construction of S will let us impose a bound on κ(S) if we
can bound both λp (S) and λ1 (S). We have already found that λp (S) is bounded below by 1 − ρ − δ. Thus,
all that remains is to bound λ1 (S) from above.
We claim that λ1 (S) ≤ pδ + pρ + (1 − δ − ρ). However, we will need to mimic the approach we used to
bound λp (S) and first estimate λ1 (Σ) and λ1 (E). We begin with λ1 (E) using Gerschgorin’s disk theorem [3,
Theorem 6.1.1] and the Cauchy-Schwarz-Buniakovsky Inequality [3, Theorem 0.6.3].
Definition 5. Let A be a p × p matrix with entries (aij ), then
Ri (A) =
p
X
j=1,j6=i
|aij |.
1. THE EIGENVALUES OF E, Σ AND S
13
Also, let the ith Gerschgorin disk of A be D(aii , Ri ), where D(aii , Ri ) denotes a disk centered at aii with
radius Ri .
Theorem 2 (Cauchy-Schwarz-Buniakovsky Inequality). Let x, y be vectors in an inner product space
V , then
khx, yik2 ≤ hx, xihy, yi
Theorem 3 (Gerschgorin Disk Theorem [3]). For a p × p matrix A, every eigenvalue of A lies inside
at least one Gerschgorin disk of A.
Now we apply Gershgorin’s Disk Theorem to E to find an upper bound for λ1 (E).
Remark 5. The largest eigenvalue of E is bounded above. In particular, λ1 (E) ≤ pδ.
Proof. Applying Gerschgorin’s Disk theorem to E, we find that aii = hei , ei i = δ and Ri (E) ≤ (p − 1)δ
because each of the p − 1 off diagonal entries in row i of E will have absolute value less than or equal to δ
by Theorem 2. Therefore, if λj (E) ∈ σ(E), then λj (E) ∈ D(δ, Ri ).
|δ − λi (E)| ≤ Ri (E)
p
X
Ri (E) =
|hei , ej i|
j=1,j6=i
p
X
|hei , ej i| ≤ (p − 1)δ
j=1,j6=i
∴ |δ − λi (E)| ≤ (p − 1)δ
Recall that δ > 0, so |δ| = δ and use the triangle inequality on |λi (E) − δ| to find that:
|λi (E)| − δ ≤ |λi (E) − δ| ≤ (p − 1)δ
⇒ |λi (E)| − δ ≤ (p − 1)δ
∴ |λi (E)| ≤ pδ
Further, E is positive, so |λi (E)| = λi (E) and we conclude that λi (E) ≤ pδ
λ1 (E) ≤ pδ.
∀ λi (E). In particular,
Now that we have established a bound for λ1 (E), recall that we previously showed that λ1 (Σ) = pρ +
(1 − ρ − δ). Now that we have bounds for λ1 (E) and λ1 (Σ) we can move on to finding a bound for λ1 (S).
Result 2. Let S = Σ + E, then λ1 (S) ≤ (p)δ + pρ + (1 − δ − ρ).
Proof. We apply Weyl’s inequality with j = 1 to S
λ1 (S) ≤ λ1 (A) + λ1 (B)
∴ λ1 (S) ≤ (p)δ + pρ + (1 − δ − ρ)
Applying results 1 and 2, we now have bounds for both the lowest eigenvalue of S and the greatest eigenvalue
of S. Obviously, by definition λp (S) ≤ λ1 (S), so we have bounds ∀λi ∈ σ(S)
0 < 1 − ρ − δ ≤ λp (S) ≤ λ1 (S) ≤ pδ + pρ + (1 − δ − ρ).
Further, we now have an upper bound on the condition number κ(S).
14
2. DERIVATION OF THE ALGORITHM
Result 3.
pδ + pρ + (1 − δ − ρ)
= κmax S
1−ρ−δ
Also, let the upper bound we derived for λ1 (S) be denoted λmax
(S) and the lower bound we derived for
1
λp (S) be denoted λmax
(S).
p
κ(S) ≤
κmax (S) scales linearly with the dimension of our matrix, p. This means that we can control how easily we
can invert S is through our choice of ρ and δ for a particular matrix size. Further, κmax (S) is the optimal
upper bound for κ(S).
Definition 6. An inequality is sharp if there exist values for which equality is attained.
Our inequality κ(S) ≤ κmax (S) is sharp because we can find values of S for which κ(S) = κmax (S). In the
two dimensional case it is easily shown:
Example 2. Fix ρ, δ such that ρ + δ < 1, then:
1−δ
ρ
δ
Let Σ =
, E=
ρ
1−δ
δ
1
ρ+δ
⇒S=
ρ+δ
1
δ
δ
2ρ + 2δ + (1 − ρ − δ)
ρ+δ+1
=
1−ρ−δ
1−ρ−δ
which is exactly our estimate of the condition number κmax (S) for S in this instance.
κ(S) =
Thus, our algorithm produces matrices which are positive definite and have bounded condition numbers.
As a final note, we observe that S has controlled off diagonal entries. We now proceed to perform simulations
in order to estimate the effects of inputs ρ,δ and p on κ(S), as well as the true condition number of the
matrix.
CHAPTER 3
Simulations Conducted to Evaluate the Algorithm
1. The effects of our algorithm on the condition numbers of matrices
In order to evaluate the performance of our technique computationally, simulation were performed in
R generating matrices with dimensions of between 500 and 1000, with a step size of 50, with ρ = .8 and
δ = .1. A second set of simulations was performed varying δ from δ = 0 to δ = .199999 with a step size of
9
for 2 ≤ l ≤ 10 for ρ = .8. Each combination of parameters had 100 matrices generated from it and
10−l
we calculated three quantities for each matrix, our upper bound on the condition number κmax (S), the true
condition number of each matrix and R’s built in estimate of the condition number of S.
Figure 1. Pairs plot of the dimension of our matrices (Dimension.S) versus the true value
of κ(S) (True.K) and our estimate of the upper bound of κ(S), κmax (S) (Upper.Bound.K).
The dimension of S was varied over [500,1000] in increments of 50 and 100 matrices were
generate from each set of parameter values.
15
16
3. SIMULATIONS CONDUCTED TO EVALUATE THE ALGORITHM
Figure 2. Pairs plot of the dimension of our matrices (Dimension.S) versus R’s estimate
of κ(S) (R.Estimate.K) and our estimate of the upper bound for κ(S), κmax (S) (Upper.Bound.K). The dimension of S was varied over [500,1000] in increments of 50 and 100
matrices were generate from each set of parameter values.
Figure 3. Pairs plot of the δ used to generate each matrix S (Delta.S) versus the true value
of log(κ(S)) (Log.True.K) and log(κmax (S)) (Log.Upper.Bound.K). The plot was done on
log scales for condition numbers and estimates of condition numbers because of the wide
range of observed condition numbers in these simulations.
1. THE EFFECTS OF OUR ALGORITHM ON THE CONDITION NUMBERS OF MATRICES
17
Figure 4. Pairs plot of the δ used to generate each matrix S (Delta.S) versus R’s estimate
of κ(S) (R.Estimate.K) and log(κmax (S)) (Log.Upper.Bound.K). The plot was done on log
scales for condition numbers and estimates of condition numbers because of the wide range
of observed condition numbers in these simulations.
As can be clearly seen from the results of these simulations, p has a linear effect on the true condition number
of the the matrix, our estimate of κ(S) and R’s built in estimate, while varying δ has non-linear effect on all
of these quantities. This is what the theoretical work we did in Chapter 1 would suggest ought to be true,
so these results are consistent with our earlier theory.
We performed all the simulations used to generate the previous figures varying δ, we did not vary ρ
independently for this set of simulations. This is justified for two reasons. First, our theory suggests that
ρ + δ determines the condition number of S (Result 3). Second, the simulations reported in Figures 5,6,7
indicate experimentally that ρ, δ have similar effects on κ(S). We begin by making the theoretical argument
that ρ, δ will have identical effects on κmax (S).
Consider λ1 (S) and λp (S) separately for our estimate of κmax (S). Our upper bound for λ1 (S) was
pδ + pρ + (1 − ρ − δ) which will obviously be equally affected by changes in ρ, δ. Likewise our lower bound
λp (S) is 1 − ρ − δ which displays the same property.
Now, we move on to describing how we tested the actual effects of ρ, δ on κ(S). We performed a
simulation varying the sum ρ + δ = c from c = 0.8 to c = 0.99 using a step size of .001, while allowing
δ ∼ U [0, c] and making ρ = c − δ. 100 matrices were generated from each value of ρ + δ ∈ [0.8, 0.99] with each
matrix having a different randomly generated combination of ρ and δ. It is important to stress that in these
simulations each of the 100 different matrices generated for a particular value of c had a different value of ρ, δ.
Note that in this simulation, ρ, δ are completely dependent random variables and δ ∼ U [0, c], ρ ∼ U [0, c].
Thus, ρ, δ were varied in such a way that ρ and δ were equally likely to be any value in the interval [0.0, c] for
each c we tested in the interval [.8, .99]. The true condition number of each of these 100 matrices was then
calculated, with the results for each value of c averaged and presented in Figure 5. Compare these results
to those presented in Figure 6. In Figure 6, we created 100 matrices for each δ from δ = 0 to δ = .19,
with a step size of .001, setting ρ = 0.8 for all the matrices. We then calculated the condition numbers of
the matrices which resulted from this method of choosing ρ, δ and averaged them for each value δ.
18
3. SIMULATIONS CONDUCTED TO EVALUATE THE ALGORITHM
Figure 5. Plot of the average true condition number for matrices constructed by increasing
δ + ρ = c from c = .8 to c = .99 while letting δ, ρ be uniformly distributed random variables
over [0.0, c]
Figure 6. Plot of the average true condition number for matrices constructed varying δ
from 0 to .99 while setting ρ = 0.8. Note that the values of c = ρ + δ will still go from c = .8
to c = .99 in this set of simulations
1. THE EFFECTS OF OUR ALGORITHM ON THE CONDITION NUMBERS OF MATRICES
19
While Figures 5 and 6 appear similar on inspection, a more informative graph might be constructed
by directly comparing the condition numbers of the matrices generated by randomly varying ρ and δ and
monotonically increasing δ for constant ρ. We did this in Figure 7. The results of this graph are interesting,
because it appears that those matrices with random ρ and δ have, on average, lower condition numbers than
matrices with fixed ρ and monotonically increasing δ (most red dots are below most blue dots). This may
seem odd given that our upper bound for the condition numbers of both matrices will be the same.
Figure 7. The average condition numbers found from matrices with randomly varied ρ + δ
subject to the constraint ρ + δ = c (red) and the average condition numbers of matrices
generated from ρ = .8 and δ increasing monotonically (blue) plotted against the values of c
used to generate each set of 100 matrices from c = [.8, .99]
However, the results of this simulation do not contradict our theory. In fact, a matrix with small δ and
large ρ will tend to have a higher condition number than a matrix with small ρ and large δ. To see why
this is so, consider that a matrix S with a large δ has more variable lowest and highest eigenvalues, while
a matrix with a large ρ will have less variable lowest and highest eigenvalues. If ρ is large then the matrix
with less variable lowest and highest eigenvalues will tend to have a condition number which is much closer
to κmax (S). Note this follows from the fact that if ρ is large, then δ is small, so E is small in some sense
and the spectrum of S will be approximately equal to the spectrum of Σ. If ρ is large and δ is small then
the lowest and highest eigenvalues of Σ will be close to those predicted by κmax (S) for all matrices S.
Postulate 1. Let ρ1 , δ1 be the parameters used to generate matrix S1 = Σ1 + E1 . In particular, let
ρ1 , δ1 be chosen so that δ1 , ρ1 ∼ U (0, 1) subject to the constraint that ρ1 + δ1 < 1. Further, let S2 = Σ2 + E2
with ρ2 large relative to δ2 so that ρ2 > E [ρ1 ] and δ2 ≤ E [δ1 ] on average, then E [κ(S1 )] < E [κ(S2 )] .
In light of Postulate 1 we re-examine the results we reported in Figure 7. Now, we clearly see that
the average κ(S1 ) realized in this data is indeed less than the average κ(S2 ), which is perfectly consistent
with our postulate and earlier theory.
20
3. SIMULATIONS CONDUCTED TO EVALUATE THE ALGORITHM
2. How different are S and Σ?
As we discussed in the previous section, our algorithm can produce matrices with widely varying condition
numbers, however we have not yet answered the question of whether or not our algorithm is useful. In order
to be useful, our algorithm would need to be able to generate a random matrix S which is discernibly different
the determined matrix Σ. We looked at a number of measures to evaluate the performance of our algorithm
in generating discernibly different matrices. In particular, we performed hypothesis tests on our simulated
correlation matrices S using Box’s M test of equality for covariance matrices [5]. Box’s M test is of the
hypotheses:
H0 : S = Σ0
H1 : S 6= Σ0
where Σ0 = Σ + δIp with Σ, δ as previously and Ip the p × p identity matrix. Σ0 is a transformation of Σ
into a correlation matrix because it now has a main diagonal of 1s.

1
ρ

Σ0 =  .
 ..
ρ

ρ ... ρ
1 . . . ρ

.. . .
.
. .. 
.
ρ ... 1
The test statistic for Box’s M is:
(2p2 + 3p − 1)
2
[(n − 1) log |Σ0 | − log |S| + tr(SΣ0−1 ) − p
X0 = (1 − C)W = 1 −
6(n − 1)(p + 1)
where
(2p2 + 3p − 1)
C=
6(n − 1)(p + 1)
W = (n − 1) log |Σ0 | − log |S| + tr(SΣ0−1 ) − p
and n is the number of observations which our estimated correlation matrix S is based on. This test statistic
does not follow any conventional distribution, however it can be modified so that the p-value of the test can
be estimated by:
αp
=
P (−2ρ log λ ≥ X02 )
=
P (Xf2 ≥ X02 ) + ω[P (Xf2+4 ≥ X02 ) − P (Xf2 ≥ X02 )]/(1 − C)2 + O((n − 1)−3 )
with
p(2p4 + 6p3 + p2 − 12p − 13)
p(p + 1)
and f =
2
288((n − 1) )(p + 1))
2
Box’s M test is a statistical test for correlation matrices. It is designed for testing parameter estimates
gathered from multivariate normal data. However, our correlation matrices are not estimates, they are
exactly equal to the idealized population parameters and the number of observations used in αp could be
said to infinity. Thus, it may seem slightly odd to utilize the test because it would be possible for us to
simply look at the two matrices, Σ0 and S and see if they are different. However, using Box’s M test allows
us to determine whether or not the simulated matrix S and defined Σ0 are different enough from one another
that it would be possible to tell them apart if S had been estimated from real data and Σ0 was the matrix
of true but unknown population correlations. The value for the number of observations n can be thought of
as indicative of the size of the data set from which we would need to estimate S in order to be sure it was
in fact different from Σ0 given the degree of difference between the entries of Σ0 and S.
ω=
2. HOW DIFFERENT ARE S AND Σ?
21
While Box’s M is a reasonable metric to use in evaluating whether or not our algorithm produces
meaningfully different matrices, there are a number of other measures we could use. In particular, given the
method we used to generate S a natural question might be how different S and Σ are under various different
matrix norms. The two norms which it is most natural to ask this question about are the Frobenius norm,
which is a generalization of Euclidean distance and is easily understood intuitively, and kSkop , which we
implicitly used in order to construct S.
Definition 7 (Frobenius Norm). For a matrix T , the Frobenius norm, denoted kT k2 is
v
uX
n
um X
kT k2 = t
|aij |2
i=1 j=1
Definition 8. Let A be a p × p positive matrix, then λ1 (A) is a matrix norm. In particular, it is the
operator norm for A. Denote it kSkop .
The value of kS − Σ0 k2 and kS − Σ0 kop was graphed versus δ from δ = 0 to δ = .19 for ρ = .8 with step
size .001 and n = 10. Based on these graphs it appears that the p-value of Box’s M is related, though not
linearly, to both the Frobenius and operator norms. The p-value of Box’s M seems to undergo a radical
jump for some critical value of the matrix norm for both the Frobenius and operator norms.
Further, it appears that varying δ linearly increases both the Frobenius and operator norm’s calculation
of the distance between S, Σ0 .
Figure 8. The Frobenius norm distance between S, Σ0 (Frobenius.Norm.Distance), the
values of δ used to generate S (Delta.S) and the P-value of Box’s M test for the hypotheses
that S and Σ0 are different (Box.M.Test.pvalue)
22
3. SIMULATIONS CONDUCTED TO EVALUATE THE ALGORITHM
Figure 9. The Modulus norm distance between S, Σ0 (Frobenius.Norm.Distance), the values of δ used to generate S (Delta.S) and the P-value of Box’s M test for the hypotheses
that S and Σ0 are different (Box.M.Test.pvalue)
CHAPTER 4
Extensions of our algorithm
We explored a number of natural modifications and extensions of the algorithm explored in depth in the
previous 3 chapters which allow us to (1) add gram matrices constructed with different δ for each row of E
to Σ, (2) construct block diagonal correlation matrices and (3) construct matrices with positive and negative
starting values of ρij .
1. Multiple δi
The simplest modification to the algorithm we have already explored is the method we used to generate
a different δi for each of p rows of E. Choose some max(δ), denoted δmax , subject to the constraint that
ρ + δmax < 1. Now choose, or randomly generate, up to p different δi subject
√ to the additional constraint
that δi ≤ δmax . Now create a Gram matrix composed of ei , where kei k ≤ δi . Denote the Gram matrix
which results from this method of choosing δ Ed .


δ1
he1 , e2 i . . . he1 , ep i
 he2 , e1 i
δ2
. . . he2 , ep i


Ed = 
..
..
.. 
..

.
.
.
. 
hep , e1 i hep , e2 i . . .
We now argue that Ed will have bounded off diagonal entries.
δp
Remark 6. The off diagonal entries of Ed are bounded. In particular, each off diagonal entry has
absolute value less than or equal to δmax .
Proof. Consider an arbitrary entry of Ed . This entry will be hei , ej i. Applying Theorem 2 we find
that
p p
|hei , ej i| ≤ δi δj ≤ δmax .
It is also necessary to slightly modify the matrix Σ for use with multiple δi , label the new matrix Σd .



Σd = 

1 − δ1
ρ
..
.
ρ
1 − δ2
..
.
...
...
..
.
ρ
ρ
..
.





ρ
ρ
. . . 1 − δp
By an identical argument to that we used in Chapter 1, it is apparent that Sd = Σd + Ed is a correlation
matrix because it satisfies the criteria of Remark 2. Further, the upper bound for the condition number
κmax (S) we described and derived earlier in Remark 3 still holds, so
κ(Sd ) ≤
pρ + pδmax + 1
1 − ρ − δmax
23
24
4. EXTENSIONS OF OUR ALGORITHM
except now the inequality will be sharp if and only if
δi = δmax
∀ i.
In other words, the inequality will be sharp if the different δi are all equal to one another, so we return
to the case which we have discussed at length in Chapter 1. The proof is trivial, so we do not present a
formal argument, however consider that if δi < δmax , for at least 1 δi , then λp (Σd ) = 1 − ρ − δi > 1 − ρ − δ
and the lowest possible eigenvalue of Ed remains zero. Thus, κ(Sd ) < κmax (Sd ). We now move onto the
block diagonal case.
2. Block Diagonal Correlation Matrices
The theory which motivates the construction of block diagonal matrices is extremely similar to that
which drives the construction of matrices with different δi for each row. Suppose we want a correlation
matrix T with dimension a × a. We would like this matrix to have larger entries in diagonal blocks than the
off diagonal blocks. In order to do this, we will first create a block diagonal matrix A and perturb it with a
Gram matrix G. First, consider the construction of A.
Let A be a block diagonal a × a matrix with N total blocks, where each block is correlation matrix.
Each of N diagonal blocks of A will be a Σ matrix identical to those we have constructed in earlier chapters.
We introduce the following notation.
Definition 9. Let A be a block diagonal matrix with dimension a × a. Denote each diagonal block as
Σt for 1 ≤ t ≤ N . Further, let each Σt ∈ Mbt ×bt . We define Σt to have off diagonal entries ρt and main
diagonal entries with value 1 − δt . Also, define δmaxN = max (δt ) and ρmaxN = max (ρt ). We further
1≤t≤N
1≤t≤N
require that
δmaxN + ρmaxN < 1.
Thus, each Σt is a matrix with the following structure



Σt = 

1 − δt
ρt
..
.
ρt
1 − δt
..
.
...
...
..
.
ρt
ρt
..
.
ρt
ρt
...
1 − δt





and A has the structure



A=

Σ1
0
..
.
0
Σ2
..
.
...
...
..
.
0
0
..
.





0
0 . . . ΣN
Now, we create an a × a Gram matrix G which mirrors the block structure of M . In order to construct
N
X
G we will consider choosing the requisite a vectors in t sets with cardinality bt , where
bt = a. In order to
t=1
emphasize the similarity of the process which produces G to the process which produces the E in the single
block case, denote each of these sets as Et . Thus, we have
Et = {et,1 , et,2 , . . . , et,b }.
2. BLOCK DIAGONAL CORRELATION MATRICES
We require that for each et,i ∈ Et that ket,i k =
25
√
δt . Note that et,i is a vector in an a dimensional vector
N
[
space. After choosing t of these sets of vectors, we form a Gram matrix from
Et . Denote this matrix G.
t=1



G=

he1,1 , e1,2 i
δ1
..
.
δ1
he1,2 , e1,1 i
..
.
...
...
..
.

he1,1 eN,b i
he1,2 , eN,b i


..

.
he1,1 eN,b i he1,2 , eN,b i . . .
δN
Now define T = A + G. T is a correlation matrix because it satisfies the criteria of Remark 2 and it will
be computationally invertible because it will still be subject to the argument in Remark 3.


1
ρ1 + he1,1 , e1,2 i . . . he1,1 eN,b i
 ρ1 + he1,2 , e1,1 i
1
. . . he1,2 , eN,b i


T =

..
..
..
..


.
.
.
.
he1,1 eN,b i
he1,2 , eN,b i
...
1
To illustrate using our algorithm to generate a block diagonal matrix, we provide an example.
Example 3. We will generate a correlation matrix T with dimension a × a = 4 × 4. Further, we want
to have two main diagonal blocks the first of which will have dimension b1 × b1 = 2 × 2 and the second of
which will have b2 × b2 = 2 × 2. Thus, we choose ρ1 and ρ2 with δ1 and δ2 subject to the constraints that:
(1)
δ 1 + ρ1 < 1
(2)
δ 1 + ρ2 < 1
(3)
δ 2 + ρ1 < 1
(4)
δ 2 + ρ2 < 1
Or equivalently,
max δt + max ρt = δmaxN + ρmaxN < 1.
t=1,2
t=1,2
Thus we have

A=
Σ1
02×2
02×2
Σ2
δ1
ρ1
=
0
0
ρ1
δ1
0
0
0
0
δ2
ρ2

0
0

ρ2 
δ2
Now choose 4 vectors
, e1,2 } and E2 = {e2,1 , e2,2 } subject
to the the constraint that
√ in two sets, E1 = {e1,1√
S
ke1,1 k = ke1,2 k = δ1 and ke2,1 k = ke2,2 k = δ2 . The Gram Matrix of E1 E2 . is:


δ1
he1,1 , e1,2 i he1,1 , e2,1 i he1,1 , e2,2 i
he1,2 , e1,1 i
δ1
he1,2 , e1,2 i he1,2 , e2,2 i

G=
he2,1 , e1,1 i he2,1 , e1,2 i
δ2
he2,1 , e2,2 i
he2,2 , e1,1 i he2,2 , e1,2 i he2,2 , e1,2 i
δ2
Thus, we have T = A + G where T is:


1
ρ1 + he1,1 , e1,2 i
he1,1 , e2,1 i
he1,1 , e2,2 i
ρ1 + he1,2 , e1,1 i
1
he1,2 , e1,2 i
he1,2 , e2,2 i 

T =
 he2,1 , e1,1 i
he2,1 , e1,2 i
1
ρ2 + he2,1 , e2,2 i
he2,2 , e1,1 i
he2,2 , e1,2 i
ρ2 + he2,2 , e1,2 i
1
26
4. EXTENSIONS OF OUR ALGORITHM
3. Make some ρij for Σ negative
A final extension of our algorithm allows us to incorporate positive and negative initial ρ. Let U be a
p × p diagonal matrix with +/ − 1 on the main diagonal. Note that U ∗ = U −1 so
U ΣU
has the same block diagonal structure as Σ, but now Σ has +/− signs intermixed with the off diagonal
entries. Further, U ΣU is unitarily equivalent to Σ so the eigenvalues of U ΣU are the same as those of Σ.
Now let S = U ΣU + G and observe that S will be a correlation matrix with some entries which are positive
and some that are negative.
4. Some concluding remarks
We have described an algorithm to generate random correlation matrices S which have bounded condition numbers. This allows us to determine how computationally stable the matrix S will be. The algorithm
we have presented here has a number of natural applications. One such application is to sensitivity test calculations which rely on correlation structure. In particular, our algorithm adds small random perturbations
to the hypothesized correlation matrix and recalculate any correlation related quantity.
Further, the algorithm has potential applications in fields where it is desirable to generate data with a
variety of correlation structures. If one is interested in finding a way to randomly create multivariate normal
data with a random collection of correlation matrices, our algorithm could easily allow one to generate any
number of different correlation structures. Further refinements of the methods we have presented here could
randomly perturb any starting positive definite matrix with known eigenvalues while bounding the condition
number of the perturbed matrix.
Bibliography
[1] Rajendra Bhatia. Matrix Analysis. Springer, 1997.
[2] Nicholas J. Higham. Computing the nearest correlation matrix-a problem from finance. IMA Journal of Numerical Analysis,
22(3), 2002.
[3] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 1985.
[4] Garry J. Tee. A simple example of an ill-conditioned matrix. ACM SIGNUM Newsletter, 7(3), 1972.
[5] Neil H. Timm. Applied Multivariate Analysis. Springer, 2002.
[6] Wikipedia. Covariance matrix, 2010.
27
Download