GAUSSIAN RANDOM VECTORS

advertisement
Chapter 2
GAUSSIAN RANDOM
VECTORS
2.1
Introduction
Gaussian random variables and Gaussian random vectors (vectors whose components are
jointly Gaussian, as defined later) play a central role in detection and estimation. Part
of the reason for this is that noise like quantities in many applications are reasonably
modeled as Gaussian. Another, perhaps more important reason, is that Gaussian random
variables turn out to be remarkably easy to work with (after an initial period of learning
their peculiarities). Jointly Gaussian random variables are completely described by their
means and covariances, which is part of the simplicity of working with them. When we find
estimates or detection rules, and when we evaluate their performance, the answers then
involve only those means and covariances.
A third reason why Gaussian random variables and vectors are so important is that we
shall find, in many cases, that the performance measures we get for estimation and detection problems for the Gaussian case often bounds the performance for other random
variables with the same means and covariances. For example, we will find that the minimum mean square estimator for Gaussian problems is the same, and has the same mean
square performance, as the linear least squares estimator for other problems with the same
mean and covariance. We will also find that this estimator is quite simple and is linear
in the observations. Finally, we will find that the minimum mean square estimator for
non-Gaussian problems always has a better performance than that for Gaussian problems
of the same mean and covariance, but that the estimator is frequently much more complex.
The point of this example is that non-Gaussian problems are often more easily and more
deeply understood if we first understand the corresponding Gaussian problem.
In this chapter, we develop the most important properties of Gaussian random variables
and vectors, namely the moment generating function, the moments, the joint densities, and
the conditional probability densities. We also develop the properties of covariance matrices
65
66
CHAPTER 2. GAUSSIAN RANDOM VECTORS
Figure 2.1: Graph of the density of a normalized Gaussian rv (the taller curve) and of
a zero mean Gaussian rv with variance 4 (the flatter curve).
(which apply to both Gaussian and non-Gaussian random variables and vectors), and review
a number of results about linear algebra that will be used in subsequent chapters.
2.2
Gaussian Random Variables
A random variable (rv) W is defined to be a normalized Gaussian rv if it has the density
µ
∂
1
−w2
pW (w) = √ exp
(2.1)
2
2π
Exercise 2.1 shows that pW (w) integrates to 1 (i.e., it is a probability density), and that W
has mean 0 and variance 1. If we scale W by a positive constant σ to get the rv Z = σW ,
then the density of Z at z = σw satisfies pZ (z)dz = pW (w)dw. Since dz/dw = σ, the
density of Z is
µ 2∂
1 ≥z¥
1
−z
pZ (z) = pW
=√
exp
(2.2)
σ
σ
2σ 2
2π σ
Thus the density function for Z is scaled horizontally by the factor σ, and then scaled
vertically by 1/σ (see Figure 2.1). This scaling leaves the integral of the density unchanged
with value 1 and scales the variance by σ 2 . If we let σ approach 0, this density approaches
an impulse, i.e., Z becomes the atomic random variable for which Pr(Z=0) = 1. For
convenience in what follows, we use (2.2) as the density for Z for all σ ≥ 0, with the above
understanding about the σ = 0 case. A rv with the density in (2.2), for any σ ≥ 0, is defined
to be a zero mean Gaussian rv. The values P r(|Z| < σ) = .682, P r(|Z| < 3σ) = .997 give
us a sense of how small the tails of the Gaussian distribution are.
If we shift Z to U = Z + m, then the density shifts so as to be centered at m, the mean
becomes m, and the density satisfies pU (u) = pZ (u − m), so that
µ
∂
1
−(u − m)2
pU (u) = √
exp
(2.3)
2σ 2
2π σ
A random variable U with this density, for arbitrary m and σ ≥ 0, is defined to be a
Gaussian random variable and is denoted U ∼ N (m, σ 2 ).
2.2. GAUSSIAN RANDOM VARIABLES
67
The added generality of a mean often obscures formulas; we will usually work with zero
mean rv’s and random vectors (r~v’s) and insert the means later. That is, any random
variable can be regarded as a constant (the mean) plus a zero mean random variable (called
the fluctuation). When necessary for notation, we denote U = mU + f
U , where f
U is the
f
fluctuation of U around its mean mU , and work with U .
The moment generating function (MGF) of an arbitrary rv Z is defined to be gZ (s) =
E[exp(sZ)]. We usually take s to be a real variable, but it can also be regarded as complex,
with real part α and imaginary part jω. The two sided Laplace transform of the density of
Z is equal to gZ (−s). Similarly, the Fourier transform of the density of Z, as a function of ω,
is gZ (−jω) and the characteristic function of Z is gZ (jω). The characteristic function and
Fourier transform have the advantage that, for real ω, they exist for all random variables,
whereas the MGF and Laplace transform (for real s 6= 0) exist only if the tails of the
distribution approach 0 at least exponentially. For the rv’s of interest here, the MGF exists
for at least a region of real s around 0, and thus, if one calculates one of these transforms,
say the MGF, the others follow simply by substituting −s, jω, or −jω for s.
For the Gaussian rv Z ∼ N (0, σ 2 ), gZ (s) can be calculated as follows:
∑ 2∏
Z 1
1
−z
gZ (s) = E[exp(sZ)] = √
exp(sz) exp
dz
2σ 2
2π σ −1
∑ 2
∏
Z 1
1
−z + 2σ 2 sz − s2 σ 4 s2 σ 2
= √
exp
+
dz
2σ 2
2
2π σ −1
∑ 2 2∏Ω
∑
∏ æ
Z 1
s σ
1
−(z − sσ)2
√
= exp
exp
dz
2
2σ 2
2π σ −1
∑ 2 2∏
s σ
= exp
2
(2.4)
(2.5)
(2.6)
We completed the square in the exponent in (2.4) and then recognized, in (2.5), that the
term in braces is the integral of a probability density and thus equal to 1. If the MGF of
Z exists for a region of real s around 0 (as it does for Gaussian rv’s), it can be used to
calculate all the moments of Z. As shown in Exercise 2.2, E[Z 2k ], for Z ∼ N (0, σ 2 ), is
given, for all integers k ≥ 1, by
E[Z 2k ] =
(2k)!σ 2k
= (2k − 1)(2k − 3)(2k − 5) . . . (3)(1)σ 2k
k! 2k
(2.7)
Thus, E[Z 4 ] = 3σ 4 , E[Z 6 ] = 15σ 6 , etc. Since z 2k+1 is an odd function of z and the
Gaussian density is an even function, the odd moments of Z are all zero. For an arbitrary
Gaussian rv U ∼ N (m, σ 2 ), we have U = m + f
U , where f
U ∼ N (0, σ 2 ). Thus gU (s) =
E[exp(s(m + f
U )] = esm E[s f
U ] = esm exp(s2 σ 2 /2). It turns out that the distribution
function of a random variable is uniquely determined1 by its moment generating function
1
More precisely, the distribution function is uniquely determined except on a set of points of measure
zero.
68
CHAPTER 2. GAUSSIAN RANDOM VECTORS
(if the MGF exists in an interval around 0), so we see that a random variable U is Gaussian
with mean m and variance σ 2 if and only if (iff) its moment generating function is
µ
∂
s2 σ 2
gZ (s) = exp sm +
(2.8)
2
2.3
Gaussian Random Vector and MGF’s
An n by m matrix A is an array of nm elements arranged in n rows and m columns; aij
denotes the j th element in the ith row. Unless specified to the contrary, the elements will
be real numbers. The transpose AT of an n by m matrix A is an m by n matrix B with
bji = aij for all i, j. A matrix is square if n = m and a square matrix A is symmetric
if A = AT . If A and B are each n by m matrices, A + B is an n by m matrix C with
cij = aij + bij for all i, j. If APis n by m and B is m by r, the matrix AB is an n by r
matrix C with elements cik = j aij bjk . A vector (or column vector) of dimension n is an
n by 1 matrix and a row vector of dimension n is a 1 by n matrix. Since the transpose of a
vector is a row vector, we denote a vector ~a as (a1 , . . . , an )T . The reader is expected to be
familiar with vector and matrix manipulations.
An n-dimensional random vector (an n-r~v) is a mapping from the sample space into the
space <n of n-dimensional real vectors. We could view an n-r~v simply as n individual
random variables, but vector notation allows us to state results much more compactly for
vectors than for the set of individual rv’s. Sampled time stochastic processes can be viewed
simply as random vectors (although the dimension is often infinite), and continuous time
stochastic processes are usually best studied by various expansions that transform them to
r~v’s (although again often of infinite dimension). Thus, a thorough understanding of r~v’s is
essential to everything that follows in this subject.
~ = (Z1 , Z2 , . . . , Zn )T is simply the joint probThe probability density, pZ~ (~z ), of an n-r~v Z
~ denoted m
~ is the
ability density of the components Z1 , . . . , Zn . The mean of Z,
~ Z~ or E[Z],
T
real vector (mZ1 , mZ2 , . . . , mZn ) where mZi =hE[Zi ] for 1 ≤ i ≤ n. iSimilarly the covari~ denoted K ~ , is defined as E (Z
~ −m
~ −m
ance matrix of Z,
~ )(Z
~ )T . This is an n by n
Z
~
Z
~
Z
symmetric matrix whose element in the ith row, jth column, is the covariance of Zi and Zj ,
~
i.e., E[(Zi − mi )(Zj − mj )]. Finally, the moment generating function (MGF) of an n-r~v Z
~ where ~s = (s1 , . . . , sn )T is an n-dimensional vector. The
is defined as gZ~ (~s) = E[exp(~s T Z)]
components of s could be taken to be complex, but we usually view them as real. If the
components of a r~v are independent and identically distributed (IID), we call the vector an
IID r~v.
~ where
Example 2.1. An example that will become very familiar is that of an IID n-r~v W
the components Wi , 1 ≤ i ≤ n, are each normalized Gaussian, Wi ∼ N (0, 1). By taking the
~ = (W1 , W2 , . . . , Wn )T is
product of n densities given by (2.1), the density of W
µ
∂
µ
∂
1
−w12 − w22 − . . . − wn2
1
−w
~ Tw
~
pW~ (w)
~ =
exp
=
exp
(2.9)
2
2
(2π)n/2
(2π)n/2
2.3. GAUSSIAN RANDOM VECTOR AND MGF’S
69
w2
w1
Figure 2.2: Contours of equal probability density for two normalized IID Gaussian rv’s.
~ at a sample value w
The joint density of W
~ depends only on the squared distance w
~ Tw
~
of the sample value from the origin. That is, pW~ (w)
~ is spherically symmetric around the
origin, and points of equal probability density lie on concentric spheres around the origin
(see Figure 2.2).
~ is easily calculated as follows:
The moment generating function of W
"
#
Y
~ )] = E[exp(s1 W1 + · · · + sn Wn ] = E
g (~s) = E[exp ~s T W
exp(si Wi )
~
W
i
=
Y
i
E [exp(si Wi )] =
Y
i
exp
µ
∂
∑ T ∏
s2i
~s ~s
= exp
,
2
2
(2.10)
where we have used, first, the fact that the independence of {W1 , . . . , Wn } guarantees the
independence of {exp(s1 W1 ), . . . , exp(sn Wn )}, next, the fact that the expected value of a
product of independent rv’s is equal to the product of the expected values, and, finally, the
fact that (2.6) gives the MGF of each Wi .
We now go on to define the general class of Gaussian r~v’s.
~ =
Definition: {Z1 , Z2 , . . . , Zn } is a set of jointly Gaussian random variables, and Z
T
T
(Z1 , . . . , Zn ) is a Gaussian r~v, if, for all real vectors ~s = (s1 , . . . , sn ) , the linear combi~ = s1 Z1 + s2 Z2 + · · · + sn Zn is a Gaussian random variable.
nation ~s T Z
The intuitive idea here is that Gaussian rv’s arise in practice because of the addition of large
numbers of small essentially independent rv’s (the central limit theorem indicates that such
a sum can be approximated by a Gaussian rv). For example, when a broad band noise
waveform is passed through a narrow band linear filter, the output at any given time is
usually well approximated as a Gaussian rv. A linear combination of outputs at different
times is also a sum of the same set of small, essentially independent, underlying rv’s, and
that sum can again be approximated as Gaussian. Thus we would expect a set of outputs
at different times to be a jointly Gaussian set according to the above definition.
Note that if {Z1 , . . . , Zn } is jointly Gaussian, then for each i, 1 ≤ i ≤ n, Zi is a Gaussian
random variable (as follows by choosing si = 1 and sj = 0 for j 6= i in the definition
above). In the same way, each subset of {Z1 , . . . , Zn } is also jointly Gaussian. However, if Z1 , Z2 , . . . , Zn are each Gaussian rv’s, it does not necessarily follow that the set
70
CHAPTER 2. GAUSSIAN RANDOM VECTORS
{Z1 , . . . , Zn } is jointly Gaussian. As a simple example, let Z1 ∼ N (0, 1), let X be +1 or
−1, each with probability 1/2, and let Z2 = Z1 X1 . Then Z2 ∼ N (0, 1). The joint probability density, pZ1 Z2 (z1 , z2 ) is then impulsive on the diagonals where z2 = ±z1 and is zero
elsewhere. Then, Z1 + Z2 can not be Gaussian, since it takes on the value 0 with probability one half2 . Exercise 2.3 gives another example. The distinction between individually
Gaussian and jointly Gaussian rv’s is much more than mathematical nit picking, since the
remarkable properties of Gaussian random vectors follow largely from the jointly Gaussian
property rather than merely the property of being individually Gaussian.
~ = (Z1 , . . . , Zn )T under
We now find the moment generating function (MGF) of the r~v Z
~
the assumption that Z is a zero mean Gaussian r~v. The MGF, by definition, is
"
√
!#
h
i
X
~ = E exp
g (~s) = E exp(~s T Z)
si Zi
~
Z
i
~ Since Z
~ is a Gaussian r~v (i.e., (Z1 , . . . , Zn ) is jointly
For any given ~s, let X = ~s T Z.
~ is zero mean (i.e., all components are zero
Gaussian), X is also Gaussian, and since Z
2 /2], so that g (1) = exp[σ 2 /2]. For
mean), X is zero mean. From (2.6), gX (r) = exp[r2 σX
X
X
~ then,
X = ~s T Z
2
~ = E[exp(X)] = g (1) = exp(σX
gZ~ (~s) = E[exp(~s T Z)]
/2)
X
2 = E[X 2 ] = E[~
~Z
~ T ~s ] = ~s T K ~ ~s, where K ~ is the covariance matrix of Z.
~
Finally σX
sTZ
Z
Z
~ is
Thus, the moment generating function of an arbitrary zero mean Gaussian r~v Z
"
#
~s T KZ~ ~s
gZ~ (~s) = exp
(2.11)
2
~ = (U1 , . . . , Un )T be an arbitrary Gaussian r~v and, for 1 ≤ i ≤ n, let mi =
Next let U
E[Ui ] and let Zi = Ui − mi be the fluctuation of Ui . Letting m
~ = (m1 , . . . , mn )T and
~ = (Z1 , . . . , Zn ), we have U
~ = m
~ Thus the MGF of U
~ is given by g (s) =
Z
~ + Z.
~
U
T
T
T
~ = exp(~s m)g
~ and U
~ have
E[exp(~s m
~ + ~s Z)]
~ Z~ (~s). Using (2.11) and recognizing that Z
the same covariance function,
√
!
~s T KU~ ~s
T
gU~ (~s) = exp ~s m
~ +
(2.12)
2
Note that the MGF of a Gaussian r~v is completely specified by its mean and covariance. Be~ of mean m
~ ∼ N (m,
cause of this, we denote a Gaussian r~v U
~ and covariance KU~ as U
~ KU~ ).
~
Note also that if a r~v U has the MGF in (2.12), then, as shown in Exercise 2.5, each linear
~ is a Gaussian rv. Thus we have the theorem,
combination of the components of U
~ with mean m
Theorem 2.1. A r~v U
~ and covariance KU~ is a Gaussian r~v iff gU~ (~s) is given
by (2.12).
2
One frequently hears the erroneous statement that uncorrelated Gaussian rv’s are independent. This
is false as the above example shows. The correct statement, as we see later, is that uncorrelated jointly
Gaussian rv’s are independent.
2.4. JOINT PROBABILITY DENSITIES FOR GAUSSIAN RANDOM VECTORS
71
Note that the MGF’s in (2.6), (2.8), and (2.11) are special cases of (2.12). Also the MGF
~ in (2.10) agrees with (2.12) since the mean of W
~ is zero and the covariance
of the IID r~v W
~ , not surprisingly, is a Gaussian r~v, i.e., W
~ ∼ N (0, I).
is the identity matrix I. Thus W
2.4
Joint Probability Densities for Gaussian Random Vectors
In this section, we start with an example of a zero mean Gaussian r~v that is defined as a
~ of example 2.1. We show later
linear transformation of the IID normalized Gaussian r~v W
that this example in fact covers all zero mean Gaussian r~v’s.
~ be an IID normalized Gaussian n-r~v,
Example 2.2. Let A be an n by n real matrix, let W
~
~
~
~ is
and let the n-r~v Z be given by Z = AW . The covariance matrix of Z
~Z
~ T ] = E[AW
~W
~ TAT ] = AAT
KZ~ = E[Z
(2.13)
~ since
since E[W W T ] is the identity matrix, In . We can easily find the MGF of Z
~ = E[exp(~s T AW
~ )] = E[exp{(AT ~s)T W
~ }] = g (AT ~s)
gZ~ (~s) = E[exp(~s T Z)]
~
W
"
#
∑ T
∏
~s T KZ~ ~s
~s AAT ~s
= exp
= exp
(2.14)
2
2
~ is a zero mean Gaussian r~v. We shall see
Comparing (2.14) with (2.11), we see that Z
~ = AW
~ for some
shortly that any zero mean Gaussian r~v can be represented in the form Z
~.
real n by n matrix A and IID normalized Gaussian r~v W
~ = AW
~ . First consider the corresponding
We next find the joint probability density of Z
transformation of real valued vectors, ~z = Aw.
~ Let ~ei be the ith unit vector (i.e., the vector
whose ith component is 1 and whose other components are 0). Then A~ei = ~ai , where ~ai
is the ith column of A. Thus, ~z = Aw
~ transforms the unit vectors ~ei into the columns ~ai
of A. For n = 2, Figure 2.3 shows how this transformation carries the unit square with the
corners ~0, ~e1 , ~e2 , and (~e1 + ~e2 ) into the parallelogram with corners ~0, ~a1 , ~a2 , and (~a1 + ~a2 ).
For an arbitrary number of dimensions, the unit cube in the w
~ space is the set of points
w
~ such that 0 ≤ wi ≤ 1 for i = 1, . . . , n. There are 2n corners of the unit cube, and each
is some 0/1 combination of the unit vectors, i.e., each has the form ~ei1 + ~ei2 + · · · + ~eik .
The transformation Aw
~ carries the unit cube into a parallelepiped, where each corner of
the cube, ~ei1 + ~ei2 + · · · + ~eik , is carried into a corresponding corner ~ai1 + ~ai2 + · · · + ~aik of
the parallelepiped. One of the most interesting and geometrically meaningful properties of
the determinant, det(A), of a square matrix A is that the magnitude of that determinant,
| det(A)|, is equal to the volume of that parallelepiped (see Strang, Linear Algebra, sec.
4.4). If det(A) = 0, i.e., if A is singular, then the n-dimensional unit cube in the w space
72
CHAPTER 2. GAUSSIAN RANDOM VECTORS
w2
~e2
~e1 + ~e2
w1
~e1
z2
◗
s
◗
✁~a2
✁
✁
✁
✁
◗
s
◗
✁~a1 + ~a2
✁
✁
✁
✁ z1
A=
∑
1 0.5
0 1
∏
~a1
Figure 2.3: Unit square transformed to a parallelogram.
is transformed into a smaller dimensional parallelepiped whose volume (as a region of ndimensional space) is 0. We assume in what follows that det(A) =
6 0 so that the inverse of
A exists.
~ and let w
Now let ~z be a sample value of Z,
~ = A−1~z be the corresponding sample value of
~ . The joint density at ~z must satisfy
W
pZ~ (z)|d~z | = pW~ (w)|d
~ w|
~
(2.15)
where |dw|
~ is the volume of an incremental cube with dimension δ = dwi on each side,
and |d~z | is the volume of that incremental cube transformed by A. Thus |dw|
~ = δ n and
|d~z | = δ n | det(A)| so that |d~z |/|dw|
~ = | det(A)|. Using this in (2.15), and using (2.9) for
~ = AW
~ is
pW~ (w)
~ = pW~ (A−1~z ), we see that the density of a jointly Gaussian vector Z
£
§
exp 12 ~z T (A−1 )T A−1~z
pZ~ (~z ) =
(2.16)
(2π)n/2 | det(A)|
−1 T −1 and det(K ) = det(A) det(AT ) =
From (2.13), we have KZ~ = AAT , so K −1
~
~ = (A ) A
Z
Z
[det(A)]2 > 0. Thus (2.16) becomes
h
i
exp − 12 ~z T K −1
~z
~
Z
q
pZ~ (~z ) =
(2.17)
(2π)n/2 det(KZ~ )
Eq. (2.17) doesn’t have any meaning when A, and thus KZ~ , is singular, since K −1
~ does not
Z
then exist. In the case where A is singular, Aw
~ maps the set of n-dimensional vectors w
~
into a subspace of dimension less than n, and pZ~ (~z ) is 0 outside of that subspace and is
~ can
impulsive inside. What this means is that some components of the random vector Z
be expressed as linear combinations of other components. In this case, one can avoid very
messy notation by simply defining the components that are linearly independent as forming
~ 0 , and representing the other components as linear combinations of the
a random vector Z
~ 0 . With this stratagem, we can always take the covariance of any such
components of Z
reduced r~v to be nonsingular.
~ =m
~ . Define f
~ as the fluctuation in U
~ , and note that (2.17)
Next consider U
~ + AW
U = AW
f
f
~ ] = m.
~
can be used for the density of U . Since E[ U ] = 0, we see that E[U
~ Assuming
2.5. PROPERTIES OF COVARIANCE MATRICES
73
~ as a translation of that for
det(A) 6= 0, we can immediately write down the density for U
f
U,
h
i
exp − 12 (~u − m)
~ T K −1
(~
u
−
m)
~
~
U
q
pU~ (~u) =
(2.18)
(2π)n/2 det(KU~ )
T
~ and f
where KU~ = E[ f
Uf
U ] is the covariance matrix of both U
U . We soon show that this
~
is the general form of density for a Gaussian random vector U ∼ N (m,
~ KU~ ) if det(KU~ ) 6= 0.
~ ∼ N (m,
In fact, we have already shown that for any Gaussian r~v U
~ KU~ ), the MGF satisfies
~ . Thus, if there is
(2.12), and it turns out that this uniquely specifies the distribution of U
T
~
a matrix A with det(A) 6= 0 such that KU~ = AA , then U ∼ N (m,
~ KU~ ) can be expressed
~
~
as m
~ + AW , and (2.18) must be the density of U . We shall soon see that any covariance
matrix KU~ can be expressed as AAT for some square matrix A. In most of what follows,
we restrict our attention to zero mean Gaussian r~v’s, since, as we have shown, it is usually
~ , and then include the mean later.
simpler to deal with the fluctuation f
U of U
For the 2-dimensional zero mean case, let E[Z12 ] = σ12 , E[Z22 ] = σ22 and E[Z1 Z2 ] = k12 .
2 = σ 2 σ 2 (1 −
Define the normalized covariance, ρ, as k12 /(σ1 σ2 ). Then det(KZ~ ) = σ12 σ22 − k12
1 2
2
2
ρ ). For A to be non-singular, we need det(KZ~ ) = [det(A)] > 0, so we need |ρ| < 1. We
then have
K −1
~
Z
1
= 2 2
2
σ1 σ2 − k12
pZ~ (~z ) =
=
∑
σ22 −k12
−k12 σ12
1
2π
p
exp
2
σ12 σ22 − k12
1
p
exp
2πσ1 σ2 1 − ρ2
µ
µ
∏
1
=
1 − ρ2
∑
1/σ12
−ρ/(σ1 σ2 )
−ρ/(σ1 σ2 )
1/σ22
−z12 σ22 + 2z1 z2 k12 − z22 σ12
2 )
2(σ12 σ22 − k12
∏
∂
−(z1 /σ1 )2 + 2ρ(z1 /σ1 )(z2 /σ2 ) − (z2 /σ2 )2
2(1 − ρ2 )
∂
(2.19)
This is why we use vector notation! There are two lessons in this. First, hand calculation
is frequently messy for Gaussian r~v’s, and second, the vector equations are much simpler,
so we must learn to reason directly from the vector equations and use standard computer
programs to do the calculations.
2.5
Properties of Covariance Matrices
In this section, we summarize some properties of covariance matrices that will be used
~ exists
frequently in what follows. A matrix K is a covariance matrix if a zero mean r~v Z
~Z
~ T ]. It is important to realize that the properties developed here apply
such that K = E[Z
to non-Gaussian as well as Gaussian r~v’s. An n by n matrix K is positive semi-definite if it
74
CHAPTER 2. GAUSSIAN RANDOM VECTORS
is symmetric and if, for all real n-vectors ~b, ~bT K~b ≥ 0. It is positive definite if, in addition,
~bT K~b > 0 for all non-zero ~b. Our objective in this section is to list the relationships between
these types of matrices, and to state some other frequently useful properties of covariance
matrices.
~ be a zero mean
1) Every covariance matrix K is positive semi-definite. To see this, let Z
T
~
~
n-r~v such that K = E[Z Z ]. K is symmetric since E[Zi Zj ] = E[ZjhZi ] for alli i, j. Let ~b be
~ Then 0 ≤ E[X 2 ] = E ~bT Z
~Z
~ T ~b = ~bT K~b.
an arbitrary real n-vector, and let X = ~bT Z.
~ as above
2) A covariance matrix K is positive definite iff det(K) 6= 0. To see this, define Z
T
T
~
~
~
~
~
and note that if b K b = 0 for some b 6= 0, then X = b Z has zero variance, and therefore
~ T ] = 0, so ~bT E[Z
~Z
~ T ] = 0. Since ~b 6= 0 and ~bK = 0,
is zero with probability 1. Thus E[X Z
we must have det(K) = 0. Conversely, if det(K) = 0, there is some ~b such that K~b = 0, so
~bT K~b is also 0.
3) A complex number ∏ is an eigenvalue of a matrix K if K~q = ∏~q for some non-zero vector
~q ; the corresponding ~q is called an eigenvector. The following results about the eigenvalues
and eigenvectors of positive definite (semi-definite) matrices K are standard linear algebra
results (see for example, Strang, section 5.5)3
All eigenvalues of K are positive (non-negative). All the eigenvectors can be taken to
be real. All eigenvectors of different eigenvalues are orthogonal, and if an eigenvalue has
multiplicity j, then it has j orthogonal eigenvectors. Altogether, n orthogonal eigenvectors
can be chosen, and they can be scaled to be orthonormal.
4) If K is positive semi-definite, there is an orthonormal matrix Q whose columns, ~q1 , . . . , ~qn
are the orthonormal eigenvectors above. Q satisfies KQ = QΛ where Λ = diag(∏1 , . . . , ∏n )
is the diagonal matrix whose ith element, ∏i , is the eigenvalue of K corresponding to the
eigenvector ~qi . This is simply the vector version of the eigenvector/eigenvalue relationship
in property 3. Q also satisfies QT Q = I where I is the identity matrix. This follows since
~qiT ~qj = δij . We then also have Q−1 = QT .
5) If K is positive semi-definite, and if Q and Λ are the matrices above, then K = QΛQT .
If K is positive definite, then also K −1 = QΛ−1 QT . The first equality follows from property
4, and the second follows by multiplying the expression for K by that for K −1 , which exists
because all the eigenvalues are positive.
Q
6) For a symmetric n by n matrix K, det K = ni=1 ∏i where ∏1 , . . . , ∏n are the eigenvalues
of K repeated according to their multiplicity. Thus if K is positive definite, det K > 0 and
if K is positive semi-definite, det K ≥ 0.
3
Strang also shows that K is positive definite iff all the upper left square submatrices have positive
determinants. This is often called
∑ Sylvester’s
∏ test. This test does not extend to positive semi-definite
0
0
matrices (for example, the matrix
is not positive semi-definite even though the upper left square
0 −1
submatrices have non-negative determinants). K is positive semi-definite iff all the principal submatrices
(i.e., the submatrices resulting from dropping a subset of rows and the corresponding subset of columns)
have non-negative determinants.
2.6. GEOMETRY AND PRINCIPAL AXES FOR GAUSSIAN DENSITIES
75
7) If K is a positive definite (semi-definite) matrix, then there is a unique positive definite
(semi-definite) square root matrix R satisfying R2 = K. In particular, R is given by
≥p
p
p ¥
R = QΛ1/2 QT where Λ1/2 = diag
∏1 , ∏2 , . . . , ∏n
(2.20)
8) If K is positive semi-definite, then K is a covariance matrix. In particular, K is the
~ = RW
~ where R is the square root matrix in (2.20) and W
~ is an IID
covariance matrix of Z
normalized Gaussian r~v.
9) For any real n by n matrix A, the matrix K = AAT is a covariance matrix. In particular,
~ = AW
~.
K is the covariance matrix of Z
For any given covariance matrix K, there are usually many choices for A satisfying K =
AAT . The square root matrix R above is simply a convenient choice. The most important
of the results in this section are summarized in the following theorem:
Theorem 2.2. A real n by n matrix K is a covariance matrix iff it is positive semi-definite.
Also it is a covariance matrix iff K = AAT for some real n by n matrix A. One choice for
A is the square root matrix R in (2.20).
We now see, as summarized in the following corollary, that example 2.2 above is completely
general.
~ ∼ N (0, K)
COROLLARY 1: For any covariance matrix K, a zero mean Gaussian r~v Z
~ can be viewed as AW
~ , where AAT = K and W
~ ∼ N (0, In ). The MGF of Z
~ is
exists and Z
given by (2.11), and, if det(K) 6= 0, the probability density is given by (2.17). The density
~ ∼ (m,
for an arbitrary Gaussian r~v U
~ K), with det(K) 6= 0, is given by (2.18).
2.6
Geometry and Principal Axes for Gaussian Densities
~ be an arbitrary zero mean Gaussian n-r~v with non-singular covariance K. For any
Let Z
given c > 0, the points ~z for which ~z T K~z = c form a contour of equal probability density for
~ as seen by (2.17). This is a quadratic equation in ~z and is the equation for an ellipsoid
Z,
centered on the origin. The principal axes of this ellipsoid are given by the eigenvectors of
K. In order to understand this, let Q be an orthonormal matrix whose columns, ~q1 , . . . , ~qn
are the eigenvectors of K and let Λ be the corresponding diagonal matrix of eigenvalues (as
~ = QT Z.
~ The covariance matrix of V
~
in property 4 above). Consider the transformation V
is then
~V
~ T ] = E[QT Z
~Z
~ T Q] = QT KQ = Λ,
KV~ = E[V
(2.21)
~ is a zero mean
where the last step follows from K = QΛQT and QT = Q−1 . Since Z
~
Gaussian r~v, V is also, so it has the joint probability density
h
i
£ P
§
−1
T
n
exp −~v K ~ ~v /2
Y
exp − i vi2 /(2∏i )
exp[−vi2 /(2∏i )]
V
£
§
√
√
Q
pV~ (~v ) =
=
=
(2.22)
(2π)n/2 [det(KV~ )]1/2
(2π)n/2
2π∏
i
i ∏i
i=1
76
CHAPTER 2. GAUSSIAN RANDOM VECTORS
~e2
~q1
~q2
z1~e1
v1 ~q1
~e1
z2~e2
v2 ~q2
~x
Figure 2.4: Representation of a point in two different co-ordinate systems. Point ~x is
represented by (z1 , z2 )T in the ~e1 , ~e2 system and by (v1 , v2 )T in the ~q1 , ~q2 system.
~ = QT Z
~ carries each sample value ~z for the r~v Z
~ into the sample
The transformation V
~ . Visualize this transformation as a change of basis, i.e., ~z represents
value ~v = QT ~z of V
some point ~x of n-dimensional real space in terms of a given co-ordinate system and ~v
represents the same ~x in a different co-ordinate system (see Figure 2.4). In particular, let
~e1 , ~e2 , . . . , ~en be the basis vectors in co-ordinate system 1, and let ~q1 , ~q2 , . . . , ~qn be the basis
vectors in co-ordinate system 2. Then
~x =
n
X
i=1
zi~ei =
n
X
vi ~qi
(2.23)
i=1
In co-ordinate system 1, ~ei is simply the ith unit vector, containing 1 in position i and 0
elsewhere. In this co-ordinate system, ~x is represented by the n-tuple (z1 , . . . , zn )T . Also,
in co-ordinate system 1, ~qi is the ith normalized eigenvector of the matrix K. Since the
orthonormal matrix Q has the columns ~q1 , . . . , ~qn , the right hand side of (2.23) can be
written, in co-ordinate system 1, as Q~v , where ~v is the n-tuple (v1 , . . . , vn )T . Thus, in
co-ordinate system 1, ~z = Q~v , so ~v = Q−1~z = QT ~z , verifying that ~v = QT ~z actually
corresponds to the indicated change of basis. Note that ~q1 , . . . , ~qn , are orthonormal in both
co-ordinate systems, and ~e1 , . . . , ~en are also orthonormal in both systems. This means that
the transformation can be viewed as a rotation, i.e., distances are unchanged (see Exercise
2.7).
For any given real number c > 0. the set of vectors ~z for which ~z T K −1~z = c form a contour
~ (in co-ordinate system 1). The corresponding contour in
of equal probability density for Z
in co-ordinate system
2, with ~z = Q~v is given by ~v T QT K −1 Q~v = c. This can be rewritten
P
as ~v T Λ−1~v = c, or i vi2 /∏i = c (as can also be seen from (2.22). This is the equation of
an ellipsoid, but the principal axes of the ellipse are now aligned with co-ordinate system 2,
using basis vectors ~q1 , . . . , ~qn (see Figure
√ 2.5). The distances from the origin to the ellipsoid
in these principal axis directions are c∏i , 1 ≤ i ≤ n.
2.7. CONDITIONAL PROBABILITIES
77
Figure 2.5: Lines of equal probability density form ellipses. The principal axes of
these ellipses are the eigenvectors of K. The intersection of an ellipse with each axis is
proportional to the square root of the corresponding eigenvalue.
2.7
Conditional Probabilities
Next consider the conditional probability pX|Y (x|y) for two jointly Gaussian zero mean rv’s
X and Y with a non-singular covariance matrix. From (2.19),
∑
∏
1
−(x/σX )2 + 2ρ(x/σX )(y/σY ) − (y/σY )2
p
pX,Y (x, y) =
exp
,
2(1 − ρ2 )
2πσX σY 1 − ρ2
where ρ = E[XY ]/(σX σY ). Since
pX|Y (x|y) =
and Y ∼ N (0, σY2 ), we have
pX,Y (x, y)
pY (y)
∑
−(x/σX )2 + 2ρ(x/σX )(y/σY ) − ρ2 (y/σY )2
p
pX|Y (x|y) =
exp
2(1 − ρ2 )
σX 2π(1 − ρ2 )
1
∏
This simplifies to
"
− [x − ρ(σX /σY )y]2
p
pX|Y (x|y) =
exp
2 (1 − ρ2 )
2σX
σX 2π(1 − ρ2 )
1
#
(2.24)
This says that, given any particular sample value y for the rv Y , the conditional density of
2 (1 − ρ2 ) and mean ρ(σ /σ )y. Given Y = y, we can view
X is Gaussian with variance σX
X
Y
X as a random variable
in
the
restricted
sample
space
where
Y = y, and in that restricted
°
¢
2 (1 − ρ2 ) . This means that the fluctuation of X, in
sample space, X is N ρ(σX /σY )y, σX
this restricted space, has the same density for all y. When we study estimation, we shall find
that the facts that, first, the conditional mean is linear in y, and, second, the fluctuation is
independent of y, are crucially important. These simplifications will lead to many important
properties and insights in what follows. We now go on to show that the same kind of
simplification occurs when we study the conditional density of one Gaussian random vector
conditional on another Gaussian random vector, assuming that all the variables are jointly
Gaussian.
78
CHAPTER 2. GAUSSIAN RANDOM VECTORS
~ be an n + m dimensional Gaussian r~v. View U
~ as a pair of r~v’s X,
~ Y
~ of dimensions
Let U
T
~
m and n respectively. That is, U = (U1 , U2 , . . . , Un+m ) = (X1 , X2 , . . . , Xm , Y1 , . . . , Yn )=
~T,Y
~ T ). X
~ and Y
~ are called jointly Gaussian r~v’s. If the covariance matrix of U
~ is non(X
~
~
~
singular, we say that X and Y are jointly non-singular. In what follows, we assume that X
4
~
and Y are jointly Gaussian, jointly non-singular, and zero-mean . The covariance matrix
~ can be partitioned into m rows on top and n rows on bottom, and then further
KU~ of U
partitioned into m and n columns, yielding:




KX~ KX~ Y~
B
C
 ;


KU~ = 
K −1
(2.25)
~ =
U
T
K ~ ~ KY~
CT D
XY
~X
~ T ], K ~ ~ = E[X
~Y
~ T ], and K ~ = E[Y
~Y
~ T ]. The blocks K ~ , K ~ , B, and
Here KX~ = E[X
XY
Y
X
Y
D are all non-singular (see Exercise 2.11). We will evaluate the blocks B, C, and D in
terms of KX~ , KY~ , and KX~ Y~ later, but first we find the conditional density, pX|
(~x | ~y ) in
~ Y
~
terms of these blocks. What we shall find is that for any given ~y , this is a Gaussian density
with a conditional covariance matrix equal to B −1 . As in (2.24), where X and Y are one~ given
dimensional, this covariance does not depend on ~y . Also, the conditional mean of X,
−1
−1
~ = ~y , will turn out to be −B C ~y . Thus, X,
~ conditional on Y
~ = ~y , is N (−B C ~y , B −1 ),
Y
i.e.,
£
§
exp − 12 (~x + B −1 C ~y )T B (~x + B −1 C ~y )
p
pX|
(~x|~y ) =
(2.26)
~ Y
~
(2π)n/2 det(B −1 )
In order to see this, express pX|
(~x|~y ) as pX~ Y~ (~x, ~y )/pY~ (~y ). From (2.17),
~ Y
~
pX,
(~x, ~y ) =
~ Y
~
h
i
T,~
T )T
exp − 12 (~xT , ~y T )K −1
(~
x
y
~
qU
(2π)(n+m)/2 det(K −1
~ )
U
=
£
°
¢§
exp − 12 ~xT B~x + ~xT C~y + ~y T C T ~x + ~y T D~y
q
(2π)(n+m)/2 det(K −1
~ )
U
We note that ~x only appears in the first three terms of the exponent above, and that ~x does
not appear at all in pY~ (~y ) Thus we can express the dependence on ~x in pX|
(~x|~y ) by
~ Y
~
pX|
(~x | ~y ) = f (~y ) exp
~ Y
~
(
£
§)
− ~xT B~x + ~xT C~y + ~y T C T ~x
2
(2.27)
where f (~y ) is some function of ~y . We now complete the square around B in the exponent
above, getting
∑
∏
−(~x +B −1 C ~y )T B (~x +B −1 C ~y ) + ~y T C T B −1 C ~y
pX|
(~x | ~y ) = f (~y ) exp
~ Y
~
2
4
Exercise 2.12 generalizes this to the case with an arbitrary mean.
2.7. CONDITIONAL PROBABILITIES
79
Since the last term in the exponent does not depend on ~x, we can absorb it into f (~y ). The
remaining expression is in the form of (2.26). Since pX|
(~x|~y ) must be a probability density
~ Y
~
for each ~y , the modified coefficient f (~y ) must be the reciprocal of the denominator in (2.26),
so that pX|
(~x|~y ) is given by (2.26).
~ Y
~
~ , the conditional distribution of
To interpret (2.26), note that for any sample value ~y for Y
−1
~
X has a mean given by −B C~y and a Gaussian fluctuation around the mean of variance
B −1 . This fluctuation has the same distribution for all ~y and thus can be represented as a
~ that is independent of Y
~ . Thus we can represent X
~ as
r~v V
~ = GY
~ +V
~;
X
~ ,V
~ independent
Y
(2.28)
where
G = −B −1 C
~ ∼ N (0, B −1 )
and V
(2.29)
~ is often called an innovation because it is the part of X
~ that is independent of Y
~ . It is also
V
−1
called a noise term for the same reason. We will call KV~ = B the conditional covariance
~ given any sample value ~y for Y
~ . In summary, the unconditional covariance, K ~ , of
of X
X
~ is given by the upper left block of K ~ in (2.25), while the conditional covariance K ~ is
X
U
V
the inverse of the upper left block, B, of the inverse of KU~ .
~ and Y
~ . To do this,
From (2.28), we can express G and KV~ in terms of the covariances of X
note that
~Y
~ T ] = E[GY
~Y
~T +V
~Y
~ T ] = GK ~
KX~ Y~ = E[X
Y
(2.30)
h
i
~ +V
~ )(GY
~ +V
~ )T = GK ~ GT + K ~
KX~ = E (GY
Y
V
(2.31)
G = KX~ Y~ K ~−1
(2.32)
Solving these equations, we get
Y
T
KV~ = KX~ − GKY~ GT = KX~ − KX~ Y~ K ~−1 KX
~Y
~
Y
(2.33)
where we have substituted the solution from (2.32) into (2.33). We can summarize these
results in the following theorem.
~ and Y
~ be zero mean, jointly Gaussian, and jointly non-singular.
Theorem 2.3. Let X
~ = GY
~ +V
~ where Y
~ and V
~ are independent,
Then they can be represented in the form X
~
V ∼ N (0, KV~ ), G is given by (2.32), KV~ is non-singular and given by (2.33), and the
~ given Y
~ is
conditional density of X
h
i
exp − 12 (~x − G~y )T K ~−1 (~x − G~y )
q V
pX|
(~x|~y ) =
(2.34)
~ Y
~
n/2
(2π)
det(KV~ )
80
CHAPTER 2. GAUSSIAN RANDOM VECTORS
We can also solve for the matrix B = K ~−1 from (2.33),
V
h
i−1
T
B = KX~ − KX~ Y~ K ~−1 KX
~Y
~
(2.35)
C = −K ~−1 G = −K ~−1 KX~ Y~ K ~−1
(2.36)
Y
Similarly, from (2.29) and (2.32),
V
V
Y
~ and Y
~ , we can repeat the above arguments, reversing the
From the symmetry between X
~ and Y
~ . Thus, we can represent Y
~ as
roles of X
~ = HX
~ +Z
~;
Y
~ Z
~ independent
X,
(2.37)
~ ∼ N (0, D−1 )
and Z
(2.38)
where
H = −D−1 C T
~ and Y
~.
Using (2.37), we express H and KZ~ in terms of the covariances of X
Solving, we get
~ X
~ + Z)
~ T = K ~ HT
KX~ Y~ = E[X(H
X
(2.39)
h
i
~ + Z)(H
~
~ + Z)
~ T = HK ~ H T + K ~
KY~ = E (H X
X
X
Z
(2.40)
−1
T
H = KX
~Y
~ K~
(2.41)
X
−1
T
KZ~ = KY~ − HKX~ H T = KY~ − KX
~Y
~
~Y
~ K ~ KX
X
The conditional density can now be expressed as
h
i
exp − 12 (~y − H~x)T K −1
(~
y
x
−
H~
x
)
~
q Z
pY~ |X~ (~y |~x) =
(2π)n/2 det(KZ~ )
(2.42)
(2.43)
where H and KZ~ are given in (2.41) and (2.42).
Finally, from (2.42), D = K −1
~ is
Z
h
i−1
−1
T
D = KY~ − KX
K
K
~Y
~
~Y
~
~
X
(2.44)
−1 T
−1
C T = −K −1
~Y
~ K~
~ H = −K ~ KX
(2.45)
X
Similarly, from (2.38) and (2.41),
Z
Z
X
2.8. SUMMARY
81
The matrices B, C, and D can also be derived directly from setting KU~ K −1
~ = I in (2.26)
U
(see Exercise 2.16).
~ = HX
~ +Z
~ and want to
In many estimation problems, we start with the formulation Y
~
~
~
find G and KV~ in the dual formulation X = GY + V . We can use (2.32) and (2.33)
directly, but the following expressions are frequently much more useful. From (2.29), we
have G = −KV~ C, and from (2.38), we have H = −KZ~ C T , or equivalently, C = −H T K −1
~ .
Z
Combining, we have
G = KV~ H T K −1
~
(2.46)
Z
Next, we can multiply KV~ , as given by (2.33), by K ~−1 to get
V
−1
T
I = KX~ K ~−1 − KX~ Y~ K ~−1 KX
~Y
~ K~
V
Y
V
(2.47)
From (2.36) and (2.45), we can express C T in the following two ways:
−1
−1
T
C T = −K ~−1 KX
~Y
~ K ~ = −K ~ H
Y
V
Z
(2.48)
The first of these expressions appears at the end of (2.47), and replacing it with the second,
(2.47) becomes
I = KX~ K ~−1 − KX~ Y~ K −1
~ H
V
(2.49)
Z
−1
T
Pre-multiplying all terms by K −1
~Y
~ , we get the final result,
~ and substituting H for K ~ KX
X
X
T −1
K ~−1 = K −1
~ + H K~ H
V
X
Z
(2.50)
We will interpret these equations in Chapter 4 on estimation.
2.8
Summary
~ with a covariance matrix K ~ is a Gaussian n-r~v (equivalently, the n
A zero mean n-r~v Z
Z
~ is Gaussian for all real n-vectors ~b. An equivalent
components are jointly Gaussian) if ~bT Z
~ has the MGF in (2.11). Another equivalent condition is that Z
~ is a
condition is that Z
linear transformation of n IID normalized Gaussian rv’s. For the case in which KZ~ is non~ has the density in (2.17). If Z
~ has a singular
singular, a final equivalent condition is that Z
covariance matrix, it should be viewed as a k-dimensional r~v, k < n, with a non-singular
~ has
covariance matrix, plus n − k variables that are linear combinations of the first k. If Z
~ −m
a mean m,
~ it is a Gaussian n-r~v iff the fluctuation Z
~ is a zero mean Gaussian n-r~v.
For a Gaussian n-r~v, the regions of equal probability density form concentric ellipsoids
around the mean. The principal axes for these ellipsoids are the eigenvectors of the covariance matrix and the length of each axis is proportional to the square root of the corresponding eigenvalue.
82
CHAPTER 2. GAUSSIAN RANDOM VECTORS
If X1 , X2 , . . . , Xm , Y1 , . . . , Yn are zero mean, jointly Gaussian, and have a non-singular
overall covariance matrix, then the conditional density pX|
x | ~y ) is a Gaussian m-r~v
~ Y
~ (~
for each ~y , and has a fixed covariance KX~ − KX~ Y~ K ~−1 K T~ ~ for each ~y and has a mean
Y
XY
KX~ Y~ K ~−1 ~y , which depends linearly on ~y . This situation can be equivalently formulated as
Y
~ = GY
~ +V
~ , where V
~ is a zero mean Gaussian m-r~v independent of Y
~ . Equivalently it
X
~
~
~
~
can be formulated as Y = H X + Z where Z is a zero mean Gaussian n-r~v independent of
~
X.
2.9
Exercises
2
Exercise 2.1. a) Let X, Y be IID
√ rv’s, each with density pX (x) = α exp(−x /2). In part
(b), we show that α must be 1/ 2π in order for pX (x) to integrate to 1, but in this part,
we leave α undetermined. Let S = X 2 + Y 2 . Find the probability density of S in terms of
α.
√
b) Prove from part (a) that α must be 1/ 2π in order for S, and thus X and Y , to be
random variables. Show that E[X] = 0 and that E[X 2 ] = 1.
√
c) Find the probability density of R = S. R is called a Rayleigh rv.
Exercise 2.2. a) By expanding in a power series in (1/2)s2 σ 2 , show that
exp
µ
s2 σ 2
2
∂
=1+
s2 σ 2
s4 σ 4
s2k σ 2k
+
+ ··· + k
+ ···
2
2
2(2 )
2 (k!)
b) By expanding esZ in a power series in sZ, show that
gZ (s) = E[esZ ] = 1 + sE[Z] +
s2 E[Z 2 ]
si E[Z i ]
+ ··· +
+ ···
2
(i)!
c) By matching powers of s between (a) and (b), show that for all integer k ≥ 1,
E[Z 2k ] =
(2k)!σ 2k
= (2k − 1)(2k − 3) · · · (3)(1)σ 2k
k!2k
;
E[Z 2k+1 ] = 0
Exercise 2.3. Let X and Z be IID normalized Gaussian random variables. Let Y =
|Z| Sgn(X), where Sgn(X) is 1 if X ≥ 0 and −1 otherwise. Show that X and Y are each
Gaussian, but are not jointly Gaussian. Sketch the contours of equal joint probability
density.
Exercise 2.4. a) Let X1 ∼ N (0, σ12 ) and let X2 ∼ N (0, σ22 ) be independent of X1 . Convolve the density of X1 with that of X2 to show that X1 + X2 is Gaussian.
b) Combine part (a) with induction to show that all linear combinations of IID normalized
Gaussian rv’s are Gaussian.
2.9. EXERCISES
83
~ be a r~v with mean m
Exercise 2.5. a) Let U
~ and covariance K whose MGF is given by
T
~
(2.12). Let X
=
s
U
for
an
arbitrary
real
vector
~s. Show that the MGF of X is given by
£
§
2 /2 and relate E[X] and σ 2 to m
gX (r) = exp rE[X] + r2 σX
~ and K.
X
~ is a Gaussian r~v.
b) Show that U
~ ∼ N (0, K) be n-dimensional. By expanding in a power series in
Exercise 2.6. a) Let Z
T
(1/2)~s K ~s, show that
gZ~ (~s) = exp
∑
P
∏
~s T K~s
=1+
2
i,j si sj [K]i,j
2
+ ··· +
≥P
i,j si sj [K]i,j
2m m!
¥m
+ ···
b) By expanding esi Zi in a power series in si Zi for each i, show that
"
gZ~ (s) = E exp
√
X
i
si Zi
!#
=
1
X
j1 =0
···
1
X
sj11
sjnn
···
E[Z1j1 . . . Znjn ]
(j1 )!
(jn )!
jn =0
c) Let D = {i1 , i2 , . . . , i2m } be a set of 2m distinct integers each between 1 and n. Consider
the term si1 si2 · · · si2m E[Zi1 Zi2 · · · Zi2m ] in part (b). By comparing with the set of terms in
part (a) containing the same product si1 si2 · · · si2m , show that
E[Zi1 Zi2 · · · Zi2m ] =
X
j1 j2 ...j2m
[K]j1 j2 [K]j3 j4 · · · [K]j2m−1 j2m
2m m!
where the sum is over all permutations (j1 , j2 , . . . , j2m ) of the set D.
d) Find the number of permutations of D that contain the same set set of unordered pairs
({j1 , j2 }, . . . , {j2m−1 , j2m }). For example, ({1, 2}, {3, 4}) is the same set of unordered pairs
as ({3, 4}, {2, 1}). Show that
E[Zi1 Zi2 · · · Zi2m ] =
X
j1 ,j2 ,... ,j2m
[K]j1 j2 [K]j3 j4 · · · [K]j2m−1 j2m
(2.51)
where the sum is over distinct sets of unordered pairs of the set D. Note: another way
to say the same thing is that the sum is over the set of all permutations of D for which
j2i−1 < j2i for 1 ≤ i ≤ m and j2i−1 < j2i+1 for 1 ≤ i ≤ m − 1.
e) To find E[Z1j1 · · · Znjn ], where j1 + j2 + · · · + jn = 2m, construct the random variables
U1 , . . . , U2m , where U1 , . . . , Uj1 are all identically equal to Z1 , where Uj1 +1 , . . . , Uj1 +j2 are
identically equal to Z2 , etc., and use (i) to find E[U1 U2 · · · U2m ]. Use this formula to find
E[Z12 Z2 Z3 ], E[Z12 Z22 ], and E[Z1 ]4 .
Exercise 2.7. Let Q be an orthonormal matrix. Show that the squared distance between
any two vectors ~z and ~y is equal to the squared distance between Q~z and Q~y .
84
CHAPTER 2. GAUSSIAN RANDOM VECTORS
£ .25 §
Exercise 2.8. a) Let K = .75
.25 .75 . Show that 1 and 1/2 are eigenvalues of K and find the
normalized eigenvectors. Express K as QΛQ−1 where L is diagonal and Q is orthonormal.
b) Let K 0 = αK for real α 6= 0. Find the eigenvalues and eigenvectors of K 0 . Don’t use
brute force—think!
c) Consider the mth power of K, K m for m > 0. Find the eigenvalues and eigenvectors of
K m.
2 , σ2 ,
Exercise 2.9. Let X and Y be jointly Gaussian with means mX , mY , variances σX
Y
and normalized covariance ρ. Find the conditional density pX|Y (x | y).
2 , σ 2 , and
Exercise 2.10. a) Let X and Y be zero mean jointly Gaussian with variances σX
Y
normalized covariance ρ. Let V = Y 3 . Find the conditional density pX|V (x | v). Hint: This
requires no computation.
b) Let U = Y 2 and find the conditional density of pX|U (x | u). Hint: first understand why
this is harder than part a).
~ T = (X
~T,Y
~ T ) have a non-singular covariance matrix K ~ . Show
Exercise 2.11. a) Let U
U
that KX~ and KY~ are positive definite, and thus non-singular.
b) Show that the matrices B and D in (2.25) are also positive definite and thus non-singular.
~ and Y
~ be jointly Gaussian r~v’s with means m
Exercise 2.12. Let X
~ X~ and m
~ Y~ , covariance
matrices KX~ and KY~ and cross covariance matrix KX~ Y~ . Find the conditional probability
~T,Y
~ T ) is non-singular. Hint: think
density pX|
x | ~y ). Assume that the covariance of (X
~ Y
~ (~
~ and Y
~.
of the fluctuations of X
~ be a normalized IID Gaussian n-r~v and let Y
~ be a Gaussian m-r~v.
Exercise 2.13. a) Let W
T
~Y
~ ] to be some arbitrary real valued n by
Suppose we would like the cross covariance E[W
~
~ achieves the desired cross covariance.
m matrix K. Find the matrix A such that Y = AW
Note: this shows that any real valued n by m matrix is the cross covariance matrix for some
choice of random vectors.
~ be a zero mean Gaussian n-r~v with non-singular covariance K ~ , and let Y
~ be a
b) Let Z
Z
~Y
~ T ] to be some arbitrary
Gaussian m-r~v. Suppose we would like the cross covariance E[Z
0
~ = BZ
~ achieves the desired
real valued n by m matrix K . Find the matrix B such that Y
cross covariance. Note: this shows that any real valued n by m matrix is the cross covariance
~ and Y
~ where K ~ is given (and non-singular).
matrix for some choice of random vectors Z
Z
~ has a singular covariance matrix in part b). Explain the constraints
c) Now assume that Z
~Y
~ T ]. Hint: your solution should
this places on possible choices for the cross covariance E[Z
involve the eigenvectors of KZ~ .
~ = (W1 , W2 , . . . , W2n )T be a 2n dimensional IID normalized
Exercise 2.14. a) Let W
2 . Show that S
Gaussian r~v. Let S2n = W12 + W22 + · · · + W2n
2n is an nth order Erlang
−n
n−1
−s/2
rv with parameter 1/2, i.e., that pS2n (s) = 2 s
e
/(n − 1)!. Hint: look at S2 from
Exercise 1.
2.9. EXERCISES
b) Let R2n =
85
√
S2n . Find the probability density of R2n .
c) Let v2n (r) be the volume of a 2n dimensional sphere of radius r and let b2n (r) be the
surface area of that sphere, i.e., b2n (r) = dv2n (r)/dr. The point of this exercise is to show
how to calculate these quantities. By considering an infinitesimally thin spherical shell of
thickness δ at radius r, show that
pR2n (r) = b2n (r)pW
~ |W
~ (w)
~ :W
~ TW
~ =r2
d) Calculate b2n (r) and v2n (r). Note that for any fixed δ ø r, the volume within δ of the
surface of a sphere of radius r to the total volume of the sphere approaches 1 with increasing
n.
~ = (X, Y )T . Solve directly for B, C, and D in (2.25) for this case,
Exercise 2.15. a) Let U
and show that (2.26) agrees with (2.24)
b) Show that your solution for B, C, and D agrees with (2.35), (2.36), and (2.44).
Exercise 2.16. Express B, C, and D in terms of KX~ , KY~ and KX~ Y~ by multiplying the
block expression for KU~ by that for K −1
~ . Show that your answer agrees with that in (2.35),
U
(2.36), and (2.44).
Download