Chapter 2 GAUSSIAN RANDOM VECTORS 2.1 Introduction Gaussian random variables and Gaussian random vectors (vectors whose components are jointly Gaussian, as defined later) play a central role in detection and estimation. Part of the reason for this is that noise like quantities in many applications are reasonably modeled as Gaussian. Another, perhaps more important reason, is that Gaussian random variables turn out to be remarkably easy to work with (after an initial period of learning their peculiarities). Jointly Gaussian random variables are completely described by their means and covariances, which is part of the simplicity of working with them. When we find estimates or detection rules, and when we evaluate their performance, the answers then involve only those means and covariances. A third reason why Gaussian random variables and vectors are so important is that we shall find, in many cases, that the performance measures we get for estimation and detection problems for the Gaussian case often bounds the performance for other random variables with the same means and covariances. For example, we will find that the minimum mean square estimator for Gaussian problems is the same, and has the same mean square performance, as the linear least squares estimator for other problems with the same mean and covariance. We will also find that this estimator is quite simple and is linear in the observations. Finally, we will find that the minimum mean square estimator for non-Gaussian problems always has a better performance than that for Gaussian problems of the same mean and covariance, but that the estimator is frequently much more complex. The point of this example is that non-Gaussian problems are often more easily and more deeply understood if we first understand the corresponding Gaussian problem. In this chapter, we develop the most important properties of Gaussian random variables and vectors, namely the moment generating function, the moments, the joint densities, and the conditional probability densities. We also develop the properties of covariance matrices 65 66 CHAPTER 2. GAUSSIAN RANDOM VECTORS Figure 2.1: Graph of the density of a normalized Gaussian rv (the taller curve) and of a zero mean Gaussian rv with variance 4 (the flatter curve). (which apply to both Gaussian and non-Gaussian random variables and vectors), and review a number of results about linear algebra that will be used in subsequent chapters. 2.2 Gaussian Random Variables A random variable (rv) W is defined to be a normalized Gaussian rv if it has the density µ ∂ 1 −w2 pW (w) = √ exp (2.1) 2 2π Exercise 2.1 shows that pW (w) integrates to 1 (i.e., it is a probability density), and that W has mean 0 and variance 1. If we scale W by a positive constant σ to get the rv Z = σW , then the density of Z at z = σw satisfies pZ (z)dz = pW (w)dw. Since dz/dw = σ, the density of Z is µ 2∂ 1 ≥z¥ 1 −z pZ (z) = pW =√ exp (2.2) σ σ 2σ 2 2π σ Thus the density function for Z is scaled horizontally by the factor σ, and then scaled vertically by 1/σ (see Figure 2.1). This scaling leaves the integral of the density unchanged with value 1 and scales the variance by σ 2 . If we let σ approach 0, this density approaches an impulse, i.e., Z becomes the atomic random variable for which Pr(Z=0) = 1. For convenience in what follows, we use (2.2) as the density for Z for all σ ≥ 0, with the above understanding about the σ = 0 case. A rv with the density in (2.2), for any σ ≥ 0, is defined to be a zero mean Gaussian rv. The values P r(|Z| < σ) = .682, P r(|Z| < 3σ) = .997 give us a sense of how small the tails of the Gaussian distribution are. If we shift Z to U = Z + m, then the density shifts so as to be centered at m, the mean becomes m, and the density satisfies pU (u) = pZ (u − m), so that µ ∂ 1 −(u − m)2 pU (u) = √ exp (2.3) 2σ 2 2π σ A random variable U with this density, for arbitrary m and σ ≥ 0, is defined to be a Gaussian random variable and is denoted U ∼ N (m, σ 2 ). 2.2. GAUSSIAN RANDOM VARIABLES 67 The added generality of a mean often obscures formulas; we will usually work with zero mean rv’s and random vectors (r~v’s) and insert the means later. That is, any random variable can be regarded as a constant (the mean) plus a zero mean random variable (called the fluctuation). When necessary for notation, we denote U = mU + f U , where f U is the f fluctuation of U around its mean mU , and work with U . The moment generating function (MGF) of an arbitrary rv Z is defined to be gZ (s) = E[exp(sZ)]. We usually take s to be a real variable, but it can also be regarded as complex, with real part α and imaginary part jω. The two sided Laplace transform of the density of Z is equal to gZ (−s). Similarly, the Fourier transform of the density of Z, as a function of ω, is gZ (−jω) and the characteristic function of Z is gZ (jω). The characteristic function and Fourier transform have the advantage that, for real ω, they exist for all random variables, whereas the MGF and Laplace transform (for real s 6= 0) exist only if the tails of the distribution approach 0 at least exponentially. For the rv’s of interest here, the MGF exists for at least a region of real s around 0, and thus, if one calculates one of these transforms, say the MGF, the others follow simply by substituting −s, jω, or −jω for s. For the Gaussian rv Z ∼ N (0, σ 2 ), gZ (s) can be calculated as follows: ∑ 2∏ Z 1 1 −z gZ (s) = E[exp(sZ)] = √ exp(sz) exp dz 2σ 2 2π σ −1 ∑ 2 ∏ Z 1 1 −z + 2σ 2 sz − s2 σ 4 s2 σ 2 = √ exp + dz 2σ 2 2 2π σ −1 ∑ 2 2∏Ω ∑ ∏ æ Z 1 s σ 1 −(z − sσ)2 √ = exp exp dz 2 2σ 2 2π σ −1 ∑ 2 2∏ s σ = exp 2 (2.4) (2.5) (2.6) We completed the square in the exponent in (2.4) and then recognized, in (2.5), that the term in braces is the integral of a probability density and thus equal to 1. If the MGF of Z exists for a region of real s around 0 (as it does for Gaussian rv’s), it can be used to calculate all the moments of Z. As shown in Exercise 2.2, E[Z 2k ], for Z ∼ N (0, σ 2 ), is given, for all integers k ≥ 1, by E[Z 2k ] = (2k)!σ 2k = (2k − 1)(2k − 3)(2k − 5) . . . (3)(1)σ 2k k! 2k (2.7) Thus, E[Z 4 ] = 3σ 4 , E[Z 6 ] = 15σ 6 , etc. Since z 2k+1 is an odd function of z and the Gaussian density is an even function, the odd moments of Z are all zero. For an arbitrary Gaussian rv U ∼ N (m, σ 2 ), we have U = m + f U , where f U ∼ N (0, σ 2 ). Thus gU (s) = E[exp(s(m + f U )] = esm E[s f U ] = esm exp(s2 σ 2 /2). It turns out that the distribution function of a random variable is uniquely determined1 by its moment generating function 1 More precisely, the distribution function is uniquely determined except on a set of points of measure zero. 68 CHAPTER 2. GAUSSIAN RANDOM VECTORS (if the MGF exists in an interval around 0), so we see that a random variable U is Gaussian with mean m and variance σ 2 if and only if (iff) its moment generating function is µ ∂ s2 σ 2 gZ (s) = exp sm + (2.8) 2 2.3 Gaussian Random Vector and MGF’s An n by m matrix A is an array of nm elements arranged in n rows and m columns; aij denotes the j th element in the ith row. Unless specified to the contrary, the elements will be real numbers. The transpose AT of an n by m matrix A is an m by n matrix B with bji = aij for all i, j. A matrix is square if n = m and a square matrix A is symmetric if A = AT . If A and B are each n by m matrices, A + B is an n by m matrix C with cij = aij + bij for all i, j. If APis n by m and B is m by r, the matrix AB is an n by r matrix C with elements cik = j aij bjk . A vector (or column vector) of dimension n is an n by 1 matrix and a row vector of dimension n is a 1 by n matrix. Since the transpose of a vector is a row vector, we denote a vector ~a as (a1 , . . . , an )T . The reader is expected to be familiar with vector and matrix manipulations. An n-dimensional random vector (an n-r~v) is a mapping from the sample space into the space <n of n-dimensional real vectors. We could view an n-r~v simply as n individual random variables, but vector notation allows us to state results much more compactly for vectors than for the set of individual rv’s. Sampled time stochastic processes can be viewed simply as random vectors (although the dimension is often infinite), and continuous time stochastic processes are usually best studied by various expansions that transform them to r~v’s (although again often of infinite dimension). Thus, a thorough understanding of r~v’s is essential to everything that follows in this subject. ~ = (Z1 , Z2 , . . . , Zn )T is simply the joint probThe probability density, pZ~ (~z ), of an n-r~v Z ~ denoted m ~ is the ability density of the components Z1 , . . . , Zn . The mean of Z, ~ Z~ or E[Z], T real vector (mZ1 , mZ2 , . . . , mZn ) where mZi =hE[Zi ] for 1 ≤ i ≤ n. iSimilarly the covari~ denoted K ~ , is defined as E (Z ~ −m ~ −m ance matrix of Z, ~ )(Z ~ )T . This is an n by n Z ~ Z ~ Z symmetric matrix whose element in the ith row, jth column, is the covariance of Zi and Zj , ~ i.e., E[(Zi − mi )(Zj − mj )]. Finally, the moment generating function (MGF) of an n-r~v Z ~ where ~s = (s1 , . . . , sn )T is an n-dimensional vector. The is defined as gZ~ (~s) = E[exp(~s T Z)] components of s could be taken to be complex, but we usually view them as real. If the components of a r~v are independent and identically distributed (IID), we call the vector an IID r~v. ~ where Example 2.1. An example that will become very familiar is that of an IID n-r~v W the components Wi , 1 ≤ i ≤ n, are each normalized Gaussian, Wi ∼ N (0, 1). By taking the ~ = (W1 , W2 , . . . , Wn )T is product of n densities given by (2.1), the density of W µ ∂ µ ∂ 1 −w12 − w22 − . . . − wn2 1 −w ~ Tw ~ pW~ (w) ~ = exp = exp (2.9) 2 2 (2π)n/2 (2π)n/2 2.3. GAUSSIAN RANDOM VECTOR AND MGF’S 69 w2 w1 Figure 2.2: Contours of equal probability density for two normalized IID Gaussian rv’s. ~ at a sample value w The joint density of W ~ depends only on the squared distance w ~ Tw ~ of the sample value from the origin. That is, pW~ (w) ~ is spherically symmetric around the origin, and points of equal probability density lie on concentric spheres around the origin (see Figure 2.2). ~ is easily calculated as follows: The moment generating function of W " # Y ~ )] = E[exp(s1 W1 + · · · + sn Wn ] = E g (~s) = E[exp ~s T W exp(si Wi ) ~ W i = Y i E [exp(si Wi )] = Y i exp µ ∂ ∑ T ∏ s2i ~s ~s = exp , 2 2 (2.10) where we have used, first, the fact that the independence of {W1 , . . . , Wn } guarantees the independence of {exp(s1 W1 ), . . . , exp(sn Wn )}, next, the fact that the expected value of a product of independent rv’s is equal to the product of the expected values, and, finally, the fact that (2.6) gives the MGF of each Wi . We now go on to define the general class of Gaussian r~v’s. ~ = Definition: {Z1 , Z2 , . . . , Zn } is a set of jointly Gaussian random variables, and Z T T (Z1 , . . . , Zn ) is a Gaussian r~v, if, for all real vectors ~s = (s1 , . . . , sn ) , the linear combi~ = s1 Z1 + s2 Z2 + · · · + sn Zn is a Gaussian random variable. nation ~s T Z The intuitive idea here is that Gaussian rv’s arise in practice because of the addition of large numbers of small essentially independent rv’s (the central limit theorem indicates that such a sum can be approximated by a Gaussian rv). For example, when a broad band noise waveform is passed through a narrow band linear filter, the output at any given time is usually well approximated as a Gaussian rv. A linear combination of outputs at different times is also a sum of the same set of small, essentially independent, underlying rv’s, and that sum can again be approximated as Gaussian. Thus we would expect a set of outputs at different times to be a jointly Gaussian set according to the above definition. Note that if {Z1 , . . . , Zn } is jointly Gaussian, then for each i, 1 ≤ i ≤ n, Zi is a Gaussian random variable (as follows by choosing si = 1 and sj = 0 for j 6= i in the definition above). In the same way, each subset of {Z1 , . . . , Zn } is also jointly Gaussian. However, if Z1 , Z2 , . . . , Zn are each Gaussian rv’s, it does not necessarily follow that the set 70 CHAPTER 2. GAUSSIAN RANDOM VECTORS {Z1 , . . . , Zn } is jointly Gaussian. As a simple example, let Z1 ∼ N (0, 1), let X be +1 or −1, each with probability 1/2, and let Z2 = Z1 X1 . Then Z2 ∼ N (0, 1). The joint probability density, pZ1 Z2 (z1 , z2 ) is then impulsive on the diagonals where z2 = ±z1 and is zero elsewhere. Then, Z1 + Z2 can not be Gaussian, since it takes on the value 0 with probability one half2 . Exercise 2.3 gives another example. The distinction between individually Gaussian and jointly Gaussian rv’s is much more than mathematical nit picking, since the remarkable properties of Gaussian random vectors follow largely from the jointly Gaussian property rather than merely the property of being individually Gaussian. ~ = (Z1 , . . . , Zn )T under We now find the moment generating function (MGF) of the r~v Z ~ the assumption that Z is a zero mean Gaussian r~v. The MGF, by definition, is " √ !# h i X ~ = E exp g (~s) = E exp(~s T Z) si Zi ~ Z i ~ Since Z ~ is a Gaussian r~v (i.e., (Z1 , . . . , Zn ) is jointly For any given ~s, let X = ~s T Z. ~ is zero mean (i.e., all components are zero Gaussian), X is also Gaussian, and since Z 2 /2], so that g (1) = exp[σ 2 /2]. For mean), X is zero mean. From (2.6), gX (r) = exp[r2 σX X X ~ then, X = ~s T Z 2 ~ = E[exp(X)] = g (1) = exp(σX gZ~ (~s) = E[exp(~s T Z)] /2) X 2 = E[X 2 ] = E[~ ~Z ~ T ~s ] = ~s T K ~ ~s, where K ~ is the covariance matrix of Z. ~ Finally σX sTZ Z Z ~ is Thus, the moment generating function of an arbitrary zero mean Gaussian r~v Z " # ~s T KZ~ ~s gZ~ (~s) = exp (2.11) 2 ~ = (U1 , . . . , Un )T be an arbitrary Gaussian r~v and, for 1 ≤ i ≤ n, let mi = Next let U E[Ui ] and let Zi = Ui − mi be the fluctuation of Ui . Letting m ~ = (m1 , . . . , mn )T and ~ = (Z1 , . . . , Zn ), we have U ~ = m ~ Thus the MGF of U ~ is given by g (s) = Z ~ + Z. ~ U T T T ~ = exp(~s m)g ~ and U ~ have E[exp(~s m ~ + ~s Z)] ~ Z~ (~s). Using (2.11) and recognizing that Z the same covariance function, √ ! ~s T KU~ ~s T gU~ (~s) = exp ~s m ~ + (2.12) 2 Note that the MGF of a Gaussian r~v is completely specified by its mean and covariance. Be~ of mean m ~ ∼ N (m, cause of this, we denote a Gaussian r~v U ~ and covariance KU~ as U ~ KU~ ). ~ Note also that if a r~v U has the MGF in (2.12), then, as shown in Exercise 2.5, each linear ~ is a Gaussian rv. Thus we have the theorem, combination of the components of U ~ with mean m Theorem 2.1. A r~v U ~ and covariance KU~ is a Gaussian r~v iff gU~ (~s) is given by (2.12). 2 One frequently hears the erroneous statement that uncorrelated Gaussian rv’s are independent. This is false as the above example shows. The correct statement, as we see later, is that uncorrelated jointly Gaussian rv’s are independent. 2.4. JOINT PROBABILITY DENSITIES FOR GAUSSIAN RANDOM VECTORS 71 Note that the MGF’s in (2.6), (2.8), and (2.11) are special cases of (2.12). Also the MGF ~ in (2.10) agrees with (2.12) since the mean of W ~ is zero and the covariance of the IID r~v W ~ , not surprisingly, is a Gaussian r~v, i.e., W ~ ∼ N (0, I). is the identity matrix I. Thus W 2.4 Joint Probability Densities for Gaussian Random Vectors In this section, we start with an example of a zero mean Gaussian r~v that is defined as a ~ of example 2.1. We show later linear transformation of the IID normalized Gaussian r~v W that this example in fact covers all zero mean Gaussian r~v’s. ~ be an IID normalized Gaussian n-r~v, Example 2.2. Let A be an n by n real matrix, let W ~ ~ ~ ~ is and let the n-r~v Z be given by Z = AW . The covariance matrix of Z ~Z ~ T ] = E[AW ~W ~ TAT ] = AAT KZ~ = E[Z (2.13) ~ since since E[W W T ] is the identity matrix, In . We can easily find the MGF of Z ~ = E[exp(~s T AW ~ )] = E[exp{(AT ~s)T W ~ }] = g (AT ~s) gZ~ (~s) = E[exp(~s T Z)] ~ W " # ∑ T ∏ ~s T KZ~ ~s ~s AAT ~s = exp = exp (2.14) 2 2 ~ is a zero mean Gaussian r~v. We shall see Comparing (2.14) with (2.11), we see that Z ~ = AW ~ for some shortly that any zero mean Gaussian r~v can be represented in the form Z ~. real n by n matrix A and IID normalized Gaussian r~v W ~ = AW ~ . First consider the corresponding We next find the joint probability density of Z transformation of real valued vectors, ~z = Aw. ~ Let ~ei be the ith unit vector (i.e., the vector whose ith component is 1 and whose other components are 0). Then A~ei = ~ai , where ~ai is the ith column of A. Thus, ~z = Aw ~ transforms the unit vectors ~ei into the columns ~ai of A. For n = 2, Figure 2.3 shows how this transformation carries the unit square with the corners ~0, ~e1 , ~e2 , and (~e1 + ~e2 ) into the parallelogram with corners ~0, ~a1 , ~a2 , and (~a1 + ~a2 ). For an arbitrary number of dimensions, the unit cube in the w ~ space is the set of points w ~ such that 0 ≤ wi ≤ 1 for i = 1, . . . , n. There are 2n corners of the unit cube, and each is some 0/1 combination of the unit vectors, i.e., each has the form ~ei1 + ~ei2 + · · · + ~eik . The transformation Aw ~ carries the unit cube into a parallelepiped, where each corner of the cube, ~ei1 + ~ei2 + · · · + ~eik , is carried into a corresponding corner ~ai1 + ~ai2 + · · · + ~aik of the parallelepiped. One of the most interesting and geometrically meaningful properties of the determinant, det(A), of a square matrix A is that the magnitude of that determinant, | det(A)|, is equal to the volume of that parallelepiped (see Strang, Linear Algebra, sec. 4.4). If det(A) = 0, i.e., if A is singular, then the n-dimensional unit cube in the w space 72 CHAPTER 2. GAUSSIAN RANDOM VECTORS w2 ~e2 ~e1 + ~e2 w1 ~e1 z2 ◗ s ◗ ✁~a2 ✁ ✁ ✁ ✁ ◗ s ◗ ✁~a1 + ~a2 ✁ ✁ ✁ ✁ z1 A= ∑ 1 0.5 0 1 ∏ ~a1 Figure 2.3: Unit square transformed to a parallelogram. is transformed into a smaller dimensional parallelepiped whose volume (as a region of ndimensional space) is 0. We assume in what follows that det(A) = 6 0 so that the inverse of A exists. ~ and let w Now let ~z be a sample value of Z, ~ = A−1~z be the corresponding sample value of ~ . The joint density at ~z must satisfy W pZ~ (z)|d~z | = pW~ (w)|d ~ w| ~ (2.15) where |dw| ~ is the volume of an incremental cube with dimension δ = dwi on each side, and |d~z | is the volume of that incremental cube transformed by A. Thus |dw| ~ = δ n and |d~z | = δ n | det(A)| so that |d~z |/|dw| ~ = | det(A)|. Using this in (2.15), and using (2.9) for ~ = AW ~ is pW~ (w) ~ = pW~ (A−1~z ), we see that the density of a jointly Gaussian vector Z £ § exp 12 ~z T (A−1 )T A−1~z pZ~ (~z ) = (2.16) (2π)n/2 | det(A)| −1 T −1 and det(K ) = det(A) det(AT ) = From (2.13), we have KZ~ = AAT , so K −1 ~ ~ = (A ) A Z Z [det(A)]2 > 0. Thus (2.16) becomes h i exp − 12 ~z T K −1 ~z ~ Z q pZ~ (~z ) = (2.17) (2π)n/2 det(KZ~ ) Eq. (2.17) doesn’t have any meaning when A, and thus KZ~ , is singular, since K −1 ~ does not Z then exist. In the case where A is singular, Aw ~ maps the set of n-dimensional vectors w ~ into a subspace of dimension less than n, and pZ~ (~z ) is 0 outside of that subspace and is ~ can impulsive inside. What this means is that some components of the random vector Z be expressed as linear combinations of other components. In this case, one can avoid very messy notation by simply defining the components that are linearly independent as forming ~ 0 , and representing the other components as linear combinations of the a random vector Z ~ 0 . With this stratagem, we can always take the covariance of any such components of Z reduced r~v to be nonsingular. ~ =m ~ . Define f ~ as the fluctuation in U ~ , and note that (2.17) Next consider U ~ + AW U = AW f f ~ ] = m. ~ can be used for the density of U . Since E[ U ] = 0, we see that E[U ~ Assuming 2.5. PROPERTIES OF COVARIANCE MATRICES 73 ~ as a translation of that for det(A) 6= 0, we can immediately write down the density for U f U, h i exp − 12 (~u − m) ~ T K −1 (~ u − m) ~ ~ U q pU~ (~u) = (2.18) (2π)n/2 det(KU~ ) T ~ and f where KU~ = E[ f Uf U ] is the covariance matrix of both U U . We soon show that this ~ is the general form of density for a Gaussian random vector U ∼ N (m, ~ KU~ ) if det(KU~ ) 6= 0. ~ ∼ N (m, In fact, we have already shown that for any Gaussian r~v U ~ KU~ ), the MGF satisfies ~ . Thus, if there is (2.12), and it turns out that this uniquely specifies the distribution of U T ~ a matrix A with det(A) 6= 0 such that KU~ = AA , then U ∼ N (m, ~ KU~ ) can be expressed ~ ~ as m ~ + AW , and (2.18) must be the density of U . We shall soon see that any covariance matrix KU~ can be expressed as AAT for some square matrix A. In most of what follows, we restrict our attention to zero mean Gaussian r~v’s, since, as we have shown, it is usually ~ , and then include the mean later. simpler to deal with the fluctuation f U of U For the 2-dimensional zero mean case, let E[Z12 ] = σ12 , E[Z22 ] = σ22 and E[Z1 Z2 ] = k12 . 2 = σ 2 σ 2 (1 − Define the normalized covariance, ρ, as k12 /(σ1 σ2 ). Then det(KZ~ ) = σ12 σ22 − k12 1 2 2 2 ρ ). For A to be non-singular, we need det(KZ~ ) = [det(A)] > 0, so we need |ρ| < 1. We then have K −1 ~ Z 1 = 2 2 2 σ1 σ2 − k12 pZ~ (~z ) = = ∑ σ22 −k12 −k12 σ12 1 2π p exp 2 σ12 σ22 − k12 1 p exp 2πσ1 σ2 1 − ρ2 µ µ ∏ 1 = 1 − ρ2 ∑ 1/σ12 −ρ/(σ1 σ2 ) −ρ/(σ1 σ2 ) 1/σ22 −z12 σ22 + 2z1 z2 k12 − z22 σ12 2 ) 2(σ12 σ22 − k12 ∏ ∂ −(z1 /σ1 )2 + 2ρ(z1 /σ1 )(z2 /σ2 ) − (z2 /σ2 )2 2(1 − ρ2 ) ∂ (2.19) This is why we use vector notation! There are two lessons in this. First, hand calculation is frequently messy for Gaussian r~v’s, and second, the vector equations are much simpler, so we must learn to reason directly from the vector equations and use standard computer programs to do the calculations. 2.5 Properties of Covariance Matrices In this section, we summarize some properties of covariance matrices that will be used ~ exists frequently in what follows. A matrix K is a covariance matrix if a zero mean r~v Z ~Z ~ T ]. It is important to realize that the properties developed here apply such that K = E[Z to non-Gaussian as well as Gaussian r~v’s. An n by n matrix K is positive semi-definite if it 74 CHAPTER 2. GAUSSIAN RANDOM VECTORS is symmetric and if, for all real n-vectors ~b, ~bT K~b ≥ 0. It is positive definite if, in addition, ~bT K~b > 0 for all non-zero ~b. Our objective in this section is to list the relationships between these types of matrices, and to state some other frequently useful properties of covariance matrices. ~ be a zero mean 1) Every covariance matrix K is positive semi-definite. To see this, let Z T ~ ~ n-r~v such that K = E[Z Z ]. K is symmetric since E[Zi Zj ] = E[ZjhZi ] for alli i, j. Let ~b be ~ Then 0 ≤ E[X 2 ] = E ~bT Z ~Z ~ T ~b = ~bT K~b. an arbitrary real n-vector, and let X = ~bT Z. ~ as above 2) A covariance matrix K is positive definite iff det(K) 6= 0. To see this, define Z T T ~ ~ ~ ~ ~ and note that if b K b = 0 for some b 6= 0, then X = b Z has zero variance, and therefore ~ T ] = 0, so ~bT E[Z ~Z ~ T ] = 0. Since ~b 6= 0 and ~bK = 0, is zero with probability 1. Thus E[X Z we must have det(K) = 0. Conversely, if det(K) = 0, there is some ~b such that K~b = 0, so ~bT K~b is also 0. 3) A complex number ∏ is an eigenvalue of a matrix K if K~q = ∏~q for some non-zero vector ~q ; the corresponding ~q is called an eigenvector. The following results about the eigenvalues and eigenvectors of positive definite (semi-definite) matrices K are standard linear algebra results (see for example, Strang, section 5.5)3 All eigenvalues of K are positive (non-negative). All the eigenvectors can be taken to be real. All eigenvectors of different eigenvalues are orthogonal, and if an eigenvalue has multiplicity j, then it has j orthogonal eigenvectors. Altogether, n orthogonal eigenvectors can be chosen, and they can be scaled to be orthonormal. 4) If K is positive semi-definite, there is an orthonormal matrix Q whose columns, ~q1 , . . . , ~qn are the orthonormal eigenvectors above. Q satisfies KQ = QΛ where Λ = diag(∏1 , . . . , ∏n ) is the diagonal matrix whose ith element, ∏i , is the eigenvalue of K corresponding to the eigenvector ~qi . This is simply the vector version of the eigenvector/eigenvalue relationship in property 3. Q also satisfies QT Q = I where I is the identity matrix. This follows since ~qiT ~qj = δij . We then also have Q−1 = QT . 5) If K is positive semi-definite, and if Q and Λ are the matrices above, then K = QΛQT . If K is positive definite, then also K −1 = QΛ−1 QT . The first equality follows from property 4, and the second follows by multiplying the expression for K by that for K −1 , which exists because all the eigenvalues are positive. Q 6) For a symmetric n by n matrix K, det K = ni=1 ∏i where ∏1 , . . . , ∏n are the eigenvalues of K repeated according to their multiplicity. Thus if K is positive definite, det K > 0 and if K is positive semi-definite, det K ≥ 0. 3 Strang also shows that K is positive definite iff all the upper left square submatrices have positive determinants. This is often called ∑ Sylvester’s ∏ test. This test does not extend to positive semi-definite 0 0 matrices (for example, the matrix is not positive semi-definite even though the upper left square 0 −1 submatrices have non-negative determinants). K is positive semi-definite iff all the principal submatrices (i.e., the submatrices resulting from dropping a subset of rows and the corresponding subset of columns) have non-negative determinants. 2.6. GEOMETRY AND PRINCIPAL AXES FOR GAUSSIAN DENSITIES 75 7) If K is a positive definite (semi-definite) matrix, then there is a unique positive definite (semi-definite) square root matrix R satisfying R2 = K. In particular, R is given by ≥p p p ¥ R = QΛ1/2 QT where Λ1/2 = diag ∏1 , ∏2 , . . . , ∏n (2.20) 8) If K is positive semi-definite, then K is a covariance matrix. In particular, K is the ~ = RW ~ where R is the square root matrix in (2.20) and W ~ is an IID covariance matrix of Z normalized Gaussian r~v. 9) For any real n by n matrix A, the matrix K = AAT is a covariance matrix. In particular, ~ = AW ~. K is the covariance matrix of Z For any given covariance matrix K, there are usually many choices for A satisfying K = AAT . The square root matrix R above is simply a convenient choice. The most important of the results in this section are summarized in the following theorem: Theorem 2.2. A real n by n matrix K is a covariance matrix iff it is positive semi-definite. Also it is a covariance matrix iff K = AAT for some real n by n matrix A. One choice for A is the square root matrix R in (2.20). We now see, as summarized in the following corollary, that example 2.2 above is completely general. ~ ∼ N (0, K) COROLLARY 1: For any covariance matrix K, a zero mean Gaussian r~v Z ~ can be viewed as AW ~ , where AAT = K and W ~ ∼ N (0, In ). The MGF of Z ~ is exists and Z given by (2.11), and, if det(K) 6= 0, the probability density is given by (2.17). The density ~ ∼ (m, for an arbitrary Gaussian r~v U ~ K), with det(K) 6= 0, is given by (2.18). 2.6 Geometry and Principal Axes for Gaussian Densities ~ be an arbitrary zero mean Gaussian n-r~v with non-singular covariance K. For any Let Z given c > 0, the points ~z for which ~z T K~z = c form a contour of equal probability density for ~ as seen by (2.17). This is a quadratic equation in ~z and is the equation for an ellipsoid Z, centered on the origin. The principal axes of this ellipsoid are given by the eigenvectors of K. In order to understand this, let Q be an orthonormal matrix whose columns, ~q1 , . . . , ~qn are the eigenvectors of K and let Λ be the corresponding diagonal matrix of eigenvalues (as ~ = QT Z. ~ The covariance matrix of V ~ in property 4 above). Consider the transformation V is then ~V ~ T ] = E[QT Z ~Z ~ T Q] = QT KQ = Λ, KV~ = E[V (2.21) ~ is a zero mean where the last step follows from K = QΛQT and QT = Q−1 . Since Z ~ Gaussian r~v, V is also, so it has the joint probability density h i £ P § −1 T n exp −~v K ~ ~v /2 Y exp − i vi2 /(2∏i ) exp[−vi2 /(2∏i )] V £ § √ √ Q pV~ (~v ) = = = (2.22) (2π)n/2 [det(KV~ )]1/2 (2π)n/2 2π∏ i i ∏i i=1 76 CHAPTER 2. GAUSSIAN RANDOM VECTORS ~e2 ~q1 ~q2 z1~e1 v1 ~q1 ~e1 z2~e2 v2 ~q2 ~x Figure 2.4: Representation of a point in two different co-ordinate systems. Point ~x is represented by (z1 , z2 )T in the ~e1 , ~e2 system and by (v1 , v2 )T in the ~q1 , ~q2 system. ~ = QT Z ~ carries each sample value ~z for the r~v Z ~ into the sample The transformation V ~ . Visualize this transformation as a change of basis, i.e., ~z represents value ~v = QT ~z of V some point ~x of n-dimensional real space in terms of a given co-ordinate system and ~v represents the same ~x in a different co-ordinate system (see Figure 2.4). In particular, let ~e1 , ~e2 , . . . , ~en be the basis vectors in co-ordinate system 1, and let ~q1 , ~q2 , . . . , ~qn be the basis vectors in co-ordinate system 2. Then ~x = n X i=1 zi~ei = n X vi ~qi (2.23) i=1 In co-ordinate system 1, ~ei is simply the ith unit vector, containing 1 in position i and 0 elsewhere. In this co-ordinate system, ~x is represented by the n-tuple (z1 , . . . , zn )T . Also, in co-ordinate system 1, ~qi is the ith normalized eigenvector of the matrix K. Since the orthonormal matrix Q has the columns ~q1 , . . . , ~qn , the right hand side of (2.23) can be written, in co-ordinate system 1, as Q~v , where ~v is the n-tuple (v1 , . . . , vn )T . Thus, in co-ordinate system 1, ~z = Q~v , so ~v = Q−1~z = QT ~z , verifying that ~v = QT ~z actually corresponds to the indicated change of basis. Note that ~q1 , . . . , ~qn , are orthonormal in both co-ordinate systems, and ~e1 , . . . , ~en are also orthonormal in both systems. This means that the transformation can be viewed as a rotation, i.e., distances are unchanged (see Exercise 2.7). For any given real number c > 0. the set of vectors ~z for which ~z T K −1~z = c form a contour ~ (in co-ordinate system 1). The corresponding contour in of equal probability density for Z in co-ordinate system 2, with ~z = Q~v is given by ~v T QT K −1 Q~v = c. This can be rewritten P as ~v T Λ−1~v = c, or i vi2 /∏i = c (as can also be seen from (2.22). This is the equation of an ellipsoid, but the principal axes of the ellipse are now aligned with co-ordinate system 2, using basis vectors ~q1 , . . . , ~qn (see Figure √ 2.5). The distances from the origin to the ellipsoid in these principal axis directions are c∏i , 1 ≤ i ≤ n. 2.7. CONDITIONAL PROBABILITIES 77 Figure 2.5: Lines of equal probability density form ellipses. The principal axes of these ellipses are the eigenvectors of K. The intersection of an ellipse with each axis is proportional to the square root of the corresponding eigenvalue. 2.7 Conditional Probabilities Next consider the conditional probability pX|Y (x|y) for two jointly Gaussian zero mean rv’s X and Y with a non-singular covariance matrix. From (2.19), ∑ ∏ 1 −(x/σX )2 + 2ρ(x/σX )(y/σY ) − (y/σY )2 p pX,Y (x, y) = exp , 2(1 − ρ2 ) 2πσX σY 1 − ρ2 where ρ = E[XY ]/(σX σY ). Since pX|Y (x|y) = and Y ∼ N (0, σY2 ), we have pX,Y (x, y) pY (y) ∑ −(x/σX )2 + 2ρ(x/σX )(y/σY ) − ρ2 (y/σY )2 p pX|Y (x|y) = exp 2(1 − ρ2 ) σX 2π(1 − ρ2 ) 1 ∏ This simplifies to " − [x − ρ(σX /σY )y]2 p pX|Y (x|y) = exp 2 (1 − ρ2 ) 2σX σX 2π(1 − ρ2 ) 1 # (2.24) This says that, given any particular sample value y for the rv Y , the conditional density of 2 (1 − ρ2 ) and mean ρ(σ /σ )y. Given Y = y, we can view X is Gaussian with variance σX X Y X as a random variable in the restricted sample space where Y = y, and in that restricted ° ¢ 2 (1 − ρ2 ) . This means that the fluctuation of X, in sample space, X is N ρ(σX /σY )y, σX this restricted space, has the same density for all y. When we study estimation, we shall find that the facts that, first, the conditional mean is linear in y, and, second, the fluctuation is independent of y, are crucially important. These simplifications will lead to many important properties and insights in what follows. We now go on to show that the same kind of simplification occurs when we study the conditional density of one Gaussian random vector conditional on another Gaussian random vector, assuming that all the variables are jointly Gaussian. 78 CHAPTER 2. GAUSSIAN RANDOM VECTORS ~ be an n + m dimensional Gaussian r~v. View U ~ as a pair of r~v’s X, ~ Y ~ of dimensions Let U T ~ m and n respectively. That is, U = (U1 , U2 , . . . , Un+m ) = (X1 , X2 , . . . , Xm , Y1 , . . . , Yn )= ~T,Y ~ T ). X ~ and Y ~ are called jointly Gaussian r~v’s. If the covariance matrix of U ~ is non(X ~ ~ ~ singular, we say that X and Y are jointly non-singular. In what follows, we assume that X 4 ~ and Y are jointly Gaussian, jointly non-singular, and zero-mean . The covariance matrix ~ can be partitioned into m rows on top and n rows on bottom, and then further KU~ of U partitioned into m and n columns, yielding: KX~ KX~ Y~ B C ; KU~ = K −1 (2.25) ~ = U T K ~ ~ KY~ CT D XY ~X ~ T ], K ~ ~ = E[X ~Y ~ T ], and K ~ = E[Y ~Y ~ T ]. The blocks K ~ , K ~ , B, and Here KX~ = E[X XY Y X Y D are all non-singular (see Exercise 2.11). We will evaluate the blocks B, C, and D in terms of KX~ , KY~ , and KX~ Y~ later, but first we find the conditional density, pX| (~x | ~y ) in ~ Y ~ terms of these blocks. What we shall find is that for any given ~y , this is a Gaussian density with a conditional covariance matrix equal to B −1 . As in (2.24), where X and Y are one~ given dimensional, this covariance does not depend on ~y . Also, the conditional mean of X, −1 −1 ~ = ~y , will turn out to be −B C ~y . Thus, X, ~ conditional on Y ~ = ~y , is N (−B C ~y , B −1 ), Y i.e., £ § exp − 12 (~x + B −1 C ~y )T B (~x + B −1 C ~y ) p pX| (~x|~y ) = (2.26) ~ Y ~ (2π)n/2 det(B −1 ) In order to see this, express pX| (~x|~y ) as pX~ Y~ (~x, ~y )/pY~ (~y ). From (2.17), ~ Y ~ pX, (~x, ~y ) = ~ Y ~ h i T,~ T )T exp − 12 (~xT , ~y T )K −1 (~ x y ~ qU (2π)(n+m)/2 det(K −1 ~ ) U = £ ° ¢§ exp − 12 ~xT B~x + ~xT C~y + ~y T C T ~x + ~y T D~y q (2π)(n+m)/2 det(K −1 ~ ) U We note that ~x only appears in the first three terms of the exponent above, and that ~x does not appear at all in pY~ (~y ) Thus we can express the dependence on ~x in pX| (~x|~y ) by ~ Y ~ pX| (~x | ~y ) = f (~y ) exp ~ Y ~ ( £ §) − ~xT B~x + ~xT C~y + ~y T C T ~x 2 (2.27) where f (~y ) is some function of ~y . We now complete the square around B in the exponent above, getting ∑ ∏ −(~x +B −1 C ~y )T B (~x +B −1 C ~y ) + ~y T C T B −1 C ~y pX| (~x | ~y ) = f (~y ) exp ~ Y ~ 2 4 Exercise 2.12 generalizes this to the case with an arbitrary mean. 2.7. CONDITIONAL PROBABILITIES 79 Since the last term in the exponent does not depend on ~x, we can absorb it into f (~y ). The remaining expression is in the form of (2.26). Since pX| (~x|~y ) must be a probability density ~ Y ~ for each ~y , the modified coefficient f (~y ) must be the reciprocal of the denominator in (2.26), so that pX| (~x|~y ) is given by (2.26). ~ Y ~ ~ , the conditional distribution of To interpret (2.26), note that for any sample value ~y for Y −1 ~ X has a mean given by −B C~y and a Gaussian fluctuation around the mean of variance B −1 . This fluctuation has the same distribution for all ~y and thus can be represented as a ~ that is independent of Y ~ . Thus we can represent X ~ as r~v V ~ = GY ~ +V ~; X ~ ,V ~ independent Y (2.28) where G = −B −1 C ~ ∼ N (0, B −1 ) and V (2.29) ~ is often called an innovation because it is the part of X ~ that is independent of Y ~ . It is also V −1 called a noise term for the same reason. We will call KV~ = B the conditional covariance ~ given any sample value ~y for Y ~ . In summary, the unconditional covariance, K ~ , of of X X ~ is given by the upper left block of K ~ in (2.25), while the conditional covariance K ~ is X U V the inverse of the upper left block, B, of the inverse of KU~ . ~ and Y ~ . To do this, From (2.28), we can express G and KV~ in terms of the covariances of X note that ~Y ~ T ] = E[GY ~Y ~T +V ~Y ~ T ] = GK ~ KX~ Y~ = E[X Y (2.30) h i ~ +V ~ )(GY ~ +V ~ )T = GK ~ GT + K ~ KX~ = E (GY Y V (2.31) G = KX~ Y~ K ~−1 (2.32) Solving these equations, we get Y T KV~ = KX~ − GKY~ GT = KX~ − KX~ Y~ K ~−1 KX ~Y ~ Y (2.33) where we have substituted the solution from (2.32) into (2.33). We can summarize these results in the following theorem. ~ and Y ~ be zero mean, jointly Gaussian, and jointly non-singular. Theorem 2.3. Let X ~ = GY ~ +V ~ where Y ~ and V ~ are independent, Then they can be represented in the form X ~ V ∼ N (0, KV~ ), G is given by (2.32), KV~ is non-singular and given by (2.33), and the ~ given Y ~ is conditional density of X h i exp − 12 (~x − G~y )T K ~−1 (~x − G~y ) q V pX| (~x|~y ) = (2.34) ~ Y ~ n/2 (2π) det(KV~ ) 80 CHAPTER 2. GAUSSIAN RANDOM VECTORS We can also solve for the matrix B = K ~−1 from (2.33), V h i−1 T B = KX~ − KX~ Y~ K ~−1 KX ~Y ~ (2.35) C = −K ~−1 G = −K ~−1 KX~ Y~ K ~−1 (2.36) Y Similarly, from (2.29) and (2.32), V V Y ~ and Y ~ , we can repeat the above arguments, reversing the From the symmetry between X ~ and Y ~ . Thus, we can represent Y ~ as roles of X ~ = HX ~ +Z ~; Y ~ Z ~ independent X, (2.37) ~ ∼ N (0, D−1 ) and Z (2.38) where H = −D−1 C T ~ and Y ~. Using (2.37), we express H and KZ~ in terms of the covariances of X Solving, we get ~ X ~ + Z) ~ T = K ~ HT KX~ Y~ = E[X(H X (2.39) h i ~ + Z)(H ~ ~ + Z) ~ T = HK ~ H T + K ~ KY~ = E (H X X X Z (2.40) −1 T H = KX ~Y ~ K~ (2.41) X −1 T KZ~ = KY~ − HKX~ H T = KY~ − KX ~Y ~ ~Y ~ K ~ KX X The conditional density can now be expressed as h i exp − 12 (~y − H~x)T K −1 (~ y x − H~ x ) ~ q Z pY~ |X~ (~y |~x) = (2π)n/2 det(KZ~ ) (2.42) (2.43) where H and KZ~ are given in (2.41) and (2.42). Finally, from (2.42), D = K −1 ~ is Z h i−1 −1 T D = KY~ − KX K K ~Y ~ ~Y ~ ~ X (2.44) −1 T −1 C T = −K −1 ~Y ~ K~ ~ H = −K ~ KX (2.45) X Similarly, from (2.38) and (2.41), Z Z X 2.8. SUMMARY 81 The matrices B, C, and D can also be derived directly from setting KU~ K −1 ~ = I in (2.26) U (see Exercise 2.16). ~ = HX ~ +Z ~ and want to In many estimation problems, we start with the formulation Y ~ ~ ~ find G and KV~ in the dual formulation X = GY + V . We can use (2.32) and (2.33) directly, but the following expressions are frequently much more useful. From (2.29), we have G = −KV~ C, and from (2.38), we have H = −KZ~ C T , or equivalently, C = −H T K −1 ~ . Z Combining, we have G = KV~ H T K −1 ~ (2.46) Z Next, we can multiply KV~ , as given by (2.33), by K ~−1 to get V −1 T I = KX~ K ~−1 − KX~ Y~ K ~−1 KX ~Y ~ K~ V Y V (2.47) From (2.36) and (2.45), we can express C T in the following two ways: −1 −1 T C T = −K ~−1 KX ~Y ~ K ~ = −K ~ H Y V Z (2.48) The first of these expressions appears at the end of (2.47), and replacing it with the second, (2.47) becomes I = KX~ K ~−1 − KX~ Y~ K −1 ~ H V (2.49) Z −1 T Pre-multiplying all terms by K −1 ~Y ~ , we get the final result, ~ and substituting H for K ~ KX X X T −1 K ~−1 = K −1 ~ + H K~ H V X Z (2.50) We will interpret these equations in Chapter 4 on estimation. 2.8 Summary ~ with a covariance matrix K ~ is a Gaussian n-r~v (equivalently, the n A zero mean n-r~v Z Z ~ is Gaussian for all real n-vectors ~b. An equivalent components are jointly Gaussian) if ~bT Z ~ has the MGF in (2.11). Another equivalent condition is that Z ~ is a condition is that Z linear transformation of n IID normalized Gaussian rv’s. For the case in which KZ~ is non~ has the density in (2.17). If Z ~ has a singular singular, a final equivalent condition is that Z covariance matrix, it should be viewed as a k-dimensional r~v, k < n, with a non-singular ~ has covariance matrix, plus n − k variables that are linear combinations of the first k. If Z ~ −m a mean m, ~ it is a Gaussian n-r~v iff the fluctuation Z ~ is a zero mean Gaussian n-r~v. For a Gaussian n-r~v, the regions of equal probability density form concentric ellipsoids around the mean. The principal axes for these ellipsoids are the eigenvectors of the covariance matrix and the length of each axis is proportional to the square root of the corresponding eigenvalue. 82 CHAPTER 2. GAUSSIAN RANDOM VECTORS If X1 , X2 , . . . , Xm , Y1 , . . . , Yn are zero mean, jointly Gaussian, and have a non-singular overall covariance matrix, then the conditional density pX| x | ~y ) is a Gaussian m-r~v ~ Y ~ (~ for each ~y , and has a fixed covariance KX~ − KX~ Y~ K ~−1 K T~ ~ for each ~y and has a mean Y XY KX~ Y~ K ~−1 ~y , which depends linearly on ~y . This situation can be equivalently formulated as Y ~ = GY ~ +V ~ , where V ~ is a zero mean Gaussian m-r~v independent of Y ~ . Equivalently it X ~ ~ ~ ~ can be formulated as Y = H X + Z where Z is a zero mean Gaussian n-r~v independent of ~ X. 2.9 Exercises 2 Exercise 2.1. a) Let X, Y be IID √ rv’s, each with density pX (x) = α exp(−x /2). In part (b), we show that α must be 1/ 2π in order for pX (x) to integrate to 1, but in this part, we leave α undetermined. Let S = X 2 + Y 2 . Find the probability density of S in terms of α. √ b) Prove from part (a) that α must be 1/ 2π in order for S, and thus X and Y , to be random variables. Show that E[X] = 0 and that E[X 2 ] = 1. √ c) Find the probability density of R = S. R is called a Rayleigh rv. Exercise 2.2. a) By expanding in a power series in (1/2)s2 σ 2 , show that exp µ s2 σ 2 2 ∂ =1+ s2 σ 2 s4 σ 4 s2k σ 2k + + ··· + k + ··· 2 2 2(2 ) 2 (k!) b) By expanding esZ in a power series in sZ, show that gZ (s) = E[esZ ] = 1 + sE[Z] + s2 E[Z 2 ] si E[Z i ] + ··· + + ··· 2 (i)! c) By matching powers of s between (a) and (b), show that for all integer k ≥ 1, E[Z 2k ] = (2k)!σ 2k = (2k − 1)(2k − 3) · · · (3)(1)σ 2k k!2k ; E[Z 2k+1 ] = 0 Exercise 2.3. Let X and Z be IID normalized Gaussian random variables. Let Y = |Z| Sgn(X), where Sgn(X) is 1 if X ≥ 0 and −1 otherwise. Show that X and Y are each Gaussian, but are not jointly Gaussian. Sketch the contours of equal joint probability density. Exercise 2.4. a) Let X1 ∼ N (0, σ12 ) and let X2 ∼ N (0, σ22 ) be independent of X1 . Convolve the density of X1 with that of X2 to show that X1 + X2 is Gaussian. b) Combine part (a) with induction to show that all linear combinations of IID normalized Gaussian rv’s are Gaussian. 2.9. EXERCISES 83 ~ be a r~v with mean m Exercise 2.5. a) Let U ~ and covariance K whose MGF is given by T ~ (2.12). Let X = s U for an arbitrary real vector ~s. Show that the MGF of X is given by £ § 2 /2 and relate E[X] and σ 2 to m gX (r) = exp rE[X] + r2 σX ~ and K. X ~ is a Gaussian r~v. b) Show that U ~ ∼ N (0, K) be n-dimensional. By expanding in a power series in Exercise 2.6. a) Let Z T (1/2)~s K ~s, show that gZ~ (~s) = exp ∑ P ∏ ~s T K~s =1+ 2 i,j si sj [K]i,j 2 + ··· + ≥P i,j si sj [K]i,j 2m m! ¥m + ··· b) By expanding esi Zi in a power series in si Zi for each i, show that " gZ~ (s) = E exp √ X i si Zi !# = 1 X j1 =0 ··· 1 X sj11 sjnn ··· E[Z1j1 . . . Znjn ] (j1 )! (jn )! jn =0 c) Let D = {i1 , i2 , . . . , i2m } be a set of 2m distinct integers each between 1 and n. Consider the term si1 si2 · · · si2m E[Zi1 Zi2 · · · Zi2m ] in part (b). By comparing with the set of terms in part (a) containing the same product si1 si2 · · · si2m , show that E[Zi1 Zi2 · · · Zi2m ] = X j1 j2 ...j2m [K]j1 j2 [K]j3 j4 · · · [K]j2m−1 j2m 2m m! where the sum is over all permutations (j1 , j2 , . . . , j2m ) of the set D. d) Find the number of permutations of D that contain the same set set of unordered pairs ({j1 , j2 }, . . . , {j2m−1 , j2m }). For example, ({1, 2}, {3, 4}) is the same set of unordered pairs as ({3, 4}, {2, 1}). Show that E[Zi1 Zi2 · · · Zi2m ] = X j1 ,j2 ,... ,j2m [K]j1 j2 [K]j3 j4 · · · [K]j2m−1 j2m (2.51) where the sum is over distinct sets of unordered pairs of the set D. Note: another way to say the same thing is that the sum is over the set of all permutations of D for which j2i−1 < j2i for 1 ≤ i ≤ m and j2i−1 < j2i+1 for 1 ≤ i ≤ m − 1. e) To find E[Z1j1 · · · Znjn ], where j1 + j2 + · · · + jn = 2m, construct the random variables U1 , . . . , U2m , where U1 , . . . , Uj1 are all identically equal to Z1 , where Uj1 +1 , . . . , Uj1 +j2 are identically equal to Z2 , etc., and use (i) to find E[U1 U2 · · · U2m ]. Use this formula to find E[Z12 Z2 Z3 ], E[Z12 Z22 ], and E[Z1 ]4 . Exercise 2.7. Let Q be an orthonormal matrix. Show that the squared distance between any two vectors ~z and ~y is equal to the squared distance between Q~z and Q~y . 84 CHAPTER 2. GAUSSIAN RANDOM VECTORS £ .25 § Exercise 2.8. a) Let K = .75 .25 .75 . Show that 1 and 1/2 are eigenvalues of K and find the normalized eigenvectors. Express K as QΛQ−1 where L is diagonal and Q is orthonormal. b) Let K 0 = αK for real α 6= 0. Find the eigenvalues and eigenvectors of K 0 . Don’t use brute force—think! c) Consider the mth power of K, K m for m > 0. Find the eigenvalues and eigenvectors of K m. 2 , σ2 , Exercise 2.9. Let X and Y be jointly Gaussian with means mX , mY , variances σX Y and normalized covariance ρ. Find the conditional density pX|Y (x | y). 2 , σ 2 , and Exercise 2.10. a) Let X and Y be zero mean jointly Gaussian with variances σX Y normalized covariance ρ. Let V = Y 3 . Find the conditional density pX|V (x | v). Hint: This requires no computation. b) Let U = Y 2 and find the conditional density of pX|U (x | u). Hint: first understand why this is harder than part a). ~ T = (X ~T,Y ~ T ) have a non-singular covariance matrix K ~ . Show Exercise 2.11. a) Let U U that KX~ and KY~ are positive definite, and thus non-singular. b) Show that the matrices B and D in (2.25) are also positive definite and thus non-singular. ~ and Y ~ be jointly Gaussian r~v’s with means m Exercise 2.12. Let X ~ X~ and m ~ Y~ , covariance matrices KX~ and KY~ and cross covariance matrix KX~ Y~ . Find the conditional probability ~T,Y ~ T ) is non-singular. Hint: think density pX| x | ~y ). Assume that the covariance of (X ~ Y ~ (~ ~ and Y ~. of the fluctuations of X ~ be a normalized IID Gaussian n-r~v and let Y ~ be a Gaussian m-r~v. Exercise 2.13. a) Let W T ~Y ~ ] to be some arbitrary real valued n by Suppose we would like the cross covariance E[W ~ ~ achieves the desired cross covariance. m matrix K. Find the matrix A such that Y = AW Note: this shows that any real valued n by m matrix is the cross covariance matrix for some choice of random vectors. ~ be a zero mean Gaussian n-r~v with non-singular covariance K ~ , and let Y ~ be a b) Let Z Z ~Y ~ T ] to be some arbitrary Gaussian m-r~v. Suppose we would like the cross covariance E[Z 0 ~ = BZ ~ achieves the desired real valued n by m matrix K . Find the matrix B such that Y cross covariance. Note: this shows that any real valued n by m matrix is the cross covariance ~ and Y ~ where K ~ is given (and non-singular). matrix for some choice of random vectors Z Z ~ has a singular covariance matrix in part b). Explain the constraints c) Now assume that Z ~Y ~ T ]. Hint: your solution should this places on possible choices for the cross covariance E[Z involve the eigenvectors of KZ~ . ~ = (W1 , W2 , . . . , W2n )T be a 2n dimensional IID normalized Exercise 2.14. a) Let W 2 . Show that S Gaussian r~v. Let S2n = W12 + W22 + · · · + W2n 2n is an nth order Erlang −n n−1 −s/2 rv with parameter 1/2, i.e., that pS2n (s) = 2 s e /(n − 1)!. Hint: look at S2 from Exercise 1. 2.9. EXERCISES b) Let R2n = 85 √ S2n . Find the probability density of R2n . c) Let v2n (r) be the volume of a 2n dimensional sphere of radius r and let b2n (r) be the surface area of that sphere, i.e., b2n (r) = dv2n (r)/dr. The point of this exercise is to show how to calculate these quantities. By considering an infinitesimally thin spherical shell of thickness δ at radius r, show that pR2n (r) = b2n (r)pW ~ |W ~ (w) ~ :W ~ TW ~ =r2 d) Calculate b2n (r) and v2n (r). Note that for any fixed δ ø r, the volume within δ of the surface of a sphere of radius r to the total volume of the sphere approaches 1 with increasing n. ~ = (X, Y )T . Solve directly for B, C, and D in (2.25) for this case, Exercise 2.15. a) Let U and show that (2.26) agrees with (2.24) b) Show that your solution for B, C, and D agrees with (2.35), (2.36), and (2.44). Exercise 2.16. Express B, C, and D in terms of KX~ , KY~ and KX~ Y~ by multiplying the block expression for KU~ by that for K −1 ~ . Show that your answer agrees with that in (2.35), U (2.36), and (2.44).