Chapter 3 PROBABILITY AND STOCHASTIC PROCESSES God does not play dice with the universe. —Albert Einstein Not only does God definitely play dice, but He sometimes confuses us by throwing them where they cannot be seen. —Stephen Hawking Abstract This chapter aims to provide a cohesive overview of basic probability concepts, starting from the axioms of probability, covering events, event spaces, joint and conditional probabilities, leading to the introduction of random variables, discrete probability distributions (PDP) and probability density functions (PDFs). Then follows an exploration of random variables and stochastic processes including cumulative distribution functions (CDFs), moments, joint densities, marginal densities, transformations and algebra of random variables. A number of useful univariate densities (Gaussian, chi-square, non-central chi-square, Rice, etc.) are then studied in turn. Finally, an introduction to multivariate statistics is provided, including the exterior product (used instead of the conventional determinant in matrix r.v. transformations), Jacobians of random matrix transformations and culminating with the introduction of the Wishart distribution. Multivariate statistics are instrumental in characterizing multidimensional problems such as array processing operating over random vector or matrix channels. Throughout the chapter, an emphasis is placed on complex random variables, vectors and matrices, because of their unique importance in digital communications. 91 92 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING This material to appear as part of the book Space-Time Methods, vol. 1: Space-Time Processing c �2009 by S«ebastien Roy. All rights reserved. 1. Introduction The theory of probability and stochastic processes is of central importance in many core aspects of communication theory, including modeling of information sources, of additive noise, and of channel characteristics and fluctuations. Ultimately, through such modeling, probability and stochastic processes are instrumental in assessing the performance of communication systems, wireless or otherwise. It is expected that most readers will have at least some familiarity with this vast topic. If that is not the case, the brief overview provided here may be less than satisfactory. Interested readers who wish for a fuller treatment of the subject from an engineering perspective may consult a number of classic textbooks [Papoulis and Pillai, 2002], [Leon-Garcia, 1994], [Davenport, 1970]. The subject matter of this book calls for the development of a perhaps lesserknown, but increasingly active, branch of statistics, namely multivariate statistical theory. It is an area whose applications — in the field of communication theory in general, and in array processing / MIMO systems in particular — have grown considerably in recent years, mostly thanks to the usefulness and polyvalence of the Wishart distribution. To supplement the treatment given here, readers are directed to the excellent textbooks [Muirhead, 1982] and [Andersen,1958]. 2. Probability Experiments, events and probabilities Definition 3.1. A probability experiment or statistical experiment consists in performing an action which may result in a number of possible outcomes, the actual outcome being randomly determined. For example, rolling a die and tossing a coin are probability experiments. In the first case, there are 6 possible outcomes while in the second case, there are 2. Definition 3.2. The sample space S of a probability experiment is the set of all possible outcomes. In the case of a coin toss, we have S = {t, h} , (3.1) 93 Probability and stochastic processes where t denotes “tails” and h denotes “heads”. Another, slightly more sophisticated, probability experiment could consist in tossing a coin five times and defining the outcome as being the total number of heads obtained. Hence, the sample space would be S = {1, 2, 3, 4, 5} . (3.2) Definition 3.3. An event E occurs if the outcome of the experiment is part of a predetermined subset (as defined by E) of the sample space S. For example, let the event A correspond to the obtention of an even number of heads in the five-toss experiment. Therefore, we have A = {2, 4} . (3.3) A single outcome can also be considered an event. For instance, let the event B correspond to the obtention of three heads in the five-toss experiment, i.e. B = {3}. This is also called a single event or a sample point (since it is a single element of the sample space). Definition 3.4. The complement of an event X consists of all the outcomes (sample points) in the sample space S that are not in event X. For example, the complement of event A consists in obtaining an odd number of heads, i.e. Ā = {1, 3, 5} . (3.4) Two events are considered mutually exclusive if they have no outcome in common. For instance, A and Ā are mutually exclusive. In fact, an event and its complement must by definition be mutually exclusive. The events C = {1} and D = {3, 5} are also mutually exclusive, but B = {3} and D are not. Definition 3.5. Probability (simple definition): If an experiment has N possible and equally-likely exclusive outcomes, and M ≤ N of these outcomes constitute an event E, then the probability of E is P (E) = M . N (3.5) Example 3.1 Consider a die roll. If the die is fair, all sample point are equally likely. Defining event Ek as corresponding to outcome / sample point k, where k ranges from 1 to 6, we have 1 P (Ek ) = , k = 1 . . . 6. 6 Given an event F = {1, 2, 3, 4}, we find P (F ) = 4 2 = , 6 3 94 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING and 2 1 = . 6 3 Definition 3.6. The sum or union of two events is an event that contains all the outcomes in the two events. P (F̄ ) = For example, C ∪ D = {1} ∪ {3, 5} = {1, 3, 5} = Ā. (3.6) Therefore the event Ā corresponds to the occurence of event C or event D. Definition 3.7. The product or intersection of two events is an event that contains only the outcomes that are common in the two events. For example and Ā ∩ B = {1, 3, 5} ∩ {3} = {3} , (3.7) {1, 2, 3, 4} ∩ Ā = {1, 2, 3, 4} ∩ {1, 3, 5} = {1, 3} . (3.8) Therefore, the intersection corresponds to the occurence of one event and the other. It is noteworthy that the intersection of two mutually exclusive events yields the null event, e. g. E ∩ Ē = ∅, (3.9) where ∅ denotes the null event. A more rigorous definition of probability calls for the statement of four postulates. Definition 3.8. Probability (rigorous definition): Given that each event E is associated with a corresponding probability P (E), Postulate 1: The probability of a given event E is such that P (E) ≥ 0. Postulate 2: The probability associated with the null event is zero, i.e. P (∅) = 0. Postulate 3: The probability of the event corresponding to the entire sample space (referred to as the certain event) is 1, i.e. P (S) = 1. Postulate 4: Given a number N of mutually exclusive events X1 , X2 , . . . XN , then the probability of the union of these events is given by N � � � P ∪N X = P (Xi ). i=1 i i=1 (3.10) 95 Probability and stochastic processes From postulates 1, 3 and 4, it is easy to deduce that the probability of an event E must necessarily satisfy the condition P (X) ≤ 1. The proof is left as an exercise. Twin experiments It is often of interest to consider two separate experiments as a whole. For example, if one probability experiment consists of a coin toss with event space S = {h, t}, two consecutive (or simultaneous) coin tosses constitute twin experiments or a joint experiment. The event space of such a joint experiment is therefore S2 = {(h, h), (h, t), (t, h), (t, t)} . (3.11) Let Xi (i = 1, 2) correspond to an outcome on the first coin toss and Yj (j = 1, 2) correspond to an outcome on the second coin toss. To each joint outcome (Xi , Yj ) is associated a joint probability P (Xi , Yj ). This corresponds naturally to the probability that events Xi and Yj occurred and it can therefore also be written P (Xi ∩ Yj ). Suppose that given only the set of joint probabilities, we wish to find P (X1 ) and P (X2 ), that is, the marginal probabilities of events X1 and X2 . Definition 3.9. The marginal probability of an event E is, in the context of a joint experiment, the probability that event E occurs irrespective of the other constituent experiments in the joint experiment. In general, a twin experiment has outcomes Xi , i = 1, 2, . . . , N1 , for the first experiment and Yj , j = 1, 2, . . . , N2 for the second experiment. If all the Yj are mutually exclusive, the marginal probability of Xi is given by P (Xi ) = N2 � P (Xi , Yj ) . (3.12) j=1 By the same token, if all the Xi are mutually exclusive, we have P (Yj ) = N1 � P (Xi , Yj ) . (3.13) i=1 Now, suppose that the outcome of one of the two experiments is known, but not the other. Definition 3.10. The conditional probability of an event Xi given an event Yj is the probability that event Xi will occur given that Yj has occured. 96 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING The conditional probability of Xi given Yj is defined as P (Xi |Yj ) = P (Xi , Yj ) . P (Yj ) (3.14) P (Yj |Xi ) = P (Xi , Yj ) . P (Xi ) (3.15) Likewise, we have A more general form of the above two relations is known as Bayes’ rule. Definition 3.11. Bayes’ rule: Given {Y1 , . . . , YN }, a set of N mutually exclusive events whose union forms the entire sample space S, and X is any arbitrary event in S with non-zero probability (P (X) ≥ 0), then P (Yj |X) = = = P (X, Yj ) P (X) P (Yj ) P (X|Yj ) P (X) P (Yj ) P (X|Yj ) . �N j=1 P (Yj ) P (X|Yj ) (3.16) (3.17) Another important concern in a twin experiment is whether or not the occurrence of one event (Xi ) influences the probability that another event (Yj ) will occur, i.e. whether the events Xi and Yj are independent. Definition 3.12. Two events X and Y are said to be statistically independent if P (X|Y ) = P (X) and P (Y |X) = P (Y ). For instance, in the case of the twin coin toss joint experiment, the events X1 = {h} and X2 = {t} — being the potential outcomes of the first coin toss — are independent from the events Y1 = {h} and Y2 = {t}, the potential outcomes of the second coin toss. However, X1 is certainly not independent from X2 since the occurence of X1 precludes the occurence of X2 . Typically, independence results if the two events considered are generated by physically separate probability experiments (e. g. two different coins). Example 3.2 Consider a twin die toss. The results on one die is independent from the other since there is no physical linkage between the two dice. However, if we reformulate the joint experiment as follows: Experiment 1: the outcome is the sum of the two dice and corresponds to the event Xi , i=1, 2, . . . , 11. The corresponding sample space is SX = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. Probability and stochastic processes 97 Experiment 2: the outcome is the magnitude of the difference between the two dice and corresponds to the event Yj , j=1, 2, . . . , 6. The corresponding sample space is SY = {0, 1, 2, 3, 4, 5}. In this case, the two experiments are linked since they are both derived from the same dice toss. For example, if Xi = 4, then Yj can only take the values {0, 2}. We therefore have statistical dependence. From (3.14), we find the following: Definition 3.13. Multiplicative rule: Given two events X and Y , the probability that they both occur is P (X, Y ) = P (X)P (Y |X) = P (Y )P (X|Y ). (3.18) Hence, the probability that X and Y occur is the probability that one of these events occurs times the probability that the other event occurs, given that the first one has occurred. Furthermore, should events X and Y be independent, the above reduces to (special multiplicative rule): P (X, Y ) = P (X)P (Y ). 3. (3.19) Random variables In the preceding section, it was seen that a statistical experiment is any operation or physical process by which one or more random measurements are made. In general, the outcome of such an experiment can be conveniently represented by a single number. Let the function X(s) constitute such a mapping, i.e. X(s) takes on a value on the real line as a function of s where s is an arbitrary sample point in the sample space S. Then X(s) is a random variable. Definition 3.14. A function whose value is a real number and is a function of an element chosen in a sample space S is a random variable or r.v.. Given a die toss, the sample space is simply S = {1, 2, 3, 4, 5, 6} . (3.20) In a straightforward manner, we can define a random variable X(s) = s which takes on a value determined by the number of spots found on the top surface of the die after the roll. A slightly less obvious mapping would be the following � −1 if s = 1, 3, 5, X(s) = (3.21) 1 if s = 2, 4, 6, 98 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING which is an r.v. that takes on a value of 1 if the die roll is even and -1 otherwise. Random variables need not be defined directly from a probability experiment, but can actually be derived as functions of other r.v.’s. Going back to example 3.2, and letting D1 (s) = s be an r.v. associated with the first die and D2 (s) = s with the second die, we can define the r.v’s X = D1 + D2 , Y = |D1 − D2 |, (3.22) (3.23) where X is an r.v. corresponding to the sum of the dice and defined as X(sX ) = sX , where sX is an element of the sample space SX = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. The same holds for Y , but with respect to sample space SY = {0, 1, 2, 3, 4, 5}. Example 3.3 Consider a series of N consecutive coin tosses. Let us define the r.v. X as being the total number of heads obtained. Therefore, the sample space of X is SX = {0, 1, 2, 3, . . . , N } . (3.24) Furthermore, we define an r.v. Y = X N , it being the ratio of the number of heads to the total number of tosses. If N = 1, the sample � space �Y corresponds to the 1 set {0, 1}. If N = 2, the sample space becomes 0, , 1 , and if N = 10, it is 2 � 1 2 � 0, 10 , 10 , . . . , 1 . Hence, the variable Y is always constrained between 0 and 1; however, if we let N tend towards infinity, it can take an infinite number of values within this interval (its sample space is infinite) and it thus becomes a continuous r.v. Definition 3.15. A continuous random variable is an r.v. which is not restricted to a discrete set of values, i.e. it can take any real value within a predetermined interval or set of intervals. It follows that a discrete random variable is an r.v. restricted to a finite set of values (its sample space is finite) and corresponds to the type of r.v. and probability experiments discussed so far. Typically, discrete r.v.’s are used to represent countable data (number of heads, number of spots on a die, number of defective items in a sample set, etc.) while continuous r.v.’s are used to represent measurable data (heights, distances, temperatures, electrical voltages, etc.). Probability distribution Since each value that a discrete random variable can assume corresponds to an event or some quantification / mapping / function of an event, it follows that each such value is associated with a probability of occurrence. 99 Probability and stochastic processes Consider the variable X in example 3.3. If N = 4, we have the following: x P (X = x) 0 1 2 3 4 1 16 4 16 6 16 4 16 1 16 Assuming a fair coin, the underlying sample space is made up of sixteen outcomes S = {tttt, ttth, ttht, tthh, thtt, thth, thht, thhh, httt, htth, htht, hthh, hhtt, hhth, hhht, hhhh} , (3.25) 1 and each of its sample points is equally likely with probability 16 . Only one of 1 these outcomes (tttt) has no heads; it follows that P (X = 0) = 16 . However, four outcomes have one head and three tails (httt, thtt, ttht, ttth), leading to 4 to P (X = 1) = 16 . Furthermore, there are 6 possible combinations of 2 heads 6 and 2 tails, yielding P (X = 2) = 16 . By symmetry, we have P (X = 3) = 4 1 P (X = 1) = 16 and P (X = 4) = P (X = 0) = 16 . Knowing that the number of combinations of N distinct objects taken n at a time is � � N! N = , (3.26) n n!(N − n)! the probabilities tabulated above (for N = 4) can be expressed with a single formula, as a function of x: � � 1 4 P (X = x) = , x = 0, 1, 2, 3, 4. (3.27) x 16 Such a formula constitutes the discrete probability distribution of X. Suppose now that the coin is not necessarily fair and is characterized by a probability p of getting heads and a probability 1 − p of getting tails. For arbitrary N , we have � � N P (X = x) = px (1 − p)N −x , x ∈ {0, 1, . . . , N } , (3.28) x which is known as the binomial distribution. Probability density function Consider again example 3.3. We have found that the variable X has a discrete probability distribution of the form (3.28). Figure 3.1 shows histograms of P (X = x) for increasing values of N . It can be seen that as N gets large, the histogram approaches a smooth curve and its “tails” spread out. Ultimately, if we let N tend towards infinity, we will observe a continuous curve like the one in Figure 3.1d. 100 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING The underlying r.v. is of the continuous variety, and Figure 3.1d is a graphical representation of its probability density function (PDF). While a PDF cannot be tabulated like a discrete probability distribution, it certainly can be expressed as a mathematical function. In the case of Figure 3.1d, the PDF is expressed 2 − x2 1 fX (x) = √ e 2σX , 2πσX (3.29) 2 is the variance of the r.v. X. This density function is the allwhere σX important normal or Gaussian distribution. See the Appendix for a derivation of this PDF which ties in with the binomial distribution. 0.2 0.25 0.15 P (X = x) P (X = x) 0.2 0.15 0.1 0.1 0.05 0.05 0 0 2 1 3 4 5 6 7 8 1 9 2 3 4 5 6 7 8 x (a) N = 8 (b) N = 16 0.14 0.14 0.12 0.12 0.1 0.1 P (X = x) P (X = x) 9 10 11 12 13 14 15 16 17 x 0.08 0.06 0.08 0.06 0.04 0.04 0.02 0.02 0 1 5 10 20 15 25 30 33 x (c) N = 32 0 0 5 10 20 15 25 30 x (d) Gaussian PDF Figure 3.1. Histograms of the discrete probability function of the binomial distributions with (a) N = 8; (b) N = 16; (c) N = 32; and (d) a Gaussian PDF with the same mean and variance as the binomial distribution with N = 32. A PDF has two important properties: 1. fX (x) ≥ 0 for all x, since a negative probability makes no sense; �∞ 2. −∞ fX (x)dx = 1 which is the continuous version of postulate 3 from section 2. 101 Probability and stochastic processes It is often of interest to determine the probability that an r.v. takes on a value which is smaller than a predetermined threshold. Mathematically, this is expressed � x FX (x) = P (X ≤ x) = fX (α)dα, −∞ < x < ∞, (3.30) −∞ where fX (x) is the PDF of variable X and FX (x) is its cumulative distribution function (CDF). A CDF has three outstanding properties: 1. FX (−∞) = 0 (obvious from (3.30)); 2. FX (∞) = 1 (a consequence of property 2 of PDFs); 3. FX (x) increases in a monotone fashion from 0 to 1 (from (3.30) and property 1 of PDFs). Furthermore, the relation (3.30) can be inverted to yield dFX (x) . (3.31) dx It is also often useful to determine the probability that an r.v. X takes on a value falling in an interval [x1 , x2 ]. This can be easily accomplished using the PDF of X as follows: � x2 � x2 � x1 P (x1 < X ≤ x2 ) = fX (x)dx = fX (x)dx − fX (x)dx fX (x) = x1 −∞ = FX (x2 ) − FX (x1 ). −∞ (3.32) This leads us to the following counter-intuitive observation. If the PDF is continuous and we wish to find P (X = x1 ), it is insightful to proceed as folllows: lim P (x1 < X ≤ x2 ) x2 → x1 � x2 lim = fX (x)dx x2 → x1 x1 � x2 = fX (x1 ) lim dx x2 → x1 x1 P (X = x1 ) = = fX (x1 ) lim (x2 − x1 ) x2 → x1 = 0. (3.33) 102 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Hence, the probability that X takes on exactly a given value x1 is null. Intuitively, this is a consequence of the fact that X can take on an infinity of values and the “sum” (integral) of the associated probabilities must be equal to 1. However, if the PDF is not continuous, the above observation doesn’t necessarily hold. A discrete r.v., for example, not only has a discrete probability distribution, but also a corresponding PDF. Since the r.v. can only take on a finite number of values, its PDF is made up of Dirac impulses at the locations of the said values. For the binomial distribution, we have � N � � N fX (x) = pn (1 − p)N −n δ(x − n), n (3.34) n=0 and, for a general discrete r.v. with a sample space of N elements, we have fX (x) = N � n=1 P (X = xn )δ(x − xn ). (3.35) Moments and characteristic functions What exactly is the mean of a random variable? One intuitive answer is that it is the “most likely” outcome of the underlying experiment. However, while this is not far from the truth for well-behaved PDFs (which have a single peak at, or in the vicinity of, their mean), it is misleading since the mean, in fact, may not even be a part of the PDF’s support (i.e. its sample space or range of possible values). Nonetheless, we refer to the mean of an r.v. as its expected value, denoted by the expectation operator �·�. It is defined �X� = µX = � ∞ xfX (x)dx, (3.36) −∞ and it also happens to be the first moment of the r.v. X. The expectation operator bears its name because it is indeed the best “educated guess” one can make a priori about the outcome of an experiment given the associated PDF. While it may indeed lie outside the set of allowable values of the r.v., it is the quantity that will on average minimize the error between X and its a priori estimate X̂ = �X�. Mathematically, this is expressed �X� = min X̂ �� �� � � �X − X̂ � . (3.37) 103 Probability and stochastic processes Although this is a circular definition, it is insightful, especially if we expand the right-hand side into its integral representation: �X� = min X̂ � ∞ −∞ � � � � �X̂ − x� fX (x)dx. (3.38) For another angle to this concept, consider a series of N trials of the same experiment. If each trial is unaffected by the others, the outcomes are associated with a set of N independent and identically distributed (i.i.d.) r.v.’s X1 , . . . , XN . The average of these outcomes is itself an r.v. given by Y = N � Xn n=1 N (3.39) . As N gets large, Y will tend towards the mean of the Xn ’s and in the limit Y = lim N →∞ N � Xn n=1 N = �X� , (3.40) where �X� = �X1 � = . . . = �XN �. This is known in statistics as the weak law of large numbers. In general, the k th raw moment is defined as � � � ∞ Xk = xk fX (x)dx, (3.41) −∞ where the law of large numbers still holds since � ∞ � � � ∞ k k X = x fX (x)dx = zfZ (z)dz = �Z� , −∞ (3.42) −∞ where fZ (z) is the PDF of Z = X k . In the same manner, we can find the expectation of any arbitrary function Y = g(X) as follows � ∞ �Y � = �g(X)� = g(x)fX (x)dx. (3.43) −∞ One such useful function is Y = (X − µX )k where µX = �X� is the mean of X. Definition 3.16. The expectation of Y = (X − µX )k is the k th central moment of the r.v. X. 104 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING From (3.43), we have � � � k �Y � = (X − µX ) = ∞ −∞ (x − µX )k fX (x)dx. (3.44) Definition 3.17. The 2nd central moment �X − µX �2 is called the variance and its square root is the standard deviation. 2 , is given by The variance of X, denoted σX � ∞ 2 σX = (x − µX )2 fX (x)dx, (3.45) −∞ and it is useful because (like the standard deviation), it provides a measure of the degree of dispersion of the r.v. X about its mean. Expanding the quadratic form (x − µX )2 in (3.45) and integrating term-byterm, we find � � 2 σX = X 2 − 2µX �X� + µ2X � � = X 2 − 2µ2X + µ2X � � = X 2 − µ2X . (3.46) In deriving the above, we have inadvertently exploited two of the properties of the expectation operator: Property 3.1. Given the expectation of a sum, where each term may or may not involve the same random variable, it is equal to the sum of the expectations, i.e. � �X + Y � = �X� + �Y � , � � � X + X 2 = �X� + X 2 . Property 3.2. Given the expectation of the product of an r.v. by a deterministic quantity α, the said quantity can be removed from the expectation, i.e. �αX� = α �X� . These two properties stem readily from the (3.43) and the properties of integrals. Theorem 3.1. Given the expectation of a product of random variables, it is equal to the product of expectations, i.e. �XY � = �X� �Y � , if and only if the two variables X and Y are independent. 105 Probability and stochastic processes Proof. We can readily generalize (3.43) for the multivariable case as follows: � � ∞ ∞ ··· �g(X1 , X2 , · · · , XN )� = g (X1 , X2 , · · · , XN ) × −∞ � �� −∞� N -fold fX1 ,··· ,XN (x1 , · · · xN ) dx1 · · · dxN . (3.47) Therefore, we have �XY � = � ∞ −∞ � ∞ xyfX,Y (x, y)dxdy, (3.48) −∞ where, if and only if X and Y are independent, the density factors to yield � ∞� ∞ �XY � = xyfX (x)fY (y)dxdy −∞ −∞ � ∞ � ∞ = xfX (x)dx yfY (y)dy −∞ ∞ = �X� �Y � . (3.49) The above theorem can be readily extended to the product of N independent variables or N expressions of independent random variables by applying (3.46). Definition 3.18. The characteristic function (C. F.) of an r.v. X is defined � ∞ � jtX � φX (jt) = e = ejtX fX (x)dx, (3.50) where j = √ −∞ −1. It can be seen that the integral is in fact an inverse Fourier transform in the variable t. It follows that the inverse of (3.50) is � ∞ 1 fX (x) = φX (jt)e−jtx dt. (3.51) 2π −∞ Characteristic functions play a role similar to the Fourier transform. In other words, some operations are easier to perform in the characteristic function domain than in the PDF domain. For example, there is a direct relationship between the C. F. and the moments of a random variable, allowing the latter to be obtained without integration (if the C. F. is known). 106 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING The said relationship involves evaluation of the k th derivative of the C. F. at t = 0, i. e. � � � k � k k d φX (jt) � X = (−j) . (3.52) dtk �t=0 As will be seen later, characteristic functions are also useful in determining the PDF of sums of random variables. Furthermore, since the crossing into the C. F. domain is, in fact, an inverse Fourier transform, all properties of Fourier transforms hold. Functions of one r.v. Consider an r.v. Y defined as a function of another r.v. X, i.e. Y = g(X). (3.53) If this function is uniquely invertible, we have X = g −1 (Y ) and FY (y) = P (Y ≤ y) = P (g(X) ≤ y) = P (X ≤ g −1 (y)) = FX (g −1 (y)). (3.54) Differentiating with respect to y allows us to relate the PDFs of X and Y : � � d fX g −1 (y) dy � � � � d g −1 (y) d = FX g −1 (y) −1 d (g (y)) dy � � � −1 � d g −1 (y) d = FX g (y) d (g −1 (y)) dy � −1 � � � d g (y) = fX g −1 (y) dy fY (y) = (3.55) (3.56) In the general case, the equation Y = g(X) may have more than one root. If there are N real roots denoted x1 (y), x2 (y), . . . , xN (y), the PDF of Y is given by � � N � � ∂xn (y) � �. fY (y) = fX (xn (y)) �� (3.57) ∂y � n=1 Example 3.4 Let Y = AX + B where A and B are arbitrary constants. Therefore, we have X = Y −B A and � � ∂ y−B A ∂y = 1 A 107 Probability and stochastic processes . It follows that 1 fY (y) = fX A � y−B A � . (3.58) Suppose that X follows a Gaussian distribution with a mean of zero and a 2 . Hence, variance of σX 2 fX (x) = √ − x2 1 e 2σX , 2πσX (3.59) and (y−B)2 fY (y) = √ − 2 2 1 e 2A σX . 2πσX A (3.60) 2 A2 and its mean The r.v. Y is still a Gaussian variate, but its variance is σX is B. Pairs of random variables In performing multiple related experiments or trials, it becomes necessary to manipulate multiple r.v.’s. Consider two r.v.’s X1 and X2 which stem from the same experiment or from twin related experiments. The probability that X1 < x1 and X2 < x2 is determined from the joint CDF P (X1 < x1 , X2 < x2 ) = FX1 ,X2 (x1 , x2 ). (3.61) Furthermore, the joint PDF can be obtained by derivation: fX1 ,X2 (x1 , x2 ) = ∂ 2 FX1 ,X2 (x1 , x2 ) . ∂x1 ∂x2 (3.62) Following the concepts presented in section 2 for probabilities, the PDF of the r.v. X1 irrespective of X2 is termed the marginal PDF of X1 and is obtained by “averaging out” the contribution of X2 to the joint PDF, i.e. � ∞ fX1 (x1 ) = fX1 ,X2 (x1 , x2 )dx2 . (3.63) −∞ According to Bayes’ rule, the conditional PDF of X1 given that X2 = x2 is given by fX1 ,X2 (x1 , x2 ) fX1 |X2 (x1 |x2 ) = . (3.64) fX2 (x2 ) 108 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Transformation of two random variables Given 2 r.v.’s X1 and X2 with joint PDF fX1 ,X2 (x1 , x2 ), let Y1 = g(X1 , X2 ), Y2 = g2 (X1 , X2 ), where g1 (X1 , X2 ) and g2 (X1 , X2 ) are 2 arbitrary singlevalued continuous functions of X1 and X2 . Let us also assume that g1 and g2 are jointly invertible, i.e. X1 = h1 (Y1 , Y2 ), X2 = h2 (Y1 , Y2 ), (3.65) where h1 and h2 are also single-valued and continuous. The application of the transformation defined by g1 and g2 amounts to a change in the coordinate system. If we consider an infinitesimal rectangle of dimensions ∆y1 ×∆y2 in the new system, it will in general be mapped through the inverse transformation (defined by h1 and h2 ) to a four-sided curved region in the original system (see Figure 3.2). However, since the region is infinitesimally small, the curvature induced by the transformation can be abstracted out. We are left with a parallelogram and we wish to calculate its area. y2 ∆x2 w ∆y1 h1 , h2 α ∆x1 ∆y2 g1 , g2 v y1 Figure 3.2. Coordinate system change under transformation (g1 , g2 ). This can be performed by relying on the tangential vectors v and w. The sought-after area is then given by A = ∆x1 ∆x2 = �v�2 �w�2 sin α∆y1 ∆y2 � � v[1] w[1] = det ∆y1 ∆y2 , v[2] w[2] (3.66) where the scaling factor is denoted � � �� � A v[1] w[1] �� � J= = det , v[2] w[2] � ∆y1 ∆y2 � (3.67) and is the Jacobian J of the transformation. The Jacobian embodies the scaling effects of the transformation and ensures that the new PDF will integrate to unity. 109 Probability and stochastic processes Thus, the joint PDF of Y1 and Y2 is given by fY1 ,Y2 (y1 , y2 ) = JfX1 ,X2 (g1−1 (y1 , y2 ), g2−1 (y1 , y2 )). From the tangential vectors, the Jacobian is defined � � �� � � �� ∂h1 (x1 ,x2 ) ∂h2 (x1 ,x2 ) � � � ∂y �� � � ∂x ∂x J = �det ∂h1 (x11,x2 ) ∂h2 (x11,x2 ) � = ��det . � � ∂x � ∂x ∂x 2 (3.68) (3.69) 2 In the above, it has been assumed that the mapping between the original and the new coordinate system was one-to-one. What if several regions in the original domain map to the same rectangular area in the new domain? This implies that the system Y1 = h1 (X1 , X2 ), (3.70) Y2 = h2 (X1 , X2 ), � � � � � � (K) (K) (1) (1) (2) (2) . Then all has several solutions x1 , x2 , x1 , x2 , . . . , x1 , x2 these solutions (or roots) contribute equally to the new PDF, i.e. fY1 ,Y2 (y1 , y2 ) = K � k=1 � � � � (k) (k) (k) (k) fX1 ,X2 x1 , x2 J x1 , x2 . (3.71) Multiple random variables Transformation of N random variables It is often the case (and especially in multidimensional signal processing problems, which constitute a main focal point of the present book) that a relatively large set of random variables must be considered collectively. It can be considered then that such a set of variables represent the state of a single stochastic process. Given an arbitrary number N of random variables X1 , . . . , XN which collectively behave according to joint PDF fX1 ,X2 ,··· ,XN (x1 , x2 , . . . , xN ), a transformation is defined by the system of equations y = g(x), (3.72) where x and y are N × 1 vectors, and g is an N × 1 vector function of vector x. The reverse transformation may have one or more solutions, i.e. x(k) = gk−1 (y), where K is the number of solutions. k ∈ [1, 2, · · · , K], (3.73) 110 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING The Jacobian corresponding to the kth solution is given by � � �� � ∂gk−1 (x) �� � Jk = �det �, � � ∂x (3.74) which is simply a generalization (with a slight change in notation) of (3.69). This directly leads to fy (y) = K � k=1 � � fx gk−1 (y) Jk . (3.75) Joint characteristic functions Given a set of N random variables characterized by a joint PDF, it is relevant to define a corresponding joint characteristic function. Definition 3.19. The joint characteristic function of a set of r.v.’s X1 , X2 , . . . , XN with joint PDF fX1 ,X2 ,··· ,XN (x1 , x2 , . . . , xN ) is given by � � � φX1 ,X2 ,··· ,XN (t1 , t2 , . . . , tN ) = ej(t1 x1 +t2 x2 +···tN xN ) = ∞ � ∞ ··· ej(t1 x1 +···tN xN ) fX1 ,··· ,XN (x1 , . . . , xN )dx1 · · · dxN . −∞ −∞ � �� � N -fold Algebra of random variables Sum of random variables Suppose we want to characterize an r.v. which is defined as the sum of two other independent r.v.’s, i.e. Y = X1 + X2 . (3.76) One way to attack this problem is to fix one r.v., say X2 , and treat this as a transformation from X1 to Y . Given X2 = x2 , we have Y = g(X1 ), (3.77) g(X1 ) = X1 + x2 , (3.78) g −1 (Y ) = Y − x2 . (3.79) where and 111 Probability and stochastic processes It follows that ∂g −1 (y) ∂y = fX1 (Y − x2 ). fY |X2 (y|x2 ) = fX1 (y − x2 ) (3.80) Furthermore, we know that fY (y) = = � ∞ �−∞ ∞ −∞ fY,X2 (y, x2 )dx2 fY |X2 (y|x2 )fX2 (x2 )dx2 which, by substituting (3.80), becomes � ∞ fY (y) = fX1 (y − x2 )fX2 (x2 )dx2 . (3.81) (3.82) −∞ This is the general formula for the sum of two independent r.v.’s and it can be observed that it is in fact a Fourier convolution of the two underlying PDFs. It is common knowledge that a convolution becomes a simple multiplication in the Fourier transform domain. The same principle applies here with characteristic functions and it is easily demonstrated. Consider a sum of N independent random variables: Y = N � (3.83) Xn . n=1 The characteristic function of Y is given by � � � � PN φY (jt) = ejtY = ejt n=1 Xn . Assuming that the Xn ’s are independent, we have �N � N N � � � jtXn � � jtXn φY (jt) = e e = φXn (jt). = n=1 n=1 (3.84) (3.85) n=1 Therefore, the characteristic function of a sum of r.v.’s is the product of the constituent C.F.’s. The corresponding PDF is obtainable via the Fourier transform � ∞ � N 1 fY (y) = φXn (jt)e−jyt dt. (3.86) 2π −∞ n=1 112 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING What if the random variables are not independent and exhibit correlation? Consider again the case of two random variables; this time, the problem is conveniently addressed by an adequate joint transformation. Let Y = g1 (X1 , X2 ) = X1 + X2 , Z = g2 (X1 , X2 ) = X2 , (3.87) (3.88) where Z is not a useful r.v. per se, but was included to allow a 2 × 2 transformation. The corresponding Jacobian is � � � 1 0 � � � = 1. J =� (3.89) −1 1 � Hence, we have fY,Z (y, z) = fX1 ,X2 (y − z, z), (3.90) and we can find the marginal PDF of Y with the usual integration procedure, i.e. � ∞ fY (y) = fY,Z (y, z)fZ (z)dz −∞ � ∞ = fX1 ,X2 (y − x2 , x2 )fX2 (x2 )dx2 . (3.91) −∞ Products of random variables A similar approach applies to products of r.v.’s. Consider the following product of two r.v.’s: Z = X1 X2 . (3.92) We can again fix X2 and obtain the conditional PDF of Z given X2 = x2 as was done for the sum of two r.v.’s. Given that g(X1 ) = X1 x2 and g −1 (Z) = Z x2 , this approach yields � � � � z ∂ z fZ|X2 (z|x2 ) = fX1 x2 ∂z x2 � � 1 z = fX1 . (3.93) x2 x2 Therefore, we have fZ (z) = = � ∞ fZ|X2 (z|x2 )fX2 (x2 )dx2 � � z 1 fX1 fX2 (x2 ) dx2 . x2 x2 −∞ −∞ � ∞ (3.94) 113 Probability and stochastic processes This formula happens to be the Mellin convolution of fX1 (x) and fX2 (x) and it is related to the Mellin tranform in the same way that Fourier convolution is related to the Fourier transform (see the overview of Mellin transforms in subsection 3). Hence, given a product of N independent r.v.’s Z= N � Xn , (3.95) n=1 we can immediately conclude that the Mellin transform of fZ (z) is the product of the Mellin transforms of the constituent r.v.’s, i.e. MfZ (s) = N � n=1 MfX (s) = [MfX (s)]N . (3.96) It follows that we can find the corresponding PDF through the inverse Mellin transform. This is expressed � 1 fZ (z) = MfZ (s) x−s dx, (3.97) 2π L±∞ where one particular integration path was chosen, but others are possible (see chapter 2, section 3, subsection on Mellin transforms). 4. Stochastic processes Many observable parameters that are considered random in the world around us are actually functions of time, e.g. ambient temperature and pressure, stock market prices, etc. In the field of communications, actual useful message signals are typically considered random, although this might seem counterintuitive. The randomness here relates to the unpredictability that is inherent in useful communications. Indeed, if it is known in advance what the message is (the message is predetermined or deterministic), there is no point in transmitting it at all. On the other hand, the lack of any fore-knowledge about the message implies that from the point-of-view of the receiver, the said message is a random process or stochastic process. Moreover, the omnipresent white noise in communication systems is also a random process, as well as the channel gains in multipath fading channels, as will be seen in chapter 4. Definition 3.20. Given a random experiment with a sample space S, comprising outcomes λ1 , λ2 , . . . , λN , and a mapping between every possible outcome λ and a set of corresponding functions of time X(t, λ), then this family of functions, together with the mapping and the random experiment, constitutes a stochastic process. 114 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING In fact, a stochastic process is a function of two variables; the outcome variable λ which necessarily belongs to sample space S, and time t, which can take any value between −∞ and ∞. Definition 3.21. To a specific outcome λi in a stochastic process corresponds a single time function X(t, λi ) = xi (t) called a member function or sample function of the said process. Definition 3.22. The set of all sample functions in a stochastic process is called an ensemble. We have established that for a given outcome λi , X(t, λi ) = xi (t) is a predetermined function out of the ensemble of possible functions. If we fix t to some value t1 instead of fixing the outcome λ, then X(t1 , λ) becomes a random variable. At another time instant, we find another random variable X(t2 , λ) which is most likely correlated with X(t1 , λ). It follows that a stochastic process can also be seen as a succession of infinitely many joint random variables (one for each defined instant in time) with a given joint distribution. Any set of instances for these r.v.’s constitutes one of the member functions. While this view is conceptually helpful, it is hardly practical for manipulating processes given the infinite number of individual r.v.’s required. Instead of continuous time, it is often sufficient to consider only a predetermined set of time instants t1 , t2 , . . . , tN . In that case, the set of random variables X(t1 ), X(t2 ), . . . , X(tN ) becomes a random vector x with a PDF fx (x), where x = [X(t1 ), X(t2 ), · · · , X(tN )]T . Likewise, a CDF can be defined: Fx (x) = P (X(t1 ) ≤ x1 , X(t2 ) ≤ x2 , . . . , X(tN ) ≤ xN ) . (3.98) Sometimes, it is more productive to consider how the process is generated. Indeed, a large number of infinitely precise time functions can in some cases result from a relatively simple random mechanism, and such simple models are highly useful to the communications engineer, even when they are approximations or simplifications of reality. Such parametric modeling as a function of one or more random variables can be expressed as follows: X(t, λ) = g1 (Y1 , Y2 , · · · , YN , t), (3.99) where g1 is a function of the underlying r.v.’s Y1 , Y2 , . . . , YN and time t, and λ = g2 (Y1 , Y2 , · · · , YN ), (3.100) where g2 is a function of the r.v.’s {Yn }, which together uniquely determine the outcome λ. 115 Probability and stochastic processes Since at a specific time t a process is essentially a random variable, it follows that a mean can be defined as � ∞ �X(t)� = µX (t) = x(t)fX(t) (x)dx. (3.101) −∞ Other important statistics are defined below. Definition 3.23. The joint moment � RXX (t1 , t2 ) = �X(t1 )X(t2 )� = ∞ −∞ � ∞ −∞ x1 x2 fX(t1 ),X(t2 ) (x1 , x2 )dx1 dx2 , is known as the autocorrelation function. Definition 3.24. The joint central moment µXX (t1 , t2 ) = �(X(t1 ) − µX (t1 )) (X(t2 ) − µX (t2 ))� = RXX (t1 , t2 ) − µX (t1 )µX (t2 ). is known as the autocovariance function. The mean, autocorrelation, and autocovariance functions as defined above are designated ensemble statistics since the averaging is performed with respect to the ensemble of possible functions at a specific time instant. Other joint moments can be defined in a straightforward manner. Example 3.5 Consider the stochastic process defined by X(t) = sin(2πfc t + Φ), (3.102) where fc is a fixed frequency and Φ is a uniform random variable taking values between 0 and 2π. This is an instance of a parametric description where the process is entirely defined by a single random variable Φ. The ensemble mean is � ∞ µX (t) = xfX(t) (x)dx = −∞ 2π � 0 1 = 2π = 0, sin(2πfc t + x)fΦ (x)dx � 0 2π sin(2πfc t + x)dx (3.103) 116 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING and the autocorrelation function is � ∞� ∞ RXX (t1 , t2 ) = x1 x2 fX(t1 ),X(t2 ) (x1 , x2 )dx1 dx2 = −∞ 2π � −∞ sin(2πfc t1 + φ) sin(2πfc t2 + φ)fΦ (φ)dφ �� 2π 1 cos (2πfc (t2 − t1 )) dφ 4π 0 � � 2π − cos (2πfc (t1 + t2 ) + 2φ) dφ 0 = 0 = = 1 [2π cos (2πfc (t2 − t1 )) − 0] 4π 1 cos (2πfc (t2 − t1 )) . 2 (3.104) Definition 3.25. If all statistical properties of a stochastic process are invariant to a change in time origin, i.e. X(t, λ) is statistically equivalent to X(t + T, λ), for any t, and T is any arbitrary time shift, then the process is said to be stationary in the strict sense. Stationarity in the strict sense implies that for any set of N time instants t1 , t2 , . . . , tN , the joint PDF of X(t1 ), X(t2 ), . . . , X(tN ) is identical to the joint PDF of X(t1 + T ), X(t2 + T ), . . . , X(tN + T ). Equivalently, it can be said that a process is strictly stationary if � � � � X k (t) = X k (0) , for all t, k. (3.105) However, a less stringent definition of stationarity is often useful, since strict stationarity is both rare and difficult to determine. Definition 3.26. If the ensemble mean and autocorrelation function of a stochastic process are invariant to a change of time origin, i.e. �X(t)� = �X(t + T )� , RXX (t1 , t2 ) = RXX (t1 + T, t2 + T ), for any t, t1 , t2 and T is any arbitrary time shift, then the process is said to be stationary in the wide-sense or wide-sense stationary. In the field of communications, it is typically considered sufficient to satisfy the wide-sense stationarity (WSS) conditions. Hence, the expression “stationary process” in much of the literature (and in this book!) usually implies WSS. Gaussian processes constitute a special case of interest; indeed, a Gaussian process that is WSS is also automatically stationary in the strict sense. This is 117 Probability and stochastic processes by virtue of the fact that all high-order moments of the Gaussian distribution are functions solely of its mean and variance. Since the point of origin t1 becomes irrelevant, the autocorrelation function of a stationary process can be specified as a function of a single argument, i.e. RXX (t1 , t2 ) = RXX (t1 − t2 ) = RXX (τ ), (3.106) where τ is the delay variable. Property 3.3. The autocorrelation function of a stationary process is an even function (symmetric about τ = 0), i.e. RXX (τ ) = �X(t1 )X(t1 − τ )� = �X(t1 + τ )X(t1 )� = RXX (−τ ). (3.107) Property 3.4. The autocorrelation function with τ = 0 yields the average energy of the process, i.e. � � RXX (0) = X 2 (t) . Property 3.5. (Cauchy-Schwarz inequality) |RXX (τ )| ≤ RXX (0), . Property 3.6. If X(t) is periodic, then RXX (τ ) is also periodic, i.e. X(t) = X(t + T ) → RXX (τ ) = RXX (τ + T ). It is straightforward to verify whether a process is stationary or not if an analytical expression is available for the process. For instance, the process of example 3.5 is obviously stationary. Otherwise, typical data must be collected and various statistical tests performed on it to determine stationarity. Besides ensemble statistics, time statistics can be defined with respect to a given member function. Definition 3.27. The time-averaged mean of a stochastic process X(t) is given by � T 2 1 MX = lim X(t, λi )dt, T →∞ T − T 2 where X(t, λi ) is a member function. Definition 3.28. The time-averaged autocorrelation function is defined 1 RXX (τ ) = lim T →∞ T � T 2 − T2 X(t, λi )X(t + τ, λi )dt. 118 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Unlike the ensemble statistics, the time-averaged statistics are random variables; their actual values depend on which member function is used for time averaging. Obtaining ensemble statistics requires averaging over all member functions of the sample space S. This requires either access to all said member functions, or perfect knowledge of all the joint PDFs characterizing the process. This is obviously not possible when observing a real-world phenomenon behaving according to a stochastic process. The best that we can hope for is access to time recordings of a small set of member functions. This makes it possible to compute time-averaged statistics. The question that arises is: can a time-averaged statistic be employed as an approximation of the corresponding ensemble statistic? Definition 3.29. Ergodicity is the property of a stochastic process by virtue of which all its ensemble statistics are equal to its corresponding time-averaged statistics. Not all processes are ergodic. It is also very difficult to determine whether a process is ergodic in the strict sense, as defined above. However, it is sufficient in practice to determine a limited form of ergodicity, i.e. with respect to one or two basic statistics. For example, a process X(t) is said to be ergodic in the mean if µX = MX . Similarly, a process is said to be ergodic in the autocorrelation if RXX = RXX . An ergodic process is necessarily stationary, although stationarity does not imply ergodicity. Joint and complex stochastic processes Given two joint stochastic process X(t) and Y (t), they are fully defined at two respective sets of time instants {t1,1 , t1,2 , . . . , t1,M } and {t2,1 , t2,2 , . . . , t2,N } by the joint PDF fX(t1,1 ),X(t1,2 ),··· ,X(t1,M ),Y (t2,1 ),Y (t2,2 ),··· ,Y (t2,N ) (x1 , x2 , . . . , xM , y1 , y2 , . . . , yN ) . Likewise, a number of useful joint statistics can be defined. Definition 3.30. The joint moment RXY (t1 , t2 ) = �X(t1 )Y (t2 )� = � ∞ −∞ � ∞ −∞ xyfX(t1 ),Y (t2 ) (x, y)dxdy is the cross-correlation function of the processes X(t) and Y (t). Definition 3.31. The cross-covariance of X(t) and Y (t) is µXY (t1 , t2 ) = �X(t1 )Y (t2 )� = RXY (t1 , t2 ) − µX (t1 )µY (t2 ). 119 Probability and stochastic processes X(t) and Y (t) are jointly wide sense stationary if �X m (t)Y n (t)� = �X m (0)Y n (0)� , for all t, m and n. (3.108) If X(t) and Y (t) are individually and jointly WSS, then RXY (t1 , t2 ) = RXY (τ ), µXY (t1 , t2 ) = µXY (τ ). (3.109) (3.110) Property 3.7. If X(t) and Y (t) are individually and jointly WSS, then RXY (τ ) = RY X (−τ ), µXY (τ ) = µY X (−τ ). The above results from the fact that �X(t)Y (t + τ )� = �X(t − τ )Y (t)�. The two processes are said to be statistically independent if and only if their joint distribution factors, i.e. fX(t1 ),Y (t2 ) (x, y) = fX(t1 ) (x)fY (t2 ) (y). (3.111) They are uncorrelated if and only if µXY (τ ) = 0 and orthogonal if and only if RXY (τ ) = 0. Property 3.8. (triangle inequality) |RXY (τ )| ≤ 1 [RXX (0) + RY Y (0)] . 2 Definition 3.32. A complex stochastic process is defined Z(t) = X(t) + jY (t), where X(t) and Y (t) are joint, real stochastic processes. Definition 3.33. The complex autocorrelation function (autocorrelation function of a complex process) is defined 1 �Z(t1 )Z ∗ (t2 )� 2 1 = �[X(t1 ) + jY (t1 )] [X(t2 ) − jY (t2 )]� 2 1 = [RXX (t1 , t2 ) + RY Y (t1 , t2 )] 2 j + [RY X (t1 , t2 ) − RXY (t1 , t2 )] . 2 RZZ (t1 , t2 ) = Property 3.9. If Z(t) is WSS (implying that X(t) and Y (t) are individually and jointly WSS), then RZZ (t1 , t2 ) = RZZ (t1 − t2 ) = RZZ (τ ). (3.112) 120 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Definition 3.34. The complex crosscorrelation function of two complex random processes Z1 (t) = X1 (t) + jY1 (t) and Z2 (t) = X2 (t) + jY2 (t) is defined 1 RZ1 Z2 (t1 , t2 ) = �Z1 (t1 )Z2∗ (t2 )� 2 1 = [RX1 X1 (t1 , t2 ) + RX2 X2 (t1 , t2 )] 2 j + [RY1 X2 (t1 , t2 ) − RX1 Y2 (t1 , t2 )] . 2 Property 3.10. If the real and imaginary parts of two complex stochastic processes Z1 (t) and Z2 (t) are individually and pairwise WSS, then 1 ∗ ∗ RZ (τ ) = �Z (t)Z2 (t − τ )� 1 Z2 2 1 1 ∗ = �Z (t + τ )Z2 (t)� 2 1 = RZ2 Z1 (−τ ). (3.113) Property 3.11. If the real and imaginary parts of a complex random process Z(t) are individually and jointly WSS, then ∗ (τ ) = RZZ (−τ ). RZZ (3.114) Linear systems and power spectral densities How does a linear system behave when a stochastic process is applied at its input? Consider a linear, time-invariant system with impulse response h(t) having for input a stationary random process X(t), as depicted in Figure 3.3. It is logical to assume that its output will be another stationary stochastic process Y (t) and that it will be defined by the standard convolution integral � ∞ Y (t) = X(τ )h(t − τ )dτ. (3.115) −∞ X(t) Y (t) h(t) Figure 3.3. Linear system with impulse response h(t) and stochastic process X(t) at its input. The expectation of Y (t) is then given by �� ∞ � �Y (t)� = X(τ )h(t − τ )dτ −∞ � ∞ = �X(τ )� h(t − τ )dτ, −∞ (3.116) 121 Probability and stochastic processes where, by virtue of the stationarity of X(t), the remaining expectation is actually a constant, and we have � ∞ �Y (t)� = µX h(t − τ )dτ −∞ � ∞ = µX h(τ )dτ −∞ = µX H(0), (3.117) where H(f ) is the Fourier transform of h(t). Let us now determine the crosscorrelation function of Y (t) and X(t). We have RY X (t, τ ) = �Y (t)X ∗ (t − τ )� � � � ∞ ∗ = X (t − τ ) X(t − a)h(a)da −∞ � ∞ = �X ∗ (t − τ )X(t − a)� h(a)da −∞ � ∞ = RXX (τ − a)h(a)da. (3.118) −∞ Since the right-hand side of the above is independent of t, it can be deduced that X(t) and Y (t) are jointly stationary. Furthermore, the last line is also a convolution integral, i.e. RY X (τ ) = RXX (τ ) ∗ h(τ ). (3.119) The autocorrelation function of Y (t) can be derived in the same fashion: RY Y (τ ) = �Y (t)Y ∗ (t − τ )� � � � ∞ ∗ = Y (t − τ ) X(t − a)h(a)da −∞ � ∞ = �Y ∗ (t − τ )X(t − a)� h(a)da �−∞ ∞ = RY X (a − τ )h(a)da −∞ = RY X (−τ ) ∗ h(τ ) = RXX (−τ ) ∗ h(−τ ) ∗ h(τ ) = RXX (τ ) ∗ h(−τ ) ∗ h(τ ). (3.120) Given that X(t) at any given instant is a random variable, how can the spectrum of X(t) be characterized? Intuitively, it can be assumed that there is 122 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING a different spectrum X(f, λ) for every member function X(t, λ). However, stochastic processes are in general infinite energy signals which implies that their Fourier transform in the strict sense does not exist. In the time domain, a process is characterized essentially through its mean and autocorrelation function. In the frequency domain, we resort to the power spectral density. Definition 3.35. The power spectral density (PSD) of a random process X(t) is a spectrum giving the average (in the ensemble statistic sense) power in the process at every frequency f . The PSD can be found simply by taking the Fourier transform of the autocorrelation function, i.e. � ∞ SXX (f ) = RXX (τ )e−j2πf τ dτ, (3.121) −∞ which obviously implies that the autocorrelation function can be found from the PSD SXX (f ) by performing an inverse transform, i.e. � ∞ RXX (τ ) = SXX (f )ej2πf τ df. (3.122) −∞ This bilateral relation is known as the Wiener-Khinchin theorem and its proof is left as an exercise. Definition 3.36. The cross power spectral density (CPSD) between two random processes X(t) and Y (t) is a spectrum giving the ensemble average product between every frequency component of X(t) and every corresponding frequency component of Y (t). As could be expected, the CPSD can also be computed via a Fourier transform � ∞ SXY (f ) = RXY (τ )ej2πf τ dτ. (3.123) −∞ By taking the Fourier transform of properties 3.10 and 3.11, we find the following: Property 3.12. Property 3.13. ∗ SXY (f ) = SY X (f ). ∗ SXX (f ) = SXX (f ). The CPSD between the input process X(t) and the output process Y (t) of a linear system can be found by simply taking the Fourier transform of 3.117. Thus, we have SY X (f ) = H(f )SXX (f ). (3.124) 123 Probability and stochastic processes Likewise, the PSD of Y (t) is obtained by taking the Fourier transform of (3.119), which yields SY Y (f ) = H(f ) ∗ H ∗ (f )SXX (f ) = |H(f )|2 SXX (f ). (3.125) It is noteworthy that, since RXX (f ) is an even function, its Fourier transform SXX (f ) is necessarily real. This is logical, since according to the definition, the PSD yields the average power at every frequency (and complex power makes no sense). It also follows that the average total power of X(t) is given by � � P = X 2 (t) = RXX (0) � ∞ = SXX (f )df. (3.126) −∞ Discrete stochastic processes If a stochastic process is bandlimited (i.e. its PSD is limited to a finite interval of frequencies) either because of its very nature or because it results from passing another process through a bandlimiting filter, then it is possible to characterize it fully with a finite set of time instants by virtue of the sampling theorem. Given a deterministic signal x(t), it is bandlimited if X(f ) = F{x(t)} = 0, for |f | > W , (3.127) where W corresponds to the highest frequency in s(t). Recall that according to the sampling theorem, x(t) can be uniquely determined by the set of its samples taken at a rate of fs ≥ 2W samples / s, where the latter inequality constitutes the Nyquist criterion and the minimum rate 2W samples / s is known as the Nyquist rate. Sampling at the Nyquist rate, the sampled signal is � � � � ∞ � k k xs (t) = x δ t− , (3.128) 2W 2W k=−∞ and it corresponds to the discrete sequence � � k x[k] = x . 2W (3.129) Sampling theory tells us that x[k] contains all the necessary information to reconstruct x(t). Since the member functions of a stochastic process are individually deterministic, the same is true for each of them and, by extension, for the process itself. Hence, a bandlimited process X(t) is fully characterized 124 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING by the sequence of random variables X[k] corresponding to a sampling set at the Nyquist rate. Such a sequence X[k] is an instance of a discrete stochastic process. Definition 3.37. A discrete stochastic process X[k] is an ensemble of discrete sequences (member functions) x1 [k], x2 [k], . . . , xN [k] which are mapped to the outcomes λ1 , λ2 , . . . , λN making up the sample space S of a corresponding random experiment. The mth moment of X[k] is �X [k]� = m � ∞ −∞ xm fX[k] (x)dx, (3.130) its autocorrelation is defined � ∞� ∞ � 1� ∗ RXX [k1 , k2 ] = X k1 X k 2 = xy ∗ fXk1 ,Xk2 (x, y)dxdy, (3.131) 2 −∞ −∞ and its autocovariance is given by µXX [k1 , k2 ] = RXX [k1 , k2 ] − �X[k1 ]� �X[k2 ]� . (3.132) If the process is stationary, then we have RXX [k1 , k2 ] = RXX [k1 − k2 ], µXX [k1 , k2 ] = µXX [k1 − k2 ] = RXX [k1 − k2 ] − µ2X . (3.133) The power spectral density of a discrete process is, naturally enough, computed using the discrete Fourier transform, i.e. SXX (f ) = ∞ � RXX [k]e−j2πf k , (3.134) k=−∞ with the inverse relationship being � 1/2 RXX [k] = SXX (f )ej2πf k df. (3.135) −1/2 Hence, as with any discrete Fourier transform, the PSD SXX (f ) is periodic. More precisely, we have SXX (f ) = SXX (f + n) where n is any integer. Given a discrete-time linear time-invariant system with a discrete impulse response h[k] = h(tk ), the output process Y [k] of this system when a process X[k] is applied at its input is given by Y [k] = ∞ � n=−∞ h[n]X[k − n], (3.136) 125 Probability and stochastic processes which constitutes a discrete convolution. The mean of the output can be computed as follows: = �Y [k]� � ∞ � � = h[n]X[k − n] µY n=−∞ = ∞ � h[n] �X[k − n]� n=−∞ ∞ � = µX h[n] = µX H(0), (3.137) n=−∞ where H(0) is the DC component of the system’s frequency transfer function. Likewise, the autocorrelation function of the output is given by RY Y [k] = = 1 ∗ �Y [n]Y [n + k]� 2� � ∞ ∞ � � 1 ∗ ∗ h [m]X [n − m] h[l]X[k + n − l] 2 m=−∞ l=−∞ = = ∞ � ∞ � m=−∞ l=−∞ ∞ ∞ � � m=−∞ l=−∞ h∗ [m]h[l] �X ∗ [n − m]X[k + n − l]� h∗ [m]h[l]RXX [k + m − l], (3.138) which is in fact a double discrete convolution. Taking the discrete Fourier transform of the above expression, we obtain SY Y (f ) = SXX (f ) |Hs (f )|2 , (3.139) which is exactly the same as for continuous processes, except that SXX (f ) and SY Y (f ) are the periodic PSDs of discrete processes, and Hs (f ) is the periodic spectrum of the sampled version of h(t). Cyclostationarity The modeling of signals carrying digital information implies stochastic processes which are not quite stationary, although it is possible in a sense to treat them as stationary and thus obtain the analytical convenience associated with this property. Definition 3.38. A cyclostationary stochastic process is a process with nonconstant mean (it is therefore not stationary, neither in the strict sense nor the 126 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING wide-sense) such that the mean and autocorrelation function are periodic in time with a given period T . Consider a random process S(t) = ∞ � k=−∞ a[k]g(t − kT ), (3.140) where {a[k]} is a sequence of complex random variables having a mean µA and autocorrelation function RAA [n] (so that the sequence A[k] = {a[k]} is a stationary discrete random process), and g(t) is a real pulse-shaping function considered to be 0 outside the interval t ∈ [0, T ]. The mean of S(t) is given by µS = µA ∞ � k=−∞ g(t − nT ), (3.141) and it can be seen that it is periodic with period T . Likewise, the autocorrelation function is given by RSS (t, t + τ ) = = 1 ∗ �S (t)S(t + τ )� 2� � ∞ ∞ � � 1 ∗ a [k]g(t − kT ) a[l]g(t + τ − lT ) 2 k=−∞ = = 1 g(t − kT )g(t + τ − lT ) �a∗ [k]a[l]� 2 −∞ k=−∞ ∞ � ∞ � k=−∞ −∞ It is easy to see that l=−∞ ∞ � ∞ � g(t − kT )g(t + τ − lT )RAA [l − k]. (3.142) RSS (t, t + τ ) = RSS (t + kT, t + τ + kT ), for any integer k. (3.143) The fact that such processes are not stationary is inconvenient. For example, it is awkward to derive a PSD from the above autocorrelation function, because there are 2 variables involved (t and τ ) and this calls for a 2-dimensional Fourier transform. However, it is possible to sidestep this issue by observing that averaging the autocorrelation function over one period T removes any dependance upon t. Definition 3.39. The period-averaged autocorrelation function of a cyclostationary process is defined � 1 T /2 RSS (t, t + τ )dt. R̄SS (τ ) = T −T /2 127 Probability and stochastic processes It is noteworthy that the period-averaged autocorrelation function is an ensemble statistic and should not be confused with the time-averaged statistics defined earlier. Based on this, the power spectral density can be simply defined as � ∞ � � SSS (f ) = F R̄SS (τ ) = R̄SS (τ )ej2πf τ dτ. (3.144) −∞ 5. Typical univariate distributions In this section, we examine various types of random variables which will be found useful in the following chapters. We will start by studying three fundamental distributions: the binomial, Gaussian and uniform laws. Because complex numbers play a fundamental role in the modeling of communication systems, we will then introduce complex r.v.’s and the associated distributions. Most importantly, we will dwell on the complex Gaussian distribution, which will then serve as a basis for deriving other useful distributions: chi-square, F-distribution, Rayleigh and Rice. Finally, the Nakagami-m and lognormal distributions complete our survey. Binomial distribution We have already seen in section 2 that a discrete random variable following a binomial distribution is characterized by the PDF � N � � N fX (x) = pn (1 − p)N −n δ(x − n). n (3.145) n=0 We can find the CDF by simply integrating the above. Thus we have � N � � N pn (1 − p)N −n δ(α − n)dα n −∞ � FX (x) = x n=0 � N � = n=0 where It follows that � x −∞ N n � pn (1 − p)N −n δ(α − n)dα = FX (x) = � � x −∞ δ(α − n)dα, (3.146) 1, if n < x 0, otherwise � �x� � � N pn (1 − p)N −n , n n=0 (3.147) (3.148) 128 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING where �x� denotes the floor operator, i.e. it corresponds to the nearest integer which is smaller or equal to x. The characteristic function is given by � � ∞� N � N φX (jt) = pn (1 − p)N −n δ(x − n)ejtx dx n −∞ n=0 � N � = n=0 N n � pn (1 − p)N −n ejtn , (3.149) which, according to the binomial theorem, reduces to φX (jt) = (1 − p + pejt )N . (3.150) Furthermore, the Mellin transform of the binomial PDF is given by � � ∞� N � N MfX (s) = pn (1 − p)N −n δ(x − n)xs−1 dx n −∞ n=0 � N � = n=0 N n � pn (1 − p)N −n ns−1 , which implies that the kth moment is � N � � � � N k X = pn (1 − p)N −n nk . n (3.151) (3.152) n=0 If k = 1, we have � N � � N pn (1 − p)N −n n �X� = n n=0 N � N! pn (1 − p)N −n (N − n)!(n − 1)! n=0 � N −1 � � � N −1 = Np pn (1 − p)N −1−n � n n� =−1 � � N −1 � where n = n − 1 and, noting that = 0, we have −1 � N −1 � � � � N −1 �X� = N p pn (1 − p)N −1−n � n � = (3.153) n =0 = N p(1 − p + p)N −1 = N p. (3.154) 129 Probability and stochastic processes Likewise, if k = 2, we have � X 2 � � N � � N = pn (1 − p)N −n n2 n n=0 = Np N −1 � � n� =0 N −1 n� � = N p (N − 1)p N −1 � � n� =0 N −1 n� � N −2 � n�� =0 � � � pn (1 − p)N −1−n (n� + 1) � N −2 n�� n� � �� �� pn (1 − p)N −2−n + N −1−n� p (1 − p) � , (3.155) where n� = n − 1 and n�� = n − 2 and, applying the binomial theorem � 2� X = N (N − 1)p2 + N p = N p(1 − p) + N 2 p2 . If follows that the variance is given by � � 2 σX = X 2 − �X�2 = N p(1 − p) + N 2 p2 − N 2 p2 = N p(1 − p) (3.156) (3.157) Uniform distribution A continuous r.v. X characterized by a uniform distribution can take any value within an interval [a, b] and this is denoted X ∼ U (a, b). Given a smaller interval [∆a, ∆b] of width ∆ such that ∆a ≥ a and ∆b ≤ b, we find that, by definition P (X ∈ [∆a , ∆b ]) = P (∆a ≤ X ≤ ∆b ) = ∆ , b−a (3.158) regardless of the position of the interval [∆a , ∆b ] within the range [a, b]. The PDF of such an r.v. is simply � 1 b−a , if x ∈ [a, b], fX (x) = (3.159) 0, elsewhere, or, in terms of the step function u(x), fX (x) = 1 [u(x − a) − u(x − b)] . b−a (3.160) 130 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING The corresponding CDF is FX (x) = 0, x b−a , 1, if x < a, if x ∈ [a, b] if x > b, (3.161) which, in terms of the step function is conveniently expressed x FX (x) = [u(x − a) − u(x − b)] + u(x − b). b−a The characteristic function is obtained as follows: � b 1 jtx φX (jt) = e dx a b−a � jtx �b 1 e = b − a jt a � � 1 = ejtb − ejta . jt(b − a) The Mellin transform is equally elementary: � b s−1 x MfX (s) = dx a b−a � s �b 1 x = b−a s a 1 bs − as = . b−a s It follows that � � 1 bk+1 − ak+1 Xk = . b−a k+1 (3.162) (3.163) (3.164) (3.165) Gaussian distribution From Appendix 3.A and Example 3.4, we know that the PDF of a Gaussian r.v. is (x−µX )2 − 1 2 √ fX (x) = e 2σX , (3.166) 2πσX 2 is the variance and µ is the mean of X. where σX X The corresponding CDF is � x FX (x) = fX (α)dα −∞ = √ 1 2πσX � x −∞ − e α−µX 2σ 2 X dα, (3.167) 131 Probability and stochastic processes α−µ √ X, 2σX which, with the variable substitution u = becomes � x−µ √ X 2σX 1 2 FX (x) = √ e−u du π �−∞ � �� x−µ 1 √ X 1 + erf , 2� � 2σX �� |x−µX | 1 √ = 2 1 − erf 2σX � � | = 1 erfc |x−µ √ X 2 2σ if x − µX > 0, (3.168) otherwise, X where erf(·) is the error function and erfc(·) is the complementary error function. Figure 3.4, shows the PDF and CDF of the Gaussian distribution. 0.12 0.1 fX (x) 0.08 0.06 0.04 0.02 0 −15 −10 −5 0 5 10 15 x (a) Probability density function fX (x) P (X ≤ x) = FX (x) 1 0.8 0.6 0.4 0.2 0 −15 −10 −5 0 5 10 15 x (b) Cumulative density function fX (x) Figure 3.4. The PDF and CDF of the real Gaussian distribution with zero mean and a variance of 10. 132 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING The characteristic function of a Gaussian r.v. with mean µX and variance 2 is σX φX (jt) = ejtu− 2 t2 σX 2 (3.169) . The k th central moment is given by � (X − µX ) k � = � ∞ (x − µX ) √ k −∞ − 1 e 2πσX where the integral can be reduced by substituting u = � (X − µX )k (x−µX )2 2σ 2 X (3.170) , (x−µX )2 2 2σX to yield �k � � ∞ k−1 2 2 � 2σX k+1 = √ 1 − (−1) u 2 e−u du, π 0 � � (3.171) which was obtained by treating separately the cases (x − µX ) > 0 and (x − µX ) < 0. It can be observed that all odd-order central moments are null. For the even moments, we have � (X − µX ) k � � 2 2 σk = √X π � (X − µX ) k ∞ u k−1 2 e−u du, if k is even, (3.172) 0 which, according to the definition of the Gamma function, is k � k 2 2 σk = √ XΓ π � k+1 2 � . (3.173) By virtue of identity (2.197), the above becomes � (X − µX ) k � k 2 2 σ k 1 · 3 · 5 · · · (k − 1) √ = √X π, k π 22 if k is even, (3.174) which finally reduces to � � (X − µX )k = � 0, k σX � k2 −1 i=1 if k is odd, (2i − 1), if k is even. (3.175) The raw moments are best obtained as a function of the central moments. For the k th raw moment, assuming k an integer, we have: � X k � � − 1 = x √ e 2πσX −∞ ∞ k (x−µX )2 2σ 2 X dx. (3.176) 133 Probability and stochastic processes This can be expressed as a function of x−µX thanks to the binomial theorem as follows: � ∞ (x−µX )2 � � − 1 2 k k X = (x − µX + µX ) √ e 2σX dx 2πσX −∞ � � � k (x−µX )2 ∞ � k − 1 2 k−n n = (x − µX ) √ µX e 2σX dx n 2πσX −∞ n=0 � � k � � � k k (X − µ ) = µk−n X X n n=0 = � k2 � � � n−1 � � k k−2n� 2n� µX σX (2n − 1). � 2n � n =0 (3.177) i=1 The Mellin transform does not follow readily because the above assumes that k is an integer. The Mellin transform variable s, on the other hand, is typically not an integer and can assume any value in the complex plane. Furthermore, it may sometimes be useful to find fractional moments. From (3.176), we have � ∞ � � s−1 � � � � s−1 X = µs−1−n (X − µX )s−1 , (3.178) X n n=0 where the summation now runs form 0 to ∞ to account for the fact that s might not be an integer. If it is an integer, terms for n > s − 1 will be nulled because the combination operator has (s − 1 − n)! at the denominator and the factorial of a negative integer is infinite. It follows that the above equation is true regardless of the value of s. Applying the identity (2.197), we find � ∞ � � � Γ(1 − s + n) s−1−n � X s−1 = µX (X − µX )s−1 , Γ(1 − s)n! (3.179) n=0 which, by virtue of 3.173 and bearing in mind that the odd central moments are null, becomes � � � 2n� ∞ � s−1 � � Γ(1 − s + 2n� ) s−1−2n� 2n σX 1 � √ X = µ Γ n + . (3.180) Γ(1 − s)(2n� )! X 2 π � n =0 The Gamma function and the factorial in 2n� can be expanded by virtue of property 2.68 to yield � � � � � � 2n� ∞ � s−1 � � Γ n� + 1−s Γ n + 1 − 2s 2n σX 2 X = . (3.181) Γ(1 − s)n� ! � n =0 134 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Expanding Γ(1 − s) in the same way allows the formation of two Pochhammer functions by combination with the numerator Gamma functions; this results in � ∞ � 1−s � � s � � s−1 � 1 − � 2 n 2 n� n� 2n� X = 2 σX n! n� =0 � � 1−s s 2 = 2 F0 , 1 − ; 2σX . (3.182) 2 2 A unique property of this distribution is that a sum of Gaussian r.v.’s is itself a Gaussian variate. Consider Y = N � (3.183) Xi , i=1 where each Xi is a Gaussian r.v. with mean µi and variance σi2 . The corresponding characteristic function is φY (jt) = N � ejtµi − t2 σi2 2 i=1 = �N � jtµi e �� N � −t e i=1 i=1 P t2 PN 2 jt N µ − i=1 i i=1 σi 2 = e 2σ2 i 2 � (3.184) , which, of course, is the C. F. of a Gaussian r.v. with mean � 2 ance N i=1 σi . �N i=1 µi and vari- Complex Gaussian distribution In communications, quantities (channel gains, data symbols, filter coefficients) are typically complex thanks in no small part to the widespread use of complex baseband notation (see chapter 4). It follows that it is useful to handle complex r.v.’s. This is simple enough to do, at least in principle, given that a complex r.v. comprises two jointly-distributed real r.v.’s corresponding to the real and imaginary part of the complex quantity. Hence, a complex Gaussian variate is constructed from two jointly distributed real Gaussian variates. Consider Z = A + jB, (3.185) 135 Probability and stochastic processes where A and B are real Gaussian variates. It was found in [Goodman, 1963] that such a complex normal r.v. has desirable analytic properties if � � � � (A − µA )2 = (B − µB )2 , (3.186) �(A − µA )(B − µB )� = = 0. (3.187) The latter implies that A and B are uncorrelated and independent. Furthermore, the variance of Z is defined � � |Z − µZ |2 σZ2 = = �(Z − µZ )(Z − µZ )∗ � , (3.188) where µZ = µA + jµB . Hence, we have σZ2 = �[(A − µA ) + j(B − µB )] [(A − µA ) − j(B − µB )]� � � � � = (A − µA )2 + (B − µB )2 2 2 = σA + σB 2 = 2σA . (3.189) It follows that A and B obey the same marginal distribution which is − 1 fA (a) = √ e 2πσA (a−µA )2 2σ 2 A (3.190) . Hence, the joint distribution of A and B is fA,B (a, b) = = − 1 e 2πσA σB 1 − 2 e 2πσA (a−µA )2 2σ 2 A − e (b−µB )2 2σ 2 B (a−µA )2 +(b−µB )2 2σ 2 A , (3.191) which directly leads to 1 − fZ (z) = e πσZ2 (z−µZ )(z−µZ )∗ σ2 Z . (3.192) Figure 3.5 shows the PDF fZ (z) plotted as a surface above the complex plane. While we can conveniently express many complex PDFs as a function of a single complex r.v., it is important to remember that, from a formal standpoint, a complex PDF is actually a joint PDF since the real and imaginary parts are separate r.v.’s. Hence, we have fZ (z) = fA,B (a, b) = fRe{Z},Im{Z} (Re{z}, Im{z}). (3.193) 136 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING 0.3 fZ (z) 2 0.2 0.1 1 0 −2 0 −1 −1 0 Im{z} Re{z} 1 2 −2 Figure 3.5. 2 The bidimensional PDF of unit variance Z (σZ = 1) on the complex plane. The fact that there are two underlying r.v.’s becomes inescapable when we consider the CDF of a complex r.v. Indeed, the event Z ≤ z is ambiguous. Therefore, the CDF must be expressed as a joint CDF of the real and imaginary parts, i.e. FRe{Z},Im{Z} (a, b) = P (Re{Z} ≤ a, Im{Z} ≤ b), (3.194) which, provided that the CDF is differentiable along its two dimensions, can lead us back to the complex PDF according to fZ (z) = fRe{Z},Im{Z} (a, b) = ∂2 F (a, b). ∂a∂b Re{Z},Im{Z} (3.195) Accordingly, the C. F. of a complex r.v. Z is actually a joint C.F. defined by � � φRe{Z},Im{Z} (jt1 , jt2 ) = ejt1 Re{Z}+jt2 Im{Z} . (3.196) In the case of a complex Gaussian variate, we have 2 φRe{Z},Im{Z} (jt1 , jt2 ) = ej(t1 Re{Z}+t2 Im{Z}) e−σZ (t1 +t2 ) . (3.197) Rayleigh distribution The Rayleigh distribution characterizes the amplitute of a narrowband noise process n(t), as originally derived by Rice [Rice, 1944], [Rice, 1945]. In the context of wireless communications, however, the Rayleigh distribution is most often associated with multipath fading (see chapter 4). 137 Probability and stochastic processes Consider that the noise process n(t) is made up of a real and imaginary part (as per complex baseband notation — see chapter 4): N (t) = A(t) + jB(t), (3.198) and that N (t) follows a central (zero-mean) complex Gaussian distribution. It follows that if N (t) is now expressed in phasor notation, i.e. N (t) = R(t)ejΘ(t) , (3.199) where R(t) is the modulus (amplitude) and Θ(t) is the phase, their respective PDFs can be derived through transformation techniques. We have 2 2 1 − a σ+b 2 N fA,B (a, b) = , (3.200) 2 e πσN 2 = 2σ 2 = 2σ 2 and the transformation is where σN A B � R = g1 (A, B) = A2 + B 2 , R ∈ [0, ∞], � � B Θ = g2 (A, B) = arctan , Θ ∈ [0, 2π), A (3.201) (3.202) being the conversion from cartesian to polar coordinates. The inverse transformation is A = g1−1 (R, Θ) = R cos Θ, B = g2−1 (R, Θ) = R sin Θ, A ∈ [−∞, ∞] B ∈ [−∞, ∞]. Hence, the Jacobian is given by � ∂R cos Θ ∂R sin Θ � � ∂R ∂R J = det ∂R cos Θ ∂R sin Θ = det ∂Θ ∂Φ = R(cos Θ + sin Θ) = R. 2 2 cos Θ sin Θ −R sin Θ R cos Θ (3.203) (3.204) � (3.205) It follows that 2 r − σr2 N u(r) [u(θ) − u(θ − 2π)] , fR,Θ (r, θ) = 2 e πσN and � 1 [u(θ) − u(θ − 2π)] , 2π 0 2 � 2π � 2π −r re σn2 fR (r) = fR,Θ (r, θ)dθ = dθ 2 u(r) πσN 0 0 fΘ (θ) = ∞ fR,Θ (r, θ)dr = (3.206) (3.207) 2 = 2r − σr2 N u(r). 2 e σN (3.208) 138 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Hence, the modulus R is Rayleigh-distributed and the phase is uniformly distributed. Furthermore, the two variables are independent since fR,Θ (r, θ) = fR (r)fΘ (θ). The Mellin tranform of fR (r) is � � � � s−1 s−1 Γ 1+ MfR (s) = Rs−1 = σN , (3.209) 2 and the kth central moment is given by � � � � k k k R = σN Γ 1 + . 2 (3.210) Consequently, the mean is �R� = σN Γ and the variance is (3.211) � � � (R − µR )2 = R2 − µ2R � � 2 π σN 4−π 2 2 = σN Γ(2) − = σN . 4 4 2 = σR The CDF is simply � � √ 3 σN π = , 2 2 � FR (r) = P (R < r) = � 0 r 2 (3.212) 2 − r2 2α − σα2 σ N dα = 1 − e N . e 2 σN (3.213) Figure 3.6 depicts typical PDFs and CDFs for Rayleigh-distributed variables. Finally, the characteristic function is given by � � � � � 2 t2 π σN − σN σN t 4 √ φR (jt) = 1 − te erfi −j , (3.214) 2 2 2 where erfi(x) = erf(jx) is the imaginary error function. Rice distribution If a signal is made up of the addition of a pure sinusoid and narrowband noise, i.e. its complex baseband representation is S(t) = C + A(t) + jB(t), (3.215) where C is a constant and N (t) = A(t) + jB(t) is the narrowband noise (as characterized in the previous subsection), then the envelope R(t) follows a Rice distribution. 139 Probability and stochastic processes 0.8 2 =1 σN fR (r) 0.6 2 =2 σN 0.4 2 =4 σN 0.2 0 0 2 4 6 8 10 r (a) Probability density function fR (r) 1 P (R ≤ r) = FR (r) 0.8 2 =1 σN 0.6 2 =2 σN 0.4 2 =3 σN 2 =4 σN 0.2 0 2 =6 σN 0 2 4 6 8 10 r (b) Cumulative density function FR (r) 2 Figure 3.6. ` 4−π ´ The PDF and CDF of the Rayleigh distribution with various variances σR = 2 σN 4 . We note that Re{S(t)} = C + A(t), Im{S(t)} = B(t) and R(t) = � (C + A(t))2 + B(t)2 � � B(t) Θ(t) = arctan . C + A(t) (3.216) (3.217) The inverse transformation is given by A(t) = R(t) cos Θ(t) − C B(t) = R(t) sin Θ(t) (3.218) (3.219) It follows that the Jacobian is identical to that found for the Rayleigh distribution, i.e. J = R. 140 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Therefore, we have fR,Θ (r, θ) = = r − 2 e πσN (r cos θ−C)2 +r 2 sin2 θ σ2 N r −r 2 e πσN 2 −2rC cos θ+C 2 σ2 N (3.220) , and the marginal distribution fR (r) is given by � 2π 2 θ+C 2 r − r −2rCσcos 2 N fR (r) = dθ 2 e πσN 0 2 2 � 2π 2rC cos θ r − r σ+C 2 2 N = e e σN dθ, 2 πσN 0 (3.221) (3.222) where the remaining integral can be reduced (by virtue of the definition of the modified Bessel function of the first kind) to yield � � 2 2 r − r σ+C 2rC 2 N 2I0 fR (r) = 2 e u(r). (3.223) 2 σN σN Finding the marginal PDF of Θ is a bit more challenging: � ∞ 2 θ+C 2 r − r −2rCσcos 2 N fΘ (θ) = e dr 2 πσN 0 2 = to − C2 σ N e 2 πσN � ∞ −r re 2 −2rC cos θ σ2 N (3.224) (3.225) dr. 0 According to [Prudnikov et al., 1986a, 2.3.15-3], the above integral reduces 2 fΘ (θ) = − C2 e σ N 2π exp � C 2 cos2 θ 2 2σN � D−2 � √ 2C cos θ − σN � (3.226) , where we can get rid of D−2 (z) (the parabolic cylinder function of order -2) by virtue of the following identity [Prudnikov et al., 1986b, App. II.8]: � � � z2 π z2 z D−2 (z) = ze 4 erfc √ + e− 4 (3.227) 2 2 to finally obtain 2 fΘ (θ) = − C2 e σ N 2π �√ πC cos(θ) C e σN × [u(θ) − u(θ − 2π)] . 2 cos2 θ σ2 N � C cos θ erfc − σN � +1 � (3.228) 141 Probability and stochastic processes In the derivation, we have assumed (without loss of generality and for analytical convenience) that the constant term was real where in fact, the constant term can have a phase (Cejθ0 ). This does not change the Rice amplitude PDF, but displaces the Rice phase distribution so that it is centered at θ0 . Therefore, in general, we have 2 fΘ (θ) = − C2 � √ C 2 cos2 (θ−θ ) 0 πC cos(θ − θ0 ) σ2 N × (3.229) 1+ e 2π σN � �� C cos (θ − θ0 ) erfc − [u(θ) − u(θ − 2π)] . σN e σ N It is noteworthy that the second raw moment (a quantity of some importance since it reflects the average power of Rician process R(t)) can easily be found without integration. Indeed, we have from (3.216) � 2� � � R = (C + A)2 + B 2 � � = C 2 + 2AC + A2 + B 2 � � � � = C 2 + A2 + B 2 2 = C 2 + σN . (3.230) One important parameter associated with Rice distributions is the so-called K factor which constitutes a measure of the importance of the random portion of the variable (N (t) = A + jB) relative to the constant factor C. It is defined simply as C2 K= 2 . (3.231) σN Figure (3.7) shows typical PDFs for R and Θ for various values of the K factor, as well as their geometric relation with the underlying complex � Gaussian � variate. For the PDF of R, the K factor was varied while keeping R2 = 1. Central chi-square distribution Consider the r.v. Y = N � Xn2 , (3.232) n=1 where {X1 , X2 , . . . , XN } form a set of i.i.d. zero-mean real Gaussian variates. Then Y is a central chi-square variate with N degrees-of-freedom. This distribution is best derived in the characteristic function domain. We start with the case N = 1, i.e. Y = X 2, (3.233) 142 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING �{Z} r C θ0 θ Cejθ 0 �{Z} 0 (a) Geometry of the Rice amplitude and phase distributions in the complex plane 1.75 1.75 1.5 K = 10 K=2 1.25 K=2 1 fR (r) fΘ (θ) K = 10 1.5 1.25 K=1 0.75 0.5 K= 0.25 1 0.25 0 −2 −1 0 1 θ − θ0 (radians) 2 K=1 0.75 0.5 1 16 0 −3 K= 1 16 3 0 0.5 1 1.5 2 2.5 r(m) (b) Rice phase PDF (d) Rice amplitude PDF Figure 3.7. The Rice distributions and their relationship with the complex random variable Z = (C cos(θ) + A) + j (C sin(θ) + B); (a) the distribution of Z in the complex plane (a non-central complex Gaussian distribution pictured here with concentric isoprobability curves) and relations with variables R and θ; (b) PDF of R; (c) PDF of Θ. where the C. F. is simply � � � 2 ejtY = ejtX � ∞ 2 2 − x2 ejtx 2σ √ = e X dx 2πσX −∞ « � � ∞ „ jt− 12 x2 2 1 2σ X = e dx. π σX 0 φY (jt) = � (3.234) 143 � � The integral can be solved by applying the substitution u = − jt − 2σ12 x2 X which leads to � ∞ −u 1 1 e � √ du φY (jt) = √ u 1 2πσX 0 − jt 2 2σ � ∞ −u 1 e √ du. = � (3.235) u 2 ) 0 π(1 − 2jtσX Probability and stochastic processes According to the definition of the Gamma function, the latter reduces to � � � � Γ 12 2 −1/2 φY (jt) = � = 1 − 2jtσX . (3.236) 2 ) π(1 − 2jtσX Going back now to the general case of N degrees of freedom, we find that φY (jt) = N � � n=1 2 1 − 2jtσX �−1/2 � � 2 −N/2 = 1 − 2jtσX . (3.237) The PDF can be obtained by taking the transform of the above, i.e. � ∞ � � 1 2 −N/2 −jty fY (y) = 1 − 2jtσX e dt, (3.238) 2π −∞ which, according to [Prudnikov et al., 1986a, 2.3.3-9], reduces to fY (y) = � 2 2σX 1 �N/2 2 Γ (N/2) y N/2−1 e−y/2σX u(y). (3.239) The shape of the chi-square PDF for 1 to 10 degrees-of-freedom is shown in Figure 3.8. It should be noted that for N = 2, the chi-square distribution reduces to the simpler exponential distribution. The chi-square distribution can also be derived from complex Gaussian r.v.’s. Let N � Y = |Zn |2 , (3.240) n=1 where {Z1 , Z2 , . . . , ZN } are i.i.d. complex Gaussian variates with variance σZ2 . If we start again with the case N = 1, we find that the C.F. is � � � � � � � �−1 2 2 2 , (3.241) φY (jt) = ejtY = ejt|Z1 | = ejt(A +B ) = 1 − jtσZ2 144 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING 2 N = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 fY (y) 1.5 1 0.5 0 0 4 2 6 8 10 y Figure 3.8. The chi-square PDF for values of N between 1 and 10. where A = �{Z} and B = �{Z}. Hence, is is a 2 degrees of freedom chi2 = 2σ 2 . square variate which can be related to (3.237) by noting that σZ2 = 2σA B It follows that in the general case of N variates, we have � �−N φY (jt) = 1 − jtσZ2 , (3.242) which naturally leads to f y(y) = 1 y (σZ2 )N Γ(N ) 2 N −1 −y/σZ e u(y). (3.243) which is a 2N degrees-of-freedom X 2 variate. Equivalently, it can also be said that this r.v. has N complex degrees-of-freedoms. The k th raw moment is given by � ∞ � � 1 2 k Y = y N −1+k e−y/σZ dy 2 N (σZ ) Γ(N ) 0 � ∞ 2 (σZ )N −1+k = uN −1+k e−u du (σZ2 )N −1 Γ(N ) 0 = (σZ2 )k Γ(N + k). Γ(N ) (3.244) It follows that the mean is µY = �Y � = 2 σX Γ(N + 1) = N σZ2 , Γ(N ) (3.245) 145 Probability and stochastic processes and the variance is given by � � σY2 = (Y − µY )2 � � = Y 2 − �Y �2 σZ4 Γ(N + 2) − N 2 σZ4 Γ(N ) = σZ4 (N + 1)N − N 2 σZ4 = σZ4 N. = (3.246) Non-central chi-square distributions The non-central chi-square distribution characterizes a sum of squared nonzero-mean Gaussian variates. Let Y = N � Xn2 , (3.247) n=1 where {X1 , X2 , . . . , XN } form a set of independant real Gaussian r.v.’s with 2 and means {µ , µ , . . . , µ } and equal variance σX 1 2 N N � n=1 |µn | = � 0, (3.248) i.e. at least one Gaussian r.v. has a non-zero mean. As a first step in deriving the distribution, consider Yn = Xn2 , (3.249) where −(x−µ )2 n 1 2 fXn (x) = √ e 2σX . (3.250) 2πσX √ According to (3.57) and since Xn = ± Yn , we find that � −(√y−µ )2 � √ −( y−µn )2 n 1 1 2σ 2 2σ 2 X X fYn (y) = √ e +e √ 2 y 2πσX � � √ √ µ2 yµn yµ − y2 − n2 − 2n 1 2σ 2σ σ2 σ √ = √ e Xe X e X +e X 2 2πσX y �√ � µ2 − y2 − n2 yµn 1 2σ 2σ = √ . (3.251) √ e X e X cosh 2 σX 2πσX y 146 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING To find the characteristic function, we start by expanding the hyperbolic cosine into an infinite series: � � φYn (jt) = ejtYn � ∞ ejtα 2 2 2 √ = √ e−α/2σX e−µn /2σX × 2πσX α 0 ∞ � (αµ2 )k � 1 �k n , (3.252) 4 (2k)! σX k=0 Applying the identity (2k)! = � − � e ejtYn = √ µn 2σ 2 X 2σX � 2k 2 √ Γ π k+ 1 2 � Γ(k + 1), the above becomes � 4 �−k � ∞ ∞ � − α2 µ2k n 4σX jtα k− 21 2σ X α � � e e dα, 1 Γ k + 2 k! 0 k=0 (3.253) where the resulting integral can be solved in a manner similar to that of subsection 6 to give � − � ejtYn = e √ µn 2σ 2 X 2σX ∞ � µ2k 2 )k+1/2 (2σX 2 t)k+1/2 k (1 − j2σX n k=0 2 e−µn /2σX = 2 t)1/2 (1 − j2σX e � �k µ2 n /2 1−j2σ 2 t X jtµ2 n 1−j2σ 2 t X e . 2 t)1/2 (1 − j2σX = 1 2 4σX (3.254) In the general case, the C.F. is given by � φY (jt) = e jtY � � = e jt PN n=1 Yn Substituting (3.253) in (3.255) yields φY (jt) = N � � = N � � ejtYn n=1 � (3.255) jtµ2 n 1−j2tσ 2 t X e 2 t)1/2 (1 − jt2σX n=1 jtU 2 = where e 1−jt2σX t , 2 t)N/2 (1 − jt2σX U= N � n=1 µ2n . (3.256) (3.257) 147 Probability and stochastic processes To find the corresponding PDF by taking the Fourier transform, we once again resort to an infinite series expansion, i.e. � ∞ 1 fY (y) = e−jyt φY (jt)dt 2π −∞ jtU � ∞ −jyt 1−j2σ 2 t X 1 e e = dt 2 t)N/2 2π −∞ (1 − j2σX 2 � U/2σX 1 −U/2σ2 ∞ e−jyt 1−j2σ 2 t X X dt = e e 2 N/2 2π −∞ (1 − j2σX t) � � 2 � �k � ∞ U/ 2σX 1 −U/2σ2 � 1 ∞ e−jyt X = e dt 2 t)N/2 1 − j2σ 2 t 2π k! −∞ (1 − j2σX X k=0 � 2�k� ∞ ∞ ] 1 −U/2σ2 � [U/ 2σX e−jyt X = e dt 2 t)N/2+k 2π k! (1 − j2σ −∞ X k=0 (3.258) which, according to [Prudnikov et al., 1986a, 2.3.3-9], reduces to � 2�k 2 ∞ � [U/ 2σX ] y N/2+k−1 e−y/(2σX ) 2 −U/2σX fY (y) = e u(y). 2 )N/2+k Γ( N + k) k! (2σX 2 k=0 (3.259) We note, however, that � 4�k k ∞ � [U/ 4σX ] y k=0 k!Γ( N2 + k) = 0 F1 � N 2 which leads directly to y N/2−1 e−y/(2σX ) 0 F1 2 )N/2 (2σX 2 2 −U/2σX fY (y) = e � � � � Uy � � 4σ 4 , (3.260) X N 2 � � � Uy � � 4σ 4 . (3.261) X From (3.261), it is useful to observe that � 2�k ∞ � � − U2 � [U/ 2σX ] 2 2σ fY (y) = e X gN +2k y, σX , k! (3.262) k=0 2 ) is the PDF of a central chi-square variate with N + 2k where gN +2k (y, σX 2 . This formulation of a non-central degrees-of-freedom and a variance of σX χ2 PDF as a mixture of central χ2 PDFs is often useful to extend results from the central to the non-central case. 148 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Like in the central case, the non-central χ2 distribution can be defined from complex normal r.v.’s. Given the sum of the moduli of N squared complex normal r.v.’s with the same variance σZ2 and means {µ1 , µ2 , . . . , µN }, the PDF is given by fY (y) = − e U σ2 Z y N −1 σZ2N − e y σ2 Z 0 F1 � It follows that the corresponding C.F. is � � �Uy � N � 4 u(y). σZ (3.263) jtU 2 φY (jt) = � e 1−jσZ t 1 − jσZ2 t �N . (3.264) From (3.262) and (3.243), the kth raw moment is � Y k � ∞ � [U/σZ2 ]m Γ (N + m + k) = e m! Γ (N + m) m=0 � � � 2 N + k �� U Γ(N + k) −U/σZ 2k = e σ Z 1 F1 , N � σZ2 Γ(N ) − U σ2 Z σZ2k which, by virtue of Kummer’s identity, becomes � � � � � U Γ(N + k) −k �� k 2k Y = σ Z 1 F1 − . N � σ2 Γ(N ) (3.265) (3.266) Z A most helpful fact at this point is that a hypergeometric function with a single negative integer upper argument is in fact a hypergeometric polynomial, i.e. it can be expressed as a finite sum. Indeed, we have: � � � � ∞ Γ(−k + m)Γ(n1 )(−A)m −k �� F − A = , (3.267) 1 1 n1 � Γ(n1 + m)Γ(−k)m! m=0 which, by virtue of the Gamma reflection formula, becomes 1 F1 (· · · ) ∞ � Γ(n1 )(−1)m Γ(1 + k)(−A)m Γ(n1 + m)Γ(1 + k − m)m! m=0 � k � � Γ(n1 )Am k = , m Γ(n1 + m) = (3.268) m=0 where the terms for m > k are zero because they are characterized by a Gamma function at the denominator (Γ(1 + k − m)) with an integer argument which is zero or negative. Indeed, such a Gamma function has an infinite magnitude. 149 Probability and stochastic processes Therefore, we have � Y k � = �m �� k � � U/σZ2 k + k) . m m! σZ2k Γ(N (3.269) m=0 From this finite sum representation, the mean is easily found to be � � U 2 �Y � = σZ N 1 + 2 . (3.270) σZ N Likewise, the second raw moment is � � � � � 2� 2U U2 4 Y = σZ (N + 1) N + 2 + 4 . σZ σZ It follows that the variance is σY2 � Y 2 − �Y �2 � � 2U 4 = σZ N + 2 . σZ = � (3.271) (3.272) Non-central F -distribution Consider Z1 /n1 , (3.273) Z2 /n2 where Z1 is a non-central χ2 variate with n1 complex degrees-of-freedom and a non-centrality parameter U , and Z2 is a central χ2 variate with n2 complex degrees-of-freedom, then Y is said to follow a non-central F -distribution parameterized on n1 , n2 and U . Given the distributions 1 fZ1 (x) = xn1 −1 e−x e−U 0 F1 (n1 |U x ) u(x), (3.274) Γ(n1 ) 1 fZ2 (x) = xn2 −1 e−x u(x), (3.275) Γ(n2 ) Y = we apply two simple univariate transformations: A1 = Z1 , n1 A2 = n2 , Z2 (3.276) such that Y = A1 A2 . It is straightforward to show that fA1 (x) = fA2 (x) = nn1 1 n1 −1 −n1 x −U x e e 0 F1 (n1 |U n1 x ) u(x), Γ(n1 ) nn2 2 −n2 −1 − n2 x e x u(x). Γ(n2 ) (3.277) (3.278) 150 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING It then becomes possible to compute the PDF of Y through the Mellin convolution, i.e. � �y� 1 fA1 (x) dx, y ≥ 0 x x 0 � ∞ n2 � �n2 +1 n x n1 n1 x n2 − 2 = e y xn1 −2 e−n1 x e−U × Γ(n2 ) y Γ(n1 ) 0 0 F1 (n1 |U n1 x ) dx “ ” � ∞ n2 nn2 2 nn1 1 e−U n1 +n2 −1 − y +n1 x = x e × y n2 +1 Γ(n1 )Γ(n2 ) 0 (3.279) 0 F1 (n1 |U n1 x ) dx, fY (y) = ∞ fA2 where the integral can be interpreted as the Laplace transform of xν 0 F1 (n |Kx ) (see property 2.79) and yields fY (y) = e−U Γ(n � + n2 ) � Γ(n1 )Γ(n2 ) 1 n1 n2 1+ �n1 n1 n2 y y n1 −1 �n1 +n2 1 F1 � � � � n1 n1 + n2 � n2 U y u(y). � n1 � 1 + nn12 y (3.280) It is noteworty that letting U = 0 in the above, it reduces to the density of a central F-distribution, it being the ratio of two central χ2 variates, i.e. � 1 fY (y) = � B(n1 , n2 ) n1 n2 1+ �n1 n1 n2 y y n1 −1 �n1 +n2 u(y). (3.281) Examples of the shape of the F-distribution are shown in Figure 3.9. The PDF (3.280) could also have been obtained by representing the PDF of Z1 as an infinite series of central chi-square PDFs according to (3.262) and proceeding to find the PDF of a ratio of central χ2 variates. Hence, we have: − fZ1 (z) = e U σ2 Z �k ∞ � � U/σZ2 hn1 +k (z, σZ2 ), k! (3.282) k=0 where we have included the variance parameter σZ2 (being the variance of the underlying complex gaussian variates; it was implicitely equal to 1 in the preσ2 ceding development leading to (3.280)) and hn (z, σZ2 ) = g2n (z, 2Z ) is the density in z of a central χ2 variate with n complex degrees-of-freedom and a variance of the underlying complex Gaussian variates of σZ2 . 151 Probability and stochastic processes 0.6 0.5 fY (y) 0.4 0.3 U = 0, 1, 3, 5 0.2 0.1 0 0 4 2 6 8 10 y Figure 3.9. The F-distribution PDF with n1 = n2 = 2 and various values of the non-centrality parameter U . Given that the PDF of Z2 is (3.275), we apply (for the sake of diversity with respect to the preceding development) the following joint transformation: Y = n2 Z1 , n1 Z2 (3.283) X = Z2 . The Jacobian is J = fY,X (y, x) = = n1 x n2 . n1 x fZ n2 1 It follows that the joint density of y and x is � n1 x − σU2 e Z n2 � n1 xy fZ2 (x) n2 �m ∞ � � U/σ 2 Z m=0 1 xn2 −1 e−x Γ(n2 ) m! hn1 +m � n1 xy, σZ2 n2 � × „ « � ∞ � n y 2 m −x 1+ 1 2 − U2 � U/σZ n1 x n2 −1 n2 σ σ Z e Z = x e × m! n2 Γ(n2 )σZ2 k=0 � n1 �n1 +m−1 1 n2 xy u(x)u(y). (3.284) 2 Γ (n1 + m) σZ 152 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING The marginal PDF of Y is, as usual, obtained by integrating over X. In this case, it is found convenient to exchange the order of the summation and integration operations, i.e. � ∞ fY (y) = fY,X (y, x)dx 0 � �m � n1 �n1 +m−1 ∞ − U2 � U/σZ2 n1 n2 y σ Z × = e 2 m!Γ(n1 + m) σZ2 Γ(n2 + 1)σZ m=0 « „ � ∞ n y −x 1+ 1 2 n2 σ Z dx. (3.285) xn2 +n1 +m−1 e 0 � � Letting v = x 1 + nn1σy2 , we find that the remaining integrand follows the 2 Z definition of the Gamma function. Hence, we have „ « � ∞ n y −x 1+ 1 2 n2 σ Z I = xn2 +n1 +m−1 e 0 � ∞ 1 = � v n1 +n2 +m−1 e−v dv �n1 +n2 +m n1 y 0 1 + n σ2 2 Z = � Γ (n1 + n2 + m) �n1 +n2 +m . n1 y 1 + n σ2 (3.286) 2 Z Substituting the above in (3.284), we obtain fY (y) = − U2 n1 σ Z × e (3.287) Γ(n2 + 1)σZ2 � �m � n1 �n1 +m−1 ∞ � U/σZ2 Γ(n1 + n2 + m) n2 y u(y), � �n1 +n2 +m σZ2 n y m=0 m!Γ(n1 + m) 1 + 1 2 n σ 2 Z where the resulting series defines a 1 F1 (·) (Kummer) confluent hypergeometric function, thus leading to the compact representation � �n1 � � � − U2 n1 n1 � σ y n1 −1 U y n2 e Z u(y) n1 + n2 � n . fY (y) = 2n1 � 4 2 2 n1 � � 1 F1 n1 � σZ + σZ n y σZ B(n1 , n2 ) 1 + n1 y n1 +n2 2 2 n2 σZ (3.288) It is noteworthy that (3.287) can be expressed as a mixture of central F distributions, i.e. � � � �� ∞ � 2 m − U2 � U/σZ m (F ) σ fY (y) = e Z hn1 +m,n2 y, σZ2 1 + , (3.289) m! n1 m=0 153 Probability and stochastic processes where ) h(F ν1 ,ν2 � y, σF2 � 1 = B (ν1 , ν2 ) � ν1 ν2 σF2 �ν 1 � 1+ y ν1 −1 �ν1 +ν2 u(y), ν1 y 2 ν 2 σF (3.290) is the PDF of a central F -distributed variate in y with ν1 and ν2 degrees-offreedom and where the numerator has a mean of σF2 1 . The raw moments are best calculated based on the series representation (3.287). Thus, after inverting the order of the summation and integration operations, we have: � �n1 � � � n �m n1 2 m 1 ∞ � � U � U/σ Γ(n1 + n2 + m) 2 − 2 Z n2 n2 σZ k σ Z Y = e × m!Γ(n1 + m) Γ(n2 )σZ2n1 m=0 � ∞ y k+n1 +m−1 (3.291) � �n1 +n2 +m dy. 0 1 + nn1σy2 2 Z Letting u = n1 y n2 σ 2 Z n1 y 1+ n2 σ 2 Z , we find that y = dy = n2 σZ2 u , n1 (1 − u) n1 (1 − u)2 du, n2 σZ2 (3.292) (3.293) which leads to I = � 0 = 1 un1 +m+k−1 � n1 2 n2 σZ � �−k−n1 −m n1 2 n2 σZ du k+1−n2 (1 − u) �−k−n1 −m B (k + m + n1 , −k + n2 ) , n2 > k. Substituting (3.293) in (3.290), we find � 2 �k n2 σZ � �m ∞ � � − U2 � U/σZ2 n1 k σ = Y e Z × Γ(n2 ) m!Γ(n1 + m) (3.294) m=0 Γ (k + m + n1 ) Γ (−k + n2 ) , 1 The mean of the numerator is indeed ν 1 n2 > k, (3.295) times the variance of the underlying Gaussians by virtue of (3.245) 154 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING which also has a compact representation similar to (3.288): � 2 n2 σZ n1 �k − U σ2 Z � � k + n1 �� U Y = . 1 F1 n1 � σZ2 Γ(n1 )Γ(n2 ) (3.296) According to Kummer’s identity, the above can be transformed as follows: � k � � e � Γ(k + n1 )Γ(n2 − k) n1 2 n2 σZ �n1 +k−n2 � � � � U −k �� − Y = . 1 F1 n1 � σZ2 Γ(n1 )Γ(n2 ) (3.297) By virtue of (3.267) and (3.268), a hypergeometric function with an upper negative integer argument (and no lower negative integer argument) is representable as a finite sum. Thus, we have � k � Yk = � � Γ(k + n1 )Γ(n2 − k) 2 n2 σZ n1 �k k � � k m m=0 Γ(k + n1 )Γ(n2 − k) Γ(n2 ) � �m U � 2 σZ × . (3.298) n2 > 1. (3.299) Γ(n1 + m) It follows that the mean is �Y � = � n2 σZ2 n1 + U 2 σZ n1 (n2 − 1) � , In the same fashion, we can show that the second raw moment is �� � � � 2 � n22 σZ4 1 U U2 Y = 2 2 + n1 (n1 + 1) + 4 . n21 (n2 − 1)(n2 − 2) σZ σZ (3.300) It follows that the variance is � �2 U n + 2 4 1 2 n2 σ Z U σZ 2 σY = 2 + 2 2 + n1 . (3.301) n1 (n2 − 1)(n2 − 2) σZ n2 − 1 Beta distribution Let Y1 and Y2 be two independent central chi-square random variables with n1 and n2 complex degrees-of-freedom, respectively and where the variance of the underlying complex Gaussians is the same and is equal to σZ2 . Furthermore, 155 Probability and stochastic processes we impose that A2 = Y1 + Y2 , BA2 = Y1 . (3.302) (3.303) Then A and B are independent and B follows a beta distribution with PDF fB (b) = Γ(n1 + n2 ) n1 −1 b (1 − b)n2 −1 [u(b) − u(b − 1)] . Γ(n1 )Γ(n2 ) (3.304) As a first step in deriving this PDF, consider the following bivariate transformation: C1 = Y1 + Y2 , C2 = Y1 , (3.305) (3.306) where it can be verified that the corresponding Jacobian is 1. Since Y1 and Y2 are independent, their joint PDF is fY1 ,Y2 (y1 , y2 ) = − y1n1 −1 y2n2 −1 e 2(n1 +n2 ) σZ y1 +y2 σ2 Z Γ(n1 )Γ(n2 ) u(y1 )u(y2 ), (3.307) which directly leads to fC1 ,C2 (c1 , c2 ) = − cn2 1 −1 (c1 − c2 )n2 −1 e 2(n1 +n2 ) σZ c1 σ2 Z Γ(n1 )Γ(n2 ) u(c1 )u(c2 ). (3.308) Consider now a second bivariate transformation: T BT = C1 , = C2 , (3.309) (3.310) where the Jacobian is simply t. We find that − t 2 Γ(n1 + n2 ) n1 −1 tn1 +n2 −1 e σZ fB,T (b, t) = b (1 − b)n2 −1 2(n +n ) u(b)u(t). Γ(n1 )Γ(n2 ) σZ 1 2 Γ(n1 + n2 ) (3.311) Since this joint distribution factors, B and T are independent and B is said to follow a beta distribution with n1 and n2 complex degrees-of-freedom and PDF given by fB (b) = Γ(n1 + n2 ) n1 −1 b (1 − b)n2 −1 u(b), Γ(n1 )Γ(n2 ) (3.312) 156 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING 5 [2, 1], [3, 1], [4, 1], [5, 1] 4 fB (b) [4, 2] 3 [4, 4] [3, 2] [2, 2] 2 1 0 0 0.2 0.4 0.6 0.8 1 b Figure 3.10. The beta PDF for various integer combinations of [n1 , n2 ]. while T follows a central chi-square distribution with n1 +n2 complex degreesof-freedom and an underlying variance of σZ2 . Representative instances of the beta PDF are shown in Figure 3.10. The CDF of a beta-distributed variate is given by Γ(n1 + n2 ) FB (x) = P (B < x) = Γ(n1 )Γ(n2 ) � 0 x bn1 −1 (1 − b)n2 −1 db. (3.313) Applying integration by parts, we obtain � � � Γ(n1 + n2 ) xn1 (1 − x)n2 −1 n2 − 1 x n1 n2 −2 FB (x) = + b (1 − b) db , Γ(n1 )Γ(n2 ) n1 n1 0 (3.314) where it can be observed that the remaining integral is of the same form as the original except that the exponent of b has been incremented while the exponent of (1 − b) has been decremented. It follows that integration by parts can be applied iteratively until the last integral term contains only a power of b and is 157 Probability and stochastic processes thus expressible in closed form: � Γ(n1 + n2 ) xn1 (1 − x)n2 −1 FB (x) = + Γ(n1 )Γ(n2 ) n1 � n2 − 1 xn1 −1 (1 − x)n2 −2 + ··· n1 n1 + 1 � n2 − 2 xn1 +n2 −1 (1 − x) ··· + + n1 + 1 n1 + n2 − 2 �� � xn1 +n2 −1 ··· . (n1 + n2 − 1)(n1 + n2 ) (3.315) By inspection, a pattern can be identified in the above, thus leading to the following finite series expression: FB (x) = xn1 Γ (n1 + n2 + 1) n� 2 −1 m=0 (1 − x)n2 −1−m xm . Γ(n1 + m + 1)Γ(n2 − m) (3.316) However, the above development is valid only if the degrees-of-freedom are integers. But the PDF exists even if that is not the case. In general, we have: � � � Γ(n1 + n2 ) n 1 − n2 �� n1 FB (x) = x 2 F1 (3.317) �x . 1 + n1 Γ(n1 + 1)Γ(n2 ) The kth raw moment is given by � � � Γ(n1 + n2 ) 1 k+n1 −1 k B = b (1 − b)n2 −1 db Γ(n1 )Γ(n2 ) 0 Γ(n1 + n2 )Γ(k + n1 ) = . Γ(n1 )Γ(k + n1 + n2 ) (3.318) Therefore, the mean is given by �B� = the second raw moment by � 2� B = and the variance by 2 σB = = = � n1 , n1 + n2 n1 (1 + n1 ) , (n1 + n2 )(1 + n1 + n2 ) (3.319) (3.320) � B 2 − �B�2 n1 (1 + n1 ) n21 − (n1 + n2 )(1 + n1 + n2 ) (n1 + n2 )2 n1 (1 + n1 − n1 n2 ) . (n1 + n2 )2 (1 + n1 + n2 ) (3.321) 158 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Nakagami-m distribution The Nakagami-m distribution was originally proposed by Nakagami [Nakagami, 1960] to model a wide range of multipath fading behaviors. Joining the Rayleigh and Rice PDFs, the Nakagami-m distribution is one of the three most encountered models for multipath fading (see chapter 4 for more details). Unlike the other two, however, the Nakagami PDF was obtained through empirical fitting with measured RF data. It is of interest that the Nakagami-m distribution is very flexible, being controlled by its m parameter. Thus, it includes the Rayleigh PDF as a special case, and can be made to approximate closely the Rician PDF. The Nakagami-m PDF is given by 2 � m �m 2m−1 −m y y e Ω u(y), fY (y) = (3.322) Γ(m) Ω where m is the distribution’s parameter which takes on any real value between 1 2 and ∞, and Ω is the second raw moment, i.e. � � Ω= Y2 . (3.323) In general, the kth raw moment of Y is given by � � Γ � k + m� � Ω �k/2 2 Yk = . Γ(m) m Likewise, the variance is given by � � σY2 = Y 2 − �Y �2 � � �� 1 Γ2 12 + m = Ω 1− . m Γ2 (m) (3.324) (3.325) � 1 The � Nakagami distribution reduces to a Rayleigh PDF if m = 1. If m ∈ , 1 , distributions which are more spread out (i.e. have longer tails) than 2 Rayleigh result. In fact, a half-Gaussian distribution is obtained with the minimum value m = 12 . Values of m above 1 result in distributions with a more compact support than the Rayleigh PDF. The distribution is illustrated in Figure 3.11 for various values of m. It was found in [Nakagami, 1960] that a very good approximation of the Rician PDF can be obtained by relating the Rice factor K and the Nakagami parameter m as follows: K= m2 − m √ , m − m2 − m m > 1, (3.326) 159 Probability and stochastic processes with the inverse relationship being m= (K + 1)2 K2 =1+ . 2K + 1 2K + 1 fY (y) 2 (3.327) m = 12 , 34 , 1, 32 , 2, 52 , 3 1.5 1 0.5 0 0 1 2 3 4 5 6 y Figure 3.11. The Nakagami distribution for various values of the parameter m with Ω = 1. Lognormal distribution The lognormal distribution, while being analytically awkward to use, is highly important in wireless communications because it characterizes very well the shadowing phenomenon which impacts outdoor wireless links. Interestingly, other variates which follow the lognormal distribution include the weight and blood pressure of humans, and the number of words in the sentences of the works of George Bernard Shaw! When only path loss and shadowing are taken into account (see chapter 4), it is known that the received power Ω(dB) at the end of a wireless transmission over a certain distance approximately follows a noncentral normal distribution, i.e. � � 1 (p − µP )2 fΩ(dB) (p) = √ exp − , (3.328) 2σP2 2πσΩ where Ω(dB) = 10 log10 (P ). (3.329) 160 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING After an appropriate transformation of r.v.’s, the lognormal distribution of P is found to be � � η (10 log10 (x) − µP )2 fP (x) = √ exp − u(x), (3.330) 2σP2 2πσΩ x 10 where η = ln(10) . This density, however, is very difficult to integrate. Nonetheless, the kth raw moment can be found by using variable substitution to revert to the Gaussian form in the integrand: � ∞ � � k P = xk fP (x)dx 0 � � � ∞ η (10 log10 (p) − µP )2 k−1 √ dx = x exp − 2σP2 2πσΩ 0 � � � ∞ 1 (y − µP )2 = √ 10ky/10 exp − dy 2σP2 2πσΩ −∞ � � � ∞ 1 (y − µP )2 ky/η = √ e exp − dy 2σP2 2πσΩ −∞ � 2 � � ∞ η (z − µP /η)2 η kz = √ e exp − dz (3.331) 2σP2 2πσΩ −∞ which, by virtue of [Prudnikov et al., 1986a, 2.3.15-11], yields � 2 � � � 1 σP 2 µP k P = exp k + k . 2 η2 η It follows that the variance is given by � � σP2 = P 2 − �P �2 � 2 � � 2 � 2σP 2µP µP 2 σP = exp + − exp + η2 η 2η 2 η � 2 � � 2 � 2σP σP 2µP 2µP = exp + − exp 2 + η2 η η η � �� � 2� � 2 σ σ 2µP = exp + P2 exp P2 − 1 . η η η 6. (3.332) (3.333) Multivariate statistics Significant portions of this section follow the developments in the first few chapters of [Muirhead, 1982] with additions, omissions and alterations to suit our purposes. One notable divergence is the emphasis on complex quantities. 161 Probability and stochastic processes Random vectors Given an M × M random vector x, its mean is defined by the vector �x1 � �x2 � µ = �x� = .. . (3.334) . �xM � The central second moments of x, analogous to the variance of a scalar variate, are defined by the covariance matrix: � � ,M H Σx = [σmn ]m=1,··· (3.335) n=1,··· ,M = (x − µ) (x − µ) , where σmn = �(xm − µm )(xn − µn )∗ � . (3.336) Lemma 3.1. An M ×M matrix Σx is a covariance matrix iff it is non-negative definite (Σ ≥ 0). Proof. Consider the variance of aH x where a is considered constant. We have: � � Var(aH x) = aH (x − µx ) (x − µx )H a � � = aH (x − µx ) (x − µx )H a = aH Σx a. (3.337) � H �2 The quantity �a (x − µx )� is by definition always non-negative. The variance above is the expectation of this expression and is therefore also nonnegative. It follows that the quadratic form aH Σx a ≥ 0, proving that Σx is non-negative definite. The above does not rule out the possibility that aH Σx a = 0 for some vector a. This is only possible if there is some linear combination of the elements of x, defined by vector a, which always yields 0. This is in turn only possible if all elements of x are fully defined by x1 , i.e. there is only a single random degree of freedom. In such a case, the complex vector x is constrained to a fixed 2D hyperplane in R2M . If x is real, then it is constrained to a line in RM . Multivariate normal distribution Definition 3.40. An M × 1 vector has an M -variate real Gaussian (normal) distribution if the distribution of aT x is Gaussian for all a in RM where a �= 0. Theorem 3.2. Given an M × 1 vector x whose elements are real Gaussian variates with mean µx = �x� , (3.338) 162 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING and covariance matrix � � Σx = (x − µ) (x − µ)T , (3.339) then x is said to follow a multivariate Gaussian distribution, denoted x ∼ NM (µ, Σ), and the associated PDF is � � 1 1 −M − T −1 fx (x) = (2π) 2 det(Σ) 2 exp − (x − µ) Σ (x − µ) . (3.340) 2 Proof. Given a vector u where the elements are zero-mean i.i.d Gaussian variates with u ∼ N (0, I). We have the transformation x = Au + µ, (3.341) which implies that Σx = � (x − µ)(x − µ)T � � = (Au)(Au)T � � = A uuT AT � = AAT . (3.342) Assuming that A is nonsingular, the inverse transformation is u = B(x − µ), where B = A−1 . The Jacobian is J ∂u1 ∂x1 ··· .. . ··· .. . = det ∂uM ∂x1 = det(B) = det(A (3.343) ∂u1 ∂xM .. . ∂uM ∂xM −1 ) 1 = det(A)−1 = det(Σ)− 2 . (3.344) Given that fu (u) = M � 1 1 2 (2π)− 2 e− 2 um m=1 −M 2 = (2π) M = (2π)− 2 � M 1� 2 exp − um 2 m=1 � � 1 exp − uT u . 2 � (3.345) 163 Probability and stochastic processes Applying the transformation, we get � � � −1 �H −1 1 1 −M − T fx (x) = (2π) 2 det(Σ) 2 exp − (x − µ) A A (x − µ) , 2 (3.346) � �H which, noting that Σ−1 = A−1 A−1 , constitutes the sought-after result. Theorem 3.3. Given an M ×1 vector x whose elements are complex Gaussian variates with mean u and covariance matrix � � Σ = xxH , (3.347) then x is said to follow a complex multivariate Gaussian distribution and this is denoted x ∼ CN M (u, Σ). The associated PDF is � � fx (x) = (π)−M det(Σ)−1 exp −(x − u)H Σ−1 (x − u) . (3.348) The proof is almost identical to the real case and is left as an exercise. Theorem 3.4. If x ∼ CN M (u, Σ) and A is a fixed P × M matrix and b is a fixed K × 1 vector, then � � y = Bx + b ∼ CN P Bu + b, BΣBH . (3.349) Proof. From definition 3.40, it is clear that any linear combination of the elements of a Gaussian vector, real or complex, is a univariate Gaussian r.v. It directly follows that y is a multivariate normal vector. Furthermore, we have �y� = = = � H� yy = �AX + b� A �X� + b Au + b (3.350) � � (Ax + b − (Au + b)) (Ax + b − (Au + b))H � � = (A(x − u)) (A(x − u))H � � = A (x − u)(x − u)H AH = AΣAH , (3.351) which concludes the proof. Theorem 3.5. Consider x ∼ CN M (u, Σ) and the division of x, its mean vector, and covariance matrix as follows: � � � � � � x1 u1 Σ11 0 x= u= Σ= , (3.352) x2 u2 0 Σ22 164 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING where x1 is P × 1 and has mean vector u1 and covariance matrix Σ11 , x2 is (N − P ) × 1 and corresponds to mean vector u2 and covariance matrix Σ22 . Then x1 ∼ CN P (u1 , Σ11 ) and x2 ∼ CN M −P (u2 , Σ22 ). Furthermore, x1 and x2 are not only uncorrelated, they are also independent. Proof. To prove this theorem, it suffices to introduce the partitioning of x into x1 and x2 into the PDF of x. Hence: � � � � �� ��−1 �H x1 Σ11 0 (x1 − u1 ) −M fx = (π) det exp − × x2 0 Σ22 (x2 − u2 ) � �−1 � �� Σ11 0 (x1 − u1 )H 0 Σ22 (x2 − u2 )H = (π)−M det (Σ11 )−1 det (Σ22 )−1 × � � exp (x1 − u1 )H Σ−1 11 (x1 − u1 ) × � � exp (x2 − u2 )H Σ−1 22 (x2 − u2 ) = fx1 (x1 )fx2 (x2 ). (3.353) Since the density factors, independence is established. Furthermore, the PDFs of x1 and x2 have the expected forms, thus completing the proof. The preceding theorem establishes a very important property of Gaussian variates: if two (or more) Gaussian r.v.’s are uncorrelated, they are also automatically independant. While it may seem counterintuitive, this is not in general true for arbitrarily-distributed variables. Theorem 3.6. Consider x ∼ CN M (u, Σ) and the division of x, its mean vector, and covariance matrix as follows: � � � � � � Σ11 Σ12 x1 u1 Σ= , (3.354) x= u= Σ21 Σ22 x2 u2 where x1 is P × 1 and has mean vector u1 and covariance matrix Σ11 , x2 is (N − P ) × 1 and corresponds to mean vector u2 and covariance matrix Σ22 . Then + y = x − Σ12 Σ+ 22 x2 ∼ CN P (u1 − Σ12 Σ22 u2 ), Σ11.2 ), (3.355) where Σ11.2 = Σ11 − Σ12 Σ+ 22 Σ21 , and y is independent from x2 . Furthermore, the conditional distribution of x1 given x2 is also complex Gaussian, i.e. � � (x1 |x2 ) ∼ CN P u1 + Σ12 Σ+ (3.356) 22 (x2 − u2 ), Σ11.2 . Proof. To simplify the proof, it will be assumed that Σ22 is positive definite, −1 thus implying that Σ+ 22 = Σ22 . For a more general proof, see [Muirhead, 1982, theorem 1.2.11]. 165 Probability and stochastic processes Defining the matrix B= � theorem 3.4 indicates that y = Bx = is a Gaussian vector with mean v = �y� = � IP 0 −Σ12 Σ−1 22 IN −P � x1 − Σ12 Σ−1 22 x2 x2 � , (3.358) � u1 − Σ12 Σ−1 22 u2 u2 � , (3.359) (3.357) , and covariance matrix � � BΣBH = (y − v)(y − v)H � � �� �� IP 0 Σ11 Σ12 IP −Σ12 Σ−1 22 = Σ21 Σ22 −Σ−1 0 IN −P 22 Σ21 IN −P � � Σ11.2 0 = . (3.360) 0 Σ22 Random matrices Given an M × N matrix A = [a1 a2 · · · aN ], where �am � = 0 � am aH = Σ, m = 1, . . . , M m � � H am an = 0, m �= n. � To fully characterize the covariances of a random matrix, it is necessary to vectorize it, i.e. Σ 0 ··· 0 0 Σ ··· 0 � � vec(A)vec(A)H = .. .. . . . . . . .. 0 0 = IN ⊗ Σ, ··· Σ (3.361) where the above matrix has dimensions M N × M N and IN is the N × N identity matrix. It follows that if Σ = IM , the overall covariance is IN ⊗ IM = IM N . 166 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Given the transformation B = RAS, where P × M matrix R and N × Q matrix S are fixed, we have �B� = R �A� S. (3.362) From lemma (2.3), we have � � vec(B) = SH ⊗ R vec(B), which implies and � � � �vec(B)� = SH ⊗ R �vec(B)� , vec(B)vec(B)H � (3.363) (3.364) � �� � �H � SH ⊗ R vec(A) SH ⊗ B vec(A) �� H � � �� = S ⊗ R vec(A)vec(A)H S ⊗ RH � � � � = SH ⊗ R (IN ⊗ Σ) S ⊗ RH , (3.365) = �� where property 2.60 was applied to move from the first to the second line. Applying property 2.61 twice, we finally get � � Σvec(B) = vec(B)vec(B)H = SH S ⊗ RΣRH . (3.366) It follows that, if Σ = IM , the overall covariance matrix can be conveniently expressed as Σvec(B) = SH S ⊗ RRH , (3.367) where RRH is the column covariance matrix and SH S is the line covariance matrix. Gaussian matrices Theorem 3.7. Given a P × Q complex Gaussian matrix X such that X ∼ CN (M, C ⊗ D) where C is a Q × Q positive definite matrix, D is a P × P positive definite matrix, and �X� = M, then the PDF of X is fX (X) = 1 π P Q (det(C))P (det(D))Q × � � etr −C−1 (X − M)H D−1 (X − M) , (3.368) X > 0, where etr(X) = exp(tr(X)). Proof. Given the random vector x = vec(X) with mean m = �vec(X)�, we know from theorem 3.2 that its PDF is � � 1 fx (x) = P Q exp −(x − m)H (C ⊗ D)−1 (x − m) . π det (C ⊗ D) (3.369) 167 Probability and stochastic processes Equivalence of the above with the PDF stated in the theorem is demonstrated first by observing that det(C ⊗ D) = (det(C))P (det(D))Q , (3.370) which is a consequence of property 2.64. Second, we have � � (x−m)H (C ⊗ D)−1 (x−m) = (x−m)H C−1 ⊗ D−1 (x−m), (3.371) which, according to lemma 2.4(c), becomes � � � � (x − m)H C−1 ⊗ D−1 (x − m) = tr C−1 (X − M)H D−1 (X − M) , (3.372) which completes the proof. 7. Transformations, Jacobians and exterior products Given an M × 1 vector x which follows a PDF fx (x), we introduce a transformation y = g(x), thus finding that where the Jacobian is fy (y) = fx (g −1 (y)) |J(x → y)| , J(x → y) = det ∂x1 ∂y1 .. . ∂xM ∂y1 ··· .. . ··· ∂x1 ∂yM .. . ∂xM ∂yM (3.373) . (3.374) The above definition for the Jacobian, based on a determinant of partial derivatives, certainly works and was used in deriving the multivariate normal distribution. However, this definition is cumbersome for large numbers of variables. We will illustrate an alternative approach based on the multiple integral below. Our outline follows [James, 1954] as interpreted in [Muirhead, 1982]. We have � I= fx (x1 , . . . , xM )dx1 . . . dxM , (3.375) S M R where S is a subset of and I is the probability that vector x takes on a value in subset S. Given a one-to-one invertible transformation y = g(x), the integral becomes I= � S† fx (g −1 (x)) |J(x → y| dy1 . . . dyM , (3.376) (3.377) 168 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING where S † is the image of S under the transformation, i.e. S † = g(S). We wish to find an alternative expression of dx1 · · · dxM as a function of dy1 · · · dyM . Such an expression can be derived from the differential forms dxm = ∂xm ∂xm ∂xm dy1 + dy2 + · · · + dyM . ∂y1 ∂y2 ∂yM (3.378) These can be substituted directly in (3.377). Consider for example the case where M = 2. We get � �� � � ∂x1 ∂x1 ∂x2 ∂x2 −1 I= fx (g (y)) dy1 + dy2 dy1 + dy2 . ∂y1 ∂y2 ∂y1 ∂y2 S† (3.379) The problem at hand is to find a means of carrying out the product of the two differential form in order to fall back to the Jacobian. If the multiplication is carried out in the conventional fashion, we find � �� � ∂x1 ∂x1 ∂x2 ∂x2 dy1 + dy2 dy1 + dy2 = ∂y1 ∂y2 ∂y1 ∂y2 ∂x1 ∂x2 ∂x1 ∂x2 ∂x1 ∂x2 dy1 dy1 + dy1 dy2 + dy2 dy1 + ∂y1 ∂y1 ∂y1 ∂y2 ∂y2 ∂y1 ∂x1 ∂x2 dy2 dy2 . (3.380) ∂y2 ∂y2 Comparing the above to the Jacobian � J(x → y)dy1 dy2 = det ∂x1 ∂y1 ∂x2 ∂y1 ∂x1 ∂y2 ∂x2 ∂y2 � = � ∂x1 ∂x2 ∂x1 ∂x2 − ∂y1 ∂y2 ∂y2 ∂y1 � , (3.381) we find that the two expressions ((3.379) and (3.381)) can only be reconciled if we impose non-standard multiplication rules and use a noncommutative, alternating product where dym dyn = −dyn dym . (3.382) This implies that dym dym = −dym dym = 0. This product is termed the exterior product and denoted by the symbol ∧. According to this wedge product, the product (3.379) becomes � � ∂x1 ∂x2 ∂x1 ∂x2 dx1 dx2 = − dy1 ∧ dy2 . (3.383) ∂y1 ∂y2 ∂y2 ∂y1 Theorem 3.8. Given two N × 1 real random vectors x and y, as well as the transformation y = Ax where A is a nonsingular M × M matrix, we have 169 Probability and stochastic processes dy = Adx and M � dym = det(A) m=1 M � dxm . m=1 Proof. Given the properties of the exterior product, it is clear that M � dym = p(A) m=1 M � (3.384) dxm , m=1 where p(A) is a polynomial in the elements of A. Indeed, the elements of A are the coefficients of the elements of x and as such are extracted by the partial derivatives (see e.g. (3.383)). The following outstanding properties of p(A) can be observed: (i) If any row of A is multiplied by a scalar factor α, then p(A) is likewise increased by α. (ii) If the positions of two variates yr and ys are reversed, then the positions of dyr and dys are also reversed in the exterior product, leading to a change of sign (by virtue of (3.382). This, however, is equivalent to interchanging rows r and s in A. It follows that interchanging two rows of A reverses the sign of p(A). (iii) If A = I, then p(I) = 1 since this corresponds to the identity transformation. However, these three properties correspond to properties 2.29-2.31 of determinants. In fact, this set of properties is sufficiently restrictive to define determinants; they actually correspond to Weierstrass’ axiomatic definition of determinants [Knobloch, 1994]. Therefore, we must have p(A) = det(A). Rewriting (3.375) using exterior product notation, we have � I= fx (x1 , . . . , xM )dx1 ∧ dx2 . . . ∧ dxM , (3.385) S From (3.378), we know that dx = ∂x1 ∂y1 .. . ∂xM ∂y1 ··· .. . ··· ∂x1 ∂yM .. . ∂xM ∂yM . Therefore, theorem 3.8 implies that �� � M � M � ∂xm m=1,··· ,M � dxm = det dym , ∂yn n=1,··· ,M m=1 m=1 (3.386) (3.387) 170 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING where the Jacobian is the magnitude of the right-hand side determinant. In general, we have M � m=1 dxm = J(x → y) M � dym . (3.388) m=1 For an M × N real, arbitrary random matrix X, we have in general dx11 ··· dx1N .. .. , .. dX = (3.389) . . . dxM 1 · · · dxM N which is the matrix of differentials, and we denote (dX) = N � M � (3.390) dxmn , n=1 m=1 the exterior product of the differentials. If X is M × M and symmetric, it really has only 12 M (M + 1) distinct elements (the diagonal and above-diagonal elements). Therefore, we define (dX) = M � n � dxmn = n=1 m=1 � dxmn . (3.391) m≤n In the same fashion, if X is M × M and skew-symmetric, then there are only 12 M (M − 1) (above-diagonal) elements and (dX) = M n−1 � � n=1 m=1 dxmn = � dxmn . (3.392) m<n It should be noted that a transformation on a complex matrix is best treated by decomposing it into a real and imaginary part, i.e. X = �{X} + j�{X} = Xr + jXi which naturally leads to (dX) = (dXr )(dXi ), (3.393) since exterior product rules eliminate any cross-terms. We offer no detailed proof, but the proof of theorem (3.16) provides some insight into this matter. It is also noteworthy that d(AB) = AdB + dAB. (3.394) 171 Probability and stochastic processes Theorem 3.9. Given two N × 1 complex random vectors u and v, as well as the transformation v = Au where A is a nonsingular M × M matrix, we have dv = Adu and (dv) = det(A)2 (du) . Proof. Given the vector version of (3.393), we have (dv) = (d�{v}) (d�{v}) , (3.395) and, applying theorem 3.8, we obtain (d�{v}) = |det(A)| (d�{u}) (d�{v}) = |det(A)| (d�{u}) , which naturally leads to (dv) = (d�{v}) (d�{v}) = det(A)2 (d�{u}) (d�{u}) = det(A)2 (du) . (3.396) (3.397) Selected Jacobians What follows is a selection of Jacobians of random matrix transformations which will help to support forthcoming derivations. For convenience and clarity of notation, it is the inverse transformations that will be given. Portions of this section follow [Muirhead, 1982, chapter 2] and [Ratnarajah et al., 2004] (for complex matrices). Theorem 3.10. If X = BY where X and Y are N ×M real random matrices and B is a fixed positive definite N × N matrix, then which implies that (dX) = |det(B)|M (dY), (3.398) J(X → Y) = |det(B)|M . (3.399) Proof. The equation X = BY implies dX = BdY. Letting dX = [dx1 , · · · , dxM ] , dY = [dy1 , · · · , dyM ] , we find dxm = Bdym , m = 1, · · · , M, (3.400) 172 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING which, by virtue of theorem 3.8, implies that N � dxnm = det(B) n=1 N � dynm , n=1 m = 1, · · · , M. (3.401) Therefore, we have M � N � (dX) = = dxnm m=1 n=1 M � det(B) m=1 = det(B)M N � dynm n=1 M N � � dynm m=1 n=1 = det(B)M (dY). Theorem 3.11. If X = BYBT where X and Y are M × M symmetric real random matrices and B is a non-singular fixed M × M matrix, then (dX) = |det(B)|M +1 (dY). (3.402) Proof. The equation X = BYBT implies dX = BdYBT . It follows that � � (dX) = BdYBT = p(B)(dY). (3.403) where p(B) is a polynomial in the elements of B. Furthermore it can be shown that if B = B1 B2 , (3.404) then p(B) = p(B1 )p(B2 ). Indeed, we have p(B)(dY) = = = = where (3.403) was applied twice. (B1 B2 dY(B1 B2 )T ) (B1 B2 dYBT2 BT1 ) p(B1 )(B2 dYBT2 ) p(B1 )p(B2 )(dY), (3.405) 173 Probability and stochastic processes It turns out that the only polynomials in the elements of B that can be factorized as above are the powers of det(B) (see prop. 2.40). Therefore, we have p(B) = (det(B))k , where k is some integer. We can isolate k by letting B = det (β, 1, · · · , 1) such that 2 β y11 by12 · · · by1M βy12 y22 · · · by2M BdYBT = .. .. .. . . . . . . . by1M y2M (3.406) · · · yM M It follows that the exterior product of the distinct elements (diagonal and above-diagonal elements) of dX is (dX) = (BdYBT ) = β M +1 (dY). Given that p(B) = β M +1 = det(B)M +1 , the proof is complete. Theorem 3.12. If X = BYBT where X and Y are M × M skew-symmetric real random matrices and B is a non-singular fixed M × M matrix, then (dX) = |det(B)|M −1 (dY). (3.407) Proof. The proof follows that of theorem 3.11 except that by definition, we take the exterior product of the above diagonal elements only in (3.406) since X and Y are skew-symmetric. Theorem 3.13. If X = BYBH where X and Y are M × M positive definite complex random matrices and B is a non-singular fixed M × M matrix, then (dX) = |det(B)|2M (dY). (3.408) Proof. The proof follows from the fact that real and imaginary parts of a Hermitian matrix are symmetric and skew-symmetric, respectively. It follows that, by virtue of theorems 3.11 and 3.12, (dXr ) = det(B)M +1 (dYr ) (dXi ) = det(B)M −1 (dYi ). (3.409) (3.410) It follows that (dX) = (dXr )(dXi ) = det(B)2M (dYr )(dYi ) = det(B)2M (dY). 174 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Theorem 3.14. If X = Y−1 where X and Y are M × M complex nonsingular random matrices and Y is Hermitian, then (dX) = det(Y)−2M (dY), (3.411) Proof. Since XY = I and applying (3.394), we have d(XY) = XdY + dX · Y = dI = 0. (3.412) Therefore, dX · Y = −XdY, dX = −XdY · Y−1 , dX = −Y−1 dY · Y−1 , (3.413) (3.414) which implies that (dX) = (Y−1 dY · Y−1 ) = det(Y)2M (dY), by virtue of theorem 3.13. Theorem 3.15. If A is a real M × M positive definite random matrix, there exists a decomposition A = TT T (Cholesky decomposition) where T is upper triangular. Furthermore, we have (dA) = 2M M � −m+1 tM (dT), mm (3.415) m=1 where tmm is the mth element of the diagonal of T. Proof. The decomposition has the form a11 a12 · · · a1M a12 a22 · · · a2M .. .. .. = .. . . . . a1M a2M · · · aM M t11 t12 .. . 0 t22 .. . ··· ··· .. . 0 0 .. . t1M t2M · · · tM M t11 t12 · · · t1M 0 t22 · · · t2M .. .. . . .. . . . . 0 0 · · · tM M . (3.416) 175 Probability and stochastic processes We proceed by expressing each distinct element of A as a function of the elements of T and then taking their respective differentials. Thus, we have = = = = .. . = .. . = a11 a12 a1M a22 a2M aM M t211 , t11 t12 , t11 t1M , t212 + t222 t12 t1M + t22 t2M , t21M + · · · + t2M M , where we note that da11 = 2t11 dt11 , da12 = t11 dt12 + t12 dt11 , .. . da1M = t11 dt1M + t1M dt11 . When taking the exterior product of the above differentials, the second term in da12 , · · · , da1M disappears since dt211 = 0 (according to the product rules) so that we have: M M � � M da1j = 2t11 dt1M . (3.417) n=0 n=0 In the same manner, we find that M � −m+1 damn = 2tM mm n=m−1 M � dtmM , n=m−1 thus leading naturally to (dA) = � damn = n≤m = 2 M M � m=1 = 2M M � m=1 M � M � damn m=0 n=m−1 −m+1 tM mm � dtmn n≤m −m+1 tM (dT). mm (3.418) 176 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Theorem 3.16. If A is a complex M × M positive definite random matrix, there exists a decomposition A = TH T (Cholesky decomposition, def. 2.53) where T is upper triangular. Furthermore, we have (dA) = 2M M � −2m+1 (dT), t2M mm (3.419) m=1 where tmm is the mth element of the diagonal of T. Proof. Letting amn = αmn +jβmn and tmn = τmn +jµmn and in accordance with def. 2.53 (where the diagonal imaginary elements µmm are set to zero), we have α11 α12 · · · α1M α12 α22 · · · a2M (3.420) .. .. .. = .. . . . . α1M and α2M t211 · · · αM M τ11 τ12 2 + τ2 τ12 τ11 τ22 11 .. .. . . τ1M τ11 ··· 0 β12 .. . β12 0 .. . β1M β2M 0 · · · β1M · · · a2M .. .. . . ··· 0 −µ12 τ11 .. . −µ1M τ11 ··· τ11 τ1M .. ··· . .. .. . . 2 · · · τM M + ··· = τ11 µ12 · · · τ11 µ1M 0 · · · τ12 µ1M + τ22 µM 2 − µ12 τ1M .. .. .. . . . ··· ··· 0 (3.421) . Taking the differentials of the diagonal and the upper-diagonal elements of the real part of A, we have dα11 = 2τ11 dτ11 dα12 = τ11 dτ12 .. . dαM M = 2τM M dτM M . 177 Probability and stochastic processes Likewise, the imaginary part yields dβ12 = τ11 dµ12 dβ13 = τ11 dµ13 .. . dβ1M = τ11 dµ1M . dβM −1,M = τM −1,M −1 dµM −1,M + · · · It is noteworthy that the differential expressions above do not contain all the terms they should. This is because they are meant to be multiplied together using exterior product rules (so that repeated differentials yield 0) and redundant terms have been removed. Hence, only differentials in τmn were taken into account for the real part, and only differentials in µmn were kept for the imaginary part. Likewise, the first appearance of a given differential form precludes its reappearance in successive expressions. For example, dτ11 appears in the expression for dα11 so that all terms with dτ11 are omitted in successive expressions dα12 , . . . dαM M . Hence, we have (dA) = (d�{A})(d�{A}) = M � dαmn m≤n � dβmn m<n M −1 = 2M tM · · · tM M 11 t22 � M � m≤n −1 M −2 tM · · · tM −1,M −1 11 t22 = 2M = 2M M � m=1 M � dτmn × M � dµmn m<n � 2M −2m+1 tmm (d�{T})(d�{T}) 2M −2m+1 tmm (dT). (3.422) m=1 Theorem 3.17. Given a complex N × M matrix Z of full rank where N ≥ M and let it be defined as the product Z = U1 T where U1 is an N × M unitary matrix such that UH 1 U1 = IM , and T is an M × M upper triangular matrix 178 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING (QR decomposition), then we have M � (dZ) = −2m+1 t2N (dT)(UH 1 dU1 ). mm (3.423) m=1 Proof. We postulate a matrix U2 , being a function of U1 , of dimensions N × (N − M ) such that U = [ U1 | U2 ] = [u1 , u2 , · · · , uN ] , (3.424) is a unitary N × N matrix. Furthermore, we have M � N � (UH 1 U1 ) = uH n dum . (3.425) m=1 n=m We note that, according to the chain rule dZ = dU1 T + U1 dT, and UH 1 dZ = � UH 1 UH 2 � dZ = � H UH 1 dU1 T + U1 U1 dT H U2 dU1 T + UH 2 U1 dT H which, noting that UH 1 U1 = IM and U2 U1 = 0, reduces to � � H U1 dU1 T + dT UH dZ = . 1 UH 2 dU1 T (3.426) � The exterior product of the left-hand side of (3.427) is given by � H � U1 dZ = (detU1 )2M (dZ) = (dZ) , . (3.427) (3.428) by virtue of theorem (3.10), (3.393) and the fact that U1 is unitary. Considering now the lower part of the right-hand side of (3.427), the mth row of UH 2 dU1 T is � H � um du1 , · · · , uH m ∈ [M + 1, N ]. (3.429) m duM T, Applying theorem 3.9, the exterior product of the elements in this row is 2 |detT| M � uH m dun , (3.430) n=1 and the exterior product of all elements of UH 2 dU1 T is det(T)2 M � n=1 uH m dun , (3.431) 179 Probability and stochastic processes and the exterior product of all elements of UH 2 dU1 T is � � N M � � � � H uH U2 dU1 T = det(T)2 m dun n=1 m=M +1 2(N −M ) = det N � M � uH m dun . (3.432) m=M +1 n=1 We now turn our attention to the upper part of the right-hand side of (3.427), the latter being UH 1 dU1 T + dT. Since U1 is unitary, we have UH 1 U1 = IM . H Taking the differential of the above yields UH 1 dU1 dU1 U1 = 0 which implies that � H �H H . (3.433) UH 1 dU1 = −dU1 U1 = − U1 U1 For the above to hold, UH 1 dU1 must be skew-Hermitian, i.e. its real part is skew-symmetric and its imaginary part is symmetric. It follows that the real part of UH 1 dU1 can be written as H 0 −�{uH 2 du1 } · · · −�{uM du1 } �{uH du1 } 0 · · · −�{uH 2 M u2 } �{UH dU } = . . . 1 .. 1 .. .. .. . �{uH M u1 } �{uH M u2 } ··· 0 (3.434) Postmultiplying the above by T and retaining only the terms contributing to the exterior product, we find that the subdiagonal elements are 0 ··· ··· ··· ··· �{uH du1 }t11 ··· ··· ··· ··· 2 �{uH du1 }t11 �{u3 du2 }t22 · · · · · · · · · 3 . .. .. .. . . . . . . . . . H �{uH M du1 }t11 �{uM du2 }t22 · · · �{uM uM −1 }tM −1,M −1 · · · (3.435) � � Thus, we find that the exterior product of the subdiagonal elements of �{ UH 1 U1 T} is given by � −1 �M H M −2 M H tM 11 m=2 �{um du1 }t2 2 m=3 �{um du2 } · · · tM −1,M −1 �{uH M duM −1 } = �� �� � M M M M −m H (3.436) m=1 tmm m=1 n=m+1 �{un dum }. 180 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING In the same fashion, we find that the exterior product of the diagonal and subdiagonal elements of �{UH 1 dU1 }T is � M M � M � � � M −m+1 �{uH (3.437) tmm n dum }. m=1 n=m m=1 � � Clearly, the above-diagonal and diagonal elements of �{ UH 1 dU1 T} contribute nothing to the exterior product since they all involve an element of dU1 and all such elements already appear in the subdiagonal � portion. � The same argument appears to the above diagonal elements of �{ UH dU 1 T. Further1 H more, it can be verified that the inclusion of dT in U1 U1 + dT amounts to multiplying the exterior product by (dT) = M � dtmn . (3.438) m≤n Multiplying (3.431), (3.435), (3.437) and (3.438), we finally find that (dZ) = M � m=1 � � t2N −2m+1 (dT) UH 1 dU1 . (3.439) Theorem 3.18. Given a real N × M matrix X of full rank where N ≥ M and let it be defined as the product X = H1 T where H1 is an N × M orthogonal matrix such that HH 1 H1 = IM , and T is an M × M real upper triangular matrix (QR decomposition), then we have (dX) = M � −m H tN mm (dT)(H1 dH1 ), (3.440) m=1 where (HH 1 dH1 ) = M � N � hH n dhm . (3.441) m=1 n=m+1 The proof is left as an exercise. Multivariate Gamma function Definition 3.41. The multivariate Gamma function is a generalization of the Gamma function and is given by the multivariate integral � ΓM (a) = etr(−A)det(A)a−(M +1)/2 (dA), (3.442) A>0 181 Probability and stochastic processes where the integral is carried out over the space of real positive definite (symmetric) matrices. It is a point of interest that it can be verified that Γ1 (a) = Γ(a). Definition 3.42. The complex multivariate Gamma function is defined � Γ̃M (a) = etr(−A)det(A)a−M (dA), (3.443) A>0 where the integral is carried out over the space of Hermitian matrices. Theorem 3.19. The complex multivariate Gamma function can be computed in terms of standard Gamma functions by virtue of Γ̃M (a) = π M (M −1)/2 M � m=1 Γ(a − m + 1). (3.444) Proof. Given the definition 3.19, we apply the transformation A = TH T to obtain (by virtue of theorem 3.16) � M � � � M −2m+1 Γ̃M (a) = 2 etr −TH T det(TH T)a−M t2M (dT), mm T,TH T>0 m=1 (3.445) where we note that a triangular matrix has its eigenvalues along its diagonal. Therefore, M � det(TH T) = t2mm , (3.446) m=1 and � � � tr −TH T = |tmn |2 . (3.447) m≤n Using (3.446) and (3.447), (3.445) can be rewritten in decoupled form, i.e. � M M M � � � Γ̃M (a) = 2M exp − |tmn |2 t2a−2m+1 dtmn mm T,TH T>0 = 2M M �� � −∞ m<n M � m<n �� M �� � m=1 ∞ e ∞ 2 m≤n � e−�{tmn } d�{tmn } × −�{tmn }2 −∞ 0 ∞ m=1 m≤n � d�{tmn } × � 2 e−tmm t2a−2m+1 dt mm . mm (3.448) 182 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Knowing that � ∞ 2 e−t dt = √ (3.449) π, −∞ and applying the substitution um = t2mm , we find Γ̃M (a) = π M (M −1)/2 M �� � m=1 0 ∞ e−um ua−m m dum � , (3.450) where the remaining integrals correspond to standard Gamma functions, thus yielding M � M (M −1)/2 Γ̃M (a) = π Γ(a − m + 1). (3.451) m=1 Stiefel manifold Consider the matrix H1 in theorem 3.18. It is an N × M matrix with orthonormal columns. The space of all such matrices is called the Stiefel manifold and it is denoted VM,N . Mathematically, this is stated � � VM,N = H1 ∈ RN ×M ; HH (3.452) 1 H1 = IM . However, the complex counterpart of this concept will be found more useful in the study of space-time multidimensional communication systems. Definition 3.43. The N × M complex Stiefel manifold is the space spanned by all N × M unitary matrices (denoted U1 ) and it is denoted ṼM,N , i.e. � � ṼM,N = U1 ∈ CN ×M ; UH 1 U1 = IM . Theorem 3.20. The volume of the complex Stiefel manifold is � � � � H � 2M π M N Vol ṼM,N = U1 dU1 = . Γ̃M (N ) ṼM,N Proof. If the real parts and imaginary parts of all the elements of U1 are treated as individual coordinates, then a given instance of U1 defines a point in 2M N dimensional Euclidian space. Furthermore, the constraining equation UH 1 U1 = IM can be decomposed into its real and imaginary part. Since UH U 1 is necessarily Hermitian, the 1 real part is symmetric and leads to 12 M (M + 1) constraints on the elements of U1 . The imaginary part is skew-symmetric and thus leads to 12 M (M − 1) constraints. It follows that there are M 2 constraints on the position of the point 183 Probability and stochastic processes in 2M N -dimensional Euclidian space corresponding to U1 . Hence, the point lies on a 2M N − M 2 -dimensional surface in 2M N space. Moreover, the constraining equation implies M � N � u2mn = M. (3.453) m=1 n=1 Geometrically, this means that the said surface is a portion of a sphere of √ radius M . Given an N × M complex Gaussian matrix X with N ≥ M such that X ∼ CN (0, IN ⊗ IM ). By virtue of theorem 3.7, its density is � � fX (X) = π −N M etr −XH X . (3.454) Since a density function integrates to 1, it is clear that � � � etr −XH X = π M N . (3.455) X Applying the transformation X = U1 T where U1 ∈ ṼM,N and T is upper triangular with positive diagonal elements (since it is nonsingular in accordance with the definition of QR-decomposition), then we have M � � � � � tr XH X = tr TH T = |tmn |2 m≤n (dX) = M � m=1 � � −2m+1 t2N (dT) UH mm 1 dU1 . (3.456) Then, eq. (3.455) becomes � � �� � H � � � M M 2 2N −2m+1 (dT) exp − |t | H mn m≤n m=1 tmm T,T T>0 ṼM,N U1 dU1 = πM N . (3.457) An integral having the same form as the integral above over the elements of T was solved in the proof of theorem 3.19. Applying this result, we find that � � �� � H � � � M M 2 2N −2m+1 (dT) exp − |t | H mn m≤n m=1 tmm T,T T>0 ṼM,N U1 dU1 = Γ̃M (N ) . 2M From the above and (3.455), it is obvious that � 2M π M N � Vol ṼM,N = . Γ̃M (N ) (3.458) (3.459) 184 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING Wishart matrices As a preamble to this subsection, we introduce yet another theorem related to the complex multivariate Gamma function. Theorem 3.21. Given an Hermitian M × M matrix C and a scalar a such that �{a} > M − 1, then � � � etr −C−1 A det(A)a−M (dA) = Γ̃M (a)det(C)a , A>0 where integration is carried out over the space of Hermitian matrices. Proof. Applying the transformation A = C1/2 BC1/2 , where C1/2 is the positive square root (e.g. obtained through eigenanalysis and taking the square root of the eigenvalues) of A, theorem 3.13 implies that (dA) = det(C)M (dB). Hence, the integral becomes � � � I = etr −C−1 A det(A)a−M (dA) �A>0 � � = etr −C−1 C1/2 BC1/2 det(B)a−M (dB)det(C)a−M +M �B>0 = etr (−B) det(B)a−M (dB)det(C)a , (3.460) B>0 Thus, the transformed integral now coincides with the definition of Γ̃M (a), leading us directly to I = Γ̃M (a)det(C)a . (3.461) From the above proof, it is easy to deduce that fA (A) = � � 1 etr −C−1 A det(A)a−M (dA), a Γ̃M (a)det(C) (3.462) is a multivariate PDF since it integrates to 1 and is positive given the constraint A > 0. This is an instance of the Wishart distribution, as detailed hereafter. Definition 3.44. Given an N × M complex Gaussian matrix Z ∼ CN (0, IN ⊗ Σ) where N ≥ M , its PDF is given by fZ (Z) = 1 πM N N |det(Σ)| � � etr −Σ−1 ZH Z (dZ), Z > 0, then A = ZH Z is of size M × M and follows a complex Wishart density with N degrees of freedom and denoted A ∼ CW M (N, Σ). 185 Probability and stochastic processes Unless stated otherwise, it will be assumed in the following that the number of degrees-of-freedom N is equal or superior to the Wishart matrix dimension M , i.e. the Wishart matrix is nonsingular. Theorem 3.22. If A ∼ CW M (N, Σ), then its PDF is given by fA (A) = etr(−Σ−1 A) |det(A)|N −M Γ̃M (N ) |det(Σ)|N . Proof. Given that A = ZH Z in accordance with definition (3.44), we wish to apply a transformation Z = U1 T where U1 is N × M and unitary (i.e. UH 1 U1 = IM ) and T is M × M and upper triangular. From the PDF of Z given in definition 3.44 and theorem 3.17, we have 1 fU1 ,T (U1 , T) = N π M N |det (Σ)| M � m=1 � � etr −Σ−1 TH T × � � −2m+1 t2N (dT) UH mm 1 dU1 , (3.463) H where we note that A = ZH Z = TH UH 1 U1 T = T T and U1 can be removed by integration, i.e. 1 fT (T) = � � −1 H etr −Σ T T × π M N |det (Σ)|N � M � � H � 2N −2m+1 tmm (dT) U1 dU1 , (3.464) ṼM,N m=1 which, according to theorem 3.20, yields fT (T) = 2M � � −1 H etr −Σ T T × Γ̃(N ) |det (Σ)|N M � −2m+1 t2N (dT) . mm (3.465) m=1 Applying a second transformation A = TH T and by virtue of theorem 3.16, we find that fA (A) = 2M N Γ̃(N ) |det (Σ)| M � m=1 � � etr −Σ−1 A × −M ) t2(N (dA) , mm (3.466) 186 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING � 2(N −M ) where M = |det (T)|2(N −M ) = |det (A)|N −M , which conm=1 tmm cludes the proof. While the density above was derived assuming that N is an integer, the distribution exists for non-integer values of N as well. Theorem 3.23. If A ∼ CW M (N, Σ), then its characteristic function is φA (jΘ) = �etr (jΘA)� = det (I − jΘΣ)−N , where Θ = 2t11 t12 · · · t1M t21 2t22 · · · t2M .. .. .. .. . . . . tM 1 tM 2 · · · 2tM M ,M = [tmn ]n=1,··· m=1,··· ,M + diag (t11 , t22 , · · · , tM M ) , and tmn is the variable in the characteristic function domain associated with element amn of the matrix A. Since A is Hermitian, tmn = tnm . Several aspects of the above theorem are interesting. First, a new definition of characteristic functions — using matrix Θ — is introduced for random matrices, allowing for convenient and concise C. F. expressions. Second, we note that if M = 1, the complex Wishart matrix reduces to a scalar a and its characteristic function according to the above theorem is (1 − jtσ 2 )−N , i.e. the C. F. of a 2N degrees-of-freedom chi-square variate. Proof. We have φA (jΘ) = �etr (jΘA)� = = � A etr((jΘ − � etr (jΘA) fA (A)(dA) A Σ−1 )A) |det(A)|N −M Γ̃M (N ) |det(Σ)|N (dA), (3.467) where the integral can be solved by virtue of theorem 3.21 to yield � �−N φA (jΘ) = det (Σ)−N Σ−1 − jΘ = det (I − jΘΣ)−N . (3.468) Theorem 3.24. If A ∼ CW M (N, Σ) and given a K × M matrix X of rank K H (thus implying that K ≤ M ), then the product XAX � � also follows a complex H Wishart distribution denoted by CW K N, XΣX . 187 Probability and stochastic processes Proof. The C. F. of XΣXH is given by � � �� φ (jΘ) = etr jXAXH Θ , (3.469) where, according to a property of traces, the order of matrices can be rotated to yield � � �� φ (jΘ) = etr jAXH ΘX � �−N = det IM − jXH ΘXΣ , (3.470) which, according to property 2.35, is equivalent to � �−N φ (jΘ) = det IM − jΘXΣXH . (3.471) � � Since this is the C. F. of a CW M N, XΣXH variate, the proof is complete. Theorem 3.25. If A ∼ CW M (N, Σ) and given equivalent partitionings of A and Σ, i.e. � � � � Σ11 Σ12 A11 A12 Σ= , A= ΣH AH 12 Σ22 12 A22 where A11 and Σ11 are of size M � × M � , then submatrix A11 follows a CW M � (N, Σ11 ) distribution. Proof. Letting X = [IM � 0] (an M � × M matrix) in theorem 3.24, we find that XAXH = A11 and XΣXH = Σ11 , thus completing the proof. Theorem 3.26. If A ∼ CW M (N, Σ) and x is any M × 1 random vector independant of A such that the probability that x = 0 is null, then xH Ax ∼ χ22N , xH Σx and is independent of x. Proof. The distribution naturally derives from applying theorem 3.24 and letting X = x therein. 1 To show independance, let y = Σ 2 x. Thus, we have xH Ax yH By = , xH Σx yH y (3.472) where B follows a CW M (N, IM ) distribution. This is also equivalent to yH By = zH Bz, yH y (3.473) 188 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING where z = √ yH is obviously a unit vector. It follows that the quadratic form y y is independant of the length of y. Likewise, B has the same distribution as UH BU where U is any M × M unitary matrix (i.e. a rotation in space). It follows that zH Bz is also independant of the angle of y in M -dimensional Hermitian space. Therefore, the quadratic form is totally independant of y and, by extension, of x. Theorem 3.27. Suppose that A ∼ CW M (N, Σ) where N ≥ M and given the following partitionings, � � � � Σ11 Σ12 A11 A12 Σ= , A= ΣH AH 12 A22 12 Σ22 where A11 and Σ11 are P × P , A22 and Σ22 are Q × Q, and P + Q = M , then H the matrix A1.2 = A11 − A12 A−1 22 A12 follows a CW P (M − Q, Σ1.2 ) H distribution, where Σ1.2 = Σ1 1 − Σ12 Σ−1 22 Σ12 ; A1.2 is independent from A12 and A22 ; the PDF of A22 is CW Q (N, Σ22 ); � � the PDF of A12 conditioned on A22 is CN Σ12 Σ−1 22 A22 , A22 ⊗ Σ1.2 . Proof. Given the distribution of A, it can be expressed A = XXH , where X is CN (0, Σ ⊗ IN ) and N × M . Furthermore, X is partitionned as follows: � � , (3.474) X = XH XH 1 2 where X1 is P × N and X2 is Q × N . Since Y2 is made up of uncorrelated columns, its rank is Q, Therefore, theorem 2.6 garantees the existence of a matrix B of size (N − Q) × N of rank N − Q such that X2 BH = 0, BBH = IN −Q , and � � Y = XH (3.475) BH , 2 is nonsingular. Observe that A22 = X2 XH 2 A12 = X1 XH 2 (3.476) and A11.2 = A11 − A12 A−1 22 A21 � � � � H −1 H = X 1 IN − X H X X X 2 2 2 X1 . 2 (3.477) 189 Probability and stochastic processes Starting from � �−1 YH YYH Y = IN , (3.478) which derives directly from the nonsingularity of Y, it can be shown that � � H −1 X 2 + BH X = I N , (3.479) XH 2 X2 X2 by simply expanding the matrix Y into its partitions in (3.478) and exploiting the properties of X2 and B to perform various simplifications. Substituting (3.479) into (3.476), we find that A11.2 = X1 BH BXH 1 . (3.480) Given that X is CN (0, Σ ⊗ IN ), it follows � according to theorem �3.6 that the distribution of X1 conditioned on CN Σ12 Σ−1 22 X2 , IN ⊗ Σ1.2 where −1 Σ1.2 = Σ11 − Σ12 Σ22 Σ21 and the corresponding distribution is fX1 |X2 (X1 |X2 ) = 1 (3.481) × π P N det(Σ11.2 )P � etr −Σ−1 11.2 (X1 − � M)(X1 − M)H , where M = Σ12 Σ−1 22 X2 . � � Applying the transformation Z = A12 C = X1 YH , the Jacobian is found to be � �2P � �P J(X1 → A12 , C) = det YH = det YYH = det (A22 )P (3.482) by exploiting theorems 3.10 and 3.9. Furthermore, the argument of etr(·) in (3.484) can be expressed as follows �� � �� �−1 A12 C − MY YYH (X1 − M) (X1 − M)H = × �� � �H A12 C − MY (3.483) H H = (A12 − M) A−1 22 (A12 − M) + CC , thus yielding fA12 ,C|X2 (A12 , C|X2 ) = 1 π P N det(Σ11.2 )P det(A22 )P � H etr −Σ−1 11.2 CC −, × (3.484) � H −1 Σ−1 (A − M) A (A − M) . 12 12 11.2 22 Since the above density factors, this implies that C follows a CN −Q ⊗ � (0, IN −1 Σ1.2 ) distribution and that A12 conditioned on X2 follows a CN Σ12 Σ22 A22 , 190 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING A22 ⊗ Σ1.2 ) distribution. Furthermore, C is independently distributed from both A12 and X2 . This, in turn, implies that A1.2 = CCH is independently distributed from A12 and A22 . It readily follows that A1.2 = CCH ∼ CW P (N − Q,� Σ1.2 ), A22 ∼ CW Q (N, �Σ22 ) and (A12 conditioned on A22 follows a CN Σ12 Σ−1 22 A22 , A22 ⊗ Σ1.2 , thus completing the proof. Theorem 3.28. If A ∼ CW M (N, Σ) and given a K × M matrix X of rank � � −1 H −1 K where K ≤ M , then B = XA X follows a CW K (N − M + K, � � � −1 XΣ−1 XH distribution. 1 1 Proof. Applying the transformation A2 = Σ− 2 AΣ− 2 , then A2 ∼ CW M (N , IM ) as a consequence of theorem 3.24. The problem can thus be recast in terms of A2 (which has the virtue of exhibiting no correlations, thus simplifying the subsequent proof), i.e. � �−1 � � H −1 = YA−1 , (3.485) XA−1 XH 2 Y 1 where Y = XΣ 2 . It follows that � �−1 � �−1 XΣ−1 X = YYH . (3.486) Performing the SVD of Y according to (2.115), we have � � � � Y = U S 0 VH = US IK 0 VH . Therefore, we have � � H −1 YA−1 2 Y � � = US IK = � � H −1 SU = US−1 � � � 0 � IK � V IK 0 H A−1 2 V 0 � � V D−1 H � � IK 0 A−1 2 V IK 0 � � H SU ��−1 IK 0 (3.487) �−1 ��−1 S−1 UH , (US)−1 (3.488) where D = VH A2 V and, according to theorem 3.24, its PDF is also CW M (N , IM ). It is clear that the product inside the parentheses above yields the upper-left K × K partition of D−1 . According to lemma 2.1, this is equivalent to � � H −1 YA−1 = US−1 D11.2 S−1 UH , (3.489) 2 Y 191 Probability and stochastic processes where D11.2 = D11 − D12 D−1 22 D21 which, by virtue of theorem 3.27, follows CW K (N follows that US−1 D11.2 S−1 UH is � − M + K, IK ).−1It immediately � −1 CW K N − M + K, US S U where US−1 S−1 UH = = thus completing the proof. � � YYH �−1 XΣ−1 X �−1 , (3.490) A highly useful consequence of the above theorem follows. Theorem 3.29. Given a matrix A ∼ CW M (N, Σ) and x is any random vector independent of A such that P (x = 0) = 0, then xH Σ−1 x ∼ χ22N −2M +2 , xH A−1 x and is independent of x. The proof is left as an exercise. Problems 3.1. From the axioms of probability (see definition 3.8), demonstrate that the probability of an event E must necessarily satisfy P (X) ≤ 1. 3.2. What is the link between the transformation technique of a random variable used to obtain a new PDF and the basic calculus technique of variable substitution used in symbolic integration? 3.3. Given your answer in 3.2, show that Jacobians can be used to solve multiple integrals by permitting multivariate substitution. 3.4. Show that if Y follows a uniform distribution within arbitrary bounds a and b (Y ∼ U (a, b)), then it can be found as a tranformation upon X ∼ U (0, 1). Find the transformation function. 3.5. Propose a proof for the Wiener-Khinchin theorem (eq. 3.121). 3.6. Given that the characteristic function of a central 2 degrees-of-freedom chisquare variate with variance P is given by Ψx (jv) = 1 , 1 − jπvP derive the infinite-series representation for the Rice PDF (envelope). 3.7. Demonstrate (3.317). 192 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING 3.8. From (3.317), derive the finite sum form which applies if n2 is an integer. 3.9. Show that −M 2 fx (x) = (2π) − 12 det(Σ) � � 1 T −1 exp − (x − µ) Σ (x − µ) . (3.491) 2 is indeed a PDF. Hint: Integrate over all elements of x to show that the area under the surface of fx (x) is equal to 1. To do so, use a transformation of random variables to decouple the multiple integrals. 3.10. Given two χ2 random variables Z1 and Z2 with PDFs fZ1 (x) = fZ2 (x) = 1 xn1 −1 e−x , Γ(n1 ) 1 xn2 −1 e−x/2.5 , Γ(n2 )2.5n2 (a) What is the PDF of the ratio Y = Z1 Z2 ? (b) If n1 = 2 and n2 = 2, derive the PDF of Y = Z1 + Z2 . Hint: Use characteristic functions and expansions into partial fractions. 3.11. Write a proof for theorem 3.18. 3.12. If X = BY, where X and Y are N × M complex random matrices and B is a fixed positive definite N × N matrix, prove that the Jacobian is given by (dX) = |det(B)|2M (dY). (3.492) 3.13. Show that if W = CW M (N, IM ), then its diagonal elements w11 , w22 , . . . , wM M are independent and identically distributed according to a χ2 law with 2M degrees of freedom. 3.14. Write a proof for theorem 3.29. Hint: start by studying theorems 3.25 and 3.26. APPENDIX 3.A: Derivation of the Gaussian distribution Given the binomial discrete probability distribution „ « N g(x) = P (X = x) = px (1 − p)N −x , x (3.A.1) where x is an integer between 0 and N , we wish to show that it tends towards the Gaussian distribution as N gets large. In fact, Abraham de Moivre originally derived the Gaussian distribution as an approximation to the binomial distribution and as a means of quickly calculating cumulative probabilities (e.g. the probability that no more than 7 heads are obtained in 10 coin tosses). The first appearance 193 APPENDIX 3.A: Derivation of the Gaussian distribution of the normal distribution and the associated CDF (the probability integral) occurs in a latin pamphlet published by de Moivre in 1733 [Daw and Pearson, 1972]. This original derivation hinges on the approximation to the Gamma function known as Stirling’s formula (which was, in fact, discovered in a simpler form and used by de Moivre before Stirling). It is the approach we follow here. The normal distribution was rediscovered by Laplace in 1778 as he derived the central limit theorem, and rediscovered independently by Adrian in 1808 and Gauss in 1809 in the process of characterizing the statistics of errors in astronomical measurements. We start by expanding log(g(x)) into its Taylor series representation about its point xm where g(x) is maximum. Since the relation between g(x) and log(g(x)) is monotonic, log(g(x)) is also maximum at x = xm . Hence, we have: log(g(x)) = = » – d log(g(x)) (x − xm ) + dx x=xm » – 1 d2 log(g(x)) (x − xm )2 + 2! dx2 x=xm » – 1 d3 log(g(x)) (x − xm )3 + · · · , 3! dx3 x=xm » – ∞ X 1 dk log(g(x)) (x − xm )k . k! dxk x=xm log(g(xm )) + (3.A.2) k=0 Taking the logarithm of g(x), we find log(g(x)) = log(N !) − log(x!) − log((N − x)!) + x log(p) + (N − x) log(1 − p). (3.A.3) Since N is large by hypothesis and the expansion is about xm which is distant from both 0 and N (since it corresponds to the maximum of g(x)), thus making x and N − x also large, Stirling’s approximation formula can be applied to the factorials x! and (N − x)! to yield 1 log(2πx) + x(log x − 1) 2 x(log(x) − 1). ≈ log(x!) ≈ (3.A.4) Thus, we have d log(x!) ≈ (log(x) − 1) + 1 = log(x), dx (3.A.5) and d log(N − x)! dx ≈ d [(N − x)(log(N − x) − 1)] dx = −(log(N − x) − 1) + (N − x) = − log(N − x), −1 N −x (3.A.6) which, combined with (3.A.3), leads to d log(g(x)) ≈ − log(x) + log(N − x) + log(p) − log(1 − p). dx (3.A.7) 194 SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING The maximum point xm can be easily found by setting the above derivative to 0 and solving it for x. Hence, « „ p N − xm = 0 log 1 − p xm p N − xm = 1 1 − p xm (N − xm )p = (1 − p)xm xm (p + (1 − p)) xm = Np = N p. (3.A.8) We can now find the first coefficients in the Taylor series expansion. First, we have » – d log(g(x)) = 0, (3.A.9) dx x=xm by definition, since xm is the maximum. Using (3.A.8), we have » 2 – d log(g(x)) 1 1 1 = − − =− dx2 x N − x N p(1 − p) m m x=xm » 3 – d log(g(x)) 1 1 1 − 2p = − = 2 2 , dx3 x2m (N − xm )2 p N (1 − p)2 x=xm (3.A.10) (3.A.11) where we note that the 3rd coefficient is much smaller (by a factor proportional to N1 ) than the second one as N and xm grow large. Since their contribution is not significant, all terms in the Taylor expansion beyond k = 2 are neglected. Taking the exponential of (3.A.2), we find (x−x )2 m − 2N p(1−p) g(x) = g(xm )e , (3.A.12) which is a smooth function of x. It only needs to be normalized to become a PDF, i.e. (3.A.13) fX (x) = Kg(x), where K= and Z ∞ g(x)dx = −∞ = = = Therefore, the distribution is »Z Z ∞ −∞ ∞ – g(x)dx , − (x−xm )2 g(xm )e 2N p(1−p) dx −∞ Z ∞ u2 − g(xm ) e 2N p(1−p) du −∞ Z ∞ v 1 − √ e 2N p(1−p) dv 2g(xm ) 2 v 0 p g(xm ) 2πN p(1 − p). fX (x) = √ (3.A.14) (x−µ)2 1 − e 2σ2 , 2πσ (3.A.15) (3.A.16) 195 REFERENCES where the variance is and the mean is given by σ 2 = N p(1 − p), (3.A.17) µ = xm = N p, (3.A.18) since the distribution is symmetric about its peak, which is situated at xm = N p. References [Andersen,1958] T. W. Andersen, An introduction to multivariate statistical analysis. New York: J. Wiley & Sons. [Davenport, 1970] W. B. Davenport, Jr., Probability and random processes. New York: McGraw-Hill. [Ratnarajah et al., 2004] T. Ratnarajah, R. Vaillancourt and M. Alvo, “Jacobians and hypergeometric functions in complex multivariate analysis,” to appear in Can. Appl. Math. Quaterly. [Daw and Pearson, 1972] R. H. Saw and E. S. Pearson, “Studies in the history of probability and statistics XXX : Abraham de Moivre’s 1733 derivation of the normal curve : a bibliographical note,” Biometrika, vol. 59, pp. 677-680. [Goodman, 1963] N. R. Goodman, “Statistical analysis based on a certain multivariate complex Gaussian distribution,” Ann. Math. Statist., vol. 34, pp. 152-177. [Knobloch, 1994] E. Knobloch, “From Gauss to Weierstrass: determinant theory and its evaluations,” in The intersection of history and mathematics, vol. 15 of Sci. Networks Hist. Stud., pp. 51-66. [Leon-Garcia, 1994] A. Leon-Garcia, Probability and random processes for electrical engineering, 2nd ed. Addison-Wesley. [James, 1954] A. T. James, “Normal multivariate analysis and the orthogonal group,” Ann. Math. Statist., vol. 25, pp. 40-75. [Muirhead, 1982] R. J. Muirhead, Aspects of mutivariate statistical theory. New York: J. Wiley & Sons. [Nakagami, 1960] M. Nakagami, “The m-distribution, a general formula of intensity distribution of rapid fading,” in Statistical Methods in Radio Wave Propagation, W. G. Hoffman, Ed., Oxford: Pergamon. [Prudnikov et al., 1986a] A. P. Prudnikov, Y. U. Brychkov and O. I. Marichev, Integrals and Series, vol. 2: Elementary Functions, Amsterdam: Gordon and Breach. [Prudnikov et al., 1986b] A. P. Prudnikov, Y. U. Brychkov and O. I. Marichev, Integrals and Series, vol. 2: Special Functions, Amsterdam: Gordon and Breach. [Papoulis and Pillai, 2002] A. Papoulis and S. U. Pillai, Probability, random variables, and stochastic processes, 4th ed. New York: McGraw-Hill. [Rice, 1944] S. O. Rice, ‘Mathematical analysis of random noise,” Bell Syst. Tech. J., vol. 23, pp. 282-332. [Rice, 1945] S. O. Rice, ‘Mathematical analysis of random noise,” Bell Syst. Tech. J., vol. 24, pp. 46-156. [Reprinted with [Rice, 1944] in Selected papers on noise and stochastic processes, N. Wax, ed. New York: Dover, pp. 133-294].