Chapter 3 PROBABILITY AND STOCHASTIC PROCESSES

Chapter 3
PROBABILITY AND STOCHASTIC PROCESSES
God does not play dice with the universe.
—Albert Einstein
Not only does God definitely play dice, but He sometimes confuses us by throwing them
where they cannot be seen.
—Stephen Hawking
Abstract
This chapter aims to provide a cohesive overview of basic probability concepts,
starting from the axioms of probability, covering events, event spaces, joint and
conditional probabilities, leading to the introduction of random variables, discrete probability distributions (PDP) and probability density functions (PDFs).
Then follows an exploration of random variables and stochastic processes including cumulative distribution functions (CDFs), moments, joint densities, marginal densities, transformations and algebra of random variables. A number of
useful univariate densities (Gaussian, chi-square, non-central chi-square, Rice,
etc.) are then studied in turn. Finally, an introduction to multivariate statistics is provided, including the exterior product (used instead of the conventional
determinant in matrix r.v. transformations), Jacobians of random matrix transformations and culminating with the introduction of the Wishart distribution.
Multivariate statistics are instrumental in characterizing multidimensional problems such as array processing operating over random vector or matrix channels.
Throughout the chapter, an emphasis is placed on complex random variables,
vectors and matrices, because of their unique importance in digital communications.
91
92
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
This material to appear as part of the book Space-Time Methods, vol. 1:
Space-Time Processing
c
�2009
by S«ebastien Roy. All rights reserved.
1.
Introduction
The theory of probability and stochastic processes is of central importance
in many core aspects of communication theory, including modeling of information sources, of additive noise, and of channel characteristics and fluctuations.
Ultimately, through such modeling, probability and stochastic processes are instrumental in assessing the performance of communication systems, wireless
or otherwise.
It is expected that most readers will have at least some familiarity with this
vast topic. If that is not the case, the brief overview provided here may be
less than satisfactory. Interested readers who wish for a fuller treatment of
the subject from an engineering perspective may consult a number of classic
textbooks [Papoulis and Pillai, 2002], [Leon-Garcia, 1994], [Davenport, 1970].
The subject matter of this book calls for the development of a perhaps lesserknown, but increasingly active, branch of statistics, namely multivariate statistical theory. It is an area whose applications — in the field of communication
theory in general, and in array processing / MIMO systems in particular —
have grown considerably in recent years, mostly thanks to the usefulness and
polyvalence of the Wishart distribution. To supplement the treatment given
here, readers are directed to the excellent textbooks [Muirhead, 1982] and [Andersen,1958].
2. Probability
Experiments, events and probabilities
Definition 3.1. A probability experiment or statistical experiment consists
in performing an action which may result in a number of possible outcomes,
the actual outcome being randomly determined.
For example, rolling a die and tossing a coin are probability experiments. In
the first case, there are 6 possible outcomes while in the second case, there are
2.
Definition 3.2. The sample space S of a probability experiment is the set of
all possible outcomes.
In the case of a coin toss, we have
S = {t, h} ,
(3.1)
93
Probability and stochastic processes
where t denotes “tails” and h denotes “heads”.
Another, slightly more sophisticated, probability experiment could consist
in tossing a coin five times and defining the outcome as being the total number
of heads obtained. Hence, the sample space would be
S = {1, 2, 3, 4, 5} .
(3.2)
Definition 3.3. An event E occurs if the outcome of the experiment is part of
a predetermined subset (as defined by E) of the sample space S.
For example, let the event A correspond to the obtention of an even number
of heads in the five-toss experiment. Therefore, we have
A = {2, 4} .
(3.3)
A single outcome can also be considered an event. For instance, let the event
B correspond to the obtention of three heads in the five-toss experiment, i.e.
B = {3}. This is also called a single event or a sample point (since it is a
single element of the sample space).
Definition 3.4. The complement of an event X consists of all the outcomes
(sample points) in the sample space S that are not in event X.
For example, the complement of event A consists in obtaining an odd number of heads, i.e.
Ā = {1, 3, 5} .
(3.4)
Two events are considered mutually exclusive if they have no outcome in
common. For instance, A and Ā are mutually exclusive. In fact, an event and
its complement must by definition be mutually exclusive. The events C = {1}
and D = {3, 5} are also mutually exclusive, but B = {3} and D are not.
Definition 3.5. Probability (simple definition): If an experiment has N possible and equally-likely exclusive outcomes, and M ≤ N of these outcomes
constitute an event E, then the probability of E is
P (E) =
M
.
N
(3.5)
Example 3.1
Consider a die roll. If the die is fair, all sample point are equally likely. Defining event Ek as corresponding to outcome / sample point k, where k ranges
from 1 to 6, we have
1
P (Ek ) = , k = 1 . . . 6.
6
Given an event F = {1, 2, 3, 4}, we find
P (F ) =
4
2
= ,
6
3
94
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
and
2
1
= .
6
3
Definition 3.6. The sum or union of two events is an event that contains all
the outcomes in the two events.
P (F̄ ) =
For example,
C ∪ D = {1} ∪ {3, 5} = {1, 3, 5} = Ā.
(3.6)
Therefore the event Ā corresponds to the occurence of event C or event D.
Definition 3.7. The product or intersection of two events is an event that
contains only the outcomes that are common in the two events.
For example
and
Ā ∩ B = {1, 3, 5} ∩ {3} = {3} ,
(3.7)
{1, 2, 3, 4} ∩ Ā = {1, 2, 3, 4} ∩ {1, 3, 5} = {1, 3} .
(3.8)
Therefore, the intersection corresponds to the occurence of one event and
the other.
It is noteworthy that the intersection of two mutually exclusive events yields
the null event, e. g.
E ∩ Ē = ∅,
(3.9)
where ∅ denotes the null event.
A more rigorous definition of probability calls for the statement of four postulates.
Definition 3.8. Probability (rigorous definition): Given that each event E is
associated with a corresponding probability P (E),
Postulate 1: The probability of a given event E is such that P (E) ≥ 0.
Postulate 2: The probability associated with the null event is zero, i.e.
P (∅) = 0.
Postulate 3: The probability of the event corresponding to the entire sample
space (referred to as the certain event) is 1, i.e. P (S) = 1.
Postulate 4: Given a number N of mutually exclusive events X1 , X2 ,
. . . XN , then the probability of the union of these events is given by
N
�
� �
P ∪N
X
=
P (Xi ).
i=1 i
i=1
(3.10)
95
Probability and stochastic processes
From postulates 1, 3 and 4, it is easy to deduce that the probability of an
event E must necessarily satisfy the condition
P (X) ≤ 1.
The proof is left as an exercise.
Twin experiments
It is often of interest to consider two separate experiments as a whole. For
example, if one probability experiment consists of a coin toss with event space
S = {h, t}, two consecutive (or simultaneous) coin tosses constitute twin experiments or a joint experiment. The event space of such a joint experiment is
therefore
S2 = {(h, h), (h, t), (t, h), (t, t)} .
(3.11)
Let Xi (i = 1, 2) correspond to an outcome on the first coin toss and Yj (j =
1, 2) correspond to an outcome on the second coin toss. To each joint outcome
(Xi , Yj ) is associated a joint probability P (Xi , Yj ). This corresponds naturally
to the probability that events Xi and Yj occurred and it can therefore also be
written P (Xi ∩ Yj ). Suppose that given only the set of joint probabilities, we
wish to find P (X1 ) and P (X2 ), that is, the marginal probabilities of events X1
and X2 .
Definition 3.9. The marginal probability of an event E is, in the context of
a joint experiment, the probability that event E occurs irrespective of the other
constituent experiments in the joint experiment.
In general, a twin experiment has outcomes Xi , i = 1, 2, . . . , N1 , for the
first experiment and Yj , j = 1, 2, . . . , N2 for the second experiment. If all the
Yj are mutually exclusive, the marginal probability of Xi is given by
P (Xi ) =
N2
�
P (Xi , Yj ) .
(3.12)
j=1
By the same token, if all the Xi are mutually exclusive, we have
P (Yj ) =
N1
�
P (Xi , Yj ) .
(3.13)
i=1
Now, suppose that the outcome of one of the two experiments is known, but
not the other.
Definition 3.10. The conditional probability of an event Xi given an event
Yj is the probability that event Xi will occur given that Yj has occured.
96
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
The conditional probability of Xi given Yj is defined as
P (Xi |Yj ) =
P (Xi , Yj )
.
P (Yj )
(3.14)
P (Yj |Xi ) =
P (Xi , Yj )
.
P (Xi )
(3.15)
Likewise, we have
A more general form of the above two relations is known as Bayes’ rule.
Definition 3.11. Bayes’ rule: Given {Y1 , . . . , YN }, a set of N mutually exclusive events whose union forms the entire sample space S, and X is any
arbitrary event in S with non-zero probability (P (X) ≥ 0), then
P (Yj |X) =
=
=
P (X, Yj )
P (X)
P (Yj ) P (X|Yj )
P (X)
P (Yj ) P (X|Yj )
.
�N
j=1 P (Yj ) P (X|Yj )
(3.16)
(3.17)
Another important concern in a twin experiment is whether or not the occurrence of one event (Xi ) influences the probability that another event (Yj )
will occur, i.e. whether the events Xi and Yj are independent.
Definition 3.12. Two events X and Y are said to be statistically independent
if P (X|Y ) = P (X) and P (Y |X) = P (Y ).
For instance, in the case of the twin coin toss joint experiment, the events
X1 = {h} and X2 = {t} — being the potential outcomes of the first coin
toss — are independent from the events Y1 = {h} and Y2 = {t}, the potential
outcomes of the second coin toss. However, X1 is certainly not independent
from X2 since the occurence of X1 precludes the occurence of X2 .
Typically, independence results if the two events considered are generated
by physically separate probability experiments (e. g. two different coins).
Example 3.2
Consider a twin die toss. The results on one die is independent from the other
since there is no physical linkage between the two dice. However, if we reformulate the joint experiment as follows:
Experiment 1: the outcome is the sum of the two dice and corresponds to
the event Xi , i=1, 2, . . . , 11. The corresponding sample space is SX =
{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.
Probability and stochastic processes
97
Experiment 2: the outcome is the magnitude of the difference between the
two dice and corresponds to the event Yj , j=1, 2, . . . , 6. The corresponding
sample space is SY = {0, 1, 2, 3, 4, 5}.
In this case, the two experiments are linked since they are both derived from
the same dice toss. For example, if Xi = 4, then Yj can only take the values
{0, 2}. We therefore have statistical dependence.
From (3.14), we find the following:
Definition 3.13. Multiplicative rule: Given two events X and Y , the probability that they both occur is
P (X, Y ) = P (X)P (Y |X) = P (Y )P (X|Y ).
(3.18)
Hence, the probability that X and Y occur is the probability that one of
these events occurs times the probability that the other event occurs, given that
the first one has occurred.
Furthermore, should events X and Y be independent, the above reduces to
(special multiplicative rule):
P (X, Y ) = P (X)P (Y ).
3.
(3.19)
Random variables
In the preceding section, it was seen that a statistical experiment is any operation or physical process by which one or more random measurements are
made. In general, the outcome of such an experiment can be conveniently
represented by a single number.
Let the function X(s) constitute such a mapping, i.e. X(s) takes on a value
on the real line as a function of s where s is an arbitrary sample point in the
sample space S. Then X(s) is a random variable.
Definition 3.14. A function whose value is a real number and is a function of
an element chosen in a sample space S is a random variable or r.v..
Given a die toss, the sample space is simply
S = {1, 2, 3, 4, 5, 6} .
(3.20)
In a straightforward manner, we can define a random variable X(s) = s
which takes on a value determined by the number of spots found on the top
surface of the die after the roll.
A slightly less obvious mapping would be the following
�
−1 if s = 1, 3, 5,
X(s) =
(3.21)
1
if s = 2, 4, 6,
98
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
which is an r.v. that takes on a value of 1 if the die roll is even and -1 otherwise.
Random variables need not be defined directly from a probability experiment, but can actually be derived as functions of other r.v.’s. Going back to
example 3.2, and letting D1 (s) = s be an r.v. associated with the first die and
D2 (s) = s with the second die, we can define the r.v’s
X = D1 + D2 ,
Y = |D1 − D2 |,
(3.22)
(3.23)
where X is an r.v. corresponding to the sum of the dice and defined as X(sX ) =
sX , where sX is an element of the sample space SX = {2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12}. The same holds for Y , but with respect to sample space
SY = {0, 1, 2, 3, 4, 5}.
Example 3.3
Consider a series of N consecutive coin tosses. Let us define the r.v. X as
being the total number of heads obtained. Therefore, the sample space of X is
SX = {0, 1, 2, 3, . . . , N } .
(3.24)
Furthermore, we define an r.v. Y = X
N , it being the ratio of the number of heads
to the total number of tosses. If N = 1, the sample
� space
�Y corresponds to the
1
set
{0,
1}.
If
N
=
2,
the
sample
space
becomes
0,
,
1
, and if N = 10, it is
2
� 1 2
�
0, 10 , 10 , . . . , 1 .
Hence, the variable Y is always constrained between 0 and 1; however, if
we let N tend towards infinity, it can take an infinite number of values within
this interval (its sample space is infinite) and it thus becomes a continuous r.v.
Definition 3.15. A continuous random variable is an r.v. which is not restricted to a discrete set of values, i.e. it can take any real value within a
predetermined interval or set of intervals.
It follows that a discrete random variable is an r.v. restricted to a finite
set of values (its sample space is finite) and corresponds to the type of r.v. and
probability experiments discussed so far.
Typically, discrete r.v.’s are used to represent countable data (number of
heads, number of spots on a die, number of defective items in a sample set,
etc.) while continuous r.v.’s are used to represent measurable data (heights,
distances, temperatures, electrical voltages, etc.).
Probability distribution
Since each value that a discrete random variable can assume corresponds to
an event or some quantification / mapping / function of an event, it follows that
each such value is associated with a probability of occurrence.
99
Probability and stochastic processes
Consider the variable X in example 3.3. If N = 4, we have the following:
x
P (X = x)
0
1
2
3
4
1
16
4
16
6
16
4
16
1
16
Assuming a fair coin, the underlying sample space is made up of sixteen
outcomes
S = {tttt, ttth, ttht, tthh, thtt, thth, thht, thhh, httt, htth, htht,
hthh, hhtt, hhth, hhht, hhhh} ,
(3.25)
1
and each of its sample points is equally likely with probability 16
. Only one of
1
these outcomes (tttt) has no heads; it follows that P (X = 0) = 16
. However,
four outcomes have one head and three tails (httt, thtt, ttht, ttth), leading to
4
to P (X = 1) = 16
. Furthermore, there are 6 possible combinations of 2 heads
6
and 2 tails, yielding P (X = 2) = 16
. By symmetry, we have P (X = 3) =
4
1
P (X = 1) = 16 and P (X = 4) = P (X = 0) = 16
.
Knowing that the number of combinations of N distinct objects taken n at
a time is
�
�
N!
N
=
,
(3.26)
n
n!(N − n)!
the probabilities tabulated above (for N = 4) can be expressed with a single
formula, as a function of x:
� �
1
4
P (X = x) =
, x = 0, 1, 2, 3, 4.
(3.27)
x 16
Such a formula constitutes the discrete probability distribution of X.
Suppose now that the coin is not necessarily fair and is characterized by
a probability p of getting heads and a probability 1 − p of getting tails. For
arbitrary N , we have
�
�
N
P (X = x) =
px (1 − p)N −x , x ∈ {0, 1, . . . , N } ,
(3.28)
x
which is known as the binomial distribution.
Probability density function
Consider again example 3.3. We have found that the variable X has a discrete probability distribution of the form (3.28). Figure 3.1 shows histograms
of P (X = x) for increasing values of N . It can be seen that as N gets large,
the histogram approaches a smooth curve and its “tails” spread out. Ultimately,
if we let N tend towards infinity, we will observe a continuous curve like the
one in Figure 3.1d.
100
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
The underlying r.v. is of the continuous variety, and Figure 3.1d is a graphical representation of its probability density function (PDF). While a PDF
cannot be tabulated like a discrete probability distribution, it certainly can be
expressed as a mathematical function. In the case of Figure 3.1d, the PDF is
expressed
2
− x2
1
fX (x) = √
e 2σX ,
2πσX
(3.29)
2 is the variance of the r.v. X. This density function is the allwhere σX
important normal or Gaussian distribution. See the Appendix for a derivation
of this PDF which ties in with the binomial distribution.
0.2
0.25
0.15
P (X = x)
P (X = x)
0.2
0.15
0.1
0.1
0.05
0.05
0
0
2
1
3
4
5
6
7
8
1
9
2
3
4
5
6
7
8
x
(a) N = 8
(b) N = 16
0.14
0.14
0.12
0.12
0.1
0.1
P (X = x)
P (X = x)
9 10 11 12 13 14 15 16 17
x
0.08
0.06
0.08
0.06
0.04
0.04
0.02
0.02
0
1
5
10
20
15
25
30
33
x
(c) N = 32
0
0
5
10
20
15
25
30
x
(d) Gaussian PDF
Figure 3.1. Histograms of the discrete probability function of the binomial distributions with
(a) N = 8; (b) N = 16; (c) N = 32; and (d) a Gaussian PDF with the same mean and variance
as the binomial distribution with N = 32.
A PDF has two important properties:
1. fX (x) ≥ 0 for all x, since a negative probability makes no sense;
�∞
2. −∞ fX (x)dx = 1 which is the continuous version of postulate 3 from
section 2.
101
Probability and stochastic processes
It is often of interest to determine the probability that an r.v. takes on a
value which is smaller than a predetermined threshold. Mathematically, this is
expressed
� x
FX (x) = P (X ≤ x) =
fX (α)dα, −∞ < x < ∞,
(3.30)
−∞
where fX (x) is the PDF of variable X and FX (x) is its cumulative distribution function (CDF). A CDF has three outstanding properties:
1. FX (−∞) = 0 (obvious from (3.30));
2. FX (∞) = 1 (a consequence of property 2 of PDFs);
3. FX (x) increases in a monotone fashion from 0 to 1 (from (3.30) and property 1 of PDFs).
Furthermore, the relation (3.30) can be inverted to yield
dFX (x)
.
(3.31)
dx
It is also often useful to determine the probability that an r.v. X takes on a
value falling in an interval [x1 , x2 ]. This can be easily accomplished using the
PDF of X as follows:
� x2
� x2
� x1
P (x1 < X ≤ x2 ) =
fX (x)dx =
fX (x)dx −
fX (x)dx
fX (x) =
x1
−∞
= FX (x2 ) − FX (x1 ).
−∞
(3.32)
This leads us to the following counter-intuitive observation. If the PDF is
continuous and we wish to find P (X = x1 ), it is insightful to proceed as
folllows:
lim
P (x1 < X ≤ x2 )
x2 → x1
� x2
lim
=
fX (x)dx
x2 → x1 x1
� x2
= fX (x1 ) lim
dx
x2 → x1 x1
P (X = x1 ) =
= fX (x1 ) lim
(x2 − x1 )
x2 → x1
= 0.
(3.33)
102
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Hence, the probability that X takes on exactly a given value x1 is null.
Intuitively, this is a consequence of the fact that X can take on an infinity of
values and the “sum” (integral) of the associated probabilities must be equal to
1.
However, if the PDF is not continuous, the above observation doesn’t necessarily hold. A discrete r.v., for example, not only has a discrete probability
distribution, but also a corresponding PDF. Since the r.v. can only take on a
finite number of values, its PDF is made up of Dirac impulses at the locations
of the said values.
For the binomial distribution, we have
�
N �
�
N
fX (x) =
pn (1 − p)N −n δ(x − n),
n
(3.34)
n=0
and, for a general discrete r.v. with a sample space of N elements, we have
fX (x) =
N
�
n=1
P (X = xn )δ(x − xn ).
(3.35)
Moments and characteristic functions
What exactly is the mean of a random variable? One intuitive answer is that
it is the “most likely” outcome of the underlying experiment. However, while
this is not far from the truth for well-behaved PDFs (which have a single peak
at, or in the vicinity of, their mean), it is misleading since the mean, in fact,
may not even be a part of the PDF’s support (i.e. its sample space or range of
possible values). Nonetheless, we refer to the mean of an r.v. as its expected
value, denoted by the expectation operator �·�. It is defined
�X� = µX =
�
∞
xfX (x)dx,
(3.36)
−∞
and it also happens to be the first moment of the r.v. X.
The expectation operator bears its name because it is indeed the best “educated guess” one can make a priori about the outcome of an experiment given
the associated PDF. While it may indeed lie outside the set of allowable values
of the r.v., it is the quantity that will on average minimize the error between X
and its a priori estimate X̂ = �X�. Mathematically, this is expressed
�X� = min
X̂
��
��
�
�
�X − X̂ � .
(3.37)
103
Probability and stochastic processes
Although this is a circular definition, it is insightful, especially if we expand
the right-hand side into its integral representation:
�X� = min
X̂
�
∞
−∞
�
�
�
�
�X̂ − x� fX (x)dx.
(3.38)
For another angle to this concept, consider a series of N trials of the same
experiment. If each trial is unaffected by the others, the outcomes are associated with a set of N independent and identically distributed (i.i.d.) r.v.’s
X1 , . . . , XN . The average of these outcomes is itself an r.v. given by
Y =
N
�
Xn
n=1
N
(3.39)
.
As N gets large, Y will tend towards the mean of the Xn ’s and in the limit
Y = lim
N →∞
N
�
Xn
n=1
N
= �X� ,
(3.40)
where �X� = �X1 � = . . . = �XN �. This is known in statistics as the weak
law of large numbers.
In general, the k th raw moment is defined as
� � � ∞
Xk =
xk fX (x)dx,
(3.41)
−∞
where the law of large numbers still holds since
� ∞
� � � ∞
k
k
X =
x fX (x)dx =
zfZ (z)dz = �Z� ,
−∞
(3.42)
−∞
where fZ (z) is the PDF of Z = X k .
In the same manner, we can find the expectation of any arbitrary function
Y = g(X) as follows
� ∞
�Y � = �g(X)� =
g(x)fX (x)dx.
(3.43)
−∞
One such useful function is Y = (X − µX )k where µX = �X� is the mean
of X.
Definition 3.16. The expectation of Y = (X − µX )k is the k th central moment of the r.v. X.
104
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
From (3.43), we have
�
� �
k
�Y � = (X − µX ) =
∞
−∞
(x − µX )k fX (x)dx.
(3.44)
Definition 3.17. The 2nd central moment �X − µX �2 is called the variance
and its square root is the standard deviation.
2 , is given by
The variance of X, denoted σX
� ∞
2
σX =
(x − µX )2 fX (x)dx,
(3.45)
−∞
and it is useful because (like the standard deviation), it provides a measure of
the degree of dispersion of the r.v. X about its mean.
Expanding the quadratic form (x − µX )2 in (3.45) and integrating term-byterm, we find
� �
2
σX
= X 2 − 2µX �X� + µ2X
� �
= X 2 − 2µ2X + µ2X
� �
= X 2 − µ2X .
(3.46)
In deriving the above, we have inadvertently exploited two of the properties
of the expectation operator:
Property 3.1. Given the expectation of a sum, where each term may or may
not involve the same random variable, it is equal to the sum of the expectations,
i.e.
�
�X + Y � = �X� + �Y � ,
�
� �
X + X 2 = �X� + X 2 .
Property 3.2. Given the expectation of the product of an r.v. by a deterministic
quantity α, the said quantity can be removed from the expectation, i.e.
�αX� = α �X� .
These two properties stem readily from the (3.43) and the properties of integrals.
Theorem 3.1. Given the expectation of a product of random variables, it is
equal to the product of expectations, i.e.
�XY � = �X� �Y � ,
if and only if the two variables X and Y are independent.
105
Probability and stochastic processes
Proof. We can readily generalize (3.43) for the multivariable case as follows:
�
�
∞
∞
···
�g(X1 , X2 , · · · , XN )� =
g (X1 , X2 , · · · , XN ) ×
−∞
�
�� −∞�
N -fold
fX1 ,··· ,XN (x1 , · · · xN ) dx1 · · · dxN . (3.47)
Therefore, we have
�XY � =
�
∞
−∞
�
∞
xyfX,Y (x, y)dxdy,
(3.48)
−∞
where, if and only if X and Y are independent, the density factors to yield
� ∞� ∞
�XY � =
xyfX (x)fY (y)dxdy
−∞ −∞
� ∞
� ∞
=
xfX (x)dx
yfY (y)dy
−∞
∞
= �X� �Y � .
(3.49)
The above theorem can be readily extended to the product of N independent variables or N expressions of independent random variables by applying
(3.46).
Definition 3.18. The characteristic function (C. F.) of an r.v. X is defined
� ∞
� jtX �
φX (jt) = e
=
ejtX fX (x)dx,
(3.50)
where j =
√
−∞
−1.
It can be seen that the integral is in fact an inverse Fourier transform in the
variable t. It follows that the inverse of (3.50) is
� ∞
1
fX (x) =
φX (jt)e−jtx dt.
(3.51)
2π −∞
Characteristic functions play a role similar to the Fourier transform. In other
words, some operations are easier to perform in the characteristic function domain than in the PDF domain. For example, there is a direct relationship between the C. F. and the moments of a random variable, allowing the latter to be
obtained without integration (if the C. F. is known).
106
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
The said relationship involves evaluation of the k th derivative of the C. F. at
t = 0, i. e.
�
� �
k
�
k
k d φX (jt) �
X = (−j)
.
(3.52)
dtk �t=0
As will be seen later, characteristic functions are also useful in determining
the PDF of sums of random variables. Furthermore, since the crossing into the
C. F. domain is, in fact, an inverse Fourier transform, all properties of Fourier
transforms hold.
Functions of one r.v.
Consider an r.v. Y defined as a function of another r.v. X, i.e.
Y = g(X).
(3.53)
If this function is uniquely invertible, we have X = g −1 (Y ) and
FY (y) = P (Y ≤ y) = P (g(X) ≤ y) = P (X ≤ g −1 (y))
= FX (g −1 (y)).
(3.54)
Differentiating with respect to y allows us to relate the PDFs of X and Y :
�
�
d
fX g −1 (y)
dy
�
�
�
�
d g −1 (y)
d
=
FX g −1 (y)
−1
d (g (y))
dy
�
�
� −1 � d g −1 (y)
d
=
FX g (y)
d (g −1 (y))
dy
� −1 �
�
� d g (y)
= fX g −1 (y)
dy
fY (y) =
(3.55)
(3.56)
In the general case, the equation Y = g(X) may have more than one root.
If there are N real roots denoted x1 (y), x2 (y), . . . , xN (y), the PDF of Y is
given by
�
�
N
�
� ∂xn (y) �
�.
fY (y) =
fX (xn (y)) ��
(3.57)
∂y �
n=1
Example 3.4
Let Y = AX + B where A and B are arbitrary constants. Therefore, we have
X = Y −B
A and
�
�
∂
y−B
A
∂y
=
1
A
107
Probability and stochastic processes
.
It follows that
1
fY (y) = fX
A
�
y−B
A
�
.
(3.58)
Suppose that X follows a Gaussian distribution with a mean of zero and a
2 . Hence,
variance of σX
2
fX (x) = √
− x2
1
e 2σX ,
2πσX
(3.59)
and
(y−B)2
fY (y) = √
− 2 2
1
e 2A σX .
2πσX A
(3.60)
2 A2 and its mean
The r.v. Y is still a Gaussian variate, but its variance is σX
is B.
Pairs of random variables
In performing multiple related experiments or trials, it becomes necessary
to manipulate multiple r.v.’s. Consider two r.v.’s X1 and X2 which stem from
the same experiment or from twin related experiments. The probability that
X1 < x1 and X2 < x2 is determined from the joint CDF
P (X1 < x1 , X2 < x2 ) = FX1 ,X2 (x1 , x2 ).
(3.61)
Furthermore, the joint PDF can be obtained by derivation:
fX1 ,X2 (x1 , x2 ) =
∂ 2 FX1 ,X2 (x1 , x2 )
.
∂x1 ∂x2
(3.62)
Following the concepts presented in section 2 for probabilities, the PDF
of the r.v. X1 irrespective of X2 is termed the marginal PDF of X1 and is
obtained by “averaging out” the contribution of X2 to the joint PDF, i.e.
� ∞
fX1 (x1 ) =
fX1 ,X2 (x1 , x2 )dx2 .
(3.63)
−∞
According to Bayes’ rule, the conditional PDF of X1 given that X2 = x2
is given by
fX1 ,X2 (x1 , x2 )
fX1 |X2 (x1 |x2 ) =
.
(3.64)
fX2 (x2 )
108
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Transformation of two random variables
Given 2 r.v.’s X1 and X2 with joint PDF fX1 ,X2 (x1 , x2 ), let Y1 = g(X1 , X2 ),
Y2 = g2 (X1 , X2 ), where g1 (X1 , X2 ) and g2 (X1 , X2 ) are 2 arbitrary singlevalued continuous functions of X1 and X2 .
Let us also assume that g1 and g2 are jointly invertible, i.e.
X1 = h1 (Y1 , Y2 ),
X2 = h2 (Y1 , Y2 ),
(3.65)
where h1 and h2 are also single-valued and continuous.
The application of the transformation defined by g1 and g2 amounts to a
change in the coordinate system. If we consider an infinitesimal rectangle of
dimensions ∆y1 ×∆y2 in the new system, it will in general be mapped through
the inverse transformation (defined by h1 and h2 ) to a four-sided curved region
in the original system (see Figure 3.2). However, since the region is infinitesimally small, the curvature induced by the transformation can be abstracted out.
We are left with a parallelogram and we wish to calculate its area.
y2
∆x2
w
∆y1
h1 , h2
α
∆x1
∆y2
g1 , g2
v
y1
Figure 3.2. Coordinate system change under transformation (g1 , g2 ).
This can be performed by relying on the tangential vectors v and w. The
sought-after area is then given by
A = ∆x1 ∆x2 = �v�2 �w�2 sin α∆y1 ∆y2
�
�
v[1] w[1]
= det
∆y1 ∆y2 ,
v[2] w[2]
(3.66)
where the scaling factor is denoted
�
�
��
�
A
v[1] w[1] ��
�
J=
= det
,
v[2] w[2] �
∆y1 ∆y2 �
(3.67)
and is the Jacobian J of the transformation. The Jacobian embodies the scaling
effects of the transformation and ensures that the new PDF will integrate to
unity.
109
Probability and stochastic processes
Thus, the joint PDF of Y1 and Y2 is given by
fY1 ,Y2 (y1 , y2 ) = JfX1 ,X2 (g1−1 (y1 , y2 ), g2−1 (y1 , y2 )).
From the tangential vectors, the Jacobian is defined
�
�
�� �
� ��
∂h1 (x1 ,x2 )
∂h2 (x1 ,x2 )
�
� �
∂y ��
�
�
∂x
∂x
J = �det ∂h1 (x11,x2 ) ∂h2 (x11,x2 ) � = ��det
.
�
�
∂x �
∂x
∂x
2
(3.68)
(3.69)
2
In the above, it has been assumed that the mapping between the original
and the new coordinate system was one-to-one. What if several regions in the
original domain map to the same rectangular area in the new domain? This
implies that the system
Y1 = h1 (X1 , X2 ),
(3.70)
Y2 = h2 (X1 , X2 ),
�
�
� �
�
�
(K) (K)
(1) (1)
(2) (2)
. Then all
has several solutions x1 , x2 , x1 , x2 , . . . , x1 , x2
these solutions (or roots) contribute equally to the new PDF, i.e.
fY1 ,Y2 (y1 , y2 ) =
K
�
k=1
�
� �
�
(k) (k)
(k) (k)
fX1 ,X2 x1 , x2 J x1 , x2 .
(3.71)
Multiple random variables
Transformation of N random variables
It is often the case (and especially in multidimensional signal processing
problems, which constitute a main focal point of the present book) that a relatively large set of random variables must be considered collectively. It can
be considered then that such a set of variables represent the state of a single
stochastic process.
Given an arbitrary number N of random variables X1 , . . . , XN which collectively behave according to joint PDF fX1 ,X2 ,··· ,XN (x1 , x2 , . . . , xN ), a transformation is defined by the system of equations
y = g(x),
(3.72)
where x and y are N × 1 vectors, and g is an N × 1 vector function of vector
x.
The reverse transformation may have one or more solutions, i.e.
x(k) = gk−1 (y),
where K is the number of solutions.
k ∈ [1, 2, · · · , K],
(3.73)
110
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
The Jacobian corresponding to the kth solution is given by
�
�
��
�
∂gk−1 (x) ��
�
Jk = �det
�,
�
�
∂x
(3.74)
which is simply a generalization (with a slight change in notation) of (3.69).
This directly leads to
fy (y) =
K
�
k=1
�
�
fx gk−1 (y) Jk .
(3.75)
Joint characteristic functions
Given a set of N random variables characterized by a joint PDF, it is relevant
to define a corresponding joint characteristic function.
Definition 3.19. The joint characteristic function of a set of r.v.’s X1 , X2 ,
. . . , XN with joint PDF fX1 ,X2 ,··· ,XN (x1 , x2 , . . . , xN ) is given by
�
�
�
φX1 ,X2 ,··· ,XN (t1 , t2 , . . . , tN ) = ej(t1 x1 +t2 x2 +···tN xN ) =
∞
�
∞
···
ej(t1 x1 +···tN xN ) fX1 ,··· ,XN (x1 , . . . , xN )dx1 · · · dxN .
−∞
−∞
�
��
�
N -fold
Algebra of random variables
Sum of random variables
Suppose we want to characterize an r.v. which is defined as the sum of two
other independent r.v.’s, i.e.
Y = X1 + X2 .
(3.76)
One way to attack this problem is to fix one r.v., say X2 , and treat this as a
transformation from X1 to Y . Given X2 = x2 , we have
Y = g(X1 ),
(3.77)
g(X1 ) = X1 + x2 ,
(3.78)
g −1 (Y ) = Y − x2 .
(3.79)
where
and
111
Probability and stochastic processes
It follows that
∂g −1 (y)
∂y
= fX1 (Y − x2 ).
fY |X2 (y|x2 ) = fX1 (y − x2 )
(3.80)
Furthermore, we know that
fY (y) =
=
�
∞
�−∞
∞
−∞
fY,X2 (y, x2 )dx2
fY |X2 (y|x2 )fX2 (x2 )dx2
which, by substituting (3.80), becomes
� ∞
fY (y) =
fX1 (y − x2 )fX2 (x2 )dx2 .
(3.81)
(3.82)
−∞
This is the general formula for the sum of two independent r.v.’s and it can
be observed that it is in fact a Fourier convolution of the two underlying PDFs.
It is common knowledge that a convolution becomes a simple multiplication in the Fourier transform domain. The same principle applies here with
characteristic functions and it is easily demonstrated.
Consider a sum of N independent random variables:
Y =
N
�
(3.83)
Xn .
n=1
The characteristic function of Y is given by
�
�
� � PN
φY (jt) = ejtY = ejt n=1 Xn .
Assuming that the Xn ’s are independent, we have
�N
�
N
N
�
�
� jtXn � �
jtXn
φY (jt) =
e
e
=
φXn (jt).
=
n=1
n=1
(3.84)
(3.85)
n=1
Therefore, the characteristic function of a sum of r.v.’s is the product of
the constituent C.F.’s. The corresponding PDF is obtainable via the Fourier
transform
� ∞ �
N
1
fY (y) =
φXn (jt)e−jyt dt.
(3.86)
2π −∞
n=1
112
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
What if the random variables are not independent and exhibit correlation?
Consider again the case of two random variables; this time, the problem is
conveniently addressed by an adequate joint transformation. Let
Y = g1 (X1 , X2 ) = X1 + X2 ,
Z = g2 (X1 , X2 ) = X2 ,
(3.87)
(3.88)
where Z is not a useful r.v. per se, but was included to allow a 2 × 2 transformation.
The corresponding Jacobian is
�
�
� 1 0 �
�
� = 1.
J =�
(3.89)
−1 1 �
Hence, we have
fY,Z (y, z) = fX1 ,X2 (y − z, z),
(3.90)
and we can find the marginal PDF of Y with the usual integration procedure,
i.e.
� ∞
fY (y) =
fY,Z (y, z)fZ (z)dz
−∞
� ∞
=
fX1 ,X2 (y − x2 , x2 )fX2 (x2 )dx2 .
(3.91)
−∞
Products of random variables
A similar approach applies to products of r.v.’s. Consider the following
product of two r.v.’s:
Z = X1 X2 .
(3.92)
We can again fix X2 and obtain the conditional PDF of Z given X2 = x2 as
was done for the sum of two r.v.’s. Given that g(X1 ) = X1 x2 and g −1 (Z) =
Z
x2 , this approach yields
� �
� �
z
∂
z
fZ|X2 (z|x2 ) = fX1
x2 ∂z x2
� �
1
z
=
fX1
.
(3.93)
x2
x2
Therefore, we have
fZ (z) =
=
�
∞
fZ|X2 (z|x2 )fX2 (x2 )dx2
� �
z
1
fX1
fX2 (x2 ) dx2 .
x2
x2
−∞
−∞
� ∞
(3.94)
113
Probability and stochastic processes
This formula happens to be the Mellin convolution of fX1 (x) and fX2 (x)
and it is related to the Mellin tranform in the same way that Fourier convolution
is related to the Fourier transform (see the overview of Mellin transforms in
subsection 3).
Hence, given a product of N independent r.v.’s
Z=
N
�
Xn ,
(3.95)
n=1
we can immediately conclude that the Mellin transform of fZ (z) is the product
of the Mellin transforms of the constituent r.v.’s, i.e.
MfZ (s) =
N
�
n=1
MfX (s) = [MfX (s)]N .
(3.96)
It follows that we can find the corresponding PDF through the inverse Mellin
transform. This is expressed
�
1
fZ (z) =
MfZ (s) x−s dx,
(3.97)
2π L±∞
where one particular integration path was chosen, but others are possible (see
chapter 2, section 3, subsection on Mellin transforms).
4.
Stochastic processes
Many observable parameters that are considered random in the world around
us are actually functions of time, e.g. ambient temperature and pressure, stock
market prices, etc. In the field of communications, actual useful message signals are typically considered random, although this might seem counterintuitive. The randomness here relates to the unpredictability that is inherent in
useful communications. Indeed, if it is known in advance what the message
is (the message is predetermined or deterministic), there is no point in transmitting it at all. On the other hand, the lack of any fore-knowledge about the
message implies that from the point-of-view of the receiver, the said message
is a random process or stochastic process. Moreover, the omnipresent white
noise in communication systems is also a random process, as well as the channel gains in multipath fading channels, as will be seen in chapter 4.
Definition 3.20. Given a random experiment with a sample space S, comprising outcomes λ1 , λ2 , . . . , λN , and a mapping between every possible outcome
λ and a set of corresponding functions of time X(t, λ), then this family of
functions, together with the mapping and the random experiment, constitutes a
stochastic process.
114
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
In fact, a stochastic process is a function of two variables; the outcome
variable λ which necessarily belongs to sample space S, and time t, which can
take any value between −∞ and ∞.
Definition 3.21. To a specific outcome λi in a stochastic process corresponds
a single time function X(t, λi ) = xi (t) called a member function or sample
function of the said process.
Definition 3.22. The set of all sample functions in a stochastic process is called
an ensemble.
We have established that for a given outcome λi , X(t, λi ) = xi (t) is a
predetermined function out of the ensemble of possible functions. If we fix
t to some value t1 instead of fixing the outcome λ, then X(t1 , λ) becomes
a random variable. At another time instant, we find another random variable X(t2 , λ) which is most likely correlated with X(t1 , λ). It follows that
a stochastic process can also be seen as a succession of infinitely many joint
random variables (one for each defined instant in time) with a given joint distribution. Any set of instances for these r.v.’s constitutes one of the member
functions. While this view is conceptually helpful, it is hardly practical for
manipulating processes given the infinite number of individual r.v.’s required.
Instead of continuous time, it is often sufficient to consider only a predetermined set of time instants t1 , t2 , . . . , tN . In that case, the set of random
variables X(t1 ), X(t2 ), . . . , X(tN ) becomes a random vector x with a PDF
fx (x), where x = [X(t1 ), X(t2 ), · · · , X(tN )]T . Likewise, a CDF can be
defined:
Fx (x) = P (X(t1 ) ≤ x1 , X(t2 ) ≤ x2 , . . . , X(tN ) ≤ xN ) .
(3.98)
Sometimes, it is more productive to consider how the process is generated.
Indeed, a large number of infinitely precise time functions can in some cases
result from a relatively simple random mechanism, and such simple models
are highly useful to the communications engineer, even when they are approximations or simplifications of reality. Such parametric modeling as a function
of one or more random variables can be expressed as follows:
X(t, λ) = g1 (Y1 , Y2 , · · · , YN , t),
(3.99)
where g1 is a function of the underlying r.v.’s Y1 , Y2 , . . . , YN and time t, and
λ = g2 (Y1 , Y2 , · · · , YN ),
(3.100)
where g2 is a function of the r.v.’s {Yn }, which together uniquely determine
the outcome λ.
115
Probability and stochastic processes
Since at a specific time t a process is essentially a random variable, it follows
that a mean can be defined as
� ∞
�X(t)� = µX (t) =
x(t)fX(t) (x)dx.
(3.101)
−∞
Other important statistics are defined below.
Definition 3.23. The joint moment
�
RXX (t1 , t2 ) = �X(t1 )X(t2 )� =
∞
−∞
�
∞
−∞
x1 x2 fX(t1 ),X(t2 ) (x1 , x2 )dx1 dx2 ,
is known as the autocorrelation function.
Definition 3.24. The joint central moment
µXX (t1 , t2 ) = �(X(t1 ) − µX (t1 )) (X(t2 ) − µX (t2 ))�
= RXX (t1 , t2 ) − µX (t1 )µX (t2 ).
is known as the autocovariance function.
The mean, autocorrelation, and autocovariance functions as defined above
are designated ensemble statistics since the averaging is performed with respect to the ensemble of possible functions at a specific time instant. Other
joint moments can be defined in a straightforward manner.
Example 3.5
Consider the stochastic process defined by
X(t) = sin(2πfc t + Φ),
(3.102)
where fc is a fixed frequency and Φ is a uniform random variable taking values
between 0 and 2π. This is an instance of a parametric description where the
process is entirely defined by a single random variable Φ.
The ensemble mean is
� ∞
µX (t) =
xfX(t) (x)dx
=
−∞
2π
�
0
1
=
2π
= 0,
sin(2πfc t + x)fΦ (x)dx
�
0
2π
sin(2πfc t + x)dx
(3.103)
116
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
and the autocorrelation function is
� ∞� ∞
RXX (t1 , t2 ) =
x1 x2 fX(t1 ),X(t2 ) (x1 , x2 )dx1 dx2
=
−∞
2π
�
−∞
sin(2πfc t1 + φ) sin(2πfc t2 + φ)fΦ (φ)dφ
�� 2π
1
cos (2πfc (t2 − t1 )) dφ
4π 0
�
� 2π
−
cos (2πfc (t1 + t2 ) + 2φ) dφ
0
=
0
=
=
1
[2π cos (2πfc (t2 − t1 )) − 0]
4π
1
cos (2πfc (t2 − t1 )) .
2
(3.104)
Definition 3.25. If all statistical properties of a stochastic process are invariant
to a change in time origin, i.e. X(t, λ) is statistically equivalent to X(t +
T, λ), for any t, and T is any arbitrary time shift, then the process is said to be
stationary in the strict sense.
Stationarity in the strict sense implies that for any set of N time instants t1 ,
t2 , . . . , tN , the joint PDF of X(t1 ), X(t2 ), . . . , X(tN ) is identical to the joint
PDF of X(t1 + T ), X(t2 + T ), . . . , X(tN + T ). Equivalently, it can be said
that a process is strictly stationary if
�
� �
�
X k (t) = X k (0) , for all t, k.
(3.105)
However, a less stringent definition of stationarity is often useful, since strict
stationarity is both rare and difficult to determine.
Definition 3.26. If the ensemble mean and autocorrelation function of a stochastic process are invariant to a change of time origin, i.e.
�X(t)� = �X(t + T )� ,
RXX (t1 , t2 ) = RXX (t1 + T, t2 + T ),
for any t, t1 , t2 and T is any arbitrary time shift, then the process is said to be
stationary in the wide-sense or wide-sense stationary.
In the field of communications, it is typically considered sufficient to satisfy
the wide-sense stationarity (WSS) conditions. Hence, the expression “stationary process” in much of the literature (and in this book!) usually implies WSS.
Gaussian processes constitute a special case of interest; indeed, a Gaussian
process that is WSS is also automatically stationary in the strict sense. This is
117
Probability and stochastic processes
by virtue of the fact that all high-order moments of the Gaussian distribution
are functions solely of its mean and variance.
Since the point of origin t1 becomes irrelevant, the autocorrelation function
of a stationary process can be specified as a function of a single argument, i.e.
RXX (t1 , t2 ) = RXX (t1 − t2 ) = RXX (τ ),
(3.106)
where τ is the delay variable.
Property 3.3. The autocorrelation function of a stationary process is an even
function (symmetric about τ = 0), i.e.
RXX (τ ) = �X(t1 )X(t1 − τ )� = �X(t1 + τ )X(t1 )� = RXX (−τ ). (3.107)
Property 3.4. The autocorrelation function with τ = 0 yields the average
energy of the process, i.e.
�
�
RXX (0) = X 2 (t) .
Property 3.5. (Cauchy-Schwarz inequality)
|RXX (τ )| ≤ RXX (0),
.
Property 3.6. If X(t) is periodic, then RXX (τ ) is also periodic, i.e.
X(t) = X(t + T ) → RXX (τ ) = RXX (τ + T ).
It is straightforward to verify whether a process is stationary or not if an
analytical expression is available for the process. For instance, the process of
example 3.5 is obviously stationary. Otherwise, typical data must be collected
and various statistical tests performed on it to determine stationarity.
Besides ensemble statistics, time statistics can be defined with respect to a
given member function.
Definition 3.27. The time-averaged mean of a stochastic process X(t) is
given by
� T
2
1
MX = lim
X(t, λi )dt,
T →∞ T − T
2
where X(t, λi ) is a member function.
Definition 3.28. The time-averaged autocorrelation function is defined
1
RXX (τ ) = lim
T →∞ T
�
T
2
− T2
X(t, λi )X(t + τ, λi )dt.
118
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Unlike the ensemble statistics, the time-averaged statistics are random variables; their actual values depend on which member function is used for time
averaging.
Obtaining ensemble statistics requires averaging over all member functions
of the sample space S. This requires either access to all said member functions,
or perfect knowledge of all the joint PDFs characterizing the process. This
is obviously not possible when observing a real-world phenomenon behaving
according to a stochastic process. The best that we can hope for is access to
time recordings of a small set of member functions. This makes it possible to
compute time-averaged statistics.
The question that arises is: can a time-averaged statistic be employed as an
approximation of the corresponding ensemble statistic?
Definition 3.29. Ergodicity is the property of a stochastic process by virtue of
which all its ensemble statistics are equal to its corresponding time-averaged
statistics.
Not all processes are ergodic. It is also very difficult to determine whether a
process is ergodic in the strict sense, as defined above. However, it is sufficient
in practice to determine a limited form of ergodicity, i.e. with respect to one
or two basic statistics. For example, a process X(t) is said to be ergodic in
the mean if µX = MX . Similarly, a process is said to be ergodic in the autocorrelation if RXX = RXX . An ergodic process is necessarily stationary,
although stationarity does not imply ergodicity.
Joint and complex stochastic processes
Given two joint stochastic process X(t) and Y (t), they are fully defined at
two respective sets of time instants {t1,1 , t1,2 , . . . , t1,M } and {t2,1 , t2,2 , . . . , t2,N }
by the joint PDF
fX(t1,1 ),X(t1,2 ),··· ,X(t1,M ),Y (t2,1 ),Y (t2,2 ),··· ,Y (t2,N ) (x1 , x2 , . . . , xM , y1 , y2 , . . . , yN ) .
Likewise, a number of useful joint statistics can be defined.
Definition 3.30. The joint moment
RXY (t1 , t2 ) = �X(t1 )Y (t2 )� =
�
∞
−∞
�
∞
−∞
xyfX(t1 ),Y (t2 ) (x, y)dxdy
is the cross-correlation function of the processes X(t) and Y (t).
Definition 3.31. The cross-covariance of X(t) and Y (t) is
µXY (t1 , t2 ) = �X(t1 )Y (t2 )� = RXY (t1 , t2 ) − µX (t1 )µY (t2 ).
119
Probability and stochastic processes
X(t) and Y (t) are jointly wide sense stationary if
�X m (t)Y n (t)� = �X m (0)Y n (0)� ,
for all t, m and n.
(3.108)
If X(t) and Y (t) are individually and jointly WSS, then
RXY (t1 , t2 ) = RXY (τ ),
µXY (t1 , t2 ) = µXY (τ ).
(3.109)
(3.110)
Property 3.7. If X(t) and Y (t) are individually and jointly WSS, then
RXY (τ ) = RY X (−τ ),
µXY (τ ) = µY X (−τ ).
The above results from the fact that �X(t)Y (t + τ )� = �X(t − τ )Y (t)�.
The two processes are said to be statistically independent if and only if
their joint distribution factors, i.e.
fX(t1 ),Y (t2 ) (x, y) = fX(t1 ) (x)fY (t2 ) (y).
(3.111)
They are uncorrelated if and only if µXY (τ ) = 0 and orthogonal if and
only if RXY (τ ) = 0.
Property 3.8. (triangle inequality)
|RXY (τ )| ≤
1
[RXX (0) + RY Y (0)] .
2
Definition 3.32. A complex stochastic process is defined
Z(t) = X(t) + jY (t),
where X(t) and Y (t) are joint, real stochastic processes.
Definition 3.33. The complex autocorrelation function (autocorrelation function of a complex process) is defined
1
�Z(t1 )Z ∗ (t2 )�
2
1
=
�[X(t1 ) + jY (t1 )] [X(t2 ) − jY (t2 )]�
2
1
=
[RXX (t1 , t2 ) + RY Y (t1 , t2 )]
2
j
+ [RY X (t1 , t2 ) − RXY (t1 , t2 )] .
2
RZZ (t1 , t2 ) =
Property 3.9. If Z(t) is WSS (implying that X(t) and Y (t) are individually
and jointly WSS), then
RZZ (t1 , t2 ) = RZZ (t1 − t2 ) = RZZ (τ ).
(3.112)
120
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Definition 3.34. The complex crosscorrelation function of two complex random processes Z1 (t) = X1 (t) + jY1 (t) and Z2 (t) = X2 (t) + jY2 (t) is defined
1
RZ1 Z2 (t1 , t2 ) =
�Z1 (t1 )Z2∗ (t2 )�
2
1
=
[RX1 X1 (t1 , t2 ) + RX2 X2 (t1 , t2 )]
2
j
+ [RY1 X2 (t1 , t2 ) − RX1 Y2 (t1 , t2 )] .
2
Property 3.10. If the real and imaginary parts of two complex stochastic processes Z1 (t) and Z2 (t) are individually and pairwise WSS, then
1 ∗
∗
RZ
(τ ) =
�Z (t)Z2 (t − τ )�
1 Z2
2 1
1 ∗
=
�Z (t + τ )Z2 (t)�
2 1
= RZ2 Z1 (−τ ).
(3.113)
Property 3.11. If the real and imaginary parts of a complex random process
Z(t) are individually and jointly WSS, then
∗
(τ ) = RZZ (−τ ).
RZZ
(3.114)
Linear systems and power spectral densities
How does a linear system behave when a stochastic process is applied at
its input? Consider a linear, time-invariant system with impulse response h(t)
having for input a stationary random process X(t), as depicted in Figure 3.3. It
is logical to assume that its output will be another stationary stochastic process
Y (t) and that it will be defined by the standard convolution integral
� ∞
Y (t) =
X(τ )h(t − τ )dτ.
(3.115)
−∞
X(t)
Y (t)
h(t)
Figure 3.3. Linear system with impulse response h(t) and stochastic process X(t) at its input.
The expectation of Y (t) is then given by
�� ∞
�
�Y (t)� =
X(τ )h(t − τ )dτ
−∞
� ∞
=
�X(τ )� h(t − τ )dτ,
−∞
(3.116)
121
Probability and stochastic processes
where, by virtue of the stationarity of X(t), the remaining expectation is actually a constant, and we have
� ∞
�Y (t)� = µX
h(t − τ )dτ
−∞
� ∞
= µX
h(τ )dτ
−∞
= µX H(0),
(3.117)
where H(f ) is the Fourier transform of h(t).
Let us now determine the crosscorrelation function of Y (t) and X(t). We
have
RY X (t, τ ) = �Y (t)X ∗ (t − τ )�
�
�
� ∞
∗
=
X (t − τ )
X(t − a)h(a)da
−∞
� ∞
=
�X ∗ (t − τ )X(t − a)� h(a)da
−∞
� ∞
=
RXX (τ − a)h(a)da.
(3.118)
−∞
Since the right-hand side of the above is independent of t, it can be deduced
that X(t) and Y (t) are jointly stationary. Furthermore, the last line is also a
convolution integral, i.e.
RY X (τ ) = RXX (τ ) ∗ h(τ ).
(3.119)
The autocorrelation function of Y (t) can be derived in the same fashion:
RY Y (τ ) = �Y (t)Y ∗ (t − τ )�
�
�
� ∞
∗
=
Y (t − τ )
X(t − a)h(a)da
−∞
� ∞
=
�Y ∗ (t − τ )X(t − a)� h(a)da
�−∞
∞
=
RY X (a − τ )h(a)da
−∞
= RY X (−τ ) ∗ h(τ )
= RXX (−τ ) ∗ h(−τ ) ∗ h(τ )
= RXX (τ ) ∗ h(−τ ) ∗ h(τ ).
(3.120)
Given that X(t) at any given instant is a random variable, how can the spectrum of X(t) be characterized? Intuitively, it can be assumed that there is
122
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
a different spectrum X(f, λ) for every member function X(t, λ). However,
stochastic processes are in general infinite energy signals which implies that
their Fourier transform in the strict sense does not exist. In the time domain, a
process is characterized essentially through its mean and autocorrelation function. In the frequency domain, we resort to the power spectral density.
Definition 3.35. The power spectral density (PSD) of a random process X(t)
is a spectrum giving the average (in the ensemble statistic sense) power in the
process at every frequency f .
The PSD can be found simply by taking the Fourier transform of the autocorrelation function, i.e.
� ∞
SXX (f ) =
RXX (τ )e−j2πf τ dτ,
(3.121)
−∞
which obviously implies that the autocorrelation function can be found from
the PSD SXX (f ) by performing an inverse transform, i.e.
� ∞
RXX (τ ) =
SXX (f )ej2πf τ df.
(3.122)
−∞
This bilateral relation is known as the Wiener-Khinchin theorem and its
proof is left as an exercise.
Definition 3.36. The cross power spectral density (CPSD) between two random processes X(t) and Y (t) is a spectrum giving the ensemble average product between every frequency component of X(t) and every corresponding frequency component of Y (t).
As could be expected, the CPSD can also be computed via a Fourier transform
� ∞
SXY (f ) =
RXY (τ )ej2πf τ dτ.
(3.123)
−∞
By taking the Fourier transform of properties 3.10 and 3.11, we find the
following:
Property 3.12.
Property 3.13.
∗
SXY
(f ) = SY X (f ).
∗
SXX
(f ) = SXX (f ).
The CPSD between the input process X(t) and the output process Y (t) of
a linear system can be found by simply taking the Fourier transform of 3.117.
Thus, we have
SY X (f ) = H(f )SXX (f ).
(3.124)
123
Probability and stochastic processes
Likewise, the PSD of Y (t) is obtained by taking the Fourier transform of
(3.119), which yields
SY Y (f ) = H(f ) ∗ H ∗ (f )SXX (f )
= |H(f )|2 SXX (f ).
(3.125)
It is noteworthy that, since RXX (f ) is an even function, its Fourier transform SXX (f ) is necessarily real. This is logical, since according to the definition, the PSD yields the average power at every frequency (and complex power
makes no sense). It also follows that the average total power of X(t) is given
by
�
�
P = X 2 (t) = RXX (0)
� ∞
=
SXX (f )df.
(3.126)
−∞
Discrete stochastic processes
If a stochastic process is bandlimited (i.e. its PSD is limited to a finite
interval of frequencies) either because of its very nature or because it results
from passing another process through a bandlimiting filter, then it is possible to
characterize it fully with a finite set of time instants by virtue of the sampling
theorem.
Given a deterministic signal x(t), it is bandlimited if
X(f ) = F{x(t)} = 0,
for |f | > W ,
(3.127)
where W corresponds to the highest frequency in s(t). Recall that according
to the sampling theorem, x(t) can be uniquely determined by the set of its
samples taken at a rate of fs ≥ 2W samples / s, where the latter inequality
constitutes the Nyquist criterion and the minimum rate 2W samples / s is
known as the Nyquist rate. Sampling at the Nyquist rate, the sampled signal
is
�
� �
�
∞
�
k
k
xs (t) =
x
δ t−
,
(3.128)
2W
2W
k=−∞
and it corresponds to the discrete sequence
�
�
k
x[k] = x
.
2W
(3.129)
Sampling theory tells us that x[k] contains all the necessary information
to reconstruct x(t). Since the member functions of a stochastic process are
individually deterministic, the same is true for each of them and, by extension,
for the process itself. Hence, a bandlimited process X(t) is fully characterized
124
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
by the sequence of random variables X[k] corresponding to a sampling set at
the Nyquist rate. Such a sequence X[k] is an instance of a discrete stochastic
process.
Definition 3.37. A discrete stochastic process X[k] is an ensemble of discrete
sequences (member functions) x1 [k], x2 [k], . . . , xN [k] which are mapped to
the outcomes λ1 , λ2 , . . . , λN making up the sample space S of a corresponding
random experiment.
The mth moment of X[k] is
�X [k]� =
m
�
∞
−∞
xm fX[k] (x)dx,
(3.130)
its autocorrelation is defined
� ∞� ∞
�
1�
∗
RXX [k1 , k2 ] =
X k1 X k 2 =
xy ∗ fXk1 ,Xk2 (x, y)dxdy, (3.131)
2
−∞ −∞
and its autocovariance is given by
µXX [k1 , k2 ] = RXX [k1 , k2 ] − �X[k1 ]� �X[k2 ]� .
(3.132)
If the process is stationary, then we have
RXX [k1 , k2 ] = RXX [k1 − k2 ],
µXX [k1 , k2 ] = µXX [k1 − k2 ] = RXX [k1 − k2 ] − µ2X . (3.133)
The power spectral density of a discrete process is, naturally enough, computed using the discrete Fourier transform, i.e.
SXX (f ) =
∞
�
RXX [k]e−j2πf k ,
(3.134)
k=−∞
with the inverse relationship being
� 1/2
RXX [k] =
SXX (f )ej2πf k df.
(3.135)
−1/2
Hence, as with any discrete Fourier transform, the PSD SXX (f ) is periodic.
More precisely, we have SXX (f ) = SXX (f + n) where n is any integer.
Given a discrete-time linear time-invariant system with a discrete impulse
response h[k] = h(tk ), the output process Y [k] of this system when a process
X[k] is applied at its input is given by
Y [k] =
∞
�
n=−∞
h[n]X[k − n],
(3.136)
125
Probability and stochastic processes
which constitutes a discrete convolution.
The mean of the output can be computed as follows:
= �Y [k]�
� ∞
�
�
=
h[n]X[k − n]
µY
n=−∞
=
∞
�
h[n] �X[k − n]�
n=−∞
∞
�
= µX
h[n] = µX H(0),
(3.137)
n=−∞
where H(0) is the DC component of the system’s frequency transfer function.
Likewise, the autocorrelation function of the output is given by
RY Y [k] =
=
1 ∗
�Y [n]Y [n + k]�
2�
�
∞
∞
�
�
1
∗
∗
h [m]X [n − m]
h[l]X[k + n − l]
2 m=−∞
l=−∞
=
=
∞
�
∞
�
m=−∞ l=−∞
∞
∞
�
�
m=−∞ l=−∞
h∗ [m]h[l] �X ∗ [n − m]X[k + n − l]�
h∗ [m]h[l]RXX [k + m − l],
(3.138)
which is in fact a double discrete convolution.
Taking the discrete Fourier transform of the above expression, we obtain
SY Y (f ) = SXX (f ) |Hs (f )|2 ,
(3.139)
which is exactly the same as for continuous processes, except that SXX (f ) and
SY Y (f ) are the periodic PSDs of discrete processes, and Hs (f ) is the periodic
spectrum of the sampled version of h(t).
Cyclostationarity
The modeling of signals carrying digital information implies stochastic processes which are not quite stationary, although it is possible in a sense to treat
them as stationary and thus obtain the analytical convenience associated with
this property.
Definition 3.38. A cyclostationary stochastic process is a process with nonconstant mean (it is therefore not stationary, neither in the strict sense nor the
126
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
wide-sense) such that the mean and autocorrelation function are periodic in
time with a given period T .
Consider a random process
S(t) =
∞
�
k=−∞
a[k]g(t − kT ),
(3.140)
where {a[k]} is a sequence of complex random variables having a mean µA
and autocorrelation function RAA [n] (so that the sequence A[k] = {a[k]} is a
stationary discrete random process), and g(t) is a real pulse-shaping function
considered to be 0 outside the interval t ∈ [0, T ]. The mean of S(t) is given by
µS = µA
∞
�
k=−∞
g(t − nT ),
(3.141)
and it can be seen that it is periodic with period T .
Likewise, the autocorrelation function is given by
RSS (t, t + τ ) =
=
1 ∗
�S (t)S(t + τ )�
2�
�
∞
∞
�
�
1
∗
a [k]g(t − kT )
a[l]g(t + τ − lT )
2
k=−∞
=
=
1
g(t − kT )g(t + τ − lT ) �a∗ [k]a[l]�
2
−∞
k=−∞
∞ �
∞
�
k=−∞ −∞
It is easy to see that
l=−∞
∞ �
∞
�
g(t − kT )g(t + τ − lT )RAA [l − k]. (3.142)
RSS (t, t + τ ) = RSS (t + kT, t + τ + kT ),
for any integer k.
(3.143)
The fact that such processes are not stationary is inconvenient. For example, it is awkward to derive a PSD from the above autocorrelation function, because there are 2 variables involved (t and τ ) and this calls for a 2-dimensional
Fourier transform. However, it is possible to sidestep this issue by observing that averaging the autocorrelation function over one period T removes any
dependance upon t.
Definition 3.39. The period-averaged autocorrelation function of a cyclostationary process is defined
�
1 T /2
RSS (t, t + τ )dt.
R̄SS (τ ) =
T −T /2
127
Probability and stochastic processes
It is noteworthy that the period-averaged autocorrelation function is an ensemble statistic and should not be confused with the time-averaged statistics
defined earlier. Based on this, the power spectral density can be simply defined as
� ∞
�
�
SSS (f ) = F R̄SS (τ ) =
R̄SS (τ )ej2πf τ dτ.
(3.144)
−∞
5.
Typical univariate distributions
In this section, we examine various types of random variables which will
be found useful in the following chapters. We will start by studying three
fundamental distributions: the binomial, Gaussian and uniform laws. Because
complex numbers play a fundamental role in the modeling of communication
systems, we will then introduce complex r.v.’s and the associated distributions.
Most importantly, we will dwell on the complex Gaussian distribution, which
will then serve as a basis for deriving other useful distributions: chi-square,
F-distribution, Rayleigh and Rice. Finally, the Nakagami-m and lognormal
distributions complete our survey.
Binomial distribution
We have already seen in section 2 that a discrete random variable following
a binomial distribution is characterized by the PDF
�
N �
�
N
fX (x) =
pn (1 − p)N −n δ(x − n).
n
(3.145)
n=0
We can find the CDF by simply integrating the above. Thus we have
�
N �
�
N
pn (1 − p)N −n δ(α − n)dα
n
−∞
�
FX (x) =
x
n=0
�
N
�
=
n=0
where
It follows that
�
x
−∞
N
n
�
pn (1 − p)N −n
δ(α − n)dα =
FX (x) =
�
�
x
−∞
δ(α − n)dα, (3.146)
1, if n < x
0, otherwise
�
�x� �
�
N
pn (1 − p)N −n ,
n
n=0
(3.147)
(3.148)
128
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
where �x� denotes the floor operator, i.e. it corresponds to the nearest integer
which is smaller or equal to x.
The characteristic function is given by
�
� ∞�
N �
N
φX (jt) =
pn (1 − p)N −n δ(x − n)ejtx dx
n
−∞
n=0
�
N
�
=
n=0
N
n
�
pn (1 − p)N −n ejtn ,
(3.149)
which, according to the binomial theorem, reduces to
φX (jt) = (1 − p + pejt )N .
(3.150)
Furthermore, the Mellin transform of the binomial PDF is given by
�
� ∞�
N �
N
MfX (s) =
pn (1 − p)N −n δ(x − n)xs−1 dx
n
−∞
n=0
�
N
�
=
n=0
N
n
�
pn (1 − p)N −n ns−1 ,
which implies that the kth moment is
�
N �
� � �
N
k
X =
pn (1 − p)N −n nk .
n
(3.151)
(3.152)
n=0
If k = 1, we have
�
N �
�
N
pn (1 − p)N −n n
�X� =
n
n=0
N
�
N!
pn (1 − p)N −n
(N − n)!(n − 1)!
n=0
�
N
−1 �
�
�
N −1
= Np
pn (1 − p)N −1−n
�
n
n� =−1
�
�
N −1
�
where n = n − 1 and, noting that
= 0, we have
−1
�
N
−1 �
�
�
�
N −1
�X� = N p
pn (1 − p)N −1−n
�
n
�
=
(3.153)
n =0
= N p(1 − p + p)N −1
= N p.
(3.154)
129
Probability and stochastic processes
Likewise, if k = 2, we have
�
X
2
�
�
N �
�
N
=
pn (1 − p)N −n n2
n
n=0
= Np
N
−1 �
�
n� =0
N −1
n�
�
= N p (N − 1)p
N
−1 �
�
n� =0
N −1
n�
�
N
−2
�
n�� =0
�
�
�
pn (1 − p)N −1−n (n� + 1)
�
N −2
n��
n�
�
��
��
pn (1 − p)N −2−n +
N −1−n�
p (1 − p)
�
,
(3.155)
where n� = n − 1 and n�� = n − 2 and, applying the binomial theorem
� 2�
X
= N (N − 1)p2 + N p
= N p(1 − p) + N 2 p2 .
If follows that the variance is given by
� �
2
σX
= X 2 − �X�2
= N p(1 − p) + N 2 p2 − N 2 p2
= N p(1 − p)
(3.156)
(3.157)
Uniform distribution
A continuous r.v. X characterized by a uniform distribution can take any
value within an interval [a, b] and this is denoted X ∼ U (a, b). Given a smaller
interval [∆a, ∆b] of width ∆ such that ∆a ≥ a and ∆b ≤ b, we find that, by
definition
P (X ∈ [∆a , ∆b ]) = P (∆a ≤ X ≤ ∆b ) =
∆
,
b−a
(3.158)
regardless of the position of the interval [∆a , ∆b ] within the range [a, b].
The PDF of such an r.v. is simply
� 1
b−a , if x ∈ [a, b],
fX (x) =
(3.159)
0,
elsewhere,
or, in terms of the step function u(x),
fX (x) =
1
[u(x − a) − u(x − b)] .
b−a
(3.160)
130
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
The corresponding CDF is
FX (x) =

 0,

x
b−a ,
1,
if x < a,
if x ∈ [a, b]
if x > b,
(3.161)
which, in terms of the step function is conveniently expressed
x
FX (x) =
[u(x − a) − u(x − b)] + u(x − b).
b−a
The characteristic function is obtained as follows:
� b
1 jtx
φX (jt) =
e dx
a b−a
� jtx �b
1
e
=
b − a jt a
�
�
1
=
ejtb − ejta .
jt(b − a)
The Mellin transform is equally elementary:
� b s−1
x
MfX (s) =
dx
a b−a
� s �b
1
x
=
b−a s a
1 bs − as
=
.
b−a s
It follows that
� �
1 bk+1 − ak+1
Xk =
.
b−a
k+1
(3.162)
(3.163)
(3.164)
(3.165)
Gaussian distribution
From Appendix 3.A and Example 3.4, we know that the PDF of a Gaussian
r.v. is
(x−µX )2
−
1
2
√
fX (x) =
e 2σX ,
(3.166)
2πσX
2 is the variance and µ is the mean of X.
where σX
X
The corresponding CDF is
� x
FX (x) =
fX (α)dα
−∞
=
√
1
2πσX
�
x
−∞
−
e
α−µX
2σ 2
X
dα,
(3.167)
131
Probability and stochastic processes
α−µ
√ X,
2σX
which, with the variable substitution u =
becomes
� x−µ
√ X
2σX
1
2
FX (x) = √
e−u du
π
 �−∞
�
��
x−µ
1

√ X
1
+
erf
,

 2�

� 2σX ��
|x−µX |
1
√
=
2 1 − erf
2σX

�
�


|
 = 1 erfc |x−µ
√ X
2
2σ
if x − µX > 0,
(3.168)
otherwise,
X
where erf(·) is the error function and erfc(·) is the complementary error function.
Figure 3.4, shows the PDF and CDF of the Gaussian distribution.
0.12
0.1
fX (x)
0.08
0.06
0.04
0.02
0
−15
−10
−5
0
5
10
15
x
(a) Probability density function fX (x)
P (X ≤ x) = FX (x)
1
0.8
0.6
0.4
0.2
0
−15
−10
−5
0
5
10
15
x
(b) Cumulative density function fX (x)
Figure 3.4. The PDF and CDF of the real Gaussian distribution with zero mean and a variance
of 10.
132
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
The characteristic function of a Gaussian r.v. with mean µX and variance
2 is
σX
φX (jt) = ejtu−
2
t2 σX
2
(3.169)
.
The k th central moment is given by
�
(X − µX )
k
�
=
�
∞
(x − µX ) √
k
−∞
−
1
e
2πσX
where the integral can be reduced by substituting u =
�
(X − µX )k
(x−µX )2
2σ 2
X
(3.170)
,
(x−µX )2
2
2σX
to yield
�k
� � ∞ k−1
2 2 �
2σX
k+1
= √
1 − (−1)
u 2 e−u du,
π
0
�
�
(3.171)
which was obtained by treating separately the cases (x − µX ) > 0 and (x −
µX ) < 0. It can be observed that all odd-order central moments are null. For
the even moments, we have
�
(X − µX )
k
�
�
2 2 σk
= √X
π
�
(X − µX )
k
∞
u
k−1
2
e−u du,
if k is even,
(3.172)
0
which, according to the definition of the Gamma function, is
k
�
k
2 2 σk
= √ XΓ
π
�
k+1
2
�
.
(3.173)
By virtue of identity (2.197), the above becomes
�
(X − µX )
k
�
k
2 2 σ k 1 · 3 · 5 · · · (k − 1) √
= √X
π,
k
π
22
if k is even,
(3.174)
which finally reduces to
�
�
(X − µX )k =
�
0,
k
σX
� k2 −1
i=1
if k is odd,
(2i − 1), if k is even.
(3.175)
The raw moments are best obtained as a function of the central moments.
For the k th raw moment, assuming k an integer, we have:
�
X
k
�
�
−
1
=
x √
e
2πσX
−∞
∞
k
(x−µX )2
2σ 2
X
dx.
(3.176)
133
Probability and stochastic processes
This can be expressed as a function of x−µX thanks to the binomial theorem
as follows:
� ∞
(x−µX )2
� �
−
1
2
k
k
X
=
(x − µX + µX ) √
e 2σX dx
2πσX
−∞
�
�
�
k
(x−µX )2
∞
� k
−
1
2
k−n
n
=
(x − µX ) √
µX
e 2σX dx
n
2πσX
−∞
n=0
�
�
k
�
�
�
k
k
(X
−
µ
)
=
µk−n
X
X
n
n=0
=
� k2 � �
�
n−1
�
�
k
k−2n� 2n�
µX
σX
(2n − 1).
�
2n
�
n =0
(3.177)
i=1
The Mellin transform does not follow readily because the above assumes
that k is an integer. The Mellin transform variable s, on the other hand, is typically not an integer and can assume any value in the complex plane. Furthermore, it may sometimes be useful to find fractional moments. From (3.176),
we have
�
∞ �
� s−1 � �
�
�
s−1
X
=
µs−1−n
(X − µX )s−1 ,
(3.178)
X
n
n=0
where the summation now runs form 0 to ∞ to account for the fact that s
might not be an integer. If it is an integer, terms for n > s − 1 will be nulled
because the combination operator has (s − 1 − n)! at the denominator and the
factorial of a negative integer is infinite. It follows that the above equation is
true regardless of the value of s.
Applying the identity (2.197), we find
�
∞
� �
�
Γ(1 − s + n) s−1−n �
X s−1 =
µX
(X − µX )s−1 ,
Γ(1 − s)n!
(3.179)
n=0
which, by virtue of 3.173 and bearing in mind that the odd central moments
are null, becomes
�
�
� 2n�
∞
� s−1 � �
Γ(1 − s + 2n� ) s−1−2n� 2n σX
1
�
√
X
=
µ
Γ
n
+
.
(3.180)
Γ(1 − s)(2n� )! X
2
π
�
n =0
The Gamma function and the factorial in 2n� can be expanded by virtue of
property 2.68 to yield
�
� � �
� � 2n�
∞
� s−1 � �
Γ n� + 1−s
Γ n + 1 − 2s 2n σX
2
X
=
.
(3.181)
Γ(1 − s)n� !
�
n =0
134
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Expanding Γ(1 − s) in the same way allows the formation of two Pochhammer functions by combination with the numerator Gamma functions; this results in
�
∞ � 1−s � �
s
�
� s−1 �
1
−
�
2 n
2 n� n� 2n�
X
=
2 σX
n!
n� =0
�
�
1−s
s
2
= 2 F0
, 1 − ; 2σX
.
(3.182)
2
2
A unique property of this distribution is that a sum of Gaussian r.v.’s is itself
a Gaussian variate. Consider
Y =
N
�
(3.183)
Xi ,
i=1
where each Xi is a Gaussian r.v. with mean µi and variance σi2 .
The corresponding characteristic function is
φY (jt) =
N
�
ejtµi −
t2 σi2
2
i=1
=
�N
�
jtµi
e
�� N
�
−t
e
i=1
i=1
P
t2 PN
2
jt N
µ
−
i=1 i
i=1 σi
2
= e
2σ2
i
2
�
(3.184)
,
which, of course, is the C. F. of a Gaussian r.v. with mean
�
2
ance N
i=1 σi .
�N
i=1 µi
and vari-
Complex Gaussian distribution
In communications, quantities (channel gains, data symbols, filter coefficients) are typically complex thanks in no small part to the widespread use of
complex baseband notation (see chapter 4). It follows that it is useful to handle
complex r.v.’s. This is simple enough to do, at least in principle, given that a
complex r.v. comprises two jointly-distributed real r.v.’s corresponding to the
real and imaginary part of the complex quantity.
Hence, a complex Gaussian variate is constructed from two jointly distributed real Gaussian variates. Consider
Z = A + jB,
(3.185)
135
Probability and stochastic processes
where A and B are real Gaussian variates. It was found in [Goodman, 1963]
that such a complex normal r.v. has desirable analytic properties if
�
�
�
�
(A − µA )2 = (B − µB )2 ,
(3.186)
�(A − µA )(B − µB )� = = 0.
(3.187)
The latter implies that A and B are uncorrelated and independent.
Furthermore, the variance of Z is defined
�
�
|Z − µZ |2
σZ2 =
= �(Z − µZ )(Z − µZ )∗ � ,
(3.188)
where µZ = µA + jµB . Hence, we have
σZ2
= �[(A − µA ) + j(B − µB )] [(A − µA ) − j(B − µB )]�
�
� �
�
= (A − µA )2 + (B − µB )2
2
2
= σA
+ σB
2
= 2σA .
(3.189)
It follows that A and B obey the same marginal distribution which is
−
1
fA (a) = √
e
2πσA
(a−µA )2
2σ 2
A
(3.190)
.
Hence, the joint distribution of A and B is
fA,B (a, b) =
=
−
1
e
2πσA σB
1 −
2 e
2πσA
(a−µA )2
2σ 2
A
−
e
(b−µB )2
2σ 2
B
(a−µA )2 +(b−µB )2
2σ 2
A
,
(3.191)
which directly leads to
1 −
fZ (z) =
e
πσZ2
(z−µZ )(z−µZ )∗
σ2
Z
.
(3.192)
Figure 3.5 shows the PDF fZ (z) plotted as a surface above the complex
plane.
While we can conveniently express many complex PDFs as a function of a
single complex r.v., it is important to remember that, from a formal standpoint,
a complex PDF is actually a joint PDF since the real and imaginary parts are
separate r.v.’s. Hence, we have
fZ (z) = fA,B (a, b) = fRe{Z},Im{Z} (Re{z}, Im{z}).
(3.193)
136
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
0.3
fZ (z)
2
0.2
0.1
1
0
−2
0
−1
−1
0
Im{z}
Re{z}
1
2 −2
Figure 3.5.
2
The bidimensional PDF of unit variance Z (σZ
= 1) on the complex plane.
The fact that there are two underlying r.v.’s becomes inescapable when we
consider the CDF of a complex r.v. Indeed, the event Z ≤ z is ambiguous.
Therefore, the CDF must be expressed as a joint CDF of the real and imaginary
parts, i.e.
FRe{Z},Im{Z} (a, b) = P (Re{Z} ≤ a, Im{Z} ≤ b),
(3.194)
which, provided that the CDF is differentiable along its two dimensions, can
lead us back to the complex PDF according to
fZ (z) = fRe{Z},Im{Z} (a, b) =
∂2
F
(a, b).
∂a∂b Re{Z},Im{Z}
(3.195)
Accordingly, the C. F. of a complex r.v. Z is actually a joint C.F. defined by
�
�
φRe{Z},Im{Z} (jt1 , jt2 ) = ejt1 Re{Z}+jt2 Im{Z} .
(3.196)
In the case of a complex Gaussian variate, we have
2
φRe{Z},Im{Z} (jt1 , jt2 ) = ej(t1 Re{Z}+t2 Im{Z}) e−σZ (t1 +t2 ) .
(3.197)
Rayleigh distribution
The Rayleigh distribution characterizes the amplitute of a narrowband noise
process n(t), as originally derived by Rice [Rice, 1944], [Rice, 1945]. In
the context of wireless communications, however, the Rayleigh distribution is
most often associated with multipath fading (see chapter 4).
137
Probability and stochastic processes
Consider that the noise process n(t) is made up of a real and imaginary part
(as per complex baseband notation — see chapter 4):
N (t) = A(t) + jB(t),
(3.198)
and that N (t) follows a central (zero-mean) complex Gaussian distribution. It
follows that if N (t) is now expressed in phasor notation, i.e.
N (t) = R(t)ejΘ(t) ,
(3.199)
where R(t) is the modulus (amplitude) and Θ(t) is the phase, their respective
PDFs can be derived through transformation techniques.
We have
2
2
1 − a σ+b
2
N
fA,B (a, b) =
,
(3.200)
2 e
πσN
2 = 2σ 2 = 2σ 2 and the transformation is
where σN
A
B
�
R = g1 (A, B) = A2 + B 2 , R ∈ [0, ∞],
� �
B
Θ = g2 (A, B) = arctan
, Θ ∈ [0, 2π),
A
(3.201)
(3.202)
being the conversion from cartesian to polar coordinates. The inverse transformation is
A = g1−1 (R, Θ) = R cos Θ,
B = g2−1 (R, Θ) = R sin Θ,
A ∈ [−∞, ∞]
B ∈ [−∞, ∞].
Hence, the Jacobian is given by
� ∂R cos Θ ∂R sin Θ �
�
∂R
∂R
J = det ∂R cos Θ ∂R sin Θ = det
∂Θ
∂Φ
= R(cos Θ + sin Θ) = R.
2
2
cos Θ
sin Θ
−R sin Θ R cos Θ
(3.203)
(3.204)
�
(3.205)
It follows that
2
r − σr2
N u(r) [u(θ) − u(θ − 2π)] ,
fR,Θ (r, θ) =
2 e
πσN
and
�
1
[u(θ) − u(θ − 2π)] ,
2π
0
2
� 2π
� 2π
−r
re σn2
fR (r) =
fR,Θ (r, θ)dθ =
dθ
2 u(r)
πσN
0
0
fΘ (θ) =
∞
fR,Θ (r, θ)dr =
(3.206)
(3.207)
2
=
2r − σr2
N u(r).
2 e
σN
(3.208)
138
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Hence, the modulus R is Rayleigh-distributed and the phase is uniformly
distributed. Furthermore, the two variables are independent since fR,Θ (r, θ) =
fR (r)fΘ (θ).
The Mellin tranform of fR (r) is
�
�
�
�
s−1
s−1
Γ 1+
MfR (s) = Rs−1 = σN
,
(3.209)
2
and the kth central moment is given by
�
�
� �
k
k
k
R = σN Γ 1 +
.
2
(3.210)
Consequently, the mean is
�R� = σN Γ
and the variance is
(3.211)
� � �
(R − µR )2 = R2 − µ2R
�
�
2 π
σN
4−π
2
2
= σN Γ(2) −
= σN
.
4
4
2
=
σR
The CDF is simply
� �
√
3
σN π
=
,
2
2
�
FR (r) = P (R < r) =
�
0
r
2
(3.212)
2
− r2
2α − σα2
σ
N dα = 1 − e
N .
e
2
σN
(3.213)
Figure 3.6 depicts typical PDFs and CDFs for Rayleigh-distributed variables.
Finally, the characteristic function is given by
�
�
�
�
�
2 t2
π σN − σN
σN t
4
√
φR (jt) = 1 −
te
erfi
−j ,
(3.214)
2 2
2
where erfi(x) = erf(jx) is the imaginary error function.
Rice distribution
If a signal is made up of the addition of a pure sinusoid and narrowband
noise, i.e. its complex baseband representation is
S(t) = C + A(t) + jB(t),
(3.215)
where C is a constant and N (t) = A(t) + jB(t) is the narrowband noise
(as characterized in the previous subsection), then the envelope R(t) follows a
Rice distribution.
139
Probability and stochastic processes
0.8
2 =1
σN
fR (r)
0.6
2 =2
σN
0.4
2 =4
σN
0.2
0
0
2
4
6
8
10
r
(a) Probability density function fR (r)
1
P (R ≤ r) = FR (r)
0.8
2 =1
σN
0.6
2 =2
σN
0.4
2 =3
σN
2 =4
σN
0.2
0
2 =6
σN
0
2
4
6
8
10
r
(b) Cumulative density function FR (r)
2
Figure
3.6.
` 4−π
´ The PDF and CDF of the Rayleigh distribution with various variances σR =
2
σN 4 .
We note that Re{S(t)} = C + A(t), Im{S(t)} = B(t) and
R(t) =
�
(C + A(t))2 + B(t)2
�
�
B(t)
Θ(t) = arctan
.
C + A(t)
(3.216)
(3.217)
The inverse transformation is given by
A(t) = R(t) cos Θ(t) − C
B(t) = R(t) sin Θ(t)
(3.218)
(3.219)
It follows that the Jacobian is identical to that found for the Rayleigh distribution, i.e. J = R.
140
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Therefore, we have
fR,Θ (r, θ) =
=
r −
2 e
πσN
(r cos θ−C)2 +r 2 sin2 θ
σ2
N
r −r
2 e
πσN
2 −2rC cos θ+C 2
σ2
N
(3.220)
,
and the marginal distribution fR (r) is given by
� 2π
2
θ+C 2
r − r −2rCσcos
2
N
fR (r) =
dθ
2 e
πσN
0
2
2 �
2π 2rC cos θ
r − r σ+C
2
2
N
=
e
e σN dθ,
2
πσN
0
(3.221)
(3.222)
where the remaining integral can be reduced (by virtue of the definition of the
modified Bessel function of the first kind) to yield
�
�
2
2
r − r σ+C
2rC
2
N 2I0
fR (r) = 2 e
u(r).
(3.223)
2
σN
σN
Finding the marginal PDF of Θ is a bit more challenging:
� ∞
2
θ+C 2
r − r −2rCσcos
2
N
fΘ (θ) =
e
dr
2
πσN
0
2
=
to
− C2
σ
N
e
2
πσN
�
∞
−r
re
2 −2rC cos θ
σ2
N
(3.224)
(3.225)
dr.
0
According to [Prudnikov et al., 1986a, 2.3.15-3], the above integral reduces
2
fΘ (θ) =
− C2
e
σ
N
2π
exp
�
C 2 cos2 θ
2
2σN
�
D−2
� √
2C cos θ
−
σN
�
(3.226)
,
where we can get rid of D−2 (z) (the parabolic cylinder function of order -2)
by virtue of the following identity [Prudnikov et al., 1986b, App. II.8]:
�
�
�
z2
π z2
z
D−2 (z) =
ze 4 erfc √
+ e− 4
(3.227)
2
2
to finally obtain
2
fΘ (θ) =
− C2
e
σ
N
2π
�√
πC cos(θ) C
e
σN
× [u(θ) − u(θ − 2π)] .
2 cos2 θ
σ2
N
�
C cos θ
erfc −
σN
�
+1
�
(3.228)
141
Probability and stochastic processes
In the derivation, we have assumed (without loss of generality and for analytical convenience) that the constant term was real where in fact, the constant
term can have a phase (Cejθ0 ). This does not change the Rice amplitude PDF,
but displaces the Rice phase distribution so that it is centered at θ0 . Therefore,
in general, we have
2
fΘ (θ) =
− C2
�
√
C 2 cos2 (θ−θ )
0
πC cos(θ − θ0 )
σ2
N
× (3.229)
1+
e
2π
σN
�
��
C cos (θ − θ0 )
erfc −
[u(θ) − u(θ − 2π)] .
σN
e
σ
N
It is noteworthy that the second raw moment (a quantity of some importance
since it reflects the average power of Rician process R(t)) can easily be found
without integration. Indeed, we have from (3.216)
� 2�
�
�
R
= (C + A)2 + B 2
�
�
= C 2 + 2AC + A2 + B 2
� � � �
= C 2 + A2 + B 2
2
= C 2 + σN
.
(3.230)
One important parameter associated with Rice distributions is the so-called
K factor which constitutes a measure of the importance of the random portion
of the variable (N (t) = A + jB) relative to the constant factor C. It is defined
simply as
C2
K= 2 .
(3.231)
σN
Figure (3.7) shows typical PDFs for R and Θ for various values of the K factor, as well as their geometric relation with the underlying complex
� Gaussian
�
variate. For the PDF of R, the K factor was varied while keeping R2 = 1.
Central chi-square distribution
Consider the r.v.
Y =
N
�
Xn2 ,
(3.232)
n=1
where {X1 , X2 , . . . , XN } form a set of i.i.d. zero-mean real Gaussian variates.
Then Y is a central chi-square variate with N degrees-of-freedom.
This distribution is best derived in the characteristic function domain. We
start with the case N = 1, i.e.
Y = X 2,
(3.233)
142
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
�{Z}
r
C
θ0
θ
Cejθ
0
�{Z}
0
(a) Geometry of the Rice amplitude and phase distributions in the complex
plane
1.75
1.75
1.5
K = 10
K=2
1.25
K=2
1
fR (r)
fΘ (θ)
K = 10
1.5
1.25
K=1
0.75
0.5
K=
0.25
1
0.25
0
−2
−1
0
1
θ − θ0 (radians)
2
K=1
0.75
0.5
1
16
0
−3
K=
1
16
3
0
0.5
1
1.5
2
2.5
r(m)
(b) Rice phase PDF
(d) Rice amplitude PDF
Figure 3.7. The Rice distributions and their relationship with the complex random variable
Z = (C cos(θ) + A) + j (C sin(θ) + B); (a) the distribution of Z in the complex plane (a
non-central complex Gaussian distribution pictured here with concentric isoprobability curves)
and relations with variables R and θ; (b) PDF of R; (c) PDF of Θ.
where the C. F. is simply
�
� �
2
ejtY = ejtX
� ∞
2
2
− x2
ejtx
2σ
√
=
e X dx
2πσX
−∞
«
�
� ∞ „
jt− 12 x2
2 1
2σ
X
=
e
dx.
π σX 0
φY (jt) =
�
(3.234)
143
�
�
The integral can be solved by applying the substitution u = − jt − 2σ12 x2
X
which leads to
� ∞ −u
1
1
e
�
√ du
φY (jt) = √
u
1
2πσX
0
−
jt
2
2σ
� ∞ −u
1
e
√ du.
= �
(3.235)
u
2 ) 0
π(1 − 2jtσX
Probability and stochastic processes
According to the definition of the Gamma function, the latter reduces to
� �
�
�
Γ 12
2 −1/2
φY (jt) = �
= 1 − 2jtσX
.
(3.236)
2 )
π(1 − 2jtσX
Going back now to the general case of N degrees of freedom, we find that
φY (jt) =
N
�
�
n=1
2
1 − 2jtσX
�−1/2
�
�
2 −N/2
= 1 − 2jtσX
.
(3.237)
The PDF can be obtained by taking the transform of the above, i.e.
� ∞
�
�
1
2 −N/2 −jty
fY (y) =
1 − 2jtσX
e
dt,
(3.238)
2π −∞
which, according to [Prudnikov et al., 1986a, 2.3.3-9], reduces to
fY (y) = �
2
2σX
1
�N/2
2
Γ (N/2)
y N/2−1 e−y/2σX u(y).
(3.239)
The shape of the chi-square PDF for 1 to 10 degrees-of-freedom is shown
in Figure 3.8. It should be noted that for N = 2, the chi-square distribution
reduces to the simpler exponential distribution.
The chi-square distribution can also be derived from complex Gaussian
r.v.’s.
Let
N
�
Y =
|Zn |2 ,
(3.240)
n=1
where {Z1 , Z2 , . . . , ZN } are i.i.d. complex Gaussian variates with variance
σZ2 .
If we start again with the case N = 1, we find that the C.F. is
� �
� �
�
� �
�−1
2
2
2
, (3.241)
φY (jt) = ejtY = ejt|Z1 | = ejt(A +B ) = 1 − jtσZ2
144
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
2
N = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
fY (y)
1.5
1
0.5
0
0
4
2
6
8
10
y
Figure 3.8. The chi-square PDF for values of N between 1 and 10.
where A = �{Z} and B = �{Z}. Hence, is is a 2 degrees of freedom chi2 = 2σ 2 .
square variate which can be related to (3.237) by noting that σZ2 = 2σA
B
It follows that in the general case of N variates, we have
�
�−N
φY (jt) = 1 − jtσZ2
,
(3.242)
which naturally leads to
f y(y) =
1
y
(σZ2 )N Γ(N )
2
N −1 −y/σZ
e
u(y).
(3.243)
which is a 2N degrees-of-freedom X 2 variate. Equivalently, it can also be said
that this r.v. has N complex degrees-of-freedoms.
The k th raw moment is given by
� ∞
� �
1
2
k
Y
=
y N −1+k e−y/σZ dy
2
N
(σZ ) Γ(N )
0
� ∞
2
(σZ )N −1+k
=
uN −1+k e−u du
(σZ2 )N −1 Γ(N ) 0
=
(σZ2 )k
Γ(N + k).
Γ(N )
(3.244)
It follows that the mean is
µY = �Y � =
2
σX
Γ(N + 1) = N σZ2 ,
Γ(N )
(3.245)
145
Probability and stochastic processes
and the variance is given by
�
�
σY2 =
(Y − µY )2
� �
= Y 2 − �Y �2
σZ4 Γ(N + 2)
− N 2 σZ4
Γ(N )
= σZ4 (N + 1)N − N 2 σZ4 = σZ4 N.
=
(3.246)
Non-central chi-square distributions
The non-central chi-square distribution characterizes a sum of squared nonzero-mean Gaussian variates. Let
Y =
N
�
Xn2 ,
(3.247)
n=1
where {X1 , X2 , . . . , XN } form a set of independant real Gaussian r.v.’s with
2 and means {µ , µ , . . . , µ } and
equal variance σX
1 2
N
N
�
n=1
|µn | =
� 0,
(3.248)
i.e. at least one Gaussian r.v. has a non-zero mean.
As a first step in deriving the distribution, consider
Yn = Xn2 ,
(3.249)
where
−(x−µ )2
n
1
2
fXn (x) = √
e 2σX .
(3.250)
2πσX
√
According to (3.57) and since Xn = ± Yn , we find that
� −(√y−µ )2
�
√
−( y−µn )2
n
1
1
2σ 2
2σ 2
X
X
fYn (y) = √
e
+e
√
2 y
2πσX
�
�
√
√
µ2
yµn
yµ
− y2 − n2
− 2n
1
2σ
2σ
σ2
σ
√
=
√ e Xe X e X +e X
2 2πσX y
�√
�
µ2
− y2 − n2
yµn
1
2σ
2σ
= √
.
(3.251)
√ e X e X cosh
2
σX
2πσX y
146
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
To find the characteristic function, we start by expanding the hyperbolic
cosine into an infinite series:
�
�
φYn (jt) = ejtYn
� ∞
ejtα
2
2
2
√
=
√ e−α/2σX e−µn /2σX ×
2πσX α
0
∞
� (αµ2 )k � 1 �k
n
,
(3.252)
4
(2k)!
σX
k=0
Applying the identity (2k)! =
�
−
� e
ejtYn = √
µn
2σ 2
X
2σX
�
2k
2
√ Γ
π
k+
1
2
�
Γ(k + 1), the above becomes
� 4 �−k � ∞
∞
�
− α2
µ2k
n 4σX
jtα
k− 21
2σ
X α
�
�
e
e
dα,
1
Γ k + 2 k! 0
k=0
(3.253)
where the resulting integral can be solved in a manner similar to that of subsection 6 to give
�
−
�
ejtYn =
e
√
µn
2σ 2
X
2σX
∞
�
µ2k
2 )k+1/2
(2σX
2 t)k+1/2
k (1 − j2σX
n
k=0
2
e−µn /2σX
=
2 t)1/2
(1 − j2σX
e
�
�k
µ2
n /2
1−j2σ 2 t
X
jtµ2
n
1−j2σ 2 t
X
e
.
2 t)1/2
(1 − j2σX
=
1
2
4σX
(3.254)
In the general case, the C.F. is given by
�
φY (jt) = e
jtY
�
�
= e
jt
PN
n=1
Yn
Substituting (3.253) in (3.255) yields
φY (jt) =
N
�
�
=
N
�
�
ejtYn
n=1
�
(3.255)
jtµ2
n
1−j2tσ 2 t
X
e
2 t)1/2
(1 − jt2σX
n=1
jtU
2
=
where
e 1−jt2σX t
,
2 t)N/2
(1 − jt2σX
U=
N
�
n=1
µ2n .
(3.256)
(3.257)
147
Probability and stochastic processes
To find the corresponding PDF by taking the Fourier transform, we once
again resort to an infinite series expansion, i.e.
� ∞
1
fY (y) =
e−jyt φY (jt)dt
2π −∞
jtU
� ∞ −jyt 1−j2σ
2 t
X
1
e
e
=
dt
2 t)N/2
2π −∞ (1 − j2σX
2
�
U/2σX
1 −U/2σ2 ∞
e−jyt
1−j2σ 2 t
X
X dt
=
e
e
2 N/2
2π
−∞ (1 − j2σX t)
�
� 2 � �k
�
∞
U/ 2σX
1 −U/2σ2 � 1 ∞
e−jyt
X
=
e
dt
2 t)N/2 1 − j2σ 2 t
2π
k! −∞ (1 − j2σX
X
k=0
� 2�k� ∞
∞
]
1 −U/2σ2 � [U/ 2σX
e−jyt
X
=
e
dt
2 t)N/2+k
2π
k!
(1
−
j2σ
−∞
X
k=0
(3.258)
which, according to [Prudnikov et al., 1986a, 2.3.3-9], reduces to
� 2�k
2
∞
�
[U/
2σX ] y N/2+k−1 e−y/(2σX )
2
−U/2σX
fY (y) = e
u(y).
2 )N/2+k Γ( N + k)
k!
(2σX
2
k=0
(3.259)
We note, however, that
� 4�k k
∞
�
[U/ 4σX
] y
k=0
k!Γ( N2 + k)
= 0 F1
�
N
2
which leads directly to
y N/2−1 e−y/(2σX )
0 F1
2 )N/2
(2σX
2
2
−U/2σX
fY (y) = e
�
�
�
� Uy
�
� 4σ 4 ,
(3.260)
X
N
2
�
�
� Uy
�
� 4σ 4 .
(3.261)
X
From (3.261), it is useful to observe that
� 2�k
∞
�
�
− U2 � [U/ 2σX
]
2
2σ
fY (y) = e X
gN +2k y, σX
,
k!
(3.262)
k=0
2 ) is the PDF of a central chi-square variate with N + 2k
where gN +2k (y, σX
2 . This formulation of a non-central
degrees-of-freedom and a variance of σX
χ2 PDF as a mixture of central χ2 PDFs is often useful to extend results from
the central to the non-central case.
148
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Like in the central case, the non-central χ2 distribution can be defined from
complex normal r.v.’s. Given the sum of the moduli of N squared complex
normal r.v.’s with the same variance σZ2 and means {µ1 , µ2 , . . . , µN }, the PDF
is given by
fY (y) =
−
e
U
σ2
Z
y N −1
σZ2N
−
e
y
σ2
Z
0 F1
�
It follows that the corresponding C.F. is
�
�
�Uy
�
N � 4 u(y).
σZ
(3.263)
jtU
2
φY (jt) = �
e 1−jσZ t
1 − jσZ2 t
�N .
(3.264)
From (3.262) and (3.243), the kth raw moment is
�
Y
k
�
∞
�
[U/σZ2 ]m Γ (N + m + k)
= e
m!
Γ (N + m)
m=0
�
�
�
2
N + k �� U Γ(N + k)
−U/σZ
2k
= e
σ Z 1 F1
,
N � σZ2
Γ(N )
−
U
σ2
Z
σZ2k
which, by virtue of Kummer’s identity, becomes
�
�
�
� �
U Γ(N + k)
−k ��
k
2k
Y
= σ Z 1 F1
−
.
N � σ2
Γ(N )
(3.265)
(3.266)
Z
A most helpful fact at this point is that a hypergeometric function with a
single negative integer upper argument is in fact a hypergeometric polynomial,
i.e. it can be expressed as a finite sum. Indeed, we have:
�
�
� �
∞
Γ(−k + m)Γ(n1 )(−A)m
−k ��
F
−
A
=
,
(3.267)
1 1
n1 �
Γ(n1 + m)Γ(−k)m!
m=0
which, by virtue of the Gamma reflection formula, becomes
1 F1 (· · · )
∞
�
Γ(n1 )(−1)m Γ(1 + k)(−A)m
Γ(n1 + m)Γ(1 + k − m)m!
m=0
�
k �
�
Γ(n1 )Am
k
=
,
m Γ(n1 + m)
=
(3.268)
m=0
where the terms for m > k are zero because they are characterized by a
Gamma function at the denominator (Γ(1 + k − m)) with an integer argument which is zero or negative. Indeed, such a Gamma function has an infinite
magnitude.
149
Probability and stochastic processes
Therefore, we have
�
Y
k
�
=
�m
��
k �
�
U/σZ2
k
+ k)
.
m
m!
σZ2k Γ(N
(3.269)
m=0
From this finite sum representation, the mean is easily found to be
�
�
U
2
�Y � = σZ N 1 + 2
.
(3.270)
σZ N
Likewise, the second raw moment is
�
�
�
�
� 2�
2U
U2
4
Y = σZ (N + 1) N + 2 + 4 .
σZ
σZ
It follows that the variance is
σY2
�
Y 2 − �Y �2
�
�
2U
4
= σZ N + 2 .
σZ
=
�
(3.271)
(3.272)
Non-central F -distribution
Consider
Z1 /n1
,
(3.273)
Z2 /n2
where Z1 is a non-central χ2 variate with n1 complex degrees-of-freedom and
a non-centrality parameter U , and Z2 is a central χ2 variate with n2 complex
degrees-of-freedom, then Y is said to follow a non-central F -distribution parameterized on n1 , n2 and U .
Given the distributions
1
fZ1 (x) =
xn1 −1 e−x e−U 0 F1 (n1 |U x ) u(x),
(3.274)
Γ(n1 )
1
fZ2 (x) =
xn2 −1 e−x u(x),
(3.275)
Γ(n2 )
Y =
we apply two simple univariate transformations:
A1 =
Z1
,
n1
A2 =
n2
,
Z2
(3.276)
such that Y = A1 A2 .
It is straightforward to show that
fA1 (x) =
fA2 (x) =
nn1 1 n1 −1 −n1 x −U
x
e
e 0 F1 (n1 |U n1 x ) u(x),
Γ(n1 )
nn2 2 −n2 −1 − n2
x
e x u(x).
Γ(n2 )
(3.277)
(3.278)
150
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
It then becomes possible to compute the PDF of Y through the Mellin convolution, i.e.
�
�y�
1
fA1 (x) dx, y ≥ 0
x
x
0
� ∞ n2 � �n2 +1 n x n1
n1
x
n2
− 2
=
e y
xn1 −2 e−n1 x e−U ×
Γ(n2 ) y
Γ(n1 )
0
0 F1 (n1 |U n1 x ) dx
“
”
� ∞
n2
nn2 2 nn1 1 e−U
n1 +n2 −1 − y +n1 x
=
x
e
×
y n2 +1 Γ(n1 )Γ(n2 ) 0
(3.279)
0 F1 (n1 |U n1 x ) dx,
fY (y) =
∞
fA2
where the integral can be interpreted as the Laplace transform of xν 0 F1 (n |Kx )
(see property 2.79) and yields
fY (y) =
e−U Γ(n
�
+ n2 )
�
Γ(n1 )Γ(n2 )
1
n1
n2
1+
�n1
n1
n2 y
y n1 −1
�n1 +n2 1 F1
�
�
�
� n1
n1 + n2 � n2 U y
u(y).
�
n1
� 1 + nn12 y
(3.280)
It is noteworty that letting U = 0 in the above, it reduces to the density of a
central F-distribution, it being the ratio of two central χ2 variates, i.e.
�
1
fY (y) =
�
B(n1 , n2 )
n1
n2
1+
�n1
n1
n2 y
y n1 −1
�n1 +n2 u(y).
(3.281)
Examples of the shape of the F-distribution are shown in Figure 3.9.
The PDF (3.280) could also have been obtained by representing the PDF
of Z1 as an infinite series of central chi-square PDFs according to (3.262) and
proceeding to find the PDF of a ratio of central χ2 variates. Hence, we have:
−
fZ1 (z) = e
U
σ2
Z
�k
∞ �
�
U/σZ2
hn1 +k (z, σZ2 ),
k!
(3.282)
k=0
where we have included the variance parameter σZ2 (being the variance of the
underlying complex gaussian variates; it was implicitely equal to 1 in the preσ2
ceding development leading to (3.280)) and hn (z, σZ2 ) = g2n (z, 2Z ) is the
density in z of a central χ2 variate with n complex degrees-of-freedom and a
variance of the underlying complex Gaussian variates of σZ2 .
151
Probability and stochastic processes
0.6
0.5
fY (y)
0.4
0.3
U = 0, 1, 3, 5
0.2
0.1
0
0
4
2
6
8
10
y
Figure 3.9. The F-distribution PDF with n1 = n2 = 2 and various values of the non-centrality
parameter U .
Given that the PDF of Z2 is (3.275), we apply (for the sake of diversity with
respect to the preceding development) the following joint transformation:
Y
=
n2 Z1
,
n1 Z2
(3.283)
X = Z2 .
The Jacobian is J =
fY,X (y, x) =
=
n1 x
n2 .
n1 x
fZ
n2 1
It follows that the joint density of y and x is
�
n1 x − σU2
e Z
n2
�
n1
xy fZ2 (x)
n2
�m
∞ �
�
U/σ 2
Z
m=0
1
xn2 −1 e−x
Γ(n2 )
m!
hn1 +m
�
n1
xy, σZ2
n2
�
×
„
«
�
∞ �
n y
2 m
−x 1+ 1 2
− U2 � U/σZ
n1 x
n2 −1
n2 σ
σ
Z e
Z
=
x
e
×
m!
n2 Γ(n2 )σZ2
k=0
� n1 �n1 +m−1
1
n2 xy
u(x)u(y).
(3.284)
2
Γ (n1 + m)
σZ
152
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
The marginal PDF of Y is, as usual, obtained by integrating over X. In
this case, it is found convenient to exchange the order of the summation and
integration operations, i.e.
� ∞
fY (y) =
fY,X (y, x)dx
0
�
�m � n1 �n1 +m−1
∞
− U2 �
U/σZ2
n1
n2 y
σ
Z
×
=
e
2
m!Γ(n1 + m) σZ2
Γ(n2 + 1)σZ
m=0
«
„
� ∞
n y
−x 1+ 1 2
n2 σ
Z dx.
(3.285)
xn2 +n1 +m−1 e
0
�
�
Letting v = x 1 + nn1σy2 , we find that the remaining integrand follows the
2 Z
definition of the Gamma function. Hence, we have
„
«
� ∞
n y
−x 1+ 1 2
n2 σ
Z
I =
xn2 +n1 +m−1 e
0
� ∞
1
= �
v n1 +n2 +m−1 e−v dv
�n1 +n2 +m
n1 y
0
1 + n σ2
2 Z
=
�
Γ (n1 + n2 + m)
�n1 +n2 +m .
n1 y
1 + n σ2
(3.286)
2 Z
Substituting the above in (3.284), we obtain
fY (y) =
− U2
n1
σ
Z ×
e
(3.287)
Γ(n2 + 1)σZ2
�
�m
� n1 �n1 +m−1
∞
�
U/σZ2
Γ(n1 + n2 + m)
n2 y
u(y),
�
�n1 +n2 +m
σZ2
n y
m=0 m!Γ(n1 + m) 1 + 1 2
n σ
2 Z
where the resulting series defines a 1 F1 (·) (Kummer) confluent hypergeometric
function, thus leading to the compact representation
� �n1
�
�
�
− U2
n1
n1
�
σ
y n1 −1
U
y
n2
e Z u(y)
n1 + n2 �
n
.
fY (y) = 2n1
� 4 2 2 n1
�
�
1 F1
n1
� σZ + σZ n y
σZ B(n1 , n2 ) 1 + n1 y n1 +n2
2
2
n2 σZ
(3.288)
It is noteworthy that (3.287) can be expressed as a mixture of central F distributions, i.e.
�
�
�
��
∞ �
2 m
− U2 � U/σZ
m
(F )
σ
fY (y) = e Z
hn1 +m,n2 y, σZ2 1 +
,
(3.289)
m!
n1
m=0
153
Probability and stochastic processes
where
)
h(F
ν1 ,ν2
�
y, σF2
�
1
=
B (ν1 , ν2 )
�
ν1
ν2 σF2
�ν 1
�
1+
y ν1 −1
�ν1 +ν2 u(y),
ν1 y
2
ν 2 σF
(3.290)
is the PDF of a central F -distributed variate in y with ν1 and ν2 degrees-offreedom and where the numerator has a mean of σF2 1 .
The raw moments are best calculated based on the series representation
(3.287). Thus, after inverting the order of the summation and integration operations, we have:
� �n1
�
� � n �m
n1
2 m
1
∞
� �
U �
U/σ
Γ(n1 + n2 + m)
2
− 2
Z
n2
n2 σZ
k
σ
Z
Y
=
e
×
m!Γ(n1 + m)
Γ(n2 )σZ2n1
m=0
� ∞
y k+n1 +m−1
(3.291)
�
�n1 +n2 +m dy.
0
1 + nn1σy2
2 Z
Letting u =
n1
y
n2 σ 2
Z
n1 y
1+
n2 σ 2
Z
, we find that
y =
dy =
n2 σZ2 u
,
n1 (1 − u)
n1
(1 − u)2 du,
n2 σZ2
(3.292)
(3.293)
which leads to
I =
�
0
=
1
un1 +m+k−1
�
n1
2
n2 σZ
�
�−k−n1 −m
n1
2
n2 σZ
du
k+1−n2
(1 − u)
�−k−n1 −m
B (k + m + n1 , −k + n2 )
,
n2 > k.
Substituting (3.293) in (3.290), we find
� 2 �k
n2 σZ
�
�m
∞
� �
− U2 �
U/σZ2
n1
k
σ
=
Y
e Z
×
Γ(n2 )
m!Γ(n1 + m)
(3.294)
m=0
Γ (k + m + n1 ) Γ (−k + n2 ) ,
1 The mean of the numerator is indeed ν
1
n2 > k,
(3.295)
times the variance of the underlying Gaussians by virtue of (3.245)
154
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
which also has a compact representation similar to (3.288):
�
2
n2 σZ
n1
�k
−
U
σ2
Z
�
�
k + n1 �� U
Y
=
.
1 F1
n1 � σZ2
Γ(n1 )Γ(n2 )
(3.296)
According to Kummer’s identity, the above can be transformed as follows:
�
k
�
�
e
�
Γ(k + n1 )Γ(n2 − k)
n1
2
n2 σZ
�n1 +k−n2
�
�
�
�
U
−k ��
−
Y
=
.
1 F1
n1 � σZ2
Γ(n1 )Γ(n2 )
(3.297)
By virtue of (3.267) and (3.268), a hypergeometric function with an upper
negative integer argument (and no lower negative integer argument) is representable as a finite sum. Thus, we have
�
k
�
Yk
=
�
�
Γ(k + n1 )Γ(n2 − k)
2
n2 σZ
n1
�k
k �
�
k
m
m=0
Γ(k + n1 )Γ(n2 − k)
Γ(n2 )
� �m
U
�
2
σZ
×
.
(3.298)
n2 > 1.
(3.299)
Γ(n1 + m)
It follows that the mean is
�Y � =
�
n2 σZ2 n1 +
U
2
σZ
n1 (n2 − 1)
�
,
In the same fashion, we can show that the second raw moment is
��
�
�
� 2 � n22 σZ4
1
U
U2
Y =
2 2 + n1 (n1 + 1) + 4 .
n21 (n2 − 1)(n2 − 2)
σZ
σZ
(3.300)
It follows that the variance is
�

�2
U




n
+
2
4
1
2
n2 σ Z
U
σZ
2
σY = 2
+ 2 2 + n1 .
(3.301)

n1 (n2 − 1)(n2 − 2) 
σZ
 n2 − 1

Beta distribution
Let Y1 and Y2 be two independent central chi-square random variables with
n1 and n2 complex degrees-of-freedom, respectively and where the variance of
the underlying complex Gaussians is the same and is equal to σZ2 . Furthermore,
155
Probability and stochastic processes
we impose that
A2 = Y1 + Y2 ,
BA2 = Y1 .
(3.302)
(3.303)
Then A and B are independent and B follows a beta distribution with PDF
fB (b) =
Γ(n1 + n2 ) n1 −1
b
(1 − b)n2 −1 [u(b) − u(b − 1)] .
Γ(n1 )Γ(n2 )
(3.304)
As a first step in deriving this PDF, consider the following bivariate transformation:
C1 = Y1 + Y2 ,
C2 = Y1 ,
(3.305)
(3.306)
where it can be verified that the corresponding Jacobian is 1.
Since Y1 and Y2 are independent, their joint PDF is
fY1 ,Y2 (y1 , y2 ) =
−
y1n1 −1 y2n2 −1 e
2(n1 +n2 )
σZ
y1 +y2
σ2
Z
Γ(n1 )Γ(n2 )
u(y1 )u(y2 ),
(3.307)
which directly leads to
fC1 ,C2 (c1 , c2 ) =
−
cn2 1 −1 (c1 − c2 )n2 −1 e
2(n1 +n2 )
σZ
c1
σ2
Z
Γ(n1 )Γ(n2 )
u(c1 )u(c2 ).
(3.308)
Consider now a second bivariate transformation:
T
BT
= C1 ,
= C2 ,
(3.309)
(3.310)
where the Jacobian is simply t.
We find that
−
t
2
Γ(n1 + n2 ) n1 −1
tn1 +n2 −1 e σZ
fB,T (b, t) =
b
(1 − b)n2 −1 2(n +n )
u(b)u(t).
Γ(n1 )Γ(n2 )
σZ 1 2 Γ(n1 + n2 )
(3.311)
Since this joint distribution factors, B and T are independent and B is said
to follow a beta distribution with n1 and n2 complex degrees-of-freedom and
PDF given by
fB (b) =
Γ(n1 + n2 ) n1 −1
b
(1 − b)n2 −1 u(b),
Γ(n1 )Γ(n2 )
(3.312)
156
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
5
[2, 1], [3, 1], [4, 1], [5, 1]
4
fB (b)
[4, 2]
3
[4, 4]
[3, 2]
[2, 2]
2
1
0
0
0.2
0.4
0.6
0.8
1
b
Figure 3.10. The beta PDF for various integer combinations of [n1 , n2 ].
while T follows a central chi-square distribution with n1 +n2 complex degreesof-freedom and an underlying variance of σZ2 .
Representative instances of the beta PDF are shown in Figure 3.10.
The CDF of a beta-distributed variate is given by
Γ(n1 + n2 )
FB (x) = P (B < x) =
Γ(n1 )Γ(n2 )
�
0
x
bn1 −1 (1 − b)n2 −1 db.
(3.313)
Applying integration by parts, we obtain
�
�
�
Γ(n1 + n2 ) xn1 (1 − x)n2 −1 n2 − 1 x n1
n2 −2
FB (x) =
+
b (1 − b)
db ,
Γ(n1 )Γ(n2 )
n1
n1
0
(3.314)
where it can be observed that the remaining integral is of the same form as the
original except that the exponent of b has been incremented while the exponent
of (1 − b) has been decremented. It follows that integration by parts can be
applied iteratively until the last integral term contains only a power of b and is
157
Probability and stochastic processes
thus expressible in closed form:
�
Γ(n1 + n2 ) xn1 (1 − x)n2 −1
FB (x) =
+
Γ(n1 )Γ(n2 )
n1
�
n2 − 1 xn1 −1 (1 − x)n2 −2
+ ···
n1
n1 + 1
�
n2 − 2 xn1 +n2 −1 (1 − x)
··· +
+
n1 + 1
n1 + n2 − 2
�� �
xn1 +n2 −1
··· .
(n1 + n2 − 1)(n1 + n2 )
(3.315)
By inspection, a pattern can be identified in the above, thus leading to the
following finite series expression:
FB (x) = xn1 Γ (n1 + n2 + 1)
n�
2 −1
m=0
(1 − x)n2 −1−m xm
.
Γ(n1 + m + 1)Γ(n2 − m)
(3.316)
However, the above development is valid only if the degrees-of-freedom are
integers. But the PDF exists even if that is not the case. In general, we have:
� �
�
Γ(n1 + n2 )
n
1 − n2 ��
n1
FB (x) =
x 2 F1
(3.317)
�x .
1 + n1
Γ(n1 + 1)Γ(n2 )
The kth raw moment is given by
�
� �
Γ(n1 + n2 ) 1 k+n1 −1
k
B
=
b
(1 − b)n2 −1 db
Γ(n1 )Γ(n2 ) 0
Γ(n1 + n2 )Γ(k + n1 )
=
.
Γ(n1 )Γ(k + n1 + n2 )
(3.318)
Therefore, the mean is given by
�B� =
the second raw moment by
� 2�
B =
and the variance by
2
σB
=
=
=
�
n1
,
n1 + n2
n1 (1 + n1 )
,
(n1 + n2 )(1 + n1 + n2 )
(3.319)
(3.320)
�
B 2 − �B�2
n1 (1 + n1 )
n21
−
(n1 + n2 )(1 + n1 + n2 ) (n1 + n2 )2
n1 (1 + n1 − n1 n2 )
.
(n1 + n2 )2 (1 + n1 + n2 )
(3.321)
158
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Nakagami-m distribution
The Nakagami-m distribution was originally proposed by Nakagami [Nakagami, 1960] to model a wide range of multipath fading behaviors. Joining
the Rayleigh and Rice PDFs, the Nakagami-m distribution is one of the three
most encountered models for multipath fading (see chapter 4 for more details).
Unlike the other two, however, the Nakagami PDF was obtained through empirical fitting with measured RF data. It is of interest that the Nakagami-m
distribution is very flexible, being controlled by its m parameter. Thus, it includes the Rayleigh PDF as a special case, and can be made to approximate
closely the Rician PDF.
The Nakagami-m PDF is given by
2 � m �m 2m−1 −m y
y
e Ω u(y),
fY (y) =
(3.322)
Γ(m) Ω
where m is the distribution’s parameter which takes on any real value between
1
2 and ∞, and Ω is the second raw moment, i.e.
� �
Ω= Y2 .
(3.323)
In general, the kth raw moment of Y is given by
� � Γ � k + m� � Ω �k/2
2
Yk =
.
Γ(m)
m
Likewise, the variance is given by
� �
σY2 = Y 2 − �Y �2
�
�
��
1 Γ2 12 + m
= Ω 1−
.
m Γ2 (m)
(3.324)
(3.325)
� 1 The
� Nakagami distribution reduces to a Rayleigh PDF if m = 1. If m ∈
,
1
, distributions which are more spread out (i.e. have longer tails) than
2
Rayleigh result. In fact, a half-Gaussian distribution is obtained with the minimum value m = 12 . Values of m above 1 result in distributions with a more
compact support than the Rayleigh PDF. The distribution is illustrated in Figure 3.11 for various values of m.
It was found in [Nakagami, 1960] that a very good approximation of the
Rician PDF can be obtained by relating the Rice factor K and the Nakagami
parameter m as follows:
K=
m2 − m
√
,
m − m2 − m
m > 1,
(3.326)
159
Probability and stochastic processes
with the inverse relationship being
m=
(K + 1)2
K2
=1+
.
2K + 1
2K + 1
fY (y)
2
(3.327)
m = 12 , 34 , 1, 32 , 2, 52 , 3
1.5
1
0.5
0
0
1
2
3
4
5
6
y
Figure 3.11. The Nakagami distribution for various values of the parameter m with Ω = 1.
Lognormal distribution
The lognormal distribution, while being analytically awkward to use, is
highly important in wireless communications because it characterizes very
well the shadowing phenomenon which impacts outdoor wireless links. Interestingly, other variates which follow the lognormal distribution include the
weight and blood pressure of humans, and the number of words in the sentences of the works of George Bernard Shaw!
When only path loss and shadowing are taken into account (see chapter 4),
it is known that the received power Ω(dB) at the end of a wireless transmission
over a certain distance approximately follows a noncentral normal distribution,
i.e.
�
�
1
(p − µP )2
fΩ(dB) (p) = √
exp −
,
(3.328)
2σP2
2πσΩ
where
Ω(dB) = 10 log10 (P ).
(3.329)
160
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
After an appropriate transformation of r.v.’s, the lognormal distribution of P
is found to be
�
�
η
(10 log10 (x) − µP )2
fP (x) = √
exp −
u(x),
(3.330)
2σP2
2πσΩ x
10
where η = ln(10)
.
This density, however, is very difficult to integrate. Nonetheless, the kth raw
moment can be found by using variable substitution to revert to the Gaussian
form in the integrand:
� ∞
� �
k
P
=
xk fP (x)dx
0
�
�
� ∞
η
(10 log10 (p) − µP )2
k−1
√
dx
=
x
exp −
2σP2
2πσΩ 0
�
�
� ∞
1
(y − µP )2
= √
10ky/10 exp −
dy
2σP2
2πσΩ −∞
�
�
� ∞
1
(y − µP )2
ky/η
= √
e
exp −
dy
2σP2
2πσΩ −∞
� 2
�
� ∞
η (z − µP /η)2
η
kz
= √
e exp −
dz
(3.331)
2σP2
2πσΩ −∞
which, by virtue of [Prudnikov et al., 1986a, 2.3.15-11], yields
� 2
�
� �
1 σP 2 µP
k
P = exp
k +
k .
2 η2
η
It follows that the variance is given by
� �
σP2 = P 2 − �P �2
� 2
�
� 2
�
2σP
2µP
µP
2 σP
= exp
+
− exp
+
η2
η
2η 2
η
� 2
�
� 2
�
2σP
σP
2µP
2µP
= exp
+
− exp 2 +
η2
η
η
η
�
��
� 2�
�
2
σ
σ
2µP
= exp
+ P2
exp P2 − 1 .
η
η
η
6.
(3.332)
(3.333)
Multivariate statistics
Significant portions of this section follow the developments in the first few
chapters of [Muirhead, 1982] with additions, omissions and alterations to suit
our purposes. One notable divergence is the emphasis on complex quantities.
161
Probability and stochastic processes
Random vectors
Given an M × M random vector x, its mean is defined by the vector


�x1 �
 �x2 � 


µ = �x� =  ..  .
(3.334)
 . 
�xM �
The central second moments of x, analogous to the variance of a scalar
variate, are defined by the covariance matrix:
�
�
,M
H
Σx = [σmn ]m=1,···
(3.335)
n=1,··· ,M = (x − µ) (x − µ) ,
where
σmn = �(xm − µm )(xn − µn )∗ � .
(3.336)
Lemma 3.1. An M ×M matrix Σx is a covariance matrix iff it is non-negative
definite (Σ ≥ 0).
Proof. Consider the variance of aH x where a is considered constant. We have:
�
�
Var(aH x) =
aH (x − µx ) (x − µx )H a
�
�
= aH (x − µx ) (x − µx )H a
= aH Σx a.
(3.337)
� H
�2
The quantity �a (x − µx )� is by definition always non-negative. The variance above is the expectation of this expression and is therefore also nonnegative. It follows that the quadratic form aH Σx a ≥ 0, proving that Σx
is non-negative definite.
The above does not rule out the possibility that aH Σx a = 0 for some vector
a. This is only possible if there is some linear combination of the elements of
x, defined by vector a, which always yields 0. This is in turn only possible
if all elements of x are fully defined by x1 , i.e. there is only a single random
degree of freedom. In such a case, the complex vector x is constrained to a
fixed 2D hyperplane in R2M . If x is real, then it is constrained to a line in RM .
Multivariate normal distribution
Definition 3.40. An M × 1 vector has an M -variate real Gaussian (normal)
distribution if the distribution of aT x is Gaussian for all a in RM where a �= 0.
Theorem 3.2. Given an M × 1 vector x whose elements are real Gaussian
variates with mean
µx = �x� ,
(3.338)
162
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
and covariance matrix
�
�
Σx = (x − µ) (x − µ)T ,
(3.339)
then x is said to follow a multivariate Gaussian distribution, denoted x ∼
NM (µ, Σ), and the associated PDF is
�
�
1
1
−M
−
T
−1
fx (x) = (2π) 2 det(Σ) 2 exp − (x − µ) Σ (x − µ) .
(3.340)
2
Proof. Given a vector u where the elements are zero-mean i.i.d Gaussian variates with u ∼ N (0, I). We have the transformation
x = Au + µ,
(3.341)
which implies that
Σx =
�
(x − µ)(x − µ)T
�
�
= (Au)(Au)T
�
�
= A uuT AT
�
= AAT .
(3.342)
Assuming that A is nonsingular, the inverse transformation is
u = B(x − µ),
where B = A−1 .
The Jacobian is
J

∂u1
∂x1
···
..
.
···
..
.

= det 
∂uM
∂x1
= det(B) = det(A
(3.343)
∂u1
∂xM
..
.
∂uM
∂xM
−1
)



1
= det(A)−1 = det(Σ)− 2 .
(3.344)
Given that
fu (u) =
M
�
1
1
2
(2π)− 2 e− 2 um
m=1
−M
2
= (2π)
M
= (2π)− 2
�
M
1� 2
exp −
um
2
m=1
�
�
1
exp − uT u .
2
�
(3.345)
163
Probability and stochastic processes
Applying the transformation, we get
�
�
� −1 �H −1
1
1
−M
−
T
fx (x) = (2π) 2 det(Σ) 2 exp − (x − µ) A
A (x − µ) ,
2
(3.346)
�
�H
which, noting that Σ−1 = A−1 A−1 , constitutes the sought-after result.
Theorem 3.3. Given an M ×1 vector x whose elements are complex Gaussian
variates with mean u and covariance matrix
�
�
Σ = xxH ,
(3.347)
then x is said to follow a complex multivariate Gaussian distribution and this
is denoted x ∼ CN M (u, Σ). The associated PDF is
�
�
fx (x) = (π)−M det(Σ)−1 exp −(x − u)H Σ−1 (x − u) .
(3.348)
The proof is almost identical to the real case and is left as an exercise.
Theorem 3.4. If x ∼ CN M (u, Σ) and A is a fixed P × M matrix and b is a
fixed K × 1 vector, then
�
�
y = Bx + b ∼ CN P Bu + b, BΣBH .
(3.349)
Proof. From definition 3.40, it is clear that any linear combination of the elements of a Gaussian vector, real or complex, is a univariate Gaussian r.v. It
directly follows that y is a multivariate normal vector. Furthermore, we have
�y� =
=
=
� H�
yy
=
�AX + b�
A �X� + b
Au + b
(3.350)
�
�
(Ax + b − (Au + b)) (Ax + b − (Au + b))H
�
�
=
(A(x − u)) (A(x − u))H
�
�
= A (x − u)(x − u)H AH
= AΣAH ,
(3.351)
which concludes the proof.
Theorem 3.5. Consider x ∼ CN M (u, Σ) and the division of x, its mean
vector, and covariance matrix as follows:
�
�
�
�
�
�
x1
u1
Σ11 0
x=
u=
Σ=
,
(3.352)
x2
u2
0 Σ22
164
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
where x1 is P × 1 and has mean vector u1 and covariance matrix Σ11 , x2 is
(N − P ) × 1 and corresponds to mean vector u2 and covariance matrix Σ22 .
Then x1 ∼ CN P (u1 , Σ11 ) and x2 ∼ CN M −P (u2 , Σ22 ). Furthermore, x1
and x2 are not only uncorrelated, they are also independent.
Proof. To prove this theorem, it suffices to introduce the partitioning of x into
x1 and x2 into the PDF of x. Hence:
� �
�
�
��
��−1
�H
x1
Σ11 0
(x1 − u1 )
−M
fx
= (π) det
exp −
×
x2
0 Σ22
(x2 − u2 )
�
�−1 �
��
Σ11 0
(x1 − u1 )H
0 Σ22
(x2 − u2 )H
= (π)−M det (Σ11 )−1 det (Σ22 )−1 ×
�
�
exp (x1 − u1 )H Σ−1
11 (x1 − u1 ) ×
�
�
exp (x2 − u2 )H Σ−1
22 (x2 − u2 )
= fx1 (x1 )fx2 (x2 ).
(3.353)
Since the density factors, independence is established. Furthermore, the
PDFs of x1 and x2 have the expected forms, thus completing the proof.
The preceding theorem establishes a very important property of Gaussian
variates: if two (or more) Gaussian r.v.’s are uncorrelated, they are also automatically independant. While it may seem counterintuitive, this is not in
general true for arbitrarily-distributed variables.
Theorem 3.6. Consider x ∼ CN M (u, Σ) and the division of x, its mean
vector, and covariance matrix as follows:
�
�
�
�
�
�
Σ11 Σ12
x1
u1
Σ=
,
(3.354)
x=
u=
Σ21 Σ22
x2
u2
where x1 is P × 1 and has mean vector u1 and covariance matrix Σ11 , x2 is
(N − P ) × 1 and corresponds to mean vector u2 and covariance matrix Σ22 .
Then
+
y = x − Σ12 Σ+
22 x2 ∼ CN P (u1 − Σ12 Σ22 u2 ), Σ11.2 ),
(3.355)
where Σ11.2 = Σ11 − Σ12 Σ+
22 Σ21 , and y is independent from x2 . Furthermore, the conditional distribution of x1 given x2 is also complex Gaussian,
i.e.
�
�
(x1 |x2 ) ∼ CN P u1 + Σ12 Σ+
(3.356)
22 (x2 − u2 ), Σ11.2 .
Proof. To simplify the proof, it will be assumed that Σ22 is positive definite,
−1
thus implying that Σ+
22 = Σ22 . For a more general proof, see [Muirhead,
1982, theorem 1.2.11].
165
Probability and stochastic processes
Defining the matrix
B=
�
theorem 3.4 indicates that
y = Bx =
is a Gaussian vector with mean
v = �y� =
�
IP
0
−Σ12 Σ−1
22
IN −P
�
x1 − Σ12 Σ−1
22 x2
x2
�
,
(3.358)
�
u1 − Σ12 Σ−1
22 u2
u2
�
,
(3.359)
(3.357)
,
and covariance matrix
�
�
BΣBH = (y − v)(y − v)H
�
�
��
��
IP
0
Σ11 Σ12
IP −Σ12 Σ−1
22
=
Σ21 Σ22
−Σ−1
0
IN −P
22 Σ21 IN −P
�
�
Σ11.2 0
=
.
(3.360)
0
Σ22
Random matrices
Given an M × N matrix A = [a1 a2 · · · aN ], where
�am � = 0
�
am aH
= Σ, m = 1, . . . , M
m
�
�
H
am an
= 0, m �= n.
�
To fully characterize the covariances of a random matrix, it is necessary to
vectorize it, i.e.


Σ 0 ··· 0
 0 Σ ··· 0 
�
�


vec(A)vec(A)H =  .. .. . .
. 
 . .
. .. 
0 0
= IN ⊗ Σ,
··· Σ
(3.361)
where the above matrix has dimensions M N × M N and IN is the N × N
identity matrix.
It follows that if Σ = IM , the overall covariance is IN ⊗ IM = IM N .
166
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Given the transformation B = RAS, where P × M matrix R and N × Q
matrix S are fixed, we have
�B� = R �A� S.
(3.362)
From lemma (2.3), we have
�
�
vec(B) = SH ⊗ R vec(B),
which implies
and
�
�
�
�vec(B)� = SH ⊗ R �vec(B)� ,
vec(B)vec(B)H
�
(3.363)
(3.364)
�
��
�
�H �
SH ⊗ R vec(A) SH ⊗ B vec(A)
�� H
�
�
��
=
S ⊗ R vec(A)vec(A)H S ⊗ RH
�
�
�
�
= SH ⊗ R (IN ⊗ Σ) S ⊗ RH ,
(3.365)
=
��
where property 2.60 was applied to move from the first to the second line.
Applying property 2.61 twice, we finally get
�
�
Σvec(B) = vec(B)vec(B)H = SH S ⊗ RΣRH .
(3.366)
It follows that, if Σ = IM , the overall covariance matrix can be conveniently
expressed as
Σvec(B) = SH S ⊗ RRH ,
(3.367)
where RRH is the column covariance matrix and SH S is the line covariance
matrix.
Gaussian matrices
Theorem 3.7. Given a P × Q complex Gaussian matrix X such that X ∼
CN (M, C ⊗ D) where C is a Q × Q positive definite matrix, D is a P × P
positive definite matrix, and �X� = M, then the PDF of X is
fX (X) =
1
π P Q (det(C))P (det(D))Q
×
�
�
etr −C−1 (X − M)H D−1 (X − M) ,
(3.368)
X > 0,
where etr(X) = exp(tr(X)).
Proof. Given the random vector x = vec(X) with mean m = �vec(X)�, we
know from theorem 3.2 that its PDF is
�
�
1
fx (x) = P Q
exp −(x − m)H (C ⊗ D)−1 (x − m) .
π det (C ⊗ D)
(3.369)
167
Probability and stochastic processes
Equivalence of the above with the PDF stated in the theorem is demonstrated
first by observing that
det(C ⊗ D) = (det(C))P (det(D))Q ,
(3.370)
which is a consequence of property 2.64.
Second, we have
�
�
(x−m)H (C ⊗ D)−1 (x−m) = (x−m)H C−1 ⊗ D−1 (x−m), (3.371)
which, according to lemma 2.4(c), becomes
�
�
�
�
(x − m)H C−1 ⊗ D−1 (x − m) = tr C−1 (X − M)H D−1 (X − M) ,
(3.372)
which completes the proof.
7.
Transformations, Jacobians and exterior products
Given an M × 1 vector x which follows a PDF fx (x), we introduce a transformation y = g(x), thus finding that
where the Jacobian is
fy (y) = fx (g −1 (y)) |J(x → y)| ,


J(x → y) = det 
∂x1
∂y1
..
.
∂xM
∂y1
···
..
.
···
∂x1
∂yM
..
.
∂xM
∂yM
(3.373)


.
(3.374)
The above definition for the Jacobian, based on a determinant of partial
derivatives, certainly works and was used in deriving the multivariate normal
distribution. However, this definition is cumbersome for large numbers of variables.
We will illustrate an alternative approach based on the multiple integral below. Our outline follows [James, 1954] as interpreted in [Muirhead, 1982].
We have
�
I=
fx (x1 , . . . , xM )dx1 . . . dxM ,
(3.375)
S
M
R
where S is a subset of
and I is the probability that vector x takes on a
value in subset S.
Given a one-to-one invertible transformation
y = g(x),
the integral becomes
I=
�
S†
fx (g −1 (x)) |J(x → y| dy1 . . . dyM ,
(3.376)
(3.377)
168
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
where S † is the image of S under the transformation, i.e. S † = g(S).
We wish to find an alternative expression of dx1 · · · dxM as a function of
dy1 · · · dyM . Such an expression can be derived from the differential forms
dxm =
∂xm
∂xm
∂xm
dy1 +
dy2 + · · · +
dyM .
∂y1
∂y2
∂yM
(3.378)
These can be substituted directly in (3.377). Consider for example the case
where M = 2. We get
�
��
�
�
∂x1
∂x1
∂x2
∂x2
−1
I=
fx (g (y))
dy1 +
dy2
dy1 +
dy2 .
∂y1
∂y2
∂y1
∂y2
S†
(3.379)
The problem at hand is to find a means of carrying out the product of the
two differential form in order to fall back to the Jacobian. If the multiplication
is carried out in the conventional fashion, we find
�
��
�
∂x1
∂x1
∂x2
∂x2
dy1 +
dy2
dy1 +
dy2 =
∂y1
∂y2
∂y1
∂y2
∂x1 ∂x2
∂x1 ∂x2
∂x1 ∂x2
dy1 dy1 +
dy1 dy2 +
dy2 dy1 +
∂y1 ∂y1
∂y1 ∂y2
∂y2 ∂y1
∂x1 ∂x2
dy2 dy2 .
(3.380)
∂y2 ∂y2
Comparing the above to the Jacobian
�
J(x → y)dy1 dy2 = det
∂x1
∂y1
∂x2
∂y1
∂x1
∂y2
∂x2
∂y2
�
=
�
∂x1 ∂x2 ∂x1 ∂x2
−
∂y1 ∂y2
∂y2 ∂y1
�
,
(3.381)
we find that the two expressions ((3.379) and (3.381)) can only be reconciled
if we impose non-standard multiplication rules and use a noncommutative, alternating product where
dym dyn = −dyn dym .
(3.382)
This implies that dym dym = −dym dym = 0. This product is termed the
exterior product and denoted by the symbol ∧. According to this wedge
product, the product (3.379) becomes
�
�
∂x1 ∂x2 ∂x1 ∂x2
dx1 dx2 =
−
dy1 ∧ dy2 .
(3.383)
∂y1 ∂y2
∂y2 ∂y1
Theorem 3.8. Given two N × 1 real random vectors x and y, as well as the
transformation y = Ax where A is a nonsingular M × M matrix, we have
169
Probability and stochastic processes
dy = Adx and
M
�
dym = det(A)
m=1
M
�
dxm .
m=1
Proof. Given the properties of the exterior product, it is clear that
M
�
dym = p(A)
m=1
M
�
(3.384)
dxm ,
m=1
where p(A) is a polynomial in the elements of A.
Indeed, the elements of A are the coefficients of the elements of x and as
such are extracted by the partial derivatives (see e.g. (3.383)).
The following outstanding properties of p(A) can be observed:
(i) If any row of A is multiplied by a scalar factor α, then p(A) is likewise
increased by α.
(ii) If the positions of two variates yr and ys are reversed, then the positions of
dyr and dys are also reversed in the exterior product, leading to a change
of sign (by virtue of (3.382). This, however, is equivalent to interchanging
rows r and s in A. It follows that interchanging two rows of A reverses the
sign of p(A).
(iii) If A = I, then p(I) = 1 since this corresponds to the identity transformation.
However, these three properties correspond to properties 2.29-2.31 of determinants. In fact, this set of properties is sufficiently restrictive to define determinants; they actually correspond to Weierstrass’ axiomatic definition of determinants [Knobloch, 1994]. Therefore, we must have p(A) = det(A).
Rewriting (3.375) using exterior product notation, we have
�
I=
fx (x1 , . . . , xM )dx1 ∧ dx2 . . . ∧ dxM ,
(3.385)
S
From (3.378), we know that


dx = 
∂x1
∂y1
..
.
∂xM
∂y1
···
..
.
···
∂x1
∂yM
..
.
∂xM
∂yM


.
Therefore, theorem 3.8 implies that
��
� M
�
M
�
∂xm m=1,··· ,M �
dxm = det
dym ,
∂yn n=1,··· ,M
m=1
m=1
(3.386)
(3.387)
170
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
where the Jacobian is the magnitude of the right-hand side determinant.
In general, we have
M
�
m=1
dxm = J(x → y)
M
�
dym .
(3.388)
m=1
For an M × N real, arbitrary random matrix X, we have in general


dx11
···
dx1N

..
..  ,
..
dX = 
(3.389)
.
.
. 
dxM 1 · · · dxM N
which is the matrix of differentials, and we denote
(dX) =
N �
M
�
(3.390)
dxmn ,
n=1 m=1
the exterior product of the differentials.
If X is M × M and symmetric, it really has only 12 M (M + 1) distinct
elements (the diagonal and above-diagonal elements). Therefore, we define
(dX) =
M �
n
�
dxmn =
n=1 m=1
�
dxmn .
(3.391)
m≤n
In the same fashion, if X is M × M and skew-symmetric, then there are
only 12 M (M − 1) (above-diagonal) elements and
(dX) =
M n−1
�
�
n=1 m=1
dxmn =
�
dxmn .
(3.392)
m<n
It should be noted that a transformation on a complex matrix is best treated
by decomposing it into a real and imaginary part, i.e. X = �{X} + j�{X} =
Xr + jXi which naturally leads to
(dX) = (dXr )(dXi ),
(3.393)
since exterior product rules eliminate any cross-terms. We offer no detailed
proof, but the proof of theorem (3.16) provides some insight into this matter.
It is also noteworthy that
d(AB) = AdB + dAB.
(3.394)
171
Probability and stochastic processes
Theorem 3.9. Given two N × 1 complex random vectors u and v, as well as
the transformation v = Au where A is a nonsingular M × M matrix, we
have dv = Adu and
(dv) = det(A)2 (du) .
Proof. Given the vector version of (3.393), we have
(dv) = (d�{v}) (d�{v}) ,
(3.395)
and, applying theorem 3.8, we obtain
(d�{v}) = |det(A)| (d�{u})
(d�{v}) = |det(A)| (d�{u}) ,
which naturally leads to
(dv) = (d�{v}) (d�{v})
= det(A)2 (d�{u}) (d�{u})
= det(A)2 (du) .
(3.396)
(3.397)
Selected Jacobians
What follows is a selection of Jacobians of random matrix transformations
which will help to support forthcoming derivations. For convenience and clarity of notation, it is the inverse transformations that will be given.
Portions of this section follow [Muirhead, 1982, chapter 2] and [Ratnarajah
et al., 2004] (for complex matrices).
Theorem 3.10. If X = BY where X and Y are N ×M real random matrices
and B is a fixed positive definite N × N matrix, then
which implies that
(dX) = |det(B)|M (dY),
(3.398)
J(X → Y) = |det(B)|M .
(3.399)
Proof. The equation X = BY implies dX = BdY. Letting
dX = [dx1 , · · · , dxM ] ,
dY = [dy1 , · · · , dyM ] ,
we find
dxm = Bdym ,
m = 1, · · · , M,
(3.400)
172
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
which, by virtue of theorem 3.8, implies that
N
�
dxnm = det(B)
n=1
N
�
dynm ,
n=1
m = 1, · · · , M.
(3.401)
Therefore, we have
M �
N
�
(dX) =
=
dxnm
m=1 n=1
M
�
det(B)
m=1
= det(B)M
N
�
dynm
n=1
M
N
� �
dynm
m=1 n=1
= det(B)M (dY).
Theorem 3.11. If X = BYBT where X and Y are M × M symmetric real
random matrices and B is a non-singular fixed M × M matrix, then
(dX) = |det(B)|M +1 (dY).
(3.402)
Proof. The equation X = BYBT implies dX = BdYBT . It follows that
�
�
(dX) = BdYBT = p(B)(dY).
(3.403)
where p(B) is a polynomial in the elements of B. Furthermore it can be shown
that if
B = B1 B2 ,
(3.404)
then
p(B) = p(B1 )p(B2 ).
Indeed, we have
p(B)(dY) =
=
=
=
where (3.403) was applied twice.
(B1 B2 dY(B1 B2 )T )
(B1 B2 dYBT2 BT1 )
p(B1 )(B2 dYBT2 )
p(B1 )p(B2 )(dY),
(3.405)
173
Probability and stochastic processes
It turns out that the only polynomials in the elements of B that can be factorized as above are the powers of det(B) (see prop. 2.40). Therefore, we
have
p(B) = (det(B))k ,
where k is some integer.
We can isolate k by letting B = det (β, 1, · · · , 1) such that

 2
β y11 by12 · · · by1M
 βy12 y22 · · · by2M 


BdYBT = 
..
..
..  .
.
.

.
.
.
. 
by1M
y2M
(3.406)
· · · yM M
It follows that the exterior product of the distinct elements (diagonal and
above-diagonal elements) of dX is
(dX) = (BdYBT ) = β M +1 (dY).
Given that p(B) = β M +1 = det(B)M +1 , the proof is complete.
Theorem 3.12. If X = BYBT where X and Y are M × M skew-symmetric
real random matrices and B is a non-singular fixed M × M matrix, then
(dX) = |det(B)|M −1 (dY).
(3.407)
Proof. The proof follows that of theorem 3.11 except that by definition, we
take the exterior product of the above diagonal elements only in (3.406) since
X and Y are skew-symmetric.
Theorem 3.13. If X = BYBH where X and Y are M × M positive definite
complex random matrices and B is a non-singular fixed M × M matrix, then
(dX) = |det(B)|2M (dY).
(3.408)
Proof. The proof follows from the fact that real and imaginary parts of a Hermitian matrix are symmetric and skew-symmetric, respectively. It follows that,
by virtue of theorems 3.11 and 3.12,
(dXr ) = det(B)M +1 (dYr )
(dXi ) = det(B)M −1 (dYi ).
(3.409)
(3.410)
It follows that
(dX) = (dXr )(dXi ) = det(B)2M (dYr )(dYi ) = det(B)2M (dY).
174
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Theorem 3.14. If X = Y−1 where X and Y are M × M complex nonsingular random matrices and Y is Hermitian, then
(dX) = det(Y)−2M (dY),
(3.411)
Proof. Since XY = I and applying (3.394), we have
d(XY) = XdY + dX · Y = dI = 0.
(3.412)
Therefore,
dX · Y = −XdY,
dX = −XdY · Y−1 ,
dX = −Y−1 dY · Y−1 ,
(3.413)
(3.414)
which implies that
(dX) = (Y−1 dY · Y−1 )
= det(Y)2M (dY),
by virtue of theorem 3.13.
Theorem 3.15. If A is a real M × M positive definite random matrix, there
exists a decomposition A = TT T (Cholesky decomposition) where T is upper
triangular. Furthermore, we have
(dA) = 2M
M
�
−m+1
tM
(dT),
mm
(3.415)
m=1
where tmm is the mth element of the diagonal of T.
Proof. The decomposition has the form


a11 a12 · · · a1M
 a12 a22 · · · a2M 


 ..
..
..  =
..
 .
.
.
. 
a1M





a2M
· · · aM M
t11
t12
..
.
0
t22
..
.
···
···
..
.
0
0
..
.
t1M
t2M
· · · tM M





t11 t12 · · · t1M
0 t22 · · · t2M
..
.. . .
..
.
.
.
.
0
0 · · · tM M



 . (3.416)

175
Probability and stochastic processes
We proceed by expressing each distinct element of A as a function of the
elements of T and then taking their respective differentials. Thus, we have
=
=
=
=
..
.
=
..
.
=
a11
a12
a1M
a22
a2M
aM M
t211 ,
t11 t12 ,
t11 t1M ,
t212 + t222
t12 t1M + t22 t2M ,
t21M + · · · + t2M M ,
where we note that
da11 = 2t11 dt11 ,
da12 = t11 dt12 + t12 dt11 ,
..
.
da1M = t11 dt1M + t1M dt11 .
When taking the exterior product of the above differentials, the second term
in da12 , · · · , da1M disappears since dt211 = 0 (according to the product rules)
so that we have:
M
M
�
�
M
da1j = 2t11
dt1M .
(3.417)
n=0
n=0
In the same manner, we find that
M
�
−m+1
damn = 2tM
mm
n=m−1
M
�
dtmM ,
n=m−1
thus leading naturally to
(dA) =
�
damn =
n≤m
= 2
M
M
�
m=1
= 2M
M
�
m=1
M
�
M
�
damn
m=0 n=m−1
−m+1
tM
mm
�
dtmn
n≤m
−m+1
tM
(dT).
mm
(3.418)
176
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Theorem 3.16. If A is a complex M × M positive definite random matrix,
there exists a decomposition A = TH T (Cholesky decomposition, def. 2.53)
where T is upper triangular. Furthermore, we have
(dA) = 2M
M
�
−2m+1
(dT),
t2M
mm
(3.419)
m=1
where tmm is the mth element of the diagonal of T.
Proof. Letting amn = αmn +jβmn and tmn = τmn +jµmn and in accordance
with def. 2.53 (where the diagonal imaginary elements µmm are set to zero),
we have


α11 α12 · · · α1M
 α12 α22 · · · a2M 


(3.420)
 ..
..
..  =
..
 .
.
.
. 
α1M

and





α2M
t211
· · · αM M
τ11 τ12

2 + τ2
 τ12 τ11 τ22
11

..
..

.
.
τ1M τ11
···
0
β12
..
.
β12
0
..
.
β1M

β2M
0
· · · β1M
· · · a2M
..
..
.
.
···
0
 −µ12 τ11


..

.
−µ1M τ11
···
τ11 τ1M
..
···
.
..
..
.
.
2
· · · τM
M + ···








=

τ11 µ12 · · ·
τ11 µ1M
0
· · · τ12 µ1M + τ22 µM 2 − µ12 τ1M
..
..
..
.
.
.
···
···
0
(3.421)



.

Taking the differentials of the diagonal and the upper-diagonal elements of
the real part of A, we have
dα11 = 2τ11 dτ11
dα12 = τ11 dτ12
..
.
dαM M = 2τM M dτM M .
177
Probability and stochastic processes
Likewise, the imaginary part yields
dβ12 = τ11 dµ12
dβ13 = τ11 dµ13
..
.
dβ1M = τ11 dµ1M .
dβM −1,M = τM −1,M −1 dµM −1,M + · · ·
It is noteworthy that the differential expressions above do not contain all the
terms they should. This is because they are meant to be multiplied together
using exterior product rules (so that repeated differentials yield 0) and redundant terms have been removed. Hence, only differentials in τmn were taken
into account for the real part, and only differentials in µmn were kept for the
imaginary part. Likewise, the first appearance of a given differential form precludes its reappearance in successive expressions. For example, dτ11 appears
in the expression for dα11 so that all terms with dτ11 are omitted in successive
expressions dα12 , . . . dαM M .
Hence, we have
(dA) = (d�{A})(d�{A})
=
M
�
dαmn
m≤n

�
dβmn
m<n
M −1
= 2M tM
· · · tM M
11 t22
�
M
�
m≤n
−1 M −2
tM
· · · tM −1,M −1
11 t22
= 2M
= 2M
M
�
m=1
M
�

dτmn  ×
M
�
dµmn
m<n
�
2M −2m+1
tmm
(d�{T})(d�{T})
2M −2m+1
tmm
(dT).
(3.422)
m=1
Theorem 3.17. Given a complex N × M matrix Z of full rank where N ≥ M
and let it be defined as the product Z = U1 T where U1 is an N × M unitary
matrix such that UH
1 U1 = IM , and T is an M × M upper triangular matrix
178
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
(QR decomposition), then we have
M
�
(dZ) =
−2m+1
t2N
(dT)(UH
1 dU1 ).
mm
(3.423)
m=1
Proof. We postulate a matrix U2 , being a function of U1 , of dimensions N ×
(N − M ) such that
U = [ U1 | U2 ] = [u1 , u2 , · · · , uN ] ,
(3.424)
is a unitary N × N matrix.
Furthermore, we have
M �
N
�
(UH
1 U1 ) =
uH
n dum .
(3.425)
m=1 n=m
We note that, according to the chain rule
dZ = dU1 T + U1 dT,
and
UH
1 dZ
=
�
UH
1
UH
2
�
dZ =
�
H
UH
1 dU1 T + U1 U1 dT
H
U2 dU1 T + UH
2 U1 dT
H
which, noting that UH
1 U1 = IM and U2 U1 = 0, reduces to
�
� H
U1 dU1 T + dT
UH
dZ
=
.
1
UH
2 dU1 T
(3.426)
�
The exterior product of the left-hand side of (3.427) is given by
� H �
U1 dZ = (detU1 )2M (dZ) = (dZ) ,
.
(3.427)
(3.428)
by virtue of theorem (3.10), (3.393) and the fact that U1 is unitary.
Considering now the lower part of the right-hand side of (3.427), the mth
row of UH
2 dU1 T is
� H
�
um du1 , · · · , uH
m ∈ [M + 1, N ].
(3.429)
m duM T,
Applying theorem 3.9, the exterior product of the elements in this row is
2
|detT|
M
�
uH
m dun ,
(3.430)
n=1
and the exterior product of all elements of UH
2 dU1 T is
det(T)2
M
�
n=1
uH
m dun ,
(3.431)
179
Probability and stochastic processes
and the exterior product of all elements of UH
2 dU1 T is
�
�
N
M
�
�
�
� H
uH
U2 dU1 T =
det(T)2
m dun
n=1
m=M +1
2(N −M )
= det
N
�
M
�
uH
m dun .
(3.432)
m=M +1 n=1
We now turn our attention to the upper part of the right-hand side of (3.427),
the latter being UH
1 dU1 T + dT. Since U1 is unitary, we have
UH
1 U1 = IM .
H
Taking the differential of the above yields UH
1 dU1 dU1 U1 = 0 which
implies that
� H �H
H
.
(3.433)
UH
1 dU1 = −dU1 U1 = − U1 U1
For the above to hold, UH
1 dU1 must be skew-Hermitian, i.e. its real part is
skew-symmetric and its imaginary part is symmetric.
It follows that the real part of UH
1 dU1 can be written as


H
0
−�{uH
2 du1 } · · · −�{uM du1 }

 �{uH du1 }
0
· · · −�{uH
2
M u2 } 

�{UH
dU
}
=


.
.
.
1
..
1
..
..
..


.
�{uH
M u1 }
�{uH
M u2 }
···
0
(3.434)
Postmultiplying the above by T and retaining only the terms contributing to
the exterior product, we find that the subdiagonal elements are


0
···
···
···
···
 �{uH du1 }t11
···
···
···
··· 
2


 �{uH du1 }t11 �{u3 du2 }t22 · · ·

·
·
·
·
·
·
3

.


..
..
..
.
.
.
.

.
.
.
.
. 
H
�{uH
M du1 }t11 �{uM du2 }t22 · · · �{uM uM −1 }tM −1,M −1 · · ·
(3.435)
�
�
Thus, we find that the exterior product of the subdiagonal elements of �{ UH
1 U1 T}
is given by
�
−1 �M
H
M −2 M
H
tM
11
m=2 �{um du1 }t2 2
m=3 �{um du2 } · · ·
tM −1,M −1 �{uH
M duM −1 } =
��
��
�
M
M
M
M −m
H
(3.436)
m=1 tmm
m=1 n=m+1 �{un dum }.
180
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
In the same fashion, we find that the exterior product of the diagonal and
subdiagonal elements of �{UH
1 dU1 }T is
� M M
� M
�
� �
M −m+1
�{uH
(3.437)
tmm
n dum }.
m=1 n=m
m=1
�
�
Clearly, the above-diagonal and diagonal elements of �{ UH
1 dU1 T} contribute nothing to the exterior product since they all involve an element of dU1
and all such elements already appear in the subdiagonal
� portion.
� The same
argument appears to the above diagonal elements of �{ UH
dU
1 T. Further1
H
more, it can be verified that the inclusion of dT in U1 U1 + dT amounts to
multiplying the exterior product by
(dT) =
M
�
dtmn .
(3.438)
m≤n
Multiplying (3.431), (3.435), (3.437) and (3.438), we finally find that
(dZ) =
M
�
m=1
�
�
t2N −2m+1 (dT) UH
1 dU1 .
(3.439)
Theorem 3.18. Given a real N × M matrix X of full rank where N ≥ M and
let it be defined as the product X = H1 T where H1 is an N × M orthogonal
matrix such that HH
1 H1 = IM , and T is an M × M real upper triangular
matrix (QR decomposition), then we have
(dX) =
M
�
−m
H
tN
mm (dT)(H1 dH1 ),
(3.440)
m=1
where
(HH
1 dH1 ) =
M
�
N
�
hH
n dhm .
(3.441)
m=1 n=m+1
The proof is left as an exercise.
Multivariate Gamma function
Definition 3.41. The multivariate Gamma function is a generalization of the
Gamma function and is given by the multivariate integral
�
ΓM (a) =
etr(−A)det(A)a−(M +1)/2 (dA),
(3.442)
A>0
181
Probability and stochastic processes
where the integral is carried out over the space of real positive definite (symmetric) matrices.
It is a point of interest that it can be verified that Γ1 (a) = Γ(a).
Definition 3.42. The complex multivariate Gamma function is defined
�
Γ̃M (a) =
etr(−A)det(A)a−M (dA),
(3.443)
A>0
where the integral is carried out over the space of Hermitian matrices.
Theorem 3.19. The complex multivariate Gamma function can be computed
in terms of standard Gamma functions by virtue of
Γ̃M (a) = π M (M −1)/2
M
�
m=1
Γ(a − m + 1).
(3.444)
Proof. Given the definition 3.19, we apply the transformation A = TH T to
obtain (by virtue of theorem 3.16)
�
M
�
�
�
M
−2m+1
Γ̃M (a) = 2
etr −TH T det(TH T)a−M
t2M
(dT),
mm
T,TH T>0
m=1
(3.445)
where we note that a triangular matrix has its eigenvalues along its diagonal.
Therefore,
M
�
det(TH T) =
t2mm ,
(3.446)
m=1
and
�
� �
tr −TH T =
|tmn |2 .
(3.447)
m≤n
Using (3.446) and (3.447), (3.445) can be rewritten in decoupled form, i.e.


�
M
M
M
�
�
�
Γ̃M (a) = 2M
exp −
|tmn |2 
t2a−2m+1
dtmn
mm
T,TH T>0
= 2M
M ��
�
−∞
m<n
M
�
m<n
��
M ��
�
m=1
∞
e
∞
2
m≤n
�
e−�{tmn } d�{tmn } ×
−�{tmn }2
−∞
0
∞
m=1
m≤n
�
d�{tmn } ×
�
2
e−tmm t2a−2m+1
dt
mm .
mm
(3.448)
182
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Knowing that
�
∞
2
e−t dt =
√
(3.449)
π,
−∞
and applying the substitution um = t2mm , we find
Γ̃M (a) = π
M (M −1)/2
M ��
�
m=1
0
∞
e−um ua−m
m dum
�
,
(3.450)
where the remaining integrals correspond to standard Gamma functions, thus
yielding
M
�
M (M −1)/2
Γ̃M (a) = π
Γ(a − m + 1).
(3.451)
m=1
Stiefel manifold
Consider the matrix H1 in theorem 3.18. It is an N × M matrix with orthonormal columns. The space of all such matrices is called the Stiefel manifold and it is denoted VM,N . Mathematically, this is stated
�
�
VM,N = H1 ∈ RN ×M ; HH
(3.452)
1 H1 = IM .
However, the complex counterpart of this concept will be found more useful
in the study of space-time multidimensional communication systems.
Definition 3.43. The N × M complex Stiefel manifold is the space spanned
by all N × M unitary matrices (denoted U1 ) and it is denoted ṼM,N , i.e.
�
�
ṼM,N = U1 ∈ CN ×M ; UH
1 U1 = IM .
Theorem 3.20. The volume of the complex Stiefel manifold is
�
� �
� H
� 2M π M N
Vol ṼM,N =
U1 dU1 =
.
Γ̃M (N )
ṼM,N
Proof. If the real parts and imaginary parts of all the elements of U1 are treated
as individual coordinates, then a given instance of U1 defines a point in 2M N dimensional Euclidian space.
Furthermore, the constraining equation UH
1 U1 = IM can be decomposed
into its real and imaginary part. Since UH
U
1 is necessarily Hermitian, the
1
real part is symmetric and leads to 12 M (M + 1) constraints on the elements
of U1 . The imaginary part is skew-symmetric and thus leads to 12 M (M − 1)
constraints. It follows that there are M 2 constraints on the position of the point
183
Probability and stochastic processes
in 2M N -dimensional Euclidian space corresponding to U1 . Hence, the point
lies on a 2M N − M 2 -dimensional surface in 2M N space.
Moreover, the constraining equation implies
M �
N
�
u2mn = M.
(3.453)
m=1 n=1
Geometrically,
this means that the said surface is a portion of a sphere of
√
radius M .
Given an N × M complex Gaussian matrix X with N ≥ M such that
X ∼ CN (0, IN ⊗ IM ). By virtue of theorem 3.7, its density is
�
�
fX (X) = π −N M etr −XH X .
(3.454)
Since a density function integrates to 1, it is clear that
�
�
�
etr −XH X = π M N .
(3.455)
X
Applying the transformation X = U1 T where U1 ∈ ṼM,N and T is upper
triangular with positive diagonal elements (since it is nonsingular in accordance with the definition of QR-decomposition), then we have
M
�
�
�
� �
tr XH X = tr TH T =
|tmn |2
m≤n
(dX) =
M
�
m=1
�
�
−2m+1
t2N
(dT) UH
mm
1 dU1 .
(3.456)
Then, eq. (3.455) becomes
� �
��
� H
�
�
�
M
M
2
2N −2m+1 (dT)
exp
−
|t
|
H
mn
m≤n
m=1 tmm
T,T T>0
ṼM,N U1 dU1
= πM N .
(3.457)
An integral having the same form as the integral above over the elements of
T was solved in the proof of theorem 3.19. Applying this result, we find that
� �
��
� H
�
�
�
M
M
2
2N −2m+1 (dT)
exp
−
|t
|
H
mn
m≤n
m=1 tmm
T,T T>0
ṼM,N U1 dU1
=
Γ̃M (N )
.
2M
From the above and (3.455), it is obvious that
� 2M π M N
�
Vol ṼM,N =
.
Γ̃M (N )
(3.458)
(3.459)
184
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
Wishart matrices
As a preamble to this subsection, we introduce yet another theorem related
to the complex multivariate Gamma function.
Theorem 3.21. Given an Hermitian M × M matrix C and a scalar a such
that �{a} > M − 1, then
�
�
�
etr −C−1 A det(A)a−M (dA) = Γ̃M (a)det(C)a ,
A>0
where integration is carried out over the space of Hermitian matrices.
Proof. Applying the transformation A = C1/2 BC1/2 , where C1/2 is the positive square root (e.g. obtained through eigenanalysis and taking the square root
of the eigenvalues) of A, theorem 3.13 implies that (dA) = det(C)M (dB).
Hence, the integral becomes
�
�
�
I =
etr −C−1 A det(A)a−M (dA)
�A>0
�
�
=
etr −C−1 C1/2 BC1/2 det(B)a−M (dB)det(C)a−M +M
�B>0
=
etr (−B) det(B)a−M (dB)det(C)a ,
(3.460)
B>0
Thus, the transformed integral now coincides with the definition of Γ̃M (a),
leading us directly to
I = Γ̃M (a)det(C)a .
(3.461)
From the above proof, it is easy to deduce that
fA (A) =
�
�
1
etr −C−1 A det(A)a−M (dA),
a
Γ̃M (a)det(C)
(3.462)
is a multivariate PDF since it integrates to 1 and is positive given the constraint
A > 0. This is an instance of the Wishart distribution, as detailed hereafter.
Definition 3.44. Given an N × M complex Gaussian matrix Z ∼ CN (0, IN ⊗
Σ) where N ≥ M , its PDF is given by
fZ (Z) =
1
πM N
N
|det(Σ)|
�
�
etr −Σ−1 ZH Z (dZ),
Z > 0,
then A = ZH Z is of size M × M and follows a complex Wishart density
with N degrees of freedom and denoted A ∼ CW M (N, Σ).
185
Probability and stochastic processes
Unless stated otherwise, it will be assumed in the following that the number
of degrees-of-freedom N is equal or superior to the Wishart matrix dimension
M , i.e. the Wishart matrix is nonsingular.
Theorem 3.22. If A ∼ CW M (N, Σ), then its PDF is given by
fA (A) =
etr(−Σ−1 A) |det(A)|N −M
Γ̃M (N ) |det(Σ)|N
.
Proof. Given that A = ZH Z in accordance with definition (3.44), we wish
to apply a transformation Z = U1 T where U1 is N × M and unitary (i.e.
UH
1 U1 = IM ) and T is M × M and upper triangular.
From the PDF of Z given in definition 3.44 and theorem 3.17, we have
1
fU1 ,T (U1 , T) =
N
π M N |det (Σ)|
M
�
m=1
�
�
etr −Σ−1 TH T ×
�
�
−2m+1
t2N
(dT) UH
mm
1 dU1 ,
(3.463)
H
where we note that A = ZH Z = TH UH
1 U1 T = T T and U1 can be
removed by integration, i.e.
1
fT (T) =
�
�
−1 H
etr
−Σ
T
T
×
π M N |det (Σ)|N
�
M
�
� H
�
2N −2m+1
tmm
(dT)
U1 dU1 ,
(3.464)
ṼM,N
m=1
which, according to theorem 3.20, yields
fT (T) =
2M
�
�
−1 H
etr
−Σ
T
T
×
Γ̃(N ) |det (Σ)|N
M
�
−2m+1
t2N
(dT) .
mm
(3.465)
m=1
Applying a second transformation A = TH T and by virtue of theorem
3.16, we find that
fA (A) =
2M
N
Γ̃(N ) |det (Σ)|
M
�
m=1
�
�
etr −Σ−1 A ×
−M )
t2(N
(dA) ,
mm
(3.466)
186
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
�
2(N −M )
where M
= |det (T)|2(N −M ) = |det (A)|N −M , which conm=1 tmm
cludes the proof.
While the density above was derived assuming that N is an integer, the
distribution exists for non-integer values of N as well.
Theorem 3.23. If A ∼ CW M (N, Σ), then its characteristic function is
φA (jΘ) = �etr (jΘA)� = det (I − jΘΣ)−N ,
where



Θ = 

2t11 t12 · · · t1M
t21 2t22 · · · t2M
..
..
..
..
.
.
.
.
tM 1 tM 2 · · · 2tM M





,M
= [tmn ]n=1,···
m=1,··· ,M + diag (t11 , t22 , · · · , tM M ) ,
and tmn is the variable in the characteristic function domain associated with
element amn of the matrix A. Since A is Hermitian, tmn = tnm .
Several aspects of the above theorem are interesting. First, a new definition of characteristic functions — using matrix Θ — is introduced for random
matrices, allowing for convenient and concise C. F. expressions. Second, we
note that if M = 1, the complex Wishart matrix reduces to a scalar a and its
characteristic function according to the above theorem is (1 − jtσ 2 )−N , i.e.
the C. F. of a 2N degrees-of-freedom chi-square variate.
Proof. We have
φA (jΘ) = �etr (jΘA)� =
=
�
A
etr((jΘ −
�
etr (jΘA) fA (A)(dA)
A
Σ−1 )A) |det(A)|N −M
Γ̃M (N ) |det(Σ)|N
(dA), (3.467)
where the integral can be solved by virtue of theorem 3.21 to yield
�
�−N
φA (jΘ) = det (Σ)−N Σ−1 − jΘ
= det (I − jΘΣ)−N .
(3.468)
Theorem 3.24. If A ∼ CW M (N, Σ) and given a K × M matrix X of rank K
H
(thus implying that K ≤ M ), then the product
XAX
�
� also follows a complex
H
Wishart distribution denoted by CW K N, XΣX .
187
Probability and stochastic processes
Proof. The C. F. of XΣXH is given by
� �
��
φ (jΘ) = etr jXAXH Θ ,
(3.469)
where, according to a property of traces, the order of matrices can be rotated
to yield
� �
��
φ (jΘ) = etr jAXH ΘX
�
�−N
= det IM − jXH ΘXΣ
,
(3.470)
which, according to property 2.35, is equivalent to
�
�−N
φ (jΘ) = det IM − jΘXΣXH
.
(3.471)
�
�
Since this is the C. F. of a CW M N, XΣXH variate, the proof is complete.
Theorem 3.25. If A ∼ CW M (N, Σ) and given equivalent partitionings of A
and Σ, i.e.
�
�
�
�
Σ11 Σ12
A11 A12
Σ=
,
A=
ΣH
AH
12 Σ22
12 A22
where A11 and Σ11 are of size M � × M � , then submatrix A11 follows a
CW M � (N, Σ11 ) distribution.
Proof. Letting X = [IM � 0] (an M � × M matrix) in theorem 3.24, we find that
XAXH = A11 and XΣXH = Σ11 , thus completing the proof.
Theorem 3.26. If A ∼ CW M (N, Σ) and x is any M × 1 random vector
independant of A such that the probability that x = 0 is null, then
xH Ax
∼ χ22N ,
xH Σx
and is independent of x.
Proof. The distribution naturally derives from applying theorem 3.24 and letting X = x therein.
1
To show independance, let y = Σ 2 x. Thus, we have
xH Ax
yH By
=
,
xH Σx
yH y
(3.472)
where B follows a CW M (N, IM ) distribution. This is also equivalent to
yH By
= zH Bz,
yH y
(3.473)
188
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
where z = √ yH is obviously a unit vector. It follows that the quadratic form
y y
is independant of the length of y.
Likewise, B has the same distribution as UH BU where U is any M × M
unitary matrix (i.e. a rotation in space). It follows that zH Bz is also independant of the angle of y in M -dimensional Hermitian space. Therefore, the
quadratic form is totally independant of y and, by extension, of x.
Theorem 3.27. Suppose that A ∼ CW M (N, Σ) where N ≥ M and given the
following partitionings,
�
�
�
�
Σ11 Σ12
A11 A12
Σ=
,
A=
ΣH
AH
12 A22
12 Σ22
where A11 and Σ11 are P × P , A22 and Σ22 are Q × Q, and P + Q = M ,
then
H
the matrix A1.2 = A11 − A12 A−1
22 A12 follows a CW P (M − Q, Σ1.2 )
H
distribution, where Σ1.2 = Σ1 1 − Σ12 Σ−1
22 Σ12 ;
A1.2 is independent from A12 and A22 ;
the PDF of A22 is CW Q (N, Σ22 );
�
�
the PDF of A12 conditioned on A22 is CN Σ12 Σ−1
22 A22 , A22 ⊗ Σ1.2 .
Proof. Given the distribution of A, it can be expressed A = XXH , where X
is CN (0, Σ ⊗ IN ) and N × M . Furthermore, X is partitionned as follows:
�
�
,
(3.474)
X = XH
XH
1
2
where X1 is P × N and X2 is Q × N . Since Y2 is made up of uncorrelated
columns, its rank is Q, Therefore, theorem 2.6 garantees the existence of a
matrix B of size (N − Q) × N of rank N − Q such that X2 BH = 0, BBH =
IN −Q , and
�
�
Y = XH
(3.475)
BH ,
2
is nonsingular.
Observe that
A22 = X2 XH
2
A12 = X1 XH
2
(3.476)
and
A11.2 = A11 − A12 A−1
22 A21
�
�
�
�
H −1
H
= X 1 IN − X H
X
X
X
2 2
2 X1 .
2
(3.477)
189
Probability and stochastic processes
Starting from
�
�−1
YH YYH
Y = IN ,
(3.478)
which derives directly from the nonsingularity of Y, it can be shown that
�
�
H −1
X 2 + BH X = I N ,
(3.479)
XH
2 X2 X2
by simply expanding the matrix Y into its partitions in (3.478) and exploiting
the properties of X2 and B to perform various simplifications.
Substituting (3.479) into (3.476), we find that
A11.2 = X1 BH BXH
1 .
(3.480)
Given that X is CN (0, Σ ⊗ IN ), it follows
� according to theorem �3.6 that
the distribution of X1 conditioned on CN Σ12 Σ−1
22 X2 , IN ⊗ Σ1.2 where
−1
Σ1.2 = Σ11 − Σ12 Σ22 Σ21 and the corresponding distribution is
fX1 |X2 (X1 |X2 ) =
1
(3.481)
×
π P N det(Σ11.2 )P
�
etr −Σ−1
11.2 (X1 −
�
M)(X1 − M)H ,
where M = Σ12 Σ−1
22 X2 .
�
�
Applying the transformation Z = A12 C = X1 YH , the Jacobian is
found to be
�
�2P
�
�P
J(X1 → A12 , C) = det YH
= det YYH = det (A22 )P (3.482)
by exploiting theorems 3.10 and 3.9. Furthermore, the argument of etr(·) in
(3.484) can be expressed as follows
��
�
��
�−1
A12 C − MY YYH
(X1 − M) (X1 − M)H =
×
��
�
�H
A12 C − MY
(3.483)
H
H
= (A12 − M) A−1
22 (A12 − M) + CC ,
thus yielding
fA12 ,C|X2 (A12 , C|X2 ) =
1
π P N det(Σ11.2 )P det(A22 )P
�
H
etr −Σ−1
11.2 CC −,
×
(3.484)
�
H
−1
Σ−1
(A
−
M)
A
(A
−
M)
.
12
12
11.2
22
Since the above density factors, this implies that C follows a CN
−Q ⊗
� (0, IN
−1
Σ1.2 ) distribution and that A12 conditioned on X2 follows a CN Σ12 Σ22 A22 ,
190
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
A22 ⊗ Σ1.2 ) distribution. Furthermore, C is independently distributed from
both A12 and X2 . This, in turn, implies that A1.2 = CCH is independently
distributed from A12 and A22 . It readily follows that A1.2 = CCH ∼
CW P (N − Q,� Σ1.2 ), A22 ∼ CW Q (N, �Σ22 ) and (A12 conditioned on A22
follows a CN Σ12 Σ−1
22 A22 , A22 ⊗ Σ1.2 , thus completing the proof.
Theorem 3.28. If A ∼ CW M (N, Σ) and given a K × M matrix X of rank
�
�
−1 H −1
K where K ≤ M
,
then
B
=
XA
X
follows a CW K (N − M + K,
�
�
�
−1
XΣ−1 XH
distribution.
1
1
Proof. Applying the transformation A2 = Σ− 2 AΣ− 2 , then A2 ∼ CW M (N ,
IM ) as a consequence of theorem 3.24. The problem can thus be recast in
terms of A2 (which has the virtue of exhibiting no correlations, thus simplifying the subsequent proof), i.e.
�
�−1 �
�
H −1
= YA−1
,
(3.485)
XA−1 XH
2 Y
1
where Y = XΣ 2 . It follows that
�
�−1 �
�−1
XΣ−1 X
= YYH
.
(3.486)
Performing the SVD of Y according to (2.115), we have
�
�
�
�
Y = U S 0 VH = US IK 0 VH .
Therefore, we have
�
�
H −1
YA−1
2 Y
�
�
=
US IK
=
�
�
H −1
SU
= US−1
�
�
�
0
�
IK
�
V
IK
0
H
A−1
2 V
0
�
�
V
D−1
H
�
�
IK
0
A−1
2 V
IK
0
�
�
H
SU
��−1
IK
0
(3.487)
�−1
��−1
S−1 UH ,
(US)−1
(3.488)
where D = VH A2 V and, according to theorem 3.24, its PDF is also CW M (N ,
IM ).
It is clear that the product inside the parentheses above yields the upper-left
K × K partition of D−1 . According to lemma 2.1, this is equivalent to
�
�
H −1
YA−1
= US−1 D11.2 S−1 UH ,
(3.489)
2 Y
191
Probability and stochastic processes
where D11.2 = D11 − D12 D−1
22 D21 which, by virtue of theorem 3.27, follows
CW K (N
follows that US−1 D11.2 S−1 UH is
� − M + K, IK ).−1It immediately
�
−1
CW K N − M + K, US S U where
US−1 S−1 UH
=
=
thus completing the proof.
�
�
YYH
�−1
XΣ−1 X
�−1
,
(3.490)
A highly useful consequence of the above theorem follows.
Theorem 3.29. Given a matrix A ∼ CW M (N, Σ) and x is any random vector
independent of A such that P (x = 0) = 0, then
xH Σ−1 x
∼ χ22N −2M +2 ,
xH A−1 x
and is independent of x.
The proof is left as an exercise.
Problems
3.1. From the axioms of probability (see definition 3.8), demonstrate that the
probability of an event E must necessarily satisfy P (X) ≤ 1.
3.2. What is the link between the transformation technique of a random variable used to obtain a new PDF and the basic calculus technique of variable
substitution used in symbolic integration?
3.3. Given your answer in 3.2, show that Jacobians can be used to solve multiple
integrals by permitting multivariate substitution.
3.4. Show that if Y follows a uniform distribution within arbitrary bounds a and
b (Y ∼ U (a, b)), then it can be found as a tranformation upon X ∼ U (0, 1).
Find the transformation function.
3.5. Propose a proof for the Wiener-Khinchin theorem (eq. 3.121).
3.6. Given that the characteristic function of a central 2 degrees-of-freedom chisquare variate with variance P is given by
Ψx (jv) =
1
,
1 − jπvP
derive the infinite-series representation for the Rice PDF (envelope).
3.7. Demonstrate (3.317).
192
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
3.8. From (3.317), derive the finite sum form which applies if n2 is an integer.
3.9. Show that
−M
2
fx (x) = (2π)
− 12
det(Σ)
�
�
1
T −1
exp − (x − µ) Σ (x − µ) . (3.491)
2
is indeed a PDF. Hint: Integrate over all elements of x to show that the area
under the surface of fx (x) is equal to 1. To do so, use a transformation of
random variables to decouple the multiple integrals.
3.10. Given two χ2 random variables Z1 and Z2 with PDFs
fZ1 (x) =
fZ2 (x) =
1
xn1 −1 e−x ,
Γ(n1 )
1
xn2 −1 e−x/2.5 ,
Γ(n2 )2.5n2
(a) What is the PDF of the ratio Y =
Z1
Z2 ?
(b) If n1 = 2 and n2 = 2, derive the PDF of Y = Z1 + Z2 . Hint: Use
characteristic functions and expansions into partial fractions.
3.11. Write a proof for theorem 3.18.
3.12. If X = BY, where X and Y are N × M complex random matrices and B
is a fixed positive definite N × N matrix, prove that the Jacobian is given
by
(dX) = |det(B)|2M (dY).
(3.492)
3.13. Show that if W = CW M (N, IM ), then its diagonal elements w11 , w22 ,
. . . , wM M are independent and identically distributed according to a χ2
law with 2M degrees of freedom.
3.14. Write a proof for theorem 3.29. Hint: start by studying theorems 3.25 and
3.26.
APPENDIX 3.A: Derivation of the Gaussian distribution
Given the binomial discrete probability distribution
„
«
N
g(x) = P (X = x) =
px (1 − p)N −x ,
x
(3.A.1)
where x is an integer between 0 and N , we wish to show that it tends towards the Gaussian
distribution as N gets large.
In fact, Abraham de Moivre originally derived the Gaussian distribution as an approximation
to the binomial distribution and as a means of quickly calculating cumulative probabilities (e.g.
the probability that no more than 7 heads are obtained in 10 coin tosses). The first appearance
193
APPENDIX 3.A: Derivation of the Gaussian distribution
of the normal distribution and the associated CDF (the probability integral) occurs in a latin
pamphlet published by de Moivre in 1733 [Daw and Pearson, 1972]. This original derivation
hinges on the approximation to the Gamma function known as Stirling’s formula (which was,
in fact, discovered in a simpler form and used by de Moivre before Stirling). It is the approach
we follow here. The normal distribution was rediscovered by Laplace in 1778 as he derived the
central limit theorem, and rediscovered independently by Adrian in 1808 and Gauss in 1809 in
the process of characterizing the statistics of errors in astronomical measurements.
We start by expanding log(g(x)) into its Taylor series representation about its point xm
where g(x) is maximum. Since the relation between g(x) and log(g(x)) is monotonic, log(g(x))
is also maximum at x = xm . Hence, we have:
log(g(x))
=
=
»
–
d log(g(x))
(x − xm ) +
dx
x=xm
»
–
1 d2 log(g(x))
(x − xm )2 +
2!
dx2
x=xm
»
–
1 d3 log(g(x))
(x − xm )3 + · · · ,
3!
dx3
x=xm
»
–
∞
X
1 dk log(g(x))
(x − xm )k .
k!
dxk
x=xm
log(g(xm )) +
(3.A.2)
k=0
Taking the logarithm of g(x), we find
log(g(x)) = log(N !) − log(x!) − log((N − x)!) + x log(p) + (N − x) log(1 − p). (3.A.3)
Since N is large by hypothesis and the expansion is about xm which is distant from both 0
and N (since it corresponds to the maximum of g(x)), thus making x and N − x also large,
Stirling’s approximation formula can be applied to the factorials x! and (N − x)! to yield
1
log(2πx) + x(log x − 1)
2
x(log(x) − 1).
≈
log(x!)
≈
(3.A.4)
Thus, we have
d log(x!)
≈ (log(x) − 1) + 1 = log(x),
dx
(3.A.5)
and
d log(N − x)!
dx
≈
d
[(N − x)(log(N − x) − 1)]
dx
=
−(log(N − x) − 1) + (N − x)
=
− log(N − x),
−1
N −x
(3.A.6)
which, combined with (3.A.3), leads to
d log(g(x))
≈ − log(x) + log(N − x) + log(p) − log(1 − p).
dx
(3.A.7)
194
SPACE-TIME METHODS, VOL. 1: SPACE-TIME PROCESSING
The maximum point xm can be easily found by setting the above derivative to 0 and solving
it for x. Hence,
«
„
p N − xm
= 0
log
1 − p xm
p N − xm
= 1
1 − p xm
(N − xm )p = (1 − p)xm
xm (p + (1 − p))
xm
=
Np
=
N p.
(3.A.8)
We can now find the first coefficients in the Taylor series expansion. First, we have
»
–
d log(g(x))
= 0,
(3.A.9)
dx
x=xm
by definition, since xm is the maximum.
Using (3.A.8), we have
» 2
–
d log(g(x))
1
1
1
= −
−
=−
dx2
x
N
−
x
N
p(1
− p)
m
m
x=xm
» 3
–
d log(g(x))
1
1
1 − 2p
=
−
= 2 2
,
dx3
x2m
(N − xm )2
p N (1 − p)2
x=xm
(3.A.10)
(3.A.11)
where we note that the 3rd coefficient is much smaller (by a factor proportional to N1 ) than the
second one as N and xm grow large.
Since their contribution is not significant, all terms in the Taylor expansion beyond k = 2
are neglected. Taking the exponential of (3.A.2), we find
(x−x
)2
m
− 2N p(1−p)
g(x) = g(xm )e
,
(3.A.12)
which is a smooth function of x. It only needs to be normalized to become a PDF, i.e.
(3.A.13)
fX (x) = Kg(x),
where
K=
and
Z
∞
g(x)dx
=
−∞
=
=
=
Therefore, the distribution is
»Z
Z
∞
−∞
∞
–
g(x)dx ,
−
(x−xm )2
g(xm )e 2N p(1−p) dx
−∞
Z ∞
u2
−
g(xm )
e 2N p(1−p) du
−∞
Z ∞
v
1 −
√ e 2N p(1−p) dv
2g(xm )
2
v
0
p
g(xm ) 2πN p(1 − p).
fX (x) = √
(3.A.14)
(x−µ)2
1
−
e 2σ2 ,
2πσ
(3.A.15)
(3.A.16)
195
REFERENCES
where the variance is
and the mean is given by
σ 2 = N p(1 − p),
(3.A.17)
µ = xm = N p,
(3.A.18)
since the distribution is symmetric about its peak, which is situated at xm = N p.
References
[Andersen,1958] T. W. Andersen, An introduction to multivariate statistical analysis. New
York: J. Wiley & Sons.
[Davenport, 1970] W. B. Davenport, Jr., Probability and random processes. New York:
McGraw-Hill.
[Ratnarajah et al., 2004] T. Ratnarajah, R. Vaillancourt and M. Alvo, “Jacobians and hypergeometric functions in complex multivariate analysis,” to appear in Can. Appl. Math.
Quaterly.
[Daw and Pearson, 1972] R. H. Saw and E. S. Pearson, “Studies in the history of probability and statistics XXX : Abraham de Moivre’s 1733 derivation of the normal curve : a
bibliographical note,” Biometrika, vol. 59, pp. 677-680.
[Goodman, 1963] N. R. Goodman, “Statistical analysis based on a certain multivariate complex Gaussian distribution,” Ann. Math. Statist., vol. 34, pp. 152-177.
[Knobloch, 1994] E. Knobloch, “From Gauss to Weierstrass: determinant theory and its evaluations,” in The intersection of history and mathematics, vol. 15 of Sci. Networks Hist.
Stud., pp. 51-66.
[Leon-Garcia, 1994] A. Leon-Garcia, Probability and random processes for electrical engineering, 2nd ed. Addison-Wesley.
[James, 1954] A. T. James, “Normal multivariate analysis and the orthogonal group,” Ann.
Math. Statist., vol. 25, pp. 40-75.
[Muirhead, 1982] R. J. Muirhead, Aspects of mutivariate statistical theory. New York: J. Wiley
& Sons.
[Nakagami, 1960] M. Nakagami, “The m-distribution, a general formula of intensity distribution of rapid fading,” in Statistical Methods in Radio Wave Propagation, W. G. Hoffman,
Ed., Oxford: Pergamon.
[Prudnikov et al., 1986a] A. P. Prudnikov, Y. U. Brychkov and O. I. Marichev, Integrals and
Series, vol. 2: Elementary Functions, Amsterdam: Gordon and Breach.
[Prudnikov et al., 1986b] A. P. Prudnikov, Y. U. Brychkov and O. I. Marichev, Integrals and
Series, vol. 2: Special Functions, Amsterdam: Gordon and Breach.
[Papoulis and Pillai, 2002] A. Papoulis and S. U. Pillai, Probability, random variables, and
stochastic processes, 4th ed. New York: McGraw-Hill.
[Rice, 1944] S. O. Rice, ‘Mathematical analysis of random noise,” Bell Syst. Tech. J., vol. 23,
pp. 282-332.
[Rice, 1945] S. O. Rice, ‘Mathematical analysis of random noise,” Bell Syst. Tech. J., vol. 24,
pp. 46-156. [Reprinted with [Rice, 1944] in Selected papers on noise and stochastic processes, N. Wax, ed. New York: Dover, pp. 133-294].