Information Theoretic Bounds for Compound MIMO Gaussian Channels Member, IEEE, Member, IEEE

advertisement
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
1
Information Theoretic Bounds for Compound
MIMO Gaussian Channels
Stojan Z. Denic, Member, IEEE, Charalambos D. Charalambous, Senior Member, IEEE,
and Seddik M. Djouadi, Member, IEEE
Abstract—Achievable rates for compound Gaussian multipleinput multiple-output channels are derived. Two types of channels, modeled in the frequency domain, are considered when: 1)
the channel frequency response matrix H belongs to a subset
of H ∞ normed linear space, and 2) the power spectral density
(PSD) matrix of the Gaussian noise belongs to a subset of L1
space. The achievable rates of these two compound channels
are related to the maximin of the mutual information rate. The
minimum is with respect to the set of all possible H matrices
or all possible PSD matrices of the noise. The maximum is with
respect to all possible PSD matrices of the transmitted signal
with bounded power. For the compound channel modeled by the
set of H matrices, it is shown, under certain conditions, that the
code for the worst case channel can be used for the whole class of
channels. For the same model, the water-filling argument implies
that the larger the set of matrices H, the smaller the bandwidth
of the transmitted signal. For the second compound channel,
the explicit relation between the maximizing PSD matrix of the
transmitted signal and the minimizing PSD matrix of the noise is
found. Two PSD matrices are related through a Riccati equation,
which is always present in Kalman filtering and liner-quadratic
Gaussian control problems.
Index Terms - multiple-input multiple-output Gaussian
channel, compound channel, robustness, universal code,
power allocation, channel degrading
I. I NTRODUCTION
A compound communication channel is used to model the
situation when the transmitter and receiver, although unaware
of the true communication channel, know that it belongs to a
certain set of channels. The underlying assumption in the case
of compound channels is that the true channel does not change
throughout the transmission. Specifically, it models the situation in which “nature” chooses one channel from the set of all
possible channels and keeps it unchanged from the beginning
to the end of the transmission. Thus, a compound channel
represents one way to model the uncertainty the transmitter
and receiver have regarding the communication channel. The
importance of compound channels has been pointed out in
[1] (Section VII), where it is argued that compound channels
S. Z. Denic is with Telecommunications Research Lab, Toshiba Research
Europe Limited, BS1 4ND, Bristol, UK. (Email: stojan.denic@toshibatrel.com). Previously, he was with the Department of Electrical and Computer
Engineering, University of Arizona, Tucson, USA, and the Department of
Electrical and Computer Engineering University of Cyprus, Nicosia, Cyprus.
C. D. Charalamobus is with the Department of Electrical and Computer
Engineering, University of Cyprus, Nicosia, Cyprus; also with the School of
Information Technology and Engineering, University of Ottawa.
S. M. Djouadi is with the Department of Electrical and Computer Engineering, University of Tennessee, Knoxville, USA. (Email: djouadi@ece.utk.edu).
This work was supported by the European Commission under the project
ICCCSYSTEMS and the NSERC under an operating grant (T-810-289-01).
may be used to model different combinations of (slow/fast) (flat/frequency-selective) fading channels.
This paper is concerned with information theoretic bounds
for two types of compound multiple-input multiple-output
(MIMO) Gaussian channels. More precisely, the main goal
of this paper is to determine achievable transmission rates for
those channels. An achievable transmission rate is defined in
the following way: if an encoder uses a code rate smaller
than the achievable rate for a given communication channel,
the probability of the decoding error can be made arbitrarily
small as the codeword length increases. Hence, this is different
from channel capacity that is defined as maximal achievable
transmission rate for particular channel. The results presented
here have been partially published in [2].
To derive the achievable rates, the mutual information rate
formula for MIMO Gaussian channels, given in the frequency
domain, is employed. The mutual information rate J(Wx , Wn )
is given by ([3], page 146)
Z 2π
¢
¡
1
log det Ip + H(ejθ )Wx (θ)H ∗ (ejθ )Wn−1 (θ) dθ, (1)
4π 0
where Wx (θ) is the PSD matrix of a transmitted signal x,
Wn (θ) is the PSD matrix of a Gaussian additive noise n,
H(ejθ ) is the channel frequency response matrix, and θ ∈
[0, 2π]. The notation (·)∗ means complex conjugate transpose,
and Ip stands for the identity matrix of dimension p. Thus, we
deal with channels where the transmitted signal is constrained
in the power and then sent through a linear time-invariant filter
[4]. From (1), it can be noticed that there are two possible
sources of uncertainty; one is the lack of knowledge of the
channel matrix function H(ejθ ), while the other is the lack
of knowledge of the noise PSD matrix Wn (θ). In this paper,
these two types of uncertainties are modeled by two compound
channel models. The uncertainty about the true channel matrix
H(ejθ ) is due to impreciseness of the channel model or
channel measurements, while the uncertainty about the true
Wn (θ) may come from inability to estimate the interference
from other users in a communication network. The difference
between these two models of uncertainty is explained by
Blachman in [5]. According to his terminology, the uncertainty
regarding the channel matrix H(ejθ ) is called “interference of
the second kind”, while the uncertainty regarding the noise
PSD matrix Wn (θ) is called “interference of the first kind”.
The achievable rates of two compound channels will be
related to
sup inf J(Wx , Wn ),
Wx ∈A1 B
(2)
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
2
where J(Wx , Wn ) is defined by (1), A1 is the set of PSD
matrices of the transmitted signal having bounded power, and
B is the set of all possible channels, defined by the two channel
models discussed above. In subsequent developments, the set
B will be referred to as “uncertainty” set.
The first type of compound channels is described by the
set of channel matrices H(ejθ ) which is a subset of the
space of bounded and analytic matrix functions defined in the
open right-half plane H ∞ (for the definition of the space and
corresponding norm see Appendix C and [6]). Specifically, the
uncertainty set is represented by an additive model
H(ejθ ) = Hnom (ejθ ) + ∆H(ejθ ),
(3)
where Hnom (ejθ ) is the so-called nominal part, which is a
known matrix, and ∆H(ejθ ) is an unknown perturbation that
models the uncertainty and the size of the uncertainty set
[7]. In reality, the set of channel matrices H(ejθ ) can be
extracted from Nyquist or Bode plots, obtained from channel
measurements. An example of determining the capacity of
MIMO channels from measurement data was demonstrated in
[8].
Modeling uncertainty in the frequency domain enables us to
determine how the achievable rate and the optimal transmission bandwidth will change when the size of the uncertainty
set varies. The main contributions are the following. For a
general form of ∆H(ejθ ), we derive necessary conditions
of the maximin optimization problem given by (2). These
necessary conditions provide a solution of (2) when there is
an uncertainty in both, singular values and unitary matrices
of the singular value decomposition of the channel frequency
response matrix H(ejθ ). Further, the special case of ∆H(ejθ )
is considered, when the uncertainty is placed only on the
singular values of H(ejθ ). We infer, based on the behavior
of physical systems for high frequencies and the water-filling
argument, that the optimal bandwidth shrinks as the size of
uncertainty set, i.e., uncertainty increases. The worst case
channel matrix is identified, as well. It can be seen that all
other channels from the uncertainty set may be “degraded”
to the worst case channel implying that the code for the
worst case channel may be used for all channels within the
uncertainty set. It is shown, under certain conditions, that the
worst case concept can be applied to the general form of
∆H(ejθ ). Also, from the derived formula for the achievable
rate, it follows that with the increase of uncertainty, the
transmission takes place only over the largest singular values
of the channel frequency response matrix Hnom (ejθ ).
It should be noted that the formulation of the additive model
implies partial knowledge of the channel state information
(CSI) and no channel distribution information (CDI) regarding
H(ejθ ). In this paper, CSI refers to the knowledge of the
channel matrix realization H(ejθ ), and CDI refers to the
knowledge of its probability distribution.
The second type of compound channels is described by the
set of the noise PSD matrices
Z 2π
n
o
Wn (θ) :
Trace(Wn (θ))dθ ≤ Pn ,
(4)
0
which is a subset of the L1 space, the space of absolutely
integrable functions. A similar problem was studied in [9], but
for the case of memoryless channels. Our main contributions
are the following. While in [9], the maximin optimization
problem was solved numerically, in this paper, it is solved
analytically, providing a relation between the optimal PSD
matrix of the transmitted signal Wxo (θ), which maximizes the
mutual information rate J(Wx , Wn ), and the noise PSD matrix
Wno (θ), which minimizes it. Specifically, the two matrices
are related through a Riccati equation which is common
in optimal estimation and control problems [10]. When the
channel matrix H(ejθ ) is square and invertible, it turns out
that the noise PSD matrix Wno (θ) is proportional to the optimal
PSD matrix of the transmitted signal Wxo (θ). This represents
a generalization of the work found in [11], for the case of
MIMO compound channels. The achievability of J(Wxo , Wno )
follows from the classical result [12], by employing “Gaussian
codebooks” and maximum-likelihood decoding.
A. Literature Review
In this section, we give the review of the literature that is
related to the problems considered in the paper.
1) Channels with Complete CSI and Uncertain Channel
Matrix H: Probably, the first, who studied additive Gaussian
channels, where the transmitted signal, constrained in power,
is sent through a liner time-invariant filter, which is not bandlimited, is Gallager in [4]. He proved a channel coding theorem
and its converse in the case of continuous-time channels. His
work was generalized by Root and Varaiya [13], who computed the capacity of different classes of Gaussian channels.
They derived the capacity formulas for the class of Gaussian
MIMO memoryless channels and for the class of single-input
single-output continuous-time channels when the uncertainty
set is compact.
Brandenburg and Wyner, [12], deal with discrete-time
MIMO Gaussian channels when the channel matrix H(ejθ )
and the noise PSD Wn (θ) are completely known to the
transmitter and the receiver. The optimal PSD matrix of the
transmitted signal is given in terms of “water-filling” over the
singular values of the channel matrix H(ejθ ) in the frequency
domain.
Médard, in [14], computed upper and lower bounds on
the mutual information for uncertain fading channels, when
the receiver uses a channel estimate in the decoding for
single-user and multiple-access channels. In [15], Palomar
et al. considered a memoryless compound MIMO Gaussian
channel, in the absence of CDI and with partial CSI. The
uncertainty set is defined as the set of channel matrices with
unconstrained right singular vectors. It is shown that the
optimal covariance matrix of the transmitted signal provides
uniform power allocation among the transmitting antennas.
The effect of the channel estimation error on the capacity and
outage probability of memoryless MIMO fading channels is
studied in [16], when the transmitter and the receiver are aware
of the CDI and use the estimate of the channel matrix in the
decoding. The optimal spatial and temporal power adaptation
strategies are provided. It is interesting to notice that the
solution for the optimal spatial power adaptation, found in
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
[16], is similar to the optimal power adaptation obtained in
this paper for the case where the uncertainty is imposed on
the channel matrix H(ejθ ). The difference is that, in our
case, we consider the compound channel, where the solution
is obtained by applying the worst case channel concept and
ignoring possible randomness of H(ejθ ), while in [16], the
authors consider the capacity subject to imperfect CSI where
the channel matrix has a Rayleigh distribution.
The impact of a training sequence on the capacity of
MIMO fading channels is studied in [17]. For a block-fading
assumption, it is shown that the number of training symbols is
equal or greater than the number of the transmitting antennas.
Other related work includes the results on the capacity of
MIMO fading channels with non-zero mean (Ricean fading
channel) [18], [19], [20]. Ricean fading channel is modeled by
the sum of deterministic and random matrix whose elements
are independent and identically distributed zero-mean complex
circularly-symmetric Gaussian random variables. In [19], it is
shown that the capacity of this channel is monotonically nondecreasing in the singular values of the deterministic matrix,
when the CSI is available to the receiver but not to the
transmitter. In [20], the authors consider a compound Ricean
fading channel. It is assumed that the deterministic part of
the channel matrix is uncertain. Under the assumption that
the transmitter does not have CSI, the optimal signal matrix,
which achieves the capacity, is equal to the product of three
matrices: a T × T unitary matrix, a T × M real non-negative
diagonal matrix, and M × M unitary matrix. M denotes the
number of transmitting antennas, and T is the wireless channel
coherence time, measured in number of symbol periods.
In [21] and [22], the capacity of MIMO fading channels in
the presence of CDI, but not CSI is considered. The authors of
[21] assume block-fading channel model and take a geometric
approach in computing the channel capacity. It is found that
∗
the channel capacity grows linearly with M ∗ (1− MT ) for high
4
signal-to-noise ratio (SNR), where M ∗ = {M, P, bT /2c}. T
and M are defined as in [20], while P represents the number of
receive antennas. Also, it is shown that the optimal number of
transmitting antennas is M ∗ , i.e., the use of larger number of
antennas does not give larger capacity gain. If the block fading
assumption is removed, then for high SNR, it is proved that
the channel capacity grows only double logarithmically as a
function of SNR [22]. For consideration of how the lack of the
CSI affects the design of a MIMO receiver see, for example,
[23].
2) Uncertain Channel Noise: It appears that Blachman was
the first to investigate the channel capacity subject to uncertain
noise [11]. By using a two-player zero-sum game theoretical
framework, he defined “pure” and “mixed” strategies and
considered the case where the transmitter and noise (jammer) can decide between finite number of strategies. Hence,
communication subject to uncertain noise can be understood
as communication in presence of jamming, which is further
treated in, for instance, [24] and [25] for single-input singleoutput channels. In [26] and [27], Baker and Chao provide
the most general mathematical framework for analysis of compound channels with uncertain noise. Finite [26] and infinite
3
dimensional [27] communication channels are examined, and
the capacity and optimal transmitter and jammer strategies are
given as functions of eigenvalues of operators that model the
channel. In contrast to [27], we deal with stationary signals
such that the channel capacity is given in terms of the PSD’s
of the transmitted and noise signals, which could be more
desirable from the engineering point of view. In [28], the
capacity of memoryless MIMO channels, subject to noise
covariance constraints, is obtained. The authors establish many
interesting properties of the mutual information with respect
to the maximin optimization problem similar to that defined
in (2). Other references of interest for MIMO memoryless
uncertain noise channels include [29] and [30]. The former
gives the relation between the eigenvalues of the transmitted
signal covariance matrix and the covariance matrix of the
noise. The latter considers the case of jamming over MIMO
Gaussian Rayleigh channels when the jammer has access to
the transmitted signal. It is found that the knowledge of the
channel input does not bring any advantage to the jammer.
B. Paper Organization
Section II gives the definitions of two compound channels and corresponding achievable rates. Sections III and
IV provide explicit formulas for the achievable rates of the
compound channels defined in Section II, in terms of the
maximin of the mutual information rate. Section V discusses
the conditions under which the previously computed code rates
are achievable. Section VI contains examples.
C. Notation
Z represents the set of integers. C represents the set of
complex numbers. 0 is a zero matrix. arg(w) stands for
the phase of the transfer function w(ejθ ). σ̄(A) denotes the
maximal singular value of a matrix A. If A is a matrix, notation
A ≥ 0 means that A is non-negative definite. [A]ij denotes an
element of matrix A located at ith row and j th column.
II. P ROBLEM D EFINITION
Consider a discrete-time MIMO channel defined by
y(t) =
t
X
h(t − j)x(j) + n(t),
(5)
j=−∞
4
where x = {x(t) : t ∈ Z} is an m-component complex
stationary stochastic process representing a transmitted signal,
4
n = {n(t) : t ∈ Z} is a p-component complex Gaussian
4
stochastic process representing an additive noise, y = {y(t) :
t ∈ Z} is a p-component complex stationary stochastic process
4
representing a received signal, and h = {h(t) : t ∈ Z} is a
sequence of complex p × m matrices representing the impulse
response of a MIMO communication channel. It is assumed
that x generates a Hilbert space [3]. Here, H(ejθ ) represents
a channel frequency response matrix, which is the discrete
Fourier transform of h given by
H(ejθ ) =
+∞
X
t=0
h(t)ejθt ,
(6)
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
4
where
P+∞ θ is ajθtnormalized frequency. It is assumed that
converges in L2 (Fx ), to a limit in L2 (Fx ),
t=−∞ h(t)e
denoted by H(ejθ ). L2 (Fx ) is a Hilbert space of complexvalued Lebesgue-Stieltjes measurable functions H(ejθ ) of a
finite norm
Z 2π
4
kH(ejθ )kL2 (Fx ) = Trace
H(ejθ )dFx (θ)H ∗ (ejθ ). (7)
0
Fx denotes the matrix spectral distribution of x [3], which
is assumed to be absolutely continuous with respect to the
Lebesgue measure on [0, 2π]. Hence, dFx (θ) = Wx (θ)dθ,
where Wx (θ) represents the PSD matrix of x. Wn (θ) represents the PSD matrix of n.
Next, the two compound channels are defined as well as
the corresponding maximin optimization problems that will
be related to the achievable rates of these channels.
First problem: Channel unknown, noise known. The
compound channel is defined by a set
n
4
A2 = H ∈ H ∞ : H = Hnom + w∆,
o
H, Hnom , w, ∆ ∈ H ∞ , k∆k∞ ≤ 1 ,
(8)
where: 1) Hnom (ejθ ) is a nominal channel frequency response matrix, which is stable, and is the result of previous
measurement or belief regarding the channel, 2) w(ejθ ) is a
known stable scalar transfer function, which defines the size
of the uncertainty set A2 at each frequency θ, 3) ∆(ejθ ) is
a stable variable frequency response matrix, which accounts
for the phase uncertainty and acts as a scaling factor on the
magnitude of the perturbation (for the definition of H ∞ space
and associated norm see Appendix C). Thus, H(ejθ ) describes
a disk centered at Hnom (ejθ ) with a radius determined by
w(ejθ ). To explain this point better, observe that
σ̄(H − Hnom ) = σ̄(w∆) ≤ |w|,
(9)
because k∆(ejθ )k∞ ≤ 1 (for the definition of k · k∞ see
Appendix C). In other words, the maximum singular value of
H(ejθ )−Hnom (ejθ ) is bounded by |w(ejθ )|. It means that the
uncertainty set is defined by putting a bound on the maximum
singular value of the matrix implying that all other smaller
singular values of the matrix are bounded as well. This is
different from what it is done in, for instance [31], where
the uncertainty is defined with respect to the sum of singular
values of the channel matrix.
Further, introduce the set of all possible PSD matrices of
the transmitted signal by
Z 2π
n
o
4
A1 = Wx (θ) :
Trace(Wx (θ))dθ ≤ Px .
(10)
transmitter and the receiver, are aware of the uncertainty set
A2 .
Second problem: Channel known, noise unknown. The
noise uncertainty is defined through the uncertainty of the PSD
matrix Wn (θ). It is assumed that although unknown, Wn (θ)
belongs to a set
Z 2π
n
o
4
A3 = Wn (θ) :
Trace(Wn (θ))dθ ≤ Pn .
(12)
0
The same constraint
R 2πis introduced for the transmitter such that
4
A1 = {Wx (θ) : 0 Trace(Wx (θ))dθ ≤ Px }. An achievable
rate of the compound channel described by A3 is defined by
4
Ra2 = sup
4
Ra1 = sup
inf J(Wx , Wn ),
Wx ∈A1 H∈A2
(11)
where J(Wx , Wn ) is given by (1). It is also assumed that
the transmitter and the receiver know the nominal channel
frequency response matrix Hnom (ejθ ), as well as the size
of the uncertainty set |w(ejθ )|. This implies that both, the
J(Wx , Wn ).
(13)
In the next two sections, the solutions of (11) and (13) are
presented.
III. F IRST P ROBLEM : C HANNEL U NKNOWN , N OISE
K NOWN
When the channel matrix is completely known, its singular
value decomposition gives n independent parallel channels,
where n is the number of nonzero singular values [32]. But,
when the channel matrix is partially known as in (8), the application of singular value decomposition is not possible. This
causes a problem, for both maximization and minimization in
(11).
Since (11) is a double-constrained optimization problem, the
Lagrange multiplier technique is applied, which gives the necessary conditions that relate the optimal Wx (θ) and ∆(ejθ ).
For simplicity of the derivation, it is assumed that Wn (θ) = Ip ,
while the case when Wn (θ) 6= Ip can be treated by consider1/2
ing (Wn )−1 (θ)H(ejθ ) as an equivalent channel frequency
1/2
1/2∗
response matrix, where Wn (θ)Wn (θ) = Wn (θ).
Theorem 3.1: The solution of the maximin optimization
problem (11) is given by
Z 2π
1
log det(Ip + (Hnom + w∆o )Wxo (Hnom + w∆o )∗ )dθ,
4π 0
where the optimal Wx (θ) and ∆(ejθ ) are the solutions of
1
wWxo (Hnom + w∆o )∗ (Ip + (Hnom + w∆o ) ×
4π
Wxo (Hnom + w∆o )∗ )−1 + K∆o,∗ = 0,
and
(Hnom + w∆o )∗ (I + (Hnom + w∆o )Wxo ×
(Hnom + w∆o )∗ )−1 (Hnom + w∆o ) = 4πλ1 Im ,
0
An achievable rate of the compound channel described by (8)
is going to be related to the following maximin problem
inf
Wx ∈A1 Wn ∈A3
where
Z
2π
T race(Wxo (θ))dθ = Px ,
(14)
∆o,∗ (ejθ )∆o (ejθ ) − Im = 0,
(15)
0
and constant scalar λ1 > 0 and constant matrix K > 0 are
Lagrange multipliers.
Proof. The proof is given in Appendix A.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
Remark 3.2: Notice that in (8), the scalar-valued function
w(ejθ ) may be replaced by a matrix-valued function W (ejθ ),
while the forms of expressions in Theorem 3.1 remain unchanged if w(ejθ ) is substituted by W (ejθ ).
Theorem 3.1 provides the optimal power allocation for
the general form of perturbation ∆H(ejθ ). In this case, the
uncertainty is imposed on the singular values of H(ejθ )
as well as on the unitary matrices Uc (ejθ ) and Vc (ejθ ),
which are the factors in the singular value decomposition
of H(ejθ ) = Uc (ejθ )Σc (θ)Vc∗ (ejθ ). On the other hand, the
computation of the optimal power allocation given by Theorem
3.1 requires the application of numerical methods, and the
achievability of Ra1 might be difficult to prove.
Hence, the closed form solution of (11) will be given for a
specific structure of the matrix ∆(ejθ ),
"
#
jθ
∆
(e
)
0
1
∆(ejθ ) = U (ejθ )
V ∗ (ejθ ).
(16)
0
0
Here, ∆1 (ejθ ) is an n × n diagonal matrix, and n is the rank
of Hnom (ejθ ), n ≤ min(p, m). U (ejθ ) and V (ejθ ) are p × p
and m × m unitary matrices, respectively, which correspond
to the singular value decomposition of the nominal frequency
response matrix Hnom (ejθ ) = U (ejθ )Σ(θ)V ∗ (ejθ ), where
"
#
Σ1 0
Σ=
,
(17)
0 0
p×m
and Σ1 (θ) = diag(σ1 (θ), ..., σn (θ)) is a diagonal matrix
whose elements are called singular values of Hnom (ejθ ).
As it turns out, this assumption enables the application of
Hadamard’s inequality for the maximization of the mutual
information with respect to the PSD matrix Wx (θ) as in [32].
Previous assumption also means that the uncertainty is
imposed only on the singular values of H(ejθ ), but not on
the unitary matrices Uc (ejθ ) and Vc (ejθ ). Because the mutual
information is the function of singular values, determining the
achievable rate, (11) subject to (16) provides an achievable
rate when there is the uncertainty in the singular values of the
channel matrix H(ejθ ).
Physically, if H(ejθ ) models a wireless communication
channel, this situation could correspond to a fixed wireless link
when the transmit and receive antennas do not move. Then,
the angles of arrivals and departures are known, implying that
matrices Uc (ejθ ) and Vc (ejθ ) are known. However, the channel
gains may change due to the variations in the propagation
environment. In this case, the uncertainty in the channel gains
is described by the uncertainty in the singular values, and (11)
subject to (16) gives an achievable rate for this particular case.
The achievability of Ra1 in the special and general case is
addressed in Section V.
Theorem 3.3: The solution of maximin optimization problem (11) subject to (16) is given by
Ra1 =
n Z
1 X
log[µ(σi (θ) − |w(ejθ )|)2 ]dθ,
4π i=1 Si
(18)
(µ − (σi (θ) − |w(ejθ )|)−2 )dθ = Px ,
(19)
n Z
X
i=1
Si
5
where
4
Si = {θ : µ − (σi (θ) − |w(ejθ )|)−2 > 0},
(20)
i = 1, ..., n, and µ is a positive constant related to the Lagrange
multiplier of the power constraint associated with the set A1 .
A matrix ∆o1 (ejθ ), which minimizes the mutual information
rate, is given by
 −jarg(w)+jπ

e
0
···
0


0
e−jarg(w)+jπ · · ·
0

 . (21)
..
..
..
..


.
.
.
.
−jarg(w)+jπ
0
0
··· e
Proof. The proof is given in Appendix A.
Corollary 3.4: If |w(ejθ )| is equal to zero, the formula for
the channel capacity when there is no uncertainty is obtained
[12].
Remark 3.5: In Appendix A, it is proved that the optimal
solutions of (11), Wxo (θ) and ∆o (ejθ ), satisfy a saddle point
property, which is equivalent to saying that
sup
inf J(Wx , Wn ) = inf
Wx ∈A1 H∈A2
sup J(Wx , Wn ). (22)
H∈A2 Wx ∈A1
This is shown by solving maximin and minimax problems
directly, rather than using the Minimax Theorem of von
Neumann [33].
Theorem 3.3 shows that the optimal transmitted power is
given in the form of modified water-filling which implies that
Ra1 will be different from zero only if there exists an interval
of frequencies such that σi (θ) − |w(ejθ )| > 0, i=1,...,n. If the
uncertainty |w(ejθ )| grows over the whole frequency range,
at one point it will reach the lowest singular value σn (θ) of
the nominal frequency response matrix Hnom (ejθ ). Then, the
optimal way of transmission is to concentrate the power on the
rest of the channel modes, and not to allocate the power to the
mode that vanished because of uncertainty. By the same token,
if the uncertainty keeps growing, the modes will vanish one
by one, and at the end only the strongest mode will remain. In
addition, the form of (18) indicates that we may deal with the
worst case channel that is characterized by the singular values
of the nominal channel frequency response matrix Hnom (ejθ )
reduced by the size of the uncertainty set |w(ejθ )|. More
discussions on these issues are given in Section V.
Another interesting point, which is obtained from the waterfilling (18) concerns the dependence of the optimal bandwidth
on the uncertainty. Note the following: Theorem 3.3 implies
that the the optimal bandwidth depends on the frequencies
over which µ − (σi (θ) − |w(ejθ )|)−2 , i = 1, ..., n, is positive.
The optimal power allocation is found by pouring the power,
constrained by Px , into the wells (σi (θ) − |w(ejθ )|)−2 , i =
1, ..., n, till level µ. The shapes of the wells depend on the size
of the uncertainty set |w(ejθ )|, for each frequency θ. On the
other hand, physical systems are subject to more uncertainties
at high frequencies than at low frequencies. Thus, |w(ejθ )|
should be assigned larger values at high frequencies.
This aspect is explained by considering a wireless fading
channel. The low-pass representation of a time-varying im-
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
6
while the optimal PSD matrix of the transmitted signal satisfies
1
V w H ∗ (ejθ )Wno−1 (θ)(Ip + H(ejθ )Wxo (θ) ×
H ∗ (ejθ )Wno−1 (θ))−1 H(ejθ ) = 4πλo2 Im ,
2
1
V2
T B1 T B 2
Fig. 1.
P1
P2
λo1
T B 2 T B1
and
> 0 and
> 0 are the Lagrange multipliers associated
with the constraint sets A3 and A1 , respectively, which are
computed from
Z 2π
Trace(Wxo (θ))dθ = Px ,
(26)
0
Z 2π
Trace(Wno (θ))dθ = Pn .
(27)
T
0
Modified water-filling: optimal bandwidth and uncertainty
pulse response of a wireless fading channel is given by [34]
X
c(τ, t) =
αn (t) exp(−j2πfc τn (t))δ(τ − τn (t)), (23)
n
where αn (t) and τn (t) are attenuations and delays, respectively, and fc is a carrier frequency. The phase of the signal
θn (t) is determined by 2πfc τn (t). This means that the change
in τn (t) by 1/fc results in a change in θn (t) by 2π. Therefore,
for large fc , 1/fc is a small value such that a small motion in
the transmission medium may cause a change in θn (t) by 2π.
Hence, as frequency increases the distance between a perturbed well (σi (θ) − |w(ejθ )|)−2 and the nominal σi−2 (θ) will
increase. Consequently, the perturbed well is narrower than
the nominal one as illustrated in Fig 1. From this, we may
deduce that the optimal bandwidth for an uncertain channel is
smaller than the optimal bandwidth for a completely known
channel. One has to have in mind that the two wells have to
be filled by using a total power of Px .
IV. S ECOND P ROBLEM : C HANNEL K NOWN , N OISE
U NKNOWN
In this section, the solution of (13) is considered. The
problem may be treated as a game between the transmitter
x and “nature” that picks up the PSD of the noise Wn (θ)
which is unknown but belongs to the set A3 . We solve the
maximin problem directly, by applying the Lagrange multiplier
technique.
Theorem 4.1: The solution of the maximin optimization
problem (13) is given by
Z 2π
1
log det(Ip + H(ejθ )Wxo (θ)H ∗ (ejθ )Wno−1 (θ))dθ,
4π 0
where the optimal PSD matrix of the noise satisfies the matrix
Riccati equation
Wno2 (θ)
1
+ Wno (θ)H(ejθ )Wxo (θ)H ∗ (ejθ ) +
2
1
H(ejθ )Wxo (θ)H ∗ (ejθ )Wno (θ) −
2
1
H(ejθ )Wxo (θ)H ∗ (ejθ ) = 0,
4πλo1
(25)
λo2
Proof. The proof is given in Appendix B.
Remark 4.2: Theorem 4.1 shows that the optimal matrices
Wxo and Wno are related through the matrix Riccati equation
which emerges in solutions of many optimal estimation and
control problems. For instance, the time evolution of the error
covariance matrix in Kalman filtering is described by Riccati
equation [34]. Further, when H(ejθ ) is square and invertible,
after some manipulation of (24) and (25), it is shown that
λo1 ∗ jθ
Wxo (θ) =
H (e )Wno (θ)H −∗ (ejθ ).
(28)
λo2
Moreover, the optimal PSD matrices of the communicator and
the noise are given by
λo
Wxo = 1o H ∗ (4πλo2 Ip + 4πλo1 HH ∗ )−1 H,
(29)
λ2
Wno = H(4πλo2 Ip + 4πλo1 H ∗ H)−1 H ∗ .
(30)
The formula for the achievable rate becomes
Z 2π
1
λo
Ra2 =
log det(Ip + 1o H(ejθ )H ∗ (ejθ ))dθ. (31)
4π 0
λ2
When the channel is single-input single-output, (28) gives
λo
(32)
Wxo (θ) = o1 Wno (θ).
λ2
Thus, the explicit solution of the optimization problem (13)
provides us with the explicit relation between Wxo (θ) and
Wno (θ) in terms of Riccati matrix equation (24) and (25)
in the general case, and (28) in the special case of square
and invertible channel matrix H(ejθ ). In the special case, the
optimal PSD matrix of the noise is proportional to the optimal
PSD matrix of the transmitted signal demonstrating that the
optimal solution for both players is to match opponent’s
statistical properties. This shows the advantage of a direct
solution of optimization problem (13), since it directly relates
the optimal transmitter’s and noise strategies, which cannot be
seen if the problem is solved through numerical techniques.
Remark 4.3: From the solution given in Appendix B, the
optimal solutions Wxo (θ) and Wno (θ) constitute a saddle point
of the optimization problem given by (13), meaning
J(Wx , Wno ) ≤ J(Wxo , Wno ) ≤ J(Wxo , Wn ),
(33)
Wxo (θ)
for any Wx (θ) ∈ A1 , Wx (θ) 6=
and any Wn (θ) ∈ A3 ,
Wn (θ) 6= Wno (θ). In addition, it follows from Appendix B
that
(24)
sup
inf
Wx ∈A1 Wn ∈A3
J(Wx , Wn ) =
inf
sup J(Wx , Wn ). (34)
Wn ∈A3 Wx ∈A1
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
Hence, when the transmitter does not know the PSD of the
noise, it will use the PSD matrix Wxo (θ), which guarantees
the transmission rate of at least J(Wxo , Wno ) according to (33).
Similarly, “nature” tends to use the PSD matrix Wno (θ), which
is a scaled version of the optimal PSD matrix Wxo (θ).
In addition, the derived formula can be used to model the
transmission over a communication channel subject to interference from other users in the network. One such example
is a public cellular system where all the mobile and base
stations use the same number of antennas. If the channel
matrix between the interferer and the base station of the
cellular system is denoted by G(ejθ ) (of dimension p × p),
then the mutual information rate is given by
Z 2π
1
log det(Ip + HWx H ∗ (GWn G∗ )−1 )dθ
4π 0
Z 2π
1
(35)
log det(Ip + H1 Wx H1∗ Wn−1 )dθ,
=
4π 0
4
where H1 (ejθ ) = G−1 (ejθ )H(ejθ ). Here, it is assumed that
G(ejθ ) is an invertible matrix.
V. ACHIEVABLE R ATES
A. Achievable Rates for Uncertain Channel Matrix
Special case of additive perturbation. For the compound
channel described by the set A2 , it is shown that the supremum
and the infimum of the problem defined by (11) can be
interchanged (see Appendix A). Moreover, the form of (18)
and (19) subject to (16), suggests that we are dealing with
the worst case channel determined by the singular values of
the nominal channel matrix Hnom (ejθ ) that are reduced by
the size of the uncertainty set |w(ejθ )|. Hence, the MIMO
capacity depends only on the singular values of the channel
frequency response and not on the unitary matrices U (ejθ )
and V (ejθ ) [12]. Therefore, all channel matrices, which have
the same singular values, constitute an equivalent class, where
all members of the class have the same channel capacity.
For the specific structure of ∆(ejθ )
∆(ejθ ) = U (ejθ )∆s (ejθ )V ∗ (ejθ ),
where
"
4
∆s (ejθ ) =
(36)
#
∆1 (ejθ ) 0
,
0
0
(37)
any channel frequency response matrix from A2 is given by
H(ejθ ) = U (ejθ )(Σ(θ) + ∆s (ejθ )w(ejθ ))V ∗ (ejθ ),
jθ
(38)
jθ
where Σ(θ) + ∆s (e )w(e ) is diagonal. In order to achieve
the worst case channel capacity, it is enough to diagonalize the
channel matrix H(ejθ ) by precoding the transmitted signal
by V (ejθ ) and by shaping the received signal by U ∗ (ejθ )
as shown in Fig. 2. Consequently, n parallel one-dimensional
compound channels are obtained, which enables the use of
n one-dimensional codes. Each one-dimensional compound
channel is represented by
σi (θ) + δs,ii (ejθ )w(ejθ ), i = 1, ..., n,
(39)
7
n
MIMO
channel
H
V
Fig. 2.
+
U*
Transmission scheme for uncertain channel matrix
where δs,ii (ejθ ) is an element of ∆s (ejθ )
=
diag(δs,11 (ejθ ), ..., δs,nn (ejθ )), |δs,ii (ejθ )| ≤ 1, i = 1, ..., n.
For each of n one-dimensional channels defined by (39), the
worst case channel is determined by σi (θ) − |w(ejθ )|, which
represents the channel with the smallest magnitude out of all
possible channels defined by (39).
Thus, following Shannon’s work [35] and the notion of
“degrading” (for one practical example of degrading channels
see [36]), it is possible to use n one-dimensional codes for
the compound channel defined by A2 subject to (16), each
of which is tuned to the corresponding worst case channel
σi (θ) − |w(ejθ )|, i = 1, ..., n. Namely, an one-dimensional
code, tuned to the worst case channel, will perform well over
the whole class of channels defined by (39), because all other
channels from the class are less detrimental than the worst
case channel, i.e., they have larger magnitude. Moreover, we
say that each channel from the set {σi (θ) + δs,ii (ejθ )w(ejθ ) :
|δs,ii (ejθ )| ≤ 1} can be “degraded” to the worst case channel
σi (θ) − |w(ejθ )|, −π ≤ θ ≤ π. The expression “degrading”
means that each channel from {σi (θ) + δs,ii (ejθ )w(ejθ ) :
|δs,ii (ejθ )| ≤ 1} can be transformed into σi (θ) − |w(ejθ )|,
−π ≤ θ ≤ π, by reducing a channel magnitude for each
frequency θ to σi (θ) − |w(ejθ )|, −π ≤ θ ≤ π. This implies
that the set of n one-dimensional codes are universal for the
special case of the MIMO compound channel defined by (16).
Therefore, Ra1 represents the channel capacity of the MIMO
compound channel defined by the set A2 subject to (16).
General case of additive perturbation. In the general case,
∆(ejθ ) does not have the form of (36). Next, consider the
general form of the additive uncertainty description given by
H(ejθ ) =
=
=
Hnom (ejθ ) + ∆(ejθ )w(ejθ )
(40)
jθ
∗ jθ
jθ
jθ
U (e )Σ(θ)V (e ) + ∆(e )w(e ) (41)
U (Σ + U ∗ ∆V w)V ∗ .
(42)
Thus, the second term within parentheses in (42) corresponds
to (36), i.e.,
4
∆s (ejθ ) = U ∗ (ejθ )∆(ejθ )V (ejθ ).
(43)
It follows that the most general case of (42) is equivalent to
the case when ∆s (ejθ ) is not diagonal. Then, ∆s (ejθ ) can be
written as the sum of diagonal and off-diagonal part, namely,
∆s (ejθ ) = ∆s,diag (ejθ )+∆s,of f −diag (ejθ ). The effect of offdiagonal elements of ∆s (ejθ ) may be viewed as an additive
noise, which will be explained next. If we use the same
strategy as before, by precoding the transmitted signal by
V (ejθ ) and shaping the received signal by U ∗ (ejθ ) (see [16]),
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
the equivalent channel representation is given by
jθ
Heq (e ) =
=
∗
jθ
jθ
jθ
U (e )H(e )V (e )
(44)
Σ + ∆s,diag w + ∆s,of f −diag w,
(45)
in which the output of the ith channel is represented in the
frequency domain by
yi (ejθ ) = (σi (θ) + δs,ii (ejθ )w(ejθ ))xi (ejθ ) +
X
δs,ik (ejθ )w(ejθ )xk (ejθ ) + ni (ejθ ).
(46)
k6=i
Here, δs,ij (ejθ ), 1 ≤ i ≤ p, 1 ≤ j ≤ m, are the entries
of ∆s (ejθ ). The off-diagonal elements δs,ij (ejθ ), i 6= j, are
absorbed into an equivalent noise
X
neq,i (ejθ ) = ni (ejθ ) +
δs,ik (ejθ )w(ejθ )xk (ejθ ). (47)
k6=i
Thus, one can use n one-dimensional codes, each tuned to
a corresponding σi (θ) − |w(ejθ )| channel, and include the
additional term, created by off-diagonal elements of ∆s (ejθ ),
into the noise. This means that the channel capacity in the most
general case will be less then the value found in Theorem
3.3. It is interesting to note that this interpretation of the
uncertainty is equivalent to the “interference of the second
kind” introduced by Blachman in [5], where the overall
interference has two parts, one, which is orthogonal to the
transmitted signal, and the other antiparallel to it.
Remark 5.1: Ra1 given in Theorem 3.3 represents an upper
bound on the capacity of the MIMO compound channel
defined by the set A2 .
An achievable rate in the general case can be inferred from
(46). First, k∆(ejθ )k∞ ≤ 1 and (43) imply k∆s (ejθ )k∞ ≤ 1
and |δs,ij (ejθ )| ≤ 1, 1 ≤ i ≤ p, 1 ≤ j ≤ m. Therefore, the
power of the equivalent noise neq,i (ejθ ) can be upper bounded
by
X
Wneq,i ≤ Wni +
|w|2 Wxk ,
(48)
k6=i
where Wneq,i (θ), Wni (θ) and Wxk (θ) are PSD’s of the
equivalent noise neq,i of the ith channel, the noise ni of the
ith channel, and the signals xk coming from the transmitters
k 6= i, respectively. Also, it is assumed that the signals
transmitted from different antennas are independent. Having
in mind (48), an achievable rate for the general case can be
computed, and it is given by the following theorem.
Theorem 5.2: An achievable rate for the compound channel
defined by the set A2 is the solution of the following optimization problem
Ã
!
n Z
Wxi (σi − |w|)2
1 X 2π
P
log 1 +
dθ
sup
4π i=1 0
Wni + k6=i |w|2 Wxk
n Z 2π
X
Wxi dθ = Px .
(49)
s.t.
i=1
0
Proof. The proof follows from the achievability for singleinput single-output channels
From (47), the equivalent noise at the ith receiver consists
of two parts: one, that is the product of thermal noise at the ith
8
receiver, and the other which is the product of the interference
coming from the signals generated at the
P transmitters k 6= i.
Corollary 5.3: If it is assumed that k6=i |w|2 Wxk is negligible comparing to Wni in Theorem 5.2, then the ensemble
of n one-dimensional codes can achieve the transmission rate
Ra1 given by Theorem 3.3.
In practice, this could happen if the noise n is used not only
to model the thermal noise, but also the interference from other
users in a communication network.
B. Achievable Rates for Uncertain Noise
When the noise uncertainty is described by the set A3 , the
optimal PSD matrix of the transmitted signal Wxo (θ) and the
optimal PSD matrix of the noise Wno (θ) satisfy the saddle
point condition, as proved in Appendix B. This suggests that
the notion of the worst case noise can be employed.
To show the achievability of Ra2 given by Theorem 4.1,
we assume the following: 1) The receiver has the knowledge
of the noise PSD matrix Wn (θ), while the transmitter does
not have to know it, 2) The transmitter uses a “Gaussian
codebooks”. In this paper, the “Gaussian codebook” is defined as the set of codewords {xi }M
i=1 of length N , where
xi = [xi (0), ..., xi (N − 1)], xi (k) ∈ Cm , k = 0, ..., (N − 1),
are the sample paths of M independent discrete-time Gaussian
processes, having the same PSD matrices Wxo (θ), the one
that is optimal with respect to the worst case noise PSD,
Wno (θ). It is assumed that stochastic processes, which generate
the codewords, are ergodic. Then, the achievable transmission rate is J(Wxo (θ), Wn (θ)) for any choice of the PSD
noise matrix Wn (θ) (proved in [12]). From the saddle point
property J(Wxo (θ), Wno (θ)) ≤ J(Wxo (θ), Wn (θ)), implying
that the achievable transmission rate is lower bounded by
J(Wxo (θ), Wno (θ)). This transmission rate is achievable when
the worst case noise affects the transmission (having PSD
Wno (θ)). Hence, Ra2 = J(Wxo (θ), Wno (θ)) is an achievable
rate for the whole class of noises A3 subject to the assumption
that the receiver knows Wn (θ), but the transmitter does not
have to know it. The transmitter has to have the knowledge
of J(Wxo (θ), Wno (θ)) in order to choose a transmission rate
R < J(Wxo (θ), Wno (θ)). Therefore, a “Gaussian codebook”
provides robustness when the transmitter does not know the
noise. However, the previous consideration does not prove the
universality of the “Gaussian codebook”, i.e., it does not show
the existence of the single code that will be good for all
noises in A3 . Rather, based on the classical result [12] and
the saddle point property found here, it can be shown that
the probability of the decoding error, averaged over randomly
chosen codebooks, tends to zero, for any noise from A3 .
VI. E XAMPLES
A. Uncertain Channel
The following example illustrates the computation of the
achievable rates, when the uncertainty with respect to the
channel frequency response matrix is described by the set A2 .
The nominal channel frequency response matrix is given by
"
#
1
1
1
Hnom (ejθ ) =
1− 12 e−jθ 1− 14 e−jθ
1
1− 41 e−jθ
1− 14 e−jθ
1
1
1− 21 e−jθ 1− 41 e−jθ
, (50)
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
9
4.5
3.5
>H e @
4
>H e @
jT
3
ub
G 1.6
jT
11
nom
3.5
11
2.5
3
V 1 T 2.5
2
w e jT
2
1.5
1.5
1
1
0.5
-4
-3
-2
-1
0
1
2
3
4
0.5
-4
-3
-2
-1
0
1
2
3
4
ș
ș
Fig. 4.
Fig. 3.
Magnitudes of
[Hnom (ejθ )]11
and [Hub
(ejθ )]
σ1 (θ) and |w(ejθ )|
11
1.4
G 1.6
while the upper bound on the uncertainty set is given by
"
#
δ
1.4
Hub (ejθ ) =
1− 12 e−jθ
1.4
1− 14 e−jθ
1− 14 e−jθ
δ
1− 12 e−jθ
,
1.2
(51)
where δ is a known constant. The magnitudes of
[Hnom (ejθ )]11 and [Hub (ejθ )]11 are shown in Fig. 3, for
δ = 1.6.
The size of the uncertainty set is determined by observing
that
1
0.8
0.6
V 2 T w e jT
0.4
0.2
σ̄(Hub (ejθ ) − Hnom (ejθ )) = σ̄(∆(ejθ )w(ejθ )Im ) ≤ |w(ejθ )|.
For the connection between the maximal singular value and
k · k∞ see Appendix C. It is assumed that |w(ejθ )| =
σ̄(Hub (ejθ ) − Hnom (ejθ )) for each frequency θ ∈ [−π, π].
In Fig. 4 and 5, the size of the uncertainty set |w(ejθ )|
is compared to the singular values σ1 (θ) and σ2 (θ) of the
nominal frequency response matrix Hnom (ejθ ), respectively,
for δ = 1.6. From Fig. 5, the frequency range, over which
the transmission over the second mode is optimal, can be
determined. As suggested by (18) and (19), the transmission
takes place only over the frequencies where σ2 (θ) > |w(ejθ )|.
This implies that for a larger size of the uncertainty set
|w(ejθ )|, the weaker mode will not be used in the transmission.
In the case of the first singular value, the size of the uncertainty
set |w(ejθ )| is smaller than σ1 (θ) for all frequencies between
−π and π implying that all the frequencies will be used for
the transmission.
It is interesting to observe (19). It requires the positivity of
µ − (σi (θ) − |w(ejθ )|)−2 , i = 1, ..., n. Fig. 6 shows (σ1 (θ) −
|w(ejθ )|)−2 , while Fig. 7 and 8 show (σ2 (θ) − |w(ejθ )|)−2 .
The peaks in Fig. 7 occurs at the frequencies where σ2 (θ) =
|w(ejθ )|. In order to have a transmission over the ith mode, the
constant µ, which is chosen in accordance with the power constraint (19), must be larger than (σi (θ) − |w(ejθ )|)−2 over an
interval of frequencies θ ∈ [−π, π]. Because σi (θ) − |w(ejθ )|
is smaller for smaller singular values of a nominal channel
frequency response matrix, (σi (θ) − |w(ejθ )|)−2 is larger for
smaller singular values. This can be verified from Fig. 6 and
0
-4
-3
-2
-1
0
1
2
3
4
ș
Fig. 5.
σ2 (θ) and |w(ejθ )|
8. Thus, for small values of transmitted power Px , the level of
power poured into (σi (θ) − |w(ejθ )|)−2 may be small. Hence,
for small transmitted power Px , it could happen that the
weaker modes could be left without power. In fact, the larger
uncertainty makes the weaker modes even weaker, which may
contribute to the transmission only over the strongest mode
for small transmitted power.
Fig. 9 shows the achievable rate versus the parameter δ,
which determines the size of the uncertainty set, for a fixed
value of Px = 1 W. As expected, the achievable rate decreases,
as the size of uncertainty set increases.
B. Uncertain Noise
Next, it will be shown how the achievable rates for the
MIMO communication channel can be computed when the
PSD matrix of the noise Wn (θ) belongs to the set A3 . The
channel frequency response is completely known, and it is
given by
#
"
1
1
H(ejθ ) =
1− 12 e−jθ
1
1− 14 e−jθ
1− 14 e−jθ
1
1− 12 e−jθ
.
(52)
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
40
10
V T w e jT
4
2
1
35
3.5
30
3
Ra1 [nat/s/Hz]
25
20
15
2.5
2
1.5
10
1
5
0.5
0
-4
-3
-2
-1
0
1
2
3
0
1.4
4
1.6
1.8
2
ș
Fig. 9.
(σ1 (θ) − |w(ejθ )|)−2
Fig. 6.
2.2
Uncertainty
2.4
2.6
2.8
3
į
Achievable rate Ra1 vs. δ
3.2
4000
V T we jT
3
2
2
3500
2.8
2.6
2500
2.4
Ra2 [nat/s/Hz]
3000
2000
1500
2.2
2
1.8
1000
1.6
500
1.4
1.2
0
-4
-3
-2
-1
0
1
2
3
0
1
2
4
3
4
5
SNR (dB)
6
7
8
9
10
ș
Fig. 10.
Fig. 7.
20
SN R = 10 log Px /Pn . The transmitted power is limited by
Px = 0.1W, while Pn is varied in a certain range.
It is interesting to notice that the Lagrange multipliers λo1
and λo2 represent the derivatives of Ra2 with respect to Pn
and Px , respectively. The multipliers are shown in Fig. 11
and 12. It can be seen that Ra2 is more sensitive to variations
in Pn and Px for high SNR. But, a better picture regarding
the sensitivity of Ra2 with respect to Pn and Px is obtained
a2
by comparing the relative perturbation of Ra2 , ∆R
Ra2 , to the
∆Px
∆Pn
relative perturbations of Pn , Pn , and Px , Px . Viewing Ra2
as a function of Pn and Px , and taking the limit with respect
to Pn and Px , we get
V T we jT
18
2
2
16
14
12
10
8
6
4
2
0
-1
∆Ra2 /Ra2
Pn dRa2
=
,
∆Pn /Pn
Ra2 dPn
∆Ra2 /Ra2
Px dRa2
Sx = lim
=
.
Px →0 ∆Px /Px
Ra2 dPx
Sn = lim
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
ș
Fig. 8.
Achievable rate Ra2 vs. SN R
(σ2 (θ) − |w(ejθ )|)−2
(σ2 (θ) − |w(ejθ )|)−2
In this case, the channel frequency response matrix H(ejθ )
is square and invertible. To find all required quantities, (29)
and (30) are used. Fig. 10 shows the channel capacity vs.
Pn →0
(53)
(54)
The ratios of relative perturbations are shown in Fig. 13 and
Fig. 14. For this particular example, the relative perturbation
of Ra2 is between 1/3 and 1/2 of the relative perturbations
of Pn and Px . Thus, the sensitivity of Ra2 with respect to Pn
and Px is not significant.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
dRa 2
dPn
O1 0
11
S x 0.55
-10
0.5
-20
-30
0.45
-40
0.4
-50
-60
-70
Fig. 11.
dR a 2
dPx
0.35
0
1
2
3
4
5
SNR (dB)
6
7
8
9
10
0.3
Derivative of Ra2 w.r.t. Pn
O2
Fig. 14.
1
2
3
4
5
SNR (dB)
6
7
8
9
10
Sensitivity w.r.t. Px
7
6.5
6
5.5
5
4.5
4
Fig. 12.
0
0
1
2
3
4
5
SNR (dB)
6
7
8
9
10
Derivative of Ra2 w.r.t. Px
S n -0.3
-0.35
represented through a subset of L1 function space. For both
problems, the explicit formulas for the achievable rates are
derived, as well as the optimal PSD matrices of the transmitted
signals. For the case when the noise is uncertain, the optimal
PSD matrix of the noise is derived, and it is shown that it is
related to the optimal PSD matrix of the transmitted signal
through a Riccati matrix equation. In the special case, when
the channel frequency response matrix is square and invertible,
the optimal PSD matrix of the noise is proportional to the
optimal PSD matrix of the transmitted signal. For the case
when the channel frequency response matrix is uncertain, the
achievable rate and the optimal PSD matrix of the transmitted
signal depend on the size of the uncertainty set. From these
two formulas, it can be concluded that the transmission over
the strongest mode of the nominal channel frequency response
matrix is the optimal strategy when the size of the uncertainty
set increases. In addition, it is shown that under certain
conditions, the worst case channel can be identified and that
the worst case code, i.e., the code that performs well over the
worst case channel is universal for the compound channel of
the first-type.
-0.4
ACKNOWLEDGMENT
-0.45
The authors would like to thank Professor Frank Kschischang from the Department of Electrical and Computer Engineering, University of Toronto, for his useful comments and
suggestions.
-0.5
-0.55
Fig. 13.
0
1
2
3
4
5
SNR (dB)
6
7
8
9
10
Sensitivity w.r.t. Pn
VII. C ONCLUSION
This paper deals with the information theoretic bounds of
compound MIMO Gaussian channels with memory for two
different types of uncertainties. The uncertainty of the channel
matrix frequency response is described through a subset of
H ∞ space, while the uncertainty of the noise PSD matrix is
A PPENDIX A
P ROOF OF T HEOREM 3.1 AND T HEOREM 3.3
Notice that the constraint H(ejθ ) ∈ A2 is the same as
k∆(ejθ )k∞ ≤ 1. Instead of working with this constraint,
we will work with the constraint ∆∗ (ejθ )∆(ejθ ) − Im ≤ 0,
for θ ∈ [0, 2π]. This notation means that ∆∗ ∆ − Im is
non-positive definite. The latter constraint implies the former
(see (86)). This modification does not change the problem;
it turns out that the optimal ∆o (ejθ ) lies on the boundary,
k∆o (ejθ )k∞ = 1.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
Next, define the Lagrangian J1 (Wx , ∆, K)
Z 2π
1
log det(Ip + (Hnom + w∆)Wx ×
4π 0
Z 2π
∗
(Hnom + w∆) )dθ +
Trace[K(∆∗ ∆ − Im )]dθ,
0
where K ≥ 0 is a Lagrangian matrix having appropriate
dimensions. Next, we use Kuhn-Tucker conditions [37]. The
variation of J1 with respect to ∆(ejθ ) gives a necessary
condition for the infimum of J1 with respect to ∆(ejθ )
1
wWx (Hnom + w∆o )∗ (Ip + (Hnom + w∆o ) ×
4π
Wx (Hnom + w∆o )∗ )−1 + K∆o,∗ = 0,
(55)
which cannot be solved explicitly. Therefore, it has
to be accounted as the equality constraint in order
to resolve the supremum part. The original problem
supWx ∈A1 inf k∆k∞ ≤1 J(Wx , Wn )
is
equivalent
to
supWx ∈A1 supK≥0 J1 (Wx , ∆o , K) subject to constraint
(55) (see [37], [38]).
Further, introduce the Lagrangian J2 (Wx , ∆o , K, S, λ1 ) by
Z 2π
1
log det(Ip + (Hnom + w∆o )Wx ×
4π 0
Z 2π
o ∗
(Hnom + w∆ ) )dθ +
Trace[K(∆o,∗ ∆o − Im )]dθ
0
Z 2π
1
−
Trace[S( wWx (Hnom + w∆o )∗
4π
0
+K∆o∗ (Ip + (Hnom + w∆o )Wx (Hnom + w∆o )∗ ))]dθ
µ Z 2π
¶
−λ1
Trace(Wx )dθ − Px ,
(56)
0
where λ1 and S are Lagrange multipliers. λ1 is a positive
constant, and S is a matrix having appropriate dimensions.
Then
sup sup J1 (Wx , ∆o , K),
(57)
Wx ∈A1 K≥0
subject to constraint (55), is equivalent to
inf inf sup sup J2 (Wx , ∆o , K, S, λ1 ),
λ1 ≥0 S K≥0 Wx ∈A1
(58)
because (58) is a dual problem of (57) [37], [38]. Since Wx (θ)
and ∆o (ejθ ) are related through the equality constraint (55),
the Lagrangian J2 has to be varied with respect to both,
Wx (θ) and ∆o (ejθ ). By varying J2 with respect to ∆o (ejθ )
and equating the derivative to zero, the following equation is
obtained
1
wWx (Hnom + w∆o )∗ (Ip + (Hnom + w∆o ) ×
4π
Wx (Hnom + w∆o )∗ )−1 + K∆o,∗
∗
= Wx Hnom
SK∆o,∗ w + Wx ∆o,∗ SK∆o,∗ |w|2 ,
(59)
where the term on the left hand side is equal to zero (see (55)).
This observation implies that
∗
wWx (θ)(Hnom
(ejθ ) + w∗ (ejθ )∆o,∗ (ejθ ))SK∆o,∗ (ejθ ) = 0.
It follows that either Hnom (ejθ ) + w(ejθ )∆o (ejθ ) = 0, or
S = 0, or K = 0, or some combination of the previous
12
conditions is true. If Hnom (ejθ ) + w(ejθ )∆o (ejθ ) = 0, then
from (55) follows that K = 0, meaning that the constraint
imposed on ∆o (ejθ ) vanishes, and Ra1 is equal to zero,
which is a trivial solution. Thus, the possibility that remains
is S = 0. This and the Lagrangian J2 indicate that the sup
and inf problems may be solved independently. To verify this
claim, observe that when S = 0, the Lagrangian J2 , (56),
corresponds to the Lagrangian when a saddle point exists.
When a saddle point exists, the Lagrangian consists of the payoff function, the term that describes the constraint on ∆(ejθ ),
and the term that describes the constraint on Wx (θ), while
the Lagrangian is varied with respect to ∆(ejθ ) and Wx (θ)
as they were independent. This is exactly the case of (56) for
S = 0. Hence, the conclusion is that
sup
inf J(Wx (θ), Wn (θ)) =
Wx ∈A1 H∈A2
inf
sup J(Wx (θ), Wn (θ)).
H∈A2 Wx ∈A1
(60)
Thus, for S = 0, we vary the Lagrangian J2 with respect
to Wx (θ) to get
(Hnom + w∆o )∗ (I + (Hnom + w∆o )Wxo ×
(Hnom + w∆o )∗ )−1 (Hnom + w∆o ) = 4πλ1 Im ,(61)
which represents the equation that is satisfied by the optimal
Wxo (θ). From Kuhn-Tucker conditions [37], [38]
Z 2π
λ1 [
Trace(Wxo (θ))dθ − Px ] = 0,
(62)
0
Z 2π
Trace[K(∆o,∗ ∆o − Im )]dθ = 0.
(63)
0
Further, by observing that λ1 6= 0 and K 6= 0 (because the
opposite conditions imply trivial solution C = 0), it follows
Z 2π
Trace(Wxo (θ))dθ = Px ,
(64)
0
∆o,∗ ∆o − Im = 0,
(65)
concluding the proof of Theorem 3.1
However, Theorem 3.1 does not provide us with the explicit
solution for ∆o (ejθ ). The reason for this is that k · k∞ norm
puts the constraint on ∆∗ (ejθ )∆(ejθ ), but not on ∆(ejθ ) itself.
So, in Theorem 3.3, we have to make an additional step to find
∆o (ejθ ).
Next, it is shown why we deal with a particular case of
∆(ejθ ). We find what conditions should be satisfied such that
the integrand
det(Ip + (Hnom + w∆)Wx (Hnom + w∆)∗ )
(66)
can be maximized with respect to Wx (θ) by using
Hadamard’s inequality. Having in mind that Hnom (ejθ ) =
U (ejθ )Σ(ejθ )V ∗ (ejθ ), and by using the fact that det(In +
AB) = det(Im + BA) (where A is n × m matrix, and B is
m × n matrix), the integrand can be expressed as
det(Ip + V ∗ Wx V (Σ + wU ∗ ∆V )∗ (Σ + wU ∗ ∆V ))
= det(Ip + W̃x R∗ R),
(67)
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
4
4
where R = Σ + wU ∗ ∆V , and W̃x = V ∗ Wx V . Further, by
using Hadamard’s inequality (as in [32]), the determinant can
be upper bounded by the product of the diagonal elements,
det(Ip + W̃x R∗ R) ≤ Πni=1 (1 + [Q]ii ),
(68)
Q = W̃x R∗ R.
(69)
where
The equality in (68) is achieved when Q is diagonal. If
R∗ R were diagonal, Hadamard’s inequality could be used
to maximize the mutual information rate in W̃x . Then maximizing W̃x would be diagonal. Therefore, the original supinf
problem is solved by using this information concerning the
diagonal property of W̃x and R∗ R, to determine ∆o (ejθ ),
which minimizes J(Wx (θ), Wn (θ)). Assume W̃x is diagonal
and choose the matrix ∆(ejθ ) as follows
"
#
jθ
∆
(e
)
0
1
∆o (ejθ ) = U (ejθ )
V ∗ (ejθ ),
(70)
0
0
where ∆1 (ejθ ) is n × n matrix, ∆1 (ejθ ) = diag(δ1 , ..., δn ),
and n is the rank of Hnom (ejθ ), n ≤ min(p, m). This ensures
that R∗ R is diagonal. Then
det(Ip + W̃x R∗ R) = Πni=1 (1 + [W̃x ]ii |σi + δi w|2 ),
(71)
where {σi }ni=1 are singular values of Hnom (ejθ ). Note that
∆∗1 ∆1 is the identity matrix, which comes from ∆∗ ∆ = Im
(see (63)). It follows that |δi | = 1, i = 1, ..., n. The
determinant in (71) is minimized when ξi = |σi + δi w|2 is
minimized. ξi has the following lower bound
ξi
=
≥
σi2 + 2σi |δi ||w| cos(arg(δi ) + arg(w)) + |δi |2 |w|2
(σi − |w|)2 ,
(72)
where δi = |δi |ejarg(δi ) = ejarg(δi ) , and w = |w|ejarg(w) .
The matrix ∆1 (ejθ ) that achieves the lower bound is given by
(21). The optimal PSD matrix Wx (θ) is found by substituting
(70) into (61) and using (64)
A PPENDIX B
P ROOF OF T HEOREM 4.1
The achievable rate Ra2 in the case of the noise uncertainty
is given in the form of the double optimization problem (13)
sup
inf
Wx ∈A1 Wn ∈A2
J(Wx (θ), Wn (θ)).
We formulate the Lagrangian J1 (Wx , Wn , λ1 )
Z 2π
1
log det(I + HWx H ∗ Wn−1 )dθ
4π 0
µ Z 2π
¶
+λ1
Trace(Wn )dθ − Pn ,
(73)
13
apply Kuhn-Tucker conditions [37]. The variation of J1 with
respect to Wn gives the quadratic, Riccati-type equation
1
Wno2 (θ) + Wno (θ)H(ejθ )Wxo (θ)H ∗ (ejθ )
2
1
jθ
+ H(e )Wxo (θ)H ∗ (ejθ )Wno (θ)
2
1
−
H(ejθ )Wxo (θ)H ∗ (ejθ ) = 0,
(75)
4πλo1
which cannot be solved explicitly. Therefore, it has
to be accounted as the equality constraint in order
to resolve the supremum part. The original problem
supWx ∈A1 inf Wn ∈A2 J(Wx , Wn )
is
equivalent
to
supWx ∈A1 supλ1 ≥0 J1 (Wx , Wno , λ1 ) subject to (75) (dual
problem [37], [38]). Further, introduce the Lagrangian
J2 (Wx , Wno , λ1 , λ2 , K)
Z 2π
1
log det(Ip + HWx H ∗ Wno−1 )dθ
4π 0
µ Z 2π
¶
+λ1
Trace(Wno )dθ − Pn
µ Z0 2π
¶
−λ2
Trace(Wx )dθ − Px
0
Z 2π
1
−
Trace[K(Wno2 + HWx H ∗ Wno
2
0
1 o
1
+ Wn HWx H ∗ −
HWx H ∗ )]dθ,
(76)
2
4πλo1
where λ2 is a constant, and K is a matrix, representing
Lagrange multipliers. Then,
sup sup J1 (Wx , Wno , λ1 ),
(77)
Wx ∈A1 λ1 ≥0
subject to (75), is equivalent to
inf inf sup sup J2 (Wx , Wno , λ1 , λ2 , K),
λ2 ≥0 K λ1 ≥0 Wx ∈A1
(78)
(dual problem [37], [38]). Since Wx and Wno are related
through equality constraint (75), the Lagrangian J2 is varied
with respect to both, Wx and Wno . By varying J2 with respect
to Wx , the following equation is obtained
H ∗ Wno−1 (I + HWx H ∗ Wno−1 )−1 H
1
−4πH ∗ Wn KH + H ∗ KH = 4πλ2 Im .
λ1
By varying J2 with respect to Wn , we have
(79)
−Wno−1 (Ip + HWx H ∗ Wno−1 )−1 HWx H ∗ Wno−1
+4πλ1 Ip = 4πK(2Wno + HWx H ∗ ).
(80)
Further, (80) can be massaged to give
1
(I + HWx H ∗ Wno−1 )Wno K(2Wno + HWx H ∗ )Wno = 0, (81)
λo1
(74)
0
where λ1 is a constant, representing a Lagrange multiplier,
associated with the constraint imposed on the Wn . Next, we
which follows from (75). Hence, one or more terms on the
right hand side must be equal to zero. From the setting of
the problem, the only possibility that remains is K = 0. This
implies that the two constraints imposed on Wx and Wn can be
decoupled, and that Wx and Wn satisfy saddle point property
J(Wx , Wno ) ≤ J(Wxo , Wno ) ≤ J(Wxo , Wn ),
(82)
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
which is equivalent to
sup
inf
Wx ∈A1 Wn ∈A3
J(Wx , Wn ) =
inf
sup J(Wx , Wn ). (83)
Wn ∈A3 Wx ∈A1
In addition, (79) and (80) transform into (25) and (24),
respectively
A PPENDIX C
H ∞ S PACE
4
Let D denote the unit disc of the complex plane, D = {z ∈
4
C : |z| < 1}, and ∂D denote the boundary of D, D = {z ∈
C : |z| = 1}.
Definition C.1: A Hilbert space of matrix-valued functions
G, on ∂D, which have a finite norm
Z
4
kGkL2 =
Trace[G∗ (ejθ )G(ejθ )]dθ,
(84)
[0,2π)
is a Lebesgue-Bochner space, which is denoted by L2 .
Definition C.2: A Banach space of matrix-valued functions
G, on ∂D, which have a finite norm
4
kGkL∞ = ess sup σ̄[G(ejθ )],
(85)
θ∈[0,2π)
is a Lebesgue-Bochner space, which is denoted by L∞ .
Here, σ̄(A) denotes the maximal singular value of the matrix
A. “esssup” stands for essential supremum of a function, and
it is defined as the smallest number α such that the measure
of the set {θ : σ̄[G(ejθ )] > α} is zero.
Definition C.3: H ∞ space is a closed subspace of L∞ ,
where the matrix-valued functions G are analytic and bounded
on the unit disc D.
The norm associated with the space H ∞ , k · k∞ , which is
given by (85), can also be expressed in terms of k · kL2 as
follows
kGk∞ =
kGxkL2
.
kxkL2 6=0 kxkL2
sup
(86)
This explains why k·k∞ is sometimes called induced or system
norm. k · k∞ norm represents the maximum attenuation which
system inflicts on the signal that is transmitted through it. It
corresponds to the maximal value of Bode magnitude plot.
R EFERENCES
[1] A. Lapidoth and P. Narayan, “Reliable communication under channel
uncertainty”, IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2148-2177,
Oct. 1998.
[2] S. Z. Denic, C. D. Charalambous and S. M. Djouadi, “Channel capacity
subject to frequency domain normed uncertainties - SISO and MIMO
cases,” accepted for presentation at 17th International Symposium on
Mathematical Theory of Networks and Systems, 2006, Japan.
[3] P. E. Caines, Linear stochastic systems. New York: John Wiley & Sons,
1988.
[4] R. Gallager, Information theory and reliable communication. New York:
Wiley, 1968.
[5] N. M. Blachman, “On the capacity of a band-limited channel perturbed
by statistically dependent interference,” IRE Trans. Inf. Theory, pp. 48-55,
Jan. 1962.
[6] P. L. Duren, Theory of Hp spaces. Dover Publications, Unabridged
edition, 2000.
[7] J. C. Doyle, B. A. Francis, and A. R. Tannenbaum, Feedback Control
Theory. New York: Mcmillan Publishing Company, 1992.
14
[8] A. F. Molisch, M. Steinbauer, M. Toeltsch, E. Bonek, and R. S. Thomä,
“Capacity of MIMO systems based on measured wireless channels,” IEEE
J. Sel. Areas Commun., vol. 20, no. 3, pp. 561-569, Apr. 2002.
[9] S. Vishwanath, S. Boyd, A. Goldsmith, “Worst-case capacity of Gaussian
vector channels,” Proceedings of 2003 Canadian Workshop on Information Theory, 2003.
[10] P. Lancaster, L. Rodman, Algebriac Riccati equations. Oxford University
Press, USA, 1995.
[11] N. M. Blachman, “Communication as a game,” IRE Wescon 1957
Conference Record, vol. 2, pp. 61-66, 1957.
[12] L. H. Brandenburg and A. D. Wyner, “Capacity of the Gaussian channel
with memory: the multivariate case,” Bell Syst. Tech. J., vol. 53, no. 5,
1974.
[13] W. L. Root and P. P. Varaiya, “Capacity of classes of Gaussian channels,”
SIAM J. Appl. Math., vol. 16, no. 6, pp. 1350-1393, Nov. 1968.
[14] M. Médard, “The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel”, IEEE Trans.
Inf. Theory, vol. 46, no. 3, pp. 933-946, May 2000.
[15] D. P. Palomar, J. M. Cioffi, M. A. Lagunas, ”Uniform power allocation
in MIMO channels: a game theoretic approach,” IEEE Trans. Inf. Theory,
vol. 49, no. 7, pp. 1707-1727, July, 2003.
[16] T. Yoo and A. Goldsmith, “Capacity and optimal power allocation for
fading MIMO channels with channel estimation error,” IEEE Trans. Inf.
Theory, vol. 52, no. 5, pp. 2203-2214, May, 2006.
[17] B. Hassibi and B. M. Hochwald, “How much training is needed in
multiple-antenna wireless systems,” IEEE Trans. Inf. Theory, vol. 49, no.
4, pp. 951-963, Apr. 2003.
[18] S. Venkatesan, S. H. Simon, R. A. Valenzuela, “Capacity of MIMO
Gaussian channel with non-zero mean,” IEEE Vehicular Technology
Conference 2003, VTC 2003-Fall, 2003.
[19] D. Hösli and A. Lapidoth, “The capacity of a MIMO Ricean channel
is monotone in the singular values of the mean,” Proc. of the 5th
International ITG Conference on Source and Channel Coding (SCC),
Erlangen, Nuremberg, January 14-16, 2004.
[20] M. Godavarti, A. O. Hero, T. L. Marzetta, “Min-capacity of a multipleantenna wireless channel in a static Ricean fading environment,” IEEE
Trans. Wireless Commun., vol. 4, no. 4, pp. 1715 - 1723, Jul. 2005.
[21] L. Zheng and D. Tse, “Communication on the Grassmann manifold: a
geometric approach to the noncoherent multiple-antenna channel,” IEEE
Trans. Inf. Theory, vol. 48, no. 2, pp. 359-383, Feb. 2002.
[22] A. Lapidoth and S. M. Moser, “Capacity bounds via duality with
applications to multiple antenna systems on flat fading channels,” IEEE
Trans. Inf. Theory, vol. 49, no. 10, pp. 2426-2467, Oct. 2003.
[23] M. Uysal, C. N. Georghiades, “An efficient implementation of a
maximum-likelihood detector for space-time block coded systems,” IEEE
Trans. Commun, vol. 51, no. 4, pp. 521-524, Apr. 2003.
[24] R. Ahlswede, “The capacity of a channel with arbitrary varying Gaussian
channel probability functions,” Trans. 6th Prague Conf. Information
Theory, Statistical Decision Functions and Random Processes, pp. 1321, Sep. 1971.
[25] R. J. McElice, “Communications in the presence of jamming An
information theoretic approach”, in Secure Digital Commun., G. Longo,
ed., Springer-Verlang, pp. 127-166, New York, 1983.
[26] C. R. Baker and I.-F. Chao, “Information capacity of channels with
partially unknown noise. I. Finite dimensional channels,” SIAM J. Appl.
Math., vol. 56, no. 3, pp. 946-963, Jun. 1996.
[27] C. R. Baker and I.-F. Chao, “Information capacity of channels with
partially unknown noise. II. Infinite dimensional channels,” SIAM J.
Control and Optimization, vol. 34, no. 4, pp. 1461-1472, Jul. 1996.
[28] S. N. Diggavi and T. M. Cover, “The worst additive noise under a
covariance constraint”, IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 30723081, Nov. 2001.
[29] H. Boche and E. A. Jorsweick, “Multiuser MIMO systems, worst case
noise, and transmitter cooperation,” Proc. of the 3rd IEEE Int. Symp.
Signal Processing and Information Technology, ISSPIT 2003, 2003.
[30] A. Kashyap, T. Basar, R. Srikant, “Correlated jamming on MIMO
Gaussian fading channels,” IEEE Trans. Inf. Theory, vol. 50, no. 9, pp.
2119-2123, Sep., 2004.
[31] N. Chiurtu, B. Rimoldi and E. Telatar, “On the capacity of multi-antenna
Gaussian channels,” ISIT 2001, Washington DC, page 53, Jun. 24-29,
2001.
[32] E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans.
Telecomm. ETT, vol. 10, no. 6, pp. 585 596, Nov. 1999.
[33] M. J. Osborne and A. Rubinstein, A course in game theory, MIT Press,
1994
[34] Proakis, J., Digital communications. New York: Mcgraw-Hill, 1983.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X
[35] C. E. Shannon, “A note on a partial ordering for communication
channels,” Inform. and Control, vol. 1, pp. 390-397, 1958.
[36] A. W. Eckford, F. R. Kschischang, and S. Pasupathy, “On partial
ordering of Markov modulated channels under LDPC decoding,” ISIT
2003, page 295, Yokohama, Japan, 2003.
[37] D. G. Luenberger, Optimization by vector space methods, John Wiley
& Sons, 1969.
[38] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, 2003.
15
Download