IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X 1 Information Theoretic Bounds for Compound MIMO Gaussian Channels Stojan Z. Denic, Member, IEEE, Charalambos D. Charalambous, Senior Member, IEEE, and Seddik M. Djouadi, Member, IEEE Abstract—Achievable rates for compound Gaussian multipleinput multiple-output channels are derived. Two types of channels, modeled in the frequency domain, are considered when: 1) the channel frequency response matrix H belongs to a subset of H ∞ normed linear space, and 2) the power spectral density (PSD) matrix of the Gaussian noise belongs to a subset of L1 space. The achievable rates of these two compound channels are related to the maximin of the mutual information rate. The minimum is with respect to the set of all possible H matrices or all possible PSD matrices of the noise. The maximum is with respect to all possible PSD matrices of the transmitted signal with bounded power. For the compound channel modeled by the set of H matrices, it is shown, under certain conditions, that the code for the worst case channel can be used for the whole class of channels. For the same model, the water-filling argument implies that the larger the set of matrices H, the smaller the bandwidth of the transmitted signal. For the second compound channel, the explicit relation between the maximizing PSD matrix of the transmitted signal and the minimizing PSD matrix of the noise is found. Two PSD matrices are related through a Riccati equation, which is always present in Kalman filtering and liner-quadratic Gaussian control problems. Index Terms - multiple-input multiple-output Gaussian channel, compound channel, robustness, universal code, power allocation, channel degrading I. I NTRODUCTION A compound communication channel is used to model the situation when the transmitter and receiver, although unaware of the true communication channel, know that it belongs to a certain set of channels. The underlying assumption in the case of compound channels is that the true channel does not change throughout the transmission. Specifically, it models the situation in which “nature” chooses one channel from the set of all possible channels and keeps it unchanged from the beginning to the end of the transmission. Thus, a compound channel represents one way to model the uncertainty the transmitter and receiver have regarding the communication channel. The importance of compound channels has been pointed out in [1] (Section VII), where it is argued that compound channels S. Z. Denic is with Telecommunications Research Lab, Toshiba Research Europe Limited, BS1 4ND, Bristol, UK. (Email: stojan.denic@toshibatrel.com). Previously, he was with the Department of Electrical and Computer Engineering, University of Arizona, Tucson, USA, and the Department of Electrical and Computer Engineering University of Cyprus, Nicosia, Cyprus. C. D. Charalamobus is with the Department of Electrical and Computer Engineering, University of Cyprus, Nicosia, Cyprus; also with the School of Information Technology and Engineering, University of Ottawa. S. M. Djouadi is with the Department of Electrical and Computer Engineering, University of Tennessee, Knoxville, USA. (Email: djouadi@ece.utk.edu). This work was supported by the European Commission under the project ICCCSYSTEMS and the NSERC under an operating grant (T-810-289-01). may be used to model different combinations of (slow/fast) (flat/frequency-selective) fading channels. This paper is concerned with information theoretic bounds for two types of compound multiple-input multiple-output (MIMO) Gaussian channels. More precisely, the main goal of this paper is to determine achievable transmission rates for those channels. An achievable transmission rate is defined in the following way: if an encoder uses a code rate smaller than the achievable rate for a given communication channel, the probability of the decoding error can be made arbitrarily small as the codeword length increases. Hence, this is different from channel capacity that is defined as maximal achievable transmission rate for particular channel. The results presented here have been partially published in [2]. To derive the achievable rates, the mutual information rate formula for MIMO Gaussian channels, given in the frequency domain, is employed. The mutual information rate J(Wx , Wn ) is given by ([3], page 146) Z 2π ¢ ¡ 1 log det Ip + H(ejθ )Wx (θ)H ∗ (ejθ )Wn−1 (θ) dθ, (1) 4π 0 where Wx (θ) is the PSD matrix of a transmitted signal x, Wn (θ) is the PSD matrix of a Gaussian additive noise n, H(ejθ ) is the channel frequency response matrix, and θ ∈ [0, 2π]. The notation (·)∗ means complex conjugate transpose, and Ip stands for the identity matrix of dimension p. Thus, we deal with channels where the transmitted signal is constrained in the power and then sent through a linear time-invariant filter [4]. From (1), it can be noticed that there are two possible sources of uncertainty; one is the lack of knowledge of the channel matrix function H(ejθ ), while the other is the lack of knowledge of the noise PSD matrix Wn (θ). In this paper, these two types of uncertainties are modeled by two compound channel models. The uncertainty about the true channel matrix H(ejθ ) is due to impreciseness of the channel model or channel measurements, while the uncertainty about the true Wn (θ) may come from inability to estimate the interference from other users in a communication network. The difference between these two models of uncertainty is explained by Blachman in [5]. According to his terminology, the uncertainty regarding the channel matrix H(ejθ ) is called “interference of the second kind”, while the uncertainty regarding the noise PSD matrix Wn (θ) is called “interference of the first kind”. The achievable rates of two compound channels will be related to sup inf J(Wx , Wn ), Wx ∈A1 B (2) IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X 2 where J(Wx , Wn ) is defined by (1), A1 is the set of PSD matrices of the transmitted signal having bounded power, and B is the set of all possible channels, defined by the two channel models discussed above. In subsequent developments, the set B will be referred to as “uncertainty” set. The first type of compound channels is described by the set of channel matrices H(ejθ ) which is a subset of the space of bounded and analytic matrix functions defined in the open right-half plane H ∞ (for the definition of the space and corresponding norm see Appendix C and [6]). Specifically, the uncertainty set is represented by an additive model H(ejθ ) = Hnom (ejθ ) + ∆H(ejθ ), (3) where Hnom (ejθ ) is the so-called nominal part, which is a known matrix, and ∆H(ejθ ) is an unknown perturbation that models the uncertainty and the size of the uncertainty set [7]. In reality, the set of channel matrices H(ejθ ) can be extracted from Nyquist or Bode plots, obtained from channel measurements. An example of determining the capacity of MIMO channels from measurement data was demonstrated in [8]. Modeling uncertainty in the frequency domain enables us to determine how the achievable rate and the optimal transmission bandwidth will change when the size of the uncertainty set varies. The main contributions are the following. For a general form of ∆H(ejθ ), we derive necessary conditions of the maximin optimization problem given by (2). These necessary conditions provide a solution of (2) when there is an uncertainty in both, singular values and unitary matrices of the singular value decomposition of the channel frequency response matrix H(ejθ ). Further, the special case of ∆H(ejθ ) is considered, when the uncertainty is placed only on the singular values of H(ejθ ). We infer, based on the behavior of physical systems for high frequencies and the water-filling argument, that the optimal bandwidth shrinks as the size of uncertainty set, i.e., uncertainty increases. The worst case channel matrix is identified, as well. It can be seen that all other channels from the uncertainty set may be “degraded” to the worst case channel implying that the code for the worst case channel may be used for all channels within the uncertainty set. It is shown, under certain conditions, that the worst case concept can be applied to the general form of ∆H(ejθ ). Also, from the derived formula for the achievable rate, it follows that with the increase of uncertainty, the transmission takes place only over the largest singular values of the channel frequency response matrix Hnom (ejθ ). It should be noted that the formulation of the additive model implies partial knowledge of the channel state information (CSI) and no channel distribution information (CDI) regarding H(ejθ ). In this paper, CSI refers to the knowledge of the channel matrix realization H(ejθ ), and CDI refers to the knowledge of its probability distribution. The second type of compound channels is described by the set of the noise PSD matrices Z 2π n o Wn (θ) : Trace(Wn (θ))dθ ≤ Pn , (4) 0 which is a subset of the L1 space, the space of absolutely integrable functions. A similar problem was studied in [9], but for the case of memoryless channels. Our main contributions are the following. While in [9], the maximin optimization problem was solved numerically, in this paper, it is solved analytically, providing a relation between the optimal PSD matrix of the transmitted signal Wxo (θ), which maximizes the mutual information rate J(Wx , Wn ), and the noise PSD matrix Wno (θ), which minimizes it. Specifically, the two matrices are related through a Riccati equation which is common in optimal estimation and control problems [10]. When the channel matrix H(ejθ ) is square and invertible, it turns out that the noise PSD matrix Wno (θ) is proportional to the optimal PSD matrix of the transmitted signal Wxo (θ). This represents a generalization of the work found in [11], for the case of MIMO compound channels. The achievability of J(Wxo , Wno ) follows from the classical result [12], by employing “Gaussian codebooks” and maximum-likelihood decoding. A. Literature Review In this section, we give the review of the literature that is related to the problems considered in the paper. 1) Channels with Complete CSI and Uncertain Channel Matrix H: Probably, the first, who studied additive Gaussian channels, where the transmitted signal, constrained in power, is sent through a liner time-invariant filter, which is not bandlimited, is Gallager in [4]. He proved a channel coding theorem and its converse in the case of continuous-time channels. His work was generalized by Root and Varaiya [13], who computed the capacity of different classes of Gaussian channels. They derived the capacity formulas for the class of Gaussian MIMO memoryless channels and for the class of single-input single-output continuous-time channels when the uncertainty set is compact. Brandenburg and Wyner, [12], deal with discrete-time MIMO Gaussian channels when the channel matrix H(ejθ ) and the noise PSD Wn (θ) are completely known to the transmitter and the receiver. The optimal PSD matrix of the transmitted signal is given in terms of “water-filling” over the singular values of the channel matrix H(ejθ ) in the frequency domain. Médard, in [14], computed upper and lower bounds on the mutual information for uncertain fading channels, when the receiver uses a channel estimate in the decoding for single-user and multiple-access channels. In [15], Palomar et al. considered a memoryless compound MIMO Gaussian channel, in the absence of CDI and with partial CSI. The uncertainty set is defined as the set of channel matrices with unconstrained right singular vectors. It is shown that the optimal covariance matrix of the transmitted signal provides uniform power allocation among the transmitting antennas. The effect of the channel estimation error on the capacity and outage probability of memoryless MIMO fading channels is studied in [16], when the transmitter and the receiver are aware of the CDI and use the estimate of the channel matrix in the decoding. The optimal spatial and temporal power adaptation strategies are provided. It is interesting to notice that the solution for the optimal spatial power adaptation, found in IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X [16], is similar to the optimal power adaptation obtained in this paper for the case where the uncertainty is imposed on the channel matrix H(ejθ ). The difference is that, in our case, we consider the compound channel, where the solution is obtained by applying the worst case channel concept and ignoring possible randomness of H(ejθ ), while in [16], the authors consider the capacity subject to imperfect CSI where the channel matrix has a Rayleigh distribution. The impact of a training sequence on the capacity of MIMO fading channels is studied in [17]. For a block-fading assumption, it is shown that the number of training symbols is equal or greater than the number of the transmitting antennas. Other related work includes the results on the capacity of MIMO fading channels with non-zero mean (Ricean fading channel) [18], [19], [20]. Ricean fading channel is modeled by the sum of deterministic and random matrix whose elements are independent and identically distributed zero-mean complex circularly-symmetric Gaussian random variables. In [19], it is shown that the capacity of this channel is monotonically nondecreasing in the singular values of the deterministic matrix, when the CSI is available to the receiver but not to the transmitter. In [20], the authors consider a compound Ricean fading channel. It is assumed that the deterministic part of the channel matrix is uncertain. Under the assumption that the transmitter does not have CSI, the optimal signal matrix, which achieves the capacity, is equal to the product of three matrices: a T × T unitary matrix, a T × M real non-negative diagonal matrix, and M × M unitary matrix. M denotes the number of transmitting antennas, and T is the wireless channel coherence time, measured in number of symbol periods. In [21] and [22], the capacity of MIMO fading channels in the presence of CDI, but not CSI is considered. The authors of [21] assume block-fading channel model and take a geometric approach in computing the channel capacity. It is found that ∗ the channel capacity grows linearly with M ∗ (1− MT ) for high 4 signal-to-noise ratio (SNR), where M ∗ = {M, P, bT /2c}. T and M are defined as in [20], while P represents the number of receive antennas. Also, it is shown that the optimal number of transmitting antennas is M ∗ , i.e., the use of larger number of antennas does not give larger capacity gain. If the block fading assumption is removed, then for high SNR, it is proved that the channel capacity grows only double logarithmically as a function of SNR [22]. For consideration of how the lack of the CSI affects the design of a MIMO receiver see, for example, [23]. 2) Uncertain Channel Noise: It appears that Blachman was the first to investigate the channel capacity subject to uncertain noise [11]. By using a two-player zero-sum game theoretical framework, he defined “pure” and “mixed” strategies and considered the case where the transmitter and noise (jammer) can decide between finite number of strategies. Hence, communication subject to uncertain noise can be understood as communication in presence of jamming, which is further treated in, for instance, [24] and [25] for single-input singleoutput channels. In [26] and [27], Baker and Chao provide the most general mathematical framework for analysis of compound channels with uncertain noise. Finite [26] and infinite 3 dimensional [27] communication channels are examined, and the capacity and optimal transmitter and jammer strategies are given as functions of eigenvalues of operators that model the channel. In contrast to [27], we deal with stationary signals such that the channel capacity is given in terms of the PSD’s of the transmitted and noise signals, which could be more desirable from the engineering point of view. In [28], the capacity of memoryless MIMO channels, subject to noise covariance constraints, is obtained. The authors establish many interesting properties of the mutual information with respect to the maximin optimization problem similar to that defined in (2). Other references of interest for MIMO memoryless uncertain noise channels include [29] and [30]. The former gives the relation between the eigenvalues of the transmitted signal covariance matrix and the covariance matrix of the noise. The latter considers the case of jamming over MIMO Gaussian Rayleigh channels when the jammer has access to the transmitted signal. It is found that the knowledge of the channel input does not bring any advantage to the jammer. B. Paper Organization Section II gives the definitions of two compound channels and corresponding achievable rates. Sections III and IV provide explicit formulas for the achievable rates of the compound channels defined in Section II, in terms of the maximin of the mutual information rate. Section V discusses the conditions under which the previously computed code rates are achievable. Section VI contains examples. C. Notation Z represents the set of integers. C represents the set of complex numbers. 0 is a zero matrix. arg(w) stands for the phase of the transfer function w(ejθ ). σ̄(A) denotes the maximal singular value of a matrix A. If A is a matrix, notation A ≥ 0 means that A is non-negative definite. [A]ij denotes an element of matrix A located at ith row and j th column. II. P ROBLEM D EFINITION Consider a discrete-time MIMO channel defined by y(t) = t X h(t − j)x(j) + n(t), (5) j=−∞ 4 where x = {x(t) : t ∈ Z} is an m-component complex stationary stochastic process representing a transmitted signal, 4 n = {n(t) : t ∈ Z} is a p-component complex Gaussian 4 stochastic process representing an additive noise, y = {y(t) : t ∈ Z} is a p-component complex stationary stochastic process 4 representing a received signal, and h = {h(t) : t ∈ Z} is a sequence of complex p × m matrices representing the impulse response of a MIMO communication channel. It is assumed that x generates a Hilbert space [3]. Here, H(ejθ ) represents a channel frequency response matrix, which is the discrete Fourier transform of h given by H(ejθ ) = +∞ X t=0 h(t)ejθt , (6) IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X 4 where P+∞ θ is ajθtnormalized frequency. It is assumed that converges in L2 (Fx ), to a limit in L2 (Fx ), t=−∞ h(t)e denoted by H(ejθ ). L2 (Fx ) is a Hilbert space of complexvalued Lebesgue-Stieltjes measurable functions H(ejθ ) of a finite norm Z 2π 4 kH(ejθ )kL2 (Fx ) = Trace H(ejθ )dFx (θ)H ∗ (ejθ ). (7) 0 Fx denotes the matrix spectral distribution of x [3], which is assumed to be absolutely continuous with respect to the Lebesgue measure on [0, 2π]. Hence, dFx (θ) = Wx (θ)dθ, where Wx (θ) represents the PSD matrix of x. Wn (θ) represents the PSD matrix of n. Next, the two compound channels are defined as well as the corresponding maximin optimization problems that will be related to the achievable rates of these channels. First problem: Channel unknown, noise known. The compound channel is defined by a set n 4 A2 = H ∈ H ∞ : H = Hnom + w∆, o H, Hnom , w, ∆ ∈ H ∞ , k∆k∞ ≤ 1 , (8) where: 1) Hnom (ejθ ) is a nominal channel frequency response matrix, which is stable, and is the result of previous measurement or belief regarding the channel, 2) w(ejθ ) is a known stable scalar transfer function, which defines the size of the uncertainty set A2 at each frequency θ, 3) ∆(ejθ ) is a stable variable frequency response matrix, which accounts for the phase uncertainty and acts as a scaling factor on the magnitude of the perturbation (for the definition of H ∞ space and associated norm see Appendix C). Thus, H(ejθ ) describes a disk centered at Hnom (ejθ ) with a radius determined by w(ejθ ). To explain this point better, observe that σ̄(H − Hnom ) = σ̄(w∆) ≤ |w|, (9) because k∆(ejθ )k∞ ≤ 1 (for the definition of k · k∞ see Appendix C). In other words, the maximum singular value of H(ejθ )−Hnom (ejθ ) is bounded by |w(ejθ )|. It means that the uncertainty set is defined by putting a bound on the maximum singular value of the matrix implying that all other smaller singular values of the matrix are bounded as well. This is different from what it is done in, for instance [31], where the uncertainty is defined with respect to the sum of singular values of the channel matrix. Further, introduce the set of all possible PSD matrices of the transmitted signal by Z 2π n o 4 A1 = Wx (θ) : Trace(Wx (θ))dθ ≤ Px . (10) transmitter and the receiver, are aware of the uncertainty set A2 . Second problem: Channel known, noise unknown. The noise uncertainty is defined through the uncertainty of the PSD matrix Wn (θ). It is assumed that although unknown, Wn (θ) belongs to a set Z 2π n o 4 A3 = Wn (θ) : Trace(Wn (θ))dθ ≤ Pn . (12) 0 The same constraint R 2πis introduced for the transmitter such that 4 A1 = {Wx (θ) : 0 Trace(Wx (θ))dθ ≤ Px }. An achievable rate of the compound channel described by A3 is defined by 4 Ra2 = sup 4 Ra1 = sup inf J(Wx , Wn ), Wx ∈A1 H∈A2 (11) where J(Wx , Wn ) is given by (1). It is also assumed that the transmitter and the receiver know the nominal channel frequency response matrix Hnom (ejθ ), as well as the size of the uncertainty set |w(ejθ )|. This implies that both, the J(Wx , Wn ). (13) In the next two sections, the solutions of (11) and (13) are presented. III. F IRST P ROBLEM : C HANNEL U NKNOWN , N OISE K NOWN When the channel matrix is completely known, its singular value decomposition gives n independent parallel channels, where n is the number of nonzero singular values [32]. But, when the channel matrix is partially known as in (8), the application of singular value decomposition is not possible. This causes a problem, for both maximization and minimization in (11). Since (11) is a double-constrained optimization problem, the Lagrange multiplier technique is applied, which gives the necessary conditions that relate the optimal Wx (θ) and ∆(ejθ ). For simplicity of the derivation, it is assumed that Wn (θ) = Ip , while the case when Wn (θ) 6= Ip can be treated by consider1/2 ing (Wn )−1 (θ)H(ejθ ) as an equivalent channel frequency 1/2 1/2∗ response matrix, where Wn (θ)Wn (θ) = Wn (θ). Theorem 3.1: The solution of the maximin optimization problem (11) is given by Z 2π 1 log det(Ip + (Hnom + w∆o )Wxo (Hnom + w∆o )∗ )dθ, 4π 0 where the optimal Wx (θ) and ∆(ejθ ) are the solutions of 1 wWxo (Hnom + w∆o )∗ (Ip + (Hnom + w∆o ) × 4π Wxo (Hnom + w∆o )∗ )−1 + K∆o,∗ = 0, and (Hnom + w∆o )∗ (I + (Hnom + w∆o )Wxo × (Hnom + w∆o )∗ )−1 (Hnom + w∆o ) = 4πλ1 Im , 0 An achievable rate of the compound channel described by (8) is going to be related to the following maximin problem inf Wx ∈A1 Wn ∈A3 where Z 2π T race(Wxo (θ))dθ = Px , (14) ∆o,∗ (ejθ )∆o (ejθ ) − Im = 0, (15) 0 and constant scalar λ1 > 0 and constant matrix K > 0 are Lagrange multipliers. Proof. The proof is given in Appendix A. IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X Remark 3.2: Notice that in (8), the scalar-valued function w(ejθ ) may be replaced by a matrix-valued function W (ejθ ), while the forms of expressions in Theorem 3.1 remain unchanged if w(ejθ ) is substituted by W (ejθ ). Theorem 3.1 provides the optimal power allocation for the general form of perturbation ∆H(ejθ ). In this case, the uncertainty is imposed on the singular values of H(ejθ ) as well as on the unitary matrices Uc (ejθ ) and Vc (ejθ ), which are the factors in the singular value decomposition of H(ejθ ) = Uc (ejθ )Σc (θ)Vc∗ (ejθ ). On the other hand, the computation of the optimal power allocation given by Theorem 3.1 requires the application of numerical methods, and the achievability of Ra1 might be difficult to prove. Hence, the closed form solution of (11) will be given for a specific structure of the matrix ∆(ejθ ), " # jθ ∆ (e ) 0 1 ∆(ejθ ) = U (ejθ ) V ∗ (ejθ ). (16) 0 0 Here, ∆1 (ejθ ) is an n × n diagonal matrix, and n is the rank of Hnom (ejθ ), n ≤ min(p, m). U (ejθ ) and V (ejθ ) are p × p and m × m unitary matrices, respectively, which correspond to the singular value decomposition of the nominal frequency response matrix Hnom (ejθ ) = U (ejθ )Σ(θ)V ∗ (ejθ ), where " # Σ1 0 Σ= , (17) 0 0 p×m and Σ1 (θ) = diag(σ1 (θ), ..., σn (θ)) is a diagonal matrix whose elements are called singular values of Hnom (ejθ ). As it turns out, this assumption enables the application of Hadamard’s inequality for the maximization of the mutual information with respect to the PSD matrix Wx (θ) as in [32]. Previous assumption also means that the uncertainty is imposed only on the singular values of H(ejθ ), but not on the unitary matrices Uc (ejθ ) and Vc (ejθ ). Because the mutual information is the function of singular values, determining the achievable rate, (11) subject to (16) provides an achievable rate when there is the uncertainty in the singular values of the channel matrix H(ejθ ). Physically, if H(ejθ ) models a wireless communication channel, this situation could correspond to a fixed wireless link when the transmit and receive antennas do not move. Then, the angles of arrivals and departures are known, implying that matrices Uc (ejθ ) and Vc (ejθ ) are known. However, the channel gains may change due to the variations in the propagation environment. In this case, the uncertainty in the channel gains is described by the uncertainty in the singular values, and (11) subject to (16) gives an achievable rate for this particular case. The achievability of Ra1 in the special and general case is addressed in Section V. Theorem 3.3: The solution of maximin optimization problem (11) subject to (16) is given by Ra1 = n Z 1 X log[µ(σi (θ) − |w(ejθ )|)2 ]dθ, 4π i=1 Si (18) (µ − (σi (θ) − |w(ejθ )|)−2 )dθ = Px , (19) n Z X i=1 Si 5 where 4 Si = {θ : µ − (σi (θ) − |w(ejθ )|)−2 > 0}, (20) i = 1, ..., n, and µ is a positive constant related to the Lagrange multiplier of the power constraint associated with the set A1 . A matrix ∆o1 (ejθ ), which minimizes the mutual information rate, is given by −jarg(w)+jπ e 0 ··· 0 0 e−jarg(w)+jπ · · · 0 . (21) .. .. .. .. . . . . −jarg(w)+jπ 0 0 ··· e Proof. The proof is given in Appendix A. Corollary 3.4: If |w(ejθ )| is equal to zero, the formula for the channel capacity when there is no uncertainty is obtained [12]. Remark 3.5: In Appendix A, it is proved that the optimal solutions of (11), Wxo (θ) and ∆o (ejθ ), satisfy a saddle point property, which is equivalent to saying that sup inf J(Wx , Wn ) = inf Wx ∈A1 H∈A2 sup J(Wx , Wn ). (22) H∈A2 Wx ∈A1 This is shown by solving maximin and minimax problems directly, rather than using the Minimax Theorem of von Neumann [33]. Theorem 3.3 shows that the optimal transmitted power is given in the form of modified water-filling which implies that Ra1 will be different from zero only if there exists an interval of frequencies such that σi (θ) − |w(ejθ )| > 0, i=1,...,n. If the uncertainty |w(ejθ )| grows over the whole frequency range, at one point it will reach the lowest singular value σn (θ) of the nominal frequency response matrix Hnom (ejθ ). Then, the optimal way of transmission is to concentrate the power on the rest of the channel modes, and not to allocate the power to the mode that vanished because of uncertainty. By the same token, if the uncertainty keeps growing, the modes will vanish one by one, and at the end only the strongest mode will remain. In addition, the form of (18) indicates that we may deal with the worst case channel that is characterized by the singular values of the nominal channel frequency response matrix Hnom (ejθ ) reduced by the size of the uncertainty set |w(ejθ )|. More discussions on these issues are given in Section V. Another interesting point, which is obtained from the waterfilling (18) concerns the dependence of the optimal bandwidth on the uncertainty. Note the following: Theorem 3.3 implies that the the optimal bandwidth depends on the frequencies over which µ − (σi (θ) − |w(ejθ )|)−2 , i = 1, ..., n, is positive. The optimal power allocation is found by pouring the power, constrained by Px , into the wells (σi (θ) − |w(ejθ )|)−2 , i = 1, ..., n, till level µ. The shapes of the wells depend on the size of the uncertainty set |w(ejθ )|, for each frequency θ. On the other hand, physical systems are subject to more uncertainties at high frequencies than at low frequencies. Thus, |w(ejθ )| should be assigned larger values at high frequencies. This aspect is explained by considering a wireless fading channel. The low-pass representation of a time-varying im- IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X 6 while the optimal PSD matrix of the transmitted signal satisfies 1 V w H ∗ (ejθ )Wno−1 (θ)(Ip + H(ejθ )Wxo (θ) × H ∗ (ejθ )Wno−1 (θ))−1 H(ejθ ) = 4πλo2 Im , 2 1 V2 T B1 T B 2 Fig. 1. P1 P2 λo1 T B 2 T B1 and > 0 and > 0 are the Lagrange multipliers associated with the constraint sets A3 and A1 , respectively, which are computed from Z 2π Trace(Wxo (θ))dθ = Px , (26) 0 Z 2π Trace(Wno (θ))dθ = Pn . (27) T 0 Modified water-filling: optimal bandwidth and uncertainty pulse response of a wireless fading channel is given by [34] X c(τ, t) = αn (t) exp(−j2πfc τn (t))δ(τ − τn (t)), (23) n where αn (t) and τn (t) are attenuations and delays, respectively, and fc is a carrier frequency. The phase of the signal θn (t) is determined by 2πfc τn (t). This means that the change in τn (t) by 1/fc results in a change in θn (t) by 2π. Therefore, for large fc , 1/fc is a small value such that a small motion in the transmission medium may cause a change in θn (t) by 2π. Hence, as frequency increases the distance between a perturbed well (σi (θ) − |w(ejθ )|)−2 and the nominal σi−2 (θ) will increase. Consequently, the perturbed well is narrower than the nominal one as illustrated in Fig 1. From this, we may deduce that the optimal bandwidth for an uncertain channel is smaller than the optimal bandwidth for a completely known channel. One has to have in mind that the two wells have to be filled by using a total power of Px . IV. S ECOND P ROBLEM : C HANNEL K NOWN , N OISE U NKNOWN In this section, the solution of (13) is considered. The problem may be treated as a game between the transmitter x and “nature” that picks up the PSD of the noise Wn (θ) which is unknown but belongs to the set A3 . We solve the maximin problem directly, by applying the Lagrange multiplier technique. Theorem 4.1: The solution of the maximin optimization problem (13) is given by Z 2π 1 log det(Ip + H(ejθ )Wxo (θ)H ∗ (ejθ )Wno−1 (θ))dθ, 4π 0 where the optimal PSD matrix of the noise satisfies the matrix Riccati equation Wno2 (θ) 1 + Wno (θ)H(ejθ )Wxo (θ)H ∗ (ejθ ) + 2 1 H(ejθ )Wxo (θ)H ∗ (ejθ )Wno (θ) − 2 1 H(ejθ )Wxo (θ)H ∗ (ejθ ) = 0, 4πλo1 (25) λo2 Proof. The proof is given in Appendix B. Remark 4.2: Theorem 4.1 shows that the optimal matrices Wxo and Wno are related through the matrix Riccati equation which emerges in solutions of many optimal estimation and control problems. For instance, the time evolution of the error covariance matrix in Kalman filtering is described by Riccati equation [34]. Further, when H(ejθ ) is square and invertible, after some manipulation of (24) and (25), it is shown that λo1 ∗ jθ Wxo (θ) = H (e )Wno (θ)H −∗ (ejθ ). (28) λo2 Moreover, the optimal PSD matrices of the communicator and the noise are given by λo Wxo = 1o H ∗ (4πλo2 Ip + 4πλo1 HH ∗ )−1 H, (29) λ2 Wno = H(4πλo2 Ip + 4πλo1 H ∗ H)−1 H ∗ . (30) The formula for the achievable rate becomes Z 2π 1 λo Ra2 = log det(Ip + 1o H(ejθ )H ∗ (ejθ ))dθ. (31) 4π 0 λ2 When the channel is single-input single-output, (28) gives λo (32) Wxo (θ) = o1 Wno (θ). λ2 Thus, the explicit solution of the optimization problem (13) provides us with the explicit relation between Wxo (θ) and Wno (θ) in terms of Riccati matrix equation (24) and (25) in the general case, and (28) in the special case of square and invertible channel matrix H(ejθ ). In the special case, the optimal PSD matrix of the noise is proportional to the optimal PSD matrix of the transmitted signal demonstrating that the optimal solution for both players is to match opponent’s statistical properties. This shows the advantage of a direct solution of optimization problem (13), since it directly relates the optimal transmitter’s and noise strategies, which cannot be seen if the problem is solved through numerical techniques. Remark 4.3: From the solution given in Appendix B, the optimal solutions Wxo (θ) and Wno (θ) constitute a saddle point of the optimization problem given by (13), meaning J(Wx , Wno ) ≤ J(Wxo , Wno ) ≤ J(Wxo , Wn ), (33) Wxo (θ) for any Wx (θ) ∈ A1 , Wx (θ) 6= and any Wn (θ) ∈ A3 , Wn (θ) 6= Wno (θ). In addition, it follows from Appendix B that (24) sup inf Wx ∈A1 Wn ∈A3 J(Wx , Wn ) = inf sup J(Wx , Wn ). (34) Wn ∈A3 Wx ∈A1 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X Hence, when the transmitter does not know the PSD of the noise, it will use the PSD matrix Wxo (θ), which guarantees the transmission rate of at least J(Wxo , Wno ) according to (33). Similarly, “nature” tends to use the PSD matrix Wno (θ), which is a scaled version of the optimal PSD matrix Wxo (θ). In addition, the derived formula can be used to model the transmission over a communication channel subject to interference from other users in the network. One such example is a public cellular system where all the mobile and base stations use the same number of antennas. If the channel matrix between the interferer and the base station of the cellular system is denoted by G(ejθ ) (of dimension p × p), then the mutual information rate is given by Z 2π 1 log det(Ip + HWx H ∗ (GWn G∗ )−1 )dθ 4π 0 Z 2π 1 (35) log det(Ip + H1 Wx H1∗ Wn−1 )dθ, = 4π 0 4 where H1 (ejθ ) = G−1 (ejθ )H(ejθ ). Here, it is assumed that G(ejθ ) is an invertible matrix. V. ACHIEVABLE R ATES A. Achievable Rates for Uncertain Channel Matrix Special case of additive perturbation. For the compound channel described by the set A2 , it is shown that the supremum and the infimum of the problem defined by (11) can be interchanged (see Appendix A). Moreover, the form of (18) and (19) subject to (16), suggests that we are dealing with the worst case channel determined by the singular values of the nominal channel matrix Hnom (ejθ ) that are reduced by the size of the uncertainty set |w(ejθ )|. Hence, the MIMO capacity depends only on the singular values of the channel frequency response and not on the unitary matrices U (ejθ ) and V (ejθ ) [12]. Therefore, all channel matrices, which have the same singular values, constitute an equivalent class, where all members of the class have the same channel capacity. For the specific structure of ∆(ejθ ) ∆(ejθ ) = U (ejθ )∆s (ejθ )V ∗ (ejθ ), where " 4 ∆s (ejθ ) = (36) # ∆1 (ejθ ) 0 , 0 0 (37) any channel frequency response matrix from A2 is given by H(ejθ ) = U (ejθ )(Σ(θ) + ∆s (ejθ )w(ejθ ))V ∗ (ejθ ), jθ (38) jθ where Σ(θ) + ∆s (e )w(e ) is diagonal. In order to achieve the worst case channel capacity, it is enough to diagonalize the channel matrix H(ejθ ) by precoding the transmitted signal by V (ejθ ) and by shaping the received signal by U ∗ (ejθ ) as shown in Fig. 2. Consequently, n parallel one-dimensional compound channels are obtained, which enables the use of n one-dimensional codes. Each one-dimensional compound channel is represented by σi (θ) + δs,ii (ejθ )w(ejθ ), i = 1, ..., n, (39) 7 n MIMO channel H V Fig. 2. + U* Transmission scheme for uncertain channel matrix where δs,ii (ejθ ) is an element of ∆s (ejθ ) = diag(δs,11 (ejθ ), ..., δs,nn (ejθ )), |δs,ii (ejθ )| ≤ 1, i = 1, ..., n. For each of n one-dimensional channels defined by (39), the worst case channel is determined by σi (θ) − |w(ejθ )|, which represents the channel with the smallest magnitude out of all possible channels defined by (39). Thus, following Shannon’s work [35] and the notion of “degrading” (for one practical example of degrading channels see [36]), it is possible to use n one-dimensional codes for the compound channel defined by A2 subject to (16), each of which is tuned to the corresponding worst case channel σi (θ) − |w(ejθ )|, i = 1, ..., n. Namely, an one-dimensional code, tuned to the worst case channel, will perform well over the whole class of channels defined by (39), because all other channels from the class are less detrimental than the worst case channel, i.e., they have larger magnitude. Moreover, we say that each channel from the set {σi (θ) + δs,ii (ejθ )w(ejθ ) : |δs,ii (ejθ )| ≤ 1} can be “degraded” to the worst case channel σi (θ) − |w(ejθ )|, −π ≤ θ ≤ π. The expression “degrading” means that each channel from {σi (θ) + δs,ii (ejθ )w(ejθ ) : |δs,ii (ejθ )| ≤ 1} can be transformed into σi (θ) − |w(ejθ )|, −π ≤ θ ≤ π, by reducing a channel magnitude for each frequency θ to σi (θ) − |w(ejθ )|, −π ≤ θ ≤ π. This implies that the set of n one-dimensional codes are universal for the special case of the MIMO compound channel defined by (16). Therefore, Ra1 represents the channel capacity of the MIMO compound channel defined by the set A2 subject to (16). General case of additive perturbation. In the general case, ∆(ejθ ) does not have the form of (36). Next, consider the general form of the additive uncertainty description given by H(ejθ ) = = = Hnom (ejθ ) + ∆(ejθ )w(ejθ ) (40) jθ ∗ jθ jθ jθ U (e )Σ(θ)V (e ) + ∆(e )w(e ) (41) U (Σ + U ∗ ∆V w)V ∗ . (42) Thus, the second term within parentheses in (42) corresponds to (36), i.e., 4 ∆s (ejθ ) = U ∗ (ejθ )∆(ejθ )V (ejθ ). (43) It follows that the most general case of (42) is equivalent to the case when ∆s (ejθ ) is not diagonal. Then, ∆s (ejθ ) can be written as the sum of diagonal and off-diagonal part, namely, ∆s (ejθ ) = ∆s,diag (ejθ )+∆s,of f −diag (ejθ ). The effect of offdiagonal elements of ∆s (ejθ ) may be viewed as an additive noise, which will be explained next. If we use the same strategy as before, by precoding the transmitted signal by V (ejθ ) and shaping the received signal by U ∗ (ejθ ) (see [16]), IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X the equivalent channel representation is given by jθ Heq (e ) = = ∗ jθ jθ jθ U (e )H(e )V (e ) (44) Σ + ∆s,diag w + ∆s,of f −diag w, (45) in which the output of the ith channel is represented in the frequency domain by yi (ejθ ) = (σi (θ) + δs,ii (ejθ )w(ejθ ))xi (ejθ ) + X δs,ik (ejθ )w(ejθ )xk (ejθ ) + ni (ejθ ). (46) k6=i Here, δs,ij (ejθ ), 1 ≤ i ≤ p, 1 ≤ j ≤ m, are the entries of ∆s (ejθ ). The off-diagonal elements δs,ij (ejθ ), i 6= j, are absorbed into an equivalent noise X neq,i (ejθ ) = ni (ejθ ) + δs,ik (ejθ )w(ejθ )xk (ejθ ). (47) k6=i Thus, one can use n one-dimensional codes, each tuned to a corresponding σi (θ) − |w(ejθ )| channel, and include the additional term, created by off-diagonal elements of ∆s (ejθ ), into the noise. This means that the channel capacity in the most general case will be less then the value found in Theorem 3.3. It is interesting to note that this interpretation of the uncertainty is equivalent to the “interference of the second kind” introduced by Blachman in [5], where the overall interference has two parts, one, which is orthogonal to the transmitted signal, and the other antiparallel to it. Remark 5.1: Ra1 given in Theorem 3.3 represents an upper bound on the capacity of the MIMO compound channel defined by the set A2 . An achievable rate in the general case can be inferred from (46). First, k∆(ejθ )k∞ ≤ 1 and (43) imply k∆s (ejθ )k∞ ≤ 1 and |δs,ij (ejθ )| ≤ 1, 1 ≤ i ≤ p, 1 ≤ j ≤ m. Therefore, the power of the equivalent noise neq,i (ejθ ) can be upper bounded by X Wneq,i ≤ Wni + |w|2 Wxk , (48) k6=i where Wneq,i (θ), Wni (θ) and Wxk (θ) are PSD’s of the equivalent noise neq,i of the ith channel, the noise ni of the ith channel, and the signals xk coming from the transmitters k 6= i, respectively. Also, it is assumed that the signals transmitted from different antennas are independent. Having in mind (48), an achievable rate for the general case can be computed, and it is given by the following theorem. Theorem 5.2: An achievable rate for the compound channel defined by the set A2 is the solution of the following optimization problem à ! n Z Wxi (σi − |w|)2 1 X 2π P log 1 + dθ sup 4π i=1 0 Wni + k6=i |w|2 Wxk n Z 2π X Wxi dθ = Px . (49) s.t. i=1 0 Proof. The proof follows from the achievability for singleinput single-output channels From (47), the equivalent noise at the ith receiver consists of two parts: one, that is the product of thermal noise at the ith 8 receiver, and the other which is the product of the interference coming from the signals generated at the P transmitters k 6= i. Corollary 5.3: If it is assumed that k6=i |w|2 Wxk is negligible comparing to Wni in Theorem 5.2, then the ensemble of n one-dimensional codes can achieve the transmission rate Ra1 given by Theorem 3.3. In practice, this could happen if the noise n is used not only to model the thermal noise, but also the interference from other users in a communication network. B. Achievable Rates for Uncertain Noise When the noise uncertainty is described by the set A3 , the optimal PSD matrix of the transmitted signal Wxo (θ) and the optimal PSD matrix of the noise Wno (θ) satisfy the saddle point condition, as proved in Appendix B. This suggests that the notion of the worst case noise can be employed. To show the achievability of Ra2 given by Theorem 4.1, we assume the following: 1) The receiver has the knowledge of the noise PSD matrix Wn (θ), while the transmitter does not have to know it, 2) The transmitter uses a “Gaussian codebooks”. In this paper, the “Gaussian codebook” is defined as the set of codewords {xi }M i=1 of length N , where xi = [xi (0), ..., xi (N − 1)], xi (k) ∈ Cm , k = 0, ..., (N − 1), are the sample paths of M independent discrete-time Gaussian processes, having the same PSD matrices Wxo (θ), the one that is optimal with respect to the worst case noise PSD, Wno (θ). It is assumed that stochastic processes, which generate the codewords, are ergodic. Then, the achievable transmission rate is J(Wxo (θ), Wn (θ)) for any choice of the PSD noise matrix Wn (θ) (proved in [12]). From the saddle point property J(Wxo (θ), Wno (θ)) ≤ J(Wxo (θ), Wn (θ)), implying that the achievable transmission rate is lower bounded by J(Wxo (θ), Wno (θ)). This transmission rate is achievable when the worst case noise affects the transmission (having PSD Wno (θ)). Hence, Ra2 = J(Wxo (θ), Wno (θ)) is an achievable rate for the whole class of noises A3 subject to the assumption that the receiver knows Wn (θ), but the transmitter does not have to know it. The transmitter has to have the knowledge of J(Wxo (θ), Wno (θ)) in order to choose a transmission rate R < J(Wxo (θ), Wno (θ)). Therefore, a “Gaussian codebook” provides robustness when the transmitter does not know the noise. However, the previous consideration does not prove the universality of the “Gaussian codebook”, i.e., it does not show the existence of the single code that will be good for all noises in A3 . Rather, based on the classical result [12] and the saddle point property found here, it can be shown that the probability of the decoding error, averaged over randomly chosen codebooks, tends to zero, for any noise from A3 . VI. E XAMPLES A. Uncertain Channel The following example illustrates the computation of the achievable rates, when the uncertainty with respect to the channel frequency response matrix is described by the set A2 . The nominal channel frequency response matrix is given by " # 1 1 1 Hnom (ejθ ) = 1− 12 e−jθ 1− 14 e−jθ 1 1− 41 e−jθ 1− 14 e−jθ 1 1 1− 21 e−jθ 1− 41 e−jθ , (50) IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X 9 4.5 3.5 >H e @ 4 >H e @ jT 3 ub G 1.6 jT 11 nom 3.5 11 2.5 3 V 1 T 2.5 2 w e jT 2 1.5 1.5 1 1 0.5 -4 -3 -2 -1 0 1 2 3 4 0.5 -4 -3 -2 -1 0 1 2 3 4 ș ș Fig. 4. Fig. 3. Magnitudes of [Hnom (ejθ )]11 and [Hub (ejθ )] σ1 (θ) and |w(ejθ )| 11 1.4 G 1.6 while the upper bound on the uncertainty set is given by " # δ 1.4 Hub (ejθ ) = 1− 12 e−jθ 1.4 1− 14 e−jθ 1− 14 e−jθ δ 1− 12 e−jθ , 1.2 (51) where δ is a known constant. The magnitudes of [Hnom (ejθ )]11 and [Hub (ejθ )]11 are shown in Fig. 3, for δ = 1.6. The size of the uncertainty set is determined by observing that 1 0.8 0.6 V 2 T w e jT 0.4 0.2 σ̄(Hub (ejθ ) − Hnom (ejθ )) = σ̄(∆(ejθ )w(ejθ )Im ) ≤ |w(ejθ )|. For the connection between the maximal singular value and k · k∞ see Appendix C. It is assumed that |w(ejθ )| = σ̄(Hub (ejθ ) − Hnom (ejθ )) for each frequency θ ∈ [−π, π]. In Fig. 4 and 5, the size of the uncertainty set |w(ejθ )| is compared to the singular values σ1 (θ) and σ2 (θ) of the nominal frequency response matrix Hnom (ejθ ), respectively, for δ = 1.6. From Fig. 5, the frequency range, over which the transmission over the second mode is optimal, can be determined. As suggested by (18) and (19), the transmission takes place only over the frequencies where σ2 (θ) > |w(ejθ )|. This implies that for a larger size of the uncertainty set |w(ejθ )|, the weaker mode will not be used in the transmission. In the case of the first singular value, the size of the uncertainty set |w(ejθ )| is smaller than σ1 (θ) for all frequencies between −π and π implying that all the frequencies will be used for the transmission. It is interesting to observe (19). It requires the positivity of µ − (σi (θ) − |w(ejθ )|)−2 , i = 1, ..., n. Fig. 6 shows (σ1 (θ) − |w(ejθ )|)−2 , while Fig. 7 and 8 show (σ2 (θ) − |w(ejθ )|)−2 . The peaks in Fig. 7 occurs at the frequencies where σ2 (θ) = |w(ejθ )|. In order to have a transmission over the ith mode, the constant µ, which is chosen in accordance with the power constraint (19), must be larger than (σi (θ) − |w(ejθ )|)−2 over an interval of frequencies θ ∈ [−π, π]. Because σi (θ) − |w(ejθ )| is smaller for smaller singular values of a nominal channel frequency response matrix, (σi (θ) − |w(ejθ )|)−2 is larger for smaller singular values. This can be verified from Fig. 6 and 0 -4 -3 -2 -1 0 1 2 3 4 ș Fig. 5. σ2 (θ) and |w(ejθ )| 8. Thus, for small values of transmitted power Px , the level of power poured into (σi (θ) − |w(ejθ )|)−2 may be small. Hence, for small transmitted power Px , it could happen that the weaker modes could be left without power. In fact, the larger uncertainty makes the weaker modes even weaker, which may contribute to the transmission only over the strongest mode for small transmitted power. Fig. 9 shows the achievable rate versus the parameter δ, which determines the size of the uncertainty set, for a fixed value of Px = 1 W. As expected, the achievable rate decreases, as the size of uncertainty set increases. B. Uncertain Noise Next, it will be shown how the achievable rates for the MIMO communication channel can be computed when the PSD matrix of the noise Wn (θ) belongs to the set A3 . The channel frequency response is completely known, and it is given by # " 1 1 H(ejθ ) = 1− 12 e−jθ 1 1− 14 e−jθ 1− 14 e−jθ 1 1− 12 e−jθ . (52) IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X 40 10 V T w e jT 4 2 1 35 3.5 30 3 Ra1 [nat/s/Hz] 25 20 15 2.5 2 1.5 10 1 5 0.5 0 -4 -3 -2 -1 0 1 2 3 0 1.4 4 1.6 1.8 2 ș Fig. 9. (σ1 (θ) − |w(ejθ )|)−2 Fig. 6. 2.2 Uncertainty 2.4 2.6 2.8 3 į Achievable rate Ra1 vs. δ 3.2 4000 V T we jT 3 2 2 3500 2.8 2.6 2500 2.4 Ra2 [nat/s/Hz] 3000 2000 1500 2.2 2 1.8 1000 1.6 500 1.4 1.2 0 -4 -3 -2 -1 0 1 2 3 0 1 2 4 3 4 5 SNR (dB) 6 7 8 9 10 ș Fig. 10. Fig. 7. 20 SN R = 10 log Px /Pn . The transmitted power is limited by Px = 0.1W, while Pn is varied in a certain range. It is interesting to notice that the Lagrange multipliers λo1 and λo2 represent the derivatives of Ra2 with respect to Pn and Px , respectively. The multipliers are shown in Fig. 11 and 12. It can be seen that Ra2 is more sensitive to variations in Pn and Px for high SNR. But, a better picture regarding the sensitivity of Ra2 with respect to Pn and Px is obtained a2 by comparing the relative perturbation of Ra2 , ∆R Ra2 , to the ∆Px ∆Pn relative perturbations of Pn , Pn , and Px , Px . Viewing Ra2 as a function of Pn and Px , and taking the limit with respect to Pn and Px , we get V T we jT 18 2 2 16 14 12 10 8 6 4 2 0 -1 ∆Ra2 /Ra2 Pn dRa2 = , ∆Pn /Pn Ra2 dPn ∆Ra2 /Ra2 Px dRa2 Sx = lim = . Px →0 ∆Px /Px Ra2 dPx Sn = lim -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 ș Fig. 8. Achievable rate Ra2 vs. SN R (σ2 (θ) − |w(ejθ )|)−2 (σ2 (θ) − |w(ejθ )|)−2 In this case, the channel frequency response matrix H(ejθ ) is square and invertible. To find all required quantities, (29) and (30) are used. Fig. 10 shows the channel capacity vs. Pn →0 (53) (54) The ratios of relative perturbations are shown in Fig. 13 and Fig. 14. For this particular example, the relative perturbation of Ra2 is between 1/3 and 1/2 of the relative perturbations of Pn and Px . Thus, the sensitivity of Ra2 with respect to Pn and Px is not significant. IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X dRa 2 dPn O1 0 11 S x 0.55 -10 0.5 -20 -30 0.45 -40 0.4 -50 -60 -70 Fig. 11. dR a 2 dPx 0.35 0 1 2 3 4 5 SNR (dB) 6 7 8 9 10 0.3 Derivative of Ra2 w.r.t. Pn O2 Fig. 14. 1 2 3 4 5 SNR (dB) 6 7 8 9 10 Sensitivity w.r.t. Px 7 6.5 6 5.5 5 4.5 4 Fig. 12. 0 0 1 2 3 4 5 SNR (dB) 6 7 8 9 10 Derivative of Ra2 w.r.t. Px S n -0.3 -0.35 represented through a subset of L1 function space. For both problems, the explicit formulas for the achievable rates are derived, as well as the optimal PSD matrices of the transmitted signals. For the case when the noise is uncertain, the optimal PSD matrix of the noise is derived, and it is shown that it is related to the optimal PSD matrix of the transmitted signal through a Riccati matrix equation. In the special case, when the channel frequency response matrix is square and invertible, the optimal PSD matrix of the noise is proportional to the optimal PSD matrix of the transmitted signal. For the case when the channel frequency response matrix is uncertain, the achievable rate and the optimal PSD matrix of the transmitted signal depend on the size of the uncertainty set. From these two formulas, it can be concluded that the transmission over the strongest mode of the nominal channel frequency response matrix is the optimal strategy when the size of the uncertainty set increases. In addition, it is shown that under certain conditions, the worst case channel can be identified and that the worst case code, i.e., the code that performs well over the worst case channel is universal for the compound channel of the first-type. -0.4 ACKNOWLEDGMENT -0.45 The authors would like to thank Professor Frank Kschischang from the Department of Electrical and Computer Engineering, University of Toronto, for his useful comments and suggestions. -0.5 -0.55 Fig. 13. 0 1 2 3 4 5 SNR (dB) 6 7 8 9 10 Sensitivity w.r.t. Pn VII. C ONCLUSION This paper deals with the information theoretic bounds of compound MIMO Gaussian channels with memory for two different types of uncertainties. The uncertainty of the channel matrix frequency response is described through a subset of H ∞ space, while the uncertainty of the noise PSD matrix is A PPENDIX A P ROOF OF T HEOREM 3.1 AND T HEOREM 3.3 Notice that the constraint H(ejθ ) ∈ A2 is the same as k∆(ejθ )k∞ ≤ 1. Instead of working with this constraint, we will work with the constraint ∆∗ (ejθ )∆(ejθ ) − Im ≤ 0, for θ ∈ [0, 2π]. This notation means that ∆∗ ∆ − Im is non-positive definite. The latter constraint implies the former (see (86)). This modification does not change the problem; it turns out that the optimal ∆o (ejθ ) lies on the boundary, k∆o (ejθ )k∞ = 1. IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X Next, define the Lagrangian J1 (Wx , ∆, K) Z 2π 1 log det(Ip + (Hnom + w∆)Wx × 4π 0 Z 2π ∗ (Hnom + w∆) )dθ + Trace[K(∆∗ ∆ − Im )]dθ, 0 where K ≥ 0 is a Lagrangian matrix having appropriate dimensions. Next, we use Kuhn-Tucker conditions [37]. The variation of J1 with respect to ∆(ejθ ) gives a necessary condition for the infimum of J1 with respect to ∆(ejθ ) 1 wWx (Hnom + w∆o )∗ (Ip + (Hnom + w∆o ) × 4π Wx (Hnom + w∆o )∗ )−1 + K∆o,∗ = 0, (55) which cannot be solved explicitly. Therefore, it has to be accounted as the equality constraint in order to resolve the supremum part. The original problem supWx ∈A1 inf k∆k∞ ≤1 J(Wx , Wn ) is equivalent to supWx ∈A1 supK≥0 J1 (Wx , ∆o , K) subject to constraint (55) (see [37], [38]). Further, introduce the Lagrangian J2 (Wx , ∆o , K, S, λ1 ) by Z 2π 1 log det(Ip + (Hnom + w∆o )Wx × 4π 0 Z 2π o ∗ (Hnom + w∆ ) )dθ + Trace[K(∆o,∗ ∆o − Im )]dθ 0 Z 2π 1 − Trace[S( wWx (Hnom + w∆o )∗ 4π 0 +K∆o∗ (Ip + (Hnom + w∆o )Wx (Hnom + w∆o )∗ ))]dθ µ Z 2π ¶ −λ1 Trace(Wx )dθ − Px , (56) 0 where λ1 and S are Lagrange multipliers. λ1 is a positive constant, and S is a matrix having appropriate dimensions. Then sup sup J1 (Wx , ∆o , K), (57) Wx ∈A1 K≥0 subject to constraint (55), is equivalent to inf inf sup sup J2 (Wx , ∆o , K, S, λ1 ), λ1 ≥0 S K≥0 Wx ∈A1 (58) because (58) is a dual problem of (57) [37], [38]. Since Wx (θ) and ∆o (ejθ ) are related through the equality constraint (55), the Lagrangian J2 has to be varied with respect to both, Wx (θ) and ∆o (ejθ ). By varying J2 with respect to ∆o (ejθ ) and equating the derivative to zero, the following equation is obtained 1 wWx (Hnom + w∆o )∗ (Ip + (Hnom + w∆o ) × 4π Wx (Hnom + w∆o )∗ )−1 + K∆o,∗ ∗ = Wx Hnom SK∆o,∗ w + Wx ∆o,∗ SK∆o,∗ |w|2 , (59) where the term on the left hand side is equal to zero (see (55)). This observation implies that ∗ wWx (θ)(Hnom (ejθ ) + w∗ (ejθ )∆o,∗ (ejθ ))SK∆o,∗ (ejθ ) = 0. It follows that either Hnom (ejθ ) + w(ejθ )∆o (ejθ ) = 0, or S = 0, or K = 0, or some combination of the previous 12 conditions is true. If Hnom (ejθ ) + w(ejθ )∆o (ejθ ) = 0, then from (55) follows that K = 0, meaning that the constraint imposed on ∆o (ejθ ) vanishes, and Ra1 is equal to zero, which is a trivial solution. Thus, the possibility that remains is S = 0. This and the Lagrangian J2 indicate that the sup and inf problems may be solved independently. To verify this claim, observe that when S = 0, the Lagrangian J2 , (56), corresponds to the Lagrangian when a saddle point exists. When a saddle point exists, the Lagrangian consists of the payoff function, the term that describes the constraint on ∆(ejθ ), and the term that describes the constraint on Wx (θ), while the Lagrangian is varied with respect to ∆(ejθ ) and Wx (θ) as they were independent. This is exactly the case of (56) for S = 0. Hence, the conclusion is that sup inf J(Wx (θ), Wn (θ)) = Wx ∈A1 H∈A2 inf sup J(Wx (θ), Wn (θ)). H∈A2 Wx ∈A1 (60) Thus, for S = 0, we vary the Lagrangian J2 with respect to Wx (θ) to get (Hnom + w∆o )∗ (I + (Hnom + w∆o )Wxo × (Hnom + w∆o )∗ )−1 (Hnom + w∆o ) = 4πλ1 Im ,(61) which represents the equation that is satisfied by the optimal Wxo (θ). From Kuhn-Tucker conditions [37], [38] Z 2π λ1 [ Trace(Wxo (θ))dθ − Px ] = 0, (62) 0 Z 2π Trace[K(∆o,∗ ∆o − Im )]dθ = 0. (63) 0 Further, by observing that λ1 6= 0 and K 6= 0 (because the opposite conditions imply trivial solution C = 0), it follows Z 2π Trace(Wxo (θ))dθ = Px , (64) 0 ∆o,∗ ∆o − Im = 0, (65) concluding the proof of Theorem 3.1 However, Theorem 3.1 does not provide us with the explicit solution for ∆o (ejθ ). The reason for this is that k · k∞ norm puts the constraint on ∆∗ (ejθ )∆(ejθ ), but not on ∆(ejθ ) itself. So, in Theorem 3.3, we have to make an additional step to find ∆o (ejθ ). Next, it is shown why we deal with a particular case of ∆(ejθ ). We find what conditions should be satisfied such that the integrand det(Ip + (Hnom + w∆)Wx (Hnom + w∆)∗ ) (66) can be maximized with respect to Wx (θ) by using Hadamard’s inequality. Having in mind that Hnom (ejθ ) = U (ejθ )Σ(ejθ )V ∗ (ejθ ), and by using the fact that det(In + AB) = det(Im + BA) (where A is n × m matrix, and B is m × n matrix), the integrand can be expressed as det(Ip + V ∗ Wx V (Σ + wU ∗ ∆V )∗ (Σ + wU ∗ ∆V )) = det(Ip + W̃x R∗ R), (67) IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X 4 4 where R = Σ + wU ∗ ∆V , and W̃x = V ∗ Wx V . Further, by using Hadamard’s inequality (as in [32]), the determinant can be upper bounded by the product of the diagonal elements, det(Ip + W̃x R∗ R) ≤ Πni=1 (1 + [Q]ii ), (68) Q = W̃x R∗ R. (69) where The equality in (68) is achieved when Q is diagonal. If R∗ R were diagonal, Hadamard’s inequality could be used to maximize the mutual information rate in W̃x . Then maximizing W̃x would be diagonal. Therefore, the original supinf problem is solved by using this information concerning the diagonal property of W̃x and R∗ R, to determine ∆o (ejθ ), which minimizes J(Wx (θ), Wn (θ)). Assume W̃x is diagonal and choose the matrix ∆(ejθ ) as follows " # jθ ∆ (e ) 0 1 ∆o (ejθ ) = U (ejθ ) V ∗ (ejθ ), (70) 0 0 where ∆1 (ejθ ) is n × n matrix, ∆1 (ejθ ) = diag(δ1 , ..., δn ), and n is the rank of Hnom (ejθ ), n ≤ min(p, m). This ensures that R∗ R is diagonal. Then det(Ip + W̃x R∗ R) = Πni=1 (1 + [W̃x ]ii |σi + δi w|2 ), (71) where {σi }ni=1 are singular values of Hnom (ejθ ). Note that ∆∗1 ∆1 is the identity matrix, which comes from ∆∗ ∆ = Im (see (63)). It follows that |δi | = 1, i = 1, ..., n. The determinant in (71) is minimized when ξi = |σi + δi w|2 is minimized. ξi has the following lower bound ξi = ≥ σi2 + 2σi |δi ||w| cos(arg(δi ) + arg(w)) + |δi |2 |w|2 (σi − |w|)2 , (72) where δi = |δi |ejarg(δi ) = ejarg(δi ) , and w = |w|ejarg(w) . The matrix ∆1 (ejθ ) that achieves the lower bound is given by (21). The optimal PSD matrix Wx (θ) is found by substituting (70) into (61) and using (64) A PPENDIX B P ROOF OF T HEOREM 4.1 The achievable rate Ra2 in the case of the noise uncertainty is given in the form of the double optimization problem (13) sup inf Wx ∈A1 Wn ∈A2 J(Wx (θ), Wn (θ)). We formulate the Lagrangian J1 (Wx , Wn , λ1 ) Z 2π 1 log det(I + HWx H ∗ Wn−1 )dθ 4π 0 µ Z 2π ¶ +λ1 Trace(Wn )dθ − Pn , (73) 13 apply Kuhn-Tucker conditions [37]. The variation of J1 with respect to Wn gives the quadratic, Riccati-type equation 1 Wno2 (θ) + Wno (θ)H(ejθ )Wxo (θ)H ∗ (ejθ ) 2 1 jθ + H(e )Wxo (θ)H ∗ (ejθ )Wno (θ) 2 1 − H(ejθ )Wxo (θ)H ∗ (ejθ ) = 0, (75) 4πλo1 which cannot be solved explicitly. Therefore, it has to be accounted as the equality constraint in order to resolve the supremum part. The original problem supWx ∈A1 inf Wn ∈A2 J(Wx , Wn ) is equivalent to supWx ∈A1 supλ1 ≥0 J1 (Wx , Wno , λ1 ) subject to (75) (dual problem [37], [38]). Further, introduce the Lagrangian J2 (Wx , Wno , λ1 , λ2 , K) Z 2π 1 log det(Ip + HWx H ∗ Wno−1 )dθ 4π 0 µ Z 2π ¶ +λ1 Trace(Wno )dθ − Pn µ Z0 2π ¶ −λ2 Trace(Wx )dθ − Px 0 Z 2π 1 − Trace[K(Wno2 + HWx H ∗ Wno 2 0 1 o 1 + Wn HWx H ∗ − HWx H ∗ )]dθ, (76) 2 4πλo1 where λ2 is a constant, and K is a matrix, representing Lagrange multipliers. Then, sup sup J1 (Wx , Wno , λ1 ), (77) Wx ∈A1 λ1 ≥0 subject to (75), is equivalent to inf inf sup sup J2 (Wx , Wno , λ1 , λ2 , K), λ2 ≥0 K λ1 ≥0 Wx ∈A1 (78) (dual problem [37], [38]). Since Wx and Wno are related through equality constraint (75), the Lagrangian J2 is varied with respect to both, Wx and Wno . By varying J2 with respect to Wx , the following equation is obtained H ∗ Wno−1 (I + HWx H ∗ Wno−1 )−1 H 1 −4πH ∗ Wn KH + H ∗ KH = 4πλ2 Im . λ1 By varying J2 with respect to Wn , we have (79) −Wno−1 (Ip + HWx H ∗ Wno−1 )−1 HWx H ∗ Wno−1 +4πλ1 Ip = 4πK(2Wno + HWx H ∗ ). (80) Further, (80) can be massaged to give 1 (I + HWx H ∗ Wno−1 )Wno K(2Wno + HWx H ∗ )Wno = 0, (81) λo1 (74) 0 where λ1 is a constant, representing a Lagrange multiplier, associated with the constraint imposed on the Wn . Next, we which follows from (75). Hence, one or more terms on the right hand side must be equal to zero. From the setting of the problem, the only possibility that remains is K = 0. This implies that the two constraints imposed on Wx and Wn can be decoupled, and that Wx and Wn satisfy saddle point property J(Wx , Wno ) ≤ J(Wxo , Wno ) ≤ J(Wxo , Wn ), (82) IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X which is equivalent to sup inf Wx ∈A1 Wn ∈A3 J(Wx , Wn ) = inf sup J(Wx , Wn ). (83) Wn ∈A3 Wx ∈A1 In addition, (79) and (80) transform into (25) and (24), respectively A PPENDIX C H ∞ S PACE 4 Let D denote the unit disc of the complex plane, D = {z ∈ 4 C : |z| < 1}, and ∂D denote the boundary of D, D = {z ∈ C : |z| = 1}. Definition C.1: A Hilbert space of matrix-valued functions G, on ∂D, which have a finite norm Z 4 kGkL2 = Trace[G∗ (ejθ )G(ejθ )]dθ, (84) [0,2π) is a Lebesgue-Bochner space, which is denoted by L2 . Definition C.2: A Banach space of matrix-valued functions G, on ∂D, which have a finite norm 4 kGkL∞ = ess sup σ̄[G(ejθ )], (85) θ∈[0,2π) is a Lebesgue-Bochner space, which is denoted by L∞ . Here, σ̄(A) denotes the maximal singular value of the matrix A. “esssup” stands for essential supremum of a function, and it is defined as the smallest number α such that the measure of the set {θ : σ̄[G(ejθ )] > α} is zero. Definition C.3: H ∞ space is a closed subspace of L∞ , where the matrix-valued functions G are analytic and bounded on the unit disc D. The norm associated with the space H ∞ , k · k∞ , which is given by (85), can also be expressed in terms of k · kL2 as follows kGk∞ = kGxkL2 . kxkL2 6=0 kxkL2 sup (86) This explains why k·k∞ is sometimes called induced or system norm. k · k∞ norm represents the maximum attenuation which system inflicts on the signal that is transmitted through it. It corresponds to the maximal value of Bode magnitude plot. R EFERENCES [1] A. Lapidoth and P. Narayan, “Reliable communication under channel uncertainty”, IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2148-2177, Oct. 1998. [2] S. Z. Denic, C. D. Charalambous and S. M. Djouadi, “Channel capacity subject to frequency domain normed uncertainties - SISO and MIMO cases,” accepted for presentation at 17th International Symposium on Mathematical Theory of Networks and Systems, 2006, Japan. [3] P. E. Caines, Linear stochastic systems. New York: John Wiley & Sons, 1988. [4] R. Gallager, Information theory and reliable communication. New York: Wiley, 1968. [5] N. M. Blachman, “On the capacity of a band-limited channel perturbed by statistically dependent interference,” IRE Trans. Inf. Theory, pp. 48-55, Jan. 1962. [6] P. L. Duren, Theory of Hp spaces. Dover Publications, Unabridged edition, 2000. [7] J. C. Doyle, B. A. Francis, and A. R. Tannenbaum, Feedback Control Theory. New York: Mcmillan Publishing Company, 1992. 14 [8] A. F. Molisch, M. Steinbauer, M. Toeltsch, E. Bonek, and R. S. Thomä, “Capacity of MIMO systems based on measured wireless channels,” IEEE J. Sel. Areas Commun., vol. 20, no. 3, pp. 561-569, Apr. 2002. [9] S. Vishwanath, S. Boyd, A. Goldsmith, “Worst-case capacity of Gaussian vector channels,” Proceedings of 2003 Canadian Workshop on Information Theory, 2003. [10] P. Lancaster, L. Rodman, Algebriac Riccati equations. Oxford University Press, USA, 1995. [11] N. M. Blachman, “Communication as a game,” IRE Wescon 1957 Conference Record, vol. 2, pp. 61-66, 1957. [12] L. H. Brandenburg and A. D. Wyner, “Capacity of the Gaussian channel with memory: the multivariate case,” Bell Syst. Tech. J., vol. 53, no. 5, 1974. [13] W. L. Root and P. P. Varaiya, “Capacity of classes of Gaussian channels,” SIAM J. Appl. Math., vol. 16, no. 6, pp. 1350-1393, Nov. 1968. [14] M. Médard, “The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel”, IEEE Trans. Inf. Theory, vol. 46, no. 3, pp. 933-946, May 2000. [15] D. P. Palomar, J. M. Cioffi, M. A. Lagunas, ”Uniform power allocation in MIMO channels: a game theoretic approach,” IEEE Trans. Inf. Theory, vol. 49, no. 7, pp. 1707-1727, July, 2003. [16] T. Yoo and A. Goldsmith, “Capacity and optimal power allocation for fading MIMO channels with channel estimation error,” IEEE Trans. Inf. Theory, vol. 52, no. 5, pp. 2203-2214, May, 2006. [17] B. Hassibi and B. M. Hochwald, “How much training is needed in multiple-antenna wireless systems,” IEEE Trans. Inf. Theory, vol. 49, no. 4, pp. 951-963, Apr. 2003. [18] S. Venkatesan, S. H. Simon, R. A. Valenzuela, “Capacity of MIMO Gaussian channel with non-zero mean,” IEEE Vehicular Technology Conference 2003, VTC 2003-Fall, 2003. [19] D. Hösli and A. Lapidoth, “The capacity of a MIMO Ricean channel is monotone in the singular values of the mean,” Proc. of the 5th International ITG Conference on Source and Channel Coding (SCC), Erlangen, Nuremberg, January 14-16, 2004. [20] M. Godavarti, A. O. Hero, T. L. Marzetta, “Min-capacity of a multipleantenna wireless channel in a static Ricean fading environment,” IEEE Trans. Wireless Commun., vol. 4, no. 4, pp. 1715 - 1723, Jul. 2005. [21] L. Zheng and D. Tse, “Communication on the Grassmann manifold: a geometric approach to the noncoherent multiple-antenna channel,” IEEE Trans. Inf. Theory, vol. 48, no. 2, pp. 359-383, Feb. 2002. [22] A. Lapidoth and S. M. Moser, “Capacity bounds via duality with applications to multiple antenna systems on flat fading channels,” IEEE Trans. Inf. Theory, vol. 49, no. 10, pp. 2426-2467, Oct. 2003. [23] M. Uysal, C. N. Georghiades, “An efficient implementation of a maximum-likelihood detector for space-time block coded systems,” IEEE Trans. Commun, vol. 51, no. 4, pp. 521-524, Apr. 2003. [24] R. Ahlswede, “The capacity of a channel with arbitrary varying Gaussian channel probability functions,” Trans. 6th Prague Conf. Information Theory, Statistical Decision Functions and Random Processes, pp. 1321, Sep. 1971. [25] R. J. McElice, “Communications in the presence of jamming An information theoretic approach”, in Secure Digital Commun., G. Longo, ed., Springer-Verlang, pp. 127-166, New York, 1983. [26] C. R. Baker and I.-F. Chao, “Information capacity of channels with partially unknown noise. I. Finite dimensional channels,” SIAM J. Appl. Math., vol. 56, no. 3, pp. 946-963, Jun. 1996. [27] C. R. Baker and I.-F. Chao, “Information capacity of channels with partially unknown noise. II. Infinite dimensional channels,” SIAM J. Control and Optimization, vol. 34, no. 4, pp. 1461-1472, Jul. 1996. [28] S. N. Diggavi and T. M. Cover, “The worst additive noise under a covariance constraint”, IEEE Trans. Inf. Theory, vol. 47, no. 7, pp. 30723081, Nov. 2001. [29] H. Boche and E. A. Jorsweick, “Multiuser MIMO systems, worst case noise, and transmitter cooperation,” Proc. of the 3rd IEEE Int. Symp. Signal Processing and Information Technology, ISSPIT 2003, 2003. [30] A. Kashyap, T. Basar, R. Srikant, “Correlated jamming on MIMO Gaussian fading channels,” IEEE Trans. Inf. Theory, vol. 50, no. 9, pp. 2119-2123, Sep., 2004. [31] N. Chiurtu, B. Rimoldi and E. Telatar, “On the capacity of multi-antenna Gaussian channels,” ISIT 2001, Washington DC, page 53, Jun. 24-29, 2001. [32] E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans. Telecomm. ETT, vol. 10, no. 6, pp. 585 596, Nov. 1999. [33] M. J. Osborne and A. Rubinstein, A course in game theory, MIT Press, 1994 [34] Proakis, J., Digital communications. New York: Mcgraw-Hill, 1983. IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. X, NO. X, 200X [35] C. E. Shannon, “A note on a partial ordering for communication channels,” Inform. and Control, vol. 1, pp. 390-397, 1958. [36] A. W. Eckford, F. R. Kschischang, and S. Pasupathy, “On partial ordering of Markov modulated channels under LDPC decoding,” ISIT 2003, page 295, Yokohama, Japan, 2003. [37] D. G. Luenberger, Optimization by vector space methods, John Wiley & Sons, 1969. [38] S. Boyd and L. Vandenberghe, Convex optimization, Cambridge University Press, 2003. 15