Free Deconvolution for MIMO Capacity Estimation Øyvind Ryan Mérouane Debbah Centre of Mathematics for Applications, University of Oslo, P.O. Box 1053 Blindern, 0316 Oslo, Norway Phone: +47 93 24 83 21 Fax: +47 22 85 43 49 Email: oyvindry@ifi.uio.no SUPELEC, 3 rue Joliot-Curie, 91192 GIF SUR YVETTE CEDEX, France Phone: +33 1 69 85 20 07 Fax: +33 1 69 85 12 59 Email: merouane.debbah@supelec.fr Abstract—In many channel measurement applications, one needs to estimate some characteristics of the channels based on a limited set of measurements. This is mainly due to the highly time varying characteristics of the channel. In this contribution, it will be shown how free probability can be used for channel capacity estimation in MIMO systems. Free probability has already been shown to be an invaluable tool for describing the asymptotic behaviour of many systems when the dimensions of the system get large (i.e. the number of antennas). In particular, introducing the notion of free deconvolution, we provide hereafter an asymptotically (in the number of antennas) unbiased capacity estimator (w.r.t. the number of observations) for MIMO channels impaired with noise. Another unbiased estimator (for any number of observations) is also constructed by slightly modifying the free probability based estimator. The framework is then extended to MIMO channels with phase off-set and frequency drift for which no estimator has been provided so far. I. I NTRODUCTION Random matrices, and in particular limit distributions of sample covariance matrices, have proved to be a useful tool for modelling systems, for instance in digital communications [1], nuclear physics [2] and mathematical finance [3]. A typical random matrix model is the information-plus-noise model, 1 (Rn + σXn )(Rn + σXn )H . (1) N Rn and Xn are assumed independent random matrices of dimension n × N , where Xn contains i.i.d. standard (i.e. mean 0, variance 1) complex Gaussian entries. (1) can be thought of as the sample covariance matrices of random vectors rn + σxn . rn can be interpreted as a vector containing the system characteristics (direction of arrival for instance in radar applications or impulse response in channel estimation applications). xn represents additive noise, with σ a measure of the strength of the noise. Classical signal processing estimation methods consider the case where the number of observations is far bigger than the dimensions of the system, for which equation (1) can be shown to be approximately: Wn = Wn = Γn + σ 2 In . (2) Here, Γn is the true covariance of the signal. In this case, one can separate the signal from the noise subspace and infer (based only on the statistics of the signal) on the characteristics of the input signal. However, in many situations, one can gather only a limited number of observations during which the characteristics of the signal does not change. In order to model this case, n and N will be increased n = c, i.e. the number of observations is increased so that limn→∞ N at the same rate as the number of parameters of the system. Previous contributions have already dealt with this problem. In [4], Dozier and Silverstein explain how one can use the eigenvalue distribution of Γn = N1 Rn RH n to estimate the eigenvalue distribution of Wn . In [5], [6], an algorithm was provided to estimate one from another, using the concept of multiplicative free convolution, which admits a convenient implementation. In this paper, channel capacity estimation in MIMO systems is used as a benchmark application by using the connection between free probability theory and systems of the form (1). For MIMO channels with and without frequencyoff sets, we derive explicit unbiased estimators which perform much better than classical ones. This paper is organized as follows. Section II presents the problem under consideration. Section III provides the basic concepts needed on free probability, including multiplicative and additive free (de)convolution. In section IV, we formalize the free probability approach as an estimator, and explain some of the shortcomings for MIMO models with frequency off-sets. Another estimator, called the unbiased capacity estimator is then formalized to address the shortcomings of the free probability based estimator. In section V, simulations of the free probability based and the unbiased estimators are performed and compared. In the following, upper (lower boldface) symbols will be used for matrices (column vectors) whereas lower symbols will represent scalar values, (.)T will denote transpose hermitian transpose. operator, (.) conjugation and (.)H = (.)T In will represent the n × n identity matrix. T rn will denote the non-normalized trace on n × n matrices, while trn = n1 T rn denotes the normalized trace. Also, we will throughout the paper use c as a shorthand notation for the ratio between the number of rows and the number of columns in the random matrix model being considered, n for systems of the form (1). i.e. c = limn→∞ N II. S TATEMENT OF THE PROBLEM In usual time varying measurement methods for MIMO systems, one validates models [7] by determining how the model fits with actual capacity measurements. In this setting, one has to be extremely cautious about the measurement noise, especially for far field measurements where the signal strength can be lower than the noise. The MIMO measured channel in the frequency domain can be modelled by: 1 Ĥi = √ Dri HDti + σXi n (3) where Ĥi , H and Xi are respectively the n × n measured MIMO matrix (n is the number of receiving and transmitting antennas1 ), the n × n MIMO channel and the n × n noise matrix with i.i.d. zero 1 Without loss of generality, we consider the same number of antennas at the transmitter and receiver. mean unit variance Gaussian entries. We suppose that the channel H, although time varying, stays constant (block fading assumption) during L blocks. Dri and Dti are diagonal matrices which represent phase off-sets and phase drifts (which are impairments due to the antennas and not the channel) at the receiver and transmitter given respectively by (these are supposed to vary on a block basis): i i i t Dri = diag[ejφ1 , ..., ejφn ] and Dti = diag[ejθ1 , ..., ejθn ] where the phases φij and θji are random, independent, and uniformly distributed. We will also consider the simpler model without phase off-sets and phase drifts, i.e. 1 Ĥi = √ (H + σXi ) . n (4) The capacity of a channel with channel matrix H and signal to noise ratio ρ = σ12 is given by H 1 C = n1 log ndet I + nσ 21 HH (5) 1 = n l=1 log(1 + σ 2 λl ) where λl are the eigenvalues of n1 HHH . The problem consists therefore of estimating the eigenvalues of n1 HHH based on few observations Ĥi , which is paramount for modelling purposes. Note that the capacity expression supposes that the channel is perfectly known at the receiver and not at the transmitter. In practice, with the noise impairment, the channel will never be estimated perfectly and therefore expression (5) is not achievable. However, for MIMO modelling purposes, for which the capacity is often the matching metric, one needs to compare the capacity of the model with expression (5). There are different methods actually used for channel capacity estimation. Usual methods discard, through an ad-hoc threshold procedure, all channels Ĥi for which the channel to noise ratio ( nσ1 2 Trace(HHH )) is lower than a threshold and then compute M M 1 1 1 1 H Ĥi )( Ĥi ) ) C̃(σ ) = log2 det I + 2 ( n σ M i=1 M i=1 2 where M ≤ L is the number of channels having a signal to noise ratio higher than the threshold. One of the drawbacks of this method is that one will not analyse the true capacity but only the capacity of the ”good channels”. Moreover, one has to limit the channel measurement campaign (in order to have enough channels higher than the threshold) only to regions which are close (in terms of actual distance) enough to the base station. Other methods, in order to have a capacity estimation at a given signal to noise ratio (different from the measured one with noise variance σ 2 ), normalize each channel realization Ĥi and then compute for a different value of the noise variance σ12 (for example 10dB) the capacity estimate C̃(σ12 ). In the case where σ 2 is high and σ12 is low, one usually finds a high capacity estimate as one measures only the noise, which is known to have a high multiplexing gain. In this contribution, we will provide a neat framework, based on free deconvolution, for channel capacity estimation that circumvents all the previous drawbacks. Moreover, we will deal with model (3), for which no solution has been provided in the literature so far. III. F RAMEWORK FOR FREE CONVOLUTION Free probability [8] is a theory for non-commutative random variables (like matrices), which has grown into an entire field of research through the pioneering work of Voiculescu. These random variables are elements in what is called a noncommutative probability space. Free probability is analogous to classical probability in the sense that a general linear functional (such as the trace on the set of matrices) takes the place of the expectation, and the concept of freeness [8] takes the place of the concept of independence. It turns out that many types of random matrices (for which our expectation is the (classical) expectation of the trace) exhibit asymptotic freeness [8], meaning that the freeness relation holds only asymptotically when the matrices get large. For instance, random matrices √1n An1 , √1n An2 , ..., where the Ani are n × n with all entries independent and standard Gaussian (i.e. mean 0 and variance 1), are asymptotically free [8]. In free probability, one also has a transform analogous to the logarithm of the Fourier transform. One also has convolution, called free convolution, which linearizes this transform. We will denote Additive free convolution by μ1 μ2 , and multiplicative free convolution by μ1 μ2 , where μ1 and μ2 are probability measures. In free probability, these operations are formally defined by associating the moments of the probability measures μ1 and μ2 with the moments non-commutative random variables a1 and a2 which are free, forming their sum/product, and associating the moments of the sum/product with another probability measure. We will also find the concepts of additive and multiplicative free deconvolution useful: Given probability measures μ and μ2 , when there is a unique probability measure μ1 such that μ = μ1 μ2 (μ = μ1 μ2 ), we will write μ1 = μ μ2 (μ1 = μ μ2 respectively). We say that μ1 is the additive (respectively multiplicative) free deconvolution of μ with μ2 . One important measure is the Marc̆henko Pastur law μc [9], characterized by the density (x − a)+ (b − x)+ 1 f μc (x) = (1 − )+ δ(x) + , (6) c 2πcx √ √ where (z)+ = max(0, z), a = (1 − c)2 and b = (1 + c)2 . It is known that μc describes asymptotic eigenvalue distributions of Wishart matrices. These have the form N1 RRH , where R is an n×N n → c) random matrix with independent standard Gaussian entries. (N By the empirical eigenvalue distribution of an n×n random matrix X we will mean the random atomic measure 1 (δ(λ1 (X)) + · · · + δ(λn (X))) , n where λ1 (X), ..., λn (X) are the (random) eigenvalues of X. The empirical eigenvalue distribution of X is also denoted μX . To make the connection between models (4), (3) and the model (1), we need the following result from free probability [5]: Theorem 1: Assume that the empirical eigenvalue distribution of Γn = N1 Rn RH n converges in distribution almost surely to a compactly supported probability measure μΓ . Then we have that the empirical eigenvalue distribution of Wn also converges in distribution almost surely to a compactly supported probability measure μW uniquely identified by μW μc = (μΓ μc ) μσ 2 I . (7) When we have L observations Ĥi in a MIMO system as in (4) or (3), we will form the n × nL random matrices 1 σ H1...L + √ X1...L Ĥ1...L = √ (8) n L with H1...L 1 Ĥ1...L = √ Ĥ1 , Ĥ2 , ..., ĤL , L 1 r = √ Di HDti , Dri HDti , ..., Dri HDti , L X1...L = [X1 , X2 , ..., XL ] . For the case L = 1, the formula j H j Dr1 HDt1 Dr1 HDt1 HHH trn = trn can be combined with theorem 1 to give the approximation μĤ1 ĤH μ1 = μ 1 HHH μ1 μσ 2 . 1 n (9) (10) for a single observation. This approximation works well when n is H when large. For many observations, note that H1...L HH 1...L = HH there is no phase off-set and phase drift, so that the approximation (11) μĤ1...L ĤH μ 1 = μ 1 HHH μ 1 μσ 2 1...L L n L applies and generalizes (10). Note that the ratio between the number of rows and columns in the matrices H1...L , X1...L and Ĥ1...L is c = L1 , considering the horizontal stacking of the observations in a larger matrix. It is only this stacking which will be considered in this paper, so this value of c will apply for all random matrix models we will consider. When phase off-set and phase drift are added, it is much harder to adapt theorem 1 to produce the moments of n1 HHH . The reason is that theorem 1 really helps us to find the moments of n1 H1...L HH 1...L . In the case without phase off-set and phase drift, this is enough since these moments are equal to the moments of n1 HHH . However, equality between these moments does not hold when phase off-set and phase drift are added. In section IV, we will instead define an estimator for the channel capacity which do not stack observations into the matrix H1...L at all. IV. N EW ESTIMATORS FOR CHANNEL CAPACITY In this section, two new channel capacity estimators are defined. First, a free probability based estimator is introduced, which will be shown to be asymptotically unbiased w.r.t. the number of observations. Then, by slightly modifying this we will construct what we call the unbiased capacity estimator. This estimator will be shown to be unbiased for any number of observations. A. The free probability based capacity estimator The free probability based estimator can take advantage of knowledge of the rank of the channel matrix: With a lower rank, it can perform estimation with less computation. Definition 1: By a free probability based estimator for the capacity of a channel (with channel matrix H) we will mean an estimator which performs the following steps: 1) Compute the first r moments ĥ1 , ..., ĥr of the sam= ple covariance matrix Ĥ1...L ĤH 1...L (i.e. compute ĥj j H Ĥ1...L Ĥ1...L for 1 ≤ j ≤ r), where r is the rank trn of n1 HHH , 2) use (11) to estimate the first r moments h1 , ..., hr of n1 HHH , 3) estimate the r nonzero eigenvalues λ1 , ..., λr of n1 HHH from h1 , ..., hr , 4) substitute these in (5). Steps 2 and 3 in definition 1 need some elaboration. To address step 3, we remark that the k nonzero eigenvalues of a matrix can be calculated from it’s k first moments, by using the Newton-Girard formulas as described in [10]. An accompanying implementation of this can be found in [11]. To address step 2, a Matlab implementation [11] which performs free (de)convolution was developed and used for simulations in this paper. The following algorithm for computing free convolution with the Marc̆henko Pastur law in terms of moments appeared first in [6]. It is easiest to describe for c = 1: Proposition 1: If mn are the moments of μ, and Mn are the moments of μ μ1 , then we have that Mn = (12) mk coefn−k (1 + M1 z + M2 z 2 + · · · )k , k≤n where coefk is the coefficient of z k in the polynomial on the right hand side. In [12], Matlab implementations of this can be found, together with a straightforward generalization to free convolution with any Marc̆henko Pastur law. ¿From the results in section III we conclude that the free probability based estimator is asymptotically unbiased (w.r.t. the number of observations) for systems of type (4) (not for systems of (3)). B. The unbiased capacity estimator The definition below for the unbiased capacity estimator is motivated from computing expected values of mixed moments of Gaussian and deterministic matrices [10]: Definition 2: The unbiased capacity estimator is defined for rank r ≤ 4, with the requirement that the channel matrix H is hermitian if r = 4, in the following way: 1) For each observation, perform the following a) Compute the first r moments ĥi1 , ..., ĥir of the sam(i.e. compute ĥj = ple covariance matrix Ĥi ĤH i j H Ĥ1...L Ĥ1...L trn for 1 ≤ j ≤ r), b) find estimates hi1 , hi2 , hi3 of the first three moments of 1 HHH by solving n ĥi1 ĥi2 ĥi3 = = = hi1 + σ 2 hi2 + 4σ 2 hi1 + 2σ 4 hi3 + 6σ 2 hi2 + 3σ 2 h2i1 +3σ 4 5 + N22 hi1 + σ 6 5 + (13) 1 n2 , and an estimate hi4 of the fourth moment by solving the additional formulas ˆ i2 = hdi2 hd +2σ 2 1 + n1 hi1 +σ 4 1 + n2 = hi4 ĥi4 +8σ 2 hi3 (14) +8σ 2 hi2 hi1 +4σ 4 7 + n62 hi2 2 +4σ 4 7 + n2 h i1 +4σ 4 n1 + n22 hdi2 +4σ 6 14 + n2 + n172 + n83 hi1 +σ 8 14 + n102 when r = 4, where ˆ i2 = trn hd 2 diag Ĥi ĤH i (hdij is used as notation for the moments of a diagonal matrix, i.e. the d stands for diagonal). If we restrict to L = 1 observation, the following holds: 1) The free probability based and the unbiased estimator coincide for the first two moments h1 and h2 . 2) The third moment h3 in the free probability based estimator is biased, with bias given by 6σ 4 trn n1 HHH + σ 6 − n2 This holds regardless of whether model (3) or model (4) is used. In this paper, the unbiased capacity estimator is used for systems with phase off-set and phase drift (where the free probability based estimator fails). Free capacity estimation is used for systems without phase off-set and phase drift, and has an implementation which can be adapted to channel matrices with any rank. V. C HANNEL CAPACITY ESTIMATION Several candidates for channel capacity estimators for (4) have been used in the literature. We will consider the following: L H 1 1 C1 = nL i=1 log 2 det I + σ 2 Ĥi Ĥi H C2 = n1 log2 det I + Lσ1 2 L i=1 Ĥi Ĥi L H 1 C3 = n1 log2 det I + σ12 ( L1 L i=1 Ĥi )( L i=1 Ĥi ) ) (15) A. Channels without phase off-set and phase drift In figure 1, C1 , C2 and C3 are compared with the free probability based and unbiased estimators for various number of observations, with σ 2 = 0.1, and a 10 × 10 channel matrix of rank 3. It is seen that only the C3 estimator gives values close to the true capacity. The channel considered has no phase drift or phase off-set. C1 and C2 are seen to have a high bias. The free probability based and unbiased estimators are seen to give values closer to the true capacity than C3 . B. Channels with phase off-set and phase drift In figure 2, the C3 estimator is compared with the free probability based estimator, the unbiased estimator and the true capacity, for various number of observations, and with the same σ and channel matrix as in figure 1. Phase off-set and phase drift have also been introduced. In this case, the free probability based estimator and the C3 -estimator seem to be biased. In figure 3, we have varied σ, used 10 observations, and also formed a rank 3 channel matrix with n = 4. It is seen that the deviation from the true capacity is small, which provides a 2.6 2.4 2.2 Capacity 2 1.8 1.6 1.4 1.2 1 0.8 0 True capacity Classical capacity estimation (C ) 1 Classical capacity estimation (C2) Classical capacity estimation (C3) Free capacity estimation Unbiased capacity estimation 5 10 15 20 Number of observations 25 30 Fig. 1. Comparison of various classical capacity estimators and the two new capacity estimators introduced in this paper for various number of observations, model (4). σ2 = 0.1 and n = 10. The rank of H was 3. 2.6 2.4 2.2 2 Capacity Form the estimates hj = L1 L i=1 hij , 1 ≤ j ≤ r, of the first moments of n1 HHH , 2) estimate the r nonzero eigenvalues λ1 , ...λr of n1 HHH from h1 , ..., hr , 3) substitute these in (5). A Matlab implementation performing these steps can be found in [11]. Without the assumption of H being hermitian, the expression for the rank 4 estimator would be more complex. The following proposition shows that the estimator of definition 2 actually qualifies for it’s name. It’s proof (found in [10]) is the background for the formulas in definition 2. Proposition 2: The unbiased capacity estimator is actually unbiased (for any number of observations), i.e. j 1 HHH , 1 ≤ j ≤ 4. E(hj ) = trn n 1.8 1.6 1.4 1.2 1 0.8 0 True capacity Classical capacity estimation (C3) Free capacity estimation Unbiased capacity estimation 5 10 15 20 Number of observations 25 30 Fig. 2. Comparison of capacity estimators which worked for model (4) for increasing number of observations. Model (3) is used. σ2 = 0.1 and n = 10. The rank of H was 3. very good candidate for channel estimation in highly time-varying environments. In [10], the estimator was also tested with only one observation, for which the deviation from the true capacity was higher, but still quite small for small σ. In [10], the estimator was also tested for one observation with n = 10, for which the deviation from the true capacity was smaller. Finally, let us use a channel matrix of rank 4. In this case we have increased the number of observations further to predict the channel capacity. In figure 4, unbiased capacity estimation is performed for a rank 4 channel matrix with n = 4 and 1600 observations are performed. In [10], comparison is also done for 50 observations also. 9 True capacity Unbiased capacity estimation 8 R EFERENCES Capacity 7 6 5 4 3 2 0.05 0.1 0.15 0.2 0.25 σ 0.3 0.35 0.4 0.45 0.5 Fig. 3. The unbiased estimator for L = 10 observations and n = 4, with varying values of σ. Model (3). The rank of H was 3. 10 True capacity Unbiased capacity estimation 9 Capacity 8 7 6 5 4 3 ACKNOWLEDGMENT Mérouane Debbah is supported by Alcatel-Lucent within the Alcatel-Lucent Chair on flexible radio at SUPELEC. 0.05 0.1 0.15 0.2 0.25 σ 0.3 0.35 0.4 0.45 0.5 Fig. 4. The unbiased estimator for L = 1600 observations and n = 4, with varying values of σ. Model (3). The rank of H was 4. VI. C ONCLUSION In this paper, we have shown that free probability provides a neat framework for estimating the channel capacity for certain MIMO systems. In the case of highly time varying environments, where one can rely only on a set of limited noisy measurements, we have provided an asymptotically unbiased estimator of the channel capacity. A modified estimator called the unbiased estimator (in comparison with the asymptotically unbiased estimator) was also introduced to take into account the bias in the case of finite dimensions and was proved to be adequate for low rank channel matrices. Moreover, although the results are based on asymptotic claims (in the number of observations), simulations show that the estimators work well for a very low number of observations also. Even when considering discrepancies such as phase drifts and phase off-set, the algorithm, based on the unbiased estimator, provided very good performance. [1] E. Telatar, “Capacity of multi-antenna gaussian channels,” Eur. Trans. Telecomm. ETT, vol. 10, no. 6, pp. 585–596, Nov. 1999. [2] T. Guhr, A. Müller-Groeling, and H. A. Weidenmüller, “Random matrix theories in quantum physics: Common concepts,” Physica Rep., pp. 190– , 299 1998. [3] J.-P. Bouchaud and M. Potters, Theory of Financial Risks-From Statistical Physics to Risk Management. Cambridge: Cambridge University Press, 2000. [4] B. Dozier and J. W. Silverstein, “On the empirical distribution of eigenvalues of large dimensional information-plus-noise type matrices,” J. Multivariate Anal., vol. 98, no. 4, pp. 678–694, 2007. [5] Ø. Ryan and M. Debbah, “Multiplicative free convolution and information-plus-noise type matrices,” 2007, http://arxiv.org/abs/math.PR/0702342. [6] ——, “Free deconvolution for signal processing applications,” Submitted to IEEE Trans. on Information Theory, 2007, http://arxiv.org/abs/cs.IT/0701025. [7] J. P. Kermoal, L. Schumacher, K. I. Pedersen, P. E. Mogensen, and F. Frederiken, “A stochastic MIMO radio channel model with experimental validation,” IEEE Journal on Selected Areas in Communications, vol. 20, no. 6, pp. 1211–1225, 2002. [8] F. Hiai and D. Petz, The Semicircle Law, Free Random Variables and Entropy. American Mathematical Society, 2000. [9] A. M. Tulino and S. Verdú, Random Matrix Theory and Wireless Communications. www.nowpublishers.com, 2004. [10] Ø. Ryan and M. Debbah, “Channel capacity estimation using free probability theory,” Submitted to Submitted to IEEE Trans. Signal Process., 2007, http://arxiv.org/abs/0707.3095. [11] Ø. Ryan, Tools for estimating channel capacity, 2007, http://ifi.uio.no/˜oyvindry/channelcapacity/. [12] ——, Computational tools for free convolution, 2007, http://ifi.uio.no/˜oyvindry/freedeconvsignalprocapps/.