IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 71, NO. 8, AUGUST 2022 8729 A Compressive Sensing and Deep Learning-Based Time-Varying Channel Estimation for FDD Massive MIMO Systems Jiancun Fan , Senior Member, IEEE, Peizhe Liang, Zihan Jiao, and Xiaodong Han Abstract—To achieve the performance gains of massive multipleinput multiple-output (MIMO) systems, the downlink channel state information (CSI) must be acquired at the base station (BS). In frequency division duplexing (FDD) massive MIMO systems, the BS always first transmits downlink pilot symbols so that the user equipment (UE) can estimate CSI and then feedback to the BS. However, the huge number of antennas at the BS will lead to overwhelming feedback overhead. Moreover, time-varying caused by high mobility of user terminals makes the priori channel knowledge of the channels to change from one slot to another so that CSI aquisition is hard. To simultaneously reduce the overhead of overwhelming downlink pilot signaling and uplink feedback in timevarying massive MIMO systems, we propose a channel estimation scheme based on compressive sensing (CS) and deep learning (DL) in frequency division duplexing (FDD) massive MIMO systems. Specifically, we first develop a new CS-based algorithm for sparse channel estimation, which requires no priori knowledge of channel statistics. After obtaining the innitial channel estimation, we utilize two DL-based networks, named DnNet and DnLSTM respectively for denoising. Simulation results demonstrate that the proposed method can considerably reduce the training and feedback overhead and outperform the existing classical algorithms. Index Terms—Channel training and feedback, compressive sensing, deep learning, FDD, massive MIMO. I. INTRODUCTION ASSIVE multiple-input multiple-output (MIMO) has been viewed as a key technology for the fifth-generation (5G) wireless communication systems due to its huge performance gains in terms of capacity and reliability [1]. To achieve these performance gains of massive MIMO, the downlink channel state information (CSI) has to be acquired at the base station (BS) [2]. In time-division duplexing (TDD) systems, channel reciprocity is usually used to obtain the downlink CSI, but it requires accurate calibration of hardware circuitry with high M Manuscript received 19 February 2021; revised 25 October 2021 and 18 February 2022; accepted 10 April 2022. Date of publication 20 May 2022; date of current version 15 August 2022. This work was supported in part by the National Natural Science Foundation of China under Grant 61671367 and in part by the Research Foundation of Science and Technology on Communication Networks Laboratory. The review of this article was coordinated by Prof. Namyoon Lee. (Corresponding author: Jiancun Fan.) The authors are with the School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, China (e-mail: fanjc 0114@gmail.com; peizhel@stu.xjtu.edu.cn; 1413564217@qq.com; hxd13414 71413@stu.xjtu.edu.cn). Digital Object Identifier 10.1109/TVT.2022.3176290 cost [3]. In frequency-division duplexing (FDD) systems, the reciprocity of the uplink and downlink channels does not hold yet, hence the users have to estimate the downlink CSI first and then feed it back to the BS [4]. Fortunately, it doesn’t require complexity hardware calibration. Meanwhile, the traditional cellular system mainly works in FDD mode. Therefore, FDD massive MIMO is a very popular topic in academia and industry. However, FDD massive MIMO will result in a huge overhead of channel acquisition because the large-scale antennas are exploited at the BS. Especially, the time-varying fading of the channel will further increase the complexity of the channel acquisition. For the traditional methods, such as least-square (LS) or minimum mean-squared error (MMSE) algorithms [5], the required number of pilots always scales linearly with the number of BS antennas. To address this issue, compressive sensing (CS) has been widely applied in the channel estimation of massive MIMO so that the sparse CSI is recovered from a reduced number of received pilots [6], [7]. However, they cannot work well in the time-varying channel scenarios when considering the cost of channel estimation and feedback. Therefore, it is necessary to study the channel training and feedack scheme in FDD massive time-varying MIMO systems. Although the channel is time-varying, its statistical characteristics are often stable, thus the previous estimated information can be used in current estimation to track the time-varying characteristic of channel [8]–[11]. Especially, the stable statistical characteristics will result in the channel sparsity in some domain so the compressive sensing (CS) technique can be used to estimate the channel. In [8], a CS recovery approach is proposed to estimate the time-varying channel by using priori support information, and it also shows that the training signals could be further reduced by using the temporal correlation. In [9], the temporal correlation of time-varying channels is further exploited to propose a differential-based structured compressive sampling matching pursuit (S-CoSaMP) algorithm to acquire CSI. The above methods depend on the priori knowledge of channels, i.e. the correlation of channel as well as the sparsity of channel. In [10], a distributed compressed sensing (DCS)aided channel estimation approach is proposed to fully exploits slow variation of the channel statistics in consecutive time slots and spatially common sparsity within multiple subchannels in the frequency-domain. In [11], a feasible downlink training sequence design approach based on a partial CSI estimation 0018-9545 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on November 05,2022 at 20:48:54 UTC from IEEE Xplore. Restrictions apply. 8730 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 71, NO. 8, AUGUST 2022 is proposed for an FDD massive-MIMO system with a shorter coherence time. However, the above estimation schemes have critical requirements to the priori channel information. In [12]–[16], several channel estimation schemes without the priori channel information have been proposed. In [12] and [13], adaptive step size sparsity adaptive matching pursuit (AS-SAMP) and orthogonal matching pursuit algorithm with unknown sparsity (OMP-US) are proposed for sparse channel estimation when the number of non-zero coefficients is unknown. In [14], an adaptive channel estimation and feedback algorithm with low overhead for OFDM system is proposed, which can adaptively adjust the training overhead and pilot design for reliable CSI estimation. Compared with the method of M-SP in [8], [15] proposes an adaptive M-SP approach which could adaptively adjust the prior channel support quality parameter to the appropriate value in the case of model mismatch. In [16], an approximate message passing algorithm based on expectation-maximization and Gaussian-mixture distribution is proposed, which can simultaneously learn the signal distribution and recover the signal. However, all approaches based on CS have one drawback that the noise added on the non-zero elements is hard to eliminate. Therefore, it is necessary to find a better method to eliminate noise after using CS-based estimation. Recently, deep learning (DL) methods have been successfully applied in wireless communications [17]–[28], especially for the part of channel estimation and feedback. An autoencoderbased network, named CsiNet, has been designed for the CSI feedback. Specifically, the encoder compresses CSI at the user equipment (UE) and then the decoder at the BS reconstructs the channel matrix from the compressed representation [17]. Further more, an improved network, CsiNet+, further improves the performance of CsiNet by enlarging the convolutional kernel size [18]. In [19] and [20], two RNN networks are proposed to improve the feedback accuracy of CSI in Massive MIMO systems. In [21], a convolutional long short-term memory (LSTM) network (ConvLSTM-net)-based method has been proposed to predict the downlink CSI from the uplink CSI. By exploiting the correlation of time-varying channels, a convolutional neural network (CNN) and a recurrent neural network (RNN) in [22] extracts the spatial features and interframe correlation, respectively. Combining the CS-base and DL-based method, a framework, in [26], named CS-ReNet, compresses the perfect CSI at the user side and then reconstructs the CSI at the BS by utilizing deep neural network. A joint convolution residual network in [27] can benifit the MIMO channel feature extraction and recovery from the perspective of bit-level quantization performance. Considering the high mobility environments, [28] develops a channel estimation network, which consists of a CNN for mimicing the interpolation processes of frequency-domain and a bidirection LSTM (BiLSTM) network for time-domain channel prediction. Except the aforementioned data-driven approaches, there are also some model-driven deep learning approaches to obtain CSI. In [29], a fully convolutional denoising approximate message passing (FCDAMP) algorithm is proposed by combining fully convolutional denoising networks (FCDNet) with learned approximate message passing (LAMP) networks in millimeter-wave (mmWave) massive MIMO system. A dualdriven network with data and models is formed to retain the excellent characteristics of FCDNet and LAMP to further reduce the training information and improve the performance.In [30], a model-driven deep learning (MDDL)-based channel estimation and feedback scheme is proposed for wideband mmWave massive hybrid MIMO systems, where the channels’structured sparsity from an a priori model is first used and then learn the integrated trainable parameters from data samples. However, these above algorithms don’t eliminate the noise in the channel estimation and don’t reduce the estimation complexity by taking advantage of the time-varying characteristics of the channel. To solve the above problems, we propose a CS and DL-based channel estimation scheme for FDD massive MIMO systems, which simultaneously reduces the overhead for downlink training and uplink feedback and improve the channel estimation accuracy in the time-varying channel. In the scheme, the users directly feedback the received pilots to the BS first, and then the channel matrices are reconstructed using a CS-based method, named adaptive structured orthogonal matching pursuit (ASJOMP), and two DL-based methods, named DnNet as well as DnLSTM. Specifically, we first develop on AS-JOMP algorithm, which could adaptively reconstructe the channel by exploiting its structured sparsity without using the knowledge of the number of non-zero elements in CSI. The structured sparsity is caused by the tight arrangement of antennas at the base station which makes the signals experience similar paths. After obtaining the initial estimation of CSI, aiming at the problem that the preliminary CSI could not effectively eliminate the noise adding at the position of non-zero elements and did not make full use of the effective information between time-varying channels, we propose two network models based on deep learning, namely DnNet network based on multi-layer CNN and DnLSTM network based on a small number of layers CNN and LSTM network, to further improve the accuracy of the final CSI estimation. As is well-known, DL-based methods usually have high computing cost during training stage because of the large model sizes. In this article, we use a lightweight DNN, which has a lighter architecture than others for noise reduction and only needs two hundreds epochs to train. The proposed channel estimation scheme avoids the complex computations at power-limited users for channel estimation and CSI projection, which not only relieves the computational burden for the user devices but also reduces the cost of feedback. We analyze the CSI quality at the BS in terms of the normalized mean-squared error (NMSE). Simulation results demonstrate that the proposed method outperforms the existing classical algorithm. The main contribution of this article is summarized as follows: 1) We propose a channel estimation scheme based on CS and DL for FDD massive MIMO systems, which simultaneously reduces the overhead for downlink training and uplink feedback and improve the estimation accuracy. In this scheme, we first send the received pilots from the UE to the BS, and then reconstruct the channel matrices by using the method based on CS and DL. Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on November 05,2022 at 20:48:54 UTC from IEEE Xplore. Restrictions apply. FAN et al.: COMPRESSIVE SENSING AND DEEP LEARNING-BASED TIME-VARYING CHANNEL ESTIMATION FOR FDD MASSIVE MIMO SYSTEMS 8731 The recieved symbol, y, at the time-domain can be expressed as y= N BS hi ∗ xi + n, (2) i=1 Fig. 1. where hi ∈ C L×1 is the CSI from the ith BS antenna to the user as described in Section II, L is the length of channel, n ∈ C Nc ×1 denotes the independent and identically distributed (i.i.d.) additive white complex Gaussian noise (AWGN) with zero mean and variance, ∗ denotes convolution operation, which can be also expressed as the matrix multiplication. If the OFDM system is with a cyclic prefix of a proper length, then the linear convolution in (2) will turn into the circular convolution after removing the cyclic prefix at the receiver. We use proper zeros-padding (ZP) for hi , the convolution operation will be comb-type pilots. 2) We propose an adaptive structured orthogonal matching pursuit (AS-JOMP) algorithm, which adaptively reconstructs the sparse channel by exploiting the structured sparsity of wireless MIMO channels and requires no prior information about the sparsity of the time-varying channel. 3) We utilize two DL-based methods, DnNet and DnLSTM networks, trained by training set and verification set to denoise and learn the characteristic of the time-varying channel, which makes the system performance better when the signal-to-noise ratio (SNR) or the training overhead is low. The rest of this article is organized as follows. Section II provides the system model. Section III develops CS-based ASJOMP for CSI estimation and two DL-based methods, DnNeT and DnLSTM, for denoising. In Section IV, simulation results are presented and finally Section V concludes this article. II. CHANNEL MODEL We consider a massive MIMO OFDM system operating in the FDD mode. In this system, there are NBS antennas equipped at the BS and K scheduled single-antenna users. In this article, we adopt a comb pilot scheme, which is shown in Fig. 1. In this scheme, pilots are equally spaced in each OFDM symbol, and each circle represents a resource unit in the time-domain and frequency-domain. The users working at the non-overlapped frequency bands. The pilots at the ith transmit antenna of the kth user is cik ∈ C P ×1 , where P is the number of pilots. Without loss of generality, we omit subscript k in the subsequent discussion. The pilot symbol at the ith antenna is ci ∈ C P ×1 , where the elements in ci are randomly selected from OFDM symbol xfi . By inverse discrete Fourier transform (IDFT), we can get the transmission symbol at the time-domain as xi = F H xfi , yo = where xi ∈ C Nc ×1 , FH ∈ C Nc ×Nc is the DFT matrix, (·)H denotes the complex conjugate transpose, and Nc is the number of subcarriers. The transmission symbol, xi , at the time-domain is trandmitted by the ith antenna at the BS. hoi ∗ xi + n, (3) i=1 where hoi ∈ C Nc ×1 and the Lth to Nc th elements of hoi are zero. The convolution in (3) can be also expressed into matrix multiplication yc = N BS Xci hoi + n, (4) i=1 where Xci ∈ C Nc ×Nc is a Toeplitze matrix ⎡ xi [0] xi [Nc − 1] xi [Nc − 2] ⎢ xi [1] xi [0] xi [Nc − 1] ⎢ ⎢ xi [1] xi [0] Xci = ⎢ xi [2] ⎢ ⎣ ··· ··· ··· xi [Nc − 1] xi [Nc − 2] xi [Nc − 3] ⎤ xi [1] xi [2] ⎥ ⎥ xi [3] ⎥ ⎥ . (5) ⎥ ··· ⎦ · · · xi [0] ··· ··· ··· .. . In (4), yc is the accumulation of sequence that is cutted to the o length of Nc from yo , yc = +∞ i=−∞ y [n + i(2Nc − 1)]. And the circular convolution in (3) is used to represent by matrix convolution. By using DFT, the recieved symbol at the frequence domain can be expressed as yf = Fyc = N BS FXci hoi + Fn, (6) i=1 yf = N BS FXci FH Fhoi + Fn i=1 = (1) N BS N BS diag {xfi }Fhoi + Fn (7) i=1 Because Xci is the Toeplitze matrix, FXci FH is the diagonal matrix whose elements are xfi . To estimate the CSI hi , the users have to extract the pilot c ∈ C P ×1 from the recieved symbol yf . The received symbol yΩ Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on November 05,2022 at 20:48:54 UTC from IEEE Xplore. Restrictions apply. 8732 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 71, NO. 8, AUGUST 2022 pilot can be written as c yΩ = N BS diag {xfi,Ω }FΩ,L hi + FΩ nc , (8) i=1 where xfi,Ω is selected from xfi in the index set Ω, which is selected from subcarriers set {1, 2, . . . , Nc } as the type of comb pilot, FΩ,L ∈ C P ×L is correspondingly a partial Fourier matrix, the rows are selected form the index of pilot and the columns are selected from the first L columns from F. Obviously, xfi,Ω is equal to ci . Thus c yΩ = N BS Ci FΩ,L hi + nc , (9) i=1 where Ci diag{ci }, hi = [hi (1), hi (2), . . . , hi (L)]T is the CSI from the ith BS antenna to the user; nc = [n1 , . . . , nP ]T denotes the i.i.d. additive white complex Gaussian noise with zero mean and variance σn2 over the pilots. If comb-type pilots as shown in Fig. 1 are used, the received pilots for a certain user can be expressed as c = Φh + n, yΩ (10) where Φ = [C1 FΩ,L , C2 FΩ,L , . . . , CM FΩ,L ], Φ ∈ C P ×LNBS and h = [hT1 , hT2 , . . . , hTM ]T , h ∈ C LNBS ×1 . Based on many pratical measurements and theoretical analysis [31], we have an important observation that the number of non-zero elements of CSI hi is much smaller than its length, L, thus the channel could be compressed in the time-domain. It is obvious that (10) can be treated as a problem of CS, Φ is the sensing matrix, and h is the sparse signal. Thus we can reconc by utilizing a CS-based reconstruction struct h from Φ and yΩ algorithm. Under this channel model, the uplink channel has the same SNR as the downlink, and the noise parameter, σn2 , denotes the overall noise power both in the downlink and in the uplink [32]. For block-fading time-varying MIMO channels, the CSI changes from time slot to another but remains same during a slot. The dynamic channel could be modeled by the variation of CSI’s support and the evolution of the non-zero elements’ amplitudes [9] as ht = st ◦ gt , (11) where ht is the CSI at time t, st (l) ∈ {0, 1} denotes whether the lth support index of st is zero, gt (l) ∈ C denotes the amplitude at the lth support index of gt and ◦ denotes the Hadamard product. Specifically, we use a first-order Markov process to model the support change, i.e. we define two transition probabilities as p10 P (st+1 (l) = 1|st (l) = 0) , (12) p01 P (st+1 (l) = 0|st (l) = 1) , (13) and p 1−p p01 , to achieve the steady-state sparsity where we set p10 = rate p ∈ (0, 1). As for the amplitudes, we use a first-order autoregressive model expressed as gt (l) = ρgt−1 (l) + 1 − ρ2 vt (l), (14) where the correlation coefficient ρ = J0 (2πfd τ ) is given by the zero-order Bessel function of the first kind, with fd being the maximal Doppler frequency and τ being the time slot duration, and the parameter vt (l) ∼ CN (0, σω2 ) is the i.i.d. complex Gaussian variables. Some classical CS-based algorithms require to know ahead of time that the number of non-zero elements in h, which are not appropriate for the time-varying channel system since it is hard to get the priori knowledge at every time slot. So it is imperative to finding a new CS-based method that requires no priori knowledge of channel and more suitable for this system. III. JOINT CHANNEL TRAINING AND FEEDBACK In this section, we will propose a joint training and feedback scheme, which includes two steps. In the first step, we adopt the CS-based AS-JOMP algorithm to recover CSI h from received c at the BS, then we analyse the performence of ASpilots yΩ JOMP by discussing the mutual coherence property (MCP) of the sensing matrix. In the second step, we use the estimated CSI as the input of the DnNet and the DnLSTM to denoise and learn the time-variant characteristic between the time-varying channels. A. AS-JOMP Algorithm For simplicity, we omit superscript t in the above equations. For the FDD system, the CSI knowledge at the BS is obtained by the step that the ith user estimates the CSI and then feed it back to the BS. Using the conventional LS-based CSI estimation techniques, channel can be estimated by c , ĥ = Φ† yΩ † H H (15) −1 where Φ = Φ (ΦΦ ) is the Moore-Pensrose pseudoinverse. However, this LS-based approach requires P ≥ LNBS [5], which induces an overwhelming pilot training and CSI feedback overhead, particularly when NBS is large. Thankfully, when P < LNBS , (7) is a problem in underdetermined but could be soveled by CS technique by exploiting the sparsity of CSI c and Φ are referred to vector. For the general CS model, yΩ as the measurement and the measurement matrix, respectively. Moreover, h ∈ C LNBS ×1 , Φ ∈ C P ×LNBS and y ∈ C P ×1 . Obviously the overhead for downlink training is reduced because of the decrease of the number of pilots. Furthermore, the users feed back the recived pilots to the BS, in this way, the uplink feedback can also be reduced from LNBS to P . In many compressed sensing reconstruction algorithms, orthogonal matching pursuit (OMP) algorithm is the classic one which is used and improved many times [33]. OMP reconstructs the channel by iteratively recognizing the support set that contains the indices of the columns of Φ which correlated with the c . In each iteration, it selects an index based on measurement yΩ a maximum correlation test and subtracts the contribution of the corresponding column from the current measurement. The iterative process continues until the indices of non-zero elements are all recognized. However, the sparsity S is not available for practical applications in general. This motivates us to propose an improved AS-JOMP algorithm based on the characteristics of Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on November 05,2022 at 20:48:54 UTC from IEEE Xplore. Restrictions apply. FAN et al.: COMPRESSIVE SENSING AND DEEP LEARNING-BASED TIME-VARYING CHANNEL ESTIMATION FOR FDD MASSIVE MIMO SYSTEMS the practical channel. From [34], in the case of far field, the CSI between the BS antennas shares the common support and has the property of structured sparsity because of the close antenna (t) BS spacing at the BS. Thus {hi }N i=1 can be generated with the (t) NBS same support vectors {si }i=1 . Meanwhile, we also observe that the decreasing magnitudes of energy difference of remaining observation vectors between two consecutive iterations are different under the different SNR. Following the structured sparsity, the AS-JOMP algorithm divides the cross-correlation vector which is the product of the sensing matrix Φ and the remaining observation vectors ri into NBS vectors and summation, then selects the index of maximum value like OMP. This process can select more accurate indices of non-zero elements of CSI. With the second observation, two threshold parameter are used to stop the iteration under high and low SNR. Thus AS-JOMP algorithm can adaptively recover the CSI without any priori information of the sparsity. The pseudocode of the proposed algorithm is provided in Algorithm 1. For Algorithm 1, some notations should be further detailed. supp(Γ(x, k)) denotes an operator on x that find the index of the k largest value in x and Tmin () in step 13 means to find the first value that makes the inequality true. Then we further explain the main steps in Algorithm 1 as follows. First, for steps 3 − 8, we aim to find those columns of measurement matrix Φ that are participate in forming the measurement y. The elements in set Λ are the indices of the columns that are more correlated with y. Second, in steps 9 − 11, we use thresholds ξ1 to stop the iteration, obtaining the sparsity level and signal estimation in high SNR. In steps 12 − 15, we use thresholds ξ2 to stop the iteration and obtain the sparsity level in low SNR. Thus, until step 17, we get the true sparsity level s. Third, for steps 18 − 28, we can estimate the signal by the LS algorithm. Finally, the ture signal estimation can be got by step 29. Compared to the other state-of-the-art CS-based channel estimation schemes [33], [35], the proposed AS-JOMP algorithm has the following distinctive features: 1) The proposed AS-JOMP algorithm removes the unrealistic assumption that the channel sparsity level is required as the priori information for reliable channel estimation since it can adaptively acquire the sparsity level of wireless MIMO channels. The proposed stopping criteria enables AS-JOMP algorithm to estimate channels with good performance under high and low SNR levels. 2) The AS-JOMP algorithm offers a more precise support update by considering the structured sparsity of h, the BS support of each {hi }N i=1 is updated together. By doing so, the CSI recovery performance can be improved. B. Performance Analysis of Proposed AS-JOMP Algorithm From the theory of CS, sensing matrix Φ plays a decisive role in the recovery performance of sparse vectors h. In order to ensure the sparse signal recovery performance, many researches have proved that the sensing matrix should satisfy the restricted isometry property (RIP) and MCP [9], [33]. In this section, we will discuss the MCP of the sensing matrix in our method. 8733 Algorithm 1: AS-JOMP Algorithm. Input: Received pilots y; Measurement matrix Φ; Estimated sparsity s; Threshold ξ1 , ξ2 . Output: CSI recovery ĥ 1: Initialization: 2: ri = y, i = 1, f lag = 0, Λ = ∅. 3: while f lag = 0 do BS −1 e(mL + l), l = 4: e = ΦH ri ; z(l) = N m=0 1, 2, . . . , L 5: inx = supp(Γ(z, 1)) 6: J0 = {mL + l}, m = 0, 1, 2, . . . , NBS − 1, l = inx 7: Λ = Λ ∪ J0 ; ĥt = Φ†Λ y 8: i = i + 1; ri = y − Φ†Λ ĥt 9: if i > 1& ri−1 2 > ξ1 ri 2 then 10: s=i−1 11: break; 12: else if i >= m then 13: j = Tmin ( rj−1 2 < ξ2 rj 2 ), j = 2, . . . , m 14: do initial again 15: f lag = 1; s = j − 1 16: end if 17: end while 18: while f lag = 1 do BS −1 e(mL + l), l = 19: e = ΦH ri ; z(l) = N m=0 1, 2, . . . , L 20: inx = supp(Γ(z, 1)) 21: J0 = mL + l, m = 0, 1, 2, . . . , NBS − 1, l = inx 22: Λ = Λ ∪ J0 ; ĥt = Φ†Λ y 23: i = i + 1; ri = y − Φ†Λ ĥt 24: if i >= s then 25: f lag = 0 26: end if 27: end while 28: ĥΛ = ĥt 29: Return ĥ The coherence μ(Φ) is defined as the largest normalized inner product of any two columns of Φ μ(Φ) = max 1≤i=j≤LNBS ΦH :,i Φ:,j , Φ:,i Φ:,j (16) where Φ:,i is the ith column vector of μ(Φ). The sparsity, i.e. the number of non-zero elements of h, is SN BS . Suppose that ( 1 +1) SNBS ≤ μ(Φ)4 , then the estimated sparse vector ĥ has error bounded by [9], [33] ĥ − h 2 ≤ γ 1 − μ(Φ)(4SNBS − 1))) . (17) From (17), the estimation error is influenced by the parameters, the variance of noise vector γ, the value of μ(Φ), which is always greater than or equal to zero, and the sparsity level of the channel. Once the communication system is given, the number of antennas will be given. And the sparsity and SNR are determined by the propagation features and the characteristics Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on November 05,2022 at 20:48:54 UTC from IEEE Xplore. Restrictions apply. 8734 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 71, NO. 8, AUGUST 2022 of communication system. Therefore, the smaller value of the μ(Φ), the better recovery performance. It will make the design of Φ very essential. As for Φ = [C1 FΩ,L , C2 FΩ,L , . . . , CM FΩ,L ], the value of the μ(Φ) is determined by the pilot symbol Ci . Suppose that the elements in pilot sequence have random phase and unit amplitude, the magnitude of the column vectors of the matrix Φ √ is P . We have G= 1 H ΦH (:, i) Φ(:, j) Φ Φ= , P Φ(:, i) Φ(:, j) (18) thus the maximum absolute value of the non-diagonal elements gij (i = j) of G is μ(Φ), that is μ(Φ) = max |gij | . (19) i=j Fig. 2. The DnNet model for CSI denoising. And according to formula (18), (19) can be further expressed as P 2π 1 max μ(Φ) = cip cjp e−j Nc mp (n1 −n2 ) , P i=j˜or˜n1 =n2 p=1 C. CNN-Based DnNet Method (20) where 1 ≤ i, j ≤ LNBS , 0 ≤ n1 , n2 ≤ L − 1, cip (p = 1, 2, . . ., P ) is the complex conjugate of the kth element of ci , and the {mp }P p=1 is the subcarrier index which assigned to the −j Nc mp (n1 −n2 ) ∗ pilot. When i = j or n1 = n2 , P1 P p=1 cip cjp e is not the diagonal element of G. Consider three situations: r if i = j and n1 = n2 , (20) can be expressed as 2π μ(Φ) = P 2π 1 max e−j Nc mp (n1 −n2 ) , P p=1 (21) m because Ncp obey uniform distribution U[0, 1), the value of expectation of μ(Φ) is zero. r if i = j and n1 = n2 , (20) can be expressed as P P 1 1 max cip cjp = max ej2π(θjp −θip ) , P P p=1 p=1 (22) because θjp and θip is the phase of pilot cjp and cip , they obey uniform distribution U[0, 1), the value of expectation of μ(Φ) is zero. r if i = j and n1 = n2 , (20) can be expressed as μ(Φ) = P mp 1 max ej2π( Nc (n2 −n1 )+(θjp −θip )) . i =jandn =n P 1 2 p=1 (23) m because Ncp , θjp and θip obey uniform distribution U[0, 1), the value of expectation of μ(Φ) is zero. From the above analysis, the elements in pilot sequence have random phase and unit amplitude can ensure the value of the μ(Φ) be minimum, i.e. the value of expectation of μ(Φ) is zero. Hence, the sparse vector h can be estimated accurately. Eventhough the indices of non-zero elements of sparse channel can be selected by AS-JOMP algorithm, the noise that added on these indices does not be eliminate, and the time-variation characteristic is not well used. Based on these requirements, we proposed a DL-based network, named DnNet. μ(Φ) = In this section, we exploit the popular convolutional neural networks (CNNs) to denoise and improve the reconstruction quality. As we know, in the filed of signal denoising, the main method is sparse decomposition, whose principle is to use an over-complete atomic libraries to express the noise signal. The original signal is reconstructed with a few S-large coefficients and some small coefficients with noise are shielded. However, the noise that added on the S non-zero elements of the signal is hard to be eliminated with the CS-based method. Thus, we propose a CNN-based method, named DnNet, to denoise, i.e. learn the difference between the initial estimation ĥ and the original CSI h and then recover a more accuracy CSI h̃. The structure of our developed model, DnNet, is illustrated in Fig. 2. Here S1 , S2 and S3 denote the length, the width, and the number of feature maps, respectively. Once CSI is estimated, which is LNBS × 1, we reshape its real and imaginary parts into a 2LNBS × 1 real-valued vector and then normalize it. After that, we reshape the 2LNBS × 1 real-valued vector into two Ñ1 × Ñ2 matrices with Ñ1 × Ñ2 = LNBS , which can be regarded as an image with two channels. Then the reshaped initial estimate CSI is fed into several residual learning networks [36], named Resblock. Each Resblock consists of four layers, as shown in Fig. 2. The first layer is the input layer, for layers 2-4, S3 filters of size 3 × 3 × S3 are used, and batch normalization is added between convolution and Leaky rectified linear unit (LeakeyReLU), which is used as the activation function. We consider the input of the Resblock as a shortcut and add it to the output of the fourth convolutional layer to avoid the vanishing gradient problem. Once the channel matrix has been refined by all Resblocks, the channel matrix is inputted into the final convolutional layer and the sigmoid function is used to scale values to [0,1]. Finally, we reshape the output of our network and perform the anti-normalization to obtain the final reconstruction of CSI h̃. Through extending simulation, we find using five Resblocks can obtained the best performance, adding more Resblocks does not significantly boost reconstruction quality but increases computational complexity and training time. In this training, Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on November 05,2022 at 20:48:54 UTC from IEEE Xplore. Restrictions apply. FAN et al.: COMPRESSIVE SENSING AND DEEP LEARNING-BASED TIME-VARYING CHANNEL ESTIMATION FOR FDD MASSIVE MIMO SYSTEMS Fig. 3. The DnLSTM Network data transmission model. Fig. 4. The ConvLSTM network basic unit structure. parameters are updated by the adaptive moment estimation (ADAM) algorithm. The loss function is the mean-squared error (MSE), which is calculated as follows: L(Θ) = T 1 f (ĥ; Θ) − h) T i=1 2 2 (24) where · 2 is the Euclidean norm, Θ is the set of parameters of our deep learning network, T is the total number of samples in the training set, and f (ĥ; Θ) is the output of the trained model. D. CNN-Based DnLSTM Method In DnNet network, CNN can effectively extract the features of CSI matrix to reduce the noise, and the network can also learn time-varying features. However, at training stage, it is not reliable to only use multi-layer CNN to learn time-varying features, and there is no theoretical support. Therefore, this section proposes a joint network based on CNN and LSTM, named DnLSTM, which can not only extract channel structure information by CNN to reduce noise, but also learn the correlation of time-domain channel through LSTM network. In this way, the spatial and temporal characteristics of the channel can 8735 be effectively learned so that the final CSI estimate accuracy can be improved. The structure of DnLSTM network proposed in this article is shown in Fig. 3. In this network, the CSI channel group Ĥi ∈ C LNBS ×1×T is taken as input, where T is the number of CSI in a channel group and i = 1, 2, . . . , Nsample , where Nsample is the number of samples. The output result of the network is expressed as Hi ∈ C LNBS ×1×T . For convenience, we omit the subscript i and express Ĥ and H as Ĥ = [ĥ0 , ĥ1 , . . . ĥT −1 ] and H = [h0 , h1 , . . . hT −1 ]. At present, the input data of deep neural network are all real numbers. So we use two matrices of real numbers to represent the real and imaginary parts of the input channel matrix. Then all the elements in the matrix are normalized to [0,1], which can be equivalent to image data with two channels for convenient training. ĥ obtained by as-JOMP algorithm is a complex vector with the size of LNBS × 1. Therefore, the real and imaginary parts of the complex vector are separated into two real vectors of size LNBS × 1, and then transformed into two real matrices of size Ñ1 × Ñ2 , Ñ1 × Ñ2 = LNBS . Thus, the data of each layer of the network is a four-dimensional tensor, and S1 × S2 × S3 on the left of each layer represents the size of CSI channel group. S4 at the top is the number of feature maps. In fact, the proposed DnLSTM network is composed of three layers, mainly including channel feature extraction, timevarying channel correlation extraction, and the CSI reconstruction. The main components are as follows: 1) Channel Feature Extraction: Channel feature extraction is mainly completed by CNN network at the first layer. In this process, each matrix of time dimension can be separately convoluted to extract features. The convolution layer contains 64 3 × 3-sized convolution kernels with sliding step size of 1. The same padding is used to fill zeros around the input during convolution, so that the feature graph output by the network of each layer is kept the same size as the original channel matrix, and ReLU is used as activation function. 2) Time-varying Channel Correlation Extraction: The extraction of temporal correlation is mainly accomplished by the second layer LSTM network. At present, most LSTM networks predicting temporal state use fully connected structure, but the input of fully connected networks is one-dimensional and the spatial correlation is not taken into account. Compared with traditional methods, the convolution operation used in ConvLSTM network can get better space-time relation since the ConvLSTM layer, like LSTM, takes the output of the previous layer as the input of the next layer. As shown in Fig. 4, the ConvLSTM network consists of an input gate it to record new information to the current state, an output gate ot to control how much information of the current state is visible to the external network, a forgetting gate ft to control how much information of the historical state is allowed to enter after flowing to the current state, and a memory unit Ct to store information of the previous moment. Here Wxi , Whi , Wci , Wxo , Who , Wco , Wxf , Whf , Wcf is the weight matrix, bi , bo , bf , bc is the bias, ◦ denotes the Hadamard product, σ(·) represents the sigmoid layer. The output is numbers in range of [0,1]. Output 0 represents the proportion of information passing, 0 means no passing, 1 means all passing. Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on November 05,2022 at 20:48:54 UTC from IEEE Xplore. Restrictions apply. 8736 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 71, NO. 8, AUGUST 2022 The above parameters can be calculated by ⎧ it = σ (Wxi ∗ xt + Whi ∗ ht−1 + Wci ◦ Ct−1 + bi ) ⎪ ⎪ ⎪ ⎪ ⎨ ot = σ (Wxo ∗ xt + Who ∗ ht−1 + Wco ◦ Ct + bo ) ft = σ (Wxf ∗ xt + Whf ∗ ht−1 + Wcf ◦ Ct−1 + bf ) ⎪ ⎪ C ⎪ t = ft ◦ Ct−1 + it ◦ tanh (Wxc ∗ xt + Whc ∗ ht−1 + bc ) ⎪ ⎩ ht = ot ◦ tanh (Ct ) (25) We take the preliminary estimated T CSI matrices as input to the ConvLSTM layer in chronological order. At each time step, the ConvLSTM network can fuse the temporal correlation information learned at the previous time into the input of the current time step. The correlation information is updated as the time step passes. 3) CSI Reconstruction: In order to obtain the same dimension as the input data, we consider using the same padding, ReLU activation function and 3 × 3 × 2 size of filters. Finally, we implement dimension transformation and inverse normalization to obtain the final CSI estimation result h. IV. SIMULATION RESULTS This section investigates the performence of the proposed ASJOMP-based and DL-based joint channel training and feedback scheme. In the massive MIMO system, the length of the OFDM symbol is Nc = 1024, the BS is equipped with NBS = 16 antennas, and the channel length is L = 64. The parameters that generate the time-varying channel are set as follows [9]. The sparsity probability μ is set as 0.1, i.e. the average channel sparsity level is μL = 6, the transition probability p01 is set as 0.16, the movement velocity v of users is 12 km/h, the carrier frequency is 900 MHz, the time slot duration is τ = 0.5 ms, σv2 = 1 and the initial amplitudes g0 (l) ∼ CN (0, 1). The coefficients ξ1 and ξ2 in Algorithms 1 is set as 2 and 1.1, respectively. The training overhead is η = P/N . Keras and Tensorflow with a GPU backend are used to implement our proposed scheme. In the offline stage, the loss function of the network is measured by MSE and the exploited optimization algorithm is adaptive moment estimation (ADAM). The batch size is 200 with the epochs of 200. The training, validation and testing set contains 84,000, 18,000 and 18,000 samples, respectively. Before training the model, we normalize the estimated signal. The training set, validation set, and test set use the same parameters to perform the normalization and anti-normalization. The recovery performance is quantified by the NMSE as 2 (26) NMSE = E h − h̃ / h 22 , 2 where h and h̃ denote the original and the recovered CSI, respectively. We use the NMSE to measure the difference between the recovered CSI h̃ and the original h. Firstly, in Fig. 5, we compare the proposed AS-JOMP algorithm with other calssical methods that we have mentioned under different training overhead r J-OMP: Joint orthogonal matching pursuit (J-OMP) algorithm has been proposed in [4] for conducting the CSI reconstruction by exploiting the joint sparsity in the user Fig. 5. Performance comparison between CS reconstruction algorithms in different training overhead. channel matrices. J-OMP degrades as OMP [33] because we consider singel user instead of multiple users. r S-CoSaMP: Structured compressive sampling matching pursuit (S-CoSaMP) [9] is the improvement of CoSaMP [37] by exploiting the temporal correlation of MIMO channels. r SP: Subspace pursuit (SP) algorithm is a very important greedy methodology in compressed perception. It has a faster computing speed and better reconstruction probability, and is widely used in practice [38]. r SAMP: Sparsity adaptive matching pursuit (SAMP) algorithm require no sparsity S, it fill the support set by setting the step size and appropriate stopping conditions (which greatly affects the accuracy) [39]. r Oracle-LS: It is the performance bound in the CS-based algorithms. In our paper, this serves as a benchmark to evaluate the quality of channel estimation, in which the position of non-zero elements are assumed known and only the values are estimated by the LS estimator [4], [13]. From Fig. 5, when the training overhead, η, increases, the accuracy of reconstruction increases too. That is because when η increase, the P is large, the size of measurement marix is also large, so the compressive measurement y contains more information about channel, the more observations are made for the sparse channel, the more channel features are captured in the observed value vector, and the better reconstruction is. When the training overhead increase, the reconstruction performance of the algorithm is only slightly improved. On the other hand, the proposed scheme achieves more reduction in channel training and feedback overhead over the conventional scheme in achieving the same performece on recovering CSI at the BS. In Fig. 6, the NMSE with respect to SNR is illustrated when η = 0.3. Note that the NMSE of all methods decreases when the value of SNR increases because the noisy energy decreases. In addition, the AS-JOMP algorithm achieves the best NMSE performance and is superior to the other method. That is because it exploites the structured sparsity of channel and adaptively find the sparsity level under the both high and low SNR levels. Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on November 05,2022 at 20:48:54 UTC from IEEE Xplore. Restrictions apply. FAN et al.: COMPRESSIVE SENSING AND DEEP LEARNING-BASED TIME-VARYING CHANNEL ESTIMATION FOR FDD MASSIVE MIMO SYSTEMS Fig. 6. Performance comparison between CS reconstruction algorithms in different SNR. Fig. 8. Performance comparison between AS-JOMP algorithm and added DnNet as well as DnLSTM networks. COMPARISON OF Fig. 7. Performance comparison between CS reconstruction algorithms in different number of BS antennas. In massive MIMO systems, different antenna scale will lead to different channel estimation performance. Therefore, Fig. 7 shows the evaluation of the effect of BS antennas on the channel estimation results, when SNR=20, η = 0.3 and the number of subcarriers Nc is 1024. It can be observed that with the increase of NBS , the NMSE performance of the AS-JOMP algorithm as well as other algorithms deteriorates. However, AS-JOMP algorithm can always approach the performance limit, while NMSE performance of other algorithms deteriorates more seriously. This is because the proposed algorithm can more accurately estimate the position of non-zero elements while other methods cannot when they gradually increase. When NBS > 16, as the channel length L = 64 and Nc is 2048, the number of pilots P required by the traditional LS and MMSE method should be equal or greater than 1024, so it is not applicable. However, CS-based algorithms, especially the AS-JOMP algorithm, can still achieve good performance. Therefore, this method has great advantages in massive MIMO systems. Fig. 8 shows the NMSE of AS-JOMP algorithm, AS-JOMP + DnNet algorithm, and AS-JOMP + DnLSTM algorithm with SNR, where the number of BS antennas is 32, the pilot overhead is 30%, and the number of subcarriers is 1024. It can be observed 8737 TABLE I RECONSTRUCTION TIME OF EACH ALGORITHM that the estimation performance has been greatly improved by adding DnNet after the AS-JOMP algorithm, especially in the case of low SNR. This is because when the SNR is low, there are more different characteristics between the noisy CSI and the original CSI are more different. When the SNR is high, the accuracy of the AS-JOMP algorithm has been very good, so there is only little difference between the two methods. On the other hand, when DnLSTM is added after the AS-JOMP algorithm, the NMSE under different SNR is further reduced. This is because the AS-JOMP algorithm only estimates the channel in a single temporal state and does not use the information of time-varying channels, while the LSTM network in DnLSTM can effectively learn the long-term time-varying information. In this case, its performance is also better than the DnNet algorithm. In order to analyze the computation complexity of each algorithm, the comparison of the average reconstruction time of each algorithm is given in Table I. The trained DnNet and DnLSTM networks were tested on Nvidia GeForce RTX2070 GPU. As the traditional method cannot use GPU acceleration, Intel Core I7 CPU is used for calculation. It can be observed that J-OMP algorithm has the shortest reconstruction time, because it directly reconstructs the measured value according to the known information of the support and does not need to search for sparsity adaptively. S-CoSaMP and SP algorithms are also with known priori information, but the reconstruction time is higher than AS-JOMP algorithm. This is because AS-JOMP algorithm uses two iterative thresholds, which makes the adaptive sparseness search faster. At the same time, it can be seen that the time used for CSI estimation based on DnNet network and Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on November 05,2022 at 20:48:54 UTC from IEEE Xplore. Restrictions apply. 8738 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 71, NO. 8, AUGUST 2022 DnLSTM network is much less than that required by traditional methods. The average CSI reconstruction time is 0.0001 and 0.0002 seconds, and the time used by DnLSTM network is the smallest because the structure of DnLSTM network is simpler than that of DnNet. V. CONCLUSION In this article, we have proposed a joint channel estimation method which contains two steps to reduce the training and feedback overhead in the time-varying channels. We use a new CS-based algorithm, named AS-JOMP, to reconstruct the CSI from the received pilots at the BS and then exploit a DL-based algorithm, named DnNet, to obtain more accurate results. The stimulation results have verified the performance gains on recovering CSI by the proposed method in terms of recovery accuracy. REFERENCES [1] Z. Qin, J. Fan, Y. Liu, Y. Gao, and G. Y. Li, “Sparse representation for wireless communications: A compressive sensing approach,” IEEE Signal Process. Mag., vol. 35, no. 3, pp. 40–58, May 2018. [2] Q. Shi, M. Razaviyayn, Z. Luo, and C. He, “An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4331–4340, Sep. 2011. [3] M. Arnold, S. Dörner, S. Cammerer, S. Yan, J. Hoydis, and S. T. Brink, “Enabling FDD massive MIMO through deep learning-based channel prediction,” 2019, arXiv:1901.03664. [4] X. Rao and V. K. N. Lau, “Distributed compressive CSIT estimation and feedback for FDD multi-user massive MIMO systems,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3261–3271, Jun. 2014. [5] M. K. Ozdemir and H. Arslan, “Channel estimation for wireless OFDM systems,” IEEE Commun. Surv. Tut., vol. 9, no. 2, pp. 18–48, Apr.–Jun. 2007. [6] X. Zhu, L. Dai, G. Gui, W. Dai, Z. Wang, and F. Adachi, “Structured matching pursuit for reconstruction of dynamic sparse channels,” in Proc. IEEE Glob. Commun. Conf., 2015, pp. 1–5. [7] Y. Han, P. Zhao, L. Sui, and Z. Fan, “Time-varying channel estimation based on dynamic compressive sensing for OFDM systems,” in Proc. IEEE Int. Symp. Broadband Multimedia Syst. Broadcast., 2014, pp. 1–5. [8] H. Yang, Y. Fan, D. Liu, Z. Zheng, and S. Lin, “Compressive sensing and prior support based adaptive channel estimation in massive MIMO,” in Proc. 2nd IEEE Int. Conf. Comput. Commun., 2016, pp. 1618–1622. [9] W. Shen, L. Dai, Y. Shi, B. Shim, and Z. Wang, “Joint channel training and feedback for FDD massive MIMO systems,” IEEE Trans. Veh. Technol., vol. 65, no. 10, pp. 8762–8767, Oct. 2016. [10] R. Zhang, H. Zhao, and J. Zhang, “Distributed compressed sensing aided sparse channel estimation in FDD massive MIMO system,” IEEE Access, vol. 6, pp. 18383–18397, 2018. [11] M. A. Naser, M. Q. Alsabah, and M. A. Taher, “A partial CSI estimation approach for downlink FDD massive-MIMO system with different base transceiver station topologies,” Wireless Pers. Commun., vol. 119, pp. 3609–3630, 2021. [12] Y. Zhang, R. Venkatesan, O. A. Dobre, and C. Li, “An adaptive matching pursuit algorithm for sparse channel estimation,” in Proc. IEEE Wireless Commun. Netw. Conf., 2015, pp. 626–630. [13] M. J. Azizipour and K. Mohamed-Pour, “Compressed channel estimation for FDD massive MIMO systems without prior knowledge of sparse channel model,” IET Commun., vol. 13, no. 6, pp. 657–663, Apr. 2019. [14] Z. Gao, L. Dai, Z. Wang, and S. Chen, “Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO,” IEEE Trans. Signal Process., vol. 63, no. 23, pp. 6169–6183, Dec. 2015. [15] X. Bi, J. Zhao, G. Wang, Y. Lu, L. Zhou, and D. Li, “Modified CSbased downlink channel estimation with temporal correlation in FDD massive MIMO systems,” in Proc. Wireless Telecommun. Symp., 2018, pp. 1–7. [16] J. P. Vila and P. Schniter, “Expectation-maximization Gaussian-mixture approximate message passing,” IEEE Trans. Signal Process., vol. 61, no. 19, pp. 4658–4672, Oct. 2013. [17] C. Wen, W. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,” IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, Oct. 2018. [18] J. Guo, C. Wen, S. Jin, and G. Y. Li, “Convolutional neural networkbased multiple-rate compressive sensing for massive MIMO CSI feedback: Design, simulation, and analysis,” IEEE Trans. Wireless Commun., vol. 19, no. 4, pp. 2827–2840, Jan. 2020. [19] C. Lu, W. Xu, H. Shen, J. Zhu, and K. Wang, “Mimo channel information feedback using deep recurrent network,” IEEE Commun. Lett., vol. 23, no. 1, pp. 188–191, Jan. 2019. [20] X. Li and H. Wu, “Spatio-temporal representation with deep neural recurrent network in MIMO CSI feedback,” IEEE Wireless Commun. Lett., vol. 9, no. 5, pp. 653–657, May 2020. [21] J. Wang, Y. Ding, S. Bian, Y. Peng, M. Liu, and G. Gui, “UL-CSI data driven deep learning for predicting DL-CSI in cellular FDD systems,” IEEE Access, vol. 7, pp. 96105–96112, 2019. [22] T. Wang, C. Wen, S. Jin, and G. Y. Li, “Deep learning-based CSI feedback approach for time-varying massive MIMO channels,” IEEE Wireless Commun. Lett., vol. 8, no. 2, pp. 416–419, Apr. 2019. [23] Y. Wang, J. Yang, M. Liu, and G. Gui, “LightAMC: Lightweight automatic modulation classification via deep learning and compressive sensing,” IEEE Trans. Veh. Technol., vol. 69, no. 3, pp. 3491–3495, Mar. 2020. [24] J. Fan, S. Chen, X. Luo, Y. Zhang, and G. Y. Li, “A machine learning approach for hierarchical localization based on multipath MIMO fingerprints,” IEEE Commun. Lett., vol. 23, no. 10, pp. 1765–1768, Oct. 2019. [25] Y. Ge and J. Fan, “Beamforming optimization for intelligent reflecting surface assisted MISO: A deep transfer learning approach,” IEEE Trans. Veh. Technol., vol. 70, no. 4, pp. 3902–3907, Apr. 2021. [26] P. Liang, J. Fan, W. Shen, Z. Qin, and G. Y. Li, “Deep learning and compressive sensing-based CSI feedback in FDD massive MIMO systems,” IEEE Trans. Veh. Technol., vol. 69, no. 8, pp. 9217–9222, Aug. 2020. [27] C. Lu, W. Xu, S. Jin, and K. Wang, “Bit-level optimized neural network for multi-antenna channel quantization,” IEEE Wireless Commun. Lett., vol. 9, no. 1, pp. 87–90, Jan. 2020. [28] Y. Liao, Y. Hua, and Y. Cai, “Deep learning based channel estimation algorithm for fast time-varying MIMO-OFDM systems,” IEEE Commun. Lett., vol. 24, no. 3, pp. 572–576, Mar. 2020. [29] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel estimation for beamspace mmwave massive MIMO systems,” IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 852–855, Oct. 2018. [30] X. Ma, Z. Gao, F. Gao, and M. Di Renzo, “Model-driven deep learning based channel estimation and feedback for millimeter-wave massive hybrid MIMO systems,” IEEE J. Sel. Areas Commun., vol. 39, no. 8, pp. 2388–2406, Aug. 2021. [31] F. Kaltenberger, D. Gesbert, R. Knopp, and M. Kountouris, “Correlation and capacity of measured multi-user MIMO channels,” in Proc. IEEE 19th Int. Symp. Pers. Indoor Mobile Radio Commun., 2008, pp. 1–5. [32] H. Shirani-Mehr and G. Caire, “Channel state feedback schemes for multiuser MIMO-OFDM downlink,” IEEE Trans. Commun., vol. 57, no. 9, pp. 2713–2723, Sep. 2009. [33] M. F. Duarte and Y. C. Eldar, “Structured compressed sensing: From theory to applications,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4053–4085, Sep. 2011. [34] Y. Barbotin, A. Hormati, S. Rangan, and M. Vetterli, “Estimation of sparse MIMO channels with common support,” IEEE Trans. Commun., vol. 60, no. 12, pp. 3705–3716, Dec. 2012. [35] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing: Closing the gap between performance and complexity,” CoRR, vol. abs/0803.0811, pp. 1–19, Jan. 2008. [36] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778. [37] D. Needell and J. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples – sciencedirect,” Appl. Comput. Harmon. Anal., vol. 26, no. 3, pp. 301–321, May 2009. [38] N. Kato, B. Mao, F. Tang, Y. Kawamoto, and J. Liu, “Ten challenges in advancing machine learning technologies toward 6G,” IEEE Wireless Commun., vol. 27, no. 3, pp. 96–103, Jun. 2020. [39] T. T. Do, L. Gan, N. Nguyen, and T. D. Tran, “Sparsity adaptive matching pursuit algorithm for practical compressed sensing,” in Proc. 42nd Asilomar Conf. Signals Syst. Comput., 2008, pp. 581–587. Authorized licensed use limited to: The Islamia University of Bahawalpur. Downloaded on November 05,2022 at 20:48:54 UTC from IEEE Xplore. Restrictions apply.