A Near Optimal QoE-Driven Power Allocation Scheme for SVC-Based Video Transmissions Over MIMO Systems Xiang Chen1, Jenq-Neng Hwang1, Chiung-Ying Wang2, Chung-Nan Lee3 1 Department of Electrical Engineering, Box 352500, University of Washington, Seattle, WA 98195, USA. Email: {xchen28, hwang}@uw.edu 2 Department of Information Management, TransWorld University, Yunlin, Taiwan, ROC. Email: ann@mail.twu.edu.tw 3 Department of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, ROC. Email: cnlee@cse.nsysu.edu.tw Abstract²In this paper, we propose a near optimal power allocation scheme, which maximizes the quality of experience (QoE), for scalable video coding (SVC) based video transmissions over multi-input multi-output (MIMO) systems. This scheme tries to optimize the received video quality according to video frame-error-rate (FER), which may be caused by either transmission errors in physical (PHY) layer or video coding structures in application (APP) layer. Due to the complexity of the original optimization problem, we decompose it into several sub-problems, which can then be solved by classic convex optimization methods. Detailed algorithms with corresponding theoretical derivations are provided. Simulations with real video traces demonstrate the effectiveness of our proposed scheme. Keywords²power allocation; QoE; SVC; MIMO; convex optimization I. INTRODUCTION Due to the exponentially increasing demands of wireless multimedia applications; offering higher quality video transmissions over wireless environments becomes an everlasting endeavor for multimedia service providers [1]. However, the error prone and band-limited nature of wireless channels creates obstacles for these bandwidth consuming applications [2]. Multi-input multi-output (MIMO) technology, which can provide more reliable and efficient wireless communications, has been considered as one of the solutions for better wireless video delivery [3]. Among plenty of existing MIMO techniques, spatial multiplexing (SM) approach, which simultaneously transmits independent data streams on each antenna to achieve higher spectral efficiency [4], is suitable for high data rate video transmissions. Scalable video coding (SVC) is another attractive technique in wireless video transmissions. Videos can be encoded with different temporal (frame rates), spatial (picture resolutions) and quality (image fidelity) scalabilities [5]. Parts of the encoded bit streams (higher enhancement layers) can be removed and the resulting substreams (base layer and lower enhancement layers) can still form another valid bit streams with lower resource consumptions but lower video qualities [6]. Therefore, SVC provides a way for adaptive switch between different video qualities according to different available resources or channel conditions at the user end. Significant amount of researches have been conducted in The research is based on work supported by the Ministry of Economic Affairs (MOEA) of Taiwan, under Grant number MOEA 102-EC-17-A-03-S1-214. transmitting SVC-based videos over MIMO-SM systems, which require jointly optimizations of physical (PHY) layer structures and video characteristics in application (APP) layer. Antenna selection is one of the major techniques to improve the video qualities. For instance, an adaptive channel selection (ACS) scheme has been proposed in [7]. In this system, bit streams with higher priorities will be transmitted through the antennas with higher signal-to-noise ratios (SNRs). A crosslayer dynamic antenna selection (CLDAS) scheme [8] is designed to jointly optimize the rate-distortion characteristics of source-channel encoding and the multiplexing-diversity tradeoff to mitigate the end-to-end video distortion. Power allocation has also been adopted in video transmissions with MIMO-SM techniques. A maximumthroughput delivery of SVC-based video over MIMO systems has been proposed in [1], in which the traditional capacityachieving water-filling (WF) algorithm [9] is improved when discrete modulation levels are considered in real applications. However, this scheme is targeted on improving the throughput of the system, which may not directly reflect quality of experience (QoE) of users. In [10], the proposed power allocation scheme can enhance the quality of SVC video streaming over MIMO systems by a modified WF (M-WF) algorithm such that unequal error protection (UEP) is applied by setting different bit-error-rate (BER) requirements on base layer and enhancement layers. Nevertheless, due to the empirical nature, the fixed BER requirements may not be optimal in different channel conditions. Transmission errors such as damaged or lost packets will degrade video qualities [11]. If SVC-based videos are transmitted, decoding errors in base layer frames will cause propagation errors in the corresponding enhancement layer frames. Moreover, directly minimizing BER in PHY layer will not necessarily minimize video frame error rate (FER) in APP layer due to different packet sizes and video coding structures. These characteristics motivate us to develop an effective power allocation scheme to optimize the received video QoE. In this paper, we propose a near optimal QoE-driven power allocation scheme for SVC-based video transmissions over MIMO-SM systems. Our proposed scheme tries to maximize the overall video quality based on video FER where video packet sizes, SVC layer structures and PHY layer BERs for different modulations are jointly considered. Due to the complexity of this optimization problem, we decompose it into several sub- problems which can then be solved by classic convex optimization methods. Detailed algorithms for searching the optimal solutions and its corresponding theoretical analyses are provided. Moreover, simulations with real SVC-based video traces are conducted to demonstrate the effectiveness of our proposed scheme. This paper is organized as follows. In the next section, system overview including SVC-based video coding and MIMO-SM system are described. Problem formulations are provided in Section III. In Section IV, we will describe our proposed optimization search algorithms together with theoretical analyses. Simulation results and conclusion remarks are given in Section V and VI respectively. Notations: Upper (lower) boldface letters are used for matrices (column vectors). diag(h) is a diagonal matrix with the elements of h sitting on the diagonal. 1 denotes a column vector all of whose components are one. (.)H means the Hermitian. (.)T is the transpose. IN denotes the N×N identity matrix. dom f means the domain of function f. II. SYSTEM OVERVIEW A. SVC-Based Video An SVC encoded video consists of base and enhancement layers in a hierarchical dependency structure, where the video layers with higher qualities can be processed when the corresponding low-quality video layers are successfully decoded. Therefore, the base layer is mandatory to decode all the other enhancement layers [12][13]. SVC can support all of the temporal, spatial and quality scalabilities. In this paper, we only consider videos encoded with quality scalability. However, the similar idea can be applied to videos with temporal or spatial scalabilities. The QoEs at user ends are normally measured in utility values [14]. In order to maximize the overall utilities of the recovered videos, we choose a perceptual quality model [15] for SVC-based videos with quality scalability: ul c 1 q q ­ e 1 min ° ® c1ql qmin c1ql 1 qmin e °̄e ,l 1 ,l t 2 , (1) where c is a video dependent model parameter; ql is the quantization stepsize of the lth video layer; and qmin is the minimum quantization stepsize which correspond to the video layer with highest quality. B. Proposed System Structure The proposed MIMO-SM system for SVC-based video transmissions is shown in Fig. 1. A video sequence is encoded into one base layer and L-1 enhancement layers, which are fed into a MIMO system with Nt (Nt/) transmitter antennas and Nr receiver antennas. At the transmitter side, an adaptive channel selection (ACS) module [1][7] is implemented so that video layers with higher importance, such as base layer, are transmitted through the channels (antennas) with higher SNR. The power allocation module allocates appropriate power to modulated symbols on each channel based on the cross-layer video information and channel state information (CSI) fed back from receiver side so that the overall utility is maximized. After precoding, the data symbols are transmitted through wireless channels. At the receiver side, a channel estimation module sends CSIs back to the transmitter side. In this paper, we assume the CSIs containing full channel knowledge are fed back without any estimation error and delay. Similar as in [1], we assume the channel selection sequences and modulation schemes are known at the receiver side through control channel. After decoding, detection, demodulation and channel selection, the received bit-streams are fed into SVC decoder for video reconstruction. We assume no error concealment techniques are applied in the system. Therefore, video frames with any single bit error are dropped. Moreover, if the lth layer is not successfully decoded, all higher layers (i.e., from l+1 to L) of this frame are also dropped. C. MIMO Channel Model The system equation can be described as: y Hx n , (2) where y is an Nr×1 complex received signal vector, x is an Nt×1 complex transmitted symbol vector with E[xxH]=diag(p) =diag(p1, p2« pNt), subject to normalized power 1Tp=1 and each element in p is not less than 0. n is an Nr×1 independent and identically distributed (i.i.d.) complex additive white Gaussian noise (AWGN) vector. H is Nr×Nt channel matrix in Fig. 1. Proposed MIMO System for SVC-Based Video Transmissions which all elements are i.i.d. circularly symmetric complex Gaussian (ZMCSCG) random variables with zero mean and variance 1, i.e., ^0,1` . Therefore, the average SNR of sk · § l max ¦ ul ¨ 1 Pbk pk ¸ p l ©k1 ¹ subject to pk t 0; ¦ k 1 pk L the system is ʌ=1Tp /N0=1/N0. MIMO channel matrix H can be decomposed by the singular value decomposition (SVD): H U/V H , (3) where U and V are unitary matrices. ȁ is a diagonal matrix specified as: / diag O1 , O2 ,..., OR ,0,...,0 , (4) where R=min(Nr, Nt) is the rank of channel matrix H, and Ȝ1 Ȝ2« ȜR are eigenvalues of H H H . If correct and full channel knowledge is available at both transmitter and receiver sides, the symbols are precoded with V before transmission and decoded with UH at receiver. Therefore, the received signal before detection can be expressed as: y UH HVx UH n /x n . (5) H Since U is a unitary matrix, each element of n=U n still follows the complex Gaussian distribution, i.e., 0, N 0I N . It is clear that by using precoder and ^ r ` decoder, a MIMO channel can be decomposed into R independent single-input single-output (SISO) channels [1]. III. PROBLEM FORMULATION In error prone wireless channels, the receiver bit error rate (BER) of M-QAM can be approximated as [16]: 1 · § 2 ¨1 ¸ § M ¹ ¨ © Pb | Q¨ log 2 M ¨ © § 3log 2 M ¨ ¨ M 1 © ¸· 2E · ¸ , ¸ N 0 ¸¸ ¹ ¹ b where pl is the lth element of power vector p. Suppose there are sl bits in total for transmitting the lth layer of a single video frame. The FER of layer l can be calculated as: l fl p 1 1 Pbk pk . sk (8) k 1 Our optimization problem is to maximize the system utility based on video frame error rate (FER) subject to certain power constraints: 1. Here, we consider a linear mapping between utilities and FER for simplicity. In fact, the actual utility model, as a decreasing function of FER, can vary when different error concealment techniques are applied. Thus, minimizing FER will be more general in real applications. Furthermore, directly solving optimization problem in Eq. (9) is not an easy task. Therefore, we decompose Eq. (9) into L subproblems: if up to the lth layer is allowed to be transmitted, the corresponding frame correction rate of layer l, denoted as f l p 1 f l p , can be optimized: min log fl p p ¦ s l k log 1 Pbk pk (10) k 1 subject to pk t 0; ¦ l k 1 pk 1. Note that in this case, pl+1=pl+2 « pL=0 are implied since the layers higher than l are not allowed to be transmitted. If th p*l denotes the solution of the l sub-problem in Eq. (10), in our proposed scheme, the solution of Eq. (9) is found by: (11) p* arg max uk f k p*l . ¦ p*l k Please note that as the original problem is solved by finding the best among solutions of the sub-problems, the solution of Eq. (11) is a near-optimal solution of the original problem. In Section V, we will demonstrate the effectiveness and the near-optimality of the proposed scheme comparing with the global optimal points obtained by exhaustive searches. IV. (6) where M is the number of constellation points; Q(.) is the complementary error function, and Eb /N0 is the average bit energy to average noise power ratio. Since SNR can be calculated from Eb /N0, i.e. SNR=log2(M)×Eb/N0, in our proposed MIMO-SM system, BER for the lth channel can be derived as: § 1 · 2 ¨1 ¸ ¨ · M l ¸¹ § § 3 · © , (7) Pbl pl | Q¨ ¨ ¸ UOl pl ¸ ¨ © M l 1 ¹ ¸ log 2 M l © ¹ (9) ALGORITHM DESIGN A. Log-Concavity of Objective Functions in Sub-problems To simplify the notations, we express Eq. (7) as: Pbl pl | AQ l Bl UOl pl Al Al ) Bl UOl pl , (12) where Al and Bl are corresponding constants in Eq. (7) and their values are determined by Ml; ) (.) is the cumulative distribution function of the standard normal distribution. The objective function in Eq. (10) can be expressed as: log fl p ¦ s l k k 1 log 1 Ak Ak ) Bk UOk pk . (13) As stated in [17], a function g(x) is log-concave if and only if for all x dom g x , g x g '' x d g ' x , 2 (14) where g¶x) and g¶¶x) are, respectively, the first and second derivatives of function g. If we define function gk as: gk pk 1 Pbk pk 1 Ak Ak ) Bk UOk pk , (15) which is non-negative. Its first derivatives gk¶is: g k ' pk Ak Bk UOk 2 pk I Bk UOk pk Ak Bk UOk 2 2S pk e B UO p k k k 2 wL p, ȟ,X pk ,(16) where I (.) is the probability density function (pdf) of the standard normal distribution. And its second derivative is: g k '' pk Ak Bk UOk 4 2S e Bk UOk pk 2 p 1.5 k Bk UOk pk 0.5 ,(17) 0 pk* ,[k* ,X * sk Ak I Bk UOk pk* 2 1 Ak Ak ) [ Bk UOk Bk UOk p * k * k (19) * k X* 0. p For convex optimization problems, if any point satisfies the KKT conditions, it is primal and dual optimal, with a zero duality gap [17]. which is non-positive for pk. Due to gk¶¶pk)gk(pk (gk¶pk))2, gk(pk) is log-concave. Also, Eq. (13) is nonnegative sum of convex functions, which is also convex [17]. Therefore, the optimization problem in Eq. (10) can be solved by classic convex optimization methods. Examples of log(gk(pk)) are plotted in Fig. 2. The above KKT conditions imply: sk Ak I X t * Bk UOk pk* 2 1 Ak Ak ) Bk UOk Bk UOk p * k (20) * k p and ·¸ p 0. (21) f (22) § * ¨ * sk Ak I Bk UOk pk Bk UOk ¨X 2 1 Ak Ak ) Bk UOk pk* pk* ¨ © * k ¸ ¸ ¹ There are two cases when Eq. (21) holds: 1) pk* 0 , which implies: X* t 2) Ll p, ȟ,X l ¦ sk log 1 Ak Ak ) k 1 Bk U pk Ok (18) § l · ¦ [ k pk X ¨ ¦ pk 1¸ , k 1 ©k 1 ¹ l where ȟ and X are Lagrange multipliers associated with the inequality constraints and equality constraint respectively. The Karush-Kuhn-Tucker (KKT) conditions can be written for each value of k «l as: 1. Primal feasible: pk* t 0; 2. Dual feasible: ¦p * k 1. [ p * k 3. Complementary slackness: 4. Gradient of Lagrangian vanishes: Bk UOk 2 1 Ak Ak ) 0 0 X * sk Ak I Bk UOk pk* 2 1 Ak Ak ) . (23) pk* Case 1 is trivial since when pk* 0 , the video layers higher than k are not allowed to be transmitted, and it is equivalent to solving the (k-1)th sub-problem. Therefore, we only consider Case 2 specified in Eq. (23). And for different X * , any solution p*l satisfying Eq. (23) and power constraint ¦ l k 1 pk th 1 is an optimal point of the l sub-problem in Eq. (10). C. Proposed Algorithm Based on Eq. (23), we define the function hk as: 0. Bk UOk Bk UOk pk* hk pk* log 2 2S 1 Ak Ak ) [k* t 0 . * k pk* ! 0 , which implies: Fig. 2. Examples of log(gk(pk)) when ʄk=1 and ʌ=1. B. Conditions of Optimal Solutions in Sub-problems The Lagrangian of the lth sub-problem, in Eq. (10), can be derived as: sk Ak I 0 log sk Ak Bk UOk Bk UOk pk* p (24) * k Bk UOk pk* . 2 The proposed bisection search algorithm is shown in Fig. 3, which is to find the optimal point p*l of the lth sub-problem. 1. upper= min( hk (1) ), for k «l 2. lower= max(hk (ǻIRUk «l 3. while( ¦p * k 1 ! ' ) 4. ȝ=(upper+lower)/2; 5. pk* 6. if ( ¦ hk1 P ; l k 1 pk* 1 ) lower= ȝ; 7. 8. else upper= ȝ; 9. 10. end if 11. end while Fig. 3. Proposed Bisection Search Algorithm for the lth Sub-problem. +HUHǻLVDVPDOOSRVLWLYHQXPEHUDQGLVVHWDVLQRXU implementation. The overall optimization problem in Eq. (9) is solved by the proposed algorithm shown in Fig. 4. 1. Umax=0; 2. for l=1:L 3. Obtain p*l by solving the lth sub-problem 4. Ul 5. if (UlUmax) ¦ l k 1 Umax=Ul; 7. p* 9. Figure 5 illustrates a snapshot of the system utilities calculated by the objective function in Eq. (9). The optimal curve, obtained by exhaustive searches, is included for comparison. QPSK modulation is adopted for all the video layers. The average SNR is set as 13dB. There are 30 simulation results included with different channel matrices. It is clear that our proposed algorithm is very close to the optimal solutions. Even though WF algorithm is optimal in terms of PHY layer capacities, it is no longer optimal in APP layer utilities. M-WF is better than WF since UEP scheme on base layer and enhancement layers is applied. But it is still far from optimal. f k p*l uk ; 6. 8. layers of City are u=[0.5459, 0.2749, 0.0826, 0.0966] with an empirically chosen c=0.13 [15]. The encoded network abstraction layer unit (NALU) is fragmented by link layer with packet size as 48 bytes [8] and then transmitted through PHY layer. A 4×4 MIMO-SM system is applied with 100 KHz bandwidth. The CSI is fed back every channel coherent time, which is assumed to be 1ms. At the receiver side, we assume the packets containing control messages such as video coding parameters are correctly received. Also, perfect error detection scheme is assumed so that bit errors at the receiver side can be detected. The undecodable NALUs, including erroneous bits caused by channel degradation or unsatisfied dependencies, are discarded before passing through the SVC decoder. We compare our proposed scheme with traditional WF algorithm, M-WF algorithm in [10] and the simple equal power allocation scheme in the simulations. p*l ; end if end for Fig. 4. Proposed Algorithm for the Optimization Problem in Eq. (9). V. SIMULATION RESULTS In this section, the effectiveness and the near-optimality of our proposed algorithm are evaluated through plenty of simulations. Video clips Foreman and City with resolution of 352×288 are encoded by the JSVM (Joint Scalable Video Model) version 9.19 [18]. Frame rates are both set as 30fps. GOP sizes are 8 with frame the pattern: IBBBBBBB. There are 161 frames encoded in total so that 20 GOP groups with one additional I frame are included. Three additional quality enhancement layers are encoded with medium-grain scalability (MGS). The basis quantization parameters of the four layers (i.e., one base layer and three enhancement layers) are set as QP=[45, 38, 35, 30] and the corresponding uniform quantization stepsizes can be calculated by q=2(QP4)/6 [15]. Based on Eq. (1), the utilities of the four layers of Foreman are u=[0.5719, 0.2614, 0.0772, 0.0895] with an empirically chosen c=0.12 [15]. The utilities of the four Fig. 5. Snapshot of System Utilities (QPSK, SNR: 13dB) The successfully received NALUs by the four schemes, with the same seeds for random number generations of wireless channel environments, are fed into SVC decoder to reconstruct the videos. Figure 6 shows the PSNRs of reconstructed Foreman clip when the average SNR is 18dB and QPSK modulation is used for all the video layers. It is clear that our proposed method outperforms the other three, even though our optimization objective function is not PSNR. This is due to the fact that by applying our proposed scheme with reasonable utility functions, more video frames with higher quality layers are received. Since NALU sizes are included in our objective function, unequal error protection (UEP) capability on lower layers of large NALUs, such as I-frames, is naturally inherent in our scheme. Moreover, the better receiving of I-frames also leads to higher PSNR of successive B-frames in the same GOP. Note that M-WF algorithm is not necessarily a good choice when transmitting NALUs with small sizes (i.e., B frames). This is due to the fact that over-protection of base layer may lead to waste of power. terms of utility. Moreover, by applying our proposed scheme with real-world SVC video traces, users can receive more error-free video frames with higher quality layers, which lead to better PSNR or QoE for the reconstructed videos. Similar results of City are plotted in Fig. 7. Here, the system average SNR is set as 20dB. The modulation schemes are QPSK, 16-QAM, 16-QAM, 64-QAM for video layer 1, 2, 3, and 4 respectively. Clearly, our proposed scheme has higher PSNR than that of the other three schemes. Since the BER of different modulation schemes are part of our objective function, our proposed scheme has much more obvious advantage comparing with the other methods. [1] REFERENCES [2] [3] [4] [5] [6] [7] [8] [9] Fig. 6. PSNR of Reconstructed Video (Foreman, QPSK, SNR: 18dB) [10] [11] [12] [13] [14] [15] Fig. 7. PSNR of Reconstructed Video (City, l1: QPSK, l2: 16-QAM, l3: 16QAM, l4: 64-QAM, SNR: 20dB) [16] VI. CONCLUSION In this paper, we have proposed a near-optimal QoEdriven power allocation scheme for SVC-based video transmissions over a MIMO-SM system. Detailed algorithms are described with theoretical reasoning. 