Proposal

advertisement
Smoooth Streaming over wireless Networks
Sreya Chakraborty
Interim Report
EE-5359
Abstract: Smooth streaming is a serious problem since bandwidth is a natural resource and it is
limited. In this paper the implications of video traffic smoothing on the numbers of statistically
multiplexed H.264 SVC,H.264/AVC, and MPEG-4 Part 2 streams, the bandwidth requirements
for streaming, and the introduced delay are examined. SVC enables the transmission and
decoding of partial bit streams to provide video services with lower temporal or spatial
resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the
rate of partial bit streams. Here two algorithms are proposed for compressive multimedia streams
to considerate level.
Introduction: Smooth streaming is a challenge in areas where bandwidth is low or limited. In
most of the cases for streaming video and audio data UDP was found useful over TCP, since
TCP introduces various delays. It also waits for the receipt of acknowledgement causing delay in
the frame arrival. The loss of data is acceptable to certain extent but not the delay caused.
Modern video transmission and storage are based on RTP/IP for real time services. Most RTP/IP
access networks are typically characterized by a wide range of connection qualities and receiving
devices. The varying connection quality is due to adaptive resource sharing mechanisms of these
networks. Traditional digital video transmission and storage systems are based on H.222.0,
H.320 [7] for broadcasting services over satellite, cable, and terrestrial transmission channels, for
DVD storage and for conversational video conferencing services. International video coding
standards H.262, H.263 and MPEG-4 already include several tools by which the most important
scalability modes can be supported. But the characteristics of traditional video transmission
systems and the quality scalability features came with a significant loss in coding efficiency as
well as a large increase in decoder complexity. Simulcast provides similar functionalities as a
scalable bit stream.
Scalable video coding extension of the H.264/AVC with its hierarchical B-frames compresses
single layer video. H.264/AVC and H.264 SVC video encoding are expected to be widely
adopted for wired and wireless network video transport due to their increased compression
efficiency compared to MPEG-4 and their widespread inclusion in application standards. The
compression efficiency of a video codec is generally characterized with a rate distortion curve[2]
that shows the bit rate of the compressed video stream as a function of the video quality
(distortion), which is typically measured in terms of the Peak Signal to Noise Ratio (PSNR). For
a given video quality, the lower the compressed bitrate, the more efficient is the compression.
The improvements in rate-distortion (RD) compression efficiency with H.264 SVC and
H.264/AVC come at the expense of significantly increased variabilities of the encoded frame
sizes (in bits).
The recently developed H.264/AVC video codec with Scalable Video Coding (SVC) extension,
compresses non-scalable (single-layer) and scalable video significantly more efficiently than
MPEG–4 Part 2. Since the traffic characteristics of encoded video have a significant impact on
its network transport, the bit rate-distortion and bit rate variability-distortion performance of
single-layer video traffic of the H.264/AVC codec and SVC extension using long CIF resolution
videos is examined. The traffic characteristics of the hierarchical B frames (SVC) versus
classical B frames is compared. In addition, we examine the impact of frame size smoothing on
the video traffic to mitigate the effect of bit rate variabilities. Compared to MPEG–4 Part 2, the
H.264/AVC codec and SVC extension achieve lower average bit rates at the expense of
significantly increased traffic variabilities that remain at a high level even with smoothing.
Through simulations we investigate the implications of this increase in rate variability on (i)
frame losses when transmitting a single video, and (ii) on the number of supported video streams
in a bufferless statistical multiplexing scenario with restricted link capacity and information loss.
In general, video can be encoded (i) with fixed quantization scales, which results in nearly
constant video quality at the expense of variable video traffic (bit rate), or (ii) with rate control,
which adapts the quantization scales to keep the video bit rate nearly constant at the expense of
variable video quality. In order to examine the fundamental traffic characteristics of the
H.264/AVC video coding standard, which does not specify a normative rate control mechanism,
primarily on encodings with fixed quantization scales is focused. An additional motivation for
the focus on variable bit rate video encoded with fixed quantization scales is that the variable bit
rate streams allow for statistical multiplexing gains that have the potential to improve the
efficiency of video transport over communication networks. The development of video network
transport mechanisms that meet the strict playout deadlines of the video frames and efficiently
accommodate the variability of the video traffic is a challenging problem. A wide array of video
transport mechanisms has been developed and evaluated, based primarily on the characteristics
of MPEG–2 and MPEG–4 Part 2 encoded video. The widespread adoption of the new
H.264/AVC video standard necessitates the careful study of the traffic characteristics of video
coded with the new H.264/AVC codec and its extensions. Therefore, it is necessary to examine
the new video encoder’s statistical characteristics and compression performance from a
communication network perspective. We study the Main profile of the H.264/AVC encoder
using long Common Intermediate Format (CIF) 352x288 pixel resolution sequences. Our study
of the newest H.264 SVC extension analyzes single-layer (non-scalable) video traffic
characteristics of long CIF videos, i.e., although the H.264 SVC single-layer encoding supports
temporal scalability, we group the individual temporal layers and consider the aggregate stream.
H.264/AVC and H.264 SVC single-layer video traffic is significantly more variable than
MPEG–4 Part 2 traffic under similar encoding conditions. At the same time, we confirm the
significant average bit rate savings. The increased bit rate variability is observed over a wide
range of average qualities of the encoded streams and for all tested video sequences. This makes
the transport of H.264/AVC and H.264 SVC single-layer traffic more challenging than MPEG–4
Part 2 traffic.
SVC’s temporal scalability is built on the hierarchical prediction concept for B frames.
Temporal Scalability with Hierarchical B Frames: The introduction of hierarchical B frames has
allowed the H.264 SVC encoder to achieve temporal scalability while at the same time
improving RD efficiency compared to the classical B frame prediction method employed by the
older MPEG standards (MPEG–1/2/4-Part 2) and by default in H.264/AVC. In Fig. 1, we
illustrate both concepts for predicting B frames.
Hierarchical B frames are an important new concept that was first introduced in H.264/AVC
using generalized B frames and was later found to be the best method to build the Scalable Video
Coding (SVC) extension on. Hence, the H.264 SVC encoded single-layer stream is decodable by
existing H.264/AVC codecs. The scalability modes do require new SVC capability, with the
supported modes depending on the applications or equivalently on the H.264 SVC profiles.
Fig. 1(a) depicts the classical B frame prediction structure, where each B frame is predicted
only from the preceding I or P frame and from the subsequent I or P frame. Other B frames are
not referenced since this is not allowed by video standards preceding H.264/AVC. This
restriction is lifted in the generalized B frame paradigm that was first introduced in the
H.264/AVC standard. Fig. 1(b) depicts the hierarchical B frame structure which uses B frames
for the prediction of B frames. The illustrated case is the dyadic hierarchy of B frames, meaning
that the number of B frames n in between the key pictures (I or P frames) equals n = 2k ¡ 1. The
hierarchy with 3 B frames (I frame period is 16) is depicted in Fig. 1(b). In this example, the
frame sequence is I0B2B1B2P0B2B1B2P0B2B1B2P0B2B1B2, where the index represents the temporal
layer number.
The coding efficiency of hierarchical B frames depends on the number of hierarchical B frames
(temporal levels) and on the choice of quantization parameters for each B frame. Therefore,
H.264 SVC introduces cascading quantizers which assign a higher quantization parameter value
(lower quality) to B frames belonging to higher temporal layers. This concept is based on the
insight that the lowest temporal layer 0 requires higher quality than the next temporal layer, since
all other predictions depend on it. The quality of each subsequent temporal layer can be
gradually reduced since fewer layers depend on it. Apparently the quality fluctuation that is
introduced within a GoP is not subjectively noticeable by human observers, as studied by the
standard committee.
For a video sequence consisting of M frames encoded with a given quantization scale, we let Xm
(m = 1; : : : ; M) denote the sizes [bits]
the encoded video frames. The mean frame size X [bits] of the encoded video sequence is
defined as
𝑀
1
𝑋 = ∑ π‘‹π‘š
𝑀
π‘š=1
While the variance 𝑆π‘₯2 of the frame sizes (𝑆π‘₯ is the standard deviation [bits] ) is defined as
𝑀
1
𝑆π‘₯2 = ∑ (π‘‹π‘š − 𝑋)2
𝑀
π‘š=1
The coefficient of variation of frame sizes [unit free] is defined as
πΆπ‘œ 𝑉π‘₯ =
𝑆π‘₯
𝑋
Fig.1 B frame prediction structures [8]
GoP Structure Comparison
Selected RD graphs for the Silence of the Lambs sequence encoded with H.264/AVC, H.264
SVC, and MPEG–4 Part 2 are depicted in Fig. 2(a), (c), and (e). Each figure depicts the RD
curves for all GoP structures for a particular encoder. We observe that the H.264/AVC encoder
achieves the best RD performance for GoP structure G16-B3 with almost coinciding RD curves.
For the MPEG–4 Part 2 encoder the RD efficiency decreases significantly with increasing
number of B frames in the GoP structures. Contrary to these two encoders, the H.264 SVC
encoder achieves best RD performance for the G16-B15 GoP structure and lowest for G16-B1.
From RD comparison plots between all three encoders, not included due to space constraints, we
find that for GoP structure G16-B1, H.264/AVC and H.264 SVC have comparable RD
performance. However, H.264 SVC increasingly outperforms H.264/AVC for GoP structures
G16-B3 to G16-B15.
We observe that the better the RD performance of a particular GoP structure, the higher the
corresponding traffic variability.
In the subsequent experiments, we employ four different GoP structures, namely
IBPBPBPBPBPBPBPB (16 frames, with 1 B frame per I/P frame), which we denote by G16-B1,
IBBBPBBBPBBBPBBB (16 frames, with 3 B frames per I/P frame) denoted by G16-B3,
IBBBBBBBPBBBBBBB (16 frames, with 7 B frames per I/P frame) denoted by G16-B7, and
IBBBBBBBBBBBBBBB (16 frames, with 15 B frames per I frame) denoted by G16-B15. In the
context of SVC, these four GoP structures are respectively designated by their “GoP size” which
is the number of hierarchical B frames plus one key picture, either of type I or P. Hence, G16-B1
has GoP size 2, G16-B3 has GoP size 4, G16-B7 has GoP size 8, and G16-B15 has GoP size 16.
In the following, we employ our own GoP structure notation to emphasize the repetitive I-P-B
frame type patterns in the encodings and to avoid confusion. These four GoP structures are
natural structures for hierarchical B frames and allow us to compare the three encoders based on
identical underlying GoP patterns.
We employ the H.264/AVC encoder in the Main profile with all compression tools enabled, as
specified in Section III-B, i.e., using variable block sizes, three reference frames for the past and
the future, referenced B frames, P and B frame weighted prediction, CABAC, and rate-distortion
optimization (RDO). We designate these settings by “Full-RDO”. The H.264 SVC settings are
similar.
We use the MPEG–4 Part 2 encoder in the Advanced Simple profile (ASP) to encode the
sequences, for comparison with the H.264/AVC encodings. This ASP profile adds B frames to
the Simple profile. We employ half pixel motion compensated prediction; RDO is not supported
by the reference encoder implementation. The MPEG–4 Part 2 encoder uses one reference frame
for the past and one for the future, and 16 £ 16 blocks for motion estimation that can be split into
8 £ 8 blocks.
GoP Structure Comparison
Selected RD graphs for the Silence of the Lambs sequence encoded with H.264/AVC, H.264
SVC, and MPEG–4 Part 2 are depicted in Fig. 2(a), (c), and (e). Each figure depicts the RD
curves for all GoP structures for a particular encoder. We observe that the H.264/AVC encoder
achieves the best RD performance for GoP structure G16-B3 with almost coinciding RD curves.
For the MPEG–4 Part 2 encoder the RD efficiency decreases significantly with increasing
number of B frames in the GoP structures. Contrary to these two encoders, the H.264 SVC
encoder achieves best RD performance for the G16-B15 GoP structure and lowest for G16-B1.
From RD comparison plots between all three encoders, not included due to space constraints, we
find that for GoP structure G16-B1, H.264/AVC and H.264 SVC have comparable RD
performance. However, H.264 SVC increasingly outperforms H.264/AVC for GoP structures
G16-B3 to G16-B15.
Two algorithms are proposed that compress both audio-video and since they have a linear time
complexity they will use up the least amount of bandwidth. Hence in the most fluctuating
network traffic the smoothest possible audio-video conferencing environments can be achieved.
The two problems faced are: Recovery of original data after the decompression phase with a
relatively lower compression ratio or achieve a higher compression ratio but only losing more
data.
ALGO-1 [6] is depicted in Fig. 2. It takes two bytes from the multimedia byte streams which are
denoted as Uncompressed Byte-1 and Uncompressed Byte-2 in the figure. After that the four
most significant bits of each of the uncompressed bytes are placed in a compressed byte denoted
in the figure as Compressed Byte. Initially, we are placing the four most significant bits (a7, a6,
a5, a4) of Uncompressed Byte-1 into the four most significant bit positions (c7,c6,c5,c4) of
Compressed Byte. We then place the four most significant bits (b7, b6, b5, b4) of
Uncompressed Byte-2 into the four least significant bit positions (c3,c2,c1,c0) of Compressed
Byte-1. This concludes the compression phase of ALGO-1. [5]. The compressed byte is then sent
over the network to the receiver where it is decompressed into two bytes depicted in the figure as
Decompressed Byte-1 and Decompressed Byte-2. Bits (c7,c6,c5,c4) are placed into bit
positions (d7,d6,d5,d4) of Decompressed Byte- 1 respectively. On the other hand bits
(c3,c2,c1,c0) are placed into the byte labeled Decompressed Byte-2 occupying positions
(e7,e6,e5,e4) respectively. Now the dilemma arises about what bit values should be placed in the
four least significant bit positions of the decompressed bytes. As far as Fig. 1 is concerned we
have padded both the least significant bits with zeros.
ALGO-1_COMPRESS(U)
C[length[U]/2]
j 0
for i 0 to length[U]-1 a U[i] b U[i+1] c 0
(c7, c6 ,c5,c4) (a7, a6, a5, a4)
(c3, c2 ,c1,c0) (b7, b6, b5, b4)
C[j] c
j j+1
i i+1
return C
Consequently, a decompression procedure for ALGO-1:
ALGO-1_DECOMPRESS(C)
D[length[C]*2] j 0
for i 0 to length[C]-1 c C[i]
d 0 e 0
(d7, d6 ,d5,d4) (c7, c6, c5, c4) (e7, e6 ,e5,e4)
D[j] d
D[j+1] e if d>0
d d+10
else
d d-10 if e>0
e e+10
else
e e-10
(c0, c1, c2, c3)
j
j+1
i i+1
return D
Fig. 2 Schematic diagram of ALGO-1 (Compression and Decompression) [6]
ALGO-2 [6]-This algorithm compresses four consecutive bytes of a multimedia stream to three
compressed bytes. The whole compression and decompression processes are depicted in Fig.3
and Fig.4 respectively. As shown in Figure 3, the four most significant bits (a7,a6,a5,a4) of
Uncompressed Byte-1 are placed in the four most significant bit positions (e7,e6,e5,e4) of
Compressed Byte-1 respectively. The next step is to place the four most significant bits
(b7,b6,b5,b4) of Uncompressed Byte-2 are placed in (e3,e2,e1,e0) of Compressed Byte-1. We
repeat the previous step for Uncompressed Byte-3 and Uncompressed Byte-4, in which bits
(c7,c6,c5,c4) and (d7,d6,d5,d4) from both the uncompressed bytes are copied to Compressed
Byte-3, i.e. the bits from Uncompressed Byte-3 are placed in (g7,g6,g5,g4) and those of
Uncompressed Byte-4 being placed in (g3,g2,g1,g0).
On receiving the three compressed bytes, we map the bits in the following way as illustrated in
Fig. 4:
(i) (e7,e6,e5,e4) of Compresse Byte-1 to (h7,h6,h5,h4) of Decompressed Byte-1 respectively.
(ii) (e3,e2,e1,e0) of Compressed Byte-1 to (i7,i6,i5,i4) of Decompressed Byte-2
respectively.
(iii) (g7,g6,g5,g4) of Compressed Byte-3 to (j7,j6,j5,j4) of Decompressed Byte-3
(iv) (g3,g2,g1,g0) of Compressed Byte-3 to (k7,k6,k5,k4) of Decompressed Byte-4
respectively.
(v) (f7,f6) of Compressed Byte-2 to (h3,h2) of Decompressed Byte-1 respectively.
(vi) (f5,f4) of Compressed Byte-2 to (i3,i2) of Decompressed Byte-2 respectively.
(vii) (f3,f2) of Compressed Byte-2 to (j3,j2) of Decompressed Byte-3 respectively.
(viii) (f1,f0) of Compressed Byte-2 to(k3,k2) of Decompressed Byte-4 respectively.
Fig. 3 Compression process of ALGO-2 [6]
Fig. 4 Decompression process of ALGO-2 [6]
References:
[1] H.Schwarz, D.Marpe, and T.Weigand, “Overview of the scalable video coding extension of
the H.264/AVC standard,” IEEE Trans. Circuits and Systems for Video Technology, vol 17,
no.9, pp.1103-1120, Sep.2007.
[2] G.Van der Auwera and M.Reisslein, “Implications of smooth streaming on statistical
multiplexing of H.264/AVC and SVC video streams,” IEEE trans. Broadcasting, vol.55, no.3,
pp.541-558, Sep.2009.
[3] M.Wien, H.Schwarz, and T.Oelbaum, “Performance analysis of SVC,” IEEE Trans. Circuits
and Systems for Video Technology, vol.17, no.9, pp.1194-1203, Sep.2007.
[4] G.Vander der Auwera, P.T.David, and M.Reisslein, “Traffic characteristics of H.264/AVC
variable bit rate video,” IEEE Communications Magazine, vol.46, no.11, pp.164-174, Nov.2008.
[5] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, “Introduction to Algorithms,”
First Edition 1990, MIT press and McGraw-Hill, Cambridge, MA, USA.
[6] T.R. Rahman, M. Rahman, “ Compression algorithms for audio-video streaming” IEEE
Conference Intelligent systems, modeling and simulation, pp. 187-192, 2010.
[7] ITU-T and ISO/IEC JTC 1, “Generic coding of moving pictures and associated audio
information-Part 1: Systems,” ITU-T Recommendation H.222.0 and ISO/IEC 13818-1(MPEG-2
Systems), Nov.1994.
[8] G.Vander der Auwera, P.T.David, and M.Reisslein, “Traffic and quality characterization of
single-layer video streams encoded with the H.264/MPEG-4 advanced video coding standard and
scalable video coding extension,” Broadcasting, IEEE Transactions, vol.54, no.3, pp.698-718,
Aug.2008.
Download