Final Rep. - The University of Texas at Arlington

advertisement
Investigation of Image Quality of Dirac, H.264 and H.265
Biju Shrestha (UTA ID: 1000113697 Email: biju.shrestha@mavs.uta.edu)
The University of Texas at Arlington
416 Yates Street, Arlington, Texas 76019-0016
Acronyms and Abbreviations
AVC
advanced video coding
BBC
British Broadcasting Corporation
CBR
constant bit rate
CIF
Common intermediate format
CODEC
coder and decoder
CSIQ
Categorical subjective image quality
CSNR
channel signal to noise ratio
dB
decibel
FRExt
fidelity range extensions
FSIM
featured similarity index
GM
gradient magnitude
HEVC
high efficiency video coding
HVS
human visual system
IEC
international electrotechnical commission
ISO
international organization for standardization
IST
integer sine transform
ITU-T
international telecommunication union - telecommunication standardization sector
IVC
Images and video communications
Interim Report for EE 5359: Multimedia Processing
JPEG
joint photographic experts group
kbps
kilobits per second
LIVE
laboratory for image and video engineering
MICT
media information and communication technology laboratory
MPEG
moving picture experts group
MSE
mean squared error
MS SSIM
multi scale structural similarity metric
MSU
Moscow State University
PC
phase congruency
PSNR
peak signal to noise ratio
QCIF
Quarter common image intermediate format
RGB
red, green and blue
SSIM
structural similarity metric
TID2008
Tampere image database 2008
VBR
variable bit rate
VCEG
video coding experts group
Abstract
There exist several standards for video compression with additional improvements in
performance and qualities in comparison to their older versions [2]. The image quality of Dirac,
H.264 and H.265 can be investigated using metrics like PSNR, CSNR, MSE, SSIM, MS SSIM,
and FSIM [3, 5, and 7] using various test sequences. The conventional metrics like PSNR and
Interim Report for EE 5359: Multimedia Processing
MSE are a measure of intensity and cannot measure the subjective fidelity [3]. The metrics like
SSIM and FSIM takes into an account of human visual system.
Introduction
Video codec is a tool which is used to compress and decompress the digital video [2]. There are
several types of video compression methods. Few of them that are going to be discussed in this
project are Dirac, H.264 and H.265 [1-3].
Dirac
Dirac video codec was initially developed by BBC Research [1]. It is an open source software
project and is powerful and flexible despite using only small number of core tools [1]. The
several features that Dirac offers are [1]:

Multi-resolution transforms

Inter and intra frame coding

Frame and field coding

Dual syntax

CBR and VBR operations

Variable bit depths.

Multiple chroma sampling formats

Lossless and lossy coding

Choice of wavelet filters

Simple stream navigation
Interim Report for EE 5359: Multimedia Processing
Dirac has three main strands [15]. First is a compression specification for the byte stream and the
decoder [15]. Second is software for compression and decompression and third are the
algorithms designed to support simple and efficient hardware implementations [15]. Dirac
despite being similar to many video coding systems had additionally adopted the combined
effectiveness, efficiency and simplicity. The encoder and decoder architectures of Dirac are
shown respectively in figures 1 and 2.
Figure 1. Dirac encoder architecture [15]
Interim Report for EE 5359: Multimedia Processing
Figure 2. Dirac decoder architecture [18]
H.264
H.264 is also referred as AVC and it is a standard for video compression [2]. H.264/MPEG-4
AVC is one of the international video coding standards jointly developed by the VCEG of the
ITU-T and the MPEG of ISO/IEC [11]. It provides enhanced coding efficiency for a wide range
of applications like video telephony, video conferencing, TV, storage, streaming video, digital
video authoring, digital cinema, etc. [11]. In addition, the FRExt provides enhanced capabilities
relative to the base specification [11].
H.264 does not have a predefined CODEC but has the predefined syntax for encoding and
decoding bit stream as shown in figures 3 and 4 respectively [1]. The various profiles of H.264
are shown in figure 5.
Interim Report for EE 5359: Multimedia Processing
Figure 3. H.264 encoder [2]
Figure 4. H.264 decoder [2]
Figure 5. Various profile of H.264 [12]
Interim Report for EE 5359: Multimedia Processing
H.265
H.265 is also known as HEVC [3] and it can deliver significantly improved compression
performance relative to that of the AVC (ITU-T H.264 | ISO/IEC 14496-10) [10]. Alshina et al
[16] investigated the coding efficiency with high resolution, HD 1080p, and concluded that it can
be increased by average 37% and 36% bit savings for hierarchical B structure and IPPP structure
when compared to MPEG-4 AVC [16]. The typical block-based video codec is composed of
many processes including intra prediction and inter prediction, transforms, quantization, entropy
coding, and filtering [17] as shown in Figure 6. Over the decade, video coding techniques have
gone through intensive research to achieve higher coding efficiencies [17].
Figure 6. Encoder block diagram of H.265. Grey boxes are proposed tools and white boxes are
H.264/AVC tools [17]
Interim Report for EE 5359: Multimedia Processing
Figure 7. Decoder block diagram of H.265. Grey boxes are proposed tools and white boxes are
H.264/AVC tools [27]
Image Quality Assessment using SSIM and FSIM
Digital images and videos are prone to different kinds of distortions during different phases like
acquisition, processing, compression, storage, transmission, and reproduction [5]. This
degradation results in poor visual quality. There are several metrics which are widely used to
quantify the image quality like FSIM, SSIM, bitrates, PSNR and MSE [3, 8, 13, 14]. The
conventional metrics like PSNR and MSE are directly dependent on the intensity of an image
and do not correlate with the subjective fidelity ratings [3]. MSE cannot model the human visual
system very accurately [4].The measured parameters like PSNR, MSE, and SSIM of Dirac,
H.264, and H.265 will be compared to study their comparative characteristics and make
conclusions.
Interim Report for EE 5359: Multimedia Processing
SSIM is the quality assessment of an image based on the degradation of structural information
[5]. The SSIM takes an approach that the human visual system is adapted to extract structural
information from images [14]. Thus, it is important to retain the structural signal for image
fidelity measurement. Figure 8 shows the difference between nonstructural and structural
distortions. The nonstructural distortions are changes in parameter like luminance, contrast,
gamma distortion, and spatial shift and are usually caused by environmental and instrumental
conditions occurred during image acquisition and display [14]. On the other hand, structural
distortion embraces additive noise, blur, and lossy compression [14]. The structural distortions
change the structure of an image [14]. Figure 9 explains the measurement system used in the
calculation of SSIM.
Figure 8. Difference between nonstructural and structural distortions [14]
Interim Report for EE 5359: Multimedia Processing
Figure 9. Block diagram of SSIM measurement system [5]
For given vectors, x = {xi | i =1, . . . ,N} and y = {yi | i=1, . . . ,N}. SSIM is evaluated on three
different metrics like luminance, contrast, and structure which are described mathematically by
equations (1), (2), and (3) respectively [7].
--------------------------------------------- (1)
--------------------------------------------- (2)
--------------------------------------------- (3)
Here,
µx and µy = local sample means of x and y respectively
σx and σy = local sample standard deviations of x and y respectively
σxy = local sample correlation coefficient between x and y
Interim Report for EE 5359: Multimedia Processing
C1, C2, and C3 = constants that stabilize the computations when denominators become small
General form of SSIM index can be obtained by combining equations (1), (2) and (3) [7].
------------------------ (4)
Here, α, β, and γ are parameters that mediate the relative importance of those three
components. Using α = β = γ = 1. We get [7],
------------------------ (5)
Figure 10 shows the different distorted images which are quantified using MSE and SSIM. It is
clearly visible that the different images are of different quality based on human visual system
(HVS). However, all the distorted images have approximately same MSE, whereas SSIM is less
for poor quality image giving much better image quality indication than that of MSE.
(a) Original
MSE = 0; SSIM = 1
(b) Mean luminance shift
MSE = 144, SSIM = 0.988
(c) Contrast stretch
MSE = 144, SSIM = 0.913
(d) Impulse noise
contamination
MSE = 144, SSIM = 0.840
(e) Blurring
MSE = 144, SSIM = 0.694
(f) JPEG compression
MSE = 142, SSIM = 0.662
Figure 10. MSE and SSIM measurement of images under different distortions. (a) original
image, (b) mean luminance shift, (c) contrast stretch, (d) impulse noise contamination, (e)
blurring, and (f) JPEG [22] compression [13]
FSIM is based on the fact that HVS understands an image mainly according to its low-level
features [3]. PC is a dimensionless measure of the significance of a local structure [3]. PC and
image GM measurements are used as primary and secondary feature respectively in FSIM [3].
FSIM score is calculated by applying PC as a weighting function on the image local quality
characterized by PC and GM [3]. FSIM is designed for gray-scale images [3] and FSIMc
Interim Report for EE 5359: Multimedia Processing
incorporates the chrominance information. FSIM can be mathematically modeled as shown in
equation 6 [3].
---------------------- (6)
Here, SL(x) = overall similarity between reference image and distorted image
FSIMc can be mathematically modeled as shown in equation 7 and the computation process is
illustrated in figure 11 [3].
---------------------- (7)
Here, λ > 0 is the parameter used to adjust the importance of the chrominance components.
Figure 11. Illustration for FSIM/FSIMc index computation. f1 is the reference image, and f2 is a
distorted version of f1 [3].
Interim Report for EE 5359: Multimedia Processing
All the metrics use different approaches to compare the images quantitavely. This different
approach makes one method different from another. Table 1 shows the ranking of image quality
assessment metric performance on six databases. It can be seen from Table 1 that FSIM is better
than SSIM and SSIM is better than PSNR when implementing an image quality assessment.
Table 1. Ranking of image quality assessment metrics performance (FSIM, SSIM and PSNR) on
six databases [5].
FSIM
SSIM
PSNR
TID2008
1
2
3
CSIQ
1
2
3
LIVE
1
2
3
IVC
1
2
3
MICT
1
2
3
A57
1
2
3
Results
Results using Foreman QCIF sequence
Video Information
QCIF sequence: foreman_qcif.yuv
Frame height: 176
Frame width: 144
Frame rate: 30 frame/second
Total frame used for encoding: 30 frames
Figure 12: Original Foreman QCIF sequence
[28]
Interim Report for EE 5359: Multimedia Processing
Dirac at 87.32 kbps
H.264 at 87.6 kbps
(baseline profile)
H.265 at 76.80 kbps
Dirac at 152.85 kbps
H.264 at 142.82 kbps
(baseline profile)
H.265 at 162.46 kbps
Dirac at 397.60 kbps
H.264 at 323.76 kbps
(baseline profile)
H.265 at 398.21 kbps
Dirac at 4266.92 kbps
H.264 at 3667.01 kbps
H.265 at 2301.14 kbps
(baseline profile)
Figure 13: Foreman QCIF sequence results using different codec
Interim Report for EE 5359: Multimedia Processing
Table 2: Tabular results for Y-component using Foreman QCIF sequence
PSNR vs bitrate
60
55
PSNR in dB
50
45
40
Dirac
35
H.264
H.265
30
25
20
0
200
400
600
800
1000
Bitrate (kbps)
Figure 14: PSNR achieved at various bitrates for foreman QCIF sequence
MSE vs bitrate
50
45
40
35
MSE
30
25
Dirac
20
H.264
15
H.265
10
5
0
0
200
400
600
800
1000
Bitrate (kbps)
Figure 15: MSE achieved at various bitrates for foreman QCIF sequence
Interim Report for EE 5359: Multimedia Processing
SSIM vs bitrate
1
0.98
0.96
SSIM Index
0.94
0.92
0.9
Dirac
0.88
H.264
0.86
H.265
0.84
0.82
0.8
0
200
400
600
800
1000
Bitrate (kbps)
Figure 16: SSIM achieved at various bitrates for foreman QCIF sequence
Results using Foreman CIF sequence
Video Information
CIF sequence: foreman_qcif.yuv
Frame height: 352
Frame width: 288
Frame rate: 30 frame/second
Total frame used for encoding: 30 frames
Figure 17: Original Foreman CIF sequence
[28]
Interim Report for EE 5359: Multimedia Processing
Dirac at 251.79 kbps
H.264 at 96.64 kbps
(baseline profile)
H.265 at 93.44 kbps
Dirac at 295.83 kbps
H.264 at 320.87 kbps
(baseline profile)
H.265 at 274.82 kbps
Dirac at 665.37 kbps
H.264 at 565.55 kbps
(baseline profile)
H.265 at 500.16 kbps
H.264 at 9462.26 kbps
H.265 at 9562.62 kbps
(baseline profile)
Figure 18: Foreman CIF sequence results using different codec
Dirac at 17673.92 kbps
Interim Report for EE 5359: Multimedia Processing
Table 3: Tabular results for Y-component using Foreman CIF sequence
PSNR vs bitrate
50
45
PSNR in dB
40
Dirac
35
H.264
30
H.265
25
20
0
500
1000
1500
2000
2500
3000
3500
4000
Bitrate (kbps)
Figure 19: PSNR achieved at various bitrates for foreman CIF sequence
MSE vs bitrate
80
70
60
MSE
50
Dirac
40
H.264
30
H.265
20
10
0
0
500
1000
1500
2000
Bitrate (kbps)
Figure 20: MSE achieved at various bitrates for foreman CIF sequence
Interim Report for EE 5359: Multimedia Processing
SSIM vs bitrate
1
0.98
0.96
SSIM Index
0.94
0.92
0.9
Dirac
0.88
H.264
0.86
H.265
0.84
0.82
0.8
0
500
1000
1500
2000
Bitrate (kbps)
Figure 21: SSIM achieved at various bitrates for foreman CIF sequence
Results using container QCIF sequence
Video Information
QCIF sequence: contianer_qcif.yuv
Frame height: 176
Frame width: 144
Frame rate: 30 frame/second
Total frame used for encoding: 30 frames
Figure 22: Original container QCIF sequence
[28]
Interim Report for EE 5359: Multimedia Processing
Dirac at 47.08 kbps
H.264 at 38.54 kbps
(baseline profile)
H.265 at 44.81 kbps
Dirac at 74.55 kbps
H.264 at 69.37 kbps
(baseline profile)
H.265 at 74.51 kbps
Dirac at 218.86 kbps
H.264 at 203.5 kbps
(baseline profile)
H.265 at 205.55 kbps
H.264 at 1806.49 kbps
H.265 at 1743.176 kbps
(baseline profile)
Figure 23: Container QCIF sequence results using different codec
Dirac at 3568.48 kbps
Interim Report for EE 5359: Multimedia Processing
Table 4: Tabular results for Y-component using container QCIF sequence
PSNR vs bitrate
60
55
PSNR in dB
50
45
40
Dirac
35
H.264
H.265
30
25
20
0
200
400
600
800
1000
Bitrate (kbps)
Figure 24: PSNR achieved at various bitrates for container QCIF sequence
MSE vs bitrate
50
45
40
35
MSE
30
25
Dirac
20
H.264
15
H.265
10
5
0
0
200
400
600
800
1000
Bitrate (kbps)
Figure 25: MSE achieved at various bitrates for container QCIF sequence
Interim Report for EE 5359: Multimedia Processing
SSIM vs bitrate
1
0.98
0.96
SSIM index
0.94
0.92
0.9
Dirac
0.88
H.264
0.86
H.265
0.84
0.82
0.8
0
200
400
600
800
1000
Bitrate (kbps)
Figure 26: SSIM achieved at various bitrates for container QCIF sequence
Results using container CIF sequence
Video Information
CIF sequence: contianer_cif.yuv
Frame height: 352
Frame width: 288
Frame rate: 30 frame/second
Total frame used for encoding: 30 frames
Figure 27: Original container CIF sequence
[28]
Interim Report for EE 5359: Multimedia Processing
Dirac at 122.01 kbps
H.264 at 115.82 kbps
(baseline profile)
H.265 at 122.14 kbps
Dirac at 217.54 kbps
H.264 at 233.19 kbps
(baseline profile)
H.265 at 208.02 kbps
Dirac at 906.96 kbps
H.264 at 753.17 kbps
(baseline profile)
H.265 at 916.83 kbps
H.264 at 1555.43 kbps
H.265 at 11469.4 kbps
(baseline profile)
Figure 28: Container CIF sequence results using different codec
Dirac at 15501.95 kbps
Interim Report for EE 5359: Multimedia Processing
Table 5: Tabular results for Y-component using container CIF sequence
PSNR vs bitrate
45
35
Dirac
30
H.264
H.265
25
20
0
200
400
600
800
1000
Bitrate (kbps)
Figure 29: PSNR achieved at various bitrates for container CIF sequence
MSE vs bitrate
50
45
40
35
30
MSE
PSNR in dB
40
25
Dirac
20
H.264
15
H.265
10
5
0
0
200
400
600
800
1000
Bitrate (kbps)
Figure 30: MSE achieved at various bitrates for container CIF sequence
Interim Report for EE 5359: Multimedia Processing
SSIM vs bitrate
1
0.98
0.96
SSIM Index
0.94
0.92
0.9
Dirac
0.88
H.264
0.86
H.265
0.84
0.82
0.8
0
200
400
600
800
1000
Bitrate (kbps)
Figure 31: SSIM achieved at various bitrates for container CIF sequence
Conclusions
The project is aimed in studying the qualitative performances of different video codecs with a
primary focus on Dirac, H.264 and H.265 [19 – 21]. Different parameters like PSNR, MSE, and
SSIM at various bitrates were measured for all three video codecs to make a comparative study.
Based on various test sequences of different spatial/temporal resolutions, MATLAB, Microsoft
visual studio, and MSU video quality measurement tools [27] were extensively used to perform
image quality assessment of different codecs at various bit rates. Figures 14-16, 19-21, 24-26,
and 29-31 shows the variation of metrics like PSNR, MSE, and SSIM respectively for various
bitrates for Foreman QCIF sequence, Foreman CIF sequence, container QCIF sequence, and
container CIF sequence respectively. The respective results in tabular form are shown in table 2-
Interim Report for EE 5359: Multimedia Processing
5. Based on the results obtained, H.265 achieved better performance over H.264 and Dirac at
any given bitrates. However, at higher bitrates all three codecs performs close to each other in
terms of the measurement metrics.
References
[1]
Dirac Video (2008, September 23), “Dirac Specification” [Online]. Available:
http://diracvideo.org/download/specification/dirac-spec-latest.pdf
[2]
I. Richardson (2011), “A Technical Introduction to H.264/AVC” [Online]. Available:
http://www.vcodex.com/files/H.264_technical_introduction.pdf
[3]
L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity index for image
quality assessment,” IEEE Transactions on Image Processing, vol.20, no.8, pp.23782386, Aug. 2011.
[4]
Z.Li and A.M. Tourapis, “New video quality metrics in the H.264 reference software,”
Input Document to JVT, Hannover, DE, 20-25 Jul. 2008.
[5]
Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli,“Image quality assessment: from
error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13,
issue 4, pp. 600-612, Apr. 2004.
[6]
Z. Wang, E.P. Simoncelli, and A.C. Bovik, “Multiscale structural similarity for image
quality assessment,” Conference Record of the Thirty-Seventh Asilomar Conference on
Signals, Systems and Computers, 2003, vol.2, pp. 1398- 1402, 9-12 Nov. 2003.
Interim Report for EE 5359: Multimedia Processing
[7]
C. Li, and A. C. Bovik, “Content-weighted video quality assessment using a threecomponent image model.” Journal of Electronic Imaging, vol.19, pp. 65-71, Mar. 2010.
[8]
X. Ran and N. Farvardin, “A perceptually-motivated three-component image model - part
I: description of the model,” IEEE Transactions on Image Processing, vol.4, no.4,
pp.401-415, Apr. 1995.
[9]
J. L. Li, G. Chen, and Z. R. Chi, “Image coding quality assessment using fuzzy integrals
with a three-component image model,” IEEE Transactions on Fuzzy Systems, vol.12,
no.1, pp. 99- 106, Feb. 2004.
[10]
G. J. Sullivan and J. Ohm, “Recent developments in standardization of high efficiency
video coding (HEVC),” Proc. SPIE 7798, 77980V, 2010.
[11]
G. Sullivan, P. Topiwala, and A. Luthra, “The H.264/AVC video coding standard:
overview and introduction to the fidelity range extensions,” SPIE Conference on
Applications of Digital Image Processing XXVII, vol. 5558, pp. 53-74, Aug. 2004.
[12]
A. Puri, X. Chen, and A. Luthra, “Video coding using the H.264/MPEG-4 AVC
compression standard,” Signal Processing: Image Communication, vol. 19, pp. 793-849,
Oct. 2004.
[13]
Z. Wang et al (2003, February), “The SSIM index for image quality assessment”
[Online]. Available: https://ece.uwaterloo.ca/~z70wang/research/ssim/
[14]
C. Chukka, “A universal image quality index and SSIM comparison” [Online]. Available:
http://www-ee.uta.edu/Dip/Courses/EE5359/chaitanyaee5359d.pdf
Interim Report for EE 5359: Multimedia Processing
[15]
BBC Research, “The technology behind Dirac” [Online]. Available:
http://www.bbc.co.uk/rd/projects/dirac/technology.shtml
[16]
E. Alshina et al, “Technical considerations of new challenges in video coding
standardization,” International Organization for Standardization Organization
Internationale De Normalisation ISO/IEC JTC1/SC29/WG11 Coding of Moving Pictures
and Audio, Oct. 2008.
[17]
S. Jeong et al, “Highly efficient video codec for entertainment quality,” ETRI Journal,
vol.33, no. 2, pp. 145-154, Apr. 2011.
[18]
K. R. Rao and D. N. Kim, “Current video coding standards: H.264/AVC, Dirac, AVS
China and VC-1,” 42nd Southeastern Symposium on System Theory (SSST), pp.1-8,
Mar. 2010.
[19]
A. M. Tourapis (January 2009), “H.264/14496-10 AVC reference software manual”
[Online]. Available:
http://iphome.hhi.de/suehring/tml/JM%20Reference%20Software%20Manual%20%28JV
T-AE010%29.pdf
[20]
F. Bossen, D. Flynn, and K. Sühring (July 2011), “HEVC reference software manual”
[Online]. Available: http://phenix.intevry.fr/jct/doc_end_user/documents/6_Torino/wg11/JCTVC-F634-v2.zip
[21]
DiracPRO software: http://dirac.kw.bbc.co.uk/download/
Interim Report for EE 5359: Multimedia Processing
[22]
D. T. Lee, “JPEG 2000: Retrospective and new developments,” Proc. IEEE, vol. 93, pp.
32-41, Jan. 2005.
[23]
KTA software: http://iphome.hhi.de/suehring/tml/download/KTA/
[24]
H.264/AVC Reference Software: http://iphome.hhi.de/suehring/tml/download/
[25]
A. Ravi, “Performance analysis and comparison of the Dirac video codec with
H.264/MPEG-4 part 10 AVC,” M.S. thesis, Dept. Elect. Eng., Univ. of Texas at
Arlington, 2009.
[26]
I.E.G. Richardson, “H.264 and MPEG-4 video compression: video coding for next generation
multimedia,” Great Britain: Wiley, 2003, pp. 159-223
[27]
MSU video quality measurement tool:
http://compression.ru/video/quality_measure/video_measurement_tool_en.html
[28]
B. Bross et al, “High efficiency video coding (HEVC) text specification draft 6,” Joint
collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC
JTC1/SC29/WG11, 7th Meeting: Geneva, CH, 21–30 Nov. 2011.
[29]
“YUV video sequences” [Online]. Available: http://trace.eas.asu.edu/yuv/
[30]
A. Urs, “Multiplexing/de-multiplexing Dirac video with AAC audio bit stream,” M.S. thesis,
Dept. Elect. Eng., Univ. of Texas at Arlington, May 2011.
[31]
JM Software: http://iphome.hhi.de/suehring/tml/
[32]
T. Wiegand and G.J. Sullivan, “The picturephone is here. Really,” IEEE Spectrum, vol. 48,
pp. 50-54, Sept. 2011.
[33]
A. Ravi and K. R. Rao, “Performance analysis and comparison of the Dirac video codec
with H.264/MPEG-4 part 10 AVC,” IJWMIP, vol. 4, no. 4 pp. 635 – 654, 2011
[34]
N. Jayant, “Frontiers of audiovisual communications: new convergences of broadband
communications, computing, and rich media,” Proceedings of the IEEE, vol. 100, no. 4,
Apr. 2012.
Download