Low Complexity Video Compression Based on Orthogonal Transforms — Neelesh Choudhary

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 12 Number 10 – Jun 2014
Low Complexity Video Compression Based on
Orthogonal Transforms
Neelesh Choudhary#1, Alok Jain#2
#
Samrat Ashok Technological Institute, Vidisha, M.P., India
Abstract— Video is a 3D (three dimensional) array of pixels.
Two of its dimensions serve as spatial domain for moving
pictures and remaining one dimension serve as time domain. As
compared to other multimedia applications, transmission of
video frames requires large bandwidth and more storage space,
therefore video compression is necessary. The proposed
technique compresses the video by converting video from 3D
(three dimension) to 2D (two dimension) using Accordion
function. This transformation converts temporal redundancy of
video into spatial redundancy. To eliminate spatial redundancy
Discrete Cosine Transform (DCT), Discrete Walsh Transform
(DWT) and Discrete Kekre Transform (DKT) are applied
separately. Comparative study of the transform is done on the
basis of parameters calculated and it is found that Discrete
Kekre Transform performs better as compared to Discrete
Cosine Transform and Discrete Walsh Transform.
Keywords— Discrete cosine transform, Discrete Walsh
Transform, Discrete Kekre Transform, Accordion function,
Arithmetic encoding.
I. INTRODUCTION
Video is a sequence of pictures or frames. Video signal has
high correlation between successive frames which is known as
temporal redundancy. Neighbouring pixels in an image are
similar to some extent and shows spatial redundancy. Video
files are stretching towards every field of information
technology, therefore it is required to compress and
decompress in an efficient way. The aim of video compression
is to reduce the size of video signal for reducing storage size
and enhance the transmission time without degrading its
visual quality. In this paper, a new video compression
technique is proposed, which tends to remove temporal
redundancy and spatial redundancy in terms to improve the
efficiency with minimum processing complexity.
In portable digital video applications, efficient real-time
video compression and decompression technique is required.
For such applications, a desired technique which is efficient,
low cost, low complexity is required. Motion based estimation
is widely used amongst all video compression techniques. It is
based on inter frame correlation due to which it provides wellorganised compression. However, it is a complex process and
requires a large number of operations per pixel and motion
estimation process is computationally expensive [1, 2].
For stored video applications, motion based video coding
standard MPEG was used, through which encoding process is
typically carried out offline. Therefore, it is not relevant to
implement it as a real-time compression technique for portable
video files.
ISSN: 2231-5381
Another avenue to exploit inter frame correlation is to use
transform based approach. The objective here is to find
relevant transform that has energy compaction property and
allows an efficient subsequent entropy coding. For image
compression wavelet transform and 2D DCT have shown
these properties. These transforms are used in image/video
compression in spatial domain. The 2D-DCT can be extended
into third dimension, i.e. 3D-DCT by including time as third
dimension into transformation and energy compaction [3, 4, 5,
6]. Transform coder provides video compression ratio which
is as close as motion estimation based coding and less
complex as compared to motion estimation based coding [7].
II. TRANSFORM BASED CODING
The Transform coding is being in used for more than two
decades and has proven to be most efficient coding technique,
eminently for spatial domain. It is now the basis of almost all
video coding standards. The usually used transform based
video coder uses DCT as used in JPEG compression standard.
For video it is known to us as M-JPEG, where “m” stands for
“motion”. Transform based coding is shown in Fig.1.
At first, the input frame of video is divided into N×N
blocks. Thereafter, transform is applied on these blocks [6,7,8].
The objective of using transform is to de-correlate pixels of
the input block. This is achieved by re-adjusting the energy of
pixels and concentrating majority in a small set of transform
coefficient known as energy compaction.
Fig. 1 Transform based video compression
Compression is achieved in two ways: first, low energy
coefficients are discarded as they have minimum impact on
the reconstruction. Second, retained coefficients are quantized
according to their visual importance because human vision
system has different sensitivity to different frequencies.
Transformed coefficients are then quantized and then suitably
http://www.ijettjournal.org
Page 543
International Journal of Engineering Trends and Technology (IJETT) – Volume 12 Number 10 – Jun 2014
coded. Now these quantized coefficients are suitably coded
[9,10].
III. METHODOLOGY
Here approach is to represent data in highly correlated form,
which is obtained by exploiting by both temporal and spatial
redundancy in video signal. The input of our encoder is
referred as video cube (3D), which is made up of number of
frames. This cube is decomposed into temporal frames which
are then combined into one frame (2D). In the next step, 2D
frame is decomposed into N×N blocks of pixels and transform
is applied. Block diagram representation of accordion based
transform coding is shown in fig.3.
Fig.4 Spatial decomposition of frames
Initially, in spatial decomposition, frames decompose into
its row and column pixel intensity value as shown in Fig 5.
Then we decompose group of frames into temporal (time)
domain.
Fig.5 Temporal decomposition of frames
Fig. 2 Block diagram representation of Accordion Based Transform Coding
Temporal decomposition is achieved by merging the video
cube pixels which have same column rank. Since, these
frames have a stronger correlation as compared to spatial
frames. Frames retrieved from temporal decomposition are
called temporal frames.
A. Accordion Representation
The accordion representation transforms video into images
that arrange adjacent pixels of temporal frames to spatial
neighbourhood [11]. Fig.3 shows first four frames of News
reader sequence.
To obtain accordion representation arrange temporal
frames of same column rank are grouped together since they
highly correlated to each other. Direction of arrow in Fig.6
shows the manner in which frames are arranged to form
accordion.
Fig.3. Frames of News reader video
Here, objective is to carryout spatial and temporal
decomposition of 3D video signal. Fig. 4 shows spatial
decomposition of video cube of first four frames. Here
I(h,w,tn) is the pixel intensity value in respected coordinate,
here ‘h’ and ‘w’ represents height and width of video frame,
‘tn’ shows time instance.
ISSN: 2231-5381
Fig.6 Arrangement of Temporal frames to form Accordion representation of
video
http://www.ijettjournal.org
Page 544
International Journal of Engineering Trends and Technology (IJETT) – Volume 12 Number 10 – Jun 2014
Fig. 6 shows the obtained accordion representation of
spatially and temporally decomposed video file.
Fig.7 Accordion representation in the form of intensity values of video
file
Accordion representation of News reader sequence is
shown in fig 7.
transform applied on an frame (image) of size N×N requires
only addition and no multiplications [13]. 4×4 walsh
transform matrix is shown in fig.8.
1
1
1
1
1
1
-1
-1
1
-1
-1
1
1
-1
1
-1
Fig. 9 4×4 Walsh Transform Matrix
Although, the computational complexity of DWT is simpler
than other transform techniques, it is not suitable for the
application which requires higher compression ratio. For the
higher compression ratio, the reconstruction quality drastically
decreases
3) Discrete Kekre Transform: Unlike other transform
matrix kekre transform matrix need not to be in power of 2,
transformation matrix can be of any size. All upper diagonal
values are one, while lower diagonal part except the values
just below diagonal are zero [13, 14]. Expression for N×N
Kekre’s transform matrix is given as:
(
{
)
(3)
Fig.8 Accordion representation of Akiyo video file
Obtained IACC frames are decomposed into 8×8 blocks. For
each 8×8 block transform is applied. Transform used here are
DCT, DWT and DKT. Transformed coefficients are quantized
and encoded.
B. Transform Used:
Following transforms are used in this proposed work are
described below:
1) Discrete Cosine Transform: Discrete Cosine
Transform (DCT) is a lossy compression technique used in
image compression. Its expression for N×N is given as(
)
√
Where, ( )
( ) ( )∑
∑
(
)
[(
)
]
[(
)
]
(1)
N×N Kekre transform matrix is shown in Fig.10
1
1
1
1
1
-N+1
1
1
1
1
0
-N+2
1
1
1
:
:
:
:
:
0
0
0
1
1
0
0
0
-N+(N-1)
1
:
Fig. 10 N×N Kekre Transform Matrix
{√
(2)
As DCT shows simplicity and satisfactory performance in
compression, it was most widely used in transform coding.
Drawback of using DCT is that it introduces blocking artifacts
at low bit rate.
2) Discrete Walsh Transform: The walsh transform
matrix is define as a set of N rows, denote Wj, for
j=0,1………N-1. Here, Wj takes on the values +1 and -1 and
Wj [0]=1 for all j. walsh transform Matrix is defined using
Hadamard matrix of order N. Full 2-Dimensional walsh
ISSN: 2231-5381
C. Quantization and Zigzag Scanning:
Quantization is the process of mapping a large set of values
to a smaller set of values by rounding some values to some
unit of precision. After quantization of the transformed values,
entropy encoder further compresses the quantized values to
give additional compression. It is a loss-less compression and
is also reversible.
An effective way to code the resulting set quantized
coefficients is with a combination of zigzag scan of
coefficients. Fig.11 shows zig-zag ordering of quantized
coefficients.
http://www.ijettjournal.org
Page 545
International Journal of Engineering Trends and Technology (IJETT) – Volume 12 Number 10 – Jun 2014
(DWT) and Discrete Kekre Transform (DKT). Performance
parameters used here are Peak signal to noise Ratio (PSNR)
and Compression Ratio.
Fig. 11 Zig zag scanning of coefficients
D. Encoding:
In the entropy encoding, the idea is to find a reversible
mapping to the quantized values such that the average number
of bits or symbols is minimized. Encoding used here is Run
Length Encoding (RLE) and arithmetic coding. Run length
encoding works by reducing the physical size of a repeating
string of characters. This repeating string, called a run, is
typically encoded into two bytes. The first byte represents the
number of characters in the run and is called the run count.
The second byte is the value of the character in the run, which
is in the range of 0 to 255, and is called the run value. RLE
schemes are simple and fast, but their compression efficiency
depends on the type of image data being encoded.
The idea behind Arithmetic coding is to have a probability
line, 0-1, and assign to every symbol a range in this line based
on its probability, the higher the probability, the higher range
which assigns to it.
A. MSE and PSNR
The two commonly used error notations used to compare
various image/video compression techniques are Mean square
error and peak signal to noise ratio. The MSE is cumulative
squared error between compressed and original image
whereas the PSNR is a measure of peak error. PSNR is the
ratio between maximum possible power of the signal and the
power of corrupting Noise that affects the fidelity of
representation. Since, many of the signals have a very wide
dynamic range, PSNR is usually expressed in terms of
Logarithmic decibel scale. PSNR is most usually used as
measure of quality of reconstruction of lossy compression
method.
Equation for MSE and PSNR are given as:
∑
∑ [ ( )
( )]
(4)
Where, M×N represents number of rows and columns of
frames.
Fˈ(i,j) represents reconstructed frames.
F(i,j) represents original frames
(
)
(5)
√
Here. MAX is the maximum possible value of a image.
When pixels are represented by 8 bits per sample, than
maximum value is 255.
B. Compression Ratio
Compression defines the measure of extent to which
compression can be achieved without degrading the
quality of video file. It is given as:
IV. EXPERIMENTS AND RESULTS
Experiments had been performed in order to study the
performance of proposed technique by considering four
videos shown in Fig.12
(6)
Where,
= number of bits in input video
= number of bits in compressed video.
DKT gives better PSNR for every frame then DCT and
DWT which is clear in Fig.13
psnr(db)
ACC DKT
ACC DCT
ACC DWT
44
42
40
38
36
34
32
30
28
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
frames
Fig.12 (a) 1st frame of Akiyo (b) 4th frame of Deadline (c) 2nd frame of
Container (d) 100th frame of Bowing
Transform which provide high PSNR and Compression
ratio is considered to be the best. Transform used here are
Discete Cosine Transform (DCT), Discrete walsh Transform
ISSN: 2231-5381
Fig.13. PSNR waveform comparison between ACC-DKT, ACC_DCT and
ACC-DKT (Bowing test video).
http://www.ijettjournal.org
Page 546
International Journal of Engineering Trends and Technology (IJETT) – Volume 12 Number 10 – Jun 2014
Table 1 represents overall performance of all proposed
Transforms i.e. Discrete Cosine Transform (DCT) and
Discrete Kekre Transform (DKT) for different compression
ratios. Here, PSNR is observed for different compression ratio.
Transform which gives high value of PSNR for high
Compression ratio is considered to be the best. From
observation, we found that Kekre Transform provides high
PSNR at different Compression Ratio without distorting the
quality of video file.
TABLE I
NUMERICAL RESULTS ON THE TEST VIDEO
FRAMES AVERAGED OVER 50 FRAMES
PSNR(db)
TEST VIDEO
AKIYO
DEADLINE
CONTAINER
BOWING
CR=10
CR=25
CR=50
CR=10
CR=25
CR=50
CR=10
CR=25
CR=50
CR=10
CR=25
CR=50
DCT
DWT
DKT
32.15
31.62
29.48
30.01
30.00
28.34
32.21
30.61
28.06
33.72
32.48
29.38
33.70
32.25
29.10
30.65
30.00
29.10
33.18
31.06
29.06
33.22
33.01
30.42
34.26
33.13
31.66
31.30
30.55
30.14
35.33
32.29
30.28
37.94
37.66
34.99
In this methodology, Redundancies are exploited in
temporal domain. Some artifacts produced by transform based
compression techniques prevail in accordion. The application
of Transform based coding on IACC allows the
transformation from temporal domain to spatial domain. After
quantization, high spatial frequencies of IACC frames are
eliminated which actually represents the high temporal
frequencies of 3D video file. Thus, strong quantification will
not affect the quality of image but will rather affect the
fluency of video.
References
[1] Q. L. X. Zhou and Y. Chen, “Implementation of h.264 decoder on
general purpose processors with media instructions”, in SPIE Conf. on
Image and Video Communications and Processing, 224-235, 2003.
[2] M. B. T. Q. N. A. Molino and F.Vacca, “Low complexity video codec for
mobile video conferencing”, in Eur. Signal Processing Conference, 665668, 2004.
[3] S. B. Gokturk and A. M. Aaron, “Applying 3d techniques to video for
compression”, in Digital Video Processing (EE392J) Projects Winter
Quarter, 2002.
[4] T. Fryza, “Compression of Video Signals by 3D DCT Transform”,
Diploma thesis, Institute of Radio Electronics, FEKT Brno University of
Technology, Czech Republic, 2002.
[5] G. M.P. Servais, “Video Compression using the three dimensional
discrete cosine transform” in Proc. COMSIG, 27-32, 1997.
[6] R. A.Burg, “A 3d-dct real-time video compression system for low
complexity single chip vlsi implementation”, in the Mobile Multimedia
Conference (MoMuC), 2000.
[7] Jaya Krishna sunkara, E navaneethasagari, D pradeep, E Naga
Chaithanya,D Pravani and D V Sai Sudeer, “A new video compression
method using DCT/DWT and spiht based on Accordion representation”,
in International Journal of Image,Graphics and Signal Processing,2012.
[8] Reny Catherin L.,Thirupurasunthari P.,Sherley Arcksily Sylvia
A.,Sravani Kumari G.,Joany R.M and N.M. Nandhitha, “A servey on
Hybrid Image compression Techniques for Video Transmisson”, in
IJECE 2013.
[9] M.Atheeshwari and K.Mahesh, “video compression techniques- A
Comprehensive survey”, in IJARCSSE, 2014.
[10] Detlev Marpe, Heiko Schwarz and Thomas Wiegand, “Context based
adaptive binary arithmetic coding in the H.264/AVC video compression
standard”, in IEEE transaction on circuits and system for video
technology, 2003.
[11] Tarek Ouni, Walid Ayedi and Mohamed “New low complexity DCT
based video compression method” in IEEE 2009.
[12] Gregorio Bernabe, Jose M. Garcia and Jose Gonzalez, “A lossy 3D
wavelet transform for high quality compression of medical video”, in The
Journal of Systems and Software, Elsevier, 2009.
[13] Sudeep.D.thepade and Jaya H.Dewan, “varying proportions of
constituent transforms to generate hybrid wavelet transform for image
compression”, in IEEE International Conference on Emerging Trends in
computing communication and nanotechnology, 2013.
[14] Dr. H.B kekre, Sudeep D. Thepade and Akshay Maloo, “performance
comparison of image retrieval techniques using wavelet pyramids of
walsh, haar and kekre transforms”,in IJCA, 2010.
V. CONCLUSION
In this paper, comparative study of different orthogonal
transform like Discrete Cosine Transform, Discrete Walsh
Transform and Discrete Kekre Transform is done for effective
video compression. These transforms were applied on
ACCORDION representation of video file which is 3D to 2D
conversion of video. Objective of transform based coding is to
remove temporal redundancy. On the basis of PSNR and
Compression Ration calculated, inference drawn here is that
Discrete Kekre Transform (DKT) has performed better as
compared to Discrete Cosine Transform (DCT) and Discrete
Walsh Transform (DWT). DKT has provided high PSNR
amongst all transforms used.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 547
Download