A novel architecture of optimized truncation for jpeg2000

advertisement
A NOVEL ARCHITECTURE OF OPTIMIZED
TRUNCATION FOR JPEG2000
NING CHEN, SHIGEN XIE, XIANG XIE, LEIBO LIU, LI ZHANG, ZHIHUA
WANG
Integrated Circuits and Systems Lab, Department of Electronics Engineering,
Tsinghua University
Beijing, 100084
CHINA
Abstract: - To cope with the recent mobile scenes where images are used aggressively, a novel optimized
truncation architecture dedicated to JPEG2000 image coding is presented in this paper. With a rate-distortion
slope based method, the proposed architecture is aimed to achieve a low computational cost and small working
memory size yet maintaining high image quality. By deciding the adequate number of coding passes to
truncate as soon as each code-block coding ends, the proposed architecture eliminates iterative computations
and reduces working memory size for bitstream buffering from full image size down to three-code-block size
and provides the same rate-distortion performance as the software architecture. It is also implemented and
verified inside a JPEG2000 encoder based on a Xilinx xc2v4000-ff1152 FPGA chip.
Key-Words: - Optimized Truncation, EBCOT, JPEG2000, VLSI, FPGA
1
Introduction
With the recent advances in functionalities of
mobile systems and transmission bandwidth of
network systems, there are strong demands of
digital image processing including still and moving
picture coding. In the case of transmitting
image/video data over communication channel with
specified bandwidth, such as broadcasting or
wireless environment, so-called rate control of
image codestream is indispensable. Rate control is
useful to meet a particular target bitrate or
transmission time, and assures that the desired
number of bytes used in the codestream while
assuring the highest image quality possible.
JPEG2000 [1], standardized in 2001, has a great
ability of rate control based on wavelet
transformation and embedded block coding. It is
highly advantageous to adopt embedded block
coding
since
this
scheme
allows
for
post-compression
rate-distortion
optimization
(PCRD) algorithm [2]. In other words, JPEG2000
encoder can truncate block codestream in an
optimal way when the required bitrate is attained.
Therefore, JPEG2000 can be regarded as a viable
image coding scheme in the coming network era.
However, looking into details of the algorithm, the
process of coefficient bit modeling and arithmetic
coding is executed in code-block basis, which is
small rectangle portion of decomposed image.
These code-blocks are encoded independently each
other, and hence to control the total number of bits of
coded image, a set of truncation (coding termination)
points for all code-blocks must be resolved according
to the specified bitrate. There is an approach to rate
control called rate-distortion optimization [3]. Given
a target bitrate, this scheme evaluates distortion
incurred in re-constructed images attained by all
candidates each for a set of truncation points, and
then selects the set which gives the minimum image
distortion.
This scheme attains fairly good image quality,
however suffers from high computational costs since
the coefficient bit modeling and arithmetic coding
process must be completed before starting rate
control, which demands a large buffer for the whole
image, and to find the appropriate truncation point set,
iterative computations are needed. Hence there still
remains difficult for implementing optimized
truncation
on
chip.
Current
JPEG2000
implementations employ two methods to solve this
problem: 1. to use quantization coefficients instead
of optimized truncation for rate control, 2. to leave
the optimized truncation for software on MCU. Both
methods sacrifice flexibility and the second even
imposes much burden on system throughput and
costs.
Motivated by this problem, a new rate control
architecture, which executes optimized truncation in
parallel with the process of arithmetic coding, is
devised. This architecture first stores codestream and
code-block information of a code-block in separate
buffers, then estimates rate-distortion slope for each
truncation point and selects the monotonically
decreasing subset. When all the rate-distortion slope
metrics available, the optimal truncation point for
current block can be easily decided. Referring to
information buffer to get truncated block length, the
architecture accomplishes rate control by simply shift
the buffer address to truncate the block stream.
Certainly, an arbiter is needed to manage buffers and
output stream when necessary. As a result,
considerable reduction of computational costs can be
achieved with avoiding iterative truncations. At the
same time, buffer size is reduced from tile size to
code-block size.
Our optimized truncation architecture, reducing
computational cost in high degree, is especially
effective when used in low target bitrate applications
which demand for flexible rate control and optimal
rate-distortion performance.
2
JPEG2000 Coding and Rate
Control
2.1 JPEG2000 Coding Algorithm
In JPEG2000 coding scheme, first a target image is
divided into square regions, called tiles. Then 2-D
discrete wavelet transformation (shortly DWT)
decomposes a tile into LL, HL, LH, and HH
subbands. LL subband is a low resolution version of
the original tile and again is to be decomposed into
four subbands recursively. Thus, for each level,
there are three bands except the last has four. This
decomposition is called Mallat decomposition. A
subband is divided into code-blocks, typically 64x64,
each of which is coded individually by coefficient bit
modeling described later.
Wavelet coefficients in a code-block are quantized,
then quantized coefficients are separated to sign bits
and absolute values, and so-called bit-planes are
generated from the bits of absolute values such that
each bit-plane refers to all the bits of the same
magnitude in all coefficients of the subband.
Coefficient Bit modeling is a process to label bits of
a bit-plane based on the statistical information
through three different coding passes, which allows
efficient compression by succeeding MQ-coder, an
arithmetic coder. MQ-coder generates compressed
image data and information for each code-block
independently.
2.2 Rate Control Mechanisms
In the encoder, rate control can be achieved through
two distinct mechanisms: 1) the choice of quantizer
step sizes, and 2) the selection of the subset of
coding passes to include in the code stream. When
the first mechanism is employed, quantizer step
sizes are adjusted in order to control rate. Although
this rate control mechanism is conceptually simple,
it does have one potential drawback. Every time the
quantizer step sizes are changed, the tier-1 encoding
3
Parallel Optimized Truncation
Architecture
As mentioned before, the rate-distortion optimized
truncation can find the set of truncation points
which attains the best image quality. However, this
scheme requests to execute coefficient bit modeling
and arithmetic coding of all bit-planes of
code-blocks in entire image, which is the most
computationally intensive process in JPEG2000
coding. Furthermore, referring to working memory
size, this scheme must keep compressed code-block
data of whole image and considerable
information concerning coded data
distortion values.
On the contrary to this, our approach
truncation points for each code-block as
coding process ends.
amount of
sizes and
determines
soon as its
Info
Control
optimize_valid
SRAM
optimized_length
Code
Control
SRAM
Bu
ff
er
Ma
na
ge
r
RD_Slope
I/F
3. 1 Proposed Optimized Truncation Scheme
The suggested architecture is composite of three
modules: code truncation, info truncation, buffer
arbiter. The diagram below shows the architecture.
Arith
I/F
must be performed again. Since tier-1 coding
requires a considerable amount of computation, this
approach to rate control may not be practical in
computationally-constrained encoders.
When the second mechanism is used, the encoder
can elect to discard coding passes in order to control
the rate. The encoder knows the contribution that
each coding pass makes to rate, and can also
calculate the distortion reduction associated with
each coding pass. Using this information, the
encoder can then include the coding passes in order
of decreasing distortion reduction per unit rate until
the bit budget has been exhausted. This process
stands for optimized truncation in EBCOT. This
approach is very flexible in that different distortion
metrics can be easily accommodated (e.g., mean
squared error, visually weighted mean squared error,
etc.). A rate-distortion optimization adopted in
JPEG2000 verification model (shortly VM) [4] tries
to achieve the best image quality for the specified
bitrate. Since in JPEG2000 every code-block is
coded completely independent each other, this
scheme first calculates pairs of coded data amount
and distortion value at the end of all passes in each
code-block. Among those sets of pairs which
maintain target bitrate, select the one gives lowest
value in terms of total image distortion. Then for
each code-block, truncation point is set to the end of
a pass in the selected set.
Fig.1. Proposed Optimization Truncation Block
Diagram
As we all know, data needed to construct a
JPEG2000 codestream can be divided into two
categories: code and info. Code means those bytes
generated by entropy encoder while info stands for
code-block info necessary for decoder. Info consists
of three types of data: zero bit-plane, pass number,
cumulative length for each pass in the code-block.
Along with these data, rate-distortion slope for each
pass is necessary for optimized truncation. It is
worth noting that only the truncated block length is
necessary not all the cumulative lengths. These
characteristics enable some simplification in the
implementation, which will be discussed later.
In this design, separate handlers for code and info
simplify and clarify the architecture and minimize
memory
addressing
efforts.
Though
the
packetization process is not included in this encoder,
the independent info buffer enables easy integration
with such friendly interface. But at the other hand, it
demands for double blocks of memories, which
impose burden on backend design, especially when
multi-band encoders are used which will be
3.2 Parallel for three band encoders
Since almost all JPEG2000 encoders employ three
entropy encoders to exploit speed, we use three
duplicates to handle them except there is only one
arbiter. Fig. 2 shows the JPEG2000 Encoder
architecture block diagram. Since the proposed
optimized truncation scheme truncate each block
independently as the entropy encoder, so it can be
seamlessly integrated into the three-band parallel
architecture. Fig. 3 shows the three-band parallel
optimized truncation diagram.
Entropy Coding
Engine
Quantizer
HOST I/F
Wavelet
Engine
Arbiter
Entropy Coding
Engine
O
p
t
T
i
r
m
u
i
n
z
c
e
a
d
t
i
o
n
CCD/CMOS I/F
SDRAM I/F
Control Unit
Coefficent
SRAM
Capture
Entropy Coding
Engine
Info
Control
optimize_valid
SRAM
optimized_length
Code
Control
SRAM
Info
Control
SRAM
optimize_valid
optimized_length
Code
Control
SRAM
Info
Control
SRAM
optimize_valid
Bu
ff
er
Ma
na
ge HOST I/F
r
Arith I/F
Arith I/F
Fig.2. JPEG2000 Encoder System Bock Diagram
Arith I/F
discussed later. Another thing is, the buffer sizes are
different for code and info. To avoid buffer
overflow, both buffers must be double size of the
maximum capacity of a code-block. For default
block size 32x32, the code buffer and info buffer
can be chosen to 1024x16 and 128x16 since it
seldom generates more than 1024 bytes or 64
passes.
The proposed optimized truncation method can be
simply described as below:
1. when block coding starts, info handler gets
zero_bit_plane and pass_number;
2. for each code_byte, code handler write it into
buffer, address increases by one.
3. when a pass finishes, rd_slope give out the
pass_length and info handler write it into the buffer,
address increases by one.
4. when the block finishes, rd_slope give out all
slopes for every pass one by one in a monotonously
decreasing order. Slopes of those passes not suitable
for optimized truncation are set to zero.
5. info handler compares these slopes with given
threshold.
if the slope > threshold
optimized_pass_number
=
current_pass_number.
else if slope == zero
next
else if slope < threshold
break;
As we know in the last section, rate-distortion slope
is not necessary for constructing a codestream, so
no memory access is needed.
6. info handler moves address to the block
beginning and writes the optimized_pass_number
7. info handler increases address by optimized_pass
_number and gets optimized_block_length from
buffer and sets optimize_valid valid
8. when optimize_valid valid, code handler gets
optimized_block_length and shift address to
current_ address - block_length + optimized_block
_length
optimized_length
Code
Control
SRAM
Fig.3. Proposed Architecture Block Diagram
Compared with the software architecture using in
JPEG2000 Verification Model, they provide the
same rate-distortion performance for they choose
the same truncation points. At the same time,
parallel optimized truncation needs much small
buffer and is exempt from searching the entire
codestream. It is worth pointing out that
post-compression after the entire image is encoded
implemented in VLSI must suffer from large
addressing effort.
3.3 Comparison with other schemes
Compared with other schemes, the proposed
architecture achieves low computation efforts by
enabling one-pass encoding process with parallel
rate control and reduces the tile-size (512x512x8)
memory demand down to three-block-size
(32x32x8x3) without rate-distortion performance
loss. Table 1 shows the comparison result.
Quantization
MCU-based
Proposed
Method
Optimzed
Architecture
Truncation
Mem
N/A
Tile-size
Area
N/A
Depent
block-size x3
on
7.9 kgates
on
Parallel
MCU
Time
Multi-pass
Depent
encoding
MCU Speed
process
PSNR
Good
one-pass
encoding
Best
Best
Table 1.Comparison with other schemes
4
Conclusion
In this paper, the novel optimized truncation
architecture is proposed dedicatedly for JPEG2000
image coding. The architecture runs concurrently
with the process of code-block coding including
coefficient bit modeling and arithmetic coding
successfully. With reducing considerable part of
computational labor and working memory size, the
high image quality which is the same with the
software architecture is achieved. Thus the
proposed architecture is suitable to be used in ASIC.
It has been integrated into the entire JPEG2000
encoder system and verified on FPGA.
5
Acknowledgement
This work was supported in part by the National
Basic Research Priorities Program of China (973
Program) Grant G2000036508 and the National
High Technologies Research and Development
Program of China (863 Program) Grant
2002AA1Z1420.
References:
[1] ISO/IEC JTC1/SC29/WG1 N2165, JPEG2000
verification model 9.1 (technical description), June
2001.
[2] D. Taubman, High Performance Scalable Image
Compression with EBCOT, IEEE Trans. Image
Processing. Vol. 9, no. 7, pp. 1158-1170, June 2000
[3] Jin Li, Shawmin Lei, An Embedded Still Image
Coder with Rate-Distortion Optimization, IEEE
TRANSACTIONS ON IMAGE PROCESSING, VOL.
8, NO. 7, JULY 1999
[4] ISO/IEC JTC 1/SC 29/ WG 1 N1684,
JPEG2000 Verification Model 7.0 (Technical
description), April 2000
Download