Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara

advertisement
April 30th, 2003
Parallel Design of JPEG2000 Image Compression
Xiuzhen Huang
CS Department
UC Santa Barbara
CS Department
Page 1
Outline
• Introduction to image compression
• JPEG2000 compression scheme
• Parallel implementation of JPEG2000
– On distributed-memory multiprocessors
– On shared-memory multiprocessors
• Conclusion
CS Department
Page 2
Introduction to Image Compression
Why do we need image compression?
File size of a small digital photo without compression:
1280  800
 3 (RGB)
= 3 M bytes
800
pixels
To speedup the image
transmission over Internet
and reduce image storage
space, we need compression
CS Department
1280 pixels
Page 3
Introduction to Image Compression
Original Picture
3 M bytes
JPEG2000 Compression
19 K bytes
• Compression Ratio: >150 times !
• No noticeable difference in picture quality
CS Department
Page 4
JPEG2000 International Standard
JPEG2000: the new international standard for image compression, is much more
efficient than the old JPEG international standard.
For the same compression ratio / bit rate / file size, the JPEG2000 picture
has much better quality.
Original Picture
JPEG
JPEG2000
Compression
ratio :
50:1
Strong blockiness
CS Department
Page 5
JPEG2000 International Standard
JPEG2000 has a much Higher computational complexity than JPEG,
especially for larger pictures.
Need parallel
implementation
to reduce
compression
time.
CS Department
Page 6
JPEG2000 Compression Scheme
Major steps of JPEG2000 image compression
Input
Wavelet
Transform
Blockwise
Partition
Coding
of each block
Binary
Compressed
data
• Wavelet transform uses most of the image compression time (>80%)
• parallel implementation should focus on wavelet transform
CS Department
Page 7
JPEG2000 Compression Scheme
Brief Introduction to Wavelet Transform
Step 1: Horizontal wavelet transform of an image
for each row
do 1-D wavelet transform;
end
What is 1-D wavelet transform ?
CS Department
Page 8
JPEG2000 Compression Scheme
A simple example: 1-D Haar wavelet transform
Low-pass filter
[1, 1]
One array of
image data
2
First half of
the output
Average
of neighboring
pixels
LowFrequency
coefficients
Difference
of neighboring
pixels
HighFrequency
coefficients
Down-sample by 2
[1, -1]
high-pass filter
2
Second half of
the output
Low
High
Horizontal
Wavelet
Transform
of Each Row
CS Department
Page 9
JPEG2000 Compression Scheme
Wavelet Transform
Step 2: Vertical transform of image
for each column of the new image
do 1-D wavelet transform;
end
CS Department
Page 10
JPEG2000 Compression Scheme
Low
High
Horizontal
Wavelet
Transform
of Each Row
Low
Low
Low
High
High
Low
High
High
CS Department
Vertical
Wavelet
Transform of
Each Column
Page 11
Parallel Design of JPEG2000 Compression
Two Parallel Computing Architectures
Distributed-Memory Multiprocessors
•
Each processor has its own memory module
•
Processors communicate to each other over a high-speed network
•
Programming tool: MPI (Message Passing Interface)
Shared-Memory Multiprocessors
•
•
•
Has a single address space.
Allow processors to communicate through variables stored in a
shared address space
Programming tool: openMP
CS Department
Page 12
Parallel Implementation of JPEG2000 Compression
on Distributed-Memory Multiprocessors
CS Department
Page 13
Parallel Design of JPEG2000 Compression-DMP
Traditional Approach
• The image is first divided
into n regions on rows.
• Each processor performs 1-D
horizontal wavelet transform
• Then, the new image is
divided into n regions on
columns.
• Each processor performs 1-D
vertical wavelet transform.
This approach requires intensive data transmission among
processors, has very high network communication cost.
CS Department
Page 14
Parallel Design of JPEG2000 Compression-DMP
Tiling Approach
• JPEG2000 international
standard supports tilebased image compression.
• A large image is divided
into several tiles and each
image tile is compressed
independently.
CS Department
P1
P2
P3
P4
P5
P6
P7
P8
P9
Page 15
Parallel Design of JPEG2000 Compression-DMP
Choose MPI for parallel implementation of JPEG2000, because the
JPEG2000 software is written in C, which supported by MPI.
Basic framework is:
CS Department
Page 16
Image:
512x512
Compression Time (Sec)
Parallel Design of JPEG2000 Compression-DMP
Size: 32
Size: 256
Number of processors
The picture shows the compression time using different tile size.
For each tile size,processor number increases,compression time
is reduced.The small tile need larger computation overhead.
CS Department
Page 17
Parallel Design of JPEG2000 Compression-DMP
Note
• There is a jump between one process and two
processes.
• When there is only one process, JPEG2000
compression is sequential
• If there are more than two processes involved in
the program, Process 1 is responsible for
collecting data, while the others are responsible for
processing different tiles and sending processed
data back to the Process 1.
CS Department
Page 18
Parallel Implementation of JPEG2000 Compression
on Shared-Memory Multiprocessors
CS Department
Page 19
Parallel Design of JPEG2000 Compression-SMP
A problem with tile-based approach
Images compressed by JPEG, JPEG2000, and JPEG2000
with relatively small tiles.
Each tile is compressed independently, which causes
discontinuity across tile edges, also called blockiness.
CS Department
Page 20
Parallel Design of JPEG2000 Compression-SMP
•
•
•
Another parallel architecture is shared-memory
multiprocessors.
The excellent price-performance ratio of Intel-based SMPs
make such systems very popular in many data processing
applications.
There are also many available programming tools for
shared memory processor, such as openMP and Java
Threads.
CS Department
Page 21
Parallel Design of JPEG2000 Compression-SMP
• In SMP, we do not need worry
about data communication over
network, because the data is in the
shared memory. So there is no
need for tile partitioning.
• Therefore, we can use the
traditional data partitioning
approach for horizontal and
vertical wavelet transforms.
CS Department
Page 22
Parallel Design of JPEG2000 Compression-SMP
• JPEG2000 image compression is implemented on a 4processor SMP system using direct openMP.
• The speedup in wavelet transform is only about 1.6
times, which is supposed to be near 4 times.
• Why?
CS Department
Page 23
Parallel Design of JPEG2000 Compression-SMP
It is found that the vertical
wavelet transform requires
more than 10 times the
horizontal transform.
But we know that both
vertical and horizontal
transforms have the same
number of operations.
vertical
CS Department
horizontal
Page 24
Parallel Design of JPEG2000 Compression-SMP
Cache Miss Problem
• In computer memory, the image data is
stored line by line in a raster-scan order
(from left to right, from top to bottom).
• Each continuous block of image data is
brought into the cache from memory for
wavelet transform.
• In horizontal wavelet transform, as the
filter window is moving, the data of next
transform is often available, few cache
miss.
CS Department
Page 25
Parallel Design of JPEG2000 Compression-SMP
Cache Miss Problem
filtering
• In vertical wavelet transform, the filtering
is done in the vertical direction, however,
the data is brought into cache in a
horizontal way. So, there are very
frequent cache miss.
data
Solution
Do vertical transform of several
columns at the same time to make full
use of the existing data in the cache.
, instead of column by column
Significantly reduces cache miss.
CS Department
Page 26
Parallel Design of JPEG2000 Compression-SMP
Original
Vertical
transform
Improved
Vertical
transform
CS Department
The vertical transform is speed up
by about 10 times.
Page 27
Parallel Design of JPEG2000 Compression-SMP
Using the improved vertical wavelet transform, the overall
speedup times of wavelet transform is now close to the number of
processors.
CS Department
Page 28
Conclusion
•
•
Give a brief review JPEG2000 image
compression.
Discussed two approaches for parallel
implementation of JPEG2000 image
compression: distributed memory multiprocessor
and shared memory multiprocessor.
Question?
CS Department
Page 29
Download