RLE Image Compression

advertisement
RLE Image Compression
RLE is a natural candidate for compressing graphical data. A digital
image consists of small dots called pixels. Each pixel can be either one bit,
indicating a black or a white dot, or several bits, indicating one of several
colors or shades of gray. We assume that the pixels are stored in an array
called a bitmap in memory, so the bitmap is the input stream for the image.
Pixels are normally arranged in the bitmap in scan lines, so the
first bitmap pixel is the dot at the top left corner of the image, and the last
pixel is the one at the bottom right corner.
Compressing an image using RLE is based on the observation that if we
select a pixel in the image at random, there is a good chance that its
neighbors will have the same color . The compressor thus scans the bitmap
row by row, looking for runs of pixels of the same color. If the bitmap starts,
e.g., with 17 white pixels, followed by 1 black one, followed by 55 white
ones, etc., then only the numbers 17, 1, 55,. . . need be written on the output
stream.
The compressor assumes that the bitmap starts with white pixels. If this is
not true, then the bitmap starts with zero white pixels, and the output stream
should start with 0. The resolution of the bitmap should also be saved at the
start of the output stream.
The size of the compressed stream depends on the complexity of the image.
The more detail, the worse the compression. However, Figure 1.4 shows
how scan lines go through a uniform area. A line enters through one point on
the perimeter of the area and exits through another point, and these two
points are not “used” by any other scan lines. It is now clear that the number
of scan lines traversing a uniform area is roughly
1
equal to half the length (measured in pixels) of its perimeter. Since the area
is uniform, each scan line contributes one number to the output stream. The
compression ratio of a uniform area thus roughly equals the ratio
half the length of the perimeter \ total number of pixels in the area .
RLE can also be used to compress grayscale images. Each run of pixels of
the same intensity (gray level) is encoded as a pair (run length, pixel value).
The run length usually occupies one byte, allowing for runs of up to 255
pixels. The pixel value occupies several bits, depending on the number of
gray levels (typically between 4 and 8 bits).
Example: An 8-bit deep grayscale bitmap that starts with
12, 12, 12, 12, 12, 12, 12, 12, 12, 35, 76, 112, 67, 87, 87, 87, 5, 5, 5, 5, 5, 5, 1, . . .
is compressed into 9 ,12,35,76,112,67, 3 ,87, 6 ,5,1,. . . , where the boxed
numbers indicate counts. The problem is to distinguish between a byte
containing a grayscale value (such as 12) and one containing a count (such
as 9 ). Here are some solutions (although not the only possible ones):
1. If the image is limited to just 128 grayscales, we can devote one bit in
each byte to indicate whether the byte contains a grayscale value or a count.
2
2. If the number of grayscales is 256, it can be reduced to 255 with one value
reserved as a flag to precede every byte with a count. If the flag is, say, 255,
then the sequence above becomes
255, 9, 12, 35, 76, 112, 67, 255, 3, 87, 255, 6, 5, 1, . . . .
3. Again, one bit is devoted to each byte to indicate whether the byte
contains a grayscale value or a count. This time, however, these extra bits
are accumulated in groups of 8, and each group is written on the output
stream preceding (or following) the 8 bytes it “belongs to.”
Example: the sequence 9 ,12,35,76,112,67, 3 ,87, 6 ,5,1,. . . becomes
10000010 ,9,12,35,76,112,67,3,87, 100..... ,6,5,1,. . . .
The total size of the extra bytes is, of course, 1/8 the size of the output
stream (they contain one bit for each byte of the output stream), so they
increase the size of that stream by 12.5%.
4. A group of m pixels that are all different is preceded by a byte with the
negative value -m. The sequence above is encoded by 9, 12,-4, 35, 76, 112,
67, 3, 87, 6, 5, ?, 1, . . . (the value of the byte with ? is positive or
negative depending on what follows the pixel of 1). The worst case is a
sequence of pixels (p1, p2, p2) repeated n times throughout the bitmap. It is
encoded as (−1, p1, 2, p2), four numbers instead of the original three! If each
pixel requires one byte, then the original three bytes are expanded into four
bytes. If each pixel requires three bytes, then the original three pixels
(comprising 9 bytes) are compressed into 1 + 3 + 1 + 3 = 8 bytes.
Three more points should be mentioned:
1. Since the run length cannot be 0, it makes sense to write the [run length
minus one on the output stream. Thus the pair (3, 87) means a run of four
pixels with intensity 87. This way, a run can be up to 256 pixels long.
3
2. In color images it is common to have each pixel stored as three bytes,
representing the intensities of the red, green, and blue components of the
pixel. In such a case, runs of each color should be encoded separately. Thus
the pixels (171, 85, 34), (172, 85, 35), (172, 85, 30), and (173, 85, 33) should be
separated into the three sequences (171, 172, 172, 173, . . .), (85, 85, 85, 85, .
. .), and (34, 35, 30, 33, . . .). Each sequence should
be run-length encoded separately. This means that any method for
compressing grayscale images can be applied to color images as well.
Move-to-Front Coding
The basic idea of this method is to maintain the alphabet A of symbols
as a list where frequently occurring symbols are located near the front. A
symbol “s” is encoded as the number of symbols that precede it in this list.
Thus if A=(“t”, “h”, “e”, “s”,. . . ) and the next symbol in the input stream to
be encoded is “e”, it will be encoded as 2, since it is preceded by two
symbols. There are several possible variants to this method; the most basic
of them adds one more step: After symbol “s” is encoded, it is moved to the
front of list A. Thus, after encoding “e”, the alphabet is modified to A=(“e”,
“t”, “h”, “s”,. . . ). This move-to-front step reflects the hope that once “e” has
been read from the input stream, it will be read many more times and will, at
least for a while, be a common symbol. The move-to-front method is locally
adaptive, since it adapts itself to the frequencies of symbols in local areas of
the input stream.
The method thus produces good results if the input stream satisfies this hope,
i.e., if it contains concentrations of identical symbols (if the local frequency
of symbols changes significantly from area to area in the input stream). We
call this “the concentration property.” Here are two examples that illustrate
4
the move-to-front idea. Both assume the alphabet A=(“a”, “b”, “c”, “d”,
“m”, “n”, “o”, “p”).
1. The input stream “abcddcbamnopponm” is encoded as
C = (0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3) (Table 1.1a). Without the
move-to-front step it is encoded as C_ = (0, 1, 2, 3, 3, 2, 1, 0, 4, 5, 6, 7, 7, 6,
5, 4) (Table 1.14b). Both C and C_ contain codes in the same range [0, 7],
but the elements of C are smaller on the average, since the input starts with a
concentration of “abcd” and continues with a concentration of “mnop”. (The
average value of C is 2.5, while that of C_ is 3.5.)
a abcdmnop 0
b abcdmnop 1
c bacdmnop 2
d cbadmnop 3
d dcbamnop 0
c dcbamnop 1
b cdbamnop 2
a bcdamnop 3
m abcdmnop 4
n mabcdnop 5
o nmabcdop 6
p onmabcdp 7
p ponmabcd 0
o ponmabcd 1
n opnmabcd 2
m nopmabcd 3
mnopabcd
(a)
a abcdmnop 0
b abcdmnop 1
c abcdmnop 2
d abcdmnop 3
d abcdmnop 3
c abcdmnop 2
b abcdmnop 1
a abcdmnop 0
m abcdmnop 4
n abcdmnop 5
o abcdmnop 6
p abcdmnop 7
p abcdmnop 7
o abcdmnop 6
n abcdmnop 5
m abcdmnop 4
(b)
a abcdmnop 0
b abcdmnop 1
c bacdmnop 2
d cbadmnop 3
m dcbamnop 4
n mdcbanop 5
o nmdcbaop 6
p onmdcbap 7
a ponmdcba 7
b aponmdcb 7
c baponmdc 7
d cbaponmd 7
m dcbaponm 7
n mdcbapon 7
o nmdcbapo 7
p onmdcbap 7
ponmdcba
a abcdmnop 0
b abcdmnop 1
c abcdmnop 2
d abcdmnop 3
m abcdmnop 4
n abcdmnop 5
o abcdmnop 6
p abcdmnop 7
a abcdmnop 0
b abcdmnop 1
c abcdmnop 2
d abcdmnop 3
m abcdmnop 4
n abcdmnop 5
o abcdmnop 6
p abcdmnop 7
(c )
(d)
Table 1.1: Encoding With and Without Move-to-Front.
2. The input stream “abcdmnopabcdmnop” is encoded as C = (0, 1, 2, 3, 4,
5, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7) (Table 1.14c). Without the move-to-front step it
is encoded as C_ = (0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7) (Table 1.14d).
The average of C is now 5.25, greater than that of C_, which is 3.5. The
5
move-to-front rule creates a worse result in this case, since the input does
not contain concentrations of identical symbols (it does not satisfy the
concentration property).
6
Download