Applications of the wavelet transform in image processing Øyvind Ryan

advertisement
Applications of the wavelet transform in image
processing
Øyvind Ryan
Department of informatics, University of Oslo
e–mail: oyvindry@ifi.uio.no ∗
12 nov 2004
Abstract
Mathematical methods applied in the most recent image formats are
presented. First of all, the application of the wavelet transform in JPEG2000
is gone through. JPEG2000 is a standard established by the same group
which created the widely used JPEG standard, and it was established to
solve some of the shortcomings of JPEG. Also presented are other recently
established image formats having wavelet transforms as part of the codec.
Other components in modern image compression systems are also gone
through, together with the mathematical and statistical methods used.
1
Prelimiaries
All image formats gone through here use an image transform, quantization
and coding. All of these are described for the different formats in question.
Transforms mentioned below can be separably extended to two dimensions
for applications to image processing. Therefore, we state our results in one
dimension only for the sake of simplicity. We will describe this separation
process later. We use value m for block dimension (or more precisely,
the number of channels), for our wavelet transforms we will always have
m = 2. We will associate a transform with m filters, so that for m = 2
we will have only two filters. With respect to signal processing, these are
to be interpreted as low-pass high-pass filters.
If we denote the signal by x, the block dimension (or number of channels) describes the size of the block partitioning of the signal. JPEG
applies only block transforms, meaning that if the signal is split into
blocks x = x[i] (each x[i] a vector of dimension m), the transformed
signal y = y[i] is given by
y[i] = A∗ x[i]
for some m × m matrix A. This way we lose influence from different
blocks, for instance for pixels on block boundaries. This is what gives
rise to the blockiness artifact in JPEG. Important block transforms are
sketched below.
1 Sponsored
by the Norwegian Research Council, project nr. 160130/V30
1
1.1
KLT (Karhunen-Loeve Transform)
KLT is the unique transform which decorellates its input. To be precise,
define the covariance matric CX of a random vector X by
CX = E ((X − µx )(X − µx )∗ ) .
If the KLT transform is called K, then the random vector Y = K ∗ X
should have uncorrellated components, i.e. CY = K ∗ CX K is a diagonal
matrix. This transform is what gives rise to principal component analysis
(PCA). Among linear transforms, KLT minimizes MSE when keeping a
given number of it’s principal components (when principal components
are ranked in decreasing order).
The drawback for the KLT is that we need to recompute the transform
each time the statistics of the source changes. By it’s nature, it cannot
be separable either.
1.2
DFT (Discrete Fourier Transform
2π
The DFT is defined by the transform A = ei m pq
, which is unitary.
p,q
DFT has an efficient implementation through the FFT. It is also separable.
One drawback of DFT is that the transform works badly when the
end points (x0 and xm−1 ) are far apart. If the full Fourier transform was
applied in this case, many higher Fourier components would be introduced
to compensate for this.
1.3
DCT
DCT is defined by
A = cq cos 2πfq p +
where
cq =
p1
pm
2
m
1
2
; fq =
q,p
q
,
2m
if q = 0
.
if q =
6 0
DCT can be constructed through DFT by symmetrically extending the
input sequence about the last point, applying the 2m point DFT, and
recovering the first m points afterwards. It is separable since the DFT is,
and the FFT can be used as an efficient implementation of the DCT. No
need to adapt the transform to the statistics of the source material (as
with KLT). DCT is robust approximation to the KLT for natural image
sources. Used in JPEG, MPEG, CCITT H.261.
One drawback is that there is no way to use the DCT for lossless
compression, since outputs of the transform are not integers.
2
2.1
JPEG (baseline)
Transform
DCT is used as the transform of the input signal, after the input has been
level shifted. If elements in the input signal is 8 bits, level shift would
mean subtracting 128 from numbers in [0, 256i, producing numbers in
[−128, 128i. Block dimension is always 8.
2
2.2
Quantization
Uniform midtread quantization (midtread means that 0 is the midpoint of
the quantization interval about 0) is used. A quantization table, consisiting of the step sizes of each coefficient quantizer (the table has size 8 × 8).
This table is emitted with the data itself. Use smaller values in this table
if you want less loss during encoding. The coefficients after quantizations
are called labels. The first label in a block is called an DC coefficient, the
rest are called AC coefficients. Higher AC coefficients typically rounded
to 0. This provides us with good compression.
2.3
Coding
AC and DC coefficients are coded differently. Differences between successive DC labels are coded, instead of the DC label itself. This difference is not coded for AC labels. Labels are partitioned into categories:
0, −1, 1, −3, −2, 2, 3, ..., of sizes 20 , 21 , 22 , ..., numbered 0, 1, 2, 3, , .... Category numbers are Huffman coded. Coding is done in zig zag scan order.
Each label is coded with first the Huffman code of the category number,
followed by the value within the category (number of bits required equals
the category number). Zig zag scan order ensures that many coeffiecients
are zero near end of traversal, these are skipped with an end of block code.
Drawbacks and lacks in generality of the JPEG standard are
1. blockiness, due to splitting image into independent parts,
2. blocks are always processed sequentially (no way to obtain other
types of progression),
3. lossy version only, since quantization is always performed on floating
point values.
3
3.1
JPEG2000
Wavelet transform basics
We follow the introduction to subband transforms and wavelets used in [4].
A lighter introduction with more examples can be found in [5]. Going
through these two references together can be very helpful. Instead of
applying a block transform, one can attempt with a transform where one
block influences many other (surrounding) blocks. This may reduce the
blockiness, even if the transformed signal at the end is partitioned into
independent blocks anyway. We will consider a subband transform as our
candidate to such a transform:
Definition 1 An (analysis) subband transform for the set of m × m matrices A[i]i∈Z is defined by
y[n] =
X
A∗ [i]x[n − i].
i∈Z
Definition 2 A (synthesis) subband transform for the set of m × m matrices S[i]i∈Z is defined by
x[n] =
X
S[i]y[n − i].
i∈Z
3
x[n − 2]
m
x[n − 1]
x[n]
x[n + 1]
B
@
A[1]
A[0]
A[−1]
B
@
B
@
B
@
@ B
@ B BB
@@
L
-x[k]
H
HH
y[n − 2]
y[n − 1]
y[n]
y[n + 1]
m
Figure 1: One dimensional convolutional transform.
-y[k]
These definitions can be thought of as one dimensional convolutional
transforms, as shown in figure 3.1. The analysis transform produces a
transformed signal from an input signal, while the synthesis transform
should recover the input signal from the transformed signal. We say that
we have perfect reconstruction if there exists a synthesis transform exactly
inverting the analysis transform. JPEG2000 applies subband transforms
with only two channels (m = 2), as opposed to JPEG’s block transform
with eight channels. So, artifact like blocking may be removed when using
subband transforms, even if the number of channels is decreased.
3.2
Expressing transforms with filter banks
One can write
y[nm + q] = (x ? hq )[mn], 0 ≤ q < m,
(1)
∗
where the filter bank {hq }0≤q<m is defined by hq [mi − j] = (A [i])q,j .
This expresses the analysis operation through filter banks.
One can also write
m−1
x=
X
(y˜q ? gq ),
q=0
where the filter bank {gq }0≤q<m is defined by gq [mi + j] = (S[i])j,q , and
where
i
yq [ m
] if m divides i
y˜q [i] =
0
otherwise
This expresses the synthesis operation through filter banks. When constructing subband transforms from wavelets, we will construct the transform by first finding a filter bank from the scaling function of the wavelet.
3.3
Expressing transforms in terms of vectors
(n)
Let h·, ·i be the inner product in `2 (Z). One can write yq [n] = hx, aq i,
where
aq [k] = h∗q [−k],
a(n)
q [k] = aq [k − mn]
are the analysis vectors. This expresses the analysis operation in terms of
the analysis vectors.
Pm−1 P
(n)
One can also write x = q=0
y [n]sq , where
n q
sq [k] = gq [k]
4
s(n)
q [k] = sq [k − mn]
are the synthesis vectors. This expresses the synthesis operation in terms
of the analysis vectors.
3.4
Orthonormal and biorthogonal transforms
Definition 3 An orthonormal subband transform is a transform for which
the synthesis vectors are orthonormal.
It is easy to show that for orthonormal subband transforms, the analysis
and synthesis vectors are equal (sq = aq ∀q), and the analysis and synthesis
matrices are reversed version of one another A[i] = S[−i]. Orthonormal
subband transform are the natural extension of orthonormal (unitary)
block transforms.
If the analysis system is given by filters h0 , h1 , and the synthesis system
is given by filters g0 , g1 , one can calculate the end-to-end transfer function
of analysis combined with synthesis. In order to avoid aliasing, one will
find that
hˆ0 (ω + π)gˆ0 (ω) + hˆ1 (ω + π)gˆ1 (ω) = 0,
(2)
and if we in addition want perfect reconstruction,
hˆ0 (ω)gˆ0 (ω) + hˆ1 (ω)gˆ1 (ω) = 2
(3)
Example 1 Let’s take a look at a popular definition of orthonormal subband transforms through filter banks. This is an alternative definition of
what is called Quadrature Mirror Filters (QMF). Given a low-pass prototype f ,
h0 [k] = g0 [−k] = f [k]
h1 [k] = g1 [−k] = (−1)k+1 f (−(k + 1)).
Note that
g1 [n] = (−1)n+1 g0 [−(n − 1)],
−iω
or gˆ1 (ω) = e
(4)
∗
gˆ0 (ω + π) in the Fourier domain. Note also that
h1 [n] = (−1)n+1 h0 [−(n + 1)],
(5)
∗
or hˆ1 (ω) = eiω hˆ0 (ω + π) in the Fourier domain. These relations will be
used when we construct biorthogonal transforms below.
It is not hard to see that the alias condition( 2) is satisfied, and that
perfect reconstruction 3 is satisfied if
hˆ0 (ω)2 + hˆ0 (ω ± π)2 = 2.
(6)
It is also not hard to show that the {h0 [i − 2n]}n , {h1 [i − 2n]}n are orthonormal and gives rise to an orthonormal subband transform if f = h0
satsified this.
Example 2 Lapped orthogonal transform with cosine modulated filter bank:
Analysis vectors are defined by
aq [k] = cq [k] =
√1
m
cos 2πfq k −
0
q+ 1
m−1
2
if −m ≤ k < m
otherwise
where the cosine frequencies are fq = 2m2 ; 0 ≤ q < m. It is easy to verify
that these analysis vectors give rise to an orthonormal subband transform,
5
where all analysis matrices are 0, except for A[0] and A[1]. Such transforms are called lapped transforms.
One can get a more general family of lapped transforms by defining
aq [k] = cq [k]w[k],
where the windowing sequence w satisfies
w[k]
w2 [k] + w2 [m − 1 − k]
=
=
w[−1 − k];
2;
0≤k<m
0≤k<m
One can show that any window sequence satisfying these assumptions gives
rise to a new lapped orthonormal transform. These transforms work well,
and by choosing a windowing sequence wisely, one can obtain very good
frequency discrimination between the subbands.
The only thing we miss in the above example is linear phase (linear
phase means that the filter sequence is symmetric or anti-symmetric about
some point). Linear phase will ensure that filter applications will preserve
the support of the filter, which is a very nice property to use in an implementation. As it turns out, we can’t get this in addition to orthonormality:
One can show that there exists no two channel (m = 2) nontrivial, linear
phase, finitely supported orthonormal subband transforms. We therefore
extend our transforms to the following class.
Definition 4 A biorthogonal subband transform is a transform for which
(n2 )
1)
hs(n
q1 , aq2 i = δ[q1 − q2 ]δ[n1 − n2 ], 0 ≤ q1, q2 < m, n1 , n2 ∈ Z
(7)
Contrary to the case for orthonormal transforms, there exists two channel
nontrivial, linear phase, finitely supported biorthogonal subband transforms. Biorthogonal transforms are important in image compression also
because they may approximate orthonormal transforms well. It is not
hard to see that biorthogonality is equivalent with perfect reconstruction.
Example 3 We will construct biorthogonal subband transforms in the following way: We start with filters h0 , g0 , and construct filters h1 , g1 using
equations 5 and 4. To get a biorthogonal transform, we must construct
h0 , g0 jointly so that equation 7 is satisfied.
Alias cancellation and perfect reconstruction in this case reduce to
hˆ0 (ω + π)gˆ0 (ω) = hˆ0 (ω)∗ gˆ0 (ω + π)∗ ,
∗
hˆ0 (ω)gˆ0 (ω) + hˆ0 (ω + π) gˆ0 (ω + π)∗ = 2.
These equations are normally specified another way, in which it uses functions associated with wavelets: (m̂0 (ω) = √12 gˆ0 (ω), m̃ˆ0 (ω) = √12 hˆ0 (−ω)).
3.5
Multi-resolution analysis (MRA)
We turn now to the concept of constructing biorthogonal/orthonormal
transforms from wavelets.
Definition 5 A multi-resolution analysis (MRA) on L2 (R) is a set of
sub-spaces
· · · ⊂ V (2) ⊂ V (1) ⊂ V (0) ⊂ V (−1) ⊂ V (−2) ⊂ · · ·
satisfying S
the following properties.
(MR-1) m∈Z V (m) = L2 (R).
6
(MR-2) m∈Z V (m) = {0}.
(MR-3) x(t) ∈ V (0) ⇐⇒ x(2−m t) ∈ V (m) .
(MR-4) x(t) ∈ V (0) ⇐⇒ x(t − n) ∈ V (0) .
(MR-5) There exists an orthonormal basis {φn }n∈Z , for V (0) such that
φn (t) = φ(t − n). The function φ(t) is called the scaling function.
T
Since V (0) ⊂ V (−1) , MR-3 and MR-4 shows that we can write
∞
√ X
2
g0 [n]φ(2t − n)
φ(t) =
n=−∞
for some sequence g0 . g0 is to be thought of as a low-pass prototype. This
equation is called the two-scale equation. From the MR properties one
can deduce from the two scale equation that the vectors {g0 [i − 2n]}n are
orthonormal. From example 1, we can associate it with a function f , and
obtain an orthonormal subband transform this way. the high-pass filter
obtained in this way, g1 , can be used to construct a function
ψ(t) =
∞
√ X
2
g1 [n]φ(2t − n).
n=−∞
This function is called the mother wavelet, and has the nice property that
(m)
it’s translated dilations ψn (t) = ψ(2−m t − n) are orthonormal functions
2
spanning L (R).
3.5.1
Interpretation of MRA in image processing
MRA has the following interpretations with respect to image processing.
The input signal is represented as an element in V (0) , by putting the
components in the signal as coefficients for the translates of the scaling
function:
X (0)
x(t) =
y0 [n]φ(t − n).
n∈Z
(m)
Define W (m) by the span of the {ψn (t)}n . It is not difficult to show
that
1. V (m) and W (m) are orthogonal subspaces,
2. V (m) = V (m+1) ⊕ W (m+1) ,
3. the coefficients in such a decomposition can be obtained by filtering
with h0 and h1 respectively.
Note that point 1 and 2 implies that V (0) = ⊕i>0 W (i) . We need to explain
point 3 further. Equation 1 produces, through filtering with h0 , h1 , two
sequences (i.e. the two polyphase components of the transformed signal)
(0)
from an input signal. We let y0 be the input signal, and let the two
(1)
(1)
sequences produced be y0 and y1 . Then one can show (by also using
the two-scale equation)
X
(0)
y0 [n]φ(t − n) =
(1)
y0 [n]φ(1)
n (t) +
n∈Z
n∈Z
|
X
{z
∈V (0)
}
|
X
(1)
y1 [n]ψn(1) (t),
n∈Z
{z
∈V (1)
}
|
{z
}
∈W (1)
which explains point 3. This can be done iteratively, by writing
X
(1)
y0 [n]φ(1)
n (t) =
n∈Z
|
X
(2)
y0 [n]φ(2)
n (t) +
n∈Z
{z
∈V (1)
}
|
X
(2)
y1 [n]ψn(2) (t),
n∈Z
{z
∈V (2)
7
}
|
{z
∈W (2)
}
and so on. Therefore, we obtain wavelet coefficients (i.e. coefficients in
W (m) ) by iterative applications of the filters h0 , h1 .
The interpretation of the wavelet suspaces W (i) in image processing is
in terms of resolution: Wavelet subspaces at higher indices can be thought
of as image content at lower resolution, and the subspace V (m) for high m
can be thought of as a base for obtaining a low resolution approximation
of the image. If one writes
V (0) = V (2) ⊕ W (2) ⊕ W (1) ,
(2)
(2)
(1)
and decomposes a signal x = y0 + y1 + y1 into components in these
(2)
(2)
subspaces, one has in addition two approximations to the signal: y0 +y1
(2)
and y0 , where the first one is a better approximation than the last one.
One can view these approximations as versions of the signal with higher
frequencies dropped, since the coefficients are obtained through filtering
with h0 , which can be viewed as a low-pass filter. The effect of dropping
high frequencies in the approximation can be seen especially at sharp
edges in an image. These get more blurred, since they can’t be represented
exactly at lower frequencies.
Transforms in image processing are two-dimensional, so we need a few
comments on how we implement a separable transform. When a twodimensional transform is separable, we can calculate it by applying the
corresponding one-dimensional transform to the columns first, and then
to the rows. When filtering, we have four possibilities
1. low-pass filter to rows, followed by low-pass filter to columns (LLcoefficients)
2. low-pass filter to rows, followed by high-pass filter to columns (HLcoefficients)
3. high-pass filter to rows, followed by low-pass filter to columns (LHcoefficients)
4. high-pass filter to rows, followed by high-pass filter to columns (HHcoefficients)
When a separable transform is applied, only the LL-coefficients may need
further decomposition. When this decomposition is done at many levels,
we get the subband decomposition in figure 3.5.1. A similar type of decomposition is sketched for FBI fingerprint compression in section 3.12. The
wavelet subspace decomposition in two dimensions has a similar forms:
(m+1)
V (m) = V (m+1) ⊕ W0,1
(m+1)
⊕ W1,0
(m+1)
⊕ W1,1
,
and the mother wavelet basis functions are expressed in terms of the
synthesis filters by
ψ0,1 (s1 , s2 ) = 2
X
g0 [n1 ]g1 [n2 ]φ(2s1 − n1 , 2s2 − n2 ),
n1 ,n2 ∈Z
ψ1,0 (s1 , s2 ) = 2
X
g1 [n1 ]g0 [n2 ]φ(2s1 − n1 , 2s2 − n2 ),
n1 ,n2 ∈Z
ψ1,1 (s1 , s2 ) = 2
X
g1 [n1 ]g1 [n2 ]φ(2s1 − n1 , 2s2 − n2 ).
n1 ,n2 ∈Z
8
LL3 HL3
HL2
LH3 HH3
HL1
LH2
HH2
LH1
HH1
Figure 2: Passband structure for a two dimensional subband transform with
three levels.
Example 4 With three different resolutions, our subband decomposition
can be written
(2)
(2)
(2)
(1)
(1)
(1)
V (0) = V (2) ⊕ W0,1 ⊕ W1,0 ⊕ W1,1 ⊕ W0,1 ⊕ W1,0 ⊕ W1,1
= LL2 ⊕ HL2 ⊕ LH2 ⊕ HH2 ⊕ HL1 ⊕ LH1 ⊕ HH1 .
Contributions from these subspaces appear in the same order as above
in a JPEG2000 file, and by including more of the subspaces results in
higher resolution. We demonstrate this here with a computer generated
file. Figures 3 through 9 show file sizes and images at all decomposition
levels in this case. Also shown graphically are the subbands which are
dropped, these are blacked out. We see that we gradually lose resolution
when dropping more and more wavelet subband spaces, but that even at the
lowest resolution the image is recognizable, even if the file size is reduced
from 105 kb to 17kb. This is a very nice property for usage in web browsers.
Note that these files sizes are calculated by replacing the contents of the
subbands dropped with zeroes. They are close to number of bytes if the
subbands were dropped in their entirety.
If we can find a wavelet with nice properties, most wavelet coefficients are
close to 0, and can thus be dropped in a lossy compression scheme.
3.5.2
Biorthogonal wavelets
Given that orthonormality in (MR-5) is replaced with linear independence, we can follow the same reasoning as for orthonormality to create
a mother wavelet function. The wavelet coefficient subspaces will not
be orthogonal in this case. When iterating the filters g0 , g1 , we decompose into subspaces spanned by scaling function and mother wavelet φ, ψ.
When we iterate the filters h0 , h1 , we decompose into subspaces spanned
by dual scaling function φ̃ and dual mother wavelet ψ̃. Scaling and dual
scaling functions differ only in the biorthogonal case. We can deduce a
biorthogonal subband transform under appropriate conditions. Expressed
with mother wavelets, the criteria for constructing a biorthogonal wavelet
becomes
(m̃)
hψn(m) , ψ̃ñ i = δ[n − ñ]δ[m − m̃], ∀n, m, ñ, m̃
9
Figure 3: File with no loss, i.e. all wavelet subband spaces included. It’s size is
105kb
(1)
Figure 4: We then remove the W1,1 -coefficients. It’s size is 94kb.
10
(1)
Figure 5: We then remove the W1,0 -coefficients also. It’s size is 84kb.
(1)
Figure 6: We then remove the W0,1 -coefficients also. It’s size is 73kb.
11
(2)
Figure 7: We then remove the W1,1 -coefficients also. It’s size is 57kb.
(1)
Figure 8: We then remove the W1,0 -coefficients also. It’s size is 37kb.
12
(1)
Figure 9: Finally, we remove the W0,1 -coefficients also. It’s size is 17kb.
Since we assume linear phase, we will work with biorthogonal wavelets
from now on.
Not all biorthogonal subband transforms and orthonormal transforms
give rise to wavelets. In order for a filter bank to give rise to a wavelet,
one can show that m̂0 (defined in example 3) must have a zero at π, and
that the number of zeroes affects the smoothness of the scaling function:
More zeroes means a smoother scaling function.
Daubechies found all FIR filters with N zeroes at π which give rise
to orthonormal wavelets. Using similar calculations, Cohen, Daubechies,
and Feaveau [3] found FIR filters of odd length, linear phase and with
delay-normalized transforms with N, Ñ zeroes at π (These must both be
even), which give rise to biorthogonal wavelets. These are:
m̂0 (ω) = cos
ω
2
N
ω
p0 (cos(ω)) , and m̃ˆ0 (ω) = cos
2
Ñ
p˜0 (cos(ω))
where p0 (x)p˜0 (x) is an arbitrary factorization of the polynomial (set M =
N +Ñ
)
2
M −1
X M + n − 1 1
n
2
n=0
n
−x
.
The factorization of the polynomial P (x) is not completely arbitrary, since
we must group complex conjugate roots together to get real-valued filter
coefficients.
Example 5 If we set p0 (x) ≡ 1, we obtain biorthogonal wavelets with
filter banks consisting of dyadic fractions only. If we in addition set
Set N = Ñ = 2, we obtain the Spline 5/3 transform. This is used by
JPEG2000 for lossless compression. One can show that definition 1 simplifies to
y[n] =
√
2
− 18
0
− 41
− 18
x[n + 1] +
13
3
4
1
4
− 14
3
4
x[n] +
− 81
1
4
0
− 18
x[n − 1]
in this case. Similarly, definition 2 simplifies to
√
x[n] = 2
0
1
4
0
0
y[n + 1] +
1
2
1
4
− 14
1
2
y[n] +
0
0
− 14
0
y[n − 1]
Example 6 If we split the zeroes at π as equally as we can, and the
factors in p0 (x) and p˜0 (x) equally also, and also set N = Ñ = 4, we obtain
the wavelet JPEG2000 uses for lossy compression. This is the CFD 9/7
transform. For the lossless transform above, only the A[−1], A[0], A[1],
S[−1], S[0], S[1] were nonzero. For this lossy transform, only the A[−2],
A[−1], A[0], A[1], A[2], S[−2], S[−1], S[0], S[1], S[2] are nonzero.
5/3 and 9/7 above referr to the number nonzero coefficients in the
corresponding filters.
3.6
Transform
Wavelet-transform, with different wavelet kernels. Transform may be
skipped in it’s entirety (typically done so in lossless compression). Transform may also be applied fully or only partially. DWT. Has efficient
implementation, both in lossy and lossless case.
3.7
Quantization
Deadzone scalar quantization is the quantization method used by JPEG2000.
Extensions to the standard opens up for another quantization scheme.
3.8
Coding
MQ coding
Image split into tiles (not smaller blocks as in JPEG). Typical size
512×512. Each tile is decomposed into constituent parts, using particular
filter banks. Demands:
1. Spatial random access into bitstream
2. Distortion scalable bitstream
3. progression scalability
4. resolution scalability
3.9
Applications to video compression
MPEG4
3.10
Applications to speech recognition
PCA (Principal component analysis) is a common technique in specch
recognition.
3.11
Applications to face recognition
Elastic bunch graph matching. Gabor wavelets.
14
Figure 10: Decomposition structure employed in the FBI fingerprint compression standard.
3.12 Applications to fingerprint compression and
matching
FBI uses it’s own standard [1] for compressing fingerprint images. Compression algorithms with wavelet-based transformations were selected in
competition with compression using fractal transformations. FBI’s standard has similarities with the JPEG2000 standard, and especially with an
extension to the JPEG2000 standard. It uses another subband decomposition, this is demonstrated in figure /reffig:fingerprintdecomp.
Further decomposition of the LH-, HL- and HH-bands like this may
improve compression somewhat, since the effect of the filter bank application may be thought of as an ”approximative orthonormalization process”.
The extension to the JPEG2000 standard also opens up for this type of
more general subband decompositions. In FBI’s standard we may also
use many different wavelets, with the coefficients of the corresponding
filter banks signalled in the code-stream. The only constraint on the filters is that there should be no more than 32 nonzero coefficients. This
is much longer than lossy compression in JPEG2000 (9 nonzero coefficients). This may be necessary, since fingerprint images are images with
much more sharp edges than most natural images. FBI has their own set
of filters which they recommend. The JPEG2000 extension opens up for
user-definable wavelet kernels also. The coding is done differently with
FBI’s standard. They use Huffman coding, with tables calculated on a per
image basis. It turns out to be impossible (at least yet) to find a lossless
compression algorithm for fingerprint images with compression ratio more
than 2 : 1. Similar phenomena can be observed with JPEG2000: If an image with many sharp edges is wavelet-transformed, the compressed data
may be larger compared to when the image is NOT wavelet-transformed
(many small coefficients are obtained after wavelet transformation, and
we would obtain compression in the lossy case only, since these would be
quantized to 0 for lossy transformation). JPEG2000 solves this by not
making the wavelet transform mandatory: It can be applied down to a
given level only, or skipped altogether.
15
3.13
Other applications of wavelets
Blending of multiple images [2]. If several lighting sources are combined,
the result may be obtained by combining a set of basis images. The
combination can be done very fast in the wavelet domain.
References
[1] WSQ Gray-scale Fingerprint Image Compression Specification.
[2] I. Drori D. Lischinski. Fast multiresolution image operations in the
wavelet domain. IEEE Transactions on Visualization and Computer
Graphics., 9(3):395–412, 2003.
[3] A. Cohen I. Daubechies J.-C. Feauveau. Biorthogonal bases of compactly supported wavelets. Communications on Pure and Appl. Math.,
45(5):485–560, June 1992.
[4] David S. Taubman Michael W. Marcellin. JPEG2000. Image compression. Fundamentals, standards and practice. Kluwer Academic
Publishers, 2002.
[5] Khalid Sayood. Introduction to Data Compression. Academic Press,
2000.
16
Download