Blind Adaptive Sampling of Images

advertisement
1478
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012
Blind Adaptive Sampling of Images
Zvi Devir and Michael Lindenbaum
Abstract—Adaptive sampling schemes choose different sampling
masks for different images. Blind adaptive sampling schemes use
the measurements that they obtain (without any additional or direct knowledge about the image) to wisely choose the next sample
mask. In this paper, we present and discuss two blind adaptive
sampling schemes. The first is a general scheme not restricted to
a specific class of sampling functions. It is based on an underlying
statistical model for the image, which is updated according to the
available measurements. A second less general but more practical
method uses the wavelet decomposition of an image. It estimates
the magnitude of the unsampled wavelet coefficients and samples
those with larger estimated magnitude first. Experimental results
show the benefits of the proposed blind sampling schemes.
Index Terms—Adaptive sampling, blind sampling, image representation, statistical pursuit, wavelet decomposition.
I. INTRODUCTION
I
MAGE sampling is a method for extracting partial information from an image. In its simplest form, i.e., point sampling, every sample provides value
of image at some
location . In generalized sampling methods, every sample
measures the inner product of image with some sampling
mask . Linear transforms may be considered as sampling processes, which use a rich set of masks, defined by their basis functions; examples include the discrete cosine transform (DCT) and
the discrete wavelet transform (DWT).
A progressive sampling scheme is a sequential sampling
scheme that can be stopped after any number of samples, while
providing a “good” sampling pattern. Progressive sampling is
preferred when the number of samples is not predefined. This
is important, e.g., when we want to minimize the number of
samples and terminate the sampling process when sufficient
knowledge about the image is obtained.
Adaptive schemes generate different sets of sampling masks
for different images. Common sampling schemes (e.g., DCT
and DWT) are nonadaptive, in the sense of using a fixed set
of masks, regardless of the image. Adaptive schemes are potentially more efficient as they are allowed to use direct information from the image in order to better construct or choose the
set of sampling masks. For example, using a directional DCT
basis [21] that is adapted to the local gradient of an image patch
is more efficient than using a regular 2-D DCT basis. Other
Manuscript received July 08, 2010; revised October 10, 2011; accepted
November 15, 2011. Date of publication December 23, 2011; date of current
version March 21, 2012. The associate editor coordinating the review of this
manuscript and approving it for publication was Prof. Charles Creusere.
The authors are with the Computer Science Department, Technion–Israel Institute of Technology, Haifa 32000, Israel (e-mail: zdevir@cs.technion.ac.il;
mic@cs.technion.ac.il).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2011.2181523
examples for progressive adaptive sampling methods are the
matching pursuit and the orthogonal matching pursuit (OMP)
[11], [12], [14], which sample the image with a predefined set
of masks and choose the best mask according to various criteria.
In this paper, we consider a class of schemes that we refer to
as blind adaptive sampling schemes. Those schemes do not have
direct access to the image and make exclusive use of indirect information gathered from previous measurements for generating
their adaptive masks. A geological survey for underground oil
reserves may be considered as an example of such a scheme.
Drilling can be regarded as point sampling of the underground
geological structure, and the complete survey can be regarded
as a progressive sampling scheme, which ends when either oil
is found or funds are exhausted. This sampling scheme is blind
in the sense that the only data directly available to the geologists
are the measurements. The sampling scheme is adaptive in the
sense that the previous measurements are taken into an account
when choosing the best spot for the next drill.
While blindness to the image necessarily makes the samples
somewhat less efficient, it implies that the masks are dependent only on the previous measurements. Therefore, storing the
various sampling masks is unnecessary. The sampling scheme
functions as a kind of decoder, i.e., given the same set of measurements, it will produce the corresponding set of sampling
masks. This property of blind sampling schemes stems from the
deterministic nature of the sampling process. Similar ideas stand
at the root of various coding schemes, such as adaptive Huffman
codes (e.g., [20]) or binary compression schemes, which construct the same online dictionary during encoding and decoding
(e.g., the Lempel–Ziv–Welch algorithm).
In this paper, we present two blind adaptive schemes, which
can be considered as generalizations of the adaptive farthest
point strategy (AFPS), a progressive blind adaptive sampling
scheme proposed in [6].
The first method, called statistical pursuit, generates an optimal (in a statistical sense) set of masks and can be restricted
to dictionaries of masks. The scheme maintains an underlying
statistical model of the image, derived from the available information about the image (i.e., the measurements). The model is
updated online as additional measurements are obtained. The
scheme uses this statistical model to generate an optimal mask
that provides the most additional information possible from the
image.
The second method, called blind wavelet sampling, works
with the limited yet large set of wavelet masks. It relies on an
empirical statistical model of wavelet coefficients. Using this
model, it estimates the magnitude of unknown wavelet coefficients from the sampled ones and samples the coefficients with
larger estimated magnitude first. Our experiments indicate that
this method collects information about the image much more
efficiently than alternative nonadaptive methods.
1057-7149/$26.00 © 2011 IEEE
DEVIR AND LINDENBAUM: BLIND ADAPTIVE SAMPLING OF IMAGES
1479
Fig. 1. One iteration of the nonadaptive FPS algorithm: (a) the sampled points
(sites) and their corresponding Voronoi diagram; (b) the candidates for sampling
(Voronoi vertices); (c) the farthest candidate chosen for sampling; and (d) the
updated Voronoi diagram.
The remainder of this paper is organized as follows: In
Section II, the AFPS scheme is described as a particular case
of a progressive blind adaptive sampling scheme. Section III
presents the statistical pursuit scheme, followed by a presentation of the blind wavelet sampling scheme. Experimental
results are shown in Section V. Section VI concludes this paper
and proposes related research paths.
II. AFPS ALGORITHM
We begin by briefly describing an earlier blind adaptive
sampling scheme [6], which inspired the work in this paper.
The algorithm, denoted as AFPS, is based on the farthest point
strategy (FPS), a simple, progressive, but nonadaptive point
sampling scheme.
A. FPS Algorithm
In the FPS, a point in the image domain is progressively
chosen, such that it is the farthest from all previously sampled
points. This intuitive rule leads to a truly progressive sampling
scheme, providing after every single sample a cumulative set of
samples, which is uniform in a deterministic sense and becomes
continuously denser [6].
To efficiently find its samples, the FPS scheme maintains a
Voronoi diagram of the sampled points. A Voronoi diagram [2]
is a geometric structure that divides the image domain into cells
corresponding to the sampled points (sites). Each cell contains
exactly one site and all points in the image domain, which are
closer to the site than to all other sites. An edge in the Voronoi
diagram contains points equidistant to two sites. A vertex in the
Voronoi diagram is equidistant to three sites (in the general case)
and is thus a local maximum of the distance function. Therefore, in order to find the next sample, it is sufficient to consider
only the Voronoi vertices (with some special considerations for
points on the image boundary).
After each sampling iteration, the new sampled point becomes a site, and the Voronoi diagram is accordingly updated.
Fig. 1 describes one iteration of the FPS algorithm. Note that,
because the FPS algorithm is nonadaptive, it produces a uniform sampling pattern regardless of the given image content.
B. AFPS Algorithm
An adaptive more efficient sampling scheme is derived from
the FPS algorithm. Instead of using the Euclidean distance as
a priority function, the geometrical distance is used, along with
Fig. 2. First 1024, 4096, and 8192 point samples of the cameraman image,
taken according to the AFPS scheme.
either the estimated local variance of the image intensities or the
equivalent local bandwidth. The resulting algorithm, denoted
AFPS, samples the image more densely in places where it is
more detailed and more sparsely where it is relatively smooth.
Fig. 2 shows the first 1024, 4096, and 8192 sampling points produced by the AFPS algorithm for the cameraman image, using
the priority function
, where
is the distance of the candidate to its closest neighbors and
is a local
variance estimate.
A variant of the AFPS scheme, designed for range sampling
using a regularized grid pattern, was presented in [5].
III. STATISTICAL PURSUIT
In this section, we propose a sampling scheme based on a direct statistical model of the image. In contrast to point sampling
schemes such as the AFPS, this scheme may choose sampling
masks from an overcomplete family of basis functions or calculate optimal masks. The scheme updates an underlying statistical model for the image as more information is gathered during
the sampling process.
A. Simple Statistical Model for Images
Images are often regarded as 2-D arrays of scalar values (gray
levels or intensities). Yet, it is clear that arbitrary arrays of values
do not resemble natural images. Natural images contain structures that are difficult to explicitly define. Several attempts were
made to formulate advanced statistical models, which can approximate such global structures and provide good prediction
for missing parts [9]. Still, the local structure is easier to predict, and there exist some low-level statistical models, which
model local behavior fairly well.
1480
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012
Here, we consider a common and particularly simple secondorder local statistical model for images. We regard image as
a Gaussian random vector, i.e.,
(1)
where is the mean vector and is the covariance matrix. For
simplicity and without loss of generality, we assume
.
Two neighboring pixels in an image often have similar gray
level values (colors). Statistically speaking, their colors have
a strong positive correlation, which weakens as their distance
grows. The exponential correlation model [10], [13] is a secondorder stationary model based on this observation. According to
this model, the covariance between the intensities of two arbitrary pixels and exponentially depends on their distance, i.e.,
(2)
where is the variance of the intensities and
quickly the correlation drops.
determines how
B. Statistical Reconstruction
We consider linear sampling where the th sample is generated as an inner product of image with some sampling mask
. That is,
, where both the image and the mask
are regarded as column vectors. A sampling process provides us
with a set of sampling masks and their corresponding measurements
. We wish to reconstruct an image from
this partial information.
Let
be a matrix containing the masks as
its columns, and let
be a column vector of the
measurements. Using those matrix notations,
.
The underlying statistical model of the image may be used to
obtain an image estimate based on measurements
.
For the second-order statistical model (1), the optimal estimator
is linear [15]. The linear estimator is optimal in the
sense,
i.e., it minimizes the mean square error (MSE) between the true
and reconstructed images MSE
.
It is not hard to show that the image estimate , its covariance
, and its MSE can be written in matrix form as
(3)
(4)
MSE
trace
(5)
is linearly independent.
assuming
It should be noted that the statistical reconstruction is analogous to the algebraic one. The algebraic (consistent) reconstruction of an image from its measurements is
(assuming the linear independence of
). That is, the algebraic reconstruction is a statistical reconstruction, assuming the
pixels are independent identically distributed, i.e.,
.
Searching for a new mask that minimizes the algebraic reconstruction error
, leads to the OMP [14]. Analogously, searching for a new mask that minimizes the expected
error
, leads to the statistical pursuit, which is discussed next.
C. Reconstruction Error Minimization
A greedy sampling strategy is to find a sampling mask that
minimizes the MSE of .
If is a linear combination of the previous masks, it is trivial
to show that the MSE does not change (as no additional information is gained) and MSE
. Therefore, we can assume
that the new mask is linearly independent of .
Proposition: The reduction of the MSE, given a new mask
(linearly independent of ), is
MSE
(6)
where is the covariance of , defined in (4).
Proof: See Appendix A.
The aforementioned proposition justifies selection criteria for
the next best mask(s) in several scenarios.
For the sake of brevity, we define
and
as the estimated
image and its covariance, after sampling the image with masks
. MSE is the MSE after sampling masks, and
MSE
is the expected reduction of MSE given an arbitrary
and
mask as the next selected mask. We further denote
. Without subscript, and shall refer to the estimated
image and its covariance, given all known masks.
is a positive-semidefinite symmetric matrix, with the previous masks as its eigenvectors with corresponding zero eigenvalues.
may be regarded as the “portion” of the covariance matrix , which is statistically independent of the previous
masks.
D. Progressive Sampling Schemes
1) Predetermined Family of Masks: The masks are often selected from a predefined set of masks (a dictionary) , such as
the DCT or DWT basis. In such cases, the next best mask is determined by calculating MSE for each mask in the dictionary
and choosing
MSE
.
2) Parametrized Masks: Suppose
depends on several
parameters
. We can differentiate (6) by the parameters of the mask, solve the resulting system of equations
MSE
, and check all the extrema masks for the
one that maximizes MSE . However, solving the resulting
system of equations is not a trivial task.
3) Optimal Mask: If the next mask is not restricted, the optimal mask is an eigenvector corresponding to the largest eigenvalue. See Appendix B for details. We shall denote the eigenvector corresponding to the largest eigenvalue of
the largest
eigenvector of
and mark it as
.
is the
4) Optimal Set of Masks: The largest eigenvector of
optimal mask. If that mask is chosen, it becomes an eigenvector of
with a corresponding eigenvalue of 0. Therefore,
optimal mask is the largest eigenvector of
, which
the
is the second largest eigenvector of
. Collecting optimal
masks together is equivalent to finding the largest eigenvectors of
.
If we begin with no initial masks, the optimal first masks
are simply the largest eigenvectors of
. Those are, not
surprisingly, the first components obtained from the principal
DEVIR AND LINDENBAUM: BLIND ADAPTIVE SAMPLING OF IMAGES
1481
F. Image Reconstruction From Adaptive Sampling
The statistical reconstruction of (3) requires the measurements, along with their corresponding masks. For nonadaptive
sampling schemes, the masks are fixed regardless of the image,
and there is no need to store them. For blind adaptive sampling
schemes, where the set of masks differs for different images,
there is no need to store them either.
At each iteration of the sampling process, a new sampling
mask is constructed, and the image is sampled. Because of the
deterministic nature of the sampling process, those masks can
be recalculated during reconstruction. The reconstruction algorithm is almost identical to Algorithm 1, except for step 3(c),
which now reads, “Pick
from a list of stored measurements,” and the stopping criteria at step 4 is accordingly updated.
IV. BLIND WAVELET SAMPLING
component analysis (PCA) [7], [8] of
the image .
, i.e., the covariance of
E. Adaptive Progressive Sampling
Using the MSE minimization criteria, it is easy to construct
a nonadaptive progressive sampling scheme that selects the optimal mask, i.e., the one that minimizes the estimated error at
each step. Such a nonadaptive sampling scheme makes use of
a fixed underlying statistical model for image
.
However, as we gain information about the image, we can update the underlying model accordingly.
In the exponential model of (2), the covariance between two
pixels depends only on their spatial distance. However, pairs
associated with a large intensity dissimilarity are more likely
to belong to different segments in the image and to thus be less
correlated. If we have some image estimate, a pair of pixels may
be characterized by their spatial distance and their estimated
intensity dissimilarity.
Using the partial knowledge about image obtained during
the iterative sampling process, we now redefine the exponential
correlation model (2). The new model is based on both the spatial distance and the estimated intensity distance, i.e.,
(7)
Such correlation models are implicitly used in nonlinear filters such as the bilateral filter [19]. Naturally, other color-aware
models can be used. For example, instead of taking Euclidean
distances, we can take geodesic distances on the image color
manifold [17].
Introducing the model of (7) into a progressive sampling
scheme, we can construct an adaptive sampling scheme presented as Algorithm 1.
The statistical pursuit algorithm is quite general, but updating the direct underlying space-varying statistical model of
the image is computationally costly. We now present an alternative blind sampling approach, which is limited to a family
of wavelet masks and relies on an indirect statistical model
of the image. The scheme chooses first the coefficient that is
estimated to carry most of the energy, using the measurements
that it obtains.
A trivial adaptive scheme stores the largest wavelet coefficients and their corresponding masks. Such a scheme samples
(i.e., decomposes) the complete image and sorts the coefficients
according to their energy. However, it clearly uses all the image
information and is therefore not blind.
The proposed sampling scheme is based on the statistical
properties of a wavelet family; we use the statistical correlations
between magnitudes (or energies) of the wavelet coefficients.
Those statistical relationships are used to construct a number
of linear predictors, which predict the magnitude of the unsampled coefficients from the magnitude of the known coefficients.
This way, the presented scheme chooses the larger coefficients
without direct knowledge about the image.
The proposed sampling scheme is divided into three stages.
1) Learning Stage: The statistical properties of the wavelet
family are collected and studied. This stage is done offline
using a large set of images and is considered a constant
model.
2) Sampling Stage: At each iteration of this blind sampling
scheme, the magnitudes of all unsampled coefficients are
estimated, and the coefficient with the largest magnitude is
greedily chosen.
3) Reconstruction Stage: The image is reconstructed using
the measurements obtained from the sampling stage.
As with the other blind schemes, it is sufficient to store the
values of the sampled coefficients since their corresponding
masks are recalculated during reconstruction.
A. Correlation Between Wavelet Coefficients
Wavelet coefficients are weakly correlated among themselves [3], [9]. However, their absolute values or their energies
are highly correlated. For example, the correlation between
the magnitude of wavelet coefficients at different scales but
1482
similar spatial locations is relatively high. Nevertheless, the
sign of the coefficient is hard to predict, and therefore, the
correlations between the coefficients remain low. This property
is the foundation of zerotree encoding [16], which is used to
efficiently encode quantized wavelet coefficients.
Each wavelet coefficient corresponds to a discrete wavelet
basis function, which can also be interpreted as a sampling
mask. The mask (and its associated coefficient) is specified by
orientation ( ,
,
, or
), level of decomposition (or
scale), spatial location, and support.
Three relationships are defined.
1) Each coefficient has four (spatial) direct neighbors in the
same block.
2) Each coefficient from the
,
, and
blocks (for
) has four children. The children are of the same
orientation one level below and occupy approximately the
same spatial location of the parent coefficient. Each coefficient in the
,
, and
blocks (expected at the
highest level) has a parent coefficient.
3) Each coefficient from the
,
, and
blocks has
two cousins that occupy the same location but with different orientations. At the highest level , a third cousin in
the
block exists. Similarly, each coefficient in the
block has three cousins in the
,
, and
blocks.
We can further define second-order family relationships, such
as grandparents and grandchildren, cousin’s children, diagonal
neighbors, and so on. Those relationships carry lower correlation and can be indirectly approximated from the first-order relatives.
B. Learning Stage
The learning stage is done offline, and the statistical relationships are stored for the later sampling and reconstruction stages.
The measured correlations are considered as a fixed model. Our
experiments showed the statistical characteristics to be almost
indifferent to the image classes. That is, the proposed method is
robust for varying image classes.
At the learning stage, the wavelet coefficients are considered
as instances of random variables. We assume that the statistics
of the wavelet coefficients are independent of their spatial location, and we study complete blocks of wavelet coefficients as
instances of a single random variable. In addition, we assume
transposition invariance of the image and wavelet coefficients.
Therefore, we expect the same behavior from wavelet coefficients of opposite orientations (e.g., the
and
blocks).
It is common to assume scaling invariance as well, but our experiments showed this assumption is not completely valid for
discrete images.
See Section V-B for experimental results of the statistical
model for different types of wavelet families.
It is straightforward to build a linear predictor of the magnitude of a certain coefficient
, assuming it has known
relatives
, i.e.,
[15]. However,
the actual predictors differ according to the available observations.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012
C. Sampling Stage
The sampling stage is divided into an initial phase and a progressive phase.
1) Initial Sampling Phase: Before any coefficient is sampled,
the predictors can rely only on the expected mean. Therefore,
the coefficients with the highest expected mean should be sampled first. For wavelet decompositions, the expected mean of the
coefficients from the
block is the highest, and we start by
sampling them all.
The coefficients of the
block carry low correlation with
their cousins (at the
,
, and
blocks). Therefore,
after sampling the
block, we further decompose it “on the
fly” into
,
,
, and
, in order to make
use of the higher parent–children correlations. This way, we get
better predictors for
,
, and
.
2) Progressive Sampling Phase: The blind sampling algorithm has three types of coefficients, i.e., the coefficients it has
already sampled, which we refer to as known coefficients; relatives of the known coefficients, which are the candidates for the
sampling; and the remaining coefficients. The algorithm keeps
the candidates in a heap data structure, sorted according to their
estimated magnitude.
The output of the algorithm is an array of the coefficient
values, as sampled at steps 1(a) and 2(a).
D. Reconstruction Stage
Again, we mark the coefficients as known, candidates, and
the remaining coefficients. The input for the reconstruction algorithm is the array of values generated by the sampling algorithm (Algorithm 2).
Note that, while the value of the coefficient is stored in the
array, its corresponding wavelet mask is not part of the stored
information and is obtained during the reconstruction stage (step
2), using the predictors. Consequently, the algorithm does not
DEVIR AND LINDENBAUM: BLIND ADAPTIVE SAMPLING OF IMAGES
1483
Fig. 3. First 30 nonadaptive sampling masks.
Fig. 6. Ratio between the average reconstruction errors of adaptive and nonadaptive schemes, for unrestricted masks (PCA) and three overcomplete dictionaries (DB3, DB2, and Haar). The reference line (100%) represents the nonadaptive schemes.
V. EXPERIMENTAL RESULTS
Fig. 4. First 30 adaptive sampling masks.
Fig. 5. (Left) Image patch and (right) the reconstruction error using adaptive
and nonadaptive unrestricted masks.
A. Masks Generated by Statistical Pursuit
We start by illustrating the difference between the nonadaptive and adaptive masks. Figs. 3 and 4 present the first 30 masks
used to sample a 32 32 image patch, shown in Fig. 5. The
masks are produced by both nonadaptive and adaptive schemes,
where, in both cases, the masks are unrestricted and the same exponential correlation model is used.
The nonadaptive sampling masks, shown in Fig. 3, closely
resemble the DCT basis and not by coincidence (the DCT is an
approximation of the PCA for periodic stationary random signals [1]). The adaptive sampling masks, shown in Fig. 4, present
more complicated patterns. As the sampling process advances,
it attempts to “study” the image at interesting regions, e.g., the
vertical edge at the center of the patch (see Fig. 5).
Fig. 5 presents the true reconstruction error for a varying
number of samples. For both adaptive and nonadaptive sampling schemes, the reconstruction is done using the linear estimator (3).
Fig. 6 presents the ratio between the reconstruction errors
of the adaptive and nonadaptive schemes. We took 256 small
patches (16
16 pixels each) and compared the average reconstruction errors. This experiment was repeated for several
classes of masks, i.e., unrestricted masks (PCA) and an overcomplete family of Daubechies wavelets of order three (DB3),
of order two (DB2), and of order one (Haar wavelets) [4].
Using the adaptive schemes reduces the reconstruction error
by between 5% and 10% compared with the nonadaptive
schemes (which use the same family of masks). Using optimal
representation (i.e., the PCS basis), the error is reduced by 5%,
whereas using a less optimal basis (the Haar basis), the error is
reduced even further, as the corresponding nonadaptive scheme
is known to be suboptimal. For the first few coefficients, the
adaptive scheme does not have much information to work with.
Therefore, until more samples are gathered, the relative gain
over the nonadaptive schemes is erratic. After sampling some
more coefficients, the benefits of the adaptivity become more
apparent.
B. Correlation Models for Blind Wavelet Sampling
need to keep the masks associated with the coefficients or their
indexes.
In our experiments, we used a data set of 20 images. All the
images were converted to grayscale and rescaled to 256 256
pixels. The images are shown in Fig. 7. The first two subsets are
1484
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012
TABLE II
CORRELATIONS BETWEEN DB3
Fig. 7. Image sets used in our experiments.
TABLE I
CORRELATIONS BETWEEN DB2
“natural” images, the third set is taken from a computed-tomography brain scan, and the fourth set is a collection of animations.
Two examples for experimental correlation models are
presented in Tables I and II. Those models were studied
for Daubechies wavelets [4] of second and third orders. A
four-level wavelet decomposition was carried out over the
image data set, and the correlation coefficients between different kinds of wavelet coefficients were estimated according
to the maximum-likelihood principle.
Observing Tables I and II, we see that Daubechies wavelets of
second and third orders exhibit similar behavior. This behavior
was also experimentally found for other wavelet families. As
the order of the wavelet family increases, most of the correlation coefficients decrease. We also see that the images are not
scale invariant, and as the decomposition level decreases, the
correlation coefficients between related coefficients increase.
Our experiments show the statistical characteristics to be almost indifferent to the image classes. We tested the benefits of
using a specific (rather than the generic) model for each class of
images and found that it reduces the error by a negligible 1%.
It appears that the correlation model characterizes the statistical
behavior of the wavelet family and not of the different image
classes.
C. Blind Wavelet Sampling Results
Having obtained the correlation model for the wavelet family,
we now present some results of the blind sampling scheme.
Taking an image, we decompose it using third-level wavelet
decomposition with DB2. We compare the adaptive order, obtained by the blind adaptive sampling scheme, to a nonadaptive
raster order, and to a optimal order (which is not blind).
The nonadaptive raster-order scheme samples the coefficients
according to their block order, from the highest level to lower
ones. The optimal-order scheme assumes full knowledge of the
coefficients and samples them according to their energy.
Figs. 8–10 present partial reconstructions of the cameraman
image, where only some of the wavelet coefficients are used for
the reconstruction. There are three columns in the figures, i.e.,
the left, where the selected coefficients are marked; the middle,
where the reconstructed images, using the selected coefficients,
are presented; and the right, where the error images are shown.
Reconstruction
error is also shown.
All three schemes start with the
block and continue to
sample the remaining wavelet coefficients in different orders.
Figs. 8–10 correspond to raster, adaptive, and optimal orders,
respectively.
In Fig. 11, we can see the reconstruction errors for the raster,
adaptive and optimal orders, averaged over all 20 images. Note
that the same reconstruction error may be achieved by the adaptive scheme using about half of the samples required by the nonadaptive scheme.
The main advantage of blind sampling schemes is that the
sampling results (the actual coefficients) need to be stored
but not their locations. Nonblind adaptive schemes, such as
choosing the largest coefficients of the decomposition (the
optimal-order scheme), require an additional piece of information, i.e., the exact location of each coefficient, to be stored
alongside its value.
From a compression point of view, we have to estimate the
number of bits required for storing the additional information needed for the reconstruction. As a rough comparison,
DEVIR AND LINDENBAUM: BLIND ADAPTIVE SAMPLING OF IMAGES
Fig. 8. Reconstruction of the cameraman image using (a) the first 1024 coeffiblock; (b) 4096 coefficients of the
,
,
, and
cients of the
blocks; and (c) 16 384 coefficients of the
,
,
,
,
,
,
blocks.
and the
1485
Fig. 11. Reconstruction error averaged over 20 images, using up to 16 384
wavelet coefficients, taken according to the raster, adaptive, and optimal orders.
(Dashed line) Compression-aware comparison of the optimal order, taking into
an account the space required to store the coefficient indexes of the optimal
order.
dashed line marks the reconstruction error of the optimal order
taking into account the storage considerations.
VI. CONCLUSION
Fig. 9. Reconstruction of the cameraman image using (a) 4096 coefficients
taken according to the blind sampling order and (b) 16 384 coefficients taken
according to the blind sampling order.
Fig. 10. Reconstruction of the cameraman image using (a) 4096 coefficients
taken according to the optimal order and (b) 16 384 coefficients taken according
to the optimal order.
we assume that the location (index) of a coefficient takes
bits. Optimistic assumptions on entropy
encoders reduce the size required for storing the coefficient
indexes to about 8 bits, disregarding quantization. Using this
rough estimation on storage requirements, each coefficient of
the optimal order is equivalent, in a bit-storage sense, to about
two coefficients of the blind adaptive scheme. In Fig. 11, the
In this paper, we have presented two novel blind adaptive
schemes. Our statistical pursuit scheme, presented in Section III,
maintains a second-order statistical model of the image, which is
updated as information is gathered during the sampling process.
Experimental results have shown that the reconstruction error
is smaller by between 5% and 10%, as compared with regular
nonadaptive sampling schemes, depending on the class of basis
functions. Due to its complexity, however, this scheme is most
suitable for small patches and a small number of coefficients.
Our blind wavelet sampling scheme, presented in Section IV,
is more suitable for complete images. It uses the statistical correlation between the magnitudes of wavelet coefficients.
Naturally, the optimal selection of the coefficients with the
highest magnitude shall produce superior results, but such
unblind methods require storage of the coefficient indexes,
whereas the blind scheme only stores the coefficients. Taking
into an account the additional bit space, the blind wavelet
sampling scheme produces results almost as good as optimal
selection of the masks.
Some open problems are left for future research, such as the
application of the statistical-pursuit scheme to image compression. Including quantization in the scheme and introducing an
appropriate entropy encoder can turn the sampling scheme into
a compression scheme. Replacing DCT or DWT sampling with
their statistical-pursuit counterparts reduces the
error for
each patch by 5%–10%. However, some of the gain is expected
to be lost by the entropy encoder.
The blind wavelet sampling scheme makes use of linear predictors. However, it is known that the distribution of wavelet coefficients is not Gaussian [9], [18]. Therefore, higher order predictors for modeling the relationships between the coefficients
1486
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012
may yield better predictors and a better blind adaptive sampling
scheme.
and the denominator is
APPENDIX A
Let
be a linearly independent set of
masks, where
is the new mask. According to (5), MSE ,
i.e., the MSE of the whole image estimate after sampling the
th mask, is
MSE
Both the numerator and the denominator have similar
quadratic forms. Therefore, let us define
Plugging
back into the numerator and the denominator of
MSE
yields
where
, is a unit column vector with 1 at and
0 elsewhere, and is the image domain.
We now rewrite MSE while separating the elements influenced by
, i.e., the new mask, from the elements that are
independent of
MSE
Surprisingly,
is the covariance of , i.e., the estimated
image based on the first
measurements.
MSE
APPENDIX B
We denote
as the matrix of the previous
masks, excluding the th new mask
. Using matrix notation,
,
,
,
, and
.
A matrix blockwise inversion
, which is an eigenvector corresponding to
Proposition:
the largest eigenvalue of , is an optimal mask.
Proof: Let
be the eigendecomposition of ,
where
are the orthonormal eigenvectors
of , sorted in descending order of their corresponding eigenvalues
.
Let
be an arbitrary mask. If
, MSE
. Otherwise, let
be represented by the eigenvector basis as
. According to (6)
implies that
MSE
MSE
As
MSE
Now, we have a more precise expression for the expected
reduction of the MSE at the th iteration, i.e.,
Since
indeed
MSE
is
MSE
MSE
Hence,
MSE
The numerator of
is the largest eigenvalue of , the following holds:
maximizes
, we see that,
MSE
MSE and is an optimal mask.
REFERENCES
[1] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,”
IEEE Trans. Comput., vol. C-23, no. 1, pp. 90–93, Jan. 1974.
[2] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry, Algorithms and Applications, 2nd ed. New
York: Springer-Verlag, 2000.
[3] R. W. Buccigrossi and E. P. Simoncelli, “Image compression via joint
statistical characterization in the wavelet domain,” IEEE Trans. Image
Process., vol. 8, no. 12, pp. 1688–1701, Dec. 1999.
[4] I. Daubechies, Ten Lectures on Wavelets. Philadelphia, PA: SIAM,
1992.
DEVIR AND LINDENBAUM: BLIND ADAPTIVE SAMPLING OF IMAGES
[5] Z. Devir and M. Lindenbaum, “Adaptive range sampling using a stochastic model,” J. Comput. Inf. Sci. Eng., vol. 7, no. 1, pp. 20–25, Mar.
2007.
[6] Y. Eldar, M. Lindenbaum, M. Porat, and Y. Zeevi, “The farthest point
strategy for progressive image sampling,” IEEE Trans. Image Process.,
vol. 6, no. 9, pp. 1305–1315, Sep. 1997.
[7] H. Hotelling, “Analysis of a complex of statistical variables into principal components,” J. Educ. Psychol., vol. 24, no. 6, pp. 417–441, Sep.
1933.
[8] H. Hotelling, “Analysis of a complex of statistical variables into principal components,” J. Edu. Psychol., vol. 24, no. 7, pp. 498–520, Oct.
1933.
[9] J. Huang and D. Mumford, “Statistics of natural images and models,”
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Fort Collins, CO,
1999, pp. 541–547.
[10] A. K. Jain, Fundamentals of Digital Image Processing. Upper Saddle
River, NJ: Prentice-Hall, 1989.
[11] S. Mallat, A Wavelet Tour of Signal Processing. San Diego, CA: Academic, 1999.
[12] S. Mallat and Z. Zhifeng, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415,
Dec. 1993.
[13] A. N. Netravali and B. G. Haskell, Digital Pictures. New York:
Plenum Press, 1995.
[14] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching
pursuit: recursive function approximation with applications to wavelet
decomposition,” in Proc. 27th Annu. Asilomar Conf. Signals, Syst.,
Comput., 1993, pp. 40–44.
[15] S. Papoulis, Probability, Random Variables and Stochastic Processes. New York: McGraw-Hill, 2002.
[16] J. M. Shapiro, “Embedded image coding using zerotrees of wavelet
coefficients,” IEEE Trans. Signal Process., vol. 41, no. 12, pp.
3445–3462, Dec. 1993.
[17] N. Sochen, R. Kimmel, and R. Malladi, “A general framework for low
level vision,” IEEE Trans. Image Process., vol. 7, no. 3, pp. 310–318,
Mar. 1998.
[18] A. Srivastava, A. B. Lee, E. P. Simoncelli, and S. C. Zhu, “On advances
in statistical modeling of natural images,” J. Math. Imag. Vis., vol. 18,
no. 1, pp. 17–33, Jan. 2003.
[19] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proc. IEEE Int. Conf. Comput. Vis., 1998, pp. 839–846.
[20] J. S. Vitter, “Design and analysis of dynamic Huffman codes,” J. ACM,
vol. 34, no. 4, pp. 825–845, Oct. 1987.
1487
[21] B. Zeng and J. Fu, “Directional discrete cosine transforms for image
coding,” in Proc. IEEE Int. Conf. Multimedia Expo, 2006, pp. 721–724.
Zvi Devir received the B.A. degrees in mathematics
and in computer science and the M.Sc. degree in computer science from the Technion–Israel Institute of
Technology, Haifa, Israel, in 2000, 2000, and 2007,
respectively.
From 2006 to 2010, he was with Medic Vision
Imaging Solutions, Haifa, Israel, a company he cofounded, where he was the Chief Scientific Officer.
Previously, he was with Intel, Haifa, Israel, mainly
working on computer graphic and mathematical
optimizations. He is currently with IARD Sensing
Solutions, Yagur, Israel, focusing on advanced video processing and spectral
imaging. His research interests include video and image processing, mainly
algebraic representations and differential methods for images.
Michael Lindenbaum received the B.Sc., M.Sc.,
and D.Sc. degrees from the Department of Electrical Engineering, Technion–Israel Institute of
Technology, Haifa, Israel, in 1978, 1987, and 1990,
respectively.
From 1978 to 1985, he served in the IDF in
Research and Development positions. He did his
Postdoc with the Nippon Telegraph and Telephone
Corporation Basic Research Laboratories, Tokyo,
Japan. Since 1991, he has been with the Department
of Computer Science, Technion. He was also a Consultant with Hewlett-Packard Laboratories Israel and spent sabbaticals in NEC
Research Institute, Princeton, NJ, in 2001 and in Telecom ParisTech, in 2011.
He also spent shorter research periods in the Advanced Telecommunications
Research, Kyoto, Japan, and the National Institute of Informatics, Tokyo.
He worked in digital geometry, computational robotics, learning, and various
aspects of computer vision and image processing. Currently, his main research
interest is computer vision, particularly statistical analysis of object recognition
and grouping processes.
Prof. Lindenbaum served on several committees of computer vision conferences and is currently an Associate Editor of the IEEE TRANSACTIONS OF
PATTERN ANALYSIS AND MACHINE.
Download