Titel 16 point centred

advertisement
Perceptual image distortion metric based on a statisticallyderived divisive normalization model
1
1
Roberto Valerio , Rafael Navarro and Bart M. ter Haar Romeny
2
1
Instituto de Óptica “Daza de Valdés” - CSIC, Madrid, Spain, 28006
{r.valerio, r.navarro}@io.cfmac.csic.es
2
Department of Biomedical Engineering - Eindhoven University of Technology,
Eindhoven, The Netherlands, 5600 MB
B.M.terHaarRomeny@tue.nl
Abstract. We present a perceptual image distortion metric based on recent models of primate primary
visual cortex (V1). The perceptual metric is similar to that proposed by Teo and Heeger (1994) and
includes a linear filtering stage followed by a gain control mechanism, known as “divisive
normalization”, that explains some of the non-linear behaviour of V1 neurons. The main difference is
that in our case, following the latest V1 models, the divisive normalization is more general (it
considers not only neighbouring responses in orientation but also in position and scale) and also it is
adapted to natural image statistics. In particular, the parameters of the divisive normalization are fixed
using a novel statistically-derived model of minimum noticeable distortions in squared linear
coefficients of natural images. The results show that the proposed metric fits very well (as well as the
metric by Teo and Heeger, 1994) empirical data obtained from contrast masking experiments.
Keywords: Perceptual quality metrics, non-linear models of V1 neurons, divisive normalization, natural image
statistics.
1. Introduction
In many different image processing applications the properties of the human visual system (HVS) can
be exploited to improve the performance from a visual quality point of view. The quality
improvement that can be achieved using an HVS-based approach is significant and applies to a broad
range of applications. In the last three decades, a great deal of effort has gone into the development of
quality assessment methods that take advantage of known characteristics of the HVS. Reviews on
perceptual image quality assessment algorithms can be found in Eckert and Brandley (1998) and
Pappas and Safranek (2000). The common element in these algorithms is always a computational
model of human vision.
In recent years, various authors have shown that the non-linear behaviour of V1 neurons in primate
visual cortex can be modelled by including a gain control stage, known as “divisive normalization”
(Bonds, 1989; Geisler and Albrecht, 1992; Heeger, 1992; Carandini, Heeger and Movshon, 1997),
after a linear filtering step. Divisive normalization not only can be used to describe the non-linear
response properties of neurons in visual cortex, but also yields image descriptors more relevant from a
perceptual point of view (Foley, 1994; Teo and Heeger, 1994). Very recently, Simoncelli and coworkers (Simoncelli and Schwartz, 1999; Schwartz and Simoncelli, 2001; Wainwright et al., 2002)
presented a statistically-derived divisive normalization model. In addition to its utility to characterize
the non-linear response properties of neurons in sensory systems, and thus to demonstrate that early
neural processing is well matched to the statistical properties of the stimuli, they showed empirically
that the statistical normalization model strongly reduces pairwise statistical dependences between
responses.
In this paper, we present a perceptual image distortion metric similar to that proposed by Teo and
Heeger (1994). Our perceptual metric is based on a statistically-derived divisive normalization model
of V1 neuron responses and has two main differences with respect to that by Teo and Heeger (1994):
first, the divisive normalization considers not only neighbouring responses in orientation but also in
position and scale and second, the parameters of the divisive normalization are adapted to natural
image statistics (through a novel statistically-derived model of minimum noticeable distortions in
squared linear coefficients of natural images) instead of being fixed exclusively to fit psychophysical
data. We show that the proposed metric fits very well (as well as the metric by Teo and Heeger, 1994)
1
psychophysical data describing masking of a Gabor function by sinusoidal gratings. The fit is much
better than that of other simpler perceptual metrics.
2. Perceptual metric
The perceptual metric we proposed here consists of the following three stages:
2.1. Linear stage
The linear stage is an approximately orthogonal four-level linear decomposition based on symmetric
quadrature mirror filters (QMF) with 9 coefficients (Simoncelli and Adelson, 1990), which are closely
related to wavelets (essentially, they are approximate wavelet filters). The basis functions of this
linear transform are localized in space, orientation and spatial frequency. This gives rise to 12
subbands (horizontal, vertical and diagonal for each of the 4 scales considered here) plus an additional
low-pass channel. Multiscale linear transforms like this are very popular for image representation.
2.2. Non-linear stage
The non-linear stage consists basically of a divisive normalization, in which the responses of the
previous linear filtering stage, ci, are squared and then divided by a weighted sum of squared
neighbouring responses in space, orientation and scale, {c 2j } , plus a constant, d i2 (Simoncelli and
Schwartz, 1999):
ri 
ci2
(1)
d i2   eij c 2j
j i
If we set arbitrary at one the threshold at which distortion is visible, then the minimum noticeable
distortion  c i2 is d i2  eij c 2j . Hence, the parameters of the divisive normalization (constant d i2

j i
and weights {eij}) can be obtained from a model of minimum noticeable distortions  c i2 .
We propose the following statistically-derived model for  c i2 given the neighbouring coefficients
{c 2j } :
  p 1 
 c  c    z    
  2 2 
2
i
2
i
2
if ci2   ci2
2
ci
(2)
y 2  ci

2
2
2
 ci2  

 y
 p
 z   
2

  ci
 
 
 
 
 

2
if ci2   ci2
y  ci2   ci2
, where Φ( ) is the Gaussian cumulative distribution function with unity standard deviation and z( ) is
its inverse function. Figure 1 shows a plot of this model.
 ci2
4
3
2
1
0
ci2
0
1
2
3
4
Figure 1. Minimum noticeable distortion model with p = 0.5 and  c2 = 1.
i
2
Eq. 2 gives for each value of c i2 the value  c i2 that yields an error probability p (that is, the
probability that the random variable c i2 is in the interval [ c i2 -  c i2 , c i2 +  c i2 ]), assuming that the
conditional probability p(ci | {c 2j }) is Gaussian with zero mean and variance  c2i .
If we fix p to 0.5, then the mean of  c i2 over c i2 is  ci2   c2i . This means, according to what we
discussed above, that a good choice for the parameter values of the divisive normalization (constant
2
eij c 2j a good estimator of  c2i . It is important to note
d i2 and weights {eij}) is that that makes d i 

j i
that other authors (Schwartz and Simoncelli, 2001; Wainwright et al., 2002) have used ad hoc this
choice of parameters.
2.3. Error pooling
The final stage computes a Minkowski sum with exponent 2 of the differences  ri (multiplied by
constants ki that adjust the overall gain) between the non-linear outputs from the reference image and
the non-linear outputs from the distorted image:
r 
k
2
i
  ri
2
(3)
i
3. Results
To test the perceptual metric we have used empirical data from Teo and Heeger (1994) obtained from
contrast masking experiments conducted by Foley and Boynton (1994). The task in these experiments
is to detect a target pattern superimposed on a masker pattern. The maskers are 2 cycles per degree
(cpd) sinusoidal gratings of several orientations (0, 11.25, 22.5, 45 and 90 degrees). The target is a
vertically oriented 2 cpd Gabor patch with vertical and horizontal 1/e halfwidths of 0.5 degrees. The
target and the masker are presented simultaneously and viewed from a distance of 162 cm. We created
the corresponding digital images very easily using the program Discrim by Landy (2003).
To fix the parameters of the divisive normalization we used a “training set” of six B&W natural
images with 512x512 pixel format (“Boats”, “Elaine”, “Goldhill”, “Lena”, “Peppers” and “Sailboat”).
We considered a 12-coefficient neighbourhood {c 2j } of squared adjacent coefficients to ci along the
four dimensions (8 in a square box in the 2D space, 2 in orientation and 2 in scale), and we used
maximum-likelihood (ML) estimation independently for each subband of the QMF pyramid.
On the other hand, the gains ki (one constant for each subband of the QMF pyramid) in the error
pooling stage were determined by fitting the metric outputs to the psychophysical data. Left panel in
figure 2 shows the results for a masker orientation of 0 degrees. As we can see, the fit to the data is
extremely good. Our metric yields much better results than simple perceptual metrics, such as the
“single filter, uniform masking” (SFUM) model by Ahumada (1996) (see right panel in figure 2).
0-degree masker
0-degree masker
-10
Target threshold contrast (dB)
Target threshold contrast (dB)
-10
-20
-30
-40
-50
-40
-30
-20
Masker contrast (dB)
-10
-20
-30
-40
-50
0
-40
-30
-20
Masker contrast (dB)
-10
0
Figure 2. Results of fitting our metric (left) and the SFUM model (right) to empirical data. Empirical data are
denoted by circles. Solid curves denote predicted target thresholds contrasts.
3
While the fits to the 0-degree and 11.25-degree data are impressive, the fits to the other curves are
not as good. This is caused by the relatively broad orientation bandwidth of the QMF’s (see Teo and
Heeger, 1994).
One important characteristic of the contrast masking data is the presence (or absence) of a “dipper”
that indicates that, within that range of masker contrasts, the masker facilitates the detection of the
target. The particular nonlinearity used in our metric (Eq. 1) permits to fit the dipper quite well, which
is not the case of other non-linear functions (for example, if we simply take the square root in Eq. 1).
4. Summary and conclusions
We have presented a perceptual image distortion metric based on a statistically-derived divisive
normalization model of V1 neuron responses. Parameters of the divisive normalization have been
determined from natural image statistics using a novel statistically-derived model of minimum
noticeable distortions in squared linear coefficients of natural images. The resulting statistical way of
fixing the divisive normalization parameters is in complete agreement with the accepted hypothesis
that sensory systems are adapted to the signals to which they are exposed and also has been used ad
hoc in the literature. An important difference with other similar schemes is that the neighbourhood
considered in the divisive normalization contains image linear coefficients belonging to different
positions, orientations and scales. This permits the model to implement many intraband and interband
masking mechanisms. Finally, the results show that the perceptual metric fits very well
psychophysical data from classical contrast masking experiments, and we expect even better results in
more realistic experiments with natural stimuli.
Acknowledgements
This research was supported by the Spanish Commission for Research and Technology (CICYT) under grant DPI200204370-C02-02. Roberto Valerio was supported by a Madrid Education Council and Social European Fund Scholarship for
Training of Research Personnel, and by a City Hall of Madrid Scholarship for Researchers and Artists in the Residencia de
Estudiantes.
References
Ahumada, A. J., Jr. 1996. Simplified vision models for image quality assessment. In SID International
Symposium Digest of Technical Papers. Ed. Morreale, J. Society for Information Display, 27: 397-400.
Bonds, A. B. 1989. Role of inhibition in the specification of orientation selectivity of cells in the cat striate
cortex. Visual Neuroscience, 2: 41-55.
Carandini, M., Heeger, D. J. and Movshon, J. A. 1997. Linearity and normalization in simple cells of the
macaque primary visual cortex. J. Neuroscience, 17: 8621-8644.
Eckert, M. P. and Brandley, A. P. 1998. Perceptual quality metrics applied to still image compression. Signal
Processing, 70: 177-200.
Foley, J. M. 1994. Human luminance pattern mechanisms: masking experiments require a new model. Journal
of the Optical Society of America A, 11: 1710-1719.
Foley, J. M. and Boynton, G. M. 1994. A new model of human luminance pattern vision mechanisms: Analysis
of the effects of pattern orientation, spatial phase, and temporal frequency. In Computational Vision Based on
Neurobiology. Ed. Lawton, T. A. SPIE Proceedings, 2054.
Geisler, W. S. and Albrecht, D. G. 1992. Cortical neurons: Isolation of contrast gain control. Vision Research, 8:
1409-1410.
Heeger, D. J. 1992. Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9: 181-198.
Landy, M. S. 2003. A tool for determining image discriminability, http://www.cns.nyu.edu/~msl/discrim/
/discrimpaper.pdf.
Pappas, T. N. and Safranek, R. J. 2000. Perceptual criteria for image quality evaluation. In Handbook of Image
and Video Proc. Ed. Bovik, A. Academic Press.
Schwartz, O. and Simoncelli, E. P. 2001. Natural signal statistics and sensory gain control. Nature neuroscience,
4(8): 819-825.
Simoncelli, E. P. and Adelson, E. H. 1990. Subband image coding. Subband Transforms. Ed. Woods, J. W.
Kluwer Academic Publishers. Chapter 4: 143-192
Simoncelli, E. P. and Schwartz, O. 1999. Modeling surround suppression in V1 neurons with a statisticallyderived normalization model. Advances in Neural Information Processing Systems, 11: 153-159.
Teo, P. C. and Heeger, D. J. 1994. Perceptual image distortion. Human Vision, Visual Processing, and Digital
Display V. B. Eds. Rogowitz, B. E. and Allebach, J. P. Proc. SPIE, 2179: 127-141.
Wainwright, M. J., Schwartz, O. and Simoncelli, E. P. 2002. Natural image statistics and divisive normalization:
modeling nonlinearities and adaptation in cortical neurons. Statistical Theories of the Brain. Eds. Rao, R.,
Olshausen, R., and Lewicki, M. MIT Press. Chapter 10: 203-222.
4
Download