subset formula

advertisement
Full Reference Image Quality Assessment Based on Saliency
Map Analysis
Tong Yubing*, Hubert Konik*, Faouzi Alaya Cheikh** and Alain Tremeau*
*
Laboratoire Hubert Crurien UMR 5516, Université Jean Monnet -Saint-Etienne, Université de Lyon,42000
Saint-Etienne, France.
E-mail : yubing.tong@univ-st-etienne.fr
**
Computer Science & Media Technology, Gjvik University College, PO BOX 191, N-2802,
Gjvik , Norway
Abstract. Salient regions of an image are the parts that differ significantly from their neighbors. They tend to
immediately attract our eyes and capture our attention. Therefore, they are very important regions in the
assessment of image quality. For the sake of simplicity, region saliency hasn’t been fully considered in most of
previous image quality assessment models. PSNR_HVS and PSNR_HVS_M are two new image quality
estimation methods with promising performance.1 But with PSNR_HVS and PSNR_HVS_M, no saliency
region information is used. Moreover images are divided into fixed size blocks and each block is processed
independently in the same way with the same weights. In this paper, the contribution of any region to the global
quality measure of an image is weighted with variable weights computed as a function of its saliency. The idea
is to take into account the visual attention mechanism. In salient regions, the differences between distorted and
original images are emphasized, as if we are observing the difference image with a magnifying glass. Here a
mixed saliency map model based on Itti’s model and face detection is proposed. As faces play an important role
in our visual attention, faces should also be used as an independent feature of the saliency map. Both low-level
features including intensity, color, orientation and high-level feature such as face are used in the mixed model.
Differences in salient regions are then given more importance and thus contribute more to the image quality
score. The saliency map of every point is correlated with that of its neighboring region, considering the
statistical information of the point neighborhood and the global saliency distribution. The experiments done on
the 1700 distorted images of the TID2008 database, show that the performance of the image quality assessment
on full subsets is enhanced. Especially on Exotic and Exotic2 distorted subsets, the performance of the
modified PSNR_HVS and PSNR_HVS_M based on saliency map is greatly enhanced. Exotic and exotic2 are
two subsets with contrast change, mean shift distortions. PSNR_HVS and PSNR_HVS_M only used image
intensity information, but for our proposed method, color contrast, intensity and other information will be
detected in the image quality assessing and our method can reflect the attribute of our visual attention more
effectively than PSNR_HVS or PSNR_HVS_M. For PSNR_HVS, the Spearman correlations on exotic and
exotic2 subsets have been enhanced individually by nearly 69.1% and 16.4 % respectively, and Kendall
correlations have been enhanced individually by nearly 60.5% and 6.7 % respectively. For PSNR_HVS_M, the
Spearman correlations have been enhanced individually by nearly 61.3% and 15.3 % respectively, and Kendall
correlations have been enhanced individually by nearly 51.55% and 4.76 % respectively.
Key words: image quality assessment, saliency map, face detection, visual attention mechanism
1. Introduction
Subjective image quality assessment procedure is a costly process which requires a large number of observers
and takes lots of time. Therefore, it cannot be used in automatic evaluation programs or in real time
applications. Hence it is a trend to assess image quality with objective methods. Usually image quality
assessment models are set up to approximate the subjective score on image quality. Some referenced models
1
had been proposed such as in VQEG.2 Some methods have gotten better results than PSNR and MSE, including
UQI, SSIM, LINLAB, PSNR_HVS, PSNR_HVS_M, NQM, WSNR, VSNR etc.3-16 But it has been
demonstrated that considering the wide range of possible distortion types no existing metric performance will
be good enough. PSNR_HVS and PSNR_HVS_M are two new methods with high performance on noise,
noise2, safe, simple and hard subsets of TID2008, which makes them appropriate for evaluating the efficiency
of image filtering and lossy image compression.1 But PSNR_HVS and PSNR_HVS_M show very low
performance on exotic and exotic2 subset of TID2008 database. With PSNR_HVS and PSNR_HVS_M, images
are divided into fixed size blocks. Moreover, every block is processed independently in the same way with the
same weights.
Such way of comparing images is contradictory with the way our HVS proceeds. Dividing an image into
blocks of equal size irrespective of its content is definitely counterproductive since it breaks large objects and
structures of the image into semantically non-meaningful small fragments. Additionally it introduces strong
discontinuities that were not present in the original image. Furthermore, it is proven that our HVS is selective in
its handling/processing of the visual stimulus. Thanks to this selectivity of our visual attention mechanism,
human observers usually focus more on some regions than another irrespective of their size. Therefore, it is
intuitive to think that an approach that treats the image regions in the same way, disregarding the variation of
their contents will never be able to faithfully estimate the perceived quality of the visual media. Therefore, we
propose to use the saliency information to mimic the selectivity of the HVS and integrate it into existing
objective image quality metrics to give more importance to the contribution of salient regions over those of
non-salient regions.
Image saliency map could be used as weights on the results of SSIM, VIF etc.17, but the saliency map used
in this study was in fact the image reconstructed by phase spectrum and inverse Fourier transform which could
reflect the presence of contours. This may not be enough, since the contour of an image is far from containing
all information in the image. The detection order of region saliency was used to weight the difference between
reference and distorted images.18 For every image, there are 20 time steps to find the saliency region. If a
salient region is found first, it is assigned the largest weight and vice versa. For pixels in the detected salient
region, same weighting and simple linear weighting were used. In this paper, we propose to consider additional
information computed from the image contents that affects region saliency. We will consider not only the
saliency value of every pixel but also the relative saliency degree of the current pixel to its neighboring field
and to the global image. Furthermore, non-salient regions contribution to image quality score will be reduced
by assigning lower weights to them.
Face plays an important role in recognition and can focus much of our attention.19 Face should thus be
used as a high-level feature for the saliency map analysis in addition to low-level features such as those used in
Itti’s model 20 based on color, intensity and orientations. In this paper, we propose a mixed saliency map model
based on Itti’s model and a face detection model.
This paper is organized as follows: PSNR_HVS and PSNR_HVS_M are reviewed in section 2. An
example about the distortion in salient region is then given to show that salient regions contribute more to the
perceived image quality which has not been considered in PSNR_HVS and PSNR_HVS_M models. In section
3, an image quality assessment model based on a mixed saliency map is proposed. Experimental results using
images from TID2008 database are presented and discussed in section 4. Section 5 concludes the paper.
2. Analysis of Previous Work and Primary Conclusion
PSNR and MSE are two common methods used to assess the quality of the distorted image defined by,
2
2
5
5
P
S
N
R

1
0
lg
(
)
M
S
E
2
(1)
M S E
1 M N
 i, j 2

M  N i 1 j 1
(2)

 i, j   [a(i, j )  a (i, j )]
(3)

Where (i,j) is the current pixel position; a (i, j ) and a (i, j ) are the original image and the distorted image
respectively, and M and N are the height and width of the image. Neither image content information nor HVS
characteristics were taken into account by PSNR and MSE when they are used to assess image quality.
Consequently PSNR and MSE can’t achieve good results when compared to subjective quality scores,
especially for images such as those in noise, noise2, exotic and exotic2 subsets which include images corrupted
with additive Gaussian noise, high frequency noise, impulse noise, Gaussian blur etc.. PSNR gives the worst
results according to Spearman’s correlation and Kendall’s correlation.1
PSNR_HVS and PSNR_HVS_M are two models which had been designed to improve the performance of
PSNR and MSE. The PSNR_HVS divides the image into 8x8 pixels non-overlapping blocks. Then the
 i, j 
difference between the original and the distorted blocks is weighted for every 8x8 block by the coefficients of
the Contrast Sensitivity Function (CSF). So equation (3) can be rewritten as follows,
 PSNR _ HVS i, j    i, j   CSFCof i, j 
Here
(4)
 i, j  is calculated using DCT coefficients.
PSNR_HVS_M is defined in similar way to the PSNR_HVS, but the difference between the DCT
coefficients is further multiplied by a contrast masking metric (CM) for every 8x8 block. The result is then
weighted by the CSFCof as follows,
 PSNR _ HVS _ M i, j    i, j   CM (i, j)  C S F
Cof i, j 
(5)
Consequently, a new MSE metric for PSNR_HVS can be defined as follows,
MSE PSNR _ HVS 
1
M N

   
M /8 N /8
I 1 J 1

8

8
i 1 j 1
PSNR _ HVS
i, j 2 
(6)

where (I,J) is the position of an 8x8 block in the image and (i,j) is the position of a pixel in the 8x8 block.
MSE PSNR _ HVS _ M can be defined in the same way. Then PSNR_HVS or PSNR_HVS_M can be computed by
replacing the MSE in equation (1) with MSE PSNR _ HVS or MSE PSNR _ HVS _ M .
2.1 Analysis
For PSNR_HVS and PSNR_HVS_M, images are processed with non-overlapping 8x8 blocks. Every 8x8 block
is considered to contribute equally to the image quality metric. According to human visual perception, 8*8
block size are not optimal considering the variability of image content. In fact, the size of the salient region is
not fixed. Independent blocks with fixed size might result in blockness or sudden change that affects greatly the
subjective quality perception. As an illustration the following figures show that different parts of an image
contribute differently to the perceived image quality and that degradation in salient regions may be more
prominent and hence should contribute more to the final quality measure.
3
Reference I18 of TID2008
saliency map with skin hue detection of I18 in TID2008
50
100
100
150
150
height
height
60
50
200
50
40
200
250
250
300
300
30
20
10
350
350
50
100
150
200
250
300
width
350
400
450
500
100
100
100
150
150
height
height
50
200
200
250
250
300
300
350
350
200
250
300
width
500
distortion on non-saliency region
50
150
400
Figure 2. Saliency map of ‘I18’ with face detection.
distortion on saliency region
100
300
width
Figure 1. Reference image ‘I18’.
50
200
350
400
450
50
500
Figure 3.‘I18’ with noise in one salient region.
100
150
200
250
300
width
350
400
450
500
Figure 4.‘I18’ with noise in four non-salient regions.
The image ‘I18’ and its corresponding saliency map are respectively illustrated in Figure 1and Figure 2.
Figure 3 is a distorted image of ‘I18’ with noise on the saliency region including face, neck and breast part. The
objective image quality of this distorted image is equal to 46.3 db with PSNR, 33.74 db with PSNR_HVS and
36.3 db with PSNR-HVS_M. Figure 4 is another distorted image of ‘I18’ with noise on the non-saliency region.
The objective image quality of this second distorted image is equal 41.6 db with PSNR, 32.4 db with
PSNR_HVS and 35.8 db with PSNR-HVS_M. Here a local smoothing filter was used to filter the
corresponding parts in saliency map with noise. The objective image quality metric values show that the quality
of Figure 3 is better than that of Figure 4. But it is easy to see that the image quality of Figure 4 is better than
that of Figure 3 as the filter operation was added on the non-saliency region of Figure 4. All the distorted parts
in Figure 4 are not perceptibly noticeable unless they were carefully observed pixel by pixel. In Figure 5, the
non-saliency regions with noise in Figure 4 are marked out with blue circles.
4
Figure 5. ‘I18’ with distortion in four non-salient regions.
For the reference image ‘I18’ in TID2008, noise was added in equal quantity to different parts of the image.
Each time image quality scores were computed and found to be different. This result confirms our initial
expectations that quantitatively equal distortion yield different image quality scores. Each part of an image
contributes differently to the perceived image quality. Furthermore the distortions in salient regions affect
image quality more profoundly than those in non-salient regions.
3. Image Quality Assessment Based on Region Saliency
In this section, saliency map of an image will be calculated using Itti’s saliency map model or the following
mixed saliency map model when faces are present in the image. First, a simple and fast face detection program
in OpenCV based on Haar like features was used to decide if the current image contains human faces.21 Then
according to that decision, Itti’s model or the mixed model will be used to calculate saliency map.
3.1 Itti’s Saliency Map Model
The saliency map model that we propose is mainly based on Itti’s visual attention model. Considering that faces
play an important role in our daily social interaction and thus easily focus our visual attention, we propose a
mixed saliency map model based on Itti’s visual attention model and face detection.
Itti’s salient map model is defined as a bottom-up visual attention mechanism which is based on color,
intensity and orientation features. Each feature is analyzed using Gaussian pyramid and multi-scales. This
model is based on 7 feature maps including one intensity, four orientations (at 0°, 45°, 90° and 135°) and two
color opponencies (red/green and blue/yellow) conspicuous maps. After a normalization step, all those feature
maps are summed to 3 conspicuous maps including intensity conspicuous map C i , color conspicuous map C c
and orientation conspicuous map C o . Finally the saliency maps are combined together to get the saliency
maps according to the following equation
SItti=
1
 Ck
3 k i , c , o
(7)
As an example, let us consider the image ‘I01’ in TID2008 (see Figure 6 (a)), its saliency map (Figure 6
(b)) computed using Itti’s model and the corresponding surface plot (Figure 6 (c)). The more reddish a region of
the saliency map is, the more salient it’s corresponding image region is. In figure 6 (c) most of the regions are
non-salient except only a few saliency regions. This concords with the selectivity of the HVS which focuses
only on some parts of the image instead of the whole content.
5
Reference I01 in TID2008
50
100
height
150
200
250
300
350
50
100
150
200
250
300
width
350
400
450
500
(a) reference image ‘I01’.
50
100
150
200
250
300
350
50
100
150
200
250
300
350
400
450
500
(b) saliency map of ‘I01’.
(c) surface plot of I01 saliency map.
Figure 6. image ‘I01’ with its saliency map and corresponding surface plot.
3.2 Saliency Map Model based on Face Detection
Faces are features which focus more attention than other features in many images. Psychological tests have
proven that face, head or hands can be perceived prior to any other details.20 So faces can be used as high level
features for saliency map. One drawback of Itti’s visual attention mechanism model is that its saliency map
model is not well adapted for images with faces. Several studies in face recognition have shown that skin hue
features could be used to extract the face information. To detect heads and hands in images, we have used the
face recognition and location algorithm used by Walther et al.22. This algorithm is based on a Gaussian model
of the skin hue distribution in the (r’, g’) color space as independent feature. For a given color pixel (r’, g’), the
model’s hue response is then defined by the following equation,
 1  (r '   ) 2 ( g '   g ) 2  (r '   r )( g '   g )  
r

h(r ' , g ' )  exp   


2
2

 2




r g
r
g



r' 
r
r  g b
g' 
and
g
r  g b
2
Where (  r ,  g ) is the average of the skin hue distributions,  r and
(8)
(9)
 g2 are the variances of the r’ and g’
components, and  is the correlation between the components r’ and g’. These parameters had been
6
statistically estimated from 1153 photographs which contained faces. The function h(r ' , g ' ) can be considered
as a color variability function around a given hue.
3.3 Mixed Saliency Map Model based on Face Detection
The mixed saliency analysis model that we propose is a linear combination model which combines both Itti’s
model and the Gaussian face detection model as follow,
SMIX =
Where
  S Itti  (1   )  S Face
(10)
 is a constant. The best results that we obtained in our study has been achieved for  = 3 7 .
s a l i e n c y m a p wi t h s k i n h u e d e t e c t i o n o f I1 8 i n TID2 0 0 8
s a l i e n c y m a p wi t h o u t s k i n h u e d e t e c t i o n o f I1 8 i n TID2 0 0 8
60
60
50
50
50
height
200
50
100
40
150
30
250
40
150
200
30
250
20
20
300
300
10
10
350
350
100
200
300
width
400
500
100
200
300
width
400
500
(a) Saliency map from mixed model.
(b) Saliency map from Itti’s model.
Figure 7. Saliency maps for mixed model and Itti’s model on ‘I18’ reference image.
reference I23 in TID2008
50
100
150
height
height
100
200
250
300
350
50
100
150
200
250
300
width
350
400
(a) ‘I23’ reference image.
7
450
500
saliency area based on skin hue detection of reference I23 of TID2008
saliency area without skin hue detection of referenceI23 of TID2008
60
60
50
50
50
100
40
200
40
150
height
height
150
50
100
30
250
200
30
250
20
20
300
300
10
10
350
350
100
200
300
400
500
100
width
200
300
400
500
width
(b) Saliency map from mixed model.
(c) Saliency map from Itti’s model.
Figure 8. Saliency maps from mixed model and Itti’s model for ‘I23’ reference image.
For most of images containing faces, heads or hands, the mixed model with skin hue detection gives better
results than the Itti’s model, i.e. more accurate saliency maps. The two examples given in this paper show the
difference between Itti’s model and the mixed model for face images. The first example corresponds to the
reference image ‘I18’ in TID2008 which contains a face with eyes and hands. Figure 7 (a) shows the saliency
map computed from the mixed model. Figure 7 (b) shows the saliency map computed from Itti’s model. In
Figure 1, the most salient regions which attract the attention are the face and the hands. Figures 7 (a) and 7 (b)
show that the saliency map computed from the mixed model is more precise that those computed from the Itti’s
model.
Another interesting example is the reference image ‘I23’ which is a non-human face image. The original
reference image is shown in Figure 8 (a). The most salient regions which focus the attention are the heads of
the parrots and in particular their eyes and their faces. Considering the hue of the faces of the parrots and in
particular the hue of the neighborhood around the eyes, we computed the corresponding color variability
function h(r ' , g ' ) next the mixed model associated to this hue distribution. The saliency map computed from
the mixed model is given by Figure 8 (b) and the one computed from Itti’s model is given by Figure 8 (c).
Figures 8 (a) and 8 (b) show that the saliency map computed from the mixed model is more accurate that that
computed from the Itti’s model. This second example shows that the mixed model could be extended to other
high level features other than human faces.
3.4 Mixed Saliency Map Model based on Salient Region
We usually focus on the salient regions instead of salient points. That means that the saliency value of every
pixel should be weighted in function of the saliency value of pixels belonging to the neighboring field or of the
saliency value of the region it belongs to. For each pixel belonging to a salient region, we propose to enlarge
the area of neighboring field as if we are wearing a magnifying glass. For each pixel belonging to a non-salient
region, we propose to give less weight to the neighboring field. We used a metric to define the salient regions
and the neighboring field associated with a given pixel.
First we computed the binary mark metric, Bi , j defined as follow,
if S MIX i, j   T1 ,
0
Bi , j  
 1 otherwise.
(11)
Where T1 is a threshold computed experimentally, SMIX (i,j) is the saliency value computed from the saliency
map model considered and (i,j) is the pixel position in the image.
8
Next we computed block by block the relative saliency degree of the current pixel in function of its
neighboring field. The current point A(i,j), current block(I,J) and the overlapped neighboring field N(i,j) with
size of k  k are illustrated in Figure 9.
N(i,j)
Current Block(I,J)
A(i,j)
Figure 9. Current block, current pixel and its neighboring field.
 I , J was defined as a saliency flag of the current block as follow,
I ,J
8
8

if   BBlock( I , J ) (i, j )  T2 ,
 false

i 1
j 1
 t r u e o t h e r w.i s e
(12)
Where T2 is an experimental threshold, (i,j) is the pixel position in the Block(I,J).
Then, as salient regions focus more the attention of the observers than non-salient regions, we gave less
weight to pixels belonging to non-salient regions. This means that the saliency value of every pixel is weighted
by a function of the saliency values of the pixels belonging to its neighboring area. We considered several
variables to compute the relative saliency of the current neighboring area, current block and current pixel.
Let us define
 Block ( I , J ) and  region (i, j ) the relative saliency degree of the current block and the current
neighboring field as functions of the average saliency and of the global image.
 Block ( I , J ) 
 1 8 8

  S MIX (i, j ) 


SGlobal  64 i 1 j 1

1
 region (i, j ) 
S Local
S Global
(13)
(14)
with
S L o c al
SGlobal 
Let us define  pixel _ average (i, j ) and
1 k k
 S M I X(i, j)
k  k i 1 j 1
1
WN
M
N
 S
i 1 j 1
 pixel _ max (i, j )
MIX
(i, j )
(15)
(16)
the relative saliency degree of the current pixel as a
function of its neighboring field and of the global image.
 S MIX (i, j ) S MIX (i, j ) 
,

SGLobal 
 S Local
 pixel _ average (i, j )  max 
9
(17)
 pixel _ max (i, j ) 
S MIX (i, j )
S Max _ Local
(18)
with
SMax _ Local  max SMIX (i, j ) i  k , j  k
(19)
Finally, to decrease the influence of non-salient regions, we computed a weighted saliency map ws (i, j )
defined as follow,

ws (i, j )  max region(i, j ), Block (i, j )

region(i, j )  T3

(20)
Where T3 is a threshold computed experimentally.
Thus, if we consider for example the saliency map of reference ‘I18’ given by Figure 7 (a), we get the
weighted saliency map ws corresponding to the Figure 10.
(b) surface plot of weighted saliency map ws .
(a) surface plot of saliency map.
Figure 10. Surface plot of saliency map and weighted saliency map ws .
Comparing Figures 10 (a) and (b), we can see that ws reflects the fact that observers usually focus on the
most salient parts instead of all locally salient parts. Most salient regions correspond to regions which are not
only locally salient but also salient with regards to the global image.
3.5 Image Quality Assessment weighted by Salient Region
In order to improve the efficiency of image quality metrics taking into account the human visual attention
mechanism, we propose to weight the image differences from the salient regions instead of salient point.
Considering that human observers are unable to focus on several areas at the same time and that they assess the
quality of an image firstly/mainly from the most salient areas, we propose to weight image differences metrics
by the weighted saliency map ws defined above. Thus the PSNR-HVS metric can be computed with the
following pseudo code,
10
// for the pixels in a target block with 8x8
for i=1:8
for j=1:8
if (  I , J is false)
 CSFCof i, j  
 ;
 CSFCof i, j   1 
 PSNR _ HVS _ S i, j    i, j   
else
if ((  pixel _ max  T4 ) & (  pixel _ average  T5 ))
 PSNR _ HVS _ S i, j    PSNR _ HVS i, j   ws (i, j ) ;
else
 PSNR _ HVS _ S i, j    PSNR _ HVS i, j  ;
end
end
end
end
Where (i,j) is the position of a pixel in an 8x8 block. The thresholds T3, T4, T5 have been empirically defined to
15, 0.5 and 40 respectively for TID2008 database.
4. Experimental Results and Analysis
In this paper, the images in TID2008 database were used to test our image quality assessment model. TID2008
is the largest database of distorted images intended for verification of full reference quality metrics.23 We used
the TID2008 database as it contains more distorted images, types of distortion and subjective experiments than
the LIVE database.24 The TID2008 database contains 1700 distorted images (25 reference images x 17 types of
distortions x 4 levels of distortions). LIVE contains 779 distorted images with only 5 types of distortion and
161 subjective experiments. The MOS (Mean Opinion Score) of image quality was computed from the results
of 838 subjective experiments carried out by observers from Finland, Italy, and Ukraine. The higher the MOS is
(0 - minimal, 9 - maximal, MSE of each score is 0.019), the higher the visual quality of the images is.
All the distorted images are grouped together in a full subset or into different subsets including Noise,
Noise2, Safe, Hard, Simple, Exotic, Exotic2 with different distortions. For example, in Noise subset there are
several types of distortions such as high frequency noise distortion, Gaussian blur, image denoising etc.
In order to compare the accuracy of the image quality metrics weighted by salient regions with those of
non-weighted metrics, we compute the Spearman correlation and Kendall correlation coefficients. Spearman
correlation and Kendall correlation coefficients are two indexes used in image quality assessment to compute
the correlation of objective measures with human perception. Other methods including PSNR and LINLAB
were also computed for comparison purposes.
11
Table I. Spearman correlation coefficients.
Spearman correlation coefficients
models
subsets
PSNR_HVS_
PSNR
LINLAB
PSNR_HVS
PSNR_HVS_S
 (%)
M
 (%)
PSNR_HVS_M_S
Noise
0.704
0.839
0.917
0.914
-0.327
0.918
0.920
0.218
Noise2
0.612
0.853
0.933
0.863
-7.50
0.93
0.871
-6.344
Safe
0.689
0.859
0.932
0.920
-1.28
0.936
0.924
-1.282
Hard
0.697
0.761
0.791
0.814
2.908
0.783
0.816
4.215
Simple
0.799
0.877
0.939
0.933
-0.639
0.942
0.935
-0.743
Exotic
0.248
0.135
0.275
0.465
69.09
0.274
0.442
61.314
Exotic2
0.308
0.033
0.324
0.377
16.358
0.287
0.331
15.331
Full
0.525
0.487
0.594
0.622
4.71
0.559
0.595
6.44
Table 2 Kendall correlation coefficients
Kendall correlation coefficients
models
subsets
PSNR_HVS_
PSNR
LINLAB
PSNR_HVS
PSNR_HVS_S
 (%)
M
PSNR_HVS_M_S
 (%)
Noise
0.501
0.652
0.751
0.745
-0.799
0.752
0.752
0
Noise2
0.424
0.671
0.78
0.680
-12.82
0.771
0.689
-10.63
Safe
0.486
0.682
0.772
0.752
-2.59
0.778
0.757
-2.69
Hard
0.516
0.569
0.614
0.634
3.257
0.606
0.637
5.11
Simple
0.598
0.715
0.785
0.773
-1.52
0.789
0.777
-1.52
Exotic
0.178
0.084
0.195
0.313
60.51
0.194
0.294
51.55
Exotic2
0.225
0.026
0.238
0.254
6.72
0.21
0.220
4.76
Full
0.369
0.381
0.476
0.472
-0.8
0.449
0.455
1.34
In the table 1 and table 2, PSNR_HVS_S and PSNR_HVS_M_S are respectively the new modified
PSNR_HVS and PSNR_HVS_M based on weighted saliency map. The original PSNR_HVS and
PSNR_HVS_M are based on image differences metrics which assess the image quality by independent blocks
without taking into account that salient regions contribute more in the image quality score.  (%) is the
enhancement of performance of PSNR_HVS and PSNR_HVS_M.
Considering Spearman correlation coefficients, PSNR_HVS and PSNR_HVS_M perform well on Noise,
Noise2, Safe, Hard and Simple subsets of TID2008. But they don’t perform well on Exotic and Exotic2 subset.
With the weighted saliency map, the Spearman coefficients of PSNR_HVS and PSNR_HVS_M on full subsets
are enhanced although there is reduction on Noise2 subset. On Exotic and Exotic2 distorted subsets, the
performance of the modified PSNR_HVS and PSNR_HVS_M based on saliency map are remarkably enhanced.
For PSNR_HVS, the Spearman correlations on Exotic and Exotics2 are enhanced individually nearly 69.1%
and 16.4 %, and Kendall correlations are enhanced individually nearly 60.5% and 6.7 % respectively. For
PSNR_HVS_M, the Spearman correlations are enhanced individually nearly 61.3% and 15.3 % respectively,
and Kendall correlations are enhanced individually nearly 51.55% and 4.8 % respectively. Exotic and exotic2
are two subsets with contrast change, mean shift distortions. PSNR_HVS and PSNR_HVS_M only used the
intensity information, but for our proposed method, color contrast, intensity and other information will be
12
detected in the image quality assessing. So our method can reflect the attribute of our visual attention more
effectively than PSNR_HVS or PSNR_HVS_M.
Figure 11 illustrates the scatter plots of the MOS for different models including PSNR, LINLAB,
PSNR_HVS and PSNR_HVS_S, etc. This Figure shows that the modified PSNR-HVS and PSNR-HVS-M are
better clustered than that of original models except for only few extreme points.
Figure 11.
Scatter plots of the image quality assessment models, the plots with blue points were the results
from the image quality assessment model based on weighted saliency map.
Our results show that PSNR-HVS and PSNR-HVS-M work better than PSNR and LINLAB. The
performance of the original methods PSNR_HVS_S and PSNR_HVS_M_S is preserved by the modified
PSNR-HVS and PSNR-HVS-M on noise, safe, hard and simple subsets. The performance on exotic and exotic2
subsets is improved remarkably.
5. Conclusions And further research
In is paper, saliency map has been introduced to improve image quality assessment based on the observation
that salient regions contribute more to the perceived image quality. The saliency map is defined by a mixed
model based on Itti’s model and face detection model. Salient region information including local contrast
saliency and local average saliency etc. were used instead of salient pixel information as weights of the output
of previous methods. The experimental results from TID2008 database show that weighted saliency map can be
used to remarkably enhance the performance of PSNR-HVS, PSNR-HVS-M on specific subsets.
Further research involves extending the test database and analyzing the extreme points in scatter plots for
which the distance between objective and MOS is large. That means for some images the image quality
assessment models do not work accurately. The performance of image quality assessment models will be
enhanced by reducing the number of these extreme points. Besides that, some machine learning method such as
neural network might be involved to acquire well-chosen coefficients in mixed saliency map and thresholds.
References
13
1
N. Ponomarenko, F. Battisti, K. Egiazarian, J. Astola, V. Lukin "Metrics performance comparison for
color image database", Proc. Fourth international workshop on video processing and quality metrics for
consumer electronics, 14-16 (2009).
2 VQEG,
"Final report from the video quality experts group on the validation of objective models of video
quality assessment," Http://www.vqeg.org/.
3 Matthew
Gaubatz, "Metrix MUX Visual Quality Assessment Package: MSE, PSNR, SSIM, MSSIM,
VSNR, VIF, VIFP, UQI, IFC, NQM, WSNR, SNR", http://foulard.ece.cornell.edu/gaubatz/metrix_mux/.
4 A.
B. Watson, "DCTune: A technique for visual optimization of DCT quantization matrices for individual
images," Soc. Inf. Display Dig. Tech. Papers, vol. XXIV, pp. 946-949 (1993).
5 Z.
Wang, A. Bovik, "A universal image quality index", IEEE Signal Processing Letters, vol. 9, pp. 81-84
(2002).
6 Z.
Wang, A. Bovik, H. Sheikh, E. Simoncelli, "Image quality assessment: from error visibility to
structural similarity", IEEE Transactions on Image Proc., vol. 13, issue 4, pp. 600-612 (2004).
7 Z.
Wang, E. P. Simoncelli and A. C. Bovik, "Multi-scale structural similarity for image quality
assessment," Proc. IEEE Asilomar Conference on Signals, Systems and Computers (2003).
8 B.
Kolpatzik and C. Bouman, "Optimized Error Diffusion for High Quality Image Display", Journal
Electronic Imaging, pp. 277-292 (1992).
9 B.
W. Kolpatzik and C. A. Bouman, "Optimized Universal Color Palette Design for Error Diffusion",
Journal Electronic Imaging, vol. 4, pp. 131-143 (1995).
10 N.
Ponomarenko, F. Silvestri, K. Egiazarian, M. Carli, J. Astola, V. Lukin "On between-coefficient
contrast masking of DCT basis functions", CD-ROM Proc. of the Third International Workshop on Video
Processing and Quality Metrics.( U.S.A),p.4 (2007).
11 H.R.
Sheikh.and A.C. Bovik, "Image information and visual quality," IEEE Transactions on Image
Processing, Vol.15, no.2, pp. 430-444 (2006).
12 Damera-Venkata
N., Kite T., Geisler W., Evans B. and Bovik A. "Image Quality Assessment Based on a
Degradation Model", IEEE Trans. on Image Processing, Vol. 9, pp. 636-650 (2000).
13 T.
Mitsa and K. Varkur, "Evaluation of contrast sensitivity functions for the formulation of quality
measures incorporated in halftoning algorithms", Proc. ICASSP, pp. 301-304 (1993).
14 H.R.
Sheikh, A.C. Bovik and G. de Veciana, "An information fidelity criterion for image quality
assessment using natural scene statistics", IEEE Transactions on Image Processing, vol.14, no.12, pp.
2117-2128 (2005).
14
15 D.M.
Chandler, S.S. Hemami, "VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for Natural
Images", IEEE Transactions on Image Processing, Vol. 16 (9), pp. 2284-2298 (2007).
16 K.
Egiazarian, J. Astola, N. Ponomarenko, V. Lukin, F. Battisti, M. Carli, "New full-reference quality
metrics based on HVS", CD-ROM Proceedings of the Second International Workshop on Video Processing
and Quality Metrics, (Scottsdale), p.4, (2006).
17 Qi
Ma and Liming Zhang. "Saliency-Based Image Quality Assessment Criterion", Proc. ICIC 2008,
LNCS 5226, pp. 1124–1133 (2008).
18 Xin
Feng, Tao Liu, Dan Yang and Yao Wang, "Saliency Based Objective Quality Assessment of Decoded
Video Affected by Packet Losses", Proc.ICIP2008, pp.2560-2563 (2008).
19 R
Desimone, TD Albright, CG Gross and C Bruce. "Stimulus selective properties of inferior temporal
neurons in the macaque", Journal of Neuroscience, vol4, 2051-2062 (1984).
20 L.
Itti and C.Koch, "A saliency-based search mechanism forovert and covert shifts of visual attention,"
Vision Research, vol. 40, no.14, pp.211-227 (2008).
21 Face
detection using OpenCV. http://opencv.willowgarage.com/wiki/FaceDetection, accessed May 2009.
22 Walther,
D., Koch, "Modeling Attention to Salient Proto-objects", Neural Networks vol 19, 1395–1407
(2006).
23 TID2008,
http://www.ponomarenko.info/tid2008.htm, accessed May 2009.
24 Sheikh
H.R., Sabir M.F., Bovik A.C. " A statistical evaluation of recent full reference image quality
assessment algorithms", IEEE transactions on Image Proc. Vol 15, no.11, pp.3441-3452 (2006).
15
Download