Full Reference Image Quality Assessment Based on Saliency Map Analysis Tong Yubing*, Hubert Konik*, Faouzi Alaya Cheikh** and Alain Tremeau* * Laboratoire Hubert Crurien UMR 5516, Université Jean Monnet -Saint-Etienne, Université de Lyon,42000 Saint-Etienne, France. E-mail : yubing.tong@univ-st-etienne.fr ** Computer Science & Media Technology, Gjvik University College, PO BOX 191, N-2802, Gjvik , Norway Abstract. Salient regions of an image are the parts that differ significantly from their neighbors. They tend to immediately attract our eyes and capture our attention. Therefore, they are very important regions in the assessment of image quality. For the sake of simplicity, region saliency hasn’t been fully considered in most of previous image quality assessment models. PSNR_HVS and PSNR_HVS_M are two new image quality estimation methods with promising performance.1 But with PSNR_HVS and PSNR_HVS_M, no saliency region information is used. Moreover images are divided into fixed size blocks and each block is processed independently in the same way with the same weights. In this paper, the contribution of any region to the global quality measure of an image is weighted with variable weights computed as a function of its saliency. The idea is to take into account the visual attention mechanism. In salient regions, the differences between distorted and original images are emphasized, as if we are observing the difference image with a magnifying glass. Here a mixed saliency map model based on Itti’s model and face detection is proposed. As faces play an important role in our visual attention, faces should also be used as an independent feature of the saliency map. Both low-level features including intensity, color, orientation and high-level feature such as face are used in the mixed model. Differences in salient regions are then given more importance and thus contribute more to the image quality score. The saliency map of every point is correlated with that of its neighboring region, considering the statistical information of the point neighborhood and the global saliency distribution. The experiments done on the 1700 distorted images of the TID2008 database, show that the performance of the image quality assessment on full subsets is enhanced. Especially on Exotic and Exotic2 distorted subsets, the performance of the modified PSNR_HVS and PSNR_HVS_M based on saliency map is greatly enhanced. Exotic and exotic2 are two subsets with contrast change, mean shift distortions. PSNR_HVS and PSNR_HVS_M only used image intensity information, but for our proposed method, color contrast, intensity and other information will be detected in the image quality assessing and our method can reflect the attribute of our visual attention more effectively than PSNR_HVS or PSNR_HVS_M. For PSNR_HVS, the Spearman correlations on exotic and exotic2 subsets have been enhanced individually by nearly 69.1% and 16.4 % respectively, and Kendall correlations have been enhanced individually by nearly 60.5% and 6.7 % respectively. For PSNR_HVS_M, the Spearman correlations have been enhanced individually by nearly 61.3% and 15.3 % respectively, and Kendall correlations have been enhanced individually by nearly 51.55% and 4.76 % respectively. Key words: image quality assessment, saliency map, face detection, visual attention mechanism 1. Introduction Subjective image quality assessment procedure is a costly process which requires a large number of observers and takes lots of time. Therefore, it cannot be used in automatic evaluation programs or in real time applications. Hence it is a trend to assess image quality with objective methods. Usually image quality assessment models are set up to approximate the subjective score on image quality. Some referenced models 1 had been proposed such as in VQEG.2 Some methods have gotten better results than PSNR and MSE, including UQI, SSIM, LINLAB, PSNR_HVS, PSNR_HVS_M, NQM, WSNR, VSNR etc.3-16 But it has been demonstrated that considering the wide range of possible distortion types no existing metric performance will be good enough. PSNR_HVS and PSNR_HVS_M are two new methods with high performance on noise, noise2, safe, simple and hard subsets of TID2008, which makes them appropriate for evaluating the efficiency of image filtering and lossy image compression.1 But PSNR_HVS and PSNR_HVS_M show very low performance on exotic and exotic2 subset of TID2008 database. With PSNR_HVS and PSNR_HVS_M, images are divided into fixed size blocks. Moreover, every block is processed independently in the same way with the same weights. Such way of comparing images is contradictory with the way our HVS proceeds. Dividing an image into blocks of equal size irrespective of its content is definitely counterproductive since it breaks large objects and structures of the image into semantically non-meaningful small fragments. Additionally it introduces strong discontinuities that were not present in the original image. Furthermore, it is proven that our HVS is selective in its handling/processing of the visual stimulus. Thanks to this selectivity of our visual attention mechanism, human observers usually focus more on some regions than another irrespective of their size. Therefore, it is intuitive to think that an approach that treats the image regions in the same way, disregarding the variation of their contents will never be able to faithfully estimate the perceived quality of the visual media. Therefore, we propose to use the saliency information to mimic the selectivity of the HVS and integrate it into existing objective image quality metrics to give more importance to the contribution of salient regions over those of non-salient regions. Image saliency map could be used as weights on the results of SSIM, VIF etc.17, but the saliency map used in this study was in fact the image reconstructed by phase spectrum and inverse Fourier transform which could reflect the presence of contours. This may not be enough, since the contour of an image is far from containing all information in the image. The detection order of region saliency was used to weight the difference between reference and distorted images.18 For every image, there are 20 time steps to find the saliency region. If a salient region is found first, it is assigned the largest weight and vice versa. For pixels in the detected salient region, same weighting and simple linear weighting were used. In this paper, we propose to consider additional information computed from the image contents that affects region saliency. We will consider not only the saliency value of every pixel but also the relative saliency degree of the current pixel to its neighboring field and to the global image. Furthermore, non-salient regions contribution to image quality score will be reduced by assigning lower weights to them. Face plays an important role in recognition and can focus much of our attention.19 Face should thus be used as a high-level feature for the saliency map analysis in addition to low-level features such as those used in Itti’s model 20 based on color, intensity and orientations. In this paper, we propose a mixed saliency map model based on Itti’s model and a face detection model. This paper is organized as follows: PSNR_HVS and PSNR_HVS_M are reviewed in section 2. An example about the distortion in salient region is then given to show that salient regions contribute more to the perceived image quality which has not been considered in PSNR_HVS and PSNR_HVS_M models. In section 3, an image quality assessment model based on a mixed saliency map is proposed. Experimental results using images from TID2008 database are presented and discussed in section 4. Section 5 concludes the paper. 2. Analysis of Previous Work and Primary Conclusion PSNR and MSE are two common methods used to assess the quality of the distorted image defined by, 2 2 5 5 P S N R 1 0 lg ( ) M S E 2 (1) M S E 1 M N i, j 2 M N i 1 j 1 (2) i, j [a(i, j ) a (i, j )] (3) Where (i,j) is the current pixel position; a (i, j ) and a (i, j ) are the original image and the distorted image respectively, and M and N are the height and width of the image. Neither image content information nor HVS characteristics were taken into account by PSNR and MSE when they are used to assess image quality. Consequently PSNR and MSE can’t achieve good results when compared to subjective quality scores, especially for images such as those in noise, noise2, exotic and exotic2 subsets which include images corrupted with additive Gaussian noise, high frequency noise, impulse noise, Gaussian blur etc.. PSNR gives the worst results according to Spearman’s correlation and Kendall’s correlation.1 PSNR_HVS and PSNR_HVS_M are two models which had been designed to improve the performance of PSNR and MSE. The PSNR_HVS divides the image into 8x8 pixels non-overlapping blocks. Then the i, j difference between the original and the distorted blocks is weighted for every 8x8 block by the coefficients of the Contrast Sensitivity Function (CSF). So equation (3) can be rewritten as follows, PSNR _ HVS i, j i, j CSFCof i, j Here (4) i, j is calculated using DCT coefficients. PSNR_HVS_M is defined in similar way to the PSNR_HVS, but the difference between the DCT coefficients is further multiplied by a contrast masking metric (CM) for every 8x8 block. The result is then weighted by the CSFCof as follows, PSNR _ HVS _ M i, j i, j CM (i, j) C S F Cof i, j (5) Consequently, a new MSE metric for PSNR_HVS can be defined as follows, MSE PSNR _ HVS 1 M N M /8 N /8 I 1 J 1 8 8 i 1 j 1 PSNR _ HVS i, j 2 (6) where (I,J) is the position of an 8x8 block in the image and (i,j) is the position of a pixel in the 8x8 block. MSE PSNR _ HVS _ M can be defined in the same way. Then PSNR_HVS or PSNR_HVS_M can be computed by replacing the MSE in equation (1) with MSE PSNR _ HVS or MSE PSNR _ HVS _ M . 2.1 Analysis For PSNR_HVS and PSNR_HVS_M, images are processed with non-overlapping 8x8 blocks. Every 8x8 block is considered to contribute equally to the image quality metric. According to human visual perception, 8*8 block size are not optimal considering the variability of image content. In fact, the size of the salient region is not fixed. Independent blocks with fixed size might result in blockness or sudden change that affects greatly the subjective quality perception. As an illustration the following figures show that different parts of an image contribute differently to the perceived image quality and that degradation in salient regions may be more prominent and hence should contribute more to the final quality measure. 3 Reference I18 of TID2008 saliency map with skin hue detection of I18 in TID2008 50 100 100 150 150 height height 60 50 200 50 40 200 250 250 300 300 30 20 10 350 350 50 100 150 200 250 300 width 350 400 450 500 100 100 100 150 150 height height 50 200 200 250 250 300 300 350 350 200 250 300 width 500 distortion on non-saliency region 50 150 400 Figure 2. Saliency map of ‘I18’ with face detection. distortion on saliency region 100 300 width Figure 1. Reference image ‘I18’. 50 200 350 400 450 50 500 Figure 3.‘I18’ with noise in one salient region. 100 150 200 250 300 width 350 400 450 500 Figure 4.‘I18’ with noise in four non-salient regions. The image ‘I18’ and its corresponding saliency map are respectively illustrated in Figure 1and Figure 2. Figure 3 is a distorted image of ‘I18’ with noise on the saliency region including face, neck and breast part. The objective image quality of this distorted image is equal to 46.3 db with PSNR, 33.74 db with PSNR_HVS and 36.3 db with PSNR-HVS_M. Figure 4 is another distorted image of ‘I18’ with noise on the non-saliency region. The objective image quality of this second distorted image is equal 41.6 db with PSNR, 32.4 db with PSNR_HVS and 35.8 db with PSNR-HVS_M. Here a local smoothing filter was used to filter the corresponding parts in saliency map with noise. The objective image quality metric values show that the quality of Figure 3 is better than that of Figure 4. But it is easy to see that the image quality of Figure 4 is better than that of Figure 3 as the filter operation was added on the non-saliency region of Figure 4. All the distorted parts in Figure 4 are not perceptibly noticeable unless they were carefully observed pixel by pixel. In Figure 5, the non-saliency regions with noise in Figure 4 are marked out with blue circles. 4 Figure 5. ‘I18’ with distortion in four non-salient regions. For the reference image ‘I18’ in TID2008, noise was added in equal quantity to different parts of the image. Each time image quality scores were computed and found to be different. This result confirms our initial expectations that quantitatively equal distortion yield different image quality scores. Each part of an image contributes differently to the perceived image quality. Furthermore the distortions in salient regions affect image quality more profoundly than those in non-salient regions. 3. Image Quality Assessment Based on Region Saliency In this section, saliency map of an image will be calculated using Itti’s saliency map model or the following mixed saliency map model when faces are present in the image. First, a simple and fast face detection program in OpenCV based on Haar like features was used to decide if the current image contains human faces.21 Then according to that decision, Itti’s model or the mixed model will be used to calculate saliency map. 3.1 Itti’s Saliency Map Model The saliency map model that we propose is mainly based on Itti’s visual attention model. Considering that faces play an important role in our daily social interaction and thus easily focus our visual attention, we propose a mixed saliency map model based on Itti’s visual attention model and face detection. Itti’s salient map model is defined as a bottom-up visual attention mechanism which is based on color, intensity and orientation features. Each feature is analyzed using Gaussian pyramid and multi-scales. This model is based on 7 feature maps including one intensity, four orientations (at 0°, 45°, 90° and 135°) and two color opponencies (red/green and blue/yellow) conspicuous maps. After a normalization step, all those feature maps are summed to 3 conspicuous maps including intensity conspicuous map C i , color conspicuous map C c and orientation conspicuous map C o . Finally the saliency maps are combined together to get the saliency maps according to the following equation SItti= 1 Ck 3 k i , c , o (7) As an example, let us consider the image ‘I01’ in TID2008 (see Figure 6 (a)), its saliency map (Figure 6 (b)) computed using Itti’s model and the corresponding surface plot (Figure 6 (c)). The more reddish a region of the saliency map is, the more salient it’s corresponding image region is. In figure 6 (c) most of the regions are non-salient except only a few saliency regions. This concords with the selectivity of the HVS which focuses only on some parts of the image instead of the whole content. 5 Reference I01 in TID2008 50 100 height 150 200 250 300 350 50 100 150 200 250 300 width 350 400 450 500 (a) reference image ‘I01’. 50 100 150 200 250 300 350 50 100 150 200 250 300 350 400 450 500 (b) saliency map of ‘I01’. (c) surface plot of I01 saliency map. Figure 6. image ‘I01’ with its saliency map and corresponding surface plot. 3.2 Saliency Map Model based on Face Detection Faces are features which focus more attention than other features in many images. Psychological tests have proven that face, head or hands can be perceived prior to any other details.20 So faces can be used as high level features for saliency map. One drawback of Itti’s visual attention mechanism model is that its saliency map model is not well adapted for images with faces. Several studies in face recognition have shown that skin hue features could be used to extract the face information. To detect heads and hands in images, we have used the face recognition and location algorithm used by Walther et al.22. This algorithm is based on a Gaussian model of the skin hue distribution in the (r’, g’) color space as independent feature. For a given color pixel (r’, g’), the model’s hue response is then defined by the following equation, 1 (r ' ) 2 ( g ' g ) 2 (r ' r )( g ' g ) r h(r ' , g ' ) exp 2 2 2 r g r g r' r r g b g' and g r g b 2 Where ( r , g ) is the average of the skin hue distributions, r and (8) (9) g2 are the variances of the r’ and g’ components, and is the correlation between the components r’ and g’. These parameters had been 6 statistically estimated from 1153 photographs which contained faces. The function h(r ' , g ' ) can be considered as a color variability function around a given hue. 3.3 Mixed Saliency Map Model based on Face Detection The mixed saliency analysis model that we propose is a linear combination model which combines both Itti’s model and the Gaussian face detection model as follow, SMIX = Where S Itti (1 ) S Face (10) is a constant. The best results that we obtained in our study has been achieved for = 3 7 . s a l i e n c y m a p wi t h s k i n h u e d e t e c t i o n o f I1 8 i n TID2 0 0 8 s a l i e n c y m a p wi t h o u t s k i n h u e d e t e c t i o n o f I1 8 i n TID2 0 0 8 60 60 50 50 50 height 200 50 100 40 150 30 250 40 150 200 30 250 20 20 300 300 10 10 350 350 100 200 300 width 400 500 100 200 300 width 400 500 (a) Saliency map from mixed model. (b) Saliency map from Itti’s model. Figure 7. Saliency maps for mixed model and Itti’s model on ‘I18’ reference image. reference I23 in TID2008 50 100 150 height height 100 200 250 300 350 50 100 150 200 250 300 width 350 400 (a) ‘I23’ reference image. 7 450 500 saliency area based on skin hue detection of reference I23 of TID2008 saliency area without skin hue detection of referenceI23 of TID2008 60 60 50 50 50 100 40 200 40 150 height height 150 50 100 30 250 200 30 250 20 20 300 300 10 10 350 350 100 200 300 400 500 100 width 200 300 400 500 width (b) Saliency map from mixed model. (c) Saliency map from Itti’s model. Figure 8. Saliency maps from mixed model and Itti’s model for ‘I23’ reference image. For most of images containing faces, heads or hands, the mixed model with skin hue detection gives better results than the Itti’s model, i.e. more accurate saliency maps. The two examples given in this paper show the difference between Itti’s model and the mixed model for face images. The first example corresponds to the reference image ‘I18’ in TID2008 which contains a face with eyes and hands. Figure 7 (a) shows the saliency map computed from the mixed model. Figure 7 (b) shows the saliency map computed from Itti’s model. In Figure 1, the most salient regions which attract the attention are the face and the hands. Figures 7 (a) and 7 (b) show that the saliency map computed from the mixed model is more precise that those computed from the Itti’s model. Another interesting example is the reference image ‘I23’ which is a non-human face image. The original reference image is shown in Figure 8 (a). The most salient regions which focus the attention are the heads of the parrots and in particular their eyes and their faces. Considering the hue of the faces of the parrots and in particular the hue of the neighborhood around the eyes, we computed the corresponding color variability function h(r ' , g ' ) next the mixed model associated to this hue distribution. The saliency map computed from the mixed model is given by Figure 8 (b) and the one computed from Itti’s model is given by Figure 8 (c). Figures 8 (a) and 8 (b) show that the saliency map computed from the mixed model is more accurate that that computed from the Itti’s model. This second example shows that the mixed model could be extended to other high level features other than human faces. 3.4 Mixed Saliency Map Model based on Salient Region We usually focus on the salient regions instead of salient points. That means that the saliency value of every pixel should be weighted in function of the saliency value of pixels belonging to the neighboring field or of the saliency value of the region it belongs to. For each pixel belonging to a salient region, we propose to enlarge the area of neighboring field as if we are wearing a magnifying glass. For each pixel belonging to a non-salient region, we propose to give less weight to the neighboring field. We used a metric to define the salient regions and the neighboring field associated with a given pixel. First we computed the binary mark metric, Bi , j defined as follow, if S MIX i, j T1 , 0 Bi , j 1 otherwise. (11) Where T1 is a threshold computed experimentally, SMIX (i,j) is the saliency value computed from the saliency map model considered and (i,j) is the pixel position in the image. 8 Next we computed block by block the relative saliency degree of the current pixel in function of its neighboring field. The current point A(i,j), current block(I,J) and the overlapped neighboring field N(i,j) with size of k k are illustrated in Figure 9. N(i,j) Current Block(I,J) A(i,j) Figure 9. Current block, current pixel and its neighboring field. I , J was defined as a saliency flag of the current block as follow, I ,J 8 8 if BBlock( I , J ) (i, j ) T2 , false i 1 j 1 t r u e o t h e r w.i s e (12) Where T2 is an experimental threshold, (i,j) is the pixel position in the Block(I,J). Then, as salient regions focus more the attention of the observers than non-salient regions, we gave less weight to pixels belonging to non-salient regions. This means that the saliency value of every pixel is weighted by a function of the saliency values of the pixels belonging to its neighboring area. We considered several variables to compute the relative saliency of the current neighboring area, current block and current pixel. Let us define Block ( I , J ) and region (i, j ) the relative saliency degree of the current block and the current neighboring field as functions of the average saliency and of the global image. Block ( I , J ) 1 8 8 S MIX (i, j ) SGlobal 64 i 1 j 1 1 region (i, j ) S Local S Global (13) (14) with S L o c al SGlobal Let us define pixel _ average (i, j ) and 1 k k S M I X(i, j) k k i 1 j 1 1 WN M N S i 1 j 1 pixel _ max (i, j ) MIX (i, j ) (15) (16) the relative saliency degree of the current pixel as a function of its neighboring field and of the global image. S MIX (i, j ) S MIX (i, j ) , SGLobal S Local pixel _ average (i, j ) max 9 (17) pixel _ max (i, j ) S MIX (i, j ) S Max _ Local (18) with SMax _ Local max SMIX (i, j ) i k , j k (19) Finally, to decrease the influence of non-salient regions, we computed a weighted saliency map ws (i, j ) defined as follow, ws (i, j ) max region(i, j ), Block (i, j ) region(i, j ) T3 (20) Where T3 is a threshold computed experimentally. Thus, if we consider for example the saliency map of reference ‘I18’ given by Figure 7 (a), we get the weighted saliency map ws corresponding to the Figure 10. (b) surface plot of weighted saliency map ws . (a) surface plot of saliency map. Figure 10. Surface plot of saliency map and weighted saliency map ws . Comparing Figures 10 (a) and (b), we can see that ws reflects the fact that observers usually focus on the most salient parts instead of all locally salient parts. Most salient regions correspond to regions which are not only locally salient but also salient with regards to the global image. 3.5 Image Quality Assessment weighted by Salient Region In order to improve the efficiency of image quality metrics taking into account the human visual attention mechanism, we propose to weight the image differences from the salient regions instead of salient point. Considering that human observers are unable to focus on several areas at the same time and that they assess the quality of an image firstly/mainly from the most salient areas, we propose to weight image differences metrics by the weighted saliency map ws defined above. Thus the PSNR-HVS metric can be computed with the following pseudo code, 10 // for the pixels in a target block with 8x8 for i=1:8 for j=1:8 if ( I , J is false) CSFCof i, j ; CSFCof i, j 1 PSNR _ HVS _ S i, j i, j else if (( pixel _ max T4 ) & ( pixel _ average T5 )) PSNR _ HVS _ S i, j PSNR _ HVS i, j ws (i, j ) ; else PSNR _ HVS _ S i, j PSNR _ HVS i, j ; end end end end Where (i,j) is the position of a pixel in an 8x8 block. The thresholds T3, T4, T5 have been empirically defined to 15, 0.5 and 40 respectively for TID2008 database. 4. Experimental Results and Analysis In this paper, the images in TID2008 database were used to test our image quality assessment model. TID2008 is the largest database of distorted images intended for verification of full reference quality metrics.23 We used the TID2008 database as it contains more distorted images, types of distortion and subjective experiments than the LIVE database.24 The TID2008 database contains 1700 distorted images (25 reference images x 17 types of distortions x 4 levels of distortions). LIVE contains 779 distorted images with only 5 types of distortion and 161 subjective experiments. The MOS (Mean Opinion Score) of image quality was computed from the results of 838 subjective experiments carried out by observers from Finland, Italy, and Ukraine. The higher the MOS is (0 - minimal, 9 - maximal, MSE of each score is 0.019), the higher the visual quality of the images is. All the distorted images are grouped together in a full subset or into different subsets including Noise, Noise2, Safe, Hard, Simple, Exotic, Exotic2 with different distortions. For example, in Noise subset there are several types of distortions such as high frequency noise distortion, Gaussian blur, image denoising etc. In order to compare the accuracy of the image quality metrics weighted by salient regions with those of non-weighted metrics, we compute the Spearman correlation and Kendall correlation coefficients. Spearman correlation and Kendall correlation coefficients are two indexes used in image quality assessment to compute the correlation of objective measures with human perception. Other methods including PSNR and LINLAB were also computed for comparison purposes. 11 Table I. Spearman correlation coefficients. Spearman correlation coefficients models subsets PSNR_HVS_ PSNR LINLAB PSNR_HVS PSNR_HVS_S (%) M (%) PSNR_HVS_M_S Noise 0.704 0.839 0.917 0.914 -0.327 0.918 0.920 0.218 Noise2 0.612 0.853 0.933 0.863 -7.50 0.93 0.871 -6.344 Safe 0.689 0.859 0.932 0.920 -1.28 0.936 0.924 -1.282 Hard 0.697 0.761 0.791 0.814 2.908 0.783 0.816 4.215 Simple 0.799 0.877 0.939 0.933 -0.639 0.942 0.935 -0.743 Exotic 0.248 0.135 0.275 0.465 69.09 0.274 0.442 61.314 Exotic2 0.308 0.033 0.324 0.377 16.358 0.287 0.331 15.331 Full 0.525 0.487 0.594 0.622 4.71 0.559 0.595 6.44 Table 2 Kendall correlation coefficients Kendall correlation coefficients models subsets PSNR_HVS_ PSNR LINLAB PSNR_HVS PSNR_HVS_S (%) M PSNR_HVS_M_S (%) Noise 0.501 0.652 0.751 0.745 -0.799 0.752 0.752 0 Noise2 0.424 0.671 0.78 0.680 -12.82 0.771 0.689 -10.63 Safe 0.486 0.682 0.772 0.752 -2.59 0.778 0.757 -2.69 Hard 0.516 0.569 0.614 0.634 3.257 0.606 0.637 5.11 Simple 0.598 0.715 0.785 0.773 -1.52 0.789 0.777 -1.52 Exotic 0.178 0.084 0.195 0.313 60.51 0.194 0.294 51.55 Exotic2 0.225 0.026 0.238 0.254 6.72 0.21 0.220 4.76 Full 0.369 0.381 0.476 0.472 -0.8 0.449 0.455 1.34 In the table 1 and table 2, PSNR_HVS_S and PSNR_HVS_M_S are respectively the new modified PSNR_HVS and PSNR_HVS_M based on weighted saliency map. The original PSNR_HVS and PSNR_HVS_M are based on image differences metrics which assess the image quality by independent blocks without taking into account that salient regions contribute more in the image quality score. (%) is the enhancement of performance of PSNR_HVS and PSNR_HVS_M. Considering Spearman correlation coefficients, PSNR_HVS and PSNR_HVS_M perform well on Noise, Noise2, Safe, Hard and Simple subsets of TID2008. But they don’t perform well on Exotic and Exotic2 subset. With the weighted saliency map, the Spearman coefficients of PSNR_HVS and PSNR_HVS_M on full subsets are enhanced although there is reduction on Noise2 subset. On Exotic and Exotic2 distorted subsets, the performance of the modified PSNR_HVS and PSNR_HVS_M based on saliency map are remarkably enhanced. For PSNR_HVS, the Spearman correlations on Exotic and Exotics2 are enhanced individually nearly 69.1% and 16.4 %, and Kendall correlations are enhanced individually nearly 60.5% and 6.7 % respectively. For PSNR_HVS_M, the Spearman correlations are enhanced individually nearly 61.3% and 15.3 % respectively, and Kendall correlations are enhanced individually nearly 51.55% and 4.8 % respectively. Exotic and exotic2 are two subsets with contrast change, mean shift distortions. PSNR_HVS and PSNR_HVS_M only used the intensity information, but for our proposed method, color contrast, intensity and other information will be 12 detected in the image quality assessing. So our method can reflect the attribute of our visual attention more effectively than PSNR_HVS or PSNR_HVS_M. Figure 11 illustrates the scatter plots of the MOS for different models including PSNR, LINLAB, PSNR_HVS and PSNR_HVS_S, etc. This Figure shows that the modified PSNR-HVS and PSNR-HVS-M are better clustered than that of original models except for only few extreme points. Figure 11. Scatter plots of the image quality assessment models, the plots with blue points were the results from the image quality assessment model based on weighted saliency map. Our results show that PSNR-HVS and PSNR-HVS-M work better than PSNR and LINLAB. The performance of the original methods PSNR_HVS_S and PSNR_HVS_M_S is preserved by the modified PSNR-HVS and PSNR-HVS-M on noise, safe, hard and simple subsets. The performance on exotic and exotic2 subsets is improved remarkably. 5. Conclusions And further research In is paper, saliency map has been introduced to improve image quality assessment based on the observation that salient regions contribute more to the perceived image quality. The saliency map is defined by a mixed model based on Itti’s model and face detection model. Salient region information including local contrast saliency and local average saliency etc. were used instead of salient pixel information as weights of the output of previous methods. The experimental results from TID2008 database show that weighted saliency map can be used to remarkably enhance the performance of PSNR-HVS, PSNR-HVS-M on specific subsets. Further research involves extending the test database and analyzing the extreme points in scatter plots for which the distance between objective and MOS is large. That means for some images the image quality assessment models do not work accurately. The performance of image quality assessment models will be enhanced by reducing the number of these extreme points. Besides that, some machine learning method such as neural network might be involved to acquire well-chosen coefficients in mixed saliency map and thresholds. References 13 1 N. Ponomarenko, F. Battisti, K. Egiazarian, J. Astola, V. Lukin "Metrics performance comparison for color image database", Proc. Fourth international workshop on video processing and quality metrics for consumer electronics, 14-16 (2009). 2 VQEG, "Final report from the video quality experts group on the validation of objective models of video quality assessment," Http://www.vqeg.org/. 3 Matthew Gaubatz, "Metrix MUX Visual Quality Assessment Package: MSE, PSNR, SSIM, MSSIM, VSNR, VIF, VIFP, UQI, IFC, NQM, WSNR, SNR", http://foulard.ece.cornell.edu/gaubatz/metrix_mux/. 4 A. B. Watson, "DCTune: A technique for visual optimization of DCT quantization matrices for individual images," Soc. Inf. Display Dig. Tech. Papers, vol. XXIV, pp. 946-949 (1993). 5 Z. Wang, A. Bovik, "A universal image quality index", IEEE Signal Processing Letters, vol. 9, pp. 81-84 (2002). 6 Z. Wang, A. Bovik, H. Sheikh, E. Simoncelli, "Image quality assessment: from error visibility to structural similarity", IEEE Transactions on Image Proc., vol. 13, issue 4, pp. 600-612 (2004). 7 Z. Wang, E. P. Simoncelli and A. C. Bovik, "Multi-scale structural similarity for image quality assessment," Proc. IEEE Asilomar Conference on Signals, Systems and Computers (2003). 8 B. Kolpatzik and C. Bouman, "Optimized Error Diffusion for High Quality Image Display", Journal Electronic Imaging, pp. 277-292 (1992). 9 B. W. Kolpatzik and C. A. Bouman, "Optimized Universal Color Palette Design for Error Diffusion", Journal Electronic Imaging, vol. 4, pp. 131-143 (1995). 10 N. Ponomarenko, F. Silvestri, K. Egiazarian, M. Carli, J. Astola, V. Lukin "On between-coefficient contrast masking of DCT basis functions", CD-ROM Proc. of the Third International Workshop on Video Processing and Quality Metrics.( U.S.A),p.4 (2007). 11 H.R. Sheikh.and A.C. Bovik, "Image information and visual quality," IEEE Transactions on Image Processing, Vol.15, no.2, pp. 430-444 (2006). 12 Damera-Venkata N., Kite T., Geisler W., Evans B. and Bovik A. "Image Quality Assessment Based on a Degradation Model", IEEE Trans. on Image Processing, Vol. 9, pp. 636-650 (2000). 13 T. Mitsa and K. Varkur, "Evaluation of contrast sensitivity functions for the formulation of quality measures incorporated in halftoning algorithms", Proc. ICASSP, pp. 301-304 (1993). 14 H.R. Sheikh, A.C. Bovik and G. de Veciana, "An information fidelity criterion for image quality assessment using natural scene statistics", IEEE Transactions on Image Processing, vol.14, no.12, pp. 2117-2128 (2005). 14 15 D.M. Chandler, S.S. Hemami, "VSNR: A Wavelet-Based Visual Signal-to-Noise Ratio for Natural Images", IEEE Transactions on Image Processing, Vol. 16 (9), pp. 2284-2298 (2007). 16 K. Egiazarian, J. Astola, N. Ponomarenko, V. Lukin, F. Battisti, M. Carli, "New full-reference quality metrics based on HVS", CD-ROM Proceedings of the Second International Workshop on Video Processing and Quality Metrics, (Scottsdale), p.4, (2006). 17 Qi Ma and Liming Zhang. "Saliency-Based Image Quality Assessment Criterion", Proc. ICIC 2008, LNCS 5226, pp. 1124–1133 (2008). 18 Xin Feng, Tao Liu, Dan Yang and Yao Wang, "Saliency Based Objective Quality Assessment of Decoded Video Affected by Packet Losses", Proc.ICIP2008, pp.2560-2563 (2008). 19 R Desimone, TD Albright, CG Gross and C Bruce. "Stimulus selective properties of inferior temporal neurons in the macaque", Journal of Neuroscience, vol4, 2051-2062 (1984). 20 L. Itti and C.Koch, "A saliency-based search mechanism forovert and covert shifts of visual attention," Vision Research, vol. 40, no.14, pp.211-227 (2008). 21 Face detection using OpenCV. http://opencv.willowgarage.com/wiki/FaceDetection, accessed May 2009. 22 Walther, D., Koch, "Modeling Attention to Salient Proto-objects", Neural Networks vol 19, 1395–1407 (2006). 23 TID2008, http://www.ponomarenko.info/tid2008.htm, accessed May 2009. 24 Sheikh H.R., Sabir M.F., Bovik A.C. " A statistical evaluation of recent full reference image quality assessment algorithms", IEEE transactions on Image Proc. Vol 15, no.11, pp.3441-3452 (2006). 15