2010 Fourth Pacific-Rim Symposium on Image and Video Technology Color to Gray: Attention Preservation Yezhou Yang1∗, Mingli Song1 , Jiajun Bu1 , Chun Chen1 , Cheng Jin2 1 Zhejiang University, 2 Fudan University dering of the full color image by linearly combining R, G, and B according to conventional method fails to preserve the visual attention on the red pepper (pimiento). Abstract In this paper, we propose an approach to preserve a crucial visual cue in color to grayscale transformation: attention. The main contributions are threefolds: 1) preserving visual attention is more biological plausible than preserving other low level cues, which makes our method more reasonable in theory from both biological and psychological aspects; 2) We treat the saliency map from visual attention analysis as a classifier and aim to preserve attention area in the output grayscale image. 3) A simple minimizing function toward this specific problem is established and can be easily solved. Experimental results on test images indicate that our method is both practical and effective in preserving visual attention from the original color image to corresponding grayscale image. It is widely accepted that grayscale mappings of color images, which are transformed solely by spectral uniformity are often inadequate because many visual cues are lost, especially visual attention. When we design an applicable and user-friendly color to gray algorithm, the natural characteristics of human visual systems should be concerned into the framework. Given a color image, the tremendous amount of visual information in it makes it difficult to be fully processed by human visual system[23]. Thus, in front of such an image, human tends to pay more attention in a few parts of the image and neglect others. Thus, many vision scientists [8, 18, 4, 11] nowadays state that visual attention acts as an extremely crucial part of human vision system from an aspect of psychophysics and neuroscience. Different from those low level visual cues, such as contrast, attention analysis builds an elegant mapping from the biological nature of vision system to computational implementation, which provide us a higher level visual cue. 1. Introduction Conventional algorithms have been widely used in color to gray applications, for example, in publishing as a less expensive alternative to full color images, making color blind people perceive visual cues better in color images, and enhancement for the e-ink based book reader, which can only support grayscale rendering currently. Color to gray algorithms [19, 9, 3] are used to convert a color image to a grayscale one while preserving important visual cues. However, the definition of important visual cue is still subjective and difficult to extract in the computer vision field. Although some researchers [9] mentioned the visual cue preservation for color to gray, they didn’t give the formal definition of important visual cues. Fortunately, it is obvious that only the important visual cues can attract human visual attention. In other words, human visual system is usually sensitive to the important visual cues and neglects other information in the image. So, if we want to transfer the visual cues successfully from a color image to a grayscale one, the color to gray algorithm must keep the visual attention unchanged first of all. For example, as shown in the transformed grayscale image in Figure 1, the grayscale ren- On the other hand, as viewers, we are less concerned with the accuracy of light intensities and expect digital grayscale images to preserve meaningful visual cues, which can help us to detect the most important scene features. A widely accepted important visual cue is attention[6]. That is, viewers expect grayscale images to preserve some specific areas, which attract people’s attention in their original colored ones. Consider those patients suffered from completely color blind, who can only see the world in grayscale, their visual system can still depict out important parts of a scene, and the embedded color to grayscale system of them apparent does not solely record the light intensities. In this paper, we propose that preserving attention areas in an image is much more important than representing absolute pixel values in a color to grayscale solution or preserving other low level visual cues. We use well developed attention analysis and detect technology to automatically get attention description from original colored images. Then we use them as the ground-truth and try to propose a method to get a grayscale image which has a similar attention description with its original color one. Unlike other color to ∗ Yang is now with Department of Computer Science, University of Maryland, College Park 978-0-7695-4285-0/10 $26.00 © 2010 IEEE DOI 10.1109/PSIVT.2010.63 337 (a) (b) (c) Figure 1. (a) Original Image (b) Matlab rgb2gray (c) Saliency map from eye-tracking data grayscale models which also concern visual cues preservation, our method seeks visual attention preservation, which is believed to be more biologically reasonable than other low level visual cues and can be automatically detected using state-of-the-art techniques. This makes our algorithm not only more efficient and precise, but also more biologically plausible. As we introduced in the previous section, another trend of color to grayscale research is based on preservation of human vision cues, which is believed to be biologically plausible. For example, as vision study shows that human is more sensitive to edges than homogeneous color blocks, Bala and Eschbach [2] propose a spatial approach to color to grayscale conversion, in which they preserve chrominance edges locally by introducing high-frequency chrominance information into the luminance channel. Furthermore, it is proposed by human vision scientists that human visual system does not perceive absolute values, but based upon relative assessments. Socolinsky and Wolff [21] have applied image gradient information to model the contrast of a color image locally and used it as the visual cue to implement color to gray. However , because the contrast is built based on the maximal gradient from different channels at a short-scale, i.e., every pixel and its nearest neighbors, this algorithm cannot deal well with long-scale contrast regions. Gooch et al. [9] introduced a local algorithm called Color2Gray, in which the gray value of each pixel is iteratively adjusted to minimize an objective function, which is based on local contrasts between all the pixel pairs. Rasche et al [19] introduced a method in which they aim to preserve local contrast while maintaining consistent luminance. All of the methods above were proved to be effective from some aspects. However, human vision system is too complex to be represented solely by contrast or edges and most of the methods need users to adjust parameters in order to create aesthetic results and still may not necessarily preserve the visual attention regions in the grayscale images. 2. Related Works 2.1. Color to Grayscale Image Conversions Many color to grayscale algorithms [19, 9, 3, 20] have been developed, based on a wide variety of techniques. The classic definition of color removal is 3D to 1D transformation from a colored image to a grayscale image solely by recording the light intensities. As a special case, color to gray through linear combination of R, G and B channels is a kind of time-efficient approach. Wyszechi and Stiles [25] have combined the R, G and B channels using a group of linear mapping functions. Wu and Rao [24] have linearly combined the R, G and B channels by defining a series of policies to separate luminance value from the chrominance value so as to construct the grayscale image based on the luminance value. Researchers also treat color to gray as a dimensionality reduction process which degrades a three dimensional color space to the one dimensional grayscale space. Both linear and nonlinear dimensionality reduction techniques can be applied to do the color to gray operation. For example, principal component analysis (PCA) [15] and its kernel generalization (w.r.t., kernel PCA, KPCA) [17], can be utilized to carry out the color to grey transformation. The main purpose of those research is building a 1D parameterizations of 3D color space, such as creating clusters in the color space and applying space-filling curves on them [22]. Alsams and Kolas [1] introduced a conversion method that aims to create sharp grayscale from the original color rather than enhancing the separation between colors and Smith et al. [20] combines global and local conversions in a way similar to Alsam and Kolas [1]. However, such methods ignore the visual cues preservation and may not necessarily preserve the visual attention. 2.2. Visual Attention Analysis Tracing back to 1890, according to James’[14] suggestion, visual attention serves as a mediating mechanism involving competition between different aspects of the visual scene and selecting the most relevant areas to the detriment of others. As the mediation mechanism of human vision is not completely understood yet, using computational methods to analyze scenes becomes the main trend to determine the important and representative regions of an image. 338 This kind of computational models is called visual attention analysis. Based on the work of Koch [6], Itti proposed a saliency-based visual attention model for scene analysis in [12]. In this work, visual input is first decomposed into a set of topographic feature maps which all feed into a master “saliency map” in a bottom-up manner. Then, a W T A (Winner-Take-All) competition is employed to select the most conspicuous image locations as attended points. This method has been proved to be an elegant map from biological theories to computational implementation. Although after Itti, other researchers proposed visual attention analysis methods based on other consumptions, such as natural image statistics [5] [26] or large scale machine learning [16], and they claim their methods to be more accurate with actual human eye tracking experiments, the main idea of visual attention proposed stays unchanged and Itti’s implementation is still the most classic and widely believed accurate one. So, in this paper, we use this implementation of visual attention analysis as the automatically detected groundtruth for our method. The input of this implementation is a natural color image and the output of the process we used in our method is a “saliency” map, which is claimed to be representation of the human attention distribution. Unfortunately, none of the state-of-the-art visual attention detectors can guarantee a perfect prediction of attention area. To those images accompanied with eye-tracking experiments data, we also use “saliency map” computed directly from eye-tracking fixations as the ground-truth. And to those images without eye-tracking data, we also use human labeling data to act as visual attention ground-truth in our method for testing. However, these points do not impact on the conclusions of this paper or the theory presented, as we can assume a more stable and robust visual attention detector is just around the corner. grayscale image and the variance of the grayscale image. At last, to balance these three metrics, an object function is constructed and its corresponding optimization problem is solved. 4. Algorithm 4.1. Definition of the Problem To a color image, it is a group of 3D points. We use C to represent this group. Thus, if there are N points in the color image, C = [c1 , c2 ...cN ]. Each point cx is a 3D vector [Rx , Gx , Bx ]. Similarly, we can treat the grayscale image as a group of 1D points. We use G to represent this group. G = [g1 , g2 ...gN ] and each point gx is a 1D scalar. As claimed in the previous section, we can treat color to gray as a dimensionality reduction process which degrades a three dimensional color space to the one dimensional grayscale space. The problem becomes to seek a proper principle vector W = [w1 , w2 , w3 ] to project the 3D group C onto 1D group G and preserve the visual attention cues in C to G. Formally, it is: G = WC (1) 4.2. Visual Attention Cue From computational visual attention analysis method, we can get the saliency map S of each input color image. In our approach, we treat the saliency map as a classifier L, which labels the pixels in the original color image into two classes: attended and not-attended. Apparently, the proportion of attended pixels varies among different images, we can use methods like fuzzy growing [7] or just arbitrarily set a threshold ε to extract attended area from saliency map S, and divide it into two classes: 1 if S(x,y) > ε L(x, y) = (2) 0 Otherwise 3. Attention Preservation As demonstrated in the previous section, we want to produce a graysacle image, which has similar attention description with its original color one. Here comes the problem: how to preserve attention description? Psychology study shows that features which has a deviation with feature average tend to be attractive. So, in order to preserve attention areas in the grayscale image, we should firstly distinguish the attention areas from other areas in the grayscale image. On the other hand, the grayscale image should also preserve the sense of reality of its corresponding color image. Only distinguishing attention areas from the image will lead to a problem of over-clustering, which devastates the sense of reality of the original color image. So, the grayscale image should also preserve the variance of the original color image within an reasonable scale. In the next section, three metrics are proposed to evaluate the deviation from attention areas to the rest of the Now we get the pixels groups Cl and Gl , which are integrated with visual attention label information: Cl = l ]. [cl1 , cl2 ...clN ], Gl = [g1l , g2l ...gN In order to distinguish the attended class from notattended class, we could minimize the distance within each class while maximize the distance between classes. Follow the definition of Linear Discriminate Analysis, we define two distance matrix in our two classes problem: “between classes scatter matrix” Db and “within classes scatter matrix” Dt : Db = N1 (μ1 − c)(μ1 − c)T + N2 (μ2 − c)(μ2 − c)T (3) Dt = c=1,2 i∈c 339 (ci − μc )(ci − μc )T (4) where 1 μc = ci Nc i∈c 1 1 ci = Nc μc N i N c=1,2 c= Dt−1 (λDb + (1 − λ)Σ)W = γW (5) This is a generalized eigen-value equation and apparently the eigen-vector corresponding to the largest eigenvalue can maximize the objective. The corresponding eigen-vector can then be treated as the projection principle vector W . Refer to eq.(1), we can get the output grayscale image G. (6) After projection, the between classes distance is W T Db W and the within classes distance is W T Dt W . 4.3. Preserve the Variance 5. Experiment and Performance Solely maximizing between classes distance while minimizing within classes distance will lead to over-clustering, which will make the output grayscale image lose the sense of reality. In order to prevent the points in grayscale image from over-clustering, we have to preserve the variance of the points group G. Formally, we have to make sure the value of var(G) = var(W C) is within an reasonable scale. Follow the definition of Principle Component Analysis [15], var(W C) = W T ΣW , in which Σ is the covariance matrix of C: 5.1. Implementation Σ = (C − u)T (C − u) We use the natural images mainly from eye tracking database [4] as our test samples, as Itti’s method is completely tested on this database. We also tested our method on several other image databases, such as Hou’s[10], Judd’s[16] and images downloaded from the internet. The main framework of our method has three steps: (1) We apply Itti’s implementation [13] of visual attention analysis on original color image (C). As above mentioned, the output of this process is a small size grayscale map called “saliency” map. This map is believed to depict the human attention distribution on an image, while a brighter area represents a more attending area in the original image. Then we resize the saliency map into the same size of the original image. To those images which Itti’s implementation fails to predict accurate attention areas, we also use the “saliency” maps from eye-tracking data and human label data. (2) We compute the classifier L from the saliency map S. Then we compute the “within classes scatter matrix” Dt and the “between classes scatter matrix” Db from the data of original color image with attention labels. Also, we compute the covariance matrix Σ from the data of original color image. At last, the generalized eigen-value problem in eq.(11) is solved using Matlab and get the the projection principle vector W . (3) We substitute W into eq.(1) and then get the final grayscale image (G). (7) 4.4. The Minimization Problem In all, we want to maximize W T Db W , minimize W Dt W and keep the value of WT ΣW within an reasonable scale. We introduce a balance parameter λ, and define the objective function J(W ) as follows: T J(W ) = λW T Db W + (1 − λ)W T ΣW W T Dt W (8) An important property to notice about the objective J is that it is invariant w.r.t. rescalings of the vectors W = αW . Hence, we can always choose W such that the denominator is simply W T Dt W = 1, since it is a scalar itself. For this reason we can transform the problem of maximizing J into the following constrained optimization problem: min W s.t −(λW T Db W + (1 − λ)W T ΣW ) W T Dt W = 1 5.2. Performance Our method contains two separate time consuming process: visual attention analysis and an eigen-value problem solver. As we only use the “saliency” map from Itti’s implementation, we can discard the process of Winner Take All (W T A) competition artificial neural networks, which is the most time consuming part. Experiments show that Itti’s implementation has an approximate computational complexity of O(P ), in which P represents the number of pixels. On the other side, the computational complexity to construct and solve the eigen-value problem is O(P ) too. Thus, the upper bound of total computational complexity is still (9) Corresponding to the lagrangian, L(W ) = (11) 1 − (λW T Db W + (1 − λ)W T ΣW ) 2 1 (10) + γ(W T Dt W − 1) 2 The KKT conditions tell us that the following equation needs to hold at the solution, 340 (a) (b) (c) (d) Figure 4. (a) Original Image (b) Matlab rgb2gray (c) Rasche’s method[19] (d) Our method (a) (b) (c) (d) not-attended areas make the output Matlab and Smith’s grayscale image lose the attention information. The effects of our attention preservation algorithm are sometimes subtle, but color changes visibly brightened the region of the yellow flower (Row 1) and darkened the red fire extinguisher (Row 2) for example. We also test our method on Color Blindness Chart (Figure 3). Apparently Color Blindness Chart sometimes does not have an explicit attention area. However, our method can still produce reasonable results, especially in those images that contain large isoluminant regions with a small number of different chrominance values. Figure 4 compares our attention preservation algorithm to Rasche’s [19] method on natural images. Rasche’s method can preserve a low level visual cue: local contrast. For some images, areas with high local contrast coincidentally match the attended areas, in which Rasche’s method can provide better performance. However, as color quantization has to be done as preprocessing of Rasche’s method for the consideration of time complexity, the result of Rasche’s method normally lose the original color image’s sense of reality (Row 1 and Row 2). On the contrary, as we take the variance of color image into consideration, our result can preserve attention areas without damaging the sense of reality. Moreover, when high local contrast area does not match attention area, our method can provide better performance (Row 3 and Row 4). Figure 2. (a) Original Image (b) Matlab rgb2gray (c) Smith’s method[20] (d) Our method (a) (b) (c) (d) Figure 3. (a) Original Image (b) Matlab rgb2gray (c) Smith’s method[20] (d) Our method O(P ). Using an Intel Pentium 4 2.93GHz processor, computing 600*800 image requires 54 seconds, which includes 40 seconds to run visual attention analysis and 14 seconds to construct and solve the eigen-value problem. Considering the time consumption of memory to disk operations on large scale sparse matrix and the uncertain complexity of iterative method, the experimental result approximately matches our expectation for the computational complexity of our algorithm. Under same conditions, Rasche’s method [19] costs almost 60 seconds to do color quantization, which can reduce the number of different color to 256, and then needs another 60 seconds to run the optimization process. Without color quantization, Rasche’s method costs more than 1400 seconds to run the optimization process. 7. Conclusion and Future Work In this paper, we introduce a new approach to the color to grayscale problem, which can preserve visual attention. The saliency maps, computed from original color image, are used as ground-truth to classify each pixel into two classes: attended and not-attended. In order to preserve the visual attention in the grayscale image, we try to distinguish the attended area by maximizing the ratio of between classes distance and within classes distance. At the same time, to 6. Results and Discussion Figure 2 compares our attention preservation algorithm to Matlab rgb2gray function and the most recent stateof-the-art Smith’s [20] method on a wide variety of natural images. Isoluminant colors for attended areas and 341 preserve original color images’ sense of reality, we also take the variance into consideration and integrate it with previous two metrics into an optimization framework. Finally we solve the optimization problem and get the output grayscale images. The experimental results show that by using the proposed approach, color to grayscale transformation not only preserves visual attention, but also becomes more effective and efficient. We also notice some limitations of our method. For example, the balance parameter λ is set arbitrarily. We will develop a learning process to figure out the optimal λ for the optimization. In addition, as introduced previously, the performance of our method largely depends on the accuracy of the saliency maps from visual attention analysis. But, till now, the automatic visual attention analysis is still an open problem. However, these points do not impact on the conclusions of this paper or the theory presented. [10] X. Hou and L. Zhang. Saliency detection: A spectral residual approach. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 0:1–8, 2007. 4 [11] L. Itti and P. Baldi. Bayesian surprise attracts human attention. Advances in neural information processing systems, 18:1–8, 2006. 1 [12] L. Itti and C. Koch. Computational modeling of visual attention. Nature Reviews, Neuroscience, 2:194–203, 2001. 3 [13] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1998. 4 [14] W. James. The Principles of Psychology. Holt, New York, 1890. 2 [15] I. T. Jolliffe. Principal Component Analysis. Springer, 2002. 2, 4 [16] T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict where humans look. International Conference on Computer Vision, 2009. 3, 4 [17] S. Mika, B. Scholkopf, A. Smola, K. R. Muller, M. Scholz, and G. Ratsch. Kernel pca and de-noising in feature spaces. Advances in neural information processing systems, 11:536– 542, 1999. 2 [18] A. Oliva, A. Torralba, M. Castelhano, and J. Henderson. Contextual guidance of eye movements and attention in realworld scenes: The role of global features in object search. Psychological Review, 113:766–786, 2006. 1 [19] K. Rasche, R. Geist, and J. Westall. Detail preserving reproduction of colour images for monochromats and dichromats. IEEE Computer Graphics and Applications, 25(3):22– 30, 2005. 1, 2, 5 [20] K. Smith, P. E. Lands, and K. M. J. Thollot. Apparent greyscale: A simple and fast conversion to perceptually accurate images and video. Computer Graphics Forum, 27,3, 2008. 2, 5 [21] D. Socolinsky and L. Wolff. Multispectral image visualization through first-order fusion. IEEE Transaction on Image Processing, 11(8):923–931, 2002. 2 [22] A. Teschioni, C. S. Regazzoni, and E. Stringa. A markovian approach to color image restoration based on space filling curves. International Conference on Image Processing Proceedings, 2:462–465, 1997. 2 [23] J. Tsotsos. Analyzing vision at the complexity level. Behavioral and Brain Sciences, 13:423–445, 1990. 1 [24] H. R. Wu and K. R. Rao. Digital Video Image Quality and Perceptual Coding. CRC Press, 2001. 2 [25] G. Wyszecki and W. S. Stiles. Colour Science: Concepts and Methods, Quantitative Data and Formulae. WileyInterscience, 2000. 2 [26] Y. Yang, M. Song, N. Li, J. Bu, and C. Chen. What is the chance of happening: a new way to predict where people look. 11th European Conference on Computer Vision, 2010. 3 8. Acknowledgment This paper is supported by the National Natural Science Foundation of China under Grant 60873124, by the Natural Science Foundation of Zhejiang Province under Grant Y1090516, and by the Fundamental Research Funds for the Central Universities under Grant 2009QNA5015. References [1] A. Alsma and O. Kolas. Grey colour sharpening. Proc. of 14th Color Imaging Conf., pages 263–267, 2006. 2 [2] R. Bala and R. Eschbach. Spatial color-to-grayscale transform preserving chrominance edge information. Color Imaging Conference, pages 82–86, 2004. 2 [3] R. Brown. Photoshop tips: Converting colour to black-andwhite. http : //www.russellbrown.com/tipst ech.html, 2006. 1, 2 [4] N. Bruce and J. Tsotsos. Saliency based on information maximization. Advances in neural information processing systems, 18:155–162, 2006. 1, 4 [5] N. D. B. Bruce and J. K. Tsotsos. Saliency, attention, and visual search: An information theoretic approach. Journal of Vision, 9(3):1–24, 2009. 3 [6] C.Koch and S.Ullman. Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4:219–227, 1985. 1, 3 [7] Y. fei Ma and H. jiang Zhang. Contrast-based image attention analysis by using fuzzy growing. ACM Multimedia, 34:374–381, Nov 2003. 3 [8] D. Gao and N. Vasconcelos. Bottom-up saliency is a discriminant process. IEEE International Conference on Computer Vision, 2007. 1 [9] A. Gooch, J. Tumblin, and B. Gooch. Color2gray: Saliencepreserving colour removal. ACM Transactions on Graphics, 24(3):634–639, 2005. 1, 2 342