Color to Gray: Attention Preservation

2010 Fourth Pacific-Rim Symposium on Image and Video Technology
Color to Gray: Attention Preservation
Yezhou Yang1∗, Mingli Song1 , Jiajun Bu1 , Chun Chen1 , Cheng Jin2
Zhejiang University, 2 Fudan University
dering of the full color image by linearly combining R, G,
and B according to conventional method fails to preserve
the visual attention on the red pepper (pimiento).
In this paper, we propose an approach to preserve a crucial visual cue in color to grayscale transformation: attention. The main contributions are threefolds: 1) preserving
visual attention is more biological plausible than preserving other low level cues, which makes our method more
reasonable in theory from both biological and psychological aspects; 2) We treat the saliency map from visual attention analysis as a classifier and aim to preserve attention
area in the output grayscale image. 3) A simple minimizing function toward this specific problem is established and
can be easily solved. Experimental results on test images
indicate that our method is both practical and effective in
preserving visual attention from the original color image to
corresponding grayscale image.
It is widely accepted that grayscale mappings of color
images, which are transformed solely by spectral uniformity
are often inadequate because many visual cues are lost, especially visual attention. When we design an applicable and
user-friendly color to gray algorithm, the natural characteristics of human visual systems should be concerned into the
framework. Given a color image, the tremendous amount
of visual information in it makes it difficult to be fully processed by human visual system[23]. Thus, in front of such
an image, human tends to pay more attention in a few parts
of the image and neglect others. Thus, many vision scientists [8, 18, 4, 11] nowadays state that visual attention acts
as an extremely crucial part of human vision system from an
aspect of psychophysics and neuroscience. Different from
those low level visual cues, such as contrast, attention analysis builds an elegant mapping from the biological nature
of vision system to computational implementation, which
provide us a higher level visual cue.
1. Introduction
Conventional algorithms have been widely used in color
to gray applications, for example, in publishing as a less
expensive alternative to full color images, making color
blind people perceive visual cues better in color images,
and enhancement for the e-ink based book reader, which
can only support grayscale rendering currently. Color to
gray algorithms [19, 9, 3] are used to convert a color image
to a grayscale one while preserving important visual cues.
However, the definition of important visual cue is still subjective and difficult to extract in the computer vision field.
Although some researchers [9] mentioned the visual cue
preservation for color to gray, they didn’t give the formal
definition of important visual cues. Fortunately, it is obvious that only the important visual cues can attract human visual attention. In other words, human visual system is usually sensitive to the important visual cues and neglects other
information in the image. So, if we want to transfer the visual cues successfully from a color image to a grayscale
one, the color to gray algorithm must keep the visual attention unchanged first of all. For example, as shown in the
transformed grayscale image in Figure 1, the grayscale ren-
On the other hand, as viewers, we are less concerned
with the accuracy of light intensities and expect digital
grayscale images to preserve meaningful visual cues, which
can help us to detect the most important scene features. A
widely accepted important visual cue is attention[6]. That
is, viewers expect grayscale images to preserve some specific areas, which attract people’s attention in their original
colored ones. Consider those patients suffered from completely color blind, who can only see the world in grayscale,
their visual system can still depict out important parts of a
scene, and the embedded color to grayscale system of them
apparent does not solely record the light intensities.
In this paper, we propose that preserving attention areas
in an image is much more important than representing absolute pixel values in a color to grayscale solution or preserving other low level visual cues. We use well developed attention analysis and detect technology to automatically get
attention description from original colored images. Then
we use them as the ground-truth and try to propose a method
to get a grayscale image which has a similar attention description with its original color one. Unlike other color to
∗ Yang is now with Department of Computer Science, University of
Maryland, College Park
978-0-7695-4285-0/10 $26.00 © 2010 IEEE
DOI 10.1109/PSIVT.2010.63
Figure 1. (a) Original Image (b) Matlab rgb2gray (c) Saliency map from eye-tracking data
grayscale models which also concern visual cues preservation, our method seeks visual attention preservation, which
is believed to be more biologically reasonable than other
low level visual cues and can be automatically detected using state-of-the-art techniques. This makes our algorithm
not only more efficient and precise, but also more biologically plausible.
As we introduced in the previous section, another trend
of color to grayscale research is based on preservation of
human vision cues, which is believed to be biologically
plausible. For example, as vision study shows that human
is more sensitive to edges than homogeneous color blocks,
Bala and Eschbach [2] propose a spatial approach to color to
grayscale conversion, in which they preserve chrominance
edges locally by introducing high-frequency chrominance
information into the luminance channel. Furthermore, it
is proposed by human vision scientists that human visual
system does not perceive absolute values, but based upon
relative assessments. Socolinsky and Wolff [21] have applied image gradient information to model the contrast of
a color image locally and used it as the visual cue to implement color to gray. However , because the contrast is
built based on the maximal gradient from different channels
at a short-scale, i.e., every pixel and its nearest neighbors,
this algorithm cannot deal well with long-scale contrast regions. Gooch et al. [9] introduced a local algorithm called
Color2Gray, in which the gray value of each pixel is iteratively adjusted to minimize an objective function, which is
based on local contrasts between all the pixel pairs. Rasche
et al [19] introduced a method in which they aim to preserve
local contrast while maintaining consistent luminance. All
of the methods above were proved to be effective from some
aspects. However, human vision system is too complex to
be represented solely by contrast or edges and most of the
methods need users to adjust parameters in order to create
aesthetic results and still may not necessarily preserve the
visual attention regions in the grayscale images.
2. Related Works
2.1. Color to Grayscale Image Conversions
Many color to grayscale algorithms [19, 9, 3, 20] have
been developed, based on a wide variety of techniques. The
classic definition of color removal is 3D to 1D transformation from a colored image to a grayscale image solely by
recording the light intensities. As a special case, color to
gray through linear combination of R, G and B channels is
a kind of time-efficient approach. Wyszechi and Stiles [25]
have combined the R, G and B channels using a group of
linear mapping functions. Wu and Rao [24] have linearly
combined the R, G and B channels by defining a series of
policies to separate luminance value from the chrominance
value so as to construct the grayscale image based on the
luminance value. Researchers also treat color to gray as
a dimensionality reduction process which degrades a three
dimensional color space to the one dimensional grayscale
space. Both linear and nonlinear dimensionality reduction
techniques can be applied to do the color to gray operation.
For example, principal component analysis (PCA) [15] and
its kernel generalization (w.r.t., kernel PCA, KPCA) [17],
can be utilized to carry out the color to grey transformation. The main purpose of those research is building a 1D
parameterizations of 3D color space, such as creating clusters in the color space and applying space-filling curves on
them [22]. Alsams and Kolas [1] introduced a conversion
method that aims to create sharp grayscale from the original color rather than enhancing the separation between colors and Smith et al. [20] combines global and local conversions in a way similar to Alsam and Kolas [1]. However,
such methods ignore the visual cues preservation and may
not necessarily preserve the visual attention.
2.2. Visual Attention Analysis
Tracing back to 1890, according to James’[14] suggestion, visual attention serves as a mediating mechanism involving competition between different aspects of the visual
scene and selecting the most relevant areas to the detriment of others. As the mediation mechanism of human vision is not completely understood yet, using computational
methods to analyze scenes becomes the main trend to determine the important and representative regions of an image.
This kind of computational models is called visual attention
analysis. Based on the work of Koch [6], Itti proposed a
saliency-based visual attention model for scene analysis in
[12]. In this work, visual input is first decomposed into a
set of topographic feature maps which all feed into a master “saliency map” in a bottom-up manner. Then, a W T A
(Winner-Take-All) competition is employed to select the
most conspicuous image locations as attended points. This
method has been proved to be an elegant map from biological theories to computational implementation. Although
after Itti, other researchers proposed visual attention analysis methods based on other consumptions, such as natural image statistics [5] [26] or large scale machine learning
[16], and they claim their methods to be more accurate with
actual human eye tracking experiments, the main idea of
visual attention proposed stays unchanged and Itti’s implementation is still the most classic and widely believed accurate one. So, in this paper, we use this implementation of visual attention analysis as the automatically detected groundtruth for our method. The input of this implementation is a
natural color image and the output of the process we used
in our method is a “saliency” map, which is claimed to be
representation of the human attention distribution.
Unfortunately, none of the state-of-the-art visual attention detectors can guarantee a perfect prediction of attention area. To those images accompanied with eye-tracking
experiments data, we also use “saliency map” computed directly from eye-tracking fixations as the ground-truth. And
to those images without eye-tracking data, we also use human labeling data to act as visual attention ground-truth in
our method for testing. However, these points do not impact
on the conclusions of this paper or the theory presented, as
we can assume a more stable and robust visual attention detector is just around the corner.
grayscale image and the variance of the grayscale image.
At last, to balance these three metrics, an object function is
constructed and its corresponding optimization problem is
4. Algorithm
4.1. Definition of the Problem
To a color image, it is a group of 3D points. We use
C to represent this group. Thus, if there are N points in
the color image, C = [c1 , c2 ...cN ]. Each point cx is a 3D
vector [Rx , Gx , Bx ]. Similarly, we can treat the grayscale
image as a group of 1D points. We use G to represent this
group. G = [g1 , g2 ...gN ] and each point gx is a 1D scalar.
As claimed in the previous section, we can treat color
to gray as a dimensionality reduction process which degrades a three dimensional color space to the one dimensional grayscale space. The problem becomes to seek a
proper principle vector W = [w1 , w2 , w3 ] to project the 3D
group C onto 1D group G and preserve the visual attention
cues in C to G. Formally, it is:
G = WC
4.2. Visual Attention Cue
From computational visual attention analysis method,
we can get the saliency map S of each input color image.
In our approach, we treat the saliency map as a classifier L,
which labels the pixels in the original color image into two
classes: attended and not-attended. Apparently, the proportion of attended pixels varies among different images, we
can use methods like fuzzy growing [7] or just arbitrarily
set a threshold ε to extract attended area from saliency map
S, and divide it into two classes:
1 if S(x,y) > ε
L(x, y) =
0 Otherwise
3. Attention Preservation
As demonstrated in the previous section, we want to produce a graysacle image, which has similar attention description with its original color one. Here comes the problem:
how to preserve attention description? Psychology study
shows that features which has a deviation with feature average tend to be attractive. So, in order to preserve attention
areas in the grayscale image, we should firstly distinguish
the attention areas from other areas in the grayscale image.
On the other hand, the grayscale image should also preserve the sense of reality of its corresponding color image.
Only distinguishing attention areas from the image will lead
to a problem of over-clustering, which devastates the sense
of reality of the original color image. So, the grayscale image should also preserve the variance of the original color
image within an reasonable scale.
In the next section, three metrics are proposed to evaluate the deviation from attention areas to the rest of the
Now we get the pixels groups Cl and Gl , which are
integrated with visual attention label information: Cl =
[cl1 , cl2 ...clN ], Gl = [g1l , g2l ...gN
In order to distinguish the attended class from notattended class, we could minimize the distance within each
class while maximize the distance between classes.
Follow the definition of Linear Discriminate Analysis,
we define two distance matrix in our two classes problem:
“between classes scatter matrix” Db and “within classes
scatter matrix” Dt :
Db = N1 (μ1 − c)(μ1 − c)T + N2 (μ2 − c)(μ2 − c)T (3)
Dt =
c=1,2 i∈c
(ci − μc )(ci − μc )T
1 μc =
Nc i∈c
1 1 ci =
Nc μc
N i
N c=1,2
Dt−1 (λDb + (1 − λ)Σ)W = γW
This is a generalized eigen-value equation and apparently the eigen-vector corresponding to the largest eigenvalue can maximize the objective. The corresponding
eigen-vector can then be treated as the projection principle
vector W . Refer to eq.(1), we can get the output grayscale
image G.
After projection, the between classes distance is
W T Db W and the within classes distance is W T Dt W .
4.3. Preserve the Variance
5. Experiment and Performance
Solely maximizing between classes distance while minimizing within classes distance will lead to over-clustering,
which will make the output grayscale image lose the sense
of reality. In order to prevent the points in grayscale image from over-clustering, we have to preserve the variance
of the points group G. Formally, we have to make sure the
value of var(G) = var(W C) is within an reasonable scale.
Follow the definition of Principle Component Analysis
[15], var(W C) = W T ΣW , in which Σ is the covariance
matrix of C:
5.1. Implementation
Σ = (C − u)T (C − u)
We use the natural images mainly from eye tracking
database [4] as our test samples, as Itti’s method is completely tested on this database. We also tested our method
on several other image databases, such as Hou’s[10],
Judd’s[16] and images downloaded from the internet. The
main framework of our method has three steps:
(1) We apply Itti’s implementation [13] of visual attention analysis on original color image (C). As above mentioned, the output of this process is a small size grayscale
map called “saliency” map. This map is believed to depict the human attention distribution on an image, while a
brighter area represents a more attending area in the original
image. Then we resize the saliency map into the same size
of the original image. To those images which Itti’s implementation fails to predict accurate attention areas, we also
use the “saliency” maps from eye-tracking data and human
label data.
(2) We compute the classifier L from the saliency map
S. Then we compute the “within classes scatter matrix”
Dt and the “between classes scatter matrix” Db from the
data of original color image with attention labels. Also, we
compute the covariance matrix Σ from the data of original
color image. At last, the generalized eigen-value problem
in eq.(11) is solved using Matlab and get the the projection
principle vector W .
(3) We substitute W into eq.(1) and then get the final
grayscale image (G).
4.4. The Minimization Problem
In all, we want to maximize W T Db W , minimize
W Dt W and keep the value of WT ΣW within an reasonable scale. We introduce a balance parameter λ, and define
the objective function J(W ) as follows:
J(W ) =
λW T Db W + (1 − λ)W T ΣW
W T Dt W
An important property to notice about the objective J is
that it is invariant w.r.t. rescalings of the vectors W = αW .
Hence, we can always choose W such that the denominator
is simply W T Dt W = 1, since it is a scalar itself. For this
reason we can transform the problem of maximizing J into
the following constrained optimization problem:
−(λW T Db W + (1 − λ)W T ΣW )
W T Dt W = 1
5.2. Performance
Our method contains two separate time consuming process: visual attention analysis and an eigen-value problem
solver. As we only use the “saliency” map from Itti’s implementation, we can discard the process of Winner Take All
(W T A) competition artificial neural networks, which is the
most time consuming part. Experiments show that Itti’s implementation has an approximate computational complexity
of O(P ), in which P represents the number of pixels.
On the other side, the computational complexity to construct and solve the eigen-value problem is O(P ) too. Thus,
the upper bound of total computational complexity is still
Corresponding to the lagrangian,
L(W ) =
− (λW T Db W + (1 − λ)W T ΣW )
+ γ(W T Dt W − 1)
The KKT conditions tell us that the following equation
needs to hold at the solution,
Figure 4. (a) Original Image (b) Matlab rgb2gray (c) Rasche’s
method[19] (d) Our method
not-attended areas make the output Matlab and Smith’s
grayscale image lose the attention information. The effects of our attention preservation algorithm are sometimes
subtle, but color changes visibly brightened the region of
the yellow flower (Row 1) and darkened the red fire extinguisher (Row 2) for example.
We also test our method on Color Blindness Chart (Figure 3). Apparently Color Blindness Chart sometimes does
not have an explicit attention area. However, our method
can still produce reasonable results, especially in those images that contain large isoluminant regions with a small
number of different chrominance values.
Figure 4 compares our attention preservation algorithm
to Rasche’s [19] method on natural images. Rasche’s
method can preserve a low level visual cue: local contrast. For some images, areas with high local contrast coincidentally match the attended areas, in which Rasche’s
method can provide better performance. However, as color
quantization has to be done as preprocessing of Rasche’s
method for the consideration of time complexity, the result
of Rasche’s method normally lose the original color image’s
sense of reality (Row 1 and Row 2). On the contrary, as we
take the variance of color image into consideration, our result can preserve attention areas without damaging the sense
of reality. Moreover, when high local contrast area does not
match attention area, our method can provide better performance (Row 3 and Row 4).
Figure 2. (a) Original Image (b) Matlab rgb2gray (c) Smith’s
method[20] (d) Our method
Figure 3. (a) Original Image (b) Matlab rgb2gray (c) Smith’s
method[20] (d) Our method
O(P ). Using an Intel Pentium 4 2.93GHz processor, computing 600*800 image requires 54 seconds, which includes
40 seconds to run visual attention analysis and 14 seconds to
construct and solve the eigen-value problem. Considering
the time consumption of memory to disk operations on large
scale sparse matrix and the uncertain complexity of iterative
method, the experimental result approximately matches our
expectation for the computational complexity of our algorithm. Under same conditions, Rasche’s method [19] costs
almost 60 seconds to do color quantization, which can reduce the number of different color to 256, and then needs
another 60 seconds to run the optimization process. Without
color quantization, Rasche’s method costs more than 1400
seconds to run the optimization process.
7. Conclusion and Future Work
In this paper, we introduce a new approach to the color
to grayscale problem, which can preserve visual attention.
The saliency maps, computed from original color image, are
used as ground-truth to classify each pixel into two classes:
attended and not-attended. In order to preserve the visual
attention in the grayscale image, we try to distinguish the
attended area by maximizing the ratio of between classes
distance and within classes distance. At the same time, to
6. Results and Discussion
Figure 2 compares our attention preservation algorithm
to Matlab rgb2gray function and the most recent stateof-the-art Smith’s [20] method on a wide variety of natural images. Isoluminant colors for attended areas and
preserve original color images’ sense of reality, we also take
the variance into consideration and integrate it with previous two metrics into an optimization framework. Finally we
solve the optimization problem and get the output grayscale
images. The experimental results show that by using the
proposed approach, color to grayscale transformation not
only preserves visual attention, but also becomes more effective and efficient.
We also notice some limitations of our method. For example, the balance parameter λ is set arbitrarily. We will
develop a learning process to figure out the optimal λ for
the optimization. In addition, as introduced previously, the
performance of our method largely depends on the accuracy
of the saliency maps from visual attention analysis. But, till
now, the automatic visual attention analysis is still an open
problem. However, these points do not impact on the conclusions of this paper or the theory presented.
[10] X. Hou and L. Zhang. Saliency detection: A spectral residual
approach. Computer Vision and Pattern Recognition, IEEE
Computer Society Conference on, 0:1–8, 2007. 4
[11] L. Itti and P. Baldi. Bayesian surprise attracts human attention. Advances in neural information processing systems,
18:1–8, 2006. 1
[12] L. Itti and C. Koch. Computational modeling of visual attention. Nature Reviews, Neuroscience, 2:194–203, 2001. 3
[13] L. Itti, C. Koch, and E. Niebur. A model of saliency-based
visual attention for rapid scene analysis. IEEE Trans. on
Pattern Analysis and Machine Intelligence, 1998. 4
[14] W. James. The Principles of Psychology. Holt, New York,
1890. 2
[15] I. T. Jolliffe. Principal Component Analysis. Springer, 2002.
2, 4
[16] T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning
to predict where humans look. International Conference on
Computer Vision, 2009. 3, 4
[17] S. Mika, B. Scholkopf, A. Smola, K. R. Muller, M. Scholz,
and G. Ratsch. Kernel pca and de-noising in feature spaces.
Advances in neural information processing systems, 11:536–
542, 1999. 2
[18] A. Oliva, A. Torralba, M. Castelhano, and J. Henderson.
Contextual guidance of eye movements and attention in realworld scenes: The role of global features in object search.
Psychological Review, 113:766–786, 2006. 1
[19] K. Rasche, R. Geist, and J. Westall. Detail preserving reproduction of colour images for monochromats and dichromats. IEEE Computer Graphics and Applications, 25(3):22–
30, 2005. 1, 2, 5
[20] K. Smith, P. E. Lands, and K. M. J. Thollot. Apparent
greyscale: A simple and fast conversion to perceptually accurate images and video. Computer Graphics Forum, 27,3,
2008. 2, 5
[21] D. Socolinsky and L. Wolff. Multispectral image visualization through first-order fusion. IEEE Transaction on Image
Processing, 11(8):923–931, 2002. 2
[22] A. Teschioni, C. S. Regazzoni, and E. Stringa. A markovian
approach to color image restoration based on space filling
curves. International Conference on Image Processing Proceedings, 2:462–465, 1997. 2
[23] J. Tsotsos. Analyzing vision at the complexity level. Behavioral and Brain Sciences, 13:423–445, 1990. 1
[24] H. R. Wu and K. R. Rao. Digital Video Image Quality and
Perceptual Coding. CRC Press, 2001. 2
[25] G. Wyszecki and W. S. Stiles. Colour Science: Concepts
and Methods, Quantitative Data and Formulae. WileyInterscience, 2000. 2
[26] Y. Yang, M. Song, N. Li, J. Bu, and C. Chen. What is the
chance of happening: a new way to predict where people
look. 11th European Conference on Computer Vision, 2010.
8. Acknowledgment
This paper is supported by the National Natural Science
Foundation of China under Grant 60873124, by the Natural Science Foundation of Zhejiang Province under Grant
Y1090516, and by the Fundamental Research Funds for the
Central Universities under Grant 2009QNA5015.
[1] A. Alsma and O. Kolas. Grey colour sharpening. Proc. of
14th Color Imaging Conf., pages 263–267, 2006. 2
[2] R. Bala and R. Eschbach. Spatial color-to-grayscale transform preserving chrominance edge information. Color Imaging Conference, pages 82–86, 2004. 2
[3] R. Brown. Photoshop tips: Converting colour to black-andwhite. http : // ech.html,
2006. 1, 2
[4] N. Bruce and J. Tsotsos. Saliency based on information maximization. Advances in neural information processing systems, 18:155–162, 2006. 1, 4
[5] N. D. B. Bruce and J. K. Tsotsos. Saliency, attention, and
visual search: An information theoretic approach. Journal
of Vision, 9(3):1–24, 2009. 3
[6] C.Koch and S.Ullman. Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology,
4:219–227, 1985. 1, 3
[7] Y. fei Ma and H. jiang Zhang. Contrast-based image attention analysis by using fuzzy growing. ACM Multimedia,
34:374–381, Nov 2003. 3
[8] D. Gao and N. Vasconcelos. Bottom-up saliency is a discriminant process. IEEE International Conference on Computer
Vision, 2007. 1
[9] A. Gooch, J. Tumblin, and B. Gooch. Color2gray: Saliencepreserving colour removal. ACM Transactions on Graphics,
24(3):634–639, 2005. 1, 2