An Improved Color Coherence Vector Method for CBIR Xingfeng Chen1,2,3, Xingfa Gu1,2,3, Hua Xu2,3 1. College of Automation, University of Electronic Science and Technology of China,Chengdu 610054, China 2. State key Laboratory of Remote Sensing Science, Institute of Remote Sensing Applications, CAS, Beijing 100101, China 3. Demonstration Center of Spaceborn Remote Sensing, National Space Administration, Beijing 100101, China 1 chenxingfeng_001@163.com Abstract: Content-based image retrieval (CBIR) applications provide great convenience which help people find what they need from the tremendous amount images. CBIR retrieves images by features including color, shape and texture. Color is the most popular one among vision features of image. Color Coherence Vector (CCV) is a classical method of CBIR. It is similar to the color histogram method, and it also considers some spatial feature, and is proved to be more effective. This paper describes an improved color coherence vector method, which has more spatial information, and works more efficiently. Keywords: Content-based image retrieval (CBIR); Color Coherence Vector (CCV) 1. Introduction Database management technology, Computer technology and Network technology are developing fast nowadays. It is possible to get the needed images from the tremendous images databases, at the same time, the problem is that images retrieval theory and technology lag seriously. In 1970s, most retrieval systems always used the text-based method. The traditional text-based retrieval method was already quite mature. For example, the famous image search engine—Yahoo, Google. But “Seeing something once better than hearing hundred times”, the text is unable to replace the direct-viewing visual communication method. Content Based Image Retrieval (CBIR), in brief, means extracting a range of images which is relevant with the given image from vision database. CBIR produced to 1990s, it retrieves images by some visual feature, just like color, shape, texture. Usually, human feel more sensitive to the color feature than to texture and shape. Also, computer used to describe images by RGB form, color feature extraction can save much time because of computing more easily. So, retrieval systems using color is most popular. In the early time, global histogram was used most frequently. But in fact, histogram can’t describe partial characteristics. That means two images with different vision may have the same histogram. Also, the histogram feature has too many dimensions. Considering the disadvantages of histogram, Pass [1] and his team have propounded Color Coherence Vector (CCV) method. Its kernel idea is divided to color histogram into two parts. A CCV stores the number of coherent versus incoherent pixels with each color. By separating coherent pixels from incoherent pixels, CCV contains some spatial information. So it provides finer distinctions than color histograms. This paper gives an improved CCV method with more spatial information and without much added computing work. First, the traditional CCV method will be introduced; then, a particular description of improved CCV method will be given; following, we describe computation of the distance between two vectors; at last, we’ll analyze the improved CCV and give our evaluation and prospect. 2. Color Coherence Vector (CCV) method Before, many CBIR systems retrieve images by color histogram. But color histogram is not efficient to small changes for images’ comparing. Often, different images may have the same histogram. It is not hard to find that color histogram is a globe feature, and contains little spatial information. In order to contain more spatial information, CCV was propounded. CCV divided every pixel region of color histogram into two parts. Following words give the detailed definition. We classify the pixels within a given color bucket as either coherent or incoherent. A coherent pixel is part of a large group of pixels of the same color, while an incoherent pixel is not. We determine the pixel groups by computing connected components. If a same color group contains more coherent pixels than a threshold value defined beforehand, it belongs to coherence pixels. And the rest pixels are incoherent. For a given discretized color, some of the pixels with that color will be coherent and some will be incoherent. Let us call the number of coherent pixel of the j’th discretized color j and the number of incoherent pixels j . Clearly, the total number of pixels with that color is j j , so a color histogram would summarize an image as: 1 1 , 2 2 ,, n n Instead, for each color we compute the pair j , j . which we will call the coherence pair for the j’th color. The color coherence vector for the image consists of 1 , 1 , 2 , 2 ,, n , n . Obviously, CCV contains some spatial information by computing the distribution of coherent and incoherent pixels. 3. Improved Color Coherence Vector When CCV was propounded, the author emphasized that it contained spatial information. CCV counts the coherent and incoherent pixels using a two-dimension vector, which describes the color distribution. So, it is a promotion of color histogram which is a popular and traditional method for CBIR. Now, let’s think something about the computing progress of CCV. Due to CCV pays attention to the “coherent” pixels, we must check and find the connected pixels in the image matrix. At last, we get α and β, so as the CCV. When we found the connected pixels, inevitably, the position information of the connected pixels also appeared at the same time. For the j’th discretized color, we define j to be the mean of coordinates of the pixels which is in the maximum connected region in the coherent pixels. is different from α (the number of coherent pixels), because it represents max-connected pixels’ position information but α represents number of coherent pixels. So, we describe the improved CCV as: 1 , 1 , 1 , 2 , 2 , 2 ,, n , n , n . Compared with the traditional CCV, the improved CCV added γ, but it was gained with α, so it didn’t cost more added computation quantity. The efficiency of CBIR system is not diminished evidently. Following, an example of improved CCV will be given. We make a threshold value τ=4, and give a gradation image with just 3 colors. The digital number matrix of the given image is: 2 1 2 2 1 1 2 2 1 2 1 1 2 1 3 2 1 1 2 2 2 1 1 2 2 2 1 1 2 2 2 2 1 1 2 2 First, we find the connected regions. We define A, B, C …… as which DN=1, 2, 3 …… A1 to be the first connected region which DN=1, so, A2 is the second. Also, C means the only connected region which DN=3. Following the above rules, the image matrix becomes: B1 A2 B1 B1 A1 A1 B1 B1 A2 B1 B1 A2 C B1 A1 A1 A1 A1 B1 B1 B1 B1 B1 A1 A1 A1 B 2 A1 B 2 B 2 B1 B1 A1 B 2 B 2 A1 It is easy to get the connected regions table: Label A1 A2 B1 B2 C Color 1 1 2 2 3 Size 12 3 15 5 1 Table 1: connected regions of each color According to the definition of improved CCV, we then find regions which have more than τ=4 pixels. Obviously, A1, B1, B2 are classified as coherent. A2, C1 are classified as incoherent. Now, we have got α and β. For color=1, A1 is the maximum connected region in coherent pixels, so γ1=(4,5); for color=2, B1 is max, so γ2=(3,2); for color=3, no coherent pixels, α=0 and no γ=(0,0). The improved CCV of this image is: Color 1 2 3 α 12 20 0 β 3 0 3 γ (4,5) (3,2) (0,0) Table 2: CCV of the image It also can be written: <(12,3,(4,5)), (20,0,(3,2)), (0,3,(0,0))>. γ gives the mid-point of the max-connected coherent region. So it adds spatial information of the image. γ don’t need extra computation, it can be got when we computing α. 4. Distance computation for Improved CCV Improved CCV contains four columns in fact. It costs several steps to get the final distance between two images. First, we compute the traditional CCV distance with α and β. The distance formula: n D1 ( j a j ' ) ( j j ' ) j 1 (1) This is an efficient and simple distance formula. Second, we compute the γ distance between two images. Formula as follows: n D2 ( j (1) j ' (1)) 2 ( j (2) j ' (2)) 2 j 1 (2) Third, D1 and D2 are different features. And their numbers are far, so we must make them unitary and weight them before computing the final distance. We assume that there are M images waiting for retrieval. So, the distance between reference image and image database is the matrix [ D1 , D2 ] which has the size 2×M, then we do a Gauss normalization on both vector D1 and D2 . Normalized matrix is [ D1 ’, D2 ’]. Final distance vector: D D1 'D 2 ' (3) 5. Experimental result We have 230 different images to experiment. Before comparing them, a mean-filter processed all the images in order to eliminate small variations between neighboring pixels. We get the feature database from the image database through extracting the improved CCV. Fig.1 is the program flow chart. First,we extracted both CCV and improved CCV of Fig.2. Its size is 256×256, just one layer. The former work cost 31.875s and the later cost 31.656s. The added time ratio is less than 0.7%. It just makes a little impact on efficiency of establishing feature database. Image database Reference image CBIR programs Mean filter Mean filter Feature extraction Feature database Feature extraction Similarity computation Arrangement Show the result Fig.1: Program flow chart Fig.2: Computing-time test image Second, let’s check the efficiency of CCV and the improved CCV. CBIR hasn’t an exact criterion of evaluation. In our experiment, CCV and the improved CCV retrieve images with different distance formulas, CCV uses (1), and the improved CCV uses (3). It is not easy to compare the two methods with detailed numbers. Always, we evaluate a retrieval method with three universal criterions: effectiveness, efficiency, flexibility. Effectiveness gets most attention. Through 53 experiments, the improved CCV performs better than the traditional CCV. The improved CCV is in the person of detail and color distribution. 6. Conclusion We have introduced an improved CCV method for CBIR. Color was investigated by many researchers. Some improvements for histogram can also be used on CCV, just like dividing the image into several subimages in order to get more spatial information. The improved CCV contains more spatial information, so it performs better than the traditional CCV as a color feature. But it is still deficient to describe an image, CBIR systems work with color, texture, shape and so on. We are expecting more exact description for an image, most important, we hope an international criterion for CBIR and its evaluation. References [1] Greg Pass and Ramin Zabih, Comparing Images Using Color Coherence Vectors, In ACM International Conference on Multimedia. Boston, MA, 65-73, 1996 [2] Zhiyong Zeng, Key techniques research of CBIR, Doctoral dissertation of Xidian University, 2006 [3] Gunther Heidemann, Combining spatial and colour information for content based image retrieval, Computer Vision and Image Understanding 234–270, 94, 2004 [4] WEI Na and GENG Guo-hua, An Overview of Performance Evaluation in Content-based Image Retrieval, Journal of Image and Graphics, Vol.9, No.11, Nov. 2004 [5] HUANG Cheng and WANG Guoying, A Method of Image Retrieval Based on Color Coherence Vector, Computer Engineering, Jan. 2006 [6] Mario A. Nascimento, Effective and efficient region-based image retrieval, Journal of Visual Languages and Computing, 151–179,14, 2003 Biographies Xingfeng Chen, male, born in 1984, graduate of College of Automation, University of Electronic Science and Technology of China. His research interests include remote sensing image processing, database management, and pattern recognition. Xingfa Gu, male, born in 1962, Deputy and Executive Director of the Institute of Remote Sensing Application, Chinese Academy of Sciences. He researches on the quantitative RS and the radiometric calibration for a long period and has published many research papers. Hua Xu, male, born in 1978, teacher of Institute of Remote Sensing Application, CAS. His research refers to software design, remote sensing calibration.