IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622 Robust Perceptual Image Hashing Based on Ring Partition and NMF Zhenjun Tang, Xianquan Zhang, and Shichao Zhang, Senior Member, IEEE Abstract—This paper designs an efficient image hashing with a ring partition and a non-negative matrix factorization (NMF), which is with both the rotation robustness and good discriminative capability. The key contribution is a novel construction of rotation-invariant secondary image, which is used for the first time in image hashing and helps to make image hash resistant to rotation. In addition, NMF coefficients are approximately linearly changed by content-preserving manipulations, so as to measure hash similarity with correlation coefficient. We conduct experiments for illustrating the efficiency with 346 images. Our experiments show that the proposed hashing is robust against content-preserving operations, such as image rotation, JPEG compression, watermark embedding, Gaussian low-pass filtering, gamma correction, brightness adjustment, contrast adjustment and image scaling. Receiver operating characteristics (ROC) curve comparisons are also conducted with the stateof-the-art algorithms, and demonstrate that the proposed hashing is much better than all these algorithms in classification performances with respect to robustness and discrimination. Index Terms—Image hashing, multimedia security, non-negative matrix factorization, ring partition. —————————— —————————— 1 INTRODUCTION T HE Internet and multimedia devices, such as digital cameras, scanners and smart cellphones, have made us easier and faster to capture, store, share and convey hundreds of thousands of images, songs and videos. While we are enjoying our life increasingly with multimedia products, there are still many challenging problems faced by academia and industries. For example, there may be multiple copies for an image in a PC, where the copies can be in different digital representations with visual contents the same with the original one. It is clearly an actual yet challenging issue to efficiently search for all similar versions (including the original one and its copies) of the image from large-scale multimedia data [1, 2]. In particular, while powerful multimedia tools make digital operations, such as editing and tampering, much easier than ever before, assurance of multimedia content security has been an important concern to multimedia community [3]. These real demands lead to an emerging multimedia technology, known as image hashing. We study a new yet robust image hashing in this paper. Image hashing not only allows us to quickly find image copies in large databases, but also can ensure content security of digital images. Image hashing maps an input image to a short string, called image hash, and has been widely used in image retrieval [1], image authentication [4], digital watermarking [5], image copy detection [6], tamper detection [7], image indexing [8], multimedia forensics [9], and reduced-reference image quality assessment [10]. There are some classical cryptographic hash functions, e.g., SHA-1 and MD5, which can convert input message (document, image, graphic, text, video, etc.) into a fixedsize string. However, they are sensitive to bit-level changes and cannot be suitable for image hashing. This is because digital images often undergo normal digital processing, such as JPEG compression and image enhancement in real applications, without changing visual contents of images. This is the first one of two basic properties of image hash function, known as perceptual robustness, i.e., a hash function should be robust against content-preserving operations, such as geometric transform, format conversion and JPEG compression. In other words, hashes of an image and its processed versions are expected to be the same or very similar. Hash will be expected to be significantly changed only when visual content is altered by malicious operations, such as object deletion and object insertion. Another basic property is the discriminative capability, i.e., images with different contents should have different hashes. This means that hash distance between different images should be large enough. Moreover, image hash function should satisfy additional properties when it is applied to some applications. For example, image hash must be dependent on one ———————————————— or several keys for application in digital forensics. Z. Tang is with Department of Computer Science, Guangxi Normal UniConsequently, there are many image hashing algoversity, Guilin 541004, China. E-mail: tangzj230@163.com. X. Zhang is with Department of Computer Science, Guangxi Normal Uni- rithms successfully designed in last decade. From an applied context, there are still some limitations in image versity, Guilin 541004, China. E-mail: zxq6622@126.com. S. Zhang is with Department of Computer Science, Guangxi Normal Uni- hashing. For example, image rotation is a useful operation. versity, Guilin 541004, China; and the Faculty of Information Technology, In the post-processing of photographs, image rotation has University of Technology, Sydney, NSW 2007, Australia. E-mail: frequently been exploited to correct those photographs zhangsc@gxnu.edu.cn. (Corresponding author) with imperfect ground level. However, most of them are Manuscript received Sep. 4, 2012. Please note that all acknowledgments should be sensitive to rotation, mainly including, such as [4], [11], placed at the end of the paper, before the bibliography. xxxx-xxxx/0x/$xx.00 © 200x IEEE 1 2 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622 [12], [13], [14] and [15]. This means that they will falsely identify those corrected photographs as different images. Some methods are robust against rotation, but their discriminative capabilities are not good enough, such as [16], [17],[18], [19], [20] and [21]. It is a challenging task to simultaneously satisfy both the requirements of rotation robustness and good discriminative capability in developing high performance hashing algorithms. In this paper, we propose an efficient robust image hashing to meet both rotation robustness and good discriminative capability. This algorithm is based on a ring partition and a nonnegative matrix factorization (NMF). The key technique is a novel construction of secondary image achieved by the ring partition. The secondary image is invariant to image rotation and then makes our algorithm resistant to rotation. In addition, the use of NMF provides the proposed algorithm good discriminative capability. This is because NMF is an efficient technique for learning parts-based representation, and the more information about local image content an image hash contains, the more discriminative the image hash. We conduct experiments for illustrating the efficiency with 346 images, including 146 images in the USC-SIPI Image Database [22], and 200 different color images, where 100 images are taken from the Ground Truth Database [23], 67 images are downloaded from the Internet, and 33 images are captured by digital cameras. Our experiments demonstrate that the proposed algorithm reaches a desirable tradeoff between the rotation robustness and the discriminative capability. The rest of this paper is organized as follows. Section 2 briefly recalls the related works on image hashing. Section 3 describes the proposed image hashing. Section 4 and Section 5 present the experimental results and performance comparisons, respectively. Conclusions are finally drawn in Section 6. 2 RELATED WORKS The earliest work of image hashing is introduced by Schneider and Chang [24] at the 3rd International Conference on Image Processing (ICIP). At the 7th ICIP, Venkatesan et al. [11] published their famous paper entitled ‘Robust image hashing’. From then on, many researchers have devoted themselves to developing image hashing algorithms. In terms of the underlying techniques, the existing hashing algorithms can be roughly classified into five categories as follows. The first class includes hashing algorithms based on discrete wavelet transform (DWT) [4, 11, 25, 26]. Venkatesan et al. [11] used statistics of wavelet coefficients to construct image hashes. This method is resilient to JPEG compression, median filtering and rotation within 2°, but fragile to gamma correction and contrast adjustment. Monga and Evans [25] exploited the end-stopped wavelet transform to detect visually significant feature points. To make a short hash, Monga et al. [26] proposed a heuristic clustering algorithm with a polynomial time for feature point compression. Ahmed et al. [4] gave a secure image hashing scheme for authentication by using DWT and SHA-1. It can be applied to tamper detection, but fragile to some normal digital processing, such as brightness adjustment and contrast adjustment. The second class is the use of discrete cosine transform (DCT) [12, 13]. Fridrich and Goljan [12] found that DCT coefficients can indicate image content and then proposed a robust hashing based on this observation for application in digital watermarking. This hashing is robust against normal processing, but sensitive to image rotation. Lin and Chang [13] designed an image authentication system using robust hashing, which is based on invariant relations between DCT coefficients at the same position in separate blocks. The hashing method can distinguish JPEG compression from malicious attacks and is also fragile to rotation. The third class advocates the Radon transform (RT) [16, 18, 27]. Lefebvre et al. [16] are the first of exploiting RT to construct robust hashes. Seo et al. [27] used autocorrelation of each projection in the RT domain to design image hashing. Motivated by RT, Roover et al. [18] designed a scheme called RASH method by dividing an image into a set of radial projections of image pixels, extracting RAdial Variance (RAV) vector from these radial projections, and compressing the RAV vector by DCT. The RASH method is resilient to image rotation and rescaling, but its discriminative capability needs to be improved. The fourth class develops image hashing techniques with discrete Fourier transform (DFT) [14, 19, 21]. Swaminathan et al. [19] used the DFT coefficients to produce image hashes. This algorithm is resilient to several content-preserving modifications, such as moderate geometric transforms and filtering. Wu et al. [14] exploited RT combining with DWT and DFT to develop image hashing. Their algorithm is robust against print-scan attack and small angle rotation. Recently, Lei et al. [21] extracted moment features from RT domain and used significant DFT coefficients of the moments to produce hashes. The RT-DFT hashing is better than the methods [19, 27] in classification performances between the perceptual robustness and the discriminative capability. The last class proposes to employ the matrix factorization [7, 17, 20]. Kozat et al. [17] viewed images and attacks as a sequence of linear operators, and proposed to calculate hashes using singular value decompositions (SVDs). This method, called SVD-SVD hashing, is robust against geometric attacks, e.g., rotation, at the cost of significantly increasing misclassification. Monga and Mihcak [20] pioneered the use of NMF to derive image hashing. They applied NMF to some sub-images, used the combination coefficients in the matrix factorization to construct a secondary image, obtained its low-rank matrix approximation by using NMF again, and concatenated the matrix entries to form an NMF-NMF vector. To make a short hash, they calculated the inner product between the NMF-NMF vector and a set of weight vectors which have i.i.d. Gaussian components of zero mean and unit variance. The NMF-NMF-SQ hashing is resilient to geometric attacks, but it cannot resist some normal manipulations, e.g., watermark embedding. Tang et al. [7] observed the invariant relation existing in the NMF coefficient matrix Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF and used it to construct robust hashes. This method is robust against JPEG compression, additive noise and watermark embedding, but also fragile to image rotation. Besides the above five categories, there are also many other significant techniques for image hashing. For example, Mao and Wu [28] investigated the unicity distances of the hashing methods [11, 19] and concluded that the key reused several dozen times is no longer reliable. Khelifi and Jiang [29] proposed a robust image hashing algorithm by using the theory of optimum watermark detection. This method is resilient to rotation within 5°. Lu and Wu [9] designed an image hashing for digital forensics using visual words. Tang et al. [15] proposed a lexicographical framework for image hashing and gave an implementation based on DCT and NMF. In another work, Tang et al. [30] used structural features to design image hashing and proposed a similarity metric for tamper detection. A common weakness of the algorithms [15, 30] is the fragility to rotation. Recently, Lv and Wang [31] designed a hashing algorithm using scale invariant feature transform (SIFT) and Harris detector. Tang et al. [32] converted an RGB color image into YCbCr and HSI color spaces, and extracted invariant moments [33] from each component to construct hashes. This method is resistant to image rotation with arbitrary degree, but will falsely identify 0.43% different images as similar images when 98% visually identical images are correctly detected. In another study, Li et al. [34] proposed a robust image 3 hashing based on random Gabor filtering (GF) and dithered lattice vector quantization (LVQ), where GF is used to extract image features resilient to rotation and LVQ is exploited to conduct compression. The GF-LVQ hashing is also robust against image rotation with arbitrary degree and has a fast speed. In [35], Tang et al. exploited histogram of color vector angles to generate hashes. This algorithm is robust against rotation with arbitrary degree, but will mistakenly view 17.05% different images as identical images when the ratio of correct detection is 98%. Zhao et al. [36] used Zernike moments to design image hashing, which only tolerates rotation within 5°. Robustness performances of some typical hashing algorithms against common content-preserving operations are summarized in Table 1, where the word 'Yes' means that the algorithm is robust against such operation, the word 'No' implies that the algorithm is sensitive to the operation, and the word 'Unknown' represents that such robustness result has not been reported in the literature as far as we know. From the results, we observe that some state-of-the-art algorithms are robust against image rotation with arbitrary degree, such as [17], [18], [20], [21], [32] and [34]. However, we also find that these algorithms are not good enough in discrimination. Therefore, more efforts on developing high performance algorithms with rotation robustness and good discrimination are in demand. TABLE 1 HASHING ALGORITHMS AGAINST COMMON CONTENT-PRESERVING OPERATIONS Algorithm JPEG compression Watermarking embedding Image scaling Image rotation Gaussian low-pass filtering Gamma correction Brightness adjustment Contrast adjustment [4] Yes Unknown No No Yes Unknown No No [7] Yes Yes Yes No Yes Yes Yes Yes [11] Yes Unknown Yes Within 2° Yes No Unknown No [12] Yes Yes Yes No Yes Yes Yes Yes [15] Yes Yes Yes Within 1° Yes Yes Yes Yes [16] Yes Yes Yes Arbitrary degree Yes Yes Yes Yes [17] Yes Yes Yes Arbitrary degree Yes Yes Yes Yes [18] Yes Yes Yes Arbitrary degree Yes Yes Yes Yes [19] Yes Unknown Yes Within 20° No Yes No Unknown [20] Yes No Yes Arbitrary degree Yes Yes Yes Yes [21] Yes Yes Yes Arbitrary degree Yes Yes Yes Yes [25] Yes Unknown Yes Within 5° Yes Unknown Unknown Yes [30] Yes Yes Yes Within 1° Yes Yes Yes Yes [32] Yes Yes Yes Arbitrary degree Yes Yes Yes Yes [34] Yes Yes Yes Arbitrary degree Yes Yes Yes Yes 4 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622 3 PROPOSED IMAGE HASHING Our image hashing consists of three steps, as shown in Fig. 1. In preprocessing, the input image is converted to a normalized image with a standard size. In the second step, the normalized image is partitioned into different rings, which are then used to construct a secondary image. In the third step, NMF is applied to the secondary image, and image hash is finally formed by NMF coefficients. In the following subsections, we will present an NMF, the scheme of ring partition, and the illustration of our image hashing. 3.2 Ring partition In general, rotation manipulation takes image center as origin of coordinates. Fig. 2 (a) is a central part of Airplane, and (b) is the corresponding part of the rotated Airplane. Obviously, visual contents in the corresponding rings of Figs. 2 (a) and (b) are kept unchanged after image rotation. Thus, to make image hash resilient to rotation, we can divide an image into different rings and use them to form a secondary image invariant to rotation. Fig. 3 is a schematic diagram of secondary image construction, where (a) is a square image divided into 7 rings and (b) is the secondary image formed by these rings. Detailed scheme of secondary image construction is as follows. Input image Preprocessing Normalized image Ring partition Secondary image NMF (a) Central part of Airplane (b) Rotated by 30 Fig. 2. Ring partition of an image and its rotated version. Image hash Fig. 1. Block diagram of our image hashing. 3.1 NMF NMF is an efficient technique of dimensionality reduction, which has shown better performance than principal components analysis (PCA) and vector quantization (VQ) in learning parts-based representation [37]. In fact, NMF has been successfully used in face recognition [38], image representation [39], image analysis [40], signal separation [41], data clustering [42], and so on. A non-negative matrix V=(Vi,j)M×N is generally viewed as a combination of N vectors sized M1. NMF results of V are two nonnegative matrix factors, i.e., B=(Bi,j)M×K and C=(Ci,j)K×N, where K is the rank for NMF and K<min(M, N). The matrices B and C are called the base matrix and the coefficient matrix (or encoding matrix), respectively. They can be used to approximately represent V such that: V ≈ BC (1) In literature, various algorithms [43, 44] have been proposed to achieve NMF. In this paper, we employ the multiplicative update rules [43] to find B and C as follows. N Ck , jVi , j (BC)i , j j 1 Bi ,k Bi ,k N Ck , j j 1 M (2) Bi , kVi , j (BC)i , j i 1 Ck , j Ck , j M Bi , k i 1 i 1,, M ; j 1,, N ; k 1,, K The above rules correspond to minimizing the generalized Kullback-Leibler (KL) divergence as follows. Vi, j M N (3) F Vi, j (BC)i, j Vi, j log i 1 j 1 (BC)i, j r7 r6 r5 r4 r3 r2 r1 (a) Ring partition (b) Secondary image Fig. 3. Schematic diagram of secondary image construction. Let the size of square image be mm, n be the total number of rings and Rk be a set of those pixel values in the k-th ring (k=1, 2, …, n). In here, we only use the pixels in the inscribed circle of the square image and divide the inscribed circle into rings with equal area. This is because each ring is expected to be a column of the secondary image. The ring partition can be done by calculating the circle radii and the distance between each pixel and the image center. As shown in Fig. 3 (a), the pixels of each ring can be determined by two neighbor radii except those of the innermost ring. Suppose that rk is the k-th radius (k=1, 2, …, n), which is labeled from small value to big value. Thus, r1 and rn are the radii of the innermost and outmost circles, respectively. Clearly, rn= m/2 for the m×m images, where means downward rounding. To determine other radii, the area of the inscribed circle A and the average area of each ring A are firstly calculated as follows. A= rn2 (4) A=A/n (5) So r1 can be computed by Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF r1 A (6) Thus, other radii rk (k=2, 3, …, n−1) can be obtained by the following equation. rk A rk21 (7) Let p(x, y) be the value of the pixel in the y-th row and the x-th column of the image (1≤ x, y ≤ m). Suppose that (xc, yc) are the coordinates of the image center. Thus, xc=m/2+0.5 and yc=m/2+0.5 if m is an even number. Otherwise, xc=(m+1)/2 and yc=(m+1)/2. So the distance between p(x, y) and the image center (xc, yc) can be measured by the Euclidean distance as follows. d x, y ( x xc ) 2 ( y yc ) 2 (8) Having obtained the circle radii and pixel distances, we can classify these pixel values into n sets as follows. R1 = {p(x, y) | dx,y ≤ r1} (9) Rk = {p(x, y) | rk−1 < dx,y ≤ rk} (k = 2, 3, …, n) (10) Next, we rearrange the elements of Rk (k=1, 2, …, n) to make a sorted vector uk in ascending order. This operation guarantees that uk is unrelated to rotation. As pixel coordinates are discrete, the pixel number of each set is not always equal to A. Since the pixels of each ring are expected to form a column of the secondary image, uk is then mapped to a new vector vk sized A1 by linear interpolation. Thus, the secondary image V is obtained by arranging these new vectors as follows. V = [v1, v2, v3, …, vn] (11) As vk is unrelated to rotation, V is also invariant to this operation. Except the rotation-invariant merit, the secondary image has another advantage that it has fewer columns than the original image. Since each column is viewed as a high dimensional vector and will be compressed by NMF, fewer columns are helpful to compact hash. 3.3 Illustrating our approach Below our image hashing is illustrated with an example step by step as follows. (1) Preprocessing. The input image is firstly mapped to a normalized size mm by bilinear interpolation. This ensures that our hashing is resilient to scaling operation and hashes of different size images have the same length. For an RGB color image, we convert it into YCbCr color space by the following equation. Y 65.481 128 .553 24.966 R 16 112 G 128 (12) Cb 37.797 74.203 Cr 112 93.786 18.214 B 128 where R, G and B represent the red, green and blue components of a pixel, and Y, Cb, and Cr are the luminance, blue-difference chroma and red-difference chroma, respectively. After color space conversion, we take the luminance component Y for representation. (2) Ring partition. Divide Y into n rings and exploit them to produce a secondary image V by using the scheme described in the Subsection 3.2. The aim of this step is to construct a rotation-invariant matrix for dimen- 5 sionality reduction. (3) NMF. Apply NMF to V and then the coefficient matrix C is available. Concatenate the matrix entries and obtain a compact image hash. Thus, the hash length is L=nK, where n is the number of rings and K is the rank for NMF. To measure similarity between two image hashes, we take correlation coefficient as the metric. Let h(1)=[h1(1), h2(1), …, hL(1)] and h(2)=[h1(2), h2(2), …, hL(2)] be two image hashes. Thus, the correlation coefficient is defined as L S [h L [h i 1 (1) i (1) i i 1 1 ][hi (2) 2 ] 1 ] 2 L [h i 1 i (2) (13) 2 ] 2 where is a small constant to avoid singularity when L [ h i 1 i (1) 1 ]2 0 or L [h i 1 i (2) 2 ]2 0 , 1 and 2 are the means of h(1) and h(2), respectively. The range of S is [−1, 1]. The more similar the input images, the bigger the S value. If S is bigger than a pre-defined threshold T, the images are viewed as visually identical images. Otherwise, they are different images or one is a tampered version of the other. The reason of choosing correlation coefficient as the metric is based on the observation that NMF coefficients are approximately linearly changed by content-preserving manipulations. This will empirically be validated in Subsection 4.1. We now present two examples of our image hashing algorithm. Fig. 4 (a) is an image with imperfect ground level. Fig. 4 (b) is a similar version of 4(a), which undergone a sequence of digital processing, including image rotation with 4º, JPEG compression with quality factor 50, brightness adjustment with the Photoshop's scale 50, and Gaussian white noise of mean 0 and variance 0.01. Both the image sizes of Figs. 4 (a) and (b) are 760×560. We employed our image hashing with parameters of m=512, n=32 and K=2 to generate the image hashes of Figs. 4 (a) and (b). The results are illustrated in Table 2. We calculated similarity between the hashes of Figs. 4 (a) and (b), and found that S=0.9957, indicating that Figs. 4 (a) and (b) are a pair of visually identical images. (a) Image with imperfect ground level (b) A similar version of (a) Fig. 4. A pair of visually identical images. 4 EXPERIMENTAL RESULTS In the experiments, the input image is resized to 512512, the used number of rings is 32, and the rank for NMF is 2, i.e., m=512, n=32, and K=2. Thus, the hash length is L=nK=64 decimal digits. To validate our hashing performances, experiments about perceptual robustness and discriminative capability are conducted in the next two 6 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622 subsections, respectively. The third subsection will evaluate sensitivity to visual content changes. Effect of different parameters on hash performances, i.e., numbers of rings and ranks for NMF, will be discussed in the final subsection. TABLE 2 HASH VALUES OF THE IMAGES ILLUSTRATED IN FIGURE 4 Image Image hash 114,165,134, 99, 89, 90, 88, 86, 83, 85, 83, 87, 88, 90, 91, 94, 95, 97,104,118,120,136,141,145, 148,151,154,156,157,158,158,157, 71, 71,129,183,202,206,206,206, 210,200,198,184,175,167,166,162, 162,160,146,128,121,104, 98, 92, 88, 82, 77, 72, 67, 65, 64, 64 140,192,158,124,112,109,106,106, 102,107, 97,111,115,116,119,123, 120,124,130,144,145,157,157,165, 169,170,172,173,178,179,179,179, 86, 89,152,208,231,238,242,239, 241,230,234,212,199,193,191,187, 190,185,169,150,148,132,130,118, 111,108,104, 98, 90, 88, 87, 88 Fig. 4 (a) Fig. 4 (b) 4.1 Perceptual robustness All images in the USC-SIPI Image Database [22] were taken as test images, except those in the fourth volume, i.e., 'Sequences', which are often used for sequence image retrieval. This means, there are totally 146 test images selected in our experiments. We attacked these images by normal content-preserving operations with different parameters, and calculated similarities between hashes of the original and their attacked images. The experiments demonstrated that the proposed hashing is robust against the test operations except a few cases. For space limitation, typical examples are exemplified in this paper, where Airplane, Baboon, House, Lena and Peppers are the test images sized 512512. We exploited StirMark benchmark 4.0 [45] to attack them, where JPEG compression, watermark embedding, scaling and rotation are all used. The parameters are set as follows: quality factors for JPEG compression are 30, 40, …, 100; strengths of watermark embedding are 10, 20, …, 100; scaling ratios are 0.5, 0.75, 0.9, 1.1, 1.5 and 2.0; and rotation angles are 1º, 2º, 5º, 10º, 15º, 30º, 45º, 90º, −1º, −2º, −5º, −10º, −15º, −30º, −45º and −90º. Since rotation operation will expand the sizes of the attacked images, only the center parts sized 361361 are taken from the original and the attacked images for hash generation. Brightness adjustment and contrast adjustment with Photoshop are also applied to the test images. Their parameters are both 10, 20, −10, and −20. Moreover, gamma correction and 33 Gaussian low-pass filtering in MATLAB are also used to produce processed versions. The values for gamma correction are 0.75, 0.9, 1.1, 1.25, and the standard deviations for Gaussian blur are 0.3, 0.4, ..., 1.0. The settings of our experiments are summarized in Table 3. Therefore, each test image has 60 attacked versions. We extracted image hashes of the test images and their attacked versions, and calculated their correlation coefficients. The results are presented in Fig. 5, where the abscissa is the parameter values of each operation and the ordinate represents the correlation coefficients. It is observed that all correlation coefficients in Figs. 5 (a), (b), (c) and (e) are bigger than 0.98. In Figs. 5 (d), (f) and (g), each has one value below 0.98. They are 0.9653, 0.9790 and 0.9763, respectively. In Fig. 5 (h), two values, i.e., 0.9797 and 0.9573, are smaller than 0.98 when the Photoshop’s scales are −20 and 20. Therefore, we can choose T=0.98 as a threshold to resist most of the above content-preserving operations. In this case, 5/300100%=1.67% pairs of visually identical images are falsely considered as different images. As the threshold decreases to 0.95, no pair of visually identical images is viewed as distinct images. Moreover, we validated our hashing performances under the combined attacks between rotation and other content-preserving operations. Here, we took Airplane as test image, and further attacked the rotated Airplane with with 30º by JPEG compression with quality factor 30, 33 Gaussian low-pass filtering with a unit standard deviation, gamma correction with =1.1, brightness adjustment with the Photoshop’s scale 20, and contrast adjustment with the Photoshop’s scale 20, respectively. The S values of the five combined attacks are 0.98516, 0.99020, 0.98699, 0.99133, and 0.98801, respectively, which are all bigger than the above mentioned thresholds. Note that correlation coefficient S is a useful metric to measure linearity between two vectors. Its rang is [−1, 1], where S=1 means that the hash vectors are completely linearly dependent. The above S values are all bigger than 0.95. Those S values close to 1 indicate strong linear dependences, and therefore empirically verify that NMF coefficients are approximately linearly changed by content-preserving manipulations. 0.99 0.99 0.99 0.98 0.98 0.98 Airplane 0.97 S 1 S 1 S 1 Airplane 0.97 Baboon House Lena 0.96 40 50 60 70 Quality factor 80 (a) JPEG compression 90 Baboon House Lena 0.96 Peppers 0.95 30 Airplane 0.97 Baboon House Lena 0.96 Peppers 100 0.95 10 20 30 40 Peppers 50 60 Strength 70 80 (b) Watermark embedding 90 100 0.95 0.5 0.75 0.9 1.1 1.5 Ratio (c) Scaling 2 Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF 7 0.99 0.99 0.99 0.98 0.98 0.98 Airplane 0.97 S 1 S 1 S 1 Airplane 0.97 Baboon House Lena 0.96 Baboon House Lena 0.96 Peppers 0.95 -90 Airplane 0.97 Baboon House Lena 0.96 Peppers -45 -30 -15 -5 5 15 30 45 Rotation angle 90 0.95 0.3 (d) Rotation 0.4 0.5 0.6 0.7 0.8 Standard deviation 0.9 Peppers 1 0.95 0.75 0.9 (e) Gaussian low-pass filtering 0.99 0.99 0.98 0.98 1.25 (f) Gamma correction S 1 S 1 1.1 Airplane 0.97 Airplane 0.97 Baboon Baboon House Lena 0.96 0.96 House Lena 0.95 -20 -10 Peppers 0.95 -20 -10 0 Photoshop's scale 10 Peppers 20 0 Photoshop's scale 10 20 (g) Brightness adjustment (h) Contrast adjustment Fig. 5. Robustness test based on five standard test images. TABLE 3 THE USED CONTENT-PRESERVING OPERATIONS AND THEIR PARAMETER VALUES Tool Manipulation Parameter Parameter values StirMark JPEG compression Quality factor 30,40,50,60,70,80,90,100 StirMark Watermark embedding Strength 10,20, 30,40,50,60,70,80,90,100 StirMark Scaling Ratio StirMark Rotation and cropping Photoshop Brightness adjustment Photoshop Contrast adjustment Photoshop’s scale 10, 20, 10, 20 MATLAB Gamma correction γ 0.75, 0.9, 1.1, 1.25 MATLAB 33 Gaussian low-pass filtering Standard deviation 0.3, 0.4,0.5,0.6,0.7,0.8,0.9,1.0 0.5, 0.75, 0.9, 1.1, 1.5, 2.0 1, 2, 5, 10, 15, 30, 45, 90, 1, 2, 5, Rotation angle in degree 10, 15, 30, 45, 90 Photoshop’s scale 10, 20, 10, 20 similar images. As the threshold reaches 0.98, false judg4.2 Discriminative capability ment will not occur. To validate the discrimination, 200 different color images were collected to form an image database, where 100 images were taken from the Ground Truth Database [23], 67 images were downloaded from the Internet, and 33 imag(a) Images taken from the Ground Truth Database es were captured by digital cameras. These image contents include buildings, human beings, sport, landscape, etc. The image sizes range from 256×256 to 2048×1536. Fig. 6 presents typical examples of the image database. We calculated image hashes of these 200 images, computed correlation coefficient S between each pair of different (b) Images downloaded from the Internet images, and then obtained 19900 values. Distribution of these results is presented in Fig. 7, where the abscissa is the correlation coefficient S and the ordinate represents the frequency of S. In Fig. 7, the minimum and maximum S values are −0.6836 and 0.9717, and the mean and the (c) Images captured by digital cameras standard deviation of all S values are 0.3910 and 0.3014, Fig. 6. Typical images in the test database. respectively. If T=0.95 is selected as a threshold, 0.12% pairs of different images are falsely considered as visually 8 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622 300 250 Frequency 200 150 100 50 0 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 0.9 1 1.1 S Fig. 7. Discrimination test based on 200 different images. 4.3 Sensitivity to visual content changes When some operations, such as object deletion and object insertion, are applied to an image, the meaning of the attacked image is different from that of its original image. In other words, the original image and the attacked images are perceptually different. Generally, these operations are viewed as image tampering and are expected to be significantly reflected in image hash [7, 9, 13, 19, 21, 30]. This means that image hashing should be sensitive to image tampering. To validate sensitivity to visual content changes, we simulated image tampering by deleting objects or inserting objects using Photoshop, and generated dozens of tampered images. We applied the proposed hashing to the orignal images and their tampered images, calculated similarity S between each pair of hashes, and found that all S values are smaller than the above mentioned thresholds. For space limitation, four typical results are exemplified here. Figs. 8(a), (c), (e) and (g) are four original images, and Figs. 8(b), (d), (f) and (h) are their tampered versions, respectively. Hash similarity between each pair of images is given in Table 4. From the results, we observed that all S values are smaller than 0.90. Therefore, the proposed hashing can distinguish tampered images from the processed images by using the above threshold 0.95 or 0.98. (a) Original image sized 540319 (b) Tampered image: a vehicle deleted (c) Original image sized 538388 (d) Tampered image: a solider deleted (e) Original image sized 482319 (f) Tampered image: jungle inserted (g) Original image sized 512512 (h) Tampered image: a flower inserted Fig. 8. Original images and their tampered versions. TABLE 4 SIMILARITY S BETWEEN THE ORIGINAL AND TAMPERED IMAGES Image pair Figs. 8 (a) and (b) S 0.861 Figs. 8 (c) and (d) 0.724 Figs. 8 (e) and (f) 0.853 Figs. 8 (g) and (h) 0.788 4.4 Effect of number of rings and rank for NMF Receiver operating characteristics (ROC) graph [46] was exploited to visualize classification performances with respect to the robustness and the discriminative capability under different parameters. Thus, both true positive rate (TPR) PTPR and false positive rate (FPR) PFPR can be calculated as follows. PTPR n1 N1 (14) n2 (15) N2 where n1 is the number of the pairs of visually identical images considered as similar images, N1 is the total pairs of visually identical images, n2 is the number of the pairs of different images viewed as similar images, and N2 is the total pairs of different images. Obviously, TPR and FPR indicate the perceptual robustness and the discriminative capability, respectively. In the experiments, all images are mapped to 512512, i.e., m=512. The used ranks for NMF are 2 and 3, i.e., K=2 and K=3. For each rank, four numbers of rings are tested, i.e., n=8, n=16, n=32 and n=64. The test images used in the Subsections 4.1 and 4.2 are also adopted here. For each group of parameter values, robustness validation and discrimination test are both calculated. We choose a threshold and calculate TPR and FPR. This process is repeated for different thresholds and the ROC curve is then obtained. The used thresholds are 0.940, 0.950, 0.960, 0.970, 0.980, 0.985, 0.988, 0.990, 0.995 and 0.998. Fig. 9 is the ROC curves under different parameters. It is observed that, for a fixed rank, the whole hashing performances will be improved when the number of rings increases. In fact, the number of rings is equal to column number of the secondary image. Fewer columns will lead to fewer features in the final hash, which will inevitably hurt the discriminative capability. For a fixed number of rings, a bigger rank will make better hashing performances. This is because a bigger rank means more elements in the PFPR Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF hash, which not only preserve the perceptual robustness, but also improve the discriminative capability. In fact, hashing performances are related to hash length. Generally, a short image hash will have good robustness, but makes poor discrimination. As hash length increases, discriminative capability is strengthened while perceptual robustness slightly decreases. Note that image hash is a compact representation, meaning that hash length is expected to be short enough. Thus, a tradeoff is needed in choosing the hash length, i.e., the values of K and n. In the experiments, we find that, for 512512 normalized images, hash lengths ranging from 40 to 100 can reach a desirable compromise between the robustness and the discrimination, e.g., (1) K=2 and n=32, (2) K=3 and n=16, (3) K=3 and n=32. 9 TABLE 5 AVERAGE TIME OF HASH GENERATION UNDER DIFFERENT PARAMETERS K n 2 8 Hash length (decimal digits) 16 Average time (seconds) 2.5 2 16 32 2.6 2 32 64 2.8 2 64 128 3.3 3 8 24 2.7 3 16 48 2.7 3 32 96 2.9 3 64 192 3.4 1 True positive rate n=8 0.9 n=16 n=32 0.8 n=64 0.7 0.6 0.5 0.4 0.3 0.2 0 0.005 0.01 False positive rate 0.015 0.02 (a) K=2 1 True positive rate n=8 0.9 n=16 n=32 0.8 n=64 0.7 0.6 0.5 0.4 0.3 0.2 0 0.005 0.01 False positive rate 0.015 0.02 (b) K=3 Fig. 9. ROC curves under different parameters. To compare average time of hash generation under different parameters, we recorded the total time of calculating 200 different image hashes in discrimination tests and then computed the consumed time for generating a hash. Our algorithm was implemented in MATLAB language, running on a PC with 2.61 GHz AMD Athlon 64 X2 Dual CPU and 1.87 GB RAM. The results are presented in Table 5. We observe from the table that, for a fixed K, average time slightly increases when the n value becomes large. Similarly, for a fixed n, as the K value increases, consumed time will also increase. 5 PERFORMANCE COMPARISONS To show advantages of the proposed hashing, we compared it with some state-of-the-art algorithms: RASH method [18], RT-DFT hashing [21], SVD-SVD hashing [17], NMF-NMF-SQ hashing [20], invariant moments based hashing [32], and GF-LVQ hashing [34]. To make fair comparisons, the same images were adopted for robustness validations and discrimination tests. All images were resized to 512512 before hash generation [17, 20]. The luminance components in YCbCr color space were also taken for representation. The original similarity metrics of the assessed algorithms were also adopted, i.e., the peak of cross correlation (PCC) for the RASH method, the modified L1 norm for the RT-DFT hashing, the L2 norm for the SVD-SVD hashing, the NMF-NMF-SQ hashing and the invariant moments based hashing, and the normalized Hamming distance for the GF-LVQ hashing. The used parameters for the SVD-SVD hashing are as follows: the first number of overlapping rectangles is 100, rectangle size is 64×64, the second number of overlapping rectangles is 20 and the rectangle size is 40×40. For the NMFNMF-SQ hashing, the used parameters are: sub-image number is 80, height and width of sub-images is 64, rank of the first NMF is 2 and rank of the second NMF is 1. For the GF-LVQ hashing, image size is also normalized to 512512 and other parameter values are the same with those used in [34] .For the proposed hashing, experimental results under the parameters K=2 and n=32 were taken for comparisons. Therefore, the hash lengths of the RASH method, the RT-DFT hashing, the SVD-SVD hashing, the NMF-NMF-SQ hashing, the invariant moments based hashing and the proposed hashing are 40, 15, 1600, 64, 42 and 64 decimal digits, respectively. And the hash length of the GF-LVQ hashing is 120 bits. We exploited the RASH method, the RT-DFT hashing, the SVD-SVD hashing, the NMF-NMF-SQ hashing, the invariant moments based hashing and the GF-LVQ hashing to generate image hashes and calculated their hash similarities by using the respective metrics. The results are shown in Fig. 10, where the results of robustness validation and discrimination test are both presented for viewing classification performances. Fig. 10 (a) is the results of the RASH method. The authors of the RASH method [18] take 0.87 as the threshold. If PCC ≥ 0.87, the images are viewed as visually identical images. Otherwise, they are different images. It is observed that the minimum PCC of visually identical images is 0.649 and IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622 600 Frequency 400 300 200 100 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 L2 norm (c) SVD-SVD hashing 150 Different images Visually identical images 125 Frequency 100 75 50 25 0 0 0.5 1 1.5 2 2.5 3 3.5 L2 norm x 10 4 (d) NMF-NMF-SQ hashing 600 Different images Visually identical images 450 300 150 0 0 10 20 30 40 50 60 L2 norm (e) Invariant moments based hashing 1000 Different images Visually identical images 800 600 400 200 600 0 0 Different images Visually identical images 500 Different images Visually identical images 500 Frequency the maximum PCC of different images is 0.962. This means that PCC [0.649, 0.962] will be inevitably misclassified. Fig. 10 (b) presents the results of the RT-DFT hashing, where the minimum modified L1 norm of different images is 0.023, but the maximum distance of visually identical images is 0.144. So the values in [0.023, 0.144] will be misclassified. Fig. 10 (c) shows the L2 norms of the SVD-SVD hashing. The minimum value of different images is 0.105 and the maximum value of similar images is 0.806. Thus, the overlapping interval between the L2 norms of similar images and different images is [0.105, 0.806]. It implies that, for those L2 norms in this interval, misclassification cannot be avoided. The results of the NMF-NMF-SQ hashing are given in Fig. 10 (d), where the minimum L2 norm of different images is 3184 and the maximum value of visually identical images is 14634. Therefore, distances in the interval [3184, 14634] will be falsely classified. The results of the invariant moments based hashing are illustrated in Fig. 10 (e), where the minimum L2 norm of different images is 6.5 and the maximum value of visually identical images is 12.1. Thus, the overlapping interval is [6.5, 12.1]. The results of the GFLVQ hashing are illustrated in Fig. 10 (f), where the minimum normalized Hamming distance of different images is 0 and the maximum distance of visually identical images is 0.5417. Therefore, the overlapping interval is [0, 0.5417]. The results of the proposed hashing are presented in Fig. 10 (g). It is found that the minimum S of similar images is 0.957 and the maximum S of different images is 0.972. So the overlapping interval is [0.957, 0.972]. Clearly, all the above hashing algorithms cannot find a threshold which can divide their values into two subsets without misclassification. It is also observed that the image pair number of the proposed hashing in the overlapping interval is smaller than those of other algorithms in their respective intervals. Therefore, we can intuitively conclude that the proposed hashing outperforms the compared algorithms in classification performances. Frequency 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Normalized Hamming distance 0.8 0.9 1 (f) GF-LVQ hashing Frequency 400 300 300 250 Different images Visually identical images 200 Frequency 200 100 150 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Peak of cross correlation 0.7 0.8 0.87 0.95 1 50 (a) RASH method 0 -0.8 800 Different images Visually identical images -0.6 -0.4 -0.2 0 0.2 Correlation coefficient 0.4 0.6 0.8 0.9 1 (g) Proposed hashing 600 Frequency 100 Fig. 10. Performance comparisons. 400 200 0 0 0.1 0.2 0.3 0.4 Modified L1 norm 0.5 (b) RT-DFT hashing 0.6 0.7 0.8 To make theoretical analysis of the above results, ROC graphs were used again. For those algorithms with the same FPR, the method with big TPR outperforms the one with small value. Similarly, for those algorithms with the same TPR, the hashing with small FPR is better than that with big FPR. We exploited the thresholds of respective Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF 1 0.9 0.8 True positive rate 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 SVD-SVD hashing NMF-NMF-SQ hashing Invariant moments based hashing GF-LVQ hashing Proposed hashing 1 0.8 1 0.8 1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.2 0.4 0.6 False positive rate (c) SVD-SVD hashing 1 0.9 0.8 0.7 True positive rate RT-DFT hashing 0.8 0.8 Thresholds 0.84, 0.85, 0.87, 0.88, 0.90, 0.92, 0.94, 0.96, 0.97, 0.98 0.01, 0.05, 0.08, 0.10, 0.15, 0.20, 0.30, 0.40, 0.50, 0.80 0.10, 0.20, 0.30, 0.40, 0.45, 0.50, 0.60, 0.70, 0.80, 0.90 1500,2000,2500,3000,3500, 4000,4500,5000,5500,6000 0.4 0.6 False positive rate 0.9 0 RASH method 0.2 (b) RT-DFT hashing CURVES Algorithm 0 1 True positive rate algorithms given in Table 6 to calculate their TPR and FPR values, and obtained the ROC curves as shown in Fig. 11, where (a), (b), (c), (d), (e), (f) and (g) are the results of the RASH method, the RT-DFT hashing, the SVD-SVD hashing, the NMF-NMF-SQ hashing, the invariant moments based hashing, the GF-LVQ hashing and the proposed hashing, respectively. From the results, we observe that, when FPR is close to 0, the TPRs of the RASH method, the RT-DFT hashing, the SVD-SVD hashing, the NMFNMF-SQ hashing, the invariant moments based hashing, the GF-LVQ hashing and the proposed hashing are 0.9100, 0.8767, 0.1259, 0.8133, 0.9300, 0.5167 and 0.9833, respectively. When TPR reaches 1, the optimal FPRs of the RASH method, the RT-DFT hashing, the SVD-SVD hashing, the NMF-NMF-SQ hashing, the invariant moments based hashing, the GF-LVQ hashing and the proposed hashing are 0.2691, 0.1454, 0.9451, 0.5347, 0.0311, 0.9113 and 0.00116, respectively. It is clear that the proposed hashing outperforms the RASH method, the RTDFT hashing, the SVD-SVD hashing, the NMF-NMF-SQ hashing, the invariant moments based hashing and the GF-LVQ hashing. TABLE 6 THRESHOLDS OF THE RESPECTIVE ALGORITHMS FOR ROC 11 0.6 0.5 0.4 0.3 1,4,8,12,15,18,22,27,30,34 0.2 0.05,0.10,0.15,0.20,0.25, 0.30, 0.35,0.45,0.50,0.55 0.940,0.950,0.960,0.970,0.980, 0.985,0.988,0.990,0.995,0.998 0.1 0 0 0.2 0.4 0.6 False positive rate (d) NMF-NMF-SQ hashing 1 0.9 0.8 0.8 0.7 True positive rate 1 0.9 True positive rate 0.7 0.6 0.5 0.6 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 False positive rate (a) RASH method 0.8 1 0 0.2 0.4 0.6 False positive rate 0.8 (e) Invariant moments based hashing 1 12 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622 1 0.9 0.8 True positive rate 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 False positive rate 0.8 1 (f) GF-LVQ hashing 1 0.9 0.8 True positive rate 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 1.2 False positive rate 1.4 1.6 1.8 2 -3 x 10 (g) Proposed hashing Fig. 11. ROC curve comparisons among different hashing algorithms. We also evaluated content sensitivities of the compared algorithms. To do so, we applied the compared algorithms to those image pairs in Fig. 8, calculated their similarity values, and obtained the resulsts as shown in Table 7. We took the optimum threshold of each algorithm when FPR ≈ 0 to make fair comparions. In this case, the optimum threshold of the proposed hashing is 0.98. With this threshold, the proposed hashing can correctly identi- fy all tampered images. The optimum thresholds of the compared algorithms are listed in Table 7. It is observed that, the RASH method cannot correctly identify the image pairs of Fig. 8 (a) and Fig. 8 (b), and Fig. 8 (g) and Fig. 8 (h). Similarly, the RT-DFT hashing cannot correctly identify the image pairs of Fig. 8 (a) and Fig. 8 (b), and Fig. 8 (c) and Fig. 8 (d). However, the SVD-SVD hashing can successfully detect all tampered images. For the NMF-NMF-SQ hashing, it only finds a pair of the original and tampered images. The invariant moments based hashing is insensitive to content changes and therefore cannot discover any tampered image. For the GF-LVQ hashing, it can successfully detect the image pairs of Fig. 8 (a) and Fig. 8 (b), and Fig. 8 (c) and Fig. 8 (d), but fails to identify the image pairs of Fig. 8 (e) and Fig. 8 (f), and Fig. 8 (g) and Fig. 8 (h). In Table 7, the texts in bold indicate those results which cannot be correctly detected by the correponding algorithms. Moreover, we compared the run time of the seven algorithms. To this end, we recorded the total time of the respective discrimination test to find average time of hash generation. The average time of the RASH method, the RT-DFT hashing, the SVD-SVD hashing, the NMF-NMFSQ hashing, the invariant moments based hashing, the GF-LVQ hashing and the proposed hashing are 9.6, 12.8, 1.5, 2.4, 2.6, 0.67 and 2.8 seconds, respectively. Table 8 summarizes the performance comparisons among different algorithms. We find that the proposed hashing is better than the compared algorithms in classification performances between perceptual robustness and discrimination. The proposed hashing and the SVD-SVD hashing are both sensitive to visual content changes, showing better performances than other algorithms. As to hash length and time complexity of hash generation, the proposed hashing has moderate performances, and the best algorithm is the GF-LVQ hashing. TABLE 7 SIMILARITY RESULTS BETWEEN EACH PAIR OF IMAGES IN FIG. 8 UNDER DIFFERENT ALGORITHMS Similarity result of each pair of images in Fig. 8 (a) and (b) (c) and (d) (e) and (f) (g) and (h) Algorithm Threshold RASH method 0.87 0.914 0.767 0.518 0.921 RT-DFT hashing 0.05 0.047 0.047 0.054 0.053 0.432 0.343 0.315 0.402 SVD-SVD hashing 0.20 NMF-NMF-SQ hashing 2500 1513 2722 1931 2249 Invariant moments based hashing 8.0 4.407 3.857 3.558 7.548 GF-LVQ hashing 0.10 0.125 0.508 0 0 6 CONCLUSIONS In this paper, we have proposed a robust image hashing based on ring partition and NMF, which is robust against image rotation and has a desirable discriminative capability. A key component of our hashing is the construction of secondary image, which is used for the first time in image hashing. The secondary image is rotation-invariant and therefore makes our hashes resistant to rotation. Discriminative capability of the proposed hashing is attributed to the use of NMF, which is good at learning parts-based representation. Experiments have been conducted to validate our hashing performances, and showed that our hashing is robust against content-preserving manipulations, such as image rotation, JPEG compression, water- Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF 13 TABLE 8 PERFORMANCE COMPARISONS AMONG DIFFERENT ALGORITHMS Optimal TPR when FPR=0 Optimal FPR when TPR=1 Sensitivity to content changes Hash length Average time (seconds) RASH method RT-DFT hashing SVD-SVD hashing NMF-NMFSQ hashing Invariant moments based hashing GF-LVQ hashing Proposed hashing 0.9100 0.8767 0.1259 0.8133 0.9300 0.5167 0.9833 0.2691 0.1454 0.9451 0.5347 0.0311 0.9113 0.00116 Moderate Moderate Sensitive Moderate Insensitive Moderate Sensitive 40 digits 15 digits 1600 digits 64 digits 42 digits 120 bits 64 digits 9.6 12.8 1.5 2.4 2.6 0.67 2.8 mark embedding, Gaussian low-pass filtering, gamma correction, brightness adjustment, contrast adjustment and image scaling, and reaches good discrimination. The ROC curve comparisons have indicated that the proposed hashing is better than some well-known hashing algorithms in classification performances between the perceptual robustness and the discriminative capability. Research on image hashing is still under way. Future works are mainly focused on the use of the rotationinvariant secondary image in combination with other techniques. Issues being considered include the capability of tamper detection in small region, tamper localization, efficient color feature extraction, and so on. [5] ACKNOWLEDGMENTS [9] The authors are grateful for the anonymous reviewers’ insightful comments and valuable suggestions sincerely, which can substantially improve the quality of this paper. Many thanks for Dr. Y. Li’s source code. This work is supported in part by the Australian Research Council (ARC) under large grant DP0985456; the China 863 Program under grant 2012AA011005; the China 973 Program under grant 2013CB329404; the Natural Science Foundation of China under grants 61165009, 60963008, 61170131; Guangxi “Bagui Scholar” Teams for Innovation and Research; the Guangxi Natural Science Foundation under grants 2012GXNSFGA060004, 2012GXNSFBA053166, 2011GXNSFD018026, 0832104; the Project of the Education Administration of Guangxi under grant 200911MS55; and the Scientific Research and Technological Development Program of Guangxi under grant 10123005–8. [6] [7] [8] [10] [11] [12] [13] [14] [15] References [1] [2] [3] [4] M. Slaney and M. Casey, "Locality-sensitive hashing for finding nearest neighbors," IEEE Signal Processing Magazine, vol.25, no.2, pp.128–131, 2008. M. N. Wu, C. C. Lin and C. C. Chang, "Novel image copy detection with rotating tolerance," Journal of Systems and Software, vol.80, no.7, pp.1057–1069, 2007. S. Wang and X. Zhang, "Recent development of perceptual image hashing," Journal of Shanghai University (English Edition), vol.11, no.4, pp.323331, 2007. F. Ahmed, M.Y. Siyal and V.U. Abbas, "A secure and robust hashbased scheme for image authentication," Signal Processing, vol.90, no.5, pp.1456–1470, 2010. [16] [17] [18] C. Qin, C. C. Chang and P. Y. Chen, "Self-embedding fragile watermarking with restoration capability based on adaptive bit allocation mechanism," Signal Processing, vol.92, no.4, pp.1137– 1150, 2012. C. S. Lu, C. Y. Hsu, S. W. Sun and P. C. Chang, "Robust meshbased hashing for copy detection and tracing of images," Proc. of IEEE International Conference on Multimedia and Expo, vol.1, pp.731−734, 2004. Z. Tang, S. Wang, X. Zhang, W. Wei and S. Su, "Robust image hashing for tamper detection using non-negative matrix factorization," Journal of Ubiquitous Convergence and Technology, vol.2, no.1, pp.18–26, 2008. E. Hassan, S. Chaudhury and M. Gopal, "Feature combination in kernel space for distance based image hashing," IEEE Transactions on Multimedia, vol.14, no.4, pp.11791195, 2012. W. Lu and M. Wu, "Multimedia forensic hash based on visual words," Proc. of IEEE International Conference on Image Processing, pp.989–992, 2010. X. Lv and Z. J. Wang, "Reduced-reference image quality assessment based on perceptual image hashing," Proc. of IEEE International Conference on Image Processing, pp.4361–4364, 2009. R. Venkatesan, S.-M. Koon, M. H. Jakubowski and P. Moulin, "Robust image hashing," Proc. of IEEE International Conference on Image Processing, pp.664666, 2000. J. Fridrich and M. Goljan, "Robust hash functions for digital watermarking," Proc. of IEEE International Conference on Information Technology: Coding and Computing, pp.178183, 2000. C. Y. Lin and S. F. Chang, "A robust image authentication system distinguishing JPEG compression from malicious manipulation," IEEE Transactions on Circuits System and Video Technology, vol.11, no.2, pp.153168, 2001. D. Wu, X. Zhou and X. Niu, "A novel image hash algorithm resistant to print–scan," Signal Processing, vol. 89, no.12, pp.2415–2424, 2009. Z. Tang, S. Wang, X. Zhang, W. Wei and Y. Zhao, "Lexicographical framework for image hashing with implementation based on DCT and NMF," Multimedia Tools and Applications, vol.52, no.2–3, pp.325345, 2011. F. Lefebvre, B. Macq and J.-D. Legat, "RASH: Radon soft hash algorithm," Proc. of European Signal Processing Conference, pp.299–302, 2002. S. S. Kozat, R. Venkatesan and M. K. Mihcak, "Robust perceptual image hashing via matrix invariants," Proc. of IEEE International Conference on Image Processing, pp.3443–3446, 2004. C. D. Roover, C. D. Vleeschouwer, F. Lefebvre and B. Macq, "Robust video hashing based on radial projections of key frames," IEEE Transactions on Signal Processing, vol.53, no.10, pp.40204036, 2005. 14 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622 [19] A. Swaminathan, Y. Mao and M. Wu, "Robust and secure image hashing," IEEE Transactions on Information Forensics and Security, vol.1, no.2, pp.215230, 2006. [20] V. Monga and M. K. Mihcak, "Robust and secure image hashing via non-negative matrix factorizations," IEEE Transactions on Information Forensics and Security, vol.2, no.3, pp.376390, 2007. [21] Y. Lei, Y. Wang and J. Huang, "Robust image hash in Radon transform domain for authentication," Signal Processing: Image Communication, vol.26, no.6, pp.280–288, 2011. [22] USC-SIPI Image Database. Available: http://sipi.usc.edu/database/, February 7, 2007. [23] Ground Truth Database. Available: http://www.cs.washington.edu/research/imagedatabase/groundtru th/, Accessed 8 May 2008. [24] M. Schneider and S. F. Chang, "A robust content based digital signature for image authentication," Proc. of IEEE International Conference on Image Processing, vol.3, pp.227−230, 1996. [25] V. Monga and B. L. Evans, "Perceptual image hashing via feature points: performance evaluation and tradeoffs," IEEE Transactions on Image Processing, vol.15, no.11, pp.3452–3465, 2006. [26] V. Monga, A. Banerjee and B. L. Evans, "A clustering based approach to perceptual image hashing," IEEE Transactions on Information Forensics and Security, vol.1, no.1, pp.68–79, 2006. [27] J. S. Seo, J. Haitsma, T. Kalker and C. D. Yoo, "A robust image fingerprinting system using the Radon transform," Signal Processing: Image Communication, vol.19, no.4, pp.325339, 2004. [28] Y. Mao and M. Wu, "Unicity distance of robust image hashing," IEEE Transactions on Information Forensics and Security, vol.2, no.3, pp. 462467, 2007. [29] F. Khelifi and J. Jiang, "Perceptual image hashing based on virtual watermark detection," IEEE Transactions on Image Processing, vol.19, no.4, pp.981–994, 2010. [30] Z. Tang, S. Wang, X. Zhang and W. Wei, "Structural feature-based image hashing and similarity metric for tampering detection," Fundamenta Informaticae, vol.106, no.1, pp.7591, 2011. [31] X. Lv and Z. J. Wang, "Perceptual image hashing based on shape contexts and local feature points," IEEE Transactions on Information Forensics and Security, vol.7, no.3, pp.10811093, 2012. [32] Z. Tang, Y. Dai and X. Zhang, "Perceptual hashing for color images using invariant moments," Applied Mathematics & Information Sciences, vol.6, no.2S, pp.643650, 2012. [33] M. K. Hu, "Visual pattern recognition by moment invariants," IRE Transaction on Information Theory, vol.8, no.2, pp.179187, 1962. [34] Y. Li, Z. Lu, C. Zhu, and X. Niu, "Robust image hashing based on random Gabor filtering and dithered lattice vector quantization," IEEE Transactions on Image Processing, vol.21, no.4, pp.19631980, 2012. [35] Z. Tang, Y. Dai, X. Zhang, and S. Zhang, "Perceptual image hashing with histogram of color vector angles," The 8th International Conference on Active Media Technology (AMT 2012), Lecture Notes in Computer Science, vol.7669, pp.237246, 2012. [36] Y. Zhao, S. Wang, X. Zhang, and H. Yao, "Robust hashing for image authentication using Zernike moments and local features," IEEE Transactions on Information Forensics and Security, vol.8, no.1, pp.55-63, 2013. [37] D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol.401, pp.788791, 1999. [38] J. Pan and J. Zhang, "Large margin based nonnegative matrix factorization and partial least squares regression for face recognition," Pattern Recognition Letters, vol.32, no.14, pp.18221835, 2011. [39] I. Buciu and I. Pitas, "NMF, LNMF, and DNMF modeling of neural receptive fields involved in human facial expression perception," Journal of Visual Communication and Image Representation, vol.17, no.5, pp.958969, 2006. [40] R. Sandler and M. Lindenbaum, "Nonnegative matrix factorization with earth mover’s distance metric for image analysis," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.33, no.8, pp.15901602, 2011. [41] S. Xie, Z. Yang and Y. Fu, "Nonnegative matrix factorization applied to nonlinear speech and image cryptosystems," IEEE Transactions on Circuits and Systems I: Regular Papers, vol.55, no.8, pp.23562367, 2008. [42] Y. Chen, L. Wang and M. Dong, "Non-negative matrix factorization for semisupervised heterogeneous data coclustering," IEEE Transactions on Knowledge and Data Engineering, vol.22, no.10, pp.14591474, 2010. [43] D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," Advances in Neural Information Processing Systems, vol.13, pp.556562, 2000. [44] I. Kotsia, S. Zafeiriou and I. Pitas, "A novel discriminant nonnegative matrix factorization algorithm with applications to facial image characterization problems," IEEE Transactions on Information Forensics and Security, vol.2, no.3, pp.588595, 2007. [45] F. A. P. Petitcolas, "Watermarking schemes evaluation," IEEE Signal Processing Magazine, vol.17, no.5, pp.58–64, 2000. [46] T. Fawcett, "An introduction to ROC analysis," Pattern Recognition Letters, vol.27, no.8, pp.861874, 2006. Zhenjun Tang received the B.S. and M.Eng. degrees from Guangxi Normal University, Guilin, P.R. China, in 2003 and 2006, respectively, and the PhD degree from Shanghai University, Shanghai, P.R. China, in 2010. He is now an associate professor with the Department of Computer Science, Guangxi Normal University. His research interests include image processing and multimedia security. He has contributed more than 20 international papers. He holds two China patents. He is a reviewer of several international journals such as IEEE Transactions on Information Forensics and Security, Signal Processing, IET Image Processing, Multimedia Tools and Applications and Imaging Science Journal. Xianquan Zhang received the M.Eng. degree from Chongqing University, Chongqing, P.R. China. He is currently a Professor with the Department of Computer Science, Guangxi Normal University. His research interests include image processing and computer graphics. He has contributed more than 40 papers. Shichao Zhang received the PhD degree in computer science at the Deakin University, Australia. He is currently a China 1000-Plan distinguished professor with the Department of Computer Science, Guangxi Normal University, China. His research interests include information quality and pattern discovery. He has published about 60 international journal papers and more than 60 international conference papers. He has won more than 10 nation class grants. He served/is serving as an associate editor for the IEEE Transactions on Knowledge and Data Engineering, Knowledge and Information Systems, and the IEEE Intelligent Informatics Bulletin. He is a senior member of the IEEE and the IEEE Computer Society and a member of the ACM.