ttk2014030711s1 - IEEE Computer Society

advertisement
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622
Robust Perceptual Image Hashing Based on
Ring Partition and NMF
Zhenjun Tang, Xianquan Zhang, and Shichao Zhang, Senior Member, IEEE
Abstract—This paper designs an efficient image hashing with a ring partition and a non-negative matrix factorization (NMF),
which is with both the rotation robustness and good discriminative capability. The key contribution is a novel construction of
rotation-invariant secondary image, which is used for the first time in image hashing and helps to make image hash resistant to
rotation. In addition, NMF coefficients are approximately linearly changed by content-preserving manipulations, so as to
measure hash similarity with correlation coefficient. We conduct experiments for illustrating the efficiency with 346 images. Our
experiments show that the proposed hashing is robust against content-preserving operations, such as image rotation, JPEG
compression, watermark embedding, Gaussian low-pass filtering, gamma correction, brightness adjustment, contrast
adjustment and image scaling. Receiver operating characteristics (ROC) curve comparisons are also conducted with the stateof-the-art algorithms, and demonstrate that the proposed hashing is much better than all these algorithms in classification
performances with respect to robustness and discrimination.
Index Terms—Image hashing, multimedia security, non-negative matrix factorization, ring partition.
——————————  ——————————
1 INTRODUCTION
T
HE Internet and multimedia devices, such as digital
cameras, scanners and smart cellphones, have made
us easier and faster to capture, store, share and convey hundreds of thousands of images, songs and videos.
While we are enjoying our life increasingly with multimedia products, there are still many challenging problems faced by academia and industries. For example,
there may be multiple copies for an image in a PC, where
the copies can be in different digital representations with
visual contents the same with the original one. It is clearly
an actual yet challenging issue to efficiently search for all
similar versions (including the original one and its copies)
of the image from large-scale multimedia data [1, 2]. In
particular, while powerful multimedia tools make digital
operations, such as editing and tampering, much easier
than ever before, assurance of multimedia content security has been an important concern to multimedia community [3]. These real demands lead to an emerging multimedia technology, known as image hashing. We study a
new yet robust image hashing in this paper.
Image hashing not only allows us to quickly find image copies in large databases, but also can ensure content
security of digital images. Image hashing maps an input
image to a short string, called image hash, and has been
widely used in image retrieval [1], image authentication
[4], digital watermarking [5], image copy detection [6],
tamper detection [7], image indexing [8], multimedia forensics [9], and reduced-reference image quality assessment [10].
There are some classical cryptographic hash functions,
e.g., SHA-1 and MD5, which can convert input message
(document, image, graphic, text, video, etc.) into a fixedsize string. However, they are sensitive to bit-level changes and cannot be suitable for image hashing. This is because digital images often undergo normal digital processing, such as JPEG compression and image enhancement in real applications, without changing visual contents of images. This is the first one of two basic properties of image hash function, known as perceptual robustness, i.e., a hash function should be robust against content-preserving operations, such as geometric transform,
format conversion and JPEG compression. In other
words, hashes of an image and its processed versions are
expected to be the same or very similar. Hash will be expected to be significantly changed only when visual content is altered by malicious operations, such as object deletion and object insertion. Another basic property is the
discriminative capability, i.e., images with different contents should have different hashes. This means that hash
distance between different images should be large
enough. Moreover, image hash function should satisfy
additional properties when it is applied to some applications. For example, image hash must be dependent on one
————————————————
or several keys for application in digital forensics.
 Z. Tang is with Department of Computer Science, Guangxi Normal UniConsequently, there are many image hashing algoversity, Guilin 541004, China. E-mail: tangzj230@163.com.
 X. Zhang is with Department of Computer Science, Guangxi Normal Uni- rithms successfully designed in last decade. From an applied context, there are still some limitations in image
versity, Guilin 541004, China. E-mail: zxq6622@126.com.
 S. Zhang is with Department of Computer Science, Guangxi Normal Uni- hashing. For example, image rotation is a useful operation.
versity, Guilin 541004, China; and the Faculty of Information Technology,
In the post-processing of photographs, image rotation has
University of Technology, Sydney, NSW 2007, Australia. E-mail:
frequently been exploited to correct those photographs
zhangsc@gxnu.edu.cn. (Corresponding author)
with imperfect ground level. However, most of them are
Manuscript received Sep. 4, 2012. Please note that all acknowledgments should be sensitive to rotation, mainly including, such as [4], [11],
placed at the end of the paper, before the bibliography.
xxxx-xxxx/0x/$xx.00 © 200x IEEE
1
2
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622
[12], [13], [14] and [15]. This means that they will falsely
identify those corrected photographs as different images.
Some methods are robust against rotation, but their discriminative capabilities are not good enough, such as [16],
[17],[18], [19], [20] and [21]. It is a challenging task to simultaneously satisfy both the requirements of rotation robustness and good discriminative capability in developing high performance hashing algorithms. In this paper,
we propose an efficient robust image hashing to meet
both rotation robustness and good discriminative capability. This algorithm is based on a ring partition and a nonnegative matrix factorization (NMF). The key technique is
a novel construction of secondary image achieved by the
ring partition. The secondary image is invariant to image
rotation and then makes our algorithm resistant to rotation. In addition, the use of NMF provides the proposed
algorithm good discriminative capability. This is because
NMF is an efficient technique for learning parts-based
representation, and the more information about local image content an image hash contains, the more discriminative the image hash. We conduct experiments for illustrating the efficiency with 346 images, including 146 images
in the USC-SIPI Image Database [22], and 200 different
color images, where 100 images are taken from the
Ground Truth Database [23], 67 images are downloaded
from the Internet, and 33 images are captured by digital
cameras. Our experiments demonstrate that the proposed
algorithm reaches a desirable tradeoff between the rotation robustness and the discriminative capability.
The rest of this paper is organized as follows. Section 2
briefly recalls the related works on image hashing. Section 3 describes the proposed image hashing. Section 4
and Section 5 present the experimental results and performance comparisons, respectively. Conclusions are finally drawn in Section 6.
2 RELATED WORKS
The earliest work of image hashing is introduced by
Schneider and Chang [24] at the 3rd International Conference on Image Processing (ICIP). At the 7th ICIP, Venkatesan et al. [11] published their famous paper entitled
‘Robust image hashing’. From then on, many researchers
have devoted themselves to developing image hashing
algorithms. In terms of the underlying techniques, the
existing hashing algorithms can be roughly classified into
five categories as follows.
The first class includes hashing algorithms based on
discrete wavelet transform (DWT) [4, 11, 25, 26]. Venkatesan et al. [11] used statistics of wavelet coefficients to construct image hashes. This method is resilient to JPEG
compression, median filtering and rotation within 2°, but
fragile to gamma correction and contrast adjustment.
Monga and Evans [25] exploited the end-stopped wavelet
transform to detect visually significant feature points. To
make a short hash, Monga et al. [26] proposed a heuristic
clustering algorithm with a polynomial time for feature
point compression. Ahmed et al. [4] gave a secure image
hashing scheme for authentication by using DWT and
SHA-1. It can be applied to tamper detection, but fragile
to some normal digital processing, such as brightness
adjustment and contrast adjustment.
The second class is the use of discrete cosine transform
(DCT) [12, 13]. Fridrich and Goljan [12] found that DCT
coefficients can indicate image content and then proposed
a robust hashing based on this observation for application
in digital watermarking. This hashing is robust against
normal processing, but sensitive to image rotation. Lin
and Chang [13] designed an image authentication system
using robust hashing, which is based on invariant relations between DCT coefficients at the same position in
separate blocks. The hashing method can distinguish
JPEG compression from malicious attacks and is also fragile to rotation.
The third class advocates the Radon transform (RT) [16,
18, 27]. Lefebvre et al. [16] are the first of exploiting RT to
construct robust hashes. Seo et al. [27] used autocorrelation of each projection in the RT domain to design
image hashing. Motivated by RT, Roover et al. [18] designed a scheme called RASH method by dividing an image into a set of radial projections of image pixels, extracting RAdial Variance (RAV) vector from these radial projections, and compressing the RAV vector by DCT. The
RASH method is resilient to image rotation and rescaling, but its discriminative capability needs to be improved.
The fourth class develops image hashing techniques
with discrete Fourier transform (DFT) [14, 19, 21].
Swaminathan et al. [19] used the DFT coefficients to produce image hashes. This algorithm is resilient to several
content-preserving modifications, such as moderate geometric transforms and filtering. Wu et al. [14] exploited
RT combining with DWT and DFT to develop image
hashing. Their algorithm is robust against print-scan attack and small angle rotation. Recently, Lei et al. [21] extracted moment features from RT domain and used significant DFT coefficients of the moments to produce
hashes. The RT-DFT hashing is better than the methods
[19, 27] in classification performances between the perceptual robustness and the discriminative capability.
The last class proposes to employ the matrix factorization [7, 17, 20]. Kozat et al. [17] viewed images and attacks
as a sequence of linear operators, and proposed to calculate hashes using singular value decompositions (SVDs).
This method, called SVD-SVD hashing, is robust against
geometric attacks, e.g., rotation, at the cost of significantly
increasing misclassification. Monga and Mihcak [20] pioneered the use of NMF to derive image hashing. They
applied NMF to some sub-images, used the combination
coefficients in the matrix factorization to construct a secondary image, obtained its low-rank matrix approximation by using NMF again, and concatenated the matrix
entries to form an NMF-NMF vector. To make a short
hash, they calculated the inner product between the
NMF-NMF vector and a set of weight vectors which have
i.i.d. Gaussian components of zero mean and unit variance. The NMF-NMF-SQ hashing is resilient to geometric
attacks, but it cannot resist some normal manipulations,
e.g., watermark embedding. Tang et al. [7] observed the
invariant relation existing in the NMF coefficient matrix
Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF
and used it to construct robust hashes. This method is
robust against JPEG compression, additive noise and watermark embedding, but also fragile to image rotation.
Besides the above five categories, there are also many
other significant techniques for image hashing. For example, Mao and Wu [28] investigated the unicity distances of
the hashing methods [11, 19] and concluded that the key
reused several dozen times is no longer reliable. Khelifi
and Jiang [29] proposed a robust image hashing algorithm by using the theory of optimum watermark detection. This method is resilient to rotation within 5°. Lu and
Wu [9] designed an image hashing for digital forensics
using visual words. Tang et al. [15] proposed a lexicographical framework for image hashing and gave an implementation based on DCT and NMF. In another work,
Tang et al. [30] used structural features to design image
hashing and proposed a similarity metric for tamper detection. A common weakness of the algorithms [15, 30] is
the fragility to rotation. Recently, Lv and Wang [31] designed a hashing algorithm using scale invariant feature
transform (SIFT) and Harris detector. Tang et al. [32] converted an RGB color image into YCbCr and HSI color
spaces, and extracted invariant moments [33] from each
component to construct hashes. This method is resistant
to image rotation with arbitrary degree, but will falsely
identify 0.43% different images as similar images when
98% visually identical images are correctly detected. In
another study, Li et al. [34] proposed a robust image
3
hashing based on random Gabor filtering (GF) and dithered lattice vector quantization (LVQ), where GF is used
to extract image features resilient to rotation and LVQ is
exploited to conduct compression. The GF-LVQ hashing
is also robust against image rotation with arbitrary degree
and has a fast speed. In [35], Tang et al. exploited histogram of color vector angles to generate hashes. This algorithm is robust against rotation with arbitrary degree, but
will mistakenly view 17.05% different images as identical
images when the ratio of correct detection is 98%. Zhao et
al. [36] used Zernike moments to design image hashing,
which only tolerates rotation within 5°.
Robustness performances of some typical hashing algorithms against common content-preserving operations
are summarized in Table 1, where the word 'Yes' means
that the algorithm is robust against such operation, the
word 'No' implies that the algorithm is sensitive to the
operation, and the word 'Unknown' represents that such
robustness result has not been reported in the literature as
far as we know. From the results, we observe that some
state-of-the-art algorithms are robust against image rotation with arbitrary degree, such as [17], [18], [20], [21], [32]
and [34]. However, we also find that these algorithms are
not good enough in discrimination. Therefore, more efforts on developing high performance algorithms with
rotation robustness and good discrimination are in demand.
TABLE 1
HASHING ALGORITHMS AGAINST COMMON CONTENT-PRESERVING OPERATIONS
Algorithm
JPEG
compression
Watermarking
embedding
Image
scaling
Image
rotation
Gaussian
low-pass
filtering
Gamma
correction
Brightness
adjustment
Contrast
adjustment
[4]
Yes
Unknown
No
No
Yes
Unknown
No
No
[7]
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
[11]
Yes
Unknown
Yes
Within 2°
Yes
No
Unknown
No
[12]
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
[15]
Yes
Yes
Yes
Within 1°
Yes
Yes
Yes
Yes
[16]
Yes
Yes
Yes
Arbitrary degree
Yes
Yes
Yes
Yes
[17]
Yes
Yes
Yes
Arbitrary degree
Yes
Yes
Yes
Yes
[18]
Yes
Yes
Yes
Arbitrary degree
Yes
Yes
Yes
Yes
[19]
Yes
Unknown
Yes
Within 20°
No
Yes
No
Unknown
[20]
Yes
No
Yes
Arbitrary degree
Yes
Yes
Yes
Yes
[21]
Yes
Yes
Yes
Arbitrary degree
Yes
Yes
Yes
Yes
[25]
Yes
Unknown
Yes
Within 5°
Yes
Unknown
Unknown
Yes
[30]
Yes
Yes
Yes
Within 1°
Yes
Yes
Yes
Yes
[32]
Yes
Yes
Yes
Arbitrary degree
Yes
Yes
Yes
Yes
[34]
Yes
Yes
Yes
Arbitrary degree
Yes
Yes
Yes
Yes
4
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622
3 PROPOSED IMAGE HASHING
Our image hashing consists of three steps, as shown in
Fig. 1. In preprocessing, the input image is converted to a
normalized image with a standard size. In the second
step, the normalized image is partitioned into different
rings, which are then used to construct a secondary image. In the third step, NMF is applied to the secondary
image, and image hash is finally formed by NMF coefficients. In the following subsections, we will present an
NMF, the scheme of ring partition, and the illustration of
our image hashing.
3.2 Ring partition
In general, rotation manipulation takes image center as
origin of coordinates. Fig. 2 (a) is a central part of Airplane, and (b) is the corresponding part of the rotated
Airplane. Obviously, visual contents in the corresponding
rings of Figs. 2 (a) and (b) are kept unchanged after image
rotation. Thus, to make image hash resilient to rotation,
we can divide an image into different rings and use them
to form a secondary image invariant to rotation. Fig. 3 is a
schematic diagram of secondary image construction,
where (a) is a square image divided into 7 rings and (b) is
the secondary image formed by these rings. Detailed
scheme of secondary image construction is as follows.
Input image
Preprocessing
Normalized image
Ring partition
Secondary image
NMF
(a) Central part of Airplane
(b) Rotated by 30
Fig. 2. Ring partition of an image and its rotated version.
Image hash
Fig. 1. Block diagram of our image hashing.
3.1 NMF
NMF is an efficient technique of dimensionality reduction, which has shown better performance than principal
components analysis (PCA) and vector quantization (VQ)
in learning parts-based representation [37]. In fact, NMF
has been successfully used in face recognition [38], image
representation [39], image analysis [40], signal separation
[41], data clustering [42], and so on. A non-negative matrix V=(Vi,j)M×N is generally viewed as a combination of N
vectors sized M1. NMF results of V are two nonnegative matrix factors, i.e., B=(Bi,j)M×K and C=(Ci,j)K×N,
where K is the rank for NMF and K<min(M, N). The matrices B and C are called the base matrix and the coefficient matrix (or encoding matrix), respectively. They can
be used to approximately represent V such that:
V ≈ BC
(1)
In literature, various algorithms [43, 44] have been proposed to achieve NMF. In this paper, we employ the multiplicative update rules [43] to find B and C as follows.
N

Ck , jVi , j (BC)i , j

j 1
 Bi ,k  Bi ,k
N
Ck , j

j 1

M
(2)

Bi , kVi , j (BC)i , j
i

1
Ck , j  Ck , j
M

Bi , k

i 1
i  1,, M ; j  1,, N ; k  1,, K




The above rules correspond to minimizing the generalized Kullback-Leibler (KL) divergence as follows.

Vi, j
M
N 
(3)
F
 Vi, j  (BC)i, j 
Vi, j log
i 1
j 1
(BC)i, j


 
r7
r6
r5
r4
r3 r2
r1
(a) Ring partition
(b) Secondary image
Fig. 3. Schematic diagram of secondary image construction.
Let the size of square image be mm, n be the total
number of rings and Rk be a set of those pixel values in
the k-th ring (k=1, 2, …, n). In here, we only use the pixels
in the inscribed circle of the square image and divide the
inscribed circle into rings with equal area. This is because
each ring is expected to be a column of the secondary image. The ring partition can be done by calculating the circle radii and the distance between each pixel and the image center. As shown in Fig. 3 (a), the pixels of each ring
can be determined by two neighbor radii except those of
the innermost ring. Suppose that rk is the k-th radius (k=1,
2, …, n), which is labeled from small value to big value.
Thus, r1 and rn are the radii of the innermost and outmost
circles, respectively. Clearly, rn= m/2 for the m×m images, where  means downward rounding. To determine
other radii, the area of the inscribed circle A and the average area of each ring A are firstly calculated as follows.
A= rn2
(4)
A=A/n
(5)
So r1 can be computed by
Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF
r1 
A

(6)
Thus, other radii rk (k=2, 3, …, n−1) can be obtained by the
following equation.
rk 
 A   rk21

(7)
Let p(x, y) be the value of the pixel in the y-th row and
the x-th column of the image (1≤ x, y ≤ m). Suppose that
(xc, yc) are the coordinates of the image center. Thus,
xc=m/2+0.5 and yc=m/2+0.5 if m is an even number. Otherwise, xc=(m+1)/2 and yc=(m+1)/2. So the distance between p(x, y) and the image center (xc, yc) can be measured by the Euclidean distance as follows.
d x, y  ( x  xc ) 2  ( y  yc ) 2
(8)
Having obtained the circle radii and pixel distances, we
can classify these pixel values into n sets as follows.
R1 = {p(x, y) | dx,y ≤ r1}
(9)
Rk = {p(x, y) | rk−1 < dx,y ≤ rk} (k = 2, 3, …, n)
(10)
Next, we rearrange the elements of Rk (k=1, 2, …, n) to
make a sorted vector uk in ascending order. This operation guarantees that uk is unrelated to rotation. As pixel
coordinates are discrete, the pixel number of each set is
not always equal to A. Since the pixels of each ring are
expected to form a column of the secondary image, uk is
then mapped to a new vector vk sized A1 by linear interpolation. Thus, the secondary image V is obtained by
arranging these new vectors as follows.
V = [v1, v2, v3, …, vn]
(11)
As vk is unrelated to rotation, V is also invariant to this
operation. Except the rotation-invariant merit, the secondary image has another advantage that it has fewer
columns than the original image. Since each column is
viewed as a high dimensional vector and will be compressed by NMF, fewer columns are helpful to compact
hash.
3.3 Illustrating our approach
Below our image hashing is illustrated with an example
step by step as follows.
(1) Preprocessing. The input image is firstly mapped to
a normalized size mm by bilinear interpolation. This ensures that our hashing is resilient to scaling operation and
hashes of different size images have the same length. For
an RGB color image, we convert it into YCbCr color space
by the following equation.
 Y   65.481 128 .553 24.966   R   16 
  
   
112  G   128  (12)
Cb    37.797  74.203
 Cr   112
 93.786  18.214   B  128 
where R, G and B represent the red, green and blue components of a pixel, and Y, Cb, and Cr are the luminance,
blue-difference chroma and red-difference chroma, respectively. After color space conversion, we take the luminance component Y for representation.
(2) Ring partition. Divide Y into n rings and exploit
them to produce a secondary image V by using the
scheme described in the Subsection 3.2. The aim of this
step is to construct a rotation-invariant matrix for dimen-
5
sionality reduction.
(3) NMF. Apply NMF to V and then the coefficient matrix C is available. Concatenate the matrix entries and
obtain a compact image hash. Thus, the hash length is
L=nK, where n is the number of rings and K is the rank for
NMF.
To measure similarity between two image hashes, we
take correlation coefficient as the metric. Let h(1)=[h1(1),
h2(1), …, hL(1)] and h(2)=[h1(2), h2(2), …, hL(2)] be two image
hashes. Thus, the correlation coefficient is defined as
L
S
 [h
L
 [h
i 1
(1)
i
(1)
i
i 1
 1 ][hi (2)  2 ]
 1 ]
2
L
[h
i 1
i
(2)
(13)
 2 ]  
2
where  is a small constant to avoid singularity when
L
[ h
i 1
i
(1)
 1 ]2  0 or
L
[h
i 1
i
(2)
 2 ]2  0 , 1 and 2 are the
means of h(1) and h(2), respectively. The range of S is [−1,
1]. The more similar the input images, the bigger the S
value. If S is bigger than a pre-defined threshold T, the
images are viewed as visually identical images. Otherwise, they are different images or one is a tampered version of the other. The reason of choosing correlation coefficient as the metric is based on the observation that NMF
coefficients are approximately linearly changed by content-preserving manipulations. This will empirically be
validated in Subsection 4.1.
We now present two examples of our image hashing
algorithm. Fig. 4 (a) is an image with imperfect ground
level. Fig. 4 (b) is a similar version of 4(a), which undergone a sequence of digital processing, including image
rotation with 4º, JPEG compression with quality factor 50,
brightness adjustment with the Photoshop's scale 50, and
Gaussian white noise of mean 0 and variance 0.01. Both
the image sizes of Figs. 4 (a) and (b) are 760×560. We employed our image hashing with parameters of m=512,
n=32 and K=2 to generate the image hashes of Figs. 4 (a)
and (b). The results are illustrated in Table 2. We calculated similarity between the hashes of Figs. 4 (a) and (b), and
found that S=0.9957, indicating that Figs. 4 (a) and (b) are
a pair of visually identical images.
(a) Image with imperfect ground level
(b) A similar version of (a)
Fig. 4. A pair of visually identical images.
4 EXPERIMENTAL RESULTS
In the experiments, the input image is resized to 512512,
the used number of rings is 32, and the rank for NMF is 2,
i.e., m=512, n=32, and K=2. Thus, the hash length is
L=nK=64 decimal digits. To validate our hashing performances, experiments about perceptual robustness and
discriminative capability are conducted in the next two
6
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622
subsections, respectively. The third subsection will evaluate sensitivity to visual content changes. Effect of different parameters on hash performances, i.e., numbers of
rings and ranks for NMF, will be discussed in the final
subsection.
TABLE 2
HASH VALUES OF THE IMAGES ILLUSTRATED IN FIGURE 4
Image
Image hash
114,165,134, 99, 89, 90, 88, 86,
83, 85, 83, 87, 88, 90, 91, 94,
95, 97,104,118,120,136,141,145,
148,151,154,156,157,158,158,157,
71, 71,129,183,202,206,206,206,
210,200,198,184,175,167,166,162,
162,160,146,128,121,104, 98, 92,
88, 82, 77, 72, 67, 65, 64, 64
140,192,158,124,112,109,106,106,
102,107, 97,111,115,116,119,123,
120,124,130,144,145,157,157,165,
169,170,172,173,178,179,179,179,
86, 89,152,208,231,238,242,239,
241,230,234,212,199,193,191,187,
190,185,169,150,148,132,130,118,
111,108,104, 98, 90, 88, 87, 88
Fig. 4 (a)
Fig. 4 (b)
4.1 Perceptual robustness
All images in the USC-SIPI Image Database [22] were taken as test images, except those in the fourth volume, i.e.,
'Sequences', which are often used for sequence image retrieval. This means, there are totally 146 test images selected in our experiments. We attacked these images by
normal content-preserving operations with different parameters, and calculated similarities between hashes of
the original and their attacked images. The experiments
demonstrated that the proposed hashing is robust against
the test operations except a few cases. For space limitation, typical examples are exemplified in this paper,
where Airplane, Baboon, House, Lena and Peppers are
the test images sized 512512. We exploited StirMark
benchmark 4.0 [45] to attack them, where JPEG compression, watermark embedding, scaling and rotation are all
used. The parameters are set as follows: quality factors for
JPEG compression are 30, 40, …, 100; strengths of watermark embedding are 10, 20, …, 100; scaling ratios are 0.5,
0.75, 0.9, 1.1, 1.5 and 2.0; and rotation angles are 1º, 2º, 5º,
10º, 15º, 30º, 45º, 90º, −1º, −2º, −5º, −10º, −15º, −30º, −45º
and −90º. Since rotation operation will expand the sizes of
the attacked images, only the center parts sized 361361
are taken from the original and the attacked images for
hash generation. Brightness adjustment and contrast adjustment with Photoshop are also applied to the test images. Their parameters are both 10, 20, −10, and −20.
Moreover, gamma correction and 33 Gaussian low-pass
filtering in MATLAB are also used to produce processed
versions. The  values for gamma correction are 0.75, 0.9,
1.1, 1.25, and the standard deviations for Gaussian blur
are 0.3, 0.4, ..., 1.0. The settings of our experiments are
summarized in Table 3. Therefore, each test image has 60
attacked versions.
We extracted image hashes of the test images and their
attacked versions, and calculated their correlation coefficients. The results are presented in Fig. 5, where the abscissa is the parameter values of each operation and the
ordinate represents the correlation coefficients. It is observed that all correlation coefficients in Figs. 5 (a), (b), (c)
and (e) are bigger than 0.98. In Figs. 5 (d), (f) and (g), each
has one value below 0.98. They are 0.9653, 0.9790 and
0.9763, respectively. In Fig. 5 (h), two values, i.e., 0.9797
and 0.9573, are smaller than 0.98 when the Photoshop’s
scales are −20 and 20. Therefore, we can choose T=0.98 as
a threshold to resist most of the above content-preserving
operations. In this case, 5/300100%=1.67% pairs of visually identical images are falsely considered as different
images. As the threshold decreases to 0.95, no pair of visually identical images is viewed as distinct images.
Moreover, we validated our hashing performances
under the combined attacks between rotation and other
content-preserving operations. Here, we took Airplane as
test image, and further attacked the rotated Airplane with
with 30º by JPEG compression with quality factor 30, 33
Gaussian low-pass filtering with a unit standard deviation, gamma correction with  =1.1, brightness adjustment
with the Photoshop’s scale 20, and contrast adjustment
with the Photoshop’s scale 20, respectively. The S values
of the five combined attacks are 0.98516, 0.99020, 0.98699,
0.99133, and 0.98801, respectively, which are all bigger
than the above mentioned thresholds.
Note that correlation coefficient S is a useful metric to
measure linearity between two vectors. Its rang is [−1, 1],
where S=1 means that the hash vectors are completely
linearly dependent. The above S values are all bigger than
0.95. Those S values close to 1 indicate strong linear dependences, and therefore empirically verify that NMF
coefficients are approximately linearly changed by content-preserving manipulations.
0.99
0.99
0.99
0.98
0.98
0.98
Airplane
0.97
S
1
S
1
S
1
Airplane
0.97
Baboon
House
Lena
0.96
40
50
60
70
Quality factor
80
(a) JPEG compression
90
Baboon
House
Lena
0.96
Peppers
0.95
30
Airplane
0.97
Baboon
House
Lena
0.96
Peppers
100
0.95
10
20
30
40
Peppers
50
60
Strength
70
80
(b) Watermark embedding
90
100
0.95
0.5
0.75 0.9
1.1
1.5
Ratio
(c) Scaling
2
Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF
7
0.99
0.99
0.99
0.98
0.98
0.98
Airplane
0.97
S
1
S
1
S
1
Airplane
0.97
Baboon
House
Lena
0.96
Baboon
House
Lena
0.96
Peppers
0.95
-90
Airplane
0.97
Baboon
House
Lena
0.96
Peppers
-45 -30 -15 -5 5 15 30 45
Rotation angle
90
0.95
0.3
(d) Rotation
0.4
0.5
0.6
0.7
0.8
Standard deviation
0.9
Peppers
1
0.95
0.75
0.9
(e) Gaussian low-pass filtering
0.99
0.99
0.98
0.98
1.25
(f) Gamma correction
S
1
S
1

1.1
Airplane
0.97
Airplane
0.97
Baboon
Baboon
House
Lena
0.96
0.96
House
Lena
0.95
-20
-10
Peppers
0.95
-20
-10
0
Photoshop's scale
10
Peppers
20
0
Photoshop's scale
10
20
(g) Brightness adjustment
(h) Contrast adjustment
Fig. 5. Robustness test based on five standard test images.
TABLE 3
THE USED CONTENT-PRESERVING OPERATIONS AND THEIR PARAMETER VALUES
Tool
Manipulation
Parameter
Parameter values
StirMark
JPEG compression
Quality factor
30,40,50,60,70,80,90,100
StirMark
Watermark embedding
Strength
10,20, 30,40,50,60,70,80,90,100
StirMark
Scaling
Ratio
StirMark
Rotation and cropping
Photoshop
Brightness adjustment
Photoshop
Contrast adjustment
Photoshop’s scale
10, 20,  10,  20
MATLAB
Gamma correction
γ
0.75, 0.9, 1.1, 1.25
MATLAB
33 Gaussian low-pass filtering
Standard deviation
0.3, 0.4,0.5,0.6,0.7,0.8,0.9,1.0
0.5, 0.75, 0.9, 1.1, 1.5, 2.0
1, 2, 5, 10, 15, 30, 45, 90, 1, 2, 5,
Rotation angle in degree
10, 15, 30, 45, 90
Photoshop’s scale
10, 20,  10,  20
similar images. As the threshold reaches 0.98, false judg4.2 Discriminative capability
ment will not occur.
To validate the discrimination, 200 different color images
were collected to form an image database, where 100 images were taken from the Ground Truth Database [23], 67
images were downloaded from the Internet, and 33 imag(a) Images taken from the Ground Truth Database
es were captured by digital cameras. These image contents include buildings, human beings, sport, landscape,
etc. The image sizes range from 256×256 to 2048×1536. Fig.
6 presents typical examples of the image database. We
calculated image hashes of these 200 images, computed
correlation coefficient S between each pair of different
(b) Images downloaded from the Internet
images, and then obtained 19900 values. Distribution of
these results is presented in Fig. 7, where the abscissa is
the correlation coefficient S and the ordinate represents
the frequency of S. In Fig. 7, the minimum and maximum
S values are −0.6836 and 0.9717, and the mean and the
(c) Images captured by digital cameras
standard deviation of all S values are 0.3910 and 0.3014,
Fig.
6. Typical images in the test database.
respectively. If T=0.95 is selected as a threshold, 0.12%
pairs of different images are falsely considered as visually
8
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622
300
250
Frequency
200
150
100
50
0
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8 0.9 1 1.1
S
Fig. 7. Discrimination test based on 200 different images.
4.3 Sensitivity to visual content changes
When some operations, such as object deletion and object
insertion, are applied to an image, the meaning of the
attacked image is different from that of its original image.
In other words, the original image and the attacked images are perceptually different. Generally, these operations
are viewed as image tampering and are expected to be
significantly reflected in image hash [7, 9, 13, 19, 21, 30].
This means that image hashing should be sensitive to image tampering. To validate sensitivity to visual content
changes, we simulated image tampering by deleting objects or inserting objects using Photoshop, and generated
dozens of tampered images. We applied the proposed
hashing to the orignal images and their tampered images,
calculated similarity S between each pair of hashes, and
found that all S values are smaller than the above mentioned thresholds. For space limitation, four typical results are exemplified here. Figs. 8(a), (c), (e) and (g) are
four original images, and Figs. 8(b), (d), (f) and (h) are
their tampered versions, respectively. Hash similarity
between each pair of images is given in Table 4. From the
results, we observed that all S values are smaller than
0.90. Therefore, the proposed hashing can distinguish
tampered images from the processed images by using the
above threshold 0.95 or 0.98.
(a) Original image sized 540319
(b) Tampered image: a vehicle deleted
(c) Original image sized 538388
(d) Tampered image: a solider deleted
(e) Original image sized 482319
(f) Tampered image: jungle inserted
(g) Original image sized 512512
(h) Tampered image: a flower inserted
Fig. 8. Original images and their tampered versions.
TABLE 4
SIMILARITY S BETWEEN THE ORIGINAL AND TAMPERED IMAGES
Image pair
Figs. 8 (a) and (b)
S
0.861
Figs. 8 (c) and (d)
0.724
Figs. 8 (e) and (f)
0.853
Figs. 8 (g) and (h)
0.788
4.4 Effect of number of rings and rank for NMF
Receiver operating characteristics (ROC) graph [46] was
exploited to visualize classification performances with
respect to the robustness and the discriminative capability
under different parameters. Thus, both true positive rate
(TPR) PTPR and false positive rate (FPR) PFPR can be calculated as follows.
PTPR 
n1
N1
(14)
n2
(15)
N2
where n1 is the number of the pairs of visually identical
images considered as similar images, N1 is the total pairs
of visually identical images, n2 is the number of the pairs
of different images viewed as similar images, and N2 is
the total pairs of different images. Obviously, TPR and
FPR indicate the perceptual robustness and the discriminative capability, respectively.
In the experiments, all images are mapped to 512512,
i.e., m=512. The used ranks for NMF are 2 and 3, i.e., K=2
and K=3. For each rank, four numbers of rings are tested,
i.e., n=8, n=16, n=32 and n=64. The test images used in the
Subsections 4.1 and 4.2 are also adopted here. For each
group of parameter values, robustness validation and
discrimination test are both calculated. We choose a
threshold and calculate TPR and FPR. This process is repeated for different thresholds and the ROC curve is then
obtained. The used thresholds are 0.940, 0.950, 0.960,
0.970, 0.980, 0.985, 0.988, 0.990, 0.995 and 0.998. Fig. 9 is
the ROC curves under different parameters. It is observed
that, for a fixed rank, the whole hashing performances
will be improved when the number of rings increases. In
fact, the number of rings is equal to column number of
the secondary image. Fewer columns will lead to fewer
features in the final hash, which will inevitably hurt the
discriminative capability. For a fixed number of rings, a
bigger rank will make better hashing performances. This
is because a bigger rank means more elements in the
PFPR 
Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF
hash, which not only preserve the perceptual robustness,
but also improve the discriminative capability. In fact,
hashing performances are related to hash length. Generally, a short image hash will have good robustness, but
makes poor discrimination. As hash length increases, discriminative capability is strengthened while perceptual
robustness slightly decreases. Note that image hash is a
compact representation, meaning that hash length is expected to be short enough. Thus, a tradeoff is needed in
choosing the hash length, i.e., the values of K and n. In the
experiments, we find that, for 512512 normalized images, hash lengths ranging from 40 to 100 can reach a desirable compromise between the robustness and the discrimination, e.g., (1) K=2 and n=32, (2) K=3 and n=16, (3) K=3
and n=32.
9
TABLE 5
AVERAGE TIME OF HASH GENERATION UNDER DIFFERENT
PARAMETERS
K
n
2
8
Hash length
(decimal digits)
16
Average time
(seconds)
2.5
2
16
32
2.6
2
32
64
2.8
2
64
128
3.3
3
8
24
2.7
3
16
48
2.7
3
32
96
2.9
3
64
192
3.4
1
True positive rate
n=8
0.9
n=16
n=32
0.8
n=64
0.7
0.6
0.5
0.4
0.3
0.2
0
0.005
0.01
False positive rate
0.015
0.02
(a) K=2
1
True positive rate
n=8
0.9
n=16
n=32
0.8
n=64
0.7
0.6
0.5
0.4
0.3
0.2
0
0.005
0.01
False positive rate
0.015
0.02
(b) K=3
Fig. 9. ROC curves under different parameters.
To compare average time of hash generation under different parameters, we recorded the total time of calculating 200 different image hashes in discrimination tests and
then computed the consumed time for generating a hash.
Our algorithm was implemented in MATLAB language,
running on a PC with 2.61 GHz AMD Athlon 64 X2 Dual
CPU and 1.87 GB RAM. The results are presented in Table 5. We observe from the table that, for a fixed K, average time slightly increases when the n value becomes
large. Similarly, for a fixed n, as the K value increases,
consumed time will also increase.
5 PERFORMANCE COMPARISONS
To show advantages of the proposed hashing, we compared it with some state-of-the-art algorithms: RASH
method [18], RT-DFT hashing [21], SVD-SVD hashing
[17], NMF-NMF-SQ hashing [20], invariant moments
based hashing [32], and GF-LVQ hashing [34]. To make
fair comparisons, the same images were adopted for robustness validations and discrimination tests. All images
were resized to 512512 before hash generation [17, 20].
The luminance components in YCbCr color space were
also taken for representation. The original similarity metrics of the assessed algorithms were also adopted, i.e., the
peak of cross correlation (PCC) for the RASH method, the
modified L1 norm for the RT-DFT hashing, the L2 norm
for the SVD-SVD hashing, the NMF-NMF-SQ hashing
and the invariant moments based hashing, and the normalized Hamming distance for the GF-LVQ hashing. The
used parameters for the SVD-SVD hashing are as follows:
the first number of overlapping rectangles is 100, rectangle size is 64×64, the second number of overlapping rectangles is 20 and the rectangle size is 40×40. For the NMFNMF-SQ hashing, the used parameters are: sub-image
number is 80, height and width of sub-images is 64, rank
of the first NMF is 2 and rank of the second NMF is 1. For
the GF-LVQ hashing, image size is also normalized to
512512 and other parameter values are the same with
those used in [34] .For the proposed hashing, experimental results under the parameters K=2 and n=32 were
taken for comparisons. Therefore, the hash lengths of the
RASH method, the RT-DFT hashing, the SVD-SVD hashing, the NMF-NMF-SQ hashing, the invariant moments
based hashing and the proposed hashing are 40, 15, 1600,
64, 42 and 64 decimal digits, respectively. And the hash
length of the GF-LVQ hashing is 120 bits.
We exploited the RASH method, the RT-DFT hashing,
the SVD-SVD hashing, the NMF-NMF-SQ hashing, the
invariant moments based hashing and the GF-LVQ hashing to generate image hashes and calculated their hash
similarities by using the respective metrics. The results
are shown in Fig. 10, where the results of robustness validation and discrimination test are both presented for
viewing classification performances. Fig. 10 (a) is the results of the RASH method. The authors of the RASH
method [18] take 0.87 as the threshold. If PCC ≥ 0.87, the
images are viewed as visually identical images. Otherwise, they are different images. It is observed that the
minimum PCC of visually identical images is 0.649 and
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622
600
Frequency
400
300
200
100
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
L2 norm
(c) SVD-SVD hashing
150
Different images
Visually identical images
125
Frequency
100
75
50
25
0
0
0.5
1
1.5
2
2.5
3
3.5
L2 norm
x 10
4
(d) NMF-NMF-SQ hashing
600
Different images
Visually identical images
450
300
150
0
0
10
20
30
40
50
60
L2 norm
(e) Invariant moments based hashing
1000
Different images
Visually identical images
800
600
400
200
600
0
0
Different images
Visually identical images
500
Different images
Visually identical images
500
Frequency
the maximum PCC of different images is 0.962. This
means that PCC  [0.649, 0.962] will be inevitably misclassified. Fig. 10 (b) presents the results of the RT-DFT
hashing, where the minimum modified L1 norm of different images is 0.023, but the maximum distance of visually
identical images is 0.144. So the values in [0.023, 0.144]
will be misclassified. Fig. 10 (c) shows the L2 norms of the
SVD-SVD hashing. The minimum value of different images is 0.105 and the maximum value of similar images is
0.806. Thus, the overlapping interval between the L2
norms of similar images and different images is [0.105,
0.806]. It implies that, for those L2 norms in this interval,
misclassification cannot be avoided. The results of the
NMF-NMF-SQ hashing are given in Fig. 10 (d), where the
minimum L2 norm of different images is 3184 and the
maximum value of visually identical images is 14634.
Therefore, distances in the interval [3184, 14634] will be
falsely classified. The results of the invariant moments
based hashing are illustrated in Fig. 10 (e), where the minimum L2 norm of different images is 6.5 and the maximum value of visually identical images is 12.1. Thus, the
overlapping interval is [6.5, 12.1]. The results of the GFLVQ hashing are illustrated in Fig. 10 (f), where the minimum normalized Hamming distance of different images
is 0 and the maximum distance of visually identical images is 0.5417. Therefore, the overlapping interval is [0,
0.5417]. The results of the proposed hashing are presented
in Fig. 10 (g). It is found that the minimum S of similar
images is 0.957 and the maximum S of different images is
0.972. So the overlapping interval is [0.957, 0.972]. Clearly,
all the above hashing algorithms cannot find a threshold
which can divide their values into two subsets without
misclassification. It is also observed that the image pair
number of the proposed hashing in the overlapping interval is smaller than those of other algorithms in their
respective intervals. Therefore, we can intuitively conclude that the proposed hashing outperforms the compared algorithms in classification performances.
Frequency
10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Hamming distance
0.8
0.9
1
(f) GF-LVQ hashing
Frequency
400
300
300
250
Different images
Visually identical images
200
Frequency
200
100
150
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Peak of cross correlation
0.7
0.8
0.87
0.95 1
50
(a) RASH method
0
-0.8
800
Different images
Visually identical images
-0.6
-0.4
-0.2
0
0.2
Correlation coefficient
0.4
0.6
0.8 0.9
1
(g) Proposed hashing
600
Frequency
100
Fig. 10. Performance comparisons.
400
200
0
0
0.1
0.2
0.3
0.4
Modified L1 norm
0.5
(b) RT-DFT hashing
0.6
0.7
0.8
To make theoretical analysis of the above results, ROC
graphs were used again. For those algorithms with the
same FPR, the method with big TPR outperforms the one
with small value. Similarly, for those algorithms with the
same TPR, the hashing with small FPR is better than that
with big FPR. We exploited the thresholds of respective
Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF
1
0.9
0.8
True positive rate
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
SVD-SVD hashing
NMF-NMF-SQ hashing
Invariant moments
based hashing
GF-LVQ hashing
Proposed hashing
1
0.8
1
0.8
1
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
False positive rate
(c) SVD-SVD hashing
1
0.9
0.8
0.7
True positive rate
RT-DFT hashing
0.8
0.8
Thresholds
0.84, 0.85, 0.87, 0.88, 0.90,
0.92, 0.94, 0.96, 0.97, 0.98
0.01, 0.05, 0.08, 0.10, 0.15,
0.20, 0.30, 0.40, 0.50, 0.80
0.10, 0.20, 0.30, 0.40, 0.45,
0.50, 0.60, 0.70, 0.80, 0.90
1500,2000,2500,3000,3500,
4000,4500,5000,5500,6000
0.4
0.6
False positive rate
0.9
0
RASH method
0.2
(b) RT-DFT hashing
CURVES
Algorithm
0
1
True positive rate
algorithms given in Table 6 to calculate their TPR and
FPR values, and obtained the ROC curves as shown in Fig.
11, where (a), (b), (c), (d), (e), (f) and (g) are the results of
the RASH method, the RT-DFT hashing, the SVD-SVD
hashing, the NMF-NMF-SQ hashing, the invariant moments based hashing, the GF-LVQ hashing and the proposed hashing, respectively. From the results, we observe
that, when FPR is close to 0, the TPRs of the RASH method, the RT-DFT hashing, the SVD-SVD hashing, the NMFNMF-SQ hashing, the invariant moments based hashing,
the GF-LVQ hashing and the proposed hashing are
0.9100, 0.8767, 0.1259, 0.8133, 0.9300, 0.5167 and 0.9833,
respectively. When TPR reaches 1, the optimal FPRs of
the RASH method, the RT-DFT hashing, the SVD-SVD
hashing, the NMF-NMF-SQ hashing, the invariant moments based hashing, the GF-LVQ hashing and the proposed hashing are 0.2691, 0.1454, 0.9451, 0.5347, 0.0311,
0.9113 and 0.00116, respectively. It is clear that the proposed hashing outperforms the RASH method, the RTDFT hashing, the SVD-SVD hashing, the NMF-NMF-SQ
hashing, the invariant moments based hashing and the
GF-LVQ hashing.
TABLE 6
THRESHOLDS OF THE RESPECTIVE ALGORITHMS FOR ROC
11
0.6
0.5
0.4
0.3
1,4,8,12,15,18,22,27,30,34
0.2
0.05,0.10,0.15,0.20,0.25,
0.30, 0.35,0.45,0.50,0.55
0.940,0.950,0.960,0.970,0.980,
0.985,0.988,0.990,0.995,0.998
0.1
0
0
0.2
0.4
0.6
False positive rate
(d) NMF-NMF-SQ hashing
1
0.9
0.8
0.8
0.7
True positive rate
1
0.9
True positive rate
0.7
0.6
0.5
0.6
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0
0.2
0.4
0.6
False positive rate
(a) RASH method
0.8
1
0
0.2
0.4
0.6
False positive rate
0.8
(e) Invariant moments based hashing
1
12
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622
1
0.9
0.8
True positive rate
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
False positive rate
0.8
1
(f) GF-LVQ hashing
1
0.9
0.8
True positive rate
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
1.2
False positive rate
1.4
1.6
1.8
2
-3
x 10
(g) Proposed hashing
Fig. 11. ROC curve comparisons among different hashing algorithms.
We also evaluated content sensitivities of the compared
algorithms. To do so, we applied the compared algorithms to those image pairs in Fig. 8, calculated their similarity values, and obtained the resulsts as shown in Table
7. We took the optimum threshold of each algorithm
when FPR ≈ 0 to make fair comparions. In this case, the
optimum threshold of the proposed hashing is 0.98. With
this threshold, the proposed hashing can correctly identi-
fy all tampered images. The optimum thresholds of the
compared algorithms are listed in Table 7. It is observed
that, the RASH method cannot correctly identify the image pairs of Fig. 8 (a) and Fig. 8 (b), and Fig. 8 (g) and Fig.
8 (h). Similarly, the RT-DFT hashing cannot correctly
identify the image pairs of Fig. 8 (a) and Fig. 8 (b), and
Fig. 8 (c) and Fig. 8 (d). However, the SVD-SVD hashing
can successfully detect all tampered images. For the
NMF-NMF-SQ hashing, it only finds a pair of the original
and tampered images. The invariant moments based
hashing is insensitive to content changes and therefore
cannot discover any tampered image. For the GF-LVQ
hashing, it can successfully detect the image pairs of Fig.
8 (a) and Fig. 8 (b), and Fig. 8 (c) and Fig. 8 (d), but fails to
identify the image pairs of Fig. 8 (e) and Fig. 8 (f), and Fig.
8 (g) and Fig. 8 (h). In Table 7, the texts in bold indicate
those results which cannot be correctly detected by the
correponding algorithms.
Moreover, we compared the run time of the seven algorithms. To this end, we recorded the total time of the respective discrimination test to find average time of hash
generation. The average time of the RASH method, the
RT-DFT hashing, the SVD-SVD hashing, the NMF-NMFSQ hashing, the invariant moments based hashing, the
GF-LVQ hashing and the proposed hashing are 9.6, 12.8,
1.5, 2.4, 2.6, 0.67 and 2.8 seconds, respectively. Table 8
summarizes the performance comparisons among different algorithms. We find that the proposed hashing is better than the compared algorithms in classification performances between perceptual robustness and discrimination. The proposed hashing and the SVD-SVD hashing
are both sensitive to visual content changes, showing better performances than other algorithms. As to hash length
and time complexity of hash generation, the proposed
hashing has moderate performances, and the best algorithm is the GF-LVQ hashing.
TABLE 7
SIMILARITY RESULTS BETWEEN EACH PAIR OF IMAGES IN FIG. 8 UNDER DIFFERENT ALGORITHMS
Similarity result of each pair of images in Fig. 8
(a) and (b)
(c) and (d)
(e) and (f)
(g) and (h)
Algorithm
Threshold
RASH method
0.87
0.914
0.767
0.518
0.921
RT-DFT hashing
0.05
0.047
0.047
0.054
0.053
0.432
0.343
0.315
0.402
SVD-SVD hashing
0.20
NMF-NMF-SQ hashing
2500
1513
2722
1931
2249
Invariant moments based hashing
8.0
4.407
3.857
3.558
7.548
GF-LVQ hashing
0.10
0.125
0.508
0
0
6 CONCLUSIONS
In this paper, we have proposed a robust image hashing
based on ring partition and NMF, which is robust against
image rotation and has a desirable discriminative capability. A key component of our hashing is the construction of
secondary image, which is used for the first time in image
hashing. The secondary image is rotation-invariant and
therefore makes our hashes resistant to rotation. Discriminative capability of the proposed hashing is attributed to
the use of NMF, which is good at learning parts-based
representation. Experiments have been conducted to validate our hashing performances, and showed that our
hashing is robust against content-preserving manipulations, such as image rotation, JPEG compression, water-
Z. TANG ET AL.: ROBUST PERCEPTUAL IMAGE HASHING BASED ON RING PARTITION AND NMF
13
TABLE 8
PERFORMANCE COMPARISONS AMONG DIFFERENT ALGORITHMS
Optimal TPR
when FPR=0
Optimal FPR
when TPR=1
Sensitivity to
content changes
Hash length
Average time
(seconds)
RASH
method
RT-DFT
hashing
SVD-SVD
hashing
NMF-NMFSQ hashing
Invariant moments
based hashing
GF-LVQ
hashing
Proposed
hashing
0.9100
0.8767
0.1259
0.8133
0.9300
0.5167
0.9833
0.2691
0.1454
0.9451
0.5347
0.0311
0.9113
0.00116
Moderate
Moderate
Sensitive
Moderate
Insensitive
Moderate
Sensitive
40 digits
15 digits
1600 digits
64 digits
42 digits
120 bits
64 digits
9.6
12.8
1.5
2.4
2.6
0.67
2.8
mark embedding, Gaussian low-pass filtering, gamma
correction, brightness adjustment, contrast adjustment
and image scaling, and reaches good discrimination. The
ROC curve comparisons have indicated that the proposed
hashing is better than some well-known hashing algorithms in classification performances between the perceptual robustness and the discriminative capability.
Research on image hashing is still under way. Future
works are mainly focused on the use of the rotationinvariant secondary image in combination with other
techniques. Issues being considered include the capability
of tamper detection in small region, tamper localization,
efficient color feature extraction, and so on.
[5]
ACKNOWLEDGMENTS
[9]
The authors are grateful for the anonymous reviewers’
insightful comments and valuable suggestions sincerely,
which can substantially improve the quality of this paper.
Many thanks for Dr. Y. Li’s source code.
This work is supported in part by the Australian Research Council (ARC) under large grant DP0985456; the
China 863 Program under grant 2012AA011005; the China
973 Program under grant 2013CB329404; the Natural Science Foundation of China under grants 61165009,
60963008, 61170131; Guangxi “Bagui Scholar” Teams for
Innovation and Research; the Guangxi Natural Science
Foundation
under
grants
2012GXNSFGA060004,
2012GXNSFBA053166, 2011GXNSFD018026, 0832104; the
Project of the Education Administration of Guangxi under grant 200911MS55; and the Scientific Research and
Technological Development Program of Guangxi under
grant 10123005–8.
[6]
[7]
[8]
[10]
[11]
[12]
[13]
[14]
[15]
References
[1]
[2]
[3]
[4]
M. Slaney and M. Casey, "Locality-sensitive hashing for finding nearest
neighbors," IEEE Signal Processing Magazine, vol.25, no.2, pp.128–131,
2008.
M. N. Wu, C. C. Lin and C. C. Chang, "Novel image copy detection
with rotating tolerance," Journal of Systems and Software, vol.80, no.7,
pp.1057–1069, 2007.
S. Wang and X. Zhang, "Recent development of perceptual image
hashing," Journal of Shanghai University (English Edition), vol.11, no.4,
pp.323331, 2007.
F. Ahmed, M.Y. Siyal and V.U. Abbas, "A secure and robust hashbased scheme for image authentication," Signal Processing, vol.90, no.5,
pp.1456–1470, 2010.
[16]
[17]
[18]
C. Qin, C. C. Chang and P. Y. Chen, "Self-embedding fragile
watermarking with restoration capability based on adaptive bit
allocation mechanism," Signal Processing, vol.92, no.4, pp.1137–
1150, 2012.
C. S. Lu, C. Y. Hsu, S. W. Sun and P. C. Chang, "Robust meshbased hashing for copy detection and tracing of images," Proc.
of IEEE International Conference on Multimedia and Expo, vol.1,
pp.731−734, 2004.
Z. Tang, S. Wang, X. Zhang, W. Wei and S. Su, "Robust image hashing
for tamper detection using non-negative matrix factorization," Journal of
Ubiquitous Convergence and Technology, vol.2, no.1, pp.18–26, 2008.
E. Hassan, S. Chaudhury and M. Gopal, "Feature combination in kernel
space for distance based image hashing," IEEE Transactions on Multimedia, vol.14, no.4, pp.11791195, 2012.
W. Lu and M. Wu, "Multimedia forensic hash based on visual
words," Proc. of IEEE International Conference on Image Processing,
pp.989–992, 2010.
X. Lv and Z. J. Wang, "Reduced-reference image quality assessment based on perceptual image hashing," Proc. of IEEE International Conference on Image Processing, pp.4361–4364, 2009.
R. Venkatesan, S.-M. Koon, M. H. Jakubowski and P. Moulin, "Robust
image hashing," Proc. of IEEE International Conference on Image Processing,
pp.664666, 2000.
J. Fridrich and M. Goljan, "Robust hash functions for digital watermarking," Proc. of IEEE International Conference on Information Technology: Coding and Computing, pp.178183, 2000.
C. Y. Lin and S. F. Chang, "A robust image authentication system distinguishing JPEG compression from malicious manipulation," IEEE Transactions on Circuits System and Video Technology, vol.11, no.2, pp.153168, 2001.
D. Wu, X. Zhou and X. Niu, "A novel image hash algorithm resistant to
print–scan," Signal Processing, vol. 89, no.12, pp.2415–2424, 2009.
Z. Tang, S. Wang, X. Zhang, W. Wei and Y. Zhao, "Lexicographical
framework for image hashing with implementation based on DCT and
NMF," Multimedia Tools and Applications, vol.52, no.2–3, pp.325345,
2011.
F. Lefebvre, B. Macq and J.-D. Legat, "RASH: Radon soft hash
algorithm," Proc. of European Signal Processing Conference,
pp.299–302, 2002.
S. S. Kozat, R. Venkatesan and M. K. Mihcak, "Robust perceptual image hashing via matrix invariants," Proc. of IEEE International Conference on Image Processing, pp.3443–3446, 2004.
C. D. Roover, C. D. Vleeschouwer, F. Lefebvre and B. Macq,
"Robust video hashing based on radial projections of key
frames," IEEE Transactions on Signal Processing, vol.53, no.10,
pp.40204036, 2005.
14
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE-2012-09-0622
[19] A. Swaminathan, Y. Mao and M. Wu, "Robust and secure image hashing," IEEE Transactions on Information Forensics and Security, vol.1, no.2,
pp.215230, 2006.
[20] V. Monga and M. K. Mihcak, "Robust and secure image hashing
via non-negative matrix factorizations," IEEE Transactions on Information Forensics and Security, vol.2, no.3, pp.376390, 2007.
[21] Y. Lei, Y. Wang and J. Huang, "Robust image hash in Radon
transform domain for authentication," Signal Processing: Image
Communication, vol.26, no.6, pp.280–288, 2011.
[22] USC-SIPI Image Database. Available: http://sipi.usc.edu/database/,
February 7, 2007.
[23] Ground
Truth
Database.
Available:
http://www.cs.washington.edu/research/imagedatabase/groundtru
th/, Accessed 8 May 2008.
[24] M. Schneider and S. F. Chang, "A robust content based digital signature
for image authentication," Proc. of IEEE International Conference on Image
Processing, vol.3, pp.227−230, 1996.
[25] V. Monga and B. L. Evans, "Perceptual image hashing via feature points: performance evaluation and tradeoffs," IEEE Transactions on Image Processing, vol.15, no.11, pp.3452–3465, 2006.
[26] V. Monga, A. Banerjee and B. L. Evans, "A clustering based
approach to perceptual image hashing," IEEE Transactions on Information Forensics and Security, vol.1, no.1, pp.68–79, 2006.
[27] J. S. Seo, J. Haitsma, T. Kalker and C. D. Yoo, "A robust image fingerprinting system using the Radon transform," Signal Processing: Image
Communication, vol.19, no.4, pp.325339, 2004.
[28] Y. Mao and M. Wu, "Unicity distance of robust image hashing,"
IEEE Transactions on Information Forensics and Security, vol.2,
no.3, pp. 462467, 2007.
[29] F. Khelifi and J. Jiang, "Perceptual image hashing based on
virtual watermark detection," IEEE Transactions on Image Processing, vol.19, no.4, pp.981–994, 2010.
[30] Z. Tang, S. Wang, X. Zhang and W. Wei, "Structural feature-based
image hashing and similarity metric for tampering detection," Fundamenta Informaticae, vol.106, no.1, pp.7591, 2011.
[31] X. Lv and Z. J. Wang, "Perceptual image hashing based on
shape contexts and local feature points," IEEE Transactions on
Information Forensics and Security, vol.7, no.3, pp.10811093,
2012.
[32] Z. Tang, Y. Dai and X. Zhang, "Perceptual hashing for color images
using invariant moments," Applied Mathematics & Information Sciences,
vol.6, no.2S, pp.643650, 2012.
[33] M. K. Hu, "Visual pattern recognition by moment invariants,"
IRE Transaction on Information Theory, vol.8, no.2, pp.179187,
1962.
[34] Y. Li, Z. Lu, C. Zhu, and X. Niu, "Robust image hashing based
on random Gabor filtering and dithered lattice vector quantization," IEEE Transactions on Image Processing, vol.21, no.4,
pp.19631980, 2012.
[35] Z. Tang, Y. Dai, X. Zhang, and S. Zhang, "Perceptual image
hashing with histogram of color vector angles," The 8th International Conference on Active Media Technology (AMT 2012), Lecture
Notes in Computer Science, vol.7669, pp.237246, 2012.
[36] Y. Zhao, S. Wang, X. Zhang, and H. Yao, "Robust hashing for
image authentication using Zernike moments and local features," IEEE Transactions on Information Forensics and Security, vol.8,
no.1, pp.55-63, 2013.
[37] D. D. Lee and H. S. Seung, "Learning the parts of objects by
non-negative matrix factorization," Nature, vol.401, pp.788791,
1999.
[38] J. Pan and J. Zhang, "Large margin based nonnegative matrix
factorization and partial least squares regression for face recognition," Pattern Recognition Letters, vol.32, no.14, pp.18221835,
2011.
[39] I. Buciu and I. Pitas, "NMF, LNMF, and DNMF modeling of neural
receptive fields involved in human facial expression perception," Journal
of Visual Communication and Image Representation, vol.17, no.5,
pp.958969, 2006.
[40] R. Sandler and M. Lindenbaum, "Nonnegative matrix factorization with earth mover’s distance metric for image analysis,"
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol.33, no.8, pp.15901602, 2011.
[41] S. Xie, Z. Yang and Y. Fu, "Nonnegative matrix factorization applied to
nonlinear speech and image cryptosystems," IEEE Transactions on Circuits and Systems I: Regular Papers, vol.55, no.8, pp.23562367, 2008.
[42] Y. Chen, L. Wang and M. Dong, "Non-negative matrix factorization for
semisupervised heterogeneous data coclustering," IEEE Transactions on
Knowledge and Data Engineering, vol.22, no.10, pp.14591474, 2010.
[43] D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix
factorization," Advances in Neural Information Processing Systems,
vol.13, pp.556562, 2000.
[44] I. Kotsia, S. Zafeiriou and I. Pitas, "A novel discriminant nonnegative matrix factorization algorithm with applications to facial image characterization problems," IEEE Transactions on Information Forensics and Security, vol.2, no.3, pp.588595, 2007.
[45] F. A. P. Petitcolas, "Watermarking schemes evaluation," IEEE
Signal Processing Magazine, vol.17, no.5, pp.58–64, 2000.
[46] T. Fawcett, "An introduction to ROC analysis," Pattern Recognition Letters, vol.27, no.8, pp.861874, 2006.
Zhenjun Tang received the B.S. and M.Eng. degrees from Guangxi
Normal University, Guilin, P.R. China, in 2003 and 2006, respectively, and the PhD degree from Shanghai University, Shanghai, P.R.
China, in 2010. He is now an associate professor with the Department of Computer Science, Guangxi Normal University. His research
interests include image processing and multimedia security. He has
contributed more than 20 international papers. He holds two China
patents. He is a reviewer of several international journals such as
IEEE Transactions on Information Forensics and Security, Signal
Processing, IET Image Processing, Multimedia Tools and Applications and Imaging Science Journal.
Xianquan Zhang received the M.Eng. degree from Chongqing University, Chongqing, P.R. China. He is currently a Professor with the
Department of Computer Science, Guangxi Normal University. His
research interests include image processing and computer graphics.
He has contributed more than 40 papers.
Shichao Zhang received the PhD degree in computer science at the
Deakin University, Australia. He is currently a China 1000-Plan distinguished professor with the Department of Computer Science,
Guangxi Normal University, China. His research interests include
information quality and pattern discovery. He has published about 60
international journal papers and more than 60 international conference papers. He has won more than 10 nation class grants. He
served/is serving as an associate editor for the IEEE Transactions on
Knowledge and Data Engineering, Knowledge and Information Systems, and the IEEE Intelligent Informatics Bulletin. He is a senior
member of the IEEE and the IEEE Computer Society and a member
of the ACM.
Download