A statistical feature based approach to distinguish PRCG from

Computer Vision and Image Understanding 128 (2014) 84–93
Contents lists available at ScienceDirect
Computer Vision and Image Understanding
journal homepage: www.elsevier.com/locate/cviu
A statistical feature based approach to distinguish PRCG from
photographs q
Xiaofeng Wang a,⇑, Yong Liu a, Bingchao Xu a, Lu Li b, Jianru Xue b
a
b
School of Science and Shaanxi Key Laboratory for Network Computing and Security Technology, Xi’an University of Technology, Xi’an, Shaanxi 710048, PR China
Institute of Artificial Intelligence and Robotics at Xi’an Jiaotong University, Xi’an, Shaanxi 710049, PR China
a r t i c l e
i n f o
Article history:
Received 24 June 2013
Accepted 15 July 2014
Available online 24 July 2014
Keywords:
Distinguishing PRCG from photographs
Contourlet transformation
Homomorphic filtering
Detection accuracy
Robustness
Generalization capability
a b s t r a c t
We present a passive forensics method to distinguish photorealistic computer graphics (PRCG) from natural images (photographs). The goals of our work are to improve the detection accuracy and the robustness to content-preserving image manipulations. In the proposed method, Homomorphic filtering is used
to highlight the detail information of image. We find that the texture changes are different between
photographs and PRCG images under same Homomorphic filtering transformation, and then we use
the difference matrixes to describe the differences of texture changes. We define a customized statistical
feature, named texture similarity, and combine it with the statistical features extracted from the cooccurrence matrixes of differential matrixes to construct forensics features. Then we develop a statistical
model and use SVM as classifier to distinguish PRCG from photographs. Experimental results show that
the proposed method enjoys following advantages: (1) Proposed method reaches higher detection accuracy, synchronously, it is robust to tolerate content-preserving manipulations such as JPEG compression,
adding noise, histogram equalization, and filtering. (2) Proposed method is provided with satisfactory
generalization capability, it will be available when the training samples and the testing samples come
from different sources.
Ó 2014 Elsevier Inc. All rights reserved.
1. Introduction
With the rapid growth and widespread using of digital cameras
and mobile phones, photos and pictures can be obtained when and
where. At the same time, rapid development of computer vision
and computer graphic technologies, make it is more convenient
to generate the photorealistic computer graphics (PRCG). These
PRCG are so realistic that it is very difficult to differentiate them
from photographs and pictures by human eyes. See Fig. 1. The
old adage of ‘‘seeing is believing’’ is no longer true. This trend
has brought new issues and challenges for us concerning the
authenticity of digital images. Therefore, identifying PRCG from
photographs has become an important topic in the field of media
application.
Photographs are those pictures that are captured from realworld by imaging device such as camera. PRCG are scene images
that does not exist in reality; it is generated only by computers
q
This work was supported by the National Basic Research Program of China (973
Program) under Grant No. 2012CB316400; the National Natural Science Foundation
of China under Grant No. 61075007 and No. 61273252.
⇑ Corresponding author.
E-mail address: xfwang66@sina.com.cn (X. Wang).
http://dx.doi.org/10.1016/j.cviu.2014.07.007
1077-3142/Ó 2014 Elsevier Inc. All rights reserved.
and graphic software through the following process: first, a 3D
polygon model is built to simulate a desire shape, then color, texture and simulate light irradiation are given to this model. At last,
PRCG is imaged using a virtual camera. Recent advances in computer graphics are bring an amazing result, that is, it is possible
to create models that are physically accurate without visual artifacts and lighting inconsistencies. Therefore, distinguishing PRCG
from photographs is a very challenging problem.
There are many established methodologies to distinguish PRCG
from photographs in recent years. The majority of existing methods in literature are based on statistical learning model, in which,
the features of inconsistency between photographs and PRCG are
used for classification. Therefore, the main difference of existing
approaches lies on the features extraction, and features extraction
methods can be classified into two categories, the first is based on
transform domain and the second is based on the physical characteristics of the imaging equipment.
Taking a general survey on existing methods, there are some
problems such as lower detection precision, weaker robustness
and unsatisfactory stability. In our work, we aim at improving
the detection accuracy, robustness, and generalization capability,
and seek for new features that are able to distinguish photographs
and PRCG. We find that the texture changes are different between
X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93
85
Fig. 1. The pictures in top row are PRCG, and pictures in bottom are digital photos.
photographs and computer-generated images under same Homomorphic filtering transformation, so we use Homomorphic filtering
to highlight the image detail information. Considering that the
Contourlet transformation has multi-direction selectivity and
anisotropy, and can be used to describe the inherent characteristics
of image, we use the statistic features of the Contourlet sub-bands
to construct the distinguishing features. We define a customized
statistical feature, named texture similarity, and combine it with
the statistical features extracted from the co-occurrence matrix
of differential matrixes to construct forensics features. Then we
develop a statistical model, and use the least squares SVM as a classifier to classify PRCG from photographs. Experimental results
show that the proposed method not only possesses satisfactory
detection accuracy, also, it is robust to tolerate content-preserving
manipulations, and provides satisfactory generalization capability
for different image data sets.
The rest of this paper is organized as follows. In Section 2, we
introduce the relational works. The proposed method is described
in Section 3. Section 4 contains experimental results. Finally, a
short summary are presented in Section 5.
2. Related works
The earliest approaches to distinguish photographs and PRCG
were proposed by Ng and Chang [1] and Lyu and Farid [2], respectively. Work [1] presented a classification approach that used the
natural image statistics features composed of 33-dimensional
(33-D) power spectrum features, 24-D local patch features and
72-D wavelet higher-order statistics. Subsequently, a geometrybased approach [3] was proposed by the same authors to model
the physical differences between PRCG and photographs, in which,
192-D features were extracted through analyzing local patch statistics, local fractal dimension, surface gradient, quadratic geometry, and Beltrami flow. Lyu and Farid [2] developed a statistical
model based on a wavelet-like decomposition, in which, 216-D features were generate by using the first 4-order statistics (mean, variance, skewness, and kurtosis) of the sub-band coefficients and the
inter-sub-band prediction errors. Another wavelet transform based
method was proposed by Wang and Moulin [4] where 144-D features were extracted from the wavelet coefficient histogram. The
similar method was presented by Chen et al. [5], in which, 234-D
features were generated by using the statistical moments of image
characteristic functions and the wavelet sub-band coefficients in
HSV color space. Cui et al. [6] used a similar method, in which,
three frequency-domain moments of the wavelet coefficient histogram were extracted, and obtained 78-D features were used as
classification features. The detection rate of this method is 94%.
Zheng and Ping [7] used 81-D features generated by adjacent pixels
coherence histogram and 120-D co-occurrence matrix features
extracted from HSV color space as distinguishing features to identify PRCG from photographs. In Sankar et al. [8], authors proposed
an approach that based on an aggregate of existing features and
texture interpolation features to distinguish PRCG from real
images. In their work, 80-D features were generated, in which,
44-D came from the moment-based method, 24-D came from the
texture interpolation method, 6-D came from color histogram,
and 6-D came from patch statistics. Li et al. [9] proposed a distinguishing method, in which, the 2-order difference statistics of the
images and the first 4-order statistics of its predicting error images
were extracted to form 144-D distinguishing features.
Since the majority of real images are obtained by imaging
equipments, some physical characteristics of imaging equipments
are always remained in imaging process, and these characteristics
can be used to distinguish digital photographs from PRCG. Dehnie
et al. [10] proposed an approach that used different pattern noise
between photographs and PRCG as distinguishing features. Dirik
et al. [11] identified photographic images by detecting Bayer color
filter array (CFA) and the traces of chromatic aberration. Considering that most digital cameras employ the image sensor with a color
filter array, the process of demosaicing interpolates the raw image
to produce at each pixel an estimate for each color channel, Gallagher and Chen presented method [12] to distinguish PRCG from
photographs by detecting the interpolation traces caused by the
Bayer color filter array. This algorithm reaches the detection accuracy of 98.4% in Columbia open data set. It is almost the highest
among similar algorithms in the same dataset. But the algorithm
is less robust to JPEG compression and noise because such manipulations may destroy the interpolation traces caused by the Bayer
color filter array. The other high accuracy method was proposed by
Li et al. [13], it was reported that the detection accuracy can reach
as high as 98.3% for distinguishing Computer Graphics (CG) from
photographic images in their own data set. In their data set, computer graphics are composed of [14,15], and photographic images
are collected by authors. Zhang et al. [16] proposed a method
that distinguishes PRCG from photographic images using visual
86
X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93
vocabulary on local image edges. They preprocessed image edge
patches and projected it into a 7-D sphere, and then a visual vocabulary was constructed via determining the key sampling points in
accordance with Voronoi cells. Recently, Farid and Bravo [17],
Dang-Nguyen et al. [18,19] presented methods that can discriminate computer generated and natural human faces. Literature
[20] presented more technical approaches, system design, and
other practical issues in tackling the distinction between photographic images and PRCG.
3. Proposed method
Although PRCG are so realistic that it is very difficult to be differentiated from photographs by human’s eyes, but the imaging
mechanism and imaging process are completely dissimilar. The
imaging process of the photographs is very complex since it is
inevitably influenced by both natural environment and acquisition
equipment, while the process of generating a PRCG only primitively simulates the imaging process of photographs. These differences will lead to different characteristics in many aspects,
especially, in texture detail. Generally speaking, the texture details
of photographs appear to smooth while those of PRCG appear to
quavering [9]. Additional, photographs are always influenced by
natural environment. If illumination is uneven, the average brightness of the various parts on image would shudder, and image detail
structure that corresponding to dark regions would difficult to be
recognized. All these differences can be used to distinguish PRCG
from photographs. To this end, we keep finding the ways to capture
and describe the essential differences between photographs and
PRCG.
In digital image processing, the approach of Homomorphic filtering is derived from an illumination reflectance model of the
image, and it can be used to perform simultaneous dynamic range
compression and contrast enhancement to image. For photographs,
(a) The original photograph,
(c) The original PRCG image
after Homomorphic filtering processing, the intensity variation
would be reduced while the details would be highlighted. However, such a variation is not obvious for PRCG after the same Homomorphic filtering processing. The reason is that the natural texture
expression is different from that of the PRCG. See Fig. 2. In Fig. 2(a)
and (b) are photographs, and (c) and (d) are PRCG. We can see that
after Homomorphic filtering, the texture details of photographs in
shaded areas are apparent, but the PRCG has not such a change. For
this reason, we can use Homomorphic filter to capture the detail
difference between photographs and PRCG.
Additional, Contourlet transformation [21,22] is able to capture
a sparse expansion for typical images that are piecewise smooth
away from smooth contours. The main difference between Contourlet transform and wavelet transform is that the latter cannot
capture the geometrical smoothness of the contours. Motivated
by this fact, the statistics of the Contourlet sub-bands are used to
construct distinguishing features. The least squares SVM are used
as classifier for classifying. Our method can be divided into 2
stages: feature extraction and classification. The framework is
shown in Fig. 3.
3.1. Feature extraction
For N N image I, taking 2-level Contourlet decomposition, the
image is decomposed into low-frequency sub-band and band-pass
directional sub-bands. See Fig. 4.
h
We denote the coefficient matrixes as Ml M m
s , and M t , respectively, and corresponding sub-band images are noted in Fig. 4.
2
ml11
6 l
6 m21
Ml ¼ 6
6 ...
4
mli1
ml12
ml22
...
mli2
. . . ml1j
3
7
. . . ml2j 7
7
... ... 7
5
. . . mlij
(b) Processed photograph by
Homomorphic filtering
(d) Processed PRCG by
Homomorphic filtering
Fig. 2. The visual effects of photograph and PRCG before and after Homomorphic filtering.
ð1Þ
X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93
87
Texture similarity
Difference matrix
Feature extraction
Filtered image I’
Homomorphic filtering
Original image I
Contourlet decomposition
Coefficient matrix of I
Contourlet decomposition
Coefficient matrix of I’
Difference matrix
Gray Level Co-occurrence Matrix
Energy, Uniformity, Contrast, Mean
70-D feature vectors
Classification
Classifier training
LS-SVM
Classification
Fig. 3. The framework of the proposed method.
Ml
M sm
M th
s=1, 2, 3,4
t=1, 2, 3,…,8
k=1,2,…,13
Fig. 4. An example of Contourlet transformation, tested image is decomposed into two LP levels, which are then decomposed into four and eight directional sub-bands, the
tested image comes from the Columbia open data set.
2
mm
11
mm
12
6 mm
6 21
Mm
s ¼ 6
4 ...
mm
22
...
...
mm
p1
mm
p2
...
2
mh11
6 h
6 m21
Mht ¼ 6
6 ...
4
mhx1
mh12
mh22
...
mhi2
. . . mm
1q
2
3
7
. . . mm
2q 7
7;
... 5
where s ¼ 1; 2; 3; 4
ð2Þ
mm
pq
. . . mh1y
a12
6a
6 21
I¼6
4 ...
aN1
2
3
7
. . . mh2y 7
7;
... ... 7
5
. . . mhxy
a11
where t ¼ 1; 2; 3; . . . ; 8
ð3Þ
Let Hom() represents a convolution Homomorphic filtering
transformation, and let I0 = Hom(I). I and I0 can be denoted as:
a011
6 a0
6
I0 ¼ 6 21
4 ...
a0N1
3
...
a1N
a22
...
...
...
a2N 7
7
7
... 5
aN2
. . . aNN
a012
...
a022
...
...
...
a02N 7
7
7
... 5
...
a0NN
a0N2
a01N
ð4Þ
3
ð5Þ
We take same 2-level Contourlet decomposition for the image
I0 , and let M0 1, M 0m
and M 0h
s
t represent the coefficient matrixes,
respectively.
88
X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93
2
m0l11
6 0l
6 m21
M 0l ¼ 6
6 ...
4
m0li1
2
M 0m
s
m0l12
. . . m0l1j
m0l22
...
m0l2j
...
...
m0li2
...
3
7
7
7
... 7
5
m0lij
ð6Þ
m0m
12
. . . m0m
1q
6 m0m
6 21
¼6
4 ...
m0m
22
...
7
. . . m0m
2q 7
7;
... ... 5
m0m
p1
m0m
p2
. . . m0m
pq
m0h
11
m0h
12
. . . m0h
1y
6 0h
6 m21
6
M 0h
t ¼ 6
4 ...
m0h
x1
...
m0h
x2
1
where s ¼ 1; 2; 3; 4
fk ¼
ð7Þ
3
2
where t ¼ 1; 2; 3; . . . ; 8
fk ¼
ð8Þ
where s ¼ 1; 2; 3; 4
where t ¼ 1; 2; 3; . . . ; 8
0; otherwise
g k ð1; 1Þ . . . g k ð1; jÞ . . . g k ð1; LÞ
6
6 ...
6
Gk ¼ 6
6 g k ði; 1Þ
6
4 ...
g k ðL; 1Þ
Rkj ¼
ð13Þ
...
...
...
...
g k ði; jÞ
...
...
...
...
...
g k ðL; jÞ
...
3
7
7
7
g k ði; LÞ 7
7;
7
... 5
g k ðL; LÞ
ð14Þ
When k = 0, Gk represents the GLCM of the difference matrices
DI = I I0 .
Co-occurrence matrices capture the features of the texture but
they are not directly useful for further analysis, such as the
i¼0 g k ði; jÞ
L
j ¼ 1; 2; 3; . . . ; L
;
ð18Þ
ð19Þ
(5) Texture similarity.
In order to quantitatively measure the texture change before
and after Homomorphic filtering, we define a term named ‘‘Texture
similarity’’. Then we compute the texture similarity of difference
h
matrices DI, Dl, Dm
s and Dt . For simplicity, we denote the difference
matrices uniformly over them, and denote as Dk, k = 0, 1, 2, 3, . . . ,
13. Then we estimate the texture similarity for each difference
matrix, respectively.
For the difference matrices Dk (k = 0, 1, 2, 3, . . . , 13):
2
k
d11
6
6 ...
6 k
Dk ¼ 6
6 di1
6
4 ...
k
Here, Akj ¼
k
...
d1j
...
...
...
...
...
k
dij
...
...
...
...
k
dN1
. . . dNj
n
k
d1N
3
7
... 7 h
7
k 7
T
diN 7 ¼ Ak1
7
... 5
. . . ATkj
. . . ATkN
i
k
. . . dNN
k
k
k
d1j ; d2j ; . . . ; dNj
o
.
ð20Þ
We define an equivalence relation R on the set: Akj
k
dij R
k
k
k
k
k
di0 j () dij ¼ di0 j ^ dij ; di0 j 2 Akj
k
ð21Þ
k
It means: dij and di0 j satisfy the equivalence relation R, if and only if
the following two conditions must be satisfied:
k
k ¼ 0; 1; 2; 3; . . . ; 13
PL
j
k
dij ¼ di0j
...
ð17Þ
4
fk ¼ max Rkj
ð12Þ
N X
M X
1; if Iðp; qÞ ¼ i and Iðp þ Dx; q þ DyÞ ¼ j
L X
L
X
g k ði; jÞ
1
þ ji jj
i¼1 j¼1
(4) The maximum of column mean.
ð11Þ
where i and j are the image intensity values of the image I, p and q
are the spatial positions in image I and the offset (Dx, Dy) depends
on the direction at which the matrix is computed.
In our scheme, the parameters of gray-level co-occurrence
matrix are chosen as follows. Assuming the gray-levels of the gray
image is from 0 to L 1, the gray-level co-occurrence matrix offset
parameter Dx is set to 1, and Dy is set to 0, namely, h = 0, this
means that only the nearest neighborhoods in horizontal direction
are considered for gray-level co-occurrence matrix calculation. For
h
13 difference matrices DI, Dl, Dm
s (s = 1, 2, 3, 4) and Dt (t = 1, 2, 3, . . . ,
8), their co-occurrence matrixes Gk (k = 0, 1, 2, . . . , 13) can be
denoted as follows, respectively.
2
3
fk ¼
ð10Þ
Considering that the Gray Level Co-occurrence Matrix (GLCM) is
referred to as a co-occurrence distribution of the image pixels, it is
defined over an image to be the distribution of co-occurring pixel
values at a given offset. Based on the behavior of the GLCM, it
can be used to capture the properties of image texture.
Mathematically, a co-occurrence matrix is defined to be a
square matrix G of size L, where, L be the total number of gray levels in an N M gray image I. The (i, j)-th entry of G is parameterized by an offset (Dx, Dy), as:
p¼1 q¼1
ð16Þ
i¼1 j¼1
ð9Þ
Dl ¼ M l M 0l
gði; jÞ ¼
L X
L
X
2
ði jÞ g k ði; jÞ
(3) Homogeneity. It measures the tightness of elements distribution to diagonal in GLCM.
DI ¼ I I0
Dht ¼ M ht M 0h
t ;
ð15Þ
i¼1 j¼1
In order to represent the difference of the texture detail
between photographs and PRCG, we define 4 categories difference
h
matrices DI, Dl, Dm
s and Dt :
m
0m
Dm
s ¼ Ms Ms ;
L X
L
X
2
ðg k ði; jÞÞ
(2) Contrast. It is a measure of the intensity contrast between a
pixel and its neighbor pixels over the whole image.
7
7
. . . m0h
2y 7
;
... ... 7
5
. . . m0h
xy
m0h
22
(1) Energy. It is the quadratic sum of elements in GLCM.
3
m0m
11
2
comparison of two image textures. In order to represent the
texture details more compactly, we estimate the following five
kinds of numeric features (statistics) of each above mentioned
co-occurrence matrix, respectively.
k
k
and dij ; di0 j 2 Akj
Symbol ‘‘ () ’’ represent ‘‘if and only if’’ or ‘‘be equivalent to’’, ‘‘^’’
represent ‘‘and’’.
Then the equivalence class of each element in Akj can be
denoted as:
8dkij 2 Akj ;
h i
n
o
k
k
k
k
dij ¼
di0 j 2 Akj ^ dij R di0 j
R
h i k Let Ekj ¼ arg max dij i
R
ð22Þ
ð23Þ
89
X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93
Here, |||| denote the operator to compute the cardinality of a set.
N
X
5
Let fk ¼
Ekj
ð24Þ
j¼1
5
We named f k as ‘‘Texture similarity’’.
Then 70-dimension eigenvectors are defined as follows:
n
o
1 2 3 4 5 1 2 3 4 5
1
2
3
4
5
V ¼ f0 ; f0 ; f0 ; f0 ; f0 ; f1 ; f1 ; f 1 ; f1 ; f1 ; . . . ; f 13 ; f13 ; f13 ; f 13 ; f13
ð25Þ
3.2. Classification
Support Vector Machine (SVM) [23] is a state-of-the-art classification method that based on the statistical learning theory, and
has been applied successfully in text classification, image recognition and biological information processing. In our work, we take
feature vector V as distinguishing features, the LS-SVM classifier
is used to distinguish photograph and PRCG, and radial basis function (RBF) is used as the kernel function.
To train classifier, we labeled the PRCG as ‘‘1’’ and the photograph as ‘‘1’’, used the feature vector V as classification feature,
then took all of them as the input of the LS-SVM to obtain classifier
through learning, this process can be described in Fig. 5.
4. Experimental results and analysis
In this section, we use experimental results to demonstrate the
performance of the proposed method. Our algorithm is implemented and tested using MATLAB2010a, and our experiment is
performed on the computer with Processor Pentium(R) Dual-Core
CPU, E5400 @ 2.70 GHz, 2.00 GB RAM.
4.1. Detection Accuracy and Performance Analysis
The goal of this experiment is to test the detection accuracy and
analyze the performance of the proposed algorithm. To this end,
we present the formalization definitions of true negative rate and
true positive rate as follows:
P TN ¼ True negative rate
¼
The number of natural image ðCGÞ is detected as natural image ðCGÞ
kTesting sample setk
100%
ð26Þ
P TP ¼ True positive rate
¼
The number of natural image ðCGÞ is detected as CG ðnatural imageÞ
kTesting sample setk
100%
ð27Þ
Here, |||| denote an operator to compute the cardinality of a set.
The true negative rate represents the detection accuracy of the
algorithm, it means the ratio that photographs are detected as photographs, and PRCG are detected as PRCG. In order to test this
index, we performed our experiments on Columbia University
Image Database [24]. Tested images contained a variety of images
including various brightness, textures and fine details. We take
randomly 400 photographs and 400 PRCG as training samples,
and take randomly other 100 photographs and 100 PRCG as testing
samples, use LS-SVM as classifier.
Let PTN(PG/CG) represents the detection rate that testing samples
are photographs and PRCG, PTN(PG) represents the detection rate
that testing samples are photographs, PTN(CG) represents the detection rate that testing samples are PRCG.
Table 1 presents 10 detection results and comparing results
with [6,12], in each testing, the training samples and the testing
samples are randomly selected. As can be seen from Table 1, the
true negative rate of proposed method is satisfactory whatever
the testing samples contain either photographs or PRCG images.
As can be seen from Table 1, proposed method is superior to the
method [6] in term of detection rate, and superior to the method
[12] when testing samples only contain photographs, testing samples contain photographs and PRCG, and less than the method [12]
when testing samples only contain PRCG.
In order to confirm the availability of the proposed method, we
tested 20 pair true negative rate and true positive rate of the proposed method, as well as [6,12]. In experiment, we generated training samples by taking randomly 400 photographs and 400 PRCG
images from Columbia Image Database, and generated testing
samples by taking other 10n photographs and 10n PRCG images,
respectively, here, n = 1, . . . , 20, taking LS-SVM as classifier to get
PTN and PTP. The results are shown in Table 2.
Literature [20] lists the classification performance of the various
methods on different data sets. We compare some results in
Columbia open data set, the results are listed in Table 3.
There are other methods such as [18,19] are provided with high
classification accuracy, but they are distinguishing between computer generated and natural faces based on facial expressions.
Due to aim at face image, it is not appropriate to test the accuracy
of [18,19] in our data set. Work [13] was reported that the detection accuracy can reach as high as 98.3% for distinguishing Computer Graphics (CG) from photographic images in their own data
set. In their data set, computer graphics are composed of [14,15],
and photographic images are collected by authors. However, in
our experiments, the highest achievable accuracy is 87.55% by performing their code in our data sets. We further investigated the
feature used in [13]. Work [13] is a method that uses the statistical
difference of uniform gray-scale invariant local binary patterns
(LBP) to distinguish PRCG from photographs. In [13], the original
JPEG coefficients of Y and Cr components, and their prediction
errors were used for LBP calculation, and 236-D LBP features were
obtained and SVM was used for classification. LBP is a kind of
Training
Training sample set
Testing
Feature extraction
Testing sample set
Photographs
Feature vector V
Feature extraction
Learning
Feature vector V
PRCG
Natural image
LS-SVM classifier
LS-SVM classifier
Fig. 5. Training and testing process of LS-SVM classifier.
PRCG
90
X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93
Table 1
Detection rates and comparison results.
Experiment index
1
2
3
4
5
6
7
8
9
10
Proposed method
Cui et al. [6]
Gallagher and Chen [12]
PTN(PG) (%)
PTN(CG) (%)
PTN(PG/CG) (%)
PTN(PG) (%)
PTN(CG) (%)
PTN(PG/CG) (%)
PTN(PG) (%)
PTN(CG) (%)
PTN(PG/CG) (%)
96.5
97.5
98.0
95.5
97.5
96.0
96.5
97.5
95.5
98.0
94.5
95.5
96.5
95.0
97.5
96.5
95.0
96.5
95.0
96.5
97.5
96.5
96.0
96.5
96.0
97.0
95.5
97.5
96.0
98.0
91.0
92.0
91.5
92.5
89.5
86.0
91.5
89.5
88.5
87.0
83.5
84.5
80.0
86.0
85.0
82.0
80.0
85.5
88.5
82.0
84.0
88.5
85.0
88.5
86.0
83.5
83.5
84.0
85.5
84.5
88.0
89.0
86.0
89.5
86.0
85.0
83.0
84.0
83.5
82.5
95.5
97.5
96.5
95.0
98.0
94.5
95.0
94.5
94.0
97.0
90.0
88.5
87.0
86.0
85.5
86.0
86.5
87.0
84.0
91.0
Table 2
PTN and PTP and comparison results.
n
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Testing samples are photographs
Testing samples are PRCG images
Proposed method
Cui et al. [6]
Gallagher and Chen [12]
Proposed method
Cui et al. [6]
Gallagher and Chen [12]
PTN (%)
PTP (%)
PTN (%)
PTP (%)
PTN (%)
PTP (%)
PTN (%)
PTP (%)
PTN (%)
PTP (%)
PTN (%)
PTP (%)
100.00
100.00
100.00
100.00
100.00
98.67
97.14
96.67
97.78
98.00
97.27
97.50
99.23
99.29
98.00
99.38
97.06
98.89
97.89
98.00
0.00
0.00
0.00
0.00
0.00
1.33
2.86
3.33
2.22
2.00
2.73
2.50
0.77
0.71
2.00
0.62
2.94
1.11
2.11
2.00
100.00
95.00
96.67
95.00
92.00
91.60
85.71
86.25
87.78
91.00
88.18
87.50
90.00
90.00
91.33
91.88
90.59
92.22
92.60
92.00
0.00
5.00
3.33
5.00
8.00
8.40
14.92
13.75
12.22
9.00
11.82
12.50
10.00
10.00
8.67
8.12
9.41
7.78
7.40
8.00
100.00
100.00
96.70
97.50
94.00
90.00
98.60
92.50
94.40
97.00
94.50
93.30
93.10
93.60
94.00
92.50
92.90
96.10
94.20
93.50
0.00
0.00
3.30
2.50
6.00
10.00
1.40
7.50
5.60
3.00
5.50
6.70
6.90
6.40
6.00
7.50
7.10
3.90
5.80
6.50
100.00
100.00
100.00
100.00
100.00
100.00
100.00
98.75
98.89
99.00
99.09
99.17
97.00
96.43
96.00
96.63
96.47
96.89
96.32
97.50
0.00
0.00
0.00
0.00
0.00
0.00
0.00
1.25
1.11
1.00
0.91
0.83
3.00
3.57
4.00
3.37
3.53
3.11
3.68
2.50
100.00
100.00
96.67
95.00
90.00
90.50
87.14
85.00
84.44
81.00
90.00
85.83
85.38
82.86
80.67
85.63
84.12
84.44
84.55
84.50
0.00
0.00
3.33
5.00
10.00
9.50
12.86
15.00
15.56
19.00
10.00
14.17
14.62
17.14
19.33
14.37
15.88
15.56
15.45
15.50
100.00
100.00
100.00
100.00
100.00
98.30
97.10
97.50
97.80
98.00
98.10
98.30
97.70
97.10
96.70
96.90
97.60
97.80
97.90
98.00
0.00
0.00
0.00
0.00
0.00
1.70
2.90
2.50
2.20
2.00
1.90
1.70
2.30
2.90
3.30
3.10
2.40
2.20
2.10
2.00
Table 3
Classification accuracy comparison.
Work
The approach of feature extraction
Feature
dimension
Data set
Highest classification accuracy
(%)
Ng et al. [3]
Chen et al. [5]
Cui et al. [6]
Sankar et al. [8]
Gallagher and Chen [12]
Proposed method
Fractal geometry, differential geometry
Statistical moments, wavelet coefficients
Statistical moments, wavelet coefficients
Combining features
Interpolation clues related to the Bayer color filter array
Statistical feature
192
234
78
557
1
70
Columbia open data set
Columbia open data set
Columbia open data set
Columbia open data set
Columbia open data set
Columbia open data set
83.5
82.1
94.0
90.0
98.4
98.0
typical texture feature, it strongly dependents on the image texture
characters and image resolution. Due to work [13] only use LBP
feature directly, it is inevitable that the detection results are closely relate to used images data set. Moreover, the image resolution
used in [13] is restricted from 500 500 to 800 800, while the
image resolutions of Columbia open data set are from 757 568
to 1152 768. All of these are possible reasons to lead to declining
accuracy when perform method [13] in our data set. Such situation
is very common occurrences. E.g., work [25] is reported that the
accuracy reaches 99.33%. However, work [13] reported that the
accuracy decline to 69.94% when it was tested in the data set of
[13]. Keeping steady accuracy in different database is a good performance of the algorithm. We will further discuss in Section 4.3.
To illustrate the superiority of our method in a more convincing
way, we only present the comparison results of our algorithm and
some methods that are using Columbia open data set. We compare
our method with [12] in detail, because method [12] reaches highest accuracy in Columbia open data set.
Receiver Operating Characteristic (ROC) curves is a most common performance evaluation method of the detection algorithms.
However, the recently state-of-the-art [26] reported, in presence
of composite hypotheses (due to unknown photographs and PRCG
images), a ROC curve is not sufficient to sum up the performances
of a detector. The reason is that ROC curves are especially relevant
for testing two simple hypotheses. In the situation of discriminating PRCG images from photographs, strictly speaking, a ROC curve
has to be calculated for each possible photographs and PRCG
images, except when it is theoretically established that the studied
test is independent from the image content. Instead of using ROC
curve, we demonstrate the distribution situation of the true
91
X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93
Fig. 6. The plot and the comparison of the true positive rate. The left shows the true positive rate that testing samples are photographs; the right shows the true positive rate
that testing samples are PRCG images.
Table 4
True negative rates under content-preserving manipulations.
Factor
Salt & pepper noise (VAR = 0.001)
Histogram equalization
Mean filtering (3 3)
JPEG (QF = 30)
JPEG (QF = 60)
JPEG (QF = 90)
PTN(PG/CG) (%)
PTN(PG) (%)
PTN(CG) (%)
Proposed method
90.00
83.50
90.50
81.50
80.50
70.50
85.00
82.00
86.00
87.50
83.00
86.50
86.00
88.50
84.00
94.50
90.50
93.50
PTN(PG/CG) (%)
PTN(PG) (%)
PTN(CG) (%)
Cui et al. [6]
72.50
94.00
64.50
67.50
56.00
53.00
67.50
90.00
55.00
67.00
80.50
50.50
71.50
92.50
63.50
84.50
90.50
86.50
PTN(PG/CG) (%)
PTN(PG) (%)
PTN(CG) (%)
Gallagher and Chen [12]
66.50
88.00
92.50
65.50
88.00
92.00
53.50
37.00
86.50
66.50
40.00
85.50
73.00
41.00
89.50
71.00
46.50
93.00
positive rate, and compare it with some existing works. The results
are shown in Fig. 6.
Fig. 6 shows the distribution and the comparison of true positive rates. We can see that the true positive rates of the proposed
method are all no more than 5%, this means that the proposed
method works well whatever the image number.
4.2. Robustness analysis
These experiments are performed to test the proposed method
is robust for incidental changes caused by content-preserving
manipulations, such as JPEG compression, adding noise, and Histogram equalization. Robustness is critical important because it
determines the practical applicability of the algorithm. As the literature [20] notes ‘‘The quoted performance may not be a sufficient
indicator for the usefulness of the methods when it comes to real
applications.’’ Some methods work well on recognizing the original
and uncompressed photographic images. However, the images
extremely likely are manipulated, and undergo common image
processing such as JPEG compression, adding noise, and so on in
practical application. Therefore, our core work will be developing
a robust distinguishing method that can tolerate common content-preserving manipulations.
The robustness can be expressed by the true negative rate in the
instance that image have undergone content-preserving manipulations. Generally, we say an algorithm is robust to tolerate a certain
kind of manipulation if it holds a satisfactory true negative rate
after that manipulation. To test such a property, we generate training sample set using the similar method like Section 3.1. We generate testing sample sets by adding noise, JPEG compression, and
Histogram equalization, respectively, for 200 randomly selected
images from Columbia University Data Set. The results shown in
Table 4 are actually the average values of the 10 testing, and training samples and testing samples were randomly selected in each
testing.
As can be seen from Table 4, for content-preserving manipulations, our method presents satisfactory detection results. It means
that our method is robust to tolerate content-preserving manipulations. The method proposed by Gallagher and Chen [12] works well
on recognizing the original-size and uncompressed photographic
images. But the algorithm is less robust to JPEG compression and
noise because such manipulations may destroy the interpolation
clues caused by the Bayer color filter array.
4.3. Generalization capability analysis
Generally, different image data sets have different influences to
the detection results of the algorithm. In real world applications,
the data sources are varied. It is impossible to train a classifier that
is available for all images come from Internet. The generalization
capability of the algorithm refers to the availability for different
image databases. High generalization capability means that the
92
X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93
Table 5
Generalization capability for different image databases.
Training samples
Testing samples
PTN (%)
Columbia Image Database
Columbia Image Database
UCID, Columbia Image Database
UCID, Columbia Image Database
UCID, Columbia Image Database
Columbia Image Database
Art-CG gallery database, Columbia Image Database
Art-CG gallery database, UCID
Art-CG gallery database, UCID
UCID (photographs)
UCID, Columbia Image Database
Columbia Image Database (photographs, PRCG images)
Columbia Image Database (photographs)
Columbia Image Database (PRCG images)
Art-CG gallery database
Columbia Image Database (photographs, PRCG images)
Columbia Image Database (photographs, PRCG images)
Columbia Image Database (PRCG images)
95.00
94.00
95.50
91.00
94.00
95.50
94.50
92.50
93.00
true negative rate of algorithm is satisfactory and move gradually
within a small range when the training samples and testing samples come from different sources. In order to test the generalization
capability of the proposed method, our experiments are performed
on image sets from UCID [27], Columbia Image Database [24] and
Art-CG gallery database [28]. Nine experiments are as follows:
(1) Columbia ? UCID: Training samples are generated by taking
randomly 400 photographs and 400 PRCG images from
Columbia Image Database, and testing samples are generated by taking randomly 200 photographs from UCID.
(2) Columbia ? UCID, Columbia: Training samples are generated by taking randomly 400 photographs and 400 PRCG
images from Columbia Image Database, and testing samples
are generated by taking 100 photographs from UCID and
other 100 PRCG images from Columbia Image Database.
(3) Columbia, UCID ? Columbia: Training samples are generated by taking randomly 400 photographs from UCID, and
400 PRCG images from Columbia Image Database, and testing samples are generated by taking 100 photographs and
100 PRCG images from Columbia Image Database.
(4) Columbia, UCID ? Columbia: Training samples are generated by taking randomly 400 photographs from UCID, and
400 PRCG images from Columbia Image Database, and testing samples are generated by taking 200 photographs from
Columbia Image Database.
(5) Columbia, UCID ? Columbia: Training samples are generated by taking randomly 400 photographs from UCID, and
400 PRCG images from Columbia Image Database, and testing samples are generated by taking other 200 PRCG images
from Columbia Image Database.
(6) Columbia ? Art-CG gallery database: Training samples are
generated by taking randomly 400 photographs and 400
PRCG images from Columbia Image Database, and testing
samples are generated by taking randomly 200 PRCG images
from Art-CG gallery database.
(7) Art-CG gallery database, Columbia ? Columbia: Training
samples are generated by taking randomly 400 photographs
from Columbia Image Database and 400 PRCG images from
Art-CG gallery database, and testing samples are generated
by taking randomly 100 photographs and 100 PRCG images
from Columbia Image Database.
(8) Art-CG gallery database, UCID ? Columbia: Training samples are generated by taking randomly 400 photographs
from UCID and 400 CG images from Art-CG gallery database,
and testing samples are generated by taking randomly 100
photographs and 100 PRCG images from Columbia Image
Database.
(9) Art-CG gallery database, UCID ? Columbia: Training samples are generated by taking randomly 400 photographs
from UCID and 400 PRCG images from Art-CG gallery database, and testing samples are generated by taking randomly
200 PRCG images from Columbia Image Database.
Each experiment is repeated 10 times with different random
images. Table 5 shows the statistical average of the true negative
rate.
As can be seen from Table 5, for different data sets, the true negative rates of the proposed algorithm are satisfactory, and it varies
in small range when the training samples and testing samples
come from different sources. This shows that proposed method is
available when the training samples and testing samples come
from different sources. It means that the generalization capability
of the proposed method is satisfactory.
5. Conclusions
Nowadays, computer imagery and computer generated images
touch many aspects of our daily life. Computer graphics are getting
so photorealistic that it has brought new challenges towards the
credibility of digital images. Identifying photorealistic computer
graphics from photographs has become an important topic in the
field of media application.
In this paper, we aimed at improving the detection accuracy,
robustness, and generalization capability, simultaneously, proposed a new approach to identify photorealistic computer graphics
from photographic images. In the proposed method, the Homomorphic filter is used to highlight the image detail information,
the statistical features of the Contourlet sub-bands of the image
are used to construct the distinguishing features, and then the least
squares SVM is used for classification. Experimental results show
that the proposed method reaches high detection accuracy. Also,
the algorithm is robust to tolerate content-preserving image
manipulations, and it has satisfactory generalization capability in
different data sets.
Acknowledgments
We would like to thank Dr. Zhaohong Li for her kindness by providing us with their codes. We also would like to acknowledge the
helpful comments and kindly suggestions provided by anonymous
referees.
References
[1] T. Ng, S. Chang, Classifying photographic and photorealistic computer graphic
images using natural image, Technical Report #220-2006-6, Department of
Electrical Engineering, Columbia University, 2006.
[2] S. Lyu, H. Farid, How realistic is photorealistic?, IEEE Trans Signal Process. 53
(2) (2005) 845–850.
[3] T. Ng, S. Chang, J. Hsu, L. Xie, MP. Tsui, Physics motivated features for
distinguishing photographic images and computer graphics, in: Proceedings of
MULTIMEDIA 05, New York, NY, USA, ACM2005, pp. 239–248.
[4] Y. Wang, P. Moulin, On discrimination between photo realistic and photo
graphic images, in: IEEE International Conference on Acoustics, Speech, and
Signal Processing, Toulouse, France, 14–19 May, 2005, pp. 161–164.
[5] W. Chen, Y.Q. Shi, G.R. Xuan, Identifying computer graphics using HSV color
model and statistical moments of characteristic functions, in: IEEE
International Conference on Multimedia & Expo, Beijing, China, 2–5 July,
2007, pp. 1123–1126.
X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93
[6] X. Cui, X. Tong, G. Xuan, Discrimination between photo images and computer
graphics based on statistical moments in the frequency domain of histogram,
in: Chinese Information Hiding Workshop, Nanjing, China, 2007, pp. 276–279.
[7] E. Zheng, X. Ping, Identifying computer graphics from digital photographs
based on coherence of adjacent pixels, J. Comput. Res. Dev. 46 (Suppl. I) (2009)
258–262.
[8] G. Sankar, V. Zhao, Y.H. Yang, Feature based classification of computer graphics
and real images, in: Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing, Taipei, Taiwan, 2009, pp. 1513–1516.
[9] W. Li, T. Zhang, E. Zheng, X. Ping, Identifying photorealistic computer graphics
using second-order difference statistics, in: Seventh International Conference
on Fuzzy Systems and Knowledge Discovery, Yantai, China, August 10–12,
2010, pp. 2316–2319.
[10] S. Dehnie, T. Senca, N. Memon, Digital image forensics for identifying computer
generated and digital camera images, in: Proceedings of the International
Conference on Image Processing, Atlanta, Georgia, USA, October 8–11, 2006.
[11] A.E. Dirik, S. Bayram, H.T. Sencar, N. Memon, New features to identify
computer generated images, in: Proceedings of the International Conference
on Image Processing, San Antonio, Texas, USA, September 16–October 19,
2007, pp. IV-433–IV-436.
[12] Andrew C. Gallagher, Tsuhan Chen, Image authentication by detecting traces
of demosaicing, in: Proceedings of the CVPR WVU Workshop, Anchorage, AK,
USA, June 2008, pp. 1–8.
[13] Z. Li, J. Ye, Y.Q. Shi, Distinguishing computer graphics from photographic
images using local binary patterns, in: International Workshop on Digital
Watermarking, 2012, pp. 228–241.
[14] http://www.creative-3d.net, http://www.3dlinks.com.
[15] J. Friedman, T. Hastie, Additive logistic regression: a statistical view of
boosting, Ann. Stat. 28 (2) (2000) 337–407.
[16] R. Zhang, R.D. Wang, T.T. Ng, Distinguishing photographic images and
photorealistic computer graphics using visual vocabulary on local image
edges, in: IWDW2011, Springer, LNCS7128, pp. 292–305.
93
[17] H. Farid, M.J. Bravo, Perceptual discrimination of computer generated and
photographic faces, Digital Invest. 8 (3–4) (2012) 226 (10).
[18] D.T. Dang Nguyen, G. Boato, F. De Natale, Discrimination between computer
generated and natural human faces based on asymmetry information, in: 20th
European Signal Processing Conference, Bucharest, August 27–31, 2012, pp.
1234–1238.
[19] D.-T. Dang-Nguyen, G. Boato, F.G.B.D. Natale, Identify computer generated
characters by analysing facial expressions variation, in: IEEE International
Workshop on Information Forensics and Security, 2012, pp. 252–257.
[20] Tian-Tsong Ng, Shih-Fu Chang, Discrimination of computer synthesized or
recaptured images from real images, Digital Image Forensics, 2013, p. 275–
309.
[21] M.N. Do, M. Vetterli, The contourlet transform: an efficient directional
multiresolution image representation, IEEE Trans. Image Process. 14 (12)
(2005) 2091–2106.
[22] D.D. Po, M.N. Do, Directional multiscale modeling of images using the
contourlet transform, IEEE Trans. Image Process. 15 (6) (2006) 1610–1620.
[23] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin
classifiers, in: D. Haussler (Ed.), 5th Annual ACM Workshop on COLT, PA, ACM
Press, Pittsburgh, 1992, pp. 144–152.
[24] T.T. Ng, S.F. Chang, J. Hsu, M. Pepeljugoski, Columbia Photographic Images and
Photorealistic Computer Graphics Dataset, Department of Electrical
Engineering, Columbia University, Advent Technical Report #205-2004-5,
February 2005.
[25] A.E. Dirik, N. Memon, Image tamper detection based on demosaicing artifacts,
in: Proc. IEEE ICIP, 2009, pp. 1497–1500.
[26] L. Fillatre, Adaptive steganalysis of least significant bit replacement in
grayscale natural images, IEEE Trans. Signal Process. 60 (2) (2012) 556–569.
[27] UCID, http://www-staff.lboro.ac.uk/~cogs/datasets/UCID/ucid.html.
[28] Art-CG Gallery Database, <http://cggallery.itsartmag.com/>.