Computer Vision and Image Understanding 128 (2014) 84–93 Contents lists available at ScienceDirect Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu A statistical feature based approach to distinguish PRCG from photographs q Xiaofeng Wang a,⇑, Yong Liu a, Bingchao Xu a, Lu Li b, Jianru Xue b a b School of Science and Shaanxi Key Laboratory for Network Computing and Security Technology, Xi’an University of Technology, Xi’an, Shaanxi 710048, PR China Institute of Artificial Intelligence and Robotics at Xi’an Jiaotong University, Xi’an, Shaanxi 710049, PR China a r t i c l e i n f o Article history: Received 24 June 2013 Accepted 15 July 2014 Available online 24 July 2014 Keywords: Distinguishing PRCG from photographs Contourlet transformation Homomorphic filtering Detection accuracy Robustness Generalization capability a b s t r a c t We present a passive forensics method to distinguish photorealistic computer graphics (PRCG) from natural images (photographs). The goals of our work are to improve the detection accuracy and the robustness to content-preserving image manipulations. In the proposed method, Homomorphic filtering is used to highlight the detail information of image. We find that the texture changes are different between photographs and PRCG images under same Homomorphic filtering transformation, and then we use the difference matrixes to describe the differences of texture changes. We define a customized statistical feature, named texture similarity, and combine it with the statistical features extracted from the cooccurrence matrixes of differential matrixes to construct forensics features. Then we develop a statistical model and use SVM as classifier to distinguish PRCG from photographs. Experimental results show that the proposed method enjoys following advantages: (1) Proposed method reaches higher detection accuracy, synchronously, it is robust to tolerate content-preserving manipulations such as JPEG compression, adding noise, histogram equalization, and filtering. (2) Proposed method is provided with satisfactory generalization capability, it will be available when the training samples and the testing samples come from different sources. Ó 2014 Elsevier Inc. All rights reserved. 1. Introduction With the rapid growth and widespread using of digital cameras and mobile phones, photos and pictures can be obtained when and where. At the same time, rapid development of computer vision and computer graphic technologies, make it is more convenient to generate the photorealistic computer graphics (PRCG). These PRCG are so realistic that it is very difficult to differentiate them from photographs and pictures by human eyes. See Fig. 1. The old adage of ‘‘seeing is believing’’ is no longer true. This trend has brought new issues and challenges for us concerning the authenticity of digital images. Therefore, identifying PRCG from photographs has become an important topic in the field of media application. Photographs are those pictures that are captured from realworld by imaging device such as camera. PRCG are scene images that does not exist in reality; it is generated only by computers q This work was supported by the National Basic Research Program of China (973 Program) under Grant No. 2012CB316400; the National Natural Science Foundation of China under Grant No. 61075007 and No. 61273252. ⇑ Corresponding author. E-mail address: xfwang66@sina.com.cn (X. Wang). http://dx.doi.org/10.1016/j.cviu.2014.07.007 1077-3142/Ó 2014 Elsevier Inc. All rights reserved. and graphic software through the following process: first, a 3D polygon model is built to simulate a desire shape, then color, texture and simulate light irradiation are given to this model. At last, PRCG is imaged using a virtual camera. Recent advances in computer graphics are bring an amazing result, that is, it is possible to create models that are physically accurate without visual artifacts and lighting inconsistencies. Therefore, distinguishing PRCG from photographs is a very challenging problem. There are many established methodologies to distinguish PRCG from photographs in recent years. The majority of existing methods in literature are based on statistical learning model, in which, the features of inconsistency between photographs and PRCG are used for classification. Therefore, the main difference of existing approaches lies on the features extraction, and features extraction methods can be classified into two categories, the first is based on transform domain and the second is based on the physical characteristics of the imaging equipment. Taking a general survey on existing methods, there are some problems such as lower detection precision, weaker robustness and unsatisfactory stability. In our work, we aim at improving the detection accuracy, robustness, and generalization capability, and seek for new features that are able to distinguish photographs and PRCG. We find that the texture changes are different between X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93 85 Fig. 1. The pictures in top row are PRCG, and pictures in bottom are digital photos. photographs and computer-generated images under same Homomorphic filtering transformation, so we use Homomorphic filtering to highlight the image detail information. Considering that the Contourlet transformation has multi-direction selectivity and anisotropy, and can be used to describe the inherent characteristics of image, we use the statistic features of the Contourlet sub-bands to construct the distinguishing features. We define a customized statistical feature, named texture similarity, and combine it with the statistical features extracted from the co-occurrence matrix of differential matrixes to construct forensics features. Then we develop a statistical model, and use the least squares SVM as a classifier to classify PRCG from photographs. Experimental results show that the proposed method not only possesses satisfactory detection accuracy, also, it is robust to tolerate content-preserving manipulations, and provides satisfactory generalization capability for different image data sets. The rest of this paper is organized as follows. In Section 2, we introduce the relational works. The proposed method is described in Section 3. Section 4 contains experimental results. Finally, a short summary are presented in Section 5. 2. Related works The earliest approaches to distinguish photographs and PRCG were proposed by Ng and Chang [1] and Lyu and Farid [2], respectively. Work [1] presented a classification approach that used the natural image statistics features composed of 33-dimensional (33-D) power spectrum features, 24-D local patch features and 72-D wavelet higher-order statistics. Subsequently, a geometrybased approach [3] was proposed by the same authors to model the physical differences between PRCG and photographs, in which, 192-D features were extracted through analyzing local patch statistics, local fractal dimension, surface gradient, quadratic geometry, and Beltrami flow. Lyu and Farid [2] developed a statistical model based on a wavelet-like decomposition, in which, 216-D features were generate by using the first 4-order statistics (mean, variance, skewness, and kurtosis) of the sub-band coefficients and the inter-sub-band prediction errors. Another wavelet transform based method was proposed by Wang and Moulin [4] where 144-D features were extracted from the wavelet coefficient histogram. The similar method was presented by Chen et al. [5], in which, 234-D features were generated by using the statistical moments of image characteristic functions and the wavelet sub-band coefficients in HSV color space. Cui et al. [6] used a similar method, in which, three frequency-domain moments of the wavelet coefficient histogram were extracted, and obtained 78-D features were used as classification features. The detection rate of this method is 94%. Zheng and Ping [7] used 81-D features generated by adjacent pixels coherence histogram and 120-D co-occurrence matrix features extracted from HSV color space as distinguishing features to identify PRCG from photographs. In Sankar et al. [8], authors proposed an approach that based on an aggregate of existing features and texture interpolation features to distinguish PRCG from real images. In their work, 80-D features were generated, in which, 44-D came from the moment-based method, 24-D came from the texture interpolation method, 6-D came from color histogram, and 6-D came from patch statistics. Li et al. [9] proposed a distinguishing method, in which, the 2-order difference statistics of the images and the first 4-order statistics of its predicting error images were extracted to form 144-D distinguishing features. Since the majority of real images are obtained by imaging equipments, some physical characteristics of imaging equipments are always remained in imaging process, and these characteristics can be used to distinguish digital photographs from PRCG. Dehnie et al. [10] proposed an approach that used different pattern noise between photographs and PRCG as distinguishing features. Dirik et al. [11] identified photographic images by detecting Bayer color filter array (CFA) and the traces of chromatic aberration. Considering that most digital cameras employ the image sensor with a color filter array, the process of demosaicing interpolates the raw image to produce at each pixel an estimate for each color channel, Gallagher and Chen presented method [12] to distinguish PRCG from photographs by detecting the interpolation traces caused by the Bayer color filter array. This algorithm reaches the detection accuracy of 98.4% in Columbia open data set. It is almost the highest among similar algorithms in the same dataset. But the algorithm is less robust to JPEG compression and noise because such manipulations may destroy the interpolation traces caused by the Bayer color filter array. The other high accuracy method was proposed by Li et al. [13], it was reported that the detection accuracy can reach as high as 98.3% for distinguishing Computer Graphics (CG) from photographic images in their own data set. In their data set, computer graphics are composed of [14,15], and photographic images are collected by authors. Zhang et al. [16] proposed a method that distinguishes PRCG from photographic images using visual 86 X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93 vocabulary on local image edges. They preprocessed image edge patches and projected it into a 7-D sphere, and then a visual vocabulary was constructed via determining the key sampling points in accordance with Voronoi cells. Recently, Farid and Bravo [17], Dang-Nguyen et al. [18,19] presented methods that can discriminate computer generated and natural human faces. Literature [20] presented more technical approaches, system design, and other practical issues in tackling the distinction between photographic images and PRCG. 3. Proposed method Although PRCG are so realistic that it is very difficult to be differentiated from photographs by human’s eyes, but the imaging mechanism and imaging process are completely dissimilar. The imaging process of the photographs is very complex since it is inevitably influenced by both natural environment and acquisition equipment, while the process of generating a PRCG only primitively simulates the imaging process of photographs. These differences will lead to different characteristics in many aspects, especially, in texture detail. Generally speaking, the texture details of photographs appear to smooth while those of PRCG appear to quavering [9]. Additional, photographs are always influenced by natural environment. If illumination is uneven, the average brightness of the various parts on image would shudder, and image detail structure that corresponding to dark regions would difficult to be recognized. All these differences can be used to distinguish PRCG from photographs. To this end, we keep finding the ways to capture and describe the essential differences between photographs and PRCG. In digital image processing, the approach of Homomorphic filtering is derived from an illumination reflectance model of the image, and it can be used to perform simultaneous dynamic range compression and contrast enhancement to image. For photographs, (a) The original photograph, (c) The original PRCG image after Homomorphic filtering processing, the intensity variation would be reduced while the details would be highlighted. However, such a variation is not obvious for PRCG after the same Homomorphic filtering processing. The reason is that the natural texture expression is different from that of the PRCG. See Fig. 2. In Fig. 2(a) and (b) are photographs, and (c) and (d) are PRCG. We can see that after Homomorphic filtering, the texture details of photographs in shaded areas are apparent, but the PRCG has not such a change. For this reason, we can use Homomorphic filter to capture the detail difference between photographs and PRCG. Additional, Contourlet transformation [21,22] is able to capture a sparse expansion for typical images that are piecewise smooth away from smooth contours. The main difference between Contourlet transform and wavelet transform is that the latter cannot capture the geometrical smoothness of the contours. Motivated by this fact, the statistics of the Contourlet sub-bands are used to construct distinguishing features. The least squares SVM are used as classifier for classifying. Our method can be divided into 2 stages: feature extraction and classification. The framework is shown in Fig. 3. 3.1. Feature extraction For N N image I, taking 2-level Contourlet decomposition, the image is decomposed into low-frequency sub-band and band-pass directional sub-bands. See Fig. 4. h We denote the coefficient matrixes as Ml M m s , and M t , respectively, and corresponding sub-band images are noted in Fig. 4. 2 ml11 6 l 6 m21 Ml ¼ 6 6 ... 4 mli1 ml12 ml22 ... mli2 . . . ml1j 3 7 . . . ml2j 7 7 ... ... 7 5 . . . mlij (b) Processed photograph by Homomorphic filtering (d) Processed PRCG by Homomorphic filtering Fig. 2. The visual effects of photograph and PRCG before and after Homomorphic filtering. ð1Þ X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93 87 Texture similarity Difference matrix Feature extraction Filtered image I’ Homomorphic filtering Original image I Contourlet decomposition Coefficient matrix of I Contourlet decomposition Coefficient matrix of I’ Difference matrix Gray Level Co-occurrence Matrix Energy, Uniformity, Contrast, Mean 70-D feature vectors Classification Classifier training LS-SVM Classification Fig. 3. The framework of the proposed method. Ml M sm M th s=1, 2, 3,4 t=1, 2, 3,…,8 k=1,2,…,13 Fig. 4. An example of Contourlet transformation, tested image is decomposed into two LP levels, which are then decomposed into four and eight directional sub-bands, the tested image comes from the Columbia open data set. 2 mm 11 mm 12 6 mm 6 21 Mm s ¼ 6 4 ... mm 22 ... ... mm p1 mm p2 ... 2 mh11 6 h 6 m21 Mht ¼ 6 6 ... 4 mhx1 mh12 mh22 ... mhi2 . . . mm 1q 2 3 7 . . . mm 2q 7 7; ... 5 where s ¼ 1; 2; 3; 4 ð2Þ mm pq . . . mh1y a12 6a 6 21 I¼6 4 ... aN1 2 3 7 . . . mh2y 7 7; ... ... 7 5 . . . mhxy a11 where t ¼ 1; 2; 3; . . . ; 8 ð3Þ Let Hom() represents a convolution Homomorphic filtering transformation, and let I0 = Hom(I). I and I0 can be denoted as: a011 6 a0 6 I0 ¼ 6 21 4 ... a0N1 3 ... a1N a22 ... ... ... a2N 7 7 7 ... 5 aN2 . . . aNN a012 ... a022 ... ... ... a02N 7 7 7 ... 5 ... a0NN a0N2 a01N ð4Þ 3 ð5Þ We take same 2-level Contourlet decomposition for the image I0 , and let M0 1, M 0m and M 0h s t represent the coefficient matrixes, respectively. 88 X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93 2 m0l11 6 0l 6 m21 M 0l ¼ 6 6 ... 4 m0li1 2 M 0m s m0l12 . . . m0l1j m0l22 ... m0l2j ... ... m0li2 ... 3 7 7 7 ... 7 5 m0lij ð6Þ m0m 12 . . . m0m 1q 6 m0m 6 21 ¼6 4 ... m0m 22 ... 7 . . . m0m 2q 7 7; ... ... 5 m0m p1 m0m p2 . . . m0m pq m0h 11 m0h 12 . . . m0h 1y 6 0h 6 m21 6 M 0h t ¼ 6 4 ... m0h x1 ... m0h x2 1 where s ¼ 1; 2; 3; 4 fk ¼ ð7Þ 3 2 where t ¼ 1; 2; 3; . . . ; 8 fk ¼ ð8Þ where s ¼ 1; 2; 3; 4 where t ¼ 1; 2; 3; . . . ; 8 0; otherwise g k ð1; 1Þ . . . g k ð1; jÞ . . . g k ð1; LÞ 6 6 ... 6 Gk ¼ 6 6 g k ði; 1Þ 6 4 ... g k ðL; 1Þ Rkj ¼ ð13Þ ... ... ... ... g k ði; jÞ ... ... ... ... ... g k ðL; jÞ ... 3 7 7 7 g k ði; LÞ 7 7; 7 ... 5 g k ðL; LÞ ð14Þ When k = 0, Gk represents the GLCM of the difference matrices DI = I I0 . Co-occurrence matrices capture the features of the texture but they are not directly useful for further analysis, such as the i¼0 g k ði; jÞ L j ¼ 1; 2; 3; . . . ; L ; ð18Þ ð19Þ (5) Texture similarity. In order to quantitatively measure the texture change before and after Homomorphic filtering, we define a term named ‘‘Texture similarity’’. Then we compute the texture similarity of difference h matrices DI, Dl, Dm s and Dt . For simplicity, we denote the difference matrices uniformly over them, and denote as Dk, k = 0, 1, 2, 3, . . . , 13. Then we estimate the texture similarity for each difference matrix, respectively. For the difference matrices Dk (k = 0, 1, 2, 3, . . . , 13): 2 k d11 6 6 ... 6 k Dk ¼ 6 6 di1 6 4 ... k Here, Akj ¼ k ... d1j ... ... ... ... ... k dij ... ... ... ... k dN1 . . . dNj n k d1N 3 7 ... 7 h 7 k 7 T diN 7 ¼ Ak1 7 ... 5 . . . ATkj . . . ATkN i k . . . dNN k k k d1j ; d2j ; . . . ; dNj o . ð20Þ We define an equivalence relation R on the set: Akj k dij R k k k k k di0 j () dij ¼ di0 j ^ dij ; di0 j 2 Akj k ð21Þ k It means: dij and di0 j satisfy the equivalence relation R, if and only if the following two conditions must be satisfied: k k ¼ 0; 1; 2; 3; . . . ; 13 PL j k dij ¼ di0j ... ð17Þ 4 fk ¼ max Rkj ð12Þ N X M X 1; if Iðp; qÞ ¼ i and Iðp þ Dx; q þ DyÞ ¼ j L X L X g k ði; jÞ 1 þ ji jj i¼1 j¼1 (4) The maximum of column mean. ð11Þ where i and j are the image intensity values of the image I, p and q are the spatial positions in image I and the offset (Dx, Dy) depends on the direction at which the matrix is computed. In our scheme, the parameters of gray-level co-occurrence matrix are chosen as follows. Assuming the gray-levels of the gray image is from 0 to L 1, the gray-level co-occurrence matrix offset parameter Dx is set to 1, and Dy is set to 0, namely, h = 0, this means that only the nearest neighborhoods in horizontal direction are considered for gray-level co-occurrence matrix calculation. For h 13 difference matrices DI, Dl, Dm s (s = 1, 2, 3, 4) and Dt (t = 1, 2, 3, . . . , 8), their co-occurrence matrixes Gk (k = 0, 1, 2, . . . , 13) can be denoted as follows, respectively. 2 3 fk ¼ ð10Þ Considering that the Gray Level Co-occurrence Matrix (GLCM) is referred to as a co-occurrence distribution of the image pixels, it is defined over an image to be the distribution of co-occurring pixel values at a given offset. Based on the behavior of the GLCM, it can be used to capture the properties of image texture. Mathematically, a co-occurrence matrix is defined to be a square matrix G of size L, where, L be the total number of gray levels in an N M gray image I. The (i, j)-th entry of G is parameterized by an offset (Dx, Dy), as: p¼1 q¼1 ð16Þ i¼1 j¼1 ð9Þ Dl ¼ M l M 0l gði; jÞ ¼ L X L X 2 ði jÞ g k ði; jÞ (3) Homogeneity. It measures the tightness of elements distribution to diagonal in GLCM. DI ¼ I I0 Dht ¼ M ht M 0h t ; ð15Þ i¼1 j¼1 In order to represent the difference of the texture detail between photographs and PRCG, we define 4 categories difference h matrices DI, Dl, Dm s and Dt : m 0m Dm s ¼ Ms Ms ; L X L X 2 ðg k ði; jÞÞ (2) Contrast. It is a measure of the intensity contrast between a pixel and its neighbor pixels over the whole image. 7 7 . . . m0h 2y 7 ; ... ... 7 5 . . . m0h xy m0h 22 (1) Energy. It is the quadratic sum of elements in GLCM. 3 m0m 11 2 comparison of two image textures. In order to represent the texture details more compactly, we estimate the following five kinds of numeric features (statistics) of each above mentioned co-occurrence matrix, respectively. k k and dij ; di0 j 2 Akj Symbol ‘‘ () ’’ represent ‘‘if and only if’’ or ‘‘be equivalent to’’, ‘‘^’’ represent ‘‘and’’. Then the equivalence class of each element in Akj can be denoted as: 8dkij 2 Akj ; h i n o k k k k dij ¼ di0 j 2 Akj ^ dij R di0 j R h i k Let Ekj ¼ arg max dij i R ð22Þ ð23Þ 89 X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93 Here, |||| denote the operator to compute the cardinality of a set. N X 5 Let fk ¼ Ekj ð24Þ j¼1 5 We named f k as ‘‘Texture similarity’’. Then 70-dimension eigenvectors are defined as follows: n o 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 V ¼ f0 ; f0 ; f0 ; f0 ; f0 ; f1 ; f1 ; f 1 ; f1 ; f1 ; . . . ; f 13 ; f13 ; f13 ; f 13 ; f13 ð25Þ 3.2. Classification Support Vector Machine (SVM) [23] is a state-of-the-art classification method that based on the statistical learning theory, and has been applied successfully in text classification, image recognition and biological information processing. In our work, we take feature vector V as distinguishing features, the LS-SVM classifier is used to distinguish photograph and PRCG, and radial basis function (RBF) is used as the kernel function. To train classifier, we labeled the PRCG as ‘‘1’’ and the photograph as ‘‘1’’, used the feature vector V as classification feature, then took all of them as the input of the LS-SVM to obtain classifier through learning, this process can be described in Fig. 5. 4. Experimental results and analysis In this section, we use experimental results to demonstrate the performance of the proposed method. Our algorithm is implemented and tested using MATLAB2010a, and our experiment is performed on the computer with Processor Pentium(R) Dual-Core CPU, E5400 @ 2.70 GHz, 2.00 GB RAM. 4.1. Detection Accuracy and Performance Analysis The goal of this experiment is to test the detection accuracy and analyze the performance of the proposed algorithm. To this end, we present the formalization definitions of true negative rate and true positive rate as follows: P TN ¼ True negative rate ¼ The number of natural image ðCGÞ is detected as natural image ðCGÞ kTesting sample setk 100% ð26Þ P TP ¼ True positive rate ¼ The number of natural image ðCGÞ is detected as CG ðnatural imageÞ kTesting sample setk 100% ð27Þ Here, |||| denote an operator to compute the cardinality of a set. The true negative rate represents the detection accuracy of the algorithm, it means the ratio that photographs are detected as photographs, and PRCG are detected as PRCG. In order to test this index, we performed our experiments on Columbia University Image Database [24]. Tested images contained a variety of images including various brightness, textures and fine details. We take randomly 400 photographs and 400 PRCG as training samples, and take randomly other 100 photographs and 100 PRCG as testing samples, use LS-SVM as classifier. Let PTN(PG/CG) represents the detection rate that testing samples are photographs and PRCG, PTN(PG) represents the detection rate that testing samples are photographs, PTN(CG) represents the detection rate that testing samples are PRCG. Table 1 presents 10 detection results and comparing results with [6,12], in each testing, the training samples and the testing samples are randomly selected. As can be seen from Table 1, the true negative rate of proposed method is satisfactory whatever the testing samples contain either photographs or PRCG images. As can be seen from Table 1, proposed method is superior to the method [6] in term of detection rate, and superior to the method [12] when testing samples only contain photographs, testing samples contain photographs and PRCG, and less than the method [12] when testing samples only contain PRCG. In order to confirm the availability of the proposed method, we tested 20 pair true negative rate and true positive rate of the proposed method, as well as [6,12]. In experiment, we generated training samples by taking randomly 400 photographs and 400 PRCG images from Columbia Image Database, and generated testing samples by taking other 10n photographs and 10n PRCG images, respectively, here, n = 1, . . . , 20, taking LS-SVM as classifier to get PTN and PTP. The results are shown in Table 2. Literature [20] lists the classification performance of the various methods on different data sets. We compare some results in Columbia open data set, the results are listed in Table 3. There are other methods such as [18,19] are provided with high classification accuracy, but they are distinguishing between computer generated and natural faces based on facial expressions. Due to aim at face image, it is not appropriate to test the accuracy of [18,19] in our data set. Work [13] was reported that the detection accuracy can reach as high as 98.3% for distinguishing Computer Graphics (CG) from photographic images in their own data set. In their data set, computer graphics are composed of [14,15], and photographic images are collected by authors. However, in our experiments, the highest achievable accuracy is 87.55% by performing their code in our data sets. We further investigated the feature used in [13]. Work [13] is a method that uses the statistical difference of uniform gray-scale invariant local binary patterns (LBP) to distinguish PRCG from photographs. In [13], the original JPEG coefficients of Y and Cr components, and their prediction errors were used for LBP calculation, and 236-D LBP features were obtained and SVM was used for classification. LBP is a kind of Training Training sample set Testing Feature extraction Testing sample set Photographs Feature vector V Feature extraction Learning Feature vector V PRCG Natural image LS-SVM classifier LS-SVM classifier Fig. 5. Training and testing process of LS-SVM classifier. PRCG 90 X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93 Table 1 Detection rates and comparison results. Experiment index 1 2 3 4 5 6 7 8 9 10 Proposed method Cui et al. [6] Gallagher and Chen [12] PTN(PG) (%) PTN(CG) (%) PTN(PG/CG) (%) PTN(PG) (%) PTN(CG) (%) PTN(PG/CG) (%) PTN(PG) (%) PTN(CG) (%) PTN(PG/CG) (%) 96.5 97.5 98.0 95.5 97.5 96.0 96.5 97.5 95.5 98.0 94.5 95.5 96.5 95.0 97.5 96.5 95.0 96.5 95.0 96.5 97.5 96.5 96.0 96.5 96.0 97.0 95.5 97.5 96.0 98.0 91.0 92.0 91.5 92.5 89.5 86.0 91.5 89.5 88.5 87.0 83.5 84.5 80.0 86.0 85.0 82.0 80.0 85.5 88.5 82.0 84.0 88.5 85.0 88.5 86.0 83.5 83.5 84.0 85.5 84.5 88.0 89.0 86.0 89.5 86.0 85.0 83.0 84.0 83.5 82.5 95.5 97.5 96.5 95.0 98.0 94.5 95.0 94.5 94.0 97.0 90.0 88.5 87.0 86.0 85.5 86.0 86.5 87.0 84.0 91.0 Table 2 PTN and PTP and comparison results. n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Testing samples are photographs Testing samples are PRCG images Proposed method Cui et al. [6] Gallagher and Chen [12] Proposed method Cui et al. [6] Gallagher and Chen [12] PTN (%) PTP (%) PTN (%) PTP (%) PTN (%) PTP (%) PTN (%) PTP (%) PTN (%) PTP (%) PTN (%) PTP (%) 100.00 100.00 100.00 100.00 100.00 98.67 97.14 96.67 97.78 98.00 97.27 97.50 99.23 99.29 98.00 99.38 97.06 98.89 97.89 98.00 0.00 0.00 0.00 0.00 0.00 1.33 2.86 3.33 2.22 2.00 2.73 2.50 0.77 0.71 2.00 0.62 2.94 1.11 2.11 2.00 100.00 95.00 96.67 95.00 92.00 91.60 85.71 86.25 87.78 91.00 88.18 87.50 90.00 90.00 91.33 91.88 90.59 92.22 92.60 92.00 0.00 5.00 3.33 5.00 8.00 8.40 14.92 13.75 12.22 9.00 11.82 12.50 10.00 10.00 8.67 8.12 9.41 7.78 7.40 8.00 100.00 100.00 96.70 97.50 94.00 90.00 98.60 92.50 94.40 97.00 94.50 93.30 93.10 93.60 94.00 92.50 92.90 96.10 94.20 93.50 0.00 0.00 3.30 2.50 6.00 10.00 1.40 7.50 5.60 3.00 5.50 6.70 6.90 6.40 6.00 7.50 7.10 3.90 5.80 6.50 100.00 100.00 100.00 100.00 100.00 100.00 100.00 98.75 98.89 99.00 99.09 99.17 97.00 96.43 96.00 96.63 96.47 96.89 96.32 97.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.25 1.11 1.00 0.91 0.83 3.00 3.57 4.00 3.37 3.53 3.11 3.68 2.50 100.00 100.00 96.67 95.00 90.00 90.50 87.14 85.00 84.44 81.00 90.00 85.83 85.38 82.86 80.67 85.63 84.12 84.44 84.55 84.50 0.00 0.00 3.33 5.00 10.00 9.50 12.86 15.00 15.56 19.00 10.00 14.17 14.62 17.14 19.33 14.37 15.88 15.56 15.45 15.50 100.00 100.00 100.00 100.00 100.00 98.30 97.10 97.50 97.80 98.00 98.10 98.30 97.70 97.10 96.70 96.90 97.60 97.80 97.90 98.00 0.00 0.00 0.00 0.00 0.00 1.70 2.90 2.50 2.20 2.00 1.90 1.70 2.30 2.90 3.30 3.10 2.40 2.20 2.10 2.00 Table 3 Classification accuracy comparison. Work The approach of feature extraction Feature dimension Data set Highest classification accuracy (%) Ng et al. [3] Chen et al. [5] Cui et al. [6] Sankar et al. [8] Gallagher and Chen [12] Proposed method Fractal geometry, differential geometry Statistical moments, wavelet coefficients Statistical moments, wavelet coefficients Combining features Interpolation clues related to the Bayer color filter array Statistical feature 192 234 78 557 1 70 Columbia open data set Columbia open data set Columbia open data set Columbia open data set Columbia open data set Columbia open data set 83.5 82.1 94.0 90.0 98.4 98.0 typical texture feature, it strongly dependents on the image texture characters and image resolution. Due to work [13] only use LBP feature directly, it is inevitable that the detection results are closely relate to used images data set. Moreover, the image resolution used in [13] is restricted from 500 500 to 800 800, while the image resolutions of Columbia open data set are from 757 568 to 1152 768. All of these are possible reasons to lead to declining accuracy when perform method [13] in our data set. Such situation is very common occurrences. E.g., work [25] is reported that the accuracy reaches 99.33%. However, work [13] reported that the accuracy decline to 69.94% when it was tested in the data set of [13]. Keeping steady accuracy in different database is a good performance of the algorithm. We will further discuss in Section 4.3. To illustrate the superiority of our method in a more convincing way, we only present the comparison results of our algorithm and some methods that are using Columbia open data set. We compare our method with [12] in detail, because method [12] reaches highest accuracy in Columbia open data set. Receiver Operating Characteristic (ROC) curves is a most common performance evaluation method of the detection algorithms. However, the recently state-of-the-art [26] reported, in presence of composite hypotheses (due to unknown photographs and PRCG images), a ROC curve is not sufficient to sum up the performances of a detector. The reason is that ROC curves are especially relevant for testing two simple hypotheses. In the situation of discriminating PRCG images from photographs, strictly speaking, a ROC curve has to be calculated for each possible photographs and PRCG images, except when it is theoretically established that the studied test is independent from the image content. Instead of using ROC curve, we demonstrate the distribution situation of the true 91 X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93 Fig. 6. The plot and the comparison of the true positive rate. The left shows the true positive rate that testing samples are photographs; the right shows the true positive rate that testing samples are PRCG images. Table 4 True negative rates under content-preserving manipulations. Factor Salt & pepper noise (VAR = 0.001) Histogram equalization Mean filtering (3 3) JPEG (QF = 30) JPEG (QF = 60) JPEG (QF = 90) PTN(PG/CG) (%) PTN(PG) (%) PTN(CG) (%) Proposed method 90.00 83.50 90.50 81.50 80.50 70.50 85.00 82.00 86.00 87.50 83.00 86.50 86.00 88.50 84.00 94.50 90.50 93.50 PTN(PG/CG) (%) PTN(PG) (%) PTN(CG) (%) Cui et al. [6] 72.50 94.00 64.50 67.50 56.00 53.00 67.50 90.00 55.00 67.00 80.50 50.50 71.50 92.50 63.50 84.50 90.50 86.50 PTN(PG/CG) (%) PTN(PG) (%) PTN(CG) (%) Gallagher and Chen [12] 66.50 88.00 92.50 65.50 88.00 92.00 53.50 37.00 86.50 66.50 40.00 85.50 73.00 41.00 89.50 71.00 46.50 93.00 positive rate, and compare it with some existing works. The results are shown in Fig. 6. Fig. 6 shows the distribution and the comparison of true positive rates. We can see that the true positive rates of the proposed method are all no more than 5%, this means that the proposed method works well whatever the image number. 4.2. Robustness analysis These experiments are performed to test the proposed method is robust for incidental changes caused by content-preserving manipulations, such as JPEG compression, adding noise, and Histogram equalization. Robustness is critical important because it determines the practical applicability of the algorithm. As the literature [20] notes ‘‘The quoted performance may not be a sufficient indicator for the usefulness of the methods when it comes to real applications.’’ Some methods work well on recognizing the original and uncompressed photographic images. However, the images extremely likely are manipulated, and undergo common image processing such as JPEG compression, adding noise, and so on in practical application. Therefore, our core work will be developing a robust distinguishing method that can tolerate common content-preserving manipulations. The robustness can be expressed by the true negative rate in the instance that image have undergone content-preserving manipulations. Generally, we say an algorithm is robust to tolerate a certain kind of manipulation if it holds a satisfactory true negative rate after that manipulation. To test such a property, we generate training sample set using the similar method like Section 3.1. We generate testing sample sets by adding noise, JPEG compression, and Histogram equalization, respectively, for 200 randomly selected images from Columbia University Data Set. The results shown in Table 4 are actually the average values of the 10 testing, and training samples and testing samples were randomly selected in each testing. As can be seen from Table 4, for content-preserving manipulations, our method presents satisfactory detection results. It means that our method is robust to tolerate content-preserving manipulations. The method proposed by Gallagher and Chen [12] works well on recognizing the original-size and uncompressed photographic images. But the algorithm is less robust to JPEG compression and noise because such manipulations may destroy the interpolation clues caused by the Bayer color filter array. 4.3. Generalization capability analysis Generally, different image data sets have different influences to the detection results of the algorithm. In real world applications, the data sources are varied. It is impossible to train a classifier that is available for all images come from Internet. The generalization capability of the algorithm refers to the availability for different image databases. High generalization capability means that the 92 X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93 Table 5 Generalization capability for different image databases. Training samples Testing samples PTN (%) Columbia Image Database Columbia Image Database UCID, Columbia Image Database UCID, Columbia Image Database UCID, Columbia Image Database Columbia Image Database Art-CG gallery database, Columbia Image Database Art-CG gallery database, UCID Art-CG gallery database, UCID UCID (photographs) UCID, Columbia Image Database Columbia Image Database (photographs, PRCG images) Columbia Image Database (photographs) Columbia Image Database (PRCG images) Art-CG gallery database Columbia Image Database (photographs, PRCG images) Columbia Image Database (photographs, PRCG images) Columbia Image Database (PRCG images) 95.00 94.00 95.50 91.00 94.00 95.50 94.50 92.50 93.00 true negative rate of algorithm is satisfactory and move gradually within a small range when the training samples and testing samples come from different sources. In order to test the generalization capability of the proposed method, our experiments are performed on image sets from UCID [27], Columbia Image Database [24] and Art-CG gallery database [28]. Nine experiments are as follows: (1) Columbia ? UCID: Training samples are generated by taking randomly 400 photographs and 400 PRCG images from Columbia Image Database, and testing samples are generated by taking randomly 200 photographs from UCID. (2) Columbia ? UCID, Columbia: Training samples are generated by taking randomly 400 photographs and 400 PRCG images from Columbia Image Database, and testing samples are generated by taking 100 photographs from UCID and other 100 PRCG images from Columbia Image Database. (3) Columbia, UCID ? Columbia: Training samples are generated by taking randomly 400 photographs from UCID, and 400 PRCG images from Columbia Image Database, and testing samples are generated by taking 100 photographs and 100 PRCG images from Columbia Image Database. (4) Columbia, UCID ? Columbia: Training samples are generated by taking randomly 400 photographs from UCID, and 400 PRCG images from Columbia Image Database, and testing samples are generated by taking 200 photographs from Columbia Image Database. (5) Columbia, UCID ? Columbia: Training samples are generated by taking randomly 400 photographs from UCID, and 400 PRCG images from Columbia Image Database, and testing samples are generated by taking other 200 PRCG images from Columbia Image Database. (6) Columbia ? Art-CG gallery database: Training samples are generated by taking randomly 400 photographs and 400 PRCG images from Columbia Image Database, and testing samples are generated by taking randomly 200 PRCG images from Art-CG gallery database. (7) Art-CG gallery database, Columbia ? Columbia: Training samples are generated by taking randomly 400 photographs from Columbia Image Database and 400 PRCG images from Art-CG gallery database, and testing samples are generated by taking randomly 100 photographs and 100 PRCG images from Columbia Image Database. (8) Art-CG gallery database, UCID ? Columbia: Training samples are generated by taking randomly 400 photographs from UCID and 400 CG images from Art-CG gallery database, and testing samples are generated by taking randomly 100 photographs and 100 PRCG images from Columbia Image Database. (9) Art-CG gallery database, UCID ? Columbia: Training samples are generated by taking randomly 400 photographs from UCID and 400 PRCG images from Art-CG gallery database, and testing samples are generated by taking randomly 200 PRCG images from Columbia Image Database. Each experiment is repeated 10 times with different random images. Table 5 shows the statistical average of the true negative rate. As can be seen from Table 5, for different data sets, the true negative rates of the proposed algorithm are satisfactory, and it varies in small range when the training samples and testing samples come from different sources. This shows that proposed method is available when the training samples and testing samples come from different sources. It means that the generalization capability of the proposed method is satisfactory. 5. Conclusions Nowadays, computer imagery and computer generated images touch many aspects of our daily life. Computer graphics are getting so photorealistic that it has brought new challenges towards the credibility of digital images. Identifying photorealistic computer graphics from photographs has become an important topic in the field of media application. In this paper, we aimed at improving the detection accuracy, robustness, and generalization capability, simultaneously, proposed a new approach to identify photorealistic computer graphics from photographic images. In the proposed method, the Homomorphic filter is used to highlight the image detail information, the statistical features of the Contourlet sub-bands of the image are used to construct the distinguishing features, and then the least squares SVM is used for classification. Experimental results show that the proposed method reaches high detection accuracy. Also, the algorithm is robust to tolerate content-preserving image manipulations, and it has satisfactory generalization capability in different data sets. Acknowledgments We would like to thank Dr. Zhaohong Li for her kindness by providing us with their codes. We also would like to acknowledge the helpful comments and kindly suggestions provided by anonymous referees. References [1] T. Ng, S. Chang, Classifying photographic and photorealistic computer graphic images using natural image, Technical Report #220-2006-6, Department of Electrical Engineering, Columbia University, 2006. [2] S. Lyu, H. Farid, How realistic is photorealistic?, IEEE Trans Signal Process. 53 (2) (2005) 845–850. [3] T. Ng, S. Chang, J. Hsu, L. Xie, MP. Tsui, Physics motivated features for distinguishing photographic images and computer graphics, in: Proceedings of MULTIMEDIA 05, New York, NY, USA, ACM2005, pp. 239–248. [4] Y. Wang, P. Moulin, On discrimination between photo realistic and photo graphic images, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France, 14–19 May, 2005, pp. 161–164. [5] W. Chen, Y.Q. Shi, G.R. Xuan, Identifying computer graphics using HSV color model and statistical moments of characteristic functions, in: IEEE International Conference on Multimedia & Expo, Beijing, China, 2–5 July, 2007, pp. 1123–1126. X. Wang et al. / Computer Vision and Image Understanding 128 (2014) 84–93 [6] X. Cui, X. Tong, G. Xuan, Discrimination between photo images and computer graphics based on statistical moments in the frequency domain of histogram, in: Chinese Information Hiding Workshop, Nanjing, China, 2007, pp. 276–279. [7] E. Zheng, X. Ping, Identifying computer graphics from digital photographs based on coherence of adjacent pixels, J. Comput. Res. Dev. 46 (Suppl. I) (2009) 258–262. [8] G. Sankar, V. Zhao, Y.H. Yang, Feature based classification of computer graphics and real images, in: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, Taipei, Taiwan, 2009, pp. 1513–1516. [9] W. Li, T. Zhang, E. Zheng, X. Ping, Identifying photorealistic computer graphics using second-order difference statistics, in: Seventh International Conference on Fuzzy Systems and Knowledge Discovery, Yantai, China, August 10–12, 2010, pp. 2316–2319. [10] S. Dehnie, T. Senca, N. Memon, Digital image forensics for identifying computer generated and digital camera images, in: Proceedings of the International Conference on Image Processing, Atlanta, Georgia, USA, October 8–11, 2006. [11] A.E. Dirik, S. Bayram, H.T. Sencar, N. Memon, New features to identify computer generated images, in: Proceedings of the International Conference on Image Processing, San Antonio, Texas, USA, September 16–October 19, 2007, pp. IV-433–IV-436. [12] Andrew C. Gallagher, Tsuhan Chen, Image authentication by detecting traces of demosaicing, in: Proceedings of the CVPR WVU Workshop, Anchorage, AK, USA, June 2008, pp. 1–8. [13] Z. Li, J. Ye, Y.Q. Shi, Distinguishing computer graphics from photographic images using local binary patterns, in: International Workshop on Digital Watermarking, 2012, pp. 228–241. [14] http://www.creative-3d.net, http://www.3dlinks.com. [15] J. Friedman, T. Hastie, Additive logistic regression: a statistical view of boosting, Ann. Stat. 28 (2) (2000) 337–407. [16] R. Zhang, R.D. Wang, T.T. Ng, Distinguishing photographic images and photorealistic computer graphics using visual vocabulary on local image edges, in: IWDW2011, Springer, LNCS7128, pp. 292–305. 93 [17] H. Farid, M.J. Bravo, Perceptual discrimination of computer generated and photographic faces, Digital Invest. 8 (3–4) (2012) 226 (10). [18] D.T. Dang Nguyen, G. Boato, F. De Natale, Discrimination between computer generated and natural human faces based on asymmetry information, in: 20th European Signal Processing Conference, Bucharest, August 27–31, 2012, pp. 1234–1238. [19] D.-T. Dang-Nguyen, G. Boato, F.G.B.D. Natale, Identify computer generated characters by analysing facial expressions variation, in: IEEE International Workshop on Information Forensics and Security, 2012, pp. 252–257. [20] Tian-Tsong Ng, Shih-Fu Chang, Discrimination of computer synthesized or recaptured images from real images, Digital Image Forensics, 2013, p. 275– 309. [21] M.N. Do, M. Vetterli, The contourlet transform: an efficient directional multiresolution image representation, IEEE Trans. Image Process. 14 (12) (2005) 2091–2106. [22] D.D. Po, M.N. Do, Directional multiscale modeling of images using the contourlet transform, IEEE Trans. Image Process. 15 (6) (2006) 1610–1620. [23] B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in: D. Haussler (Ed.), 5th Annual ACM Workshop on COLT, PA, ACM Press, Pittsburgh, 1992, pp. 144–152. [24] T.T. Ng, S.F. Chang, J. Hsu, M. Pepeljugoski, Columbia Photographic Images and Photorealistic Computer Graphics Dataset, Department of Electrical Engineering, Columbia University, Advent Technical Report #205-2004-5, February 2005. [25] A.E. Dirik, N. Memon, Image tamper detection based on demosaicing artifacts, in: Proc. IEEE ICIP, 2009, pp. 1497–1500. [26] L. Fillatre, Adaptive steganalysis of least significant bit replacement in grayscale natural images, IEEE Trans. Signal Process. 60 (2) (2012) 556–569. [27] UCID, http://www-staff.lboro.ac.uk/~cogs/datasets/UCID/ucid.html. [28] Art-CG Gallery Database, <http://cggallery.itsartmag.com/>.