Estimating the photorealism of images: Distinguishing paintings from photographs Florin Cutzu, Riad Hammoud, Alex Leykin Department of Computer Science Indiana University Bloomington, IN 47408 florin, rhammoud, oleykin cs.indiana.edu Abstract Automatic classification of an image as a photograph of a real-scene or as a painting is potentially useful for image retrieval and website filtering applications. The main contribution of this paper is the proposition of several features derived from the color, edge, and gray-scale-texture information of the image that effectively discriminate paintings from photographs. For example, we found that paintings contain significantly more pure-color edges, and that certain gray-scale-texture measurements (mean and variance of Gabor filters) are larger for photographs. Using a large set of images ( ) collected from different web sites, the proposed features exhibit very promising classification performance (over ). A comparative analysis of the automatic classification results and psychophysical data is reported, suggesting that the proposed automatic classifier estimates the perceptual photorealism of a given picture. 1. Introduction The goal of the present work was the determination of the image features distinguishing photographs of real-world, three-dimensional, scenes from (photographs of) paintings and the development of a classifier system for their automatic differentiation. In the context of this work, the class “painting” included not only conventional canvas paintings, but also frescoes and murals. Line (pencil or ink) drawings as well as computer-generated images were excluded. No restrictions were imposed on the historical period or on the style of the painting. The class “photograph” included exclusively color photographs of three-dimensional real-world scenes. The problem of distinguishing paintings from photographs is non-trivial even for a human observer, as can be appreciated from the numerous illustrations shown later in the paper. To our knowledge, the present study is the first to address the problem of photograph-painting discrimination. This problem is related thematically to other work on broad image classification: city images vs. landscapes 1063-6919/03 $17.00 © 2003 IEEE [4], indoor vs. outdoor [3], and photographs vs. graphics [2] differentiation. Distinguishing photographs from paintings is, however, more difficult than the above classifications due to the generality of the problem. One difficulty is that that are no constraints on the image content of either class, such as those successfully exploited in differentiating city images from landscapes or indoor from outdoor images. The problem of distinguishing computer-generated graphics from photographs is closest to the problem considered here, and their relation will be discussed in more detail in Section 7. At this point, it suffices to note that the differences between (especially realistic) paintings and photographs are subtler than the differences between graphics and photographs; in addition, the definition of computergenerated graphics used in [2] allowed the use of powerful constraints that are not applicable to the paintings class. From a theoretical standpoint, the problem of separating photographs from paintings is interesting because it constitutes a first attempt at revealing the features of real-world images that are mis-represented in hand-crafted images. From a practical standpoint, our results are useful for the automatic classification of images in large electronic-form art collections, such as those maintained by many museums. A special application is in distinguishing pornographic images from nude paintings: distinguishing paintings from photographs is important for web browser blocking software, which currently blocks not only pornography (photographs) but also artistic images of the human body (paintings). Organization of the paper Section 2 describes the image features used to differentiate between paintings and photographs; section 3 describes the image set used. Section 4 contains the results: describes the inter-relations between the various features, the discrimination performance obtained using one feature at a time and classification results obtained by using all features concurrently. In section 5 some classified image samples are illustrated and discussed. Section 6 compares psychophysical data with the output of the classifier system. Section 7 places our results in the context of related work and outlines further work. 2. Proposed distinguishing features This section details the features proposed for discriminating painting from photographs: four so-called visual features, a hybrid spatial-color feature and a texture-based feature. The formulation of these features was based upon the visual inspection of a large number of photographs and paintings. 2.1. Visual features Four scalar-valued visual features were defined, as follows. 1. Color edges vs. intensity edges The removal of color eliminates more visual information from a painting than from a photograph of a real scene. Specifically, conversion to gray-scale leaves most edges in photographs intact but it eliminates many of the perceptual edges in paintings. These observations led to the following hypotheses: (1) Perceptual edges in photographs are, largely, intensity edges. These intensity edges can be at the same time color edges and there are few “pure” color edges – color, not intensity edges. (2) Many of the perceptual edges in paintings are pure color edges, as they result from color changes that are not accompanied by concomitant edge-like intensity changes. A quantitative criterion was developed. Consider a color input image—painting or photograph. First, the intensity edges were found. information was removed by dividing the R, G, and B image components by the image intensity at each pixel, resulting in normalized RGB components. The color edges of the resulting “intensity-free” color image were determined applying the Canny edge detector to the three color channels and fusing the resulting edges. Consider the edge pixels that are intensity but not color edge (pure intensity edge pixels); hue does not change substantially across a pure intensity edge. Let denote fraction of pure intensity-edge pixels: # pixels: intensity, not color edge total number of edge pixels Our hypothesis was that is larger for photographs. 2. Spatial variation of color Color changes to a larger extent from pixel to pixel in paintings than in photographs. Given an input image, its R, G and B channels were normalized by division by image intensity. If in the neighborhood of a pixel the R, G, and B color surfaces are parallel or nearly so, there is no substantial hue change in that neighborhood; if there is a non-zero angle among any two of these surfaces, color changes qualitatively. At each pixel, we determined the orientation of the plane that best fitted a neighborhood centered on the pixel of interest, in the R, G, and B domains respectively. Thus, at each pixel, one obtains three normals: , , . The sum of the areas of the facets of the pyramid determined by these normals was taken as a measure of the local variation of color around the pixel. Let denote the average of this quantity taken over all image pixels. should be, on the average, larger for paintings than for photographs. 3. Number of unique colors Paintings appear to have a larger color palette than photographs. The number of unique colors of an image was normalized by the total number of pixels, resulting in a measure, , which we hypothesized to be larger for paintings. 4. Pixel saturation Paintings tend to contain a larger percentage of highly saturated pixels whereas photographs contain more unsaturated pixels. This can be seen in Figure 1, which displays the mean saturation histograms derived from all paintings and all photographs in our datasets. The input images were transformed to HSV color space, and their saturation histograms were determined, using 20 bins. The ratio, , between the count in the highest bin ( ) and the lowest ( ) should, on the average, be larger for paintings, and can be used to help painting-photograph discrimination. Average saturation histograms. Black: photographs. Light color: paintings 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 2 4 6 8 10 12 14 16 18 20 Figure 1: The mean saturation histogram for photographs (black) and paintings (yellow). 20 bins were used. Photographs have more unsaturated pixels, paintings have more highly saturated pixels. 2.2. Pixel distribution in RGBXY space An image is a point-cloud in RGB space. The clouds of color-poor images (photographs, mostly) are restricted to subspaces of the 3-D space, having the appearance of cylinders or planes; the clouds of color-rich images (paintings, mostly) are fully 3-D and cannot be approximated well by a 1-D or 2-D subspace. One can enhance this representation by adding the two spatial coordinates, and to the RGB vector of each image pixel, resulting in a 5-dimensional, joint color-location space we call RGBXY. The singular val covariance matrix of the RGBXY ues of the point cloud describe the variability of the image pixels in both color space as well as across the image. Typically, paintings use a larger color palette and also have larger spatial variation of color, resulting in larger singular values for this covariance matrix. We represented each image by the 5-dimensional vector of the singular values of its RGBXY pixel covariance matrix. We expect paintings and photographs to be well-separated in this space. 2.3. Gray-Scale-Texture The features described above use color to distinguish between paintings and photographs. One can observe that the texture elements (texels) in photographs tend to be repetitive while in hand-crafted images it is difficult to maintain texel uniformity. This section reviews the Gabor filter design and its implementation in this work on gray-scale images. Gabor filter review. The joint color-texture properties of images were represented in the wavelength-Fourier , where is the wavelength of the light endomain ergy of the three dimensional density function of the color image , denotes the spatial frequency and the spatial coordinates. The texture measurement of the signal at a given spatial frequency is obtained by the multiplication with a Gaussian function centered at at scale [6] [1]. This is equivalent to the convolution with a Gabor filter in domain [8] [7]. Therefore, the texture measurement in the domain at wavelength is: "! #$ where denotes convolution, and &%' (* ),+.-/1023542 28 :%<; >=@?BAC: )9+ - 276 is the 2D Gabor function at the radial center frequency D filter orientation FE ?GA (cycles/pixels) HI:J K ON andand ( the AML . Here, ; is inversely proportional to . Implementation. In the current implementation the color information was simplified first by using the color spectrum , and P of a given RGB color image instead of the reconstruction of from its derivatives at a given frequency [6]. Then, a gray-scale image signal is obtained by averaging the three color channels. The motivation is that we would like to derive a feature that is colorindependent. The design of the Gabor filter used here is similar to the one described in [1], where redundancy of the information in the filtered images caused by the non orthogonality of the Gabor wavelets is reduced. Thus four different scales and four different orientations (0, 90, 45, 135 degrees) have been used. Then the mean and the standard deviation of the Gabor responses across image locations for each filtered image are computed and a feature vector of 72 dimensions is obtained. 3. Image set We used 6000 photographs and 6000 paintings, obtained from a variety of web sites (e.g. www.dlib.indiana.edu, www.artchive.com, www.freefoto.com). The paintings included a wide variety of artistic styles and historical periods, from Byzantine Art and Renaissance to Modernism (cubism, surrealism, pop art, etc). The photographs were also very varied in content–including animals, humans, city scenes and landscapes, indoor scenes. Image resolution was typical of web-available images. Mean image size was for :QSR TR U pixels and standard deviation &U R5Q paintings :V5W V pixels. For photographs mean image size was R5R pixels. pixels and standard deviation 4. Results Classification in X 1Y space We first verified whether , , and capture genuinely different image properties by measuring two measures of redundancy: pairwise feature correlation and the singular values of their covariance matrix. We found that: (1) the pairwise feature correlations are insignificant and (2) the smallest singular value is significantly larger than 0. To illustrate the distribution of the images in X ZY space, we computed the principal components of the common painting and photograph data set. Figure 2 displays painting and the photographs in the subspace of the first two principal components. Interestingly, the photographs overlap a subclass of the paintings: the photograph data set coincides with the right “lobe” of the paintings point cloud. This observation is in accord with the larger variability of the paintings class indicated by the larger singular values of its covariance matrix, and with the realization that photographs can be construed as extremely realistic paintings. We measured the Paintings Photographs 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 −2 −4 −2 0 2 −2 −4 −2 0 2 Figure 2: Painting and photograph data points represented separately in the same two-dimensional space of the first two principal components of the common paintingphotograph image set. L EFT: paintings R IGHT: photographs. separation effectiveness of each of the visual features when used individually. Each feature was measured for all photographs and all paintings in the database, and a optimum threshold value, chosen so as to minimize the maximum of the two misclassification rates—for photographs and for paintings, was calculated. The results are listed in Table 1 indicate that the separation power of each visual feature is modest, and the edge-based feature is the most effective. Neural network classification methodology A perceptron with six sigmoidal units in its unique hidden layer Feature P miss rate 33.34 36.14 37.40 37.93 Ph miss rate 33.34 36.13 37.43 37.92 Photographs order P Ph P Ph P Ph P Ph Paintings 80 80 60 60 40 40 20 20 Table 1: Painting-photograph discrimination performance. P denotes paintings, Ph denotes photographs. For each feature, paintings were separated from photographs using an optimal threshold. The miss rate is defined as the proportion of images incorrectly classified. The last column indicates the order of the classes with respect to the threshold. was employed to perform painting-photograph discrimination in the joint visual feature space X 1Y . Classifier performance was evaluated as follows. We partitioned the paintings and photographs sets into 6 parts (nonoverlapping subsets) of 1000 elements each. By pairing all photograph parts with all painting parts, 36 training sets were generated. Thus, a training set consisted of 1000 paintings and 1000 photographs, and the corresponding test set consisted of 5000 paintings and 5000 photographs. 36 networks were trained and tested, one for each training set. Due to the small size of the network, the convergence of the backpropagation calculation was quite rapid in almost re-initializations of the optiall cases, and usually, mization were sufficient for deriving an effective network. On the average, the networks correctly classified 71% of the photographs and 72% of the paintings in the test set, with a standard deviation of 4%, respectively, 5%. Classification in RGBXY space For visualization purposes, we determined the principal components of the common painting and photograph data set encoded in the space of the five singular values of the RGBXY covariance matrix. Figure 3 displays separately the painting and the photograph subsets in the space spanned by the first two principal components. The examination of Figure 3 reconfirms the previously-made observation that photographs appear to be a special case of paintings: the photograph point cloud has less variance and partially overlaps (at least in the space spanned by the first two principal components) with a portion of the paintings point cloud. This observation is also supported by the larger singular values of the painting point cloud (5.03, 0.21, 0.1, 0.08, 0.002) compared to those of the photograph point cloud (4.15, 0.12, 0.08, 0.03, 0.003). Following the classification methodology described above, a six-unit perceptron was used to perform paintingphotograph discrimination in the 5-dimensional RGBXY space. On the average, the networks correctly classified 81% of the photographs and 81% of the paintings in the test set, with a standard deviation of 3%, respectively, 3% 0 0 −20 −20 −40 −40 −60 −400 −300 −200 −100 0 100 −60 −400 −300 −200 −100 0 100 Figure 3: RGBXY space: painting and photograph data points represented separately in the same two-dimensional space of the first two principal components of the common painting-photograph image set. L EFT: photographs R IGHT: paintings. percent. Classification in Gabor space We calculated the means and the standard deviations of the texture features over all paintings and all photographs. Figure 4 displays the results. Interestingly, photographs tend to have more energy at horizontal and vertical orientations at all scales, while paintings have more energy at diagonal (45 and 135 degrees) orientations. Following the methodology described above, a six-neuron perceptron was used to perform paintingphotograph discrimination in the 5-dimensional space of the singular values. On the average, the networks correctly classified 78% of the photographs and 79% of the paintings in the test set, with a standard deviation of 4%, respectively, 5% percent. Discrimination using multiple classifiers We found that the most effective method of combining the three neural classifiers described above is to simply average their outputs–the “committees” of neural networks idea (see for example [5]). An individual classifier outputs a number between 0 (perfect painting) and 1 (perfect photograph). Thus, if for a given input image, the average of the outputs of the three classifiers was , it was classified as a painting; otherwise it was considered a photograph. To evaluate the performance of this combination of the individual classifiers, we partitioned the painting and photograph sets into 6 equal parts each. By pairing all photograph parts with all painting parts, 36 training sets were generated. A training set consisted of 1000 paintings and 1000 photographs, and the corresponding test set consisted of the remaining 5000 paintings and 5000 photographs. Each of the three classifiers were trained on the same training set, and their individual and their average performance was measured on the same test set. This procedure was repeated for all available training and testing sets. Classifier performance is de −3 15 −3 x 10horiz orient: means 15 5 x 10 diag orient: means 4 10 3 5 0 0 −5 −5 10 5 P hit rate ( 72 % Q 81 % 79 % Q 94 % Classifier −3 x 10 vert orient: means 2 1 0 1 2 3 4 1 horiz orient: std 2 3 4 −1 0 vert orient: std 0.025 2 4 0.02 0.02 0.008 0.015 0.006 0.01 0.01 0.005 0.005 0 0 1 2 3 4 −0.005 ) 6 Thus, these are images that our classifier considers to be typical paintings and photographs. We note that the error rate was very low (under 4%) at these output values. Figures 5, 6 display several typical paintings. Note the variety of styles of these paintings: one is tempted to conclude that the features the classifiers use capture the essence of “paintingness” of an image. Figures 7 display examples of typical photographs. We note that these tend to be typical, not artistic or in any way (illumination, subject, etc.) unusual photographs. 0.004 −0.005 Ph: hit rate ( R 71 % Q 81 % R 78 % 92 % Table 2: Classification performance: the mean and the stan dard deviation of the hit rates over the 100 testing sets. is the classifier operating in the space of the scalar-valued RGBXY space, and is features. is the classifier for the classifier for Gabor space. is the average classifier. P denotes paintings, Ph denotes photographs. 0.01 0.015 ) 6 diag orient: std 0.025 0.002 1 2 3 0 4 0 2 4 Figure 4: Errorbar plots illustrating the dependence of the imagemean and image-standard deviation of the Gabor filter outputs on filter scale and orientation for the painting (red lines) and photograph (interrupted blue lines) image sets. TOP LEFT: Horizontal orientation. Image-set-mean and image-set-standard deviation of the image-mean of Gabor filter output magnitude as a function of scale. Errobars represent the standard deviations determined across images. TOP MIDDLE : Vertical orientation. TOP RIGHT: Diagonal orientations: the data for and are presented together. BOTTOM LEFT: Horizontal orientation. Image-set-mean and image-set-standard deviation of the image-standard-deviation of Gabor filter output magnitude as a function of filter scale. Errobars represent the standard deviations determined across images. BOTTOM MIDDLE : Vertical orientation. BOTTOM RIGHT: Diagonal orientations: and . scribed in Table 2. The averaged (combined) classifier exceeds correct, significantly outperforming the individual classifiers for both paintings and photographs. This improvement is to expected, since each classifier works in a different feature space. PAINTIING 0.028645 PAINTIING 0.027513 PAINTIING 0.029838 PAINTIING 0.024984 PAINTIING 0.026304 PAINTIING 0.018153 PAINTIING 0.027244 PAINTIING 0.045818 PAINTIING 0.023283 5. Illustrating classifier performance We illustrate with examples the performance of our classifier. We selected the best classifier from the set of classifiers from which the statistics Table 2 were derived, and we studied its performance on its test set. Typical photographs and paintings For an input image, the output of the combined classifier is a number , 0 corresponding to a perfect painting and 1 to a perfect photograph; in other words, classifier output (displayed above of each image) can be interpreted as the degree of photorealism of the input image. We illustrate the behavior of the combined classifier by displaying images for which classifiers output was very close to 0 ( ) or to 1 ( ). Figure 5: Images rated as typical paintings. Classifier output is displayed above each image. An output of 1 is a perfect photograph Misclassified images The mistakes made by our classifier were interesting, in that they seemed to reflect the degree of perceptual photorealism of the input image. Figures 9, 10, 11 display paintings that were incorrectly classified as photographs. Note that most of these incorrectly classified paintings look quite photorealistic at a local level, even if their content is not realistic. Figures 12, 13, 14 display photographs that were incorrectly classified as paintings. These photographs correspond, by large, to vividly colored PAINTIING 0.029627 PAINTIING 0.029054 PAINTIING 0.026572 PAINTIING 0.028397 PAINTIING 0.022007 PAINTIING 0.028713 PHOTOGRAPH 0.92511 PHOTOGRAPH 0.92735 PHOTOGRAPH 0.93509 Figure 8: Images rated as typical photographs. Classifier output is displayed above each image. An output of 1 is a perfect photograph PAINTIING 0.023325 PAINTIING 0.027172 PAINTIING 0.017993 PHOTOGRAPH 0.65745 PHOTOGRAPH 0.50911 0.52118 PHOTOGRAPH 0.63159 PHOTOGRAPH PHOTOGRAPH 0.82392 PHOTOGRAPH 0.66837 Figure 6: Images rated as typical paintings. Classifier output is displayed above each image. An output of 1 is a perfect photograph PHOTOGRAPH 0.90998 PHOTOGRAPH 0.90628 PHOTOGRAPH 0.90382 PHOTOGRAPH 0.9254 PHOTOGRAPH 0.92845 PHOTOGRAPH 0.58216 PHOTOGRAPH 0.56688 PHOTOGRAPH 0.75062 PHOTOGRAPH 0.91042 Figure 9: Paintings classified as photographs. Classifier output is displayed above each image. An output of 1 is a perfect photograph PHOTOGRAPH 0.90048 PHOTOGRAPH 0.91188 PHOTOGRAPH 0.91174 Figure 7: Images rated as typical photographs. Classifier output is displayed above each image. An output of 1 is a perfect photograph objects—which sometimes are painted 3-D objects— or to blurry or ”artistic” photographs, or to photographs taken under unusual illumination conditions. 6. Psychophysical experiments The inspection of the correctly and the incorrectly classified images appears to suggest that the value of the output of the combined classifier correlates with the images’ perceptual degree of photorealism. The goal of the experiment described in this section was to obtain subjective ratings of degree of photorealism for both paintings and photographs. Once measured psychophysically, the subjective ratings can then be compared with the output of the combined classifier for the same images, and thus a quantitative measure of the perceptual relevance of the classifier’s output can be derived. Two observations are in order. First, in the context of this work, the degree of photorealism of an image is not dependent on image content, but on much lower-level image features. For example, certain surrealistic paintings (Magritte, Dali) are, according to our definition of the term, photorealistic, despite their non-realistic content. Thus, to remove the influence of image content, photorealism ratings should be performed locally, on image patches. Second, not all photographs have a maximum degree of photorealism: some photographs taken in peculiar illumination conditions and some “artistic” photographs look painting-like. To reduce the influence of image content on photorealism ratings, the subjects were shown scrambled versions of the original images. The scrambled images were obtained by dividing the image into square blocks, and then rearranging the blocks randomly, as shown in Figure 15. Two subjects rated 600 paintings and photographs randomly selected from among the image sets. The subjects were asked to estimate, on a scale from 1 to 10, the likelihood that the input image was a painting or a photograph: 1 corresponded to a perfect photograph, 10 to a perfect painting. The subjects made their responses by using the F1–F10 keys on the computer keyboard. Each image was shown PHOTOGRAPH 0.91841 PHOTOGRAPH 0.85455 PHOTOGRAPH 0.65303 PAINTIING 0.17287 PAINTIING 0.37311 PAINTIING 0.23322 PHOTOGRAPH 0.53064 PHOTOGRAPH 0.59253 PHOTOGRAPH 0.79344 PAINTIING 0.31456 PAINTIING 0.26809 PAINTIING 0.49037 PAINTIING 0.38022 PAINTIING 0.37232 PAINTIING 0.14935 Figure 10: Paintings classified as photographs. Classifier output is displayed above each image. An output of 1 is a perfect photograph PHOTOGRAPH 0.7775 PHOTOGRAPH 0.67624 PHOTOGRAPH 0.6325 Figure 12: Photographs classified as paintings. Classifier output is displayed above each image. An output of 0 is a perfect painting PHOTOGRAPH 0.77719 PHOTOGRAPH 0.59299 PHOTOGRAPH 0.66908 PAINTIING 0.26037 PHOTOGRAPH 0.74068 PHOTOGRAPH 0.75132 PHOTOGRAPH PAINTIING 0.36976 PAINTIING 0.40285 PAINTIING 0.48961 PAINTIING 0.34342 0.8564 PAINTIING 0.3746 PAINTIING 0.063102 PAINTIING 0.13133 Figure 11: Paintings classified as photographs. Classifier output is displayed above each image. An output of 1 is a perfect photograph briefly (750 msec) and was tested twice, the two ratings being averaged. The ratings of the two subjects were highly similar, having a correlation coefficient of 0.90. The ratings of the two subjects were averaged, and then compared to the outputs of the combined classifier for the same images. The correlation coefficient between the averaged subjective ratings and classifier outputs was 0.865. This relatively strong correlation to perceptual data suggests that our classifier captured perceptually relevant features. 7. Discussion We presented an image classification system that discriminates paintings from photographs. This image classification problem is challenging and interesting, as it is very general and must be performed in image-content-independent fashion. Using low-level image features, and a relatively small training set, we achieved discrimination performance levels of over 90%. It is interesting to compare our results to the work of Athitsos et al. [2], who accurately (over Figure 13: Photographs classified as paintings. Classifier output is displayed above each image. An output of 1 is a perfect photograph 90% correct) distinguished photographs from computergenerated graphics. The authors used the term computergenerated graphics to denote desktop or web page icons and not computer-rendered images of 3-D scenes. Obviously, paintings can be much more similar to photographs than icons are. Several features these authors used are similar to ours. Athitsos et al. noted that there is more variability in the color transitions from pixel to pixel in photographs than in graphics. We also quantified the same feature (albeit in a different way) and found more variability in paintings than in photographs. The authors also observed that edges are much sharper in graphics than in photographs. We, on the other hand, found no difference in intensity edge structure between photographs and paintings, but found instead that paintings have significantly more pure-color edges. Athit- PAINTIING 0.3831 PAINTIING 0.092071 PAINTIING 0.2402 PAINTIING 0.34862 PAINTIING 0.45955 PAINTIING 0.12466 PAINTIING 0.072161 PAINTIING 0.49891 PAINTIING 0.24804 Figure 14: Photographs classified as paintings. Classifier output is displayed above each image. An output of 1 is a perfect photograph Figure 15: LEFT: scrambled painting. photograph. RIGHT: from photographs in gray-scale images. However, it is possible that human painting-photograph discrimination relies heavily on image content, and thus is not affected by the loss of color information. To elucidate this point, we are planning to conduct psychophysical experiments similar to the one described in this paper on scrambled gray-level images. If the removal of color information affects the photorealism ratings significantly, it will mean that color is critical for human observers also. It is easy to convince oneself that reducing image size (by smoothing and sub-sampling) renders the perceptual painting/photograph discrimination more difficult if the paintings have “realistic” content. Thus, it is reasonable to expect that the discrimination performance of our classifier will also improve with increasing image resolution—hypothesis that we are planning to verify in future work. In our study we employed images of modest resolution, typical for web-available images. Certain differences between paintings and photographs might be observable only at high resolutions. Specifically, although we did not observe any differences in the edge structure of paintings and photographs in our images, we suspect that the intensity edges in paintings are different from intensity edges in photographs. In future work, we plan to study this issue on high-resolution images. scrambled sos et al. found that graphics contain more saturated colors than photographs; we found that the same was true for paintings. The authors found that graphics contain less unique (distinct) colors than photographs; we found paintings to have more unique colors than photographs. In addition, Athitsos et al. used two powerful, color-histogram based features: the prevalent color metric and the color histogram metric. We also found experimentally that the hue (or full RGB) histograms are quite useful in distinguishing between photographs and paintings; for example, the hue corresponding to the color of the sky was quite characteristic of outdoor photographs. However, since hue is image content-dependent to a large degree, we decided against using hue histograms (or RGB histograms) in our classifiers, as our intention was to distinguish paintings from photographs in a image content-independent manner. Two of the features in Athitsos et al. — smallest dimension and dimension — exploited the size characteristics of the graphics images and were not applicable to our problem. Most of our features use color in one way or another. The Gabor features are the only ones that use exclusively image intensities, and taken in isolation are not sufficient for accurate discrimination. Thus, color is critical for the good performance of our classifier. This appears to be different from human classification, since human can effortlessly discriminate paintings References [1] B.S. Manjunath and W.Y. Ma. Texture features for browsing and retrieval of image data. IEEE Transaction on Pattern Analysis an Machine Intelligence, 18(8):837–842, 1996. [2] V. Athitsos, M.J. Swain, and C. Frankel. Distinguishing photographs and graphics on the World Wide Web. Workshop on Content-Based Access of Image and Video Libraries (CBAIVL ’97) Puerto Rico, 1997. [3] M. Szummer and R. W. Picard. Indoor-outdoor image classification. In IEEE International Workshop on Content-based Access of Image and Video Databases, in conjunction with CAIVD’98, pages 42–51, 1998. [4] A. Vailaya, A. K. Jain, and H.-J. Zhang. On image classification: City vs. landscapes. International Journal of Pattern Recognition, 31:1921–1936, 1998. [5] C. M. Bishop. Neural networks for pattern recognition The Clarendon Press Oxford University Press, New York, 1995 [6] Mirth A. Hoang et al. Measurement of Color Texture citeseer.nj.nec.com/hoang02measurement.html [7] A.K. Jain and F. Farrokhnia. Unsupervised texture segmentation using gabor filters Pattern Recognition, 24(12):11671186, 1991. [8] A. C Bovik et al. Multichannel texture analysis using localized spatial filters. IEEE Trans. on PAMI, 12(1):55-73, 1990.