Estimating the photorealism of images: Distinguishing paintings from photographs

advertisement
Estimating the photorealism of images:
Distinguishing paintings from photographs
Florin Cutzu, Riad Hammoud, Alex Leykin
Department of Computer Science
Indiana University
Bloomington, IN 47408
florin, rhammoud, oleykin cs.indiana.edu
Abstract
Automatic classification of an image as a photograph of a
real-scene or as a painting is potentially useful for image
retrieval and website filtering applications. The main contribution of this paper is the proposition of several features
derived from the color, edge, and gray-scale-texture information of the image that effectively discriminate paintings
from photographs. For example, we found that paintings
contain significantly more pure-color edges, and that certain gray-scale-texture measurements (mean and variance
of Gabor filters) are larger for photographs. Using a large
set of images ( ) collected from different web sites,
the proposed features exhibit very promising classification
performance (over ). A comparative analysis of the
automatic classification results and psychophysical data is
reported, suggesting that the proposed automatic classifier
estimates the perceptual photorealism of a given picture.
1. Introduction
The goal of the present work was the determination of the
image features distinguishing photographs of real-world,
three-dimensional, scenes from (photographs of) paintings
and the development of a classifier system for their automatic differentiation. In the context of this work, the class
“painting” included not only conventional canvas paintings,
but also frescoes and murals. Line (pencil or ink) drawings as well as computer-generated images were excluded.
No restrictions were imposed on the historical period or
on the style of the painting. The class “photograph” included exclusively color photographs of three-dimensional
real-world scenes. The problem of distinguishing paintings
from photographs is non-trivial even for a human observer,
as can be appreciated from the numerous illustrations shown
later in the paper. To our knowledge, the present study is the
first to address the problem of photograph-painting discrimination. This problem is related thematically to other work
on broad image classification: city images vs. landscapes
1063-6919/03 $17.00 © 2003 IEEE
[4], indoor vs. outdoor [3], and photographs vs. graphics
[2] differentiation. Distinguishing photographs from paintings is, however, more difficult than the above classifications due to the generality of the problem. One difficulty
is that that are no constraints on the image content of either class, such as those successfully exploited in differentiating city images from landscapes or indoor from outdoor
images. The problem of distinguishing computer-generated
graphics from photographs is closest to the problem considered here, and their relation will be discussed in more
detail in Section 7. At this point, it suffices to note that the
differences between (especially realistic) paintings and photographs are subtler than the differences between graphics
and photographs; in addition, the definition of computergenerated graphics used in [2] allowed the use of powerful constraints that are not applicable to the paintings class.
From a theoretical standpoint, the problem of separating
photographs from paintings is interesting because it constitutes a first attempt at revealing the features of real-world
images that are mis-represented in hand-crafted images.
From a practical standpoint, our results are useful for the automatic classification of images in large electronic-form art
collections, such as those maintained by many museums. A
special application is in distinguishing pornographic images
from nude paintings: distinguishing paintings from photographs is important for web browser blocking software,
which currently blocks not only pornography (photographs)
but also artistic images of the human body (paintings).
Organization of the paper Section 2 describes the image
features used to differentiate between paintings and photographs; section 3 describes the image set used. Section 4
contains the results: describes the inter-relations between
the various features, the discrimination performance obtained using one feature at a time and classification results
obtained by using all features concurrently. In section 5
some classified image samples are illustrated and discussed.
Section 6 compares psychophysical data with the output of
the classifier system. Section 7 places our results in the context of related work and outlines further work.
2. Proposed distinguishing features
This section details the features proposed for discriminating
painting from photographs: four so-called visual features,
a hybrid spatial-color feature and a texture-based feature.
The formulation of these features was based upon the visual
inspection of a large number of photographs and paintings.
2.1. Visual features
Four scalar-valued visual features were defined, as follows.
1. Color edges vs. intensity edges The removal of color
eliminates more visual information from a painting than
from a photograph of a real scene. Specifically, conversion to gray-scale leaves most edges in photographs intact
but it eliminates many of the perceptual edges in paintings. These observations led to the following hypotheses:
(1) Perceptual edges in photographs are, largely, intensity
edges. These intensity edges can be at the same time color
edges and there are few “pure” color edges – color, not intensity edges. (2) Many of the perceptual edges in paintings are pure color edges, as they result from color changes
that are not accompanied by concomitant edge-like intensity changes. A quantitative criterion was developed. Consider a color input image—painting or photograph. First,
the intensity edges were found. information was removed
by dividing the R, G, and B image components by the image
intensity at each pixel, resulting in normalized RGB components. The color edges of the resulting “intensity-free”
color image were determined applying the Canny edge detector to the three color channels and fusing the resulting
edges. Consider the edge pixels that are intensity but not
color edge (pure intensity edge pixels); hue does not change
substantially across a pure intensity edge. Let denote
fraction of pure intensity-edge pixels:
# pixels: intensity, not color edge
total number of edge pixels Our hypothesis was that is larger for photographs.
2. Spatial variation of color Color changes to a larger extent from pixel to pixel in paintings than in photographs.
Given an input image, its R, G and B channels were normalized by division by image intensity. If in the neighborhood of a pixel the R, G, and B color surfaces are parallel or nearly so, there is no substantial hue change in that
neighborhood; if there is a non-zero angle among any two
of these surfaces, color changes qualitatively. At each pixel,
we determined the orientation of the plane that best fitted a
neighborhood centered on the pixel of interest, in the
R, G, and B domains respectively. Thus, at each pixel, one
obtains three normals: , , . The sum of the areas of
the facets of the pyramid determined by these normals was
taken as a measure of the local variation of color around the
pixel. Let denote the average of this quantity taken over
all image pixels. should be, on the average, larger for
paintings than for photographs.
3. Number of unique colors Paintings appear to have
a larger color palette than photographs. The number of
unique colors of an image was normalized by the total number of pixels, resulting in a measure, , which we hypothesized to be larger for paintings.
4. Pixel saturation Paintings tend to contain a larger
percentage of highly saturated pixels whereas photographs
contain more unsaturated pixels. This can be seen in Figure 1, which displays the mean saturation histograms derived from all paintings and all photographs in our datasets.
The input images were transformed to HSV color space, and
their saturation histograms were determined, using 20 bins.
The ratio, , between the count in the highest bin ( ) and
the lowest ( ) should, on the average, be larger for paintings,
and can be used to help painting-photograph discrimination.
Average saturation histograms. Black: photographs. Light color: paintings
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
2
4
6
8
10
12
14
16
18
20
Figure 1: The mean saturation histogram for photographs
(black) and paintings (yellow). 20 bins were used. Photographs have more unsaturated pixels, paintings have more
highly saturated pixels.
2.2. Pixel distribution in RGBXY space
An image is a point-cloud in RGB space. The clouds of
color-poor images (photographs, mostly) are restricted to
subspaces of the 3-D space, having the appearance of cylinders or planes; the clouds of color-rich images (paintings,
mostly) are fully 3-D and cannot be approximated well by a
1-D or 2-D subspace. One can enhance this representation
by adding the two spatial coordinates, and to the RGB
vector of each image pixel, resulting in a 5-dimensional,
joint color-location space we call RGBXY. The singular val covariance matrix of the RGBXY
ues of the
point cloud describe the variability of the image pixels in
both color space as well as across the image. Typically,
paintings use a larger color palette and also have larger
spatial variation of color, resulting in larger singular values for this covariance matrix. We represented each image
by the 5-dimensional vector of the singular values of its
RGBXY pixel covariance matrix. We expect paintings and
photographs to be well-separated in this space.
2.3. Gray-Scale-Texture
The features described above use color to distinguish between paintings and photographs. One can observe that the
texture elements (texels) in photographs tend to be repetitive
while in hand-crafted images it is difficult to maintain texel
uniformity. This section reviews the Gabor filter design and
its implementation in this work on gray-scale images.
Gabor filter review. The joint color-texture properties of images
were represented in the wavelength-Fourier
, where is the wavelength of the light endomain
ergy of the
three dimensional
density function of the color
image
,
denotes the spatial frequency and
the spatial
coordinates. The texture measurement
of
the signal
at a given spatial frequency is obtained by
the multiplication with a Gaussian function
centered at
at scale [6] [1]. This is equivalent to
the convolution with a Gabor filter in domain [8] [7].
Therefore, the texture
measurement in the domain
at wavelength is:
"!
#$
where denotes convolution, and
&%' (*
),+.-/1023542
28
:%<; >=@?BAC:
)9+
-
276
is the 2D Gabor function at the radial center frequency
D
filter orientation
FE ?GA (cycles/pixels)
HI:J K
ON andand ( the
AML . Here, ;
is inversely proportional to .
Implementation. In the current implementation the
color information was simplified first by using the color
spectrum , and P of a given RGB color image instead
of the reconstruction of
from its derivatives at
a given frequency [6]. Then, a gray-scale image signal is
obtained by averaging the three color channels. The motivation is that we would like to derive a feature that is colorindependent. The design of the Gabor filter used here is
similar to the one described in [1], where redundancy of
the information in the filtered images caused by the non orthogonality of the Gabor wavelets is reduced. Thus four different scales and four different orientations (0, 90, 45, 135
degrees) have been used. Then the mean and the standard
deviation of the Gabor responses across image locations for
each filtered image are computed and a feature vector of 72
dimensions is obtained.
3. Image set
We used 6000 photographs and 6000 paintings, obtained
from a variety of web sites (e.g. www.dlib.indiana.edu,
www.artchive.com, www.freefoto.com). The paintings included a wide variety of artistic styles and historical periods, from Byzantine Art and Renaissance to Modernism
(cubism, surrealism, pop art, etc). The photographs were
also very varied in content–including animals, humans, city
scenes and landscapes, indoor scenes. Image resolution was
typical of web-available images. Mean image size was for
:QSR TR U pixels and standard deviation &U R5Q
paintings
:V5W V
pixels. For photographs mean image size was
R5R
pixels.
pixels and standard deviation 4. Results
Classification in X 1Y space We first verified
whether , , and capture genuinely different image properties by measuring two measures of redundancy:
pairwise feature correlation and the singular values of their
covariance matrix. We found that: (1) the pairwise feature
correlations are insignificant and (2) the smallest singular
value is significantly larger than 0. To illustrate the distribution of the images in X ZY space, we computed
the principal components of the common painting and photograph data set. Figure 2 displays painting and the photographs in the subspace of the first two principal components. Interestingly, the photographs overlap a subclass of
the paintings: the photograph data set coincides with the
right “lobe” of the paintings point cloud. This observation
is in accord with the larger variability of the paintings class
indicated by the larger singular values of its covariance matrix, and with the realization that photographs can be construed as extremely realistic paintings. We measured the
Paintings
Photographs
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−1.5
−2
−4
−2
0
2
−2
−4
−2
0
2
Figure 2: Painting and photograph data points represented separately in the same two-dimensional space of the
first two principal components of the common paintingphotograph image set. L EFT: paintings R IGHT: photographs.
separation effectiveness of each of the visual features when
used individually. Each feature was measured for all photographs and all paintings in the database, and a optimum
threshold value, chosen so as to minimize the maximum
of the two misclassification rates—for photographs and for
paintings, was calculated. The results are listed in Table 1
indicate that the separation power of each visual feature is
modest, and the edge-based feature is the most effective.
Neural network classification methodology A perceptron with six sigmoidal units in its unique hidden layer
Feature
P miss rate
33.34
36.14
37.40
37.93
Ph miss rate
33.34
36.13
37.43
37.92
Photographs
order
P Ph
P Ph
P Ph
P Ph
Paintings
80
80
60
60
40
40
20
20
Table 1: Painting-photograph discrimination performance.
P denotes paintings, Ph denotes photographs. For each feature, paintings were separated from photographs using an
optimal threshold. The miss rate is defined as the proportion
of images incorrectly classified. The last column indicates
the order of the classes with respect to the threshold.
was employed to perform painting-photograph discrimination in the joint visual feature space X 1Y . Classifier performance was evaluated as follows. We partitioned the paintings and photographs sets into 6 parts (nonoverlapping subsets) of 1000 elements each. By pairing
all photograph parts with all painting parts, 36 training sets
were generated. Thus, a training set consisted of 1000 paintings and 1000 photographs, and the corresponding test set
consisted of 5000 paintings and 5000 photographs. 36 networks were trained and tested, one for each training set.
Due to the small size of the network, the convergence of
the backpropagation calculation was quite rapid in almost
re-initializations of the optiall cases, and usually,
mization were sufficient for deriving an effective network.
On the average, the networks correctly classified 71% of the
photographs and 72% of the paintings in the test set, with a
standard deviation of 4%, respectively, 5%.
Classification in RGBXY space For visualization purposes, we determined the principal components of the common painting and photograph data set encoded in the space
of the five singular values of the RGBXY covariance matrix. Figure 3 displays separately the painting and the photograph subsets in the space spanned by the first two principal components. The examination of Figure 3 reconfirms
the previously-made observation that photographs appear
to be a special case of paintings: the photograph point
cloud has less variance and partially overlaps (at least in the
space spanned by the first two principal components) with
a portion of the paintings point cloud. This observation is
also supported by the larger singular values of the painting point cloud (5.03, 0.21, 0.1, 0.08, 0.002) compared to
those of the photograph point cloud (4.15, 0.12, 0.08, 0.03,
0.003). Following the classification methodology described
above, a six-unit perceptron was used to perform paintingphotograph discrimination in the 5-dimensional RGBXY
space. On the average, the networks correctly classified
81% of the photographs and 81% of the paintings in the
test set, with a standard deviation of 3%, respectively, 3%
0
0
−20
−20
−40
−40
−60
−400
−300
−200
−100
0
100
−60
−400
−300
−200
−100
0
100
Figure 3: RGBXY space: painting and photograph data
points represented separately in the same two-dimensional
space of the first two principal components of the common
painting-photograph image set. L EFT: photographs R IGHT:
paintings.
percent.
Classification in Gabor space We calculated the means
and the standard deviations of the texture features over all
paintings and all photographs. Figure 4 displays the results.
Interestingly, photographs tend to have more energy at horizontal and vertical orientations at all scales, while paintings have more energy at diagonal (45 and 135 degrees)
orientations. Following the methodology described above,
a six-neuron perceptron was used to perform paintingphotograph discrimination in the 5-dimensional space of
the singular values. On the average, the networks correctly
classified 78% of the photographs and 79% of the paintings
in the test set, with a standard deviation of 4%, respectively,
5% percent.
Discrimination using multiple classifiers We found that
the most effective method of combining the three neural classifiers described above is to simply average their
outputs–the “committees” of neural networks idea (see for
example [5]). An individual classifier outputs a number between 0 (perfect painting) and 1 (perfect photograph). Thus,
if for a given input image, the average of the outputs of the
three classifiers was
, it was classified as a painting;
otherwise it was considered a photograph. To evaluate the
performance of this combination of the individual classifiers, we partitioned the painting and photograph sets into
6 equal parts each. By pairing all photograph parts with
all painting parts, 36 training sets were generated. A training set consisted of 1000 paintings and 1000 photographs,
and the corresponding test set consisted of the remaining
5000 paintings and 5000 photographs. Each of the three
classifiers were trained on the same training set, and their
individual and their average performance was measured on
the same test set. This procedure was repeated for all available training and testing sets. Classifier performance is de
−3
15
−3
x 10horiz orient: means
15
5
x 10 diag orient: means
4
10
3
5
0
0
−5
−5
10
5
P hit rate (
72 %
Q
81 %
79 %
Q
94 %
Classifier
−3
x 10 vert orient: means
2
1
0
1
2
3
4
1
horiz orient: std
2
3
4
−1
0
vert orient: std
0.025
2
4
0.02
0.02
0.008
0.015
0.006
0.01
0.01
0.005
0.005
0
0
1
2
3
4
−0.005
)
6
Thus, these are images that our classifier considers to be
typical paintings and photographs. We note that the error
rate was very low (under 4%) at these output values. Figures 5, 6 display several typical paintings. Note the variety
of styles of these paintings: one is tempted to conclude that
the features the classifiers use capture the essence of “paintingness” of an image. Figures 7 display examples of typical photographs. We note that these tend to be typical, not
artistic or in any way (illumination, subject, etc.) unusual
photographs.
0.004
−0.005
Ph: hit rate (
R
71 %
Q
81 %
R
78 %
92 %
Table 2: Classification performance: the mean and the stan
dard deviation of the hit rates over the 100 testing sets. is the classifier
operating in the space of the scalar-valued
RGBXY space, and is
features. is the classifier for
the classifier for Gabor space. is the average classifier. P
denotes paintings, Ph denotes photographs.
0.01
0.015
)
6
diag orient: std
0.025
0.002
1
2
3
0
4
0
2
4
Figure 4: Errorbar plots illustrating the dependence of the imagemean and image-standard deviation of the Gabor filter outputs on
filter scale and orientation for the painting (red lines) and photograph (interrupted blue lines) image sets. TOP LEFT: Horizontal
orientation. Image-set-mean and image-set-standard deviation of
the image-mean of Gabor filter output magnitude as a function
of scale. Errobars represent the standard deviations determined
across images. TOP MIDDLE : Vertical orientation. TOP RIGHT:
Diagonal orientations: the data for and are presented together. BOTTOM LEFT: Horizontal orientation. Image-set-mean
and image-set-standard deviation of the image-standard-deviation
of Gabor filter output magnitude as a function of filter scale. Errobars represent the standard deviations determined across images.
BOTTOM MIDDLE : Vertical orientation. BOTTOM RIGHT: Diagonal orientations: and .
scribed in Table 2. The averaged (combined) classifier exceeds correct, significantly outperforming the individual classifiers for both paintings and photographs. This improvement is to expected, since each classifier works in a
different feature space.
PAINTIING 0.028645
PAINTIING 0.027513
PAINTIING
0.029838
PAINTIING 0.024984
PAINTIING
0.026304
PAINTIING 0.018153
PAINTIING 0.027244
PAINTIING
0.045818
PAINTIING 0.023283
5. Illustrating classifier performance
We illustrate with examples the performance of our classifier. We selected the best classifier from the set of classifiers
from which the statistics Table 2 were derived, and we studied its performance on its test set.
Typical photographs and paintings For an input image,
the output of the combined classifier is a number , 0
corresponding to a perfect painting and 1 to a perfect photograph; in other words, classifier output (displayed above
of each image) can be interpreted as the degree of photorealism of the input image. We illustrate the behavior of the
combined classifier by displaying images for which classifiers output was very close to 0 ( ) or to 1 ( ).
Figure 5: Images rated as typical paintings. Classifier output is displayed above each image. An output of 1 is a perfect photograph
Misclassified images The mistakes made by our classifier
were interesting, in that they seemed to reflect the degree
of perceptual photorealism of the input image. Figures 9,
10, 11 display paintings that were incorrectly classified as
photographs. Note that most of these incorrectly classified
paintings look quite photorealistic at a local level, even if
their content is not realistic. Figures 12, 13, 14 display
photographs that were incorrectly classified as paintings.
These photographs correspond, by large, to vividly colored
PAINTIING 0.029627
PAINTIING
0.029054
PAINTIING
0.026572
PAINTIING
0.028397
PAINTIING
0.022007
PAINTIING
0.028713
PHOTOGRAPH 0.92511
PHOTOGRAPH 0.92735
PHOTOGRAPH
0.93509
Figure 8: Images rated as typical photographs. Classifier
output is displayed above each image. An output of 1 is a
perfect photograph
PAINTIING
0.023325
PAINTIING
0.027172
PAINTIING
0.017993
PHOTOGRAPH
0.65745
PHOTOGRAPH 0.50911
0.52118
PHOTOGRAPH 0.63159
PHOTOGRAPH
PHOTOGRAPH 0.82392
PHOTOGRAPH
0.66837
Figure 6: Images rated as typical paintings. Classifier output is displayed above each image. An output of 1 is a perfect photograph
PHOTOGRAPH
0.90998
PHOTOGRAPH 0.90628
PHOTOGRAPH 0.90382
PHOTOGRAPH 0.9254
PHOTOGRAPH
0.92845
PHOTOGRAPH
0.58216
PHOTOGRAPH
0.56688
PHOTOGRAPH 0.75062
PHOTOGRAPH 0.91042
Figure 9: Paintings classified as photographs. Classifier
output is displayed above each image. An output of 1 is
a perfect photograph
PHOTOGRAPH
0.90048
PHOTOGRAPH 0.91188
PHOTOGRAPH 0.91174
Figure 7: Images rated as typical photographs. Classifier
output is displayed above each image. An output of 1 is a
perfect photograph
objects—which sometimes are painted 3-D objects— or to
blurry or ”artistic” photographs, or to photographs taken under unusual illumination conditions.
6. Psychophysical experiments
The inspection of the correctly and the incorrectly classified images appears to suggest that the value of the output
of the combined classifier correlates with the images’ perceptual degree of photorealism. The goal of the experiment
described in this section was to obtain subjective ratings of
degree of photorealism for both paintings and photographs.
Once measured psychophysically, the subjective ratings can
then be compared with the output of the combined classifier for the same images, and thus a quantitative measure
of the perceptual relevance of the classifier’s output can be
derived. Two observations are in order. First, in the context of this work, the degree of photorealism of an image is
not dependent on image content, but on much lower-level
image features. For example, certain surrealistic paintings
(Magritte, Dali) are, according to our definition of the term,
photorealistic, despite their non-realistic content. Thus, to
remove the influence of image content, photorealism ratings
should be performed locally, on image patches. Second,
not all photographs have a maximum degree of photorealism: some photographs taken in peculiar illumination conditions and some “artistic” photographs look painting-like.
To reduce the influence of image content on photorealism
ratings, the subjects were shown scrambled versions of the
original images. The scrambled images were obtained by
dividing the image into square blocks, and then rearranging
the blocks randomly, as shown in Figure 15.
Two subjects rated 600 paintings and photographs randomly selected from among the image sets. The subjects
were asked to estimate, on a scale from 1 to 10, the likelihood that the input image was a painting or a photograph: 1
corresponded to a perfect photograph, 10 to a perfect painting. The subjects made their responses by using the F1–F10
keys on the computer keyboard. Each image was shown
PHOTOGRAPH 0.91841
PHOTOGRAPH 0.85455
PHOTOGRAPH 0.65303
PAINTIING 0.17287
PAINTIING 0.37311
PAINTIING 0.23322
PHOTOGRAPH 0.53064
PHOTOGRAPH 0.59253
PHOTOGRAPH 0.79344
PAINTIING 0.31456
PAINTIING 0.26809
PAINTIING 0.49037
PAINTIING 0.38022
PAINTIING 0.37232
PAINTIING
0.14935
Figure 10: Paintings classified as photographs. Classifier
output is displayed above each image. An output of 1 is a
perfect photograph
PHOTOGRAPH 0.7775
PHOTOGRAPH 0.67624
PHOTOGRAPH 0.6325
Figure 12: Photographs classified as paintings. Classifier
output is displayed above each image. An output of 0 is a
perfect painting
PHOTOGRAPH
0.77719
PHOTOGRAPH
0.59299
PHOTOGRAPH 0.66908
PAINTIING 0.26037
PHOTOGRAPH
0.74068
PHOTOGRAPH
0.75132
PHOTOGRAPH
PAINTIING 0.36976
PAINTIING 0.40285
PAINTIING 0.48961
PAINTIING 0.34342
0.8564
PAINTIING
0.3746
PAINTIING
0.063102
PAINTIING 0.13133
Figure 11: Paintings classified as photographs. Classifier
output is displayed above each image. An output of 1 is a
perfect photograph
briefly (750 msec) and was tested twice, the two ratings being averaged. The ratings of the two subjects were highly
similar, having a correlation coefficient of 0.90. The ratings of the two subjects were averaged, and then compared
to the outputs of the combined classifier for the same images. The correlation coefficient between the averaged subjective ratings and classifier outputs was 0.865. This relatively strong correlation to perceptual data suggests that our
classifier captured perceptually relevant features.
7. Discussion
We presented an image classification system that discriminates paintings from photographs. This image classification
problem is challenging and interesting, as it is very general
and must be performed in image-content-independent fashion. Using low-level image features, and a relatively small
training set, we achieved discrimination performance levels of over 90%. It is interesting to compare our results
to the work of Athitsos et al. [2], who accurately (over
Figure 13: Photographs classified as paintings. Classifier
output is displayed above each image. An output of 1 is a
perfect photograph
90% correct) distinguished photographs from computergenerated graphics. The authors used the term computergenerated graphics to denote desktop or web page icons and
not computer-rendered images of 3-D scenes. Obviously,
paintings can be much more similar to photographs than
icons are. Several features these authors used are similar
to ours. Athitsos et al. noted that there is more variability in
the color transitions from pixel to pixel in photographs than
in graphics. We also quantified the same feature (albeit in a
different way) and found more variability in paintings than
in photographs. The authors also observed that edges are
much sharper in graphics than in photographs. We, on the
other hand, found no difference in intensity edge structure
between photographs and paintings, but found instead that
paintings have significantly more pure-color edges. Athit-
PAINTIING 0.3831
PAINTIING
0.092071
PAINTIING 0.2402
PAINTIING
0.34862
PAINTIING 0.45955
PAINTIING 0.12466
PAINTIING 0.072161
PAINTIING 0.49891
PAINTIING 0.24804
Figure 14: Photographs classified as paintings. Classifier
output is displayed above each image. An output of 1 is a
perfect photograph
Figure 15: LEFT: scrambled painting.
photograph.
RIGHT:
from photographs in gray-scale images. However, it is possible that human painting-photograph discrimination relies
heavily on image content, and thus is not affected by the loss
of color information. To elucidate this point, we are planning to conduct psychophysical experiments similar to the
one described in this paper on scrambled gray-level images.
If the removal of color information affects the photorealism ratings significantly, it will mean that color is critical
for human observers also. It is easy to convince oneself
that reducing image size (by smoothing and sub-sampling)
renders the perceptual painting/photograph discrimination
more difficult if the paintings have “realistic” content. Thus,
it is reasonable to expect that the discrimination performance of our classifier will also improve with increasing
image resolution—hypothesis that we are planning to verify
in future work. In our study we employed images of modest
resolution, typical for web-available images. Certain differences between paintings and photographs might be observable only at high resolutions. Specifically, although we did
not observe any differences in the edge structure of paintings and photographs in our images, we suspect that the intensity edges in paintings are different from intensity edges
in photographs. In future work, we plan to study this issue
on high-resolution images.
scrambled
sos et al. found that graphics contain more saturated colors than photographs; we found that the same was true
for paintings. The authors found that graphics contain less
unique (distinct) colors than photographs; we found paintings to have more unique colors than photographs. In addition, Athitsos et al. used two powerful, color-histogram
based features: the prevalent color metric and the color histogram metric. We also found experimentally that the hue
(or full RGB) histograms are quite useful in distinguishing
between photographs and paintings; for example, the hue
corresponding to the color of the sky was quite characteristic of outdoor photographs. However, since hue is image content-dependent to a large degree, we decided against
using hue histograms (or RGB histograms) in our classifiers, as our intention was to distinguish paintings from photographs in a image content-independent manner. Two of
the features in Athitsos et al. — smallest dimension and dimension — exploited the size characteristics of the graphics
images and were not applicable to our problem. Most of our
features use color in one way or another. The Gabor features
are the only ones that use exclusively image intensities, and
taken in isolation are not sufficient for accurate discrimination. Thus, color is critical for the good performance of our
classifier. This appears to be different from human classification, since human can effortlessly discriminate paintings
References
[1] B.S. Manjunath and W.Y. Ma. Texture features for browsing and retrieval of image data. IEEE Transaction on Pattern
Analysis an Machine Intelligence, 18(8):837–842, 1996.
[2] V. Athitsos, M.J. Swain, and C. Frankel. Distinguishing
photographs and graphics on the World Wide Web. Workshop on Content-Based Access of Image and Video Libraries
(CBAIVL ’97) Puerto Rico, 1997.
[3] M. Szummer and R. W. Picard. Indoor-outdoor image classification. In IEEE International Workshop on Content-based
Access of Image and Video Databases, in conjunction with
CAIVD’98, pages 42–51, 1998.
[4] A. Vailaya, A. K. Jain, and H.-J. Zhang. On image classification: City vs. landscapes. International Journal of Pattern
Recognition, 31:1921–1936, 1998.
[5] C. M. Bishop. Neural networks for pattern recognition The
Clarendon Press Oxford University Press, New York, 1995
[6] Mirth A. Hoang et al. Measurement of Color Texture citeseer.nj.nec.com/hoang02measurement.html
[7] A.K. Jain and F. Farrokhnia. Unsupervised texture segmentation using gabor filters Pattern Recognition, 24(12):11671186, 1991.
[8] A. C Bovik et al. Multichannel texture analysis using localized spatial filters. IEEE Trans. on PAMI, 12(1):55-73, 1990.
Download