Comparison of categorization criteria across image genres

advertisement
Comparison of categorization criteria across image genres
Mari Laine-Hernandez
Department of Media Technology
Aalto University
School of Science and Technology
P.O.Box 15500, FI00076 Aalto
mari.laine-hernandez@tkk.fi
Stina Westman
Department of Media Technology
Aalto University
School of Science and Technology
P.O.Box 15500, FI00076 Aalto
stina.westman@tkk.fi
ABSTRACT
image categorization and description have been studied
extensively. The subjects of the images in these studies vary
from narrow topics such as images of people (Rorissa &
Hastings, 2004; Rorissa & Iyer, 2008) or grayscale images
of trees/forest (Greisdorf & O’Connor, 2001), to “generic”
photographs with extremely varied content including e.g.
people, objects and scenery (from a Kodak Photo-CD,
Teeselink et al., 2000; from PhotoDisc, Mojsilovic &
Rogowitz, 2001). They can also be selected based on a
specific image genre such as vacation images (Vailaya et
al., 1998), journalistic images (Laine-Hernandez &
Westman, 2006; Sormunen et al., 1999) or magazine
photographs (Laine-Hernandez & Westman, 2008;
Westman & Laine-Hernandez, 2008), and as such may
cover a wide range of semantic content.
This paper describes a comparison of categorization criteria
for three image genres. Two experiments were conducted,
where naïve participants freely sorted stock photographs
and abstract/surreal graphics. The results were compared to
a previous study on magazine image categorization. The
study also aimed to validate and generalize an existing
framework for image categorization. Stock photographs
were categorized mostly based on the presence of people,
and whether they depicted objects or scenes. For abstract
images, visual attributes were used the most. The
lightness/darkness of images and their user-evaluated
abstractness/representativeness also emerged as important
criteria for categorization. We found that image
categorization criteria for magazine and stock photographs
are fairly similar, while the bases for categorizing abstract
images differ more from the former two, most notably in
the use of visual sorting criteria. However, according to the
results of this study, people tend to use descriptors related
to both image content and image production technique and
style, as well as to interpret the affective impression of the
images in a way that remains constant across image genres.
These facets are present in the evaluated categorization
framework which was deemed valid for these genres.
The contents of the images have an influence on the
categorization results. For instance, using photographs of
people with the backgrounds removed the participants of
Rorissa and Hastings (2004) formed the following main
categories: exercising, single men/women, working/busy,
couples, poses, entertainment/fun, costume and facial
expression. Vailaya et al. (1998), on the other hand, used
outdoor vacation images which subjects sorted into the
following categories: forests and farmlands, natural scenery
and mountains, beach and water scenes, pathways,
sunset/sunrise scenes, city scenes, bridges and city scenes
with water, monuments, scenes of Washington DC, a mixed
class of city and natural scenes, and face images. A
comparison of the category labels in these two studies
reveals the effect of the material that is being categorized.
Keywords
Image categorization, free sorting, image categories, image
genres
INTRODUCTION
The ability to represent images in meaningful and useful
groups is essential for the ever-growing number of
applications and services involving large collections of
images, for both professional and personal use. Being
central issues for visual information retrieval, subjective
The above-mentioned prior research has revealed that
people evaluate image similarity mostly on a high,
conceptual (as opposed to low, syntactic) level, based on
the presence of people in the photographs, distinguishing
e.g. between people, animals and inanimate objects as well
as according to whether the scenes and objects in the
images are man-made or natural, e.g. buildings vs.
landscapes (Laine-Hernandez & Westman, 2008;
Mojsilovic & Rogowitz, 2001; Teeselink et al., 2000).
Laine-Hernandez and Westman (2006) found that people
also create image categories based on abstract concepts
related to emotions or atmosphere, cultural references and
visual elements. Professional image users (journalists) have
Permission to make digital or hard copies of all or part of this work
for personal or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage
and that copies bear this notice and the full citation on the first
page. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specific permission and/or a fee.
ASIST 2010, October 22–27, 2010, Pittsburgh, PA, USA.
Copyright © 2010 Aalto University School of Science and Technology
1
METHODOLOGY
been shown to evaluate image similarity based on criteria
varying from syntactic/visual to highly abstract: shooting
distance and angle, colors, composition, cropping, direction
(horizontal/vertical), background, direction of movement,
objects in the image, number of people in the image, action,
facial expressions and gestures, and abstract theme
(Sormunen et al., 1999).
Image categorization for different image genres
We set out to compare three different genres of images:
magazine photographs, stock photographs, and abstract
images. In the case of magazine photographs, we looked at
the results obtained by Laine-Hernandez and Westman
(2008), and for stock photographs and abstract images we
conducted two new experiments with identical procedure.
As in many previous image categorization studies (e.g.
Laine-Hernandez & Westman, 2006; Rorissa & Hastings,
2004; Vailaya et al., 1998), we used free sorting to obtain
the image categories. The first two image genres (magazine
and stock photographs) are more similar to each other - due
to the production technique but also in style and semantic
content - compared to the third genre (computer-generated
graphic abstract images). By selecting these materials we
hope to address the potentially different degrees of
applicability of the results on image categorization. Some
aspects of magazine image categorization behavior may
apply to stock photographs but not abstract images, while
other aspects might be shared between all three image
genres under study.
Most previous studies have involved a descriptive analysis
of the category labels used but have not generated a
hierarchical image categorization framework. LaineHernandez and Westman (2008), however, employed a
qualitative data-based analysis on the category names given
by their participants and produced the framework for
magazine image categorization shown in Table 1.
The goals of the current study are to:
1.
discover which image attributes (similarity
criteria) are used when categorizing stock
photographs and abstract images, and
2.
validate the class framework of Laine-Hernandez
& Westman (2008) by studying whether it can be
generalized across image genres.
Main class
The study of Laine-Hernandez and Westman (2008)
involved two participant groups, expert (staff members at
magazines, newspapers, picture agencies and museum
photograph archives) and non-expert (engineering students
or university employees). Differences were found in the two
groups’ categorization behavior. For the two new
experiments we recruited non-expert participants, and
therefore only look at the results of Laine-Hernandez and
Westman obtained with non-experts (N=18).
Subclasses
Function
Product photos, Reportage, Portraits,
News photos, Illustration, Symbol
photos, Advertisement, Profile piece,
Vacation photos, Misc. function
People
Person, Social status, Gender, Posing,
Groups, Relationships, Age,
Expression, Eye contact
Object
Non-living, Buildings, Vehicles,
Animals
Scene
Landscape, Interiors, Nature,
Cityscape, Misc. scene
Theme
Food&drink, Work, Sports, Cinema,
Travel, Fashion, Art, Transportation,
Architecture, Home&family, Culture,
Technology, Politics, Economy,
Religion, Hobbies&leisure, Misc.
theme
Story
Event, Time, Activity
Affective
Emotion, Mood
Description
Property, Number
Visual
Color, Composition, Motion, Shape
Photography
Distance, Black&white, Style, Image
size, Cropping
Experiment I – Stock photographs
Test images
One hundred stock photographs were chosen at random
from a collection of 4000+ colored stock photographs with
varied content originally collected from the stock
photography service Photos.com. Examples of the test
images are shown in Figure 1. The photographs were
printed so that the longer side of each photo measured
roughly 15 cm. Each photograph was glued on grey
cardboard for easier handling.
Table 1. Framework for magazine image
categorization (Laine-Hernandez & Westman 2008)
Figure 1. Examples of test images in Experiment I
2
Participants
The participants (N=20, 10 female) were engineering
students and university staff. Their age varied from 19 to 55
(mean 27). They were rewarded for their effort with movie
tickets.
11.2 and 12.7 cm. The images were glued on grey
cardboard for easier handling. Image orientation was
marked with a small circle drawn under each image
denoting bottom, and all images were handed to the
participant in correct orientation.
Procedure
Participants
The participant was handed the test images in one randomordered pile. The written task instruction was:
The participants (N=20, 9 female) were recruited through
engineering students’ newsgroups. Their age varied from 20
to 38 (mean 24). They were rewarded for their effort with
movie tickets.
Sort the images into piles according to their similarity so
that images similar to each other are in the same pile. You
decide the basis on which you evaluate similarity. There are
no ‘correct’ answers, but it is about what you experience as
similarity. You decide the number of piles and how many
images there are in each pile. An individual image can also
form a pile. Take your time in completing the task. There is
no time limit for the experiment.
Procedure
The first part of the procedure was identical to the
procedure described for Experiment I. Category names
were not required for single image categories due to
expected difficulty in describing these categories.
After sorting the images and describing the resulting
categories, 9 of the participants (4 female) carried out an
additional abstractness/representativeness evaluation task.
The evaluation procedure was taken from the context of
image quality evaluation (Leisti et al., 2009). The test
images were placed in one pile in random order. A paper
rating scale with numbers from 1 to 7 appearing at equal
intervals as well as the words “abstract” and
“representative” written in the ends of the scale was placed
on a table. The participant was first asked to select the most
abstract test image and place it on number 1 (“abstract”).
The participant was then asked to select the most
representative test image and place it on number 7
(“representative”). After that, the participant was asked to
place the remaining images on numbers 1 to 7 so that the
representativeness grows linearly from 1 to 7. A mean
opinion score was calculated from the participants’
evaluations for each test image.
The participant was also told that there would be a
discussion about the piles after the completion of the task.
After completing the sorting task the participant was asked
to explain the basis on which they has formed each pile, i.e.
to describe and name them.
Experiment II – Abstract/surreal images
Test images
Test image candidates were collected from various websites
(caedes.net, deviantart.com, digitalart.org, flickr.com
creative commons, sxc.hu). All the collected images were
keyworded as “abstract” or “abstract/surreal” on these sites.
The images were vector art, 3D rendered art, fractal art,
computer desktop backgrounds etc. We chose images based
on high user-rating if provided by the website. In order to
be able to produce decent-quality prints, we required that
the smaller side of the image be at least 500 pixels and the
longer side at least 800 pixels. We looked for color images
that had visually complex, abstract content, but also images
with semi-representative content (e.g. 3D modeled images)
were included in order to make the test image collection
varied within the genre. We collected approximately 250
test image candidates, and then selected the final one
hundred test images from those at random. Examples of test
images are shown in Figure 2.
Data analysis
Qualitative analysis of category names
The category names or descriptions given by the
participants in Experiments I and II were analyzed
qualitatively by the authors and assigned to the main and
subclasses shown in Table 1. The reader should note that in
this paper, the terms “category” and “class” carry distinct
meanings. Category refers to an image group created by a
participant in an experiment. Class refers to an instance of
category names coded according to the categorization
framework developed by Laine-Hernandez & Westman
(2008). If the category included references to multiple
classes (i.e. multiclass category), two or more class
instances were created from the name. For example, if the
category was described as “Ocean-themed, blue images”,
the instances “ocean-theme” (Theme – Misc. theme) and
“blue” (Visual - Color) were separated.
The images were printed at a resolution of 180 dpi, with the
length of the longer side of the images varying between
Figure 2. Examples of test images in Experiment II.
left: “Foucault Pendulum” by Burto (Caedes.net),
center: “strained glass” by Buttersweet (Flickr.com),
right: “The Dilemma of Solipsism” by WENPEDER
(Caedes.net).
The same procedure was employed by Laine-Hernandez
and Westman (2008) and the results obtained here will be
compared to those. Data from the second phase of the
3
sorting procedure in Laine-Hernandez and Westman (2008)
is disregarded here, as the results regarding the use of
classes were determined by the first part of the experiment,
identical to the procedure in Experiments I and II.
those subclasses were included that were connected to at
least two images. The contingency table was formed by
assigning subclasses (C) to columns and images (I) to rows.
The value in cell (Ci,Ij) was determined by the number of
times subclass Ci was used as category description for
image Ij.
Statistical analysis
For purposes of statistical comparisons, the distributions to
be compared were tested for normality using the ShapiroWilk test. If they were normally distributed, parametric
tests for two or more independent samples (t-test or oneway analysis of variance (ANOVA), respectively) were
applied. If they were not normally distributed, we used nonparametric tests for two or more independent samples
(Mann-Whitney U test or Kruskal-Wallis test).
CA was carried out with R (R Development Core Team,
2010), an environment for statistical computing, using the
package FactoMineR (Husson et al., 2008). The results
were further visualized using Matlab.
RESULTS
Number of categories
The numbers of categories formed for different image
genres are listed in Table 2. The amount of test images was
the same (100 images) in all three experiments. For
magazine photographs, participants created 298 categories
(out of which 2 were excluded from further analysis
because of ambiguous meaning) in total, for stock
photographs 352 categories and for abstract images 419
categories (out of which 2 were excluded from analysis
because of ambiguous meaning, and 68 because they were
single-image categories which needed not be named by the
participant). According to a one-way ANOVA there was no
statistically significant difference between the three
experiments in the number of categories formed.
Multidimensional scaling (MDS)
The sorting data from Experiments I and II was also
visualized in Matlab (The MathWorks) using non-metric
multidimensional scaling. MDS is a multivariate data
analysis method that seeks to express the structure of
similarity data (given in the form of a distance matrix)
spatially in a given (low) number of dimensions. The
Standardized Residual Sum of Square (STRESS) value is
used to evaluate the goodness of fit of an MDS solution. A
squared stress, which is normalized with the sum of 4th
powers of the inter-point distances, was used in the
analysis. A stress value <0.025 has been considered
excellent, 0.05 good, 0.10 fair and >0.20 poor (Kruskal,
1964). These are only rough guidelines and not applicable
in all situations as the stress value depends on the number
of stimuli and dimensions. When the number of points (here
images) is much larger than the number of dimensions of
the space they are represented in, as in our case (100 images
in a two-dimensional space), higher stress values may be
acceptable (Borg & Groenen, 2005). In practice stress
values below 0.2 are taken to indicate an acceptable fit of
the solution to the similarity data (Cox & Cox, 2001).
Out of the magazine photograph categories, 33.8% included
two or more bases for sorting, resulting in a total of 409
class instances. Magazine photograph category names
contained 1.38 classes on average. Of the stock photograph
categories, 23.6% were multiclass categories, resulting in
442 class instances. Stock photograph category names
contained an average of 1.26 classes. Of the analyzed
abstract image categories, 38.1% were multiclass
categories, and resulted in a total of 507 class instances.
Abstract image category names contained 1.45 classes on
average.
In order to obtain the MDS solution the data from each
experiment was first converted to an aggregate dissimilarity
(distance) matrix as follows. The percent overlap for each
pair of photographs i and j was calculated as the ratio of the
number of subjects who placed both i and j in the same
category to the total number of subjects. The percent
overlap Sij gives a measure of similarity, which was then
converted to a measure of dissimilarity: δij = 1 - Sij. MDS
was performed on these dissimilarity matrices.
It was also possible to form categories of single images. For
magazine photographs 16% of the categories consisted of
one image, for stock photographs 19% and for abstract
images 21%. According to a Kruskal-Wallis test there was
no significant difference between the three experiments in
the number of single-image categories formed.
Correspondence analysis (CA)
Correspondence analysis is an exploratory multivariate
statistical technique used to analyze contingency tables, i.e.
two-way and multi-way tables containing some measure of
correspondence between the rows and columns. We used
correspondence analysis to analyze the usage of subclasses
in Experiments I and II. This was done after separating the
multiclass categories into two or more categories. The
image sorting data by itself was not used in the analysis, but
only the connections between images and subclasses. Only
Images
Total
Min
Max
Mean
Sd
Magazine
298
4
35
17
7
Stock
352
7
32
18
7
Abstract
419
4
39
21
10
Table 2. Number of categories for different image
types
4
For magazine photographs main classes People (25.2%),
Theme (22.2%), Object (12.2%) and Scene (10.3%) were
used the most. The least-used classes were Visual (2.0%)
and Function (3.2%). Stock photographs were also
categorized mostly based on their main semantic content.
The main classes that were used the most were Object
(23.1%), Scene (18.6%), Theme (16.5%) and People
Time spent on sorting
With stock photographs the participants spent on average
16 minutes (min 7, max 32, sd 7) sorting the images
(excluding the time it took to name the categories), with
magazines photographs 24 minutes (min 9, max 45, sd 11),
and with abstract images 33 minutes (min 13, max 76, sd
18).
Magazine
According to Mann-Whitney U tests the difference between
stock photographs and the other two image genres was
significant (p<0.02), but the difference between magazine
photographs and abstract images was not significant
(p=0.11).
(LaineHernandez &
Westman
2008)
Stock
Abstract
(Experiment
I)
(Experiment
II)
Function**
3.2
3.4
9.9
People***
25.2
15.8
1.6
Object*
12.2
23.1
20.1
Scene***
10.3
18.6
3.2
Theme**
22.2
16.5
5.9
Story**
7.6
6.8
2.6
Affective
4.2
1.4
3.6
Description
7.6
6.6
8.7
Visual**
2.0
2.0
38.1
Photography
5.6
5.9
6.5
Main Class
Category types
Category descriptions
For magazine photographs, according to Laine-Hernandez
and Westman (2008) “non-expert participants often formed
categories on various semantic levels instead of a
controlled categorization on a thematic level.” Category
descriptions often contained named objects, scenes and
aspects related to the people in the images. Categories were
also formed based on affective aspects, i.e. emotional
impact or the mood interpreted in the photograph.
Stock photograph categories mostly referred to the main
semantic content of the images - often using just one word
(e.g. animals, buildings, portraits, food) or a combination
of two simple criteria combined (e.g., close-ups of objects,
modern cityscapes, people working, nature details).
Table 3. Percentage distribution of main classes for
different image genres. Significant differences
between all three image genres are marked with ***.
Significant differences between abstract images vs.
magazine and stock photographs are marked with **.
Significant difference between magazine and stock
photographs is marked with *.
Most of the category descriptions for abstract images were
more complex than the ones for stock photographs. The
participants did not know beforehand that they would have
to name the categories (just that they would have to discuss
them in some way), so they probably grouped images based
on some feeling or intuition, resulting in category
descriptions such as: “A feel of realism, but also artistic
mess; a combination of a drawing and realism”, or “3D
modeled, the same mood, a very coherent group, spheres”,
or “bright colors, they remind me of stereotypical computer
graphics, could be (computer) desktop backgrounds, rich,
lots of details”. On the other hand, there were also many
concise category names targeting various semantic levels
(space, modern art, graffiti, flowers, landscapes, spheres,
splashes, spirals etc.).
40
35
30
25
%
20
15
Magazine
Stock
Photography
Visual
Description
Affective
Story
Theme
Scene
0
Object
5
The percentage distributions of main class usage for
magazine photographs, stock photographs and abstract
images and are listed in Table 3 and visualized in Figure 3.
The exact usage percents of the main and subclasses for
each image genre are also listed in the table in Appendix 1.
In order to make the comparison between photographs and
non-photographs (i.e. abstract images) possible, the main
class Photography was extended to include aspects related
to production technique (e.g. drawing, painting, comiclike).
Function
Main class usage
People
10
Abstract
Figure 3. Main class usage for different image genres
5
Multidimensional scaling results
(15.8%). The four most-used main classes were the same
(although not in the same order) for magazine and stock
photographs. The least used classes for stock photographs
were Affective (1.4%) and Visual (2.0%).
A visualization of the two-dimensional MDS solution for
stock photographs is presented in Figure 5. The stress value
for the solution is 0.07. In the MDS several clusters of
images can be identified, e.g. photographs of people, urban
scenes, nature scenes, animals, non-living objects and food
photographs. The diagonal axes of the solution seem to
differentiate between 1) images with people and images
without people, and 2) whether the images depict objects or
scenes.
In contrast to magazine and stock photographs, abstract
images were categorized the most based on visual features.
The most used main classes were Visual (38.1%), Object
(20.1%), Function (9.9%) and Description (8.7%). The
least used classes were People (1.6%) and Story (2.6%).
According to a Kruskal-Wallis test the usage of the
following three main classes did not significantly differ
between the three experiments: Affective, Description and
Photography. For the rest of the main classes we used the
Mann-Whitney U test for two independent samples to see
where the pairwise differences occurred. For classes
Function, Story, Theme and Visual there was no significant
difference between magazine and stock photographs, but
the abstract images differed from the other two (Function
p=0.050, Story p<0.007, Theme p<0.001, Visual p=0.000).
For class Object the only significant difference occurred
between magazine and stock photographs (p=0.018), and
for class Function abstract images differed from both stock
photographs (p=0.050) and magazine photographs
(p=0.044). For classes People and Scene the differences
were significant between all three images genres
(p<0.001).The usage of the main classes was thus most
similar between magazine and stock photographs; for seven
out of ten main classes there was no significant difference.
A visualization of the two-dimensional MDS solution for
abstract images is presented in Figure 6. The stress value of
the solution is 0.17. This means that inter-categorizer
agreement was lower for abstract images than for stock
photographs.
We also analyzed for how many images (out of 100) each
main class was used for each image genre. The results are
depicted in Figure 4. The classes that were used to describe
the most images across all genres are Description (80% of
all images), Theme (74%), Function (71%) and
Photography (71%).
Figure 5. Result of the two-dimensional MDS solution
for stock photographs
Magazine
Photography
Visual
Description
Story
Stock
Affective
Theme
Scene
Object
People
Function
number of images
100
90
80
70
60
50
40
30
20
10
0
Abstract
Figure 4. Number of images (out of 100) for which each
main class was used
Figure 6. Result of the two-dimensional MDS solution
for abstract images
6
Dark images seem to be placed in the bottom-left corner of
the MDS visualization. We therefore converted the images
to the CIELAB color space (Fairchild, 2005) and calculated
their mean lightness. We then calculated the Pearson
correlation coefficient r between the mean lightness and
MDS x+y values of the images which resulted in r=0.756
(p<0.001), indicating strong correlation. Representative
images seem to be located in the top left part of the MDS
visualization. We therefore calculated the correlation
between the abstractness scores (calculated as the mean of
the participants’ evaluations) and MDS y−x coordinates of
the image resulting in r=0.772 (p<0.001), also indicating
strong correlation.
people and images with no people. Dimension 2 appears to
represent the distinction between man-made and natural
scenes/objects.
The result of the correspondence analysis for abstract
images is shown in Figure 8. Dimension 1 explains 17.0%
of the inertia and dimension 2 explains 15.4%, adding up to
32.4% of the total inertia. Even though the two dimensions
explain more of the variance than the two dimensions for
stock photographs, the solution is not as clear to interpret as
the one for stock photographs (Figure 7). This is an
indication of the complexity of similarity evaluations for
abstract images. It is also partially due to the fact that two
of the images - the only ones with clearly identifiable
human figures - are isolated and the rest of the images are
tightly clustered.
Correspondence analysis results
The result of the correspondence analysis for the stock
photographs is shown in Figure 7. Dimension 1 (horizontal
axis) explains 14.3% of the total inertia (variance) of the
data and dimension 2 (vertical axis) explains 13.5%. The
two dimensions together then account for 27.8% of the
inertia.
The subclasses correlating positively the most with
dimension 1 are Photography - Technique (r=0.70) and
Function – Advertisement (r=0.52), and the ones that
correlate negatively Theme – Misc. theme (r=-0.76) and
Photography – Distance (r=-0.71). The subclasses that
correlate positively the most with dimension 2 are People Person (r=0.72) and Scene – Nature (r=0.43), and the ones
that correlate negatively Object – Non-living (r=-0.66) and
Description – Property (r=-0.52). All these correlations are
statistically significant (p<0.001). The semantic aspects
represented by the two axes are not as clear as in the case of
stock photographs, but subclasses People – person and
Description - property can be found in the ends of both
dimension 1 of stock photographs and dimension 2 of
abstract images.
The subclasses that correlate positively the most with
dimension 1 are People - Person (r=0.87) and People Portraits (r=0.74), and the ones that correlate negatively
Description - Property (r=-0.56) and Scene - Landscape
(r=-0.53). The subclasses that correlate positively the most
with dimension 2 are Object - Buildings (r=0.69) and Scene
- Cityscape (r=0.67), and the ones that correlate negatively
Scene - Nature (r=-0.70) and Scene - Landscape (r=-0.55).
All these correlations are statistically significant (p<0.001).
Dimension 1 seems to differentiate between images of
Figure 7. Result of the correspondence analysis
based on subclass usage for stock photographs
Figure 8. Result of the correspondence analysis
based on subclass usage for abstract images
7
theme technology was extended to include industry. In
addition, especially for abstract images but to a lesser extent
also for stock photographs, the function art images
appeared in the category names. Furthermore, visual
impression (e.g. clear, fuzzy) was used as a categorization
criterion for abstract images. The addition of these
subclasses does not change the results of the comparisons
between the three image genres, because the main class
level remained unmodified.
DISCUSSION
Complexity of image categorization
Magazine photograph category names contained on average
1.38 classes, stock photograph category names 1.26 classes,
and abstract image category names 1.45 classes. These
findings are roughly in line with Jörgensen’s (1995) results,
where one third of image group names were composed of
multiple (on average 1.5) terms. There is, however, some
variation, which could be interpreted as an indication of the
complexity of categorization criteria. The time taken to sort
the images can be interpreted to correlate with the difficulty
of grouping the test images into coherent categories. Taken
together, these two findings indicate that image
categorization was the most simple for stock photographs
and the most complex for abstract images. The more
complex the categorization is, the more difficult it is to
represent the criteria employed to perform it (e.g. in the
form of a categorization framework).
Applicability of the framework for image categorization
Based on our results on main class usage, the categorization
framework developed for magazine photographs (LaineHernandez & Westman, 2008) can be extended to apply to
stock photographs. For seven out of ten main classes there
were no significant usage differences. The only main
classes with significant differences were People, Scene and
Object. Since these classes describe the main semantic
content of the images, it is not surprising that their usage
differs between different image genres and even between
different image collections within genres. It should,
however, be noted that despite the differences, these three
classes were all among the four most-used classes for both
magazine and stock photographs.
Dimensions of similarity evaluations
The dimensions of image similarity criteria as revealed by
MDS were different for stock photographs and abstract
images. The result for stock photographs was largely
similar to the results reported in earlier studies (LaineHernandez & Westman, 2008; Rogowitz et al., 1998;
Teeselink et al 2000). The result for abstract images,
however, revealed entirely different bases for
categorization: the lightness vs. darkness and abstractness
vs. representativeness of the images. These dimensions are
not about the main semantic content of the images, but
describe their visual and stylistic/representational aspects.
Abstract images differed more from the two photograph
genres; only three out of ten main classes did not show a
statistically significant difference across all three image
genres: Affective, Description and Photography. The usage
of these classes was at best moderate if evaluated by the
percentage usage of them in the category descriptions.
However, taking into account the number of images they
were used to categorize, classes Description, Function and
Photography were among the four most-used main classes
if averaged over the three image genres. This shows that
they form important and widely applicable bases for image
categorizations across image genres.
For stock photographs, the correspondence analysis
conducted using sublevel class information revealed very
similar axes as the MDS conducted by Rogowitz et al.
(1998): man-made (buildings & cityscape) vs. natural
(landscape & nature) and more human-like (person &
portraits) vs. less human-like (property & landscape).
Considering that the image sorting data was not used at all
in the correspondence analysis, the result shows that the
categorization framework of Laine-Hernandez and
Westman (2008) is capable of accurate characterization of
stock photographs when compared to user-evaluated image
similarity. For abstract images the result was not as clear,
indicating that the framework is perhaps less suitable for
non-photographs and some other image genres.
The largest differences across all three image genres
occurred in the use of main classes People, Scene and
Visual. Classes People and Scene were seldom used in the
categorization of abstract images. As mentioned above, the
differences can be explained by the subject matter of the
test images. Of the 100 stock photographs, 38 both depicted
people as the main subject matter and were used by the
participants as sorting criteria. The study of LaineHernandez and Westman (2008) included photographs from
five different magazine types with each of them including
photographs of people, one of them “predominantly” so. In
contrast to this, only 2 of the 100 abstract images contained
people depicted clearly enough for the participants to use
them as sorting criteria, and these two images were strongly
categorized based on this aspect (evident also from the
correspondence result in Figure 8). This demonstrates again
the importance of the presence of people in images. Class
Visual, on the other hand, was used rarely in the
categorization of photographs but in the case of abstract
Additions to the categorization framework
During the data analysis of Experiments I and II, a few
possible additions to the categorization framework (LaineHernandez & Westman 2008) emerged and were used in the
data analysis for stock photographs and abstract images.
Most notably, the addition of the subclass Object - nature
objects would logically fill the gap left between subclasses
non-living and animals. The subclass includes plants,
flowers etc. and was used in both experiments. Because of
the occurrences in stock photograph category names, the
8
images it was by far the most used class. Visual aspects are
therefore important in image categorization across genres.
Meeting of the American Society for Information Science
and Technology.
Laine-Hernandez, M., & Westman, S. (2008). Multifaceted
image similarity criteria as revealed by sorting tasks.
Proceedings of the 71st Annual Meeting of the American
Society for Information Science and Technology.
CONCLUSIONS
This paper presented the results of two subjective image
categorization experiments and a comparison between them
as well as an earlier study on the categorization of magazine
photographs. We found that the image categorization
criteria for magazine and stock photographs are fairly
similar, while the bases for categorizing abstract images
differ more from the former two. According to the results of
this study, the image categorization framework developed
for magazine photographs can fairly well be generalized to
apply to different image genres, and especially well to stock
photographs. The categorization framework is detailed
enough to be able to characterize the contents and similarity
of stock photographs.
Leisti, T., Radun, J., Virtanen, T., Halonen, R., & Nyman,
G. (2009). Subjective experience of image quality:
attributes, definitions, and decision making of subjective
image quality. In S. P. Farnand & F. Gaykema, eds.
Image Quality and System Performance VI. San Jose,
CA, USA.
Mojsilovic, A., & Rogowitz, B. (2001). Capturing Image
Semantics with Low-level Descriptors. Proceedings of
IEEE International Conference on Image Processing,
(ICIP 2001) 1, 18-21.
R Development Core Team (2010). R: A Language and
Environment for Statistical Computing. R Foundation for
Statistical
Computing,
Vienna,
Austria.
Url:
http://www.R-project.org.
ACKNOWLEDGMENTS
This study was funded by the Academy of Finland and the
National Technology Agency of Finland.
Rogowitz, B. E., Frese, T., Smith, J. R., Bouman, C. A., &
Kalin, E. (1998). Perceptual image similarity
experiments. In Proceedings of SPIE Conference on
Human Vision and Electronic Imaging III, 3299, 576590.
REFERENCES
Borg, I., & Groenen, P. J. F. (2005) Modern
Multidimensional Scaling: Theory and Applications, 2 nd
ed. Springer, New York.
Cox, T. F., & Cox, M. A. (2001). Multidimensional Scaling,
2nd ed. Chapman and Hall, Boca Raton.
Rorissa, A., & Hastings, S.K. (2004). Free sorting of
images: Attributes used for categorization. Proceedings
of the 67th Annual Meeting of the American Society for
Information Science and Technology.
nd
Fairchild, M. D. (2005). Color Appearance Models, 2 ed.
Wiley-IS&T, Chichester, UK.
Greisdorf, H., & O’Connor, B. (2001). Modelling what
users see when they look at images: a cognitive
viewpoint. Journal of Documentation 58(1), 6-29.
Rorissa, A., & Iyer, H. (2008). Theories of cognition and
image categorization: What category labels reveal about
basic level theory. Journal of the American Society for
Information Science and Technology 59(9), 1383-1392.
Greisdorf, H., & O’Connor, B. (2002). What do users see?
Exploring the cognitive nature of functional image
retrieval. Proceedings of the 65th Annual Meeting of the
American Society for Information Science and
Technology 39, 383-390.
Sormunen, E., Markkula, M., & Järvelin, K. (1999). The
Perceived Similarity of Photos - A Test-Collection Based
Evaluation Framework for the Content-Based Photo
Retrieval Algorithms. In: S.W. Draper S.W., M.D.
Dunlop, I. Ruthven, C.J. van Rijsbergen (Eds.) Mira 99:
Evaluating interactive information retrieval.
Husson, F., Josse, J., & Lê, S. (2008). FactoMineR: An R
Package for Multivariate Analysis. Journal of Statistical
Software 25(1), 1-18.
Teeselink, I.K., Blommaert, F., & de Ridder, H. (2000).
Image Categorization. Journal of Imaging Science and
Technology 44(6), 552-55
Jörgensen, C. (1995). Classifying Images: Criteria for
Grouping as Revealed in a Sorting Task. Proceedings of
the 6th ASIS SIG/CR Classification Research Workshop.
Vailaya, A., Jain, A., & Zhang, H.J. (1998) On Image
Classification: City Images vs. Landscapes. Pattern
Recognition 31(12), 1921-1935.
Jörgensen, C. (1998). Attributes of Images in Describing
Tasks. Information Processing & Management 34(2),
161-174.
Westman, S., & Laine-Hernandez, M. (2008). The effect of
page context on magazine image categorization.
Proceedings of the 71st Annual Meeting of the American
Society for Information Science and Technology.
Kruskal, J.B. (1964) Multidimensional Scaling by
Optimizing Goodness of Fit to a Nonmetric Hypothesis.
Psykometrika 29(1), 1-27.
Laine-Hernandez, M., & Westman, S. (2006). Image
Semantics in the Description and Categorization of
Journalistic Photographs. Proceedings of the 69th Annual
9
APPENDIX 1 Percentage distribution of main and subclasses in the three experiments. New additions to the class
hierarchy in italics.
Main/subclass
Function
Magazine
photographs
3.2
Stock
photographs
3.4
Abstract
images
Main/subclass
9.9
Magazine
photographs
Stock
photographs
Abstract
images
Cinema
1.7
0
0
Product photos
0.7
0
0
Travel
1.0
0
0
Reportage
0
0
0
Fashion
1.7
0
0
Portraits
0.2
1.8
0
Art
3.2
0.5
0.2
News photos
0.2
0
0
Transportation
0.5
0.7
0
Illustration
0
0
0
Architecture
1.0
1.8
0
Symbol photos
0
0
0
Home&family
0.5
0.7
0
Advertisement
0.2
0.2
2.8
Culture
0.5
0
0
0.7
2.7
0
0.7
0
0
Economy
0
0.2
0
Religion
0.2
0
0
Hobbies&leisure
0.2
1.4
0
Profile piece
0.5
0
0
Vacation photos
0.5
0.2
0
Industry &
Technology
Art images
-
0.9
3.8
Politics
Misc. function
0.7
0.2
3.4
People
25.2
15.8
1.6
Person
8.8
5.0
1.6
Social status
3.9
1.6
0
Gender
4.4
0.7
0
Posing
3.2
1.6
0
Groups
1.5
3.2
0
Relationships
0.7
1.1
0
Age
1.2
2.3
0
Expression
0.7
0.5
0
Eye contact
0.7
0
0
Object
12.2
23.1
8.1
13.8
Buildings
3.4
3.4
0.2
3.4
0.5
Animals
0
5.4
0.4
Nature objects
-
5.7
5.7
Scene
10.3
18.6
1.1
1.8
Time
2.4
0.9
0.6
Number
3.9
5.0
8.7
Visual
17.6
Composition
0
0.2
5.3
Impression
-
0
2.6
0.5
0.2
0.4
0.2
0
12.2
0
Shape
Nature
0.5
3.6
0.6
Theme
Food&drink
22.2
16.5
4.6
Photography
5.9
3.9
0
Work
2.2
1.8
0
Sports
2.0
0.7
0.2
10
38.1
1.6
0.2
1.6
2.0
1.2
1.0
6.6
2.0
Color
Interiors
6.4
3.2
8.7
0
1.0
Misc. scene
1.1
6.6
1.6
4.5
0
0.4
3.7
2.0
3.6
0.2
1.2
7.6
0.2
3.6
Property
Landscape
0.5
4.7
1.4
2.9
Mood
Motion
Cityscape
2.7
4.2
Emotion
3.2
5.5
2.6
2.4
Activity
0
2.3
6.8
Event
Affective
20.1
5.4
1.5
7.6
Description
Non-living
Vehicles
Misc. theme
Story
5.6
5.9
6.6
Distance
1.5
5.7
0.2
Black&white
1.7
0
0
Style/technique
0.7
0.2
6.3
Image size
1.2
0
0
Cropping
0.5
0
0
Download