Lecture 7 - University of Surrey

advertisement
CSM06 Information Retrieval
Lecture 7: Image Retrieval
Dr Andrew Salway a.salway@surrey.ac.uk
Recap…
• So far we have concentrated on text
analysis techniques and indexingretrieval of written documents
• The indexing-retrieval of visual
information (image and video data)
presents a new set of challenges –
especially for understanding the content
of images and videos…
Lecture 7: OVERVIEW
• Different kinds of metadata for indexingretrieving images (these also apply to videos)
• The “sensory gap” and the “semantic gap”, and
why these pose problems for image/video
indexing-retrieval
• Three approaches to the indexing-retrieval of
images:
– Manual indexing, e.g. CORBIS, Tate
– Content-based Image Retrieval (visual similarity;
query-by-example), e.g. QBIC and BlobWorld
– Automatic selection of keywords from text related
to images, e.g. WebSEEK, Google, AltaVista
Different kinds of images
• Photographs: holiday albums, news
archives, criminal investigations
• Fine art and museum artefacts
• Medical images: x-rays, scans
• Meteorological / Satellite Images
As with written documents, each
image in an image collection
needs to be indexed before it
can be retrieved...
Image Description Exercise
Imagine you are the indexer of an image
collection…
1) List all the words you can think of that describe the
following image, so that it could be retrieved by as
many users as possible who might be interested in
it. Your words do NOT need to be factually correct,
but they should show the range of things that could
be said about the image
2) Put your words into groups so that each group of
words says the same sort of thing about the image
3) Which words (metadata) do you think a machine
could extract from the image automatically?
Words to index the image…
Metadata for Images
• “A picture is worth a thousand
words…”
• The words that can be used to index
an image relate to different aspects
of it

• We need to label different kinds of
metadata for images
– to structure how we store /
process metadata
– some kinds of metadata will
require human input than others
Metadata for Images
• Del Bimbo (1999):
– content-independent;
– content-dependent;
– content-descriptive.
• Shatford (1986):
(in effect refines ‘content descriptive’)
– pre-iconographic;
– iconographic;
– iconological.
Metadata for Images (Del Bimbo 1999)
• Content-independent: data which is not
directly concerned with image content,
and could not necessarily be extracted
from it, e.g. artist name, date, ownership
• Content-dependent: perceptual facts to
do with colour, texture, shape; can be
automatically (and therefore objectively)
extracted from image data
• Content-descriptive: entities, actions,
relationships between them as well as
meanings conveyed by the image; more
subjective and much harder to extract
automatically
Three levels of visual content
• Based on Panofsky (1939);
adapted by Shatford (1986) for
indexing visual information. In
effect refines ‘content descriptive’.
– Pre-iconographic: generic who,
what, where, when
– Iconographic: specific who, what,
where, when
– Iconological: abstract “aboutness”
The Sensory Gap
“The sensory gap is the
gap between the object in
the world and the
information in a
(computational)
description derived from
a recording of that scene”
(Smeulders et al 2000)
The Semantic Gap
“The semantic gap is the
lack of coincidence
between the information
that one can extract from
the visual data and the
interpretation that the
same data have for a user
in a given situation”
(Smeulders et al 2000)
What visual properties do these images of
Set Reading for Lecture 7
tomatoes all have in common?
What is this?
tomato?, setting sun?, clown’s nose?….
“democracy”
SEMANTIC
GAP
DISCUSSION
• What is the impact of the
sensory gap, and the
semantic gap, on image
retrieval systems?
Three Approaches to Image
Indexing-Retrieval
1. Index by manually attaching
keywords to images – query
by keywords
2. Index by automatically
extracting visual features from
images – query by visual
example
3. Index by automatically
extracting keywords from text
already connected to images
– query by keywords
1. Manual Image Indexing
• Rich keyword-based
descriptions of image content
can be manually annotated
• May use a controlled
vocabulary and consensus
decisions to minimise
subjectivity and ambiguity
• Cost can be prohibitive
Example Systems
• Examples of manually annotated image libraries:
http://www.tate.org.uk/servlet/SubjectSearch
(Art gallery)
www.corbis.com (Commercial)
• Examples of controlled indexing schemes, see:
– www.iconclass.nl (Iconclass developed as an
extensive decimal classification scheme for
the content of paintings)
– http://www.getty.edu/research/conducting_res
earch/vocabularies/aat/ (Art and Architecture
Thesaurus)
– http://www.sti.nasa.gov/products.html#pubtoo
ls (NASA thesaurus for space / science
images)
2. Indexing-Retrieval based on
Visual Features
• Also known as “Content-based Image
Retrieval”; cf. del Bimbo’s contentdependent metadata
• To query:
– draw coloured regions (sketch-based query) ;
– or choose an example image (query by
example)
• Images with similar visual features are
retrieved (not necessarily similar
‘semantic content’)
Indexing-Retrieval based on
Visual Features
• Visual Features
–
–
–
–
Colour
Texture
Shape
Spatial Relations
• These features can be computed directly
from image data – they characterise the
pixel distribution in different ways
• Different features may help retrieve
different kinds of images
What images would this query return?
Example Systems
• QBIC (Query By Image Content), developed
by IBM and used by, among others, the Hermitage Art
Museum
http://wwwqbic.almaden.ibm.com/
• Blobworld - developed by researchers at the
University of California
http://elib.cs.berkeley.edu/photos/blobworld/start.html
3. Extracting keywords from text
already associated with images…
“One way to resolve the semantic
gap comes from sources
outside the image by
integrating other sources of
information about the image in
the query. Information about an
image can come from a number
of different sources: the image
content, labels attached to the
image, images embedded in a
text, and so on.”
(Smeulders et al 2000).
Extracting keywords from text
already associated with images…
• Images are often accompanied by, or
associated with, collateral text, e.g.
the caption of a photograph in a
newspaper, the caption of a painting
in an art gallery…
• And, on the Web, the text in the
HREF tag
• Keywords can be extracted from the
collateral text and used to index the
image
WebSEEK System
• The WebSEEK system processes
HTML tags linking to image data files
in order to index visual information
on the Web
• NB. Current web search engines, like
Google and AltaVista, appear to be
doing something similar
WebSEEK System
(Smith and Chang 1997)
• Keyword indexing and subject-based
classification for WWW-based image retrieval:
user can query or browse hierarchy
• System trawls Web to find HTML pages with links
to images
• The HTML text in which the link to an image is
embedded is used for indexing and classifying
the video
• >500,000 images and videos indexed with
11,500 terms; 2,128 classes manually created
WebSEEK System
(Smith and Chang 1997)
• The WebSeek system processed
HTML tags linking to image and
video data files in order to index
visual information on the Web
• The success of this kind of approach
depends on how well the keywords
in the collateral text relate to the
image
• Keywords are mapped automatically
to subject categories; the categories
are created previously with human
input
WebSEEK System
(Smith and Chang 1997)
• Term Extraction: terms extracted
from URLs, alt tags and hyperlink
text, e.g.
http://www.mynet.net/animals/dome
stic-beasts/dog37.jpg
“animals”, “domestic”, “beasts”,
“dog”
 Terms used to make an inverted
index for keyword-based retrieval
• Directory names also extracted, e.g.
“animals/domestic-beasts”
WebSEEK System
(Smith and Chang 1997)
• Subject Taxonomy: manually
created ‘is-a’ hierarchy with key-term
mappings to map key-terms
automatically to subject classes
• Facilitates browsing of the image
collection
WebSEEK System
(Smith and Chang 1997)
• The success of this kind of approach
depends on how well the keywords
in the collateral text relate to the
image
• URLs, alt tags and hyperlink text
may or may not be informative about
the image content; even if
informative they tend to be brief –
perhaps further kinds of collateral
text could be exploited
Image Retrieval in Google
• Rather like WebSEEK, Google
appears to match keywords in file
names and in ‘alt’ caption, e.g.
<img src="/images/020900.jpg" width=150
height=180 alt="David Beckham tussles
with Emmanuel Petit">
Essential Exercise
Image Retrieval Exercise:
“The aim of this exercise is for you to
understand more about the
approaches used by different kinds of
systems to index and retrieve digital
images.”
**DOWNLOAD from module webpage**
Further Reading
• A paper about the WebSEEK system:
Smith and Chang (1997), “Visually Searching the Web for
Content”, IEEE Multimedia July-September 1997, pp.
12-20. **Available via library’s eJournal service.**
•
Different kinds of metadata for images, and an
overview of content-based image retrieval:
Excerpts from del Bimbo (1999), Visual Information
Retrieval – available in library short-term loan articles.
•
For a comprehensive review of CBIR, and
discussions of sensory gap and semantic gap
Smeulders, A.W.M.; Worring, M.; Santini, S.; Gupta, A.;
Jain, R. (2000), “Content-based image retrieval at the
end of the early years.” IEEE Transactions on Pattern
Analysis and Machine Intelligence, Volume 22, number
12, pp.1349-1380. **Available online through library’s
eJournals.**
Eakins (2002), ‘Towards Intelligent Image Retrieval’,
Pattern Recognition 35, pp. 3-14.
Enser (2000), ‘Visual Image Retrieval: seeking the alliance
of concept-based and content-based paradigms’,
Journal of Information Science 26(4), pp. 199-210.
Lecture 7: LEARNING OUTCOMES
You should be able to:
- Define and give examples of different
kinds of metadata for images.
- Discuss how different kinds of image
metadata are appropriate for different
users of image retrieval systems
- Explain what is meant by the sensory
gap and semantic gap, and discuss
how they impact on image retrieval
systems
- Describe, critique and compare three
different approaches to indexingretrieving images with reference to
example systems
Reading ahead for LECTURE 8
If you want to prepare for next week’s lecture
then take a look at…
Informedia Research project:
http://www.informedia.cs.cmu.edu/
Yanai (2003), “Generic Image Classification Using Visual
Knowledge on the Web”, Procs ACM Multimedia 2003.
***Only Section 1 and Section 5 are essential
Download