Image Tagging (Perry Rajnovic)

advertisement
Image Tagging
Attaching textual meta-information
or semantic linkages to images
By Perry Rajnovic
What is a Digital Image?


Digital Images are usually defined as an
organized display of pixels, often called a
bitmap.
Each pixel is a numeric representation of the
color intensities of that point.
What is a Digital Image?
Each pixel may be explicitly defined or be
the result rendered by a vector or graphics
package functionality.
 These representations include no inherent
textual elements or semantic description.
 Due to the above, images are not easily
machine-readable.

Why machine-readability?
Most searches are done via textual
queries, thus there must be a mechanism
to link applicable keywords or phrases to
images.
 For blind persons, being able to convey
information about the image in another
medium would be good for accessibility.

Image Contents
The contents of an image can be a full
description written in prose (i.e. the adage
“A picture is worth 1000 words”), or might
simply have a few keywords describing
spatial, temporal, or emotional aspects.
 In many cases, accurately identifying the
content of images requires human
intervention.

Identifying Image Contents
Many good pattern recognition algorithms
exist, however few are able to interpret the
patterns extracted.
 Artificial Intelligence algorithms can learn
recognized patterns, but such a system’s
flexibility is limited by its predefined
knowledgebase.

Identifying Text in Images

CAPTCHAs (or Completely Automated Public
Turing test to tell Computers and Humans Apart)
are images which contain a distorted rendering
of some text.
Identifying Text in Images
Their goal is to provide an easy task for
humans to do, but that is extremely hard
for computer programs to perform equally.
 For this task, OCR is generally not
sufficient enough to extract the text.
 This is a good example of why machinereadable information should be available.

Example Tag Contents

As an example of
what might be
provided to tag an
image, to the right is a
list of words and
phrases to describe
this slide’s header.






Navy Blue
Squares
Fade-out
Horizontal Bar
Minimalist
Decorative
User Applications

Many applications take advantage of
image tagging, below are a few examples.
 Apple
iPhoto
 Google Picasa
 Adobe Photoshop Elements

Generally these programs use tagging for
organization and user-defined searching.
Web Applications

Several Web-based applications are now
including tagging for images, as well as
other non-image based features.
 Google
ImageLabeler
 Flickr.com
 Facebook.com
 23hq.com
 Fotki.com
Google Images
Luis von Ahn developed the “ESP Game”
which could be used to tag images.
 He presented a Google tech talk about the
game as a form of human computation.
 Google later licensed the technology to
create a similar web application called the
Google Images ImageLabeler.

Google ImageLabeler
The ImageLabeler game allows to random
users to generate tags that accurately
describe images.
 The tags should be accurate due to game
constraints, and gain specificity after
several rounds.
 The computed tags can improve searches.

Flickr
Flickr is a “Web 2.0” photo hosting and
sharing site.
 Users are encouraged to upload photos,
then to name, describe, tag, annotate,
geotag, comment on, and group their
photos in collaborative ways.

Flickr - Tags



Tags are words or
phrases meant to act
as keywords.
They are searchable
within the site, and
can show popular
topics.
They improve search
relevance.
Flickr - Geotagging

Geotagging is a term
for adding geospatial
metadata to images
such as the latitude,
longitude and other
directional indications
of where a photo was
taken at.
What are annotations?

Wikipedia defines
them as:
 Extra
information
associated with a
particular point in a
document or other
piece of information.

The US DoD defines
them as:
 A marking
placed on
imagery or drawings
for explanatory
purposes or to indicate
items or areas of
special importance.
Annotating Images

The use of annotations with images can
provide several useful functions. Below
are some examples:
 Point
out a specific piece of content.
 Explain some icon or graphic.
 Summarize the meaning of some region.
 Provide additional information via text.
Flickr - Notes
Flickr provides a feature called Notes. It
uses a Flash-based implementation of an
annotation system.
 You can dynamically size a rectangular
region over a portion of the image, then
attach a snippet of text to describe it.

FotoNotes
FotoNotes is a data format for annotating
images.
 Allows you to embed the metadata directly
into the image files for portability.
 Flickr’s Notes feature is inspired by this
standard and accompanying visualization
implementation.

FotoNotes - More
It was developed by Greg Elin.
 The homepage provides links to groups
working with the standard.
 Additionally, an implementation which
works in most browsers is provided as-is
for customization.

Facebook
Facebook.com has a tagging feature that
is integrated with “My Photos”.
 It allows you to add a textual descriptor
(tag or person’s name) to a specific point
in the image.
 This allows the module to describe who or
what are included in a specific album.

Facebook – Tag Display
When the images are viewed, placing the
mouse over a tag displays a fixed sized
square indicating where the tag (person) is
located within the image.
 This enables users to identify objects by
visual inspection or by matching the list of
contained objects with their tag displays.

Facebook – Links
Another capability incorporates the site’s
concept of friends. If the person you tag is
identified as your friend on the site, their
name will link to their profile.
 The site will also count this image in the
“photos of” feature on their profile, allowing
inclusion of photos added by other users.

Other Image Metadata: MPEG
The MPEG-7 standard is a “Multimedia
Content Description Interface”
 “MPEG-7 is not aimed at any one
application in particular; rather, the
elements that MPEG-7 standardises shall
support as broad a range of applications
as possible.”

Other Image Metadata: Adobe
Adobe Systems created a new MetaData
framework for images called XMP
(Extensible Metadata Platform).
 It is publicly documented, based on W3C
standards, built on XML, and is designed
to eliminate growing incompatibility for
metadata storage.

Other Image Metadata: IPTC
The International Press
Telecommunications Council created
standards for the interchange of news data
over a decade ago.
 These standards still persist in their IIM
standard, as well as being usable in the
newer XMP framework.

Improving Clustering
Search Interfaces
Joint Term Project
By Perry Rajnovic
and Mark Zalar
Term Project
For my term project, I will be working with
Mark Zalar to develop a new search
engine interface
 It will draw inspiration from all of the top
search engines today, along with the
enhancements now possible using
emerging technologies.

Project Goal
The Goal of the project is to implement a
search site that provides a highly usable
interface for query refinement.
 Our backend will use clustering
mechanisms to allow for easy refocusing
of search topics
 Our frontend will use AJAX for flexibility.

Frontend Design
The frontend will be designed with a
technology known as Asynchronous
JavaScript and XML (AJAX).
 This technology allows the site designer to
run unseen requests to the server and
parse XML-based results in the scripting
language for interactivity.

Frontend Theory
Most clustering based search solutions
available today use minimally interactivity.
 Our theory is that making the ability to
harness the power of clustering
dynamically as you refine your search will
improve results as well as time necessary
to finish a search.

AJAX Functionality
Our site will use AJAX to dynamically
reconfigure the clustering menu. This
allows a quicker browsing of clusters to
identify the optimal range of pages to
search within.
 The menu will also use a novel interface
that shows sibling and parent clusters.

AJAX Functionality 2
The results will be displayed to the user
with some animation.
 This will help to alert users when changes
are made to the order or set of results.
 Another advantage of this is that users will
be more aware of the difference between
clusters as they browse them.

Search Target
This search engine could target both
websites and images.
 Valid keywords improve content
knowledge.
 Clustering would be highly useful in finding
an image with a desired scene or set of
objects.

Example: search “creature”


The engine might
identify a general
cluster of “animal” or
“being”.
Animal might have
more results, so the
medium level clusters
are shown for that.
Example: search “creature”


User wants a general
discussion of
mammals. Selects
that cluster.
The results change to
focus on those related
to mammals as a
group and in specific.
Results Display
To provide animation, a similar technology
to that found in “TiddlyWiki” will be used.
 This interface allows topics to be added
and removed dynamically with animation.
 Additionally, extra links can be attached to
each topic for more functionality (open in
new window, similar items, etc.)

Future Enhancements
Our implementation will provide a basic
mockup of the interface and refinement
techniques made available.
 Several enhancements could be made to
this interface that would improve its
usability or functionality.

Enhancements in Search
Taking advantage of a meta-search would
allow the clustering algorithm to have a
higher volume of data with which to
generate data topologies to be explored.
 Using adaptive search (by userID or global
optimization) would improve clustering by
choosing ones more often used.

Enhancements in Interface
Because the site will be AJAX based, a
large amount of flexibility is possible with
respect to changes in the interface.
 The browser window is similar to a
canvas, with all of the site’s underlying
Document Object Model available for
addition, modification or deletion.

Download