Visual information Systems SPIE

advertisement
Visual information Systems:
Lessons for its Future
Prof. D. Petkovic, SFSU
Prof. R. Jain, UC Irvine
Dpetkovic@cs.sfsu.edu
SPIE January 2005, San Jose
Prof. D. Petkovic, Prof. R. Jain
1
Goals
• Look at past and present of visual information systems
from the standpoint of us, computer science researchers in
content based retrieval, CV, AI, and multimedia
• Analyze progress and status
• Identify future opportunities and challenges in making
content based retrieval and our work become part of
successful applications
• Discuss role of CV, AI and content based retrieval
researchers for future development
Intended to be critical and self-critical, and to call for action
and changes in the way we do the work
Assumption: ultimately, research has to influence real world
applications
Prof. D. Petkovic, Prof. R. Jain
2
What are visual information
systems?
Prof. D. Petkovic, Prof. R. Jain
3
Are visual information systems only this?
(content based retrieval, CV and AI - centric view)
Image
Video
Information – retrieval, query,
browsing, visualization
Prof. D. Petkovic, Prof. R. Jain
4
No, they are all this!
Related
Data
Metadata
Links to
related
info
Measurements
Image
Location
Audio
Video
WWW
info
Time
Integration
Information – retrieval, query,
browsing, visualization, delivery
of all the information
5
What do users do with visual
information systems
• Search or browse for images/video of their current interest
and then review/playback/process the results?
• But most often: search or browse for information and
knowledge where image/video is but one aspect of it
–
–
–
–
–
–
–
Entertain
Learn
Explore
Investigate/experiment/evaluate
Communicate
Teach, train
Manage personal data
Prof. D. Petkovic, Prof. R. Jain
6
Examples of commercial or nearcommercial visual info systems
• Recording and sharing of personal visual data
– My Life Bits (Microsoft Bay Area Research)
• http://www.research.microsoft.com/barc/MediaPresence/MyLif
eBits.aspx
– Internet Photo Albums
• http://photos.yahoo.com/ph//my_photos
• Scientific research and education
– Astronomy: SkyServer (Microsoft Bay Area Research)
• http://cas.sdss.org/dr3/en/
– Bioinformatics
• http://hedgehog.sfsu.edu/home/index.aspx
• Cell video
Prof. D. Petkovic, Prof. R. Jain
7
Examples (2)
• News
– http://news.yahoo.com/
• Entertainment
– http://movies.yahoo.com/
• Visual info sharing on the WWW
– http://video.search.yahoo.com/
• Art
– Getty Museum
•
http://www.getty.edu/art/
– Hermitage
•
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English
Prof. D. Petkovic, Prof. R. Jain
8
Examples (3)
• Remote sensing and surveillance
– http://www.landsat.org/
• Training and education
– http://www.employeeuniversity.com/corporatevideotraining/index.htm
– http://coursestream.sfsu.edu/
• Biometrics
– Face recognition and matching
– Fingerprints
– Iris
Prof. D. Petkovic, Prof. R. Jain
9
Search vs. browse or manually
prepared material
• It is not always search/query over indexed collections
(whether they are manually or automatically indexed)
• Very often (entrainment, on-line learning) the primary
function is browsing from a limited list of well organized
and manually prepared material.
• “Carefully prepare once – show/sell many times” paradigm
justifies investment and need for manual expert preparation
– Movie trailers are works of art with the purpose of marketing and
sales – high level of expert manual prep is required and will likely
stay that way
• Currently, market of “Carefully prepare once – show/sell
many times” is much larger
• Search is most often based on current (changing) interests
Prof. D. Petkovic, Prof. R. Jain
10
Some history and perspective…
• Early nineties (BI- before Internet): excitement of early
discovery and (over)promises
• Mid to late Nineties: explosion of Internet, things are hot!
Promise of ubiquitously available data. Furious work to
achieve goals (research and startup community). WWW
media emerging. MPEG7 started
• 2000 and beyond: “Crash” of Internet R&D (I.e. it became
ubiquitous). Promises of content based retrieval still
unfulfilled  visual info systems applications doing well
(media is integral part of WWW applications) but with
little use of content based retrieval techniques
Prof. D. Petkovic, Prof. R. Jain
11
Content based Retrieval – the
early dream
• It was immediately identified that the process of indexing
(I.e. attaching searchable metadata) of image and video is a
big problem
• Idea of content based retrieval:
– Process images and videos to automatically extract searchable
indices. Heavy use of AI, CV, PR was to be applied
– Indexes to be used for search and retrieval by “similarity” (“show
me image like this”)
• Content based retrieval was ultimately supposed to make
indexing and searching of vast image/video databases
automated and economical and reduce or even eliminate
need for text metadata
• Great excitement among research community and some
potential customers, and many excellent pieces of work
Prof. D. Petkovic, Prof. R. Jain
12
QBIC history
• Excitement among researchers at IBM Almaden Research. First
prototype very exciting, generated a lot of related work in research
community. Many good papers and patents (QBIC and others)
• Excitement among early customers in art and stock imaging/video
• Marketing first skeptical but then started to oversell (e.g. you do not
need text metadata any more)  we needed to get involved to tone it
down
• Transferred into IBM Digital Library and DB2
• Business (real $) did not happen:
– It was hard to estimate QBIC added value
– QBIC search was limited
– Too early (this was before or early Internet times)
• But QBIC did bring good marketing and attention to multimedia
features of IBM DB2 and DL. It was used successfully as a marketing
tool
–
http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English
• IBM QBIC group lost some credibility in IBM product divisions
• QBIC grew into CueVideo (video + audio indexing and search)
13
Status today: successful visual
information systems application
Prof. D. Petkovic, Prof. R. Jain
14
Characteristics of successful
commercial systems today
• Content consist of images/video but also of variety of
critically important related data (text, audio, prices, links,
measurements etc.) arranged in easy to use GUI
• Indexing and data organization done predominantly
manually with predefined and simple metadata structures
and ontology
• Metadata schemas defined by domain professionals, not
computer scientists. Most are very simple. MPEG 7 not
widely used
• Search is very simple: title, author, and sometimes a few
keywords against manually entered data
• Browsing: by alphabet, time, price, using video key, image
thumbnails, often from manually prepared collections
• Content based indexing and retrieval not used
Prof. D. Petkovic, Prof. R. Jain
15
How is indexing of images/video
done today
• Manually entered metadata, usually from a fixed list/structure
• Defined metadata structure into which the content providers can
publish the content (many standards exist). Most used
standards are relatively simple (e.g. really Simple Syndication –
RSS http://blogs.law.harvard.edu/tech/rss
• WWW: crawlers analyze image “context”: where on the WWW
page the image is, ALT tags, use of the associated text linked to
image etc.
• Use of manually generated close captions for video indexing
• Only very rudimentary content based analysis: image type,
dimensions, whether the image is color or B&W, photographic or
clip art etc.
• Even basic content based retrieval (color histograms,
composition) practically not used
Prof. D. Petkovic, Prof. R. Jain
16
Content based Retrieval – why it
is not enough
• Assume content based retrieval worked perfectly. What
could it ultimately do?
–
–
–
–
Image color and spatial composition
Recognition/matching of some major objects (people, buildings)
Motion, action recognition
Full speech to text
• Even this ideal situation is not enough! We also need:
– Other info about the image/video (when, who, where, what, related
scientific measurements…)
– Who, where, when and why
– Related data and links to related data etc.
– Integration and synchronization with other sources of data across
semantics, time, location, cause/effect dimensions
….. And much more, none of it recorded in pixels
17
What next?
Prof. D. Petkovic, Prof. R. Jain
18
Future opportunities and challenges
– some ideas
• Improve process of media annotation and indexing
(automated and semi automated)
• Define visual ontology, applications specific then more
general
• Leverage and improve speech recognition, general and
domains specific
• Integrate variety of data (media and related data) and
provide unified multimedia modeling and handling
• Incorporate time and location search into the mainstream
Prof. D. Petkovic, Prof. R. Jain
19
Improve process of media annotation
and indexing
• Automate metadata that lend themselves to automation. Leverage
semiautomated means, but pay utmost attention to HCI
• Compute indexes based on all related data and clues (WWW links,
tags, audio, GPS etc.)
• Allow multimedia annotation to help: annotate text, outline
image/video objects using pointing, add links…
• Use power of internet community to enable economical media
annotations
– e.g. ESP annotation game by CMU www.espgame.org
• Improve usability to enable annotation at most opportune time and
make it very easy to use (during capture, in free/fun time etc)
• Leverage speech (audio tags and speech recognition)
• Pay attention to ease of use and GUI
• Use time and location
• Image and video can be the data but also an index to libraries
Prof. D. Petkovic, Prof. R. Jain
20
Define visual ontology, general or
applications specific
• Define ontology of visual media: structure, terms etc. as
well as related extraction procedures
• Not clear if general ontology is practical  work on
domains specific ones first, then try to generalize
• Make it simple and work with domain experts
• Offer procedures for automated and semiautomatic
instantiation of ontologies using all available info
• Much work already done outside of CS community (e.g.
domains specific standards for data submissions)
Prof. D. Petkovic, Prof. R. Jain
21
Links to some metadata standards (most are
XML based and developed outside of CS)
• Dublin Core
– http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/
• MPEG7 ISO standard for Video (class 8)
– http://www.mpeg.org/MPEG/starting-points.html#mpeg7
• METS for Digital Libraries
– http://www.loc.gov/standards/mets/
• AIIM Standards (Enterprise Content Management)
– http://xml.coverpages.org/AIMM-Images200104.html
– http://xml.coverpages.org/umnImages.html
– http://digital.lib.umn.edu/elements.html
• Really Simple Syndication (RSS) to be used by Yahoo videos search
– http://blogs.law.harvard.edu/tech/rss
Prof. D. Petkovic, Prof. R. Jain
22
Leverage and improve speech
recognition, general and domains
specific
• Speech and audio has a wealth of information: semantics
and timing. It is easily available and natural and effective
for input
• Speech recognition engines are today trained on general
English, with no specific names and domains specific
terms. Problem: terms most often searched for (names,
specific domain terms,) which are not in speech engine 
develop domain specific speech engines
• Leverage speech and audio as annotation medium
• Push speech and audio annotation into capture devices
• Synchronize, cross-index speech with related textual data
for indexing and increased accuracy
Prof. D. Petkovic, Prof. R. Jain
23
Integrate variety of data (media and related
data) and provide unified multimedia
modeling and handling
• Visual information is based on visual media AND related
information (links, text, documents, measurements, slides
etc).  enable integrated indexing, data organization,
search and browse
• Leverage time and location in indexing and search
• Create unified multimedia data models with unified
storage, indexing and query, across semantics, time,
location
• Integrate across variety of data types both semantically, at
GUI level, and at the system level (e.g. cross index video
with slides and text info)
• From data to information: old “chasm” still exists – work
on it, first by solving some concrete applications
Prof. D. Petkovic, Prof. R. Jain
24
But also…..
• Work very closely with users and domain experts. Develop
real and complete applications
• Take a broader usage, system and application view (v.s
looking for application of AI, CV and content based
retrieval)
• Collaborate with DB, HCI and Internet systems researchers
• Leverage all sources of information, not only image and
video
• Perform extensive experimental evaluations and participate
in formal benchmarks (see e.g. NIST TRAC competition
rules)
• Contribute and participate in standards activities
• Pay much more attention to GUI and HCI and perform
more formal and complete user evaluations
25
Not doing this risks making us irrelevant…
Acknowledgement
• We thank J. Gray, J. Gemmell (Bay Area Microsoft
Research), R. Singh (SFSU), B. Horoowitz (Yahoo), A.
Amir (IBM Almaden Research) for comments and
feedback
Prof. D. Petkovic, Prof. R. Jain
26
Prof. D. Petkovic, Prof. R. Jain
27
Some history and perspective…
• Late eighties and early nineties: excitement of
early discovery and (over)promises
– CPU, networks and storage started to enable reasonably good
manipulation, rendering and processing of images
– Multimedia appears as a field
– DB people are being courted by CV/AI people to broaden their
views and include multimedia data
– First projects on content based retrieval: e.g. QBIC (IBM),
PhotoBook (MIT)…
– Startup activity: Virage and others
– Interest from CV and AI researchers
– First joint conferences with DB and CV communities
– Many (over) promises that caught the eye of investors, marketers
etc.
Prof. D. Petkovic, Prof. R. Jain
28
Some history and perspective…
• Mid to late Nineties: explosion of Internet, things
are hot! Furious work to achieve goals
– Internet enabled better communication and gradually made images,
then video feasible to manipulate, send and view. Explosive
growth
– Advances in compression, networking, CPU, media formats,
standards and storage helped greatly. Cheap capture devices
starting to happen
– Multimedia moves and melds slowly into Internet
– DB vendors start to embrace multimedia types (blobs, extenders,
blades, cartridges)
– Content base retrieval becomes a very popular research topics for
CV and AI. Many conferences and workshop organized
– MPEG7 activity started
– Availability of research and venture funding continues
– First trials, first products with content based retrieval (Virage,
29
IBM, Informix…)
Some history and perspective…
• 2000 and beyond: wake up, crash of Internet (I.e.
it became ubiquitous). Promises of content based
retrieval still unfulfilled
– Internet became common thing (which is good) but lost its research
appeal  it became a “vehicle”. It is still growing rapidly with
more and more visual data
– Explosion of image and video on internet as well as cheap capture
devices (e.g. phones capturing audio+image+video+text+GPS)
– Further advances in networking, CPU, storage made image and
video ubiquitously available and affordable
– Startups based on content based retrieval not doing well or folded
– Strong research activity
– Most applications resolved by researchers outside of CS
– Minimal or no use of content based retrieval in commercial world
Prof. D. Petkovic, Prof. R. Jain
30
Download