CuZero: Interactive Video Search for Informed User Shih-Fu Chang

advertisement
CuZero: Interactive Video Search
for Informed User
Shih-Fu Chang
(joint work with Eric Zavesky)
Digital Video and Multimedia Lab
Columbia University
www.ee.columbia.edu/dvmm
Oct. 2008
The Goggle Era Search Paradigm
“…type in a few words at most, then expect the engine to
bring back the perfect results. More than 95 percent of
us never use the advanced search features most
engines include, …”
– The Search, J. Battelle, 2003




Simple keyword search is the primary search
method
Extension to visual search seems to be a
natural desirable step
Examples: text search on Google, Flickr,
Youtube
digital video | multimedia lab
The Role of User
Formulate
query
Result triage
An Example of Web Image Search

Text Query : “Manhattan Cruise” over Goggle Image



How to visualize large image/video sets?
Why are these images returned? No word-document
matching
If this does not work, how do we choose better
search terms?
digital video | multimedia lab
Minor Changes in Keywords  Big Difference

Text Query : “Cruise around Manhattan”
digital video | multimedia lab
Pains of Frustrated Users

Forced to take “one shot” searches, iterating queries
with a trial and error approach...
Problem:
User’s Inability in Forming Queries

Difficult to choose words/concepts without in-depth
knowledge of data and vocabulary
A Call for Research Attention
?
Research needed
to address
user frustration
A lot of work
on content
analytics
Solution-Keep User Informed and Engaged

Help users stay “informed”




in formulating queries
in understanding search results
in changing search rapidly and flexibly
Keep users “engaged”

Instant feedback with minimal disruption
as opposed to slow “trial-and-error”
An attempt: CuZero
Zero-Latency Informed Search & Navigation
A lot of efforts in visual tagging

Multi-source tagging





Social tagging
Tagging by game
Geo-context tagging
Auto-tagging by
content analysis
Useful for keyword
search
S.-F. Chang, Columbia U.
11
Tagging by Content Analytics




Audio-visual features
Machine learning models
Web metadata
Context (geo, user etc)

Large pools of semantic
tags, concept labels
Semantic Indexes
- +
...
Statistical models
Anchor
Snow
Soccer
Building
Outdoor
Example: LSCOM concept pool/models
374 concept detection models:
objects, people, location, scenes,
events, etc
airplane airplane_takeoff airport_or_airfield armed_person building car
cityscape crowd desert dirt_gravel_road entertainment explosion_fire forest
highway hospital insurgents landscape maps military military_base
military_personnel mountain nighttime people-marching person
powerplants riot river road rpg shooting smoke tanks urban vegetation
vehicle waterscape_waterfront weapons weather
A User Community Approach to
Vocabulary Development

Large Scale Concept Ontology for Multimedia
(IBM-Columbia-CMU joint effort in 2005-6)

Ontology for tagging multi-source news video for analyst tasks

Joint effort by government analysts, librarians, researchers

Criteria: Useful, Observable, and Machine Detectable
834 concepts defined, extended to 2000+ by Cyc, 449
concepts annotated

Labeled over TRECVID 2005 development set, 61,000 shots


30+ annotators at Columbia and CMU

33 million judgments
Free for download (350+ downloads so far)


http://www.ee.columbia.edu/dvmm/lscom/
14
Results of sample concept detectors
waterfront
bridge
crowd
explosion fire
US flag
Military personnel
• Does not really solve the computer vision
problem
• But the plurality of the models make it interesting
for search
• Similar to speech retrieval from imperfect ASR
digital video | multimedia lab
Sample detection results (TRECVID2008)
Airplane flying
Classroom
Demonstration
Or Protest
Cityscape
Singing
Example: “Smoke” Detector
(Video Source: Youtube)
• Compare to other tags, content-based analysis
provide precision for deep tagging
• Ping point exact segment of concept
A prototype CuZero:
Zero-Latency Informed Search & Navigation
Informed User:
Instant Informed Query Formulation
Informed User for Visual Search:
Instant visual concept suggestion
Instant
Concept
Suggestion
query time concept mining
Lexical mapping
LSCOM
Mapping keywords to concept definition,
synonyms, sense context, etc
Co-occurrent concepts
images
car
road
Basketball courts and the
American won the Saint Denis on
the Phoenix Suns because of the 50
point for 19 in their role within the
National Association of Basketball
George Rizq led Hertha for the
local basketball game the
wisdom and sports
championship of the president
text
Baghdad to attend the game I see
more goals and the players did
not offer great that Beijing
Games as the beginning of his
brilliance Nayyouf 10 this
atmosphere the culture of sports
championship
Query-Time Concept Mining
Initial text query
results
Find dominant
visual concepts
person
meeting
dominant concepts
visual mining
CuZero Real-Time Query Interface
(demo)
Query auto
completion &
expansion
Instant
Concept
Suggestion
Achieving Zero Latency Exploration
Concept formulation (“car”)
concept formulation (“snow”)
• Overlap (concept formulation) with (map rendering)
• Hide rendering latency during user interaction
• Course-to-fine concept map planning/rendering
• Speed optimization on-going …
A prototype CuZero:
Zero-Latency Informed Search & Navigation
Address Information Overflow:
Effective Visualization
linear browser restricts
inspection flexibility
CMU Informedia Concept Filter
only people
only outdoor
Informed User:
Rapid Exploration of Results
Media Mill Rotor Browser
CuZero visualization:
Help users stay informed and engaged



Create a concept map as search anchors
Direct user control by sliding through the map
Instant display for each location, without new query
“sky”
“water”
“boat”
Achieve Breadth-Depth Flexibility
by instant concept map navigation


(demo)
Breadth: Quick sliding in the concept map
Depth: Deep exploration of each result list
Concept map
Deep exploration of
single permutation
Columbia CuZero Video Search Engine
- addressing the search frustration
Query Formulation
Support intuitive formation in arbitrary
combination of semantic concepts and
visual examples
(Video Source: Youtube)
road
fire
smoke
Navigation and exploration
Real-time refinement and update of query.
Fast, lightweight environment
Web-based client-server approach for instant
deployment with large data archive.
Smoke
Road
Explosion
Running 1
Running 2
dynamic
refinement
Simple Extensions




More flexibility in modifying concept
map
Construct composite “super” concepts
from simple ones
Construct personal search toolbox
Apply concept template to streaming
mode
Concept Map Refinement
(current method requires multi-panel interaction)
3
Changing navigation map requires
additional query reformulation and search
execution
1 User wants to replace
2
‘road’ with a related
concept
Co-occurrence can
suggest related concepts
4
Will interrupt the browsing task and
requires cumbersome text-entry and manual
concept selection
33
In-place Anchor Editing
• Allow fast lookup of
related anchors or
allow user to swap in a
ʻback-upʼ anchor
• May also suggest
strong metadata
relationships
Car
Road
• Capture time
• Author/channel
Urban
8:00
PM
• Geo location
34
NBC
Outdoor
Formulating a Super-Anchor
• Navigation map allows precise
specification of anchor weights
Car
• For search topics of high-interest, user
may have invested significant time in
formulation/browsing
• With ideal location in navigation map,
want to save this anchor configuration
Super Anchor
Road
(explosion
+ car +
outdoor + road)
Explosio
n Fire
Outdoor
• Apply on new search at later time
• Share with other users for
collaborative filtering
• Eventually build a personalized search
tool box
35
Super-Anchor
Filters
Super-Anchors: Automated Alerts
• Passive monitoring of live streams
to trigger alert
• Analyst benefits from customized
search parameters for specific
target
• Also reuse super-anchors in new
navigation map.
Snow
Super
Anchor
a
b
alternate database or streaming content
Super Anchor
(explosion + car + outdoor + road)
  
36
c
threshold
Deeper issues



Is keyword-concept search
the right paradigm?
Perhaps for certain vertical
domains?
How far can automatic
content analytics go?



Scale up the number of
concepts?
Find the right visual lexicon?
Are they accurate enough
for search?
S.-F. Chang, Columbia U.
37
Summary



Effective visual search requires attention
to help improve user experience
More challenges than tagging and retrieval
CuZero Design Principles

Keep user informed and engaged in



Query formulation and result visualization
Instant feedback
Support fusion of search modalities



Visual concept
Example
Keyword /speech transcripts
More Information

LSCOM lexicon and annotation



Include 449 concepts (object, scene, event et al)
http://www.ee.columbia.edu/lscom
Columbia DVMM Lab



S.-F. Chang, Columbia U.
http://www.ee.columbia.edu/dvmm
374 SVM-based concept detectors
Video search demos
39
Download