TRECVID

advertisement
TRECVID Evaluations
Mei-Chen Yeh
03/27/2012
Introduction
• Text REtrieval Conference (TREC)
– Organized by National Institute of Standards (NIST)
– Support from government agencies
– Annual evaluation (NOT a competition)
– Different “tracks” over the years, e.g.
• web retrieval, email spam filtering, question answering,
routing, spoken documents, OCR, video (standalone
conference from 2001)
• TREC Video Retrieval Evaluation (TRECVID)
Introduction
• Objectives of TRECVID
– Promote progress in content-based analysis and
retrieval from digital videos
– Provide open, metrics-based evaluation
– Model real world situations
Introduction
• Evaluation is driven by participants
• The collection is fixed, available in the spring
– 50% data used for development, 50% for testing
• Test queries available in July, 1 month to
submission
• More details:
– http://trecvid.nist.gov/
TRECVID Video Collections
• Test data
–
–
–
–
–
Broadcast news
TV programs
Surveillance videos
Video rushes provided by BBC
Documentary and educational materials supplied by the Netherlands
Institute for Sound and Vision (2007-2009)
– The Gatwick airport surveillance videos provided by the UK Home
Office (2009)
– Web videos (2010)
• Languages
– English
– Arabic
– Chinese
Collection History
Collection History
• 2011
– 19200 online videos (150 GB, 600 hours)
– 50 hours of airport surveillance videos
• 2012
– 27200 online videos (200 GB, 800 hours)
– 21,000 equal-length, short clips of BBC rush
videos
– airport surveillance videos (not yet announced)
– ~4,000-hour collection of Internet multimedia
Tasks
•
•
•
•
•
•
•
Semantic indexing (SIN)
Known-item search (KIS)
Content-based copy detection (CCD) – by 2011
Interactive surveillance event detection (SED)
Instance search (INS)
Multimedia event detection (MED)
Multimedia event recounting (MER) – since
2012
Semantic indexing
• System task:
– Given the test collection, master shot reference,
and concept definitions, return for each concept a
list of at most 2000 shot IDs from the test
collection ranked according to their likeliness of
containing the concept.
• 500 concepts (since 2011)
• “Concept pair” (2012)
Examples
•
•
•
•
•
•
•
•
•
•
•
Boy (One or more male children)
Teenager
Scientists (Images of people who appear to be scientists)
Dark skinned people
Handshaking
Running
Throwing
Eaters (Putting food or drink in his/her mouth)
Sadness
Anger
Windy (Scenes showing windy weather)
Full list
Example (concept pair)
•
•
•
•
•
•
•
•
•
•
Beach + Mountain
Old_People + Flags
Animal + Snow
Bird + Waterscape_waterfront
Dog + Indoor
Driver + Female_Human_Face
Person + Underwater
Table + Telephone
Two_People + Vegetation
Car + Bicycle
Known-item search
• Models the situation in which someone knows
of a video, has seen it before, believes it is
contained in a collection, but doesn't know
where to look.
• Inputs
– A text-only description of the video desired
– A test collection of videos
• Outputs
– Top ranked videos (automatic or interactive mode)
Examples
• Find the video with the guy talking about how it
just keeps raining.
• Find the video about some guys in their
apartment talking about some cleaning schedule.
• Find the video where a guy talks about the FBI
and Britney Spears.
• Find the video with the guy in a yellow T-shirt
with the big letter M on it.
• …
http://www-nlpir.nist.gov/projects/tv2010/ki.examples.html
Content-based copy detection
Surveillance event detection
• Detects human behaviors in vast amounts
surveillance video, real time!
• For public safety and security
• Event examples
–
–
–
–
–
–
–
Person runs
Cell to ear
Object put
People meet
Embrace
Pointing
…
Instance search
• Finds video segments of a certain specific
person, object, or place, given a visual
example.
Instance search
• Input
– a collection of test clips
– a collection of queries that delimit a person,
object, or place entity in some example video
• Output
– for each query up to the 1000 clips most likely to
contain a recognizable instance of the entity
Query examples
Multimedia event detection
• System task
– Given a collection of test videos and a list of test events,
indicate whether each of the test events is present
anywhere in each of the test videos and give the strength
of evidence for each such judgment.
• In 2010
– Making a cake: one or more people make a cake
– Batting a run in: within a single play during a baseball-type
game, a batter hits a ball and one or more runners
(possibly including the batter) scores a run
– Assembling a shelter: one or more people construct a
temporary or semi-permanent shelter for humans that
could provide protection from the elements.
• 15 new events are released for 2011, not yet
announced for 2012.
Multimedia event recounting
• New in 2012
• Task
– Once a multimedia event detection system has found an event in a
video clip, it is useful for a human user to be able to examine the
evidence on which the system's decision was based. An important goal
is for that evidence to be semantically meaningful to a human.
• Input
– a clip and a event kit (name, definition, explication--textual exposition
of the terms and concepts, evidential descriptions, and illustrative
video exemplars)
• Output
– a clear, concise text-only (alphanumeric) recounting or summary of
the key evidence that the event does in fact occur in the video
Schedule
•
•
•
•
•
•
•
Feb. call for participation
Apr. complete the guidelines
Jun.-Jul. release query data
Sep. submission due
Oct. return the results
Nov. paper submission due
Dec. workshop
Call for partners
• Standardized evaluations and comparisons
• Test on large collections
• Failures are not embarrassing, and can be
presented at the TRECVID workshop!
• Anyone can participate!
– A “priceless” resource for researches
Download