TRECVID Evaluations Mei-Chen Yeh 03/27/2012 Introduction • Text REtrieval Conference (TREC) – Organized by National Institute of Standards (NIST) – Support from government agencies – Annual evaluation (NOT a competition) – Different “tracks” over the years, e.g. • web retrieval, email spam filtering, question answering, routing, spoken documents, OCR, video (standalone conference from 2001) • TREC Video Retrieval Evaluation (TRECVID) Introduction • Objectives of TRECVID – Promote progress in content-based analysis and retrieval from digital videos – Provide open, metrics-based evaluation – Model real world situations Introduction • Evaluation is driven by participants • The collection is fixed, available in the spring – 50% data used for development, 50% for testing • Test queries available in July, 1 month to submission • More details: – http://trecvid.nist.gov/ TRECVID Video Collections • Test data – – – – – Broadcast news TV programs Surveillance videos Video rushes provided by BBC Documentary and educational materials supplied by the Netherlands Institute for Sound and Vision (2007-2009) – The Gatwick airport surveillance videos provided by the UK Home Office (2009) – Web videos (2010) • Languages – English – Arabic – Chinese Collection History Collection History • 2011 – 19200 online videos (150 GB, 600 hours) – 50 hours of airport surveillance videos • 2012 – 27200 online videos (200 GB, 800 hours) – 21,000 equal-length, short clips of BBC rush videos – airport surveillance videos (not yet announced) – ~4,000-hour collection of Internet multimedia Tasks • • • • • • • Semantic indexing (SIN) Known-item search (KIS) Content-based copy detection (CCD) – by 2011 Interactive surveillance event detection (SED) Instance search (INS) Multimedia event detection (MED) Multimedia event recounting (MER) – since 2012 Semantic indexing • System task: – Given the test collection, master shot reference, and concept definitions, return for each concept a list of at most 2000 shot IDs from the test collection ranked according to their likeliness of containing the concept. • 500 concepts (since 2011) • “Concept pair” (2012) Examples • • • • • • • • • • • Boy (One or more male children) Teenager Scientists (Images of people who appear to be scientists) Dark skinned people Handshaking Running Throwing Eaters (Putting food or drink in his/her mouth) Sadness Anger Windy (Scenes showing windy weather) Full list Example (concept pair) • • • • • • • • • • Beach + Mountain Old_People + Flags Animal + Snow Bird + Waterscape_waterfront Dog + Indoor Driver + Female_Human_Face Person + Underwater Table + Telephone Two_People + Vegetation Car + Bicycle Known-item search • Models the situation in which someone knows of a video, has seen it before, believes it is contained in a collection, but doesn't know where to look. • Inputs – A text-only description of the video desired – A test collection of videos • Outputs – Top ranked videos (automatic or interactive mode) Examples • Find the video with the guy talking about how it just keeps raining. • Find the video about some guys in their apartment talking about some cleaning schedule. • Find the video where a guy talks about the FBI and Britney Spears. • Find the video with the guy in a yellow T-shirt with the big letter M on it. • … http://www-nlpir.nist.gov/projects/tv2010/ki.examples.html Content-based copy detection Surveillance event detection • Detects human behaviors in vast amounts surveillance video, real time! • For public safety and security • Event examples – – – – – – – Person runs Cell to ear Object put People meet Embrace Pointing … Instance search • Finds video segments of a certain specific person, object, or place, given a visual example. Instance search • Input – a collection of test clips – a collection of queries that delimit a person, object, or place entity in some example video • Output – for each query up to the 1000 clips most likely to contain a recognizable instance of the entity Query examples Multimedia event detection • System task – Given a collection of test videos and a list of test events, indicate whether each of the test events is present anywhere in each of the test videos and give the strength of evidence for each such judgment. • In 2010 – Making a cake: one or more people make a cake – Batting a run in: within a single play during a baseball-type game, a batter hits a ball and one or more runners (possibly including the batter) scores a run – Assembling a shelter: one or more people construct a temporary or semi-permanent shelter for humans that could provide protection from the elements. • 15 new events are released for 2011, not yet announced for 2012. Multimedia event recounting • New in 2012 • Task – Once a multimedia event detection system has found an event in a video clip, it is useful for a human user to be able to examine the evidence on which the system's decision was based. An important goal is for that evidence to be semantically meaningful to a human. • Input – a clip and a event kit (name, definition, explication--textual exposition of the terms and concepts, evidential descriptions, and illustrative video exemplars) • Output – a clear, concise text-only (alphanumeric) recounting or summary of the key evidence that the event does in fact occur in the video Schedule • • • • • • • Feb. call for participation Apr. complete the guidelines Jun.-Jul. release query data Sep. submission due Oct. return the results Nov. paper submission due Dec. workshop Call for partners • Standardized evaluations and comparisons • Test on large collections • Failures are not embarrassing, and can be presented at the TRECVID workshop! • Anyone can participate! – A “priceless” resource for researches