Visual information Systems: Lessons for its Future Prof. D. Petkovic, SFSU Prof. R. Jain, UC Irvine Dpetkovic@cs.sfsu.edu SPIE January 2005, San Jose Prof. D. Petkovic, Prof. R. Jain 1 Goals • Look at past and present of visual information systems from the standpoint of us, computer science researchers in content based retrieval, CV, AI, and multimedia • Analyze progress and status • Identify future opportunities and challenges in making content based retrieval and our work become part of successful applications • Discuss role of CV, AI and content based retrieval researchers for future development Intended to be critical and self-critical, and to call for action and changes in the way we do the work Assumption: ultimately, research has to influence real world applications Prof. D. Petkovic, Prof. R. Jain 2 What are visual information systems? Prof. D. Petkovic, Prof. R. Jain 3 Are visual information systems only this? (content based retrieval, CV and AI - centric view) Image Video Information – retrieval, query, browsing, visualization Prof. D. Petkovic, Prof. R. Jain 4 No, they are all this! Related Data Metadata Links to related info Measurements Image Location Audio Video WWW info Time Integration Information – retrieval, query, browsing, visualization, delivery of all the information 5 What do users do with visual information systems • Search or browse for images/video of their current interest and then review/playback/process the results? • But most often: search or browse for information and knowledge where image/video is but one aspect of it – – – – – – – Entertain Learn Explore Investigate/experiment/evaluate Communicate Teach, train Manage personal data Prof. D. Petkovic, Prof. R. Jain 6 Examples of commercial or nearcommercial visual info systems • Recording and sharing of personal visual data – My Life Bits (Microsoft Bay Area Research) • http://www.research.microsoft.com/barc/MediaPresence/MyLif eBits.aspx – Internet Photo Albums • http://photos.yahoo.com/ph//my_photos • Scientific research and education – Astronomy: SkyServer (Microsoft Bay Area Research) • http://cas.sdss.org/dr3/en/ – Bioinformatics • http://hedgehog.sfsu.edu/home/index.aspx • Cell video Prof. D. Petkovic, Prof. R. Jain 7 Examples (2) • News – http://news.yahoo.com/ • Entertainment – http://movies.yahoo.com/ • Visual info sharing on the WWW – http://video.search.yahoo.com/ • Art – Getty Museum • http://www.getty.edu/art/ – Hermitage • http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English Prof. D. Petkovic, Prof. R. Jain 8 Examples (3) • Remote sensing and surveillance – http://www.landsat.org/ • Training and education – http://www.employeeuniversity.com/corporatevideotraining/index.htm – http://coursestream.sfsu.edu/ • Biometrics – Face recognition and matching – Fingerprints – Iris Prof. D. Petkovic, Prof. R. Jain 9 Search vs. browse or manually prepared material • It is not always search/query over indexed collections (whether they are manually or automatically indexed) • Very often (entrainment, on-line learning) the primary function is browsing from a limited list of well organized and manually prepared material. • “Carefully prepare once – show/sell many times” paradigm justifies investment and need for manual expert preparation – Movie trailers are works of art with the purpose of marketing and sales – high level of expert manual prep is required and will likely stay that way • Currently, market of “Carefully prepare once – show/sell many times” is much larger • Search is most often based on current (changing) interests Prof. D. Petkovic, Prof. R. Jain 10 Some history and perspective… • Early nineties (BI- before Internet): excitement of early discovery and (over)promises • Mid to late Nineties: explosion of Internet, things are hot! Promise of ubiquitously available data. Furious work to achieve goals (research and startup community). WWW media emerging. MPEG7 started • 2000 and beyond: “Crash” of Internet R&D (I.e. it became ubiquitous). Promises of content based retrieval still unfulfilled visual info systems applications doing well (media is integral part of WWW applications) but with little use of content based retrieval techniques Prof. D. Petkovic, Prof. R. Jain 11 Content based Retrieval – the early dream • It was immediately identified that the process of indexing (I.e. attaching searchable metadata) of image and video is a big problem • Idea of content based retrieval: – Process images and videos to automatically extract searchable indices. Heavy use of AI, CV, PR was to be applied – Indexes to be used for search and retrieval by “similarity” (“show me image like this”) • Content based retrieval was ultimately supposed to make indexing and searching of vast image/video databases automated and economical and reduce or even eliminate need for text metadata • Great excitement among research community and some potential customers, and many excellent pieces of work Prof. D. Petkovic, Prof. R. Jain 12 QBIC history • Excitement among researchers at IBM Almaden Research. First prototype very exciting, generated a lot of related work in research community. Many good papers and patents (QBIC and others) • Excitement among early customers in art and stock imaging/video • Marketing first skeptical but then started to oversell (e.g. you do not need text metadata any more) we needed to get involved to tone it down • Transferred into IBM Digital Library and DB2 • Business (real $) did not happen: – It was hard to estimate QBIC added value – QBIC search was limited – Too early (this was before or early Internet times) • But QBIC did bring good marketing and attention to multimedia features of IBM DB2 and DL. It was used successfully as a marketing tool – http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English • IBM QBIC group lost some credibility in IBM product divisions • QBIC grew into CueVideo (video + audio indexing and search) 13 Status today: successful visual information systems application Prof. D. Petkovic, Prof. R. Jain 14 Characteristics of successful commercial systems today • Content consist of images/video but also of variety of critically important related data (text, audio, prices, links, measurements etc.) arranged in easy to use GUI • Indexing and data organization done predominantly manually with predefined and simple metadata structures and ontology • Metadata schemas defined by domain professionals, not computer scientists. Most are very simple. MPEG 7 not widely used • Search is very simple: title, author, and sometimes a few keywords against manually entered data • Browsing: by alphabet, time, price, using video key, image thumbnails, often from manually prepared collections • Content based indexing and retrieval not used Prof. D. Petkovic, Prof. R. Jain 15 How is indexing of images/video done today • Manually entered metadata, usually from a fixed list/structure • Defined metadata structure into which the content providers can publish the content (many standards exist). Most used standards are relatively simple (e.g. really Simple Syndication – RSS http://blogs.law.harvard.edu/tech/rss • WWW: crawlers analyze image “context”: where on the WWW page the image is, ALT tags, use of the associated text linked to image etc. • Use of manually generated close captions for video indexing • Only very rudimentary content based analysis: image type, dimensions, whether the image is color or B&W, photographic or clip art etc. • Even basic content based retrieval (color histograms, composition) practically not used Prof. D. Petkovic, Prof. R. Jain 16 Content based Retrieval – why it is not enough • Assume content based retrieval worked perfectly. What could it ultimately do? – – – – Image color and spatial composition Recognition/matching of some major objects (people, buildings) Motion, action recognition Full speech to text • Even this ideal situation is not enough! We also need: – Other info about the image/video (when, who, where, what, related scientific measurements…) – Who, where, when and why – Related data and links to related data etc. – Integration and synchronization with other sources of data across semantics, time, location, cause/effect dimensions ….. And much more, none of it recorded in pixels 17 What next? Prof. D. Petkovic, Prof. R. Jain 18 Future opportunities and challenges – some ideas • Improve process of media annotation and indexing (automated and semi automated) • Define visual ontology, applications specific then more general • Leverage and improve speech recognition, general and domains specific • Integrate variety of data (media and related data) and provide unified multimedia modeling and handling • Incorporate time and location search into the mainstream Prof. D. Petkovic, Prof. R. Jain 19 Improve process of media annotation and indexing • Automate metadata that lend themselves to automation. Leverage semiautomated means, but pay utmost attention to HCI • Compute indexes based on all related data and clues (WWW links, tags, audio, GPS etc.) • Allow multimedia annotation to help: annotate text, outline image/video objects using pointing, add links… • Use power of internet community to enable economical media annotations – e.g. ESP annotation game by CMU www.espgame.org • Improve usability to enable annotation at most opportune time and make it very easy to use (during capture, in free/fun time etc) • Leverage speech (audio tags and speech recognition) • Pay attention to ease of use and GUI • Use time and location • Image and video can be the data but also an index to libraries Prof. D. Petkovic, Prof. R. Jain 20 Define visual ontology, general or applications specific • Define ontology of visual media: structure, terms etc. as well as related extraction procedures • Not clear if general ontology is practical work on domains specific ones first, then try to generalize • Make it simple and work with domain experts • Offer procedures for automated and semiautomatic instantiation of ontologies using all available info • Much work already done outside of CS community (e.g. domains specific standards for data submissions) Prof. D. Petkovic, Prof. R. Jain 21 Links to some metadata standards (most are XML based and developed outside of CS) • Dublin Core – http://dublincore.org/documents/2003/04/02/dc-xml-guidelines/ • MPEG7 ISO standard for Video (class 8) – http://www.mpeg.org/MPEG/starting-points.html#mpeg7 • METS for Digital Libraries – http://www.loc.gov/standards/mets/ • AIIM Standards (Enterprise Content Management) – http://xml.coverpages.org/AIMM-Images200104.html – http://xml.coverpages.org/umnImages.html – http://digital.lib.umn.edu/elements.html • Really Simple Syndication (RSS) to be used by Yahoo videos search – http://blogs.law.harvard.edu/tech/rss Prof. D. Petkovic, Prof. R. Jain 22 Leverage and improve speech recognition, general and domains specific • Speech and audio has a wealth of information: semantics and timing. It is easily available and natural and effective for input • Speech recognition engines are today trained on general English, with no specific names and domains specific terms. Problem: terms most often searched for (names, specific domain terms,) which are not in speech engine develop domain specific speech engines • Leverage speech and audio as annotation medium • Push speech and audio annotation into capture devices • Synchronize, cross-index speech with related textual data for indexing and increased accuracy Prof. D. Petkovic, Prof. R. Jain 23 Integrate variety of data (media and related data) and provide unified multimedia modeling and handling • Visual information is based on visual media AND related information (links, text, documents, measurements, slides etc). enable integrated indexing, data organization, search and browse • Leverage time and location in indexing and search • Create unified multimedia data models with unified storage, indexing and query, across semantics, time, location • Integrate across variety of data types both semantically, at GUI level, and at the system level (e.g. cross index video with slides and text info) • From data to information: old “chasm” still exists – work on it, first by solving some concrete applications Prof. D. Petkovic, Prof. R. Jain 24 But also….. • Work very closely with users and domain experts. Develop real and complete applications • Take a broader usage, system and application view (v.s looking for application of AI, CV and content based retrieval) • Collaborate with DB, HCI and Internet systems researchers • Leverage all sources of information, not only image and video • Perform extensive experimental evaluations and participate in formal benchmarks (see e.g. NIST TRAC competition rules) • Contribute and participate in standards activities • Pay much more attention to GUI and HCI and perform more formal and complete user evaluations 25 Not doing this risks making us irrelevant… Acknowledgement • We thank J. Gray, J. Gemmell (Bay Area Microsoft Research), R. Singh (SFSU), B. Horoowitz (Yahoo), A. Amir (IBM Almaden Research) for comments and feedback Prof. D. Petkovic, Prof. R. Jain 26 Prof. D. Petkovic, Prof. R. Jain 27 Some history and perspective… • Late eighties and early nineties: excitement of early discovery and (over)promises – CPU, networks and storage started to enable reasonably good manipulation, rendering and processing of images – Multimedia appears as a field – DB people are being courted by CV/AI people to broaden their views and include multimedia data – First projects on content based retrieval: e.g. QBIC (IBM), PhotoBook (MIT)… – Startup activity: Virage and others – Interest from CV and AI researchers – First joint conferences with DB and CV communities – Many (over) promises that caught the eye of investors, marketers etc. Prof. D. Petkovic, Prof. R. Jain 28 Some history and perspective… • Mid to late Nineties: explosion of Internet, things are hot! Furious work to achieve goals – Internet enabled better communication and gradually made images, then video feasible to manipulate, send and view. Explosive growth – Advances in compression, networking, CPU, media formats, standards and storage helped greatly. Cheap capture devices starting to happen – Multimedia moves and melds slowly into Internet – DB vendors start to embrace multimedia types (blobs, extenders, blades, cartridges) – Content base retrieval becomes a very popular research topics for CV and AI. Many conferences and workshop organized – MPEG7 activity started – Availability of research and venture funding continues – First trials, first products with content based retrieval (Virage, 29 IBM, Informix…) Some history and perspective… • 2000 and beyond: wake up, crash of Internet (I.e. it became ubiquitous). Promises of content based retrieval still unfulfilled – Internet became common thing (which is good) but lost its research appeal it became a “vehicle”. It is still growing rapidly with more and more visual data – Explosion of image and video on internet as well as cheap capture devices (e.g. phones capturing audio+image+video+text+GPS) – Further advances in networking, CPU, storage made image and video ubiquitously available and affordable – Startups based on content based retrieval not doing well or folded – Strong research activity – Most applications resolved by researchers outside of CS – Minimal or no use of content based retrieval in commercial world Prof. D. Petkovic, Prof. R. Jain 30