SESSION TWO Using stuff Rough Guide to Image Management CILIP, 31 March 2010 SESSION TWO Using stuff Metadata content and ontologies: requirements for effective retrieval Rough Guide to Image Management CILIP, 31 March 2010 Create metadata Rough Guide to Image Management CILIP, 31 March 2010 Create metadata © Radio Times Rough Guide to Image Management CILIP, 31 March 2010 Metadata needs: ‘Bibliographic’ description: creator, title, subject etc Format details Relationships, source Context, language Rights Technical data Standards Rough Guide to Image Management CILIP, 31 March 2010 Standards For a convenient listing see http://metadata.net/ DCMI: Dublin Core Metadata Initiative http://dublincore.org/ MODS: Metadata Object Description Schema http://www.loc.gov/standards/mods/ METS: Metadata Encoding and Transmission Schema http://www.loc.gov/standards/mets/ RDF: Resource Description Framework http://www.w3.org/RDF/ Rough Guide to Image Management CILIP, 31 March 2010 Why bother? Machine indexing of texts is advanced and quite efficient Not so for pictures: where meaning/significance is often attributed by context E.g. ‘the first computer’, ‘the last man on the moon’ Context must be described in metadata Rough Guide to Image Management CILIP, 31 March 2010 Ontologies Ontologies provide a way of defining context A three-dimensional thesaurus If we need words, we need definitions of words Especially in multiple languages Rough Guide to Image Management CILIP, 31 March 2010 Getting started with ontologies Useful page from AI Topics: http://www.aaai.org/AITopics/html/ontol.html Marine Metadata Interoperability http://marinemetadata.org/guides/vocabs/ont/definition Gives comprehensive guidance on using ontologies and related tools, applicable beyond the marine domain Rough Guide to Image Management CILIP, 31 March 2010 Getting started with ontologies http://www.aaai.org/AITopics/html/ontol.html Rough Guide to Image Management CILIP, 31 March 2010 http://marinemetadata.org/guides/vocabs/ont/definition Rough Guide to Image Management CILIP, 31 March 2010 Finding ontologies and tools Swoogle http://swoogle.umbc.edu/ Domain-specific e.g. FAO Agricultural Information Management Standards (AIMS) http://aims.fao.org/pages/377/sub Rough Guide to Image Management CILIP, 31 March 2010 http://swoogle.umbc.edu/ Rough Guide to Image Management CILIP, 31 March 2010 http://aims.fao.org/pages/377/sub Rough Guide to Image Management CILIP, 31 March 2010 Linguistic tools ULAN: Union List of Artist’s Names Online http://www.getty.edu/research/conducting_resea rch/vocabularies/ulan/ TGN: Thesaurus of Geographic Names Online http://www.getty.edu/research/conducting_resea rch/vocabularies/tgn/ AAT: Art & Architecture Thesaurus Online http://www.getty.edu/research/conducting_resea rch/vocabularies/aat/ ICONCLASS http://www.iconclass.nl/ WORDNET http://wordnet.princeton.edu/ Rough Guide to Image Management CILIP, 31 March 2010 Content-based Image Retrieval Automatic analysis of colour distribution and shapes Edge detection to determine shape Rough Guide to Image Management CILIP, 31 March 2010 Just how big is the ‘semantic gap’? To what extent is it now possible for computers to identify objects within images by direct inspection of the pixel information? The results I am about to show you are from two state-of-the-art automated methods for object detection semantic segmentation Independently they produce good results, and in combination they are remarkable Credits: Jamie Shotton (2007) Contour and Texture for Visual Recognition of Object Categories. Ph. D. Thesis, University of Cambridge Object detection using contour fragments These results are obtained using the first method, based upon contour fragments, used here to detect the presence of horses in images The algorithm has been ‘educated’ using a set of training images, and has then been let loose on these and other test images, which it has analysed automatically On the left of each pair, the green boxes surround the detected horses, while on the right the contour fragments used in the detection are shown This method works well on a variety of objects It gives few false positives and few false negatives, with almost perfect results for motorbikes and cows! However, it does require training, and has not yet been tested on biological research images Automatic image segmentation using texture The second method combines texture, colour, shape and context It learns from a set of 591 training images pre-labelled for 21 object classes Results of the ‘texture’ method Results of the ‘texture’ method for the semantic segmentation of test images building car road grass grass sheep building water cow cat road sky book flower bicycle road building sign grass cow chair grass . . . .but the method is not perfect sky tree cow grass building dog sign water road road road sky bike building As Jamie says in his conclusion, concerning the capabilities of machine vision: “While we are still a considerable way from accurately recognizing the tens of thousands of classes that humans effortlessly distinguish, despite incredible variations in appearance, we believe that this thesis has taken a positive step towards a solution” So the semantic gap between the capabilities of machine vision and the necessity for human metadata annotation is perhaps not as wide as I made out initially! Content-based Video Retrieval Works better: moving objects easier to anaylse Broadcasting systems use audio stream to help index video Informedia Digital Video Library http://www.informedia.cs.cmu.edu/ “combines speech, image and natural language understanding to automatically transcribe, segment and index linear video for intelligent search and image retrieval” Rough Guide to Image Management CILIP, 31 March 2010 Rough Guide to Image Management CILIP, 31 March 2010 SESSION TWO Using stuff Format and delivery issues Rough Guide to Image Management CILIP, 31 March 2010 There’s no such thing as a digital image! Digital images are just a stream of 1’s and 0’s They have to be processed to be seen Almost all processing degrades the image How much degradation is acceptable? Rough Guide to Image Management CILIP, 31 March 2010 Typical formats RAW : unprocessed, exactly as captured by camera. TIFF : processed but uncompressed. Generally best for archiving JPEG : processed and compressed. Best for ‘working’ copies, usually OK for web, not always for publication Rough Guide to Image Management CILIP, 31 March 2010 How big do you want it? DPI no guide to quality: depends on size of original and size of output. Better to quote size in pixels Output size depends on resolution of output device An image that is 1000 × 800 pixels On an old 72ppi monitor will view at 13.9” × 11.1” On a new 96ppi monitor will view at 10.4” × 8.3” On an average inkjet (150lpi) will print at 6.6” × 5.3” On a high quality printer (250lpi) will print at 4” × 3.2” No. of pixels ÷ Output resolution = Output size (http://www.jiscdigitalmedia.ac.uk/stillimages/advice/do-digital-images-existin-the-real-world/) Rough Guide to Image Management CILIP, 31 March 2010 Choosing a file format Archive highest quality – generally TIFF Use working copies – generally JPEG – for display PDF or PSD may be appropriate for some projects see http://www.jiscdigitalmedia.ac.uk/stillimages/advi ce/choosing-a-file-format-for-digital-still-images/ Rough Guide to Image Management CILIP, 31 March 2010 Delivering to the end user Low-res JPEGs ok for web or PowerPoint High-res JPEGs normally needed for publication Author’s responsibility to check publisher’s requirements Normally chargeable – plus reproduction rights To keep or not to keep a library copy? Rough Guide to Image Management CILIP, 31 March 2010 If you keep a copy… Needs long-term storage Needs adequate metadata May need additional scanning to create logical unit … so needs institutional policy decision Rough Guide to Image Management CILIP, 31 March 2010 SESSION TWO Using stuff Rights issues and commercial factors Rough Guide to Image Management CILIP, 31 March 2010 Copyright in images Photographs and images are protected as artistic works, provided original and ‘fixed’ This right does not need to be stated Electronic/digital copyright not specifically mentioned in law, which lags behind technology Ease of copying and conversion makes infringement easy; permission given for one format may not apply to another Rough Guide to Image Management CILIP, 31 March 2010 Who has the rights? The creator of the image The creator of the object imaged The subject of the image Rough Guide to Image Management CILIP, 31 March 2010 Don’t do it! The Internet is NOT a copyright-free zone DO seek copyright permission DO acknowledge the source DON’T alter the image Paul Pedley, Copyright and images, Library and Information Update, 6(6) May 2007, 36-37 Rough Guide to Image Management CILIP, 31 March 2010 Fair dealing You may use images for private study and NON-COMMERCIAL research But not on websites OR INTRANETS because equivalent to multiple copying Permission must always be sought for that Establishing the copyright owner can be extremely difficult Rough Guide to Image Management CILIP, 31 March 2010 Gowers proposals Gowers Review of Intellectual Property HM Treasury, The Stationery Office, 2006 Proposes provision for ‘orphan works’ where copyright owner cannot be traced Intellectual Property Office [=Patent Office] should issue guidance on parameters of ‘reasonable search’ And establish a voluntary register of copyright Rough Guide to Image Management CILIP, 31 March 2010 Rough Guide to Image Management CILIP, 31 March 2010 Rough Guide to Image Management CILIP, 31 March 2010 How long? 70 years after death of photographer (if UK citizen) for photos taken after August 1989; earlier, can be longer or shorter Take advice! Rough Guide to Image Management CILIP, 31 March 2010 Rough Guide to Image Management CILIP, 31 March 2010 Open Access Creative Commons http://creativecommons.org/ Creative Archive (BBC) http://creativearchive.bbc.co.uk/ Science Commons http://sciencecommons.org/ All offer opportunity for creators to license material for web use: non-commercial, credited, share-alike Rough Guide to Image Management CILIP, 31 March 2010 Rough Guide to Image Management CILIP, 31 March 2010 Rough Guide to Image Management CILIP, 31 March 2010 Rough Guide to Image Management CILIP, 31 March 2010 More info JISC Digital Media: http://www.jiscdigitalmedia.ac.uk/stillimages/adv ice/copyright-and-digital-images/ Rough Guide to Image Management CILIP, 31 March 2010 Pricing your own material No standard guidelines Reproduction fees vary widely V&A (http://www.vam.ac.uk/resources/buying/) often taken as ‘best practice’: now scrapped repro fees for scholarly publications Remember quoted prices are maxima – may be discounted or waived Administration is costly Remember original aim of digitising Rough Guide to Image Management CILIP, 31 March 2010 Rough Guide to Image Management CILIP, 31 March 2010 Buying material Unless for library collection, best for enquirer to deal direct with source May need advice on format, type of rights required etc For library retention use highest quality possible Rough Guide to Image Management CILIP, 31 March 2010