Image Tagging Attaching textual meta-information or semantic linkages to images By Perry Rajnovic What is a Digital Image? Digital Images are usually defined as an organized display of pixels, often called a bitmap. Each pixel is a numeric representation of the color intensities of that point. What is a Digital Image? Each pixel may be explicitly defined or be the result rendered by a vector or graphics package functionality. These representations include no inherent textual elements or semantic description. Due to the above, images are not easily machine-readable. Why machine-readability? Most searches are done via textual queries, thus there must be a mechanism to link applicable keywords or phrases to images. For blind persons, being able to convey information about the image in another medium would be good for accessibility. Image Contents The contents of an image can be a full description written in prose (i.e. the adage “A picture is worth 1000 words”), or might simply have a few keywords describing spatial, temporal, or emotional aspects. In many cases, accurately identifying the content of images requires human intervention. Identifying Image Contents Many good pattern recognition algorithms exist, however few are able to interpret the patterns extracted. Artificial Intelligence algorithms can learn recognized patterns, but such a system’s flexibility is limited by its predefined knowledgebase. Identifying Text in Images CAPTCHAs (or Completely Automated Public Turing test to tell Computers and Humans Apart) are images which contain a distorted rendering of some text. Identifying Text in Images Their goal is to provide an easy task for humans to do, but that is extremely hard for computer programs to perform equally. For this task, OCR is generally not sufficient enough to extract the text. This is a good example of why machinereadable information should be available. Example Tag Contents As an example of what might be provided to tag an image, to the right is a list of words and phrases to describe this slide’s header. Navy Blue Squares Fade-out Horizontal Bar Minimalist Decorative User Applications Many applications take advantage of image tagging, below are a few examples. Apple iPhoto Google Picasa Adobe Photoshop Elements Generally these programs use tagging for organization and user-defined searching. Web Applications Several Web-based applications are now including tagging for images, as well as other non-image based features. Google ImageLabeler Flickr.com Facebook.com 23hq.com Fotki.com Google Images Luis von Ahn developed the “ESP Game” which could be used to tag images. He presented a Google tech talk about the game as a form of human computation. Google later licensed the technology to create a similar web application called the Google Images ImageLabeler. Google ImageLabeler The ImageLabeler game allows to random users to generate tags that accurately describe images. The tags should be accurate due to game constraints, and gain specificity after several rounds. The computed tags can improve searches. Flickr Flickr is a “Web 2.0” photo hosting and sharing site. Users are encouraged to upload photos, then to name, describe, tag, annotate, geotag, comment on, and group their photos in collaborative ways. Flickr - Tags Tags are words or phrases meant to act as keywords. They are searchable within the site, and can show popular topics. They improve search relevance. Flickr - Geotagging Geotagging is a term for adding geospatial metadata to images such as the latitude, longitude and other directional indications of where a photo was taken at. What are annotations? Wikipedia defines them as: Extra information associated with a particular point in a document or other piece of information. The US DoD defines them as: A marking placed on imagery or drawings for explanatory purposes or to indicate items or areas of special importance. Annotating Images The use of annotations with images can provide several useful functions. Below are some examples: Point out a specific piece of content. Explain some icon or graphic. Summarize the meaning of some region. Provide additional information via text. Flickr - Notes Flickr provides a feature called Notes. It uses a Flash-based implementation of an annotation system. You can dynamically size a rectangular region over a portion of the image, then attach a snippet of text to describe it. FotoNotes FotoNotes is a data format for annotating images. Allows you to embed the metadata directly into the image files for portability. Flickr’s Notes feature is inspired by this standard and accompanying visualization implementation. FotoNotes - More It was developed by Greg Elin. The homepage provides links to groups working with the standard. Additionally, an implementation which works in most browsers is provided as-is for customization. Facebook Facebook.com has a tagging feature that is integrated with “My Photos”. It allows you to add a textual descriptor (tag or person’s name) to a specific point in the image. This allows the module to describe who or what are included in a specific album. Facebook – Tag Display When the images are viewed, placing the mouse over a tag displays a fixed sized square indicating where the tag (person) is located within the image. This enables users to identify objects by visual inspection or by matching the list of contained objects with their tag displays. Facebook – Links Another capability incorporates the site’s concept of friends. If the person you tag is identified as your friend on the site, their name will link to their profile. The site will also count this image in the “photos of” feature on their profile, allowing inclusion of photos added by other users. Other Image Metadata: MPEG The MPEG-7 standard is a “Multimedia Content Description Interface” “MPEG-7 is not aimed at any one application in particular; rather, the elements that MPEG-7 standardises shall support as broad a range of applications as possible.” Other Image Metadata: Adobe Adobe Systems created a new MetaData framework for images called XMP (Extensible Metadata Platform). It is publicly documented, based on W3C standards, built on XML, and is designed to eliminate growing incompatibility for metadata storage. Other Image Metadata: IPTC The International Press Telecommunications Council created standards for the interchange of news data over a decade ago. These standards still persist in their IIM standard, as well as being usable in the newer XMP framework. Improving Clustering Search Interfaces Joint Term Project By Perry Rajnovic and Mark Zalar Term Project For my term project, I will be working with Mark Zalar to develop a new search engine interface It will draw inspiration from all of the top search engines today, along with the enhancements now possible using emerging technologies. Project Goal The Goal of the project is to implement a search site that provides a highly usable interface for query refinement. Our backend will use clustering mechanisms to allow for easy refocusing of search topics Our frontend will use AJAX for flexibility. Frontend Design The frontend will be designed with a technology known as Asynchronous JavaScript and XML (AJAX). This technology allows the site designer to run unseen requests to the server and parse XML-based results in the scripting language for interactivity. Frontend Theory Most clustering based search solutions available today use minimally interactivity. Our theory is that making the ability to harness the power of clustering dynamically as you refine your search will improve results as well as time necessary to finish a search. AJAX Functionality Our site will use AJAX to dynamically reconfigure the clustering menu. This allows a quicker browsing of clusters to identify the optimal range of pages to search within. The menu will also use a novel interface that shows sibling and parent clusters. AJAX Functionality 2 The results will be displayed to the user with some animation. This will help to alert users when changes are made to the order or set of results. Another advantage of this is that users will be more aware of the difference between clusters as they browse them. Search Target This search engine could target both websites and images. Valid keywords improve content knowledge. Clustering would be highly useful in finding an image with a desired scene or set of objects. Example: search “creature” The engine might identify a general cluster of “animal” or “being”. Animal might have more results, so the medium level clusters are shown for that. Example: search “creature” User wants a general discussion of mammals. Selects that cluster. The results change to focus on those related to mammals as a group and in specific. Results Display To provide animation, a similar technology to that found in “TiddlyWiki” will be used. This interface allows topics to be added and removed dynamically with animation. Additionally, extra links can be attached to each topic for more functionality (open in new window, similar items, etc.) Future Enhancements Our implementation will provide a basic mockup of the interface and refinement techniques made available. Several enhancements could be made to this interface that would improve its usability or functionality. Enhancements in Search Taking advantage of a meta-search would allow the clustering algorithm to have a higher volume of data with which to generate data topologies to be explored. Using adaptive search (by userID or global optimization) would improve clustering by choosing ones more often used. Enhancements in Interface Because the site will be AJAX based, a large amount of flexibility is possible with respect to changes in the interface. The browser window is similar to a canvas, with all of the site’s underlying Document Object Model available for addition, modification or deletion.