BIT 3193 MULTIMEDIA DATABASE CHAPTER 4 : QUERING MULTIMEDIA DATABASES • The structure of image is much less explicit. • so need to apply techniques that will identify a structure • characterizing the content of visual objects is much more complex and uncertain. • characterized by feature vectors • A feature is an attribute derived from transforming the original visual object by using an image analysis algorithm. • The visual query mode involves matching the input image to pre-extracted features of real objects. • pre-extracted features are held in the database • Purpose: • to extract a set of numerical features that removes redundancy from the image and reduces its dimension • The most commonly used features for content-based image retrieval are shape, color and texture. • A content-based image retrieval (CBIR) system uses image visual content features to retrieve relevant images from an image database. • CBIR systems retrieve images according to specified features that users are interested in. • features such as texture, color, shape, and location properties can reflect the contents of an image Example: Trainable System for Object Detection • A set of positive example images of the object class considered (e.g., images of frontal faces) and a set of negative examples (e.g., any non face image) are collected. • The images are transformed into (feature) vectors in a chosen representation • (e.g., a vector of the size of the image with the values at each pixel location below this is called the “pixel” representation) • The vectors (examples) are used to train a pattern classifier, the Support Vector Machine (SVM), to learn the classification task of separating positive from negative examples. • To detect objects in out-of-sample images, the system slides a fixed size window over an image and uses the trained classifier to decide which patterns show the objects of interest. • At each window position, the system extracts the same set of features as in the training step and feed them into the classifier; the classifier output determines whether or not it is an object of interest. • Representation Technique for Face and People Detection • Pixel Representation • Eigen Vector Representation • Wavelet Representation • Multimedia data, such as images or video, are typically represented or stored as very high-dimensional vectors. • The processing time for searching or performing other operations for such systems is highly impacted by the fact that the data are so high dimensional. • It is therefore practically important to find compact representations of multimedia data, while at the same time not significantly affecting the performance of systems such as detection • Can be based on: • color • using color histograms and color variants • texture • variation in intensity and topography of surfaces • shape • using aspect ratios • circularity and moments for global features • using boundary segments for local features • Can be based on: • position • using spatial indexing • image transformations • using transformations • appearance • using a combination of color, texture and intensity surfaces Table 4.1 : Features used in retrieval Feature Measures Theory Main use Problems Color Histogram Swain and Ballard Color indexing Lighting variations Texture Pixel Intensity -illumination -topography Degrees of -directionality -regularity -periodicity Gabor filters Fractals Indexing Texture thesaurus Shape Global features -aspect ratio -circularity -moments Local features - boundary segments Active contours Shape indexing Object recognition Appearance Global features -curvature -orientation Local features - local curvatures and orientation Transforms Image classification Position Spatial relationships Tessellations (Voronoi) Object Recognition Spikes and holes in objects cause errors in indexing Table 4.2 : Advantages and Disadvantages of features methods of retrieval Feature Advantage Color Can be applied to all colored images, 2D and 3D Texture Distinguishes between image regions with similar color e.g sea and sky Large feature vectors each containing 4000 elements have been used Shape Important in image segmentation Can classify images as stick like, plate like or blob like Representation is difficult Viewpoint change an object’s shape Spikes and holes 3D is very difficult Appearance Important way of judging similarity Can generate invariant measures Describe an image at varying levels of detail Position Can be applied to 2D and 3D images Disadvantage Images must contain objects in defined spatial relationships Spatial indexing not useful unless combined with color and texture • There are two alternative approaches: • use a query image • user can provide an image or compose a target image by selecting and clicking color palettes and texture patterns • use user-defined features • allow user to select a sample image • query process • the distribution of image objects is then recomputed in terms of the distance from sample image • use automatic methods for generating contentdependent metadata • speech recognition techniques is used for the identification of both speakers and the spoken words • factors which influence the complexity of the identification problems encountered include: • isolated words (easier to recognize) • single speaker (one is easier) • vocabulary size (smaller is easier) • grammar (tightly constraint is easier) • users can use query by example (QBE) • the technologies used to achieve this have to be integrated and include: • large vocabulary speech recognition • speaker segmentation • speaker clustering • speaker identification • name spotting • topic classification • story segmentation • Videos are far more complex. • Role of video feature extraction: • image-based features • motion-based features (e.g motion of the camera) • object detection and tracking • speech recognition • speaker identification • word spotting • audio classification Attributes Index Clip 1 Category Title Clip Date Source Duration Theme Duration Story 1 Story m Frame Start Scene / Story Segment Frame End Number of Shots Event Keywords Theme Shot captured between a record and stop camera operation Frame Shot 1 Duration Shot 2 Frame Start Frame End Shot k Camera Audio Level Frame number • Clip • digital video document that can last from a few seconds to a few hours • Scene • sequential collection of shots unified by a common event or locale (background). • a clip have one or more scenes • Shot • fundamental unit • much research has focused on segmenting video by detecting boundary between camera shots • defined as a sequence of frames captured by a single camera in a single continuous action in time and space • example : two people having a conversation • low-level syntactic building blocks of a video sequence • The video operations are: • create • concatenate, union and intersection (based on temporal and spatial conditions) • output • Query example: “ Show the details of movies where a character said “I am not interested in a semantic argument, I just need the protein” Content delivery Access control and rights management User Query Processing Query results Query inputs Query Presentation Video processing and annotation summaries Visual summaries Digital video collection Figure A : Video retrieval system Indexes ISO/IEC 13249 (SQL/MM) • SQL Multimedia and Applications • Standardized in 2001 by ISO subcommittee SC32 Working Group • Provides structured object types , methods to store, manipulate image data by content • Supports OR (Object Relational) Data Model Part 1: Framework Part 2: Full Text Part 3: Spatial Part 5: Still Image Part 6: Data Mining Object types that comply with the first edition of the ISO/IEC 13249-5:2001 SQL MM Part5: StillImage standard SI_AverageColor Object Type Describes the average color feature of an image. SI_Color Object Type Encapsulates color values of a digitized image. SI_ColorHistogram Object Type Describes the relative frequencies of the colors exhibited by samples of an image. SI_FeatureList Object Type Describes an image that is represented by a composite feature. The composite feature is based on up to four basic image features (SI_AverageColor, SI_ColorHistogram, SI_PositionalColor, and SI_Texture) and their associated feature weights. SI_StillImage Object Type Represents digital images with inherent image characteristics such as height, width, format, and so on. SI_PositionalColor Object Type Describes the positional color feature of an image. Assuming that an image is divided into n by m rectangles, the positional color feature characterizes an image by the n by m most significant colors of the rectangles. SI_Texture Object Type Describes the texture feature of the image characterized by the size of repeating items (coarseness), brightness variations (contrast), and predominant direction (directionality). Read the following website for further information on Oracle implementation of SQL/MM Still Image: http://download.oracle.com/docs/cd/B19306_01/appdev.102/ b14297/ch_stimgref.htm#CHDBAGID. Example of media table for still Images defined as per SQL/MM standards Given the following PM.SI_MEDIA table definition in Oracle implementation: CREATE TABLE PM.SI_MEDIA( PRODUCT_ID NUMBER(6), PRODUCT_PHOTO SI_StillImage, AVERAGE_COLOR SI_AverageColor, COLOR_HISTOGRAM SI_ColorHistogram, FEATURE_LIST SI_FeatureList, POSITIONAL_COLOR SI_PositionalColor, TEXTURE SI_Texture, CONSTRAINT id_pk PRIMARY KEY (PRODUCT_ID)); Example1: • Construct an SI_AverageColor object from a specified color using the SI_AverageColor(averageColorSpec) constructor. Solution: DECLARE myColor SI_Color; myAvgColor SI_AverageColor; BEGIN myColor := NEW SI_COLOR(null, null, null); myColor.SI_RGBColor(10, 100, 200); myAvgColor := NEW SI_AverageColor(myColor); INSERT INTO PM.SI_MEDIA (product_id, average_color) VALUES (75, myAvgColor); COMMIT; END; Example 2: • Derive an SI_AverageColor value using the SI_AverageColor(sourceImage) constructor: Solution: DECLARE myimage SI_StillImage; myAvgColor SI_AverageColor; BEGIN SELECT product_photo INTO myimage FROM PM.SI_MEDIA WHERE product_id=1; myAvgColor := NEW SI_AverageColor(myimage); END; Example 3: • Insert into PM.SI_MEDIA table an object with PRODUCT_ID = 1 and have average color of RED = 20, GREEN = 30 and BLUE = 50. Solution: DECLARE myColor SI_Color; myAvgColor SI_AverageColor; BEGIN myColor := NEW SI_COLOR(null, null, null); myColor.SI_RGBColor(20, 30, 50); myAvgColor := NEW SI_AverageColor(myColor); INSERT INTO PM.SI_MEDIA (product_id, average_color) VALUES (1, myAvgColor); COMMIT; END; Example 4: • Derive SI_AverageColor object for image with PRODUCT_ID = 13 using the SI_FindAvgClr() function. Solution: DECLARE myimage SI_StillImage; myAvgColor SI_AverageColor; BEGIN SELECT product_photo INTO myimage FROM PM.SI_MEDIA WHERE product_id=13; myAvgColor := SI_FindAvgClr(myimage); END; • In 2002, ISO subcommittee MPEG published a standard: MPEG-7 • Formally named Multimedia Content Description Interface • MPEG-4, the first Multimedia representation Standard • Object based coding • MPEG-7 , Currently the most complete description standard for multimedia data • Any audio/visual material associated with multimedia data can be indexed & searched • Provides • Set of descriptors (D) Quantitative measures of audio/visual features • Description Scheme (DS) Structure of Descriptors & relationship • MPEG-7 descriptions associated with • Still pictures, graphics, 3D models, audio, speech, video • Composition information about how these elements are combined in a multimedia presentation (scenarios) • MPEG-7 descriptions do not depend on the ways the described content is coded or stored • It is possible to create an MPEG-7 description of an analogue movie or of a picture that is printed on paper, in the same way as of digitized content. • MPEG-7 can exploit the advantages provided by MPEG-4 coded content • Material encoded using MPEG-4 provides the means to encode audio-visual material as • Objects having certain relations in time (synchronization) and space (on the screen for video, or in the room for audio), • Possible to attach descriptions to elements (objects) within the scene, such as audio and visual objects • Same material can be described using different types of features, tuned to the area of application • Eg : A visual material: • Lower abstraction level would be a description of shape, size, texture, color, movement (trajectory) and position (“where in the scene can the object be found?”) • The highest level would give semantic information: “This is a scene with a barking brown dog on the left and a blue ball that falls down on the right, with the sound of passing cars in the background” • Apart from the description what is depicted in the content, Following additional information about the multimedia data: • The form (e.g. JPEG, MPEG-2), • The overall data size (helps determining whether the material can be “read” by the user terminal) • Conditions for accessing the material (Includes links to a registry with intellectual property rights information, and price) • Classification -(Includes parental rating, and content classification into a number of pre-defined categories) • Links to other relevant material -(helps the user speeding up the search) • The context -( The occasion of the recording, Like Olympic Games 1996, final of 200 meter hurdles, men ) • Main elements of the MPEG-7 standard • Description Tools: Descriptors (D), Description Schemes (DS), • A Description Definition Language (DDL) • Defines the syntax of the MPEG-7 Description Tools and to allow the creation of new Description Schemes • System tools • Supports binary coded representation for efficient storage and transmission, transmission mechanisms (both for textual and binary formats), multiplexing of descriptions, synchronization of descriptions with content, management and protection of intellectual property in MPEG-7 descriptions, etc. • The key info that the description tools capture includes • Structural information on spatial, temporal or spatio-temporal components of the content (scene cuts, segmentation in regions, region motion tracking). • Low level features in the content (colors, textures, sound timbres, melody description). • Conceptual information of the reality captured by the content (objects and events, interactions among objects). • Information about how to browse the content in an efficient way (summaries, variations, spatial and frequency subbands,). • Information about collections of objects. • Information about the interaction of the user with the content (user preferences, usage history) Scope of MPEG-7 MPEG-7 Main Elements Abstract representation of possible applications using MPEG-7 Integration of MPEG-7 into MMDBMS • MPEG-7 relies on XML Schema, mapping strategies from XML to database data model is an issue!!! • SQL/MM , Querying • Due the rich description provided by MPEG-7, enhancements in SQL/MM is needed • Operations that manipulate, produce as results, an XML is an option • Indexing methods for multidimensional data can be used to index multimedia data • MPEG-7 Provides methods for semantic indexing!!! More on MPEG-7 can be found from ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO- MPEG-7 Overview http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm