Abstract:
The purpose of this paper is to present and clarify a part of the challenge that the multimedia database systems represent. The MMDBS are the next level, chronologically and technically, after relational and object-oriented database systems. There isn't any need for the classical primary keys, tables and attributes anymore, now specialists work with magnetic support for storing the data, they make use of pixels, color histograms to compare images, extract screenshots, frames from video files and measure audio elements to identify sounds. This thesis also focuses on concepts like similarity search, similarity models and their mathematical basis. The application of MMDBS doesn't stop here, crime investigators use
MMDBS for fingerprints, voice and face recognition. Both the efficiency and the effectiveness of the presented techniques are thoroughly analyzed. The benefits over traditional approaches are shown by evaluating the new methods on real-world test datasets.
Keywords :
Multimedia, Pixels, Database systems, Information retrieval, Color histogram
Dan Andreea-Irina
Alexandru Ioan Cuza University
Iasi, Romania andreeairina.dan@gmail.com
The mid-90s were the emergence during the first wave of implementation of management and marketing of the multimedia databases. Among all, the most important were the MediaDB (now MediaWay), Jasmine or Itasca (the commercial successor of ORION). Each of these products could handle different types of data and query facilitators, data retrieval, insert and update data. Many of them have disappeared from the market after several years of existence, just a few were able to adapt to the changes in hardware and software.
In the second wave, commercial systems working with multimedia content, providing advanced user objects for various types of media have been launched. The most remarkable one was Informix, launched during 1996-1998. The most advanced solutions for today's market are Oracle 10g, IBM DB2 and IBM Informix.
The third wave includes projects currently under implementation or recently completed. Most applications are able to deal with richer semantic content of data.
The most of these products are based on MPEG standards (MPEG-7 and MPEG-21).
Multimedia is an irreversible trend in information technology that improves data quality and minimizes the loss of information. The main advantage of multimedia databases is that multimedia information saves space on disk, are accessible and easily processed and manipulated.
The term Multimedia Database System is used to define CD collections, systems that help the user through various tools to organize and search the multimedia information (browser), systems 'video on demand', CAD systems that use a database as storage medium, and relational databases, in combination with the so-called tabular information Save Binary Large Object (BLOB).
The term 'medium' has different definitions according to various areas. According to
Dutton, the term has the meaning of a mediator, in the colloquial sense, it refers to press, radio, television, internet, in physics it defines sound, electromagnetic waves, hardware, storage, transmission media (telephone, radio) in the logical sense, the term medium defines abstract data types (text, graphics, videos, etc.).
All these definitions have in common the fact that through the medium, messages are sent and received. The medium facilitates the communication between two or more partners. The sender generates signals in the environment and the receiver takes signals to interpret them.
1.1 Multimedia data types
1.1.1Text
This environment is often reduced to a string representation. A notable representation in an archive of documents should contain structure information and information settings.
Structured information contain the title, author, chapter, section, all represented as suggested by mark-up languages (SGML, HTML) information settings. Even with formatting the text is the most economic environment regarding the space on disk.
Without compression, an A4 text page takes about 2 KB. Expectations among users of multimedia database texts are opportunities to search through both individual words and word combinations. Options provide relatively high tolerance to mistakes or conversion (OCR), synonyms, or search for similar text in documents.
1.1.2 Raster images
Most often are obtained by taking data obtained from a digital camera or a scanner, but they can also be generated from other data types (text, diagrams). There are also many formats and compression methods (GIF, TIFF, JPEG, etc.).
A raster image is conceptually a matrix with image elements (pixels). Pixels are used to get different bits and Boolean operators (AND, OR).
Color and gray scale images need more than one bit per pixel. The number of bits per pixel (e.g. 24) is known as pixel depth or color depth. The pixel depth raster image is always a constant. Various images of a database with images and color depth may vary [1].
For raster images with fewer bits (e.g. 8) the representation is made using color tables. A color table assigns to each color combination a bit, most of them are based on color coded model RGB (red, green, blue). Alternative models are based on other primary colors (CYM) or combinations of colors, brightness and saturation (YIQ).
At a higher pixel depth, color values are typically stored in pixels (without implementation by color tables), since color chart could be too high [4].
Storage requirements for a raster image vary depending on the application and compression. Starting with a few KB corresponding to the black color for compression, white images reach up to 100 MB for the downloaded satellite images.
Basic operations that a multimedia database should support are the selection of partial images, or scaling (changing the resolution). There are many ways to search a database that can be classified according to textual descriptions and automatic search for files with similar content (similar colors, shapes).
1.1.3 Charts
Unlike raster images, graphs contain a more abstract description of the content visually represented. A graph is a hierarchy of basic shapes, lines, circles, polygons, etc. The main advantage over raster images is the easier manipulation by the user and the fact that basic shapes can be edited individually [6].
For raster images, selecting sub-unit objects is difficult if not impossible. Vector graphics depend on the complexity of a typical memory more effectively than raster images. Due to the lack of standardization of image formats, there are few arguments for manipulation and integration of search operators in database systems.
1.1.4 Audio files (music, sound, language)
Due to a wide range of offers on audio cards and CD-ROM, the use the audio environment has become very popular. Spoken language is by far the most important way of communication between people.
Musical environment, noise and spoken language differ, mainly by quality requirements. The simplest form of representing a sound sequence is the so-called
Pulse Code Modulation (PCM), where the signal is fixed at some intervals (the sampling rate) and the value obtained is measured with a fixed number of bits [5].
Sampling rate determines the quality of the audio clips. According to the sampling theorem, sampling rate must have a value of at least two times higher than the highest frequency seen in the audio signal.
There are many audio compression methods. For example, the minimum storage requirement for audio quality in phones is 20 KB per minute. In analogy, to search text files and images, large files, audio clips can be searched by the method of
Pattern Recognition [2].
A special position among audio files has the music in an abstract form (e.g. as a sequence of nodes or MIDI data) that is interpreted by a synthesizer.
1.1.5 Video Files
The aggregation of video raster images in a close temporal relationship is defined as video. In order to create a continuous motion picture, 25 frames per second should be saved. Most times we are to synchronize images with other media types (audio, text). Video files that need more space for storage. At 25 frames per second and a size of 250 KB per frame, the resulting size is 375 MB per minute. This compression is inevitable.
MMDBS should support normal cutting operations of the video and selection of still images and convert them to a certain form, quality or compression.
11.6 Graphs
Many concepts of human thought structures can be described by graphs, such as electrical diagrams, organizational structures, etc. Graph environment should be supported by a multimedia database system. Search mechanisms should include the graph structure with nodes and lines that include general similarity (using the entire graph to identify a similar one) and partial (similar sub-graphs).
The first and most important question is whether searching similar objects support the method of similarity. For different types of media, many similarity metrics are developed for different aspects of objects.
For color images, many models of similarity are represented by color histograms, where images are defined as similar, the same color must have a relatively similar frequency in both images. Other color image similarity models attempts to compare images pixel by pixel [8].
A commonly used method is to find similarity distance between two objects.
Generally, the distance function is part of the metric field. If the distance is 0, then the objects are equal, a metric distance between two different objects is always a positive value.
2.1. Similar Models-General Concepts
Many models of similarity use general similarity models in order to stabilize the subjectivity of similarity (e.g. percentage similarity, common properties that lead to a percentage of similarity up to 100%) and similarity based on distance. A distance function value describes the similarity of objects. The greater the distance, the more various are the objects. Q An object has the distance to himself the value 0.
2.1.1Interval queries
The query parameter is the request of the object q, with maximum similarity distance
ε, with the result set having the property: sim ε (q) = (a DB | d (a, q) ≤ ε). The number of results is not previously known, we only know that we have values between 0 and | DB |, the range of values result being the field of values ε [9].
Query interval disadvantage is what we don't know the value that ε should take. If ε is too small, we won't have enough results, whether, on the contrary, it is too large, we have too many results.
2.1.2 Nearest neighbor queries
The parameters are only the q required items and the result set will have the property
NN (q) = (a | ∀ a 'DB: d (a, q) ≤ d (a', q)). The number of results is 1 (at least), several definitions being possible for "just one". The range of results is not previously known, we know only that its value is ε = min (d (a, q) | DB) [3].
2.1.3 k-nearest neighbor queries
The q objects are required, with a number of k results desired, the result set containing the smallest value NNQ (k) dB | NNQ (k) | ≥ k, for which NNQ ∀ (k): ∀ a 'DB - NNQ (k): d (a, q) <d (a ', q). The number of results is k (at least) and the range of results is not previously known, but we know it has the property ε = max (d
(a, q) | a NNQ (k)) [3].
2.1.4 Incremental level ('Give me more' queries)
The motivation of using this method is that often, we do not know any appropriate ε value at the beginning of a search (e.g. internet search engines). The desired result is sorted according to the distance to the searched object.
The process is the following:
Specify an object q at the beginning.
Calling repeatedly the getnext (ki) function, each of the following k results will
appear until the find of the desired result [6].
There will be a gradual increase for the sequence k1, k2,... kn with kn = NNQ(kn) determined.
The list of contents of the database is (partly) in ascending order according to the distance from the required object. E.g. for two objects Oi and Oj, the result is
∀ i, j (1, ..., N) i <j d (Oi, q) d (Oj, q) [5].
2.2 Similarity models for images
Content-based search in image databases is theoretically the same as the search in relational databases, by searching among the standard attributes, like the primary key
(e.g. Filename). Also, in image search we use secondary features (information such as date of issue, place of origin, author, etc.).
In case of failure of the keyword search, then the element after which the search is made, is no longer used as a search descriptor (e.g. Finding all the images with a high proportion of green in the bottom of the image).
Searching for image content is based on the concept of derivation, meaning internal content representation of the image (pixels). This method has some disadvantages such as high costs and problems with manual indexing. For this type of search color, texture or visual forms of representation are used.
The features of the pictures, after which we can search and identify the colors, that are determined by color histograms, texture, materials, nature and forms of image segments (contours), by pixels and by morphology.
The most popular systems for content-based search are:
QBIC: Query By Image (and video) content, developed by IBM Almaden Research
Center
Image Technology Center Miner developed by computer science, University of
Bremen
VisualSeek developed for Telecom Research Center, Columbia University, NY
MARS: Multimedia Analysis and Retrieval System developed by the University of
Illinois / Urbana-Champaign
Surfimage Recquencourt developed by INRIA, France etc.
Generating color histogram can be done by setting the color space (e.g. RGB, HSV,
HLS, Munsell, etc.) and selecting a set of representative elements in the color space
(reference point) [1].
Color histogram calculation is done as follows: for each pixel, the number of nearest points is increased with a representative unit. To facilitate this method, histogram normalization is possible for any size image.
The textures of materials in image segments describe its nature.
The model of the texture depends in QBIC of the direction, purpose and orientation of the image elements (e.g. wall joints, gravel, gradient distribution of images), contrast and vitality of a model (e.g. the contrast between a white wall and sand).
In the QBIC method, calculating the histogram variance of gray values, granularity, scaling the texture (e.g. sand vs. pavers, smooth walls vs. walls with ribs), calculation of the movement of components on the images, sizes, evaluation of various distance functions for color and content representation pictures by histograms are processes for obtaining the best results.
A description of the distribution histograms are reference values in an image.
Histogram-based discretization is not a requirement of prototypes (ci | i = 1 ,..., n), the histogram results having n dimensions. For each pixel, the nearest prototype cj is determined (with the Voronoi model). The value of hi (i) is the number of prototypes of the assigned pixels in the cj image: HI (i) = (x | | I (x) - but | <| I (x) - cj |, j = 1..n).
Seeking applications for similar forms are used the sample images, archives, graphics, open searches, medical imaging etc. Euclidean distance problems are that when comparing beam images, the distances between different elements are different, but the Euclidean distance always has the same value [7].
A cause of this problem is that changes are either too strong or imperceptible. In these cases, translation of invariance is a powerful solution, the robustness is desired to local changes. One solution is to check the neighbor pixels, rather than considering only the direct overlap of the pixels in the difference image. Measures may be the same for all pixels, but may vary over the image.
2.3. Videos
Videos are becoming increasingly important in exchange of information in various forms: illustrative clips (simulations, animations), presentations, discussions, video conferences, monitoring systems, internet (YouTube video on demand).
It is necessary that video files are saved efficiently, accessible and recoverable.
Contemporary databases are videos made by various methods, such as blobs of smart blobs, recovery through metadata, or separation in key frames.
Video files are a structured environment in time and space that cannot be viewed as a set of individual frames, but builds of documents as wholes. Video abstractization divides videos in structured parts (Video Index).
Recovery techniques of video cover all multimedia databases issues:
Retrieval for images in the description of single images, key frames etc.
Retrieve the audio files in the soundtrack's analysis, spoken language recognition, etc.
Retrieve text for searching subtitles, summaries, etc.
Retrieval techniques are combined in video analysis in order to identify persons with segmentation and subtitling recognition, voice recognition of the actors, classifying objects with form and language information, finding interesting scenes based on audience's applause.
Using video-retrieval techniques is made to mark the news of personal interest (all clips with interesting subjects), the automatic recognition of film genre (action, comedy), detecting commercials in TV channels, or automatic recording of interesting TV programs.
Modeling video data is carried out mainly to obtain the structure of time and space in video content. Modeling is important for time-dependent queries (e.g. Find clips in which an object falls). This process is divided into two parts: modeling and representation of video; and video segmentation and summarizing [6].
The major components for detecting scenes are visual representation of video frames, color, texture, calculating different video frames neighboring threshold for deciding "how much is enough?". Key frames from videos are unlike other frames, they can provide the information extracted from color histograms, top histograms or
motion histograms [4].
The largest companies interested in processing video databases are, in order of market share in this segment:
1.IBM Research Center
2.Intel
3.Microsoft Redmond & Beijing
4.Kodak at Rochester
5.At Xerox Polo Alto
6.Google
7.Yahoo!
Despite all the drawbacks, all the unsuccessful attempts of creating an environment that would support processing the multimedia data, the multimedia databases has continued being considered a new flow in information technologies, information systems and databases processing environment.
The multimedia databases have a slow development, each step ahead is a small one, but specialists are certain that each of these steps is a step for science in developing and polishing the processing methods and environment.
The fact that companies like Oracle invest in creating an environment (InterMEDIA) for working with image databases gives us the confidence that all the work and research will have a satisfying result.
[1] Baeza-Yates R., Cunto W., Manber U., Wu S. - Proximity matching using fixedqueries trees , Macmillan, London, 2004
[2] Bozkaya T., Ozsoyoglu M. - Distance-based indexing for high-dimensional metric spaces , Duden, Berlin, 1997
[3] Brin S. - Near neighbor search in large metric spaces , Pearson, London, 2005
[4] Bunke H., Shearer K. - A Graph Distance Metric Based on the Maximal
Common Subgraph. Pattern Recognition Letters , Macmillan, London , 2008
[5] Burkhard W., Keller R. - Some approaches to best-match file searching ,
Bloomsbury, Berlin, 1973
[6] Ankerst M., Kriegel H.-P., Seidl T. - A Multistep Approach for Shape Similarity
Search in Image Databases , TKDE, 1998
[7] Chavez E., Marroquin J., Baeza-Yates Spaghettis R. - An array based algorithm for similarity queries in metric spaces , IEEE CS Press, 1999
[8] Faloutsos C., Barber R., Flickner M., Hafner J., Niblack W., Petkovic D., Equitz
W. - Efficient and Effective Querying by Image Content . Journal of Intelligent
Information Systems 3, Bern, 1994
[9] Flickner M., Sawhney H., Niblack W., Ashley J., Huang Q., Dom B., Gorkani
M., Hafner J., Lee D., Petkovic D., Steele D., Yanker P. - Query by Image and Video
Content: The QBIC System . IEEE Computer, Berlin, 1995