Solving Multimedia DBMS Storage Problems

advertisement
Tertiary Storage Systems: Solving Multimedia DBMS Storage Problems
Greg Magsamen – September 27, 2007
Introduction
For enterprise-sized industries and services that deal in multimedia data, special problems arise in how to
store and make readily accessible the content that they deal in. On-Demand video, image and audio
services must be able to provide content without bankrupting themselves with the special concerns. A
single user can afford to deal in multimedia on personal PCs, but enterprises must accommodate a
balancing act of storage, availability and services for their product. The paper provides the position that in
order to be a viable multimedia vendor, a hierarchical storage system must be used to provide for a
multimedia database in order to serve multimedia content.
This paper does not address the peer-to-peer (P2P) model of media sharing. That is a different
architecture altogether. This paper addresses the large multimedia services and users of large amounts
of multimedia content within their multimedia database. Examples would be services such as Google
Earth, YouTube, Facebook or users of large amounts of multimedia such as news services like CNN or
Reuters. Even an audio “jukebox” (logical on the web or physical on the wall) needs cost effective storage
of its content. It would be impractical for every song to be stored locally on the device playing the content.
There is a train of thought to use a current feature of databases call Binary Large Objects (BLOBS) for
direct storage of multimedia content. However, this feature quickly becomes ineffective as the database
that is storing binary representations of multimedia objects needs to reside on single or mirrored disk
systems. This becomes overtly expensive, especially when the database contains rarely accessed
content that is taking up the same amount of space as frequently-used multimedia.
Multimedia Data: Its Nature
Multimedia data, consisting of alphanumeric, graphics, image, animation, video, and audio objects, is
quite different from standard alphanumeric data in terms of both presentation and semantics. From a
presentation viewpoint, multimedia data is huge and involves time dependent characteristics that must be
adhered to for coherent viewing [8].
At the heart of multimedia information systems lies the multimedia database management system.
Traditionally, a database consists of a controlled collection of data related to a given entity, while a
database management system, or DBMS, is a collection of interrelated data with the set of programs
used to define, create, store, access, manage, and query the database. Similarly, we can view a
multimedia database as a controlled collection of multimedia data items, such as text, images, graphic
objects, sketches, video, and audio.
A multimedia DBMS provides support for multimedia data types, plus facilities for the creation, storage,
access, query, and control of the multimedia database. The different data types involved in multimedia
databases require special methods for optimal storage, access, indexing, and retrieval [1]. Some
multimedia data types such as video, audio and animation sequences also have temporal requirements,
which have implications on their storage, manipulation, and presentation. The problems become more
acute when various data types from possibly disparate sources must be presented within or at a given
time. Similarly, images, graphics, and video data have spatial constraints in terms of their content [1].
Multimedia DBMS: Its Purpose
A multimedia database management system provides a suitable environment for using and managing
multimedia database information. Therefore, it must support the various multimedia data types, in addition
to providing facilities for traditional DBMS functions like database definition and creation, data retrieval,
data access and organization, data independence, privacy, integration, integrity control, version control,
and concurrency support. The functions of a multimedia DBMS basically resemble those of a traditional
DBMS. However, the nature of multimedia information makes new demands, specifically how to store the
huge amounts of multimedia data in an enterprise size system [1].
Multimedia DBMS: Requirements
For the multimedia DBMS, and its related storage system, to serve its expected purpose, it must meet
certain special requirements. Some of the requirements specific to a multimedia DBMS [1]:
-
Traditional DBMS capabilities
Huge capacity storage management
Information retrieval capabilities
Media integration, composition, and presentation
Multimedia query support
Multimedia interface and interactivity
Performance
In this paper the requirements of 1) Huge capacity storage management and 2) Information retrieval
capabilities are addressed. Multimedia storage systems store and retrieve data from storage devices and
manage related issues including data placement, scheduling, file management, continuous data delivery,
memory buffering, and pre-fetching. For high-data-rate multimedia systems, storage systems have long
been viewed as a primary bottleneck for two reasons. First, multimedia applications have a much higher
storage system load than previous applications. Second, storage devices have become only marginally
faster compared to increased processor and network performance [9]. This increasing speed mismatch
has fueled a search for new storage structures and file system storage and retrieval mechanisms.
Administrators who design and implement multimedia storage systems must consider several issues,
including [9]:
-
What kind of storage device to use
How to order the requests
Where to put data
How to manage memory
How to deal with overload situations
Storage Management: Huge Capacity
The storage requirements in multimedia systems can be characterized by their huge capacities and the
storage system’s hierarchical organization. Hierarchical storage places the multimedia data objects in a
hierarchy of devices, either online, near-line, or offline [1]. In general, the highest level provides the
highest performance, highest cost, smallest storage capacity, and least permanence. The permanence
improves, however at significant additional cost, with the use of nonvolatile random access memory [1].
Cost and performance (in terms of access time) decrease as we go down the hierarchy, while storage
capacity and permanence increase. Typically, in most multimedia storage systems the highest level of
storage is (volatile) random access memory (RAM), followed by magnetic disk drives. These provide
online services. Optical storage devices provide the next level of storage. Online in some cases, they are
near-line (like online jukeboxes) in most cases. The lowest level in the storage hierarchy represents
offline storage devices, including magnetic tapes, optical disks, and so forth. These may or may not be
directly connected to the computer. They offer the highest storage capacity and permanence but provide
the least performance in terms of access time [1].
Again, multimedia applications require support for the storage and retrieval of multimedia data, which
typically consists of video, audio, text and images. The timing characteristics and the large volumes of
data make the design of a multimedia storage system a critical task [4].
These huge volumes of data specifically characterize multimedia information. For instance, to store an
uncompressed video, for example a 10-minute sequence at 30 frames per second, requires about 38
Gbytes of storage, reducible to about 3.8 Gbytes with a compression ratio of 100:1. The potential for huge
volumes of data involved in multimedia information systems become apparent when you consider that a
movie could run as long as two hours (45 Gbytes / movie), and a typical video repository would house
thousands of movies [1] (45,000 Gbytes = 45 TeraBytes) [1].
A multimedia service will typically need to employ several secondary (e.g., disk) and tertiary (e.g., tapes,
optical disks) storage devices to permanently store the data. A small amount of RAM is used to stage the
data retrieved from disks and tapes before it is transmitted to clients. Cost, latency and transfer rates of
these devices are as described in the table below [6].
Storage Device
Magnetic Disks
Tapes (low end)
Tapes (high end)
Optical Disks
Table: [6]
Cost/MB
20c
0.5c
0.7c
10c
Latency Data Rate
25 ms 5-8 MBps
3 rain 1.5 MBps
3 rain 10 MBps
1.25 MBps
Tertiary Storage Systems: An Old Solution
Storage of multimedia data is a critical issue for the overall system's performance and functionality.
Multimedia applications (services) must be able to support a variety of data defined as multimedia
content. With multimedia data consisting of various objects such as text, image, video and graphics, there
is a need to be synchronized and meet certain timing requirements. The development and evolution of
new applications characterized by high storage needs has resulted in strengthening the role and
importance of Tertiary Storage Systems (TSS) [2]. Recent technological advances have resulted in wide
availability of commercial products offering near-line, robot-based, tertiary storage libraries. Therefore,
such libraries have become an important component of modern large-scale storage servers]. Multimedia
objects physical storage is a demanding and challenging problem due to multimedia’s two principal
constraints, namely size and timing [2].
The main focus of this paper is to show the preferred position of using hierarchical storage system (HSS),
specifically the need for tertiary devices. On top of the computer storage hierarchy is primary storage (i.e.
RAM used for caches and main memory. Below RAM is solid state memory and then magnetic disk
devices, commonly identified as secondary storage. At the bottom of the hierarchy are tertiary storage
devices. Tertiary storage includes magnetic tapes, optical disk devices and some more recent
technologies like optical tapes and holographic storage. From top to bottom of this hierarchy average
access times increase, while the cost per megabyte of storage is dramatically decreased. Therefore,
tertiary storage is inexpensive albeit slow [2].
However, recent advances in tertiary storage technology have made it of increasing interest to computer
system designers. First, increases in bits-per-inch and tracks-per inch densities have increased tape
capability. Second, a variety of inexpensive tape drives have become available. Next the low cost of
magnetic tape media, compared to that of magnetic disks, makes it economical to build massive tertiary
storage systems. Moreover, a large number of robotic devices for handling magnetic tapes allow access
to tertiary storage without human intervention, making response times more predictable [2].
Real World: Hierarchical Usage of Tertiary Storage
The large enterprise sized multimedia systems addressed here need to capture, process, store, and
maintain a variety of information sources. These applications always include a multimedia database
management system whose performance relies on the underlying storage system.
Although some information systems store all their data on online magnetic disks, the huge amount of
multimedia data makes this storage architecture neither practical nor economical. A multi-level
hierarchical storage system (HSS) provides sufficient storage capacity at a more economical cost than
disk only systems, but it invariably includes the long access latency of data held in tertiary storage
devices deteriorating the performance of the storage system [4]. To solve this, multimedia content can
either be staged or pipelined from tertiary storage devices. However, the balancing act must be
continued and DBMS access algorithms are needed to retrieve multimedia in an efficient manner.
A multimedia DBMS must therefore manage and organize multimedia data stored at any level in the
hierarchy. It must have mechanisms for automatically migrating multimedia data objects from one level of
the storage hierarchy to another. In general, even when the data is stored in offline storage devices, the
multimedia DBMS should have information on how to easily locate the specific device containing the
multimedia data being sorted. Data migration in multilayered storage systems is not peculiar to
multimedia DBMSs. All databases handling huge amounts of data must address this issue. The
interconnection between the memory systems is obviously a problem, especially when a multimedia
database involves distributed sources of data [1].
On- Demand video services encompass many important applications pertaining to entertainment,
information, and education, such as movie-on-demand, news-on-demand, distance learning, etc. In
order to offer a wide variety of programs, a video system must accommodate a large number of video
titles in a cost effective manner. Traditionally, magnetic disks have been used in video servers to stream
videos because of their high throughput, low access latency, and random data access. This is the main
reason why most of the previously proposed commercial video servers are based on magnetic disks [3].
However, magnetic disks are still not cost-effective to store a large volume of video files due to their
relatively high cost. On the other hand, low-cost tertiary storage commonly known as library or “jukebox”
(logical) is designed for mass storage in excess of terabytes. However, due to their relatively long access
latency, tertiary systems may not yet be suitable for video streaming. Therefore, a cost-effective approach
is to make use of a hierarchical storage system consisting of a tertiary level and a secondary level. In this
system, the tertiary storage stores all video files which are dynamically transferred or “staged” onto the
secondary level for streaming according to user demand. This caching operation also resolves the
mismatch between the drive bandwidth in the tertiary storage and the streaming rate of the videos, at the
expense of some disk bandwidth due to staging [3]. Again, the DBMS must manage this.
Such a hierarchical system is attractive if many titles are not very popular because, in this case, the
secondary storage space can be effectively shared among the titles. This is particularly true for video-ondemand systems [3]. The disk space at the secondary level can hence be partitioned into two parts: the
part storing those popular movies and the part for staging the not-so-popular ones.
Conclusion: Multimedia Data in World Wide Web
As web applications grow, the need for efficient and dependable multimedia databases will become
essential. Access to various large, visual multimedia databases over the Internet plays an increasingly
important role in numerous applications such as geographic information systems, medical information
systems and distributed publishing [10]. Businesses increasingly provide and use services, applying
formal (Web) services technology for the description, composition, and management of multimedia as
services. At the same time, social communities are emerging on the Web, applying less formal practices
and Web 2.0 technology for the provision and collection of diverse content [7].
The Web is rapidly moving towards a platform for mass collaboration in content production and
consumption. Fresh content on a variety of topics, people, and places is being created and made
available on the Web at breathtaking speed. Navigating the content effectively does require techniques
such as aggregating various portal feeds, but it also demands cost effective storage mechanisms [5].
Once thought outdated, hierarchical data storage systems, including tertiary storage, should be the new
paradigm in the multimedia database architecture. In this paper it has been shown that the architecture
for multimedia databases should not rely on expensive single disks systems, but should employ the HSS
model and make use of tertiary storage for reasons of costs and access.
References
[1] Donald A. Adjeroh, Kingsley C. Nwosu, "Multimedia Database Management—Requirements and
Issues," IEEE MultiMedia, vol. 04, no. 3, pp. 24-33, Jul-Sept, 1997.
[2] Vakali, Athena, Terzi, Evimaria, “Multimedia data storage and representation issues on tertiary
storage subsystems: an overview”, ACM SIGOPS Operating Systems Review, Volume 35 , Issue 2,
Pages: 61 – 77, April 2001.
[3] S.-H. Gary Chan, Fouad A. Tobagi, "Modeling and Dimensioning Hierarchical Storage Systems for LowDelay Video Services," IEEE Transactions on Computers, vol. 52, no. 7, pp. 907-919, Jul., 2003.
[4] Clement H. C. Leung, "Retrieving Multimedia Objects From Hierarchical Storage Systems," mss, p.
297, Eighteenth IEEE Symposium on Mass Storage Systems and Technologies (MSS'01), 2001.
[5] Yih-Farn Robin Chen, Giuseppe Di Fabbrizio, David Gibbon, Serban Jora, Bernard Renger, Bin Wei,
“Geotracker: geospatial and temporal RSS navigation”,
Proceedings of the 16th international conference on World Wide Web WWW '07,
SESSION: Smarter browsing, Pages: 41 – 50, May 2007.
[6] Ozden, Banu, Rastogi, Rajeev, Silberschatz, Avi, “Architecture Issues in Multimedia Storage
Systems”, ACM SIGMETRICS Performance Evaluation Review, Volume 25, Issue 2, pages 3-12, Sept.
1997.
[7] Stefan Tai, Nirmit Desai, Pietro Mazzoleni, “Service communities: applications and middleware”,
Foundations of Software Engineering, Proceedings of the 6th international workshop on Software
engineering and middleware, SESSION: Networking and services, Pages: 17 – 22, 2006.
[8] Grosky, William I., “Managing multimedia information in database systems”, Communications of the
ACM, Volume 40, Issue 12, pages 72 – 80, Dec., 1997.
[9] Pal Halvorsen, Carsten Griwodz, Vera Goebel, Ketil Lund, Thomas Plagemann, Jonathan Walpole,
"Storage System Support for Continuous-Media Applications, Part 1: Requirements and Single-Disk
Issues," IEEE Distributed Systems Online, vol. 05, no. 1, Jan. 2004.
[10] Oya Kalipsiz, "Multimedia Databases," iv, p. 111, Fourth International Conference on Information
Visualisation (IV'00), 2000
Download