Tertiary Storage Systems: Solving Multimedia DBMS Storage Problems Greg Magsamen – September 27, 2007 Introduction For enterprise-sized industries and services that deal in multimedia data, special problems arise in how to store and make readily accessible the content that they deal in. On-Demand video, image and audio services must be able to provide content without bankrupting themselves with the special concerns. A single user can afford to deal in multimedia on personal PCs, but enterprises must accommodate a balancing act of storage, availability and services for their product. The paper provides the position that in order to be a viable multimedia vendor, a hierarchical storage system must be used to provide for a multimedia database in order to serve multimedia content. This paper does not address the peer-to-peer (P2P) model of media sharing. That is a different architecture altogether. This paper addresses the large multimedia services and users of large amounts of multimedia content within their multimedia database. Examples would be services such as Google Earth, YouTube, Facebook or users of large amounts of multimedia such as news services like CNN or Reuters. Even an audio “jukebox” (logical on the web or physical on the wall) needs cost effective storage of its content. It would be impractical for every song to be stored locally on the device playing the content. There is a train of thought to use a current feature of databases call Binary Large Objects (BLOBS) for direct storage of multimedia content. However, this feature quickly becomes ineffective as the database that is storing binary representations of multimedia objects needs to reside on single or mirrored disk systems. This becomes overtly expensive, especially when the database contains rarely accessed content that is taking up the same amount of space as frequently-used multimedia. Multimedia Data: Its Nature Multimedia data, consisting of alphanumeric, graphics, image, animation, video, and audio objects, is quite different from standard alphanumeric data in terms of both presentation and semantics. From a presentation viewpoint, multimedia data is huge and involves time dependent characteristics that must be adhered to for coherent viewing [8]. At the heart of multimedia information systems lies the multimedia database management system. Traditionally, a database consists of a controlled collection of data related to a given entity, while a database management system, or DBMS, is a collection of interrelated data with the set of programs used to define, create, store, access, manage, and query the database. Similarly, we can view a multimedia database as a controlled collection of multimedia data items, such as text, images, graphic objects, sketches, video, and audio. A multimedia DBMS provides support for multimedia data types, plus facilities for the creation, storage, access, query, and control of the multimedia database. The different data types involved in multimedia databases require special methods for optimal storage, access, indexing, and retrieval [1]. Some multimedia data types such as video, audio and animation sequences also have temporal requirements, which have implications on their storage, manipulation, and presentation. The problems become more acute when various data types from possibly disparate sources must be presented within or at a given time. Similarly, images, graphics, and video data have spatial constraints in terms of their content [1]. Multimedia DBMS: Its Purpose A multimedia database management system provides a suitable environment for using and managing multimedia database information. Therefore, it must support the various multimedia data types, in addition to providing facilities for traditional DBMS functions like database definition and creation, data retrieval, data access and organization, data independence, privacy, integration, integrity control, version control, and concurrency support. The functions of a multimedia DBMS basically resemble those of a traditional DBMS. However, the nature of multimedia information makes new demands, specifically how to store the huge amounts of multimedia data in an enterprise size system [1]. Multimedia DBMS: Requirements For the multimedia DBMS, and its related storage system, to serve its expected purpose, it must meet certain special requirements. Some of the requirements specific to a multimedia DBMS [1]: - Traditional DBMS capabilities Huge capacity storage management Information retrieval capabilities Media integration, composition, and presentation Multimedia query support Multimedia interface and interactivity Performance In this paper the requirements of 1) Huge capacity storage management and 2) Information retrieval capabilities are addressed. Multimedia storage systems store and retrieve data from storage devices and manage related issues including data placement, scheduling, file management, continuous data delivery, memory buffering, and pre-fetching. For high-data-rate multimedia systems, storage systems have long been viewed as a primary bottleneck for two reasons. First, multimedia applications have a much higher storage system load than previous applications. Second, storage devices have become only marginally faster compared to increased processor and network performance [9]. This increasing speed mismatch has fueled a search for new storage structures and file system storage and retrieval mechanisms. Administrators who design and implement multimedia storage systems must consider several issues, including [9]: - What kind of storage device to use How to order the requests Where to put data How to manage memory How to deal with overload situations Storage Management: Huge Capacity The storage requirements in multimedia systems can be characterized by their huge capacities and the storage system’s hierarchical organization. Hierarchical storage places the multimedia data objects in a hierarchy of devices, either online, near-line, or offline [1]. In general, the highest level provides the highest performance, highest cost, smallest storage capacity, and least permanence. The permanence improves, however at significant additional cost, with the use of nonvolatile random access memory [1]. Cost and performance (in terms of access time) decrease as we go down the hierarchy, while storage capacity and permanence increase. Typically, in most multimedia storage systems the highest level of storage is (volatile) random access memory (RAM), followed by magnetic disk drives. These provide online services. Optical storage devices provide the next level of storage. Online in some cases, they are near-line (like online jukeboxes) in most cases. The lowest level in the storage hierarchy represents offline storage devices, including magnetic tapes, optical disks, and so forth. These may or may not be directly connected to the computer. They offer the highest storage capacity and permanence but provide the least performance in terms of access time [1]. Again, multimedia applications require support for the storage and retrieval of multimedia data, which typically consists of video, audio, text and images. The timing characteristics and the large volumes of data make the design of a multimedia storage system a critical task [4]. These huge volumes of data specifically characterize multimedia information. For instance, to store an uncompressed video, for example a 10-minute sequence at 30 frames per second, requires about 38 Gbytes of storage, reducible to about 3.8 Gbytes with a compression ratio of 100:1. The potential for huge volumes of data involved in multimedia information systems become apparent when you consider that a movie could run as long as two hours (45 Gbytes / movie), and a typical video repository would house thousands of movies [1] (45,000 Gbytes = 45 TeraBytes) [1]. A multimedia service will typically need to employ several secondary (e.g., disk) and tertiary (e.g., tapes, optical disks) storage devices to permanently store the data. A small amount of RAM is used to stage the data retrieved from disks and tapes before it is transmitted to clients. Cost, latency and transfer rates of these devices are as described in the table below [6]. Storage Device Magnetic Disks Tapes (low end) Tapes (high end) Optical Disks Table: [6] Cost/MB 20c 0.5c 0.7c 10c Latency Data Rate 25 ms 5-8 MBps 3 rain 1.5 MBps 3 rain 10 MBps 1.25 MBps Tertiary Storage Systems: An Old Solution Storage of multimedia data is a critical issue for the overall system's performance and functionality. Multimedia applications (services) must be able to support a variety of data defined as multimedia content. With multimedia data consisting of various objects such as text, image, video and graphics, there is a need to be synchronized and meet certain timing requirements. The development and evolution of new applications characterized by high storage needs has resulted in strengthening the role and importance of Tertiary Storage Systems (TSS) [2]. Recent technological advances have resulted in wide availability of commercial products offering near-line, robot-based, tertiary storage libraries. Therefore, such libraries have become an important component of modern large-scale storage servers]. Multimedia objects physical storage is a demanding and challenging problem due to multimedia’s two principal constraints, namely size and timing [2]. The main focus of this paper is to show the preferred position of using hierarchical storage system (HSS), specifically the need for tertiary devices. On top of the computer storage hierarchy is primary storage (i.e. RAM used for caches and main memory. Below RAM is solid state memory and then magnetic disk devices, commonly identified as secondary storage. At the bottom of the hierarchy are tertiary storage devices. Tertiary storage includes magnetic tapes, optical disk devices and some more recent technologies like optical tapes and holographic storage. From top to bottom of this hierarchy average access times increase, while the cost per megabyte of storage is dramatically decreased. Therefore, tertiary storage is inexpensive albeit slow [2]. However, recent advances in tertiary storage technology have made it of increasing interest to computer system designers. First, increases in bits-per-inch and tracks-per inch densities have increased tape capability. Second, a variety of inexpensive tape drives have become available. Next the low cost of magnetic tape media, compared to that of magnetic disks, makes it economical to build massive tertiary storage systems. Moreover, a large number of robotic devices for handling magnetic tapes allow access to tertiary storage without human intervention, making response times more predictable [2]. Real World: Hierarchical Usage of Tertiary Storage The large enterprise sized multimedia systems addressed here need to capture, process, store, and maintain a variety of information sources. These applications always include a multimedia database management system whose performance relies on the underlying storage system. Although some information systems store all their data on online magnetic disks, the huge amount of multimedia data makes this storage architecture neither practical nor economical. A multi-level hierarchical storage system (HSS) provides sufficient storage capacity at a more economical cost than disk only systems, but it invariably includes the long access latency of data held in tertiary storage devices deteriorating the performance of the storage system [4]. To solve this, multimedia content can either be staged or pipelined from tertiary storage devices. However, the balancing act must be continued and DBMS access algorithms are needed to retrieve multimedia in an efficient manner. A multimedia DBMS must therefore manage and organize multimedia data stored at any level in the hierarchy. It must have mechanisms for automatically migrating multimedia data objects from one level of the storage hierarchy to another. In general, even when the data is stored in offline storage devices, the multimedia DBMS should have information on how to easily locate the specific device containing the multimedia data being sorted. Data migration in multilayered storage systems is not peculiar to multimedia DBMSs. All databases handling huge amounts of data must address this issue. The interconnection between the memory systems is obviously a problem, especially when a multimedia database involves distributed sources of data [1]. On- Demand video services encompass many important applications pertaining to entertainment, information, and education, such as movie-on-demand, news-on-demand, distance learning, etc. In order to offer a wide variety of programs, a video system must accommodate a large number of video titles in a cost effective manner. Traditionally, magnetic disks have been used in video servers to stream videos because of their high throughput, low access latency, and random data access. This is the main reason why most of the previously proposed commercial video servers are based on magnetic disks [3]. However, magnetic disks are still not cost-effective to store a large volume of video files due to their relatively high cost. On the other hand, low-cost tertiary storage commonly known as library or “jukebox” (logical) is designed for mass storage in excess of terabytes. However, due to their relatively long access latency, tertiary systems may not yet be suitable for video streaming. Therefore, a cost-effective approach is to make use of a hierarchical storage system consisting of a tertiary level and a secondary level. In this system, the tertiary storage stores all video files which are dynamically transferred or “staged” onto the secondary level for streaming according to user demand. This caching operation also resolves the mismatch between the drive bandwidth in the tertiary storage and the streaming rate of the videos, at the expense of some disk bandwidth due to staging [3]. Again, the DBMS must manage this. Such a hierarchical system is attractive if many titles are not very popular because, in this case, the secondary storage space can be effectively shared among the titles. This is particularly true for video-ondemand systems [3]. The disk space at the secondary level can hence be partitioned into two parts: the part storing those popular movies and the part for staging the not-so-popular ones. Conclusion: Multimedia Data in World Wide Web As web applications grow, the need for efficient and dependable multimedia databases will become essential. Access to various large, visual multimedia databases over the Internet plays an increasingly important role in numerous applications such as geographic information systems, medical information systems and distributed publishing [10]. Businesses increasingly provide and use services, applying formal (Web) services technology for the description, composition, and management of multimedia as services. At the same time, social communities are emerging on the Web, applying less formal practices and Web 2.0 technology for the provision and collection of diverse content [7]. The Web is rapidly moving towards a platform for mass collaboration in content production and consumption. Fresh content on a variety of topics, people, and places is being created and made available on the Web at breathtaking speed. Navigating the content effectively does require techniques such as aggregating various portal feeds, but it also demands cost effective storage mechanisms [5]. Once thought outdated, hierarchical data storage systems, including tertiary storage, should be the new paradigm in the multimedia database architecture. In this paper it has been shown that the architecture for multimedia databases should not rely on expensive single disks systems, but should employ the HSS model and make use of tertiary storage for reasons of costs and access. References [1] Donald A. Adjeroh, Kingsley C. Nwosu, "Multimedia Database Management—Requirements and Issues," IEEE MultiMedia, vol. 04, no. 3, pp. 24-33, Jul-Sept, 1997. [2] Vakali, Athena, Terzi, Evimaria, “Multimedia data storage and representation issues on tertiary storage subsystems: an overview”, ACM SIGOPS Operating Systems Review, Volume 35 , Issue 2, Pages: 61 – 77, April 2001. [3] S.-H. Gary Chan, Fouad A. Tobagi, "Modeling and Dimensioning Hierarchical Storage Systems for LowDelay Video Services," IEEE Transactions on Computers, vol. 52, no. 7, pp. 907-919, Jul., 2003. [4] Clement H. C. Leung, "Retrieving Multimedia Objects From Hierarchical Storage Systems," mss, p. 297, Eighteenth IEEE Symposium on Mass Storage Systems and Technologies (MSS'01), 2001. [5] Yih-Farn Robin Chen, Giuseppe Di Fabbrizio, David Gibbon, Serban Jora, Bernard Renger, Bin Wei, “Geotracker: geospatial and temporal RSS navigation”, Proceedings of the 16th international conference on World Wide Web WWW '07, SESSION: Smarter browsing, Pages: 41 – 50, May 2007. [6] Ozden, Banu, Rastogi, Rajeev, Silberschatz, Avi, “Architecture Issues in Multimedia Storage Systems”, ACM SIGMETRICS Performance Evaluation Review, Volume 25, Issue 2, pages 3-12, Sept. 1997. [7] Stefan Tai, Nirmit Desai, Pietro Mazzoleni, “Service communities: applications and middleware”, Foundations of Software Engineering, Proceedings of the 6th international workshop on Software engineering and middleware, SESSION: Networking and services, Pages: 17 – 22, 2006. [8] Grosky, William I., “Managing multimedia information in database systems”, Communications of the ACM, Volume 40, Issue 12, pages 72 – 80, Dec., 1997. [9] Pal Halvorsen, Carsten Griwodz, Vera Goebel, Ketil Lund, Thomas Plagemann, Jonathan Walpole, "Storage System Support for Continuous-Media Applications, Part 1: Requirements and Single-Disk Issues," IEEE Distributed Systems Online, vol. 05, no. 1, Jan. 2004. [10] Oya Kalipsiz, "Multimedia Databases," iv, p. 111, Fourth International Conference on Information Visualisation (IV'00), 2000