IP Multicast in Digital Television Transmission Infrastructure by Kirimania Murithi Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degrees of Bachelor of Science in Electrical Engineering and Computer Science and BARKEIR S W and Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 11 2001 May 2001 @ AQ LIBRARIES Kirimania Murithi, MMI. All rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis document in whole or in part. A uthor ........................................................... Department of Electrical Engineering and Computer Science May 26, 2001 Certified by.............................................. V. Mikael Bove, Jr Principal Research Scientist Thesis Supervisor Accepted by ............... ........... Arthur C. Smith Chairman, Department Committee on Graduate Students IP Multicast in Digital Television Transmission Infrastructure by Kirimania Murithi Submitted to the Department of Electrical Engineering and Computer Science on May 26, 2001, in partial fulfillment of the requirements for the degrees of Bachelor of Science in Electrical Engineering and Computer Science and Master of Engineering in Electrical Engineering and Computer Science Abstract Simultaneous access to popular data on the Internet calls for IP multicast protocols. The digital television (DTV) transmission infrastructure has not been sufficiently utilized as a means of IP multicast despite the congestion problem that face implementations of IP multicast applications over the Internet. Due to the nature of DTV transmission and coding schemes, significant portions of its channel bandwidth end up unused. This unused bandwidth can be leveraged for DTV IP multicast, that is, broadcasting Internet content that has been detected to be on high demand. In this thesis, DTV channel coding and compression schemes are explored and analyzed in depth, leading to implementation of an IP multicast protocol that encodes packetized Internet data into the unused spectrum of a DTV transmission channel. Thesis Supervisor: V. Michael Bove, Jr Title: Principal Research Scientist 2 Acknowledgments Dr. V. Michael Bove, Jr. - Principal Research Scientist, MIT Media Lab Prof. William F. Schreiber - Professor Of Elec. Eng., Emeritus, SR Lectuer Prog. George C. Verghese - Professor Of Elec. Eng. & Computer Science Everest Huang - MIT Elec. Eng. PhD Candidate Dwaine Clarke - MIT Computer Science Master's Candidate 3 Contents 1 Overview 2 Introduction 3 4 8 10 2.1 Motivation DTV IP Multicast .......................... 10 2.2 Problem Description ............................... 11 2.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 The Objective and General System Model . . . . . . . . . . . . . . . . . . . 13 History & Background Information 15 3.1 Streaming Media and the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 IP Multicast over the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 DTV Transmission Infrastructure Standards Development . . . . . . . . . . . . . . . 17 3.3.1 Conventional Analog Television Standards . . . . . . . . . . . . . . . . . . . . 18 3.3.2 The Advanced Television Systems Committee (ATSC) . . . . . . . . . . . . . 18 3.3.3 The Digital Video Broadcasting (DVB) Project . . . . . . . . . . . . . . . . . 19 3.3.4 Digital Broadcast Schemes and Modulation Formats . . . . . . . . . . . . . . 19 3.3.5 Moving Picture Experts Group (MPEG) Standards . . . . . . . . . . . . . . . 22 3.3.6 Dolby Digital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4 Transition from Analog to DTV Transmission . . . . . . . . . . . . . . . . . . . . . . 24 3.5 The PC/Internet-enabled Devices in DTV Transmission . . . . . . . . . . . . . . . . 25 3.6 DTV Encoders and Developments in DTV IP Multicast . . . . . . . . . . . . . . . . 27 28 Theoretical Analysis . . . . . . . . . . . . . . . . . 28 . . . . . . . . . . . . . . . . . . . 29 4.3 Fundamentals of MPEG-2 Video Compression Algorithms . . . . . . . . . . 30 4.4 The MPEG-2 Video Coding Techniques . . . . . . . . . . . . . . . . . . . . 30 Intraframe Coding Techniques - Transform Domain Coding - DCT . 31 4.1 Internet Transport versus DTV IP Multicast 4.2 The Structure of an MPEG-2 Bit-Stream 4.4.1 4 Interframe Coding Techniques - Motion Compensated Prediction 32 Coding of Bit-Streams - CBR versus VBR . . . . . . . . . . . . . . . . . 33 4.4.2 4.5 5 Experimentation Procedures & Implementation Details 36 . . . . . . . . . . . . . . . 36 5.1.1 MSSG Encoder Model . . . . . . . . . . . . . . . . . . . . . . . . 36 5.1.2 MSSG Decoder Model . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Analysis of the MPEG-2 Video Transport Stream . . . . . . . . . . . . . 37 5.3 Internet IP Data Injecting/Extraction Protocols . . . . . . . . . . . . . 40 5.3.1 IP Internet Data Encoding (Injecting) Protocol . . . . . . . . . . 41 5.3.2 IP Internet Data Extraction Protocol . . . . . . . . . . . . . . . 42 Results Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . . . 43 5.1 5.4 DTV Transmission Channel Characterization 6 Conclusion 45 7 Recommendations 47 7.1 Limitations 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 A Media & Streaming File Formats 49 A.1 Media File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 A.2 Streaming File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 B MPEG-2 Bit-Stream Codec Model 51 B.1 MPEG-2 Codec Parameter File . . 51 B.2 Encoder Usage of MSSG Software 53 B.3 Decoder Usage of MSSG Software. 66 C MPEG-2 Bit-Stream Data Analysis 69 D IP Data Injecting Protocol 71 E IP Data Extraction Protocol 78 Bibliography 87 5 List of Figures 4-1 CBR Multiplexing of Data................................. 34 4-2 VBR Multiplexing of Data................................. 34 5-1 Varying sizes (in bytes) of a 5 minutes long encoded MPEG-2 video/image frames sequence. 5-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A histogram of the sizes of encoded MPEG-2 video/image frames in a 5 minutes long bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 38 39 Varying sizes (in bytes) of the first 100 encoded video/image frames in an MPEG-2 sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 40 List of Tables 49 A.1 M edia File Formats..................................... A.2 .................................. Streaming File Formats ......... 7 50 Chapter 1 Overview In general, multicast is the delivery of data simultaneously to one or more destinations using a single, local transmission operation [2]. Internet Protocol (IP) multicast involves delivery of IP data, that is, Internet data that is packetized. The two forms of IP multicast explored are Internet IP multicast and Digital Television (DTV) IP multicast. Internet IP multicast is the transmission of data packets - usually audio and video streams - to multiple users simultaneously via the Internet infrastructure [16]. On the other hand, DTV IP multicast is the transmission of similar data via the digital television transmission infrastructure. This transmission scheme is similar to that used for radio and TV programs over the airwaves. The DTV transmission infrastructure has not been sufficiently utilized as a means of IP multicast despite the congestion problem that face implementations of Internet IP multicast. Internet congestion is on the rise mainly due to downloading and streaming of large data content, at times simultaneously. Attempts on using the Internet for large audience and real-time viewing of content have resulted in poor response time and network over-loading. Simultaneous access to popular data on the Internet calls for IP multicast protocols. However, implementations of IP multicast applications over the Internet are not cost effective, and do not solve the Internet congestion problem [10]. Instead, they lead to more congestion. Ultimately, much larger margins for peak data traffic capacity must be incorporated into the requirements for the Internet infrastructure. There is a need for a new solution. Due to the nature of DTV transmission and coding schemes, significant portions of the DTV channel bandwidth end up unused. Thus, broadcasters and service providers can take advantage of the unused bandwidth of their DTV channels to broadcast Internet content that has been detected to have a large demand, hence a prime candidate for IP multicast. In this thesis, DTV channel coding and compression schemes are explored and analyzed in depth, leading to implementation of an IP multicast protocol that encodes packetized Internet data into the unused spectrum of a digital television transmission channel. The Internet data types used, without 8 the loss of generality, consisted of mainly streaming media (please refer to appendix A). Chapter two of the thesis lays out the introduction of DTV IP multicast. It describes the motivation behind DTV multicast research and implementation, the objective and the problem that was targeted, and the general system model that was developed. Chapter three contains the history and background information relevant to DTV IP multicast. It provides history and background information on streaming media on the Internet, IP multicast over the Internet, and the standards that have been developed for DTV transmission infrastructure. Further, it explains the need for transition from analog transmission to DTV transmission, the role of the PC and other Internetenabled devices in DTV transmission, and the current developments in DTV transmission encoders and IP multicast applications. Chapter four explores the theoretical analysis behind the research in DTV IP multicast and the design and implementation that was adopted. In this context, the Internet transport infrastructure is analyzed and compared to DTV IP multicast. The structure of the MPEG-2 bitstream is also described, as well as the fundamentals of MPEG-2 video compression algorithms. Additionally, the MPEG-2 video, audio, and data streams coding techniques are presented and analyzed. Chapter five describes the experimentation procedures and implementation details. In this chapter, the characterization of a DTV channel using an MPEG-2 transport stream is described, and the results and analysis of this channel presented. The design and implementation of real-time data injecting protocols are also described. Chapter six is the conclusion, and chapter seven contains the recommendations. In chapter seven, the limitations that were encountered are presented, and also the avenues for future research in this field. 9 Chapter 2 Introduction 2.1 Motivation DTV IP Multicast Internet Protocol (IP) traffic and the number of users accessing the Internet continue to grow at exponential-like rates. Additionally, large numbers of users continue to seek broadband access as well as access to the same content from popular web sites. As its popularity increases, so has Internet usage changed from predominantly basic Web browsing and e-mail to such applications as online shopping, online trading, voice chat, interactive games and streaming media. Currently there are approximately 200 million worldwide users of the Internet and of these there may typically be 26 million users accessing the Internet at a given time [4]. The number of total and simultaneous Internet users continues to grow, which results in increasing traffic demand on the Internet infrastructure and access facilities leading to Internet congestion. In addition, because of the recent increases and improvement in content compression, creation, transport, distribution and editing tools (integration of video, audio, image and data) for the Web, higher access data rates are needed per end user, which also increases demand for higher transport capacity [6]. Implementation of service enhancements on the Internet in the form of current facility upgrades requires significant capital outlays and other resources. The spread of the Internet tends to be limited by the current network fabric and the bottlenecks that develop at the most popular or heavily trafficked points of the network. With increasing numbers of subscribers, these bottlenecks will be a more difficult issue to address. Various forms of IP multicast applications over the Internet have subsequently been designed and implemented in the last few years. However, these applications are not cost effective, and do not solve the Internet congestion problem [10]. Instead, they lead to more congestion. Ultimately, much larger margins for peak data traffic capacity must be incorporated into the requirements for the Internet infrastructure. Hence, simultaneous downloads and streaming of large data content calls for a new solution that 10 implements IP multicast without increasing or complicating the traffic on the Internet. This solution involves the use of Digital Television (DTV) transmission infrastructure to deliver data to a PC or a digital enabled set-top box that has a digital signal receiver. A reliable IP multicast protocol that takes full advantage of the fast delivery system of DTV channels reduces Internet congestion, provides an alternative avenue for transmission of large data content that is in high demand, and finally, leads to efficient and economic use of the underutilized DTV spectrum. 2.2 Problem Description A digital television signal is transmitted over the same general set of frequencies used by analog television broadcasts, but instead of continuous analog components carrying video and audio information, there is a single, high-speed bitstream [9]. This bitstream is a combination of encoded video, encoded audio and system data (e.g., program guides). A digital video signal is created by digitizing the image to be transmitted into a frame of pixels, then reducing the number of bits needed to represent the image using a compression method sanctioned by the Motion Picture Group known as MPEG-2. Further, a digital audio signal is created by digitizing the sound to be transmitted and compressing the number of bits needed to represent the signal using a compression method known as Dolby Digital. In the US, the FCC mandates that by the year 2003, all broadcasts must be digital, and analog TV is scheduled to end in 2006 [6]. Consequently, FCC is in the process of implementing and allocating DTV frequencies for over the air transmission, otherwise known as 'terrestrial'. In the US alone, there are 68 'terrestrial' broadcast channels. Today, most US homes can receive a digital signal, although few have digital receivers. Despite the fact that nearly 2/3 of American homes have cable service, more than 1/2 of all receivers use antennas [12]. Each DTV channel allocated has 19.4 million bits per second (Mbps) capacity to deliver content to viewers, but far less than that amount is needed for the digital broadcast bitstream. Typically, high definition TV (HDTV) uses 16 Mbps for its broadcast bitstream, while standard definition TV (SDTV) uses 4 Mbps. In addition, empty packets that are inserted into the bitstream to synchronize the transmission timing may waste up to 10% extra bandwidth [14]. All of the unused bandwidth provides an opportunity for broadcasters to capture millions of dollars in revenue from new data services, in particular, those that implement IP multicast over DTV channels. Broadcasters can offer viewers enhanced TV programming and delivery as well as new subscription-based information and entertainment services that integrate Web content delivery with their traditional TV programming. The enormous potential of IP multicast rests with DTV channels where data is delivered just once but to many recipients. This process allows for download of vast amounts of data to communities of users over a high-speed data link, that is, the unused 11 spectrum on a channel. Examples of how IP multicast in this context may be used are corporate information to multiple locations (e.g., price information, promotional media to supermarket or other retail chains), and community information (e.g., curriculum-related media to schools and colleges). With the large data capacity offered by IP multicast over DTV channels, it is entirely feasible that the whole video content and sound programs could be downloaded and stored at the locations that they are needed fast and efficiently without Internet overload. Hard disk drives and other memory devices are increasing in capacity to tens, possibly hundreds, of gigabytes, yet their prices are falling fast. Unlike DTV terrestrial channels where the currently unused bandwidth can be leveraged, in digital cable channels, most of the bandwidth is already in use, sometimes in a wasteful manner [6]. Indeed, cable bandwidth is now in short supply due to the complicated ways in which cable operators allocate their bandwidth. Additionally, despite the fact that cable channels have more bandwidth at their disposal in comparison to DTV terrestrial channels, wireless transmission has a huge added advantage in that it does not entail the need for cable drops and other physical connection points. Also, other wireless devices such as cellular phones and mobile computers can be integrated into the transmission implementation. 2.3 Problem Statement New technologies that reclaim otherwise wasted bandwidth in DTV transmission infrastructure need to be developed. Unfortunately, the current implementations that have been adopted do not take full advantage of all the unused bandwidth of the DTV channels [7]. The process of reclaiming the unused bandwidth in the DTV transmission infrastructure in part involves the design of new applications that inject Internet TCP/IP data into the available spaces of the transmission channel. Internet data that does not need to be sent in real time can be inserted into a compressed digital bitstream on a budgeted DTV channel on as-available time basis. This process is possible because the MPEG-2 compression scheme results in a very "bursty" output as a result of the time-varying information content of the video signal being encoded. Additionally, real-time data, such as streaming Web video or Web radio, can also be included in the implementation by developing effective buffering techniques at the receiving end. Nevertheless, the distinction between data broadcasting (one sender to many receivers) and oneto-one transmission as needed in telephone networks and specific Internet data exchanges must be kept in mind. For example, combining Web access with watching TV programs creates a problem in that everyone watches the same program but wants different data from the Internet. Assuming that the data being transmitted during an IP multicast is the common data that everyone is trying to download (e.g., corporate information to multiple locations, community information for schools), 12 this process would not be a problem. Otherwise, customized IP multicast is a possible alternative. In customized IP multicast, the source sends all the data to all the receivers, but arranges that each receiver only see the data that was requested. Internet intensive content can also be transmitted and cached closer to the end-users location by developing Web caching technologies. It is also appropriate to consider a scenario where a significant proportion, or all, of the available bandwidth is allocated to a particular receiver on a time division basis to facilitate a very rapid download/transmission of an item before serving the next request in a similar manner. This process is possible through encapsulation whereby an IP encapsulator is configured to allocate available bandwidth to any one receiver in the desired manner [6]. 2.4 The Objective and General System Model The objective is twofold. First, to implement IP multicast protocols that enable proficient spectrum sharing of DTV content and Internet data, while at the same time enhancing DTV spectrum efficiency. Second, to provide an alternative route for Internet streaming media data other than the Internet transport infrastructure, therefore, reducing Internet congestion mainly caused by multiple simultaneous downloads of popular data. The implementation employed is based on a PC DTV viewing environment, although it could be adopted for digital set-top boxes and digital TV sets that support software. The results from the implementation illustrate that the transmission speed of popular Internet data to PC viewers can be significantly increased using DTV transmission infrastructure. The approach taken in the implementation aims to emphasize that an economic model whereby broadcasters exchange unused bandwidth with content providers for revenue can be realized. To meet the objectives stated above, DTV channel coding and compression schemes are studied and analyzed in depth. A digital transmission channel is modeled using an MPEG-2 encoder, a broadcast quality MPEG-2 bitstream from a typical broadcast station, and an MPEG-2 decoder to convert the encoded bitstream into a sequence of video frames. The traffic characteristics of the MPEG-2 bitstream generated by the encoder are studied and analyzed. The channel transmission budget is estimated based on the current broadcasting standards. A protocol for packaging Internet data (IP packets) into the unused spectrum of the channel on as-available time basis without corrupting the bitstream content is implemented. Likewise, another protocol for extracting the injected data from the bitstream and implementing the buffering at the client to hold the extracted data is also developed. Finally, the data rate delivered using DTV IP multicast is estimated and compared to the data rates of the current Internet IP multicast implementations for similar data types. This process is accomplished by extrapolating the results obtained from the experimentation to reflect a typical IP 13 multicast to multiple clients in a DTV transmission infrastructure, and comparing the observations against those obtained from an Internet based IP multicast session. Several data types from the Internet are accounted for in the design and implementation of DTV IP multicast protocols, e.g., streaming Web video, streaming MP3 data, and streaming Web radio data (please refer to appendix A). Prior to being injected into the DTV bitstream at the transmission end, the Internet data is packetized (i.e., broken down into packets). At the receiving end, these packets are extracted and buffered to reconstruct the transmitted Internet data. In conducting the experiments, MPEG-2 bitstreams for SDTV transmission and HDTV transmission were considered. 14 Chapter 3 History & Background Information 3.1 Streaming Media and the Internet Streaming media has been around for a few years, starting with RealNetworks' (Progressive Networks) streaming audio in mid 90s, and followed by streaming video a few years later [22]. Streaming audio/video accentuates Web page information with video and voice but does not necessarily require large storage resources at the receiver. Unlike the TV broadcast which has a dedicated local 6 MHz (19.4 Mbps) channel, the quality (bit rate) of streaming video is limited by the available local and long haul network transport capacity. Further, data packet transport on the Internet is currently on a best effort basis, therefore, the quality of the streaming media can vary with traffic during peak and off peak hours. Additionally, most local access is provided via modems with peak rates in the range of 18.8 Kbps to 56 Kbps that produce a "jerky" effect combined with intermittent voice during peak usage. A smaller video window with limited motion is also typical [22]. With the availability of cable modems and Digital Subscriber Line (xDSL) access at data rates of 256 Kbps to 6 Mbps, which are much higher than Plain Old Telephone Service (POTS) modems, broadband streaming media at 100 Kbps to 1 Mbps can readily be supported. This technology opens the door for Internet content with almost TV-like quality and stereophonic sound while maintaining a reasonable window size on the computer monitor screen. The issue remaining here is the transport of such content over the Internet. With the rise in subscription by multiple users to interactive broadband services that potentially lead to simultaneous downloads of large data content, the Internet infrastructure must be greatly expanded to support the transport of broadband streaming media to multiple simultaneous users. As a result, there is an immense requirement for much more transport capacity on the Internet transport infrastructure. Another significant trend is the decreasing cost per Megabyte of hard disk storage, from $2 to $0.03, over the last five years [4]. A typical hard drive (> 10 GB) today can economically store 15 content comparable to that of several CDs or a few DVDs. With many simultaneous users and multiple downloads, content distribution is capable of creating traffic congestion even with an OC192 fiber capacity (9.95 Gbps) on the Internet backbone. As an example, delivering a 1 GB file to 10,000 users in one hour via unicast (point-to-point transmission) over the Internet would require backbone capacity on the order of 24 Gbps. To efficiently use the Internet backbone capacity and serve multiple users, large files and streaming media must be transmitted via IP multicast. 3.2 IP Multicast over the Internet The Internet consists of a network of many routers that use source-to-destination IP address routing and primarily supports unicast (point-to-point) traffic. To deliver a large file or to stream packets at a high data rate from a source to many receiving destinations, transmission must be repeated for each user, even when there is more than one end user destination at the far end of the routing path. This process is highly inefficient and presents multiple transmissions of the content at the source and also results in large amounts of redundant traffic on the adjacent paths from the source. As a result, less capacity is available for other unicast traffic, and congestion occurs with content that is in high demand. To eliminate duplicate transmissions associated with unicast, Internet IP multicast was introduced. Internet IP multicast employs multicast enabled routers (mrouters) [16]. This form of transmission relies on the mrouters to route multiple copies of packets to the appropriate distribution paths (one copy per path) leading to the intended end users. The routing intelligence at the mrouter is accomplished by having each end user register to the multicast session address at their closest mrouter. The registration process ripples backwards from mrouter to mrouter until it reaches the mrouter closest to the source. This backward rippling forms a multicast distribution tree that guides the transmission from the source. At the end of the multicast session, the distribution tree gets torn down by the mrouters, and in effect end user registrations are dropped until a new multicast session is initiated. Due to the complexity of many different topological configurations of end users that Internet IP multicast has to support, various protocols for setting up mrouter-networks for IP multicast have been developed (e.g., MSDP, OSPF, RIP, BGMP). These different protocols are designed to cover dense and sparse network operations, to prune and graft distribution trees, to administer session log on/off, and to locate the multicast sources [10]. Nevertheless, the process of providing a broad base IP multicast capability for many end users located anywhere on the Internet, and at any time, requires that all unicast routers be upgraded to mrouters. The use of these complex protocols can prove to be difficult when applied on a large scale as well. In addition, the cost of replacing the existing routers before their full capital depreciation with mrouters cannot be readily justified for 16 small multicast sessions since these sessions can be conducted with unicast routing without as much expense. To support data transmission during Internet IP multicast sessions via mrouters, several high level protocols, mostly real-time oriented, have been implemented [17]. These include the Real-time Transport Protocol (RTP), the Real-time Control Protocol (RTCP) that works in conjunction with RTP, the Resource Reservation Protocol (RSVP), and the Real-time Streaming Protocol (RTSP). These protocols, at different levels of maturity, have already been used in implementations of Internet IP multicast applications such as Multicast Backbone (MBONE). The MBONE, sometimes called the Multicast Internet, is a virtual network layered on top of the physical Internet to support routing of IP Multicast packets over the Internet [19]. Set up in 1994 as an experimental and volunteer effort, the MBONE originated in an effort to multicast audio and video from the Internet Engineering Task Force (IETF) meetings. Since most IP servers and routers on the Internet do not have the mrouters capabilities, the MBONE was designed to form a network within the Internet that could transmit IP multicasts via all kinds of routers. During an MBONE IP multicast session, tunneling is used to forward multicast packets through routers on the network that are not designed to handle multicast, i.e., the non-mrouters. In the tunneling procedure, an MBONE router that is sending a packet to another MBONE router through a non-MBONE part of the network must encapsulate the multicast packet as a unicast packet so that the non-MBONE router can be able to transmit it [19]. The receiving MBONE router un-encapsulates the unicast packet upon reception and forwards it appropriately. This process complicates the Internet traffic and does not save significant bandwidth. Consequently, the current implementation of MBONE application has a channel bandwidth of about 500 Kbps. It is therefore important to explore IP multicast over DTV transmission infrastructure. However, as an emerging new digital field, this infrastructure is compounded with myriad ideologies that have resulted in competing standards and at times conflicting technologies, some of which may have to be reconciled to guarantee success in the transition from analog transmission to digital. The US DTV standards and the European standards are primed to dominate the DTV industry. 3.3 DTV Transmission Infrastructure Standards Development Television broadcasts began in the United States in 1939 with the National Broadcasting Company (NBC). The Federal Communications Authority (FCA), the forerunner of today's Federal Communications Commission (FCC), set the first American standards for analog broadcast television in 1941. In 1953, the National Television System Committee (NTSC) set the standards for color television broadcasts in the United States [8]. The NTSC standards are also used in Japan. In the US, the Advanced Television System Committee (ATSC) develops standards for DTV transmission, while 17 the Digital Video Broadcasting (DVB) sets the standards in Europe [8]. 3.3.1 Conventional Analog Television Standards There are 3 major analog television standards in the world. That is, the US National Television Standards Committee (NTSC), the European Phase Alternation Line (PAL), and the French Sequential Couleur Avec Memoire (SECAM) [5]. The National Television Standards Committee (NTSC) In 1953, the NTSC was responsible for developing a set of standard protocols for TV broadcast transmission and reception in the United States. The NTSC standards have not changed significantly since their inception, except for the addition of new parameters for color signals [5]. NTSC signals are not directly compatible with computer systems, hence, the need for adapters in the computer environment. An NTSC TV image has 525 horizontal lines per frame (complete screen image). These lines are scanned from left to right, and from top to bottom. In the scanning, every other line is skipped, therefore, it takes two screen scans to complete a frame: one scan for the odd-numbered horizontal lines, and another scan for the even-numbered lines. Each half-frame screen scan takes approximately 1/60 of a second; hence, a complete frame is scanned every 1/30 second. This alternate-line scanning system is known as interlacing. Phase Alternation Line (PAL) PAL is the analog television display standard that is used in Europe and certain other parts of the world. PAL scans the cathode ray tube horizontally 625 times to form the video image. Sequential Couleur Avec Memoire (SECAM) SECAM analog TV display technology is the standard in France, and the countries of the former Soviet Union. Like PAL, SECAM scans the cathode ray tube horizontally 625 times to form the video image. 3.3.2 The Advanced Television Systems Committee (ATSC) In 1987, the FCC formed the Advisory Committee on Advanced Television Service (ACATS) whose purpose was to advise the FCC on the development of advanced television (ATV). ACATS decided not to consider further improvements on the NTSC analog television system but instead to concentrate solely on DTV - an all-digital television system [6]. Hence, ATSC, a standards organization created in 1982 by companies in the television industry embarked on promoting the establishment 18 of technical standards for all aspects of ATV systems. Based in Washington, D.C., ATSC has an international membership of over 200 organizations (up from an original 25) that includes broadcasters, motion picture companies, telecommunications carriers, cable TV programmers, consumer electronics manufacturers, and computer hardware and software companies. The ATSC standards specify technologies for transport, format, compression, and transmission of DTV in the U.S. The main ATSC standards for DTV are 8-Level Vestigial Sideband (8-VSB) modulation format, MPEG-2 standards for video signal compression, and Dolby Digital for audio signal coding. The ATSC is finalizing DTV standards for data broadcasting and interactive services. ATSC standards have sparked controversy in DTV industry in the US. First, cable companies have not yet determined how to efficiently integrate ATSC standards into their television systems because cable systems use different modulation formats from those enacted by ATSC. One of the modulation formats used by cable systems is the Quadrature Amplitude Modulation (QAM) scheme, whose features allow cable operators to encode many programs into their cable spectrum [6]. However, the "must carry" rule enforced by FCC upon cable operators demand that cable transmissions must carry local broadcast programs alongside their own content. Additionally, the satellite transmissions use yet another set of different modulation formats, mostly, the Quadrature Phase Shift Keying (QPSK) scheme [9]. Satellite systems are also under the "must carry" rule in the US. Second, the current DTV standards controversy is also stranded with competition from European standards. In Europe, Digital Video Broadcasting (DVB) sets the standards for terrestrial broadcast. DVB uses Coded Orthogonal Frequency Division Multiplexing (COFDM) as the modulation scheme and MPEG-2 standards for both audio and video encoding. Nevertheless, ATSC DTV standards will likely dominate North America, Mexico, Japan and Korea [6]. 3.3.3 The Digital Video Broadcasting (DVB) Project In the early 1990s, European broadcasters, consumer equipment manufacturers, and regulatory bodies formed the European Launching Group (ELG) to discuss introducing DTV throughout Europe. The Digital Video Broadcasting (DVB) project was created from the ELG membership in 1993 [22]. A fundamental decision of the DVB project was the use of Coded Orthogonal Frequency Division Multiplexing (COFDM) as a modulation format and the selection of MPEG-2 for compression of both audio and video signals. DVB is reputed for its robust transmission that opens the possibilities of providing crystal-clear television programming to television sets in buses, cars, trains, and even hand-held televisions. 3.3.4 Digital Broadcast Schemes and Modulation Formats The two most dominant modulation formats in DTV transmission are the 8-Level Vestigial Sideband (8-VSB) and the Coded Orthogonal Frequency Division Multiplexing (COFDM). 8-VSB is a 19 standard radio frequency (RF) modulation format chosen by ATSC for the transmission of digital television to consumers in the United States and other adopting countries. Countries in Europe (and others under DVB project) have adopted an alternative format, the COFDM [22]. The ATSC 8-Level Vestigial Sideband (8-VSB) The ATSC 8-VSB system uses a layered digital system architecture consisting of a picture layer that supports a number of different video formats; a compression layer that transforms the raw video and audio samples into a coded bitstream; and a radio frequency (RF) modulation / transmission layer [5]. The ATSC 8-VSB system is a single carrier frequency technology that employs vestigial sideband (VSB) modulation similar to that used by conventional analog television. The transmission layer modulates a serial bitstream into a signal that can be transmitted over a 6 MHz television channel. The ATSC 8-VSB system transmits data in a method that uses trellis-coding with 8 discrete levels of signal amplitude [5]. Complex coding techniques and adaptive equalization are used to make reception of the transmitted data more robust to propagation impairments such as multipath (strong static signal reflections), noise and interference [21]. The 6 MHz ATSC 8-VSB system transmits data at a rate of 19.4 Mbps. There is also a 16-VSB mode that has 16 discrete amplitude levels and supports up to 38.57 Mbps of data on a 6 MHz channel. 8-VSB is considered effective for the simultaneous transmission of more than one DTV program (statistical multiplexing) and the broadcasting of data alongside a television program (datacasting) because it supports large data payloads. DVB-T Coded Orthogonal Frequency Division Multiplexing (COFDM) DVB-T COFDM system is based on European Terrestrial Digital Video Broadcasting (DVB-T) standards. In contrast to VSB, DVB-T COFDM system is a multi-carrier technology. The principle of COFDM is to break a single data stream into many parallel, lower rate data streams and then use many sub-carriers to transmit these lower rate streams of data simultaneously [5]. To ensure that the sub-carriers do not interfere with one another, the frequency spacing between them is carefully chosen so that each sub-carrier is mathematically orthogonal to one another [21]. The individual sub-carriers are typically modulated using a form of either quadrature amplitude modulation (QAM) or quadrature phase shift keying (QPSK) [21]. The multi-carrier design of COFDM makes it resistant to transmission channel impairments, such as, multipath propagation, narrowband interference and frequency selective fading [5]. COFDM avoids interference from multipath echoes by increasing the length of the signal samples so that it is greater than the temporal spread of the multipath, and by applying a guard interval between data symbols where the receiver does not look for information. Guard intervals are designed such that most multipath echoes arrive within the guard period, and therefore do not interfere with 20 the reception of data symbols. Further, because information is spread among many carriers, if narrowband interference or fading occurs, only a small amount of information is lost. ATSC 8-VSB versus DVB-T COFDM Each system has its unique advantages and disadvantages. The ATSC 8-VSB system, in general, has a higher data rate capability, has better threshold or carrier-to-noise (C/N) performance, requires less transmitter power for equivalent coverage, and is more robust to impulse and phase noise [5]. On the other hand, the DVB-T COFDM system has better performance in both dynamic and high level (up to 0 dB) long delay static multipath situations [6]. The COFDM system may also offer advantages for single frequency networks and mobile reception. A single frequency network is a network of several stations that broadcast the same signal simultaneously using multiple transmitters. The data throughput of COFDM DTV operation in a 6 MHz channel is less than the 19.4 Mbps provided by 8-VSB operation. Tests indicate that for a 6 MHz channel, COFDM provides a useable data rate of 18.66 Mbps, about 5 percent less than 8-VSB [5]. While a 5 percent data rate difference is relatively small, it has some impact on the ability to provide certain high definition television programming. This difference makes 8-VSB more suitable for data applications, including emerging broadband services, as well as more appropriate for HDTV programming. The higher data capacity also enables 8-VSB to provide other services more efficiently, such as multi-channel video and ancillary data. In addition, 8-VSB system operation is significantly more cost effective for DTV transmission [5]. A COFDM station construction costs would be higher because of the need for a more powerful transmitter, heavier antenna and transmission lines, and possibly a stronger tower. The cost of operating the station would also increase because more electric power would be used. Additional power would be needed because COFDM has a higher C/N threshold than 8-VSB and it also operates with a higher peak-to-average signal power ratio than 8-VSB. As such, COFDM station would require substantially more power than 8-VSB (6 dB, or four times, more power) to serve the same area [5]. Broadcasters using COFDM would therefore be faced with losing substantial coverage, or incurring significantly higher costs for more powerful transmitters and additional electric power for operation. 8-VSB also exhibits more immunity to impulse noise than COFDM. Impulse noise occurs particularly in the VHF band and the lower portion of the UHF band. There have been significant problems of interference from impulse noise (RF noise from vacuum cleaners, hair dryers, light dimmers, power lines, etc.) to COFDM service in Great Britain [5]. COFDM is 8 dB more susceptible to impulse noise commonly found in consumer homes. In theory, 8-VSB and COFDM systems should be able to perform nearly the same in providing service where there is static multipath but COFDM can generally be expected to perform better in situations where there is dynamic multipath, e.g., in mobile operations. With 8-VSB, multipath reflection, or ghosting, is processed through an adaptive equalizer and more complex equalizers are 21 needed to handle stronger reflections and longer intervals of reflection. As a single carrier system, VSB has a higher symbol rate relative to multi-carrier systems such as COFDM. When a signal is transmission, it is met with obstructions such as buildings that scatter it and cause it to take multiple paths to reach its final destination - the receiver. VSB data symbols might not be long enough to withstand multipath echoes without complex adaptive equalization techniques. COFDM's better performance in multipath situations makes it attractive for mobile television viewing. Indeed, COFDM is ideal for Europe because stations in Europe transmit the same signal 100 percent of the time across many borders using single frequency networks [8]. However, COFDM's benefits for large single frequency network operation and mobile service may be inconsistent with the current structure of broadcasting in the United States and other ATSC-compliant countries. In these countries, different programs along with local advertising are broadcast at different times throughout the day depending upon geographic location. Therefore, in order to replicate existing NTSC service with COFDM, it might be necessary to revisit the DTV Table of Allotments [5]. Early evaluations of VSB indicated that it does not support mobile television viewing. As a result, VSB equipment manufacturers are developing ways to solve this problem and that of multipath conditions. It is expected that devices such as internal antennas will help in overcoming these limitations [8]. It is clear that DTV modulation techniques need to be enhanced, resolved and reconciled for the success of DTV transmission. Nevertheless, the research, the ideas, the design and implementation, and the analysis presented in this thesis are not limited to any particular modulation format or encoding techniques. The approach employed can be adopted for a variety of transmission schemes. 3.3.5 Moving Picture Experts Group (MPEG) Standards The MPEG group was established to develop a common format for coding and storing digital video and associated audio information [7]. The MPEG standards are an evolving set of standards for video and audio compression developed by MPEG. The MPEG group completed the first phase of its work in 1991 with the deployment of MPEG-1 standard. MPEG-1 was designed for coding progressive video at a transmission rate of about 1.5 Mbps. It was designed specifically for multimedia CDROM (compact disks) applications [13]. MPEG-1 audio layer-3 (MP3) also evolved from early MPEG work. In response to a need for greater input format flexibility, higher data rates and better error resilience, MPEG-2 standard was developed [7]. MPEG-2 was designed for coding interlaced images at a transmission rate of above 4 Mbps. MPEG-2 is used for digital television broadcast, and digital versatile disk (DVD) content. An MPEG-2 player is backward compatible, meaning it can also handle MPEG-1 data [6]. A proposed MPEG-3 standard, intended for HDTV, was merged with the MPEG-2 standard when it became clear that the MPEG-2 standard met the HDTV requirements. An MPEG-4 stan- 22 dard is in the final stages of development and release. It is a much more ambitious standard and addresses speech and video synthesis, fractal geometry, computer visualization, and an artificial intelligence (AI) approach for reconstructing images [6]. An MPEG-7 is also now being discussed, but is still in its conceptual stage. Motion Picture Experts Group standard 2 (MPEG-2) The MPEG-2 encoding standard is the accepted compression technique for all sorts of new products and services that come with DTV transmission - from satellite broadcasting to DVD to the new DTV transmission [6]. MPEG-2 video compression exploits spatial and temporal redundancies occurring in video [7]. Spatial redundancy is exploited by simply coding each frame separately with a technique referred to as Intraframe coding [13]. Additional compression can be achieved by taking advantage of the fact that consecutive frames are often almost identical. This temporal compression, which has potential for major reduction over simply encoding each frame separately, is referred to as Interframe coding [7]. In Intraframe coding, a frequency-based transform (discrete cosine transform - DCT) algorithm is used to explore spatial correlations between nearby pixels within an image frame [1]. On the other hand, motion-compensated prediction algorithm is used in Interframe coding. In this case, the differences between a frame and its preceding frame are calculated and only those differences are encoded [7]. In the simplest form of Interframe coding, the Intraframe technique is used to code the differences between two successive frames. Additional techniques that are also used in MPEG-2 compression include quantization, bidirectional prediction, and Huffman coding. Quantization, also known as "lossy" compression, is a technique for losing selective information that can be acceptably lost from visual information without affecting how the human eye perceives the image [13]. In bi-directional prediction, some frames are predicted from the content of the frames that immediately precede and immediately follow them. Huffman coding compression technique uses code tables based on statistics about the encoded data [6]. The ultimate goal of MPEG-2 compression technique is the bit-rate reduction for storage and transmission of data by exploring redundancies within that data. By using MPEG-2 coding, broadcasters can transmit digital signals using existing terrestrial, cable, and satellite systems. SDTV and HDTV digital television formats use MPEG-2 compression technique. SDTV's picture and sound quality is similar to that of a DVD. On the other hand, HDTV programming presents five times as much information than SDTV, resulting in cinema-quality programming. 23 3.3.6 Dolby Digital Dolby Digital is a digital audio coding technique that reduces the amount of data needed to produce high quality sound by taking advantage of how the human ear processes sound [6]. The fewer the bits used to represent an audio signal, the greater the coding noise; therefore, effective data rate reduction also involves audio noise reduction. Dolby Digital supports a five-channel audio transmission system and a low-frequency subwoofer for a full surround sound. In consumer electronics industry and ATSC-compliant countries, Dolby Digital soundtrack is the standard audio format for DVD, SDTV, HDTV, and is also used for digital cable and satellite transmissions. Dolby Digital coding takes maximum advantage of human auditory masking by dividing the audio spectrum of each channel into narrow frequency bands of different sizes that are optimized with respect to the frequency selectivity of human hearing. When the coding noise is close to the frequency of an audio signal, the audio signal masks the noise such that the human ear hears only the intended audio signal [6]. This property makes it possible to sharply filter out the coding noise by forcing it to stay very close in frequency to the frequency components of the audio signal being coded. Sometimes the coding noise cannot be masked because it is not in the same frequency as an audio signal. In such cases, the noise must be reduced or eliminated to preserve the sound quality of the original signal. Masking, reducing, or eliminating the noise can reduce the amount of data in the audio signal to 1/10 of the data on a compact disk (CD). 3.4 Transition from Analog to DTV Transmission The FCC set deadlines for stations in the US to complete the DTV transition process. Commercial television stations must complete construction of DTV facilities by 2002 and public television stations must complete their DTV facilities by 2003. The FCC's schedule for transition to DTV proposes that everyone in the US should have access to DTV by 2002, although analog transmissions will continue for some time after that year. After the switch to digital has been completed, regular television sets will need converters to receive broadcasts. To make the transition to DTV a smooth one, in the Telecommunications Act of 1996, FCC allotted to each existing broadcaster an additional 6 MHz channel for digital transmissions so that broadcasters can continue to send out both analog and digital transmissions simultaneously during the transition period. This procedure is known as simulcasting. Broadcasters must make the transition from analog to digital transmission for several reasons. The key benefit is the high transport efficiency of a digital format broadcast. Digital compression packs five or more times as many channels in a given distribution-network bandwidth [11] relative to analog transmission. This added advantage in turn increases the broadcaster's revenue potential by delivering more content to the end-users. 24 Second, all other related segments of the telecommunications industry that are either direct or indirect competitors to the terrestrial broadcasting have made, or are in the process of making the transition to digital. These competitors include commercial wireless service providers, such as cellular and mobile computers; wired services, such as digital subscriber line (DSL) and cable television systems; and direct broadcast satellites. For the broadcasters to remain competitive and improve their services, they must make the transition. Third, the advantages of using digital techniques relative to analog for representing, storing, processing and transmitting signals are overwhelming [11]. Digital signals are more robust that analog signals. Therefore, once digital transmission errors occur, they can be detected and corrected. In addition, digital signals can be encrypted with ease. They can also be manipulated and processed using modern computer techniques and as such, take advantage of the greater processing power and falling costs of computers. Additionally, different types of signals or services can be multiplexed or provided on a common transmission facility with ease in digital signals. Finally, a successful transition of television broadcast from analog to digital will free up spectrum for other uses as determined by the marketplace [11]. It is possible to do much more with the current 6 MHz channel than what today's analog SDTV provides. With digital technology, we can continue to have traditional broadcast services as well as exciting new broadcaster-provided services that include HDTV, multiple streams of SDTV (statistical multiplexing), and new datacasting services such as DTV IP multicast. 3.5 The PC/Internet-enabled Devices in DTV Transmission The inception of DTV creates new opportunities for the PC industry and other Internet-enabled devices. Trends in the current DTV market dictate that the PC and other Internet-enabled devices will play a major role in the future of digital broadcasting [9]. Today, the visual computing PC has the processing power and the flexibility to receive digital information of all kinds, including digital broadcast. It also stores, processes and provides highly visual information to the viewer. The PC model as a vessel for digital broadcasting presents an interactive medium for DTV transmission. In most households today, the PC is the only programmable device that is connected to the Internet. Nevertheless, other household devices such as PDAs and mobile computers are quickly being transformed into Internet access devices, and in the process providing another avenue for DTV transmission. As the PC market grows and its visual computing performance increases, the media viewership trends show that Internet viewing hours are increasing, while the traditional TV viewing is going down [4]. The new digital medium presents an opportunity for broadcasters and content providers to offer new data services alongside interactive programming to the PC-viewing audience. 25 Currently, most of the PC viewer-ship is conducted via Web streaming at the expense of increasingly overcrowding the Internet. DTV transmission infrastructure facilitates better viewer-ship without over burdening the Internet. Additionally, it provides an opportunity to reduce Internet congestion by providing another route for streaming media data and large downloads. PCs are getting cheaper and becoming easier to use. A PC with a large screen display, equipped for DTV reception, is likely to cost far less than a digital television set, and in addition offers increased benefits such as Internet access and other interactive features. Most consumer digital television sets are being introduced at a price between $5,000 and $10,000. A well-equipped PC, with a DTV receiver and a large screen monitor costs less than half as much, yet offers more functionality [3]. The rate of technology change of the PC and other handheld household devices supersedes that of the mostly passive TV [4]. Thus, new communication technologies are likely to be easily integrated into these devices faster than in TVs; in part also due to the shorter average lifetime of PCs and other handheld household devices relative to TVs. Most major companies in the PC and TV industry support PC Theater [3]. For example, Compaq Computer and Intel have proposed the PC Theater initiative, which would establish "plug and play" standards that would let audio/video devices and PC devices work together. PC Dell Computer is currently pre-installing DTV receivers (PCI-bus receiver-cards) in their products to make them more attractive to end-users. Digital TV tuners that let PCs pick up TV signals cost about $150. Since DTV receivers are projected to fall in prices, the use of PCs equipped with digital receivers is projected to be on the rise in the US. As the PC and digital TV convergence begins, the computer industry hopes to dominate by quickly producing and selling many reasonably priced, TV ready PCs. PC makers estimate that they can have 40 million digital machines in homes by the end of 2001 [3]. By contrast, studies from Forrester Research indicate that the TV industry may not sell 20 million digital sets by then. Also, many consumers may decide to buy converter boxes and keep their analog TVs for a while rather than purchase new digital TVs. These owners who do not want to buy expensive digital sets will be able to receive digital format broadcast programming but they will not benefit from digital TV's high quality pictures and sound [3]. IP multicast over DTV transmission infrastructure presents an attractive avenue for the transmission of large Internet data to the PC Theater environment. Internet data that does not need to be sent in real time can be inserted in a compressed video stream on an as-available time basis. This process can also be extended to real-time data like streaming Web video or Web radio by developing effective buffering techniques at the client. The delivery capacity of DTV IP multicast "blows away any broadband on the horizon". 26 3.6 DTV Encoders and Developments in DTV IP Multicast Most commercially available MPEG-2 transmission encoders use Constant Bit-Rate (CBR) mode of transmission [7]. CBR is a uniform transmission rate, which means that for varying information content, this transmission mode does not use up all the available bandwidth in a channel. The design of CBR mode of transmission is such that it guarantees adequate bandwidth for peak data rates during real-time transmission; hence its popularity in real-time voice and video traffic. In Asynchronous Transfer Mode (ATM), for example, CBR guarantees bandwidth for the peak cell rate of the application [2]. However, the individual video frames that are encoded using MPEG2 compression techniques contain drastically varying amounts of information, resulting in wildly varying encoding requirements from frame to frame in order to maximize bandwidth efficiency. With CBR encoders, however, the output must be selected large enough to achieve a minimum quality of the video frame with the most information, that is the frame with data from the most difficult scene to encode. In order to achieve maximum bandwidth efficiency for bursty data traffic, such as MPEG-2 bitstream, Variable Bit Rate (VBR) mode of transmission is more appropriate. However, the fact that TV broadcast needs to be real-time, implementations of VBR that try to leverage the unused bandwidth in the DTV channels results in undesirable delays of the information being transmitted and complicated buffering at the transmitter. As a result, VBR encoding is common in storage applications such as in DVDs, but not in transmission applications. However, VBR encoding can be improvised such that more bandwidth in DTV channels that would otherwise be idle is used for transmission of other data services. VBR transmission leads to more leverage of the unused bandwidth of DTV channels in comparison to the current CBR transmission. 27 Chapter 4 Theoretical Analysis 4.1 Internet Transport versus DTV IP Multicast The Transmission Control Protocol (TCP/IP) protocols work very well with one-to-one transmission traffic on an Internet transport system that is not over-congested [18]. This traffic, e.g., e-mail exchanges, does not usually involve transfer of large data to multiple users, and at times simultaneously, as is the case with Internet media streaming. Multimedia traffic, which comprise a significant portion of potential IP multicast traffic, possess different characteristics and hence require the use of different protocols to provide the necessary services without over-burdening the Internet transport system. Multimedia applications can generally forego the complexity of TCP/IP protocols and use instead a simpler transport framework. Most playback algorithms can tolerate missing data much better than lengthy delays caused by retransmissions that are common in TCP/IP protocols, and also, they do not require guaranteed in-sequence delivery [18]. Large data content downloads involve online delivery of streaming media (video/audio), shareware applications, software upgrades, and electronic documentation. Although the TCP/IP protocols ensure that downloads are 100% accurate by their transmission control from router to router, they are not capable of covering many simultaneous users without replicating each transmission [18]. If there are 100 users online requesting the same content, that content has to be downloaded 100 times. To reach many users simultaneously with minimum transport capacity, the use of IP multicast data transfer is preferred. Unfortunately, simple Internet IP multicast leads to more Internet congestion. In addition, it uses the User Datagram Protocol (UDP/IP) protocol, which unlike TCP/IP does not support retransmission between routers for error correction [16]. TCP/IP is not applicable in IP multicast implementation because it is a point-to-point protocol that has a return path requirement for acknowledgement [16]. UDP/IP provides only a best effort transmission with error packets being 28 dropped instead of being retransmitted. On transmission paths with either significant traffic congestion or high error rates, file corruption, and to some extend corruption of streaming media is highly probable [18]. Using IP multicast over DTV transmission infrastructure makes it possible to create services that can be simultaneously distributed over the DTV spectrum and the Internet. Further, the use of DTV IP multicast provides the broadcasters and content providers with a simple solution to use standard computer applications to access data transmitted to computer hosted receivers. DTV IP multicast also provides the broadcasting system with a simple mechanism for data transport using optimized application layer protocols for streaming media as well as file transfer. DTV IP multicast operates above the transport layer of the encoded MPEG-2 stream, but the physical nature of the underlying transmission infrastructure is transparent to its implementation. An important challenge facing DTV service designers lies in devising data services that operate in the most common, broadcast only environment and scale up in user-level functions with the increased capability of the underlying DTV infrastructure [11]. 4.2 The Structure of an MPEG-2 Bit-Stream An MPEG-2 bitstream contains three major frame types: Intracoded (I) frames which are selfcontained still pictures, Predictive (P) frames which are block-by-block differences with the previous frames, and Bi-directional (B) frames which are the differences with the previous and next frames [7]. While the I-frames are intracoded, the P-frames and the B-frames are intercoded. The I-frames must therefore appear regularly in the stream since they are needed to decode subsequent intercoded frames, that is the P-frames and the B-frames. Indeed, the decoding process cannot begin until an I-frame is received. An I-frame is usually inserted into a stream approximately every half second. P-frames and B-frames use macroblocks to code interframe differences. A macroblock is composed of 16x16 pixels in the luminance space and 8x8 pixels in the chrominance space for the simplest color format [7]. A macroblock is encoded by searching the previous frame or the next frame for the closest match. In a frame with a fixed background and a moving foreground object, for example, the foreground object can be represented by macroblocks from the previous frame and an offset that represents the motion. While the P-frames only require the past frames as a reference to code the differences, the B-frames require both past and future frames. The information content of the three frame types in an MPEG-2 bitstream varies greatly. The I-frames have the largest sizes, while the P-frames tend to have larger sizes relative to the B-frames. 29 4.3 Fundamentals of MPEG-2 Video Compression Algorithms The ultimate goal of video compression algorithms is the bit-rate reduction of data for transmission and storage by exploring redundancies in the data content and encoding a "minimum set" of information. The performance of video compression techniques depends on the amount of redundancy contained in the image data as well as on the actual compression techniques used for coding. With practical coding schemes, a trade-off between coding performance (high compression with sufficient quality) and implementation complexity is targeted. For the development of the MPEG-2 compression algorithms, the consideration of the capabilities of current and future technologies as guided by the existing standards was most important [13]. Depending on the applications requirements, the compression of data may be "lossless" or "lossy". The aim of "lossless" coding is to reduce video data for transmission and storage while retaining the quality of the original images prior to encoding. In contrast, the aim of "lossy" coding techniques which is more relevant to the applications envisioned by MPEG-2 video standards - is to meet a given target bit-rate for transmission and storage [13]. DTV transport infrastructure demands applications that work efficiently with communication channels that have low or constrained bandwidth. In these applications, high video compression is achieved by degrading the video quality such that the decoded image quality is reduced (to an acceptable level) compared to the quality of the original images prior to encoding. The smaller the capacity of the channel the higher the necessary compression of the video data. The ultimate aim of lossy coding techniques is to optimize image quality for a given target bit-rate subject to an optimization criteria. 4.4 The MPEG-2 Video Coding Techniques The MPEG-2 digital video coding techniques are statistical in nature. Video sequences usually contain statistical redundancies in both temporal and spatial directions [7]. The basic statistical property upon which MPEG-2 compression techniques rely on is the inter-pixel correlation, including the assumption of simple correlated motion between consecutive frames. Thus, it is assumed that the value of a particular image pixel can be predicted from nearby pixels within the same frame (using Intraframe coding techniques) or from pixels of a nearby frame (using Interframe techniques). It is clear that in some circumstances, e.g., during scene changes of a video sequence, the temporal correlation between pixels in nearby frames is small or even vanishes to an extent that the video scene resembles a collection of uncorrelated still images. In such cases, Intraframe coding techniques are appropriate to explore spatial correlation in order to achieve efficient data compression. The MPEG-2 compression algorithms employ Discrete Cosine Transform (DCT) coding techniques on image blocks of 8x8 pixels for Intraframe coding [1]. However, if the correlation between pixels in nearby frames is high, i.e., in cases where two consecutive frames have similar or identical 30 content, it is desirable to use Interframe coding techniques that employ temporal prediction (motion compensated prediction between frames). In MPEG-2 video coding schemes, an adaptive combination of both temporal motion compensated prediction followed by transform coding of the remaining spatial information is used to achieve high data compression. 4.4.1 Intraframe Coding Techniques - Transform Domain Coding - DCT The term Intraframe coding refers to the fact that the various compression techniques are performed relative to the information that is contained only within the current frame, but not relative to any other frame in the video sequence. In other words, no temporal processing is performed outside of the current picture or frame. The basic processing block of Intraframe coding is the Discrete Cosine Transform (DCT). In general, neighboring pixels within an image tend to be highly correlated. As such, it is desirable to use an invertible transform to concentrate randomness into fewer, decorrelated parameters. The DCT is near optimal for a large class of images in decomposition of the signal into underlying spatial frequencies. In Fourier analysis, a signal is decomposed into weighted sums of orthogonal sines and cosines that can be added together to reproduce the original signal. Besides decorrelation of signal data, the other important property of the DCT is its efficient energy concentration. In this manner, the sharp time domain discontinuities are eliminated, allowing the energy to be concentrated more towards the lower end of the frequency spectrum. Consequently, Transform Domain Coding is a very popular compression method for still image coding and video coding. Once the Intraframe image content has been decorrelated, the transform coefficients, rather than the original pixels of the image are encoded [13]. In this process, the input images are split into disjoint blocks of pixels, b, of size NxN pixels. The transformation can be represented as a matrix operation using an NxN transformation matrix, A, to obtain the NxN transform coefficients, c, i.e. c = Ab The transformation is reversible, hence the original NxN block of pixels, b, can be reconstructed from c using inverse transformation, i.e. b= A'c Upon many possible alternatives, DCT implementation applied to small image blocks (macroblocks) of usually 8x8 pixels has become the most successful transform for still image and video coding [13]. On top of their high de-correlation performance, DCT based implementations are also 31 used in most image and video coding standards due to the availability of fast DCT algorithms suitable for real time implementations. The major objective of the DCT based algorithms is to make as many transform coefficients as possible small enough such that they do not need to be coded for transmission [1]. At the same time, it is desirable to minimize statistical dependencies between coefficients with the aim of reducing the amount of bits needed to encode the remaining coefficients. Coefficients with small variances (the variability of the coefficients as averaged over a large number of frames) are less significant for reconstruction of the image blocks than coefficients with large variances. On average, only a small number of DCT coefficients need to be transmitted to the receiver to obtain a valuable approximate reconstruction of the image blocks [1]. Further, the most significant DCT coefficients are concentrated around the low DCT coefficients and the significance of these coefficients decay with increased distance between the blocks. This implies that higher DCT coefficients are less important for reconstruction than lower DCT coefficients. The DCT is closely related to Discrete Fourier Transform (DFT) [20], therefore, DCT coefficients can be given a frequency interpretation close to that of DFT. Thus, low DCT coefficients relate to low spatial frequencies within image blocks and high DCT coefficients to high frequencies. This property is used in DCT based coding schemes to remove redundancies contained in the image data based on human visual systems criteria. The human viewer is more sensitive to reconstruction errors related to low spatial frequencies than to high frequencies. Therefore, a frequency adaptive weighting (quantization) of the coefficients according to the human visual perception (perceptual quantization) is often employed to improve the visual quality of the decoded images for a given bit rate [13]. 4.4.2 Interframe Coding Techniques - Motion Compensated Prediction The previously discussed Intraframe coding technique is limited to processing video signal on a spatial basis, relative only to information within the current video frame. Considerably more compression efficiency can be obtained however, if the inherent temporal, or time-based redundancies, are exploited as well. Temporal processing to exploit this redundancy uses a technique known as Motion Compensated Prediction, which uses motion estimation. Starting with an Intraframe (Iframe), the encoder can forward predict a future frame. This frame is commonly referred to as a P-frame, and it may also be predicted from other P-frames, although only in a forward time manner. Each P-frame in this sequence is predicted from the frame immediately preceding it, whether it is an I-frame or a P-frame. Unlike P-frames, I-frames are coded spatially with no reference to any other frame in the sequence. The encoder also has the option of using a combination of forward and backward interpolated prediction. These frames are commonly referred to as bi-directional interpolated prediction frames, 32 or just B-frames. The B-frames are coded based on a forward prediction from a previous I-frame or P-frame, as well as a backward prediction from a succeeding I-frame or P-frame. The main advantage of the usage of B-frames is coding efficiency. In most cases, B-frames will result in less bits being coded overall. Quality can also be improved in the case of moving objects that reveal hidden areas within a video sequence. Backward prediction in this case allows the encoder to make more intelligent decisions on how to encode the video within these areas. Also, since B-frames are not used to predict future frames, errors generated will not be propagated further within the sequence. As a result, the majority of the frame-types in an MPEG-2 bitstream are B-frames. The temporal prediction technique used in MPEG-2 video is based on motion estimation. The basic premise of motion estimation is that in most cases, consecutive video frames will be similar except for changes induced by objects moving within the frames. In the trivial case of zero motion between frames (and no other differences caused by noise), it is easy for the encoder to efficiently predict the current frame as a duplicate of the prediction frame. When this is done, the only information necessary to transmit to the decoder becomes the syntactic overhead necessary to reconstruct the picture from the original reference frame. When there is motion in the images, the process is not as simple. Still, in such a case, the prediction of the actual video frame is given by a motion compensated prediction from a previously coded frame. Motion compensated prediction compression algorithms are a powerful tool in reducing temporal redundancies between frames, and are therefore used extensively in MPEG-2 video coding standards [13]. In these algorithms, video frames are usually separated into macroblocks of 16x16 pixels and a single motion vector is used to estimate and code each of these blocks for transmission. Only the difference between original images and motion compensated prediction images are transmitted. 4.5 Coding of Bit-Streams - CBR versus VBR The MPEG-2 encoding standards define methods for multiplexing one or more audio, video, or optional data streams. Each of these streams is packetized and then multiplexed to form a single output stream for transmission [7]. Most commercially available MPEG-2 transmission encoders use Constant Bit-Rate (CBR) mode of transmission [7]. As figure 4-1 illustrates, multiplexing of different data streams in a CBR mode of transmission is straight forward - the channel bandwidth is appropriated according to the maximum requirement of each stream. This process however, does not use up all the available bandwidth because the individual video frames that are encoded contain drastically varying amounts of information. With CBR encoders, the bandwidth apportioned to each stream must be large enough to transmit enough data to achieve an acceptable quality of images from the frames with the most information, that is, the I-frames. 33 Available Bandwidth for Data Total Bandwidth Budgeted Bandwidth for DTV Transmission time Figure 4-1: CBR Multiplexing of Data. Available Bandwidth for Data Total Bandwidth Budgeted Bandwidth for DTV Transmission time Figure 4-2: VBR Multiplexing of Data. 34 On the other hand, figure 4-2 illustrates another form of MPEG-2 transmission encoders, the Variable Bit-Rate (VBR) mode of transmission. This mode of transmission is commonly used in storage applications such as in DVDs, but not in the current real-time transmission applications because of potential delays and buffering requirements at the transmitter. However, an MPEG-2 bitstream is varying in nature because it is composed of three frame-types that vary greatly in sizes, that is, I, P, and B frames. Although VBR mode of transmission presents a challenge in multiplexing of different data streams, it offers a better opportunity to use up all the bandwidth available in a DTV transmission channel. Given a channel budget, other forms of useful data can be injected into the unused spaces of the bandwidth that are available due to the varying nature of the MPEG-2 bitstream. The injected data is then transmitted alongside this bitstream and extracted at the receiver. 35 Chapter 5 Experimentation Procedures & Implementation Details 5.1 DTV Transmission Channel Characterization In the simulation of a DTV transmission channel, a broadcast quality MPEG-2 video bitstream was used. Both SDTV and HDTV bitstreams were considered for experimentation, but only the analysis and results of the latter are presented. The same strategy that was employed in the experimentation procedures for HDTV bitstream can easily be adopted for an SDTV bitstream. The HDTV bitstream that was used lasted for about 5 minutes, which although not a very long time, was sufficient for the purposes of this research. This time limit was basically a consequence of the enormous computer hard-disk space requirements associated with HDTV video frames. The characteristics of this bitstream were analyzed with the help of a modified version of the MPEG Software Simulation Group (MSSG) software [15]. The MSSG develops MPEG-2 software with the purpose of providing aid in understanding the various algorithms that comprise an MPEG2 encoder and decoder, and in the process, giving a sample implementation based on advanced encoding models. This software project, mostly useful only for academic research, is still an ongoing development. The MSSG software can be simply classified as an MPEG-2 bitstream encoder and a decoder. The encoder and the decoder were verified using a set of verification pictures, a small bitstream and a Unix shell script to automatically test the outputs of the encoder and the decoder. 5.1.1 MSSG Encoder Model The MSSG MPEG-2 encoder software converts an ordered set of uncompressed input video frames into a compressed and coded bitstream sequence compliant with MPEG-2 compression standards. With various modifications (please refer to appendix B), this software was adopted for NTSC trans36 mission encoding standards (6 MHz channel, 8-VSB encoding, 30 frames/second) to generate the results that are presented. Although not presented, PAL transmission and encoding standards (8 MHz channel, COFDM encoding, 25 frames/second) were also considered. The changes made in this software package included adjusting the parameters for compression and transmission rates to match the standards employed in NTSC transmission. Some tools that were initially designed for this software also involved CBR mode of transmission. Those tools were replaced with VBR mode of transmission tools. 5.1.2 MSSG Decoder Model The MSSG MPEG-2 decoder software converts a video bitstream into an ordered set of uncompressed video frames. Just like the encoder, there were modifications made to this part of the software to make it user friendly and consistent with NTSC transmission standards. In order to verify and observe the MPEG-2 bitstream characteristics and properties prior to the decoding process, various aspects of this software were modified accordingly. Other changes made to this software also included modifications that were necessary to make it work alongside the IP data extracting protocol which is discussed later. 5.2 Analysis of the MPEG-2 Video Transport Stream The varying nature of an MPEG-2 video transport stream is clearly illustrated in figure 5-1. This figure represents the varying sizes (in bytes) of video/image frames in a coded MPEG-2 bitstream (in bytes). The HDTV MPEG-2 bitstream that was used to generate this data lasted for a period of about 5 minutes. In this bitstream sequence, the maximum size of the encoded video frames was 46,440 bytes, while the minimum was 2,848 bytes. This is indeed a huge variation (in the range of about 43,592 bytes), but it is consistent with the theoretical analysis and expectations (please refer to chapter 4). From this observation, it is clear that the CBR mode of transmission (without the undesirable buffering and delays at the transmitter) does not use up all the available bandwidth because the individual video frames that are encoded contain drastically varying amounts of information. With CBR encoding, the bandwidth apportioned to the bitstream must be large enough to transmit enough data to achieve an acceptable quality of images from the video frames with the most information, that is, the I-frames. The MPEG-2 bitstream is drastically varying in nature because it is composed of three frametypes that vary greatly in sizes, that is, the I, the P, and the B frames. The B-frames, coded based on a forward prediction from a previous I-frame or P-frame, as well as a backward prediction from a succeeding I-frame or P-frame are extensively used in MPEG-2 bitstreams because of their coding efficiency in terms of space and quality. In most cases, B-frames result in least bits being coded 37 x 104 4.5 - j ~ I I I 4 co 3.5 CD 3 E LL U- a) 2.5 CO E 0 2 0 S1.5 1 r I 1000 I I III 2000 3000 lit P I I'1'1 . 7000 6000 5000 4000 Video/Image Frames Sequence 8000 9000 10000 Figure 5-1: Varying sizes (in bytes) of a 5 minutes long encoded MPEG-2 video/image frames sequence. 38 3000 - - I I I I I I I I I I I 3.5 4 4.5 2500 F- E E 2000 F U (Z M CM E 0 1500 F (D 0 ~0 2 E z11000 I 500 I- 0'0 " 0.5 L 1 1.5 .. L .. L~ 2.5 2 ~ 3 Size Video/Image Frames in Bytes 5 x 104 Figure 5-2: A histogram of the sizes of encoded MPEG-2 video/image frames in a 5 minutes long bitstream. overall, followed by the P-frames, while the I-frames require the most bits. As a result, the majority of the frame-types in an MPEG-2 bitstream are B-frames, while the fewest frame-types are the I-frames. Figure 5-2 is an illustration of this phenomenal. This figure, generated using the MATLAB script in appendix C, is a histogram of the sizes of the encoded video frames in the 5 minutes long MPEG-2 bitstream described above. The total number of the video/image frames in this bitstream sequence was about 10,500. In figure 5-2, the first gaussian distribution representing the majority of the video frames corresponds to the B-frames in the bitstream sequence. The middle gaussian distribution corresponds to the P-frames, while the third and smallest distribution corresponds to the I-frames. This observation is also consistent with the theoretical analysis and expectations of the structure of an MPEG-2 bitstream. Further, using the MATLAB "step" function, as illustrated in the script in appendix C, figure 5-3 39 x 104 5 I I I i i i I 4.5 F4 CO, () -53.5 C E E5 U- S2.5 - E 0 a) .L 1.5 C,, 1 F 0.5 jir _,J - I II 0 0 10 20 30 I KLRJl LJ LF I 60 40 50 Video/Image Frames Sequence I I I 70 80 90 100 Figure 5-3: Varying sizes (in bytes) of the first 100 encoded video/image frames in an MPEG-2 sequence. was generated. This figure is simply a magnified version of figure 5-1, where only the first 100 frames are considered. From this small sample of frames, it is clear which frames are the I-frames, P-frames or B-frames. Additionally, the pattern is also clear. For example, the 7 th frame appears to be an I-frame, followed by a sequence of B-frames till the 2 2 nd frame, which is most likely a P-frame. This observation is consistent with the histogram of frame sizes illustrated in figure 5-2. 5.3 Internet IP Data Injecting/Extraction Protocols The maximum encoded frame size observed in the MPEG-2 bitstream that is described above was 46,440 bytes (46,440 x 8 = 371,520 bits). The NTSC transmission encoding standards allow for a transmission rate of 30 frames/second. Therefore, for CBR mode of transmission, the bandwidth requirements for this bitstream would be (371,520 x 30) bits/sec, which is about 11 Mbps. A 40 typical HDTV broadcast bitstream is theoretically expected to use up to a maximum of 16 Mbps in bandwidth. In this case, the extra 5 Mbps of bandwidth could be attributed to the MPEG-2 bitstream sample that was employed not being a complete representation of the characteristics of the entire program bitstream sequence. As such, some encoded video frames that were not observed in the 5 minutes duration may have bigger sizes than the observed maximum size of 46,440 bytes. Further, broadcasting stations also use some of this extra space for encoding program metadata such as digital TV guides and copyright information. Still, in most cases, there is some space left over at the end that can be used for multiplexing other program sequences, or data streams. The VBR mode of transmission generates more extra space than CBR mode. However, this extra space is "bursty" and unpredictable, therefore, difficult to leverage for real-time transmission applications that multiplex different program data streams because of potential delays and buffering requirements at the transmitter. Nevertheless, this extra space can efficiently be leveraged for IP multicast applications, and therefore, a primary focus of this research. The VBR mode of transmission offers a better opportunity for using up all the bandwidth available in a DTV transmission channel, and in the process, provides an alternative route for transmission of large data content other than the Internet infrastructure. Given a channel budget, Internet data can be injected into the unused spaces of the channel bandwidth available as a result of the varying nature of the MPEG2 bitstream. The injected data is then transmitted alongside this bitstream and extracted at the receiver. Two protocols, one at the encoder stage and the other at the decoder stage, were designed and implemented (from scratch) to accomplish this task (please refer to appendix D & E) by working alongside the MSSG software. 5.3.1 IP Internet Data Encoding (Injecting) Protocol This protocol packages packetized Internet data (IP packets) into the unused spectrum of the DTV channel on as-available time basis without corrupting the MPEG-2 bitstream content. The Internet data, which consisted mainly of media and streaming file formats (please refer to appendix A), was first broken down into IP packets. These IP packets were composed of headers, the data itself, and footers. The headers and footers were necessary so that the transmitted data could be reconstructed at the receiver. After the generation of MPEG-2 bitstream sequence through compression and encoding of video frames, this protocol began by computing the "local maximum size" of the first 100 encoded video frames in the bitstream sequence. The number of encoded video frames, 100, was chosen based on the observations on the nature of the MPEG-2 bitstream that was used for experimentation (please refer to figure 5.3). It is expected that the more the encoded video frames in a bitstream sequence that are used to determine this number, the better the overall results in determining the extra space that can be leveraged. The "local maximum size" value, which is updated sequentially after the transmission of every 41 100 encoded video frames is completed, was necessary to determine how much extra space was unused in those 100 encoded video frames. IP packets would then be injected into the encoded video frames that have sizes less than the value of the "local maximum size". This process then repeats until the end of the MPEG-2 bitstream sequence being transmitted. The IP data injecting protocol takes full advantage of the varying nature of the MPEG-2 bitstream. It was designed to work with VBR mode of transmission, although it also works to serve the same purpose with CBR mode. Nevertheless, the focus of this research was to determine how much extra space can be leveraged for Internet IP multicast in a VBR mode of transmission, which would otherwise not be possible in CBR mode. For the 5 minutes long MPEG-2 bitstream that was examined, a total of 356,701,896 bytes (356,701,896 x 8 = 2,853,615,168 bits) of unused space was leveraged. This suffices to approximately 9.5 Mbps extra bandwidth. Granted, this extra bandwidth is on the high side because the other forms of data that accompany a broadcast program, such as program metadata, were not accounted for. More important, a statistical multiplexer also works to multiplex other program bitstreams with a varying MPEG-2 bitstreams where possible. A statistical multiplexer seeks to bundle multiple programs into one optimized, multiplexed output stream. It uses high-speed programmable digital signal processors to dynamically adjust bandwidth allocation amongst different program bitstreams by exploiting the variations on encoded MPEG-2 streams. In statistical multiplexing, different program bitstreams are multiplexed to form a single CBR bitstream. However, in this arrangement, if one program has a higher share of the total multiplex bandwidth, other program streams typically have an aggregate lower share of the multiplex bandwidth. Therefore, owing to the sporadic nature of an MPEG-2 bitstream, this process cannot be used to entirely leverage all the unused bandwidth in a channel especially when the bandwidth that is left over is not large enough to multiplex any of the available program bitstreams. Therefore, even though a statistical multiplexer may be able to leverage a significant amount of extra bandwidth observed from this experiement, the results obtained are a clear indication of the enormous potential of DTV transmission infrastructure in IP multicast. 5.3.2 IP Internet Data Extraction Protocol This protocol extracts the injected IP packets from the MPEG-2 bitstream, reconstructs the transmitted data from these packets, and buffers the extracted data to ensure that the data being transmitted can be accessed as it is received without waiting for the entire streaming file to finish being transmitted. This process took place prior to the decoding of the encoded MPEG-2 bitstream sequence. The longer the duration of the MPEG-2 bitstream, the larger the IP data content that was transmitted alongside the encoded video frames. The implementation of this protocol ensured that once the IP data was extracted from the encoded video frames, the resulting MPEG-2 bitstream sequence was identical to the original bitstream prior to injecting the IP data at the transmitter. 42 In order to view the received IP data in real time, a very big buffer (in the tune of several gigabytes) must be implemented at the receiver. Where this implementation is not possible, the data can be put into a streaming file such that new incoming data is appended to this file as the file is accessed from the top. 5.4 Results Analysis and Discussion The research objective was clearly met, that is, implementing IP multicast protocols that enable spectrum sharing of DTV content and Internet IP data while enhancing DTV spectrum efficiency, and therefore providing an alternative route for Internet streaming media data other than the Internet transport infrastructure. The alternative data route is critical for reducing Internet congestion, mainly caused by multiple simultaneous downloads of large contents of popular data. The implementation discussed above is based on a PC DTV viewing environment, although it could be adopted for digital set-top boxes and digital TV sets that support software. The results from the implementation illustrate that using DTV transmission infrastructure can significantly increase the transmission speed of popular Internet data to PC DTV viewers. The research approach and the implementation aims to foster an economic model whereby broadcasters exchange unused bandwidth with content providers for revenue without complicating the transmission scheme at the transmitter. The enormous potential of IP multicast rests with DTV transmission infrastructure, where data is delivered just once but to many recipients. This process allows for download of vast amounts of data to communities of users over a high-speed data link, that is, the unused spectrum on a DTV channel. The significant extra bandwidth in DTV transmission channels clearly illustrates the advantage of using DTV transmission infrastructure for IP multicast. On the contrary, the current Internet IP multicast transmission rates vary from tens of kilobits per second (Kbps) to a few megabits per second (Mbps) depending on the structure of the network between the server and the clients. The Internet infrastructure is made up of a myriad of independent crisscrossing networks that employ a backbone structure made up of numerous routers and servers. The variations in delays from these servers and routers are the primary problem of poor performance of streaming media over the Internet. With the rapid increase in Internet usage, and the subsequent congestion, this performance is not likely to improve. IP multicast over DTV channels eliminates the obstacles to getting rich media and other large content to users because there are no routers in the path to introduce delays on overloaded servers. Further, since IP multicast over DTV channels is a broadcast data service, even explosive growth of the user base has no effect on performance. With the advent of DTV transmission, the potential exists for digital broadcasting to play a role as a high-speed unidirectional "overlay" to the Internet. Over time, data that is accessed by many people can be broadcast, leaving the traditional bi- 43 directional Internet more available for true point-to-point communications, such as e-commerce and video teleconferencing. 44 Chapter 6 Conclusion The very structure of the Internet is contrary to the delivery of rich-media content and other forms of large data that are accessed simultaneously by users. The Internet is made up of a myriad of independent crisscrossing networks that employ a backbone structure with numerous routers and servers. The variations in delays from these servers and routers are the primary problem of poor performance of streaming media over the Internet. The Internet was not designed to deliver rich media on a large scale and in real time. The availability and access of video, music, games, software and other forms of rich media is growing faster than the Internet network can keep up with. Even more challenging, the user population is growing faster with each new user placing an independent demand on the network. Implementations of Internet IP multicast have not solved this problem. However, IP multicast over DTV channels eliminates the obstacles to getting rich media and other large content to users because there are no routers in the path to introduce delays on overloaded servers. Further, since IP multicast over DTV channels is a broadcast data service, even explosive growth of the user base has no effect on performance. With the advent of DTV transmission, the potential exists for digital broadcasting to play a role as a high-speed unidirectional "overlay" to the Internet. Over time, data that is accessed by many people can be broadcast, leaving the traditional bi-directional Internet more available for true point-to-point communications, such as e-commerce and video teleconferencing. The research conducted implemented IP multicast protocols that enable spectrum sharing of DTV content and Internet IP data while enhancing DTV spectrum efficiency, and therefore providing an alternative route for Internet streaming media data other than the Internet transport infrastructure. The alternative data route is critical for reducing Internet congestion, mainly caused by multiple simultaneous downloads of large contents of popular data. The implementation conducted was based on a PC DTV viewing environment, although it could be adopted for digital set-top boxes and digital TV sets that support software. The results from the implementation illus- 45 trate that using DTV transmission infrastructure can significantly increase the transmission speed of popular Internet data to PC DTV viewers. The research approach and the implementation aims to foster an economic model whereby broadcasters exchange unused bandwidth with content providers for revenue without complicating the transmission scheme at the transmitter. The enormous potential of IP multicast rests with DTV transmission infrastructure, where data is delivered just once but to many recipients. This process allows for download of vast amounts of data to communities of users over a high-speed data link, that is, the unused spectrum on a DTV channel. The significant extra bandwidth in DTV transmission channels clearly illustrates the advantage of using DTV transmission infrastructure for IP multicast. On the contrary, the current Internet IP multicast transmission rates vary from tens of kilobits per second (Kbps) to a few megabits per second (Mbps) depending on the structure of the network between the server and the clients. With the rapid increase in Internet usage, and the subsequent congestion, this performance not likely to improve. 46 Chapter 7 Recommendations 7.1 Limitations The main limitation in conducting the experiments and procedures for this research was computer space and processing power. HDTV video frames (MPEG-2 files, in general) take a lot of space. Processing these files on a desktop computer also demanded lots of time, especially due to the presence of huge number of video frames in a short duration bitstream sequence. While this limitation can be overcome by more hard-disk space and computer processing power, there are other challenging obstacles facing DTV transmission. The uncertainty in standards and modulation formats may stumble the transition to DTV transmission. The current controversy over 8-VSB and COFDM could lead to problems in deployment of DTV broadcasting. Additionally, each DTV broadcasting channel is stuck with 6 MHz (19.4 Mbps) capacity, even with development and adoption of more efficient modulation formats. On the other hand, cable and satellite transmission networks have the ability to increase their bandwidth more readily. Unlike satellite and cable transmissions, terrestrial/wireless transmission/broadcasting experiences interference as a result of multipath. This problem needs to be solved especially in the US and other ATSC-compliant countries for DTV transmission to be successful. Additionally, satellite and cable transmissions are likely to evolve faster than terrestrial broadcasting. An example of such evolution is developing cable transmission such that content is transmitted as IP data/packets instead of the current bitstreams so that the bandwidth can be used more effectively and the switching schemes improved. Finally, the mobile reception of DTV IP multicast is still in question. It is still not clear how to implement an affordable mobile receiver such that the transmitted data in a DTV channel can be extracted using a reliable mobile receiver. 47 7.2 Future Work The PC DTV IP multicast model that has been presented can be adopted for other net-connected devices, such as digital TVs, digital set-top boxes and mobile devices. Further, transmission models that allow broadcasters to combine their unused bandwidth to form a single cumulative high speed data link for IP multicast can also be developed. Finally, there is a need for more research on how to develop affordable receivers for DTV IP multicast in order to accommodate mobile data reception. 48 Appendix A Media & Streaming File Formats A.1 Media File Formats The content from the broadcast media can be stored in several digital file formats. Table A.1 shows a selection of common standard media file formats for video and audio representations. A.2 Streaming File Formats A streaming file format is one that has been specially encoded so that it can be played while it downloads, instead of having to wait for the whole file to download. As part of the streaming format there is usually some form of compression included. It is possible to stream some standard media file formats, however it is usually more efficient to encode them into streaming file formats. A streaming file format includes extra information such as timing, compression and copyright information. Table A.2 below shows a selection of common streaming file formats. Table A.1: Media File Formats Media Type and Name (Video/Audio) Quicktime Video MPEG Video MPEG Layer 3 Audio Wave Audio Audio Interchange Format Sound Audio File Format Audio File Format (Sun OS) Audio Video Interleaved (Microsoft Win) 49 File Format Extension .mov .mpg .mp3 .wav .aif .snd .au .avi Table A.2: Streaming File Formats Media Type and Name (Video/Audio) Advanced Streaming Format (Microsoft) Real Video/Audio file (Progressive Networks) Real Audio file (Progressive Networks) Real Pix file (Progressive Networks) Real Text file (Progressive Networks) Shock Wave Flash (Macromedia) Vivo Movie File (Vivo Software) 50 File Format Extension .asf .rm .ra .rp .rt .swf .viv Appendix B MPEG-2 Bit-Stream Codec Model With modest changes, the MPEG Software Simulation Group (MSSG) [15] software was adopted for MPEG-2 bitstream encoding and decoding. Presented here with the necessary changes that were employed to achieve NTSC transmission standards is the parameter file for the encoding and decoding process, as well as the usage instructions. B.1 MPEG-2 Codec Parameter File MPEG-2 Test Sequence, 30 frames/sec testd /* name of source files */ qYd /* name of reconstructed images ("-": don't store) */ - /* name of intra quant matrix file - /* name of non intra quant matrix file C"-": default matrix) */ stat.out /* name of statistics file ("-": stdout ) */ 0 /* input picture file format: 150 /* number of frames */ 0 /* number of first frame */ ("-": default matrix) */ 0=*.Y,*.U,*.V, 1=*.yuv, 2=*.ppm */ 00:00:00:00 /* timecode of first frame */ 15 /* N (# of frames in GOP) */ 3 /* M (I/P frame distance) */ 0 /* ISO/IEC 11172-2 stream */ 0 /* 704 /* horizontalsize */ 480 /* vertical-size */ 2 /* aspectratioinformation 1=square pel, 2=4:3, 3=16:9, 4=2.11:1 */ 5 /* frameratecode 1=23.976, 2=24, 3=25, 4=29.97, 5=30 frames/sec. */ 0:frame pictures, 1:field pictures */ 51 5000000.0 /* bit-rate (bits/s) */ 112 /* vbv-buffer-size (in multiples of 16 kbit) */ 0 /* low-delay 0 /* constrained-parameters-flag */ 4 /* Profile ID: Simple = 5, Main 8 /* Level ID: 0 /* progressive-sequence */ 1 /* chroma_format: 1=4:2:0, 2=4:2:2, 3=4:4:4 */ 2 /* video-format: 0=comp., 1=PAL, 2=NTSC, 3=SECAM, 4=MAC, 5=unspec. */ 5 /* color-primaries */ 5 /* transfercharacteristics */ 4 /* matrix-coefficients */ 704 /* display-horizontal-size */ 480 /* displayvertical-size */ 0 /* intra-dc-precision (0: 8 bit, 1: 9 bit, 2: 10 bit, 3: 11 bit */ 1 /* topfield_first */ 0 0 0 /* frame-predframe-dct 0 0 0 /* concealment-motionvectors (I P B) 1 1 1 /* q.scale-type 1 0 0 /* intra-vlcjformat (I P B)*/ 0 0 0 /* alternate-scan (I P B) */ 0 /* repeat-firstfield */ 0 /* progressive-frame */ 0 /* P distance between complete intra slice refresh */ 0 /* rate control: r (reaction parameter) */ 0 /* rate control: avg-act (initial average activity) */ 0 /* rate control: Xi (initial I frame global complexity measure) */ 0 /* rate control: Xp (initial P frame global complexity measure) */ 0 /* rate control: Xb (initial B frame global complexity measure) */ 0 /* rate control: dOi (initial I frame virtual buffer fullness) */ 0 /* rate control: dOp (initial P frame virtual buffer fullness) */ 0 /* rate control: dOb (initial B frame virtual buffer fullness) */ 2 2 11 11 /* P: */ = 4, SNR = 3, Spatial Low = 10, Main = 8, High 1440 = 6, High (I P B) 2, High = 1 */ = = 4 */ */ (I P B) */ forwhor_f_code forw-vertfcode search-width/height */ 1 1 3 3 /* Bi: forwhor_f_code forw-vert-f-code search-width/height */ 1 1 7 7 /* Bi: backhor_f_code backvert_f_code searchwidth/height */ 1 1 7 7 /* B2: forw-hor_f-code forw-vertfcode search-width/height */ 52 1 1 3 B.2 3 /* B2: backhor_f_code back-vert_f_code searchwidth/height */ Encoder Usage of MSSG Software /* name of source frame files */ A printf format string defining the name of the input files. It has to contain exactly one numerical descriptor (%d, %x etc.): Example: frameX02d Then the encoder looks for files: frameOO, frame0l, frame02 The encoder adds an extension (.yuv, .ppm, etc.) which depends on the input file format. Input files have to be in frame format, containing two interleaved fields (for interlaced video). /* name of reconstructed frame files */ This user parameter tells the encoder what name to give the reconstructed frames. These frames are identical to frame reconstructions of decoders following normative guidelines (except of course for differences caused by different IDCT implementation). Specifying a name starting with - (or just - by itself) disables output of reconstructed frames. The reconstructed frames are always stored in Y,U,V format (see below), independent of the input file format. /* name of intra quant matrix file Setting this to a value other than ("-": - default matrix) */ specifies a file containing a custom intra quantization matrix to be used instead of the default matrix specified in ISO/IEC 13818-2 and 11172-2. This file has to contain 64 integer values (range 1.. .255) separated by white space (blank, tab, or newline), one corresponding to each of the 64 DCT coefficients. They 53 are ordered line by line, i.e. v-u frequency matrix order (not by the zig-zag pattern used for transmission). The file intra.mat contains the default matrix as a starting point for customization. It is neither necessary or recommended to specify the default matrix explicitly. Large values correspond to coarse quantization and consequently more noise at that particular spatial frequency. For the intra quantization matrix, the first value in the file (DC value) is ignored. Use the parameter intra-dc-precision (see below) to define the quantization of the DC value. /* name of non intra quant matrix file ("-": default matrix) */ This parameter field follows the same rules as described for the above intra quant matrix parameter, but specifies the file for the NON-INTRA coded (predicted / interpolated) blocks. In this case the first coefficient of the matrix is NOT ignored. The default matrix uses a constant value of 16 for all 64 coefficients. (a flat matrix is thought to statistically minimize mean square error). The file inter.mat contains an alternate matrix, used in the MPEG-2 test model. /* name of statistics file */ Statistics output is stored into the specified file. - directs statistics output to stdout. /* input picture file format */ A number defining the format of the source input frames. Code Format description 0 separate files for luminance (.Y extension), and chrominance (.U, 54 .V) all files are in headerless 8 bit per pixel format. .U and .V must correspond to the selected chromaformat (4:2:0, 4:2:2, 4:4:4, see below). Note that in this document, Cb = U, and Cr = V. This format is also used in the Stanford PVRG encoder. 1 similar to 0, but concatenated into one file (extension .yuv). This is the format used by the Berkeley MPEG-1 encoder. 2 PPM, Portable PixMap, only the raw format (P6) is supported. /* number of frames */ This defines the length of the sequence in integer units of frames. /* number of first frame */ Usually 0 or 1, but any other (positive) value is valid. /* timecode of first frame */ This line is used to set the timecode encoded into the first Pictures' 'Group of header. The format is based on the SMPTE style: hh:mm:ss:ff (hh=hour, mm=minute, ss=second, ff=frame (0..picturerate-1) /* N (# of frames in GOP) */ This defines the distance between I frames (and 'Group of Pictures' headers). Common values are 15 for 30 Hz video and 12 for 25 Hz video. /* M (I/P frame distance) */ Distance between consecutive I or P frames. Usually set to 3. N has to be a multiple of M. M = 1 means no B frames in the sequence. (in a future edition of this program, M=0 will mean only I frames). 55 /* ISO/IEC 11172-2 stream */ Set to 1 if you want to generate an MPEG-1 sequence. In this case some of the subsequent MPEG-2 specific values are ignored. /* picture format */ 0 selects frame picture coding, in which both fields of a frame are coded simultaneously, 1 select field picture coding, where fields are coded separately. The latter is permitted for interlaced video only. /* horizontal-size */ Pixel width of the frames. It does not need to be a multiple of 16. You have to provide a correct value even for PPM files (the PPM file header is currently ignored). /* verticalsize */ Pixel height of the frames. It does not need to be a multiple of 16. You have to provide a correct value even for PPM files (the PPM file header is currently ignored). /* aspectratio_information */ Defines the display aspect ratio. Legal values are: Code Meaning 1 square pels 2 4:3 display 3 16:9 display 4 2.21:1 display MPEG-1 uses a different coding of aspect ratios. In this cases codes 56 1 to 14 are valid. /* frameratecode */ Defines the frame rate (for interlaced sequences: field rate is twice the frame rate). Legal values are: Code Frames/sec Meaning 1 24000/1001 23.976 fps -- 2 24 Standard international cinema film rate 3 25 PAL (625/50) video frame rate 4 30000/1001 29.97 -- 5 30 NTSC drop-frame (525/60) video frame rate 6 50 double frame rate/progressive PAL 7 60000/1001 double frame rate NTSC 8 60 double frame rate drop-frame NTSC NTSC encapsulated film rate NTSC video frame rate /* bit-rate */ A positive floating point value specifying the target bitrate. In units of bits/sec. /* vbvbuffer-size (in multiples 16 kbit) */ Specifies, according to the Video Buffering Verifier decoder model, the size of the bitstream input buffer required in downstream decoders in order for the sequence to be decoded without underflows or or overflows. You probably will wish to leave this value at 112 for MPEG-2 Main Profile at Main Level, and 20 for Constrained Parameters Bitstreams MPEG-1. /* lowdelay */ When set to 1, this flag specifies whether encoder operates in low delay mode. Essentially, no B pictures are coded and a different rate control 57 strategy is adopted which allows picture skipping and VBV underflows. This feature has not yet been implemented. Please leave at zero for now. /* constrained-parametersflag */ Always 0 for MPEG-2. You may set this to 1 if you encode an MPEG-1 sequence which meets the parameter limits defined in ISO/IEC 11172-2 for constrained parameter bitstreams: horizontalsize <= 768 vertical-size <= 576 picture-area <= 396 macroblocks pixel-rate <= 396x25 macroblocks per second vbvbuffersize <= 20x16384 bit bitrate <= 1856000 bits/second motion vector range <= -64.. .63.5 /* Profile ID */ Specifies the subset of the MPEG-2 syntax required for decoding the sequence. All MPEG-2 sequences generated by the current version of the encoder are either Main Profile or Simple Profile sequences. Code Meaning Typical use 1 High Profile production equipment requiring 4:2:2 2 Spatially Scalable Profile Simulcasting 3 SNR Scalable Profile Simulcasting 4 Main Profile 95 % of TVs, VCRs, cable applications 5 Simple Profile Low cost memory, e.g. no B pictures /* Level ID */ Specifies coded parameter constraints, such as bitrate, sample rate, and maximum allowed motion vector range. 58 Code Meaning Typical use 4 High Level HDTV production rates: e.g. 1920 x 1080 x 30 Hz 6 High 1440 Level HDTV consumer rates: e.g. 1440 x 960 x 30 Hz 8 Main Level CCIR 601 rates: e.g. 720 x 480 x 30 Hz 10 Low Level SIF video rate: e.g. 352 x 240 x 30 Hz /* progressive-sequence */ 0 in the case of a sequences containing interlaced video (e.g. video camera source), 1 for progressive video (e.g. film source). /* chromajformat */ Specifies the resolution of chrominance data Code Meaning 1 4:2:0 half resolution in both dimensions (most common format) 2 4:2:2 half resolution in horizontal direction (High Profile only) 3 4:4:4 full resolution (not allowed in any currently defined profile) /* video-format: 0=comp., 1=PAL, 2=NTSC, 3=SECAM, 4=MAC, 5=unspec. */ /* color-primaries */ Specifies the x, y chromaticity coordinates of the source primaries. Code Meaning 1 ITU-R Rec. 709 (1990) 2 unspecified 4 ITU-R Rec. 624-4 System M 5 ITU-R Rec. 624-4 System B, G 6 SMPTE 170M 7 SMPTE 240M (1987) 59 /* transfercharacteristics */ Specifies the opto-electronic transfer characteristic of the source picture. Code Meaning 1 ITU-R Rec. 709 (1990) 2 unspecified 4 ITU-R Rec. 624-4 System M 5 ITU-R Rec. 624-4 System B, G 6 SMPTE 170M 7 SMPTE 240M (1987) 8 linear transfer characteristics /* matrixcoefficients */ Specifies the matrix coefficients used in deriving luminance and chrominance signals from the green, blue, and red primaries. Code Meaning 1 ITU-R Rec. 709 (1990) 2 unspecified 4 FCC 5 ITU-R Rec. 624-4 System B, G 6 SMPTE 170M 7 SMPTE 240M (1987) /* displayhorizontal.size */ /* displayvertical-size */ Displayhorizontal-size and display-vertical-size specify the "intended display's" active region (which may be smaller or larger than the encoded frame size). 60 /* intra-dc-precision */ Specifies the effective precision of the DC coefficient in MPEG-2 intra coded macroblocks. 10-bits usually achieves quality saturation. Code Meaning 0 8 bit 1 9 bit 2 10 bit 3 11 bit /* topfieldfirst */ Specifies which of the two fields of an interlaced frame comes earlier. The top field corresponds to what is often called the "odd field," and the bottom field is also sometimes called the "even field." Code Meaning 0 bottom field first 1 top field first /* framepred_frame_dct (I P B) */ Setting this parameter to 1 restricts motion compensation to frame prediction and DCT to frame DCT. You have to specify this separately for I, P and B picture types. /* concealment-motion-vectors (I P B) */ Setting these three flags informs encoder whether or not to generate concealment motion vectors for intra coded macroblocks in the three respective coded picture types. This feature is mostly useful in Intra-coded pictures, but may also be used in low-delay applications 61 (which attempts to exclusively use P pictures for video signal refresh, saving the time it takes to download a coded Intra picture across a channel). concealment_motion-vectors in B pictures are rather pointless since there is no error propagation from B pictures. This feature is currently not implemented. Please leave values at zero. /* q-scale.type (I P B) */ These flag sets linear (0) or non-linear (1) quantization scale type for the three respective picture types. /* intravlcformat (I P B) */ Selects one of the two variable length coding tables for intra coded blocks. Table 1 is considered to be statistically optimized for Intra coded pictures coded within the sweet spot range (e.g. 0.3 to 0.6 bit/pixel) of MPEG-2. Code Meaning 0 table 0 (= MPEG-1) 1 table 1 /* alternate-scan (I P B) */ Selects one of two entropy scanning patterns defining the order in which quantized DCT coefficients are run-length coded. The alternate scanning pattern is considered to be better suited for interlaced video where the encoder does not employ sophisticated forward quantization (as is the case in our current encoder). Code Meaning 0 Zig-Zag scan (= MPEG-1) 1 Alternate scan 62 /* repeat-first-field */ If set to one, the first field of a frame is repeated after the second by the display process. The exact function depends on progressive-sequence and topjfieldfirst. repeat-firstfield is mainly intended to serve as a signal for the Decoder's Display Process to perform 3:2 pulldown. /* progressive-frame */ Specifies whether the frames are interlaced (0) or progressive (1). MPEG-2 permits mixing of interlaced and progressive video. The encoder currently only supports either interlaced or progressive video. progressive-frame is therefore constant for all frames and usually set identical to progressive-sequence. /* intraslice refresh picture period (P factor) */ This value indicates the number of successive P pictures in which all slices (macroblock rows in our encoder model) are refreshed with intra coded macroblocks. coding. This feature assists low delay mode It is currently not implemented. /* rate control: r (reaction parameter) */ /* rate control: avgact (initial average activity) */ rate control: Xi (initial I frame global complexity measure) */ rate control: Xp (initial P frame global complexity measure) */ /* rate control: Xb (initial B frame global complexity measure) */ rate control: dOi (initial I frame virtual buffer fullness) */ rate control: dOp (initial P frame virtual buffer fullness) */ rate control: dOb (initial B frame virtual buffer fullness) */ These parameters modify the behavior of the rate control scheme. Usually set them to 0, in which case default values are computed by the encoder. /* P: forwhor_f_code forw-vertfcode searchwidth/height */ /* Bi: forw-hor_f_code forwvert_f_code search-width/height */ 63 /* Bi: backhor_f-code back-vert_f_code searchwidth/height */ /* B2: forwhor_f_code forw-vertfcode searchwidth/height */ /* B2: backhor_f_code back-vert_f_code search.width/height */ This set of parameters specifies the maximum length of the motion vectors. If this length is set smaller than the actual movement of objects in the picture, motion compensation becomes ineffective and picture quality drops. If it is set too large, an excessive number of bits is allocated for motion vector transmission, indirectly reducing picture quality, too. All fcode values have to be in the range 1 to 9 (1 to 7 for MPEG-1), which translate into maximum motion vector lengths as follows: code 1 range (inclusive) -8 ... max search width/height +7.5 7 2 -16 ... +15.5 15 3 -32 ... +31.5 31 4 -64 ... +63.5 63 5 -128 ... +127.5 127 6 -256 ... +255.5 255 7 -512 ... +511.5 511 8 -1024 ... +1023.5 1023 9 -2048 ... +2047.5 2047 f-code is specified individually for each picture type (P,Bn), direction (forward prediction, backward prediction) and component (horizontal, vertical). Bn is the n'th B frame surrounded by I or P frames (e.g.: I B1 B2 B3 P B1 B2 B3 P ... ). For MPEG-1 sequences, horizontal and vertical fcode have to be identical and the range is restricted to 1.. .7. P frame values have to be specified if N (N = # of frames in GOP) is greater than 1 (otherwise the sequences contains only I frames). 64 M - 1 (M = distance between I/P frames) sets (two lines each) of values have to specified for B frames. The first line of each set defines values for forward prediction (i.e. from a past frame), the second line those for backward prediction (from a future frame). search-width and search-height set the (half) width of the window used for motion estimation. The encoder currently employs exhaustive integer vector block matching. Execution time for this algorithm depends on the product of search-width and search-height and, too a large extent, determines the speed of the encoder. Therefore these values have to be chosen carefully. Here is an example of how to set these values, assuming a maximum motion of 10 pels per frame in horizontal and 5 pels per frame in vertical direction and M=3 (I B1 B2 P): search width / height: backward hor. vert. 5 B1 <- P 20 10 20 10 B2 <- P 10 5 30 15 forward hor. I ->B1 10 I ->B2 I ->P vert. fcode values are then selected as the smallest ones resulting in a range larger than the search widths / heights: 3 2 30 15 /* P: forw-hor-f_code forwvert_f_code search-width/height */ 2 1 10 3 2 20 10 /* Bi: backhor_f-code backvert_f_code search.width/height */ 3 2 20 10 /* B2: forwhor_f-code forwvert_f_code search-width/height */ 2 1 10 5 /* Bi: forwhor_f-code forwvert_f_code search-width/height */ 5 /* B2: backhor_f_code backvert_f_code search-width/height */ 65 Decoder Usage of MSSG Software B.3 mpeg2decode {options} input.m2v {upper.m2v} {outfile} Options: -vn verbose output (n: level) Instructs mpeg2decode to generate informative ouput about the sequence to stdout. Increasing level (-vi, -v2, etc.) results in more detailed output. -on output format (0: YUV, 1: SIF, 2: TGA, 3:PPM, 4:X11, To choose a file format for the decoded pictures. 5:X11 HiQ) Default is 0 (YUV). The following formats are currently supported: YUV: three headerless files, one for each component. The luminance component is stored with an extension of .Y, the chrominance components are stored as .U and .V respectively. Size of the chrominance files depends on the chromaformat used by the sequence. In case of 4:2:0 they have half resolution in both dimensions, in case of 4:2:2 they are subsampled in horizontal direction only, while 4:4:4 uses full chrominance resolution. All components are stored in row storage from top left to bottom right. SIF: one headerless file, with interleaved components. Component order is Cb, Y, Cr, Y. This format is also known as Abekas or CCIR Rec. 656 format. The chrominance components have half resolution in horizontal direction (4:2:2) and are aligned with the even luminance samples. File extension is .SIF. TGA: Truevision TGA [4] 24 bit R,G,B format in uncompressed (no run length coding) format with .tga extension. PPM: Portable PixMap format as defined in PBMPLUS [5], 66 a graphics package by Jef Poskanzer. Extension is .ppm. X11: display decoded video on an X Window System server. The current version supports only 8 bit color display. You can use the DISPLAY environment variable to select a (non-default) display. The output routines perform 8 bit dithering and interlaced to progressive scan conversion. You can choose among two different scan conversion algorithms (only for 4:2:0 interlaced streams): - a high quality slow algorithm (-o5, X11 HiQ) - a faster but less accurate algorithm (-o4, X11) -f store interlaced frames in frame format By default, interlaced video is stored field by field. The -f option permits to store both fields of a frame into one file. -r use double precision reference IDCT The -r option selects a double precision inverse DCT which is primarily useful for comparing results from different decoders. The default is to use a faster integer arithmetic only IDCT implementation which meets the criteria of IEEE 1180-1990 [3]. -s infile spatial scalable sequence Spatial scalable video is decoded in two passes. The -s option specifies the names of the output files from the first (lower layer) pass to the second (enhancement layer) pass. 'infile' describes the name format of the lower layer pictures for spatial scalable sequences in a format similar to outfile as described below. -q Set this switch to suppress output of warnings to stderr. Usually a bad idea. 67 t Setting this option activates low level tracing to stdout. This is mainly for debugging purposes. Output is extremely voluminous. It currently doesn't cover all syntactic elements. outfile This parameter has to be specified for output types -oO to -o3 only. It describes the names of the output files as a printf format string. It has to contain exactly one integer format descriptor (e.g. %d, %02d) and, except for frame storage (-f option or progressive video), a Xc descriptor example: out%02d_%c generates files outOOa.*, outOOb.*, outOl.a.*, ... ('a' denotes the top field, 'b' the bottom field, .* is the suffix appropriate for the output format) upper.m2v is the name of the upper layer bitstream of an SNR scalable stream or a data partioning scalable bitstream (input.m2v is the lower layer). 68 Appendix C MPEG-2 Bit-Stream Data Analysis This Matlab script was used to analyze data obtained from a 5 minute session of an MPEG-2 broadcast bitstream. With this script, the figures in chapter 5 were generated, as well as the values for space availability in a typical MPEG-2 bitsream and rate of data transmission. close all; clear all; load outputStats.txt; A = outputStats; % 10517x1 matrix with values continous N = size(A,1); % this is the rows of A, i.e. 10517 for i = 1:N-1 B(i,1) = A(i+1,1)-A(i,1); end figure(1) plot (B) axis([1 10516 1 48000]); xlabel('Sequence of Image Frames') ylabel('Size of Image Frames in Bytes') print -deps chap5figl.eps 69 figure (2) hist(B,100) xlabel('Size Image Frames in Bytes') ylabel('Number of Image Frames') print -deps chap5fig2.eps figure(3) stairs(B(1:100)) xlabel('Sequence of Image Frames') ylabel('Size of Image Frames in Bytes') print -deps chap5fig3.eps disp('Maximum Frame-Size of the Bit-Stream') MAXB = max(B) disp('Minimum Frame-Size of the Bit-Stream') min(B) disp('Mean of the Frame-Sizes in the Bit-Stream') mean(B) %std(B) x = 1:10516; size(x); K = size(B); TOTAL = 0; for j = 1:K TOTAL = TOTAL + (MAXB - B(j,1)); end disp('Size available in Bytes - For a 5 minutes bitstream') TOTAL 70 Appendix D IP Data Injecting Protocol This protocol packages packetized Internet data (IP packets) into the unused spectrum of the DTV channel on as-available time basis without corrupting the MPEG-2 bitstream content. It was designed and implemented by the author from scratch. #include <stdio.h> #include <stdlib.h> #include <dirent.h> #include <string.h> int main(int argc, char *argv[]) { DIR *dp; struct dirent *dirp; int count = 0; int i = 0; int num = 0; int entireStreamTransmittedFlag = 0; /* Files */ /* INPUT */ const char *sourceFileDirectory = "frames/"; /* MAY HAVE TO CHANGE */ char *directoryFiles[2000]; /* MAY HAVE TO CHANGE */ const char *streamingFile = "stream/bitstream"; /* OUTPUT */ 71 /* MAY HAVE TO CHANGE */ const char *encodedFileDirectory = "encodedFrames/"; /* MAY HAVE TO CHANGE */ /* OTHER VARS */ FILE *sourceFileFPtr = NULL; /* input */ FILE *streamingFileFPtr = NULL; /* input */ FILE *encodedFileFPtr = NULL; /* output */ char fullSourceFileName[100] =""; "\0"; char fullEncodedFileName[100] char *fileContents = "\0"; char *testContents = "\0"; int maxFileSize = 0; int sizeOfAFileName = 80; /* read in command line arguments, check them, and convert argument into an int */ if (argc != 2) { printf ("Usage: myEncode maxFileSize\n"); exit(0); } else { maxFileSize = atoi(argv[1J); printf("%sXd\n", "The maxFileSize argument entered is: ", maxFileSize); /* STUB!!! TAKE OUT */ //maxFileSize = 25000; } /* read files in the frame source directory */ if ( (dp = opendir(sourceFileDirectory)) == NULL) { printf("Xss\n", "Error: could not open: ", sourceFileDirectory); exit(0); } while ( (dirp = readdir(dp)) != NULL) { if ((strncmp(dirp->d.name, ".", 1)) && 72 (strncmp(dirp->d-name, 2))) "..", { if ((directoryFiles[count] = calloc(sizeOfAFileName, sizeof(char))) == NULL) { printf("%s\n", "Error: could not allocate memory for filenames"); exit (0); } dirp->dname, memcpy(directoryFiles [count], count++; // strlen(dirp->d-name)); number of files in the directory } } // open streaming file in binary mode if ((streamingFileFPtr = fopen(streamingFile, "rb")) == NULL) { printf("Xss\n", "Error: could not open ", streamingFile); exit(0); } entireStreamTransmittedFlag = 0; count is the number of files in the directory // for(i = 0; i < count; i++) // for each file in the directory { // full source file name (input file) strcpy(fullSourceFileName, sourceFileDirectory); strcat(fullSourceFileName, directoryFiles[i]); // full encoded file name (output file) strcpy(fullEncodedFileName, encodedFileDirectory); strcat(fullEncodedFileName, dire ctoryFiles [i]); printf ("Processing file: [Xs]\n", fullSourceFileName); /* allocate size of fileContents to be maxFileSize, 73 then read in contents of source file into it */ // allocate memory sizeof (char))) if ((f ileContents = calloc(maxFileSize, == NULL) { printf(" ss\n", "Error: could not allocate memory for fileContents when processing fullSourceFileName); ", exit(0); } // open file in binary mode if ((sourceFileFPtr = fopen(fullSourceFileName, "rb")) == NULL) { printf("'/ss\n", "Error: could not open ", fullSourceFileName); exit(0); } // read in the file char by char num = 0; while ((num < maxFileSize) && (fread(&fileContents [num], == 1, sizeof (char), sourceFileFPtr) 1)) { num++; } /* check if num is greater than maxFileSize; if it is, return an error if ((testContents = calloc(1, sizeof (char))) == NULL) { printf(" ss\n", "Error: could not allocate memory for fileContents when processing ", fullEncodedFileName); exit(0); } // if num is // then error >= maxFileSize, and there is if (num >= maxFileSize) 74 still something to be read, { if (fread(&testContents[0], == sizeof(char), 1, sourceFileFPtr) 1) { printf("%ss\n", "Error: file too large ", fullSourceFileName); exit (0); } } // close the file fclose(sourceFileFPtr); if (entireStreamTransmittedFlag) { /* print into encoded file output */ if ((encodedFileFPtr = fopen(fullEncodedFileName, "wb")) == NULL) { printf("Xss\n", "Error: could not open ", fullSourceFileName); exit(0); } fwrite(fileContents, sizeof(char), num, encodedFileFPtr); fclose(encodedFileFPtr); } else { /* append the next bytes in the stream onto fileContents */ /* check to see if there is enough space */ if (num < (maxFileSize - 2)) // i.e. (maxFileSize - 3) or less { /* Set delimiter ... */ fileContents[num] = '1'; num++; fileContents[num] = '1'; num++; 75 fileContents[num] = '1'; num++; /* read in next contents of streamingFile until maxFileSize */ while ( (num < maxFileSize) && (fread(&fileContents[num], == sizeof(char), 1, streamingFileFPtr) 1)) { num++; } /* end of streamingFile */ if (num < maxFileSize) { printf("Xs%s\n", "Reached end of streaming file while processing ", printf("%s\n", "Encoding complete, entire stream transmitted"); entireStreamTransmittedFlag = 1; } } /* if (num < (maxFileSize - 2)) */ /* print into encoded file output */ if ((encodedFileFPtr = fopen(fullEncodedFileName, "wb")) == NULL) { printf("%ss\n", "Error: could not open ", fullSourceFileName); exit(0); } fwrite(fileContents, sizeof(char), num, encodedFileFPtr); fclose(encodedFileFPtr); } /* else */ } /* for */ fclose(streamingFileFPtr); closedir(dp); if (!entireStreamTransmittedFlag) 76 fullSourceFileName); { printf ("%s\n", "WARNING: Encoding incomplete, } exit (0); } /* main */ 77 entire stream was not transmitted"); Appendix E IP Data Extraction Protocol This protocol extracts the injected IP packets from the MPEG-2 bitstream, reconstructs the transmitted data from these packets, and buffers the extracted data to ensure that the data being transmitted can be accessed as it is received without waiting for the entire streaming file to finish being transmitted. This protocol was also designed and implemented by the author from scratch. #include <stdio.h> #include <stdlib.h> #include <dirent .h> #include <string.h> // holds binary strings for writing to a file typedef struct SData { char *data; // holds s-expression int length; // length of data } SData; char *myBinaryStrstr (char *working-string, int working-string-length, char *search-string, int search-stringjlength) { char *pi ="\; 78 char *p2 = "\O"; int p2Length = 0; p2 = working-string; p2Length = working-string-length; while ((p1 = memchr(p2, search-string[0], p2Length)) 1= NULL) { if ((memcmp(p1, search-string, search- stringjlength)) 0) == { return p1; /* found search-string */ // p2Length = old p2Length - (new p2 - old p2) p2Length = p2Length - ((p1 + 1) - p2); p2 = (p1 + 1); } return NULL; /* failed to find search-string */ } void myStringTokenize(SData *string, SData *search-string, SData **resultTokens, int *resultSize) char *working-string = "\0"; int tempjlength = 0; char *p1 = "\0"; int i = 0; int working-stringjlength = 0; //working-string = ap-pstrdup(r->pool, string); 79 working-string = calloc(string->length, sizeof(char)); memcpy(working-string, string->data, string->length); i = 0; working-string-length = string->length; /* insert bytes before the delimeter '111' into an array in resultTokens. This does it for each 111 found, and increments i with each pass */ while ( (p1 = myBinaryStrstr(working-string, working-stringjlength, search-string->data, search-string->length)) != NULL) { /* copy first part over temp-length = p1 - working-string; //temp-string = ap-pstrndup(r->pool, working-string, temp_length); //allocate memory for this SData resultTokens[i] = calloc(i, sizeof(SData)); resultTokens[i]->data = calloc(tempjlength, sizeof(char)); memcpy(resultTokens[i]->data, working-string, temp-length); resultTokens[i]->length = temp-length; /* find last part */ p1 = p1 + strlen(search-string->data); /* make last part working string */ //tempstring = ap.pstrcat(r->pool, temp-string, p1, NULL); working-string-length = working-string-length - (tempjlength + strlen(searchstring->data)) memcpy(working-string, p1, working-stringjlength); // copy over the remainder memset((working-string + working-stringjlength), '\0', (string->length - workingstring.length)); // set rest of bytes to null i++; // insert the bit stream that was appended (if everything goes ok ... ) // or insert input string if no delimiter was found 80 resultTokens[i] = calloc(1, sizeof(SData)); resultTokens [i] ->data = calloc(working- stringlength, sizeof (char)); memcpy(resultTokens [i] ->data, working-string, working-stringjlength); resultTokens [i] ->length = working-stringjlength; i++; if (i > *resultSize) { *resultSize = -1; } else { *resultSize = i; } } int main(int argc, char *argv[]) { DIR *dp; struct dirent *dirp; int count = 0; int i = 0; int num = 0; SData **resultTokens; int resultSize = 2; /* Files */ /* INPUT */ const char *encodedFileDirectory = "encodedFrames/"; /* MAY HAVE TO CHANGE */ char *directoryFiles[2000]; /* MAY HAVE TO CHANGE */ /* OUTPUT */ const char *decodedFileDirectory = "decodedFrames/"; /* MAY HAVE TO CHANGE */ const char *streamingFile = "decodedFrames/decodedBitstream"; /* MAY HAVE TO CHANGE */ /* OTHER VARS */ FILE *streamingFileFPtr = NULL; /* input */ FILE *encodedFileFPtr = NULL; /* input */ 81 FILE *decodedFileFPtr = NULL; /* output */ SData *encodedSData = calloc(1, sizeof(SData)); SData *searchStringSData = calloc(1, sizeof(SData)); char fullEncodedFileName[100] = "\O"; char fullDecodedFileName[100] = "\O"; char *fileContents = "\0"; char *testContents = "\0"; int maxFileSize = 0; int sizeOfAFileName = 80; /* read in command line arguments, check them, and convert argument into an int */ if (argc != 2) f printf ("Usage: myEncode maxFileSize\n"); exit(0); } else { maxFileSize = atoi(argv[1]); printf("%sd\n", "The maxFileSize argument entered is: ", maxFileSize); /* STUB!!! TAKE OUT */ //maxFileSize = 25000; } /* read files in the encoded frame directory */ if ( (dp = opendir(encodedFileDirectory)) == NULL) { printf("'/ss\n", "Error: could not open: ", encodedFileDirectory); exit(0); } while ( (dirp = readdir(dp)) != NULL) { if ((strncmp(dirp->d-name, (strncmp(dirp->d-name, "..", ".", 1)) && 2))) { 82 if ((directoryFiles[count] = calloc(sizeOfAFileName, sizeof (char))) == NULL) { printf("%s\n", "Error: could not allocate memory for filenames"); exit (0); } dirp->d.name, memcpy(directoryFiles [count], count++; // strlen(dirp->d-name)); number of files in the directory } } // open streaming file in binary mode if ((streamingFileFPtr = fopen(streamingFile, "wb")) == NULL) { printf("%ss\n", "Error: could not open ", streamingFile); exit(0); } searchStringSData->data = "111"; searchStringSData->length = 3; // count is the number of files in the directory for(i = 0; i < count; i++) // for each file in the directory { // full encoded file name (input file) strcpy(fullEncodedFileName, encodedFileDirectory); strcat(fullEncodedFileName, directoryFiles[ii); // full decoded file name (output file) strcpy(fullDecodedFileName, decodedFileDirectory); strcat(fullDecodedFileName, direct oryFiles [i]); printf ("Processing file: ['s] \n", fullEncodedFileName); /* allocate size of fileContents to be maxFileSize, then read in contents of source file into it */ 83 // allocate memory if ((fileContents = calloc(maxFileSize, sizeof (char))) == NULL) { printf("XsYs\n", "Error: could not allocate memory for fileContents when processing ", fullEncodedFileName); exit(0); } // open file in binary mode if ((encodedFileFPtr = fopen(fullEncodedFileName, "rb")) == NULL) { printf("Xss\n", "Error: could not open ", fullEncodedFileName); exit(0); } // read in the file char by char num = 0; while ((nurn < maxFileSize) && (fread(&fileContents [nuim], sizeof (char), == 1, encodedFileFPtr) 1)) { num++; } /* check if num is greater than maxFileSize; if ((testContents = calloc(1, if it is, return an error sizeof (char))) == NULL) { printf("YXsXs\n", "Error: could not allocate memory for fileContents when processing ", fullEncodedFileName); exit(0); } // if num is >= maxFileSize, and there is still // then error if (num >= maxFileSize) { 84 something to be read, sizeof(char), if (fread(&testContents[0], == 1, encodedFileFPtr) 1) { printf("Xss\n", "Error: file too large; argument ", increase maxFileSize fullEncodedFileName); exit(0); } } if (num >= maxFileSize) printf("Xss\n", "Error: file too large; increase maxFileSize argument ", fullEncodedFileName); exit(0); } */ // close the file fclose(encodedFileFPtr); // make encodedSData encodedSData->data = fileContents; encodedSData->length = num; resultSize = 2; resultTokens = (SData **) calloc(resultSize, sizeof(SData *)); myStringTokenize(encodedSData, searchStringSData, resultTokens, &resultSize); if (resultSize == -1) { printf("%ss\n", "Error: more than 1 delimiter found in ", fullEncodedFileName); exit(0); } if ((decodedFileFPtr = fopen(fullDecodedFileName, "wb")) 85 == NULL) { "Error: could not open ", fullDecodedFileName); printf("Xss\n", exit (0); } fwrite (resultTokens [0] ->data, sizeof (char), resultTokens [0] ->length, fclose(decodedFileFPtr); if (resultSize == 2) { if (resultTokens[1] ! NULL) { if (resultTokens[1]->length != 0) { fwrite (resultTokens [1] ->data, sizeof (char), resultTokens[1)->length, streamingFileFPtr); } } } } /* for */ fclose(streamingFileFPtr); closedir(dp); exit(0); } /* main */ 86 decodedFileFPtr); Bibliography [1] John R. Buck Alan V. Oppenheim, Ronald W. Schafer. Discrete-Time Signal Processing. Prentice-Hall, Inc., New Jersey 07458, second edition, 1999. DSP Bible. [2] Grenville J. Armitage. Ip multicasting over atm networks. EEE Journal on Selected Areas in Communications, 1997. NJ, USA. [3] David Clark. Pc and tv makers battle over convergence. IEEE Journal, Computer - CA, USA, 1997. The conversion from analog to digital has begun. [4] eMarketer. Dtv. http://www.emarketer.com/, Online journal, 2001. [5] Bruce Franca. Dtv report on cofdm and 8-vsb performance. FCC Report, 1999. FCC, Office of Engineering and Technology, USA. [6] Miller Freeman. The Guide To Digital Television. Miller Freeman PSN Inc., New York 10016, second edition, 1999. DTV Stuff. [7] Steven Gringeri. Transmission of mpeg-2 video streams over atm. GTE Laboratories, 1998. MA, USA. [8] MIT DSP Group. Mpeg. http://rleweb.mit.edu/Publications/currents/curlO-1.htm#change, Online Journal, 2001. [9] Ken Aupperle John B. Casey. Digital television and the pc. Hauppauge Computer Works, Inc., 1998. USA. [10] H. M. Shuler K. Jong, J. W. Carlin. Ip transport trends and satellite applications. IBC 2000 proceedings Journal, 2000. Loral Skynet, USA. [11] Milan Milenkovic. Delivering interactive services via digital tv infrastructure. IEEE Feature Article on MultiMedia, 1998. Intel, USA. [12] William Schreiber. Technological decision-making at the national level. MIT DSP, 1998. USA. 87 [13] Thomas Sikora. Mpeg-1 and mpeg-2 digital video coding standards. Heinrich-Hertz-Intitut Berlin - Image Processing Department,2000. [14] SkyStream. Optimizing bandwidth - the skystream approach. new competitive opportunities for service providers. http://www.skystream.com/products/optimizing.pdf, Online Journal,2000. USA. [15] ISO/IEC 13818 Draft International Standard. Generic coding of moving pictures and associated audio. http://www.mpeg.org/MPEG/index.html, Online Journal,1996. Test Model 5, ISO/IEC JTC1/SC29/WG11/ N0400, MPEG93/457. [16] W. Richard Stevens. TCP/IP Illustrated, Volume 1, The Protocols. Addison Wesley Longman, Inc., first edition, 1994. [17] Marjory Johnson Vicki Johnson. Ip multicast apis and protocols. Stardust.com, Inc., CA, USA, 2000. A technical description of IP Multicast APIs, and an overview of application requirements for their use. [18] Marjory Johnson Vicki Johnson. Ip multicast apis and protocols. Stardust.com, Inc., CA, USA, 2000. A technical description of IP Multicast APIs, and an overview of application requirements for their use. [19] Marjory Johnson Vicki Johnson. Ip multicast backgrounder. IP Multicast Initiative (IPMI), CA, USA, 2000. How IP Multicast paves the way for next-generation network applications. [20] Graham Wade. Signal Coding and Processing. Cambridge University Press, second edition, 1994. Coding Techniques. [21] Stephen G. Wilson. DigitalModulation and Coding. Prentice-Hall, Inc., New Jersey 07458, first edition, 1996. Digital Modulation Bible. [22] www.whatis.com. Dtv. http://whatis.techtarget.com/, Online journal, 2001. 88