Fine Granularity Adaptive Multi-Receiver Video Streaming

Fine Granularity Adaptive Multi-Receiver Video Streaming Viktor S. Wold Eidea,b Frank Eliassenb,a Jørgen Andreas Michaelsenb Frank Jensenb a Simula Research Laboratory, P.O. Box 134, N-1325 Lysaker, Norway of Oslo, P.O. Box 1080 Blindern, N-0314 Oslo, Norway b University ABSTRACT Efficient delivery of video data over computer networks has been studied extensively for decades. Still, multi-receiver video delivery is challenging, due to heterogeneity and variability in network availability, end node capabilities, and receiver preferences. Our earlier work has shown that content-based networking is a viable technology for fine granularity multireceiver video streaming. By exploiting this technology, we have demonstrated that each video receiver is provided with fine grained and independent selectivity along the different video quality dimensions region of interest, signal to noise ratio for the luminance and the chrominance planes, and temporal resolution. Here we propose a novel adaptation scheme combining such video streaming with state-of-the-art techniques from the field of adaptation to provide receiver-driven multi-dimensional adaptive video streaming. The scheme allows each client to individually adapt the quality of the received video according to its currently available resources and own preferences. The proposed adaptation scheme is validated experimentally. The results demonstrate adaptation to variations in available bandwidth and CPU resources roughly over two orders of magnitude and that fine grained adaptation is feasible given radically different user preferences. Keywords: adaptive multi-receiver video streaming, scalable video coding, content-based publish-subscribe systems 1. INTRODUCTION Efficient delivery of video data over computer networks, such as the Internet, has been studied extensively for decades. As a result, relatively good solutions exist for streaming video from a sender to a single receiver. However, the content of a single video stream is often of interest to a number of receivers simultaneously. Delivering such a video stream efficiently to a potentially large number of receivers is complicated and represents a challenge even within large LANs.1 Video streaming over WANs or the Internet is even harder. Despite extensive research multi-receiver video streaming still represents a challenge which awaits satisfactory solutions.2, 3 The challenge is to provide each video receiver with the best possible video quality when considering resource limitations and preferences, while maintaining efficiency and scalability at the sender side, in the network, and at the receiver side. In other words, the challenge is to provide each receiver with a video stream which is customized according to individual preferences and currently available resources. The main difficulties are related to the combination of heterogeneity, variability, and efficient one-to-many delivery, as discussed in the following. The sources for heterogeneity are manifold. Video servers and clients are often connected to networks by diverse technologies having different characteristics. Similarly, the end node capabilities may differ radically with respect to processing capabilities, display resolution, and power availability. Receivers may also have different preferences regarding the relative importance of the different video quality dimensions. Some prefer frame rate over frame quality and vice versa. In addition to these somewhat static differences, video streaming systems have to cope with variability on a shorter time scale. The video content itself changes over time, which often translates into variable resource requirements, for example, with respect to bit rates and processing needs. The resource availability may also change over time. The available bandwidth experienced by each receiver may vary due to congestion and changes in signal strength for wireless equipment. Similarly, the processing capacity and power availability may also vary over time. If resources become scarce, some receivers may prefer to sacrifice quality in the temporal dimension instead of reducing spatial quality. This illustrates that different receivers may have different preferences regarding how to adapt to variations in resource availability. Handling heterogeneity and variability is further complicated by the need for efficient one-to-many delivery. In unicast delivery each client connects directly to a server. The server may then provide each client with a stream customized to the user preferences and current resource availability. However, unicast is inefficient and does not scale, since the server has to handle each and every video receiver individually. Network or application level multicast delivery may improve network efficiency by moving and distributing the onus of packet replication and forwarding to downstream nodes. However, a Further author information: V.S.W.E.:viktore@simula.no, F.E.:frank@ifi.uio.no, J.A.M.:jorgenam@ifi.uio.no, F.J.:fnjensen@ifi.uio.no single multicast stream provides each client with little or no selectivity. Simulcast delivery may provide clients with a choice between a few streams, each having a different tradeoff between quality characteristics and resource requirements. These streams may be delivered on different multicast channels. In order to provide each receiver with selectivity, a number of streams with different quality and resource characteristics are necessary. However, each additional stream carries some redundant video data and reduces network efficiency. The tradeoff between selectivity and efficiency in practice makes this approach rather coarse grained. A combination of layered coding and multicast,4 is also rather coarse grained, but improves network efficiency as the amount of redundant information in different streams is reduced. Fine granularity and efficient multi-receiver video streaming can be realized by exploiting content-based networking for data delivery.5–7 Content-based networking systems are realized by overlay networks. A combination of several attributes and values in each packet determine the overlay routing, forwarding, and delivery. In effect each receiver may unilaterally customize the video quality in each and every dimension, such as region of interest, signal-to-noise ratio for the luminance and chrominance planes, and temporal resolution. This paper describes how state-of-the-art techniques from the field of adaptation can be bridged with video streaming over content-based networking. A novel adaptation scheme is proposed which takes advantage of the fine grained selectivity provided by such video streaming systems. The scheme is validated experimentally and the results demonstrate that fine grained adaptation is feasible given quite different user preferences. The experiments show that a receiver may adapt to small variations in resource availability, over roughly two orders of magnitude, with respect to both bandwidth and CPU. The rest of the paper is structured as follows. Sect. 2 presents background information on content-based networking and gives an overview of how such communication systems can be used to realize a fine granularity multi-receiver video streaming system. Sect. 3 describes our adaptation scheme for multi-dimensional adaptive video streaming which is able to exploit all quality dimensions supported by the underlying video streaming system. Empirical results presented in Sect. 4 demonstrate the ability of our adaptation scheme to support fine grained multi-dimensional adaptation, while taking user preferences and a wide range of resource availability situations into account. A comparison to related work on adaptive multi-receiver video streaming is provided in Sect. 5. Sect. 6 concludes the paper and presents some ongoing work. 2. BACKGROUND This section briefly describes content-based networking and how such systems can be used to support fine granularity multi-receiver video streaming, the basis for the proposed unilateral receiver-driven adaptation scheme. 2.1. Content-Based Networking In content-based networking, as described in,8 messages are forwarded based on content and not on an explicit address. Each message contains a set of attribute-value pairs and clients express interest in certain messages by specifying predicates over these attributes and values. The predicates are used by the network nodes for routing. Messages are forwarded based on content-based routing tables and delivered only to clients having matching selection predicates. Filtering of messages is pushed towards the source while replication is pushed towards the destinations. Consequently, each message should traverse a link at most once. Distributed content-based publish-subscribe systems are intimately related to content-based networking—clients express their interest by means of subscriptions and send messages by publishing. Publish-subscribe communication has proven useful in a wide range of applications, including high performance systems.9 The expressiveness of the subscription languages provided by different kinds of publish-subscribe systems varies, where content-based systems provide most expressiveness∗ . Examples of content-based publish-subscribe systems include.11–14 In these systems, the messages are called event notifications or just notifications for short. Clients inject notifications into the network by publishing, as depicted in Fig. 1. Other clients express their interests using subscriptions, as predicates over the attribute-value pairs, and are notified accordingly. Both subscriptions and notifications are being pruned inside the content-based network. As an example, consider the case where two clients are connected to the same content-based network node, as illustrated in Fig. 1. Notifications are not forwarded in the network before at least one client has expressed interest. When the first client subscribes and thereby registers interest in some notifications, the subscription may get forwarded over the links numbered 5, 4, 3, 2, and 1. State is then maintained in the network nodes to allow notifications to flow, e.g., on the reverse path. When the client connected by the link numbered 6 subscribes, the first network node only forwards the subscription if this new subscription is not covered by the first subscription. Supposing a content-based network contains a publisher that supplies financial market information, a notification may for example contain the attributes and values [order=sell ticker=abc price=15 volume=2000]. An interested receiver may have subscribed using the predicates [ticker=abc price<20], and will be notified accordingly. ∗ For a survey on the publish-subscribe communication paradigm and the relations to other interaction paradigms see.10 A subscriber with the subscription [volume>2000] will not be video server video client(s) notified. Note that attributes not specified become unconstrained. network node Content−based network Architectures and algorithms for scalable wide area contentintra domain subscribe 3 based publish-subscribe systems have been studied extenpublish 2 1 sively.12, 14 Complementary to the WAN case is the challenge 5 notify 7 4 of efficiently distributing very high rate event notifications besubscribe subscribe tween a large number of clients within a smaller region, e.g., a 6 8 notify notify LAN or an administrative domain. In Fig. 1 this corresponds to Figure 1. Content-based networking example the notifications forwarded over the links numbered 5, 6, and to other clients within the same domain. An architecture for a distributed content-based event notification service targeted at LAN or intra domain usage is described in.15 In short, a mapping from the “event notification space” to IP multicast addresses is specified. Hence, a notification is mapped to an IP multicast address and efficiently forwarded to all clients having matching subscriptions. A client may publish several thousand notifications, carrying several megabytes of data per second, more than sufficient for streaming compressed high quality video. 2.2. Video Streaming over Content-Based Networking The work reported in6 describes a video streaming system layered on top of a content-based network. By utilizing the rich routing opportunities offered by the content-based network, the video streaming system allows video receivers (clients) to select video quality along a set of Quality of Service (QoS) dimensions without affecting other clients or the video server. A method for video encoding was devised to accommodate the publish-subscribe paradigm and allow for a fine grained and independent selectivity in each quality dimension. The video encoder partitions a set of captured video frames into multiple fragments that each map to a point in the QoS space in such a way that the full set of fragments span the QoS space. The video encoder then encapsulates each part of the video signal in a notification by marking up the binary video data with attributes and values describing where in the QoS space the fragment belongs. Each notification is published into the content-based network, and routed by the network towards clients that have expressed interest through subscriptions. Thus, clients may opt to receive only a subset of the full video signal, depending on their needs and resource availability. Efficient delivery of video data is maintained in terms of network utilization and end node decoding requirements. The video coding scheme currently supports selectivity along the following video quality dimensions: region of interest, signal to noise ratio for the luminance and the chrominance planes, and temporal resolution. The techniques used to achieve independent selectivity for each of these dimensions are briefly described in the following. A main principle of the encoding scheme is that of layered encoding. In layered encoding the video signal is encoded into a number of layers—a higher layer encodes video data corresponding to higher quality. The layers are coded cumulatively in order to reduce the amount of redundant information across layers. A client requests a particular quality in a QoS dimension by subscribing to notifications corresponding to the requested quality level and all lower levels for that dimension. Since the data corresponding to a quality level in one QoS dimension is encapsulated independently from data of other QoS dimensions, each video receiver may independently customize the video signal along different video quality dimensions by subscribing to the corresponding notifications. This is in contrast to, e.g., Receiver-driven Layered Multicast4 where the sender determines the QoS dimensions to be affected for each additional layer. Video receivers may select the region of interest in terms of so-called superblocks. A superblock contains a number of 16 × 16 pixel macroblocks, is self contained and represents the smallest selectable region. Superblocks are indexed by a row and a column number and the attribute names are row and col respectively. With respect to colors, the luminance part of the video signal is processed and encapsulated in notifications separately from the chrominance part. In the signal-tonoise-ratio (SNR) dimension, a layered coding is used, which relies on the Discrete Cosine Transform (DCT) and bit-plane coding.16 In essence, notifications for the base layer contain the most significant bits. The attributes for selecting the SNR in the luminance and chrominance dimensions are named qy (quality luminance) and qc (quality chrominance) respectively. As an example, a receiver may subscribe to the luminance part of the video signal at a higher quality than the chrominance part: [qy<=2 qc<=1]. The temporal dimension is also realized by a layered coding scheme, where each additional layer increases (doubles) the frame rate. The superblocks in the first frame in a group of pictures (GOP) are intra coded and thus self contained. The rest of the GOP is coded to exploit temporal redundancy (similar to I and P frames in MPEG). The attribute for selecting the temporal resolution is named tl (temporal layer). The number of notifications generated per video frame is determined by the number of superblocks and the number of luminance and chrominance SNR layers. The current implementation uses a base layer and three enhancement layers for Figure 2. Four video clients with different interests the temporal, luminance, and chrominance dimensions. With respect to the region of interest dimension, three rows and two columns are used. Clearly, the number of notifications generated can be substantial. However, content-based routers may encapsulate several notifications in a single packet destined for another content-based router in order to increase efficiency by exploiting the full payload capacity of the underlying network technology. Suppose the notification: [sid=10 tl=1 qy=3 qc=-1 col=1 row=2 blob=q34i23QR...D], where sid is the video stream identifier attribute, and blob is the binary video data. Because it is possible to leave dimensions unspecified, and thus unconstrained in the subscription, a client may express interest in the full quality video stream by specifying [sid=10]. Another client interested in only the luminance data at reduced frame rate, would use another subscription such as [sid=10 tl<=2 qc<0]. A third client may receive all regions in full quality, but at the lowest frame rate, by using a subscription such as [sid=10 tl=0]. More complex requirements can be expressed by combinations. Fig. 2 illustrates that different video receivers may independently select different parts of the video signal (if the images are unclear in the printed copy, please refer to the electronic version). The four screenshots in the figure illustrate the following selections (from left to right): full quality and colors; only luminance in first superblock row, only chrominance in second superblock row, and both luminance and chrominance is last superblock row; only some of the superblocks and in lowest luminance quality; only luminance and low quality, except for a region having full quality and colors. The flexibility of the video coding scheme allows a receiver to even specify different temporal resolutions (frame rates) for different superblocks (frame regions). To fully exploit the flexibility of the video streaming system without sacrificing expressiveness, a layer of indirection is needed to automatically obtain suitable configurations based on preferences and available resources. Hence, a suitable adaptation scheme is needed. This is the topic of the following section. 3. MULTI-DIMENSIONAL RECEIVER-DRIVEN ADAPTATION Here we show how the described video coding and encapsulation scheme can be used to support receiver-driven adaptive video streaming. The proposed adaptation scheme is unilateral in the sense that each receiver dynamically and independently may adapt the quality of the received video according to own preferences and resource availability. This is a unique feature of our approach made possible by the underlying content-based networking services and video encoding scheme. 3.1. Overview of Adaptation Scheme Our approach to adaptation is to combine the above described techniques for fine-granularity multi-receiver video streaming with state-of-the-art techniques for video adaptation. A candidate video adaptation technique must allow dynamic selection of video data subscriptions (video quality) as a function of user preferences and resource availability. A candidate technique must also handle multi-dimensional QoS and resource spaces. We hypothesize that due to the fine granularity selectivity of the video streaming, the resulting adaptation scheme should be able to find a set of video data subscriptions that in resource requirements closely matches any reasonably balanced combination of CPU and network bandwidth availability, while at the same time satisfying individual user preferences. Consequently, the main principle of the adaptation scheme will be that a receiver changes the video data subscriptions dynamically according to the resource availability. While there are many techniques for selecting video quality as a function of resource availability and user preferences, our choice for this demonstration fell on multi-dimensional utility functions. A utility function is a measure of user satisfaction as a function of QoS or resource availability. We model an adaptive video streaming system, inspired by adaptation spaces,17 as several video data subscription alternatives (the adaptation space). We are currently also investigating other formal adaptation models, including,18 to see whether they may provide a better foundation for multi-dimensional receiver-driven adaptation. The different subscription alternatives differ in the quality they provide and resources they require. Determining the optimal subscription alternative is done by a selection mechanism, based on information about resource availability, resource requirements of different subscription alternatives, and user utility. If a subscription alternative has too high resource requirements compared to available resources, the service may fail in unpredictable ways such as failing to completely decode received video data. A task of the adaptation system is thus to select the subscription alternative that maximizes utility while keeping resource consumption within the limits of the available resources. An adaptation system would normally also include a policy for when to adapt. While this is an essential part of any adaptation system we do not consider details of adaptation polices in this paper. The service provided after adaptation consumes a certain amount of resources and provides a certain utility to the user. Thus, any given point in the adaptation space has a corresponding point both in the resource and the utility space. In contrast, different subscription alternatives may result in the same user satisfaction or have the same resource requirements, that is, single points in the utility and resource spaces may have several corresponding points in the adaptation space. Utility functions are in general n-dimensional functions taking values from an n-dimensional QoS space as argument. However, such functions are generally very complex and challenging to define and to compute with.19 A simpler but less accurate approach, also adopted in the work of,17 is to define overall utility as a weighted sum of a set of dimensional utility functions. A dimensional utility function measures user satisfaction in one QoS-dimension only. The weights of the overall utility functions usually correspond to the relative level of importance of each QoS-dimension as preferred by the user. Utility functions normally map the degree of satisfaction into a real number, often in the range [0, 1], where 0 indicates that the corresponding quality of the service is below the minimum required by the user, while 1 indicates that the corresponding quality is at or above the required maximum quality level. 3.2. Fine-Granularity Unilateral Adaptation Scheme When applying the above principles to adaptive video streaming, the goal of maximizing utility means finding the subscription with highest utility within resource constraints. In the following we describe how QoS is specified and the form of the utility function. We also discuss the issue of determining the resource requirements of each alternative set of subscriptions. 3.2.1. Quality and Resources Quality in the temporal, luminance, and chrominance dimensions is specified as required quality layer. Superblocks (regions of interest) do not have an independent quality dimension, but rather we allow the required quality of each superblock to be specified using the temporal, luminance, and chrominance QoS dimensions. As superblocks are independently coded, the required quality of one superblock can be specified independently of the required quality of other superblocks. Hence a receiver could subscribe to any subset of superblocks of the video image, each with completely different video qualities. As described, in the Adaptation Spaces approach the resource requirement of each subscription alternative must be known. While the approach allows any kind of resource to be taken into account, we will limit this study to consider network bandwidth and CPU utilization as the dimensions of the resource space. Determining resource requirements for a given quality level of a service is in general a hard problem, and many solution approaches exist but none that solves the problem in general in a scalable manner. The general solution to this problem, however, is not the focus of this paper. To estimate the resource requirements of subscription alternatives for the experiments described in this paper we applied a combined approach of measurements and estimates. We measured the resource consumption of each possible subscription involving all superblocks, where all superblocks have equal quality dimensions (in total 96 different possible subscriptions). For subscriptions involving several superblocks with different quality dimensions we estimate the resource requirements as the sum of resource requirements of each involved superblock assuming that each superblock contributes an equal amount to the measured resource consumption. This seems like a feasible approach since video data of different superblocks are encoded and encapsulated into notifications independently of each other. Hence it appears reasonable to assume that the resource requirement of subscribing to multiple superblocks can be estimated as the sum of the resource requirements of the subscriptions for each superblock. This assumption is supported by experiments.6 Of course, the validity of the resource estimates is limited to the specific video content and CPUs used during the experiment. This is a general problem in QoS management. In order to make the observed values also valid for other CPU types and classes of video content scale factors are often used. Our experiments, however, do not attempt to validate such an approach. 3.2.2. Defining the Utility Function When defining a utility function for the demonstration of how our video coding and encapsulation scheme support receiverdriven adaptive video streaming, our goal was to take all video quality dimensions into account. Hence utility should address both the temporal, luminance and chrominance quality dimensions, as well as receiver preferences for region of interest. The latter feature of utility might be used by receivers to express preferences for regions they do not want to receive, or receive in lesser (or higher) quality than other regions. This approach makes the resulting utility function somewhat different in form from a function defined as a plain weighted sum of dimensional utilities. In accordance with the above goal, we define in equation 1 overall utility as the sum of per block utility, where N is the number of superblocks, t is the temporal quality layer, y is the luminance quality layer, and c is the chrominance quality layer. The utility contribution of superblock i is defined in equation 2 where Yi′ (t, y) denotes the utility contribution of the luminance part of the signal and Ci′ (t, c) denotes the contribution of the chrominance part. These are dependent on the temporal layer t since the temporal dimension determines the amount of luminance or chrominance data received. N X U(t, y, c) = Ui (t, y, c) (1) Ui (t, y, c) = Yi′ (t, y) + Ci′ (t, c) (2) i=0 Equations 3 and 4 model per block utility related to preference of region of interest. As indicated by the presence of the two equations, region of interest preference can be specified independently in the luminance and chrominance dimensions. The symbol sgn denotes the sign function such that sgn (x) = 1 if x > 0, −1 if x < 0, and 0 if x = 0. Yi′ (t, y) = Yi (y) + sgn (Yi (y)) T i (t) (3) Ci′ (t, c) = Ci (c) + sgn (Ci (c)) T i (t) (4) In order to explain equations 3 and 4 we first need to look at equations 5, 6 and 7 below. These define per block weighted utility for the temporal, luminance and chrominance QoS dimensions respectively. The symbols Kt , Ky and Kc denote dimensional utility functions while the symbols Wt , Wy and Wc denote receiver preference weights assigned to each QoS dimension. These functions and weights apply to all blocks in an image. The weights may take values in the domain [0, 1] P such that i∈{t,y,c} Wi = 1. Additionally a user can specify per block preference weights of the temporal, luminance and chrominance dimensions denoted by the symbols Wit , Wiy , Wic in the equations below, each with value domain [0, 1]. Wiy and Wic may also take the special value −1 to indicate that the superblock i is not wanted by the receiver for the corresponding dimension. The idea is that subscriptions that include a superblock that is not wanted by the receiver will pay a penalty of negative utility for the corresponding dimension, that is, the per block weighted dimensional utilities Yi and Ci may take negative values. This penalty is also reflected in equations 3 and 4. Because of the use of the sign function in these equations, the utility functions they define may take negative values. For any subscription which includes a superblock not wanted by the receiver, there exists at least one subscription with higher utility—the same subscription, but where superblocks not wanted are not included. T i (t) = Wt Wit Kt (t) (5) Yi (y) = Wy Wiy Ky (y) (6) Ci (c) = Wc Wic Kc (c) (7) Note that in this paper the dimensional utility functions Kt , Ky and Kc are defined as a set of coefficient values, each specifying the utility value for a quality layer of a QoS-dimension. This can conveniently be represented as a one-dimensional array for each dimensional utility function. For examples we refer to later sections of this paper. 3.2.3. Limiting QoS Variation Across Regions Since a receiver may assign different preference weights to the different regions (superblocks) in a video image, the video quality across regions may vary widely. While this may be suitable for some application types, such as parallel processing of video content, it may be less suitable for others. In order to let receivers limit the variability of QoS across regions, we introduce a mechanism that constrains quality deviation across regions. This mechanism allows a receiver to specify a quality deviation value Dmax for each quality dimension d. Dmax is a normalized measure for maximum allowed variation d d of quality in dimension d across regions such that Dmax = 0 means that no variation of quality in dimension d is allowed d between regions, while Dmax = 1 means there is no constraints on the quality variation. Otherwise the allowed variation d increases as the value of Dmax increases from 0 to 1. d For a given subscription alternative and quality dimension d, the quality deviation Dd is defined as the normalized sum of vector distances, Dd,i , between quality layers over all regions i, i = 1, . . . , N. In equations 8 and 9, lid is the quality layer subscribed to for quality dimension d in region i, nd is the total number of quality layers in dimension d, and N is the v number of superblocks. u t N N X X 2 1 Dd = Dd,i (8) Dd,i = lid − ldj (9) q j=1 (nd − 1)N N2 i=1 The use of quality deviation constraints, effectively limits the search space of subscription alternatives as all subscription not satisfying the specified maximum quality deviation are excluded as candidate configurations. 4. EMPIRICAL RESULTS The purpose of this section is to determine experimentally whether video streaming over content-based networking provides a firm foundation for realizing fine grained and multi-dimensional adaptation. The fine grained and independent selectivity should give different receivers the ability to customize the video stream according to their preferences, while the fine full full +3 75 50 TL QY QC utility null 0 +1 base 25 null none 200 400 600 800 1000 1200 1400 BW (kbps) 50 TL QY QC utility none 0 10 20 30 40 50 60 70 80 90 100 CPU (%) (b) utility (c) bandwidth adaptation Figure 3. Results for no regional variation in quality (Action) (d) CPU adaptation full full +3 +3 75 50 base TL QY QC utility null 0 25 none 200 400 600 800 1000 1200 1400 BW (kbps) (b) utility (c) bandwidth adaptation Figure 4. Results for no regional variation in quality (Detail) layer +1 75 +2 utility (%) layer +2 (a) combined resource utilization 25 +1 base null 50 TL QY QC utility utility (%) (a) combined resource utilization layer +1 base 75 +2 utility (%) layer +2 utility (%) +3 25 none 0 10 20 30 40 50 60 70 80 90 100 CPU (%) (d) CPU adaptation grained scalability, in terms of bit rates and decoding requirements, should allow each video receiver to find a configuration that closely matches the resource availability. Combined, this should allow receiver-driven adaptation over a wide range of resource availabilities with fine granularity. Table 1. Dimensional weights In the following three experiments, we test to what Action Detail Parallel extent the adaptation system can select a configuration W W W W W W W Wy Wc t y c t y c t that utilizes the available resources and maximizes the Weights 0.7 0.2 0.1 0.1 0.6 0.3 0.7 0.2 0.1 utility for the receiver, given different receiver preferences and their resource constraints. The experiTable 2. Dimensional utility functions ments will also validate whether the proposed adaptaAction Detail Parallel tion scheme has sufficient expressiveness for capturing Layers K K K K K K K Ky Kc t y c t y c t and handling different receiver preferences. It should be +3 0.9 0.7 0.4 0.4 0.9 0.7 0.9 0.7 0.4 noted that the proposed scheme is not meant to be ex+2 0.8 0.6 0.3 0.3 0.8 0.6 0.8 0.6 0.3 posed to human users directly. Rather we assume that it +1 0.7 0.5 0.2 0.2 0.7 0.5 0.7 0.5 0.2 is possible to find a more appropriate end user model and base 0.6 0.4 0.1 0.1 0.6 0.4 0.6 0.4 0.1 that a mapping to our system level model can be derived. null 0.0 0.0 0.0 0.0 0.0 0.0 It should also be noted that these experiments only consider how different receivers would choose an appropriate configuration, given their resource constraints and preferences. The regulation part of adaptation, which deals with responsiveness, smoothness, and other issues related to when a receiver should transition from one configuration to another, is considered a separate issue and not addressed any further here. 4.1. Handling Different Preferences In this subsection we introduce two receivers with differing interests. These two different and conflicting sets of interests are used to illustrate how the adaptation system configures the underlying video receiving software and to what extent the system is able to maximize a given receivers preference when available resources are constrained. We name the two receivers Action and Detail as mnemonics for their respective sets of preferences. By inspecting the dimensional weights in Table 1 we see that Action prefers temporal resolution as his weight for the temporal dimension is high relative to the other dimensions. Detail prefers high quality luminance data instead. We should expect these differences in preference to result in different adaptations of the video stream. The full preference specifications for the two receivers are given in Table 1 and 2 (the preference specification for Parallel will be used in the last experiment, cf. Sect. 4.3). In the current experiment, we allow no regional variation in quality by specifying = Dmax = 0 (cf. Sect. 3.2.3). = Dmax Dmax c y t We simulated how the adaption system would configure the underlying mechanism given a set of preferences and resource constraints. By not considering the regional dimension we arrive at 96 configuration possibilities that receive (a) combined resource utilization (b) utility (c) combined resource utilization (d) utility Figure 5. Results from full regional variations in luminance and chrominance quality: (a)(b) Action and (c)(d) Detail video data. The simulation decreases the resource availability for both dimensions in equal steps until no configuration alternatives are left. The CPU availability started at 100 % and was reduced by 1 % in each step. The bandwidth availability started at 1400 kbps, which was enough to receive the full stream, and was then reduced in 5 kbps steps. The two dimensions were reduced independently, resulting in 28000 different resource availability scenarios. For each scenario, all configurations with resource requirements less than or equal to the available resources were considered, and only the configuration with the highest utility survived. In cases where the utility was equal the configuration with the lowest resource requirement was selected. The results from the simulations of the Action and Detail preference sets are visualized in Fig. 3 and 4 respectively (if the images are unclear in the printed copy, please refer to the electronic version). Fig. 3(a) and 4(a) show the resource utilization for each of the resource availability scenarios in each preference set. The x-axis shows the various possibilities for CPU availability, while the y-axis shows the bandwidth availability. The intensity (color) represents to what degree the surviving configuration exploits the resource availability.qThe resource utilization is a normalized metric composed of both r 2 r 2 CPU and bandwidth utilization: c + abb ac R= (10) √ 2 In the above equation, rc and rb denote the required CPU and bandwidth for a given configuration, while ac and ab denote the available CPU and bandwidth given by the resource scenario. I.e., CPU and bandwidth requirements are normalized with respect to the available resources and combined using a vector distance calculation. If the normalized CPU and bandwidth utilization are both one, R is also one. Similarly, if both are zero, R is zero, and if both are 1/2, R is 1/2. In a region of the resource space where the resource availability is reasonably balanced between the dimensions, the utilization is high. As seen by comparing Fig. 3(a) and 4(a), the shape of these regions are somewhat dependent on the receivers preferences. Note that when resource availability is low in one dimension, it is not possible to exploit resource availability in the other dimension. This can be seen along the bottom edge and left edge of both figures. In a similar manner Fig. 3(b) and 4(b) present the resulting utility for a given resource scenario. The utility is normalized so that a value of 1 represents full utility. As expected, the utility is highest when both CPU and bandwidth are unconstrained (upper right corner of the plot) since the receiver can receive and decode all dimensions in full quality. As the resource availability falls towards the bottom edge and the left edge the configuration to a lesser and lesser degree fulfills the receivers preference, until the resource availability is too constrained and no valid configurations exists. Fig. 3(c) and 4(c) present different configurations, chosen by the adaptation system, to maximize utility while conforming to resource constraints in the bandwidth dimension. It is assumed here that CPU is in abundance. For both plots, the utility is monotonically increasing as more resources are made available. However, different choices are made with respect to quality dimensions as a result of differing preferences. In the case of Action, quality in the temporal dimension increases early, while luminance quality data increases later as bandwidth availability increases. For Detail opposite choices are made; luminance quality is prioritized while the temporal quality is increased when bandwidth allows, but never at the expense of luminance quality. Similarly, Fig. 3(d) and 4(d) present adaptations that maximize utility when the CPU resource is constrained and bandwidth is plentiful. The trends observed for the bandwidth plots are also found here. However, CPU usage does not vary significantly for different luminance and chrominance layers, as shown empirically in.6 Consequently, the choices in figure 3(d) appear less clear. 4.2. Increasing QoS Variations Across Regions We now extend the previous experiment by introducing selectivity in the regional dimension. The quality layers in each superblock are configured individually resulting in 1012 potential configurations. full full +3 75 50 TL:SB1,SB2 QY:SB1 QY:SB2 utility null 0 (a) combined resource utilization 25 none 50 100 150 200 250 300 350 400 BW (kbps) (b) utility (c) bandwidth adaptation Figure 6. Parallel processing results layer +1 base 75 +2 utility (%) layer +2 +1 50 base TL:SB1,SB2 QY:SB1 QY:SB2 utility null 0 5 10 15 CPU (%) utility (%) +3 25 20 none 25 (d) CPU adaptation In this experiment we consider the configurations that arise when luminance and chrominance quality is allowed to vary across regions, Dmax = Dmax = 1, while Dmax = 0 as in the previous experiment. By constraining the tempoy c t ral layer, so that all regions have the same frame rate, the number of valid configurations is reduced to just below 109 . Table 3. Regional weights The receiver preferences for Action and Detail are also used in this experiment, but their preference profiles are extended with regional Quality Action Detail Parallel weights (cf. Table 3). In this experiment the regional weights are 0.13 0.14 0.13 0.14 0.5 0.5 chosen so that it is beneficial, utility wise, to prioritize the center Wiy 0.19 0.21 0.19 0.21 - 1 -1 0.15 0.18 0.15 0.18 - 1 -1 regions for both receivers. Allowing quality to vary across regions 0.13 0.14 0.13 0.14 1 -1 enables a more fine grained adaptation. The set of possible configuc 7 W 0.19 0.21 0.19 0.21 1 -1 rations is increased by a factor 10 compared to the previous experii 0.15 0.18 0.15 0.18 1 -1 ment. The results from this simulation are presented in Fig. 5. 4.3. Region of Interest and Parallel Processing In this experiment, we consider a case in which the receivers are nodes in a parallel processing video analysis system. Each node will perform some analysis task on a fraction of the full frame in order to partition the computational load. If all nodes were to receive and decode a full frame only to analyze a fraction of it, the solution would not scale.20, 21 The partitioning of load by distributing the regions amongst the available nodes is made possible by allowing negative weights in the utility function (cf. Sect. 3.2.2). If the CPU requirements for the analysis tasks are variable, it would be beneficial to adapt the requirements for the video decoding accordingly. Similarly, if the analysis task requires the communication of a large amount of results, an ability to also reduce the bandwidth requirements for receiving video data gracefully would also be desirable. The parallel processing receivers are essentially the same, so we only need to consider one of them, dubbed Parallel. Parallel uses the same preference set as Action. However, Parallel uses the regional weights to express which regions to receive (cf. Table 3). By setting some of the regional weights to a negative value the adaptation system would incur a penalty, utility wise, by including them. Here, only the luminance quality is allowed to vary between the regions, because no chrominance data is received and Dmax = 0 as in the preceding experiments. The number of possible configurations is t in this case 97 if we count a configuration where no data is received. In Fig. 6, the results from the simulation of the parallel processing scenario are plotted. Note that the maximum resource usage in these plots are lower than the other results presented in this section, because only a fraction of the full frame is received and decoded. It is assumed that most of the CPU is needed to perform the analysis task. As seen in figure 6(a), there are several possibilities for utilizing available resources to a high degree even though only two regions and two quality dimensions are considered. Fig. 6(c) and 6(d) present the choices made by the adaptation system as bandwidth and CPU availability are increased. The bandwidth plot is similar to figure 3(c). Here, the two regions are stepped up in luminance quality in turn. Fig. 6(d) shows that both regions receive the best luminance quality early. Again, this is because no significant savings can be made in the CPU dimension by selecting lower luminance quality. As seen in Fig. 6(c) and 6(d) the utility is rapidly increasing, and remains high as long as both regions are receiving luminance data. 5. RELATED WORK A pioneering work for handling both the multi-receiver and the heterogeneity challenge is RLM,4 receiver-driven layered multicast. RLM combines layered video coding techniques with transport over several multicast channels and supports a combination of spatial and temporal scalability. The mechanism provided for adaptation is to join/leave multicast groups. Clients with low bandwidth connections may register interest in the base layer only, while other clients may subscribe to a number of additional multicast addresses, as bandwidth permits. Regarding user preferences, a layering policy determines the dimension to be refined for each additional layer. This layering policy is fixed at the sender side and determines how the quality of the video is refined when a client subscribes to an additional multicast address. As an example, the first additional layer may improve the spatial quality, while the second improves the frame rate. Although an interesting approach, some receivers may prefer temporal resolution over spatial quality and vice versa. Such conflicts can not be resolved, since the layering policy is fixed at the sender side. As the number of scalable video quality dimensions increases a large number of multicast addresses is needed and the specification of layering policies becomes even more problematic. In order to bridge the heterogeneity gap created by differences in resource availability, hardware capabilities, and software incompatibilities, media gateway systems have been proposed for streaming.22 These systems are overlay networks— the media gateways are internal network nodes, while senders and receivers are at the edges. Gateways receive media streams from upstream nodes, before forwarding the processed and potentially transformed streams to downstream gateways and receivers. The mechanism provided for adaptation is any kind of video filtering and transformation operation, such as temporal scaling, signal-to-noise-ratio scaling, or spatially partitioning as a preparation step for parallel processing. The cost associated with this flexibility is increased network and processing resource consumption and additional delay. Although several receivers may share a single gateway, they may share interest in only some part of the video signal. Hence, it seems difficult to handle such cases efficiently, processing and delivery wise. In23, 24 a framework for real-time quality-adaptive media streaming is presented. The goal is to allow video data to be encoded once and streamed anywhere, by adapting to bandwidth availability. In the scheme, called priority-progress streaming, video data are transformed into a scalable representation. Spatial scalability is realized by transcoding DCT coefficients hierarchically to a set of levels, while frame dropping is used for temporal scalability. This allows video data to be broken up into small chunks which can then be assigned different priorities based on utility functions. The timeline is divided into distinct time intervals, so-called adaptation windows. The most important chunks are sent first and whatever remains in the previous adaptation window is dropped when the system transits to the next adaptation window. One-tomany streaming requires an overlay multicast structure,24 where each edge is realized by means of the priority-progress unicast streaming approach. TCP is used for both unicast and multicast transport. The mechanism provided for adaptation is dropping, by the sender or by a node within the multicast tree. The video quality may therefore differ throughout the distribution tree. Since the priorities are assigned at the root of the tree, only a single policy may be specified for a single session. Similar to RLM, the system seems unable to handle video receivers with different preferences regarding the relative importance of the different video quality dimensions. In,3 a survey of solutions for adaptive multicast over the Internet is presented. Similar to some of the listed approaches, ours take advantage of scalable coding. Contrary to these listed approaches, our approach allows each video receiver to independently and with fine granularity express interest in different parts of the video signal. Additionally, our system supports more scalable video quality dimensions than what is usually the case, a complicating factor from an adaptation modelling point of view. In particular the incorporation of region of interest scalability necessitated a sophisticated adaptation scheme, where each region within a frame is treated separately utility wise. 6. CONCLUSION AND FURTHER WORK We have presented an approach for adaptive multi-receiver video streaming. The approach builds on our earlier developed video coding scheme which uses content-based networking for efficient multi-receiver video streaming. The coding scheme allows each receiver to independently and with fine granularity customize the video stream with respect to regions of interest, the signal-to-noise ratio for the luminance and the chrominance planes, and the temporal resolution. The contribution is a novel adaptation scheme which combines such video streaming with state-of-the-art techniques from the field of adaptation to provide receiver-driven multi-dimensional adaptive video streaming. Each video receiver may independently and unilaterally adapt the video quality along multiple independent quality dimensions, to match processing and networking resources currently available and according to own preferences. Results from experiments demonstrate (1) adaptation to variations in available bandwidth and CPU resources, roughly over two orders of magnitude, (2) that fine grained adaptation is feasible given radically different user preferences, and (3) for any reasonable balanced combination of bandwidth and CPU availability, both resource utilization and user utility can be kept at a high level. Currently we are looking more into the regulation part of the adaptation problem, addressing issues related to when and how a receiver should transit from one configuration to another. The ability to receive different regions in different qualities may allow virtual pan and zoom in very high quality video streams based on visual focus—high quality for the regions looked at and less quality in the periphery. Responsiveness and smoothness are critical aspects in such scenarios. REFERENCES 1. S. D. Servetto, R. Puri, J.-P. Wagner, P. Scholtes, and M. Vetterli, “Video Multicast in (Large) Local Area Networks,” in Proceedings of IEEE INFOCOM, 2, pp. 733–742, June 2002. 2. A. Ganjam and H. Zhang, “Internet Multicast Video Delivery,” Proceedings of the IEEE 93, pp. 159–170, January 2005. 3. J. Liu, B. Li, and Y.-Q. Zhang, “Adaptive Video Multicast over the Internet,” IEEE Multimedia 10, pp. 22–33, January/March 2003. 4. S. McCanne, M. Vetterli, and V. Jacobson, “Low-Complexity Video Coding for Receiver-Driven Layered Multicast,” IEEE Journal of Selected Areas in Communications 15, pp. 983–1001, August 1997. 5. V. S. W. Eide, F. Eliassen, and J. A. Michaelsen, “Exploiting Content-Based Networking for Video Streaming,” in Proceedings of the ACM Multimedia, (Demo), ACM MM’04, NY, USA, pp. 164–165, October 2004. 6. V. S. W. Eide, F. Eliassen, and J. A. Michaelsen, “Exploiting Content-Based Networking for Fine Granularity MultiReceiver Video Streaming,” in Proceedings of MMCN’05, SPIE, USA, 5680, pp. 155–166, January 2005. 7. F. Jensen, “Adaptive Video Streaming over an Event Notification Service,” September 2005. Master Thesis, Department of Informatics, University of Oslo, Norway. 8. A. Carzaniga, M. J. Rutherford, and A. L. Wolf, “A Routing Scheme for Content-Based Networking,” in Proceedings of IEEE INFOCOM, 2, pp. 918–928, (Hong Kong, China), March 2004. 9. G. Eisenhauer, F. E. Bustamante, and K. Schwan, “Event Services in High Performance Systems,” Cluster Computing 4(3), pp. 243–252, 2001. 10. P. T. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec, “The Many Faces of Publish/Subscribe,” ACM Computing Surveys (CSUR) 35, pp. 114–131, June 2003. 11. B. Segall, D. Arnold, J. Boot, M. Henderson, and T. Phelps, “Content Based Routing with Elvin4,” in Proceedings of AUUG2K, Canberra, Australia, June 2000. 12. L. Opyrchal, M. Astley, J. S. Auerbach, G. Banavar, R. E. Strom, and D. C. Sturman, “Exploiting IP Multicast in Content-Based Publish-Subscribe Systems.,” in Middleware, LNCS 1795, pp. 185–207, Springer, 2000. 13. P. R. Pietzuch and J. M. Bacon, “Hermes: A Distributed Event-Based Middleware Architecture,” in Proceedings of DEBS’02, Vienna, Austria, pp. 611–618, IEEE, July 2002. 14. A. Carzaniga, D. S. Rosenblum, and A. L. Wolf, “Design and Evaluation of a Wide-Area Event Notification Service,” ACM Transactions on Computer Systems 19, pp. 332–383, August 2001. 15. V. S. W. Eide, F. Eliassen, O. Lysne, and O.-C. Granmo, “Extending Content-based Publish/Subscribe Systems with Multicast Support,” Tech. Rep. 2003-03, Simula Research Laboratory, July 2003. 16. W. Li, “Overview of Fine Granularity Scalability in MPEG-4 Video Standard,” IEEE Transactions on Circuits and Systems for Video Technology 11, pp. 301–317, March 2001. 17. S. Bowers, L. Delcambre, D. Maier, C. Cowan, P. Wagle, D. McNamee, A.-F. L. Meur, and H. Hinton, “Applying adaptation spaces to support quality of service and survivability,” in DISCEX ’00, 2, pp. 271–283, Jan. 2000. 18. D. Gotz and K. Mayer-Patel, “A General Framework for Multidimensional Adaptation,” in Proceedings of the ACM Multimedia, pp. 612–619, ACM, 2004. 19. S.-F. Chang and A. Vetro, “Video Adaptation: Concepts, Technologies, and Open Issues,” in Proceedings of the IEEE, 93, pp. 148–158, January 2005. 20. V. S. W. Eide, F. Eliassen, O.-C. Granmo, and O. Lysne, “Supporting Timeliness and Accuracy in Distributed Realtime Content-based Video Analysis,” in Proceedings of the ACM Multimedia, USA, pp. 21–32, November 2003. 21. V. S. W. Eide, O. C. Granmo, F. Eliassen, and J. A. Michaelsen, “Real-time Video Content Analysis: QoS-Aware Application Composition and Parallel Processing,” ACM Transactions on Multimedia Computing, Communications, and Applications, (TOMCCAP) 2, pp. 149–172, May 2006. 22. W. T. Ooi and R. van Renesse, “Distributing media transformation over multiple media gateways.,” in Proceedings of ACM Multimedia, September 30 - October 05, Canada, pp. 159–168, 2001. 23. J. Huang, C. Krasic, J. Walpole, and W. Feng, “Adaptive Live Video Streaming by Priority Drop,” in IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS 2003), USA, IEEE Computer Society, 2003. 24. C. Krasic, A Framework for Quality-Adaptive Media Streaming: Encode Once — Stream Anywhere. PhD thesis, OGI School of Science & Engineering at Oregon Health & Science University, February 2004.

Fine Granularity Adaptive Multi-Receiver Video Streaming

Related documents

Products

Support

Fine Granularity Adaptive Multi-Receiver Video Streaming

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib