Fine Granularity Adaptive Multi-Receiver Video Streaming

advertisement
Fine Granularity Adaptive Multi-Receiver Video Streaming
Viktor S. Wold Eidea,b Frank Eliassenb,a Jørgen Andreas Michaelsenb Frank Jensenb
a Simula
Research Laboratory, P.O. Box 134, N-1325 Lysaker, Norway
of Oslo, P.O. Box 1080 Blindern, N-0314 Oslo, Norway
b University
ABSTRACT
Efficient delivery of video data over computer networks has been studied extensively for decades. Still, multi-receiver video
delivery is challenging, due to heterogeneity and variability in network availability, end node capabilities, and receiver
preferences. Our earlier work has shown that content-based networking is a viable technology for fine granularity multireceiver video streaming. By exploiting this technology, we have demonstrated that each video receiver is provided with
fine grained and independent selectivity along the different video quality dimensions region of interest, signal to noise
ratio for the luminance and the chrominance planes, and temporal resolution. Here we propose a novel adaptation scheme
combining such video streaming with state-of-the-art techniques from the field of adaptation to provide receiver-driven
multi-dimensional adaptive video streaming. The scheme allows each client to individually adapt the quality of the received
video according to its currently available resources and own preferences. The proposed adaptation scheme is validated
experimentally. The results demonstrate adaptation to variations in available bandwidth and CPU resources roughly over
two orders of magnitude and that fine grained adaptation is feasible given radically different user preferences.
Keywords: adaptive multi-receiver video streaming, scalable video coding, content-based publish-subscribe systems
1. INTRODUCTION
Efficient delivery of video data over computer networks, such as the Internet, has been studied extensively for decades.
As a result, relatively good solutions exist for streaming video from a sender to a single receiver. However, the content
of a single video stream is often of interest to a number of receivers simultaneously. Delivering such a video stream
efficiently to a potentially large number of receivers is complicated and represents a challenge even within large LANs.1
Video streaming over WANs or the Internet is even harder. Despite extensive research multi-receiver video streaming still
represents a challenge which awaits satisfactory solutions.2, 3
The challenge is to provide each video receiver with the best possible video quality when considering resource limitations and preferences, while maintaining efficiency and scalability at the sender side, in the network, and at the receiver
side. In other words, the challenge is to provide each receiver with a video stream which is customized according to individual preferences and currently available resources. The main difficulties are related to the combination of heterogeneity,
variability, and efficient one-to-many delivery, as discussed in the following.
The sources for heterogeneity are manifold. Video servers and clients are often connected to networks by diverse
technologies having different characteristics. Similarly, the end node capabilities may differ radically with respect to
processing capabilities, display resolution, and power availability. Receivers may also have different preferences regarding
the relative importance of the different video quality dimensions. Some prefer frame rate over frame quality and vice versa.
In addition to these somewhat static differences, video streaming systems have to cope with variability on a shorter
time scale. The video content itself changes over time, which often translates into variable resource requirements, for
example, with respect to bit rates and processing needs. The resource availability may also change over time. The available
bandwidth experienced by each receiver may vary due to congestion and changes in signal strength for wireless equipment.
Similarly, the processing capacity and power availability may also vary over time. If resources become scarce, some
receivers may prefer to sacrifice quality in the temporal dimension instead of reducing spatial quality. This illustrates that
different receivers may have different preferences regarding how to adapt to variations in resource availability.
Handling heterogeneity and variability is further complicated by the need for efficient one-to-many delivery. In unicast
delivery each client connects directly to a server. The server may then provide each client with a stream customized to the
user preferences and current resource availability. However, unicast is inefficient and does not scale, since the server has to
handle each and every video receiver individually. Network or application level multicast delivery may improve network
efficiency by moving and distributing the onus of packet replication and forwarding to downstream nodes. However, a
Further author information: V.S.W.E.:viktore@simula.no, F.E.:frank@ifi.uio.no, J.A.M.:jorgenam@ifi.uio.no, F.J.:fnjensen@ifi.uio.no
single multicast stream provides each client with little or no selectivity. Simulcast delivery may provide clients with a
choice between a few streams, each having a different tradeoff between quality characteristics and resource requirements.
These streams may be delivered on different multicast channels. In order to provide each receiver with selectivity, a number
of streams with different quality and resource characteristics are necessary. However, each additional stream carries some
redundant video data and reduces network efficiency. The tradeoff between selectivity and efficiency in practice makes
this approach rather coarse grained. A combination of layered coding and multicast,4 is also rather coarse grained, but
improves network efficiency as the amount of redundant information in different streams is reduced.
Fine granularity and efficient multi-receiver video streaming can be realized by exploiting content-based networking for
data delivery.5–7 Content-based networking systems are realized by overlay networks. A combination of several attributes
and values in each packet determine the overlay routing, forwarding, and delivery. In effect each receiver may unilaterally
customize the video quality in each and every dimension, such as region of interest, signal-to-noise ratio for the luminance
and chrominance planes, and temporal resolution.
This paper describes how state-of-the-art techniques from the field of adaptation can be bridged with video streaming
over content-based networking. A novel adaptation scheme is proposed which takes advantage of the fine grained selectivity provided by such video streaming systems. The scheme is validated experimentally and the results demonstrate that
fine grained adaptation is feasible given quite different user preferences. The experiments show that a receiver may adapt
to small variations in resource availability, over roughly two orders of magnitude, with respect to both bandwidth and CPU.
The rest of the paper is structured as follows. Sect. 2 presents background information on content-based networking
and gives an overview of how such communication systems can be used to realize a fine granularity multi-receiver video
streaming system. Sect. 3 describes our adaptation scheme for multi-dimensional adaptive video streaming which is able
to exploit all quality dimensions supported by the underlying video streaming system. Empirical results presented in Sect.
4 demonstrate the ability of our adaptation scheme to support fine grained multi-dimensional adaptation, while taking user
preferences and a wide range of resource availability situations into account. A comparison to related work on adaptive
multi-receiver video streaming is provided in Sect. 5. Sect. 6 concludes the paper and presents some ongoing work.
2. BACKGROUND
This section briefly describes content-based networking and how such systems can be used to support fine granularity
multi-receiver video streaming, the basis for the proposed unilateral receiver-driven adaptation scheme.
2.1. Content-Based Networking
In content-based networking, as described in,8 messages are forwarded based on content and not on an explicit address.
Each message contains a set of attribute-value pairs and clients express interest in certain messages by specifying predicates
over these attributes and values. The predicates are used by the network nodes for routing. Messages are forwarded based
on content-based routing tables and delivered only to clients having matching selection predicates. Filtering of messages
is pushed towards the source while replication is pushed towards the destinations. Consequently, each message should
traverse a link at most once. Distributed content-based publish-subscribe systems are intimately related to content-based
networking—clients express their interest by means of subscriptions and send messages by publishing.
Publish-subscribe communication has proven useful in a wide range of applications, including high performance systems.9 The expressiveness of the subscription languages provided by different kinds of publish-subscribe systems varies,
where content-based systems provide most expressiveness∗ . Examples of content-based publish-subscribe systems include.11–14 In these systems, the messages are called event notifications or just notifications for short. Clients inject
notifications into the network by publishing, as depicted in Fig. 1. Other clients express their interests using subscriptions, as predicates over the attribute-value pairs, and are notified accordingly. Both subscriptions and notifications are
being pruned inside the content-based network. As an example, consider the case where two clients are connected to the
same content-based network node, as illustrated in Fig. 1. Notifications are not forwarded in the network before at least
one client has expressed interest. When the first client subscribes and thereby registers interest in some notifications, the
subscription may get forwarded over the links numbered 5, 4, 3, 2, and 1. State is then maintained in the network nodes to
allow notifications to flow, e.g., on the reverse path. When the client connected by the link numbered 6 subscribes, the first
network node only forwards the subscription if this new subscription is not covered by the first subscription.
Supposing a content-based network contains a publisher that supplies financial market information, a notification
may for example contain the attributes and values [order=sell ticker=abc price=15 volume=2000]. An interested receiver may have subscribed using the predicates [ticker=abc price<20], and will be notified accordingly.
∗
For a survey on the publish-subscribe communication paradigm and the relations to other interaction paradigms see.10
A subscriber with the subscription [volume>2000] will not be
video server
video client(s)
notified. Note that attributes not specified become unconstrained.
network node
Content−based network
Architectures and algorithms for scalable wide area contentintra domain
subscribe
3
based publish-subscribe systems have been studied extenpublish
2
1
sively.12, 14 Complementary to the WAN case is the challenge
5 notify
7
4
of efficiently distributing very high rate event notifications besubscribe
subscribe
tween a large number of clients within a smaller region, e.g., a
6
8
notify
notify
LAN or an administrative domain. In Fig. 1 this corresponds to
Figure 1. Content-based networking example
the notifications forwarded over the links numbered 5, 6, and to
other clients within the same domain. An architecture for a distributed content-based event notification service targeted
at LAN or intra domain usage is described in.15 In short, a mapping from the “event notification space” to IP multicast
addresses is specified. Hence, a notification is mapped to an IP multicast address and efficiently forwarded to all clients
having matching subscriptions. A client may publish several thousand notifications, carrying several megabytes of data per
second, more than sufficient for streaming compressed high quality video.
2.2. Video Streaming over Content-Based Networking
The work reported in6 describes a video streaming system layered on top of a content-based network. By utilizing the rich
routing opportunities offered by the content-based network, the video streaming system allows video receivers (clients) to
select video quality along a set of Quality of Service (QoS) dimensions without affecting other clients or the video server.
A method for video encoding was devised to accommodate the publish-subscribe paradigm and allow for a fine grained
and independent selectivity in each quality dimension.
The video encoder partitions a set of captured video frames into multiple fragments that each map to a point in the QoS
space in such a way that the full set of fragments span the QoS space. The video encoder then encapsulates each part of
the video signal in a notification by marking up the binary video data with attributes and values describing where in the
QoS space the fragment belongs. Each notification is published into the content-based network, and routed by the network
towards clients that have expressed interest through subscriptions. Thus, clients may opt to receive only a subset of the full
video signal, depending on their needs and resource availability. Efficient delivery of video data is maintained in terms of
network utilization and end node decoding requirements.
The video coding scheme currently supports selectivity along the following video quality dimensions: region of interest,
signal to noise ratio for the luminance and the chrominance planes, and temporal resolution. The techniques used to
achieve independent selectivity for each of these dimensions are briefly described in the following.
A main principle of the encoding scheme is that of layered encoding. In layered encoding the video signal is encoded
into a number of layers—a higher layer encodes video data corresponding to higher quality. The layers are coded cumulatively in order to reduce the amount of redundant information across layers. A client requests a particular quality
in a QoS dimension by subscribing to notifications corresponding to the requested quality level and all lower levels for
that dimension. Since the data corresponding to a quality level in one QoS dimension is encapsulated independently from
data of other QoS dimensions, each video receiver may independently customize the video signal along different video
quality dimensions by subscribing to the corresponding notifications. This is in contrast to, e.g., Receiver-driven Layered
Multicast4 where the sender determines the QoS dimensions to be affected for each additional layer.
Video receivers may select the region of interest in terms of so-called superblocks. A superblock contains a number of
16 × 16 pixel macroblocks, is self contained and represents the smallest selectable region. Superblocks are indexed by a
row and a column number and the attribute names are row and col respectively. With respect to colors, the luminance part
of the video signal is processed and encapsulated in notifications separately from the chrominance part. In the signal-tonoise-ratio (SNR) dimension, a layered coding is used, which relies on the Discrete Cosine Transform (DCT) and bit-plane
coding.16 In essence, notifications for the base layer contain the most significant bits. The attributes for selecting the SNR
in the luminance and chrominance dimensions are named qy (quality luminance) and qc (quality chrominance) respectively.
As an example, a receiver may subscribe to the luminance part of the video signal at a higher quality than the chrominance
part: [qy<=2 qc<=1]. The temporal dimension is also realized by a layered coding scheme, where each additional layer
increases (doubles) the frame rate. The superblocks in the first frame in a group of pictures (GOP) are intra coded and thus
self contained. The rest of the GOP is coded to exploit temporal redundancy (similar to I and P frames in MPEG). The
attribute for selecting the temporal resolution is named tl (temporal layer).
The number of notifications generated per video frame is determined by the number of superblocks and the number of
luminance and chrominance SNR layers. The current implementation uses a base layer and three enhancement layers for
Figure 2. Four video clients with different interests
the temporal, luminance, and chrominance dimensions. With respect to the region of interest dimension, three rows and two
columns are used. Clearly, the number of notifications generated can be substantial. However, content-based routers may
encapsulate several notifications in a single packet destined for another content-based router in order to increase efficiency
by exploiting the full payload capacity of the underlying network technology.
Suppose the notification: [sid=10 tl=1 qy=3 qc=-1 col=1 row=2 blob=q34i23QR...D], where sid is the video
stream identifier attribute, and blob is the binary video data. Because it is possible to leave dimensions unspecified, and thus
unconstrained in the subscription, a client may express interest in the full quality video stream by specifying [sid=10].
Another client interested in only the luminance data at reduced frame rate, would use another subscription such as [sid=10
tl<=2 qc<0]. A third client may receive all regions in full quality, but at the lowest frame rate, by using a subscription
such as [sid=10 tl=0]. More complex requirements can be expressed by combinations.
Fig. 2 illustrates that different video receivers may independently select different parts of the video signal (if the images
are unclear in the printed copy, please refer to the electronic version). The four screenshots in the figure illustrate the
following selections (from left to right): full quality and colors; only luminance in first superblock row, only chrominance
in second superblock row, and both luminance and chrominance is last superblock row; only some of the superblocks and
in lowest luminance quality; only luminance and low quality, except for a region having full quality and colors.
The flexibility of the video coding scheme allows a receiver to even specify different temporal resolutions (frame rates)
for different superblocks (frame regions). To fully exploit the flexibility of the video streaming system without sacrificing
expressiveness, a layer of indirection is needed to automatically obtain suitable configurations based on preferences and
available resources. Hence, a suitable adaptation scheme is needed. This is the topic of the following section.
3. MULTI-DIMENSIONAL RECEIVER-DRIVEN ADAPTATION
Here we show how the described video coding and encapsulation scheme can be used to support receiver-driven adaptive
video streaming. The proposed adaptation scheme is unilateral in the sense that each receiver dynamically and independently may adapt the quality of the received video according to own preferences and resource availability. This is a unique
feature of our approach made possible by the underlying content-based networking services and video encoding scheme.
3.1. Overview of Adaptation Scheme
Our approach to adaptation is to combine the above described techniques for fine-granularity multi-receiver video streaming
with state-of-the-art techniques for video adaptation. A candidate video adaptation technique must allow dynamic selection
of video data subscriptions (video quality) as a function of user preferences and resource availability. A candidate technique
must also handle multi-dimensional QoS and resource spaces. We hypothesize that due to the fine granularity selectivity of
the video streaming, the resulting adaptation scheme should be able to find a set of video data subscriptions that in resource
requirements closely matches any reasonably balanced combination of CPU and network bandwidth availability, while at
the same time satisfying individual user preferences. Consequently, the main principle of the adaptation scheme will be
that a receiver changes the video data subscriptions dynamically according to the resource availability.
While there are many techniques for selecting video quality as a function of resource availability and user preferences, our choice for this demonstration fell on multi-dimensional utility functions. A utility function is a measure of user
satisfaction as a function of QoS or resource availability. We model an adaptive video streaming system, inspired by adaptation spaces,17 as several video data subscription alternatives (the adaptation space). We are currently also investigating
other formal adaptation models, including,18 to see whether they may provide a better foundation for multi-dimensional
receiver-driven adaptation.
The different subscription alternatives differ in the quality they provide and resources they require. Determining the
optimal subscription alternative is done by a selection mechanism, based on information about resource availability, resource requirements of different subscription alternatives, and user utility. If a subscription alternative has too high resource
requirements compared to available resources, the service may fail in unpredictable ways such as failing to completely decode received video data. A task of the adaptation system is thus to select the subscription alternative that maximizes utility
while keeping resource consumption within the limits of the available resources. An adaptation system would normally
also include a policy for when to adapt. While this is an essential part of any adaptation system we do not consider details
of adaptation polices in this paper.
The service provided after adaptation consumes a certain amount of resources and provides a certain utility to the
user. Thus, any given point in the adaptation space has a corresponding point both in the resource and the utility space. In
contrast, different subscription alternatives may result in the same user satisfaction or have the same resource requirements,
that is, single points in the utility and resource spaces may have several corresponding points in the adaptation space.
Utility functions are in general n-dimensional functions taking values from an n-dimensional QoS space as argument.
However, such functions are generally very complex and challenging to define and to compute with.19 A simpler but less
accurate approach, also adopted in the work of,17 is to define overall utility as a weighted sum of a set of dimensional
utility functions. A dimensional utility function measures user satisfaction in one QoS-dimension only. The weights of
the overall utility functions usually correspond to the relative level of importance of each QoS-dimension as preferred by
the user. Utility functions normally map the degree of satisfaction into a real number, often in the range [0, 1], where 0
indicates that the corresponding quality of the service is below the minimum required by the user, while 1 indicates that
the corresponding quality is at or above the required maximum quality level.
3.2. Fine-Granularity Unilateral Adaptation Scheme
When applying the above principles to adaptive video streaming, the goal of maximizing utility means finding the subscription with highest utility within resource constraints. In the following we describe how QoS is specified and the form of the
utility function. We also discuss the issue of determining the resource requirements of each alternative set of subscriptions.
3.2.1. Quality and Resources
Quality in the temporal, luminance, and chrominance dimensions is specified as required quality layer. Superblocks (regions of interest) do not have an independent quality dimension, but rather we allow the required quality of each superblock
to be specified using the temporal, luminance, and chrominance QoS dimensions. As superblocks are independently coded,
the required quality of one superblock can be specified independently of the required quality of other superblocks. Hence
a receiver could subscribe to any subset of superblocks of the video image, each with completely different video qualities.
As described, in the Adaptation Spaces approach the resource requirement of each subscription alternative must be
known. While the approach allows any kind of resource to be taken into account, we will limit this study to consider
network bandwidth and CPU utilization as the dimensions of the resource space. Determining resource requirements for a
given quality level of a service is in general a hard problem, and many solution approaches exist but none that solves the
problem in general in a scalable manner. The general solution to this problem, however, is not the focus of this paper.
To estimate the resource requirements of subscription alternatives for the experiments described in this paper we applied
a combined approach of measurements and estimates. We measured the resource consumption of each possible subscription
involving all superblocks, where all superblocks have equal quality dimensions (in total 96 different possible subscriptions).
For subscriptions involving several superblocks with different quality dimensions we estimate the resource requirements as
the sum of resource requirements of each involved superblock assuming that each superblock contributes an equal amount
to the measured resource consumption. This seems like a feasible approach since video data of different superblocks are
encoded and encapsulated into notifications independently of each other. Hence it appears reasonable to assume that the
resource requirement of subscribing to multiple superblocks can be estimated as the sum of the resource requirements
of the subscriptions for each superblock. This assumption is supported by experiments.6 Of course, the validity of the
resource estimates is limited to the specific video content and CPUs used during the experiment. This is a general problem
in QoS management. In order to make the observed values also valid for other CPU types and classes of video content
scale factors are often used. Our experiments, however, do not attempt to validate such an approach.
3.2.2. Defining the Utility Function
When defining a utility function for the demonstration of how our video coding and encapsulation scheme support receiverdriven adaptive video streaming, our goal was to take all video quality dimensions into account. Hence utility should
address both the temporal, luminance and chrominance quality dimensions, as well as receiver preferences for region of
interest. The latter feature of utility might be used by receivers to express preferences for regions they do not want to
receive, or receive in lesser (or higher) quality than other regions. This approach makes the resulting utility function
somewhat different in form from a function defined as a plain weighted sum of dimensional utilities.
In accordance with the above goal, we define in equation 1 overall utility as the sum of per block utility, where N is the
number of superblocks, t is the temporal quality layer, y is the luminance quality layer, and c is the chrominance quality
layer. The utility contribution of superblock i is defined in equation 2 where Yi′ (t, y) denotes the utility contribution of the
luminance part of the signal and Ci′ (t, c) denotes the contribution of the chrominance part. These are dependent on the
temporal layer t since the temporal dimension determines the amount of luminance or chrominance data received.
N
X
U(t, y, c) =
Ui (t, y, c)
(1)
Ui (t, y, c) = Yi′ (t, y) + Ci′ (t, c)
(2)
i=0
Equations 3 and 4 model per block utility related to preference of region of interest. As indicated by the presence of the
two equations, region of interest preference can be specified independently in the luminance and chrominance dimensions.
The symbol sgn denotes the sign function such that sgn (x) = 1 if x > 0, −1 if x < 0, and 0 if x = 0.
Yi′ (t, y) = Yi (y) + sgn (Yi (y)) T i (t)
(3)
Ci′ (t, c) = Ci (c) + sgn (Ci (c)) T i (t)
(4)
In order to explain equations 3 and 4 we first need to look at equations 5, 6 and 7 below. These define per block weighted
utility for the temporal, luminance and chrominance QoS dimensions respectively. The symbols Kt , Ky and Kc denote
dimensional utility functions while the symbols Wt , Wy and Wc denote receiver preference weights assigned to each QoS
dimension. These functions and weights apply to all blocks in an image. The weights may take values in the domain [0, 1]
P
such that i∈{t,y,c} Wi = 1. Additionally a user can specify per block preference weights of the temporal, luminance and
chrominance dimensions denoted by the symbols Wit , Wiy , Wic in the equations below, each with value domain [0, 1]. Wiy and
Wic may also take the special value −1 to indicate that the superblock i is not wanted by the receiver for the corresponding
dimension. The idea is that subscriptions that include a superblock that is not wanted by the receiver will pay a penalty
of negative utility for the corresponding dimension, that is, the per block weighted dimensional utilities Yi and Ci may
take negative values. This penalty is also reflected in equations 3 and 4. Because of the use of the sign function in these
equations, the utility functions they define may take negative values. For any subscription which includes a superblock
not wanted by the receiver, there exists at least one subscription with higher utility—the same subscription, but where
superblocks not wanted are not included.
T i (t) = Wt Wit Kt (t)
(5)
Yi (y) = Wy Wiy Ky (y)
(6)
Ci (c) = Wc Wic Kc (c)
(7)
Note that in this paper the dimensional utility functions Kt , Ky and Kc are defined as a set of coefficient values, each specifying the utility value for a quality layer of a QoS-dimension. This can conveniently be represented as a one-dimensional
array for each dimensional utility function. For examples we refer to later sections of this paper.
3.2.3. Limiting QoS Variation Across Regions
Since a receiver may assign different preference weights to the different regions (superblocks) in a video image, the video
quality across regions may vary widely. While this may be suitable for some application types, such as parallel processing
of video content, it may be less suitable for others. In order to let receivers limit the variability of QoS across regions,
we introduce a mechanism that constrains quality deviation across regions. This mechanism allows a receiver to specify a
quality deviation value Dmax
for each quality dimension d. Dmax
is a normalized measure for maximum allowed variation
d
d
of quality in dimension d across regions such that Dmax
=
0
means
that no variation of quality in dimension d is allowed
d
between regions, while Dmax
=
1
means
there
is
no
constraints
on
the
quality variation. Otherwise the allowed variation
d
increases as the value of Dmax
increases
from
0
to
1.
d
For a given subscription alternative and quality dimension d, the quality deviation Dd is defined as the normalized sum
of vector distances, Dd,i , between quality layers over all regions i, i = 1, . . . , N. In equations 8 and 9, lid is the quality
layer subscribed to for quality dimension d in region i, nd is the total number of quality layers in dimension d, and N is the
v
number of superblocks.
u
t N
N
X
X
2
1
Dd =
Dd,i
(8)
Dd,i =
lid − ldj
(9)
q
j=1
(nd − 1)N N2 i=1
The use of quality deviation constraints, effectively limits the search space of subscription alternatives as all subscription
not satisfying the specified maximum quality deviation are excluded as candidate configurations.
4. EMPIRICAL RESULTS
The purpose of this section is to determine experimentally whether video streaming over content-based networking provides
a firm foundation for realizing fine grained and multi-dimensional adaptation. The fine grained and independent selectivity
should give different receivers the ability to customize the video stream according to their preferences, while the fine
full
full
+3
75
50
TL
QY
QC
utility
null
0
+1
base
25
null
none
200 400 600 800 1000 1200 1400
BW (kbps)
50
TL
QY
QC
utility
none
0 10 20 30 40 50 60 70 80 90 100
CPU (%)
(b) utility
(c) bandwidth adaptation
Figure 3. Results for no regional variation in quality (Action)
(d) CPU adaptation
full
full
+3
+3
75
50
base
TL
QY
QC
utility
null
0
25
none
200 400 600 800 1000 1200 1400
BW (kbps)
(b) utility
(c) bandwidth adaptation
Figure 4. Results for no regional variation in quality (Detail)
layer
+1
75
+2
utility (%)
layer
+2
(a) combined resource utilization
25
+1
base
null
50
TL
QY
QC
utility
utility (%)
(a) combined resource utilization
layer
+1
base
75
+2
utility (%)
layer
+2
utility (%)
+3
25
none
0 10 20 30 40 50 60 70 80 90 100
CPU (%)
(d) CPU adaptation
grained scalability, in terms of bit rates and decoding requirements, should allow each video receiver to find a configuration
that closely matches the resource availability. Combined, this should allow receiver-driven adaptation over a wide range of
resource availabilities with fine granularity.
Table 1. Dimensional weights
In the following three experiments, we test to what
Action
Detail
Parallel
extent the adaptation system can select a configuration
W
W
W
W
W
W
W
Wy Wc
t
y
c
t
y
c
t
that utilizes the available resources and maximizes the
Weights
0.7
0.2
0.1
0.1
0.6
0.3
0.7
0.2
0.1
utility for the receiver, given different receiver preferences and their resource constraints. The experiTable 2. Dimensional utility functions
ments will also validate whether the proposed adaptaAction
Detail
Parallel
tion scheme has sufficient expressiveness for capturing
Layers
K
K
K
K
K
K
K
Ky
Kc
t
y
c
t
y
c
t
and handling different receiver preferences. It should be
+3
0.9
0.7
0.4
0.4
0.9
0.7
0.9
0.7
0.4
noted that the proposed scheme is not meant to be ex+2
0.8 0.6 0.3 0.3 0.8 0.6 0.8 0.6 0.3
posed to human users directly. Rather we assume that it
+1
0.7 0.5 0.2 0.2 0.7 0.5 0.7 0.5 0.2
is possible to find a more appropriate end user model and
base
0.6 0.4 0.1 0.1 0.6 0.4 0.6 0.4 0.1
that a mapping to our system level model can be derived.
null
0.0 0.0
0.0 0.0
0.0 0.0
It should also be noted that these experiments only consider how different receivers would choose an appropriate configuration, given their resource constraints and preferences.
The regulation part of adaptation, which deals with responsiveness, smoothness, and other issues related to when a receiver
should transition from one configuration to another, is considered a separate issue and not addressed any further here.
4.1. Handling Different Preferences
In this subsection we introduce two receivers with differing interests. These two different and conflicting sets of interests
are used to illustrate how the adaptation system configures the underlying video receiving software and to what extent
the system is able to maximize a given receivers preference when available resources are constrained. We name the two
receivers Action and Detail as mnemonics for their respective sets of preferences.
By inspecting the dimensional weights in Table 1 we see that Action prefers temporal resolution as his weight for
the temporal dimension is high relative to the other dimensions. Detail prefers high quality luminance data instead. We
should expect these differences in preference to result in different adaptations of the video stream. The full preference
specifications for the two receivers are given in Table 1 and 2 (the preference specification for Parallel will be used in
the last experiment, cf. Sect. 4.3). In the current experiment, we allow no regional variation in quality by specifying
= Dmax
= 0 (cf. Sect. 3.2.3).
= Dmax
Dmax
c
y
t
We simulated how the adaption system would configure the underlying mechanism given a set of preferences and
resource constraints. By not considering the regional dimension we arrive at 96 configuration possibilities that receive
(a) combined resource utilization
(b) utility
(c) combined resource utilization
(d) utility
Figure 5. Results from full regional variations in luminance and chrominance quality: (a)(b) Action and (c)(d) Detail
video data. The simulation decreases the resource availability for both dimensions in equal steps until no configuration
alternatives are left. The CPU availability started at 100 % and was reduced by 1 % in each step. The bandwidth availability
started at 1400 kbps, which was enough to receive the full stream, and was then reduced in 5 kbps steps. The two
dimensions were reduced independently, resulting in 28000 different resource availability scenarios. For each scenario,
all configurations with resource requirements less than or equal to the available resources were considered, and only the
configuration with the highest utility survived. In cases where the utility was equal the configuration with the lowest
resource requirement was selected.
The results from the simulations of the Action and Detail preference sets are visualized in Fig. 3 and 4 respectively
(if the images are unclear in the printed copy, please refer to the electronic version). Fig. 3(a) and 4(a) show the resource
utilization for each of the resource availability scenarios in each preference set. The x-axis shows the various possibilities
for CPU availability, while the y-axis shows the bandwidth availability. The intensity (color) represents to what degree the
surviving configuration exploits the resource availability.qThe resource utilization is a normalized metric composed of both
r 2 r 2
CPU and bandwidth utilization:
c
+ abb
ac
R=
(10)
√
2
In the above equation, rc and rb denote the required CPU and bandwidth for a given configuration, while ac and ab denote
the available CPU and bandwidth given by the resource scenario. I.e., CPU and bandwidth requirements are normalized
with respect to the available resources and combined using a vector distance calculation. If the normalized CPU and
bandwidth utilization are both one, R is also one. Similarly, if both are zero, R is zero, and if both are 1/2, R is 1/2.
In a region of the resource space where the resource availability is reasonably balanced between the dimensions, the
utilization is high. As seen by comparing Fig. 3(a) and 4(a), the shape of these regions are somewhat dependent on the
receivers preferences. Note that when resource availability is low in one dimension, it is not possible to exploit resource
availability in the other dimension. This can be seen along the bottom edge and left edge of both figures.
In a similar manner Fig. 3(b) and 4(b) present the resulting utility for a given resource scenario. The utility is normalized so that a value of 1 represents full utility. As expected, the utility is highest when both CPU and bandwidth are
unconstrained (upper right corner of the plot) since the receiver can receive and decode all dimensions in full quality. As
the resource availability falls towards the bottom edge and the left edge the configuration to a lesser and lesser degree
fulfills the receivers preference, until the resource availability is too constrained and no valid configurations exists.
Fig. 3(c) and 4(c) present different configurations, chosen by the adaptation system, to maximize utility while conforming to resource constraints in the bandwidth dimension. It is assumed here that CPU is in abundance. For both plots, the
utility is monotonically increasing as more resources are made available. However, different choices are made with respect
to quality dimensions as a result of differing preferences. In the case of Action, quality in the temporal dimension increases
early, while luminance quality data increases later as bandwidth availability increases. For Detail opposite choices are
made; luminance quality is prioritized while the temporal quality is increased when bandwidth allows, but never at the expense of luminance quality. Similarly, Fig. 3(d) and 4(d) present adaptations that maximize utility when the CPU resource
is constrained and bandwidth is plentiful. The trends observed for the bandwidth plots are also found here. However, CPU
usage does not vary significantly for different luminance and chrominance layers, as shown empirically in.6 Consequently,
the choices in figure 3(d) appear less clear.
4.2. Increasing QoS Variations Across Regions
We now extend the previous experiment by introducing selectivity in the regional dimension. The quality layers in each
superblock are configured individually resulting in 1012 potential configurations.
full
full
+3
75
50
TL:SB1,SB2
QY:SB1
QY:SB2
utility
null
0
(a) combined resource utilization
25
none
50 100 150 200 250 300 350 400
BW (kbps)
(b) utility
(c) bandwidth adaptation
Figure 6. Parallel processing results
layer
+1
base
75
+2
utility (%)
layer
+2
+1
50
base
TL:SB1,SB2
QY:SB1
QY:SB2
utility
null
0
5
10
15
CPU (%)
utility (%)
+3
25
20
none
25
(d) CPU adaptation
In this experiment we consider the configurations that arise when luminance and chrominance quality is allowed to
vary across regions, Dmax
= Dmax
= 1, while Dmax
= 0 as in the previous experiment. By constraining the tempoy
c
t
ral layer, so that all regions have the same frame rate, the number of valid configurations is reduced to just below 109 .
Table 3. Regional weights
The receiver preferences for Action and Detail are also used in this
experiment, but their preference profiles are extended with regional
Quality
Action
Detail
Parallel
weights (cf. Table 3). In this experiment the regional weights are
0.13 0.14 0.13 0.14 0.5 0.5
chosen so that it is beneficial, utility wise, to prioritize the center
Wiy
0.19 0.21 0.19 0.21 - 1
-1
0.15 0.18 0.15 0.18 - 1
-1
regions for both receivers. Allowing quality to vary across regions
0.13
0.14
0.13
0.14
1
-1
enables a more fine grained adaptation. The set of possible configuc
7
W
0.19
0.21
0.19
0.21
1
-1
rations is increased by a factor 10 compared to the previous experii
0.15
0.18
0.15
0.18
1
-1
ment. The results from this simulation are presented in Fig. 5.
4.3. Region of Interest and Parallel Processing
In this experiment, we consider a case in which the receivers are nodes in a parallel processing video analysis system.
Each node will perform some analysis task on a fraction of the full frame in order to partition the computational load. If
all nodes were to receive and decode a full frame only to analyze a fraction of it, the solution would not scale.20, 21 The
partitioning of load by distributing the regions amongst the available nodes is made possible by allowing negative weights
in the utility function (cf. Sect. 3.2.2). If the CPU requirements for the analysis tasks are variable, it would be beneficial
to adapt the requirements for the video decoding accordingly. Similarly, if the analysis task requires the communication of
a large amount of results, an ability to also reduce the bandwidth requirements for receiving video data gracefully would
also be desirable.
The parallel processing receivers are essentially the same, so we only need to consider one of them, dubbed Parallel.
Parallel uses the same preference set as Action. However, Parallel uses the regional weights to express which regions to
receive (cf. Table 3). By setting some of the regional weights to a negative value the adaptation system would incur a
penalty, utility wise, by including them. Here, only the luminance quality is allowed to vary between the regions, because
no chrominance data is received and Dmax
= 0 as in the preceding experiments. The number of possible configurations is
t
in this case 97 if we count a configuration where no data is received.
In Fig. 6, the results from the simulation of the parallel processing scenario are plotted. Note that the maximum
resource usage in these plots are lower than the other results presented in this section, because only a fraction of the full
frame is received and decoded. It is assumed that most of the CPU is needed to perform the analysis task. As seen in figure
6(a), there are several possibilities for utilizing available resources to a high degree even though only two regions and two
quality dimensions are considered.
Fig. 6(c) and 6(d) present the choices made by the adaptation system as bandwidth and CPU availability are increased.
The bandwidth plot is similar to figure 3(c). Here, the two regions are stepped up in luminance quality in turn. Fig. 6(d)
shows that both regions receive the best luminance quality early. Again, this is because no significant savings can be made
in the CPU dimension by selecting lower luminance quality. As seen in Fig. 6(c) and 6(d) the utility is rapidly increasing,
and remains high as long as both regions are receiving luminance data.
5. RELATED WORK
A pioneering work for handling both the multi-receiver and the heterogeneity challenge is RLM,4 receiver-driven layered
multicast. RLM combines layered video coding techniques with transport over several multicast channels and supports a
combination of spatial and temporal scalability. The mechanism provided for adaptation is to join/leave multicast groups.
Clients with low bandwidth connections may register interest in the base layer only, while other clients may subscribe to a
number of additional multicast addresses, as bandwidth permits. Regarding user preferences, a layering policy determines
the dimension to be refined for each additional layer. This layering policy is fixed at the sender side and determines how the
quality of the video is refined when a client subscribes to an additional multicast address. As an example, the first additional
layer may improve the spatial quality, while the second improves the frame rate. Although an interesting approach, some
receivers may prefer temporal resolution over spatial quality and vice versa. Such conflicts can not be resolved, since the
layering policy is fixed at the sender side. As the number of scalable video quality dimensions increases a large number of
multicast addresses is needed and the specification of layering policies becomes even more problematic.
In order to bridge the heterogeneity gap created by differences in resource availability, hardware capabilities, and software incompatibilities, media gateway systems have been proposed for streaming.22 These systems are overlay networks—
the media gateways are internal network nodes, while senders and receivers are at the edges. Gateways receive media
streams from upstream nodes, before forwarding the processed and potentially transformed streams to downstream gateways and receivers. The mechanism provided for adaptation is any kind of video filtering and transformation operation,
such as temporal scaling, signal-to-noise-ratio scaling, or spatially partitioning as a preparation step for parallel processing.
The cost associated with this flexibility is increased network and processing resource consumption and additional delay.
Although several receivers may share a single gateway, they may share interest in only some part of the video signal.
Hence, it seems difficult to handle such cases efficiently, processing and delivery wise.
In23, 24 a framework for real-time quality-adaptive media streaming is presented. The goal is to allow video data to
be encoded once and streamed anywhere, by adapting to bandwidth availability. In the scheme, called priority-progress
streaming, video data are transformed into a scalable representation. Spatial scalability is realized by transcoding DCT
coefficients hierarchically to a set of levels, while frame dropping is used for temporal scalability. This allows video data
to be broken up into small chunks which can then be assigned different priorities based on utility functions. The timeline is
divided into distinct time intervals, so-called adaptation windows. The most important chunks are sent first and whatever
remains in the previous adaptation window is dropped when the system transits to the next adaptation window. One-tomany streaming requires an overlay multicast structure,24 where each edge is realized by means of the priority-progress
unicast streaming approach. TCP is used for both unicast and multicast transport. The mechanism provided for adaptation
is dropping, by the sender or by a node within the multicast tree. The video quality may therefore differ throughout the
distribution tree. Since the priorities are assigned at the root of the tree, only a single policy may be specified for a single
session. Similar to RLM, the system seems unable to handle video receivers with different preferences regarding the
relative importance of the different video quality dimensions.
In,3 a survey of solutions for adaptive multicast over the Internet is presented. Similar to some of the listed approaches,
ours take advantage of scalable coding. Contrary to these listed approaches, our approach allows each video receiver to
independently and with fine granularity express interest in different parts of the video signal. Additionally, our system
supports more scalable video quality dimensions than what is usually the case, a complicating factor from an adaptation modelling point of view. In particular the incorporation of region of interest scalability necessitated a sophisticated
adaptation scheme, where each region within a frame is treated separately utility wise.
6. CONCLUSION AND FURTHER WORK
We have presented an approach for adaptive multi-receiver video streaming. The approach builds on our earlier developed
video coding scheme which uses content-based networking for efficient multi-receiver video streaming. The coding scheme
allows each receiver to independently and with fine granularity customize the video stream with respect to regions of
interest, the signal-to-noise ratio for the luminance and the chrominance planes, and the temporal resolution.
The contribution is a novel adaptation scheme which combines such video streaming with state-of-the-art techniques
from the field of adaptation to provide receiver-driven multi-dimensional adaptive video streaming. Each video receiver
may independently and unilaterally adapt the video quality along multiple independent quality dimensions, to match processing and networking resources currently available and according to own preferences. Results from experiments demonstrate (1) adaptation to variations in available bandwidth and CPU resources, roughly over two orders of magnitude, (2)
that fine grained adaptation is feasible given radically different user preferences, and (3) for any reasonable balanced
combination of bandwidth and CPU availability, both resource utilization and user utility can be kept at a high level.
Currently we are looking more into the regulation part of the adaptation problem, addressing issues related to when
and how a receiver should transit from one configuration to another. The ability to receive different regions in different
qualities may allow virtual pan and zoom in very high quality video streams based on visual focus—high quality for the
regions looked at and less quality in the periphery. Responsiveness and smoothness are critical aspects in such scenarios.
REFERENCES
1. S. D. Servetto, R. Puri, J.-P. Wagner, P. Scholtes, and M. Vetterli, “Video Multicast in (Large) Local Area Networks,”
in Proceedings of IEEE INFOCOM, 2, pp. 733–742, June 2002.
2. A. Ganjam and H. Zhang, “Internet Multicast Video Delivery,” Proceedings of the IEEE 93, pp. 159–170, January 2005.
3. J. Liu, B. Li, and Y.-Q. Zhang, “Adaptive Video Multicast over the Internet,” IEEE Multimedia 10, pp. 22–33,
January/March 2003.
4. S. McCanne, M. Vetterli, and V. Jacobson, “Low-Complexity Video Coding for Receiver-Driven Layered Multicast,”
IEEE Journal of Selected Areas in Communications 15, pp. 983–1001, August 1997.
5. V. S. W. Eide, F. Eliassen, and J. A. Michaelsen, “Exploiting Content-Based Networking for Video Streaming,” in
Proceedings of the ACM Multimedia, (Demo), ACM MM’04, NY, USA, pp. 164–165, October 2004.
6. V. S. W. Eide, F. Eliassen, and J. A. Michaelsen, “Exploiting Content-Based Networking for Fine Granularity MultiReceiver Video Streaming,” in Proceedings of MMCN’05, SPIE, USA, 5680, pp. 155–166, January 2005.
7. F. Jensen, “Adaptive Video Streaming over an Event Notification Service,” September 2005. Master Thesis, Department of Informatics, University of Oslo, Norway.
8. A. Carzaniga, M. J. Rutherford, and A. L. Wolf, “A Routing Scheme for Content-Based Networking,” in Proceedings
of IEEE INFOCOM, 2, pp. 918–928, (Hong Kong, China), March 2004.
9. G. Eisenhauer, F. E. Bustamante, and K. Schwan, “Event Services in High Performance Systems,” Cluster Computing 4(3), pp. 243–252, 2001.
10. P. T. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec, “The Many Faces of Publish/Subscribe,” ACM
Computing Surveys (CSUR) 35, pp. 114–131, June 2003.
11. B. Segall, D. Arnold, J. Boot, M. Henderson, and T. Phelps, “Content Based Routing with Elvin4,” in Proceedings of
AUUG2K, Canberra, Australia, June 2000.
12. L. Opyrchal, M. Astley, J. S. Auerbach, G. Banavar, R. E. Strom, and D. C. Sturman, “Exploiting IP Multicast in
Content-Based Publish-Subscribe Systems.,” in Middleware, LNCS 1795, pp. 185–207, Springer, 2000.
13. P. R. Pietzuch and J. M. Bacon, “Hermes: A Distributed Event-Based Middleware Architecture,” in Proceedings of
DEBS’02, Vienna, Austria, pp. 611–618, IEEE, July 2002.
14. A. Carzaniga, D. S. Rosenblum, and A. L. Wolf, “Design and Evaluation of a Wide-Area Event Notification Service,”
ACM Transactions on Computer Systems 19, pp. 332–383, August 2001.
15. V. S. W. Eide, F. Eliassen, O. Lysne, and O.-C. Granmo, “Extending Content-based Publish/Subscribe Systems with
Multicast Support,” Tech. Rep. 2003-03, Simula Research Laboratory, July 2003.
16. W. Li, “Overview of Fine Granularity Scalability in MPEG-4 Video Standard,” IEEE Transactions on Circuits and
Systems for Video Technology 11, pp. 301–317, March 2001.
17. S. Bowers, L. Delcambre, D. Maier, C. Cowan, P. Wagle, D. McNamee, A.-F. L. Meur, and H. Hinton, “Applying
adaptation spaces to support quality of service and survivability,” in DISCEX ’00, 2, pp. 271–283, Jan. 2000.
18. D. Gotz and K. Mayer-Patel, “A General Framework for Multidimensional Adaptation,” in Proceedings of the ACM
Multimedia, pp. 612–619, ACM, 2004.
19. S.-F. Chang and A. Vetro, “Video Adaptation: Concepts, Technologies, and Open Issues,” in Proceedings of the IEEE,
93, pp. 148–158, January 2005.
20. V. S. W. Eide, F. Eliassen, O.-C. Granmo, and O. Lysne, “Supporting Timeliness and Accuracy in Distributed Realtime Content-based Video Analysis,” in Proceedings of the ACM Multimedia, USA, pp. 21–32, November 2003.
21. V. S. W. Eide, O. C. Granmo, F. Eliassen, and J. A. Michaelsen, “Real-time Video Content Analysis: QoS-Aware
Application Composition and Parallel Processing,” ACM Transactions on Multimedia Computing, Communications,
and Applications, (TOMCCAP) 2, pp. 149–172, May 2006.
22. W. T. Ooi and R. van Renesse, “Distributing media transformation over multiple media gateways.,” in Proceedings of
ACM Multimedia, September 30 - October 05, Canada, pp. 159–168, 2001.
23. J. Huang, C. Krasic, J. Walpole, and W. Feng, “Adaptive Live Video Streaming by Priority Drop,” in IEEE Conference
on Advanced Video and Signal Based Surveillance (AVSS 2003), USA, IEEE Computer Society, 2003.
24. C. Krasic, A Framework for Quality-Adaptive Media Streaming: Encode Once — Stream Anywhere. PhD thesis, OGI
School of Science & Engineering at Oregon Health & Science University, February 2004.
Download