A Proxy Smoothing Service for Variable-Bit-Rate Streaming Video Jennifer Rexford

A Proxy Smoothing Service for
Variable-Bit-Rate Streaming Video
Jennifer Rexford
AT&T Labs - Research
Florham Park NJ
Joint work with Subhabrata Sen, Don Towsley, and Andrea Basso
• Background and motivation
– Burstiness of compressed video streams
– Smoothing techniques for stored video
• Online smoothing of variable-bit-rate video
– Sliding-window smoothing algorithm
– Performance evaluation on MPEG traces
• Integration of smoothing with prefix caching
– Caching initial frames of popular video streams
– Resource allocation across multiple streams
• Prototype proxy smoothing service
– Software design of proxy service in Windows NT
– MPEG-2 PC-based video streaming testbed
• Conclusions and ongoing work
Video Streaming Applications
• Live, interactive video
– Video teleconferencing, video phones, etc.
– Tight delay constraints to support interactivity
• Stored, non-interactive video
– Movies, distance learning, Web videos, etc.
– Video recorded in advance; loose delay constraints
• Live, non-interactive video
– Course lectures, news, sporting events, conferences
– Video not recorded in advance; loose delay constraints
Network Environment
Challenges of Video Streaming
• High bandwidth requirements of compressed video
– 4-6 Megabits/second for high quality MPEG2 streams
• Burstiness of frame sizes on several time scales
– MPEG group-of-pictures structure (I, P, B frames)
– Differences in action and detail within/across scenes
• Bandwidth limitations on clients and links
– 10 or 100 Mbps shared local area network
– 27 Mbps cable channel, 1.5 Mbps ADSL
• Lack of end-to-end control of path from source
• Poor delay, throughput, and loss in the Internet
Compressed Video Streams
Approaches to Handling Variability
• Constant-bit-rate encoding of each stream
– Adjust quality of encoding to stay at constant rate
– Quality degradation during scenes with action & detail
• Statistical multiplexing of variable rate streams
– Rely on mixing to reduce the aggregate peak rate
– Limited effectiveness on access links
• Selective discard of packets/frames in stream
– Discard packets/frames during transient congestion
– Noticeable degradation in video quality
• Transcoding or layered encoding to reduce bit rate
– Re-encode the video stream at different quality at proxy
– Quality degradation; hard to transcode at link speeds
Smoothing Stored Video
For prerecorded video streams:
• All video frames stored in advance at server
• Prior knowledge of all frame sizes (fi, i=1,2,..,n)
• Prior knowledge of client buffer size (b)
 workahead transmission into client buffer
b bytes
number of bytes
Smoothing Constraints
rate changes
time (in frames)
Given frame sizes {fi} and buffer size b
– Buffer underflow constraint (Lk = f1 + f2 + … + fk)
– Buffer overflow constraint (Uk = min(Lk + b, Ln))
– Find a schedule Sk between the constraints
 O(n) algorithm minimizes peak and variability
Reducing the Peak Rate
Limitations of Smoothing Model
• Assumes prerecorded stored video
 but… need to support live and precorded video
• Assumes smoothing is performed by server
 but… server is in the domain of another provider
• Assumes end-to-end control of the network
 but… the Internet is decentralized
• Assumes server knows the client buffer size
 but… the client may be in a different domain
Online Smoothing
Source or proxy can delay the stream by w time units:
b bytes
stream with
delay w
Larger window w reduces burstiness, but…
– Larger buffer at the source/proxy
– Larger processing load to compute schedule
– Larger playback delay at the client
Online Smoothing Model
• Arrival of Ai bits to proxy by time i in frames
• Smoothing buffer of B bits at proxy
• Smoothing window (playout delay) of w frames
• Playout of Di-w bits by client by time i
• Playout buffer of b bits at client
 transmission of Si bits by proxy by time i
Online Smoothing
• Must send enough to avoid underflow at client
 Si must be at least Di-w
• Cannot send more than the client can store
 Si must be at most Di-w + b
• Cannot send more than the data that has arrived
 Si must be at most Ai
• Must send enough to avoid overflow at proxy
 Si must be at least Ai - B
max{Di-w, Ai - B} <= Si <= min{Di-w + b, Ai}
Online Smoothing Constraints
number of bytes
Source/proxy has w frames ahead of current time t:
don’t know
the future
time (in frames)
Modified smoothing constraints as more frames arrive...
Smoothing Star Wars
GOP averages
2-second window
30-second window
• MPEG-1 Star Wars,12-frame group-of-pictures
• Max frame 23160 bytes, mean frame 1950 bytes
• Client buffer b=512 kbytes
Reducing Computational Complexity
• No need to compute schedule at every time unit
– Limited information from new frame arrivals
– Limited impact on trajectory of the schedule
• Execute online algorithm every a time units
– Perform O(w) work every a time units
– Limit number of rate changes
• Performance implications
– Very small increases in peak and variance of rates
– Setting a = w/2 performs almost as well as a = 1
Parameters in Smoothing Model
• Algorithm parameters
Window w (in number of frame slots)
Client buffer size b (in bytes)
Source/proxy buffer size B (in bytes)
Computation interval a (in frames)
Frame-size prediction interval p (in frames)
• Performance metrics
– Peak rate of the smoothed stream
– Coefficient of variation (standard-deviation/mean)
– Effective bandwidth (given buffer and loss rate)
Peak Rate vs. Window Size
(varying client buffer size for MPEG-1 Wizard of Oz)
• Dramatic decrease in bandwidth variability
• Online algorithm approaches offline scheme
• Ten-second window gives most of the gain
Peak Rate vs. Client Buffer
(varying window size for MPEG-1 Wizard of Oz)
• Significant reductions with a few Mbytes of buffer
• Diminishing returns for larger client buffer sizes
• Window size w should scale with buffer size b
Proxy vs. Client Buffer
(varying prediction under 512-kbyte total buffer & 30-frame window)
• Need buffer at each end for good performance
• Even buffer for large P, more at proxy for small P
• Simple prediction schemes are very effective
Prefix Caching to Avoid Start-Up Delay
• Avoid start-up delay for prerecorded streams
– Proxy caches initial part of popular video streams
– Proxy starts satisfying client request more quickly
– Proxy requests remainder of the stream from server
 smooth over large window without large delay
• Use prefix caching to hide other Internet delays
– TCP connection from browser to server
– TCP connection from player to server
– Dejitter buffer at the client to tolerate jitter
– Retransmission of lost packets
 apply to “point-and-click” Web video streams
New Questions
• Video streaming protocol
– How to get the proxy in the path?
– How to receive an initial copy of the prefix?
– How to retrieve the remaining frames of the video?
• Smoothing model
– What changes in the smoothing constraints?
– What changes in the basic performance properties?
• Proxy resource allocation
– How much prefix is needed to hide Internet delays?
– How to allocate between caching and smoothing?
– How to allocate resources across multiple streams?
Protocol Issues
• Ensuring that requests go through the proxy
– Configuration of proxy in client browser or player
– Placement of transparent proxy in the path
• Caching of the initial frames of the video
– Server replication of the prefix
– Proxy prefetching of the prefix
– Proxy caching of prefix after first request
• Transparent retrieval of remaining frames
– Range request operation in HTTP 1.1
– Absolute positioning in RTSP
Changes to Smoothing Model
Separate parameter s for client start-up delay
Prefix cache stores the first w-s frames
Arrival vector Ai includes cached frames
Prefix buffer does not empty after transmission
• Send entire prefix before overflow of bs
• Frame sizes may be known in advance (cached)
Performance Evaluation
• Comparison to original online smoothing model
Pro: can have large window and small start-up delay
Pro: performance is virtually indistinguishable
Con: storing prefix nearly doubles buffer requirement
Con: may be difficult to smooth at beginning of video
• Allocation of prefix and smoothing buffers
– Small prefix buffer limits size of smoothing window
 small window w restricts workahead smoothing
– Large prefix buffer limits size of smoothing buffer
 small bs requires aggressive transmission schedule
Peak Rate vs. Window Size
(varying total proxy buffer size for MPEG-1 Wizard of Oz)
• Convex, cup-shaped curve of peak rate vs. buffer
• Simple binary search for optimal allocation
• Heuristic: pick largest w that does not constrain b
Peak Rate vs. Prefix Buffer Size
(varying total proxy buffer size for MPEG-1 Wizard of Oz)
Allocating Resources Across Streams
• Performance issues
– Limited buffer (M) and/or bandwidth (B) at proxy
– Collection of V videos with different popularity
– Videos with different sequences of frame sizes
• Optimization problem
– Allocate prefix buffer bp for each video v =1,…, V
– Allocate smoothing buffer bs for each of nv requests
– Obey constraint on buffer (M) or bandwidth (B)
– Minimize the usage of the other resource (M or B)
Simplifying the Problem
• Complex resource allocation problem
– Assign bp, bs, and w for each video v
– Buffer requirement: sumv{bp(v) + nv * bs(v)}
– Bandwidth requirement: sumv{nv * peak(v)}
• Reduce problem to selecting w for each video
– Select same bs and w across all requests for v
– Select prefix buffer bp as first w-s frames
– Select bs as max smoothing buffer for window w
Greedy Algorithm
• Further simplifying the problem
– Selecting w determines bp(v), bs(v), and peak(v)
– Consider the nv*peak(v) vs. bp(v)+nv*bs(v) curve
– Curve is piecewise-linear, convex, non-increasing
• Greedy algorithm for buffer constraint M
– Select the video with steepest initial slope
– Assign buffer space to this video for max gain
– Repeat until reaching the buffer constraint M
• Greedy algorithm for bandwidth constraint B
– Repeat until not exceeding bandwidth constraint B
buffer for video 1
bandwidth for video 2
bandwidth for video 1
Illustration of Greedy Algorithm
buffer for video 2
Building a Smoothing Proxy
• Performance results
– Memory: a few megabytes of RAM is sufficient
– CPU: 1-2 msec to smooth 30 sec (300 MHz PC)
– Bandwidth: 2-4 Mbps feasible on personal computer
• Solution with off-the-shelf components
– 300 MHz Pentium Pro with 192 megabytes of RAM
– Input and output on 10 megabit/second Ethernet
– Windows NT operating system with WinSock 2.0
Reality Sets In
• Video stream is packetized, not a fluid
– Smoothing constraints must be applied to packets
– Proxy cannot transmit the stream at arbitrary rates
• System does not have support for traffic shaping
– Cannot control the inter-packet spacing at fine scale
– E.g., 2 msec spacing for 15-packets frames (30 fps)
• Interrupt latency, timer jitter, and data copying
– Limited control over time expiration times
– Latency in processing I/O and timer operations
– Need to avoid extra copying of video frames
Time-Sharing the Processor
• Reception of incoming packets
– Smooth over more frames by receiving often
– Avoid double-copy from kernel to user space
– Avoid the worst-case scenario of overflow
• Computation of smooth schedule
– Must run often enough to maximize smoothing
– Fortunately, does not need to read or write data
• Transmission of packets according to schedule
– Must run often enough to control packet spacing
– Avoid the bad case of sending a large burst
– Avoid the worst case of client underflow
Key Design Decisions
• Single thread of control
– No operating system control over fine-grain sharing
• High-performance counter for timing operations
– Timers are too inaccurate (tens of milliseconds)
– How often should the counter be checked?
• Overlapped I/O to avoid double copying
– Receive and send directly to/from the user-space buffer
– How many outstanding sends and receives?
• Explicit pacing of packet transmissions
– How often should the send routine be invoked?
LiveNet MPEG-2 Testbed
(developed by Andrea Basso, Glenn Cash, and Reha Civanlar)
• MPEG-2 encoder
– MPEG-2 encoder board (MPEGXpress)
– Software to read into buffers and stream into network
• Real-time packetizer
– Parses MPEG-2 stream and divides frames into slices
– Packing slices into Real-Time Protocol (RTP) packets
• MPEG-2 decoder
– Software for packet reception and error concealment
– MPEG-2 decoder board (DarimVision)
LiveNet Testbed
• Online smoothing model
– Applicable to many non-interactive applications
– Significantly lowers burstiness of compressed video
– Enables high-quality video across access networks
• Prefix caching
– Hides start-up delay for smoothing and other operations
– Effective resource allocation schemes at the proxy
• Practical application
– Transparent to the origin video source/server
– Implementation with commercial off-the-shelf parts
– Integration with MPEG-2 and Real-Time Protocol
Ongoing Work
• Prototyping the proxy smoothing service
– Completion of implementation of proxy service
– Performance evaluation of parameterized system
• Combining smoothing with other mechanisms
– Discard, transcoding, feedback, and retransmission
– Exploiting prefix cache to hide additional latency
• Measurement of Web-initiated video streaming
– Collection of video packet traces in AT&T WorldNet
– Study of potential for (partial) caching at the proxy