A Proxy Smoothing Service for Variable-Bit-Rate Streaming Video Jennifer Rexford AT&T Labs - Research Florham Park NJ http://www.research.att.com/~jrex Joint work with Subhabrata Sen, Don Towsley, and Andrea Basso Outline • Background and motivation – Burstiness of compressed video streams – Smoothing techniques for stored video • Online smoothing of variable-bit-rate video – Sliding-window smoothing algorithm – Performance evaluation on MPEG traces • Integration of smoothing with prefix caching – Caching initial frames of popular video streams – Resource allocation across multiple streams • Prototype proxy smoothing service – Software design of proxy service in Windows NT – MPEG-2 PC-based video streaming testbed • Conclusions and ongoing work Video Streaming Applications • Live, interactive video – Video teleconferencing, video phones, etc. – Tight delay constraints to support interactivity • Stored, non-interactive video – Movies, distance learning, Web videos, etc. – Video recorded in advance; loose delay constraints • Live, non-interactive video – Course lectures, news, sporting events, conferences – Video not recorded in advance; loose delay constraints Network Environment Challenges of Video Streaming • High bandwidth requirements of compressed video – 4-6 Megabits/second for high quality MPEG2 streams • Burstiness of frame sizes on several time scales – MPEG group-of-pictures structure (I, P, B frames) – Differences in action and detail within/across scenes • Bandwidth limitations on clients and links – 10 or 100 Mbps shared local area network – 27 Mbps cable channel, 1.5 Mbps ADSL • Lack of end-to-end control of path from source • Poor delay, throughput, and loss in the Internet Compressed Video Streams Approaches to Handling Variability • Constant-bit-rate encoding of each stream – Adjust quality of encoding to stay at constant rate – Quality degradation during scenes with action & detail • Statistical multiplexing of variable rate streams – Rely on mixing to reduce the aggregate peak rate – Limited effectiveness on access links • Selective discard of packets/frames in stream – Discard packets/frames during transient congestion – Noticeable degradation in video quality • Transcoding or layered encoding to reduce bit rate – Re-encode the video stream at different quality at proxy – Quality degradation; hard to transcode at link speeds Smoothing Stored Video For prerecorded video streams: • All video frames stored in advance at server • Prior knowledge of all frame sizes (fi, i=1,2,..,n) • Prior knowledge of client buffer size (b) workahead transmission into client buffer 2 n Server 1 b bytes Client number of bytes Smoothing Constraints U rate changes S L time (in frames) Given frame sizes {fi} and buffer size b – Buffer underflow constraint (Lk = f1 + f2 + … + fk) – Buffer overflow constraint (Uk = min(Lk + b, Ln)) – Find a schedule Sk between the constraints O(n) algorithm minimizes peak and variability Reducing the Peak Rate Limitations of Smoothing Model • Assumes prerecorded stored video but… need to support live and precorded video • Assumes smoothing is performed by server but… server is in the domain of another provider • Assumes end-to-end control of the network but… the Internet is decentralized • Assumes server knows the client buffer size but… the client may be in a different domain Online Smoothing Source or proxy can delay the stream by w time units: streaming video b bytes Source/Proxy stream with delay w Client Larger window w reduces burstiness, but… – Larger buffer at the source/proxy – Larger processing load to compute schedule – Larger playback delay at the client Online Smoothing Model proxy Ai B Si client b Di-w • Arrival of Ai bits to proxy by time i in frames • Smoothing buffer of B bits at proxy • Smoothing window (playout delay) of w frames • Playout of Di-w bits by client by time i • Playout buffer of b bits at client transmission of Si bits by proxy by time i Online Smoothing • Must send enough to avoid underflow at client Si must be at least Di-w • Cannot send more than the client can store Si must be at most Di-w + b • Cannot send more than the data that has arrived Si must be at most Ai • Must send enough to avoid overflow at proxy Si must be at least Ai - B max{Di-w, Ai - B} <= Si <= min{Di-w + b, Ai} Online Smoothing Constraints number of bytes Source/proxy has w frames ahead of current time t: U ? t L t+w-1 don’t know the future time (in frames) Modified smoothing constraints as more frames arrive... Smoothing Star Wars GOP averages 2-second window 30-second window • MPEG-1 Star Wars,12-frame group-of-pictures • Max frame 23160 bytes, mean frame 1950 bytes • Client buffer b=512 kbytes Reducing Computational Complexity • No need to compute schedule at every time unit – Limited information from new frame arrivals – Limited impact on trajectory of the schedule • Execute online algorithm every a time units – Perform O(w) work every a time units – Limit number of rate changes • Performance implications – Very small increases in peak and variance of rates – Setting a = w/2 performs almost as well as a = 1 Parameters in Smoothing Model • Algorithm parameters – – – – – Window w (in number of frame slots) Client buffer size b (in bytes) Source/proxy buffer size B (in bytes) Computation interval a (in frames) Frame-size prediction interval p (in frames) • Performance metrics – Peak rate of the smoothed stream – Coefficient of variation (standard-deviation/mean) – Effective bandwidth (given buffer and loss rate) Peak Rate vs. Window Size (varying client buffer size for MPEG-1 Wizard of Oz) • Dramatic decrease in bandwidth variability • Online algorithm approaches offline scheme • Ten-second window gives most of the gain Peak Rate vs. Client Buffer (varying window size for MPEG-1 Wizard of Oz) • Significant reductions with a few Mbytes of buffer • Diminishing returns for larger client buffer sizes • Window size w should scale with buffer size b Proxy vs. Client Buffer (varying prediction under 512-kbyte total buffer & 30-frame window) • Need buffer at each end for good performance • Even buffer for large P, more at proxy for small P • Simple prediction schemes are very effective Prefix Caching to Avoid Start-Up Delay • Avoid start-up delay for prerecorded streams – Proxy caches initial part of popular video streams – Proxy starts satisfying client request more quickly – Proxy requests remainder of the stream from server smooth over large window without large delay • Use prefix caching to hide other Internet delays – TCP connection from browser to server – TCP connection from player to server – Dejitter buffer at the client to tolerate jitter – Retransmission of lost packets apply to “point-and-click” Web video streams New Questions • Video streaming protocol – How to get the proxy in the path? – How to receive an initial copy of the prefix? – How to retrieve the remaining frames of the video? • Smoothing model – What changes in the smoothing constraints? – What changes in the basic performance properties? • Proxy resource allocation – How much prefix is needed to hide Internet delays? – How to allocate between caching and smoothing? – How to allocate resources across multiple streams? Protocol Issues • Ensuring that requests go through the proxy – Configuration of proxy in client browser or player – Placement of transparent proxy in the path • Caching of the initial frames of the video – Server replication of the prefix – Proxy prefetching of the prefix – Proxy caching of prefix after first request • Transparent retrieval of remaining frames – Range request operation in HTTP 1.1 – Absolute positioning in RTSP Changes to Smoothing Model • • • • Separate parameter s for client start-up delay Prefix cache stores the first w-s frames Arrival vector Ai includes cached frames Prefix buffer does not empty after transmission • Send entire prefix before overflow of bs • Frame sizes may be known in advance (cached) Ai bs bp Si bc Di-s Performance Evaluation • Comparison to original online smoothing model – – – – Pro: can have large window and small start-up delay Pro: performance is virtually indistinguishable Con: storing prefix nearly doubles buffer requirement Con: may be difficult to smooth at beginning of video • Allocation of prefix and smoothing buffers – Small prefix buffer limits size of smoothing window small window w restricts workahead smoothing – Large prefix buffer limits size of smoothing buffer small bs requires aggressive transmission schedule Peak Rate vs. Window Size (varying total proxy buffer size for MPEG-1 Wizard of Oz) • Convex, cup-shaped curve of peak rate vs. buffer • Simple binary search for optimal allocation • Heuristic: pick largest w that does not constrain b Peak Rate vs. Prefix Buffer Size (varying total proxy buffer size for MPEG-1 Wizard of Oz) Allocating Resources Across Streams • Performance issues – Limited buffer (M) and/or bandwidth (B) at proxy – Collection of V videos with different popularity – Videos with different sequences of frame sizes • Optimization problem – Allocate prefix buffer bp for each video v =1,…, V – Allocate smoothing buffer bs for each of nv requests – Obey constraint on buffer (M) or bandwidth (B) – Minimize the usage of the other resource (M or B) Simplifying the Problem • Complex resource allocation problem – Assign bp, bs, and w for each video v – Buffer requirement: sumv{bp(v) + nv * bs(v)} – Bandwidth requirement: sumv{nv * peak(v)} • Reduce problem to selecting w for each video – Select same bs and w across all requests for v – Select prefix buffer bp as first w-s frames – Select bs as max smoothing buffer for window w Greedy Algorithm • Further simplifying the problem – Selecting w determines bp(v), bs(v), and peak(v) – Consider the nv*peak(v) vs. bp(v)+nv*bs(v) curve – Curve is piecewise-linear, convex, non-increasing • Greedy algorithm for buffer constraint M – Select the video with steepest initial slope – Assign buffer space to this video for max gain – Repeat until reaching the buffer constraint M • Greedy algorithm for bandwidth constraint B – Repeat until not exceeding bandwidth constraint B #1 #4 #6 buffer for video 1 bandwidth for video 2 bandwidth for video 1 Illustration of Greedy Algorithm #2 #3 #5 buffer for video 2 Building a Smoothing Proxy • Performance results – Memory: a few megabytes of RAM is sufficient – CPU: 1-2 msec to smooth 30 sec (300 MHz PC) – Bandwidth: 2-4 Mbps feasible on personal computer • Solution with off-the-shelf components – 300 MHz Pentium Pro with 192 megabytes of RAM – Input and output on 10 megabit/second Ethernet – Windows NT operating system with WinSock 2.0 Reality Sets In • Video stream is packetized, not a fluid – Smoothing constraints must be applied to packets – Proxy cannot transmit the stream at arbitrary rates • System does not have support for traffic shaping – Cannot control the inter-packet spacing at fine scale – E.g., 2 msec spacing for 15-packets frames (30 fps) • Interrupt latency, timer jitter, and data copying – Limited control over time expiration times – Latency in processing I/O and timer operations – Need to avoid extra copying of video frames Time-Sharing the Processor • Reception of incoming packets – Smooth over more frames by receiving often – Avoid double-copy from kernel to user space – Avoid the worst-case scenario of overflow • Computation of smooth schedule – Must run often enough to maximize smoothing – Fortunately, does not need to read or write data • Transmission of packets according to schedule – Must run often enough to control packet spacing – Avoid the bad case of sending a large burst – Avoid the worst case of client underflow Key Design Decisions • Single thread of control – No operating system control over fine-grain sharing • High-performance counter for timing operations – Timers are too inaccurate (tens of milliseconds) – How often should the counter be checked? • Overlapped I/O to avoid double copying – Receive and send directly to/from the user-space buffer – How many outstanding sends and receives? • Explicit pacing of packet transmissions – How often should the send routine be invoked? LiveNet MPEG-2 Testbed (developed by Andrea Basso, Glenn Cash, and Reha Civanlar) • MPEG-2 encoder – MPEG-2 encoder board (MPEGXpress) – Software to read into buffers and stream into network • Real-time packetizer – Parses MPEG-2 stream and divides frames into slices – Packing slices into Real-Time Protocol (RTP) packets • MPEG-2 decoder – Software for packet reception and error concealment – MPEG-2 decoder board (DarimVision) LiveNet Testbed Conclusions • Online smoothing model – Applicable to many non-interactive applications – Significantly lowers burstiness of compressed video – Enables high-quality video across access networks • Prefix caching – Hides start-up delay for smoothing and other operations – Effective resource allocation schemes at the proxy • Practical application – Transparent to the origin video source/server – Implementation with commercial off-the-shelf parts – Integration with MPEG-2 and Real-Time Protocol Ongoing Work • Prototyping the proxy smoothing service – Completion of implementation of proxy service – Performance evaluation of parameterized system • Combining smoothing with other mechanisms – Discard, transcoding, feedback, and retransmission – Exploiting prefix cache to hide additional latency • Measurement of Web-initiated video streaming – Collection of video packet traces in AT&T WorldNet – Study of potential for (partial) caching at the proxy