Cloud Streaming Jingwen Wang Video content distribution Nearly 90% of all the consumer IP traffic is expected to consist of video content distribution Web video like YouTube, P2P video like BitTorrent Content distribution requirements: Scalable and secure media storage, processing and distribution Anytime, anywhere, any device consumption Low latency, global distribution Cloud Provides a Better way Massive Rapid Scale File Transfer Low IT Costs High Reliability Accredited Security CloudStream Motivation: Current solution for deliver videos: progressive download via CDN Non-adaptive codec Video freeezes WANT: a SVC based video proxy that delivers high-quality Internet streaming adapting to variable conditions Video transcoding from original formats to SVC Video streaming to different users under Internet dynamics CloudStream Implement on one processor: Video transcoding to SVC is highly complex and transcoding speed is relatively slow a long duration before a user can access the transcoded video video freezes because of unavailability of transcoded video data To enable real-time transcoding and allow scalable support for multiple concurrent videos: Use Cloud: CloudStream Partition a video into clips and maps them to different compute nodes in order to achieve encoding parallelization 5 Concerns Encoding parallelization: Multiple video clips can be mapped to compute nodes at different time First-task first-server scheme can introduce unbalanced computation load transcoding jitter The transcoding component should not speed up video encoding at the expense of degrading the encoded video quality Streaming jitter: Video clips arrive at the streaming component in batches Demand surge of network resources leads to some data not arrive at the user at the expected arrival time 6 Metrics affecting Streaming Quality Streaming Quality: Access time Transcoding and streaming latencies Video freezes Transcoding and streaming jitters Video Content: The temporal motion metric TM The spatial detail metric SD Encoding Parallelization SVC coding structure: A video non-overlapping coding-independent GOPs A picture layers A layer coding-independent slices A slice macro-blocks Parallelism Across different compute nodes: inter-node parallelism Shared-memory address parallelism inside on compute node: intra-node parallelism Multi-level parallelization Scheme Multi-level encoding parallelization: GOPs: have the largest work granularity Inter-node parallelism ! Slices: independence, relative larger amount of work Intra-node parallelism! Each slice on a different CPU Intra-node Parallelism Intra-node Parallelism Limit the average computation time spend over the GOP to an upper bound Tth Shorten the access time ! The minimum number of slices encoded in parallel: Mmin Notations Definitions M Number of encoded parallel slices in a picture NMB, i, Nslice, i Number of MBs or slices in the i-th layer of a picture TMB, i(M), Tslice, i(M) Average encoding time of one NM or slice in the i-th layer with M parallel slices Tpic, i(M) Average encoding time of the i-th layer of a picture Tpic(M) Average encoding time of a picture TGOP(M) Average encoding time of a GOP Inter-node Parallelism Inter-node Parallelism Achieve real-time transcoding Transcoding jitters introduced by variation of GOP encoding time Goal: Minimize transcoding jitters Minimize the number of compute nodes Estimation of GOP’s Encoding Time A multi-variable regression model At a given encoding configuration Train videos with different video content characteristics TM and SD to build the regression model 90% of predicted values of the testing data are fallen within the 10% of error Problem Formulation Problem Formulation Based on the approximation of each GOP’s encoding time Given Q jobs Each job i has a deadline di and a processing time pi Multiple nodes in parallel, each job is processed with out preemption on each machine until its completion Lateness li can be computed as ci (actual completion time) – di Upper bound of lateness: τ WANT: bound the lateness of these jobs find the minimal number of machines N and minimize τ Complexity: Solution: Hallsh-based Mapping Lateness-first Mapping Hallsh-based Mapping Hallsh-based Mapping(HM): Set an upper bound of τ and find the minimal number of N satisfies it Use Hallsh machine scheduling algorithm as a blackbox minMS2approx algorithm 1. Pick ε = mini{(di - pi)/τ} 2. Run HallSh by increasing the number of machines until the maximum lateness among all jobs satisfies <(1 + ε) *τ, and set the machine number at this point to be K 3. HallSh will returns the scheduling results of all jobs. For a job with lateness over the upper bound on a particular machine j, move it along with all future jobs on machine K to a new machine K + j. Then compute the new completion time for all jobs on this new machine 4. N is the number of used machines Lateness-first Mapping Lateness-first Mapping(LFM): Compute the minimal number of N based on the deadline of each job and minimize τ for the given N Deciding the minimum N: Tpic(M)*R < SG *N Minimizing τ given N: For the i-th job in every N jobs, compute its adjusted processing time p’i=pi – (di – d1) Sort the n jobs by the reverse order of p’I Schedule the job with the largest p’I to the first available compute node, the second largest one to the second available node Test SVC: JSVM Environment: Input: 64 480p video GOPs GOP: 8 pictures Picture: 4 temporal layers, 2 spatial layers, 1 quality layer Up tp 4 cores on each compute node Slices number corresponding to cores Performance Average encoding time and speedup using up to 4 cores in intra-node parallelism LFM HM Comparing LFM & HM HM can successfully decide the appropriate compute node number and limit the transcoding jitters HM may require greater N in order to achieve the same level of lateness constraint than LFM Cloud Download Using Cloud Utilities to achieve high-quality content distribution for unpopular videos Motivation: Video content distribution dominates Internet traffic High-quality video content distribution is of great significance -1. high data health -2. high data transfer rate Motivation of Cloud Download High data health Data health: number of available full copies of the shared file in a BitTorrent swarm Data health < 1.0 is unhealthy Use data health to represent data redundancy level of a video file High data transfer rate Enables online video streaming Live & VoD State-of-the-art Techniques: CDN CDN(Content Distribution Network) Strategically deploying edge servers Cooperate to replicate or move data according to data popularity and server load User obtains copy from a nearby edge server CDN: limited storage and bandwidth Not cost-effective for CDN to replicate unpopular videos the edge servers Charged facility only serving the content providers who have paid State-of-the-art Techniques: P2P P2P(Peer-to-Peer) End users forming P2P data swarms Data directly exchanged between peers Real strength shows for popular file sharing P2P: poor performance for unpopular videos Too few peers Low data health Low data transfer rate Neither of CDN and P2P work well in distributing unpopular videos, due to low data health or low data transfer rate Worldwide deployment of cloud utilities provides a novel perspective to solve the problem: Cloud Download! Cloud Download Cloud High data rate ! Cloud Download Firstly, a user sends video request to the cloud Subsequently, the cloud downloads the requested video from the file link and stores it in the cloud cache User retrieve the requested video from the cloud with hight data rate via the intra-cloud data transfer acceleration User-side energy Efficiency Commonly download an unpopular video A common user keeps his computer (& NIC) powered-on for long hours Much Energy is wasted while waiting Cloud download an unpopular video The user can just be “offline” When the video is ready, quickly retrieve it in short time User-side energy efficient! Cloud Download: View Startup Delay The only drawback of Cloud Download: For some videos, the user must wait for the cloud to download it: View startup delay This drawback is effectively alleviated By the implicit and secure data reuse among users The cloud only downloads a video when it is requested for the first time: Cloud cache! Subsequent requests directly satisfied Secure because oblivious to users Data reuse rate -> 87% System Architecture Video request Data transfer (high data rate) Data store/cache Data download Check cache Component Function ISP Proxy: receive & restrict requests in each ISP Task Manager: check cache Task Dispatcher: load balance Downloaders: Cloud download data Cache: store and upload data Hardware Composition Building Block # of servers Memory Storage Bandwidth ISP Proxy 6 8 GB 250 GB 1 Gbps (Intranet), 0.3 Gbps (Internet) Task Manager 4 8 GB 250 GB 1 Gbps (Intranet) Task Dispatcher 3 8 GB 460 GB 1 Gbps (Intranet) 460 GB 1 Gbps (Intranet), 0.325 Gbps (Internet) Downloaders 140 Cloud Cache 400 chunk servers 93 upload servers 3 index servers 8 GB 8 GB 4 TB (chunk server), 1 Gbps (Intranet), 250 0.3 Gbps (Internet) GB (upload server) Cache Capacity Planning & Replacement Strategy Handel 0.22M daily requests Average video size: 379MB Video cache duration: <7 days Thus, C=372MB*0.22M*7= 584TB Cache replacement strategies 17 days trace-driven simulations FIFO vs. LRU vs. LFU FIFO worst, LFU best! Performance Evaluation Dataset Complete running log of the VideoCloud system in 17 days: Jan.1,2011 – Jan. 17, 2011 3.87M video requests, around 1.0M unique videos Metrics Data transfer rate View startup delay Energy efficiency Data transfer rate & View startup delay Energy Efficiency User-side energy efficiency E1: users’ energy consumption using common download Eu: users’ energy consumption using cloud download User-side energy efficiency =(E1 - Eu)/E1 = 92% Overall energy efficiency Ec: the cloud’s energy consumption E2: the total energy consumption of the cloud and users, so E2 = Ec + Eu Overall energy efficiency = (E1 – E2)/E1 = 86% Cloud Download application Cloud Transcoding for mobile users http://xf.qq.com Mobile user submits a video linnk and the transcoding parameters to the cloud The cloud downloads the video from Internet via cloud download The cloud transcodes the downloaded video and transfers the transcoded video back to user References Huang et al., Cloudstream: Delivering highquality streaming videos through a cloud-based svc proxy, INFOCOM 2011 Huang et al., Cloud download: using cloud utilities to achieve high-quality content distribution for unpopular videos, ACM Multimedia 2011 http://www.slideshare.net/AmazonWebServices/aws-for-media-content-inthe-cloud-miles-ward-amazon-web-services-and-bhavik-vyas-aspera The QQCyclone platform. http://xf.qq.com. Thank you !