slides - network systems lab @ sfu

Cloud Streaming Jingwen Wang Video content distribution Nearly 90% of all the consumer IP traffic is expected to consist of video content distribution  Web video like YouTube, P2P video like BitTorrent  Content distribution requirements:  Scalable and secure media storage, processing and distribution  Anytime, anywhere, any device consumption  Low latency, global distribution Cloud Provides a Better way Massive Rapid Scale File Transfer Low IT Costs High Reliability Accredited Security CloudStream Motivation:  Current solution for deliver videos: progressive download via CDN  Non-adaptive codec  Video freeezes  WANT: a SVC based video proxy that delivers high-quality Internet streaming adapting to variable conditions  Video transcoding from original formats to SVC  Video streaming to different users under Internet dynamics CloudStream Implement on one processor:  Video transcoding to SVC is highly complex and transcoding speed is relatively slow  a long duration before a user can access the transcoded video  video freezes because of unavailability of transcoded video data To enable real-time transcoding and allow scalable support for multiple concurrent videos:  Use Cloud: CloudStream  Partition a video into clips and maps them to different compute nodes in order to achieve encoding parallelization 5 Concerns Encoding parallelization:  Multiple video clips can be mapped to compute nodes at different time  First-task first-server scheme can introduce unbalanced computation load  transcoding jitter  The transcoding component should not speed up video encoding at the expense of degrading the encoded video quality Streaming jitter:  Video clips arrive at the streaming component in batches  Demand surge of network resources leads to some data not arrive at the user at the expected arrival time 6 Metrics affecting Streaming Quality Streaming Quality:  Access time  Transcoding and streaming latencies  Video freezes  Transcoding and streaming jitters Video Content:  The temporal motion metric TM  The spatial detail metric SD Encoding Parallelization SVC      coding structure: A video  non-overlapping coding-independent GOPs A picture  layers A layer  coding-independent slices A slice  macro-blocks Parallelism  Across different compute nodes: inter-node parallelism  Shared-memory address parallelism inside on compute node: intra-node parallelism Multi-level parallelization Scheme Multi-level encoding parallelization:  GOPs: have the largest work granularity  Inter-node parallelism !  Slices: independence, relative larger amount of work  Intra-node parallelism!  Each slice on a different CPU Intra-node Parallelism Intra-node Parallelism  Limit the average computation time spend over the GOP to an upper bound Tth  Shorten the access time !  The minimum number of slices encoded in parallel: Mmin Notations Definitions M Number of encoded parallel slices in a picture NMB, i, Nslice, i Number of MBs or slices in the i-th layer of a picture TMB, i(M), Tslice, i(M) Average encoding time of one NM or slice in the i-th layer with M parallel slices Tpic, i(M) Average encoding time of the i-th layer of a picture Tpic(M) Average encoding time of a picture TGOP(M) Average encoding time of a GOP Inter-node Parallelism Inter-node Parallelism  Achieve real-time transcoding  Transcoding jitters introduced by variation of GOP encoding time  Goal:  Minimize transcoding jitters  Minimize the number of compute nodes Estimation of GOP’s Encoding Time A multi-variable regression model  At a given encoding configuration  Train videos with different video content characteristics TM and SD to build the regression model  90% of predicted values of the testing data are fallen within the 10% of error Problem Formulation Problem Formulation     Based on the approximation of each GOP’s encoding time Given Q jobs Each job i has a deadline di and a processing time pi Multiple nodes in parallel, each job is processed with out preemption on each machine until its completion  Lateness li can be computed as ci (actual completion time) – di  Upper bound of lateness: τ  WANT: bound the lateness of these jobs  find the minimal number of machines N and minimize τ Complexity: Solution:  Hallsh-based Mapping  Lateness-first Mapping Hallsh-based Mapping Hallsh-based Mapping(HM):  Set an upper bound of τ and find the minimal number of N satisfies it  Use Hallsh machine scheduling algorithm as a blackbox minMS2approx algorithm 1. Pick ε = mini{(di - pi)/τ} 2. Run HallSh by increasing the number of machines until the maximum lateness among all jobs satisfies <(1 + ε) *τ, and set the machine number at this point to be K 3. HallSh will returns the scheduling results of all jobs. For a job with lateness over the upper bound on a particular machine j, move it along with all future jobs on machine K to a new machine K + j. Then compute the new completion time for all jobs on this new machine 4. N is the number of used machines Lateness-first Mapping Lateness-first Mapping(LFM):  Compute the minimal number of N based on the deadline of each job and minimize τ for the given N  Deciding the minimum N:  Tpic(M)*R < SG *N  Minimizing τ given N:  For the i-th job in every N jobs, compute its adjusted processing time p’i=pi – (di – d1)  Sort the n jobs by the reverse order of p’I  Schedule the job with the largest p’I to the first available compute node, the second largest one to the second available node Test SVC: JSVM Environment:      Input: 64 480p video GOPs GOP: 8 pictures Picture: 4 temporal layers, 2 spatial layers, 1 quality layer Up tp 4 cores on each compute node Slices number corresponding to cores Performance Average encoding time and speedup using up to 4 cores in intra-node parallelism LFM HM Comparing LFM & HM HM can successfully decide the appropriate compute node number and limit the transcoding jitters HM may require greater N in order to achieve the same level of lateness constraint than LFM Cloud Download Using Cloud Utilities to achieve high-quality content distribution for unpopular videos Motivation:  Video content distribution dominates Internet traffic  High-quality video content distribution is of great significance -1. high data health -2. high data transfer rate Motivation of Cloud Download High data health  Data health: number of available full copies of the shared file in a BitTorrent swarm  Data health < 1.0 is unhealthy  Use data health to represent data redundancy level of a video file High data transfer rate  Enables online video streaming  Live & VoD State-of-the-art Techniques: CDN CDN(Content Distribution Network)  Strategically deploying edge servers  Cooperate to replicate or move data according to data popularity and server load  User obtains copy from a nearby edge server CDN: limited storage and bandwidth  Not cost-effective for CDN to replicate unpopular videos the edge servers  Charged facility only serving the content providers who have paid State-of-the-art Techniques: P2P P2P(Peer-to-Peer)  End users forming P2P data swarms  Data directly exchanged between peers  Real strength shows for popular file sharing P2P: poor performance for unpopular videos  Too few peers  Low data health  Low data transfer rate Neither of CDN and P2P work well in distributing unpopular videos, due to low data health or low data transfer rate Worldwide deployment of cloud utilities provides a novel perspective to solve the problem: Cloud Download! Cloud Download Cloud High data rate ! Cloud Download Firstly, a user sends video request to the cloud Subsequently, the cloud downloads the requested video from the file link and stores it in the cloud cache User retrieve the requested video from the cloud with hight data rate via the intra-cloud data transfer acceleration User-side energy Efficiency Commonly download an unpopular video  A common user keeps his computer (& NIC) powered-on for long hours  Much Energy is wasted while waiting Cloud download an unpopular video  The user can just be “offline”  When the video is ready, quickly retrieve it in short time  User-side energy efficient! Cloud Download: View Startup Delay The only drawback of Cloud Download:  For some videos, the user must wait for the cloud to download it:  View startup delay This drawback is effectively alleviated  By the implicit and secure data reuse among users  The cloud only downloads a video when it is requested for the first time:  Cloud cache!  Subsequent requests directly satisfied  Secure because oblivious to users  Data reuse rate -> 87% System Architecture Video request Data transfer (high data rate) Data store/cache Data download Check cache Component Function ISP Proxy: receive & restrict requests in each ISP Task Manager: check cache Task Dispatcher: load balance Downloaders: Cloud download data Cache: store and upload data Hardware Composition Building Block # of servers Memory Storage Bandwidth ISP Proxy 6 8 GB 250 GB 1 Gbps (Intranet), 0.3 Gbps (Internet) Task Manager 4 8 GB 250 GB 1 Gbps (Intranet) Task Dispatcher 3 8 GB 460 GB 1 Gbps (Intranet) 460 GB 1 Gbps (Intranet), 0.325 Gbps (Internet) Downloaders 140 Cloud Cache 400 chunk servers 93 upload servers 3 index servers 8 GB 8 GB 4 TB (chunk server), 1 Gbps (Intranet), 250 0.3 Gbps (Internet) GB (upload server) Cache Capacity Planning & Replacement Strategy Handel 0.22M daily requests  Average video size: 379MB  Video cache duration: <7 days  Thus, C=372MB*0.22M*7= 584TB  Cache replacement strategies  17 days trace-driven simulations  FIFO vs. LRU vs. LFU  FIFO worst, LFU best! Performance Evaluation Dataset  Complete running log of the VideoCloud system in 17 days: Jan.1,2011 – Jan. 17, 2011  3.87M video requests, around 1.0M unique videos Metrics  Data transfer rate  View startup delay  Energy efficiency Data transfer rate & View startup delay Energy Efficiency User-side energy efficiency  E1: users’ energy consumption using common download  Eu: users’ energy consumption using cloud download  User-side energy efficiency =(E1 - Eu)/E1 = 92% Overall energy efficiency  Ec: the cloud’s energy consumption  E2: the total energy consumption of the cloud and users, so E2 = Ec + Eu  Overall energy efficiency = (E1 – E2)/E1 = 86% Cloud Download application Cloud Transcoding for mobile users  http://xf.qq.com  Mobile user submits a video linnk and the transcoding parameters to the cloud  The cloud downloads the video from Internet via cloud download  The cloud transcodes the downloaded video and transfers the transcoded video back to user References   Huang et al., Cloudstream: Delivering highquality streaming videos through a cloud-based svc proxy, INFOCOM 2011 Huang et al., Cloud download: using cloud utilities to achieve high-quality content distribution for unpopular videos, ACM Multimedia 2011  http://www.slideshare.net/AmazonWebServices/aws-for-media-content-inthe-cloud-miles-ward-amazon-web-services-and-bhavik-vyas-aspera  The QQCyclone platform. http://xf.qq.com. Thank you !

slides - network systems lab @ sfu

Related documents

Products

Support

slides - network systems lab @ sfu

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib