slides - network systems lab @ sfu

advertisement
Cloud Streaming
Jingwen Wang
Video content distribution
Nearly
90% of all the consumer IP traffic is expected
to consist of video content distribution
 Web video like YouTube, P2P video like BitTorrent
 Content
distribution requirements:
 Scalable and secure media storage, processing and distribution
 Anytime, anywhere, any device consumption
 Low latency, global distribution
Cloud Provides a Better way
Massive
Rapid
Scale
File Transfer
Low
IT Costs
High
Reliability
Accredited
Security
CloudStream
Motivation:
 Current solution for deliver videos: progressive download via CDN

Non-adaptive codec

Video freeezes
 WANT: a SVC based video proxy that delivers high-quality
Internet streaming adapting to variable conditions

Video transcoding from original formats to SVC

Video streaming to different users under Internet dynamics
CloudStream
Implement
on one processor:
 Video transcoding to SVC is highly complex and transcoding speed
is relatively slow

a long duration before a user can access the transcoded video

video freezes because of unavailability of transcoded video data
To
enable real-time transcoding and allow scalable
support for multiple concurrent videos:
 Use Cloud: CloudStream

Partition a video into clips and maps them to different compute nodes in
order to achieve encoding parallelization
5
Concerns
Encoding parallelization:
 Multiple video clips can be mapped to compute nodes at different
time
 First-task first-server scheme can introduce unbalanced
computation load  transcoding jitter
 The transcoding component should not speed up video encoding at
the expense of degrading the encoded video quality
Streaming jitter:
 Video clips arrive at the streaming component in batches
 Demand surge of network resources leads to some data not arrive
at the user at the expected arrival time
6
Metrics affecting Streaming Quality
Streaming
Quality:
 Access time

Transcoding and streaming latencies
 Video freezes

Transcoding and streaming jitters
Video
Content:
 The temporal motion metric TM
 The spatial detail metric SD
Encoding Parallelization
SVC





coding structure:
A video  non-overlapping coding-independent GOPs
A picture  layers
A layer  coding-independent slices
A slice  macro-blocks
Parallelism
 Across different compute nodes: inter-node parallelism
 Shared-memory address parallelism inside on compute node:
intra-node parallelism
Multi-level parallelization Scheme
Multi-level
encoding parallelization:
 GOPs: have the largest work granularity
 Inter-node parallelism !
 Slices: independence, relative larger amount of work
 Intra-node parallelism!
 Each slice on a different CPU
Intra-node Parallelism
Intra-node
Parallelism
 Limit the average computation time spend over the GOP to an
upper bound Tth
 Shorten the access time !
 The minimum number of slices encoded in parallel: Mmin
Notations
Definitions
M
Number of encoded parallel slices in a picture
NMB, i, Nslice, i
Number of MBs or slices in the i-th layer of a picture
TMB, i(M), Tslice, i(M)
Average encoding time of one NM or slice in the i-th layer
with M parallel slices
Tpic, i(M)
Average encoding time of the i-th layer of a picture
Tpic(M)
Average encoding time of a picture
TGOP(M)
Average encoding time of a GOP
Inter-node Parallelism
Inter-node
Parallelism
 Achieve real-time transcoding
 Transcoding jitters introduced by variation of GOP encoding time
 Goal:
 Minimize transcoding jitters
 Minimize the number of compute nodes
Estimation of GOP’s Encoding Time
A multi-variable
regression model
 At a given encoding configuration
 Train videos with different video content characteristics TM and
SD to build the regression model
 90% of predicted values of the testing data are fallen within the
10% of error
Problem Formulation
Problem
Formulation




Based on the approximation of each GOP’s encoding time
Given Q jobs
Each job i has a deadline di and a processing time pi
Multiple nodes in parallel, each job is processed with out
preemption on each machine until its completion
 Lateness li can be computed as ci (actual completion time) – di
 Upper bound of lateness: τ
 WANT: bound the lateness of these jobs  find the
minimal number of machines N and minimize τ
Complexity:
Solution:
 Hallsh-based Mapping
 Lateness-first Mapping
Hallsh-based Mapping
Hallsh-based
Mapping(HM):
 Set an upper bound of τ and find the minimal number of N satisfies
it
 Use Hallsh machine scheduling algorithm as a blackbox
minMS2approx algorithm
1.
Pick ε = mini{(di - pi)/τ}
2.
Run HallSh by increasing the number of machines until
the maximum lateness among all jobs satisfies <(1 + ε) *τ,
and set the machine number at this point to be K
3.
HallSh will returns the scheduling results of all jobs. For
a job with lateness over the upper bound on a particular
machine j, move it along with all future jobs on machine K
to a new machine K + j. Then compute the new
completion time for all jobs on this new machine
4.
N is the number of used machines
Lateness-first Mapping
Lateness-first
Mapping(LFM):
 Compute the minimal number of N based on the deadline of each
job and minimize τ for the given N
 Deciding the minimum N:

Tpic(M)*R < SG *N
 Minimizing τ given N:

For the i-th job in every N jobs, compute its adjusted processing time
p’i=pi – (di – d1)

Sort the n jobs by the reverse order of p’I

Schedule the job with the largest p’I to the first available compute node,
the second largest one to the second available node
Test
SVC:
JSVM
Environment:





Input: 64 480p video GOPs
GOP: 8 pictures
Picture: 4 temporal layers, 2 spatial layers, 1 quality layer
Up tp 4 cores on each compute node
Slices number corresponding to cores
Performance
Average encoding time and speedup using up to 4 cores in intra-node
parallelism
LFM
HM
Comparing LFM & HM
HM
can successfully decide the appropriate compute
node number and limit the transcoding jitters
HM
may require greater N in order to achieve the
same level of lateness constraint than LFM
Cloud Download
Using
Cloud Utilities to achieve high-quality content
distribution for unpopular videos
Motivation:
 Video content distribution dominates Internet traffic
 High-quality video content distribution is of great significance
-1. high data health
-2. high data transfer rate
Motivation of Cloud Download
High
data health
 Data health: number of available full copies of the shared file in a
BitTorrent swarm
 Data health < 1.0 is unhealthy
 Use data health to represent data redundancy level of a video file
High
data transfer rate
 Enables online video streaming
 Live & VoD
State-of-the-art Techniques: CDN
CDN(Content
Distribution Network)
 Strategically deploying edge servers
 Cooperate to replicate or move data according to data popularity
and server load
 User obtains copy from a nearby edge server
CDN:
limited storage and bandwidth
 Not cost-effective for CDN to replicate unpopular videos the edge
servers
 Charged facility only serving the content providers who have paid
State-of-the-art Techniques: P2P
P2P(Peer-to-Peer)
 End users forming P2P data swarms
 Data directly exchanged between peers
 Real strength shows for popular file sharing
P2P:
poor performance for unpopular videos
 Too few peers

Low data health

Low data transfer rate
Neither
of CDN and P2P work well in distributing
unpopular videos, due to low data health or low data
transfer rate
Worldwide
deployment of cloud utilities provides a
novel perspective to solve the problem:
Cloud
Download!
Cloud Download
Cloud
High data
rate !
Cloud Download
Firstly,
a user sends video request to the cloud
Subsequently,
the cloud downloads the requested
video from the file link and stores it in the cloud
cache
User
retrieve the requested video from the cloud with
hight data rate via the intra-cloud data transfer
acceleration
User-side energy Efficiency
Commonly
download an unpopular video
 A common user keeps his computer (& NIC) powered-on for long
hours
 Much Energy is wasted while waiting
Cloud
download an unpopular video
 The user can just be “offline”
 When the video is ready, quickly retrieve it in short time
 User-side energy efficient!
Cloud Download: View Startup Delay
The only drawback of Cloud Download:
 For some videos, the user must wait for the cloud to download it:
 View
startup delay
This drawback is effectively alleviated
 By the implicit and secure data reuse among users
 The cloud only downloads a video when it is requested for the first
time:
 Cloud cache!
 Subsequent requests directly satisfied
 Secure because oblivious to users
 Data reuse rate -> 87%
System Architecture
Video request
Data transfer
(high data rate)
Data store/cache
Data download
Check cache
Component Function
ISP
Proxy: receive & restrict requests in each ISP
Task
Manager: check cache
Task
Dispatcher: load balance
Downloaders:
Cloud
download data
Cache: store and upload data
Hardware Composition
Building Block
# of servers
Memory
Storage
Bandwidth
ISP Proxy
6
8 GB
250 GB
1 Gbps (Intranet),
0.3 Gbps (Internet)
Task Manager
4
8 GB
250 GB
1 Gbps (Intranet)
Task Dispatcher
3
8 GB
460 GB
1 Gbps (Intranet)
460 GB
1 Gbps (Intranet),
0.325 Gbps
(Internet)
Downloaders
140
Cloud Cache
400 chunk servers
93 upload servers
3 index servers
8 GB
8 GB
4 TB (chunk server),
1 Gbps (Intranet),
250
0.3 Gbps (Internet)
GB (upload server)
Cache Capacity Planning &
Replacement Strategy
Handel
0.22M daily requests
 Average video size: 379MB
 Video cache duration: <7 days
 Thus, C=372MB*0.22M*7= 584TB

Cache replacement strategies
 17 days trace-driven simulations
 FIFO vs. LRU vs. LFU
 FIFO worst, LFU best!
Performance Evaluation
Dataset
 Complete running log of the VideoCloud system in 17 days:
Jan.1,2011 – Jan. 17, 2011
 3.87M video requests, around 1.0M unique videos
Metrics
 Data transfer rate
 View startup delay
 Energy efficiency
Data transfer rate & View startup delay
Energy Efficiency
User-side
energy efficiency
 E1: users’ energy consumption using common download
 Eu: users’ energy consumption using cloud download
 User-side energy efficiency =(E1 - Eu)/E1 = 92%
Overall
energy efficiency
 Ec: the cloud’s energy consumption
 E2: the total energy consumption of the cloud and users, so E2 = Ec
+ Eu
 Overall energy efficiency = (E1 – E2)/E1 = 86%
Cloud Download application
Cloud
Transcoding for mobile users
 http://xf.qq.com
 Mobile user submits a video linnk and the transcoding parameters
to the cloud
 The cloud downloads the video from Internet via cloud download
 The cloud transcodes the downloaded video and transfers the
transcoded video back to user
References


Huang et al., Cloudstream: Delivering highquality streaming videos
through a cloud-based svc proxy, INFOCOM 2011
Huang et al., Cloud download: using cloud utilities to achieve high-quality
content distribution for unpopular videos, ACM Multimedia 2011

http://www.slideshare.net/AmazonWebServices/aws-for-media-content-inthe-cloud-miles-ward-amazon-web-services-and-bhavik-vyas-aspera

The QQCyclone platform. http://xf.qq.com.
Thank you !
Download