ppt

advertisement
ShadowStream: Performance Experimentation as a
Capability in Production Internet Live Streaming
Networks
Present by: Chen Alexandre Tian (HUST)
Richard Alimi (Google)
Richard Yang (Yale)
David Zhang (PPLive)
1
Live Streaming is Widely Used
• Many recent major events live streamed
on the Internet
• Many daily events are streamed as well
•
Justin.tv, livestream, …
2
State of Art of Live Streaming System
Hybrid system (e.g., Adobe Flash 10.1 and later)


CDN seeding
P2P with BitTorrent-like protocols
3
Performance of Live Streaming System
Become Difficult to Understand/Predict
System software becoming more complex
4
Internet Environment Complexity
ADSL Modem Buffer
PowerBoost
Inter-ISP throttling
……
Misleading results if not considering
real network features.
5
Need Evaluation at Right Scale
Misleading results if not considering
the target scale.
6
Key Idea of ShadowStream
The production system provides an
ideal evaluation platform: real users,
real networks, at scale.
7
Starting Point: Use Experiment Algorithm
On Real User
First Challenge: How to achieve both accuracy and user protection?
User
Playpoint
Virtual Playpoint
Miss record here
CDN Protection
New pieces
inject here
Experiment
Two seconds later:
8
Issues of CDN Protection
• Scale
• 100,000 Clients @ 1 mbps rate ->100Gbps
• More demand with concurrent test channels
• Network bottleneck
• There can be bottlenecks from CDN edge
servers to streaming clients
9
New Idea: Scaling Up with Stable Protection
Observation: there already exists a stable
version w/ reasonable performance
Issue: Losses of Experiment Accuracy.
10
Why Loss Accuracy?
11
Converge to a Balance Point
We should observe m(θ0), but instead we
actually observe m(θ’).
12
Putting-Together: Cascading Protection
for Accuracy and Scalability
Q: Any remaining challenge?
13
Real user behaviors differ from testing
behaviors
Idea: transparently orchestrate experimental
scenarios from existing, already playing clients

Virtual arrivals/virtual departures
Test specification
Triggering
Virtual arrival
control
Virtual departure
control
14
Independent Arrivals Achieving Global
Arrival Pattern
Peer generate arrival times by drawing random
numbers independently according to the same
cumulative distribution function.
15
From Idea to System
Challenge:
How to minimize developers’ engineering efforts?
16
Streaming Hypervisor
Hypervisor API need for each streaming engine



getSysTime()
getLagRange(), getMaxStartupDelay()
writePiece(), getPieceMap()
17
Computing Windows Bounds
• Hypervisor calls getLagRange()
18
Sharing and Information Flow Control
19
Compositional Software framework
Example: Adding an admission control
component
20
Evaluation:
Experiment Accuracy & Protection
Only CDN as the Protection:
Cascaded Protection:
21
Evaluation:
Experimental Opportunities
SH Sports channel and HN Satellite channel, pplive,
September 6, 2010
22
Evaluation:
Accuracy of Distributed Arrivals
Arrival function from “Performance and Quality-of-Service Analysis of a Live P2P Video Multicast
Session on the Internet”. Sachin Agarwal, Jatinder Pal Singh, Aditya Mavlankar, Pierpaolo
Bacchichet, and Bernd Girod, In Proceedings of IWQoS 2008. Springer, June 2008
23
Take Home Idea
Many Internet-scale systems are unique
systems that are difficult to build/test.
The ShadowStream scheme consists of
following key ideas:


Conduct shadow experiments using real system,
real users
Protection and accuracy present dual challenges
 Use Stable for scalable protection
 Introduce external resources (CDN) to remove
interference on competing resources

Create shadow behaviors from real users
24
Thanks for coming!
Questions?
25
Metric of Live Streaming Performance
Piece missing ratio
26
Backup Slides
27
Streaming of the Internet
28
Virtual Sliding Window
A streaming engine has two sliding
windows: an upload window (P2P) and a
download window (CDN and P2P).
Each engine call getSysTime() to Hypervisor,
based on real system time and time shifted
value, Hypervisor assign a virtual system time
to each engine.
Each engine calculate x(left) and x(right) of
download window
Each engine advances its sliding window at
the channel rate μ pieces per second.
29
30
The reasoning behind
•CDN see the original miss-ratio/supply-ratio curve
•P2P Protection see the curve minus δ
31
Specification
Define multiple classes of clients (e.g.,
cable or DSL, estimated upload capacity
class, or network location)
A class-wide arrival rate function λj(t)
Client’s lifetime is determined by the
distribution Lx
32
Local Replacement for Uncontrolled
Early Departures
Capturing client state
Substitution
33
Triggering Condition
Predict(t): autoregressive integrated moving
average (ARIMA) method that uses both
recent testing channel states and the past
history of the same program
34
Independent Arrivals Algorithm
35
CDN Capacity and window length
CDN window set to 4 seconds


The TCP retransmission timeout is 3
seconds for piece loss
1 extra second for waiting retransmitted
piece
Window length
36
Starting up the engine
When starting a streaming engine x, the Streaming
Hypervisor gives x pointers to its download and
upload windows.
at time a(s), the client join test channel and Stable
engine starts.
at time a(e) >a(s), the client join testing, the
Experiment Engine and CDN Protection Engine start.
After starting, an engine begins to download pieces
starting from the target playpoint to the end of its
download window.
The piece before startup should be protected by
CDN, which would be counted by CDN capacity
37
calculation
ShadowStream Outline
Motivation and Challenge
Experiment Protection and Accuracy
Experiment Orchestration
Implementation
Evaluation
38
Client Substitution
Client substitution delay with client
dynamics.
39
Backup Slides
40
Sec. 8: Limitation Discussion
(Do we really need this?)
If Exp consumes resources while no
piece received at all (Give priority to
Protection?)
Download link are bottleneck
41
Modeling P2P Protection
Given experiment engine e, target rate R, the
miss ratio is mR,e(θ) , or, me(θ)
Given protection
engine e, its target
rate is me(θ), the
required rescue
bandwidth is
Θk(me(θ),p)* me(θ)=
η(e,p,θ)
42
P2P Protection no accurate result
•If P1 is the protection, there would exist balance point(s)
•If P2 is the protection, there would be a negative feed-back loop
•In either cases, there is no accuracy at all
43
44
45
Live Streaming
Live Streaming on Internet

Live Audio/Video Content Distribution on
Internet
 e.g., NBC Winter Olympics 2010 live

Using Microsoft Silverlight® P2P live streaming
46
47
Example: PPLive
From PPLive’s Presentation


Not Yet!
Founded by Graduate Students from Huazhong University
of
Science & Technology
PPLive is
 An online video broadcasting and advertising network
 provides online viewing experience comparable to TV
 An efficient P2P technique platform and test bench
Estimated global installed base
75 million
Monthly active users*
20 million
Daily active users
3.5 million
Peak concurrent users
2.2 million
Monthly average concurrent users
1.5 million
Weekly average usage time
48
11 hours
49
50
51
Challenges
How to achieve both experiment
accuracy and user protection?
How to produce desired experiment
pattern?
How to minimize developers’
engineering effort?
52
Starting Point: Use Experiment Alg. On Real
User
53
A simple example
• No user-visible pieces misses
• Missing piece 91 is recorded
• Piece download assignment is adaptive
54
55
Three issues delete
• Information flow control: Although piece 91 is
downloaded by the Protection Engine, it should not
be labeled as downloaded in the Experiment
Engine.
• Duplicate avoidance: Since both Experiment
Engine and Protection Engine are running, if their
download windows overlap, they may download the
same piece.
• Experiment feasibility: This lag from realtime is
determined when client i joins the test channel with
the Protection Engine to make experiment and
protection feasible.
56
Download