Oral Presentation

advertisement
An Empirical Study of Flash Crowd
Dynamics in a P2P-based Live Video
Streaming System
Bo Li, Gabriel Y. Keung, Susu Xie, Fangming Liu, Ye Sun, and
Hao Yin
Email: lfxad@cse.ust.hk
Hong Kong University of Science & Technology
Dec 2, 2008 @ IEEE GLOBECOM, New Orleans
Overview: Internet Video Streaming

Enable video distribution from any place to anywhere in the
world in any format
Cont.

Recently, significant deployment in adopting
Peer-to-Peer (P2P) technology for Internet live
video streaming


Protocol design: Overcast, CoopNet, SplitStream,
Bullet, and etc.
Real deployment: ESM, CoolStreaming, PPLive,
and etc.
Easy to deploy

Key
Good scalability
•Requires minimum support from
the infrastructure
•Greater demands also generate more
resources: Each peer not only
downloading the video content, but
also uploading it to other participants
Challenges



Real-time constraints
constraints, requiring timely and sustained streaming
Real-time
delivery to all participating peers
Performance-demanding, involving bandwidth requirements of
Performance-demanding
hundreds of kilobits per second and even more for higher
quality video
Large-scale and
and extreme
extreme peer
peer dynamics
dynamics, corresponding to tens
Large-scale
of thousands of users simultaneously participating in the
streaming with highly peer dynamics (join and leave at will)

especially flash crowd
Motivation
Challenge: Large-scale & extreme peer dynamics
Current P2P live streaming systems still suffer from potentially
long startup delay & unstable streaming quality
Especially under realistic challenging scenarios such as flash crowd

Flash crowd


A large increase in the number of users joining the streaming
in a short period of time (e.g., during the initial few minutes of
a live broadcast program)
Difficult to quickly accommodate new peers within a
stringent time constraint, without significantly impacting the
video streaming quality of existing and newly arrived peers

Different from file sharing

Cont.

Little prior study on the detailed dynamics of P2P live
streaming systems during flash crowd and its impacts

E.g., Hei et al. measurement on PPLive, the dynamic of user population
during the annual Spring Festival Gala on Chinese New Year
Focus
How to capture various effects of flash crowd in
P2P live streaming systems?
What are the impacts from flash crowd on
user experience & behaviors, and system scale?
What are the rationales behind them?
Outline

System Architecture

Measurement Methodology

Important Results




Short Sessions under Flash Crowd
User Retry Behavior under Flash Crowd
System Scalability under Flash Crowd
Summary
Some Facts of CoolStreaming System
CoolStreaming



Cooperative Overlay Streaming
First released in 2004
Roxbeam Inc. received USD 30M investment, current
through YahooBB, the largest video streaming portal
in Japan
Download
2,000,000
Average online user
20,000
Peak-time online user
150,000
Google entries (keyword:
Coolstreaming)
400,000
CoolStreaming System Architecture

Membership manager


Partner
Manager
Partnership manager

Member
Manager

BM
Segments
Stream
Manager

Maintaining partial view of
the overlay: gossip
Establishing & maintaining
TCP connections (partnership)
with other nodes
Exchanging the data
availability: Buffer Map (BM)
Stream manager



Providing stream data to
local player
Making decision where and
how to retrieve stream data
Hybrid Push & Pull
Mesh-based (Data-driven) Approaches



No explicit structures are constructed and maintained
 e.g., Coolstreaming, PPLive
Data flow is guided by the availability of data
 Video stream is divided into segments of uniform length,
availability of segments in the buffer of a peer is represented
by a buffer map (BM)
 Periodically exchange data availability info with a set of
partners (partial view of the overlay) and retrieves currently
unavailable data from each other
 Segment scheduling algorithm determines which segments
are to be fetched from which partners accordingly
Overhead & delay: peers need to explore the content
availability with one another, which is usually achieved with
the use of gossip protocol
Measurement Methodology
Each user reports its activities & internal status to the log server periodically
Using HTTP, peer log compacted into parameter parts of the URL string

3 types of status report
 QoS report




% of video data missing
the playback deadline
Traffic report
Partner report
4 events of each session
 Join event
 Start subscription event
 Media player ready event


receives sufficient data to
start playing
Leave event
Log & Data Collection

Real-world traces obtained from a live
event broadcast in Japan Yahoo using
the CoolStreaming system




A sport channel on Sept. 27, 2006 (24 hours)
Live baseball game broadcast at 18:00
Stream bit-rate is 768 Kbps
24 dedicated servers with 100 Mbps
connections
How to capture flash crowd effects?

Two key measures

Short session distribution



Counts for those that either fail to start viewing a
program or the service is disrupted during flash
crowd
Session duration is the time interval between a user
joining and leaving the system
User retry behavior

To cope with the possible service disruption often
observed during flash crowd, each peer can reconnect (retry) to the program
Short Sessions under Flash Crowd
Filter out normal sessions (i.e., users who successfully join the program)
Focus on short sessions with the duration <= 120 sec and 240 sec
No. short session increases significantly at around 18:00 when flash crowd
occurs with a large number of peers joining the live broadcast program
Strong Correlation Between the Number
of Short Sessions and Peer Joining Rate
What are the rationales behind these
observations?

Relevant factors:




User client connection fault
Insufficient uploading capacity from at least one of the
parents
Poor sustainable bandwidth at beginning of the stream
subscription
Long waiting time (timeout) for cumulating sufficient video
content at playback buffer
Newly coming peers do not have adequate content to share with others, thus
initially they can only consume the uploading capacity from existing peers
With partial knowledge (gossip), the delay to gather enough upload
bandwidth resources among peers and the heavy resource competition
could be the fundamental bottleneck
Approximate User Impatient Time
In face of poor playback continuity, users either reconnect or opt to leave
Compare the total downloaded
bytes of a session with the expected
total playback video bytes
according to the session duration
Extract sessions with insufficient
download bytes
The avg. user impatient time
is between 60s to 120s
User Retry Behavior under Flash Crowd
Retry rate: count the NO. peers that opt to re-join to the overlay
with same IP address and port per unit time
User perspective:
playback could be restored
System perspective:
amplify the join rates
Users could have tried many times to successfully start a video session
Again shows that flash crowd has significant impact on the initial
joining phase
System Scalability under Flash Crowd
Media player ready
Received sufficient
data to start playing
Successfully joined
The gap illustrates
“catch up process”
Media player ready rate picks up when the flash crowd occurs and
increases steadily; however, the ratio between these two rates <= 0.67
Imply that the system has capability to accommodate a sudden surge
of the user arrivals (flash crowd), but up to some maximum limit
Media Player Ready Time under different
time period
Considerably longer during the period
when the peer join rate is higher
Scale-Time Relationship

System perspective:


Though there could be enough aggregate resources
brought by newly coming peers, cannot be utilized
immediately
It takes time for the system to exploit such resources


User perspective:


i.e., newly coming peers (with partial view of overlay) need to
find & consume existing resources to obtain adequate content
for startup and contribute to others
Cause long startup delay & disrupted streaming (thus
short session, retry, impatience)
Future work:
Amount of initial
???
buffering
•Long  startup delay
•Short  continuity
System scale
Summary

Based on real-world measurement, capture flash
crowd effects




The system can scale up to a limit during the flash crowd
Strong correlation between the number of short sessions
and joining rate
The user behavior during flash crowd can be best captured
by the number of short sessions, retries and the impatient
time
Relevant rationales behind these findings
Future work


Modeling to quantify and analyze flash crowd
effects
Correlation among initial system capacity, the user
joining rate/startup delay, and system scale?


Intuitively, a larger initial system size can tolerate a higher
joining rate
Challenge: how to formulate the factors and performance
gaps relevant to partial knowledge (gossip)?
Based on the above study, perhaps more importantly for practical
systems, how can servers help alleviate the flash crowd problem, i.e.,
shorten users’ startup delays, boost system scaling?
 Commercial systems have utilized self-deployed servers or CDN
 Coolstreaming, Japan Yahoo, 24 servers in different regions that allowed
users to join a program in order of seconds
 PPLive is utilizing the CDN services
 On measurement, examine what real-world systems do and
experience
 On technical side, derive the relationship between
Amount of
Server Provisioning
Further, how servers are
geographically distributed
???
Expected Number
of Viewers
along with their joining behaviors
References

"Inside the New Coolstreaming: Principles, Measurements and
Performance Implications,"



"Coolstreaming: Design, Theory and Practice,"



B. Li, S. Xie, Y. Qu, Y. Keung, C. Lin, J. Liu, and X. Zhang,
in Proc. of IEEE INFOCOM, Apr. 2008.
Susu Xie, Bo Li, Gabriel Y. Keung, and Xinyan Zhang,
in IEEE Transactions on Multimedia, 9(8): 1661-1671, December
2007
"An Empirical Study of the Coolstreaming+ System,"


Bo Li, Susu Xie, Gabriel Y. Keung, Jiangchuan Liu, Ion Stoica, Hui
Zhang, and Xinyan Zhang,
in IEEE Journal on Selected Areas in Communications, 25(9):1-13,
December 2007
Q&A
Thanks !
Additional Info & Results
Comparison with the first release

The initial system adopted a simple pull-based scheme




Implemented a hybrid pull and push mechanism





Pushed by a parent node to a child node except for the first block
Lower overhead associated with each video block transmission
Reduces the initial delay and increases the video playback quality
Multiple sub-stream scheme is implemented


Content availability information exchange using buffer map
Per block overhead
Longer delay in retrieving the video content
Enables multi-source and multi-path delivery for video streams
Gossip protocol was enhanced to handle the push function
Buffer management and scheduling schemes are re-designed to deal
with the dissemination of multiple sub-streams
Gossip-based Dissemination

Gossip protocol - used in BitTorrent

Iteration




Pros:


Simple, robust to random failures, decentralized
Cons:


Nodes send messages to random sets of nodes
Each node does similarly in every round
Messages gradually flood the whole overlay
Latency trade-off
Related to Coolstreaming


Updated membership content
Multiple sub-streams
Multiple Sub-streams




Video stream is divided
into blocks
Each block is assigned a
sequence number
An example of stream
decomposition
Adoption of the gossip
concept from P2P filesharing application
Buffering

Synchronization Buffer



Received block firstly put into Syn. Buffer for corresponding sub-stream
Blocks with continuous sequence number will be combined
Cache Buffer

Combined blocks are stored in Cache Buffer
Comparison with the 1st release (II)
Comparison with the 1st release (III)
Parent-children and partnership


Partners are connected
with TCP connections
Parents are supporting
video streams to
children by TCP
connection
System Dynamics
Peer Join and Adaptation




Stream bit-rate
normalized to ONE
Two Sub-streams
Weight of node is
outgoing bandwidth
Node E is newly
arrival
Peer Adaptation
Peer Adaptation in Coolstreaming

Inequality (1) is used to monitor the buffer status of received substreams for node A


If this inequality does not hold, it implies that at least one sub-stream
is delayed beyond threshold value Ts
Inequality (2) is used to monitor the buffer status in the parents of
node A

If this inequality does not hold, it implies that the parent node is
considerably lagging behind in the number of blocks received when
comparing to at least one of the partners, which currently is not a
parent node for the given node A
User Types Distribution
Contribution Index
Conceptual Overlay Topology

Source node “O”

Super-peers
{A, B, C, D}

Moderate-peers
{a}

Casual-peers
{b, c, d}
Event Distributions
Media Player Ready Time under different
time period
Session Distribution
Download