Slides

advertisement
Measurements, Analysis, and
Modeling of BitTorrent-like Systems
Lei Guo1, Songqing Chen2, Zhen Xiao3,
Enhua Tan1, Xiaoning Ding1, and Xiaodong Zhang1
1College
of William and Mary
2George Mason University, 3AT & T Labs - Research
1
Basic Model of P2P Systems
• Peers sharing different files selforganize into a P2P network
• Exchange files they desire
♫
• Limitations
– Free riding
– Large file downloading
Examples: Gnutella, KaZaa, eDonkey/eMule/Overnet
2
BitTorrent: Fast Delivery with Incentive
• A large file is divided into chunks
4 5
• Peers interested in the same file
self-organize into a torrent
Torrent of Bits
• Peers exchange file chunks with
each other
• Incentive is established by tit for tat
...
• Very simple and effective, scale
fairly well during flash crowd
3
BitTorrent Traffic
• Online users
– 6.8 million in August 2004, 9.6 million in August 2005 (BigChampagne)
• Traffic volume
– 53% of all P2P traffic on the Internet in June 2004 (CacheLogic)
P2P traffic: 60-80%
Other traffic: 20-30%
Source: CacheLogic, 2004
Limited Understanding of BitTorrent
• Existing studies on BitTorrent systems (INFOCOM04, SIGCOMM04)
– Unrealistic assumptions in system model: no evolution considered
– Single-torrent based: more than 85% BT users join multiple torrents
• What we are not clear about BitTorrent systems
– Service availability
– Service stability
during the entire lifetime
– Service fairness
• Our objective of this work
– Evolution of single-torrent system, and limitations of BT
– Multi-torrent model for inter-torrent relation and collaboration
5
Outline
• BitTorrent mechanism and our methodology
• Modeling and characterization of single-torrent system
• Modeling and characterization of multi-torrent system
• Inter-torrent collaboration
• Conclusion
6
How BitTorrent Works: Publishing
seed
...
foo.torrent
3 4 5
I am here!
foo.torrent
peer list
announce: tracker URL for bootstrap
creation date: epoch time
of file creation
length: file size
name: file name
piece length: chunk size
pieces: SHA1 hash key
of each chunk
Tracker site
Web site
The publisher
–
–
–
–
Create a meta file
Publish on a Web site
Start the tracker site
Start a BT client as the
initial seed
7
How BitTorrent Works: Downloading
seed
...
The downloader
3 4 5
foo.torrent
peer list
– Download the meta file
– Start a BT client, connect to the
tracker site
– Get peer list from tracker
– Get first chunk from other peers
(seeds)
Tracker site
Web site
peer list
download
I am here!
foo.torrent
8
How BitTorrent Works: Downloading
seed
...
The downloader
3 4 5
foo.torrent
peer list
User
Tracker site
Web site
– Download the meta file
– Start a BT client, connect to the
tracker site
– Get peer list from tracker
– Get first chunk from other peers
(seeds)
– Exchange file chunk with other
peers
– Download complete: become a
new seed
foo.torrent
foo.torrent
9
User
How BitTorrent Works: Downloading
seed
Future performance
The downloader
... and
Depends on the arrival
3 4 departure
5
of new downloaders and seeds
foo.torrent
peer list
User
Tracker site
Web site
– Download the meta file
– Start a BT client, connect to the
tracker site
– Get peer list from tracker
– Get first chunk from other peers
(seeds)
– Exchange file chunk with other
peers
– Download complete: become a
new seed
– Initial seed leaves
foo.torrent
foo.torrent
seed
...
User
3 4 5
10
Our Methodology of this Study
• Measurement
– BitTorrent traffic pattern
– Meta file downloading and tracker statistics
• Analysis
– BitTorrent user behavior and performance limitations
– Curve fitting, parameter estimation and validation of
mathematical models
• Modeling
– Torrent evolution and inter-torrent relation
– Fluid model, probability model, and graph model
11
Meta File Downloading
• The first HTTP packets of .torrent file downloading
– Cable network: 3,000+ downloads, 1,000+ torrent meta files
– Server farm: 50 tracker sites host hundreds of torrents
– Gigasope: fast Internet traffic monitoring tool by AT&T
• What information it contains?
– Torrent birth time
– Peer arrival time to the torrent
(packet capture time of downloading)
– About 10 days
announce: tracker URL
creation date: epoch time
of file creation
length: file size
name: file name
piece length: chunk size
pieces: SHA1 hash key
of each chunk
foo.torrent
12
Torrent Statistics on Trackers
• Professional/dedicated tracker sites
– Each may host thousands of torrents at the same time
– http://www.alluvion.org/ and http://www.crapness.com/, collected
by University of Massachusetts, Amherst
– Ex: alluvion -- 1,500 torrents, 550 are fully traced
• What information it contains?
– Torrents: torrent birth time, file size, number of peers/seeds
– Peers: request time, downloading/uploading bytes,
downloading/uploading bandwidth
– Sampled every 0.5 hour for 48 days
13
Outline
• BitTorrent mechanism and our methodology
• Modeling and characterization of single-torrent system
– The evolution of torrent over time
– Limitations of current BitTorrent systems
• Modeling and characterization of multi-torrent system
• Inter-torrent collaboration
• Conclusion
14
Torrent Popularity
tracker site workload
------ raw data
------ linear fit
------ raw data
------ linear fit
104
102
102
101
100
0
100
200
100
0
20
40
individual torrents
relative deviation (%)
CCDF of peer arrival
meta file workload
103
time after torrent birth (day)
Peer arrivals: decrease with time exponentially
number of arrivals
Peer arrival rate 
t
derivative of CCDF
30
20
6% in average
10
0
100
300
500
torrents ranked by population
(non ascending order)
e
 t
 (t )  0e
 t
15
Torrent Death
Peer n arrives at time tn :
peer arrival rate:


inter-arrival time:
seed leaving rate:


seed service time:
downloading rate:
u 
downloading time:
When tn  , what will happen?

peer n
tn
inter-arrival time > seed service time
1
1
un
tn  1
 (tn )
tn  tn1  tn  1
 (tn )
1

1
u
peer n+1
tn+1
torrent dead
t
1
u n1
16
Torrent Population and Lifespan


0
Tlife   log( )

t
N all   0e dt  0

0
104
104
torrent lifespan (hour)
torrent population
trace
model
102
100
100
101
rank of torrents
102
103
Most torrents are small (avg 102)
trace
model
102
100
0
200
400
600
torrents
Most torrents are short live (avg
8 days)
17
Downloading Failure Ratio
• Avg downloading failure ratio
– about 10%
• Different evolution patterns
• Small population  large Rfail
– Reminder: most torrents have
small population!
population
download failure
104
10-1
102
10-2
10-3
0
200
400
torrent population
N fail 
R fail 

N all 0
100
downloading failure ratio
• Define:
100
600
torrents ranked in non-ascending
order of downloading failure ratio
• Altruistic peers make torrents
long live
18
Torrent Evolution: Fluid Model
• Existing model (SIGCOMM 04)
– Constant arrival rate  = const
– Torrent reaches equilibrium
• The correct model
– Exponentially decreasing arrival rate
– Torrent dead finally
– Verified by our measurements
• Two completely different pictures
19
# of downloaders
Torrent Evolution: Modeling Results
80
constant arrival model
model
40
• Flash crowd
– Downloader #: exponentially 
– Seed #: exponentially 
• Peek time
0
100
80
# of seeds
trace
200
trace
model
constant arrival model
40
0
100
time (hour)
200
– A very short duration
– Constant arrival model: flat peak
• Attenuation – a long tail
– Downloader #: exponentially 
– Seed #: exponentially 
– Constant arrival model is far from
the reality: no attenuation
• Torrent death
20
Performance Stability
104
101
10
model
trace
download speed
105
8
10
5
6
103
4
101
2
0
50
100
150
200
time (hour)
Only stable when torrent is large
Fluctuate significantly after peak time
0
50
100
downloader
seed
150
200
avg download speed (byte/sec)
15
Snapshot of torrents at time t
# of peers
avg download speed (byte/sec)
Evolution over time
torrents
Larger torrents have higher and
more table performance
21
Service Unfairness
104
10-2
0
0.2
–x– download speed
102
0.4
0.6
0.8
1
+ contribution ratio
102
100
101
10-2
0
0.2
0.4
–x– # of torrents 0
10
0.6
0.8
1
ranked peers
ranked peers
Contribution ratio:
•
uploaded bytes
•
downloaded bytes
103
# of torrents
100
102
peer contribution ratio
+ contribution ratio
106
download speed (byteps)
peer contribution ratio
102
Unfairness:  download speed,  uploading
contribution
Seeds serve high speed downloaders first
– Peers not willing to serve after downloading
– Not due to new file downloading: selfish
22
Single-torrent Model: Summary
• Torrent evolution over time
– Exponentially decreasing arrival rate
– Flash crowd – short peak – long tailed attenuation
• BitTorrent Limitations
– Content availability: torrent death
– Performance stability
– Service fairness
23
Outline
• BitTorrent mechanism and our methodology
• Modeling and characterization of single-torrent system
• Modeling and characterization of multi-torrent system
– Traffic pattern and user behavior
– Graph based model of inter-torrent relation
• Inter-torrent collaboration
• Conclusion
24
Multi-torrent Environment Dynamics
------ linear fit
------ raw data
Peer birth
CDF of peers
------ raw data
Request arrival
CDF of requests
CDF of torrents
Torrent birth
------ linear fit
------ raw data
------ asymptotic fit
Torrent birth time, request arrival time, and peer birth time (hour)
• Considering peers and torrents on the Internet as an open system
– Torrent birth rate, torrent request rate, and peer birth rate are constant
• Implication:
avg # of torrents a
torrent request rate
=
= constant
peer requests
peer birth rate
• The lifecycle of a BT peer: downloading, seeding, sleeping, …, dead
102
104
101
 r (day)
108
100
0
# of torrents
Peer Request Pattern: Request Rate
Peer request rate:
requests by a peer to different
torrents per unit time
Assume
–x– # torrents 100
+ r
2000
4000
peers
r (t )  r0e

t
r

 r(t )dt  const
0
r  77 years !
• Peer request process: seems Poisson-like
• Request a new torrent with a probability p: participation probability
• Dead with probability 1-p
26
number of torrents (m)
Peer Request Pattern: Participation Probability
––– raw data
––– linear fit
40
Probability model
i  Npm1 peers request at least m torrents
m  1
20
log i  log N
log p
p = 0.8551
0
100
104
102
peer rank (log i)
Probability model confirmed
Another estimation of p
m  torrent request rate

peer birth rate

N
m  k 1 kpk 1 (1  p )  1
1 p


m  7.514

 p  0.8548
27
Inter-torrent Relation Graph:
How Torrents Can Help with Each Other?
i
j
1
i
some peers in torrent j
have downloaded i
some peers in torrent i
have downloaded j
j
2
28
trace
model
torr size
i
weighted in-degree
torrents
trace
model
torr size
torrent size (# of online peers)
weighted out-degree
Inter-torrent Relation Graph:
How Torrents Can Help with Each Other?
j
some peers in torrent i
have downloaded j
1
i
some peers in torrent j
have downloaded i
•
j
2
Edge weight Wi,j : number of such peers
out degree : SPi   j 1Wi , j

in degree : SG j  i 1Wi , j

torrents
29
Single-torrent vs. Multi-torrent Model
• Single-torrent model
–  seed service time,  download failure rate
– Limited seed service time , but inter-arrival time  exponentially
– Small improvement
• Multi-torrent model
– Old peers come back multiple times
–  peer arrival rate,  peer inter-arrival time
– Significant improvement
30
Single-torrent vs. Multi-torrent Model
Single-torrent model
0

T


log(
)
 life


 R fail   0.1

0
seeds stay 10 times longer:  * =  /10
0
 *
T


log(
)  Tlife   log 10
 life
*

*


*
R 

 1 R fail 0.01
fail

0 100 10
Multi-torrent model
k ( t ) 1
t q

1



(
t
)


e
  (t )
0

q 1

1

r
k (t )  rt , q  pe  1
torrent death ' (T'life) = 
Tlife


T

 life r log 1  Tlife (   6)
p


Tlife
 
 
 R fail  R 6fail 110-6 ≈ 0
 R fail  e
Inter-torrent collaboration is much more effective than stimulating
seeds to serve longer
31
Outline
• BitTorrent mechanism and our methodology
• Modeling and characterization of single-torrent system
• Modeling and characterization of multi-torrent system
• Inter-torrent collaboration
– Tracker site overlay
– Instant incentive for collaboration
• Conclusion
32
Tracker Site Overlay
B
Neighbor-in
A
B
C
torrents that can
serve me
Neighbor-out
D
C
•
•
•
•
D
torrents that I can
serve (peer list)
Self-organized P2P network (a logical structure)
An instance of inter-torrent relation graph
A built-in mechanism for content search, cover 99%+ torrents
Trackerless BitTorrent: uses DHT to store meta file
33
Incentive for Inter-Torrent Collaboration
B
file A
User
User
file D
Jack
A
Thanks Jack!
C
D
Tom
User
Instant incentive – similar to “tit-for-tat” principle
• Neighboring cycle detection
• Neighboring cycle construction
– Bandwidth trading: get one chunk, serve multiple peers
34
Conclusion
• Extensive analysis and modeling to study the behaviors
of BT-like systems
– Tracker trace and .torrent downloading trace
– Mathematical model
• BitTorrent system has its limitations due to exponentially
decreasing peer arrival rate
– Service availability, performance stability, and fairness
• Graph based multi-torrent model
• System design for inter-torrent collaboration
35
Thank you!
36
Backup for Questions
37
Torrent Lifespan
 (t )  0e

t


log t   log 0  t

• Extract t and t from trace
• Get 0 and  using linear
regression
• Lifespan model verified by
measurement

0
 Tlife   log( )

torrent lifespan
104
torrent lifespan (hour)
tn  tn1  tn  1  1
 (tn ) 
trace
model
102
100
0
200
400
torrents
600
38
Torrent Population
Total population


• Model verified by measurement
t
N all   0e  dt  0
• Observations:
0
torrent population
104
– The population of most torrents
are small (102 in average)
trace
model
– Downloading failure ratio
N fail
R fail 
N all
102
– Small population  large Rfail
100
100
101
102
103
rank of torrents
(in non-ascending order of modeling results)
39
Torrent Evolution: Fluid Model
Basic equation set
 dx (t )   e  t   (x (t )  y (t )),
0
 dt
 dy (t )

 dt   (x (t )  y (t )  y (t )),


 x (0)  0, y (0)  1.
Resolution

 t
 1t
 2t
 x (t )  ae  be  d1e ,

 t
 1t
 2t
 y (t )  c1e  c2e  d 2e ,

y (t )
).
u(t )   ( 
x
(
t
)

Parameters
x(t)
number of downloaders
y(t)
number of seeds
0
initial peer arrival rate

attenuation parameter of 

uploading bandwidth
c
downloading bandwidth (c >> )

seed leaving rate

file sharing efficiency
1,2
a,b,c1,
c2,d1,d2
eigen values of the equation set
constants
40
Peer Request Pattern: Summary
•
Multi-torrent environment: an open model
– Torrent birth rate: 0.9454 per hour (nearly a constant)
– Peer birth rate: 19.37 per hour (nearly a constant)
– Torrent request rate (for all peers over all torrents): 133.39 per hour (nearly a
constant)
•
Actually increase slowly according to BigChampagne
increasing rate :
•
9.6 M  6.8 M
6.8 M
365  24 hours
 0.0047% per hour
Peer request pattern
– Lifecycle: downloading, seeding, sleeping, …, next req with prob. p
– Peer participation probability: 0.85
– Request rate (for different torrents by a peer): Poission-like
41
Tracker Site Overlay
• Table size
N
O(
)
1 p
• Node degree distribution
– Similar to unstructured P2P
networks
• Many content search and msg
routing algorithms
– Flooding
– Random walk
– …
• Trackerless BitTorrent: uses
DHT to store meta file
42
Simulation Experiments
without inter-collaboration
with inter-collaboration
content availability
downloading failure ratio
Rfail 0
performance stability
service fairness
downloading speed
contribution ratio
more stable
more balanced
Inter-torrent collaboration can improve BitTorrent performance
43
Download