Bittorrent -

advertisement
BitTorrent
BitTorrent network

On the itinerary:
 Introduction
to BitTorrent
 Basics & properties
 3 Interesting analysis results
Publishing

How to publish a (usually large) file ?
 Dedicated
server:
 Easy
to manage, Easy to find, Persistent service
 Nevertheless…
 BitTorrent:
 Organizes
multiple clients that share the same file
 Leveraging the upload bandwidth of the participants
 Self scaling, resilience, operates well in “Flash crowed”
period
 Takes 50%-60% of all p2p traffic
The Basics of BitTorrent

Content provider (everyday people) wants to
publish a file (the initial seed):
 Creates
a meta file (*.torrent)
 Publish this (light) *.torrent file on a web server
 File is broken to small blocks (32-256 KB)
 Uploads blocks to other peers
 The goal: to publish the file to many nodes by the help of
other peers, with minimum load on the seeder
The Basics of BitTorrent

Third party (the tracker):
A
tracker site keeps track of the active participants + extra
statistics.
 Upon requests from nodes, supplies a random subset of
active nodes
 Receives updates from the active nodes
 Keeps track of new node joining the ‘torrent’ (or ‘swarm’)
and nodes that left it
The Basics of BitTorrent

Peer (leecher) who is interested:
 Obtains
the public *.torrent file
 Being directed to the tracker
 Obtains a list of random neighbors (~40)
 Downloads and uploads blocks to its ‘best’ neighbors
(choking and unchoking)
 Upon download completion, becomes a seed
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent basic schemes

(Immediate) Problems arise:
 Last
block problem: assume nodes depart upon completion,
how would leeches obtain the last block ?
 Free riding problem: willing to download, but unwilling to
serve others

Simple and effective solutions:
 Last
block problem: Nodes employ Local Rarest First policy
 Free riding problem: Nodes employ “Tit For Tat” policy, i.e.
give more to those whom you accept more from
On the itinerary

One interesting limitation of BitTorrent networks
 BitTorrent
provides poor service availability
 via analysis of tracker logs over long period

Proposition of peer selection approach
 Enables
to lower costs and resources of ISPs
 Does not require ISP and peers cooperation
 Implementation of the proposition as part of Azureus client,
codename ‘Ono’

Tradeoffs
 Performance
ratio)
(avg. Download time) vs. Fairness (avg. share
BitTorrent limitations

Taken from the paper “Measurements, Analysis and
Modeling of BitTorrent-like systems” [1]
 Inspect
overall performance in the lifetime of a torrent
 Analysis is based on traces
 A model is derived which is used to draw conclusions
 Verify that the derived conclusions match the observed
behavior

Limitations were found:
 Poor
service availability (coming next)
 Fluctuating download performance
 Unfair service to peers
Service availability - Analysis

Analysis is based on traces
 Tracker
logs (~1500 torrents, sampled every 30 sec)
 Traces from servers that publish *.torrent files

Extracted data
 Identify
peers
 Birth time of the torrent, size
 For each peer in the same torrent: arrival time, download &
upload bandwidth, download & upload accumulative bytes
Service availability - Analysis
Y-axis at time t : The
total number of
requests for all torrents
in the trace minus the
cumulative number of
requests for all torrents
after time t, since they
are born.
 Similar observations are
seen in the *.torrent
metadata traces

Service availability - Analysis



This suggests an exponential decrease rate of
requests, since a torrent is born
Notice that this is a cumulative measure
Does a specific torrent behaves the same?
 Use
the least square method to measure how much a specific
torrent deviates from this logarithmic fitting
 Analyze the average relative deviation distribution (which is
mostly small – on average 6%)
Service availability – A model


Based on the observations, define the torrent
popularity at time t as the peers arrival rate, which
is the derivative of the of the peer arrival time
distribution for that torrent.
Arrival rate:
 Where
is the initial arrival rate when the torrent starts
 And
is the attenuation parameter
 Both are evaluated from the observations
Service availability – A model

Define
 Torrent
lifespan: duration from the birth to the time of no
complete copy (thus new leeches would not be able to
complete the download)
 Inter arrival time between two successive arriving peers
could be approximated as
 Assume seeds leave the system at rate then the average
service time for a seed is approximately
Service availability – A model

Look at consecutive peers that join the torrent
 Peer
n and (n+1) join the torrent in time t(n) and t(n+1)
respectively
 The inter arrival time between them is approximately
 Peer n downloads the file with speed u(n) and stays in the
torrent for time duration
 When peer arrival rate is small enough (n is large), peer
n+1, with speed u(n+1) <= u(n) could only be served by
peer n
Service availability – A model
 Thus
when
peer n+1 can’t complete the
download and the torrent is dead
 Using the definition of arrival rate, we get the torrent
lifespan:
 Both
and are extract from the trace (using linear
regression), as well as
 Compare results from the model to what the trace holds

Results match very well
Model vs. Observations
 Comparison
of
torrents lifespan:
average lifespan
according to trace
is 8.89 and 8.34
based on the model
Service availability - Summery

Conclusions were obtained relying on extensive
trace analysis and modeling
 Existing
BitTorrent systems provides poor service availability
 This is due to the exponential decreasing peer arrival rate
 This provides strong motivation for inter-torrent
collaborations
Next on the itinerary

One interesting limitation of BitTorrent networks
 BitTorrent
provides poor service availability
 via analysis of tracker logs over long period

Proposition of peer selection approach
 Enables
to lower costs and resources of ISPs
 Does not require ISP and peers cooperation
 Implementation of the proposition as part of Azureus client,
codename ‘Ono’

Tradeoffs
 Performance
ratio)
(avg. Download time) vs. Fairness (avg. share
Reducing cross-ISP traffic

Taken from the paper “Taming the torrent” [2]
 Motivation:
overwhelming popularity of p2p (70% of
internet traffic worldwide) yielded significant revenues
for ISP.
 However, p2p traffic significantly has increased ISP’s
costs, particularly in terms of cross-ISP traffic
 This has driven ISP to try and forcefully reduce p2p
traffic
 Block
specific ports, tricking clients to close connections
 Deep packet inspection
 Caching
Reducing cross-ISP traffic

One approach to alleviate this pain is to use an
Oracle that provides knowledge about which peers
are in the same ISP
 This
would benefit both ISPs and p2p community
 But, this requires p2p users and ISP to collaborate and
to trust each other
 Not likely to be adopted
Reducing cross-ISP traffic

Another approach is to recycle data that is already
being collected by Content Distribution Networks
 CDNs
attempt to improve web performance by
redirecting requests to replica servers
 The goal is to help content providers (i.e. CNN) to
distribute content by redirecting requests to replica
servers that are:
 Topologically
proximate
 Provide lower-latency
CDNs as oracles

Hypothesis: when peers exhibit similar redirection
behavior, they are likely to be close to the replica
server, and thus to each other
 Represent
redirection behavior using ratio-maps
 Each ratio represents the frequency of redirecting to a
specific replica
 Number of replicas is usually small (max 31)
 Keep a time window (~ a day)
CDNs redirections as ratio-maps

The ratio map of a peer is a set of (replica server,
ratio)
for peer a
 Specifically, if peer a is redirected toward replica
server r1 75% of the time window, and toward replica
server r2 25% of the time window, then the
corresponding ratio-map is
 The sum of all
in a given ratio map equals one

Similarity via ratio-maps

Define a metric that, given two peers, produces a
value describing the similarity between the peers’
redirections behavior
 We
are looking for overlap in redirection frequencies
maps between each two peers
 Use cosine-similarity between two peers a and b:
Cosine-similarity

Distance(a,b):
Sum is over the set of replica servers that a (b) to which
the peer has been redirected over the time window

is the ratio of time that peer a has been
redirected to replica server i
 Cosine-similarity is analogous to dot product
 When
maps are identical, equals 1
 When maps are orthogonal (no common replica), equals 0
 Values lie in [0,1]
 Determine a threshold (0.15)
CDNs implementation

Ono, an extension to Azureus client
 Upon
handshake of two peers, exchange ratio-maps
 This enables Ono to perform a biased peer selection
 Performs
DNS lookup for each CDN name to determine
redirection behavior and encodes it in ratio-maps
 Periodically update the ratio-maps
 Overhead
 18KB
is extremely small
upstream, 36KB downstream per day
 Computation of cosine-similarity is easy
Ono-recommended empirical results


Over 120,000 peers use Ono
Ono collects extra network data
 Ping,
Trace-Route to replica servers and peers
 Obtain feedback on the biased peer selection
 Not
easy to determine cross-ISP hops
 IP hops is easy and gives some measure

Compare Ono-recommended peers selection to
random peer selection
Ono-recommended empirical results
 Cumulative
Distribution function of the number of ip hops
taken along paths between Ono client and his peers.
 Each value represents the average number of hops for all
peers, seen by a particular Ono client during 6 hour interval



Ono finds shorter paths
Median in less than half
More than 20%
are only one hop
away, via less
than 2%
Ono-recommended empirical results
 Each
ip address was mapped to corresponding Autonomous
System id
 Similar to the previous graph


Over 33% of paths found
by Ono do not leave
the origin AS
Median AS hops is one
vs. less than 10% in
the random case
CDNs as oracles - Summery

Recycling network views collected by CDNs
 Good
internet citizenship in terms of reducing cross-ISP
traffic
 Performance of peers is not effected
 Scalable (the more clients adopt it, the more accurate
the bias would get)
 Available easily and freely
Last on the itinerary

One interesting limitation of BitTorrent networks
 BitTorrent
provides poor service availability
 via analysis of tracker logs over long period

Proposition of peer selection approach
 Enables
to lower costs and resources of ISPs
 Does not require ISP and peers cooperation
 Implementation of the proposition as part of Azureus client,
codename ‘Ono’

Tradeoffs
 Performance
ratio)
(avg. Download time) vs. Fairness (avg. share
BitTorrent Tradeoffs

Taken from the paper “The delicate tradeoffs in
BitTorrent-like file sharing protocol design” [3]
 Peers
that participate in BT are heterogeneous with
regard to download and upload capacities
 Taking a system approach
 The
system throughput depends critically on the “fat” peers
 However, this might result in unfairness towards those who
contribute more
 This in turn would encourage peers to supply low upload
rate to others
BitTorrent Tradeoffs

A user would look for download it gets to be
proportional to the upload it supplies
 Assuming
peers take the system overview
 Long lasting, steady state, rational

Two parameters and their inter relations are
explored
 Performance:
minimum average rate of download time
 Fairness: ratio between give and take
Model


W.L.O.G. the file is of size 1
Assume peer average arrival rate of
Assume Peers do not abort
 Upon completion, peer leaves the torrent (BT provides no
incentive of seeding)


Assume n classes (types) of peers
For each new peer arrival, with probability it belongs to
type I
 Thus, average arrival rate for class i is
 Class i has Ui and Di as upload and download capacities
 Assume U1>U2>…>Un (type 1 are the “fat” ones…)

Model

Visualization of the model for n=2
Model – measuring performance

Assume (quite natural) that the bottleneck is the
upload capacity
 i.e.


no network bottleneck such as server saturation
The file uploading capacity of the entire system
is
Consider the steady state:
 Define
as the average number of type i peers
 Approximation:
 Substitute the later in the former, in s.s.:
Model – measuring performance

In steady state, the capacity should be equal to the
arrival rate (as the file size is 1)
 Obtain
 Define
 In
“share-ratio”
a steady and balanced system, share-ratio should be 1
 Average


and rewrite as
system download time
Those two equations define the solution space, as
well as the resulting performance
Feasible solutions are the set
Model – measuring fairness


Consider share-ratio as a good and natural
measure of fairness
Define fairness index:
 Measures
how equal the ratios are (if all are the same,
it equals 1)


(after some work) Obtain:
Also expressed in terms of upload and download of
each class
Rate strategies

General assumption: all peers maximize their
upload capacity
 Based

on experiments
We want optimal average download time T
 Solve
 Use
the constraint problem
Lagrangian multiplier method
Rate strategies

Optimal average download time T that is obtained
is:
 We
get an assignment for
 The system gives the “thin” peers (other than type-1)
maximum upload capacity
 “Thin” peers get more than they contribute
 Calculate the fairness index under this solution (shown
to be quite low)
Rate strategies

Now, apply the same strategy to achieve optimal
fairness
 Then



check what is the resulting performance measure
Optimal fairness is achieved when
We get different assignments for
Compare the two:
 In
terms of system performance, we have:
 In terms of system fairness, we have:
Entire design space


Actually those are only two of infinite solutions for
assigning
The space lies on a curve
 Example:
a system of two types with specific capacities
Simulation results

Experiments with two-types system (with the same
characteristics as the last system)
 Average
 Fairness:
downloading time:
Simulation results

Fundamental tradeoffs:
 Taken

for the extreme values of the two strategies:
Summery:
 Cannot
enjoy both heavens
 Current BitTorrnet implementations lie somewhere on the
curve
Think about …

With regard to the first paper (service availability),
how did the ~8.3 average torrent lifespan was
deduced ?
 Thank
you (those who are still awake…)
Papers

[1] Measurements, Analysis, and Modeling of BitTorrent-like
Systems


[2] Taming the Torrent - A Practical Approach to Reducing Cross-ISP
Traffic in Peer-to-Peer Systems


Lei Guo, Songqing Chen, Zhen Xiao, Enhua Tan, Xiaoning Ding, and
Xiaodong Zhang
David R. Choffnes and Fabián E. Bustamante
[3] The Delicate Tradeoffs in BitTorrent-like FileSharing Protocol
Design

Bin Fan, Dah-Ming Chiu, John C.S. Lui
Download