On the Impact of Seed Scheduling in Peer-to-Peer Networks Flavio Esposito

advertisement
On the Impact of Seed Scheduling in Peer-to-Peer Networks
Boston University, Computer Science Technical Report BUCS-TR-2010-034
Flavio Esposito
Ibrahim Matta
Debajyoti Bera
Pietro Michiardi
flavio@cs.bu.edu
matta@cs.bu.edu
dbera@iiitd.ac.in
pietro.michiardi@eurecom.fr
Computer Science Department
Boston University
Boston, MA
Computer Science Department
Boston University
Boston, MA
IIIT-Delhi
New Delhi
India
Eurecom Institute
Sophia Antipolis
France
Abstract—In a content distribution (file sharing) scenario, the
initial phase is delicate due to the lack of global knowledge and
the dynamics of the overlay. An unwise piece dissemination in this
phase can cause delays in reaching steady state, thus increasing
file download times. After showing that finding the scheduling
strategy for optimal dissemination is computationally hard, even
when the offline knowledge of the overlay is given, we devise
a new class of scheduling algorithms at the seed (source peer
with full content), based on a proportional fair approach, and
we implement them on a real file sharing client. In addition to
simulation results, we validated on our own file sharing client
(BUTorrent) that our solution improves up to 25% the average
downloading time of a standard file sharing protocol. Moreover,
we give theoretical upper bounds on the improvements that our
scheduling strategies may achieve.
Index Terms—Peer-to-peer, seed scheduling, distributed protocols, BitTorrent.
I. I NTRODUCTION
I
N the last decade, content distribution has switched from
the traditional client-server model to the peer-to-peer
swarming model. Swarming, i.e. parallel download of a file
from multiple self-organizing peers with concurrent upload to
other requesting peers, is one of the most efficient methods for
multicasting bulk data. Such efficiency is in large part driven
by cooperation in the ad-hoc network forming the swarm. This
paradigm has recently been widely studied as it enables lot of
potential applications (see e.g., [1]–[4] and references therein),
with particular focus on making content distribution protocols
efficient and robust.
In particular, measurements [4], simulation [5], and analytical studies [6] on BitTorrent-like protocols have shown that,
even though peers cooperate, leveraging each other’s upload
capacity, this operation is not done optimally in every possible
scenario. Prior research in this area has therefore focused
on both the analysis and the improvement of clients (or
downloaders) in their strategies for selecting which neighbor
This work has been partially supported by a number of National Science
Foundation grants, including CISE/CCF Award #0820138, CISE/CSR Award
#0720604, CISE/CNS Award #0524477, CNS/ITR Award #0205294, and
CISE/EIA RI Award #0202067.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
(peer) to download from — including dealing with free riders
using rational tit-for-tat exchanges [3], or reducing inter-ISP
traffic using geographic or topological locality [7] — and
which part of the content (piece) to download next [8].
Other studies [9], [10] have shown that there may be source
bottlenecks due to a uneven piece distribution during initial
phases and due to churns — transient phases characterized
by a burst of simultaneous requests for a content. In fact,
during churn downloaders’ strategies are less effective since
the connected (neighboring) peers have either no interesting
content, or no content at all. As a consequence, the time to
download increases.
We claim that the source (also called seed) can help through
scheduling of pieces, in a better way than simply responding
immediately in a FCFS manner, upon downloaders requests
by providing a notion of fairness in distributing the content
pieces.
We propose to provide piece-level fairness via proportional
fair scheduling at the source of the content to ensure that
different pieces are uniformly distributed over the peer-topeer network. Specifically, we show when and how this policy
enforced by the seed may result into more effective exchange
of pieces and therefore into shorter average time to download.
The main contributions and the contents of the paper are
summarized as follows: in Section II we give an overview
of the BitTorrent protocol, defining some notions that we use
in later sections. In Section III we describe why a content
distribution protocol ala BitTorrent does not work well in
dynamic scenarios, then go on to describe the problem of
source scheduling of pieces to allow downloaders to finish
as fast as possible which we prove to be NP-Hard.
Consequently, in Section IV, we describe our polynomial
time seed scheduling algorithm for content distribution protocols and evaluate its performance in later sections.
In particular, in Section V we give an upper bound on the
performance improvement that our seed scheduling algorithm
may achieve, after showing why scheduling at the seed is
important to reduce the time to download. Moreover, we adapt
the analytical fluid model in [4] so it is valid even during
the transient (initial and churn) phases, capturing the effect of
seed scheduling, and we show that a smarter seed scheduling
increases the chances that leechers will more often find their
missing pieces among their neighbouring leechers.
Interested
Unchoke
Request
piece to A
Have
Interested
Seed
Seed
Seed
Seed
Seed
Seed
[0,0]
A
[0,0]
B
A
B
LRF
LRF
2
A
B
A
have(2)
B
A
B
For leechers, choking is based instead on two mechanisms:
(i) tit-for-tat and (ii) optimistic unchoke. The tit-for-tat peer
selection algorithm captures the strategy of preferring to send
pieces to leechers which had served them data in the past:
upload bandwidth is exchanged for download bandwidth, encouraging fair trading. In the optimistic unchoke, one leecher
(up to M ) is chosen at random from Γ(v). The objective is
mainly to explore potentially faster peers; furthermore, this
peer selection strategy is more effective than the tit-for-tat in
the initial rounds because few leechers have interesting pieces
to offer. In Figure 1(a) only two leechers (A and B) are in
Γ(s), so both are unchoked.
Unchoked leechers select one piece of the content they want
to download using the Local Rarest First (LRF ) algorithm.
Thanks to the disseminated bitfield, a leecher v counts how
many copies of each piece are present in Γ(v), and then
requests the rarest using LRF ; any ties are broken at random.
As soon as a peer gets a request it replies with the piece. In
Figure 1(a) the piece with ID 2 goes to the leecher A. Upon
reception of a piece, leechers inform all their neighbors with a
“have” message. Now, if leecher B does not have piece with
ID 2 yet, B sends an interested message to A.
Definition 2: (Initial Phase) Given a torrent T, we define
the initial phase of T to be the time interval between the first
scheduling decision made by the first seed, and the time the
last piece has been completely uploaded to some leecher in
the seed’s peerset.
After the initial phase is complete, we say that the protocol
enters the steady state.
For completeness of our protocol overview, we also introduce the last phase of the protocol. When a leecher v has
only one piece left, BitTorrent changes its normal behavior:
v requests for its last piece to all its kv neighbors; as soon
as some peer w replies with the piece, v sends to Γ(v) \ w a
“cancel” message to avoid waste of upload capacity. Formally
we have:
Definition 3: (End-Game Mode) Given a torrent T, we say
that a leecher v of T is in the end-game mode if it has
completed the download of all but its last piece.
As in [5], we ignore the end-game mode as is it inconsequential to our study, having no effect on the steady state
performance.
[0,0]
A
B
(a)
Round 1
piece 1 to A
Round 2
piece 1 to B
Round 3
piece 2 to B
Seed
Seed
Seed
1
1
Round 4
piece 2 to A
Seed
2
2
A
B
A
B
A
B
A
B
[1,0]
[0,0]
[1,0]
[1,0]
[1,0]
[1,1]
[1,1]
[1,1]
(b)
Fig. 1. (a) BitTorrent protocol: from left to right, the sequence of messages.
(b) Global information leads to faster content distribution. Peers unaware of
each other cannot use swarming.
Section VI experimentally validates our analysis of effective
swarming using both a known event-driven peer-to-peer simulator [11], and an implementation of our solution on a real file
sharing client (that we call BUtorrent [12]), confirming also
on Planetlab [13] that in dynamic overlays, we improve up to
25% the average downloading time. Finally, in Section VII we
discuss related work.
II. B IT T ORRENT OVERVIEW
BitTorrent is a file sharing protocol for content dissemination. The content is divided into 256 KB pieces so that peers
not having the whole content can speed up their download by
exchanging pieces among themselves. A peer-to-peer system
applying a swarming protocol labels the peers interested in
downloading pieces leechers, and peers that act only as servers
(i.e., only upload content) seeders or seeds. In this work, we
consider a torrent T to be a set of at least one seed, at least
two leechers, a tracker which is a special entity that helps
bootstrapping the peer-to-peer network, and a file F, split in
p pieces, to be distributed. Moreover, we use the following
definitions:
Definition 1: (Neighborhood) The neighborhood or peerset
of a peer v, denoted as Γ(v), is defined as the set of peers
directly connected to v. We denote its size |Γ(v)| as kv .
The BitTorrent protocol works as follows (see Figure 1(a)
for an illustration): if a peer vi , leecher or seed, has pieces that
another connected leecher vj does not have, an “interested”
message is sent from vj to vi . Together with the interested
message, a binary vector called bitfield, is sent. Every peer
has a bitfield, of length equal to the number of pieces of
F , and a bit is set to one if the peer has the corresponding
piece. Through the bitfield dissemination, each peer v has
information of the pieces present in Γ(v) (only). Amongst
all the interested leechers, only M leqkv (M = 4 in the first
version of the protocol) peers are given an opportunity to
download data, using the choke algorithm, which is applied
every re-choking interval Trc ,1 . For seeds, it is simply a round
robin schedule among the peers in Γ(s).
III. P ROTOCOL W EAKNESSES AND
P ROBLEM D EFINITION
We consider the problem of a seed scheduling the piece
whose injection helps reducing the download time for all
leechers, as well as the time it takes for the entire content to be
fully injected by the seed. We show how, especially in dynamic
overlays, increasing the efficiency of swarming translates into
reducing the requests to the seeds, that translates into reducing
the first content uploading time. As a consequence, the average
leechers downloading time is also reduced. To do so, we need
the following definitions:
Definition 4: (Effectiveness) Given a peer-to-peer system,
we define effectiveness of a file sharing η as the probability
1T
rc is (mysteriously !) set to ten seconds in most file sharing implementations ala BitTorrent.
2
that a leecher v has at least an interesting piece for its
neighborhood Γ(v).
We discuss the effectiveness in detail in Section V-B.
Definition 5: (Burstiness) Given a dynamic peer-to-peer
system —peers join and leave the overlay — we define
burstiness as the ratio of the peak rate of leechers’ arrival
to the average arrival rate during a given observation period.
Observe that the peak rate is defined as the ratio of the
size of a churn (number of newly arriving leechers) to the
length of the interval over which the churn occurs, and that
the average arrival rate is defined as the ratio of the total
number of leechers to the total considered time.
Definition 6: (Seed Utilization) We define seed utilization
as the ratio of the number of uploads from a seed to the average
number of uploads from all leechers in the system.
Definition 7: (Clustering Coefficient) Given a graph
G=(V,E) with V vertices and E edges, and denoting with
Ei ⊆ E the subset of edges that includes vertex i, we define
Clustering Coefficient for G [14]:
down their downloading time as they ask the seed for the same
pieces. To use swarming, peers have to have neighbors with
interesting pieces. Seeds are interesting by definition as they
have the whole content; leechers may not be because, the LRF
algorithm does not yield an equal distribution of pieces (and
replicas), so it is less likely for a leecher to find its missing
pieces at other neighboring leechers since the view leechers
have is limited. Henceforth, we refer to the equal distribution
of pieces as “piece fairness”. The lack of piece fairness is
especially more pronounced when the clustering coefficient of
the overlay is low.
Without fairness, requests to the seed for the same pieces
can occur more frequently before the whole content is injected.
This can lead to torrent death if the seed leaves the system
prematurely and leechers are left with incomplete content.
In some other cases, uploading twice the same piece can be
even costly. For example, if the seed is run using the Amazon
S3 [16] service, higher seed utilization results in higher
monetary costs for using Amazon resources (bandwidth, CPU,
etc.)
|V |
Ei
1 X
.
CC =
|V | i=1
k vi + 1
2
(1)
B. Dynamic Overlays
The problem we present here is related to the the dynamic
nature of the overlays: lack of offline knowledge of which
peer will be present at which instant of time, makes the piece
distribution problem challenging. In fact, in case a seed or
a leecher schedules a piece to a leecher j that later goes
temporarily or definitively offline, the effectiveness of the
overlay segment containing j is reduced inevitably and so
performance decreases.
Although BitTorrent has been widely studied by the research
community and recognized as a robust protocol, it is far from
being an ideal protocol in every scenario. In the presence of
burstiness for example, i.e., churn phases, peersets can be too
small to significantly benefit from swarming [15]. Moreover,
for overlays with low clustering coefficient, seed utilization,
can be surprisingly high. In our work we focus on seeds rather
than leechers.
To the best of our knowledge, only in [5] seeds have been
taken into consideration, assuming significant changes to the
whole BitTorrent protocol. In our approach, we only modify
the scheduling at the seed, leaving the rest of the BitTorrent
protocol intact.
Currently, the piece selection algorithm in BitTorrent-like
protocols, is done uniquely by the leechers throughout the
LRF algorithm; the seed simply replies to the leechers
requests in a first come first serve (FCFS) fashion. The
scheduling is crucial in the transient phases since the injection
of too many duplicates diminishes the swarming efficiency and
wastes seed upload bandwidth. During any transient phase,
almost every piece has the same LRF rarity ranking, as almost
no leecher has any piece, and since in LRF ties are broken
at random, a leecher may end up asking the same piece that
one current neighbor already asked in the current or previous
rounds.
C. The Initial Phase Problem
A third problem occurs during the initial phase of a torrent.
We generalize the initial phase concept to a transient phase,
namely, any phase in which the system does not fully utilize
its swarming properties due to the lack of pieces to trade.
Accordingly, in the rest of the paper we interchangeably use
transient and initial phase to mean any phase displaying a poor
interest in unchoking leechers.
Recalling how the tit-for-tat mechanism works (Section II),
when a new group of leechers joins the overlay (churn phase),
even though swarming techniques are used, leechers have
not yet downloaded many pieces, so every leecher will most
probably (and not surely due to the rare optimistic unchoke
events) ask the seed for its missing pieces, making the seed a
bottleneck. Therefore, during any transient phase, the seed is
one of the few interesting peers. Even though this congestion
at the source (seed) is inevitable, and may not last for too long,
this phase can be prolonged if the dissemination is inefficient.
We are not the first to observe that source bottlenecks occur
due to inefficient piece dissemination during flash crowds and
churn phases [17]. We consider any churn phase to be another
initial phase, and so during all the initial phases of a torrent,
a wise seed scheduling is crucial.
A. Global vs Local Information
Consider Figure 1 (b): in the first round the seed uploads
piece 1 to leecher A which updates its bitfield vector. Since
A is not connected to B, no “have” message is sent. In the
second round, the seed uploads the same piece 1 to B. The
same happens in the third and fourth round for piece 2. Peers
unaware of each other cannot use swarming, which slows
3
that there is only one seed and a static set of leechers, whose
topology is known to the seed. To compute the complexity of
the problem, instead of the peers asking for pieces, we let,
as in [5], the seed decide which piece to give to which peer.
The seed is allowed to upload pieces to all of its peers in any
round, but we restrict the seeds activity to only the first round
– the seed neither uploads any piece nor plays any other role
on all but the first round. Therefore, it becomes necessary to
make one additional assumption that the number of pieces is
upper bounded by the number of leechers. The objective of the
seed is to minimize the time taken by all peers to download
all the pieces.
Since the seed knows the topology, the number of rounds
for this case is surely a lower bound on the number of rounds it
would take if the leechers apply any piece selection algorithm
(e.g. LRF ) to decide which pieces to request. We show
that even in this simplified form, it is NP-hard for the seed
to decide how to schedule its upload for the first and only
round. Only leechers which are reachable from one another
can effectively participate in swarming, hence, we assume that
all leechers form a connected graph; our hardness result works
independently for each connected component of the overlay.
We prove the NP-hardness by considering the following
decision version of a graph theoretic model of the above
problem.
Definition 10: (Complete Piece Dissemination) Given an
undirected connected graph G(V, E), an integer k ≤ |V | and
an integer p ≤ |V |, we want to decide if there is a way to
assign one piece pv ∈ {1, . . . , p} to every vertex v such that
the following holds:
1) Before the beginning of round 1, vertex v is given piece
pv
2) In each subsequent round, v collects all the pieces in its
neighborhood
3) By the end of round k, v has collected all the pieces
1, . . . , p.
We now show that the Complete Piece Dissemination problem is NP-complete by reducing from another graph problem,
the Generalized Domatic Partition (GDM), which is a generalization of a known NP-complete problem - Domatic Partition
[18]. In order to do that, we generalize the Definition 1 of
neighborhood to k hops.
Definition 11: (k-neighborhood) Let G(V, E) denote an
undirected graph. Given a subset S ⊆ V Sof vertices, we
define the neighborhood of S as Γ(S) = v∈S Γ(v). The
k-neighborhood of a vertex v is inductively defined as the set
of all vertices that are reachable from v by a path of length
k or less. Formally:
{v}
if k = 0
Γk (v) =
(2)
Γk−1 (v) ∪ Γ(Γk−1 (v)) if k > 0
D. Example of Inefficiency of the Initial Phase
We illustrate the inefficiency of the initial phase through an
example.
Definition 8: (Collision) We define a collision event at
round i, Ci to be the event that two leechers ask the seed
for the same piece at round i.
Note that if two connected leechers generate a collision at
round i, and both are served, we have a suboptimal use of
the seed upload capacity as one leecher could download the
colliding piece using swarming.
Definition 9: (Earliest Content Uploading Time) We define
earliest content uploading time, the time it takes for the seed
to upload at least once every piece of the file.
A collision event slows down performance with respect to
both the earliest content uploading time, and downloading
time, since the seed’s upload capacity is used to upload more
than once the same piece. In particular, for every occurring
collision, an extra round may be needed to inject the whole
content.
Example of optimal seed scheduling: Let us consider one
seed, six pieces, and three leechers A,B and C. Assume that,
running LRF , the leechers end up asking the seed for pieces in
the following order: A = [1, 2, 3, 4, 5, 6], B = [4, 5, 3, 1, 6, 2]
and C = [3, 6, 2, 5, 1, 4]. Let us also assume the seed uploads
three requests in each round. After the first seed scheduling
round, pieces 1,4 and 3 are uploaded, and after the second
round, all the content is injected. This is the optimal case: to
inject six pieces the seed needs two rounds.
Collision example: If instead the LRF for B ends up with the
permutation B = [4, 2, 1, 3, 6, 5], we notice that in the second
scheduling round, we have a collision between leecher A and
B, both asking the seed for piece 2, and so two scheduling
rounds are not enough anymore; to inject the whole content
we need to wait three rounds.
Nowadays, BitTorrent-like protocols do not have policies for
the seed to schedule pieces, since the piece selection is done
by LRF . In the next sections we present an algorithm that
attempts to overcome the lack of leechers’ awareness of each
other, improving piece fairness, and we show that scheduling
is crucial for boosting swarming effects.
E. Complexity of Piece Dissemination
To motivate our attempt to devise a better scheduling at the
seed, we show that finding the optimal piece dissemination
is computationally hard, even if the offline knowledge is
given about the overlay of the peers in the peerset. As
described in Section III, one focus of the seed scheduling is
to increase swarming efficiency — increasing the chances that
neighbouring leechers have interesting pieces to trade — and
therefore to reduce downloading time. As an additional benefit,
such efficiency reduces uploads requests from the seed, whose
upload capacity may turn out to be the bottleneck in the initial
phases as shown in [9], [10].
We consider a simplified offline variant of the problem
which distributes all the pieces among the leechers such that
they can exchange pieces in the best possible way. Suppose
Furthermore, S
the k-neighborhood of a set of vertices is defined
as: Γk (S) = v∈S Γk (v).
Definition 12: (Dominating Set) Given an undirected connected graph G, a dominating set S of G is a set of vertices
4
such that each vertex either belongs to S or it is a neighbor
of some vertex in S i.e., every vertex v ∈ Γ1 (S).
Definition 13: (Domatic Partition) Given an undirected
connected graph G, a domatic partition of G is a partition
of the vertices such that each partition is a dominating set.
The size of a domatic partition is the number of partitions.
The domatic number of a graph is the maximim size of any
domatic partition.
For any connected graph, its domatic number is at least 2
and at most δ + 1 where δ is the minimum degree of any
vertex in the graph. It is well known that finding the domatic
number of any graph is NP-hard [19]; in other words, the
following problem is NP-complete: Given a graph G and an
integer p ≤ |V |, is there a domatic partition of size p?
We generalize the above definitions to include the concept
of k-neighborhood.
Definition 14: (Generalized Dominating Set) Given an
undirected connected graph G(V, E), a k-th order dominating
set is a set of vertices S such that for every vertex v ∈ V ,
v ∈ Γk (S).
Definition 15: (Generalized Domatic Partition) Given an
undirected connected graph G(V, E), an integer k ≤ |V |, and
an integer p ≤ |V |, does there exist a partition of V into
V1 , . . . , Vp such that each Vi is a k-th order dominating set?
Lemma 1: The Generalized Domatic Partition problem is
NP-complete.
Proof: Observe that the original problem of dominating
set (and domatic partition) is a special case of generalized
dominating set (and generalized domatic partition) with k = 1.
It is easy to see that generalized domatic partition is in NP,
since given any partition, it can be verified in polynomial time
if each of the partitions indeed forms a k-th order dominating
set. Therefore, the generalized domatic partition problem is
also NP-complete.
Theorem 2: The Complete Piece Dissemination problem is
NP complete.
Proof: The complete piece dissemination and the generalized domatic partition problems are equivalent: the vertices
initially given piece i in the complete piece dissemination
problem form the partition Vi in a generalized domatic partition and vice-versa. Since the generalized domatic partition
problem is NP-complete, therefore so is the complete piece
dissemination problem.
We end this section by discussing our assumptions, pointing
out how the Complete Piece Dissemination problem, whose
complexity we deduced to be NP-complete, might differ from
real systems, with respect to peer-to-peer protocols ala BitTorrent. In general, the scenario to analyze is far more complex
as overlays may have multiple connected (but uncoordinated)
seeds, thereby allowing more pieces in the network in a shorter
time. Moreover, these protocols have a limit on the number
of TCP connections that any leecher may keep open (in the
first version of BitTorrent for example, the maximum degree
was set to eighty), and a limit on the number of simultaneous
uploads by a seed or a leecher, thereby limiting the overlay
size and therefore the rate of piece exchange. A significant
difference is the dynamic nature of a real overlay, where
leechers join and leave, seeds arrive and depart from the
system, resulting in a constantly changing overlay structure.
Theoretical analysis of such dynamic systems, as well as
design of protocols that would also orchestrate cooperation
among the seeds (besides the usual cooperation among leechers) are beyond the aims of this paper and are left as interesting
research directions.
IV. S EED S CHEDULING
In this section we provide a detailed explanation of how we
propose to modify any swarming protocol ala BitTorrent. Our
solution, based on the Proportional Fair Scheduling (P F S)
algorithm [20], [21], exploits and improves the method conceived by Bharambe et al. [5], where memory about pieces
scheduled in the past was first introduced. Furthermore, we
formally present the scheduling algorithm that we applied at
the seed.
A. Detailed Idea of our PFS Scheduling at the Seed
Smartseed [5] inserts an intelligence into the seed by actively injecting the pieces least uploaded in the past, changing
the nature of the BitTorrent protocol from pull (on demand)
to push (proactive). Our seed strategy instead, consists of
unchoking every possible leecher, so that all the requests are
taken into account, and uploading the M (four) pieces that are
both requested the most in the current round and uploaded the
least in the past rounds, without changing the way BitTorrentlike protocols work. The piece requested the most in the
current scheduling round represents the instantaneous need,
namely, the present, while the least uploaded piece represents
the past scheduling decisions. We can view Smartseed as
considering only the past. The present need gives the seed a
rough view of which pieces leechers currently need, perhaps
because neighboring leechers do not have these pieces. While
past uploads gives the seed the ability to inject the piece
that is least replicated from its vantage point. By equalizing
the number of piece replicas, the probability that leechers
have pieces that are of interest to their neighboring leechers
increases.
Formally, if ri is the number of the requests for piece i at
the current round, and τi is the throughput vector for piece i,
namely, the number of times piece i has been uploaded in all
the previous rounds, P F S chooses the piece i∗ that maximizes
the ratio ri [t]/τi [t − 1], namely:
ri [t]
(3)
i∗ [t] = arg max
i
τi [t − 1] + where is just a small positive constant introduced to avoid
division by zero. The ri is decreased every time a piece
message is sent and τi is increased every time a piece has
been completely uploaded. The BitTorrent protocol splits
the 256 KB piece further into sub-pieces. We ignored this
further fragmentation in our simulations but not in our system
implementation (BUtorrent [12]).
5
Input: Γ (seed connections), Trc (re-choking interval),
T (timeout for collecting requests).
Output: Set of M pieces to schedule per round.
while Seed S has connected leechers do
if Rechoking interval time Trc expires then
foreach Leecher i in peerset of S do
unchoke i;
end
while S receives requests for pieces within T do
update vector of requests R = {r1 , . . . , rp }
foreach q = 1, . . . , M do
S sends piece q (seed applies FCFS);
update R and τ ;
end
end
if collected requests > M then
2
β ← Γ+1
foreach k = 1, . . . ,nM doo
B. History has to be forgotten
When peersets’ peer (neighborhoods) are dynamic, keeping
track of the pieces uploaded since the beginning of the torrent
is a bad idea, since some leechers may have left, others may
have arrived. The intuition is that the weights that τi brings
into the scheduling decision has to decrease over time. For
this reason, every time the seed uploads a piece, we update τi
using exponential weighted moving average (EWMA), namely:
τi [t + 1] = β · Ii [t] + (1 − β) · τi [t]
(4)
where Ii [t] is an indicator function such that:
1 if piece i was uploaded at round (time) t
Ii [t] =
0 otherwise.
(5)
From power series analysis [22], we can express the smoothing factor β in terms of N time periods (scheduling decisions)
as: β = N2+1 . This means that an upload decision is forgotten
after N rounds.
What is the best choice for N then? Let us assume that a
considered seed S has, at most Γ connections (in BitTorrent,
Γ = 80). Let us also assume a worse-case scenario, where
the seed’s neighborhood is made of leechers that do not serve
each other. When the peerset is static (peers neither arrive nor
depart), then the seed can schedule the same piece i at most
Γ times, one for each leecher, since peers do not ask again for
a piece they already have.
If the seed happens again to receive a request for piece i, it
means the request is coming from a new leecher which joined
the overlay after the first time i was uploaded by the seed.
So it is wise to upload that piece again. In other words, if a
seed has received Γ requests for the same piece, the (Γ + 1)th
2
request should be counted as fresh and so, we set β = Γ+1
so
the oldest upload decision for this piece gets to be forgotten.
As shown in our simulation study, the value of β should
depend on the level of burstiness of the system. The challenge
here is that the scheduling is an online process. If the seed does
not know in advance how dynamic the peerset is, a static value
of β may only work for certain cases. For classic BitTorrent
overlays, where burstiness values are not extremely high [10],
2
∼
a value of β = 80+1
= 0.0247 is a good heuristic.
Notice that 0 ≤ β ≤ 1 and that, the difference between
BitTorrent and P F Sβ=1 (no memory of the history at all),
is that P F Sβ=1 serves the pieces requested the most in each
round, considering all its connections (peerset), while seeds in
BitTorrent apply FCFS and do round robin over the leechers
in its peerset. Moreover, if the peerset of the seed is static,
β → 0, i.e. remembering the whole history, is the best choice.
A more elaborate solution would be to use a control
approach to estimate β. We leave this approach for future
work.
i
i∗k ← arg maxi τir+
(break ties at
random)
ri∗k ← ri∗k − 1,
foreach j = 1, . . . , p do
if j = i∗k then
Ij = 1 ;
else
Ij = 0 ;
end
τj ← β Ij + (1-β) τj ;
end
end
S chokes the N − M peers whose request
were not scheduled ;
end
Keep serving leechers which requested i = i∗k ;
Update R and τ as above every uploaded piece;
end
end
Algorithm 1: Seed Scheduling in BUtorrent [12].
In order to have a global view, the seed unchokes every
leecher in its peerset, allowing them to request their LRF .
Then a timer T is started. The first M requests (in BitTorrent
and in our experiments M = 4) are served by the seed as
BitTorrent does (FCFS). This is because the collection of
requests may take some time due to congestion in the underlay
network, and we do not want to waste seed uploading capacity.
Moreover, some of the Γ requests the seed is expecting may
never arrive. That is why we need the timer T : leechers send
their “interested” messages to all their neighborhood peerset,
so it is possible that in the time between the “interested”
message arrives at the seed and the unchoke message is sent by
the seed, a leecher may have started downloading its remaining
pieces from other peers, having nothing else to request from
the seed.
When the timer T expires, P F S is run on the collected
C. Our PFS Algorithm
In this subsection we present formally the Proportional Fair
Seed Scheduling. We have implemented this algorithm in a
new file sharing client that we call BUTorrent [12].
6
requests. Since the seed upload capacity has to be divided by
M , if the collected requests are less than M there is no choice
to make and so no reason to apply P F S. After choosing
the best P F S pieces, up to M other uploads begin, namely,
the seed does not interrupt the upload of the first M pieces
requested during the time T . Lastly, the N − M requests that
were not selected, are freed by choking the requesting peers
again. After the P F S decision, for a time Trc the M peers
whose requests were selected are kept unchoked and so they
may request more pieces. So M P F S decisions are made
every Trc . 2
Lemma 3: If L leechers have to choose a piece uniformly
at random from p pieces, then the expected number of pairs
of leechers requesting the same piece is L2 p1 .
Proof: Consider n2 indicator random variables Xij for
i, j = 1, . . . , n, where Xij = 1 if leecher i and j request the
same piece and 0 otherwise. Since every leecher chooses a
piece at random, Pr[Xij = 1] = 1/p and therefore, E[Xij ] =
1/p.
P
Consider the random variable X =
i<j Xij estimating
the total number of collisions. Then,P
the expected number
of
total collisions is given by E[X] = i<j E[Xij ] = L2 1/p.
V. A NALYTICAL R ESULTS
Let us consider an initial phase of a torrent with one seed
and L fully connected leechers, trying to download a file with
p pieces. 3 Let P l denote the pieces yet to be uploaded at
the beginning of round l; note how, for l = 1 we have P 1 =
{1, . . . , p} , and P l+1 ⊂ P l constantly reduces in size until
the initial phase is over at round r when P r = ∅. According
to our assumptions, once the seed uploads a piece to some
leecher, other leechers do not request it from the seed but
obtain it from one of the other leechers. Thus, every leecher
requests from the seed a piece chosen uniformly at random
from P l . Therefore, according to the Lemma 3, as l increases,
the size of the set P l decreases, and so the expected probability
of having a collision is higher in later rounds. Note how the
number of collisions even in the earlier rounds increases as
the number of pieces p becomes comparable to the number
of leechers L. In particular, the expected number of collisions
are larger than zero even in the first round when p < L2 .
Let us now show the two improvement bounds induced by
the use of our proportional fair heuristic: first, on the number
of wasted upload slots due to collision events that may arise,
and second, on the earliest content uploading time (Definition 9). Note that the performance in terms of both collisions
and earliest content uploading time of P F S are never worse
than FCFS. In FCFS in fact, the M pieces requested and served
may or may not have duplicates depending upon the nature
of the overlay and arrival order of requests. Therefore, we
may have the same of fewer collisions when using P F S, but
not more. From a theoretical point of view this implies that,
without complete knowledge of the overlay, we are only able
to show an upper bound on the improvement.
Proposition 4: In a peer-to-peer network with one seed and
homogeneous bandwidth, the P F S piece selection algorithm
decreases, with respect to FCFS, the total number of collisions
as well as the earliest content uploading time by a factor of
at most M , where M is the number of simultaneously active
connections.
Proof: See appendix.
In this section we give theoretical evidence that introducing
smarter scheduling at the seed may be highly effective in terms
of performance. We also show how the effectiveness of file
sharing η (defined in 4), that our algorithm improves reducing
downloading time, can capture the effect of seed scheduling
using the fluid model in [4].
A. Collisions during the initial phases
In Section III we have defined the initial phase problem
(Section III-C) showing with an example (Section III-D) how
serving the same piece in a round to two different connected
leechers may result in a suboptimal use of the seed uploading capacity. We have thus devised P F S (Section IV-C) to
determine which requests the seed should satisfy, limiting the
negative effects of what we have defined collisions (Definition 8). In this subsection we show theoretical bounds on the
advantage of using P F S over a FCFS scheduling approach, in
reducing both these collisions and therefore the earliest content
uploading time (Definition 9).
Leechers follow the LRF piece selection algorithm, i.e.
in each scheduling round, every leecher makes a request for a
piece randomly chosen among its equally rarest missing pieces
in its direct neighbors. The seed scheduling thus depends
heavily on the current leechers’ connectivity, and, as we have
shown earlier in Section III-E, solving it optimally in terms
of the earliest conent uploading time is NP-hard, even when
the overlay is static and known to the seed.
We assume the worse case scenario of only one seed and to
simplify our analysis, we consider, as in [4], uniform upload
and download capacity among the M concurrently active
connections. We denote the total number of leechers as L.
We also assume a fully connected overlay, so that swarming is
most effective. Under this last assumption, a piece downloaded
by one leecher is not requested again from the seed by other
leechers.
Under the assumptions given above, we can quantify the
expected number of collisions in any particular round by using
a variant of the Birthday Paradox adapted to our scenario.
B. Effectiveness during initial phases
In this subsection we extend the Qiu-Srikant model [4]
to capture the initial phase of a torrent, modeling the seed
2 The
choice of Trc is not trivial and it should not be static as overlays are
not. We briefly show this point in our simulations, though leaving an extensive
exploration of this parameter as an open question. BitTorrent implementation
has it set to 10 seconds.
3 Any small-world network, such as those exhibited by leechers as evident
in recent experiments [23], would show a similar connectivity.
7
11
01
2
1
First Come
Serve
First
Come
FirstFirst
Serve
First
Come
First
Serve
First
Come
First
Serve
Proportional
Fair Scheduling
Proportional
Fair
Scheduling
Proportional
Fair
Scheduling
Proportional Fair Scheduling
10 1.8
Probability of having a piece
0.81.6
0.8
0.8
Probability
of
having
a piece
Probability
of of
having
Probability
havinga apiece
piece
Probability of having a piece
0.9
0.8
η =1−P
0.71.4
!1
10
0.6
0.6
0.6
0.61.2
0.5
!2
1
= 1 − P k,
P =P
0.2
leecher j has all pieces of leecher i
.
0.6
0.2
0.2
0.2
!3
0.4
0
0
10
0.1
20
40
Pieces
0.2
0000
000
0
10
0
10
0.2
20
20
20
0.4
30
0.6
40
40
0.8
50
Pieces
1
Pieces
Pieces
60
80
Thus:
100
60
60
60
1.2
70
1.4
80
80
180
10 1.6
90
1.8
100
100
100
2
P =
Fig. 2. The number of pieces each downloaders has in the initial phase of
a torrent follows a power law distribution. In particular, the skewness α of
the Zipf distribution is α = 1.4 for a first come first serve ala BitTorrent and
α = 1.9 for our P F S.
dx
dt = λ − θx(t) − min{cx(t), µ(ηx(t) + y(t)),
dy
dt = min{cx(t), µ(ηx(t) + y(t))} − γy(t).
1
c2
p−1
X
nj
1
·
(n
nj )α
i
=0
X
nj =1 ni
p − ni
nj − ni
p
nj
;
(7)
Pp
where c = i=1 i−α is the normalization constant of the Zipf
distribution. Therefore we obtain:
p − ni
p − ni
p − ni
=
=
;
nj − ni
p − ni − nj + ni
p − nj
scheduling effect through one of its crucial parameter, the
effectiveness of file sharing η, defined in Section III. In
particular, in [4] there is the assumption that each leecher has
a number of pieces uniformly distributed in {0, . . . , p − 1},
where p is the number of pieces of the served file.
This assumption is invalid in the initial phase of a torrent.
Consider Figure 2: 350 leechers join at once a torrent for
downloading a file of 400M B with only one seed. We stop the
simulation before leechers can possibly reach the steady state.
We notice that (i) the number of pieces each downloaders has
in the initial phase of a torrent follows a power law distribution
and (ii) using the proportional fair scheduling, the skewness
α of the Zipf distribution is higher (α = 1.4 for a first come
first serve ala BitTorrent and α = 1.9 for our P F S). The
higher is alpha, the higher is the probability that the a peer
has interesting pieces for its neighbors, and so the higher are
the swarming effects. Notice how for 400M B, with a piece
size of 256 we have 1600 pieces, but the probability of having
more than 100 is less than 0.1. More precisely, the simulation
is being stopped after 1600 seed scheduling decisions, i.e.,
the time needed from an ideal round robin seed scheduling to
inject the whole content.
The Qiu-Srikant fluid model captures the evolution of seeds
y(t) and leechers x(t) in the overlay. Let λ be the new leechers
arrival rate, c and µ the download and upload bandwidth of
all leechers, respectively, θ the rate at which downloaders
abort the download, γ as the seed abandon rate and η as
effectiveness. From [4] we have:
k
where:
0.4
0.4
10
0.4
0.4
0.8
0.3
leecher j needs no
piece from leecher i
hence,
P =
p−1 nj
1
(p − ni )! nj !
1 X X
·
c2 n =1 n =0 (ni nj )α (nj − ni )! p!
j
(8)
i
and so:
k
nj
p−1
X
X
n
!
1
(p
−
n
)!
j
i
 .
η =1− 2
·
c p! n =1 nα
nα (nj − ni )!
j
n =0 i

j
(9)
i
1) Leechers: In Figure 3(a), we present the evolution of the
number of leechers (x(t) in Equation 6). Admissible values of
η depend on the typical values of skewness α ∈ [1, 4], plugged
into Equation (9). Results in this section are obtained under the
stability conditions described in [4]; in particular, θ = 0.001,
γ = 0.2, c = 1, µ = 1 and λ = {0, 2, 5} representing different
levels of burstiness of the arrival process of new peers. Starting
with one seed and 350 leechers, i.e. x(0) = 350 and y(0) = 1,
set as in our later simulations, we plot the number of leechers
in the system to study the impact of λ and the effectiveness
η of file sharing.
The analytical model provides the insight that for static
peersets (λ = 0), the improvement we can achieve is limited
even if we bring effectiveness to one through judicious seed
scheduling. When burstiness increases, even a small improvement in η is significant in terms of reduction in total time
to download. Note that shortest downloading time implies a
smaller number of leechers in the system.
2) Seeds: In Figure 3(b), 4(a) and 4(b) we show, the impact
of the effectiveness on the evolution of the number of seeds in
the system (y(t) in Equation 6). In Figure 3(a) three different
constant leechers’ arrival rate are represented. In Figure 4(a),
we tested the impact of effectiveness for a random leechers
arrival rate with average λ = 10 and in Figure 4(b) we
show the seed evolution with random leechers’ arrival rate and
average λ = 30. The common results is that, independently
from the level of peers arrival (burstiness), at the beginning of
the torrent — the initial phase — higher values of effectiveness
(6)
Our experiments revealed that the number of pieces that
a leecher has in initial and churn phases is not uniformly
distributed but follows a power law distribution. For lack of
space, we do not show such experiments. So we have:
leecher i has no
η =1−P
,
piece useful for its peerset
and so for two leechers i and j:
8
Effectiveness (")
Number of Seeds in the System
Number
in the
theSystem
System
Number of
of Leechers in
ofthe
in System
the
System
Number
ofthe
Seeds
in the
System
of
in
the
Number
ofin
Seeds
inSeeds
System
Number
ofNumber
Seeds
inSeeds
System
Number
ofNumber
Seeds
the
System
Number
of
Seeds
in
Number
Seeds in
System
Number
of
Leechers
in the
theSystem
System
Number
of of
in
the
Number
in the
Number
Leechers
inSystem
theSystem
System
Numberofof
ofSeeds
in
the
Number of Seeds in the System
Number
of
inNumber
theSystem
System
Number
of Leechers in
the
Number
in the
theSystem
System
Number of
of Leechers
in
of
in the
theSystem
System
Number
of Leechers in
Number
in the
theSystem
System
Number of
of Leechers in
Number
of
in the
theSystem
System
Number
of Leechers in
Number
of
Leechers
in
the
System
Number
of
in
the
System
Number
of
in
the
System
Number
of Leechers
in
System
Number
of
in the
theSystem
System
Number
of
in
Number
of
Leechers
in Leechers
theSystem
System
Number
ofthe
in
the
Number
of
inSystem
theSystem
System
Number
of Leechers
the
Number
of
in the
thein
System
Number
of Leechers
in
Number
Leechers
in the
theSystem
System
Number
of
in
Number
of
Leechers
inof
the
System
Number
of
in
the
System
Number
Leechers
in the
theSystem
System
Number of
of
in
Number
of
in the
theSystem
System
Number
of Leechers
in
Number
in the
theSystem
System
Number of
of Leechers in
Number
in the
theSystem
System
Number of
of Leechers in
Number
in the
theSystem
System
Number of
of Leechers in
Number
in the
theSystem
System
Number of
of Leechers in
Number
in the
theSystem
System
Number of
of Leechers in
Number
in the
theSystem
System
Number of
of Leechers in
350 400
350
400
350 400
400
60
60 ! = 5 60 300
!=5
350 400
350 400
350 350
Burstiness
Impact onImpact
Effectiveness
350
Burstiness
on Effectiveness (Seeds Evolution)
300 400
Impact
! = 5 on
Burstiness ImpactBurstiness
on Effectiveness
350
!=2
=5!=5
! =Effectiveness
5 ! = !5 =!
5= 5 !300
350
400
60
Burstiness
Impact
on
Effectiveness
(Seeds
Evolution)
350
400
350
250
400
350
350
! = 5 (Seeds Evolution)
60
300 300
300
!=5
350
350
350
! = 5on Effectiveness
Burstiness
Burstiness Impact on Effectiveness
Burstiness Impact
350
300 350
!=2
300350
300 300
300 350
300 300
300
Burstiness
Impact
on
Effectiveness
250
50
50
50
350
400
60
60
250
250
300
Burstiness
Impact
on
Effectiveness
(Seeds Evolution)
350
350
350
400
350
300 250 ! = 5
300 300
300
200
60
300 50
!=2
Burstiness Impact on Effectiveness (Seeds Evolution)
!
=
5
300 300 300 300 300 250
300
3
Burstiness Impact on Effectiveness
250 250
50
250 250 250
250
250300
60
250 250 400
350
200
Burstiness
Impact on
Effectiveness
(Seeds Evolution)
Burstiness Impact on 3Effectiveness
250
350
Burstiness
Impact on Effectiveness
(Seeds
Evolution)
Burstiness Impact on Effectiveness
250
Burstiness Impact on Effectiveness
200
300 300
!
=200
0
300
350
40
40
40
!
=
2
50
60
300
50
!=2
350
400
3
60
300
!=5
350
400
350
400
250
Burstiness
Impact
on
Effectiveness
2.8
3
250
3
250
150
3
250 200
250 250 250 250 250 250
250 40
!! ==5 5
3 3
!=2
200 250
!=5
50
350
400
200
200
2.8
!=2
300
150 2.8200 200 200 200 200
! = 0350
200
350
! = 2 ! = 2 ! = 2 ! = 2 !200
3
300
50
!=5
!=2
= 2 ! = 2 200 40
50
250 300
250150 200
250
300
! =250
2
3
2.8
350
2.8
2.8 2.6
350
200
2.82.8
300
150
250
300
! = 0.8 200 200
30
30
30
40
200
200
40
200150 200
200 300
50
200 200
100
200
!
=
2
!
=
2
!
=
0
2.6
200 150
3
350
150
250
250
150
200 200
250
150
!=2
250
!100
=2.8
0.92.62.6
150
40
300
30
300
40
2.8
150 150
150
200
200 250
150 150
Time
2.62.6[s] 2.6
2.6 2.4
250
300
! 100
=3150
0.8150
Skewness Impact on Effectiveness (outdegree k as parameter)
150 ! = 2 150 30
200
!=1
250
300
150
200
3
3
40
! = 30
250 150
1
150 150
150
!=5
!=5
!=5
2.8
150 150
2.4
200
200
250
100
150
200
250
2.6
1002.4
250
! 100
= 150
0.9150
150
150 200
!!== 22 20
150 100
50
2.8
Burstiness Impact on Effectiveness
20
20
2.4
30
30
2.4
30
200
3 Time [s]
50
40
2.6
3
300
2.42.4
200
100
! 100
=250
0.8
150
!=5
2.4
100
!
=
1
2.2
150
350
150 100
400
200
! = 0 250
2.8
2502.6100100
3100 100
2.8
2.8
250
30
150 150
! = 0.8
2.6
100 ! = 2 100 20
5050
100100100 100! 100
200 2002.2
250 3 250
200
=150
0.9
2.4
150
2.2
3
150
50
!
100
0.99
!=
= 55
100
100
100
100
100
!
=! 0= 5
2.2
150
50
200
100
2.2
2.8
Time [s] ! =200
! = 0.8
!=
0.8
20
! = 0.82.4
2.8
150
2.22.2
!20
= 0.8 ! = 0.8! = 0.9
30
1002.4
33 ! = 2
!
=
2 100 50
150
! = 0.8! =100
=1!0.8
!
=
0.8
0 2.8
0.8
!0.8
= 0.8! =
0.8!200
2.2
!
=
0
!=5
3
2
2.8
=
!
=
0
2.6
2.6200! = 0
250
100 5050
10
100100
10
150
10 ! 50
!=
0.92.6
100 100
20
! = 0 !!==0 0.9
150
!0= 0250
! = 0 ! =250
!=2
!=2
2.8200
=
0.9
!=2
150
0.8
!20
= 0.9 ! =!! 0.9
!0350
=100
200
2.20 2.4
50
30
2
3
!!==10.8
==0.8
=500.9!!=50
!200
=100
0.9
! = 0.9
50
==0.9
!Time
= 0.9!![s]
2
0.9
50
300
50
!
=5
50 !
50
!
=
0.9
=
0.8
50 2.2 50
2
!
=
0
50
2.6250
0.8
2.6
50
2.6
200
!
=
0.8
2 2
! = 1 2.2
50
100 ! = 0 ! = 1 ! =
2.8 !!== 00
10
100
50 0 !50=501 50 150
= 22
2.8
0.80.98
1= 1 ! =!! 1==0.90.9
!=2
!! =
! = 0.9
1.8
2
20
!10
!150
= !1 =!1= 1 ! !=
! =2.8
12.4
!!= =1 0.9
150 2 0
50 250
2.6
23 1.8
=1
0.9
2.4150 150
!=5
100
50
50
100
100
50
200 200
250! 50
!2.4
=
0.9100
!=1
=
0.8
2.2
50
50
2.6
150
50
2 1.8
!
=
0.8
2.4
10
!=1
50
2.8
! = 0!! == 00! = 0
20
!=0 !=2
! = 1! = 10.9
200
100
3
150
1.8[s]
0
1.8
! 50
= 1 150
2.4
2.4
!=5
! = 1 ! = 0.8Time
1.81.81.4 1.8
2.6
0
1.6
1.8 1.8
2
0300
2.6
! = 0!0= 0.9
0
10
500.97 2
100
00 250
100
15033 150
250
! =200
= 02 3 200
2.8
2
2
1 10
!!10=1=150
0.8
20
1
3 !
2.250
0 0 0 ! =0 0.810000
10.9
2
0
2.4 0
2.2 ! 50
50
150 0
200
250
2.2
=2.6
0.9
150
200 10
2.2 100100
250
50
102
10
1.8
50
10
10
1.6
2
10
10
10
!
=
0
2.4
10
10
10
0
50
100
150
200
250
150
0
50
100
150
200
250
100
150
200200
250
100
150
200
250
150
200
5050
100
150 100
200 150
250
200
100
200 250
250
50 50
100
150
200
250
5005050
100
150
200
Time
[s]
5050
100
200
250250 250 100100Time [s]
00.9
50
100100
150
200
250
1.8 1.6
0 ! =50
5050 100
100
150150150
200
250
01.6 0 050
150
200
250
50
150
200
250
!
=
1
0
50
150
200
250
50
100
250
2.6
100
150
200
250
0
100
Time
[s]
! ===[s]
1Time
10
2.8
Time [s]
0.8
100
!
=
2
!
0.9
!
1
!
=
1
1 [s] Time [s]
2
3
1.6
500
Time
!
=
0
0
2.2
Time
[s]
2.2
2
1.6
Time
[s]
Time
[s]
Time
[s]
Time
[s]
Time
[s]
1.61.6 1 1.6 1.2
2.4
[s]100
50 0 1.6
1.4
1.8 0
Time
100
10 Time100
10
50
250
100
1502.4
200
2500 10
1002
15010150
200
1502[s]
200
250
2.6
= 0.8
32
! = 150
100
150
250 50
0 250
050
50!50
100
150 150
200 200
2.2
2100
1.8
=200200 k = 1 250
250
0.8 0.961
!!!==
! = 10.9
1.4
2
3
Time[s]
[s]
1.8
! = 0.8
!=0
1.6 [s]
2.2
Time [s]
1.6 1.4
1.8
!=
=2.4
0.8
Time
Time
2.4 1.4
10
10
10
0
0
!
0.9
!
=
0
2.6
k
=
1.5
Skewness
Impact
on
Effectiveness
(outdegree
k
as
parameter)
0
1.4
1.4
501.6
!=0
0
10
2
2
0.81.4
1 1.2
1.2
1.4
1.8
2
200 0
1.4 0.6
! =02 250
! 250
=100
10.9
2.2
11
22 200
33
! = 0.950
1.4
150
2.2
5050
100
150
200
250
![s]
= on
2(outdegree
Time
100
50
! Leechers
= 2.8
1250 (b)
10
10
10
50
100
200on Effectiveness
(a)
evolution
Seed
evolution
Skewness
Impact
on
Effectiveness
k!as=parameter)
Impact
k as kparameter)
Skewness
Impact
onSkewness
Effectiveness
k as (outdegree
parameter)
50
100
150 150 Burstiness
200Impact
2.4
Skewness
Impact
on(outdegree
(outdegree
kasasparameter)
parameter)
Impact
on
Effectiveness
Impact
on
Effectiveness
kEffectiveness
as
parameter)
10
10
1.6
50 0Burstiness1.8
k =10
2.5
Skewness
Impact
onSkewness
Effectiveness
(outdegree
kEffectiveness
ask(outdegree
parameter)
!
=
10.9
2
Skewness
Impact
on(outdegree
Effectiveness
k as(outdegree
parameter)
Impact
on
Effectiveness
50
1.8
0
1.8
Skewness
Impact
on
Effectiveness
(outdegree
as parameter)
1.6
50
01 1 1 Skewness
1.6
1 1 [s]
! =50
150
2400
11 Impact on Effectiveness
Time [s]
[s]Impact
Time200
[s] 100
1.4 1.2
! = 0 k as parameter)
200 Burstiness
250on Effectiveness
350
3501.2
400
2 Effectiveness
33
Skewness
(outdegree
k as parameter)
01
150 1501.4
250 Impact
1! =1 1Time
Time
=2.2
1
Burstiness Impact100
on!100
Effectiveness
2.2
1
Skewness
on
(outdegree
k
=
200
2.4
0.95
10
10
0.4 1
0.6
0.8
1.4
1.61.8 1.8
1.2 350 0.6
0.4
0.80.2
1.2
1.4 1.2
1.6
1.4
1.8
1
1.21.2 0.2 1.2
! =21.8
52 22 in the system
! = 5 10
0 over (Seed
Time
[s]
! =onfaster
0.8
350
400 3. 0 Burstiness
400
Fig.
Impact
on 1Effectiveness
(a) Evolution)
Leechers
decreasing
time
indicates
a1 shorter downloading
time.
We study
Burstiness
Impact
on Effectiveness
(Seed
Burstiness
Impact
Effectiveness
Evolution)
1.2
Skewness
on2 Effectiveness
(outdegree
as parameter)
! =here
0 k the
1
k = 34
Time [s]
1.8
1.6
1.6
! Impact
= (outdegree
0impact
0on
! 1.6
= 5 60 2.2 η ofBurstiness
!=5
10 1 Impact has
10on Effectiveness
10
60
2.6
Skewness
on Effectiveness
k on
as parameter)
0.99
Burstiness
Impact
on Effectiveness
(Seeds
Evolution)
Burstiness
Impact
onhigh
Effectiveness
(Seeds
Evolution)
1.4 impact
1.4
Burstiness
Impact
Effectiveness
(Seeds
Evolution)
150
Impact
on
Effectiveness
(Seeds
Evolution)
of
different
rate
of
arrival
λ
on
the
effectiveness
file
sharing.
For
arrival
rate,
higher
effectiveness
a
higher
the
completion
1.8
Burstiness
Impact
on
Effectiveness
(Seeds
Evolution)
Burstiness
Impact
on
Effectiveness
(Seeds
Evolution)
Skewness
Impact
(outdegree
k
as
parameter)
0
1.2
0
1.2
!
=
0.9
50
100
150
200
250
0
50
100
150
200
250
k = 23
1
2
350
350
0 0.8 on Effectiveness
1.2
50
1
2
1 0.990.99 0.99 0.99
21.2 (Seeds 1.4
Time [s]
60
2.2
0.99
3000
300 1 1.8 200 50
0.99 0.99
1
60 60
150
200
250200
60100
0.2
0.4
0.6 5050 Impact
1 100100
1.6
2 150
Burstiness
Evolution)
1
150
250 of
6001 time
60
0.99
1.6
1.6
1 10
0.99
50
100
250
10
10
10
1.8
0
50
150
200
250
Skewness
Impact
on
Effectiveness
(outdegree
k
as
parameter)
Time
[s]
of
the
initial
350
leechers.
(b)
Impact
of
effectiveness
η
on
the
evolution
the
number
of
seeds
in
the
system.
Leechers
departs
when
their
download
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 0.200.4
0.2 0.4
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
350
150
0.94
on Effectiveness
(Seeds
0.2
0.4
0.8 Burstiness
1 Impact1.2
1.4 Time
1.6
1.8350
!=1
600.6
0.99
300 1.82 2
[s]1Evolution)
1
1.6 300
1.4
1.4 (Seeds
0.99
[s]
Time
0 0.6 0.2
0.4
0.6
1 1.2 1.2
1.4
1.8 1
2 1.8
Burstiness
Impact on Effectiveness
Evolution)
0 1.4
0.8
10.8
1.4 1.6Skewness
1.6
2 [s]Impact
Skewness
on Effectiveness
(outdegree k as parameter)
1.2 is complete.
60
Constant
leechers’
arrival2.4
rate λ =
{0, 2,0.25}.Time0.4
Impact
on Effectiveness
(outdegree
k as parameter)
300
2
1.9
2
1.8
!=1
1.7
! = 0.9
! = 0.8
(a) λ = 10
1.6
1.9
1.5
1.8
1.4
2
1.9
(b) λ = 30
(c) outdegree k as parameter
1.8
1.3
1.7
1.7
1.2
1.6
1.1
1
1.6
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1.5
1.6
1.8
1.8
1.2 1.8
1.8
1.5
the skewness α has less impact on the effectiveness (η) when the number of average active connection increase. The maximum number of connection is kept
1.6
1.4
1.6
to 80 as in the original version
of the BitTorrent protocol.
1.6
1.6
1
1.2 1.2
1.2
1.4
1.3
1.4
1.3
1
1 1.6
1.1
1.2
0.6 0.80.60.8 1 0.8 1 1.2 1 1.2 1.4 1.21.4 1.6 1.41.6 1.8 1.61.8 2 1.8 2
0.220 0.2 0.4
40.20.4 0.6
2
0.40.6 0.80.60.8 1 0.8 1 1.2 1 1.2 1.4 1.21.4 1.6 1.41.6 1.8 1.601.8 20 1.8
2 0.20.4 0.6
1.2 0.4
1.4
1.2
1.4
0.2 1.1
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1.4
1.1
1.4
1.4
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1
0.4 1 0
0.6 0.4
0.8 0.8
1 1.2
1.2
1.4
1.6
1.8
2
0.2
0.6 0.4
0.8 0.8
1 1.2
1.2 1.6
1.4 2
1.6
1.8
0.2
1.4 1.2 1.6
1.8
0.2
0.4
0.60.6
0.8 1
1
1.4 2 01.6
1.8
2 0.4 1 0
0.2
0.6
1
1.4
1.8
1.2
1
1.2
1.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0.2
0.4
0.6
0.8
1.2
1.4
1.6
1.8 0
0.2
0.4
1
1.2
1.4
1.6
1.8
22 0.2
1.2
1.2
0.2
0.2
0.4
0.4
0.6
0.6
increment the number of seeds present at the same time in the more chance to get more pieces, therefore the effectiveness
0.8
1
1.2
1.4
1.6
1.8
2
system,
namely,
more
leechers
complete
faster
their
download
increases.
1
0.8
1
1.2
1.4
1.6
1.811
2 0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
222
11000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
00
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
22
for higher effectiveness.
2
2
VI. E XPERIMENTAL R ESULTS
3) Skewness Impact on Effectiveness: As we showed in
Figure 2, during the initial phase of a torrent the number
of pieces a leecher has, follows a power law distribution.
As the torrent approaches steady state, the skewness of the
Zipf distribution grows until, it became reasonable to assume
a uniform distribution among the pieces a leecher has. In
Figure 4(c) we show the impact of the skewness on the
effectinvess, having the average outdegree of a node k as
parameter. k is measuring the number of active connections
(downloads) a leecher maintains during its lifetime in the
torrent. The first observation coming from Figure 4(c) is
that, fixing k, the effectiveness grows with the skewness; this
confirm that during a life time of a torrent, the skewness grows.
The second observation is that the effectiveness grows with the
average number of active connection a leecher has, namely,
when a leecher on average is unchoked by more peers, it has
We performed a series of experiments to assess the performance of our Proportional Fair Seed Scheduling, both using
the GPS simulator [11], creating our own random physical
topologies, and over PlanetLab [13]. Overlay topologies are
created by the tracker, which randomly connects together a
maximum of eighty peers. In all the simulation experiments,
we set a homogeneous bandwidth of 2 Mbps. For our PlanetLab testing we have implemented P F S on a real client [12],
starting from the mainline instrumented client [24], and we left
unconstrained the link bandwidth and measured, on average,
about 1.5 Mbps. In all our experiments, there is only one seed
and leechers leave when they are done. This choice of one
seed was made to show results in the worse case scenario. In
the following plots, 95% confidence intervals are shown.
9
4
Effectiveness (")
Effectiveness
(") (") (") (")
Effectiveness
(")
Effectiveness
(")(")
Effectiveness
Effectiveness
Effectiveness
(")
Effectiveness
Effectiveness
Effectiveness
(")
Effectiveness (")
Effectiveness (")
Effectiveness (")
Effectiveness (")
Effectiveness
Effectiveness
(")
Effectiveness
Effectiveness
(") (")(") (")
Effectiveness
Effectiveness
(")
Effectiveness (")
Effectiveness (")
Number
Leechers
inSystem
theSystem
System
Number
ofSeeds
the
Number
ofof
in thein
Number of Seeds
Number
of Seeds
in the System
Number
of
in the
theSystem
System
Number
of Leechers
in
Number
ofof
Seeds
inin
the
System
Number
of
Seeds
the
System
Number
Seeds
the
System
Number
of
Seeds
in
the
System
Number
ofin
Seeds
in
the System
Number
of
Seeds
in
the
System
Number of Seeds
Number of Seeds
Number of Seeds in the System
1.4
50
1.4
1
Number of Seeds in the System
10.4
0
Number
Leechers
inSystem
theSystem
System
Number
ofSeeds
the
Number
ofof
in thein
Number of Seeds
Number
of
Seeds
in
the
System
Number
in the
theSystem
System
Number of
of Leechers in
Number
ofNumber
Seeds
in
the
System
Number
of
Seeds
in
the
System
of
Seeds
in
the
System
Number
of
Seeds
in
the
System
Number
of
Seeds
in
the
System
Number
of Seeds
Seeds
in
the
System
Number
of
in
the
System
1.6
Skewness Impact on Effectiveness (outdegree k as parameter)
0.6
0.8
1
1.2
1.4 Effectiveness
1.6 1.2 (Seeds
1.8 Evolution)
2
1.8
0.98
0.9911
60 0.4
0.2
0.6 Burstiness
0.8 Impact
1.21.8
1.4
1.6
2
300
Burstiness
Impact1on
on Effectiveness
(Seeds
Evolution)
Burstiness Impact on Effectiveness (Seeds
12 1.8
1 Evolution)
50 300
Skewness Impact on Effectiveness (outdegree k as parameter)
501.4
1.6
0.99
60
Burstiness Impact on Effectiveness (Seeds Evolution)
0
0.2 100 0.4
0.6
0.8
1 0.980.98
1.2 0.98 0.99
1.4
1.8
2
250
50
250
0.98
0.98 0.98
0.98
50
60
300 50
50 50
300 1.6
1 Impact1.6
1.8 60
0.98
1.2
1.2
Burstiness Impact on Effectiveness (Seeds Evolution)
Skewness
on
Effectiveness
(outdegree
k kasas parameter)
0
Burstiness
Impact
on
Effectiveness
(Seeds
Evolution)
1.2
60
Skewness
Impact on Effectiveness
(outdegree
parameter)
Burstiness
Impact
on
Effectiveness
(Seeds
Evolution)
0.93
Burstiness Impact on Effectiveness (Seed Evolution)
250
250 1.4 65
0.981 Evolution)
50
100
Burstiness
Impact
on
Effectiveness
(Seeds
Burstiness
Impact
on
Effectiveness
(Seeds
Evolution)
0.98
2.2
1
0.99
50
100
150
2002.5
1
1.5
2200
0
0.2
0.4
0.6
0.8
13
1.4
1.660
1.8
2
11.2
60
0
50
100
150
250 3 250 3.5
1.6
60
Burstiness
Impact
on
Effectiveness
(Seeds
Evolution)
50
Burstiness
Impact
on
Effectiveness
(Seeds
Evolution)
1.8
60
250
250
0
0.2
0.4
0.6
0.8
1 ! = 0.81.2
1.4
1.6
1.8
2
1.2
60
1.2
1.4
0.97
Skewness
(!)
1.6
0.98
50
60
Time
[s]
0.99
0.2
0.4
0.6
0.8
1
1.2
1.43
1.6
1.8
2250 1.4
60
60
0.99
0.99
1.6
1.2 250
1
!=0
1
0.98
50
200
200
1 40
40
0.97
0.97
0.97
0.97
0.97
0.97
50
50
40 4040 40
40
!
=
2
40
0.97
!
=
2
0.98
0.99
0.99
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0.97
0
0.2 0 1.2
0.4 0.2 0.6 0.4 0.8 0.6
1 0.8 !1.2
1.4 1.2 1.6 1.4 1.8 1.6
2 1.8
= 0.9
1
2
2.8
50
50
50
1.4 55
200
2
0.97
40
! = 2200 1.6
50
1
0.97
0.99 ! = 2
0.98
1
200
200
1.2
2.8
50
50
40
k=1
1.2
1.4 1.2
1
0.2
0.6 ! =0.8
1
1.2
1.4
1.6
1.8
2
50
0
0.2
0.4
0.6
0.8
1
1.4
1.6
1.8
20.4
50
1.4050
0.96
0.98
200
50
0.97
0.2
0.4 1 0 200
0.6 0.2
1.6
1.8
2
40 0.8 0.4 1 0.6 1.2 0.8 1.4 2.6
0.98
0.98
50
0.98
1
1.2
1.4
1.6
1.8
2
k = 1 k =kk1=
k=
1.5
1150
150
0.97
=
1 1k = 1 k = 1k = 1kk==11
40
30
0.96 0.96
1.2
0.96 1.2
40
40
30 3030 30 40
30 30 1.4
0.96
0.97
0.960.96 0.96
0.98
0
0.2
0.4
0.6
0.8
1
1.4
1.6
1.8
2
1.8
0.96
2.6
1 1.5
150 150
150
45
k = 1.5
k = 1.5
30 150
k= =
k = 1.5
k =kkk1.5
1.5
40
k = 1.5
kk==1.5
1
=
2.5
30
=
1.5
110
0.97 1.6
0.98 1.2
0.2
0.4
0.6
0.8
1 0.96
1.4 0.97
1.8
2
0.96
1.2
1.2
0
40
2
k
=
1.5
040
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.4
30
150
150
k = 2.5
k = 2.5
k
=
2.5
k = 2.5
k
=
2.5
40
12.5
1.5
k = 2.5
kkk===2.5
k
=
3
40
k
=
2.5
40
0.95
50
100
150
200
250
3
0
50
100
150
200
250
3
30
30
40
3
0.96
k
=
1
0.97
2.4
1
!
=
5
!
=
5
40
0.97
!
=
5
!
=
5
!=5
!=5
2.5
2
100
100
1.2
132.5
2
k = 3 k k==kkk31=
k= =
=31.5
1.6
30
43 3k = 3 k = 3k = 3kkkk===
030
0.2
0.4
0.6
0.8 0.95 0.95
1 0.96
1.4
1.8
2
20
20
30
=
0.95 1.2
20 20
20 20
Time
[s]1.6
0.95
0.950.95 0.95
0.96
0.96
0.97
!
=
5
0.95
k
=
1.5
35
100
100 100
100
k
=
1.5
1.9
2.2
1
1.5
20 30
20
k = 4 k =kkk4=
0.95 1.2
k= =
1.9 1
243 4k = 4 k = 4k = 4kkkk===
=42.5
413
!20
= 50.40
=
100
1.9
0.2 ! = 0.8
0.4
0.61
0.8 1.2
1 0.97
2
0.95
kk
= 2.5
! = 0.8 2.8 2.8 2.8
0
0.2 100
0.6
0.8
1.4
1.6 1.4 0.96
1.81.6
21.8
2.2
=
2.5
30
20
30
20
30
30
!!==05 1 30
!=0
k = 2 k k=k=kk23=
k= 1=24 2k = 2 k = 2k = 2kkkkk====
!=
!=
!0.8
=!!
0.8=0.9
!0.8
=!!0.8
=21312.5
21.5
!=
0.8
!=
0.8
0.94
4
! = 0.8
! = 0.8
0.95
=
3
20
30
0.95
==1 0.9
!
=
5
1.81.4
0.96
30
0.96
!
=
5
!
=
5
!=
=1 1
k
=
3
!
=
1
1.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
50
50
!
=
0
!
=
0
1.8 2.6
2
kk
=k4=
= 1.5
2
0.95 !0.94
2.6
!!0.9
=
0.8 2.6
312.5
0.94 0.94
= 0.9
kkkk====1.5
10 1050 105020
=502! =1055020
= 2 0.94 0.95
!=
!0.9
=!
!=
!0.9
=!!0.9
= 2 !10
!=
0.9
!=
0.9
0.94
41.5
2
25
! = 2 ! 10
! =0.94
2 = 2 !0.94
! = 0.9
! = 0.9
==0.9
0.94
0.96
2
=
1
=
1
kk
= 2= 4
!
0.9
!
=
0.9
50
50
!! =
0.8
1
! = 0.8
20
1.7
33
10
10
0.94
=2.5
= 0.8
!! == 25 ! = 1! =
0.9
1.7
k = 2.5
2.8
3
41.5
kkkk===
!=
!=
3
0.94
1=
0.96
1!1 =!
1!1 = 1
22.5
!!!===555 0.95
0.94
! = 1! =
10 20
!!
==0.8
0.8 2.4
20
20
2.4 33 1.8
10
! = 0.8
k = 21.5
1.7 2.4
!10
= 2 20
!
=
0.9
20
!!==55
0.8
2.5
!
=
1
1.2
!
=
0.8
0.93
!
=
0.8
k
=
3
k
=
2
k
=
3
4
1.6
k=3
10
! = 0 ! = 0 ! =!0= 2 20
!
= 0 ! = 0 ! = 0 1.5 0.94
0.95
1.6
! = 0.9
1.82.8
20
0.95
0
0
2
2.5
3
4
0.941150
2.8
! ==
=50
0.8
k = 3.5
2.5
!!
10.9
2.6
15
2.8
0.93
0 00 0 10
0 0!03!
50
100
50
2003
250
=200
0.93150
00
100
250
100100
200
250
=0202 10
= 2 0.93
0.93 0.930.9310!
! =50
0.9
0.93
k3.5
=4
1 1 250
2
kkk===4432
3 0 200
2 150 150
3
1 1 2.2
1
3
1
2
330.93 30.94
2.2
(!)
2.2 2.8
2.8 2 102 21.5
10.9
10
10 2 1 2 0.93
!!1 ==10
0.8
1.5
211 250
2.5
32.5
4343 3.5
1 1 1 1.51200
2 21.5
2.5
3 3 2.5
4 4 43.5
10!0 =200
!1 250
=10
0.8
1 1.5
1.5
22 2.52.5
3= 3.5
1.5
2 250
33 3.5
1.5
2 Skewness
2.5
3 k3.5
43.5 4
1.5
22 (!)2.5
2.5
3.5
10.95
2Skewness
3 3.5
10
10 150
10
10
10
10 150
10 10
10 150
10
1.5
2.5
10
50
50
200
10 50
10
10 Time
101.5
1.5
0
0
200
250
!
=
0.8
50
100100
150
1.6 100100 10 1.6
2
!! =
==1010.9
Time
[s]
[s]
0.93
3 3.5
! = 010
Time [s]
!
=
0.8
!
Time [s]
1
2
3
0.95
1
!
=
1
k
=
2
0.93
0.94
0.8 Time Time
Skewness
(!)
2.6
1
1.5
2
2.5Skewness
3Skewness
3.5
4 kkk===224
Skewness
(!)(!)
[s]
[s] [s]
Skewness
(!)
Skewness
(!)
(!)
!! == 0.8
(!)
[s] 1.6Time
[s] Time
Skewness
(!)
!! == 02 10
Skewness
(!)
10
Time
[s]
Time
[s]
! = 0.910 Time Time
2.6 [s]
1 10
2.4
Skewness
!
=
2
0.9
0 100
10
2.6
1
1.5
2
2.5 k = 4
3
3.5
!!==22
!
=
0.9
! = 1 1 12 2
0==00.4
10
2
30.6
1.4
!
=
0.9
2.6
0
0.8
1
1.2
1.4
1.6
1.8
2
2 0.2
3 1
!
1.4
Skewness
(!)
2
k
=
2
2.6
Time
[s]
0
!
=
0
!
0
10
0.94 2
3
! ==10
0.9
102 10
103
0.94
10
! = 22 0.93
!
! = 1 1010 11.5
Skewness
1.4
10
0.9310
! = 0.9
110
1
1.5
2
2.5 k =(!)2
3
3.5
0
! = 0 0 10 3 3 !
10 11
10 2 2
Time [s]
!=
= 111
1.5 0.93
2.5
3
3.5
4
1
3 2
0.94
1.3[s]
1.42.4
2.4 Time
2.2
!
=
Time [s] 102 1
1.3
2.4
!
=
1
10
10
10
Skewness
(!)
!
=
0
Time
[s]
10
10
10
10
10
0
1
1.5
2
2.5
3
3.5
!!!===000 0.93
1.8 2.4
1 1.8
2
3
2.4
1.8
Skewness (!)
0.94
Time
[s]
Time [s]
Time
[s]
0
101
102
103
Skewness
0
!!==00
1.2 1.2
1 33
1.5
2
2.5 (!)
3
3.5
1
00
1.2
2.2
222
101.4
10
10
2.2
2 Time [s]
!=1
0
0.93
10111
10
1033
1.2 2.2
0.93 10
!=1
0
10
10
2
2.2
10
10
10
1
2
3
! = 0.9
1.6
1.6
2.2
1.6
1
1.5
2
2.5 Leechers
3 Skewness
3.5
Time [s]
! = 0.9
1 10
1.5
2
2.5
2.5 (!)
3.5
10on the evolution
10
10in the system.
Time
[s]
Fig. 4. Skewness
impact on effectiveness:
(a) Impact of effectiveness
η
of
the
number
of
seeds
departs
when 33their 4 3.5
1.1
10
10
Time
[s]
! = 0.8
1.1
0.93
Time
[s]
! = 0.8
Skewness
(!)
Time
[s]
1
Skewness
(!)
Skewness
(!)
2
Time
[s]
1
1.5the number
2 of seeds
2.5in the system.
3
3.5
2
1.8
0.932
0
0.6
0.8
1 = 10.
1.2 (b)1.4
1.6 of effectiveness
1.8
0
0.2
0.4
0.6
0.8 is complete.
1
1.2 Random
1.4
1.8 0.2
2 0.4rate with
1.3
21.6
download
arrival
average
λ
Impact
η on1.5
the evolution
of
1 leechers’
1
2
2.5
3 Skewness (!)
3.5
4
1.4 0 22 1.8 0.21 0
1.4 1.4 1.6
0.4
0.8
1.4
1.8
2
0.2 1
0.4
0.6
0.8
1
1.2
1.4
2 0.2
0.4 0.6 0.6
0.8
1 1
1.2 1.2 1.4
1.6
1.81.6
2
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
2
Leechers
departs
when
their1.8 download
is complete. λ = 30. (c) For highly connected peersets (bottom curve only one neighbor,
top curve 101 neighbors),
Skewness (!)
1.8
0.2
4
peerset (every peer joins at the same simulation time), the seed
is the only source and so it gets many requests, but many
less if the scheduling algorithm is smarter (Figure 6(a)). For
an higher value of burstiness instead (Figure 6(b)), we note
smaller improvement of the seed utilization because smaller
groups of peers at the time are downloading the file. In fact,
we also notice fewer requests to the seed.
We conclude that, even for static cases or small values
of burtiness that do not lead to a significant improvement
in downloading time, a better seed scheduling guarantees a
better seed utilization. Viceversa, we observed that, when the
burstiness is high, we have less improvement in seed utilization
but the average time to download is smaller due to the more
effective piece distribution.
A. Simulation Experiment 1: Average Downloading Time
Figure 5(a) shows the average downloading time of a 400
MB file for an overlay of 350 peers for different values of
Burstiness (defined in Section III). For P F S, we set the
forgetting factor β to 0.02 as explained in Section IV. We
also show performance of Global Rarest First (GRF), where
each peer is connected to every other peer. GRF serves as
lower bound on the average download time. We observe that
P F S improves up to 25% the leecher average downloading
time, depending on the level of burstiness in arrival process.
To obtain a value of burstiness of 200 for example, we
have used a peak rate of 10 meaning that, 10 leechers join
the overlay in 1 second, for a simulation period of 7000
seconds. In this way we have an average arrival rate of 350
peers / 7000 s = 0.05 peers/s. With a peak rate of 10 and
average arrival rate of 0.05, we obtain a burstiness level of
peak
10
= 0.05
= 200. When the burstiness level is high,
B = average
leechers have too small peersets to make good use of swarming. When instead the burstiness is too low for the chosen
value of β, then forgetting too fast has the same effect as not
remembering at all (as BitTorrent does). Even for situations
where there is no significant gain in time to download (the first
point of Figure 5(a) shows burstiness = 1), Section VI-C
shows that it is still beneficial to use P F S to reduce seed
utilization.
D. Fairness Index
In Figure 6(c) we show some fairness related results during
the download of a short file (2 Mb) from a fairly small peerset
(10 leechers). We use a standard measure of fairness, the Jain’s
fairness index, which is defined as:
Pn
2
( i=1 xi )
(10)
J(x1 , x2 , . . . , xn ) = Pn
n i=1 x2i
where xi is the number of copies of piece i. The result ranges
from n1 (worst case) to 1 (best case). As always, we stopped the
simulation when every leechers have completed the download,
and we show the Jain’s fairness index over time for both cases,
using a first come first serve and a proportional fair scheduling
at the seed.
The first result to note is that when some intelligence
is applied at the seed, i.e., when using a proportional fair
scheduling, the piece distribution is more equalized and in fact,
the fairness index is always higher. As we note in the first part
of the graph, this implies that when the piece fairness is higher,
the download is faster—when the Jain index reaches a unitary
level, it means that all the considered peers have the same
amount of pieces, and so they have completed the download.
Using a first-come first-serve approach, the first 10 seconds are
not enough for the first set of unchoked leechers to complete
their download, while in less than 5 seconds the fairness
index reaches its maximum value when the proportional fair
scheduling is applied.
The second interesting results can be observed after the
first rechecking interval, set to 10 seconds. Only M leechers
are allowed to make a request in the first unchoking interval,
and they quickly download all the pieces. After the first 10
seconds though, a new set of leechers’s requests lower the
Jain’s fairness index, till again the steady state of the piece
distribution is reached, because also the new set of unchoked
leechers has the opportunity to download.
This observation has an important consequence it the design
of peer-to-peer protocols: in particular, the rechocking interval
should not be fixed to 10 seconds but it should be adoptive
with the size of the peerset and with their capacity.
The third results, related with the previous two observations,
concerns the total downloading time. Although using our
B. Simulation Experiment 2: Seed with Global View
In Figure 5(b) we show results for a slight variation of
the P F S algorithm — the seed is connected with, and so
unchokes, all the leechers in the network. We see that the
average time to download for different level of burstiness,
with exactly the same simulation setting of the experiment
described in Figure 5(a), has improved slighly.
A second observation is that, for unitary value of burstiness,
the average downloading time does not change significantly
regardless the scheduling algorithm. Section VI-C shows why
it is useful to adopt a proportional fair scheduling strategy at
the seed even for low level of burstiness.
C. Simulation Experiment 3: Seed Utilization
Figure 6 shows the utilization (defined in Section III) of
the unique seed present for different file sizes and unitary
burstiness value for case (a) and 25 in case (b). We note how:
(i) with P F S, seeds are less congested by leechers’ requests,
and (ii) since leechers leave when they are done, a lot more
requests are made to the seed at the end of the torrent when
fewer connections are active.
Leechers find themselves alone at the end thus they all ask
the seed being the only one left with interesting content. So
seeds using P F S receive less requests due to better piece
distribution. Moreover, lower seed utilization means higher
effectiveness and so delay in reaching the phase where leechers
are alone with the seed, (iii) the seed utilization is decreasing
monotonically with the file size. When file size increases, there
is more time for leechers to use swarming. Overall, requests
to the seed are less if seed scheduling is smarter. For static
10
Burstiness9000
Impact on Average Time to Download (95% conf.int.)
9000
Burstiness Impact on Average Time to Download (95% conf.int.)
9000
8000
8000
Burstiness Impact on Average Time to Download (95% conf.int.)
9000
7000
BitTorrent
First Come
Serve
BitTorrent
First Come
FirstFirst
Serve
Proportional
Fair Scheduling
FCFS
Proportional
Fair Scheduling
FCFS Global Rarest
First
Global Rarest First
CDF
Probability
Rechoke every
every second
Rechoke
every
second
Rechoke
11second
[s]
Rechoke
every
Rechoke
every
[s]
Rechoke
every
second
Rechoke
every
22 second
Rechoke
every
second
1 Rechoke
every
10
[s]
Rechoke
every
Rechoke
every
33 22second
Rechoke
every
second
Rechoke
every
second
Rechoke
every
[s]
Rechoke
every
second
1second
[s]
Rechoke
every
110
[s]
Rechoke
every
second
Rechoke
every
Rechoke
every
Rechoke
every
second
Rechoke
every
5522 second
Rechoke
every
3
[s]
Rechoke
every
second
Rechoke
every
[s]
Rechoke
every
310
[s]
Rechoke
every
second
Rechoke
every
second
Rechoke
every733 3322second
second
10
[s]
Rechoke
every
second
Rechoke
every
Rechoke
every
75 second
Rechoke
every
second
Rechoke
every
5
[s]
Rechoke
every
3second
[s]
1
Rechoke
every
5
[s]
Rechoke
every
3
Rechoke
every
5
second
Rechoke
every
3second
Rechoke every
7 5second
Rechoke
every
second
Rechoke
every
0.8
Rechoke
every
[s]
0.8
Rechoke
every
Rechoke
every
7 535second
second
7175second
[s]
Rechoke
every
second
Rechoke
every
Rechoke
every
[s]
Rechoke
every
2
second
5
Rechoke
every
7
second
Rechokeevery
every757second
[s]
Rechoke
every
10
[s]
Rechoke
every3 27second
0.8 0.8 Rechoke
Rechoke
every
Rechoke
every
second
0.8
Rechoke
every
7second
[s]
Rechoke
Rechokeevery
every5 37second
second
Rechoke
every
3second
[s]
Rechoke
every
0.8
Rechoke
every
7 second
0.6
[s]
0.6 Rechoke
Rechokeevery
every 55second
Rechoke every 7 [s]
0.6
0.4
0.6
0.4
0.4
0.2
0.4
0.2
0.2
3000
2000
0 3000
100
200 300
300 400
400 500
500 600
600 700
700 800
800 900
900
350.4
00 200
00 300
100
200
200 Serve
400
500 Burstiness
600
700
800
900
First
Come First
350.8
300
400
500
600
700
800
900 BitTorrent
200
400
500
700
800
200 300
300
400 600
500
600 900
700
800
900
Burstiness
350.2
BurstinessBurstiness
Impact on Average Time to Download (95% conf.int.)
2000
400
500
600Burstiness
700 Burstiness
800 0
900
FCFS 300 Fair Scheduling
200 Proportional
4009000
500
600 Burstiness
700
800
900
Burstiness
Burstiness
1000
350.2
Global
Rarest First
350
Rechoke every 7 second
0.8
0.4 0.6
0.4
0.6
0.6
0.6
0.2
0.2
0.6 0.4
00
0.4 00
0.4
0.4
349.8
349.6
Average Time to Download [s]
Number of Leechers in the System
350
Number of Leechers in the System
350.2
300
350.6
350.2
350.4
Number of Leechers in the System
Number of Leechers in the System
350
400
100 150 200 250 300 350
400
Downloading Time [s]
50 Burstiness
100Impact on
150
200
250
Average Time
conf.int.) 350
BitTorrent
First00 Come
First
Serve
50
100
150
200 to Download
250 (95% 300
300
350
09000 0.2
Time
[s]
Download Downloading
Time
(sec)
leaving,
PFS
Downloading
Time10%
[s]applying
Download
Time
(sec) no
no Seed
Seed
10%
applying300
PFS
00.4
50
100
150
200leaving,
250
350
350.6
Proportional
Fair Scheduling
FCFS
8000
350.4
Downloading
Time
[s]applying PFS
1000
Download
Time
(sec) no Seed leaving,
10%
350
(a) ADT
(b)
ADT
(seed
connected
to
every
leecher)
(c)
Peer
selection
frequency
Global
Rarest
First
349.8
350
08000
0
0.2
00.2
50
100
150
200
250
300
350
200
300
400
500
600
700
800 7000 900
350.4
Fig.
(a) Average Download Time Burstiness
(ADT) for different
burstiness values: 350 peers are sharing a 400 MB file.
always
depart
they
done
350.2 5.
Time
[s]are
Download Downloading
Time
(sec) nowhen
Seed leaving,
10%
applying
PFS
349.6
0
349.8
0.2 Leechers
349.8
6000
0
7000
Number of Leechers in the System
Number of Leechers in the System
200
5000
5000
4000
0.6
0.8
0.6
11
11 1
Probability
2000
2000
3000
1000
350.6
1000
1000
350.8
350.8
350.4
350.4
3000
5000
6000
6000
350.8 Come First Serve
BitTorrent
First
2000
BitTorrent
First Come First2000
Serve
Proportional
Fair Scheduling
FCFS
First Come First Serve
4000
350.6
4000
Proportional
Fair
Scheduling
FCFS BitTorrent
351
350.6
1000
Proportional
Fair SchedulingGlobal
FCFS
3000
Rarest
First
1000
Global Rarest
First
Global Rarest First
350.8
351
350.6
1000
351
0.8
6000
50007000
7000
CDF
CDF
CDF
CDF
CDF
Probability
Probability
Probability
Probability
Probability
4000
3000
2000
3000
3000
0.8
1
0.8
8000
60008000
4000
Cumulative Distribution Function
351
4000
9000
9000
7000 7000
Probability
Probability
Probability
Probability
Probability
350.8
2000
351
5000
1
Average Time to Download [s]
351
3000
5000
5000
5000
3000
4000
4000
6000
Average Time to Download [s]
4000
8000
8000
7000
5000
7000
7000
6000
6000
4000
6000
11
Burstiness Impact on Average Time to Download (95% conf.int.)
8000
Burstiness Impact on Average Time to Download (95% conf.int.)
Average Time to Download [s]
5000
8000
7000
8000
6000
9000
9000
Average Time to Download [s]
6000
9000
Average Time to Download [s]
Time[s]
to Download [s]
Average Time Average
to Download
7000
0.2
00
50
50
100
150
200
250
300
Time10%
[s]applying PFS
Download Downloading
Time (sec) no Seed leaving,
400
400
400
400
0
First
50
100
150
200
250
300
350
400
Serve
800 approach
900 used by the BitTorrent
Download Timea(sec)400
no Seed leaving,
applying PFS
peers sharing
MB10% file.
Leechers
40
40
40
30
30
30
!0.8
60
0.8
349.2
80
99350.8
349.8
BttTorrent
First Come
First Serve
349.4
!0.4
!0.2
0
0.2
0.4
349.2
30
20
10
100
351
!0.6 3000
!0.8
FirstCome
ComeFirst
First
ServeFair Scheduling
Proportional
First
349 Serve
Proportional
Fair Scheduling
!0.4
!0.2
0!1
0.2
0.4
!0.8
!0.6
0.2 Proportional
0.4
0.6Scheduling
0.8
349.61
Proportional
Fair
Scheduling
Fair
50
!0.6
40
1
349
!1
349
!1
200
20
20
20
300
400 500 600
File Size [MB]
700
800
350.6
88
70
0.6
!0.4
900
200
300
400
500
600 700700 800800 900 900
200
300
400
500
600
200
300
400
500
600
700 800 900
File size [MB]
File
FileSize
Size[MB]
[MB]
!0.2
0.4
!0.8
30349.4
33
0
0.8
0.8
1
BitTorrent
BitTorrent
First Come
Serve
First Come
FirstFirst
Serve
01
0.2
60
200
0.6
300
400
50
500
600
Burstiness
700
800
900 2000
1000
1
350.4
30
20
!0.6
10
100
!0.4
200
50
0
350.8
350.6
40
4000
0
1
11
0
First Come
First
Serve
Proportional
Scheduling
Proportional
Fair
Scheduling
FCFS
351
Proportional
FairFair
Scheduling
3000
Global Rarest
0.4
0.6
0.8 First
1
Proportional
Fair
Scheduling
70
1000
0.8
!0.2
66 350
50
0.6
55349.8
40349.6
44
20
22 349
0.2
Burstiness = 25
2000
77
60
350.2
349.2
0
80
350.4
!1
10
10
10
100
100
100
!0.4
!0.2
300
0
0.2
400 500 600
File Size [MB]
First Come First Serve
!0.8
!0.6
!0.4
!0.2
0
Proportional
Fair
Scheduling
0.2
0.4
700
0.6
0
50
0.4
350
900
0.7
349.8
0.8
1
BitTorrent
First Come First Serve
Proportional
Fair Scheduling
FCFS
Global Rarest First
BitTorrent
Proportional
Fair Scheduling
0.8
0.6
0.6
0.6
0.5
200
0.3
400
Proportional
Fair
Proportional
FairScheduling
Scheduling
Fair
Scheduling
FirstProportional
Come
First
Serve
0.8
0.8
0.8
0.6
300
0.4
0.4
0.4
350.2
800
0.9
400
400
400
100 Time
200Fairness
250 applying
350
Download
Seed
leaving,
PFS
Download
Time150(sec)
(sec) no
noJain
Seed
leaving, 10%
10%
applying300
PFS
Download Time (sec) no Seed leaving, 10% applying PFS
100
150
200
250
300
350
First
1
FirstCome
ComeFirst
FirstServe
Serve
Download Time (sec) no Seed leaving, 10% applying PFS
Jain Fairness Index
349
!150
50
50
0
0.6
Number of Leechers in the System
60
60
60
!0.6
0.4
Seed Utilization
!0.8
!0.2
0.2
Burstiness = 1
80
70
0
Jain Fairness
Fairness Index
Index
Jain
Jain’s Fairness Index
!0.2
Seed Utilization
Utilization
!0.4
80
80
80
349.4
349
70
!1 70
70
!0.4
349.2
!0.6
Seed Utilization
!0.6
!0.8
Number of Leechers in the System
349.2
349
!1
349.6
349.2
!0.8
349.4
Seed Utilization
349.2
Average Time to Download [s]
with their download. The proportional fair scheduling guarantees 200
up to 25%
over the 600
First Come
300of improvement
400
500
700
350
protocol.
(b) Average download time
for different 350.2
burstiness values
and seed with full viewBurstiness
over the peerset.
5000
349.6
0.2350
349.6
6000
always depart when they are done with their download. The proportional fair scheduling guarantees up to 25% of improvement over the First Come First
4000
00
349.8
Serve
approach used by the BitTorrent
Applying the peer selection algorithm more often helps the
time.
349.4
5000 average
349.4 protocol. (c) 350
50 download
100
150
200
250
300
350
50
100
150
200
250
300
350
0 00
349.4
Seed Utilization
Utilization
Seed
Seed Utilization
0
Burstiness Impact on Average Time to Download (95% conf.int.)
9000
Burstiness Impact on Average Time
to Download (95% conf.int.)
8000
0.6
0.2
0.2
0.2
400
0.4
500
600
Burstiness
0.8
700
800
900
1
0.2
0.1
349.6
11
10
100
200
300
400
500
700 800800 900900
100
300
400
500
600600 700
100 200
200
300
400
500
700
800 900
File size
[MB] 600
File
349.4
FileSize
Size[MB]
[MB]
00
0
000
5
0 10
10
0 10
15
2020
25
3030
20
30
10 Time
Time[s]
[s]20
Time
[s]
Time [s]
4040
40
40
35
30
45
349.2
(a) Seed Utilization with Burstiness B = 1
(b) Seed Utilization with B = 25
(c) Jain Fairness Index
349
!1
!0.8
!0.6
!0.4
!0.2
0
0.2
0.4
0.6
Fig. 6. Seed Utilization with 350 peers. Seed applying PFS is less congested due to better piece distribution. (a) Burstiness = 1. (b) Burstiness = 25. Note
the different scale. (c) Jain’s Fairness index for a static peerset in the initial phase of a torrent. PFS guarantees a more equal piece distribution. After seed
unchoking interval, PFS recovers fairness faster.
P F S algorithm, it is barely visible that the total downloading
time of the all the peers (or the slowest peer) ends (about
5 seconds) earlier, it is more important to note how, when
using P F S, there are periods during the simulation in the
which no leechers downloads any piece only because the
rechoking interval protocol parameter is set to too high values.
This means that no other leechers except the first (M=4 in
BitTorrent) is allowed to request any piece.
This observation led us to analyze even the effect of tuning
the rechoking interval parameter, result that we report in the
next subsection.
interval period has been (mysteriously) set to 10 seconds.
Although for the rest of our simulations we kept invariant
this choice to analyze separately each contribution, we note
how it can impact on the average download time.
We have simulated an overlay of 350 peers among which
only one is a seed, all interested in downloading a file of 20
MB, with homogeneous bandwidth. As we can see, even if
the peer arrives all at the beginning, and no seed is leaving
the peerset when it has completed its download, applying
the peer selection algorithm with different frequency has a
non-negligible impact on the average downloading time. It is
important to notice that the simulation takes into account the
overhead of the control message. The take home message here
is that, refreshing the peers view by increasing the frequency
at which the best leechers are selected, helps the average
download time. Although it is intuitive that this parameter
should be adopted dynamically, depending on the peers rate
of arrival and departure, we can see how, even in static cases,
an more frequently updated view of the system helps the
downloading speed.
E. Rechoking Interval
As a last simulation result, we show in Figure 5(c) the
impact of the rechoking interval — the time period after which
every peer applies the peer selection algorithm again — on the
average downloading time.
We include this result to suggest an interesting research
direction more than an actual performance analysis of P F S. In
most of the file sharing protocols ala-BitTorrent, the rechoking
11
0.8
1
PlanetLab Results
Burstiness
ADT [s]
No
839
BitTorrent
Low
1328
High
2183
No
733
BUtorrent
Low
1120
High
2171
No
809
BUtorrent (β = 0.02) Low
1031
High
1924
Protocol
65% of all download traffic is P2P related. Between 75 and
90% of all upload traffic is P2P related” [26]. The most
popular protocol using swarming is BitTorrent [1].
Improvement
–
–
–
12.6 %
15.6 %
0.5 %
3.5 %
22.3 %
11.8 %
A. Improving peer strategies
The BitTorrent protocol has established swarming as one
of the most fresh and promising ideas in contemporary networking research and thus has kicked-started a (tidal) wave
of research articles in the area. Many proposals on how to
improve the BitTorrent protocol, by modifying the behavior
of leechers, have therefore appeared in literature. Recent
work [27] argues for greedy strategies, we cite BitTyrant [2]
as an example. Other work [3] considered game theoretical
approaches. Our focus is only on the seed without modifying
leechers.
Smartseed [5], [21] applies some strategies at the seed
to boost performance. However, Smartseed is not backwards
compatible with the BitTorrent protocol, as opposed to our
seed scheduling proposal that keeps every other fundamental
algorithm of BitTorrent intact. Moreover, Smartseed does not
take into account dynamic scenarios, one of the key aspects of
our results. Another unpublished approach involving the seed
is given by the superseed mode of Bittornado [28]. The goal of
superseed is to enable under provisioned seeds to effectively
inject content. An experimental study of its performance can
be found in [29].
A recent solution whose main goal is to reduce the seed load
has been discussed in [30]. The authors propose to estimate the
popularity of each of the pieces — using information from its
directly-connected neighbors and by probing additional peers
— to later decide which piece to advertise as available to its
neighbors. Although this policy is effective in limiting the seed
uploads, it has to deal with the trade-off of extending the peer
download time up to 3 times.
TABLE I
P LANET L AB EXPERIMENT WITH A PEERSET OF 350 ACTIVE LEECHERS
AND A UNIQUE INITIAL SEED . I N DYNAMIC OVERLAYS THE FORGETTING
FACTOR β BRINGS IMPROVEMENT IN ADT.
We therefore suggest as interesting research direction the
design of adaptive protocols that dynamically change seed
parameters given the network conditions.
F. Planetlab Experiment
To validate our results, we tested our BUtorrent on PlanetLab. We run simultaneously the scheduling algorithms we wish
to compare to minimize the difference in bandwidth available
to different experiments.
We ran a set of experiments with 350 PlanetLab nodes
sharing a 400 MB file and we report the results in table VI-F.
We found that BUtorrent improves the average download time
by 11.8% over BitTorrent in static overlays (no burstiness),
22.3% improvement for low burstiness (B = 400) and 12.6%
improvement for high burstiness (B = 800).
We show the cumulative distribution function of downloading time for static — Figure 7 (a) — and dynamic peersets —
Figure 7(b). In Figure 7(a), a file of 400 MB is shared among
the 350 leechers in a static peerset case. As always in our
experiments, leechers departs after their download is complete.
Note how the forgetting factor β decrease performance, and
also that there is not much to improvement for the group
of leechers, with lower download capacity, that are rarely
unchoked by other leechers.
In Figure 7(b) we introduce a burstiness level of 400. We
can see how that the forgetting factor plays a pretty crucial
rule in improving the downloading time.
For such dynamic scenarios, the effect of the slow peers
have less impact because in burstiness cases, smaller group
of peers must collaborate using swarming to finish as fast as
possible. Namely, when there is a small group of peers to
download from, leechers do not have much choice when they
apply the peer selection algorithm.
Notice also that the average downloading time is higher
when the peerset is dynamic because many leechers arrive
later in the torrent.
B. Scheduling, churn and transient phases
More generally, the authors in [31] show strategies for
selecting resources that minimize unwanted effects induced
by churn phases. In our case the resources are the pieces that
the seed has to schedule.
Scheduling for BitTorrent is also discussed by Mathieu
and Reynier in [9], which analyzes starvation in flash-crowd
phases. Using optimization theory, an uplink-sharing fluid
model whose goal is to find the optimal scheduling (piece
selection) for peer-to-peer networks has been investigated
in [8] and used in [32] and [33]. These approaches focus
either on the end game mode, where leechers are missing their
last pieces, or on the analysis of the minimum distribution
time. Our system-oriented approach targets the minimization
of the average downloading time by exclusively minimizing
the transient phases.
Measurement studies were also carried out, with focus on
BitTorrent transient phases [17]. The goal is to understand,
given a peer-to-peer system, how quickly the full service
capacity can be reached after a burst of demands for a
particular content, namely, how long the system stays in the
VII. R ELATED W ORK
Swarming techniques have received strong interest by the
research community, peer-to-peer traffic being a consistent
amount of the whole internet traffic [25]. “Between 50 and
12
1
0.9
1
0.9
0.5
0.4
0.3
0.2
0.1
0
0.8
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
00
0
0.7
0.6
0.5
0.4
0.3
0.2
First Come First Server
ProportionalFirst
Fair
Scheduling
Come
First Server
Proportional Fair Scheduling
(!=0.025)
Proportional Fair
Scheduling
0.1
Proportional Fair Scheduling (!=0.025)
1000
1000
500 Time [s]1000
Time [s]
DownloadingDownloading
0
2000 3000
Downloading Time [s]
(a) Planetlab with static peerset
Cumulative
Distribution Function
Cumulative Distribution Function
0.6
11
0.9
0.9
Cumulative Distribution Function
0.7
CumulativeCumulative
Distribution
Function
Distribution Function
Cumulative Distribution Function
0.8
11
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 0100
100
First Come First Server
Proportional
Scheduling
FirstFair
Come
First Server
Proportional Fair
Scheduling
Proportional
Fair(!=0.025)
Scheduling
Proportional Fair Scheduling (!=0.025)
1000
1000
Downloading
Time [s]
Downloading
Time [s]
1000
Downloading Time [s]
10000
10000
(b) Planetlab with dynamic peerset
Fig. 7. (a) Cumulative Distribution Function (CDF) of the downloading time for 350 leechers downloading a file of size 400 MB and joining the overlay
at once (no burstiness). Leecher depart when they finish their download. (b) CDF of the downloading time for 350 leechers downloading a file of size 400
MB. Burstiness value of 400. Leechers depart when they finish their download.
could turn out to be for the same piece. This would result in
the most possible collisions in each round, namely, the seed
may end up uploading the same piece for all its M upload
slots i.e. r = p and |P l+1 | = |P l | − 1 for all l. The expected
number of total collisions in the initial phase would then be:
X
X
p
p
L
1
1
L
=
i
2 i=1 p − (i − 1)
2 i=1 |S |
X
p
L
L
1
=
≈
ln p
2 j=1 j
2
transient phase. We studied initial phases using the fluid model
adopted by Qiu and Srikant in [4]. We fixed their notion of
effectiveness to capture seed scheduling effects during initial
phases.
C. Fairness
Even though the concept has been applied on the peer
and not on the piece selection, a recent solution that shows
how considering fairness may help speeding up the leechers
downloading time was discussed in FairTorrent [34].
VIII. C ONCLUSION
In this work we considered the piece selection of content
distribution protocols ala BitTorrent. We studied analytically
and we provided simulation and real evidence that improving
the scheduling algorithm at the seed can be crucial to shorten
the initial phase of a torrent therefore reducing the average
downloading time of the leechers. Our idea is to give seeds a
more global view of the system supporting and not substituting
the Local Rarest First piece selection algorithm used by
BitTorrent like protocols. We devised our seed scheduling
algorithm, inspired by the Proportional Fair Scheduling [20],
implementing it into a real file sharing client that we call
BUtorrent [12]. We found in simulation and PlanetLab experiments that BUtorrent, in dynamic overlays, increases the
effectiveness of file sharing, reduces the congestion of requests
at the seed improving up to 25% the average downloading time
over BitTorrent.
Case B: However, if the seed is following P F S, it will be
able to upload M distinct pieces in its M upload slots in
most rounds (in the final few rounds there will be less than
M distinct pieces to upload and collisions can’t be avoided).
Therefore, for this case, the number of rounds in the initial
phase reduces to p/M and the expected number of collisions
will reduce by a factor of M since:
p/M
L X 1
2 i=1 |S i |
=
p/M
L X 1
1
=
2 i=1 M p/M − (i − 1)
≈
p/M
1
L X
2 i=1 p − M (i − 1)
L 1
ln p/M
2 M
ACKNOWLEDGMENT
A PPENDIX - P ROOF OF P ROPOSITION 4
We are grateful to professor Steve Homer, Ilir Capuni,
Alessandro Duminuco and Heiko Röglin for their valuable
comments, and to Damiano Carra and Nobuyuki Mitsutake
for their previous work on this project.
Let us consider the initial phase of a torrent with one seed
and L fully connected leechers, trying to download a file with
p pieces. We consider Case A, where the seed is using FCFS
to choose which requests to satisfy and Case B where the seed
is using PFS. We compute the expected number of collisions
in the initial phase for each case and so the claim.
Case A: In the worse case the first M requests to the seed
R EFERENCES
[1] B. Cohen, “Incentives build robustness in bittorrent,” bittorrent.org,
Tech. Rep., 2003.
13
[31] B. P. Godfrey, S. Shenker, and I. Stoica, “Minimizing churn in distributed
systems,” in SIGCOMM ’06: Proceedings of the 2006 conference on
Applications, technologies, architectures, and protocols for computer
communications. New York, NY, USA: ACM, 2006, pp. 147–158.
[32] R. Kumar and K. W. Ross, “Peer-assisted file distribution: The minimum
distribution time,” in in HOTWEB, Boston, USA, 2006.
[33] V. B. A. Sweha, Raymond; Ishakian, “Angels in the cloud – a peerassisted bulk-synchronous content distribution service,,” Boston University, TR 2010-024, Tech. Rep., Aug 2010.
[34] A. Sherman, J. Nieh, and C. Stein, “Fairtorrent: bringing fairness to peerto-peer systems,” in CoNEXT ’09: Proceedings of the 5th international
conference on Emerging networking experiments and technologies. New
York, NY, USA: ACM, 2009, pp. 133–144.
[2] M. Piatek, T. Isdal, T. Anderson, A. Krishnamurthy, and A. Venkataramani, “Do incentives build robustness in bittorrent?” in Proceedings of
USENIX NSDI 2007. USENIX, April 2007.
[3] D. Levin, K. Lacurts, N. Spring, and B. Bhattacharjee, “BitTorrent
is an auction: analyzing and improving BitTorrent’s incentives,” in
SIGCOMM, vol. 38, no. 4, New York, NY, 2008, pp. 243–254.
[4] D. Qiu and R. Srikant, “Modeling and performance analysis of
bittorrent-like peer-to-peer networks,” in SIGCOMM. New York, NY:
ACM, 2004, pp. 367–378.
[5] A. R. Bharambe, C. Herley, and V. N. Padmanabhan, “Analyzing and improving a bittorrent networks performance mechanisms,” in INFOCOM,
2006.
[6] J. Mundinger, R. Weber, and G. Weiss, “Optimal scheduling of peer-topeer file dissemination,” J. of Scheduling, vol. 11, no. 2, pp. 105–120,
2008.
[7] OpenP4P. [Online]. Available: http://www.openp4p.net
[8] J. Mundinger, R. Weber, and G. Weiss, “Optimal scheduling of peertopeer file dissemination,” J. Scheduling, 2006.
[9] F. Mathieu and J. Reynier, “Missing piece issue and upload strategies in
flashcrowds and p2p-assisted filesharing,” in Telecommunications, 2006.
AICT-ICIW., 2006, p. 112.
[10] M. Izal, G. Urvoy-Keller, E. Biersack, P. Felber, A. Hamra, and
L. Garces-Erice, “Dissecting bittorrent: Five months in a torrent’s
lifetime,” in Proceedings of the 5th Passive and Active Measurement
Workshop, 2004.
[11] W. Yang and N. Abu-Ghazaleh, “Gps: a general peer-to-peer simulator
and its use for modeling bittorrent,” in Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2005. 13th IEEE
International Symposium on, 2005, pp. 425–432.
[12] [Online]. Available: http://csr.bu.edu/butorrent
[13] B. N. Chun, D. E. Culler, T. Roscoe, A. C. Bavier, L. L. Peterson,
M. Wawrzoniak, and M. Bowman, “Planetlab: an overlay testbed for
broad-coverage services,” Computer Communication Review, vol. 33,
no. 3, pp. 3–12, 2003.
[14] D. J. Watts and S. H. Strogatz, “Collective dynamics of ’small-world’
networks.” Nature, vol. 393, pp. 440–442, 1998.
[15] R. Bindal, P. Cao, W. Chan, J. Medved, G. Suwala, T. Bates, and
A. Zhang, “Improving traffic locality in bittorrent via biased neighbor
selection,” in ICDCS, 2006.
[16] [Online]. Available: http://aws.amazon.com/s3/
[17] X. Yang and G. de Veciana, “Service capacity of peer to peer networks,”
in INFOCOM, 2004.
[18] E. Cockayne and S. Hedetniemi, “Optimal domination in graphs,” IEEE
Transactions on Circuits and Systems, vol. 22, no. 11, pp. 855–857, Nov
1975.
[19] M. Garey and D. Johnson, Computers and Intractability: A Guide to the
Theory of NP-Completeness. W. H. Freeman, 1979.
[20] H. Kushner and P. Whiting, “Convergence of proportional-fair sharing
algorithms under general conditions,” Wireless Communications, IEEE
Transactions on, vol. 3, 2004.
[21] P. Michiardi, K. Ramachandran, and B. Sikdar, “Modeling seed scheduling strategies in BitTorrent,” in Networking 2007, 6th IFIP international
conference on Networking, May 14 -18, 2007, Atlanta, USA — Also
published as LNCS Volume 4479, May 2007.
[22] J. C. B. and H. Burkill, A Second Course in Mathematical Analysis.
Cambridge University Press, 2002.
[23] C. Dale, J. Liu, J. Peters, and B. Li, “Evolution and enhancement of
bittorrent network topologies,” in Quality of Service, 2008. IWQoS 2008.
16th International Workshop on, 2008, pp. 1–10.
[24] http://www-sop.inria.fr/planete/Arnaud.Legout/Projects/.
[25] C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D. Moll, R. Rockell,
T. Seely, and C. Diot, “Packet-level traffic measurements from the sprint
ip backbone,” IEEE Network, vol. 17, pp. 6–16, 2003.
[26] “http://torrentfreak.com/peer-to-peer-traffic-statistics/. 2010.”
[27] D. Carra, G. Neglia, and P. Michiardi, “On the impact of greedy
strategies in bittorrent networks: the case of bittyrant,” in IEEE P2P,
2008.
[28] [Online]. Available: http://bittornado.com/docs/superseed.txt
[29] Z. Chen, Y. Chen, C. Lin, V. Nivargi, and P. Cao, “Experimental analysis
of super-seeding in bittorrent,” in Communications, 2008. ICC ’08. IEEE
International Conference on, 19-23 2008, pp. 65 –69.
[30] B. Sanderson and D. Zappala, “Reducing source load in bittorrent,”
Computer Communications and Networks, International Conference on,
vol. 0, pp. 1–6, 2009.
14
Download