Introduction to BitTorrent

advertisement
1
Alexey Zagalsky
INTRODUCTION TO
BITTORRENT
Based on data from Wikipedia
slides from “Introduction to BitTorent” by Arvid Norberg
slides from”BitTorrent Background” by Hilel
What is BitTorrent
2



BitTorrent is a peer-to-peer file sharing protocol
used for distributing large amounts of data
It has been estimated that it accounted for roughly
27% to 55% of all Internet traffic (depending on
geographical location) as of February 2009.
Programmer Bram Cohen designed the protocol in
April 2001
Reasons for Adoption
3

Better performance through “pull-based” transfer
 Slow


nodes don’t bog down other nodes
Allows uploading from hosts that have downloaded
parts of a file
Practical Reasons (perhaps more important!)
 Working
implementation with simple well-defined
interfaces for plugging-in new content
 Many recent competitors got sued / shut down
 Napster,
Kazaa
How It Works
4

The file to be distributed is split up in pieces and a
SHA-1 hash is calculated for each piece
How It Works
5

A metadata file (.torrent) is distributed to all peers
 Usually

via Web, Email, etc…
The metadata contains:
 SHA-1
hashes of all pieces
 Tracker reference (URL)
 Piece Length: usually 256KB
 is
it better smaller or bigger pieces ?
How It Works
6




The user makes the file itself available through a
BitTorrent node acting as a seed
The Tracker is a central server keeping a list of all
peers participating in the swarm
A swarm is the set of peers that are participating in
distributing the same file
A peer joins a swarm by asking the tracker for a
peer list and connects to those peers
A Peer Joins The Swarm
7
A Peer Joins The Swarm
8
Seeding A File
9
Terminology
10


A downloader is any peer that does not have the
entire file and is downloading the file
A leecher is:
A
peer who has a negative effect on the swarm by
having a very poor share ratio
 A downloader

A seeder is a peer that has an entire copy of the
torrent and offers it for upload
Goals
11

Efficiency
 Ability
to download from many peers yields fast
downloads
 Minimize piece overlap among peers
 Download
random pieces
 Rarest First algorithm

Reliability
 Tolerant
to dropping peers
 Ability to verify data integrity (SHA-1 hashes)
Rarest First
12




The piece picking algorithm used in BitTorrent is called
Rarest First
To maximize the distributed copies, maximize the
availability of the rarest pieces
Picks a random piece from the set of rarest pieces
No peer has global knowledge of piece availability,
it is approximated by the availability among
neighbors
Incentive to Share
13
Policies to determine to whom to send data:
 Tit-for-Tat
Upload to whoever uploads the most to you
 "Survival of the fittest“
 Theoretically increases performance by encouraging fast
peers to upload to you and giving them even more pieces to
upload to others
 May result in suboptimal situations


Optimistic Unchoking
In hope of discovering better partners
 To ensure that newcomers get a chance to join the swarm

Tit-for-tat as Incentive to Upload



Want to encourage all peers to contribute
Peer A is said to choke peer B if it (A) decides not to
upload to B
Each peer (say A) unchokes at most 4 interested peers
at any time

The three with the largest upload rates to A


Where the tit-for-tat comes in
Another randomly chosen (Optimistic Unchoke)

To periodically look for better choices
Limitations
15

Content unavailability
 Although
swarming scales well to tolerate flash crowds
for popular content, it is less useful for unpopular
content

The leech problem
A
user may often choose to leave the swarm as soon as
they have a complete copy of the file they are
downloading

Pieces not downloaded in sequential order (think
VOD)
Trackerless Torrents
16

Common problems with Trackers:
 Single

point of failure
Solutions:
 Multiple
 DHT
Trackers (splits swarms)
Distributed Hash Table
17


Works as a hash table with SHA-1 hashes as keys
The key is the info-hash, the hash of the metadata
 It

uniquely identifies a torrent
The data is a peer list of the peers in the swarm
Download