1 Alexey Zagalsky INTRODUCTION TO BITTORRENT Based on data from Wikipedia slides from “Introduction to BitTorent” by Arvid Norberg slides from”BitTorrent Background” by Hilel What is BitTorrent 2 BitTorrent is a peer-to-peer file sharing protocol used for distributing large amounts of data It has been estimated that it accounted for roughly 27% to 55% of all Internet traffic (depending on geographical location) as of February 2009. Programmer Bram Cohen designed the protocol in April 2001 Reasons for Adoption 3 Better performance through “pull-based” transfer Slow nodes don’t bog down other nodes Allows uploading from hosts that have downloaded parts of a file Practical Reasons (perhaps more important!) Working implementation with simple well-defined interfaces for plugging-in new content Many recent competitors got sued / shut down Napster, Kazaa How It Works 4 The file to be distributed is split up in pieces and a SHA-1 hash is calculated for each piece How It Works 5 A metadata file (.torrent) is distributed to all peers Usually via Web, Email, etc… The metadata contains: SHA-1 hashes of all pieces Tracker reference (URL) Piece Length: usually 256KB is it better smaller or bigger pieces ? How It Works 6 The user makes the file itself available through a BitTorrent node acting as a seed The Tracker is a central server keeping a list of all peers participating in the swarm A swarm is the set of peers that are participating in distributing the same file A peer joins a swarm by asking the tracker for a peer list and connects to those peers A Peer Joins The Swarm 7 A Peer Joins The Swarm 8 Seeding A File 9 Terminology 10 A downloader is any peer that does not have the entire file and is downloading the file A leecher is: A peer who has a negative effect on the swarm by having a very poor share ratio A downloader A seeder is a peer that has an entire copy of the torrent and offers it for upload Goals 11 Efficiency Ability to download from many peers yields fast downloads Minimize piece overlap among peers Download random pieces Rarest First algorithm Reliability Tolerant to dropping peers Ability to verify data integrity (SHA-1 hashes) Rarest First 12 The piece picking algorithm used in BitTorrent is called Rarest First To maximize the distributed copies, maximize the availability of the rarest pieces Picks a random piece from the set of rarest pieces No peer has global knowledge of piece availability, it is approximated by the availability among neighbors Incentive to Share 13 Policies to determine to whom to send data: Tit-for-Tat Upload to whoever uploads the most to you "Survival of the fittest“ Theoretically increases performance by encouraging fast peers to upload to you and giving them even more pieces to upload to others May result in suboptimal situations Optimistic Unchoking In hope of discovering better partners To ensure that newcomers get a chance to join the swarm Tit-for-tat as Incentive to Upload Want to encourage all peers to contribute Peer A is said to choke peer B if it (A) decides not to upload to B Each peer (say A) unchokes at most 4 interested peers at any time The three with the largest upload rates to A Where the tit-for-tat comes in Another randomly chosen (Optimistic Unchoke) To periodically look for better choices Limitations 15 Content unavailability Although swarming scales well to tolerate flash crowds for popular content, it is less useful for unpopular content The leech problem A user may often choose to leave the swarm as soon as they have a complete copy of the file they are downloading Pieces not downloaded in sequential order (think VOD) Trackerless Torrents 16 Common problems with Trackers: Single point of failure Solutions: Multiple DHT Trackers (splits swarms) Distributed Hash Table 17 Works as a hash table with SHA-1 hashes as keys The key is the info-hash, the hash of the metadata It uniquely identifies a torrent The data is a peer list of the peers in the swarm