Bittorrent: The protocol, its background and uses 1. BitTorrent Background a) What is BitTorrent? b) Who’s the author, history 2. The Protocol a) b) c) d) Terminology Distributed Scenario Structure of .torrent files Protocol between peers and trackers 3. BitTorrent Applications a) Bittorent Inc, Usages throughout industry BitTorrent “You get so tired of having your work die,” he says. “I just wanted to make something that people would actually use.” • The above quote if from Bram Cohen, BitTorrent’s author, in an interview with Wired in 2005. What is BitTorrent? From 10,000 feet Efficient content distribution system using file swarming. Does not perform all the functions of a typical p2p system, like searching. http://www.cs.uiowa.edu/~ghosh/bittorrent.ppt What is BitTorrent? • BitTorrent introduced two novel concepts • Rather than providing a search protocol itself, it was designed to integrate seamlessly with the Web and made files (torrents) available via Web pages, which could be searched for using standard Web search tools. • It enabled so-called file swarming; that is, once a peer starts downloading that file, it also makes whatever portion of the file that is downloaded immediately available for sharing. What is BitTorrent • The file-swarming process is enabled through the use of a tracker: • an HTTP-based server used to dynamically synchronise and update the peers as they are downloading - tracks availability of pieces of the file on the network. • The tracker also can monitor users’ usage on the network – how much do they contribute? • Then implements a tit-for-tat scheme, which divides bandwidth according to how much a peer contributes to the other peers in the network – if you do not share, you cannot consume. BitTorrent Bram Cohen • Born 1975 - computer programmer • Engineered large parts of Mojo Nation (mojonation.net) - parts of it similar in flavour to Bittorrent (Pre April 2001). • April 2001, Focused on authoring the peer-to-peer (P2P) BitTorrent protocol and writing the first file sharing program to use the protocol, also known as BitTorrent. • He is also the organizer of the San Francisco Bay Area P2P-hackers meeting, and the co-author of Codeville. Currently lives in the San Francisco Bay Area Start of BitTorrent - CodeCon • Cohen unveiled his novel ideas at the first CodeCon conference in 2002 • CodeCon is a conference for hackers and technology enthusiasts. • Co-organised by Bram and his roommate Len Sassaman. • CodeCon intended to be a low cost conference (I.e. <$100) with a focus on developers doing presentations of working code, rather than on companies with products to sell. • It remains an event for those seeking information about new directions in software, though BitTorrent continues to lay claim to the title of "most famous presentation". Features? • Peer-to-peer in nature Taxonomy for Distributed Systems Taxonomy is based on following factors and their relation to centralization: 1. Resource Discovery: Mechanism for discovering resources on a distributed system? • Examples: DNS, Napster Lookup, Jini LUS, UDDI, Gnutella broadcast etc 2. Resource Availability: Scalability – do resources scale with network? - does access to them scale with network? 3. Resource Communication: Two types: Brokered Communication (centralized): communication is passed through a central server - resources do not have direct references to each other. Point to point (decentralized -peer to peer): a direct connection between the sender and the receiver. Centralization of Point-to-Point Connections True Peer to Peer e.g. Gnutella Web Server Equal Peers, balanced (equal) load on communication Many to one relationship between users and the web server and therefore this can be considered centralized communication BitTorrent pieces pieces pieces Features? • • • • • Peer-to-peer in nature Central server called a tracker Tracker uses HTTP Download and upload at the same time Efficiency improves the more a file is downloaded Downloading Speeds Download speeds depend on two factors: • BitTorrent keeps track of how much you contribute to hosting files for the group. • The more you share, the faster your downloads. • The more people trading a file, the more options for obtaining its pieces. • So, unlike the old Napster, popularity doesn't bog down the process -- it gives it a shot of adrenaline • Trackers also more dynamic than Napster servers - provide updates File Swarming • File swarming allows users to download files to the maximum of their Download capability of their broadband connection • Enables simultaneous downloads of pieces of the same file from multiple users. • Significant because broadband has a far lower Upload bandwidth than Download • upload bandwidth can be ten times slower than download • You can connect to, say, ten peers, will balance this mismatch and enable full download capacity BitTorrent Protocol • The BitTorrent protocol is an open specification • Can be found in full on the BitTorrent Web site • Is updated periodically in order to keep various BitTorrent applications compatible. Terminology 1 • Torrent - metadata file containing the information about a file to be shared on the BitTorrent network • Peer - a participant in the network • Seed - the peer that has a complete copy of the file (who probably created the torrent) • Swarm - peers that are connected (interested) in a particular file • Tracker - server responsible for keeping track of the people in a swarm Terminology 2 • Choked - state of a connection when a peer does not wish to upload information at this time (perhaps because s/he already has too many connections) • Interested - a client is “interested” if they are interested in downloading a file from another BT node. • Piece - piece of a file in Bittorrent - typically a power of 2, depends on file size - common sizes are 256K, 512K or 1MB. • Bencoding - terse format for BitTorrent messages BitTorrent A BitTorrent application generally has the following components: • • • • • • An 'original' downloader - seed An ordinary web server The end user web browsers - they click on a: A static 'metainfo' file (a .torrent file) Start the end user downloading apps (BitTorrent) A BitTorrent tracker • There are ideally many end users for a single file. Lectures as .Torrent Seed - Ian T. 1. Ian creates IansLectures.torrent, (metadata) and uploaders it to Web site Web Server Web Sites contain .torrent files IansLectures.torrent 2. User clicks IansLectures.torrent, which launches the BitTorrent Client User Web Browser BitTorrent Client (enthusiastic student) Other BitTorrent Client (enthusiastic student) Because of MIME mapping from .torrent to BitTorrent application 4. BitTorrent client contacts specified tracker and finds “interested” clients Tracker Other BitTorrent 3. Clients show interest in IansLectures.torrent Client (enthusiastic student) 5. Clients connect to each other and seed to download pieces BitTorrent Messages - Bencoding • Bencoding is a way to specify and organize data in a terse format. It supports the following 4 types: • Strings are encoded as follows: <string length>:<string data> e.g. 4:spam represents the string "spam" • Integers are encoded as follows: i<integer>e e.g. i3e represents the integer "3” • Lists are encoded as follows: l<bencoded values>e e.g. l4:spam4:eggse represents the list of two strings: [ "spam", "eggs" ] • Dictionaries are encoded as follows: d<bencoded string><bencoded element>e - note keys must be bencoded strings. E.g. d4:spaml1:a1:bee represents the dictionary { "spam" => [ "a", "b" ] } .torrent Files The content of a ".torrent" is a bencoded dictionary, containing: • announce: The URL of the tracker (string) - later versions have lists of trackers. • info: a dictionary that describes the file(s) of the torrent contains the following: • Name - name for the file • Piece length: number of bytes in each piece (integer) • Pieces: string consisting of the concatenation of all 20-byte SHA1 hash values, one per piece (byte string) • Format changes if there’s one file (as above) or many, where there are files occurrences of the above information (piece length and pieces) and path is used to replace name for uniqueness. BitTorrent - Trackers Centralised: All clients go to one server The BitTorrent Solution: customers help distribute content Their contribution grows at the same rate as their demand, creating limitless scalability for a fixed cost. Tracker maintains the process Tracker Scenario Step 1 - Pieces 1, 2 and 3 Step 2 - Pieces 4, 5 and 6 Seed Tracker Update ! BT 1 BT 3 Step 2 - Piece 1 Step 2 - Piece 3 BT 2 Step 1 Step 2 Step 2 - Piece 2 Tracker GET Request Peer -> Tracker • Info_hash - 20 byte SHA1 hash of the bencoded form of the info value from the metainfo file. • Peer_id - string of length 20 containing ID of downloader - generated at random at the start of a new download. • IP - IP (or dns name) of peer. • Port - port number for the peer - tries port 6881 and if that port is taken try 6882, then 6883, etc. and give up after 6889. • Uploaded - total amount uploaded so far. • Downloaded - The total amount downloaded so far. • Left - number of bytes this peer still has to download • Event - optional key which maps to started, completed, or stopped (or empty, which is the same as not being present). Tracker Response • Tracker -> peer • Tracker responses are bencoded dictionaries. • If a tracker response has a key failure reason, then that maps to a human readable string which explains why the query failed, and no other keys are required. • Otherwise, it must have two keys: • Interval which maps to the number of seconds the downloader should wait between regular rerequests • Peers maps to a list of dictionaries corresponding to peers, each of which contains the keys peer id, ip, and port, which map to the peer's self-selected ID, IP address or dns name as a string, and port number, respectively. Scenario Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Scenario Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Scenario Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Scenario Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Scenario Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Scenario Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Scenario Tracker Web Server C A Peer Peer [Leech] B Downloader Peer “US” [Leech] [Seed] Strengths • Better bandwidth utilization • Never before speeds. • Up to 7 MB/s from the Internet. • Limit free riding – tit-for-tat • Limit leech attack – coupling upload & download • Spurious files not propagated • Ability to resume a download • Open Source implementations ! Potential Drawbacks • Small files – latency, overhead • Scalability • Millions of peers – Tracker behavior (uses 1/1000 of bandwidth) • Single point of failure - although there can be many trackers, there is only one tracker assigned to each torrent file • Difficult to load balance • Solved later by having lists of alternative trackers • Robustness • System progress dependent on altruistic nature of seeds (and peers) • Malicious attacks and leeches. Who Uses it? • 160 million clients, 100 million active users • According to their website, the company has announced partnerships with some 55 companies, including: Bittorrent: summary 1. BitTorrent a) b) c) d) e) Underlying file sharing protocol Role of the .torrent Use and role of the tracker Bittorrent Scenario How file swarming works