Measurements, Analysis, and Modeling of BitTorrent-like Systems Lei Guo1, Songqing Chen2, Zhen Xiao3, Enhua Tan1, Xiaoning Ding1, and Xiaodong Zhang1 1College of William and Mary 2George Mason University, 3AT & T Labs - Research 1 Basic Model of P2P Systems • Peers sharing different files selforganize into a P2P network • Exchange files they desire ♫ • Limitations – Free riding – Large file downloading Examples: Gnutella, KaZaa, eDonkey/eMule/Overnet 2 BitTorrent: Fast Delivery with Incentive • A large file is divided into chunks 4 5 • Peers interested in the same file self-organize into a torrent Torrent of Bits • Peers exchange file chunks with each other • Incentive is established by tit for tat ... • Very simple and effective, scale fairly well during flash crowd 3 BitTorrent Traffic • Online users – 6.8 million in August 2004, 9.6 million in August 2005 (BigChampagne) • Traffic volume – 53% of all P2P traffic on the Internet in June 2004 (CacheLogic) P2P traffic: 60-80% Other traffic: 20-30% Source: CacheLogic, 2004 Limited Understanding of BitTorrent • Existing studies on BitTorrent systems (INFOCOM04, SIGCOMM04) – Unrealistic assumptions in system model: no evolution considered – Single-torrent based: more than 85% BT users join multiple torrents • What we are not clear about BitTorrent systems – Service availability – Service stability during the entire lifetime – Service fairness • Our objective of this work – Evolution of single-torrent system, and limitations of BT – Multi-torrent model for inter-torrent relation and collaboration 5 Outline • BitTorrent mechanism and our methodology • Modeling and characterization of single-torrent system • Modeling and characterization of multi-torrent system • Inter-torrent collaboration • Conclusion 6 How BitTorrent Works: Publishing seed ... foo.torrent 3 4 5 I am here! foo.torrent peer list announce: tracker URL for bootstrap creation date: epoch time of file creation length: file size name: file name piece length: chunk size pieces: SHA1 hash key of each chunk Tracker site Web site The publisher – – – – Create a meta file Publish on a Web site Start the tracker site Start a BT client as the initial seed 7 How BitTorrent Works: Downloading seed ... The downloader 3 4 5 foo.torrent peer list – Download the meta file – Start a BT client, connect to the tracker site – Get peer list from tracker – Get first chunk from other peers (seeds) Tracker site Web site peer list download I am here! foo.torrent 8 How BitTorrent Works: Downloading seed ... The downloader 3 4 5 foo.torrent peer list User Tracker site Web site – Download the meta file – Start a BT client, connect to the tracker site – Get peer list from tracker – Get first chunk from other peers (seeds) – Exchange file chunk with other peers – Download complete: become a new seed foo.torrent foo.torrent 9 User How BitTorrent Works: Downloading seed Future performance The downloader ... and Depends on the arrival 3 4 departure 5 of new downloaders and seeds foo.torrent peer list User Tracker site Web site – Download the meta file – Start a BT client, connect to the tracker site – Get peer list from tracker – Get first chunk from other peers (seeds) – Exchange file chunk with other peers – Download complete: become a new seed – Initial seed leaves foo.torrent foo.torrent seed ... User 3 4 5 10 Our Methodology of this Study • Measurement – BitTorrent traffic pattern – Meta file downloading and tracker statistics • Analysis – BitTorrent user behavior and performance limitations – Curve fitting, parameter estimation and validation of mathematical models • Modeling – Torrent evolution and inter-torrent relation – Fluid model, probability model, and graph model 11 Meta File Downloading • The first HTTP packets of .torrent file downloading – Cable network: 3,000+ downloads, 1,000+ torrent meta files – Server farm: 50 tracker sites host hundreds of torrents – Gigasope: fast Internet traffic monitoring tool by AT&T • What information it contains? – Torrent birth time – Peer arrival time to the torrent (packet capture time of downloading) – About 10 days announce: tracker URL creation date: epoch time of file creation length: file size name: file name piece length: chunk size pieces: SHA1 hash key of each chunk foo.torrent 12 Torrent Statistics on Trackers • Professional/dedicated tracker sites – Each may host thousands of torrents at the same time – http://www.alluvion.org/ and http://www.crapness.com/, collected by University of Massachusetts, Amherst – Ex: alluvion -- 1,500 torrents, 550 are fully traced • What information it contains? – Torrents: torrent birth time, file size, number of peers/seeds – Peers: request time, downloading/uploading bytes, downloading/uploading bandwidth – Sampled every 0.5 hour for 48 days 13 Outline • BitTorrent mechanism and our methodology • Modeling and characterization of single-torrent system – The evolution of torrent over time – Limitations of current BitTorrent systems • Modeling and characterization of multi-torrent system • Inter-torrent collaboration • Conclusion 14 Torrent Popularity tracker site workload ------ raw data ------ linear fit ------ raw data ------ linear fit 104 102 102 101 100 0 100 200 100 0 20 40 individual torrents relative deviation (%) CCDF of peer arrival meta file workload 103 time after torrent birth (day) Peer arrivals: decrease with time exponentially number of arrivals Peer arrival rate t derivative of CCDF 30 20 6% in average 10 0 100 300 500 torrents ranked by population (non ascending order) e t (t ) 0e t 15 Torrent Death Peer n arrives at time tn : peer arrival rate: inter-arrival time: seed leaving rate: seed service time: downloading rate: u downloading time: When tn , what will happen? peer n tn inter-arrival time > seed service time 1 1 un tn 1 (tn ) tn tn1 tn 1 (tn ) 1 1 u peer n+1 tn+1 torrent dead t 1 u n1 16 Torrent Population and Lifespan 0 Tlife log( ) t N all 0e dt 0 0 104 104 torrent lifespan (hour) torrent population trace model 102 100 100 101 rank of torrents 102 103 Most torrents are small (avg 102) trace model 102 100 0 200 400 600 torrents Most torrents are short live (avg 8 days) 17 Downloading Failure Ratio • Avg downloading failure ratio – about 10% • Different evolution patterns • Small population large Rfail – Reminder: most torrents have small population! population download failure 104 10-1 102 10-2 10-3 0 200 400 torrent population N fail R fail N all 0 100 downloading failure ratio • Define: 100 600 torrents ranked in non-ascending order of downloading failure ratio • Altruistic peers make torrents long live 18 Torrent Evolution: Fluid Model • Existing model (SIGCOMM 04) – Constant arrival rate = const – Torrent reaches equilibrium • The correct model – Exponentially decreasing arrival rate – Torrent dead finally – Verified by our measurements • Two completely different pictures 19 # of downloaders Torrent Evolution: Modeling Results 80 constant arrival model model 40 • Flash crowd – Downloader #: exponentially – Seed #: exponentially • Peek time 0 100 80 # of seeds trace 200 trace model constant arrival model 40 0 100 time (hour) 200 – A very short duration – Constant arrival model: flat peak • Attenuation – a long tail – Downloader #: exponentially – Seed #: exponentially – Constant arrival model is far from the reality: no attenuation • Torrent death 20 Performance Stability 104 101 10 model trace download speed 105 8 10 5 6 103 4 101 2 0 50 100 150 200 time (hour) Only stable when torrent is large Fluctuate significantly after peak time 0 50 100 downloader seed 150 200 avg download speed (byte/sec) 15 Snapshot of torrents at time t # of peers avg download speed (byte/sec) Evolution over time torrents Larger torrents have higher and more table performance 21 Service Unfairness 104 10-2 0 0.2 –x– download speed 102 0.4 0.6 0.8 1 + contribution ratio 102 100 101 10-2 0 0.2 0.4 –x– # of torrents 0 10 0.6 0.8 1 ranked peers ranked peers Contribution ratio: • uploaded bytes • downloaded bytes 103 # of torrents 100 102 peer contribution ratio + contribution ratio 106 download speed (byteps) peer contribution ratio 102 Unfairness: download speed, uploading contribution Seeds serve high speed downloaders first – Peers not willing to serve after downloading – Not due to new file downloading: selfish 22 Single-torrent Model: Summary • Torrent evolution over time – Exponentially decreasing arrival rate – Flash crowd – short peak – long tailed attenuation • BitTorrent Limitations – Content availability: torrent death – Performance stability – Service fairness 23 Outline • BitTorrent mechanism and our methodology • Modeling and characterization of single-torrent system • Modeling and characterization of multi-torrent system – Traffic pattern and user behavior – Graph based model of inter-torrent relation • Inter-torrent collaboration • Conclusion 24 Multi-torrent Environment Dynamics ------ linear fit ------ raw data Peer birth CDF of peers ------ raw data Request arrival CDF of requests CDF of torrents Torrent birth ------ linear fit ------ raw data ------ asymptotic fit Torrent birth time, request arrival time, and peer birth time (hour) • Considering peers and torrents on the Internet as an open system – Torrent birth rate, torrent request rate, and peer birth rate are constant • Implication: avg # of torrents a torrent request rate = = constant peer requests peer birth rate • The lifecycle of a BT peer: downloading, seeding, sleeping, …, dead 102 104 101 r (day) 108 100 0 # of torrents Peer Request Pattern: Request Rate Peer request rate: requests by a peer to different torrents per unit time Assume –x– # torrents 100 + r 2000 4000 peers r (t ) r0e t r r(t )dt const 0 r 77 years ! • Peer request process: seems Poisson-like • Request a new torrent with a probability p: participation probability • Dead with probability 1-p 26 number of torrents (m) Peer Request Pattern: Participation Probability ––– raw data ––– linear fit 40 Probability model i Npm1 peers request at least m torrents m 1 20 log i log N log p p = 0.8551 0 100 104 102 peer rank (log i) Probability model confirmed Another estimation of p m torrent request rate peer birth rate N m k 1 kpk 1 (1 p ) 1 1 p m 7.514 p 0.8548 27 Inter-torrent Relation Graph: How Torrents Can Help with Each Other? i j 1 i some peers in torrent j have downloaded i some peers in torrent i have downloaded j j 2 28 trace model torr size i weighted in-degree torrents trace model torr size torrent size (# of online peers) weighted out-degree Inter-torrent Relation Graph: How Torrents Can Help with Each Other? j some peers in torrent i have downloaded j 1 i some peers in torrent j have downloaded i • j 2 Edge weight Wi,j : number of such peers out degree : SPi j 1Wi , j in degree : SG j i 1Wi , j torrents 29 Single-torrent vs. Multi-torrent Model • Single-torrent model – seed service time, download failure rate – Limited seed service time , but inter-arrival time exponentially – Small improvement • Multi-torrent model – Old peers come back multiple times – peer arrival rate, peer inter-arrival time – Significant improvement 30 Single-torrent vs. Multi-torrent Model Single-torrent model 0 T log( ) life R fail 0.1 0 seeds stay 10 times longer: * = /10 0 * T log( ) Tlife log 10 life * * * R 1 R fail 0.01 fail 0 100 10 Multi-torrent model k ( t ) 1 t q 1 ( t ) e (t ) 0 q 1 1 r k (t ) rt , q pe 1 torrent death ' (T'life) = Tlife T life r log 1 Tlife ( 6) p Tlife R fail R 6fail 110-6 ≈ 0 R fail e Inter-torrent collaboration is much more effective than stimulating seeds to serve longer 31 Outline • BitTorrent mechanism and our methodology • Modeling and characterization of single-torrent system • Modeling and characterization of multi-torrent system • Inter-torrent collaboration – Tracker site overlay – Instant incentive for collaboration • Conclusion 32 Tracker Site Overlay B Neighbor-in A B C torrents that can serve me Neighbor-out D C • • • • D torrents that I can serve (peer list) Self-organized P2P network (a logical structure) An instance of inter-torrent relation graph A built-in mechanism for content search, cover 99%+ torrents Trackerless BitTorrent: uses DHT to store meta file 33 Incentive for Inter-Torrent Collaboration B file A User User file D Jack A Thanks Jack! C D Tom User Instant incentive – similar to “tit-for-tat” principle • Neighboring cycle detection • Neighboring cycle construction – Bandwidth trading: get one chunk, serve multiple peers 34 Conclusion • Extensive analysis and modeling to study the behaviors of BT-like systems – Tracker trace and .torrent downloading trace – Mathematical model • BitTorrent system has its limitations due to exponentially decreasing peer arrival rate – Service availability, performance stability, and fairness • Graph based multi-torrent model • System design for inter-torrent collaboration 35 Thank you! 36 Backup for Questions 37 Torrent Lifespan (t ) 0e t log t log 0 t • Extract t and t from trace • Get 0 and using linear regression • Lifespan model verified by measurement 0 Tlife log( ) torrent lifespan 104 torrent lifespan (hour) tn tn1 tn 1 1 (tn ) trace model 102 100 0 200 400 torrents 600 38 Torrent Population Total population • Model verified by measurement t N all 0e dt 0 • Observations: 0 torrent population 104 – The population of most torrents are small (102 in average) trace model – Downloading failure ratio N fail R fail N all 102 – Small population large Rfail 100 100 101 102 103 rank of torrents (in non-ascending order of modeling results) 39 Torrent Evolution: Fluid Model Basic equation set dx (t ) e t (x (t ) y (t )), 0 dt dy (t ) dt (x (t ) y (t ) y (t )), x (0) 0, y (0) 1. Resolution t 1t 2t x (t ) ae be d1e , t 1t 2t y (t ) c1e c2e d 2e , y (t ) ). u(t ) ( x ( t ) Parameters x(t) number of downloaders y(t) number of seeds 0 initial peer arrival rate attenuation parameter of uploading bandwidth c downloading bandwidth (c >> ) seed leaving rate file sharing efficiency 1,2 a,b,c1, c2,d1,d2 eigen values of the equation set constants 40 Peer Request Pattern: Summary • Multi-torrent environment: an open model – Torrent birth rate: 0.9454 per hour (nearly a constant) – Peer birth rate: 19.37 per hour (nearly a constant) – Torrent request rate (for all peers over all torrents): 133.39 per hour (nearly a constant) • Actually increase slowly according to BigChampagne increasing rate : • 9.6 M 6.8 M 6.8 M 365 24 hours 0.0047% per hour Peer request pattern – Lifecycle: downloading, seeding, sleeping, …, next req with prob. p – Peer participation probability: 0.85 – Request rate (for different torrents by a peer): Poission-like 41 Tracker Site Overlay • Table size N O( ) 1 p • Node degree distribution – Similar to unstructured P2P networks • Many content search and msg routing algorithms – Flooding – Random walk – … • Trackerless BitTorrent: uses DHT to store meta file 42 Simulation Experiments without inter-collaboration with inter-collaboration content availability downloading failure ratio Rfail 0 performance stability service fairness downloading speed contribution ratio more stable more balanced Inter-torrent collaboration can improve BitTorrent performance 43