BitTorrent BitTorrent network On the itinerary: Introduction to BitTorrent Basics & properties 3 Interesting analysis results Publishing How to publish a (usually large) file ? Dedicated server: Easy to manage, Easy to find, Persistent service Nevertheless… BitTorrent: Organizes multiple clients that share the same file Leveraging the upload bandwidth of the participants Self scaling, resilience, operates well in “Flash crowed” period Takes 50%-60% of all p2p traffic The Basics of BitTorrent Content provider (everyday people) wants to publish a file (the initial seed): Creates a meta file (*.torrent) Publish this (light) *.torrent file on a web server File is broken to small blocks (32-256 KB) Uploads blocks to other peers The goal: to publish the file to many nodes by the help of other peers, with minimum load on the seeder The Basics of BitTorrent Third party (the tracker): A tracker site keeps track of the active participants + extra statistics. Upon requests from nodes, supplies a random subset of active nodes Receives updates from the active nodes Keeps track of new node joining the ‘torrent’ (or ‘swarm’) and nodes that left it The Basics of BitTorrent Peer (leecher) who is interested: Obtains the public *.torrent file Being directed to the tracker Obtains a list of random neighbors (~40) Downloads and uploads blocks to its ‘best’ neighbors (choking and unchoking) Upon download completion, becomes a seed BitTorrent demo (Wikipedia) BitTorrent demo (Wikipedia) BitTorrent demo (Wikipedia) BitTorrent demo (Wikipedia) BitTorrent demo (Wikipedia) BitTorrent demo (Wikipedia) BitTorrent demo (Wikipedia) BitTorrent demo (Wikipedia) BitTorrent demo (Wikipedia) BitTorrent demo (Wikipedia) BitTorrent basic schemes (Immediate) Problems arise: Last block problem: assume nodes depart upon completion, how would leeches obtain the last block ? Free riding problem: willing to download, but unwilling to serve others Simple and effective solutions: Last block problem: Nodes employ Local Rarest First policy Free riding problem: Nodes employ “Tit For Tat” policy, i.e. give more to those whom you accept more from On the itinerary One interesting limitation of BitTorrent networks BitTorrent provides poor service availability via analysis of tracker logs over long period Proposition of peer selection approach Enables to lower costs and resources of ISPs Does not require ISP and peers cooperation Implementation of the proposition as part of Azureus client, codename ‘Ono’ Tradeoffs Performance ratio) (avg. Download time) vs. Fairness (avg. share BitTorrent limitations Taken from the paper “Measurements, Analysis and Modeling of BitTorrent-like systems” [1] Inspect overall performance in the lifetime of a torrent Analysis is based on traces A model is derived which is used to draw conclusions Verify that the derived conclusions match the observed behavior Limitations were found: Poor service availability (coming next) Fluctuating download performance Unfair service to peers Service availability - Analysis Analysis is based on traces Tracker logs (~1500 torrents, sampled every 30 sec) Traces from servers that publish *.torrent files Extracted data Identify peers Birth time of the torrent, size For each peer in the same torrent: arrival time, download & upload bandwidth, download & upload accumulative bytes Service availability - Analysis Y-axis at time t : The total number of requests for all torrents in the trace minus the cumulative number of requests for all torrents after time t, since they are born. Similar observations are seen in the *.torrent metadata traces Service availability - Analysis This suggests an exponential decrease rate of requests, since a torrent is born Notice that this is a cumulative measure Does a specific torrent behaves the same? Use the least square method to measure how much a specific torrent deviates from this logarithmic fitting Analyze the average relative deviation distribution (which is mostly small – on average 6%) Service availability – A model Based on the observations, define the torrent popularity at time t as the peers arrival rate, which is the derivative of the of the peer arrival time distribution for that torrent. Arrival rate: Where is the initial arrival rate when the torrent starts And is the attenuation parameter Both are evaluated from the observations Service availability – A model Define Torrent lifespan: duration from the birth to the time of no complete copy (thus new leeches would not be able to complete the download) Inter arrival time between two successive arriving peers could be approximated as Assume seeds leave the system at rate then the average service time for a seed is approximately Service availability – A model Look at consecutive peers that join the torrent Peer n and (n+1) join the torrent in time t(n) and t(n+1) respectively The inter arrival time between them is approximately Peer n downloads the file with speed u(n) and stays in the torrent for time duration When peer arrival rate is small enough (n is large), peer n+1, with speed u(n+1) <= u(n) could only be served by peer n Service availability – A model Thus when peer n+1 can’t complete the download and the torrent is dead Using the definition of arrival rate, we get the torrent lifespan: Both and are extract from the trace (using linear regression), as well as Compare results from the model to what the trace holds Results match very well Model vs. Observations Comparison of torrents lifespan: average lifespan according to trace is 8.89 and 8.34 based on the model Service availability - Summery Conclusions were obtained relying on extensive trace analysis and modeling Existing BitTorrent systems provides poor service availability This is due to the exponential decreasing peer arrival rate This provides strong motivation for inter-torrent collaborations Next on the itinerary One interesting limitation of BitTorrent networks BitTorrent provides poor service availability via analysis of tracker logs over long period Proposition of peer selection approach Enables to lower costs and resources of ISPs Does not require ISP and peers cooperation Implementation of the proposition as part of Azureus client, codename ‘Ono’ Tradeoffs Performance ratio) (avg. Download time) vs. Fairness (avg. share Reducing cross-ISP traffic Taken from the paper “Taming the torrent” [2] Motivation: overwhelming popularity of p2p (70% of internet traffic worldwide) yielded significant revenues for ISP. However, p2p traffic significantly has increased ISP’s costs, particularly in terms of cross-ISP traffic This has driven ISP to try and forcefully reduce p2p traffic Block specific ports, tricking clients to close connections Deep packet inspection Caching Reducing cross-ISP traffic One approach to alleviate this pain is to use an Oracle that provides knowledge about which peers are in the same ISP This would benefit both ISPs and p2p community But, this requires p2p users and ISP to collaborate and to trust each other Not likely to be adopted Reducing cross-ISP traffic Another approach is to recycle data that is already being collected by Content Distribution Networks CDNs attempt to improve web performance by redirecting requests to replica servers The goal is to help content providers (i.e. CNN) to distribute content by redirecting requests to replica servers that are: Topologically proximate Provide lower-latency CDNs as oracles Hypothesis: when peers exhibit similar redirection behavior, they are likely to be close to the replica server, and thus to each other Represent redirection behavior using ratio-maps Each ratio represents the frequency of redirecting to a specific replica Number of replicas is usually small (max 31) Keep a time window (~ a day) CDNs redirections as ratio-maps The ratio map of a peer is a set of (replica server, ratio) for peer a Specifically, if peer a is redirected toward replica server r1 75% of the time window, and toward replica server r2 25% of the time window, then the corresponding ratio-map is The sum of all in a given ratio map equals one Similarity via ratio-maps Define a metric that, given two peers, produces a value describing the similarity between the peers’ redirections behavior We are looking for overlap in redirection frequencies maps between each two peers Use cosine-similarity between two peers a and b: Cosine-similarity Distance(a,b): Sum is over the set of replica servers that a (b) to which the peer has been redirected over the time window is the ratio of time that peer a has been redirected to replica server i Cosine-similarity is analogous to dot product When maps are identical, equals 1 When maps are orthogonal (no common replica), equals 0 Values lie in [0,1] Determine a threshold (0.15) CDNs implementation Ono, an extension to Azureus client Upon handshake of two peers, exchange ratio-maps This enables Ono to perform a biased peer selection Performs DNS lookup for each CDN name to determine redirection behavior and encodes it in ratio-maps Periodically update the ratio-maps Overhead 18KB is extremely small upstream, 36KB downstream per day Computation of cosine-similarity is easy Ono-recommended empirical results Over 120,000 peers use Ono Ono collects extra network data Ping, Trace-Route to replica servers and peers Obtain feedback on the biased peer selection Not easy to determine cross-ISP hops IP hops is easy and gives some measure Compare Ono-recommended peers selection to random peer selection Ono-recommended empirical results Cumulative Distribution function of the number of ip hops taken along paths between Ono client and his peers. Each value represents the average number of hops for all peers, seen by a particular Ono client during 6 hour interval Ono finds shorter paths Median in less than half More than 20% are only one hop away, via less than 2% Ono-recommended empirical results Each ip address was mapped to corresponding Autonomous System id Similar to the previous graph Over 33% of paths found by Ono do not leave the origin AS Median AS hops is one vs. less than 10% in the random case CDNs as oracles - Summery Recycling network views collected by CDNs Good internet citizenship in terms of reducing cross-ISP traffic Performance of peers is not effected Scalable (the more clients adopt it, the more accurate the bias would get) Available easily and freely Last on the itinerary One interesting limitation of BitTorrent networks BitTorrent provides poor service availability via analysis of tracker logs over long period Proposition of peer selection approach Enables to lower costs and resources of ISPs Does not require ISP and peers cooperation Implementation of the proposition as part of Azureus client, codename ‘Ono’ Tradeoffs Performance ratio) (avg. Download time) vs. Fairness (avg. share BitTorrent Tradeoffs Taken from the paper “The delicate tradeoffs in BitTorrent-like file sharing protocol design” [3] Peers that participate in BT are heterogeneous with regard to download and upload capacities Taking a system approach The system throughput depends critically on the “fat” peers However, this might result in unfairness towards those who contribute more This in turn would encourage peers to supply low upload rate to others BitTorrent Tradeoffs A user would look for download it gets to be proportional to the upload it supplies Assuming peers take the system overview Long lasting, steady state, rational Two parameters and their inter relations are explored Performance: minimum average rate of download time Fairness: ratio between give and take Model W.L.O.G. the file is of size 1 Assume peer average arrival rate of Assume Peers do not abort Upon completion, peer leaves the torrent (BT provides no incentive of seeding) Assume n classes (types) of peers For each new peer arrival, with probability it belongs to type I Thus, average arrival rate for class i is Class i has Ui and Di as upload and download capacities Assume U1>U2>…>Un (type 1 are the “fat” ones…) Model Visualization of the model for n=2 Model – measuring performance Assume (quite natural) that the bottleneck is the upload capacity i.e. no network bottleneck such as server saturation The file uploading capacity of the entire system is Consider the steady state: Define as the average number of type i peers Approximation: Substitute the later in the former, in s.s.: Model – measuring performance In steady state, the capacity should be equal to the arrival rate (as the file size is 1) Obtain Define In “share-ratio” a steady and balanced system, share-ratio should be 1 Average and rewrite as system download time Those two equations define the solution space, as well as the resulting performance Feasible solutions are the set Model – measuring fairness Consider share-ratio as a good and natural measure of fairness Define fairness index: Measures how equal the ratios are (if all are the same, it equals 1) (after some work) Obtain: Also expressed in terms of upload and download of each class Rate strategies General assumption: all peers maximize their upload capacity Based on experiments We want optimal average download time T Solve Use the constraint problem Lagrangian multiplier method Rate strategies Optimal average download time T that is obtained is: We get an assignment for The system gives the “thin” peers (other than type-1) maximum upload capacity “Thin” peers get more than they contribute Calculate the fairness index under this solution (shown to be quite low) Rate strategies Now, apply the same strategy to achieve optimal fairness Then check what is the resulting performance measure Optimal fairness is achieved when We get different assignments for Compare the two: In terms of system performance, we have: In terms of system fairness, we have: Entire design space Actually those are only two of infinite solutions for assigning The space lies on a curve Example: a system of two types with specific capacities Simulation results Experiments with two-types system (with the same characteristics as the last system) Average Fairness: downloading time: Simulation results Fundamental tradeoffs: Taken for the extreme values of the two strategies: Summery: Cannot enjoy both heavens Current BitTorrnet implementations lie somewhere on the curve Think about … With regard to the first paper (service availability), how did the ~8.3 average torrent lifespan was deduced ? Thank you (those who are still awake…) Papers [1] Measurements, Analysis, and Modeling of BitTorrent-like Systems [2] Taming the Torrent - A Practical Approach to Reducing Cross-ISP Traffic in Peer-to-Peer Systems Lei Guo, Songqing Chen, Zhen Xiao, Enhua Tan, Xiaoning Ding, and Xiaodong Zhang David R. Choffnes and Fabián E. Bustamante [3] The Delicate Tradeoffs in BitTorrent-like FileSharing Protocol Design Bin Fan, Dah-Ming Chiu, John C.S. Lui