Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p p2p, Fall 05 1 Outline Search Strategies in Unstructured p2p Routing Indexes p2p, Fall 05 2 Topics in Database Systems: Data Management in Peer-to-Peer Systems D. Tsoumakos and N. Roussopoulos, “A Comparison of Peer-toPeer Search Methods”, WebDB03 p2p, Fall 05 3 Overview Centralized Constantly-updated directory hosted at central locations (do not scale well, updates, single points of failure) Decentralized but structured The overlay topology is highly controlled and files (or metadata/index) are not placed at random nodes but at specified locations “loosely” vs “highly-structured” DHT Decentralized and Unstructured peers connect in an ad-hoc fashion the location of document/metadata is not controlled by the system No guaranteed for the success of a search No bounds on search time p2p, Fall 05 4 Flooding on Overlays xyz.mp3 xyz.mp3 ? p2p, Fall 05 5 Flooding on Overlays xyz.mp3 xyz.mp3 ? Flooding p2p, Fall 05 6 Flooding on Overlays xyz.mp3 xyz.mp3 ? Flooding p2p, Fall 05 7 Flooding on Overlays xyz.mp3 p2p, Fall 05 8 Search in Unstructured P2P Must find a way to stop the search Time-to-Leave (TTL) Exponential Number of Messages Cycles (?) p2p, Fall 05 9 Search in Unstructured P2P BFS vs DFS BFS better response time, larger number of nodes (message overhead per node and overall) Note: search in BFS continues (if TTL is not reached), even if the object has been located on a different path Recursive vs Iterative During search, whether the node issuing the query direct contacts others, or recursively. Does the result follows the same path? p2p, Fall 05 10 Iterative vs. Recursive Routing Iterative: Originator requests IP address of each hop • Message transport is actually done via direct IP Recursive: Message transferred hop-by-hop K V K V K V K V K V K V K V K V K V K V K V retrieve (K1) p2p, Fall 05 11 Search in Unstructured P2P Two general types of search in unstructured p2p: Blind: try to propagate the query to a sufficient number of nodes (example Gnutella) Informed: utilize information about document locations (example Routing Indexes) Informed search increases the cost of join for an improved search cost p2p, Fall 05 12 Blind Search Methods Gnutella: Use flooding (BFS) to contact all accessible nodes within the TTL value Huge overhead to a large number of peers + Overall network traffic Hard to find unpopular items Up to 60% bandwidth consumption of the total Internet traffic p2p, Fall 05 13 Overlay Networks • P2P applications need to: – Track identities & (IP) addresses of peers • May be many! • May have significant Churn (update rate) • Best not to have n2 ID references – Route messages among peers • If you don’t keep track of all peers, this is “multi-hop” This is an overlay network – Peers are doing both naming and routing – IP becomes “just” the low-level transport • All the IP routing is opaque p2p, Fall 05 14 P2P Cooperation Models • • • Centralized model – global index held by a central authority (single point of failure) – direct contact between requestors and providers – Example: Napster Decentralized model – Examples: Freenet, Gnutella – no global index, no central coordination, global behavior emerges from local interactions, etc. – direct contact between requestors and providers (Gnutella) or mediated by a chain of intermediaries (Freenet) Hierarchical model – introduction of “super-peers” – mix of centralized and decentralized model – Example: DNS p2p, Fall 05 15 Free-riding on Gnutella [Adar00] • • • • p2p, Fall 05 24 hour sampling period: – 70% of Gnutella users share no files – 50% of all responses are returned by top 1% of sharing hosts A social problem not a technical one Problems: – Degradation of system performance: collapse? – Increase of system vulnerability – “Centralized” (“backbone”) Gnutella copyright issues? Verified hypotheses: – H1: A significant portion of Gnutella peers are free riders – H2: Free riders are distributed evenly across domains – H3: Often hosts share files nobody is interested in (are not downloaded) 16 Free-riding Statistics - 1 [Adar00] H1: Most Gnutella users are free riders Of 33,335 hosts: – 22,084 (66%) of the peers share no files – 24,347 (73%) share ten or less files – Top 1 percent (333) hosts share 37% (1,142,645) of total files shared – Top 5 percent (1,667) hosts share 70% (1,142,645) of total files shared – Top 10 percent (3,334) hosts share 87% (2,692,082) of total files shared p2p, Fall 05 17 Free-riding Statistics - 2 [Adar00] H3: Many servents share files nobody downloads Of 11,585 sharing hosts: – Top 1% of sites provide nearly 47% of all answers – Top 25% of sites provide 98% of all answers – 7,349 (63%) never provide a query response p2p, Fall 05 18 Free Riders • File sharing studies – Lots of people download – Few people serve files • Is this bad? – If there’s no incentive to serve, why do people do so? – What if there are strong disincentives to being a major server? p2p, Fall 05 19 Simple Solution: Thresholds • Many programs allow a threshold to be set – Don’t upload a file to a peer unless it shares > k files • Problems: – What’s k? – How to ensure the shared files are interesting? p2p, Fall 05 20 Categories of Queries [Sripanidkulchai01] Categorized top 20 queries p2p, Fall 05 21 Popularity of Queries [Sripanidkulchai01] • • • Very popular documents are approximately equally popular Less popular documents follow a Zipf-like distribution (i.e., the probability of seeing a query for the ith most popular query is proportional to 1/(ialpha)) Access frequency of web documents also follows Zipf-like distributions caching might also work for Gnutella p2p, Fall 05 22 Caching in Gnutella [Sripanidkulchai01] • • Average bandwidth consumption in tests: 3.5Mbps Best case: trace 2 (73% hit rate = 3.7 times traffic reduction) p2p, Fall 05 23 Topology of Gnutella [Jovanovic01] • Power-law properties verified (“find everything close by”) • Backbone + outskirts Power-Law (PLRG): Random Graph The node degrees follow a power law distribution: if one ranks all nodes from the most connected to the least connected, then the i’th most connected node has ω/ia neighbors, where w is a constant. p2p, Fall 05 24 Gnutella Backbone [Jovanovic01] p2p, Fall 05 25 Why does it work? It’s a small World! [Hong01] • • Milgram: 42 out of 160 letters from Oregon to Boston (~ 6 hops) Watts: between order and randomness – short-distance clustering + long-distance shortcuts Regular graph: n nodes, k nearest neighbors path length ~ n/2k 4096/16 = 256 p2p, Fall 05 Rewired graph (1% of nodes): path length ~ random graph clustering ~ regular graph Random graph: path length ~ log (n)/log(k) ~4 26 Links in the small World [Hong01] • “Scale-free” link distribution – Scale-free: independent of the total number of nodes – Characteristic for small-world networks – The proportion of nodes having a given number of links n is: P(n) = 1 /n k – Most nodes have only a few connections – Some have a lot of links: important for binding disparate regions together p2p, Fall 05 27 Freenet: Links in the small World [Hong01] P(n) ~ 1/n 1.5 p2p, Fall 05 28 Freenet: “Scale-free” Link Distribution [Hong01] p2p, Fall 05 29 Gnutella: New Measurements [1] Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble: A Measurement Study of Peer-to-Peer File Sharing Systems, Proceedings of Multimedia Computing and Networking (MMCN) 2002, San Jose, CA, USA, January 2002. [2] M. Ripeanu, I. Foster, and A. Iamnitchi. Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for system design. IEEE Internet Computing Journal, 6(1), 2002 [3] Evangelos P. Markatos, Tracing a large-scale Peer to Peer System: an hour in the life of Gnutella, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2002. [4] Y. HawatheAWATHE, S. Ratnasamy, L. Breslau, and S. Shenker. Making Gnutella-like P2P Systems Scalable. In Proc. ACM SIGCOMM (Aug. 2003). [5] Qin Lv, Pei Cao, Edith Cohen, Kai Li, Scott Shenker: Search and replication in unstructured peer-to-peer networks. ICS 2002: 84-95 p2p, Fall 05 30 Gnutella: Bandwidth Barriers • • Clip2 measured Gnutella over 1 month: – typical query is 560 bits long (including TCP/IP headers) – 25% of the traffic are queries, 50% pings, 25% other – on average each peer seems to have 3 other peers actively connected Clip2 found a scalability barrier with substantial performance degradation if queries/sec > 10: 10 queries/sec * 560 bits/query * 4 (to account for the other 3 quarters of message traffic) * 3 simultaneous connections 67,200 bps 10 queries/sec maximum in the presence of many dialup users won’t improve (more bandwidth - larger files) p2p, Fall 05 31 Gnutella: Summary • • • • • Completely decentralized Hit rates are high High fault tolerance Adopts well and dynamically to changing peer populations Protocol causes high network traffic (e.g., 3.5Mbps). For example: – 4 connections C / peer, TTL = 7 TTL i 26,240 2 * C * ( C 1 ) – 1 ping packet can cause packets i 0 No estimates on the duration of queries can be given No probability for successful queries can be given Topology is unknown algorithms cannot exploit it Free riding is a problem Reputation of peers is not addressed Simple, robust, and scalable (at the moment) • • • • • • p2p, Fall 05 32 Hierarchical Networks (& Queries) • DNS – Hierarchical name space (“clients” + hierarchy of servers) – Hierarchical routing w/aggressive caching • 13 managed “root servers” • Traditional pros/cons of Hierarchical data mgmt – Works well for things aligned with the hierarchy • Esp. physical locality – Inflexible • No data independence! p2p, Fall 05 33 Commercial Offerings • JXTA – Java/XML Framework for p2p applications – Name resolution and routing is done with floods & superpeers • Can always add your own if you like • MS WinXP p2p networking – An unstructured overlay, flooded publication and caching – “does not yet support distributed searches” • Both have some security support – Authentication via signatures (assumes a trusted authority) – Encryption of traffic p2p, Fall 05 34 Lessons and Limitations • Client-Server performs well – But not always feasible • Ideal performance is often not the key issue! • Things that flood-based systems do well – Organic scaling – Decentralization of visibility and liability – Finding popular stuff (e.g., caching) – Fancy local queries • Things that flood-based systems do poorly – Finding unpopular stuff [Loo, et al VLDB 04] – Fancy distributed queries – Vulnerabilities: data poisoning, tracking, etc. – Guarantees about anything (answer quality, privacy, etc.) p2p, Fall 05 35 Summary and Comparison of Approaches Gnutella FreeNet Chord CAN P-Grid p2p, Fall 05 Paradigm Search Type Breadth-first search on graph Depth-first search on graph Implicit binary search trees d-dimensional space Binary prefix trees String comparison String comparison Search Cost (messages) Autonomy 2 * i 0 C * (C 1)i very high TTL O(Log n) ? very high Equality O(Log n) restricted Equality O(d n^(1/d)) high Prefix O(Log n) high 36 More on Search Search Options – Query Expressiveness (type of queries) – Comprehensiveness (all or just the first (or k) results – Topology – Data Placement – Message Routing p2p, Fall 05 37 Comparison Gnutella Expressivness Comprehensivness Autonomy Efficiency Robustness Others? Topology pwr law Data Placement arbitrary Message Routing flooding p2p, Fall 05 CAN 38 Comparison Gnutella CAN Topology pwr law grid Data Placement arbitrary hashing Message Routing flooding directed Expressivness Comprehensivness Autonomy Efficiency Robustness p2p, Fall 05 Others? 39 Parallel Clusters links out of these clusters not shown search at only a fraction of the nodes! p2p, Fall 05 40 Other Open Problems besides Search: Security • • • • Availability (e.g., coping with DOS attacks) Authenticity Anonymity Access Control (e.g., IP protection, payments,...) p2p, Fall 05 41 Trustworthy P2P • Many challenges here. Examples: – Authenticating peers – Authenticating/validating data • Stored (poisoning) and in flight – Ensuring communication – Validating distributed computations – Avoiding Denial of Service • Ensuring fair resource/work allocation – Ensuring privacy of messages • Content, quantity, source, destination p2p, Fall 05 42 Authenticity title: origin of species author: charles darwin ? date: 1859 body: In an island far, far away ... ... p2p, Fall 05 43 More than Just File Integrity title: origin of species author: charles darwin ? date: 1859 00 body: In an island far, far away ... checksum p2p, Fall 05 44 More than Fetching One File T=origin Y=? A=darwin B=? T=origin Y=1800 A=darwin p2p, Fall 05 T=origin T=origin Y=1859 Y=1859 A=darwin A=darwin B=abcd T=origin Y=1859 A=darwin 45 Solutions • • • Authenticity Function A(doc): T or F – at expert sites, at all sites? – can use signature expert sig(doc) Voting Based – authentic is what majority says Time Based – e.g., oldest version (available) is authentic p2p, Fall 05 user 46 Added Challenge: Efficiency • Example: Current music sharing – everyone has authenticity function – but downloading files is expensive • Solution: Track peer behavior good peer p2p, Fall 05 good peer bad peer 47 Issues • • • • • • Trust computations in dynamic system Overloading good nodes Bad nodes can provide good content sometimes Bad nodes can build up reputation Bad nodes can form collectives ... p2p, Fall 05 48 Security & Privacy • Issues: – Anonymity – Reputation – Accountability – Information Preservation – Information Quality – Trust – Denial of service attacks p2p, Fall 05 49 Blind Search Methods Modified-BFS: Choose only a ratio of the neighbors (some random subset) Iterative Deepening: Start BFS with a small TTL and repeat the BFS at increasing depths if the first BFS fails Works well when there is some stop condition and a “small” flood will satisfy the query Else even bigger loads than standard flooding (more later …) p2p, Fall 05 50 Random Walks: Blind Search Methods The node that poses the query sends out k query messages to an equal number of randomly chosen neighbors Each step follows each own path at each step randomly choosing one neighbor to forward it Each path – a walker Two methods to terminate each walker: TTL-based or checking method (the walkers periodically check with the query source if the stop condition has been met) It reduces the number of messages to k x TTL in the worst case Some kind of local load-balancing p2p, Fall 05 51 Blind Search Methods Random Walks: In addition, the protocol bias its walks towards high-degree nodes (choose the highest degree neighbor) p2p, Fall 05 52 Blind Search Methods Using Super-nodes: Super (or ultra) peers are connected to each other Each super-peer is also connected with a number of lead nodes Routing among the super-peers The super-peers then contact their leaf nodes p2p, Fall 05 53 Blind Search Methods Using Super-nodes: Gnutella2 When a super-peer (or hub) receives a query from a leaf, it forwards it to its relevant leaves and to neighboring super-peers The hubs process the query locally and forward it to their relevant leaves Neighboring super-peers regularly exchange local repository tables to filter out traffic between them p2p, Fall 05 54 Blind Search Methods Ultrapeers can be installed (KaZaA) or self-promoted (Gnutella) Interconnection between the superpeers p2p, Fall 05 55 Informed Search Methods Local Index Each node indexes all files stored at all nodes within a certain radius r and can answer queries on behalf of them Search process at steps of r, hop distance between two consecutive searches 2r+1 Increased cost for join/leave • Flood inside each r with TTL = r, when join/leave the network p2p, Fall 05 56 Informed Search Methods Intelligent BFS query ... ? Nodes store simple statistics on its neighbors: (query, NeigborID) tuples for recently answered requests from or through their neighbors so they can rank them For each query, a node finds similar ones and selects a direction How? p2p, Fall 05 57 Informed Search Methods Intelligent or Directed BFS query • ... ? Heuristics for Selecting Direction >RES: Returned most results for previous queries <TIME: Shortest satisfaction time <HOPS: Min hops for results >MSG: Forwarded the largest number of messages (all types), suggests that the neighbor is stable <QLEN: Shortest queue <LAT: Shortest latency >DEG: Highest degree p2p, Fall 05 58 Informed Search Methods Intelligent or Directed BFS • No negative feedback • Depends on the assumption that nodes specialize in certain documents p2p, Fall 05 59 Informed Search Methods APS Again, each node keeps a local index with one entry for each object it has requested per neighbor – this reflects the relative probability of the node to be chosen to forward the query k independent walkers and probabilistic forwarding Each node forwards the query to one of its neighbor based on the local index (for each object, choose a neighbor using the stored probability) If a walker, succeeds the probability is increased, else is decreased – Take the reverse path to the requestor and update the probability, after a walker miss (optimistic update) or after a hit (pessimistic update) p2p, Fall 05 60 Topics in Database Systems: Data Management in Peer-to-Peer Systems Q. Lv et al, “Search and Replication in Unstructured Peer-toPeer Networks”, ICS’02 p2p, Fall 05 61 Search and Replication in Unstructured Peer-to-Peer Networks Type of replication depends on the search strategy used (i) A number of blind-search variations of flooding (ii) A number of (metadata) replication strategies Evaluation Method: Study how they work for a number of different topologies and query distributions p2p, Fall 05 62 Methodology Three aspects of P2P Performance of search depends on Network topology: graph formed by the p2p overlay network Query distribution: the distribution of query frequencies for individual files Replication: number of nodes that have a particular file Assumption: fixed network topology and fixed query distribution Results still hold, if one assumes that the time to complete a search is short compared to the time of change in network topology and in query distribution p2p, Fall 05 63 Network Topology p2p, Fall 05 64 Network Topology (1) Power-Law Random Graph A 9239-node random graph Node degrees follow a power law distribution when ranked from the most connected to the least, the i-th ranked has ω/ia, where ω is a constant Once the node degrees are chosen, the nodes are connected randomly p2p, Fall 05 65 Network Topology (2) Normal Random Graph A 9836-node random graph p2p, Fall 05 66 Network Topology (3) Gnutella Graph (Gnutella) A 4736-node graph obtained in Oct 2000 Node degrees distribution p2p, Fall 05 roughly follow a two-segment power law 67 Network Topology (4) Two-Dimensional Grid (Grid) A two dimensional 100x100 grid p2p, Fall 05 68 Query Distribution Assume m objects Let qi be the relative popularity of the i-th object (in terms of queries issued for it) Values are normalized Σ i=1, m qi = 1 (1) Uniform: All objects are equally popular qi = 1/m (2) Zipf-like qi 1 / i α p2p, Fall 05 69 Replication Each object i is replicated on ri nodes and the total number of objects stored is R, that is Σ i=1, m ri = R (1) Uniform: All objects are replicated at the same number of nodes ri = R/m (2) Proportional: The replication of an object is proportional to the query probability of the object ri qi (3) Square-root: The replication of an object i is proportional to the square root of its query probability qi ri √qi p2p, Fall 05 70 Query Distribution & Replication When the replication is uniform, the query distribution is irrelevant (since all objects are replicated by the same amount, search times are equivalent for both hot and cold items) When the query distribution is uniform, all three replication distributions are equivalent (uniform!) Thus, three relevant combinations query-distribution/replication (1) Uniform/Uniform (2) Zipf-like/Proportional (3) Zipf-like/Square-root p2p, Fall 05 71 Metrics Pr(success): probability of finding the queried object before the search terminates #hops: delay in finding an object as measured in number of hops p2p, Fall 05 72 Metrics #msgs per node: Overhead of an algorithm as measured in average number of search messages each node in the p2p has to process #nodes visited Percentage of message duplication Peak #msgs: the number of messages that the busiest node has to process (to identify hot spots) These are per-query measures An aggregated performance measure, each query convoluted with its probability p2p, Fall 05 73 Simulation Methodology For each experiment, First select the topology and the query/replication distributions For each object i with replication ri, generate numPlace different sets of random replica placements (each set contains ri random nodes on which to place the replicas of object i) For each replica placement, randomly choose numQuery different nodes form which to initiate the query for object i Thus, we get numPlace x numQuery queries In the paper, numPlace = 10 and numQuery = 100 -> 1000 different queries per object p2p, Fall 05 74 Limitation of Flooding Choice of TTL Too low, the node may not find the object, even if it exists Too high, burdens the network unnecessarily Search for an object that is replicated at 0.125% of the nodes (~11 nodes if total 9000) Note that TTL depends on the topology Also depends replication (which however unknown) p2p, Fall 05 on is 75 Limitation of Flooding Choice of TTL Overhead Also depends the topology p2p, Fall 05 on 76 Limitation of Flooding There are many duplicate messages (due to cycles) particularly in high connectivity graphs Multiple copies of a query are sent to a node by multiple neighbors Duplicated messages can be detected and not forwarded BUT, the number of duplicate messages can still be excessive and worsens as TTL increases p2p, Fall 05 77 Limitation of Flooding Different nodes p2p, Fall 05 78 Limitation of Flooding: Comparison of the topologies Power-law and Gnutella-style graphs particularly bad with flooding Highly connected nodes means higher duplication messages, because many nodes’ neighbors overlap Random graph best, Because in truly random graph the duplication ratio (the likelihood that the next node already received the query) is the same as the fraction of nodes visited so far, as long as that fraction is small Random graph better load distribution among nodes p2p, Fall 05 79 Two New Blind Search Strategies 1. Expanding Ring deepening) – not a fixed TTL (iterative 2. Random Walks (more details) – reduce number of duplicate messages p2p, Fall 05 80 Expanding Ring or Iterative Deepening Note that since flooding queries node in parallel, search may not stop even if the object is located Use successive floods with increasing TTL A node starts a flood with a small TTL If the search is not successful, the node increases the TTL and starts another flood The process repeats until the object is found Works well when hot objects are replicated more widely than cold objects p2p, Fall 05 81 Expanding Ring or Iterative Deepening (details) Need to define A policy: at which depths the iterations are to occur (i.e. the successive TTLs) A time period W between successive iterations after waiting for a time period W, if it has not received a positive response (i.e. the requested object), the query initiator resends the query with a larger TTL Nodes maintain ID of queries for W + ε Α node that receives the same message as in the previous round does not process it, it just forwards it p2p, Fall 05 82 Expanding Ring Start with TTL = 1 and increase it linearly at each time by a step of 2 For replication over 10%, search stops at TTL 1 or 2 p2p, Fall 05 83 Expanding Ring Comparison of expanding ring message overhead between flooding and Even for objects that are replicated at 0.125% of the nodes, even if flooding uses the best TTL for each topology, expending ring still halves the per-node message overhead p2p, Fall 05 84 Expanding Ring More pronounced improvement for Random and Gnutella graphs than for the PLRG partly because the very high degree nodes in PLGR reduce the opportunity for incremental retries in the expanding ring Introduce slight increase in the delays of finding an object: From 2 to 4 in flooding to 3 to 6 in expanding ring p2p, Fall 05 85 Random Walks Forward the query to a randomly chosen neighbor at each step Each message a walker k-walkers The requesting node sends k query messages and each query message takes its own random walk k walkers after T steps should reach roughly the same number of nodes as 1 walker after kT steps So cut delay by a factor of k 16 to 64 walkers give good results p2p, Fall 05 86 Random Walks When to terminate the walks TTL-based Checking: the walker periodically checks with the original requestor before walking to the next node (again uses (a larger) TTL, just to prevent loops) Experiments show that checking once at every 4th step strikes a good balance between the overhead of the checking message and the benefits of checking p2p, Fall 05 87 Random Walks When compared to flooding: The 32-walker random walk reduces message overhead by roughly two orders of magnitude for all queries across all network topologies at the expense of a slight increase in the number of hops (increasing from 2-6 to 4-15) When compared to expanding ring, The 32-walkers random walk outperforms expanding ring as well, particularly in PLRG and Gnutella graphs p2p, Fall 05 88 Random Walks Keeping State Each query has a unique ID and its k-walkers are tagged with this ID For each ID, a node remembers the neighbor it has forwarded the query When a new query with the same ID arrives, the node forwards it to a different neighbor (randomly chosen) Improves Random and Grid by reducing up to 30% the message overhead and up to 30% the number of hops Small improvements for Gnutella and PLRG p2p, Fall 05 89 Principles of Search Adaptive termination is very important Expanding ring or the checking method Message duplication should be minimized Preferably, each query should visit a node just once Granularity of the coverage should be small Increase of each additional step should not significantly increase the number of nodes visited p2p, Fall 05 90 Replication Next time p2p, Fall 05 91