Towards Taxonomy-based Routing in P2P Networks Alexander L¨oser 指導老師:許子衝 老師 學生:羅英辰 學號:M97G0216 1 Introduction(1) The development of smart, scalable approaches for the discovery and location of data sources in distributed heterogeneous information systems is an important problem in many scientific and commercial domains. In the e-learning domain during the last years a large number of digital e-learning repositories has been build. 2 Introduction(2) 3 Super-Peer based Architecture(1) A super-peer is a node that acts as a centralized server to a subset of clients,e.g. information provider and information consumer. Super-peers are also connected to each other as peers in a pure system are (Figure 2), routing messages over this overlay network, and submitting and answering queries on behalf of their clients and themselves. 4 Super-Peer based Architecture(2) 5 Models and Queries(1) Each peer is classified by paths in one or more taxonomies and publishes a model based on semi-structured XML data with the taxonomies and paths. Open Directory Project (ODP) 是網站的開放內 容目錄,也就是所謂的 DMOZ (來自其原始網域名 稱:directory.mozilla.org)。 6 Models and Queries(2) 7 Models and Queries(3) To lookup peer models we use a subset of the XPATH(XML Path) language. 8 Distributed Hash Tables (DHT) 9 Indexing Peer Models and Taxonomies in a DHT(1) Models are indexed in a catalog based on a Distributed Hash Table. The Catalog is distributed among the SP-SP network. Consider the model with PID=E. 10 Indexing Peer Models and Taxonomies in a DHT(2) Use SHA-1(Secure Hash Algorithm) 11 Indexing Peer Models and Taxonomies in a DHT(3) SUCC (successor) 12 Indexing Peer Models and Taxonomies in a DHT(4) 13 CHORD protocol 14 Indexing Peer Models and Taxonomies in a DHT(5) Keys are stored clockwise at the closest node with the next higher hash value. 15 Lookup Models in a DHT(1) Exact Lookups BFS-based Lookups Conjunctive Lookup 16 Lookup Models in a DHT(2) Exact Lookups Figure 4. q1/Computers/Programming/Languages/Java. The taxonomy path of the query is hashed to $EA66 and then a lookup on the Chord ring is executed. The result of the lookup is a set of PIDs storing models with this classification path, e.g the peers with the PID: D,E,F. 17 Lookup Models in a DHT(3) BFS-based Lookups 18 Lookup Models in a DHT(4) Conjunctive Lookup Ex: Figure 4 q3 19 Storage Load Balancing Strategies 20 Implementation and Evaluation(1) 50 Super-peer 15000 Peers Join and leave within 3600s Without load balancing(-VS-LBM) Virtual server(+VS) Partition based load balancing(+LBM) Combination of partition based load balancing and virtual server (+LBM+VS) 21 Implementation and Evaluation(2) Our load balancing approach performs better than virtual server(+VS) and the simulation without any load balancing(-VS-LBM). This result are valid for a small super-peer network, such as simulated in our experiment. In our approach we are only able to reduce the number of taxonomy paths a super-peer is responsible for. 22 Implementation and Evaluation(3) Each peer issues each 240 sec an exact query for a taxonomy path. The average required bandwidth for serving queries and joining and leaving peers each super-peer is 25KByte/sec. 23 Implementation and Evaluation(4) Figure 10 shows the costs using our storage load balancing approach only for joining leaving peer nodes (J/L) and for issuing queries and joining and leaving peer nodes (J/L +Query). 24 Implementation and Evaluation(5) J/LJoin and Leave 25 Summary and FurtherWork We presented a completely new approach for enabling efficient semantic query routing in P2P networks. Much work remains, for example dynamic storage load balancing strategies allowing super-peers to join and leave the catalog with a high frequency while the catalog remains robust. 26