Content-Based Search

Search in Distributed Networks Lecture: Peer-to-peer networks Professor: Dr. Robert Tolksdorf Elena Antonenko Malte Münchert Jing Zhao Shunfeng Zhang elena.Antonenko@web.de muencher@inf.fu-berlin.de zhao@inf.fu-berlin.de zhang@inf.fu-berlin.de Language of the talk:  English instead of German!  Comment: German is also a very beautiful language!  Question can asked in German! Structure of our talk:  Introduction  Content-Agnostic Search (Shunfeng);  Contect-Based Search (Elena);  Pastry(Malte);  JXTA Search (Jing) Introduction  Most applications (file sharing, instantmessaging, chatting) involve finding objects and resource of interest  exchanging resources with other peers.   Accomplished by a system of advertisements and queries Introduction Advertisement/query model: Resource providers publish resource and resource consumer send search queries; Resource seekers advertise needs on the network and resource providers query the network for resource; Introduction  The problem reduced to: query a dynamic and distributed directory of  advertiesements by advertisement consumers   Distributed directory is built using a subset of all the peers in the network Content-Agnostic Search >>>basic concept Organization of the peers not depend on the resources they index or point to; Content-Agnostic search Central mediator networks Networks forming random connected graphs Networks with regular structure Content-Agnostic Search >>> central mediator  Register content with the central server;  Query the central server for Information;  Roles of central server: Matchmaker  Broker;  Content-Agnostic Search >>> central mediator as Matchmaker ASK-ALL: who can help? Matchmaker Reply: name1 + info1… Unadvertise Advertise Peer Requester Content-Agnostic Search >>> central mediator as Matchmaker  Requester: an agent with an objective that it wants to be achieved by some other agent.  Matchmaker: an agent that knows the names of many agents  and their corresponding capabilities.   Server: an agent that has committed itself to fulfilling objectives on behalf of other agents. Content-Agnostic Search >>> central mediator as Matchmaker Content-Agnostic Search >>> central mediator as Broker STREAM-ALL: „Request“ Broker REPLY Unadvertise Advertise Peer Requester Content-Agnostic Search >>>central mediator as Broker  Requester: an agent that has an objective that the agent wants to has achieved by another agent.  Broker:    an agent that knows the names of some other agents and their corresponding capabilities, and advertises its own capabilities as some function of the capabilities of these other agents. Brokered Server: an agent that has committed to the broker to taking on a predetermined class of objectives. Content-Agnostic Search >>>central mediator  Advantages    Comprehensive Fast update Minimized messages exchange  Disadvantages    Central point failure Non-scalabe Needing central authority Comment: Be solved with decentralized mediator Content-Agnostic Search Content-Agnostic search Central mediator networks Networks forming random connected graphs Networks with regular structure Content-Agnostic Search >>>Network forming random connected Graphs  Nodes are connected to few random neighbors  Example: Gnutella network  Already done in 2.nd Talk in the Lecture  Power Law Networks The search takes advantage of the power law link distribution of naturally occurring networks Content-Agnostic Search >>>Power Law Networks Content-Agnostic Search >>>Power Law Networks  Power law distribution: few nodes have very high connectivity many nodes with very low connectivity Content-Agnostic Search >>>Power Law Networks Content-Agnostic Search >>>Power Law Networks Rule: Each time: one node two edges connect to node with higher degree Content-Agnostic Search --Power Law Networks Content-Agnostic Search >>>Power Law Networks Power law graphs are dynamically constructed  the rewiring of nodes occurs not randomly, but preferentially attaching to the most connected nodes.  Content-Agnostic Search >>>Power Law Networks  Power law search algorithm  needs modification to the basic Gnutella approach; Content-Agnostic Search >>>Power Law Networks the Gnutella approach  Broadcasting to all neighbors Modified Gnutella  the neighbor with highest connechtions   Can exchange with every neighbors Exchange with the firstand second-degree neighbors Content-Agnostic Search >>>Power Law Networks  Advantages of PLN  Networks of decentralized mediators  Broadcasting queries to all neighbors avoided  Search cost reduced Content-Based Search: Introduction Content of queries is used to efficiently route the messages to the most relevant peers  Search techniques include:  Content-mapping networks;  Some variations of publish/subscribe networks;  Content-Based Search Content – Mapping Search Networks All peer in network index a „zone“ of the advertisement space  The zone is dynamic  Size of the zone depends on the number of peers  Peers map advertisement content to the space  Mapping is performed using hash functions  Examples include: CAN, Chord, Tapestry, Pastry  Content-Based Search Distributed Hash Table (DHT)     DHT provides the same functionality as traditional hash table DHT stores key value pair Data structure is distributed over different nodes Provides functions:    insert(id, item); item = query(id); Item can be anything: a data object, document, file, pointer to a file Content-Based Search Content Addressable Network (CAN) CAN is based on virtual d-dimensional coordinate space  Associate to each node and item a unique id in an d-dimensional space  Goals  Scales to hundreds of thousands of nodes  Handles rapid arrival and failure of nodes  Content-Based Search CAN Example: Two Dimensional Space  Space divided between nodes  All nodes cover the entire space  Each node covers either a square or a rectangular area  Example: Node n1: (1, 2) first node that joins  cover the entire space Content-Based Search CAN Example: Two Dimensional Space  Node n2: (4, 2) joins space is divided between n1 and n2 Content-Based Search CAN Example: Two Dimensional Space  Node n3:(3, 5) joins too Content-Based Search CAN Example: Two Dimensional Space  Nodes n4:(5, 5) and n5:(6,6) join Content-Based Search CAN Example: Two Dimensional Space  Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5); n5:(6,6)  Items: f1:(2,3); f2:(5,0); f3:(2,1); f4:(7,5) Content-Based Search CAN Example: Two Dimensional Space  Each item is stored by the node who owns its mapping in the space Content-Based Search CAN: Query Example  Each node knows ist neighbors in the dspace  Forward query to the neighbor that is closest to the query id  Example: assume n1 queries f4 Content-Based Search CAN Routing For d dimensions with n equal zones each node has 2d neighbors  Routing table size O(d)  Guarantees that a file is found in at most d x n 1/d steps, where n is the total number of nodes  Algorithm: Choose the neighbor nearest to the destination  Content-Based Search CAN: Multi-Dimension  Increase in the dimension reduces the path length Content-Based Search Chord: Introduction Chord is a distributed lookup protocol  Given a key (data item), it maps the key onto a node (peer).  Hash function assigns each node and key an m-bit identifier.  A node’s identifier is defined by hashing the node’s IP address.  A key identifier is produced by hashing the key   ID(node) = hash(196.178.0.1)  ID(key) = hash(“jingle-bells.mp3”) Content-Based Search Chord: Data Structure Identifiers are ordered in a virtual ring of size 2m  Each node maintains   Finger table    Entry i in the finger table of node n is the first node that succeeds or equals n + 2i : successor(id) Predecessor node An item identified by id is stored on the successor node of id Content-Based Search Chord: Example Assume an identifier space 0..7  Node n1:(1) joins  all entries in its finger table are initialized to itself  Content-Based Search Chord: Example  Nodes n2:(2), n0:(0), n6:(6) join Content-Based Search Chord: Example Nodes: n0(0), n1:(1), n2(2), n6(6) Items: f1:(1), f7:(7) Content-Based Search Chord: Example Upon receiving a query for item id, a node • Check whether stores the item locally • If not, forwards the query to the largest node in its successor table that does not exceed id Content-Based Search Chord: Properties Routing table size O(log(N)) , where N is the total number of nodes  Guarantees that a file is found in O(log(N)) steps  Content-Based Search Pastry - Introduction Decentralized and scalable DHT-network  Designed for efficient message routing between nodes  What does DHT mean?  Distributed Hash Table  Hash value for every peer  Every peer has knowledge of some other peers (stored in a hash table)  All hash tables from all peers represent a complete map for all peers Hash Peer1 7cb3e8f0a 8aa59047f 0a5b4765c Peer2 d1a8d54f35 85ac7542ba Peer3 ... IP 217.4.87.4 67.9.21.7 212.90.1.44 19.1.27.2 40.92.4.120 ... The Pastry namespace  Peers reside on a virtual circle made up from all possible addresses  Blue points represents peers 2128 20 Pastry routing Origin  Message is sent to (known) node which is numerically closest to the target-node  Procedure is repeated until target-node is reached Closest to target Distance Destination Pastry routing Origin  Message is sent to (known) node which is numerically closest to the target-node  Procedure is repeated until target-node is reached Destination Prefix match  A method to estimate difference between two keys / addresses  Prefix match is the number of equal digits until the first difference Key1 20b28a0d18 Key2 20b98a50f7 Prfx.mtch. |20b| = 3 Key1 1f8319b020 Key2 712a650fa4 Prfx.mtch. ||= 0 Routing table for node 1234 (Example) 03f3 20d3 1127 1207 12.. 1210 3238 1339 122d 1230 123.. 1231 1232 1233 Increasing prefix match 100a 1... Routing table for node 1234 (Example) 03f3 03f3 1... - 20d3 3238 20d3 3238 100a 1127 12.. - 1339 1339 1207 1210 122d - 1207 1210 122d 123.. 1230 1231 1232 1233 1230 1231 1232 1233 Increasing prefix match 100a 1127 Routing table for node 1234 (Example) 03f3 - 100a 1127 20d3 3238 - 1207 1210 122d 1339 - 1230 1231 1232 1233 Leaf set Example leaf set with l=6 l/2 numerically closest smaller nodes 1209 121f 1230 our node 1234 l/2 numerically closest larger nodes 123a 1270 12ac Utilized structures (Summary) Routing table has tree structure  „Leaf set“ table lists numerically closest neighbors  Routing algorithm If target node is part of the leaf set, message is directly send  Otherwise, routing table is checked for node with greater prefix-match than our node  If still no target available, leaf set is queried for numerically closer node but with same prefixmatch like our node  Routing algorithm (Demonstration) Example node = 1234 Smaller (<) 120a 1221 Leaf set 1234 Larger (>) 125f 1297 Message to 1203 03f3 100a 1207 1230 1127 1210 1231 20d3 122d 1232 3238 1339 1233 Pastry – Routing (2) If node which message is sent to is not the target node, these steps will be repeated  Prefix match increases by every node the message travels through  O(log16N) steps (usually 5-7, max. 32 nodes to reach target)  Outline Introduction to JXTA Search Architecture and Components Query Routing Protocol (QRP) Query Resolution Platform Bindings Introduction to JXTA Search  Originally developed by Infrasearch which was acquired by Sun in March 2001.  Defines a XML-Protocol, which enables the search in P2P Network.  Open source code.  Supports „Wide Search“ and „Deep Search“. Jxta Search – „ Wide Search“ and „Deep Search“ Wide search of distributed devices, such as PCs, handhelds, and cell phones Deep search of rich content sources such as Web servers JXTA Search-Participants Three Participants:    • • • • JXTA Search Information Providers JXTA Search Consumers JXTA Search Hub Consumer applications send requests to the JXTA Search network via the nearest JXTA Search hub. The hub determines which of the known providers should receive the query based on provider meta-data. The hub sends the requests to providers, receives responses, and sends responses back to consumers. The QRP enables participants in the network to exchange information in a seamless manner without having to understand the structure of their presentation layers. Outline Introduction to JXTA Search Architecture and Components Query Routing Protocol (QRP) Query Resolution Platform Bindings Architecture and components The JXTA Search Network architecture consists of the following components: • • • • • Provider Service Consumer Service Registration Service Hub Service Message Flow The JXTA Search network architecture. JXTA Search Hub Service JXTA Search Hub Service consists of the two sub components: Router , Resolver At the heart of JXTASearch is the "router/resolver", JXTA Search Router - routes and manages query connections, - collates results and returns results to consumers JXTA Search Resolver - maintains an index of provider's registrations, - and when a query is received, matches the query against a set of providers that may be good at answering the query. Architecture and components Distributed Search • Central to the JXTA Search infrastructure are "hubs". • Each hub has a series of providers that form its local network. • These providers typically have something in common. • hubs are expected to become an efficient way to group peers with similar content, geography or queryspaces. Outline Introduction to JXTA Search Architecture and Components Query Routing Protocol (QRP) Query Messages Response Messages Registration Messages Query Resolution Platform Bindings Queryspaces  Queryspaces Providers may have widely differing types of content or resources in their datastores  The notion of queryspaces is allowed to define the structure of a query and its associated registration..  Queryspaces are a fundamental component of the JXTA Search framework. Like XML namespaces, queryspaces are identified by unique URIs.  QRP - Query Messages Query messages are structured as follows:  The default namespace is http://search.jxta.org     The simple text query for the term JXTA in the http://search.jxta.org/text queryspace is shown as the follows: The query message is contained within the envelope <request>...</request>. The query unique ID is specified in the uuid attribute of the <request> tag. The query space is specified in the query-space attribute of the request tag. The query data can be arbitrary XML within a namespace that matches http://search.jxta.org/text, which includes the tag <query> to specify the start of the actual query data and the tag <text> to specify free text, or within any other namespace specified by the queryspace definition. Example Query Messages QRP– Response Messages The response message is structured as follows: The default name space is http://search.jxta.org. The response message is enveloped within the <responses>...</responses> tags, with each specific response enveloped in <response>...</response> tags.  The body of the response is contained within the <data>...</data> tags. It can be arbitrary well-formed XML. A response to the query answered by a JXTA Peer running a stock quote service appears as follows: Example Response Message QRP-Registration Messages Information providers must register with the JXTA Search network. To register, a provider contacts an access point with a registration message. An XML document with three components: • • • Queryspace URL, identifying the URL at which, when queries are posted to it, the provider’s predicates are checked for matches. A set of predicates. The predicate defines the structure and content of the queries in which the provider is interested The provider’s query server endpoint either a JXTA pipe ID or a URL. Queries which match one of the provider’s predicates are posted to this endpoint. QRP -Registration Messages Information providers must register with the JXTA Search network. Therefore, a provider contacts an access point with a registration message. The query server The query space The predicate body Example Registration Message Outline Introduction to JXTA Search Architecture and Components Query Routing Protocol (QRP) Query Resolution Platform Bindings Query Resolution Queries are resolved by a resolver by matching query terms to registration terms. Providers whose registration terms match the query terms are returned by the resolver. To determine to which set of providers a given query should be routed. Sending all queries to all providers is inefficient. JXTA Search attempts greate efficiency. Method 1 Define a framework for providers to register the type of queries they are interested in receiving Method 2 Provide an efficient query resolution and routing service The minimal condition for matching a query to a provider is that the query must have the same query-space as the provider registration. Bindings-JXTA Search over JXTA JXTA Search is network and message format agnostic.Two platform bindings are supported: JXTA and Web bindings. JXTA Search over JXTA   Pipes are used as the communication mechanism.Query Request messages are accepted on an input pipe, and Query Response messages are returned on an output pipe specified inside the Query Request message. JXTA Search Messages are in JXTA Messages embattled.    – Query Requests: Request and Response Pipe – Query Response: Response – Registration: Registration and Response Pipe Bindings-JXTA Search over HTTP JXTA Search over HTTP  JXTA Search Messages are per POST transported.  There is a Web Front-end:     – Aggregation of responses – Presentation of responses (raw HTML) – Query ranking – Provider signup facilities JXTA Search -Summary • a novel approach for query routing in distributed networks. • Using a simple XML protocol combined with powerful but simple indexing matching engines – • provides developers with the capability to connect multiple consumer and provider applications together for the purposes of information discovery and exchange. References       [1] K. Decker, M. Williamson, and K. Sycara. “Matchmaking and brokering”, Proceedings of the Second International Conference on Multiagents Systems (ICMAS-96), 1996. [2] Clip2. “Gnutella: to the bandwidth barrier and beyond”, http://www.clip2.com/gnutella.html, November 2000. [3] Microsoft. “Pastry – Overview” http://www.research.microsoft.com/%7Eantr/PAST/overview.pdf [4] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. “A scalable content addressable Network”, ACM SIGCOM, 2001 [5] report, Sun Microsystems, Inc., 2001. http://search.jxta.org/JXTAsearch.pdf. [6] Coderman. “Decentralized resource discovery in large peer based networks”, in http://www.cubicmetercrystal.com/alpine/discovery.html

Content-Based Search

Related documents

Products

Support

Content-Based Search

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib