Content-Based Search

advertisement
Search in Distributed Networks
Lecture: Peer-to-peer networks
Professor: Dr. Robert Tolksdorf
Elena Antonenko
Malte Münchert
Jing Zhao
Shunfeng Zhang
elena.Antonenko@web.de
muencher@inf.fu-berlin.de
zhao@inf.fu-berlin.de
zhang@inf.fu-berlin.de
Language of the talk:

English instead of German!

Comment: German is also a very beautiful
language!

Question can asked in German!
Structure of our talk:

Introduction

Content-Agnostic Search (Shunfeng);

Contect-Based Search (Elena);

Pastry(Malte);

JXTA Search (Jing)
Introduction

Most applications (file sharing, instantmessaging, chatting) involve
finding objects and resource of interest
 exchanging resources with other peers.


Accomplished by a system of advertisements
and queries
Introduction
Advertisement/query model:
Resource providers publish resource and resource consumer send
search queries;
Resource seekers advertise needs on the network and resource
providers query the network for resource;
Introduction

The problem reduced to:
query a dynamic and distributed directory of
 advertiesements by advertisement consumers


Distributed directory is built using a subset of
all the peers in the network
Content-Agnostic Search
>>>basic concept
Organization of the peers not depend on
the resources they index or point to;
Content-Agnostic search
Central mediator networks
Networks forming random connected graphs
Networks with regular structure
Content-Agnostic Search
>>> central mediator

Register content with the central server;

Query the central server for Information;

Roles of central server:
Matchmaker
 Broker;

Content-Agnostic Search
>>> central mediator as Matchmaker
ASK-ALL: who can help?
Matchmaker
Reply: name1 +
info1…
Unadvertise
Advertise
Peer
Requester
Content-Agnostic Search
>>> central mediator as Matchmaker

Requester: an agent with an objective that it
wants to be achieved by some other agent.

Matchmaker: an agent that
knows the names of many agents
 and their corresponding capabilities.


Server: an agent that has committed itself to
fulfilling objectives on behalf of other agents.
Content-Agnostic Search
>>> central mediator as Matchmaker
Content-Agnostic Search
>>> central mediator as Broker
STREAM-ALL: „Request“
Broker
REPLY
Unadvertise
Advertise
Peer
Requester
Content-Agnostic Search
>>>central mediator as Broker

Requester: an agent that has an objective that the agent
wants to has achieved by another agent.

Broker:



an agent that knows the names of some other agents and their
corresponding capabilities,
and advertises its own capabilities as some function of the capabilities
of these other agents.
Brokered Server: an agent that has committed to the broker to
taking on a predetermined class of objectives.
Content-Agnostic Search
>>>central mediator

Advantages



Comprehensive
Fast update
Minimized messages
exchange

Disadvantages



Central point failure
Non-scalabe
Needing central
authority
Comment:
Be solved with
decentralized mediator
Content-Agnostic Search
Content-Agnostic search
Central mediator networks
Networks forming random connected graphs
Networks with regular structure
Content-Agnostic Search
>>>Network forming random connected Graphs

Nodes are connected to few random
neighbors

Example: Gnutella network

Already done in 2.nd Talk in the Lecture

Power Law Networks
The search takes advantage of the power law link
distribution of naturally occurring networks
Content-Agnostic Search
>>>Power Law Networks
Content-Agnostic Search
>>>Power Law Networks

Power law distribution:
few nodes have very high connectivity
many nodes with very low connectivity
Content-Agnostic Search
>>>Power Law Networks
Content-Agnostic Search
>>>Power Law Networks
Rule: Each time: one node
two edges
connect to node with higher
degree
Content-Agnostic Search
--Power Law Networks
Content-Agnostic Search
>>>Power Law Networks
Power law graphs are dynamically
constructed
 the rewiring of nodes occurs not randomly, but
preferentially attaching to the most connected
nodes.

Content-Agnostic Search
>>>Power Law Networks

Power law search algorithm

needs modification to the basic Gnutella approach;
Content-Agnostic Search
>>>Power Law Networks
the Gnutella approach
 Broadcasting to all
neighbors
Modified Gnutella
 the neighbor with
highest connechtions


Can exchange with
every neighbors
Exchange with the firstand second-degree
neighbors
Content-Agnostic Search
>>>Power Law Networks

Advantages of PLN

Networks of decentralized mediators

Broadcasting queries to all neighbors avoided

Search cost reduced
Content-Based Search: Introduction
Content of queries is used to efficiently route
the messages to the most relevant peers
 Search techniques include:

Content-mapping networks;
 Some variations of publish/subscribe networks;

Content-Based Search
Content – Mapping Search Networks
All peer in network index a „zone“ of the
advertisement space
 The zone is dynamic
 Size of the zone depends on the number of
peers
 Peers map advertisement content to the space
 Mapping is performed using hash functions
 Examples include: CAN, Chord, Tapestry,
Pastry

Content-Based Search
Distributed Hash Table (DHT)




DHT provides the same functionality as
traditional hash table
DHT stores key value pair
Data structure is distributed over different
nodes
Provides functions:



insert(id, item);
item = query(id);
Item can be anything: a data object,
document, file, pointer to a file
Content-Based Search
Content Addressable Network (CAN)
CAN is based on virtual d-dimensional
coordinate space
 Associate to each node and item a unique id
in an d-dimensional space
 Goals

Scales to hundreds of thousands of nodes
 Handles rapid arrival and failure of nodes

Content-Based Search
CAN Example: Two Dimensional Space

Space divided between
nodes
 All nodes cover the
entire space
 Each node covers either
a square or a
rectangular area
 Example:
Node n1: (1, 2) first
node that joins  cover
the entire space
Content-Based Search
CAN Example: Two Dimensional Space

Node n2: (4, 2) joins
space is divided
between n1 and n2
Content-Based Search
CAN Example: Two Dimensional Space

Node n3:(3, 5) joins too
Content-Based Search
CAN Example: Two Dimensional Space

Nodes n4:(5, 5) and
n5:(6,6) join
Content-Based Search
CAN Example: Two Dimensional Space

Nodes: n1:(1, 2);
n2:(4,2);
n3:(3, 5);
n4:(5,5);
n5:(6,6)
 Items:
f1:(2,3);
f2:(5,0);
f3:(2,1);
f4:(7,5)
Content-Based Search
CAN Example: Two Dimensional Space

Each item is stored by
the node who owns its
mapping in the space
Content-Based Search
CAN: Query Example

Each node knows ist
neighbors in the dspace
 Forward query to the
neighbor that is closest
to the query id
 Example: assume n1
queries f4
Content-Based Search
CAN Routing
For d dimensions with n equal zones each
node has 2d neighbors
 Routing table size O(d)
 Guarantees that a file is found in at most
d x n 1/d steps, where n is the total number
of nodes
 Algorithm: Choose the neighbor nearest to the
destination

Content-Based Search
CAN: Multi-Dimension

Increase in the dimension reduces the path length
Content-Based Search
Chord: Introduction
Chord is a distributed lookup protocol
 Given a key (data item), it maps the key onto a
node (peer).
 Hash function assigns each node and key an
m-bit identifier.
 A node’s identifier is defined by hashing the
node’s IP address.
 A key identifier is produced by hashing the
key


ID(node) = hash(196.178.0.1)

ID(key) = hash(“jingle-bells.mp3”)
Content-Based Search
Chord: Data Structure
Identifiers are ordered in a virtual ring of size
2m
 Each node maintains


Finger table



Entry i in the finger table of node n is the first node that
succeeds or equals n + 2i :
successor(id)
Predecessor node
An item identified by id is stored on the
successor node of id
Content-Based Search
Chord: Example
Assume an identifier space 0..7
 Node n1:(1) joins 
all entries in its
finger table are initialized to itself

Content-Based Search
Chord: Example

Nodes n2:(2), n0:(0),
n6:(6) join
Content-Based Search
Chord: Example
Nodes: n0(0), n1:(1),
n2(2), n6(6)
Items: f1:(1), f7:(7)
Content-Based Search
Chord: Example
Upon receiving a query for item id, a node
• Check whether stores the
item locally
• If not, forwards the query
to the largest node in its
successor table that
does not exceed id
Content-Based Search
Chord: Properties
Routing table size O(log(N)) , where N is the total
number of nodes
 Guarantees that a file is found in O(log(N)) steps

Content-Based Search
Pastry - Introduction
Decentralized and scalable DHT-network
 Designed for efficient message routing
between nodes

What does DHT mean?

Distributed Hash Table
 Hash value for every
peer
 Every peer has
knowledge of some
other peers (stored in a
hash table)
 All hash tables from all
peers represent a
complete map for all
peers
Hash
Peer1
7cb3e8f0a
8aa59047f
0a5b4765c
Peer2
d1a8d54f35
85ac7542ba
Peer3
...
IP
217.4.87.4
67.9.21.7
212.90.1.44
19.1.27.2
40.92.4.120
...
The Pastry namespace

Peers reside on a virtual
circle made up from all
possible addresses
 Blue points represents
peers
2128
20
Pastry routing
Origin

Message is sent to
(known) node which is
numerically closest to
the target-node
 Procedure is repeated
until target-node is
reached
Closest to target
Distance
Destination
Pastry routing
Origin

Message is sent to
(known) node which is
numerically closest to
the target-node
 Procedure is repeated
until target-node is
reached
Destination
Prefix match

A method to estimate
difference between two
keys / addresses
 Prefix match is the
number of equal digits
until the first difference
Key1
20b28a0d18
Key2
20b98a50f7
Prfx.mtch. |20b| = 3
Key1
1f8319b020
Key2
712a650fa4
Prfx.mtch. ||= 0
Routing table for node 1234 (Example)
03f3
20d3
1127
1207
12..
1210
3238
1339
122d
1230
123..
1231
1232
1233
Increasing prefix match
100a
1...
Routing table for node 1234 (Example)
03f3
03f3
1...
-
20d3
3238
20d3
3238
100a
1127
12..
-
1339
1339
1207 1210 122d
-
1207
1210
122d
123..
1230
1231
1232
1233
1230
1231
1232
1233
Increasing prefix match
100a 1127
Routing table for node 1234 (Example)
03f3
-
100a 1127
20d3 3238
-
1207 1210 122d
1339
-
1230 1231 1232 1233
Leaf set
Example leaf set with l=6
l/2 numerically closest
smaller nodes
1209
121f
1230
our
node
1234
l/2 numerically closest
larger nodes
123a
1270
12ac
Utilized structures (Summary)
Routing table has tree structure
 „Leaf set“ table lists numerically closest
neighbors

Routing algorithm
If target node is part of the leaf set, message
is directly send
 Otherwise, routing table is checked for node
with greater prefix-match than our node
 If still no target available, leaf set is queried for
numerically closer node but with same prefixmatch like our node

Routing algorithm (Demonstration)
Example node = 1234
Smaller (<)
120a
1221
Leaf
set
1234
Larger (>)
125f
1297
Message to
1203
03f3
100a
1207
1230
1127
1210
1231
20d3
122d
1232
3238
1339
1233
Pastry – Routing (2)
If node which message is sent to is not the
target node, these steps will be repeated
 Prefix match increases by every node the
message travels through
 O(log16N) steps (usually 5-7, max. 32 nodes to
reach target)

Outline
Introduction to JXTA Search
Architecture and Components
Query Routing Protocol (QRP)
Query Resolution
Platform Bindings
Introduction to JXTA Search

Originally developed by Infrasearch which
was acquired by Sun in March 2001.
 Defines a XML-Protocol, which enables the
search in P2P Network.
 Open source code.
 Supports „Wide Search“ and „Deep Search“.
Jxta Search –
„ Wide Search“ and „Deep Search“
Wide search of distributed devices, such as PCs, handhelds, and cell
phones
Deep search of rich content sources such as Web servers
JXTA Search-Participants
Three Participants:



•
•
•
•
JXTA Search Information Providers
JXTA Search Consumers
JXTA Search Hub
Consumer applications send requests to the JXTA Search network via
the nearest JXTA Search hub.
The hub determines which of the known providers should receive the
query based on provider meta-data.
The hub sends the requests to providers, receives responses, and
sends responses back to consumers.
The QRP enables participants in the network to exchange information
in a seamless manner without having to understand the structure of
their presentation layers.
Outline
Introduction to JXTA Search
Architecture and Components
Query Routing Protocol (QRP)
Query Resolution
Platform Bindings
Architecture and components
The JXTA Search Network architecture
consists of the following
components:
•
•
•
•
•
Provider Service
Consumer Service
Registration Service
Hub Service
Message Flow
The JXTA Search network architecture.
JXTA Search Hub Service
JXTA Search Hub Service consists of the two sub components:
Router , Resolver
At the heart of JXTASearch is the "router/resolver",
JXTA Search Router
- routes and manages query
connections,
- collates results and returns
results to consumers
JXTA Search Resolver
- maintains an index of
provider's registrations,
- and when a query is received,
matches the query against a set
of providers that may be good
at answering the query.
Architecture and components
Distributed Search
•
Central to the JXTA
Search infrastructure are
"hubs".
•
Each hub has a series of
providers that form its
local network.
•
These providers typically
have something in
common.
•
hubs are expected to
become an efficient way
to group peers with similar
content, geography or
queryspaces.
Outline
Introduction to JXTA Search
Architecture and Components
Query Routing Protocol (QRP)
Query Messages
Response Messages
Registration Messages
Query Resolution
Platform Bindings
Queryspaces

Queryspaces
Providers may have widely differing types of
content or resources in their datastores
 The notion of queryspaces is allowed to define
the structure of a query and its associated
registration..
 Queryspaces are a fundamental component of the
JXTA Search framework. Like XML namespaces,
queryspaces are identified by unique URIs.

QRP - Query Messages
Query messages are structured as
follows:

The default namespace is
http://search.jxta.org




The simple text query for the term JXTA in the
http://search.jxta.org/text queryspace is shown
as the follows:
The query message is contained
within the envelope
<request>...</request>.
The query unique ID is specified in
the uuid attribute of the <request>
tag.
The query space is specified in the
query-space attribute of the request
tag.
The query data can be arbitrary
XML within a namespace that
matches http://search.jxta.org/text,
which includes the tag <query> to
specify the start of the actual query
data and the tag <text> to specify
free text, or within any other
namespace specified by the queryspace definition.
Example Query Messages
QRP– Response Messages
The response message is
structured as follows:
The default name space is
http://search.jxta.org.
The response message is
enveloped within the
<responses>...</responses>
tags, with each specific
response enveloped in
<response>...</response>
tags.
 The body of the response
is contained within the
<data>...</data> tags. It can
be arbitrary well-formed
XML.
A response to the query answered by a JXTA Peer
running a stock quote service appears as follows:
Example Response Message
QRP-Registration Messages
Information providers must register with the JXTA Search
network.
To register, a provider contacts an access point with a
registration message.
An XML document with three components:
•
•
•
Queryspace URL, identifying the URL at which, when queries
are posted to it, the provider’s predicates are checked for
matches.
A set of predicates.
The predicate defines the structure and content of the queries
in which the provider is interested
The provider’s query server endpoint either a JXTA pipe ID or a
URL. Queries which match one of the provider’s predicates are
posted to this endpoint.
QRP -Registration Messages
Information providers must register with the JXTA Search network. Therefore,
a provider contacts an access point with a registration message.
The query server
The query space
The predicate body
Example Registration Message
Outline
Introduction to JXTA Search
Architecture and Components
Query Routing Protocol (QRP)
Query Resolution
Platform Bindings
Query Resolution
Queries are resolved by a resolver by matching query terms to registration
terms. Providers whose registration terms match the query terms are returned
by the resolver.
To determine to which set of providers a given query
should be routed. Sending all queries to all
providers is inefficient.
JXTA Search attempts greate efficiency.
Method 1
Define a framework for providers to
register the type of queries they are
interested in receiving
Method 2
Provide an efficient query resolution
and routing service
The minimal condition for matching a query to a provider is that the query must have
the same query-space as the provider registration.
Bindings-JXTA Search over JXTA
JXTA Search is network and message format agnostic.Two platform
bindings are supported: JXTA and Web bindings.
JXTA Search over JXTA


Pipes are used as the communication mechanism.Query
Request messages are accepted on an input pipe, and
Query Response messages are returned on an output pipe
specified inside the Query Request message.
JXTA Search Messages are in JXTA Messages embattled.



– Query Requests: Request and Response Pipe
– Query Response: Response
– Registration: Registration and Response Pipe
Bindings-JXTA Search over HTTP
JXTA Search over HTTP

JXTA Search Messages are per POST transported.

There is a Web Front-end:




– Aggregation of responses
– Presentation of responses (raw HTML)
– Query ranking
– Provider signup facilities
JXTA Search -Summary
•
a novel approach for query routing in distributed
networks.
•
Using a simple XML protocol combined with powerful
but simple indexing matching engines
–
•
provides developers with the capability to connect
multiple consumer and provider applications together
for the purposes of information discovery and
exchange.
References






[1] K. Decker, M. Williamson, and K. Sycara. “Matchmaking and
brokering”, Proceedings of the Second International Conference on Multiagents Systems (ICMAS-96), 1996.
[2] Clip2. “Gnutella: to the bandwidth barrier and beyond”,
http://www.clip2.com/gnutella.html, November 2000.
[3] Microsoft. “Pastry – Overview”
http://www.research.microsoft.com/%7Eantr/PAST/overview.pdf
[4] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. “A
scalable content addressable Network”, ACM SIGCOM, 2001
[5] report, Sun Microsystems, Inc., 2001.
http://search.jxta.org/JXTAsearch.pdf.
[6] Coderman. “Decentralized resource discovery in large peer based
networks”, in http://www.cubicmetercrystal.com/alpine/discovery.html
Download