p2p, Fall 05

advertisement
Topics in Database Systems: Data Management in
Peer-to-Peer Systems
Search in Unstructured P2p
p2p, Fall 05
1
Outline
 Search Strategies in Unstructured p2p
 Routing Indexes
p2p, Fall 05
2
Topics in Database Systems: Data Management in
Peer-to-Peer Systems
D. Tsoumakos and N. Roussopoulos, “A Comparison of Peer-toPeer Search Methods”, WebDB03
p2p, Fall 05
3
Overview
 Centralized
Constantly-updated directory hosted at central locations (do
not scale well, updates, single points of failure)
 Decentralized but structured
The overlay topology is highly controlled and files (or
metadata/index) are not placed at random nodes but at
specified locations
“loosely” vs “highly-structured” DHT
 Decentralized and Unstructured
peers connect in an ad-hoc fashion
the location of document/metadata is not controlled by the system
No guaranteed for the success of a search
No bounds on search time
p2p, Fall 05
4
Flooding on Overlays
xyz.mp3
xyz.mp3 ?
p2p, Fall 05
5
Flooding on Overlays
xyz.mp3
xyz.mp3 ?
Flooding
p2p, Fall 05
6
Flooding on Overlays
xyz.mp3
xyz.mp3 ?
Flooding
p2p, Fall 05
7
Flooding on Overlays
xyz.mp3
p2p, Fall 05
8
Search in Unstructured P2P
Must find a way to stop the search Time-to-Leave (TTL)
Exponential Number of Messages
Cycles (?)
p2p, Fall 05
9
Search in Unstructured P2P
BFS vs DFS
BFS better response time, larger number of nodes
(message overhead per node and overall)
Note: search in BFS continues (if TTL is not reached), even
if the object has been located on a different path
Recursive vs Iterative
During search, whether the node issuing the query direct
contacts others, or recursively.
Does the result follows the same path?
p2p, Fall 05
10
Iterative vs. Recursive Routing
Iterative: Originator requests IP address of each hop
• Message transport is actually done via direct IP
Recursive: Message transferred hop-by-hop
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
retrieve (K1)
p2p, Fall 05
11
Search in Unstructured P2P
Two general types of search in unstructured p2p:
Blind: try to propagate the query to a sufficient number
of nodes (example Gnutella)
Informed: utilize information about document locations
(example Routing Indexes)
Informed search increases the cost of join for
an improved search cost
p2p, Fall 05
12
Blind Search Methods
Gnutella:
Use flooding (BFS) to contact all accessible nodes within the
TTL value
Huge overhead to a large number of peers +
Overall network traffic
Hard to find unpopular items
Up to 60% bandwidth consumption of the total Internet
traffic
p2p, Fall 05
13
Overlay Networks
•
P2P applications need to:
– Track identities & (IP) addresses of peers
• May be many!
• May have significant Churn (update rate)
• Best not to have n2 ID references
– Route messages among peers
• If you don’t keep track of all peers, this is “multi-hop”
This is an overlay network
– Peers are doing both naming and routing
– IP becomes “just” the low-level transport
• All the IP routing is opaque
p2p, Fall 05
14
P2P Cooperation Models
•
•
•
Centralized model
– global index held by a central authority
(single point of failure)
– direct contact between requestors and providers
– Example: Napster
Decentralized model
– Examples: Freenet, Gnutella
– no global index, no central coordination, global behavior emerges from
local interactions, etc.
– direct contact between requestors and providers (Gnutella) or
mediated by a chain of intermediaries (Freenet)
Hierarchical model
– introduction of “super-peers”
– mix of centralized and decentralized model
– Example: DNS
p2p, Fall 05
15
Free-riding on Gnutella [Adar00]
•
•
•
•
p2p, Fall 05
24 hour sampling period:
– 70% of Gnutella users share no files
– 50% of all responses are returned by top 1% of sharing
hosts
A social problem not a technical one
Problems:
– Degradation of system performance: collapse?
– Increase of system vulnerability
– “Centralized” (“backbone”) Gnutella  copyright issues?
Verified hypotheses:
– H1: A significant portion of Gnutella peers are free riders
– H2: Free riders are distributed evenly across domains
– H3: Often hosts share files nobody is interested in (are
not downloaded)
16
Free-riding Statistics - 1 [Adar00]
H1: Most Gnutella users are free riders
Of 33,335 hosts:
– 22,084 (66%) of the peers share no files
– 24,347 (73%) share ten or less files
– Top 1 percent (333) hosts share 37% (1,142,645) of total files shared
– Top 5 percent (1,667) hosts share 70% (1,142,645) of total files shared
– Top 10 percent (3,334) hosts share 87% (2,692,082) of total files shared
p2p, Fall 05
17
Free-riding Statistics - 2 [Adar00]
H3: Many servents share files nobody downloads
Of 11,585 sharing hosts:
– Top 1% of sites provide nearly 47% of all answers
– Top 25% of sites provide 98% of all answers
– 7,349 (63%) never provide a query response
p2p, Fall 05
18
Free Riders
• File sharing studies
– Lots of people download
– Few people serve files
• Is this bad?
– If there’s no incentive to serve, why do people do so?
– What if there are strong disincentives to being a major
server?
p2p, Fall 05
19
Simple Solution: Thresholds
• Many programs allow a threshold to be set
– Don’t upload a file to a peer unless it shares > k files
• Problems:
– What’s k?
– How to ensure the shared files are interesting?
p2p, Fall 05
20
Categories of Queries [Sripanidkulchai01]
Categorized top 20 queries
p2p, Fall 05
21
Popularity of Queries [Sripanidkulchai01]
•
•
•
Very popular documents are approximately equally popular
Less popular documents follow a Zipf-like distribution (i.e., the
probability of seeing a query for the ith most popular query is
proportional to 1/(ialpha))
Access frequency of web documents also follows Zipf-like distributions
 caching might also work for Gnutella
p2p, Fall 05
22
Caching in Gnutella [Sripanidkulchai01]
•
•
Average bandwidth consumption in tests: 3.5Mbps
Best case: trace 2 (73% hit rate = 3.7 times traffic reduction)
p2p, Fall 05
23
Topology of Gnutella [Jovanovic01]
• Power-law properties verified (“find everything close by”)
• Backbone + outskirts
Power-Law
(PLRG):
Random
Graph
The node degrees follow a
power law distribution:
if one ranks all nodes from the
most connected to the least
connected, then
the i’th most connected node
has ω/ia neighbors,
where w is a constant.
p2p, Fall 05
24
Gnutella Backbone [Jovanovic01]
p2p, Fall 05
25
Why does it work? It’s a small World! [Hong01]
•
•
Milgram: 42 out of 160 letters from Oregon to Boston (~ 6 hops)
Watts: between order and randomness
– short-distance clustering + long-distance shortcuts
Regular graph:
n nodes, k nearest neighbors
 path length ~ n/2k
4096/16 = 256
p2p, Fall 05
Rewired graph (1% of nodes):
path length ~ random graph
clustering ~ regular graph
Random graph:
path length ~ log (n)/log(k)
~4
26
Links in the small World [Hong01]
•
“Scale-free” link distribution
– Scale-free: independent of the total number of nodes
– Characteristic for small-world networks
– The proportion of nodes having a given number of links n is:
P(n) = 1 /n k
– Most nodes have only a few connections
– Some have a lot of links: important for binding disparate regions
together
p2p, Fall 05
27
Freenet: Links in the small World [Hong01]
P(n) ~ 1/n 1.5
p2p, Fall 05
28
Freenet: “Scale-free” Link Distribution [Hong01]
p2p, Fall 05
29
Gnutella: New Measurements
[1] Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble:
A Measurement Study of Peer-to-Peer File Sharing Systems,
Proceedings of Multimedia Computing and Networking (MMCN)
2002, San Jose, CA, USA, January 2002.
[2] M. Ripeanu, I. Foster, and A. Iamnitchi.
Mapping the gnutella network: Properties of large-scale peer-to-peer systems and implications for
system design.
IEEE Internet Computing Journal, 6(1), 2002
[3] Evangelos P. Markatos,
Tracing a large-scale Peer to Peer System: an hour in the life of Gnutella,
2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2002.
[4] Y. HawatheAWATHE, S. Ratnasamy, L. Breslau, and S. Shenker.
Making Gnutella-like P2P Systems Scalable. In Proc. ACM SIGCOMM (Aug. 2003).
[5] Qin Lv, Pei Cao, Edith Cohen, Kai Li, Scott Shenker:
Search and replication in unstructured peer-to-peer networks. ICS 2002: 84-95
p2p, Fall 05
30
Gnutella: Bandwidth Barriers
•
•
Clip2 measured Gnutella over 1 month:
– typical query is 560 bits long (including TCP/IP headers)
– 25% of the traffic are queries, 50% pings, 25% other
– on average each peer seems to have 3 other peers actively connected
Clip2 found a scalability barrier with substantial performance degradation if
queries/sec > 10:
10 queries/sec
* 560 bits/query
* 4 (to account for the other 3 quarters of message traffic)
* 3 simultaneous connections
67,200 bps
 10 queries/sec maximum in the presence of many dialup users
 won’t improve (more bandwidth - larger files)
p2p, Fall 05
31
Gnutella: Summary
•
•
•
•
•
Completely decentralized
Hit rates are high
High fault tolerance
Adopts well and dynamically to changing peer populations
Protocol causes high network traffic (e.g., 3.5Mbps). For example:
– 4 connections C / peer, TTL = 7
TTL
i  26,240
2
*
C
*
(
C

1
)
– 1 ping packet can cause packets
i 0
No estimates on the duration of queries can be given
No probability for successful queries can be given
Topology is unknown  algorithms cannot exploit it
Free riding is a problem
Reputation of peers is not addressed
Simple, robust, and scalable (at the moment)

•
•
•
•
•
•
p2p, Fall 05
32
Hierarchical Networks (& Queries)
• DNS
– Hierarchical name space (“clients” + hierarchy of servers)
– Hierarchical routing w/aggressive caching
• 13 managed “root servers”
• Traditional pros/cons of Hierarchical data mgmt
– Works well for things aligned with the hierarchy
• Esp. physical locality
– Inflexible
• No data independence!
p2p, Fall 05
33
Commercial Offerings
•
JXTA
– Java/XML Framework for p2p applications
– Name resolution and routing is done with floods & superpeers
• Can always add your own if you like
•
MS WinXP p2p networking
– An unstructured overlay, flooded publication and caching
– “does not yet support distributed searches”
•
Both have some security support
– Authentication via signatures (assumes a trusted authority)
– Encryption of traffic
p2p, Fall 05
34
Lessons and Limitations
•
Client-Server performs well
– But not always feasible
• Ideal performance is often not the key issue!
•
Things that flood-based systems do well
– Organic scaling
– Decentralization of visibility and liability
– Finding popular stuff (e.g., caching)
– Fancy local queries
•
Things that flood-based systems do poorly
– Finding unpopular stuff [Loo, et al VLDB 04]
– Fancy distributed queries
– Vulnerabilities: data poisoning, tracking, etc.
– Guarantees about anything (answer quality, privacy, etc.)
p2p, Fall 05
35
Summary and Comparison of Approaches
Gnutella
FreeNet
Chord
CAN
P-Grid
p2p, Fall 05
Paradigm
Search Type
Breadth-first
search on graph
Depth-first
search on graph
Implicit binary
search trees
d-dimensional
space
Binary prefix
trees
String
comparison
String
comparison
Search Cost
(messages)
Autonomy
2 * i 0 C * (C  1)i very high
TTL
O(Log n) ?
very high
Equality
O(Log n)
restricted
Equality
O(d n^(1/d))
high
Prefix
O(Log n)
high
36
More on Search
Search Options
– Query Expressiveness (type of queries)
– Comprehensiveness (all or just the first (or k) results
– Topology
– Data Placement
– Message Routing
p2p, Fall 05
37
Comparison
Gnutella
Expressivness
Comprehensivness
Autonomy
Efficiency
Robustness
Others?





Topology
pwr law
Data Placement
arbitrary
Message Routing
flooding
p2p, Fall 05
CAN
38
Comparison
Gnutella
CAN










Topology
pwr law
grid
Data Placement
arbitrary
hashing
Message Routing
flooding
directed
Expressivness
Comprehensivness
Autonomy
Efficiency
Robustness
p2p, Fall 05
Others?
39
Parallel Clusters
links out of these clusters not shown
 search at only a fraction
of the nodes!
p2p, Fall 05
40
Other Open Problems besides Search: Security
•
•
•
•
Availability (e.g., coping with DOS attacks)
Authenticity
Anonymity
Access Control (e.g., IP protection, payments,...)
p2p, Fall 05
41
Trustworthy P2P
•
Many challenges here. Examples:
– Authenticating peers
– Authenticating/validating data
• Stored (poisoning) and in flight
– Ensuring communication
– Validating distributed computations
– Avoiding Denial of Service
• Ensuring fair resource/work allocation
– Ensuring privacy of messages
• Content, quantity, source, destination
p2p, Fall 05
42
Authenticity
title: origin of species
author: charles darwin
?
date: 1859
body: In an island far,
far away ...
...
p2p, Fall 05
43
More than Just File Integrity
title: origin of species
author: charles darwin
?
date: 1859 00
body: In an island far,
far away ...
checksum
p2p, Fall 05
44
More than Fetching One File
T=origin
Y=?
A=darwin
B=?
T=origin
Y=1800
A=darwin
p2p, Fall 05
T=origin T=origin
Y=1859
Y=1859
A=darwin A=darwin
B=abcd
T=origin
Y=1859
A=darwin
45
Solutions
•
•
•
Authenticity Function A(doc): T or F
– at expert sites, at all sites?
– can use signature expert
sig(doc)
Voting Based
– authentic is what majority says
Time Based
– e.g., oldest version (available) is authentic
p2p, Fall 05
user
46
Added Challenge: Efficiency
•
Example: Current music sharing
– everyone has authenticity function
– but downloading files is expensive
• Solution: Track peer
behavior
good peer
p2p, Fall 05
good peer
bad peer
47
Issues
•
•
•
•
•
•
Trust computations in dynamic system
Overloading good nodes
Bad nodes can provide good content sometimes
Bad nodes can build up reputation
Bad nodes can form collectives
...
p2p, Fall 05
48
Security & Privacy
• Issues:
– Anonymity
– Reputation
– Accountability
– Information Preservation
– Information Quality
– Trust
– Denial of service attacks
p2p, Fall 05
49
Blind Search Methods
Modified-BFS:
Choose only a ratio of the neighbors (some random subset)
Iterative Deepening:
Start BFS with a small TTL and repeat the BFS at
increasing depths if the first BFS fails
Works well when there is some stop condition and a
“small” flood will satisfy the query
Else even bigger loads than standard flooding
(more later …)
p2p, Fall 05
50
Random Walks:
Blind Search Methods
The node that poses the query sends out k query messages to an
equal number of randomly chosen neighbors
Each step follows each own path at each step randomly choosing
one neighbor to forward it
Each path – a walker
Two methods to terminate each walker:
 TTL-based or
 checking method (the walkers periodically check with the query source if the
stop condition has been met)
It reduces the number of messages to k x TTL in the worst case
Some kind of local load-balancing
p2p, Fall 05
51
Blind Search Methods
Random Walks:
In addition, the protocol bias its walks towards high-degree
nodes (choose the highest degree neighbor)
p2p, Fall 05
52
Blind Search Methods
Using Super-nodes:
Super (or ultra) peers are connected to each other
Each super-peer is also connected with a number of lead nodes
Routing among the super-peers
The super-peers then contact their leaf nodes
p2p, Fall 05
53
Blind Search Methods
Using Super-nodes:
Gnutella2
When a super-peer (or hub) receives a query from a leaf, it
forwards it to its relevant leaves and to neighboring super-peers
The hubs process the query locally and forward it to their
relevant leaves
Neighboring super-peers regularly exchange local repository
tables to filter out traffic between them
p2p, Fall 05
54
Blind Search Methods
Ultrapeers can be installed (KaZaA) or self-promoted (Gnutella)
Interconnection between
the superpeers
p2p, Fall 05
55
Informed Search Methods
Local Index
Each node indexes all files stored at all nodes within a certain
radius r and can answer queries on behalf of them
Search process at steps of r, hop distance between two
consecutive searches 2r+1
Increased cost for join/leave
• Flood inside each r with TTL = r, when join/leave the network
p2p, Fall 05
56
Informed Search Methods
Intelligent BFS
query
...
?
Nodes store simple statistics on its neighbors:
(query, NeigborID) tuples for recently answered requests from or
through their neighbors
so they can rank them
For each query, a node finds similar ones and selects a direction
How?
p2p, Fall 05
57
Informed Search Methods
Intelligent or Directed BFS
query
•
...
?
Heuristics for Selecting Direction
>RES: Returned most results for previous queries
<TIME: Shortest satisfaction time
<HOPS: Min hops for results
>MSG: Forwarded the largest number of messages (all types),
suggests that the neighbor is stable
<QLEN: Shortest queue
<LAT: Shortest latency
>DEG: Highest degree
p2p, Fall 05
58
Informed Search Methods
Intelligent or Directed BFS
• No negative feedback
• Depends on the assumption that nodes specialize in certain
documents
p2p, Fall 05
59
Informed Search Methods
APS
Again, each node keeps a local index with one entry for each object it has
requested per neighbor –
this reflects the relative probability of the node to be chosen to forward
the query
k independent walkers and probabilistic forwarding
Each node forwards the query to one of its neighbor based on the local
index (for each object, choose a neighbor using the stored probability)
If a walker, succeeds the probability is increased, else is decreased –
Take the reverse path to the requestor and update the probability, after a
walker miss (optimistic update) or after a hit (pessimistic update)
p2p, Fall 05
60
Topics in Database Systems: Data Management in
Peer-to-Peer Systems
Q. Lv et al, “Search and Replication in Unstructured Peer-toPeer Networks”, ICS’02
p2p, Fall 05
61
Search and Replication in Unstructured Peer-to-Peer
Networks
Type of replication depends on the search strategy used
(i)
A number of blind-search variations of flooding
(ii) A number of (metadata) replication strategies
Evaluation Method: Study how they work for a number of
different topologies and query distributions
p2p, Fall 05
62
Methodology
Three aspects of P2P
Performance of search depends on
 Network topology: graph formed by the p2p overlay network
 Query distribution: the distribution of query frequencies for
individual files
 Replication: number of nodes that have a particular file
Assumption: fixed network topology and fixed query distribution
Results still hold, if one assumes that the time to complete a search
is short compared to the time of change in network topology and in
query distribution
p2p, Fall 05
63
Network Topology
p2p, Fall 05
64
Network Topology
(1) Power-Law Random Graph
A 9239-node random graph
Node degrees follow a power law distribution
when ranked from the most connected to the least, the i-th
ranked has
ω/ia, where ω is a constant
Once the node degrees are chosen, the nodes are connected
randomly
p2p, Fall 05
65
Network Topology
(2) Normal Random Graph
A 9836-node random graph
p2p, Fall 05
66
Network Topology
(3) Gnutella Graph (Gnutella)
A 4736-node graph obtained in Oct 2000
Node degrees
distribution
p2p, Fall 05
roughly
follow
a
two-segment
power
law
67
Network Topology
(4) Two-Dimensional Grid (Grid)
A two dimensional 100x100 grid
p2p, Fall 05
68
Query Distribution
Assume m objects
Let qi be the relative popularity of the i-th object (in terms of
queries issued for it)
Values are normalized Σ i=1, m qi = 1
(1) Uniform: All objects are equally popular
qi = 1/m
(2) Zipf-like
qi  1 / i α
p2p, Fall 05
69
Replication
Each object i is replicated on ri nodes and the total number of
objects stored is R, that is
Σ i=1, m ri = R
(1) Uniform: All objects are replicated at the same number of
nodes
ri = R/m
(2) Proportional: The replication of an object is proportional to
the query probability of the object
ri  qi
(3) Square-root: The replication of an object i is proportional to
the square root of its query probability qi
ri  √qi
p2p, Fall 05
70
Query Distribution & Replication
When the replication is uniform, the query distribution is
irrelevant (since all objects are replicated by the same amount,
search times are equivalent for both hot and cold items)
When the query distribution is uniform, all three replication
distributions are equivalent (uniform!)
Thus, three relevant combinations query-distribution/replication
(1) Uniform/Uniform
(2) Zipf-like/Proportional
(3) Zipf-like/Square-root
p2p, Fall 05
71
Metrics
Pr(success): probability of finding the queried object before the
search terminates
#hops: delay in finding an object as measured in number of hops
p2p, Fall 05
72
Metrics
#msgs per node: Overhead of an algorithm as measured in
average number of search messages each node in the p2p has to
process
#nodes visited
Percentage of message duplication
Peak #msgs: the number of messages that the busiest node has
to process (to identify hot spots)
These are per-query measures
An aggregated performance measure, each query convoluted with
its probability
p2p, Fall 05
73
Simulation Methodology
For each experiment,
First select the topology and the query/replication distributions
For each object i with replication ri, generate numPlace different sets
of random replica placements (each set contains ri random nodes on
which to place the replicas of object i)
For each replica placement, randomly choose numQuery different nodes
form which to initiate the query for object i
Thus, we get numPlace x numQuery queries
In the paper, numPlace = 10 and numQuery = 100 -> 1000 different
queries per object
p2p, Fall 05
74
Limitation of Flooding
Choice of TTL
 Too low, the node may not find the object, even if it
exists
 Too high, burdens the network unnecessarily
Search for an object
that is replicated at
0.125% of the nodes (~11
nodes if total 9000)
Note that TTL depends
on the topology
Also
depends
replication
(which
however unknown)
p2p, Fall 05
on
is
75
Limitation of Flooding
Choice of TTL
Overhead
Also depends
the topology
p2p, Fall 05
on
76
Limitation of Flooding
There are many duplicate messages (due to cycles)
particularly in high connectivity graphs
Multiple copies of a query are sent to a node by multiple
neighbors
Duplicated messages can be detected and not forwarded
BUT, the number of duplicate messages can still be
excessive and worsens as TTL increases
p2p, Fall 05
77
Limitation of Flooding
Different nodes
p2p, Fall 05
78
Limitation of Flooding: Comparison of the topologies
Power-law and Gnutella-style graphs particularly bad with
flooding
Highly connected nodes means higher duplication
messages, because many nodes’ neighbors overlap
Random graph best,
Because in truly random graph the duplication ratio
(the likelihood that the next node already received
the query) is the same as the fraction of nodes visited
so far, as long as that fraction is small
Random graph better load distribution among nodes
p2p, Fall 05
79
Two New Blind Search Strategies
1.
Expanding Ring
deepening)
–
not
a
fixed TTL (iterative
2. Random Walks (more details) – reduce number of
duplicate messages
p2p, Fall 05
80
Expanding Ring or Iterative Deepening
Note that since flooding queries node in parallel, search
may not stop even if the object is located
Use successive floods with increasing TTL
 A node starts a flood with a small TTL
 If the search is not successful, the node increases the
TTL and starts another flood
 The process repeats until the object is found
Works well when hot objects are replicated more widely
than cold objects
p2p, Fall 05
81
Expanding Ring or Iterative Deepening (details)
Need to define
 A policy: at which depths the iterations are to occur (i.e.
the successive TTLs)
A time period W between successive iterations
 after waiting for a time period W, if it has not
received a positive response (i.e. the requested
object), the query initiator resends the query with a
larger TTL
Nodes maintain ID of queries for W + ε
Α node that receives the same message as in the previous
round does not process it, it just forwards it
p2p, Fall 05
82
Expanding Ring
Start with TTL = 1 and increase it linearly at each time by
a step of 2
For replication over
10%, search stops at
TTL 1 or 2
p2p, Fall 05
83
Expanding Ring
Comparison of
expanding ring
message
overhead
between
flooding
and
Even for objects that are replicated at 0.125% of the
nodes, even if flooding uses the best TTL for each topology,
expending ring still halves the per-node message overhead
p2p, Fall 05
84
Expanding Ring
More pronounced improvement for Random and Gnutella
graphs than for the PLRG partly because the very high
degree nodes in PLGR reduce the opportunity for
incremental retries in the expanding ring
Introduce slight increase in the delays of finding an object:
From 2 to 4 in flooding to 3 to 6 in expanding ring
p2p, Fall 05
85
Random Walks
Forward the query to a randomly chosen neighbor at each step
Each message a walker
k-walkers
The requesting node sends k query messages and each query
message takes its own random walk
k walkers after T steps should reach roughly the same number of
nodes as 1 walker after kT steps
So cut delay by a factor of k
16 to 64 walkers give good results
p2p, Fall 05
86
Random Walks
When to terminate the walks
 TTL-based
 Checking: the walker periodically checks with the original
requestor before walking to the next node (again uses (a larger)
TTL, just to prevent loops)
Experiments show that
checking once at every 4th step strikes a good balance
between the overhead of the checking message and the
benefits of checking
p2p, Fall 05
87
Random Walks
When compared to flooding:
The 32-walker random walk reduces message overhead by roughly
two orders of magnitude for all queries across all network
topologies at the expense of a slight increase in the number of
hops (increasing from 2-6 to 4-15)
When compared to expanding ring,
The 32-walkers random walk outperforms expanding ring as well,
particularly in PLRG and Gnutella graphs
p2p, Fall 05
88
Random Walks
Keeping State
 Each query has a unique ID and its k-walkers are tagged with
this ID
 For each ID, a node remembers the neighbor it has forwarded
the query
 When a new query with the same ID arrives, the node forwards
it to a different neighbor (randomly chosen)
Improves Random and Grid by reducing up to 30% the message
overhead and up to 30% the number of hops
Small improvements for Gnutella and PLRG
p2p, Fall 05
89
Principles of Search
 Adaptive termination is very important
Expanding ring or the checking method
 Message duplication should be minimized
Preferably, each query should visit a node just once
 Granularity of the coverage should be small
Increase of each additional step should not significantly
increase the number of nodes visited
p2p, Fall 05
90
Replication
Next time
p2p, Fall 05
91
Download