Implementing Declarative Overlays Boon Thau Loo P2

advertisement
Implementing Declarative
Overlays
Boon Thau Loo1
Tyson Condie1, Joseph M. Hellerstein1,2,
Petros Maniatis2, Timothy Roscoe2, Ion Stoica1
1University
of California at Berkeley, 2Intel Research Berkeley
P2
Overlays Everywhere…
Overlay networks are widely used today:


Routing and forwarding component of large-scale
distributed systems
Provide new functionality over existing
infrastructure
Many examples, variety of requirements:



Packet delivery: Multicast, RON
Content delivery: CDNs, P2P file sharing, DHTs
Enterprise systems: MS Exchange
Overlay networks are an integral part of many
large-scale distributed systems.
Problem
Non-trivial to design, build and deploy
an overlay correctly:

Iterative design process:
 Desired properties  Distributed algorithms
and protocols  Simulation 
Implementation  Deployment  Repeat…

Each iteration takes significant time and
utilizes a variety of expertise
The Goal of P2
Make overlay development more accessible:

Focus on algorithms and protocol designs, not the
implementation
Tool for rapid prototyping of new overlays:



Specify overlay network at a high level
Automatically translate specification to protocol
Provide execution engine for protocol
Aim for “good enough” performance


Focus on accelerating the iterative design process
Can always hand-tune implementation later
Outline
Overview of P2
Architecture By Example



Data Model
Dataflow framework
Query Language
Chord
Additional Benefits


Overlay Introspection
Automatic Optimizations
Conclusion
Traditional Overlay Node
Packets In
Overlay
Program
node
route
Packets Out
...
Network State
Traditional Overlay Node
P2 Overlay Node
Overlay description:
declarative query language
Runtime dataflows Planner
maintain network state
Overlay
...
Program
...
...
node
route
...
Local Tables
P2 Query Processor
Network Out Dataflow
Network In Dataflow
Packets In
Overlay description:
dataflow scripting language
Packets Out
Advantages of the P2 Approach
Declarative Query Language


Concise/high level expression
Statically checkable (termination, correctness)
Ease of modification
Unifying framework for introspection and
implementation
Automatic optimizations

Query and dataflow level
Network Out Dataflow
Network In Dataflow
Data Model
...
...
...
node
route
...
Local Tables
Relational data: relational tables and
tuples
Two kinds of tables:

Stored, soft state:
 E.g. neighbor(Src,Dst), forward(Src,Dst,NxtHop)

Transient streams:
 Network messages: message (Rcvr, Dst)
 Local timer-based events: periodic (NodeID,10)

Network Out Dataflow
Dataflow graph
Network In Dataflow
Dataflow framework
...
...
...
node
route
...
Local Tables
C++ dataflow elements
Similar to Click:


Flow elements (mux, demux, queues)
Network elements (cc, retry, rate limitation)
In addition:

Relational operators (joins, selections,
projections, aggregation)
Outline
Overview of P2
Architecture By Example



Data Model
Dataflow framework
Query Language
Chord in P2
Additional Benefits


Overlay Introspection
Automatic Optimizations
Conclusion
Simple ring
routing
example
Example: Ring Routing
Objects
“served” by
successor
Each node has an
address and an
identifier
3
60
0
58
13
56
Each object
has an
identifier.
15
42
18
40
37
33
24
28
22
Every node
knows its
successor
Ring State
Stored tables:


node(NAddr, N)
succ(NAddr, Succ, SAddr)
60
58
3
node(IP58,58)
succ(IP58,60,IP60)
13
15
40
node(IP40,40)
succ(IP40,58,IP58)
18
37
28
Example: Ring lookup
Find the responsible node
for a given key k?
n.lookup(k)
3
60
0
58
13
56
if k in (n, n.successor)
15
return n.successor.addr
else
return n.successor. lookup(k)
42
18
40
37
33
24
28
22
Ring Lookup Events
Event streams:


lookup(Addr, Req, K)
response(Addr, K, Owner)
60
3
node(IP58,58)
succ(IP58,60,IP60)
response(IP37,59,IP60)
58
13
15
n.lookup(k)
node(IP40,40)
40 succ(IP40,58,IP58)
n.successor)
lookup(IP58,IP37,59)
if k in (n,
return n.successor.addr
else
return n.successor. lookup(k)
37
18
lookup(IP37
40,IP37,59)
28
Pseudocode  Dataflow “Strands”
Pseudocode:
...
...
node
succ
...
Local Tables
Network Out Dataflow
Network In Dataflow
n.lookup(k) Strand 1
if k in (n, n.successor)
2
return Strand
n.successor.addr
else
...
return n.successor.
lookup(k)
Network Out Dataflow
Network In Dataflow
Dataflow Strand
Strand 1
Strand 2
...
...
...
node
succ
...
Local Tables
Strand Elements
Event
Stream
Element1
Element2
…
Elementn
Actions
Event: Incoming network messages, periodic timers
Condition: Process event using strand elements
Action: Outgoing network messages, local table updates
Strand 2
...
...
...
node
Stored tables


node(NAddr, N)
succ(NAddr, Succ, SAddr)
Event streams


lookup(Addr, Req, K)
response(Addr, K, Owner)
n.lookup(k)
Network Out Dataflow
Network In Dataflow
Pseudocode  Strand 1
Strand 1
succ
...
Local Tables
if k in (n, n.successor)
return n.successor.addr
else
return n.successor.lookup(k)
Event: RECEIVE lookup(NAddr, Req, K)
Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr)
& K in (N, Succ]
Action: SEND response(Req, K, SAddr) to Req
Network Out Dataflow
Network In Dataflow
Pseudocode to Strand 1
Strand 1
Strand 2
...
...
...
node
succ
...
Local Tables
Event: RECEIVE lookup(NAddr, Req, K)
Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr)
& K in (N, Succ]
Action: SEND response(Req, K, SAddr) to Req
Dataflow strand
lookup
Join
Match
Join
Match
lookup.Addr
= node.Addr
lookup.Addr
= succ.Addr
node
succ
Select
Filter
K in (N,Succ)
Project
Format
Response(Req,
K,SAddr)
Response
n.lookup(k)
if k in (n, n.successor)
return n.successor.addr
else
return n.successor. lookup(k)
Network Out Dataflow
Network In Dataflow
Pseudocode to Strand 2
Strand 1
Strand 2
...
...
...
node
succ
...
Local Tables
Event: RECEIVE lookup(NAddr, Req, K)
Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr)
& K not in (N, Succ]
Action: SEND lookup(SAddr, Req, K) to SAddr
Dataflow strand
lookup
Join
Join
lookup.Addr
= node.Addr
lookup.Addr
= succ.Addr
node
succ
Select
K not in
(N,Succ)
Project
lookup(SAddr,
Req,K)
lookup
n.lookup(k)
if k in (n, n.successor)
return n.successor.addr
else
return n.successor. lookup(k)
Strand Execution
lookup
lookup
response
Strand 2
...
...
...
node
succ
...
Local Tables
Network Out Dataflow
Network In Dataflow
lookup
lookup
Strand 1
lookup/
response
Actual Chord Lookup Dataflow
Join
lookup.NI ==
node.NI
TimedPullPush
0
Join
lookup.NI ==
node.NI
L3
Join
bestLookupDist.NI
== node.NI
TimedPullPush
0
Select
K in (N, S]
Project
lookupRes
Agg min<D>
on finger
D:=K-B-1, B in (N,K)
TimedPullPush
0
Agg min<BI>
on finger
D==K-B-1, B in (N,K)
RoundRobin
Materializations
TimedPullPus
h0
Insert
node
Insert
Demux
(tuple
name)
finger
TimedPullPus
h0
Insert
bestSucc
Queue
lookup
bestLookupDist
Mux
L2
Join
lookup.NI ==
bestSucc.NI
Dup
Network In
L1
node
bestSucc
finger
Demux
(@local?)
remote
local
Queue
Network Out
Query Language: Overlog
“SQL” equivalent for overlay networks
Based on Datalog:



Declarative recursive query language
Well-suited for querying properties of graphs
Well-studied in database literature
 Static analysis, optimizations, etc
Extensions:

Data distribution, asynchronous messaging,
periodic timers and state modification
Query Language: Overlog
Datalog rule syntax:
<head>  <condition1>, <condition2>, … , <conditionN>.
Overlog rule syntax:
<Action>  <event>, <condition1>, … , <conditionN>.
Query Language: Overlog
Overlog rule syntax:
<Action>  <event>, <condition1>, … , <conditionN>.
Event: RECEIVE lookup(NAddr, Req, K)
Condition: lookup(NAddr, Req, K) & node(NAddr, N)
& succ(NAddr, Succ, SAddr) & K in (N, Succ]
Action: SEND response(Req, K, SAddr) to Req
response@Req(Req, K, SAddr)  lookup@NAddr(Naddr, Req, K),
node@NAddr(NAddr, N),
succ@NAddr(NAddr, Succ, SAddr),
K in (N,Succ].
P2-Chord
Chord Routing, including:
 Multiple successors
 Stabilization
 Optimized finger
maintenance
 Failure recovery
47 OverLog rules
13 table definitions
Other examples:

Narada, flooding, routing
protocols
10 pt font
Performance Validation
Experimental Setup:


100 nodes on Emulab testbed
500 P2-Chord nodes
Main goals:

Validate expected network properties
Sanity Checks
Logarithmic diameter and state (“correct”)
BW-efficient: 300 bytes/s/node
Benefits of P2
Introspection with Queries
Automatic optimizations
Reconfigurable Transport (WIP)
Introspection with Queries
With Atul Singh (Rice) and Peter Druschel (MPI)
Unifying framework for debugging and
implementation

Same query language, same platform
Execution tracing/logging


Rule and dataflow level
Log entries stored as tuples and queried
Correctness invariants, regression tests as
queries:



“Is the Chord ring well formed?” (3 rules)
“What is the network diameter?” (5 rules)
“Is Chord routing consistent?” (11 rules)
Automatic Optimizations
Application of traditional Datalog optimizations to
network routing protocols (SIGCOMM 2005)
Multi-query sharing:



Common “subexpression” elimination
Caching and reuse of previously computed results
Opportunistically share message propagation across rules
lookup
lookup
Join
Join
Select
lookup.Addr
= node.Addr
lookup.Addr
= succ.Addr
K not in
(N,Succ)
Join
Join
Select
lookup.Addr
= node.Addr
lookup.Addr
= succ.Addr
K in
(N,Succ)
Project
lookup
Project
response
lookup(SAddr,
Req,K)
Response(Req,
K,SAddr)
Automatic Optimizations
Cost-based optimizations

lookup
Join ordering affects performance
Join
lookup.Addr
= node.Addr
succ.Addr
=
Join
lookup.Addr
=
= node.Addr
succ.Addr
Select
K not in
(N,Succ)
Select
K in
(N,Succ)
Project
lookup
Project
response
lookup(SAddr,R
eq,K)
Response(Req,
K,SAddr)
Open Questions
The role of rapid prototyping?
How good is “good enough”
performance for rapid prototypes?
When do developers move from rapid
prototypes to hand-crafted code?
Can we get achieve “production quality”
overlays from P2?
Future Work
“Right” language
Formal data and query semantics
Static analysis



Optimizations
Termination
Correctness
Conclusion
P2: Declarative Overlays

Tool for rapid prototyping new overlay
networks
Declarative Networks


Research agenda: Specify and construct
networks declaratively
Declarative Routing : Extensible Routing
with Declarative Queries (SIGCOMM 2005)
Thank You
http://p2.cs.berkeley.edu
P2
Latency CDF for P2-Chord
Median and average latency around 1s.
Download