Declarative Networking: Boon Thau Loo Extensible Networks with Declarative Queries

advertisement
Declarative Networking:
Extensible Networks with Declarative Queries
Boon Thau Loo
University of California, Berkeley
Era of change for the Internet
“in the thirty-odd years since its invention, new
uses and abuses, …., are pushing the Internet into
realms that its original design neither anticipated
nor easily accommodates…..”
Overcoming Barriers to Disruptive Innovation in Networking,
NSF Workshop Report ‘05
Efforts at Internet Innovation
Evolution: Overlay Networks



Commercial (Akamai, VPN, MS Exchange servers)
P2P (filesharing, telephony)
Research prototypes on testbed (PlanetLab)
Revolution: Clean slate design


NSF Future Internet Design (FIND)
program
Overlay
NSF Global Environment for Network
Investigations (GENI) initiative
Missing: software tools that can
significantly
Internet
accelerate Internet innovation
Approach: Declarative Networking
A declarative framework for networks:



Declarative language: “ask for what you
want, not how to implement it”
Declarative specifications of networks,
compiled to distributed dataflows
Runtime engine to execute distributed
dataflows
Observation: Recursive queries are a
natural fit for routing
P2 Declarative Networking System
http://p2.cs.berkeley.edu
P2 Declarative Networking System
Query Planner
UDP
Rx
Dataflow Engine
lookup
Round
Robin
lookup
CC
Rx
Network
Specifications
as Queries
CC
Tx
Queue
Queue
Dataflow
path
...
Local Tables
UDP
Tx
Demux
link
Network
Protocols
The Case for Declarative
Ease of programming:



Compact and high-level representation of protocols
Orders of magnitude reduction in code size
Easy customization
Safety:


Queries are “sandboxed” within query processor
Potential for static analysis techniques on safety
What about efficiency?



No fundamental overhead when executing standard
routing protocols
Application of well-studied query optimizations
Note: Same question was asked of relational
databases in the 70’s.
Main Contributions
Declarative Routing [HotNets ’04, SIGCOMM ’05]:

Extensible Routers (balance of flexibility, efficiency
and safety).
Declarative Overlays [SOSP ’05]:

Rapid prototyping of new overlay networks
Database Fundamentals [SIGMOD ‘06]:



Network specific query language and semantics
Distributed recursive query execution strategies
Query Optimizations, classical and new
A Breadth of Use Cases
Implemented to date:






Textbook routing protocols (3-8 lines, UCB/Wisconsin)
Chord DHT overlay routing (47 lines, UCB/IRB)
Narada mesh (16 lines, UCB/Intel)
Distributed Gnutella/Web crawlers (Dataflow, UCB)
Lamport/Chandy snapshots (20 lines, Intel/Rice/MPI)
Paxos distributed consensus (44 lines, Harvard)
In Progress:


OSPF routing (UCB)
Distributed Junction Tree statistical inference (UCB)
Outline
Background
The Connection: Routing as a Query


Execution Model
Path-Vector Protocol Example
 Query specification  protocol implementation

More Examples
Realizing the Connection

P2: Declarative Routing Engine
Beyond routing: Declarative Overlays
Conclusion
Traditional Router
Routing
Protocol
Control Plane
Neighbor Table Forwarding
updates
Table updates
Forwarding Plane
Neighbor Table
Forwarding Table
Packets
Routing
Infrastructure
Packets
Traditional Router
Review: Path Vector Protocol
Advertisement: entire path to a destination
Each node receives advertisement, add itself to
path and forward to neighbors
path=[a,b,c,d]
a
path=[b,c,d]
path=[c,d]
b
b advertises [b,c,d]
c
c advertises [c,d]
d
Declarative Router
P2 Engine
Declarative
Queries
Control Plane
Routing
Protocol
Input
Tables
Output
Tables
Neighbor Table Forwarding
updates
Table updates
Forwarding Plane
Neighbor Table
Forwarding Table
Packets
Routing
Infrastructure
Packets
Declarative
Traditional Router
Router
Introduction to Datalog
Datalog rule syntax:
<result>  <condition1>, <condition2>, … , <conditionN>.
Head
Body
Types of conditions is body:


Input tables: link(src,dst) predicate
Arithmetic and list operations
Head is an output table

Recursive rules: result of head in rule body
All-Pairs Reachability
R1: reachable(S,D)  link(S,D)
R2: reachable(S,D)  link(S,Z), reachable(Z,D)
“For all nodes
S,D, is a link from node a to node b”
link(a,b)
– “there
If there is a link from S to D, then S can reach D”.
reachable(a,b) – “node a can reach node b”
Input: link(source, destination)
Output: reachable(source, destination)
All-Pairs Reachability
R1: reachable(S,D)  link(S,D)
R2: reachable(S,D)  link(S,Z), reachable(Z,D)
“For all nodes S,D and Z,
If there is a link from S to Z, AND Z can reach D, then S
can reach D”.
Input: link(source, destination)
Output: reachable(source, destination)
Towards Network Datalog
Specify tuple placement

Value-based partitioning of tables
Tuples to be combined are co-located

Rule rewrite ensures body is always single-site
All communication is among neighbors


No multihop routing during basic rule execution
Enforced via simple syntactic restrictions
Network Datalog
Location
Specifier “@S”
R1: reachable(@S,D)  link(@S,D)
R2: reachable(@S,D)  link(@S,Z), reachable(@Z,D)
Query: reachable(@a,N)
reachable(@M,N)
link
Input table:
Output table:
All-Pairs Reachability
link
link
link
@S
D
@S
D
@S
D
@S
D
@a
b
@b
c
@c
b
@d
c
@b
a
@c
d
a
b
c
d
reachable
reachable
reachable
reachable
@S
D
@a
b
@a
c
@b
c
@a
d
@b
d
@S
D
@S
D
@S
D
@b a
Query: reachable(@a,N)
@c
a
@d
a
@c
b
@d
b
@c
d
@d
c
Path Vector in Network Datalog
R1: path(@S,D,P)  link(@S,D), P=(S,D).
R2: path(@S,D,P)  link(@Z,S), path(@Z,D,P2), P=SP2.
Query: path(@S,D,P)
Add S to front of P2
Input: link(@source, destination)
Query output: path(@source, destination, pathVector)
Query Execution
R1: path(@S,D,P)  link(@S,D), P=(S,D).
R2: path(@S,D,P)  link(@Z,S), path(@Z,D,P2), P=SP2.
Query: path(@a,d,P,C)
link
Neighbor
table:
link
D
@S
D
@S
D
@S
D
@a
b
@b
c
@c
b
@d
c
@b
a
@c
d
path
@S
link
@S
a
Forwarding
table:
link
D
P
@S
b
c
path
path
D
P
d
@S
D
P
@c
d
[c,d]
Query Execution
R1: path(@S,D,P)  link(@S,D), P=(S,D).
R2: path(@S,D,P)  link(@Z,S), path(@Z,D,P2), P=SP2.
Query: path(@a,d,P,C)
Matching variable Z = “Join”
link
Neighbor
@S D
table:Communication
@a b
link
@S
link
D
link
@S
patterns
are
identical
to
@b c
@c b
@d
those in the actual
@b path
a
vector
@c protocol
d
a
b
path(@a,d,[a,b,c,d])
path
Forwarding
table:
@S
D
@a
d
@S
PP
[a,b,c,d]
D
c
path(@b,d,[b,c,d])
path
d
path
@S
D
PP
@S
D
P
@b
d
[b,c,d]
@c
d
[c,d]
D
c
Sanity Check
All-pairs shortest latency path query:



Query convergence time: proportional to diameter of the
network. Same as hand-coded PV.
Per-node communication overhead: Increases linearly
with the number of nodes
Same scalability trends compared with PV/DV protocols
Outline
Background
The Connection: Routing as a Query


Execution Model
Path-Vector Protocol Example
 Query specifications  protocol implementation

Example Queries
Realizing the Connection
Declarative Overlays
Conclusion
Example Routing Queries
Best-Path Routing
Distance Vector
Dynamic Source Routing
Policy Decisions
QoS-based Routing
Link-state
Multicast Overlays (Single-Source & CBT)
Takeaways:
• Compact, natural representation
• Customization: easy to make modifications to get new protocols
• Connection between query optimization and protocols
All-pairs All-paths
R1: path(@S,D, P,C)  link(@S,D,C), P=(S,D).
R2: path(@S,D,P,C)  link(@S,Z,C1), path(Z,D, P2,C2), C=C1+C2,
P=SP2.
Query: path(@S,D, P,C)
All-pairs Best-path
R1: path(@S,D,P,C)  link(@S,D,C), P=(S,D).
R2: path(@S,D,P,C)  link(@S,Z,C1), path(@Z,D,P2,C2), C=C1+C2,
P=SP2.
R3: bestPathCost(@S,D,min<C>)  path(@S,D,Z,C).
R4: bestPath(@S,D,Z,C)  bestPathCost(@S,D,C), path(@S,D,P,C).
Query: bestPath(@S,D,P,C)
Customizable Best-Paths
R1: path(@S,D,P,C)  link(@S,D,C), P=(S,D).
R2: path(@S,D,P,C)  link(@S,Z,C1), path(@Z,D,P2,C2), C=FN(C1,C2),
P=SP2.
R3: bestPathCost(@S,D,AGG<C>)  path(@S,D,Z,C).
R4: bestPath(@S,D,Z,C)  bestPathCost(@S,D,C), path(@S,D,P,C).
Query: bestPath(@S,D,P,C)
Customizing C, AGG and FN: lowest RTT, lowest loss rate, highest
capacity, best-k
All-pairs All-paths
R1: path(@S,D, P,C)  link(@S,D,C) , P=(S,D).
R2: path(@S,D,P,C)  link(@S,Z,C1), path(@Z,D,P2,C2), C=C1+C2,
P=SP2.
Query: path(@S,D,P ,C)
Distance Vector
R1: path(@S,D,D,C)  link(@S,D,C).
R2: path(@S,D, Z,C)  link(@S,Z,C1), path(@Z,D,W,C2), C=C1+C2
R3: shortestLength(@S,D,min<C>)  path(@S,D,Z,C).
R4: nextHop(@S,D,Z,C)  nextHop(@S,D,Z,C), shortestLength(@S,D,C).
Query: nextHop (@S,D,Z,C)
Count to Infinity problem?
Distance Vector with Split Horizon
R1: path(@S,D,D,C)  link(@S,D,C)
R2: path(@S,D,Z,C)  link(@S,Z,C1), path(@Z,D,W,C2), C=C1+C2 , W!=S
R3: shortestLength(@S,D,min<C>)  path(@S,D,Z,C).
R4: nextHop(@S,D,Z,C)  nextHop(@S,D,Z,C), shortestLength(@S,D,C).
Query: nextHop(@S,D,Z,C)
Distance Vector with Poisoned
Reverse
R1: path(@S,D,D,C)  link(@S,D,C)
R2: path(@S,D,Z,C)  link(@S,Z,C1), path(@Z,D,W,C2), C=C1+C2, W!=S
R3: path(@S,D,Z,C)  link(@S,Z,C1), path(@Z,D,W,C2), C=, W=S
R4: shortestLength(@S,D,min<C>)  path(@S,D,Z,C).
R5: nextHop(@S,D,Z,C)  nextHop(@S,D,Z,C), shortestLength(@S,D,C).
Query: nextHop(@S,D,Z,C)
All-pairs All-Paths
R1: path(@S,D,P,C)  link(@S,D,C), P= (S,D).
R2: path(@S,D,P,C)  link(@S,Z,C1), path(@Z,D,P2,C2), C=C1+C2,
P=SP2.
Query: path(@S,D,P,C)
Dynamic Source Routing
R1: path(@S,D,P,C)  link(@S,D,C), P= (S,D).
R2: path(@S,D,P,C)  link(@Z,D,C2), path(@S,Z,P1,C1), C=C1+C2,
P=P
P=SP
1D.
2.
Query: path(@S,D,P,C)
Predicate reordering: path vector protocol
source routing
 dynamic
Other Routing Examples
Best-Path Routing
Distance Vector
Dynamic Source Routing
Policy Decisions
QoS-based Routing
Link-state
Multicast Overlays (Single-Source & CBT)
Outline
Background
The Connection: Routing as a Query
Realizing the Connection




Dataflow Generation and Execution
Recursive Query Processing
Optimizations
Semantics in a dynamic network
Beyond routing: Declarative Overlays
Conclusion
Dataflow Graph
UDP
Rx
Round
Robin
lookup
Network
Out
CC
Tx
Messages
Queue
Queue
Messages
lookup
CC
Rx
Network
In
Strands
path
...
UDP
Tx
Demux
link
Local Tables
Single P2 Node
Nodes in dataflow graph (“elements”):



Network elements (send/recv, cc, retry, rate limitation)
Flow elements (mux, demux, queues)
Relational operators (selects, projects, joins, aggregates)
UDP
Rx
lookup
CC
Rx
Dataflow Strand
Round
Robin
lookup
CC
Tx
Queue
Queue
path
...
Local Tables
Strand Elements
Input
Tuples
Element1
Element2
…
Elementn
Output
Tuples
Input: Incoming network messages, local table changes,
local timer events
Condition: Process input tuple using strand elements
Output: Outgoing network messages, local table updates
UDP
Tx
Demux
link
Rule  Dataflow “Strands”
UDP
Rx
lookup
CC
Rx
Round
Robin
R2: path(@S,D,P)  link(@S,Z), path(@Z,D,P2),
lookup
P=SP2.
CC
Tx
Queue
Queue
path
...
Local Tables
UDP
Tx
Demux
link
Localization Rewrite
Rules may have body predicates at different locations:
R2: path(@S,D,P)  link(@S,Z), path(@Z,D,P2), P=SP2.
Matching variable Z = “Join”
Rewritten rules:
R2a: linkD(S,@D)  link(@S,D)
R2b: path(@S,D,P)  linkD(S,@Z), path(@Z,D,P2), P=SP2.
Matching variable Z = “Join”
Dataflow Strand Generation
R2b: path(@S,D,P)  linkD(S,@Z), path(@Z,D,P2), P=SP2.
Strand Elements
Join
path.Z =
linkD.Z
Project
path(S,D,P)
Send to
path.S
linkD
linkD
Join
linkD.Z =
path.Z
path
Project
path(S,D,P)
Send to
path.S
Network In
Network In
path
Recursive Query Evaluation
Semi-naïve evaluation:


Iterations (rounds) of synchronous computation
Results from iteration ith used in (i+1)th
10
9
8
7
6
5
4
3
2
1
Link Table
9
7
3-hop
4
8
2-hop
1-hop
Path Table
1
2
5
0
3
6
Network
Problem: Unpredictable delays and failures
10
Pipelined Semi-naïve (PSN)
Fully-asynchronous evaluation:


Computed tuples in any iteration pipelined to next iteration
Natural for distributed dataflows
10
9
6
3
8
5
2
7
4
1
Link Table
Path Table
9
7
4
2
1
5
8
Relaxation
of 0
semi-naïve 3
6
Network
10
Pipelined Evaluation
Challenges:


Does PSN produce the correct answer?
Is PSN bandwidth efficient?
 I.e. does it make the minimum number of inferences?
Duplicate avoidance: local timestamps
Theorems:


RSSN(p) = RSPSN(p), where RS is results set
No repeated inferences in computing RSPSN(p)
p(x,z) :- p1(x,y), p2(y,z), …, pn(y,z), q(z,w)
recursive w.r.t. p
Outline
Background
The Connection: Routing as a Query
P2 Declarative Networking System



Dataflow Generation and Execution
Recursive Query Processing
Optimizations
Beyond routing: Declarative Overlays
Conclusion
Overview of Optimizations
Traditional: evaluate in the NW context



Aggregate Selections
Magic Sets rewrite
Predicate Reordering
PV/DV
 DSR
New: motivated by NW context

Multi-query optimizations:
 Query Results caching
 Opportunistic message sharing

Cost-based optimizations (work-in-progress)
 Neighborhood density function
 Hybrid rewrites
Zone
Routing
Protocol
Aggregate Selections
Prune communication using running state of
monotonic aggregate


Avoid sending tuples that do not affect value of agg
E.g., shortest-paths query
Challenge in distributed setting:


Out-of-order (in terms of monotonic
aggregate) arrival of tuples
Solution: Periodic aggregate selections
 Buffer up tuples, periodically send best-agg
tuples
Aggregate Selections Evaluation
P2 implementation of routing protocols on Emulab (100 nodes)
All-pairs best-path queries (with aggregate selections)
Aggregate Selections reduces communication overhead

More effective when link metric correlated with network delay
Periodic AS reduces communication overhead further
Outline
Background
The Connection: Routing as a Query
Realizing the Connection

P2: Declarative Routing Engine
Beyond routing: Declarative Overlays
Conclusion
Recall: Declarative Routing
P2 Engine
Declarative
Queries
Control Plane
Forwarding
Plane
Input
Tables
Neighbor Table
updates
Neighbor Table
Output
Tables
Forwarding
Table updates
Forwarding Table
Packets
Routing
Infrastructure
Packets
Declarative Router
Declarative Overlays
P2 Engine
Declarative
Queries
Control and
forwarding Plane
Overlay topology tables
Packets
Packets
Application level
Internet
Default Internet
Routing
Declarative Overlay Node
Internet
Declarative Overlays
More challenging to specify:




Not just querying for routes using input links
Rules for generating overlay topology
Message delivery, acknowledgements, failure
detection, timeouts, periodic probes, etc…
Extensive use of timer-based event predicates:
ping(@D,S) :- periodic(@S,10), link(@S,D)
P2-Chord
Chord Routing, including:
 Multiple successors
 Stabilization
 Optimized finger
maintenance
 Failure detection
47 rules
13 table definitions
MIT-Chord: x100 more code
Another example:

Narada mesh in 16 rules
10 pt font
Actual Chord Lookup Dataflow
Join
lookup.NI ==
node.NI
TimedPullPush
0
Join
lookup.NI ==
node.NI
L3
Join
bestLookupDist.NI
== node.NI
TimedPullPush
0
Select
K in (N, S]
Project
lookupRes
Agg min<D>
on finger
D:=K-B-1, B in (N,K)
TimedPullPush
0
Agg min<BI>
on finger
D==K-B-1, B in (N,K)
RoundRobin
Materializations
TimedPullPus
h0
Insert
node
Insert
Demux
(tuple
name)
finger
TimedPullPus
h0
Insert
bestSucc
Queue
lookup
bestLookupDist
Mux
L2
Join
lookup.NI ==
bestSucc.NI
Dup
Network In
L1
node
bestSucc
finger
Demux
(@local?)
remote
local
Queue
Network Out
P2-Chord Evaluation
P2 nodes running Chord on 100 Emulab nodes:



Logarithmic lookup hop-count and state (“correct”)
Median lookup latency: 1-1.5s
BW-efficient: 300 bytes/s/node
Moving up the stack
Querying the overlay:


Routing tables are “views” to be queried
Queries on route resilience, network diameter, path length
Recursive queries for network discovery:


Distributed Gnutella crawler on PlanetLab [IPTPS ‘03]
Distributed web crawler over DHTs on PlanetLab
Oct ’03 distributed crawl:
100,000 nodes, 20 million files
Outline
Background
The Connection: Routing as a Query
Realizing the Connection
Beyond routing: Declarative Overlays
Conclusion
A Sampling of Related Work
Databases


Recursive queries: software analysis, trust
management, distributed systems diagnosis
Opportunities : Computational biology, data
integration, sensor networks
Networking


XORP – Extensible Routers
High-level routing specifications
 Meta-Routing, Routing logic
Future Directions
Declarative Networking:



Static checks on desirable network properties
Automatic cost-based optimizations
Component-based network abstractions
Core Internet Infrastructure


Declarative specifications of ISP configurations
P2 deployment in routers
Distributed Data Management on
Declarative Networks
Data Management
Applications
SQL, XML,
Datalog
Distributed
Queries
P2P Search, network monitoring, P2P
data integration, collaborative filtering,
content distribution networks…
Distributed
Algorithms
P2: Declarative
Networks
Consensus (Harvard), 2PC,
Byzantine, Snapshots
(Rice/Intel), Replication
Customized routes, DHTs, Flood,
Gossip, Multicast Mesh
Run-time cross-layer optimizations:


Reoptimize data placement and queries
Reconfigure networks based on data and query workloads
Other Work
Internet-Scale Query Processing


PIER – Distributed query processor on DHTs
http://pier.cs.berkeley.edu [VLDB 2003, CIDR 2005]
P2P Search Infrastructures


P2P Web Search and Indexing [IPTPS 2003]
Gnutella measurements on PlanetLab [IPTPS 2004]
 Distributed Gnutella crawler and monitoring

Hybrid P2P search [VLDB 2004]
Contributions and Summary
P2 Declarative Networking System

Declarative Routing Engine
 Extensible routing infrastructure

Declarative Overlays
 Rapid prototyping overlay networks

Database fundamentals
 Query language
 New distributed query execution strategies and
optimizations
 Semantics in dynamic networks
Period of flux in Internet research

Declarative Networks can play an important role
Thank You
Download