Declarative Networking: Extensible Networks with Declarative Queries Boon Thau Loo University of California, Berkeley Era of change for the Internet “in the thirty-odd years since its invention, new uses and abuses, …., are pushing the Internet into realms that its original design neither anticipated nor easily accommodates…..” Overcoming Barriers to Disruptive Innovation in Networking, NSF Workshop Report ‘05 Efforts at Internet Innovation Evolution: Overlay Networks Commercial (Akamai, VPN, MS Exchange servers) P2P (filesharing, telephony) Research prototypes on testbed (PlanetLab) Revolution: Clean slate design NSF Future Internet Design (FIND) program Overlay NSF Global Environment for Network Investigations (GENI) initiative Missing: software tools that can significantly Internet accelerate Internet innovation Approach: Declarative Networking A declarative framework for networks: Declarative language: “ask for what you want, not how to implement it” Declarative specifications of networks, compiled to distributed dataflows Runtime engine to execute distributed dataflows Observation: Recursive queries are a natural fit for routing P2 Declarative Networking System http://p2.cs.berkeley.edu P2 Declarative Networking System Query Planner UDP Rx Dataflow Engine lookup Round Robin lookup CC Rx Network Specifications as Queries CC Tx Queue Queue Dataflow path ... Local Tables UDP Tx Demux link Network Protocols The Case for Declarative Ease of programming: Compact and high-level representation of protocols Orders of magnitude reduction in code size Easy customization Safety: Queries are “sandboxed” within query processor Potential for static analysis techniques on safety What about efficiency? No fundamental overhead when executing standard routing protocols Application of well-studied query optimizations Note: Same question was asked of relational databases in the 70’s. Main Contributions Declarative Routing [HotNets ’04, SIGCOMM ’05]: Extensible Routers (balance of flexibility, efficiency and safety). Declarative Overlays [SOSP ’05]: Rapid prototyping of new overlay networks Database Fundamentals [SIGMOD ‘06]: Network specific query language and semantics Distributed recursive query execution strategies Query Optimizations, classical and new A Breadth of Use Cases Implemented to date: Textbook routing protocols (3-8 lines, UCB/Wisconsin) Chord DHT overlay routing (47 lines, UCB/IRB) Narada mesh (16 lines, UCB/Intel) Distributed Gnutella/Web crawlers (Dataflow, UCB) Lamport/Chandy snapshots (20 lines, Intel/Rice/MPI) Paxos distributed consensus (44 lines, Harvard) In Progress: OSPF routing (UCB) Distributed Junction Tree statistical inference (UCB) Outline Background The Connection: Routing as a Query Execution Model Path-Vector Protocol Example Query specification protocol implementation More Examples Realizing the Connection P2: Declarative Routing Engine Beyond routing: Declarative Overlays Conclusion Traditional Router Routing Protocol Control Plane Neighbor Table Forwarding updates Table updates Forwarding Plane Neighbor Table Forwarding Table Packets Routing Infrastructure Packets Traditional Router Review: Path Vector Protocol Advertisement: entire path to a destination Each node receives advertisement, add itself to path and forward to neighbors path=[a,b,c,d] a path=[b,c,d] path=[c,d] b b advertises [b,c,d] c c advertises [c,d] d Declarative Router P2 Engine Declarative Queries Control Plane Routing Protocol Input Tables Output Tables Neighbor Table Forwarding updates Table updates Forwarding Plane Neighbor Table Forwarding Table Packets Routing Infrastructure Packets Declarative Traditional Router Router Introduction to Datalog Datalog rule syntax: <result> <condition1>, <condition2>, … , <conditionN>. Head Body Types of conditions is body: Input tables: link(src,dst) predicate Arithmetic and list operations Head is an output table Recursive rules: result of head in rule body All-Pairs Reachability R1: reachable(S,D) link(S,D) R2: reachable(S,D) link(S,Z), reachable(Z,D) “For all nodes S,D, is a link from node a to node b” link(a,b) – “there If there is a link from S to D, then S can reach D”. reachable(a,b) – “node a can reach node b” Input: link(source, destination) Output: reachable(source, destination) All-Pairs Reachability R1: reachable(S,D) link(S,D) R2: reachable(S,D) link(S,Z), reachable(Z,D) “For all nodes S,D and Z, If there is a link from S to Z, AND Z can reach D, then S can reach D”. Input: link(source, destination) Output: reachable(source, destination) Towards Network Datalog Specify tuple placement Value-based partitioning of tables Tuples to be combined are co-located Rule rewrite ensures body is always single-site All communication is among neighbors No multihop routing during basic rule execution Enforced via simple syntactic restrictions Network Datalog Location Specifier “@S” R1: reachable(@S,D) link(@S,D) R2: reachable(@S,D) link(@S,Z), reachable(@Z,D) Query: reachable(@a,N) reachable(@M,N) link Input table: Output table: All-Pairs Reachability link link link @S D @S D @S D @S D @a b @b c @c b @d c @b a @c d a b c d reachable reachable reachable reachable @S D @a b @a c @b c @a d @b d @S D @S D @S D @b a Query: reachable(@a,N) @c a @d a @c b @d b @c d @d c Path Vector in Network Datalog R1: path(@S,D,P) link(@S,D), P=(S,D). R2: path(@S,D,P) link(@Z,S), path(@Z,D,P2), P=SP2. Query: path(@S,D,P) Add S to front of P2 Input: link(@source, destination) Query output: path(@source, destination, pathVector) Query Execution R1: path(@S,D,P) link(@S,D), P=(S,D). R2: path(@S,D,P) link(@Z,S), path(@Z,D,P2), P=SP2. Query: path(@a,d,P,C) link Neighbor table: link D @S D @S D @S D @a b @b c @c b @d c @b a @c d path @S link @S a Forwarding table: link D P @S b c path path D P d @S D P @c d [c,d] Query Execution R1: path(@S,D,P) link(@S,D), P=(S,D). R2: path(@S,D,P) link(@Z,S), path(@Z,D,P2), P=SP2. Query: path(@a,d,P,C) Matching variable Z = “Join” link Neighbor @S D table:Communication @a b link @S link D link @S patterns are identical to @b c @c b @d those in the actual @b path a vector @c protocol d a b path(@a,d,[a,b,c,d]) path Forwarding table: @S D @a d @S PP [a,b,c,d] D c path(@b,d,[b,c,d]) path d path @S D PP @S D P @b d [b,c,d] @c d [c,d] D c Sanity Check All-pairs shortest latency path query: Query convergence time: proportional to diameter of the network. Same as hand-coded PV. Per-node communication overhead: Increases linearly with the number of nodes Same scalability trends compared with PV/DV protocols Outline Background The Connection: Routing as a Query Execution Model Path-Vector Protocol Example Query specifications protocol implementation Example Queries Realizing the Connection Declarative Overlays Conclusion Example Routing Queries Best-Path Routing Distance Vector Dynamic Source Routing Policy Decisions QoS-based Routing Link-state Multicast Overlays (Single-Source & CBT) Takeaways: • Compact, natural representation • Customization: easy to make modifications to get new protocols • Connection between query optimization and protocols All-pairs All-paths R1: path(@S,D, P,C) link(@S,D,C), P=(S,D). R2: path(@S,D,P,C) link(@S,Z,C1), path(Z,D, P2,C2), C=C1+C2, P=SP2. Query: path(@S,D, P,C) All-pairs Best-path R1: path(@S,D,P,C) link(@S,D,C), P=(S,D). R2: path(@S,D,P,C) link(@S,Z,C1), path(@Z,D,P2,C2), C=C1+C2, P=SP2. R3: bestPathCost(@S,D,min<C>) path(@S,D,Z,C). R4: bestPath(@S,D,Z,C) bestPathCost(@S,D,C), path(@S,D,P,C). Query: bestPath(@S,D,P,C) Customizable Best-Paths R1: path(@S,D,P,C) link(@S,D,C), P=(S,D). R2: path(@S,D,P,C) link(@S,Z,C1), path(@Z,D,P2,C2), C=FN(C1,C2), P=SP2. R3: bestPathCost(@S,D,AGG<C>) path(@S,D,Z,C). R4: bestPath(@S,D,Z,C) bestPathCost(@S,D,C), path(@S,D,P,C). Query: bestPath(@S,D,P,C) Customizing C, AGG and FN: lowest RTT, lowest loss rate, highest capacity, best-k All-pairs All-paths R1: path(@S,D, P,C) link(@S,D,C) , P=(S,D). R2: path(@S,D,P,C) link(@S,Z,C1), path(@Z,D,P2,C2), C=C1+C2, P=SP2. Query: path(@S,D,P ,C) Distance Vector R1: path(@S,D,D,C) link(@S,D,C). R2: path(@S,D, Z,C) link(@S,Z,C1), path(@Z,D,W,C2), C=C1+C2 R3: shortestLength(@S,D,min<C>) path(@S,D,Z,C). R4: nextHop(@S,D,Z,C) nextHop(@S,D,Z,C), shortestLength(@S,D,C). Query: nextHop (@S,D,Z,C) Count to Infinity problem? Distance Vector with Split Horizon R1: path(@S,D,D,C) link(@S,D,C) R2: path(@S,D,Z,C) link(@S,Z,C1), path(@Z,D,W,C2), C=C1+C2 , W!=S R3: shortestLength(@S,D,min<C>) path(@S,D,Z,C). R4: nextHop(@S,D,Z,C) nextHop(@S,D,Z,C), shortestLength(@S,D,C). Query: nextHop(@S,D,Z,C) Distance Vector with Poisoned Reverse R1: path(@S,D,D,C) link(@S,D,C) R2: path(@S,D,Z,C) link(@S,Z,C1), path(@Z,D,W,C2), C=C1+C2, W!=S R3: path(@S,D,Z,C) link(@S,Z,C1), path(@Z,D,W,C2), C=, W=S R4: shortestLength(@S,D,min<C>) path(@S,D,Z,C). R5: nextHop(@S,D,Z,C) nextHop(@S,D,Z,C), shortestLength(@S,D,C). Query: nextHop(@S,D,Z,C) All-pairs All-Paths R1: path(@S,D,P,C) link(@S,D,C), P= (S,D). R2: path(@S,D,P,C) link(@S,Z,C1), path(@Z,D,P2,C2), C=C1+C2, P=SP2. Query: path(@S,D,P,C) Dynamic Source Routing R1: path(@S,D,P,C) link(@S,D,C), P= (S,D). R2: path(@S,D,P,C) link(@Z,D,C2), path(@S,Z,P1,C1), C=C1+C2, P=P P=SP 1D. 2. Query: path(@S,D,P,C) Predicate reordering: path vector protocol source routing dynamic Other Routing Examples Best-Path Routing Distance Vector Dynamic Source Routing Policy Decisions QoS-based Routing Link-state Multicast Overlays (Single-Source & CBT) Outline Background The Connection: Routing as a Query Realizing the Connection Dataflow Generation and Execution Recursive Query Processing Optimizations Semantics in a dynamic network Beyond routing: Declarative Overlays Conclusion Dataflow Graph UDP Rx Round Robin lookup Network Out CC Tx Messages Queue Queue Messages lookup CC Rx Network In Strands path ... UDP Tx Demux link Local Tables Single P2 Node Nodes in dataflow graph (“elements”): Network elements (send/recv, cc, retry, rate limitation) Flow elements (mux, demux, queues) Relational operators (selects, projects, joins, aggregates) UDP Rx lookup CC Rx Dataflow Strand Round Robin lookup CC Tx Queue Queue path ... Local Tables Strand Elements Input Tuples Element1 Element2 … Elementn Output Tuples Input: Incoming network messages, local table changes, local timer events Condition: Process input tuple using strand elements Output: Outgoing network messages, local table updates UDP Tx Demux link Rule Dataflow “Strands” UDP Rx lookup CC Rx Round Robin R2: path(@S,D,P) link(@S,Z), path(@Z,D,P2), lookup P=SP2. CC Tx Queue Queue path ... Local Tables UDP Tx Demux link Localization Rewrite Rules may have body predicates at different locations: R2: path(@S,D,P) link(@S,Z), path(@Z,D,P2), P=SP2. Matching variable Z = “Join” Rewritten rules: R2a: linkD(S,@D) link(@S,D) R2b: path(@S,D,P) linkD(S,@Z), path(@Z,D,P2), P=SP2. Matching variable Z = “Join” Dataflow Strand Generation R2b: path(@S,D,P) linkD(S,@Z), path(@Z,D,P2), P=SP2. Strand Elements Join path.Z = linkD.Z Project path(S,D,P) Send to path.S linkD linkD Join linkD.Z = path.Z path Project path(S,D,P) Send to path.S Network In Network In path Recursive Query Evaluation Semi-naïve evaluation: Iterations (rounds) of synchronous computation Results from iteration ith used in (i+1)th 10 9 8 7 6 5 4 3 2 1 Link Table 9 7 3-hop 4 8 2-hop 1-hop Path Table 1 2 5 0 3 6 Network Problem: Unpredictable delays and failures 10 Pipelined Semi-naïve (PSN) Fully-asynchronous evaluation: Computed tuples in any iteration pipelined to next iteration Natural for distributed dataflows 10 9 6 3 8 5 2 7 4 1 Link Table Path Table 9 7 4 2 1 5 8 Relaxation of 0 semi-naïve 3 6 Network 10 Pipelined Evaluation Challenges: Does PSN produce the correct answer? Is PSN bandwidth efficient? I.e. does it make the minimum number of inferences? Duplicate avoidance: local timestamps Theorems: RSSN(p) = RSPSN(p), where RS is results set No repeated inferences in computing RSPSN(p) p(x,z) :- p1(x,y), p2(y,z), …, pn(y,z), q(z,w) recursive w.r.t. p Outline Background The Connection: Routing as a Query P2 Declarative Networking System Dataflow Generation and Execution Recursive Query Processing Optimizations Beyond routing: Declarative Overlays Conclusion Overview of Optimizations Traditional: evaluate in the NW context Aggregate Selections Magic Sets rewrite Predicate Reordering PV/DV DSR New: motivated by NW context Multi-query optimizations: Query Results caching Opportunistic message sharing Cost-based optimizations (work-in-progress) Neighborhood density function Hybrid rewrites Zone Routing Protocol Aggregate Selections Prune communication using running state of monotonic aggregate Avoid sending tuples that do not affect value of agg E.g., shortest-paths query Challenge in distributed setting: Out-of-order (in terms of monotonic aggregate) arrival of tuples Solution: Periodic aggregate selections Buffer up tuples, periodically send best-agg tuples Aggregate Selections Evaluation P2 implementation of routing protocols on Emulab (100 nodes) All-pairs best-path queries (with aggregate selections) Aggregate Selections reduces communication overhead More effective when link metric correlated with network delay Periodic AS reduces communication overhead further Outline Background The Connection: Routing as a Query Realizing the Connection P2: Declarative Routing Engine Beyond routing: Declarative Overlays Conclusion Recall: Declarative Routing P2 Engine Declarative Queries Control Plane Forwarding Plane Input Tables Neighbor Table updates Neighbor Table Output Tables Forwarding Table updates Forwarding Table Packets Routing Infrastructure Packets Declarative Router Declarative Overlays P2 Engine Declarative Queries Control and forwarding Plane Overlay topology tables Packets Packets Application level Internet Default Internet Routing Declarative Overlay Node Internet Declarative Overlays More challenging to specify: Not just querying for routes using input links Rules for generating overlay topology Message delivery, acknowledgements, failure detection, timeouts, periodic probes, etc… Extensive use of timer-based event predicates: ping(@D,S) :- periodic(@S,10), link(@S,D) P2-Chord Chord Routing, including: Multiple successors Stabilization Optimized finger maintenance Failure detection 47 rules 13 table definitions MIT-Chord: x100 more code Another example: Narada mesh in 16 rules 10 pt font Actual Chord Lookup Dataflow Join lookup.NI == node.NI TimedPullPush 0 Join lookup.NI == node.NI L3 Join bestLookupDist.NI == node.NI TimedPullPush 0 Select K in (N, S] Project lookupRes Agg min<D> on finger D:=K-B-1, B in (N,K) TimedPullPush 0 Agg min<BI> on finger D==K-B-1, B in (N,K) RoundRobin Materializations TimedPullPus h0 Insert node Insert Demux (tuple name) finger TimedPullPus h0 Insert bestSucc Queue lookup bestLookupDist Mux L2 Join lookup.NI == bestSucc.NI Dup Network In L1 node bestSucc finger Demux (@local?) remote local Queue Network Out P2-Chord Evaluation P2 nodes running Chord on 100 Emulab nodes: Logarithmic lookup hop-count and state (“correct”) Median lookup latency: 1-1.5s BW-efficient: 300 bytes/s/node Moving up the stack Querying the overlay: Routing tables are “views” to be queried Queries on route resilience, network diameter, path length Recursive queries for network discovery: Distributed Gnutella crawler on PlanetLab [IPTPS ‘03] Distributed web crawler over DHTs on PlanetLab Oct ’03 distributed crawl: 100,000 nodes, 20 million files Outline Background The Connection: Routing as a Query Realizing the Connection Beyond routing: Declarative Overlays Conclusion A Sampling of Related Work Databases Recursive queries: software analysis, trust management, distributed systems diagnosis Opportunities : Computational biology, data integration, sensor networks Networking XORP – Extensible Routers High-level routing specifications Meta-Routing, Routing logic Future Directions Declarative Networking: Static checks on desirable network properties Automatic cost-based optimizations Component-based network abstractions Core Internet Infrastructure Declarative specifications of ISP configurations P2 deployment in routers Distributed Data Management on Declarative Networks Data Management Applications SQL, XML, Datalog Distributed Queries P2P Search, network monitoring, P2P data integration, collaborative filtering, content distribution networks… Distributed Algorithms P2: Declarative Networks Consensus (Harvard), 2PC, Byzantine, Snapshots (Rice/Intel), Replication Customized routes, DHTs, Flood, Gossip, Multicast Mesh Run-time cross-layer optimizations: Reoptimize data placement and queries Reconfigure networks based on data and query workloads Other Work Internet-Scale Query Processing PIER – Distributed query processor on DHTs http://pier.cs.berkeley.edu [VLDB 2003, CIDR 2005] P2P Search Infrastructures P2P Web Search and Indexing [IPTPS 2003] Gnutella measurements on PlanetLab [IPTPS 2004] Distributed Gnutella crawler and monitoring Hybrid P2P search [VLDB 2004] Contributions and Summary P2 Declarative Networking System Declarative Routing Engine Extensible routing infrastructure Declarative Overlays Rapid prototyping overlay networks Database fundamentals Query language New distributed query execution strategies and optimizations Semantics in dynamic networks Period of flux in Internet research Declarative Networks can play an important role Thank You