Implementing Declarative Overlays Boon Thau Loo1 Tyson Condie1, Joseph M. Hellerstein1,2, Petros Maniatis2, Timothy Roscoe2, Ion Stoica1 1University of California at Berkeley, 2Intel Research Berkeley P2 Overlays Everywhere… Overlay networks are widely used today: Routing and forwarding component of large-scale distributed systems Provide new functionality over existing infrastructure Many examples, variety of requirements: Packet delivery: Multicast, RON Content delivery: CDNs, P2P file sharing, DHTs Enterprise systems: MS Exchange Overlay networks are an integral part of many large-scale distributed systems. Problem Non-trivial to design, build and deploy an overlay correctly: Iterative design process: Desired properties Distributed algorithms and protocols Simulation Implementation Deployment Repeat… Each iteration takes significant time and utilizes a variety of expertise The Goal of P2 Make overlay development more accessible: Focus on algorithms and protocol designs, not the implementation Tool for rapid prototyping of new overlays: Specify overlay network at a high level Automatically translate specification to protocol Provide execution engine for protocol Aim for “good enough” performance Focus on accelerating the iterative design process Can always hand-tune implementation later Outline Overview of P2 Architecture By Example Data Model Dataflow framework Query Language Chord Additional Benefits Overlay Introspection Automatic Optimizations Conclusion Traditional Overlay Node Packets In Overlay Program node route Packets Out ... Network State Traditional Overlay Node P2 Overlay Node Overlay description: declarative query language Runtime dataflows Planner maintain network state Overlay ... Program ... ... node route ... Local Tables P2 Query Processor Network Out Dataflow Network In Dataflow Packets In Overlay description: dataflow scripting language Packets Out Advantages of the P2 Approach Declarative Query Language Concise/high level expression Statically checkable (termination, correctness) Ease of modification Unifying framework for introspection and implementation Automatic optimizations Query and dataflow level Network Out Dataflow Network In Dataflow Data Model ... ... ... node route ... Local Tables Relational data: relational tables and tuples Two kinds of tables: Stored, soft state: E.g. neighbor(Src,Dst), forward(Src,Dst,NxtHop) Transient streams: Network messages: message (Rcvr, Dst) Local timer-based events: periodic (NodeID,10) Network Out Dataflow Dataflow graph Network In Dataflow Dataflow framework ... ... ... node route ... Local Tables C++ dataflow elements Similar to Click: Flow elements (mux, demux, queues) Network elements (cc, retry, rate limitation) In addition: Relational operators (joins, selections, projections, aggregation) Outline Overview of P2 Architecture By Example Data Model Dataflow framework Query Language Chord in P2 Additional Benefits Overlay Introspection Automatic Optimizations Conclusion Simple ring routing example Example: Ring Routing Objects “served” by successor Each node has an address and an identifier 3 60 0 58 13 56 Each object has an identifier. 15 42 18 40 37 33 24 28 22 Every node knows its successor Ring State Stored tables: node(NAddr, N) succ(NAddr, Succ, SAddr) 60 58 3 node(IP58,58) succ(IP58,60,IP60) 13 15 40 node(IP40,40) succ(IP40,58,IP58) 18 37 28 Example: Ring lookup Find the responsible node for a given key k? n.lookup(k) 3 60 0 58 13 56 if k in (n, n.successor) 15 return n.successor.addr else return n.successor. lookup(k) 42 18 40 37 33 24 28 22 Ring Lookup Events Event streams: lookup(Addr, Req, K) response(Addr, K, Owner) 60 3 node(IP58,58) succ(IP58,60,IP60) response(IP37,59,IP60) 58 13 15 n.lookup(k) node(IP40,40) 40 succ(IP40,58,IP58) n.successor) lookup(IP58,IP37,59) if k in (n, return n.successor.addr else return n.successor. lookup(k) 37 18 lookup(IP37 40,IP37,59) 28 Pseudocode Dataflow “Strands” Pseudocode: ... ... node succ ... Local Tables Network Out Dataflow Network In Dataflow n.lookup(k) Strand 1 if k in (n, n.successor) 2 return Strand n.successor.addr else ... return n.successor. lookup(k) Network Out Dataflow Network In Dataflow Dataflow Strand Strand 1 Strand 2 ... ... ... node succ ... Local Tables Strand Elements Event Stream Element1 Element2 … Elementn Actions Event: Incoming network messages, periodic timers Condition: Process event using strand elements Action: Outgoing network messages, local table updates Strand 2 ... ... ... node Stored tables node(NAddr, N) succ(NAddr, Succ, SAddr) Event streams lookup(Addr, Req, K) response(Addr, K, Owner) n.lookup(k) Network Out Dataflow Network In Dataflow Pseudocode Strand 1 Strand 1 succ ... Local Tables if k in (n, n.successor) return n.successor.addr else return n.successor.lookup(k) Event: RECEIVE lookup(NAddr, Req, K) Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr) & K in (N, Succ] Action: SEND response(Req, K, SAddr) to Req Network Out Dataflow Network In Dataflow Pseudocode to Strand 1 Strand 1 Strand 2 ... ... ... node succ ... Local Tables Event: RECEIVE lookup(NAddr, Req, K) Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr) & K in (N, Succ] Action: SEND response(Req, K, SAddr) to Req Dataflow strand lookup Join Match Join Match lookup.Addr = node.Addr lookup.Addr = succ.Addr node succ Select Filter K in (N,Succ) Project Format Response(Req, K,SAddr) Response n.lookup(k) if k in (n, n.successor) return n.successor.addr else return n.successor. lookup(k) Network Out Dataflow Network In Dataflow Pseudocode to Strand 2 Strand 1 Strand 2 ... ... ... node succ ... Local Tables Event: RECEIVE lookup(NAddr, Req, K) Condition: node(NAddr, N) & succ(NAddr, Succ, SAddr) & K not in (N, Succ] Action: SEND lookup(SAddr, Req, K) to SAddr Dataflow strand lookup Join Join lookup.Addr = node.Addr lookup.Addr = succ.Addr node succ Select K not in (N,Succ) Project lookup(SAddr, Req,K) lookup n.lookup(k) if k in (n, n.successor) return n.successor.addr else return n.successor. lookup(k) Strand Execution lookup lookup response Strand 2 ... ... ... node succ ... Local Tables Network Out Dataflow Network In Dataflow lookup lookup Strand 1 lookup/ response Actual Chord Lookup Dataflow Join lookup.NI == node.NI TimedPullPush 0 Join lookup.NI == node.NI L3 Join bestLookupDist.NI == node.NI TimedPullPush 0 Select K in (N, S] Project lookupRes Agg min<D> on finger D:=K-B-1, B in (N,K) TimedPullPush 0 Agg min<BI> on finger D==K-B-1, B in (N,K) RoundRobin Materializations TimedPullPus h0 Insert node Insert Demux (tuple name) finger TimedPullPus h0 Insert bestSucc Queue lookup bestLookupDist Mux L2 Join lookup.NI == bestSucc.NI Dup Network In L1 node bestSucc finger Demux (@local?) remote local Queue Network Out Query Language: Overlog “SQL” equivalent for overlay networks Based on Datalog: Declarative recursive query language Well-suited for querying properties of graphs Well-studied in database literature Static analysis, optimizations, etc Extensions: Data distribution, asynchronous messaging, periodic timers and state modification Query Language: Overlog Datalog rule syntax: <head> <condition1>, <condition2>, … , <conditionN>. Overlog rule syntax: <Action> <event>, <condition1>, … , <conditionN>. Query Language: Overlog Overlog rule syntax: <Action> <event>, <condition1>, … , <conditionN>. Event: RECEIVE lookup(NAddr, Req, K) Condition: lookup(NAddr, Req, K) & node(NAddr, N) & succ(NAddr, Succ, SAddr) & K in (N, Succ] Action: SEND response(Req, K, SAddr) to Req response@Req(Req, K, SAddr) lookup@NAddr(Naddr, Req, K), node@NAddr(NAddr, N), succ@NAddr(NAddr, Succ, SAddr), K in (N,Succ]. P2-Chord Chord Routing, including: Multiple successors Stabilization Optimized finger maintenance Failure recovery 47 OverLog rules 13 table definitions Other examples: Narada, flooding, routing protocols 10 pt font Performance Validation Experimental Setup: 100 nodes on Emulab testbed 500 P2-Chord nodes Main goals: Validate expected network properties Sanity Checks Logarithmic diameter and state (“correct”) BW-efficient: 300 bytes/s/node Benefits of P2 Introspection with Queries Automatic optimizations Reconfigurable Transport (WIP) Introspection with Queries With Atul Singh (Rice) and Peter Druschel (MPI) Unifying framework for debugging and implementation Same query language, same platform Execution tracing/logging Rule and dataflow level Log entries stored as tuples and queried Correctness invariants, regression tests as queries: “Is the Chord ring well formed?” (3 rules) “What is the network diameter?” (5 rules) “Is Chord routing consistent?” (11 rules) Automatic Optimizations Application of traditional Datalog optimizations to network routing protocols (SIGCOMM 2005) Multi-query sharing: Common “subexpression” elimination Caching and reuse of previously computed results Opportunistically share message propagation across rules lookup lookup Join Join Select lookup.Addr = node.Addr lookup.Addr = succ.Addr K not in (N,Succ) Join Join Select lookup.Addr = node.Addr lookup.Addr = succ.Addr K in (N,Succ) Project lookup Project response lookup(SAddr, Req,K) Response(Req, K,SAddr) Automatic Optimizations Cost-based optimizations lookup Join ordering affects performance Join lookup.Addr = node.Addr succ.Addr = Join lookup.Addr = = node.Addr succ.Addr Select K not in (N,Succ) Select K in (N,Succ) Project lookup Project response lookup(SAddr,R eq,K) Response(Req, K,SAddr) Open Questions The role of rapid prototyping? How good is “good enough” performance for rapid prototypes? When do developers move from rapid prototypes to hand-crafted code? Can we get achieve “production quality” overlays from P2? Future Work “Right” language Formal data and query semantics Static analysis Optimizations Termination Correctness Conclusion P2: Declarative Overlays Tool for rapid prototyping new overlay networks Declarative Networks Research agenda: Specify and construct networks declaratively Declarative Routing : Extensible Routing with Declarative Queries (SIGCOMM 2005) Thank You http://p2.cs.berkeley.edu P2 Latency CDF for P2-Chord Median and average latency around 1s.