SPRING-OPEN SDN based WAN Control of Open Segment Routers An ONF TAG Project Saurav Das Project Lead & ONF Consultant Ciena talk, Oct 23rd, 2014 Outline • Motivation & Project Goals • Project Description • Progress • What next? Motivation: ONF Point of View SDN/OpenFlow successful • in DataCenters • with Software Switches • and Overlay networks But when it comes to Hardware switches, misconceptions abound • OpenFlow is not mature • OpenFlow does not work with current hardware • OpenFlow does not scale • SDN/OpenFlow is about centralized control OpenFlow has evolved towards production readiness. state 1.0 flows Q4 ‘09 ports 1.1 Q1 ‘11 + Group Tables + Multiple Tables/Pipelines: 1.2 1.3 + optical ports + synchronized tables Interface msg single message queue w/optional barriers + forward 1-in-n (ECMP) + match QinQ, MPLS, SCTP + match virtual ports + per-flow metering + tunnel-id Q2 ‘12 Q4 ‘13 forward {0, 1, n} match Eth, VLAN, IP, L4 + IPv6 + multiple controllers Q4 ‘11 1.4 behavior + extensible match + extensible actions + multiple channels (auxiliary connections) + bundle messages ONF TAG Project Goals 1. Demonstrate maturity and scale of the ONF work product in hardware readily available today using the latest stable versions of ONF protocols – eg. OF 1.3.4. 2. Provide feedback to ONF WGs on their work product from an implementation of the chosen networking scenario. 3. Promote adoption by creating a core-kernel that is extensible for value-add towards deployment, interoperability and differentiation. Non - Goals 1. Not creating GA product; no QA; will not be ready for production nor interoperate with other networks and network control planes. Will support some elements helpful for productization (eg. config, troubleshooting/OAM, visibility etc.) 2. Not delivering a specific service like Bandwidth-TE /VPN/NFV. Instead supporting core-capabilities to build such services on top (extensibility options) 3. Not a plugfest – data and control plane choices will be made; however choices should be replaceable by other parts, both commercial and open-source as long as they conform to the requirements N/w. Scenario: SDN based WAN Control Routing Service Discovery Forwarding Service Service Controller System Requests Requests Routing, Recovery, Label imposition OpenFlow SR Labels imposed by controller OSR FIB built by controller Open Segment Routers (OSR) One Way to Implement SR OpenDaylight or Cisco ONE or Juniper NorthStar PCE Controller PCEP for tunnel req & label imposition IETF working on extending all of these protocols for Segment Routing Eg. Cariden/Cisco or WANDL/Juniper BGP-LS for topology info OSPFv2 OSPFv3 ISIS Routing, Recovery, Label distribution (new in SR) Controller/PCE not required for certain use cases - just configure routers for SR via CLI Why Segment Routing Segment Routing (SR) or SPRING (IETF name) – Source Packet Routing In NetworkinG • Eliminates label distribution protocols – LDP and RSVP-TE • Thereby eliminates synchronization and state management complexities • Label distribution via OSPF or ISIS with suitable extensions (see IETF drafts) • Source routing via ‘segments’ • maps to ‘labels’ in MPLS data plane; • MPLS data plane unchanged – SR operations PUSH, NEXT, CONTINUE maps to MPLS operations PUSH, POP, SWAP (with same label) resp. • Introduces globally significant labels - node segments • retains locally significant labels – adjacency segments • can use ECMP shortest-paths and Explicit Paths (loose, strict); • can be used for TE/VPN/PBR/Service-chains Think of Segment Routing as giving new meaning to labels allowing different network operations and a simpler control plane without changing the data plane! Outline • Motivation & Project Goals • Project Description • Progress • What next? ONF TAG Project Core Requirements • Must work on Hardware + • Must use ONF Protocols + • Must use Available Commodity Parts + • Provide Feedback to Standards + • Diversity of Solutions + • Must be Extensible Project Deliverables 1. Open Segment Router on 1 hardware platform 2. WAN Controller • Supports Discovery and Routing Services • Label imposition for segment-routing/stitching • GUI/CLI, troubleshooting, stats 3. System Prototype & Demonstration • Segment routed island • Demonstrate discovery & several routing scenarios • Extensible towards deployment & interoperability 4. Feedback • What was not implemented and why? • Gaps/inefficiencies in protocol • HW requirements Project Milestones & Timeline Open Segment Router (OSR) WAN Controller Controller-OSR Integration System Prototype & Demonstration June 1st Aug 1st Oct 1st Dec 1st Routing Service: Scenario # 1 Default Routing using Node Segments, ECMP and PHP 10.10.4.0/24 102 104 PHP 106 101 10.10.6.0/24 10.10.1.0/24 Global label 106 imposed on pkts dst. to the 10.10.6 subnet Still 106 103 105 ECMP Paths 10.10.3.0/24 101, 102 … 106 are Node Segments allocated out of the SRGB, and bound to the router loopback addresses. Routing Service: Scenario # 2 Policy Routing 10.10.4.0/24 102 102 106 Anycast Node Segment 999 104 106 101 10.10.6.0/24 10.10.1.0/24 link X 104 106 103 105 10.10.3.0/24 Policy#1 – Traffic from .3 to .6 should avoid link X Policy#2 – Flow ‘f’ from .1 to .6 should stay in upper plane Routing Service: Scenario # 3 TE Support: Load-balancing among non-ECMP Paths 12009 106 10.10.4.0/24 102 104 106 101 10.10.6.0/24 10.10.1.0/24 Same adjacency segment 12009 assigned to both outgoing links for load-balancing at 102, to 104 or 103 103 105 Non-ECMP paths 10.10.3.0/24 Once at 104 or 103, it’s just SPF to 106 Routing Service: Scenario # 4 TE Support: Explicit Routing 10.10.4.0/24 102 102 103 105 104 106 10.10.6.0/24 10.10.1.0/24 101 Desired Explicit Path Requires label stack: 102 103 105 104 106 105 103 Pop 105 Push 104 106 Stitching Segments Deep-stacks can cause problems in merchant silicon 1) Cannot push many labels all at once 2) Can cause loss of entropy if hw cannot read down to L3/L4 headers Solution: use Segment stitching Routing Service: Scenario # 5 Service Chaining 10.10.4.0/24 103 9002 105 16555 106 102 104 106 10.10.6.0/24 10.10.1.0/24 101 Desired Chain 105 103 Adjacency Segment 9002 Firewall DPI Adjacency Segment 16555 Note: Could have used segment-stitching or labelswapping to avoid deep label stack SPRING-OPEN Data Plane Requirements Supporting Processes OS Distribution CPU Bare-metal Hardware OF Client Gluework SDK ASIC SPRING-OPEN Control Plane Requirements Routing Service Policy Routing Manager – ACL, TE Support, Service-Chains Default Routing Manager Discovery Service Forwarding Service Consistent Update Manager Network Snapshot Manager Stats/OAM Manager Controller System Config. Manager Recovery Services Resource Manager Visibility/Debug Fwk. Link/Nbr. Disc. C2D Sync Manager Conn. Mgr. / Event Engine C2C Sync Manager HA Manager Dist. DB SPRING-OPEN Control Plane Requirements Routing Service Policy Routing Manager – ACL, TE Support, Service-Chains Default Routing Manager Discovery Service Forwarding Service Consistent Update Manager Network Snapshot Manager Stats/OAM Manager Controller System Config. Manager Recovery Services Resource Manager Visibility/Debug Fwk. Link/Nbr. Disc. C2D Sync Manager Conn. Mgr. / Event Engine C2C Sync Manager HA Manager Dist. DB SPRING-OPEN Control Plane Requirements Routing Service Policy Routing Manager – ACL, TE Support, Service-Chains Default Routing Manager Recovery Services SFP Routing with Node Segments Use of ECMP and PHP Convergence Protection Connectivity management – ACL policies Avoiding links, nodes TE support – explicit strict paths Load balancing over non-equal-cost paths Service chaining SPRING-OPEN Control Plane Requirements Routing Service Policy Routing Manager – ACL, TE Support, Service-Chains Default Routing Manager Discovery Service Forwarding Service Consistent Update Manager Network Snapshot Manager Stats/OAM Manager Controller System Config. Manager Recovery Services Resource Manager Visibility/Debug Fwk. Link/Nbr. Disc. C2D Sync Manager Conn. Mgr. / Event Engine C2C Sync Manager HA Manager Dist. DB SPRING-OPEN Control Plane Requirements Network wide view of topology, traffic, capabilities and resource limits Maintains API for requests from routing, forwarding services & external req. Provides versioning Discovery Service Network Snapshot Manager Stats/OAM Manager Config. Manager Resource Manager Link/Nbr. Disc. LLDP based distributed Link/Neighbor Discovery Data Plane Stats Data Plane Troubleshooting Node/link characteristics, capabilities & constraints (eg. table-types, bw etc.) Scope of identifiers, namespaces & association with nodes/intfs Verifying configuration vs. discovered resources Proxy edge services – eg. ARP, ICMP SPRING-OPEN Control Plane Requirements Routing Service Policy Routing Manager – ACL, TE Support, Service-Chains Default Routing Manager Discovery Service Forwarding Service Consistent Update Manager Network Snapshot Manager Stats/OAM Manager Controller System Config. Manager Recovery Services Resource Manager Visibility/Debug Fwk. Link/Nbr. Disc. C2D Sync Manager Conn. Mgr. / Event Engine C2C Sync Manager HA Manager Dist. DB SPRING-OPEN Control Plane Requirements Responsible for consistency requirements when updating Multiple entries in a table Multiple tables in a switch Multiple switches in a network Forwarding Service Consistent Update Manager C2D Sync Manager C2C Sync Manager Responsible for syncing controller-to controller forwarding state Responsible for syncing controller-tocontroller forwarding state SPRING-OPEN Control Plane Requirements Routing Service Policy Routing Manager – ACL, TE Support, Service-Chains Default Routing Manager Discovery Service Forwarding Service Consistent Update Manager Network Snapshot Manager Stats/OAM Manager Controller System Config. Manager Recovery Services Resource Manager Visibility/Debug Fwk. Link/Nbr. Disc. C2D Sync Manager Conn. Mgr. / Event Engine C2C Sync Manager HA Manager Dist. DB SPRING-OPEN Control Plane Requirements Routing Service Routing Service Discovery Forwarding Service Service Controller System Controller System Visibility/Debug Fwk. Routing Service Discovery Forwarding Service Service Controller System Conn. Mgr. / Event Engine New Handshake – better error hd -- better SM REST OpenFlow 1.3 GUI CLI Dashboard Tseries Config RT view TTP1 TTP2 HA Manager Dist. DB Leader Election Typed Table Abstraction debugCounters debugEvents Discovery Forwarding Service Service Controller System TTP3 Support for EQUALS Dist. key-value store Persistence Notifications Outline • Motivation & Project Goals • Project Description • Progress • What next? Project Members Committed Considering Switch Development NTT (Lagopus) Dell (FTOS) Intel Broadcom Controller Development ON.Lab (ONOS) ONF Switch Contribution Delta Dell NTT Advisory, Engineering Testbed Verizon NTT Google Tencent Intel Broadcom ON.Lab Involvement SPRING-OPEN IPv4 unicast routing using MPLS labels, following Segment Routing rules A platform for multiple services: Multi-layer Overlay Security ONOS Typed Table Hardware A platform for multiple switch types: Software Switches Un-typed tabled hardware Optical Switches Control Application ONOS Graph API Network Graph (Eventually consistent global view) Intent F/W Topology Replica Instance 1 OpenFlow Manager+ Intent F/W Topology Replica Instance 2 OpenFlow Manager+ Intent F/W Topology Replica Instance 3 OpenFlow Manager+ Event Notifications Hazelcast Persistence RAMCloud Low-latency k/v store (Strongly Consistent) Zookeeper Control Application Distributed Registry (Strongly Consistent) Coordination v0.1.5 (current) ONOSONOS System Architecture Host +Floodlight Host Drivers 32 Host Progress mid-May master onos13 1st June -- OF 1.3 support -- Driver Manager -- I/O State Machine -- Role management -- Debug framework onos13integration 1st July 25 26 8th August 27 1st Sept mid Oct end Nov -- Unit tests -- Manual Integration New Changes (1.3 switches) - new OF Library (Loxigen) - new support for different switches - DriverManager - support for Role. EQUAL - simultaneous support for 1.0 and 1.3 switches - prototyping Test & Integration - integration with master - unit test coverage > master - ensured nightly tests are passing - ensured global context and app functionality - reviewed and merged to master New ONOS (1.0 + 1.3) Old ONOS (1.0 switches) - old state machine (or lack thereof) - old switch/port handling - registry service (zookeeper) - role management/changer - ONOS storage + upper functionality - old controller modified modified modified modified Newer Floodlight (1.0 switches) - new I/O state machine - new switch/port handling - new role management - new debug framework - new storage/sync-manager - new controller - switch manager - role manager Progress mid-May master onos13 1st June -- OF 1.3 support -- Driver Manager -- I/O State Machine -- Role management -- Debug framework onos13integration 1st July 25 -- Unit tests -- Manual Integration 26 8th August 27 1st Sept mid Oct end Nov -- Prototyping -- CPqD13 -- OVS13 -- Dell13 SPRING-OPEN Hardware Abstraction Pkt. + Incoming Meta- VLAN Ingress Packet Data + Flow Port Action Table Set {} [0] Termination MAC Flow Table [10] Unicast IPv4 Routing Flow Table [20] MPLS Forwarding z Flow Table [30] ACL Policy Flow Table [50] Apply Outgoing Actions -push/pop Packet Egress -TTL mpls Port -Set or -Output Group -Group Group Table Entries: L3 Unicast MPLS Unicast z ECMP Progress mid-May master onos13 1st June -- OF 1.3 support -- Driver Manager -- I/O State Machine -- Role management -- Debug framework onos13integration 1st July 25 -- Unit tests -- Manual Integration 26 8th August 27 -- Network 1st Sept mid Oct end Nov Config Manager -- Prototyping -- CPqD13 -- OVS13 -- Dell13 ONOS NetworkConfigManager Channel Config file Network Config Mgr. Startup Config Config Service CLI/ REST Running Config Topology Publisher host s Running Config ONOS Instance Channel Instance ONOS Startup Config Startup Config switches links ONOS Instance Startup Config Filtering Logic Yes Default Deny DENY No Has Config? Yes DENY No Allowed ? Restrict switche s? No Default Allow Has Config? No ACCEPT Yes Allowed ? No DENY Deny list Yes ACCEPT & ADD Allow list Yes ACCEPT & ADD Progress mid-May master onos13 1st June -- OF 1.3 support -- Driver Manager -- I/O State Machine -- Role management -- Debug framework ntt onos13integration 1st July cli gui 25 -- Unit tests -- Manual Integration dell 26 -- Prototyping 8th August 27 -- Network 1st Sept Config Manager -- CPqD13 -- OVS13 -- Dell13 onos-spring SR Prototype -- Saurav (ONF) -- Sangho (ON.Lab) -- Srikanth (Ericsson/ON.Lab partner) mid Oct end Nov Dell Switch Progress Delivered two switches with pre-alpha software for integration with controller Demo • Default Segment Routing with MPLS (node-segments) and ECMP shortest-paths - Communication between subnets across the SR WAN works • ARP/ICMP handling, subnet-configuration, pinging router-IPs (normal router behavior) works • Link and Switch failure recovery works • Policy routing works for one use-case - creating an SR tunnel and assigning flow(s) to it • Segment stitching works (where tunnel requires pushing more than 3 labels, and so we stitch-segments of the tunnel to get around hardware limitations) Demo 192.168.0.2 192.168.0.5 102 105 7.7.7.0/24 10.0.1.0/24 h1 h6 101 106 192.168.0.1 192.168.0.6 103 104 192.168.0.3 192.168.0.4 Outline • Motivation & Project Goals • Project Description • Progress • What next? Options for Extensibility • Extend the controller for hierarchical, geographically distributed control SDN WAN Architecture Global Controllers Local Controllers WAN links WAN links Google’s B4 Architecture Gateway Gateway Quagga Quagga Quagga RAP TE-AGENT OFC paxos Paxos Servers Site A OFA Switch OFA Switch OFA Switch OFA Switch Global TE Central CentralTE TE Servers Servers Site B Controllers Site C Controllers Servers Servers Switch hardware B4 WAN iBGP, ISIS Site B Site controllers Site C eBGP Data Center © 2013 SDN Academy, LLC™. All Rights Reserved. Data Center Data Center 47 Microsoft’s SWAN Architecture Network agent Datacenter Switch Datacenter Inter-DC WAN Service host Service broker SWAN controller F i gu r e 5: A r ch i t ect u r e of SW A N . of priority classes t imes t he number of DC pairs. Because SWA N support s t hree priority classes, we obt ain t hree t unnels wit h non-zero t raffic per DC pair on average, which is SDN WAN Architecture Global Controllers Local Controllers WAN links WAN links Options for Extensibility • • Extend the controller for hierarchical, geographically distributed control Add E-BGP on the controller for exchanging reachability information, route selection and more Options for Extensibility • • • • • • Extend the controller for hierarchical, geographically distributed control Add E-BGP on the controller for exchanging reachability information, route selection and more Provide L3VPN/VPLS/VPWS services Provide full blown TE solution with bandwidth optimization, calendaring etc. Extend control plane to work with optical switches / networks Interoperability with traditional LDP/IGP control plane IP Routing without an IGP 109 110 100 107 108 101 109 110 107 108 102 104 106 103 105 100 102 104 106 101 103 105 109 110 107 108 102 104 100 106 101 103 105 Consistent updates – loop free updates Segment Stitching 10.10.4.0/24 102 102 103 105 104 106 10.10.6.0/24 10.10.1.0/24 101 Desired Explicit Path Requires label stack: 102 103 105 104 106 105 103 Pop 105 Push 104 106 Stitching Segments Deep-stacks can cause problems in merchant silicon 1) Cannot push many labels all at once 2) Can cause loss of entropy if hw cannot read down to L3/L4 headers Solution: use Segment stitching B4’s In-Place Replacement Model Gateway Gateway Quagga Quagga Quagga RAP TE-AGENT OFC paxos Paxos Servers Site A OFA Switch OFA Switch OFA Switch OFA Switch Global TE Central CentralTE TE Servers Servers Site B Controllers Site C Controllers Servers Servers Switch hardware B4 WAN iBGP, ISIS Site B Site controllers Site C eBGP Data Center © 2013 SDN Academy, LLC™. All Rights Reserved. Data Center Data Center 54 SPRING-OPEN’s Parallel Nw Model SDN Fabric Traditional Network Parallel Network • parallel SDN fabric, interacts with traditional network and outside world using E-BGP • small number of sites • low volume of production traffic • as confidence is gained, grow users at site, increase footprint to more sites Options for Extensibility • • • • • • • • • • • • • Extend the controller for hierarchical, geographically distributed control Add E-BGP on the controller for exchanging reachability information, route selection and more Provide L3VPN/VPLS/VPWS services Provide full blown TE solution with bandwidth optimization, calendaring etc. Extend control plane to work with optical switches / networks Interoperability with traditional LDP/IGP control plane In-band control Add FRR to data plane recovery Deeper buffers & QoS in white-box platform Scale-out Segment Routers with white-boxes More OAM / troubleshooting features Security features Multicast/IPv6 … and much more Summary • Motivation & Project Goals • Demonstrate maturity & scale of ONF work product • Promote adoption by creating core-kernel • Project Description • SDN based WAN control of Open Segment Routers • Controllers, Bare-metal, merchant-Si, MPLS, OF1.3 • Prototype & Demonstrate several Segment Routing scenarios in 6 months – multi-member-company effort • Progress • Prototyping with software switches using OF1.3 • Integration with Dell hardware switch beginning Nov • Next • Lots of extensibility options for value-add, interoperability and deployment