A hybrid networking architecture Malathi Veeraraghavan and Admela Jukan University of Virginia mvee@virginia.edu Feb. 28, 2010 1 Introduction A recent funding opportunity announcement1 defines hybrid networking as follows: A hybrid networking paradigm combines traditional packet and circuit switching concepts over a single integrated backbone network to provide differentiated network services to high-end science applications with different end-to-end networking performance requirements. Further, it lists key networking challenges of hybrid networks as: dynamic allocation of resources across multiple networking modes, hybrid networking traffic engineering services and inter-domain peering services, and protection and recovery mechanisms for hybrid networks. The purpose of this document is to define a hybrid networking architecture to meet these networking challenges. 2 Background: Current ESnet deployment and services Figure 1 shows the ESnet deployment as of Summer 2009. It consists of two nodes at each PoP, a core IP router and an Science Data Network (SDN) MPLS switch, which are interconnected by one or more 10Gbps links. There are multiple inter-PoP links, some interconnecting core IP routers and others interconnecting SDN switches. There are two primary types of connectivity services offered by ESnet2: IP-routed services and virtual circuit services. The IP-routed services are supported by the core IP routers, and the virtual circuit services are supported by the SDN switches. The latter also requires an OSCARS (On-demand Secure Circuits and Advance Reservation System) InterDomain Controller (IDC)3, which is a circuit scheduler through which users and applications can request circuits of known durations and rates for immediate or future use. In addition to the router/switch equipment at the PoPs, ESnet deploys its own routers in many of the site networksa, referred to as Provider Edge (PE) routers. Customer Edge (CE) routers owned and operated by the sites connect to these PE routers within the sites. PE routers typically connect to the core IP routers at the PoPs, though some site routers connect to the core SDN switches directly as well, e.g., Fermilab. Packet filters and route filters are executed on these site PE routers. It is preferable to execute these filters at the site PE routers than at core routers because core routers handle traffic from multiple sites, which would make the aggregate size of the filter files unmanageably large (per-site inbound and outbound filters often exceed 10K lines or more4). ESnet also peers with commercial networks and other research-and-education networks (RENs). a The term “site” is used to refer to national laboratories and other organizations connected to ESnet. 1 Figure 1: Summer 09 deployment [Source: http://www.es.net/pub/maps/current.pdf] 3 Integrated network The first step in designing a hybrid network, as per the definition stated in Section 1, is to determine equipment suitable as integrated nodes. The next step is to determine how these integrated nodes, located at PoPs and possibly some sites, are interconnected. 3.1 Integrated node For this context, an integrated node is defined as one that supports both (i) IP packet forwarding, and (ii) virtual circuit switching. Given the ubiquity of IP-routed services, IP packet forwarding capability is a must in any integrated node. For virtual circuit switching, there are a number of technology choices: SONET/SDH, WDM, MPLS variants, and Ethernet VLANs/carrier Ethernet. Circuit switches, such as SONET/SDH switches and WDM switches, are one potential choice. MultiService Provisioning Platforms (MSPPs), such as Ciena’s Core Director, integrate Ethernet VLAN capability with SONET/SDH switching. These MSPPs are used in Internet2’s Dynamic-Circuit Service (DCS) deployment. However, they do not support IP packet forwarding capability. MPLS is built into IP routers, and is the current solution used for the virtual circuit services provided by ESnet. Equipment such as Juniper’s M, MX and T series systems, and Cisco’s Catalyst 6500 series, among others, support IP packet forwarding, MPLS switching and Ethernet VLAN switching. VLAN switching is made more scalable with new standards defined under the umbrella term “Carrier Ethernet.” The InterDomain Controller Protocol (IDCP)5 has been defined to offer users inter-domain virtual circuits. Recalling Metcalfe’s law that value grows exponentially with the number of connected endpoints, this extension of virtual circuit services to inter-domain usage is highly important. SONET/SDH, WDM, and MPLS technologies have no data-plane constraints for scaling to inter-domain 2 usage. However, for Ethernet VLANs, Carrier Ethernet standards, such as IEEE 802.3ad and 802.3ah, are required to overcome scalability problems with the basic 12-bit VLAN identifier6. In summary, the key requirements for an integrated node are that it supports: (i) IP packet forwarding, and (ii) a virtual circuit switching technology that scales for interdomain usage. 3.2 Integrated links Figure 2: Integrated node and links Figure 2 depicts an integrated node that meets the high-level requirements specified in Section 3.1 in that it supports both IP packet-forwarding capability and a scalable virtual circuit switching capability. In this section, we focus on how the links can be shared between IP-routed and virtual circuit services. Shown in Figure 2 are access links to sites and peers and inter-PoP links, on both of which static circuits (shown in blue) and dynamic circuitsb (shown in red) can be provisioned. IP addresses will be assigned to the static circuits causing packets arriving on these to be handled by IP packet forwarding. Frames arriving on the dynamic circuits will be handled by the virtual circuit switch. Figure 3 shows an illustrative example of how integrated nodes can be deployed using a part of the ESnet topology. Five PoPs, PNWG, DENV, ALBU, ELPA and SUNN, and three sites, PNNL, LBL and LANL, are shown. Since some sites may just have IP routers while others may have these integrated nodes, as an example Figure 3 shows just an IP router in the PNNL site (which could be a PE or CE router) and integrated nodes at the LBL and LANL sites. Links between PoPs are shown to be 100GbE. Blue dashed lines are used to depict static circuits provisioned via a network management system, such as the Spectrum NMS currently used by ESnet2. A red dashed line is used to depict a dynamic circuit set up between LBL and LANL, which is created by the OSCARS IDC. Projects such as Terapaths7, StorNet8, and ESCPS9 will increase the number of end applications that request the use of virtual circuits. b The term “dynamic circuit” is assumed to be a generic term including both circuits and virtual circuits, depending on the technology used. ESnet refers to this service in [2] as just “virtual circuit” service. Hence these terms are used interchangeably in this document. 3 Figure 3: An example integrated (hybrid) network deployment for a part of the ESnet topology Such an integrated use of links for both IP-routed traffic and dynamic virtual circuit traffic requires the support of additional network functionality, which is described in the next section. 4 Additional functionality in hybrid networks As stated in Section 1, three networking challenges are identified for hybrid networks in [1]. These include: dynamic allocation of resources across multiple networking modes, hybrid networking traffic engineering services and inter-domain peering services, and protection and recovery mechanisms for hybrid networks. 4.1 Dynamic allocation of resources between IP-routed service and dynamic circuit service Capacity of all inter-PoP links and site/peer access links needs to be divided for IP-routed services managed by the Spectrum NMS and virtual circuit services controlled by OSCARS IDC. For example, each 100GbE link between PoPs may be divided into a 10Gb/s allocation for static circuits between PoPs to carry IP-routed traffic, and the remaining 90Gb/s allocated for dynamic circuit service. In practice, the OSCARS IDC could manage the creation/deletion of both dynamic circuits and static circuits, with the Spectrum NMS used for monitoring and other functions. Even in this scenario, capacity should still be divided between static circuits and dynamic circuits to prevent starvation of either of these two service types. For an optimal computation of capacity allocations for the two types of services, IP-routed and dynamic circuit services, network management systems that have a complete view of the current routing and traffic conditions are required. These consist of: 4 Hybrid route monitoring servers Basic route monitoring servers have been implemented to listen to but not actively participate in the distributed routing protocols executed by the route processors of IP routers. Special monitoring systems such as the OSPF Monitor10 have been successfully deployed in large ISP networks and have been integrated within the monitoring and management systems successfully in order to identify faults in IP networks11. Similar systems can be implemented for ISIS and BGP, and other routing protocols, if any, deployed by ESnet. New hybrid route monitoring servers, which are required to support hybrid capacity allocation servers (see below), could build on these systems, and add functionality, such as determining the cause of routing changes and the effects of these changes on routing in the network. Hybrid traffic monitoring servers Basic traffic monitoring servers are deployed in ISPs to periodically read out link loads measured by SNMP agents running within IP routers. Averaging is done on the order of 10-30sec. Traffic matrices (showing traffic levels between each pair of PoPs) can be estimated from these link-level measurements using techniques such as the Gravity model12. Other techniques used to determine traffic matrices is to deploy a full mesh of MPLS LSPs between PoPs with no rate policing/limiting, and then obtain SNMP traffic measurements on these LSPs13. Some such mechanism is required to determine traffic matrices for both the IP-routed and dynamic circuit service in order to then use the optimization tools to compute new capacity allocations. Hybrid capacity allocation servers Using information from the hybrid route monitoring and traffic monitoring servers, forecasting can be done to project expected traffic for a time interval into the future. Optimization algorithms can then be executed to determine the ideal capacity divisions between IP-routed and dynamic circuit services. The allocations determined for IP-routed service can be implemented with rate policing on the static circuits established between the integrated nodes. The allocations determined for the dynamic circuit service would need to be communicated to the OSCARS IDC for its use as it accepts/rejects requests for circuit reservations. This functionality of dynamic capacity allocation is important if a service provider chooses to operate the network at high levels of link utilization. However, service providers typically operate their links at less than 50% utilization both to absorb sudden surges, as in the REN community (e.g., Internet2 has a stated “headroom practice” of operating links at a maximum of 25-30% utilization to “enable researchers to engage in unpredictable large-bandwidth applications”14), and for handling additional traffic loads caused by rerouting if and when failures occur (an often cited reason by commercial providers for maintaining low link utilization). In other words, service providers typically overprovision their link capacities by adding new links as traffic loads increase. Furthermore, dynamic capacity allocation for the IP-routed service could require rerouting of some of the static circuits and/or rerouting of IP-layer traffic. Operations divisions of service providers typically have strong resistance to change the network topology because of the potential for “route flaps” and drastic changes in the end-end packet latency (e.g., greater than 10ms). For these reasons, the frequency with which dynamic capacity reallocations will be required is likely to be small. 4.2 Hybrid network traffic engineering services The ESnet services document [2] explains the traffic-engineering services deployed on today’s ESnet as follows: “ESnet employs a variety of techniques to make the best use of the resources deployed, these include: scavenger service, and site specific traffic engineering to support programmatic needs such as 5 LHC Tier1 to Tier2 support at FNAL.” The scavenger service is implemented to be “consistent with Internet2’s QBone Scavenger Service (QBSS)” and is done by “done by allocating a separate queue within each router in ESnet and configuring it with an aggressive drop profile and minimal service quota” so that large bulk transfers do not impact day-to-day traffic. The site-specific traffic engineering is implemented with Policy-Based Routing (PBR) to map specific traffic flows on to virtual circuits established through the dynamic circuit service (using OSCARS IDC). The integrated (hybrid) network architecture, described in Section 3, should build on these deployed traffic engineering services. There are two approaches, both of which can be viewed as hybrid element management systems: LambdaStation15 and HNTES16. In the Lambdastation approach, end applications, such as dCache/SRM, signal LambdaStation servers associated with IP routers that a particular flow being generated by the application would prefer the use of virtual circuits. The LambdaStation server communicates with the OSCARS IDC to dynamically create the circuit, and then configures the IP router using PBR to redirect packets corresponding to that flow to the newly established circuit. By deploying such Lambdastation servers, a better use of the two services types is accomplished with the help of end user applications. In the Hybrid Network Traffic Engineering Software (HNTES) approach, Netflow data collected by routers (which is currently enabled in ESnet routers) is analyzed offline using an Offline Flow Analysis Tool (OFAT). Unlike on the Internet, where P2P flows that implement port masquerading make 5-tuple identification of long flows challenging, in the ESnet context, it is easier to detect long flows generated by scientific applications. For example, the Unidata LDM application used by Climate scientists runs on a well-known TCP port, 388. An analysis of Internet2 Netflow data has already shown that many flows generated by this application are indeed of long durations (running tens of minutes to hours). When long flows are identified by OFAT, flow identifiers (which is a subset of the 5 tuple: source IP address, destination IP address, source port number, destination port number and IP protocol number) of long flows are placed in a monitored flow database (MFDB). The associated IP router is configured to mirror packets from these flows to the HNTES system. A flow monitoring software module is executed within the HNTES system, capturing these packets, and then communicating with the OSCARS IDC to create a dynamic circuit for a particular flow. Again PBR is used to redirect packets within the IP router to the newly established circuit. These two hybrid network traffic engineering schemes ensure a more-effective usage of the capacity allocations that are determined by the hybrid capacity allocation servers described in Section 4.1. The hybrid capacity allocation algorithms cannot be executed too often as it could cause instabilities. Therefore, these hybrid network traffic engineering servers ensure that traffic is more effectively spread between these two allocations. Even as user applications initiate flows directed to the IP-routed service, these traffic engineering servers redirect some of these flows selectively to the virtual circuit service to improve overall performance. 4.3 Protection and recovery mechanisms for hybrid networks The ESnet services document [2] notes that “ESnet’s multiple ring backbone topology insures that no single backbone circuit failure will cause an outage to a site. The internal routing protocols are configured to switch to a backup path within 2 seconds upon determining a backbone link has failed.” The topology in Figure 1 shows these rings. For example, if there is a failure on the link between DENV and KANS, the ring passing through HOUS, ELPA and ALBU can be used to reroute IP-routed traffic. This means restoration is occurring at the IP-routed layer (Layer 3). In the integrated (hybrid) network described in Section 3, since ESnet would has its own capability to provision circuits via its virtual circuit switching engines, instead of establishing single static circuits between PoPs for the IP-routed service as shown in Figure 3, two path-disjoint circuits could be 6 established for IP-routed service, with one circuit being the working path and the second, a protection path. This would consume more bandwidth but offer a faster restoration than the IP layer restoration implemented today. It however requires the virtual circuit switching technology implemented within the integrated nodes, as shown in Figure 2, to support automated protection switching schemes as offered by SONET or fast reroute of MPLS. Whether Carrier Ethernet supports such automated protection should be considered while choosing the integrated node equipment. Alternate schemes are possible in which IP routing tables could list routes via the backup virtual circuits, but then set the backup virtual circuits to a “down” state until failures occur on the primary virtual circuits. When a fault management NMS sees an alarm indicating a failure of a primary virtual circuit, it could signal the router to make the backup virtual circuit interface functional, allowing for the IP packet forwarding engine to immediately find reachability for addresses via the backup virtual circuits. This form of restoration would be faster than IP (Layer 3) restoration, such as the ring based solution used in ESnet today, since the latter requires IGP routing protocol messages to be exchanged before new reachability information is stable for addresses than become unreachable when a link fails. In addition to an immediate protection switch to a backup circuit, commercial service providers such as AT&T17 implement a two-phase approach leveraging the hybrid nature of these integrated networks. Phase 1 is the immediate protection switch to the backup circuit. Phase 2 consists of a hybrid faultmanagement NMS computing an alternate set of two paths (working and protection), and then communicating with either the OSCARS IDC or the Spectrum NMS to establish these circuits, based on whichever of these two software systems handles the static circuits required for the IP-routed service. 5 Literature review on hybrid networks Urushidani et al.18 uses the term hybrid networks in the following manner: “some academic backbone networks [Internet2 and GEANT2 are cited] focus on providing layer-1 circuit services as well as packet services by using hybrid network architectures composed of IP routers and next-generation SDH/SONET devices”. Effectively, this definition does not require an integrated network deployment in order to support the two types of services as in the definition provided in [1]. Gauger et al.19 defines a hybrid network as follows: “an optical network architecture is called hybrid if it combines two or more basic network technologies at the same time.” Three optical network technologies are named: optical packet switching (OPS), e.g., Nejabati20, optical burst switching (OBS), e.g., Qiao21 , and optical circuit switching (OCS) technologies operating on wavelengths, wavebands or fiber. Optical hybrid networks are then classified as: (a) client-server, (b) parallel, and (c) integrated. The IP-routed services layer is not considered as part of this definition of “hybrid” networks. In all three cases of “optical hybrid networks,” IP routers are considered the endpoints of the hybrid optical network. In client-server networks, OPS and OBS form “client layers” that use wavelength-, waveband- or fiber-based circuits established through the “OCS server layer.” IP packets from IP routers are carried within optical packets through an OPS network, or in bursts through an OBS network. These optical packet switches or optical burst switches are interconnected via optical circuits established a priori through the OCS network. In the parallel optical hybrid network, IP routers can feed packets directly into both the (i) OPS/OBS network, and (ii) OCS network. An example of a parallel hybrid network is a polymorphic multiservice optical network (PMON) proposed by de Miguel et al.22 The third class of hybrid architectures is the integrated hybrid network, where the optical circuit switching capability is integrated with an optical packet or burst switch. The hybrid optical switch (HOS) proposed work by Xin, et al.23 combines an OBS with an OCS. Another such integrated OBS/OCS hybrid node was proposed by Lee et al.24 7 Recently, Grid and cloud computing communities have used the term “hybrid networking” in the context of connectivity services for scientific communities, such as in a paper by de Laat et al.25 A paper by Yeh et al.26 discusses the important issue of alarm correlation in combined optical/IP networks. In summary, most of existing literature uses the concept of “optical hybrid networking” to refer to network technologies that combine different optical switching mechanisms (such as optical circuit switching, optical burst switching, and optical packet switching). Unlike the architecture described in this current paper, IP packet forwarding capability is not part of these “optical hybrid nodes.” In all cases, IP routers are edge devices that connect to these optical hybrid networks. In addition to these publications on “optical hybrid networking”, there is a significant amount of literature on integrating the design of IP routed networks with Routing and Wavelength Assignment (RWA) algorithms in optical circuit switched networks, such as SONET/SDH and WDM networks. These papers are not cited here as they are essentially separate networks with the optical circuit-switched networks serving the IP-routed networks by providing point-to-point connectivity between IP routers (e.g., Gauger et al.’s client-server architecture). 6 Summary There are several advantages to creating an integrated network, consisting of integrated nodes that support both IP packet forwarding and virtual circuit switching, and one set of shared inter-PoP and site/peer access links, on which two distinct types of services, IP-routed and dynamic virtual circuit services are supported. Equipment maintenance costs, collocation service costs and costs of wide-area link leases, will all be lower than in a solution where separate network equipment and separate links are deployed to offer customers these two types of services. Three networking functions are identified to enable the support of these dual services on this integrated single network. These include: (i) dynamic link capacity allocations for these two services, (ii) hybrid network traffic-engineering services to effectively use both capacity partitions, and (iii) hybrid fault management systems for improved protection and recovery capabilities. 1 Office of Science, Financial Assistance, Funding Opportunity Announcement DE-FOA-0000264, "High-Capacity Optical Networking and Deeply Integrated Middleware Services for Distributed Petascale Science," http://www.science.doe.gov/grants/FOA-10-0000264.html 2 Joseph Burrescia, Michael Collins, William Johnston, “ESnet Services and Service Level Descriptions Version 4.0,” July 17, 2009, http://www.es.net/hypertext/ESnetServiceLevels-V4.0.pdf 3 https://oscars.es.net/OSCARS/ 4 Conversation with Chin Guok, ESnet, October 2009. 5 Dante, Internet2, Canarie and ESNet (DICE), “Inter-domain Controller (IDC) Protocol Specification.” [Online]. http://www.controlplane.net/idcp-v1.1/idc-protocol-specification-v1.1-feb092010.pdf, Feb. 9, 2010. 6 Samer Salam and Ali Sajassi, “Provider Backbone Bridging and MPLS:Complementary Technologies for NextGeneration Carrier Ethernet Transport,” IEEE Communications Magazine, March 2008 7 TeraPaths: Configuring End-to-End Virtual Network Paths with QoS Guarantees, https://www.racf.bnl.gov/terapaths/ 8 StorNet, http://indico.fnal.gov/conferenceOtherViews.py?view=standard&confId=2970 9 End Site Control Plane System (ESCPS), http://indico.fnal.gov/conferenceOtherViews.py?view=standard&confId=2970 10 A. Shaikh and A. Greenberg, “OSPF monitoring: Architecture, design, and deployment experience,” in Proc. Networked Systems Design and Implementation, March 2004. 11 Use of OSPFMON in TSO Diagnostics tools used by Cisco, http://www.cisco.com/en/US/docs/ios/sw_upgrades/interlink/r2_0/sysmgmt/smtools.html#wp878634 12 A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya, and C. Diot, “Traffic matrix estimation: Existing techniques and new directions,” in Proc. of ACM SIGCOMM, Aug. 2002. 13 X. Xiao, A. Hannan, B. Bailey, and L. Ni, “Traffic engineering with MPLS in the Internet,” IEEE Network, no. 2, pp. 28-33, Mar/Apr 2000. 8 R. P. Vietzke, “Internet2 Headroom Practice,” Aug. 15, 2008, https://wiki.internet2.edu/confluence/download/attachments/17383/Internet2+Headroom+Practice+8-1408.pdf?version=1 15 The Lambda Station Project. [Online]. Available: http://www.lambdastation.org/ 16 Hybrid Network Traffic Engineering Software (HNTES) Software, http://www.ece.virginia.edu/mv/research/DOE09/documents/deliverables/feb2010/mv-hyntes.pdf. 17 A. Chiu, G. L. Choudhury, G. Clapp, R. Doverspike, J. W. Gannett, J. G. Klincewicz, G. Li, R. A. Skoog, J. L. Strand, A. C. Von Lehmen, and D. Xu, "Network Design and Architectures for Highly Dynamic Next-Generation IP-Over-Optical Long Distance Networks," Journal of Lightwave Technology, vol. 27, no. 12, pp. 1878-1890, June 2009. 18 Urushidani, S.; Abe, S.; Yusheng Ji; Fukuda, K.; Koibuchi, M.; Nakamura, M.; Yamada, S.; Shimizu, K.; Hayashi, R.; Inoue, I.; Shiomoto, K.; , "Design of versatile academic infrastructure for multilayer network services," Selected Areas in Communications, IEEE Journal on , vol.27, no.3, pp.253-267, April 2009 19 C. M. Gauger, et al., “Hybrid Optical Network Architectures: Bringing Packets and Circuits Together”, IEEE Communications Magazine, August 2006 20 Nejabati, R.; Zervas, G.; Simeonidou, D.; O'Mahony, M.J.; Klonidis, D.; , "The “OPORON” Project: Demonstration of a Fully Functional End-to-End Asynchronous Optical Packet-Switched Network," Lightwave Technology, Journal of , vol.25, no.11, pp.3495-3510, Nov. 2007 21 C. Qiao, M. Yoo, “Optical burst switching (OBS) - a new paradigm for an optical Internet, Journal of High Speed Networks, 8(1), 1999 22 Ignacio de Miguel, et al., “Polymorphic Architectures for Optical Networks and their Seamless Evolution towards Next Generation Networks,” Photonic Network Communications, Volume 8, Number 2, September 2004 23 C. Xin, C. Qiao, Y. Ye, S. Dixit, “A Hybrid Optical Switching Approach,” IEEE GLOBECOM 2003 24 Gyu Myoung Lee, Bartek Wydrowski, Moshe Zukerman,Jun Kyun Choi and Chuan Heng Foh, “Performance Evaluation of an Optical Hybrid Switching System,” IEEE Globecom 2003. 25 C. de Laat, et al., “A distributed topology information system for optical networks based on the semantic web,” Optical Switching and Networking, Volume 5, Issues 2-3, June 2008, Pages 85-93. 26 E. Yeh., et al, Design of Alarm Management System in Hybrid IP/Optical Networks, International Conference on Advanced Information Networking and Applications Workshop, 2009. 14 9