Hybrid Networks What are they? And why should the E-VLBI community care? Jerry Sobieski Mid-Atlantic Crossroads Sept 18, 2006 MIT Haystack Observatory Hybrid Networks • Hybrid Networks refer to emerging network technologies that allow the Internet to support both traditional packet based services as well as [new] connection oriented services. – What is old is new again () • So how do these “new” connection oriented services differ from the current Internet architecture? Outline • What’s the problem with the internet? – Why is it so hard to take advantage of high performance nets? – How does the network transport data – Some of the implications – and why these issues are still alive • What are hybrid networks? – How do they work – How they address the legacy performance issues • What is the state the art in hybrid networks? • Some prospects for the E-VLBI community – Some suggestions as to how H.N.s could be applied to the global consortium of VLBI resources… What is the problem? • Despite increasing reach and link capacity of global R&E networks, the networks are still unable to deliver a deterministic, predictable, and repeatable performance to demanding applications (such as EVLBI) – Real time transfer of E-VLBI data across the globe exposes these flows to unpredictable conditions within the network itself, and imposes stringent requirements on the end systems to adapt to these unpredictable conditions – Near real time transfers still require “fairly reliable” transfer of very large data sets. This transfer would [ideally] like to go as fast as possible…but this can be highly variable in best effort networks, and unconstrained flows wreak havoc on reliability and congestion mitigation protocols… • So what could possibly be happening in the network that makes a simple file transfer so difficult to make fast? (and how do these hybrid networks come into play?) The “best effort” Internet (basics concepts only) • Current Internet still relies on “best effort” service philosophy: – The network does not [cannot] guaranty delivery of every packet • Congestion can cause packets to be dropped (not enough buffers…) • Link errors corrupt a packet (no point delivering known faulty data..) – The network only uses one prefered path for forwarding packets – even when other paths are available, and even when congested. • Fast path forwarding in the router hardware are optimized for this • Traditional solution(s): – Increase the buffers at each hop along the path • Increases latency and jitter, and only reduces (does not eliminate) loss probability…any loss causes significant performance drop – Increase the backbone speed to better match aggregation needs • Doesn’t scale in the R&E world…How many 1GE access links does it take to fill a 10Gbps backbone link? So why do big flows create problems? (or: How do we establish “reliable” flows across the Internet?) • The Internet supports two key transport layer mechanisms: – User Datagram Protocol (UDP) • Transmits data packets from one application to another with no guaranty of delivery, i.e. if a packet is dropped…it just never shows up, and the application must be able to detect and deal with it. – Transport Control Protocol (TCP) • Reliable data transport protocol • Provides a mechanism to detect missing packets, and have the source re-transmit those packets. All packets are [re]transmitted until all are received and all are delivered to the appl in the order sent. Implications of TCP: #1: bw*delay tuning • In order to insure delivery of datagrams, the TCP sender must retain the sent datagrams in memory until they are acknowledged by the receiver. – This is known as the TCP “window”… – TCP will only allow the a “window’s worth” of data to be unacknowledged • i.e. TCP will send packets until the entire window has been sent, and will then only transmit more packets as previous packets are acknowledged. • This “window” must be large enough to fill the network link from source to destination (round trip delay X bandwidth) – A large BW*Delay product requires a large window so that the “pipe” remains full and does not sit idle awaiting ACKs. – So TCP Hosts must be “tuned” to work well in situations where the transfer is across long (global) distances with high capacity links. • Failure to tune TCP stacks is one of the most common reasons emerging applications are unable to take advantage of high performance R&E networks. ….but not the only reason. Implications of TCP: #2: congestion management • Consider the situation where many TCP flows converge on a single router… • As the output queues fill, arriving packets must be dropped… • TCP retransmits lost packets… • More data is dumped into the network…More congestion,…more TCP sessions drop packets…Still more retransmission data…worse congestion… – Result: Congestive collapse of the network • • • • In order to prevent congestive collapse, TCP slows (pretty much stops) the transmit rate when congestion is detected. – This allows the network to “catch up” and reduces the number of resends required. – TCP will back off whenever a packet goes missing – regardless of the …TCP can’t tell the difference, nor can it devine where the error occurred or why. In order find a maximum transmission rate that won’t cause congestion, TCP must slowly increase its transmission rate – “Slow Start” TCP Back-Off, or this slowing down of transmitting packet rate, and the “Slow Start” resumption is critical to proper operation of the internet as a whole, and provides a reliable delivery of datagrams... E-VLBI needs a predictable and repeatable performance environment to effectively use high speed netwokrs. Implications to High Performance Applications – like E-VLBI • • This “fair share”/”best effort” behaviour is necessary because TCP provides no apriori information about the flow to the network, and the network has not traditionally had the intelligence to use it had it been present… This TCP sharing works respectably well when the offered traffic in a TCP session is only a small fraction (<1%) of the overall link capacity, and many TCP sessions are present… – The occasional dropped packet affects very few flows, and those are only affected minimally • BUT…high performance applications may generate flows that consume all available bandwidth – i.e. any other traffic will cause significant variability in performance. – TCP may run at 1GE for seconds or minutes and then suddenly backoff due to a brief burst and then take many seconds to get back to speed. – This doesn’t work well for real-time applications, – And it means that pre-staging of files for near-real-time is unpredictable as well. Enter Connection Oriented Services • Connection Oriented Services are point to point paths set up across a network that have dedicated network resources associated with the path. – Example: Phone line- 64Kbps dedicated capacity between the calling party and the called party. – Example: OC192 SONET circuit- 9.4 Gbps between Onsala and Haystack (looking ahead) • In general, connection oriented services provide a means for the user to specify service requirements for a flow, and allows the network to allocate sufficient resources to this flow apriori (before the initiating the flow) and then to release those resources when the user no longer requires them. – This process is called provisioning, and includes path selection and establishment at each network element along the path. – Often this is manual process, sometimes semi-automated, – Emerging experimental networks (such as DRAGON, and similar projects in Japan and Europe) are developing the tools and technologies for fully automated circuit establishement. The emergence of connection oriented services in the Internet • Over the last decade, the most widely used technology for establishing connections within the Internet has been Multi-Protocol Label Switching (MPLS) – – • MPLS inserts a shim header into the IP packet that associates a packet with a specific forwarding path thru the network. All inbound packets on a given interface will be forward to a specific outbound interface, and the input shim label will be swapped at each hop to a new label recognized by the next hop. MPLS is an effective means of establishing virtual private networks between a group of end user sites – – Since it is integrated into IP routers, it is easy to leverage IP networks where these Label Switched Paths (LSPs) are small and numerous compared to backbone link capacity (<1%) Due to its reliance on expensive and complex router technologies, it is not cost effective if the number of MPLS LSPs is few (i.e. where LSPs require significant fraction of the link capacity >10%) and/or long lived. • • In these cases, lower layer circuits such as p2p ethernet or sonet are generally more cost effective Internal to internet service provider networks, MPLS LSPs are used for “Traffic Engineering” in order to allow these networks to overcome forwarding limitations in router hardware – LSPs create forwarding adjacencies (tunnels) across the network that make remote routers appear as neighbors to a local router. Thus providing the ability to sort traffic and distribute the traffic across multiple backbone links. Evolution of MPLS to GMPLS • MPLS introduced connection oriented capabilities into the Internet Service Provider network • OSPF-TE and RSVP-TE protocols provide the routing and provisioning capabilities for simplifying the creation of LSPs in the ISP networks • Other network technologies such as metro ethernet, sonet/sdh, DWDM, etc began to emerge as both alternatives and supplements to the IP backbone. • So the routing and signaling protocols of MPLS Traffic Engineering were extended to include a number of new switching technologies • Generalized MPLS or GMPLS was born. Generalized Multi-Protocol Label Switching - GMPLS GMPLS describes a hierarchy of switching types: Packet – MPLS based LSPs (PSC) Layer2 – Ethernet frame based (L2SC) Time Division Multiplexed – SONET/SDH (TDM) Lambda – Wavelength switching (LSC) Fiber – Optical fiber switching (FSC) GMPLS also includes the routing and signaling protocol extensions to support these technologies: GMPLS-OSPF-TE, and GMPLS-RSVP-TE “Light Paths”: Terminology for the New Millenium • • • • • The emergence of very high capacity and low cost optical wavelength based telecommunications technologies made the prospect of dedicated and [almost] free capacity an attractive and seemingly achievable networking nirvana… Alas, waves are not free, or cheap… But they are less expensive than traditional carrier services, And they provide enormous capacity (10 Gbps is the norm today) So the concept of a wave, or “light path”, for every project that needed high capacity or predictable and repeatable performance began to take shape… – And it is now used to describe the new models for circuits and connection oriented services being explored in current optical networks • A “light path” is a new term that refreshes the ideas for connection oriented services – Light paths complement IP services, and are generally integrated with IP networks, and yet promote the proposition that dedicated, predictable, and repeatable network services are required even today with such high performance networks. Closing the loop: Hybrid Networks • Hybrid Networks refer to emerging network technologies that allow the Internet to support both traditional packet based services as well as [new] connection oriented services. • These services can coexist ! – A user can [will] be able to access both from their workstation/cluster/lab/etc – IP services will likely run over and in conjunction with Light Path services, but other data formats are possible… • These services will enable “affinity groups” to establish customized, dedicated, and highly dynamic network infrastructure that suits their needs – No longer will such specialized networks be expensive or complex – Such specialized networks will be able to evolve and morph to meet the changing needs of the collaborating organizations… State of the Art: • Many projects around the world are exploring hybrid architectures: – – – – – • Current capabilities: – – – – • DRAGON (NSF) – Washington, DC regional network including BOSnet link to Haystack HOPI (Internet2) and “newnet” - dynamic wave services, US footprint with international links GEANT (EU) JRA3, national/regional EU networks JGN II (NICT) - several experimental GMPLS testbeds (some with industry carriers) NLR – Static wave services US footprint, international access Static provisioning is common, but still slow (weeks to months) Dynamic establishment of light paths is available in pockets (DRAGON, HOPI, with initial successful experiments over UKLight, Netherlight, NorthernLight, and JGNII. User APIs being refined – much is already available, but still early. More user friendly interfaces coming this fall and next spring. AST demonstrated, but limited flexibility. Better version(s) available by SC06 and into the spring 07. Open research topics: – – – Inter-domain automated provisioning Advanced (bookahead) scheduling Dynamic ASTs DRAGON and Friends deployments: Operational contiguous GMPLS L2SC dynamic reach: JGN II JP SE UK NL HOPI DRAGON Application Specific Topologies • Many applications need something more than a single simple point-to-point connection… – These apps may need multiple connections between many different locations (e.g. E-VLBI) – These connections are need simultaneously with other nonnetwork resources (e.g. sensors, computational clusters, etc) – These resource sets may change physical layout based upon availability, but the logical topology is persistent… • Example: E-VLBI E-VLBI Application Specific Topology Logical e-VLBI Topology: Correlator Telescopes C X Z Y Physical Instantiations of the Application Specific Topology MIT Haystack, US X C Dwingeloo, NL Z NASA Goddard, US Y Onsala, SE Westford, US C Kashima, JP X Kashima, JP Z Y Koke Park, HI Seshan, CN Application Specific Topologies using XML <topology> <resource> <resource_type> eVLBI.Mark5a <name> Haystack.muk1 <ip_addr> muk1.haystack.mit.edu <te_addr> muk1-ge0.haystack.mit.edu <appl> /usr/local/evlbi_script </resource> <resource> <resource_type> eVLBI.Mark5a <name> Westford1 <ip_addr> wstf.haystack.mit.edu <te_addr> wstf-ge0.haystack.mit.edu <appl> /usr/local/evlbi_script </resource> <resource> <resource_type> EtherPipeBasic <src> Haystack.muk1 <dest> Westford.muk1 <datarate> 1 Gbs </resource> </topology> A C </resource_type> </name> </ip_addr> </te_addr> </appl> B A </resource_type> </name> </ip_addr> </te_addr> </appl> B </resource_type> </src> </dest> </datarate> C Applications Specific Topologies • Live demonstration at Internet2 Spring Member Meeting (April 2006, Washington DC) – See www.internet2.edu for webcast of “HOPI update” presentation. • Set up global multi-link topologies – ~30 seconds E-VLBI Application Specific Network VLSR VLSR Mark 5 Correlator/Compute Cluster VLSR Mark 5 Global R&E Hybrid Infrastructure VLSR Visualization station E-VLBI Application Specific Network VLSR VLSR Mark 5 Correlator/Compute Cluster VLSR Mark 5 HS Storage Cluster Global R&E Hybrid Infrastructure VLSR E-VLBI Application Specific Network VLSR VLSR Mark 5 Correlator/Compute Cluster VLSR Mark 5 HS Distributed Virtual Storage Global R&E Hybrid Infrastructure VLSR Hybrid Networks and E-VLBI • The E-VLBI community constitutes an “Affinity Group” – i.e. a group of collaborators whose common interests allow and encourage them to work with each other on a myriad of projects, sharing resources and expertise. • Hybrid network technologies will provide a set of tools to the E-VLBI community that will: – Resolve many technical challenges associated with EVLBI workflow process – Provide a broad range of capabilities that can be employed and integrated into future application architectures – Others… The Emerging Environment • Hybrid Networks are on their way… – Over the next 12-36 months, these services will become more common and broader reaching • User interfaces are improving and becoming easier to use – Web based graphical interfaces – Programatic APIs – GRID interfaces and integration • E-VLBI has a great deal of visibility in the network research and engineering world… – The network geeks see E-VLBI as one of those “defining” apps – The E-VLBI community should leverage this interest to move the science techniques and infrastructure forward The End • Jerry Sobieski – jerrys@maxgigapop.net Application Specific Topologies • The Simple E-VLBI application model: • A more detailed model: The Internet in 2006 • What happens to the Over Engineering BCP in 2006? – – – • • • Access speeds are now measured in Mbps -10 to 100 X Streaming video increases the average flow rate – nominally 10x, mpeg2 streaming at typically 4Mbps to 45Mbs, Every laptop now has GE. Data transport is no longer limited by the access hardware. How many FE (100Mbps) flows does it take to saturate a 10GE backbone? How many GE (1Gbps) flows does it take? This situation is exacerbated in the R&E environment by widespread FE/GE access to labs, dorms, data repositories, computational clusters, sensors, etc. Over Engineering is not effective in the R&E environment (and probably won’t work in the emerging enterprise environment much longer either.) This is why E-VLBI has had such difficulty using network links… • So how do we architect the network to provide the capacity and raw performance needed by new technology and the new applications and services?