Hybrid Networks What are they? And why should the E-VLBI community care?

advertisement
Hybrid Networks
What are they?
And why should the E-VLBI
community care?
Jerry Sobieski
Mid-Atlantic Crossroads
Sept 18, 2006
MIT Haystack Observatory
Hybrid Networks
• Hybrid Networks refer to emerging network technologies
that allow the Internet to support both traditional packet
based services as well as [new] connection oriented
services.
– What is old is new again ()
• So how do these “new” connection oriented services
differ from the current Internet architecture?
Outline
• What’s the problem with the internet?
– Why is it so hard to take advantage of high performance nets?
– How does the network transport data
– Some of the implications – and why these issues are still alive
• What are hybrid networks?
– How do they work
– How they address the legacy performance issues
• What is the state the art in hybrid networks?
• Some prospects for the E-VLBI community
– Some suggestions as to how H.N.s could be applied to the
global consortium of VLBI resources…
What is the problem?
• Despite increasing reach and link capacity of global R&E networks,
the networks are still unable to deliver a deterministic, predictable,
and repeatable performance to demanding applications (such as EVLBI)
– Real time transfer of E-VLBI data across the globe exposes these flows
to unpredictable conditions within the network itself, and imposes
stringent requirements on the end systems to adapt to these
unpredictable conditions
– Near real time transfers still require “fairly reliable” transfer of very large
data sets. This transfer would [ideally] like to go as fast as possible…but
this can be highly variable in best effort networks, and unconstrained
flows wreak havoc on reliability and congestion mitigation protocols…
• So what could possibly be happening in the network that makes a
simple file transfer so difficult to make fast? (and how do these hybrid
networks come into play?)
The “best effort” Internet
(basics concepts only)
• Current Internet still relies on “best effort” service philosophy:
– The network does not [cannot] guaranty delivery of every packet
• Congestion can cause packets to be dropped (not enough buffers…)
• Link errors corrupt a packet (no point delivering known faulty data..)
– The network only uses one prefered path for forwarding packets – even
when other paths are available, and even when congested.
• Fast path forwarding in the router hardware are optimized for this
• Traditional solution(s):
– Increase the buffers at each hop along the path
• Increases latency and jitter, and only reduces (does not eliminate) loss
probability…any loss causes significant performance drop
– Increase the backbone speed to better match aggregation needs
• Doesn’t scale in the R&E world…How many 1GE access links does it take to
fill a 10Gbps backbone link?
So why do big flows create problems?
(or: How do we establish “reliable” flows across the Internet?)
• The Internet supports two key transport layer
mechanisms:
– User Datagram Protocol (UDP)
• Transmits data packets from one application to another with
no guaranty of delivery, i.e. if a packet is dropped…it just
never shows up, and the application must be able to detect
and deal with it.
– Transport Control Protocol (TCP)
• Reliable data transport protocol
• Provides a mechanism to detect missing packets, and have
the source re-transmit those packets. All packets are
[re]transmitted until all are received and all are delivered to
the appl in the order sent.
Implications of TCP:
#1: bw*delay tuning
•
In order to insure delivery of datagrams, the TCP sender must retain the
sent datagrams in memory until they are acknowledged by the receiver.
– This is known as the TCP “window”…
– TCP will only allow the a “window’s worth” of data to be unacknowledged
• i.e. TCP will send packets until the entire window has been sent, and will
then only transmit more packets as previous packets are acknowledged.
•
This “window” must be large enough to fill the network link from source to
destination (round trip delay X bandwidth)
– A large BW*Delay product requires a large window so that the “pipe” remains full
and does not sit idle awaiting ACKs.
– So TCP Hosts must be “tuned” to work well in situations where the transfer is
across long (global) distances with high capacity links.
•
Failure to tune TCP stacks is one of the most common reasons emerging
applications are unable to take advantage of high performance R&E
networks.
….but not the only reason.
Implications of TCP:
#2: congestion management
•
Consider the situation where many TCP flows converge on a single router…
• As the output queues fill, arriving packets must be dropped…
• TCP retransmits lost packets…
• More data is dumped into the network…More congestion,…more TCP sessions drop
packets…Still more retransmission data…worse congestion…
– Result: Congestive collapse of the network
•
•
•
•
In order to prevent congestive collapse, TCP slows (pretty much stops) the
transmit rate when congestion is detected.
– This allows the network to “catch up” and reduces the number of resends required.
– TCP will back off whenever a packet goes missing – regardless of the …TCP can’t
tell the difference, nor can it devine where the error occurred or why.
In order find a maximum transmission rate that won’t cause congestion, TCP must slowly
increase its transmission rate – “Slow Start”
TCP Back-Off, or this slowing down of transmitting packet rate, and the “Slow Start”
resumption is critical to proper operation of the internet as a whole, and provides a
reliable delivery of datagrams...
E-VLBI needs a predictable and repeatable performance environment to effectively use
high speed netwokrs.
Implications to High Performance
Applications – like E-VLBI
•
•
This “fair share”/”best effort” behaviour is necessary because TCP provides
no apriori information about the flow to the network, and the network has not
traditionally had the intelligence to use it had it been present…
This TCP sharing works respectably well when the offered traffic in a TCP
session is only a small fraction (<1%) of the overall link capacity, and many
TCP sessions are present…
– The occasional dropped packet affects very few flows, and those are only
affected minimally
•
BUT…high performance applications may generate flows that consume all
available bandwidth – i.e. any other traffic will cause significant variability in
performance.
– TCP may run at 1GE for seconds or minutes and then suddenly
backoff due to a brief burst and then take many seconds to get
back to speed.
– This doesn’t work well for real-time applications,
– And it means that pre-staging of files for near-real-time is
unpredictable as well.
Enter Connection Oriented Services
•
Connection Oriented Services are point to point paths set up across a
network that have dedicated network resources associated with the path.
– Example: Phone line- 64Kbps dedicated capacity between the calling party and
the called party.
– Example: OC192 SONET circuit- 9.4 Gbps between Onsala and Haystack
(looking ahead)
•
In general, connection oriented services provide a means for the user to
specify service requirements for a flow, and allows the network to allocate
sufficient resources to this flow apriori (before the initiating the flow) and
then to release those resources when the user no longer requires them.
– This process is called provisioning, and includes path selection and
establishment at each network element along the path.
– Often this is manual process, sometimes semi-automated,
– Emerging experimental networks (such as DRAGON, and similar projects in
Japan and Europe) are developing the tools and technologies for fully automated
circuit establishement.
The emergence of connection oriented
services in the Internet
•
Over the last decade, the most widely used technology for establishing connections
within the Internet has been Multi-Protocol Label Switching (MPLS)
–
–
•
MPLS inserts a shim header into the IP packet that associates a packet with a specific
forwarding path thru the network.
All inbound packets on a given interface will be forward to a specific outbound interface, and
the input shim label will be swapped at each hop to a new label recognized by the next hop.
MPLS is an effective means of establishing virtual private networks between a group
of end user sites
–
–
Since it is integrated into IP routers, it is easy to leverage IP networks where these Label
Switched Paths (LSPs) are small and numerous compared to backbone link capacity (<1%)
Due to its reliance on expensive and complex router technologies, it is not cost effective if the
number of MPLS LSPs is few (i.e. where LSPs require significant fraction of the link capacity
>10%) and/or long lived.
•
•
In these cases, lower layer circuits such as p2p ethernet or sonet are generally more cost effective
Internal to internet service provider networks, MPLS LSPs are used for “Traffic
Engineering” in order to allow these networks to overcome forwarding limitations in
router hardware
–
LSPs create forwarding adjacencies (tunnels) across the network that make remote routers
appear as neighbors to a local router. Thus providing the ability to sort traffic and distribute
the traffic across multiple backbone links.
Evolution of MPLS to GMPLS
• MPLS introduced connection oriented capabilities into
the Internet Service Provider network
• OSPF-TE and RSVP-TE protocols provide the routing and
provisioning capabilities for simplifying the creation of LSPs in
the ISP networks
• Other network technologies such as metro ethernet,
sonet/sdh, DWDM, etc began to emerge as both
alternatives and supplements to the IP backbone.
• So the routing and signaling protocols of MPLS Traffic
Engineering were extended to include a number of new
switching technologies
• Generalized MPLS or GMPLS was born.
Generalized Multi-Protocol Label
Switching - GMPLS
GMPLS describes a hierarchy of switching types:
Packet – MPLS based LSPs (PSC)
Layer2 – Ethernet frame based (L2SC)
Time Division Multiplexed – SONET/SDH (TDM)
Lambda – Wavelength switching (LSC)
Fiber – Optical fiber switching (FSC)
GMPLS also includes the routing and signaling protocol
extensions to support these technologies:
GMPLS-OSPF-TE, and GMPLS-RSVP-TE
“Light Paths”: Terminology for the New
Millenium
•
•
•
•
•
The emergence of very high capacity and low cost optical wavelength
based telecommunications technologies made the prospect of dedicated
and [almost] free capacity an attractive and seemingly achievable
networking nirvana…
Alas, waves are not free, or cheap…
But they are less expensive than traditional carrier services,
And they provide enormous capacity (10 Gbps is the norm today)
So the concept of a wave, or “light path”, for every project that needed high
capacity or predictable and repeatable performance began to take shape…
– And it is now used to describe the new models for circuits and connection
oriented services being explored in current optical networks
•
A “light path” is a new term that refreshes the ideas for connection oriented
services – Light paths complement IP services, and are generally integrated
with IP networks, and yet promote the proposition that dedicated,
predictable, and repeatable network services are required even today with
such high performance networks.
Closing the loop: Hybrid Networks
• Hybrid Networks refer to emerging network technologies that allow
the Internet to support both traditional packet based services as well
as [new] connection oriented services.
• These services can coexist !
– A user can [will] be able to access both from their
workstation/cluster/lab/etc
– IP services will likely run over and in conjunction with Light Path
services, but other data formats are possible…
• These services will enable “affinity groups” to establish customized,
dedicated, and highly dynamic network infrastructure that suits their
needs
– No longer will such specialized networks be expensive or complex
– Such specialized networks will be able to evolve and morph to meet the
changing needs of the collaborating organizations…
State of the Art:
•
Many projects around the world are exploring hybrid architectures:
–
–
–
–
–
•
Current capabilities:
–
–
–
–
•
DRAGON (NSF) – Washington, DC regional network including BOSnet link to Haystack
HOPI (Internet2) and “newnet” - dynamic wave services, US footprint with international links
GEANT (EU) JRA3, national/regional EU networks
JGN II (NICT) - several experimental GMPLS testbeds (some with industry carriers)
NLR – Static wave services US footprint, international access
Static provisioning is common, but still slow (weeks to months)
Dynamic establishment of light paths is available in pockets (DRAGON, HOPI, with initial
successful experiments over UKLight, Netherlight, NorthernLight, and JGNII.
User APIs being refined – much is already available, but still early. More user friendly
interfaces coming this fall and next spring.
AST demonstrated, but limited flexibility. Better version(s) available by SC06 and into the
spring 07.
Open research topics:
–
–
–
Inter-domain automated provisioning
Advanced (bookahead) scheduling
Dynamic ASTs
DRAGON and Friends deployments:
Operational contiguous GMPLS L2SC dynamic
reach:
JGN
II
JP
SE
UK
NL
HOPI
DRAGON
Application Specific Topologies
• Many applications need something more than a single
simple point-to-point connection…
– These apps may need multiple connections between many
different locations (e.g. E-VLBI)
– These connections are need simultaneously with other nonnetwork resources (e.g. sensors, computational clusters, etc)
– These resource sets may change physical layout based upon
availability, but the logical topology is persistent…
• Example: E-VLBI
E-VLBI Application Specific Topology
Logical e-VLBI Topology:
Correlator
Telescopes
C
X
Z
Y
Physical Instantiations of the Application Specific Topology
MIT Haystack, US
X
C
Dwingeloo, NL
Z
NASA Goddard, US Y
Onsala, SE
Westford, US
C
Kashima, JP X
Kashima, JP
Z
Y
Koke Park, HI
Seshan, CN
Application Specific Topologies using XML
<topology>
<resource>
<resource_type> eVLBI.Mark5a
<name>
Haystack.muk1
<ip_addr> muk1.haystack.mit.edu
<te_addr> muk1-ge0.haystack.mit.edu
<appl>
/usr/local/evlbi_script
</resource>
<resource>
<resource_type> eVLBI.Mark5a
<name>
Westford1
<ip_addr> wstf.haystack.mit.edu
<te_addr> wstf-ge0.haystack.mit.edu
<appl>
/usr/local/evlbi_script
</resource>
<resource>
<resource_type> EtherPipeBasic
<src>
Haystack.muk1
<dest>
Westford.muk1
<datarate>
1 Gbs
</resource>
</topology>
A
C
</resource_type>
</name>
</ip_addr>
</te_addr>
</appl>
B
A
</resource_type>
</name>
</ip_addr>
</te_addr>
</appl>
B
</resource_type>
</src>
</dest>
</datarate>
C
Applications Specific Topologies
• Live demonstration at Internet2 Spring Member Meeting
(April 2006, Washington DC)
– See www.internet2.edu for webcast of “HOPI update” presentation.
• Set up global multi-link topologies
– ~30 seconds
E-VLBI Application Specific Network
VLSR
VLSR
Mark 5
Correlator/Compute Cluster
VLSR
Mark 5
Global R&E Hybrid
Infrastructure
VLSR
Visualization station
E-VLBI Application Specific Network
VLSR
VLSR
Mark 5
Correlator/Compute Cluster
VLSR
Mark 5
HS Storage Cluster
Global R&E Hybrid
Infrastructure
VLSR
E-VLBI Application Specific Network
VLSR
VLSR
Mark 5
Correlator/Compute Cluster
VLSR
Mark 5
HS Distributed Virtual Storage
Global R&E Hybrid
Infrastructure
VLSR
Hybrid Networks and E-VLBI
• The E-VLBI community constitutes an “Affinity Group” –
i.e. a group of collaborators whose common interests
allow and encourage them to work with each other on a
myriad of projects, sharing resources and expertise.
• Hybrid network technologies will provide a set of tools to
the E-VLBI community that will:
– Resolve many technical challenges associated with EVLBI
workflow process
– Provide a broad range of capabilities that can be employed and
integrated into future application architectures
– Others…
The Emerging Environment
• Hybrid Networks are on their way…
– Over the next 12-36 months, these services will become more
common and broader reaching
• User interfaces are improving and becoming easier to
use
– Web based graphical interfaces
– Programatic APIs
– GRID interfaces and integration
• E-VLBI has a great deal of visibility in the network
research and engineering world…
– The network geeks see E-VLBI as one of those “defining” apps
– The E-VLBI community should leverage this interest to move the
science techniques and infrastructure forward
The End
• Jerry Sobieski
– jerrys@maxgigapop.net
Application Specific Topologies
• The Simple E-VLBI application model:
• A more detailed model:
The Internet in 2006
•
What happens to the Over Engineering BCP in 2006?
–
–
–
•
•
•
Access speeds are now measured in Mbps -10 to 100 X
Streaming video increases the average flow rate – nominally 10x, mpeg2 streaming at
typically 4Mbps to 45Mbs,
Every laptop now has GE. Data transport is no longer limited by the access hardware.
How many FE (100Mbps) flows does it take to saturate a 10GE backbone? How
many GE (1Gbps) flows does it take?
This situation is exacerbated in the R&E environment by widespread FE/GE access
to labs, dorms, data repositories, computational clusters, sensors, etc.
Over Engineering is not effective in the R&E environment (and probably won’t work in
the emerging enterprise environment much longer either.)
This is why E-VLBI has had such difficulty using network links…
•
So how do we architect the network to provide the capacity and raw performance
needed by new technology and the new applications and services?
Download