DOC

advertisement
Ultra-fast IP link and interface provisioning
with applications to IP restoration
Panagiotis Sebos, Jennifer Yates, Guangzhi Li, Dongmei Wang,
Albert Greenberg, Monica Lazer, Charles Kalmanek and Dan Rubenstein*
AT&T Labs-Research, 180, Park Ave, Florham Park, NJ 07932, {psebos,jyates,gli,mei,albert}@research.att.com
*EE Dept, Columbia Univ., 500W 120th Str., Room 1312, NY, NY 10027; danr@ee.columbia.edu
Abstract: We integrate IP, data-link and transport capabilities for ultra-fast IP interface and link
provisioning, and demonstrate two IP restoration schemes that use re-configurable transport
networks: 1:N router interface protection and dynamic establishment of a new link upon failure.
1. Introduction
Ultra-fast IP network restoration is important for supporting new services such as video on demand and voice over
IP. Achieving ultra-fast IP restoration through the integration of mechanisms at the IP and transport layers has
attracted a great deal of interest [1-3].
We propose detailed mechanisms for ultra-fast IP link and interface provisioning, enabling 1:N IP interface and
link protection, using integrated IP and transport capabilities. Applications include IP network failure recovery,
traffic engineering and surge protection. A central idea is to equip each IP router with one or more spare interfaces
connected to a re-configurable transport. Via rapid IP provisioning of a new IP link using the spare interfaces on two
routers, the IP network can economically and rapidly adjust to a very wide spectrum of failure scenarios at the IP
layer and below [2]. While 1:N interface protection is a familiar concept in transport networks, it is unavailable
today in IP. In IP networks, rapid interface or link provisioning requires coping with complex interactions between
the transport, data-link and IP layers. It also requires coping with large and complex state information associated
with each IP router interface, including interface specific forwarding and packet handling information, distributed
across components of the interface hardware, with constraints on read and write access. To rapidly redirect traffic
from one interface to another, IP routing tables must be rewritten, the physical connectivity between routers has to
be re-configured and the data-link has to be optimized.
We have implemented the detailed mechanisms required in the transport network, data-link-layer and IP routers
to achieve ultra-fast IP interface and link provisioning. We measure the performance of these mechanisms in the
applications to 1:N IP interface protection and dynamic link establishment. 1:N interface protection for IP routers is
a new concept; it has significant potential where interfaces are currently a single point of failure in IP networks. The
idea of using a spare interface to dynamically establish a new link between two routers was proposed in [2] for
handling sudden traffic surges – we apply it here to handle failures. Our prototype implementation indicates that
there is no fundamental obstacle to realizing these important capabilities in next generation IP networks. Additional
research and experimentation are being conducted in collaboration with Avici Systems to demonstrate feasibility in
commercial routers with more complex IP services than demonstrated here.
2. Prototype Description
Figure 1 illustrates our experimental setup. The testbed uses two PC Linux-based software IP routers and four
transport nodes, each consisting of a photonic cross-connect (PXC) and a PXC controller running AT&T’s research
prototype GMPLS control plane [4]. The PC routers utilize Open Shortest Path First (OSPF) for IP routing and the
Optical Internetworking Forum (OIF) User-to-Network Interface (UNI) to communicate with the PXC control plane.
Figure 1. Experimental setup for (a) 1:N interface protection and
(b) dynamic establishment of a new link in response to failure.
1:N interface failure protection is illustrated in Figure 1(a). A link between routers R 1 and R2 (illustrated using a
heavy black line) is routed over the re-configurable transport network and a spare interface on R1 protects N
working interfaces (N = 1 is depicted). When a failure occurs, the IP router R 1 signals PXC1 to request that the
connection be switched from the primary interface to the spare interface. The switching is done locally at PXC1
without involving any other PXC or the remote router (R 2). IP traffic is re-routed from the failed interface to the
protection interface.
Figure 1(b) illustrates our experimental setup for dynamic link establishment. Traffic between routers R3 and R4
is initially routed over a direct point-to-point IP link (i.e., not routed over the PXCs). Upon failure, a new connection
is established over the PXCs to form a new IP link between the spare router interfaces. IP traffic is then re-routed
from the failed link to the newly established one.
The first step in failure recovery in both setups is to identify the failure. This can be achieved using link layer
detection, or via polling (keep-alive) mechanisms internal to the router to identify internal interface failures.
Once a failure is identified, physical layer connectivity is re-established. If using SONET/SDH router interfaces
connected to a SONET/SDH cross-connect, 1:N protection can be achieved using standard SONET/SDH automatic
protection switching (APS) to signal from the router to the SONET cross-connect. However, PXCs are unable to
terminate SONET overhead bytes so we instead adapt the OIF UNI to implement protection switching. UNI
signaling can also be used for protection switching for Ethernet interfaces, which do not support APS functionality.
For the dynamic link establishment (Figure 1(b)), we used the UNI and GMPLS to establish new connectivity
within the transport network. An arbitration scheme is required to avoid having the routers on either end of the failed
link request bandwidth when they detect the failure. In our implementation, the router with the higher router ID is
responsible for requesting the new connectivity. This is determined by having each router query its local routing
protocol database to determine the neighboring router’s ID.
Once the physical connectivity has been re-established, IP traffic must be re-routed from the failed interface to
the new working interface. The forwarding table in a typical router maps each IP address prefix to an outgoing
(physical) interface identifier. To allow packets to be rapidly rerouted to the protection interface, we introduce a
logical interface identifier that then maps to the outgoing physical interface. This additional level of redirection
avoids updating multiple entries in the forwarding table.
Minimizing service interruption time necessitates that our failure recovery be kept local, which requires that it be
concealed from IP routing protocols (in our case OSPF). Achieving this requires that the protection interface
impersonates the failed one by copying IP and link layer state and information, such as IP addresses, from the failed
interface to the protection interface. Commercial routers have more complicated state, with complex restrictions on
read and write access, than we had in our experiments; for example, access control lists, queuing parameters and
protocol specific information must be copied.
We experimented with Packet over SONET (POS) and Gigabit Ethernet technologies. These technologies differ
in how they establish link layer connectivity and carry IP traffic. Specifically, Gigabit Ethernet uses Medium Access
Control (MAC) addresses to identify network interfaces, and the Address Resolution Protocol (ARP) to map
between IP and MAC addresses. If the MAC addresses are statically configured on the interfaces, then the ARP
protocol should resolve the new IP to MAC address mappings after protection switching. We tuned the Linux kernel
to optimize the ARP negotiations. We also experimented with copying the MAC address from the failed interface to
the protection interface, avoiding the complexity of optimizing the ARP protocol.
POS uses PPP’s Link Control Protocol (LCP) to establish link layer connectivity and test the link, and uses the
Network Control Protocol (NCP) to configure network layer protocols (e.g., IP) [5]. Once physical connectivity is
established between two POS interfaces, PPP executes LCP and NCP synchronization. We found that the first LCP
synchronization message was lost due to race conditions as the two POS interfaces at either end of the link identify
that the link is available. LCP retransmits this message – the default retransmission timeout is 3 seconds, yielding
failure recovery exceeding 3 seconds. We dramatically reduced this by tuning the retransmission timer.
There were also a number of other optimizations required in the Linux kernel to achieve fast failure recovery,
including optimization of kernel to user space communications, prioritization of UNI signaling messages, and rapid
internal copying of routing table instances.
3. Experimental Results
The 1:N interface protection and dynamic link establishment experiments were conducted using both POS and
Ethernet interfaces on 1GHz Pentium III PCs. The PXCs used had a specified cross-connect time of less than 12ms.
Table 1 presents the average restoration times, measured using the number of lost packets when sending constant
bit-rate traffic through the routers. Failures were invoked by pulling out the fiber from the receive port of router R1
of Figure 1(a) for 1:N protection, and R3 of Figure 1(b) for dynamic link establishment.
Table 1: Restoration measurements
It can be observed that ultra-fast (sub-50ms) restoration times were measured for 1:N interface protection and
dynamic link establishment for both Gigabit Ethernet and POS interfaces. Generally, restoration times are smaller
downstream than upstream (R1 to R2 versus R2 to R1 for the 1:N protection experiments) because the failure directly
impacts only the upstream direction. However, in the 1:N Ethernet protection experiments we copied over MAC
addresses for R1, which added an additional 10ms to the downstream direction.
Transport network signaling was very fast, particularly for the 1:N protection which only requires local
reconfiguration. The rest of the recovery time included physical layer cross-connection, polling mechanisms within
the line cards to identify changes in link status (failed/working), updating IP addresses and routing tables and layertwo synchronization.
Before our kernel and link layer optimizations, we measured restoration times exceeding a second for Ethernet
and three seconds for POS interfaces. The optimizations we made are clearly crucial for ultra-fast recovery. For the
POS interfaces, we initially used the default PPP timers [5]; we reduced the LCP restart (retransmission) timer to
10ms to achieve the sub-50ms results above.
Our experiments did not include network-wide propagation delays – if the two routers are distant, then we will
have additional propagation delays which will slow the signaling messages, the LCP/NCP/ARP synchronization and
consequently the restoration times.
4. Conclusions
Our experiments have demonstrated ultra-fast failure recovery for 1:N router interface protection and dynamic link
establishment in response to failure. Router, link and transport layer modifications and optimizations were required,
but minimal, if any, protocol enhancements were necessary.
5. References
[1] Y. Ye et al., “A Simple Dynamic Integrated Provisioning/Protection Scheme in IP over WDM Networks,” IEEE Commun..
Mag., Nov. 2001.
[2] P. Pongpaibool et al., “Handling IP Traffic Surges via Optical Layer Reconfiguration,” OFC 2002.
[3] S. Phillips et al., “Network Studies in IP/Optical Layer Restoration,” OFC 2002.
[4] D. Wang et al., "Optical NNI Inter-working Demonstration," LEOS Summer Topical Meeting, July 2002.
[5] W. Simpson, editor, “The Point-to-Point Protocol (PPP),” IETF RFC 1661
Download