A hardware implementation of a signaling protocol Haobo Wang, Malathi Veeraraghavan and Ramesh Karri Polytechnic University, New York haobo_w@photon.poly.edu, mv@poly.edu, ramesh@india.poly.edu Abstract 1 : Signaling protocols in connection-oriented networks are primarily implemented in software for two important reasons: its complexity and the requirement for flexibility. However, even with state-of-the-art processors, software implementations of signaling protocol are rarely capable of handling over 1000 calls/sec. Correspondingly, call setup delays per switch are in the order of milliseconds. Towards improving performance we implemented a signaling protocol in reconfigurable FPGA hardware. Our implementation demonstrates the feasibility of 100X and potentially 1000X speedup vis-à-vis its software counterpart. The impact of this work can be quite far-reaching by allowing circuit-switched networks to support a variety of new applications, possibly ones with short call holding times. 1. INTRODUCTION Signaling protocols are used in connection-oriented networks primarily to set up and release connections. Examples of signaling protocols include Signaling System 7 (SS7) in telephony networks [1], User Network Interface (UNI) and Private Network Network Interface (PNNI) signaling protocols in Asynchronous Transfer Mode (ATM) networks [2][3], Label Distribution Protocol (LDP) [4], Constraint-based Routing LDP (CR-LDP) [5] and Resource reServation Protocol (RSVP) [6] in Multi-Protocol Label Switched (MPLS) networks, and the extension of these protocols for Generalized MPLS (GMPLS) [7]-[11], which supports Synchronous Optical Network (SONET), Synchronous Digital Hierarchy (SDH) and Dense Wavelength Division Multiplexed (DWDM) networks. Signaling protocols are primarily implemented in software. There are two important reasons for this choice. First, signaling protocols are quite complex with many messages, parameters and procedures. Second, signaling protocols are updated often requiring a certain amount of flexibility for upgrading field implementations. While these are two good reasons for implementing signaling protocols in software, the price paid is performance. Even with the latest processors, software signaling protocol implementations are rarely 1 This work is sponsored by a NSF grant, 0087487, and by NYSTAR (The New York Agency of Science, Technology and Academic Research) through the Center for Advanced Technology in Telecommunications (CATT) at Polytechnic University. capable of handling over 1000 calls/sec. Correspondingly, call setup delays per switch are in the order of milliseconds. Towards improving performance, we undertook a hardware implementation of a signaling protocol. We used reconfigurable hardware, i.e., Field Programmable Gate Arrays (FPGAs) [12]-[16] to solve the inflexibility problem. These devices are a compromise between general-purpose processors used in software implementations at one end of the flexibility-performance spectrum, and Application Specific Integrated Circuits (ASICs) at the opposite end of this spectrum. FPGAs can be reprogrammed with updated versions as signaling protocols evolve while significantly improving the call handling capacities relative to software implementation. As for the challenge posed by the complexity of signaling protocols, our approach is to only implement the basic and frequently-used operations of the signaling protocol in hardware, and relegate the complex and infrequently-used operations to software. Our VHDL 2 implementation has been mapped into two FPGAs on the WILDFORCETM reconfigurable computing board - Xilinx® XC4036 FPGA with 62% resource utilization and XC4013 with 8% resource utilization. From the timing simulations, we determined that a call can be processed in 6.6s assuming a 25MHz clock yielding a call handling capacity of 150,000 calls/sec. Optimizing this implementation (for example, using pipelining) will increase the call handling capacities even further. The impact of this work can be quite far-reaching. By decreasing call processing time, it becomes conceivable to set up and tear down calls more often leading to a finer granularity of resource sharing and hence better utilization. For example, if a SONET circuit is set up and held for a long duration, given that data traffic using the SONET circuit is bursty, the circuit utilization is often low. However, if fast call setup/teardown is possible, circuits could be dynamically allocated when needed leading to improved utilization. Section 2 presents background material on connection setup and teardown procedures and surveys prior work on this topic. Section 3 describes the signaling protocol we 2 VHDL stands for VHSIC Hardware Description Language, where VHSIC stands for Very High Speed Integrated Circuits. implemented in hardware. Section 4 describes our FPGA implementation while Section 5 summarizes our conclusions. 2. maintenance of the switch, fault handling, etc., and various other details are not shown. control plane BACKGROUND AND PRIOR WORK Routing process In this section, as background material, we provide a brief review of connection setup and release. We also describe prior work on this topic. 2.1 Input signaling . . Interfaces . 1. Determine the next-hop switch toward which it should route the connection. This task can be accomplished by consulting routing tables. 2. Check for availability of required resources (link capacity and optionally buffer space) and reserve them. 3. Assign “labels” for the connection. The exact form of the “label” depends on the type of connectionoriented network. For example, in SONET/SDH networks, the label identifies an interface/timeslot pair, while in ATM networks, it is a Virtual Path Identifier/Virtual Channel Identifier (VPI/VCI) pair. 4. Program the switch fabric to map incoming labels to outgoing labels. 5. Update the connection state information. Signaling hardware accelerator NIC Input Interfaces . . . . line card Upon completion of data exchange, the connection is released with a similar end-to-end release procedure. With regards to implementation, we illustrate the typical architecture of a switch in connection-oriented networks in Figure 2. The user-plane hardware consists of switch fabric and line cards that terminate incoming/outgoing lines carrying user traffic. The control-plane unit consists of a signaling protocol engine, which could have a hardware accelerator as we are proposing, or be completely implemented in the software residing on the microprocessor. Management-plane processing is omitted from this figure. Also, all the software processes required for initialization, Space Fabric line card . . . NIC Output signaling Interfaces line card . . Output . Interfaces line card . Figure 2. Unfolded view of a switch 2.2 Prior work There are many signaling protocols as listed in Section 1. In addition, many other signaling protocols have also been proposed in the literature [18]-[27]. Some of these protocols such as Fast Reservation Protocol (FRP) [27], fast reservation schemes [20][21], YESSIR [18], UNITE [22] and PCC [23] have been designed to achieve low call setup delays by improving the signaling protocols themselves. FRP is the only signaling protocol that has been implemented in ASIC hardware. Such an ASIC implementation is inflexible. More recently, Molinero-Fernandez and Mckeown [28] are implementing a technique called TCP Switching in which the TCP SYNchronize segment is used to trigger connection setup and TCP FINish segment is used to trigger release. By processing these inside switches, it becomes comparable to a signaling protocol for connection setup/release. They are implementing this technique in FPGAs. 3. Figure 1. Illustration of connection setup NIC user plane Background Connection setup and release procedures, along with the associated message exchanges, constitute the signaling protocol, shown in Figure 1. Setting up a connection consists of five steps: NIC P Signaling process SIGNALING PROTOCOL In this section, we describe the signaling protocol we implemented in hardware. It is not a complete signaling protocol specification. Our approach is to define a subset large enough that a significant percentage of user requirements can be handled with this subset. Infrequent operations are delegated to the software, as shown in Figure 2. Therefore, often in this description, we will leave out details that are handled by the software. 3.1 Signaling messages 31 Message Length Setup Message Setup-Success Message 16 15 0 Msg.Typ (0001) Connection Reference (prev.) Destination IP Address Source IP Address Previous Node's IP Address Bandwidth Reserverd Interface Number Timeslot Number Pad Bits Checksum TTL Message Length Bandwidth Msg.Typ (0010) Connection Reference (prev.) Connection Reference(own) Reserved Checksum Release/ Message Length Cause Release-Confirm Message Connection Reference(own) Reserved Msg.Typ (0011/0100) Connection Reference (prev.) Checksum Figure 3. Signaling messages We defined four signaling messages, Setup, SetupSuccess, Release, and Release-Confirm. Figure 3 illustrates the detailed fields of these messages. The Message Length field specifies the length of the message. The Time-to-Live (TTL) field has the same function as in IP networks. The Message Type field distinguishes four different messages. The Connection Reference identifies a connection locally. The Source IP Address, Destination IP Address, and Previous Node’s IP Address specify the end hosts and the previous node respectively. The Bandwidth field specifies the bandwidth requirement of the connection. The Interface/timeslot pairs are used to program the switch fabric. If there are an odd number of interface/timeslot pairs, a 16-bit of pad is needed because the message is 32-bit aligned. The Checksum field covers the whole message. In Setup-Success message, Bandwidth field records the allocated bandwidth. In Release and ReleaseConfirm messages, Cause field explains the reason of release. The Message Length, Message Type and Connection Reference fields are common to all messages and allocated in the same relative position. Such an arrangement simplifies hardware design. 3.2 State transition diagram of a connection Closed (0001) Setup Message Received Release Confirm Received Release Sent (0100) Release or Release Confirm Received Release Message Received Setup Sent (0010) Setup Success Received Established (0011) Figure 4. State transition diagram of a connection Each connection passes through a certain sequence of states at each switch. In our protocol, we define four states, Setup-Sent, Established, Release-Sent and Closed. Figure 4 shows the state transition diagram. Initially, the connection is in the Closed state. When a switch receives a Setup message, if resource is available, the switch accepts the request, sends the message to the next switch on the path, and marks the state of the connection as Setup-Sent. When a switch receives a Setup-Success message, which means all switches along the path have successfully established the connection, the state of the connection changes to Established. Release-Sent means the switch has received the Release message, freed the allocated resources, and sent the message to the previous node. When the switch receives the Release-Confirm message, the connection is successfully terminated, the state of the connection returns to Closed. 3.3 Data tables Routing table CAC table Index Destination address Return value Next node address Next node interface# Index Next node address Return/Written value Total bandwidth Available bandwidth Index Conn. table Neighbor address Index State table Return/Written value Own connection reference Switch mapping table Return value Own interface# Neighbor interface# Connection reference Previous Next Index Own connection reference State Bandwidth Node address Previous Next Return/Written value Sequential offset(0 to BW-1) Incoming Ch. ID Outgoing Ch. ID Interface# Timeslot# Interface# Timeslot# Figure 5. Data tables used by the signaling protocol There are five tables associated with the signaling protocol, namely, Routing table, Connection Admission Control (CAC) table, Connectivity table, State table, and Switch-mapping table, shown in Figure 5. The Routing table is used to determine the next-hop switch. The index is the destination address; the fields include the address of the next switch and the corresponding output interface. The CAC table maintains the available bandwidth on the output interfaces leading to neighboring switches. The Connectivity table is used to map the output interfaces of neighboring switches to the local input interfaces. This information will be used to program the switch fabric. The State table maintains the state information associated with each connection. The connection reference is the index into the table. The fields include the connection references and addresses of the previous and next switches, the bandwidth allocated for the connection, and most importantly, the state information as defined in Figure 4. Switch fabrics such as PMC-Sierra’s PM5372, Agere’s TDCS6440G and Vitesse’s VSC9182 have similar programming interfaces. For example, VSC9182 has a 11-bit address bus A[10:0] and 10-bit data bus D[9:0]. The switch is programmed by presenting the output channel/timeslot number on A[10:0] and the input channel/timeslot number on D[9:0]. We define a generic Switch-mapping table to simulate this programming interface, with the connection reference as the index and the incoming interface/timeslot pair and the outgoing interface/timeslot pair as the fields. 4. FPGA-BASED IMPLEMENTATION OF SIGNALING To demonstrate the feasibility and advantage of hardware signaling, we implemented a signaling hardware accelerator in FPGA. We used the WILDFORCETM reconfigurable computing board shown in Figure 6. It consists of five XC4000 series Xilinx® FPGAs, one XC4036 (CPE0) and four XC4013 (PEn). These five FPGAs can be used for user logic while the crossbar provides programmable interconnections between the FPGAs. In addition, there are three FIFOs on the board, and one Dual Port RAM (DPRAM) attached to CPE0. The board is hooked to the host system through the PCI bus. The board supports a C language based API through which the host system can dynamically configure the FPGAs and access the on-board FIFOs and RAMs. These chips can easily process 65M lookups/sec. In our prototype implementation, we put-off the routing table lookup to an external Co-Processor, and used three memory accesses to simulate a routing table lookup. 4.2 State transition diagram of the signaling hardware accelerator ready reset error TTL expired chk TTL checksum failed Setup receive invalid parameter parse error no route rd route error bandwidth unavailable rd CAC state mismatch rd state Host Local Bus PCI chip 32 Release 32 wr CAC FIFO 0 SRAM FIFO 1 Signaling accelerator Release-Confirm FIFO 4 rd conn SetupSuccess allc cref CROSSBAR 36 error rd CAC 36 PCI Bus error other wr CAC free cref rd switch 36 Mezzanine memory bw>1 allc ts CPE 0 free ts bw>1 PE 1 PE 2 PE 3 wr switch PE 4 wr state ready Local Bus transmit Figure 6. Architecture of WILDFORCETM board Figure 7. State transition diagram of the signaling hardware accelerator For our prototype implementation, we used CPE0, PE1, FIFO0, FIFO1 and DPRAM. The CPE0 implements the signaling hardware accelerator state machine, State and Switch-mapping tables, FIFO0 controller, and DPRAM controller. The DPRAM implements the Routing, CAC and Connectivity tables. FIFO0 and FIFO1 work as receive and transmit buffers for signaling messages. PE1 implements the FIFO1 controller and provides the data path between CPE0 and FIFO1. Figure 7 shows the detailed state transition diagram of the signaling hardware accelerator. When a signaling message arrives, it is temporarily buffered in FIFO0. The hardware accelerator then reads the messages from FIFO0 and delimits the messages according to the Message Length field. The Checksum field is verified. The State table is referred to check the current state of the connection. Based on the Message Type field, the hardware accelerator processes messages accordingly. The processing of Setup message involves checking the TTL field, looking up the Routing table to determine the next switch and corresponding output interface, checking and updating the CAC table, referring to the Connectivity table to determine the input interface, allocating connection reference to identify the connection, allocating timeslots and programming the Switch-mapping table. The Setup-Success message requires no special processing. The processing of Release message involves updating the CAC table, releasing the timeslots reserved for the connection. When processing Release-Confirm message, the allocated connection reference is freed; the connection is finally terminated. After the message has been successfully processed, state table is updated. The new message is generated and buffered in FIFO1 temporarily, then transmitted to the next switch on the path. In the following subsections, we explain some of the simplifications we made, and describe the state transition diagram of the hardware accelerator. We also present two novel approaches to manage resources such as timeslots and connection references. 4.1 Reduce the table size Due to the limited size of the on-board DPRAM, we reduced the size of all five tables. The prototype implementation supports 5-bit addresses for all nodes, 16 interfaces per switch, a maximum bandwidth of 16 OC1s per interface and a maximum of 32 simultaneous connections at each switch. We can easily expand the tables to full size when designing a customized board. In recent years, there has been significant progress in fast table lookup in literature and industry. Lookup/classification Co-Processors are widely available, such as Silicon Access Networks’ iAP, PMC-Sierra’s ClassiPI, Solidum’s PAX1200. 4.3 Managing the available time slots 15 14 13 12 0 1 2 3 1 0 0 0 interface# / timeslot# 0 0 1 1 X X X X 1 ... 3 2 1 0 ... 1 1 1 0 ... 0 ... ... 1 1 1 0 4 ... 1 1 1 0 output time-slot 0 1 0 0 0 ... 1 1 0 0 scratch register mark used 5 Priority Decoder 2 0 3 12 13 14 15 6 write back 0 Figure 8. Timeslot generator Figure 8 illustrates the timeslot generator used to manage the available timeslots. Each entry in the timeslot table is a bit-vector, corresponding to an output interface with the bitposition determining the timeslot number and the bit-value determining availability of the timeslot (‘0’ available, ‘1’ used). The priority decoder is used to select the first available timeslot. The timeslot table works as follows. When an interface number and allocate control signal are provided by the signaling state machine (timeslot number is ignored), the bitvector corresponding to the interface is sent into the priority decoder and the first available timeslot is returned. Then the bit corresponding to the timeslot is marked as used (from 0 to 1) and the updated bit-vector is written back to the table. Deallocating a timeslot follows a similar pattern but the timeslot number will be used to determine the bit-position. In this example, timeslot 14 of interface 3 is allocated. 4.4 Figure 9 illustrates our improvement to this basic approach. A table with 256 entries of 16-bit vector is used to record the availability of connection references. The bit-vector corresponding to current pointer is sent to the priority decoder and the first available bit position is returned. 8-bit table pointer concatenated with 4-bit bit position forms a 12bit connection reference. The corresponding bit is marked as used (from 0 to 1) and the updated bit-vector is written back to the table. If all bits in current bit-vector are marked as used, the pointer will increment by one. De-allocating follows the similar pattern, the bit corresponding to the connection reference is unmarked (from 1 to 0) and the updated bitvector is written back to the table. We can parallelize this basic approach by partitioning the table into several smaller tables, each with a pointer and a priority decoder, forming several smaller generators. All these generators work concurrently. A round-robin style counter is used to choose a connection reference among the generators. 4.5 Simulation We developed a prototype VHDL model for the signaling hardware accelerator, used Synplify® tool for synthesizing the design and Xilinx® Alliance tool for placing and routing the design. CPE0 (XC4036) uses 62% of its resources while PE1 (XC4013) uses 8% of its resources. Managing the connection references in hardware 0 0 0 0 0 0 0 0 15 14 13 12 0 1 1 1 0 1 2 3 1 Table Pointer 7 1 1 ... 0 0 1 12 13 14 15 1 0 6 0 2 1 ... Priority Decoder ... 1 8 3 2 1 0 ... Write back 1 ... Figure 10: Time simulation of Setup message 5 3 ... 1 1 1 Pointer Adjust 1 0 ... 0 1 0 1 4 Scratch Register Mark Used 0 0 0 0 0 0 0 0 0 0 1 1 Output Connection Reference Figure 9. Connection reference generator A connection reference is used to identify a connection locally. It is allocated when establishing a connection and deallocated when terminating it. A straightforward implementation of a connection reference generator is a bitvector combined with a priority decoder. The priority decoder finds the first available bit-position (a bit marked as ‘0’), sends its index as the connection reference and updates the bit as used (a bit marked as ‘1’). However, this approach is impractical when there are up to 212 simultaneous connections, which requires a bit-vector with 4096 entries. We performed timing simulations of the signaling hardware accelerator using ModelSim® simulator. The simulation result for Setup message is shown in Figure 10. It can be seen that while receiving and transmitting a Setup message (requesting a bandwidth of OC-12 at a cross connect rate of OC-1) consumes 12 clock cycles each, processing of the Setup message consumes 53 clock cycles. Overall, this translates into 77 clock cycles to receive, process and transmit a Setup message. Processing Setup-Success, Release and Release -Confirm messages consumes about 70 clock cycles total since these messages are much shorter and simpler. A detailed break down of the clock cycles consumed to process each kind of signaling messages is shown in Table 1. Table 1 Clock cycles consumed by the various messages Setup Setup Release Release Success Clock cycles 77-1013 9 Confirm 51 10 5. ACKNOWLEDGMENTS We thank Reinette Grobler for specifying the signaling protocol, Brian Douglas and Shao Hui for the first VHDL implementation of the protocol. REFERENCES [3] 3 [7] [8] [9] [10] [11] [12] [13] CONCLUSIONS Implementation of signaling protocols in hardware poses a considerably larger number of problems than implementing user-plane protocols such as IP, ATM, etc. Our implementation has demonstrated the hardware handling of functions such as parsing out various fields of messages, maintaining state information, writing resource availability tables and switch mapping tables, etc., all of which are operations not encountered when processing IP headers or ATM headers. We also demonstrated the significant performance benefits of hardware implementation of signaling protocols, i.e., call handling within a few s. This preliminary implementation of our signaling protocol in VHDL shows quite promising results. Our current work is to implement CR-LDP for SONET networks in hardware. Overall, this prototype implementation of a signaling protocol in FPGA hardware has demonstrated the potential for 100X1000X speedup vis-à-vis software implementations on stateof-the-art processors. The impact of this work can be quite far-reaching allowing circuit-switched networks to support a variety of new applications, possibly ones with short call holding times. [2] [5] [6] Assuming a 25 MHz clock, this translates into 3.1 to 4.0 microseconds for Setup message processing and 2.8 microseconds for Setup-Success, Release and Release-Confirm message processing combined. Thus, a complete setup and teardown of a connection consumes about 6.6 microseconds. Compare this with the 1-2 milliseconds it takes to process signaling messages in software. We are currently optimizing the design to operate at 100 MHz thereby reducing the processing times even further. We are also exploring pipelined processing of signaling messages by selective duplication of the data path to further improve the throughput. [1] [4] T. Russell, Signaling System #7, 2nd edition, McGraw-Hill, New York, 1998. The ATM Forum Technical Committee, “User Network Interface Specification v3.1,” af-uni-0010.002, Sept. 1994. The ATM Forum Technical Committee, “Private Network-Network Specification Interface v1.0F (PNNI 1.0),” af-pnni-0055.000, March 1996. Based on a worst-case search through a four option routing table [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] L. Andersson, P. Doolan, N. Feldman, A. Fredette, B. Thomas, “LDP Specification,” IETF RFC 3036, Jan. 2001. B. Jamoussi (editor), et al., “Constraint-Based LSP Setup using LDP,” IETF RFC 3212, Jan. 2002. R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin, “Resource ReSerVation Protocol (RSVP) Version 1 Functional Specification,” IETF RFC 2205, Sept. 1997. E. Mannie (editor), “GMPLS Architecture,” IETF Internet Draft, draft-many-gmpls-architecture-00.txt, March 2001. E. Mannie (editor) et al., “GMPLS Extensions for SONET and SDH Control, IETF Internet Draft, draft-ietf-ccamp-gmpls-sonet-sdh-01.txt, June 2001. P. Ashwood-Smith, et al. “Generalized MPLS - Signaling Functional Description,” IETF Internet Draft draft-ietf-mpls-generalizedsignaling-05.txt, July 2001. P. Ashwood-Smith, et al, “Generalized MPLS Signaling - CR-LDP Extensions,” IETF Internet Draft, draft-ietf-mpls-generalized-cr-ldp03.txt, May 2001. P. Ashwood-Smith, et al. “Generalized MPLS - RSVP-TE Extensions,” IETF Internet Draft, draft-ietf-mpls-generalized-rsvp-te04.txt, July 2001. M. Gokhale, et al., “SPLASH: A Reconfigurable Linear Logic Array,” Intl. Conf. on Parallel Processing, 1990. A. K. Yeung, J. M. Rabaey, “A 2.4 GOPS Data-Driven Reconfigurable Multiprocessor IC for DSP,” Proc. of Intl. Solid State Circuits Conference, pp. 108-109, 1995. A. DeHon, “DPGA-Coupled Microprocessors: Commodity ICs for the Early 21st Century,” Proc. IEEE Symposium on FPGAs for Custom Computing Machines, Apr.1994. D. Clark and B. Hutchings, “Supporting FPGA Microprocessors Through Retargetable Software Tools,” Proc. IEEE Symposium on FPGAs for Custom Computing Machines, Apr.1996. J. E. Vuillemin, P. Bertin and D. Ronchin, “Programmable Active Memories: Reconfigurable Systems Come of Age,” IEEE Trans. on VLSI Systems, Mar.1996. D. Bertsekas and R. Gallager, “Data Networks,” Prentice Hall, 1987. P. Pan and H. Schulzrinne, “YESSIR: A Simple Reservation Mechanism for the Internet,” IBM Research Report, RC 20967, Sept. 2, 1997. G. Hjalmtsson and K. K. Ramakrishnan, “UNITE - An Architecture for Lightweight Signaling in ATM Networks,” in Proceedings of IEEE Infocom'98, San Francisco, CA. T. Hellstern, “Fast SVC Setup,” ATM Forum Contribution 97-0380, 1997. Larry Roberts, “Inclusion of ABR in Fast Signaling,” ATM Forum Contribution 97-0796, 1997. R. Ramaswami and A. Segall, “Distributed network control for wavelength routed optical networks,” IEEE Infocom’96, San Francisco, Mar. 24-28, 1996, pp. 138-147. M. Veeraraghavan, G. L. Choudhury, and M. Kshirsagar, “Implementation and Analysis of Parallel Connection Control (PCC),” Proc. of IEEE Infocom’97, Kobe, Japan, Apr. 7-11, 1997. I. Cidon, I. S. Gopal, and A. Segall, “Connection Establishment in High-Speed Networks,” IEEE Transactions on Networking, Vol. 1, No. 4, Aug. 1993, pp. 469-481. K. Murakami, and M. Katoh, “Control Architecture for NextGeneration Communication Networks Based on Distributed Databases,” IEEE Journal of Selected areas of Communication, Vol. 7, No. 3, Apr. 1989. L. A. Crutcher and A. Gill Waters, “Connection Management for an ATM Network,” IEEE Network Magazine, Vol. 6, pp. 42-55, Nov. 1992. P. Boyer and D. P. Tranchier, “A Reservation Principle with Applications to the ATM Traffic Control,” Computer Networks and ISDN Systems Journal 24, 1992, pp. 321-334. P. Molinero-Fernandez, N. Mckeown, “TCP Switching: Exposing Circuits to IP,” IEEE Micro, vol. 22, issue 1, Jan./Feb. 2002, pp. 8289. M. Veeraraghavan and R. Karri, “Towards enabling a 2-3 orders of magnitude improvement in call handling capacities of switches,” NSF proposal 0087487, 2001.