Intelligent ATM VC Management for Quality of Service Sensitive IP Flows Thesis KOM-S-0059 Gero Dittmann October 1999 Tutor: Dipl.-Wirtsch.-Inf. Jens Schmitt Industrial Process and System Communications (KOM) Prof. Dr.-Ing. Ralf Steinmetz Institute of Computer Engineering Department of Electrical Engineering and Information Technology Darmstadt University of Technology, Germany Intelligentes ATM VC Management f¸r Quality-of-Service sensible IP-Flows Studienarbeit KOM-S-0059 Gero Dittmann Oktober 1999 Betreuer: Dipl.-Wirtsch.-Inf. Jens Schmitt Fachgebiet Industrielle Prozefl- und Systemkommunikation (KOM) Prof. Dr.-Ing. Ralf Steinmetz Institut f¸r Datentechnik Fachbereich Elektrotechnik und Informationstechnik Technische Universit‰t Darmstadt, Deutschland Ehrenwˆrtliche Erkl‰rung Ehrenwˆrtliche Erkl‰rung Hiermit versichere ich, die vorliegende Diplomarbeit ohne Hilfe Dritter und nur mit den angegebenen Quellen und Hilfsmitteln angefertigt zu haben. Alle Stellen, die aus den Quellen entnommen wurden, sind als solche kenntlich gemacht worden. Diese Arbeit hat in gleicher oder ‰hnlicher Form noch keiner Pr¸fungsbehˆrde vorgelegen Darmstadt, den 21. Oktober 1999 Gero Dittmann -v- Table of Contents Table of Contents Ehrenwˆrtliche Erkl‰rung.................................................................................v Table of Contents............................................................................................. vii List of Figures ....................................................................................................xi List of Tables................................................................................................... xiii 1 Introduction .................................................................................................15 1.1 Background and Motivation ...................................................................15 1.2 New Approach........................................................................................16 1.3 Structure of the Document .....................................................................16 2 IP over ATM ...............................................................................................17 2.1 The Internet Protocol (IP).......................................................................17 2.2 Asynchronous Transfer Mode (ATM) ...................................................17 2.3 Classical IP over ATM ...........................................................................19 2.3.1 Architecture.................................................................................................... 19 2.3.2 Operation........................................................................................................ 20 2.4 Fore IP ....................................................................................................20 2.5 LAN Emulation ......................................................................................21 2.5.1 Architecture.................................................................................................... 21 2.5.2 Operation........................................................................................................ 23 2.6 IP Multicast over ATM ..........................................................................24 2.6.1 Data Distribution............................................................................................ 24 2.6.2 Address Resolution ........................................................................................ 25 3 IP QoS over ATM .......................................................................................27 3.1 Integrated Services and RSVP ...............................................................27 3.1.1 Controlled-Load Service ................................................................................ 27 3.1.2 Guaranteed Service ........................................................................................ 28 3.1.3 Admission Control and Policing .................................................................... 28 3.1.4 RSVP.............................................................................................................. 28 3.2 Differentiated Services ...........................................................................30 Thesis Gero Dittmann vii Table of Contents 3.2.1 Expedited Forwarding PHB ...........................................................................31 3.2.2 Assured Forwarding PHB...............................................................................31 3.3 ATM QoS ...............................................................................................31 3.3.1 Service Categories ..........................................................................................31 3.3.2 Service Parameters .........................................................................................32 3.3.3 Traffic Descriptors..........................................................................................33 3.4 Mapping IP QoS to ATM .......................................................................34 3.4.1 Different Approaches .....................................................................................34 3.4.2 Connection Management................................................................................34 3.4.3 Class and Parameter Mapping ........................................................................36 4 Network Protocol Stack in UNIX ..............................................................38 4.1 UNIX Device Drivers.............................................................................38 4.2 STREAMS..............................................................................................39 4.3 The IP STREAMS Stack ........................................................................40 5 Project Goals ...............................................................................................41 5.1 Previous Implementation........................................................................41 5.2 New Approach........................................................................................41 6 Software Architecture ................................................................................42 6.1 Filter Terminology..................................................................................42 6.2 Kernel Module........................................................................................42 6.2.1 Location of Interception .................................................................................42 6.2.2 VC Connection Establishment .......................................................................43 6.2.3 Kernel Module Commands.............................................................................44 6.2.4 Programming Language .................................................................................44 6.3 ATM Driver Interfaces ...........................................................................45 6.3.1 Underlying Best-Effort System ......................................................................45 6.3.2 ATM destination address resolution...............................................................46 6.3.3 ATM API........................................................................................................47 6.4 Structure Overall View...........................................................................47 6.5 Filtering ..................................................................................................49 6.5.1 Filter Matching ...............................................................................................49 viii Thesis Gero Dittmann Table of Contents 6.5.2 Relevant IP Header Fields.............................................................................. 49 6.6 User Space ..............................................................................................50 6.6.1 Programming Language................................................................................. 50 6.6.2 API ................................................................................................................. 50 7 The VCM Library .......................................................................................52 7.1 Survey.....................................................................................................52 7.2 Exceptions ..............................................................................................52 7.3 Class Reference ......................................................................................55 7.3.1 KernelInterface............................................................................................... 55 7.3.2 VCMcontrol ................................................................................................... 56 7.3.3 UNIcontrol ..................................................................................................... 57 7.3.4 VCDVCTuple ................................................................................................ 59 7.3.5 FilterRule ....................................................................................................... 60 7.3.6 Filter............................................................................................................... 61 7.3.7 QoS................................................................................................................. 63 7.3.8 VC .................................................................................................................. 64 7.3.9 PointToPointVC............................................................................................. 65 7.3.10 MultipointVC................................................................................................. 66 7.3.11 AddrPartyIDTuple ......................................................................................... 66 7.3.12 AddressResolver ............................................................................................ 67 7.3.13 SimpleAddressResolver ................................................................................. 67 7.4 Using VCM Library ...............................................................................68 7.4.1 Egress............................................................................................................. 68 7.4.2 Ingress ............................................................................................................ 68 8 The VCM Console .......................................................................................69 8.1 Implementation.......................................................................................69 8.1.1 The Classes .................................................................................................... 69 8.1.2 The Lists......................................................................................................... 69 8.1.3 The Parser ...................................................................................................... 69 8.2 VCM Console Commands......................................................................70 8.2.1 Notation.......................................................................................................... 70 8.2.2 Reference ....................................................................................................... 70 Thesis Gero Dittmann ix Table of Contents 9 Summary ......................................................................................................72 10 Evaluation ....................................................................................................73 11 Outlook ........................................................................................................74 Appendix A: Acronyms ...................................................................................75 Appendix B: References ..................................................................................77 x Thesis Gero Dittmann List of Figures List of Figures Figure 1: The ATM reference model. .................................................................................. 18 Figure 2: CLIP: Inter-LIS routing. ....................................................................................... 19 Figure 3: VCs in an ELAN................................................................................................... 23 Figure 4: MARS with multicast mesh.................................................................................. 25 Figure 5: MARS with MCS. ................................................................................................ 26 Figure 6: RSVP merge. ........................................................................................................ 30 Figure 7: QoS class mappings.............................................................................................. 36 Figure 8: The IP STREAMS stack....................................................................................... 40 Figure 9: Overall view of the VCM kernel components...................................................... 48 Figure 10: IP and TCP/UDP header fields for flow identification......................................... 50 Figure 11: The VCM library. ................................................................................................. 53 Figure 12: The VCM exception classes. ................................................................................ 54 Figure 13: KernelInterface class. ........................................................................................... 55 Figure 14: VCMcontrol class. ................................................................................................ 56 Figure 15: UNIcontrol class. .................................................................................................. 57 Figure 16: VCDVCTuple class. ............................................................................................. 59 Figure 17: FilterRule class. .................................................................................................... 60 Figure 18: Filter class............................................................................................................. 61 Figure 19: QoS class. ............................................................................................................. 63 Figure 20: VC class................................................................................................................ 64 Figure 21: PointToPointVC class........................................................................................... 65 Figure 22: Multipoint class. ................................................................................................... 66 Figure 23: AddrPartyIDTuple ................................................................................................ 66 Figure 24: Address Resolver Class ........................................................................................ 67 Figure 25: SimpleAddressResolver class............................................................................... 67 Thesis Gero Dittmann xi List of Figures xii Thesis Gero Dittmann List of Tables List of Tables Table 1: Service category attributes and guarantees........................................................... 33 Table 2: UNIcontrol methods. ............................................................................................ 58 Table 3: UNIcontrol exceptions.......................................................................................... 59 Table 4: FilterRule parameters. .......................................................................................... 60 Table 5: Filter reconfiguration methods. ............................................................................ 62 Thesis Gero Dittmann xiii List of Tables xiv Thesis Gero Dittmann Introduction 1 Introduction Over the past few years, a new form of communication has come in wide-spread use all over the industrialized world and has revolutionized interaction among people and between people and institutions, such as companies or public administration: the Internet. The Internet is based on the Internet Protocol (IP) suite. Originally developed in the 1970s for the U.S. military, it was integrated into the UNIX operating system and this way soon spread in the academic community. It was not before the 1990s when, with the invention of the World Wide Web (WWW), the Internet started to considerably gain ground in the private and in the business sector, a growth that still sustains today. This led to a sky-rocketing demand for data transmission capacities. In addition, today a growing demand for multimedia services, such as audio and video, can be observed. These applications put certain Quality of Service (QoS) requirements to the transmission lines, not only high bandwidth but also real-time parameters, e.g. low delay and jitter. Besides, most multimedia traffic is bursty, which makes it inefficient to use lines with a rigidly reserved bandwidth. In order to efficiently use their optic fibre lines for traditional telephony as well as for all kinds of data traffic at the same time, the telecommunications companies had started developing a new networking technology in the 1980s that would integrate all different kinds of traffic: ATM (Asynchronous Transfer Mode). Today, ATM holds a considerable share in the backbone market. Unfortunately, ATM was not developed with an eye on IP. Thus, a lot of the features that IP offers, from routing to multicast or the added QoS support, are difficult to realize on an ATM link layer, since the corresponding ATM mechanisms follow totally different philosophies. The overcoming of those problems has been subject to research projects for several years now, starting in 1994 with Classical IP over ATM [JNP99]. 1.1 Background and Motivation One of the research focuses at KOM is the QoS support of IP, in particular the Resource Reservation Protocol (RSVP). A goal is to enable RSVP to make reservations on an ATM link layer and to solve the above mentioned problems arising in this context. In [Zin97] a first implementation has been described. This implementation was supposed to serve as a proof of concept only, since it has some major drawbacks: ï The forwarding of incoming IP packets is handled in the user space, which is inefficient since the packets come from device drivers in the kernel space and need to be copied to and from the user space. ï The packets are duplicated, which is no violation of IP rules. Nevertheless, this means massive waste of transmission capacity. ï Multicast is not supported. The third issue has been attacked by a follow-up work described in [Rom98]. The first two problems can be solved by relocating the forwarding functionality to the kernel space and work directly on the device drivers. This has been the main objective of this thesis. Thesis Gero Dittmann 15 Introduction 1.2 New Approach What is presented in this paper is a mechanism to forward incoming IP packets in the kernel space directly to the appropriate ATM QoS connection without any copies being sent to the user space. The design aims not only at IntServ/RSVP as a user but is kept flexible enough to be easily extended to also support DiffServ or other IP QoS frameworks. The kernel module that handles all this is controlled from the user space via control messages. The whole functionality is encapsulated and made accessible to programmers by a C++ library with an easy-to-use interface. 1.3 Structure of the Document Chapters 2 to 4 give a brief overview of the technological background on which this thesis is based. Chapter 2 introduces the IP and ATM protocol standards and what has been defined for transmitting best effort IP traffic over ATM subnets. Chapter 3 gives a survey of QoS mechanisms in IP and ATM and how they could interwork. Chapter 4 provides some basic information about UNIX network drivers. Chapter 5 defines the problems that have been tried to solve within this project and chapter 6 documents the design decisions that have been made in the process and the resulting software structure. In chapter 7 the C++ library which is the interface to the software is described in detail. Chapter 8 gives an example application that uses the library to manually configure filters. In conclusion, chapters 9 to 11 summarize what has been achieved, compare this with the goals, and make suggestions for subsequent works. 16 Thesis Gero Dittmann IP over ATM 2 IP over ATM This chapter gives an overview of the different paradigms that IP and ATM networks follow. Then three techniques to make those protocols interwork are presented: Classical IP over ATM, Fore IP, and LAN Emulation. 2.1 The Internet Protocol (IP) On the network layer level, IP offers a connectionless packet-routed data transmission service. On the transport layer there are basically two alternatives for data transport to choose from: the Transmission Control Protocol (TCP), which offers connection-oriented service, or the User Datagram Protocol (UDP) for connectionless service. The transport layer takes data streams and breaks them up into datagrams of up to 64 Kbytes. In practice, the usual size is 1500 bytes. The datagrams are transmitted through the network and possibly further fragmented along the way. At the destination the original datagram is reassembled by the network layer. Finally, the transport layer restores the data stream. On layer 3, no guarantees are given with respect to the order of transmitted packets. Packets of the same data stream may take different paths through the network. IP was designed to make data transmission possible over all kinds of data link layers. Thus, no definitions below layer 3 have been made, except for the interface to layer 2. 2.2 Asynchronous Transfer Mode (ATM) With the growing demand for data transmission in the 1980s, the telephone companies (telcos) were looking for a way to avoid ending up with a variety of networking technologies, including traditional telephony, with different management systems each. Institutionalized in the American National Standards Institute (ANSI) and the International Telecommunication Union Telecommunication Standardization Sector (ITU-T, formerly CCITT), they developed a single new network that was supposed to replace the existing telephone system and all the data networks by offering services for all kinds of information transfer: ATM. Today, further standardization work is performed by the ATM Forum. The basic idea behind ATM is the fragmentation of all transmitted information into small, fixedsize packets called cells. A cell is 53 bytes long, a header of 5 bytes and 48 bytes of payload. The connection-oriented cell-switching of ATM combines the advantages of legacy network paradigms: ï It is flexible enough to handle both constant rate audio and video traffic as well as variable rate data traffic. ï Once a call is set up, all cells follow the same path to the destination. Thus, preservation of cell order is guaranteed. ï The fixed cell size makes it easy to perform switching at the hardware level at maximum speed. ATM operates at 155 Mbps and 622 Mbps and higher speeds have already been defined. These rates are based on the Synchronous Optical NETwork (SONET) system (a.k.a. SDH - Synchro- Thesis Gero Dittmann 17 IP over ATM nous Digital Hierarchy) for optic fibre lines. Because of this high bandwidth, ATM holds a considerable share not only in the telco market, but also in the campus backbone market for highspeed LAN interconnection. Furthermore, ATM is aimed to go all the way to the desktop and to replace todayís LAN technology. It is not clear whether this goal will be reached in future. ATM has its own reference model as depicted in Figure 1, different from the OSI model or the TCP/IP model. It consists of a physical layer, the ATM layer, the ATM Adaptation Layer (AAL), and upper layers for everything the users put on top of the first three. The AAL is responsible for taking larger packets from the user, segmenting these into cells, transmitting them, and reassembling them at the destination. There are four defined AAL protocols for different service classes: ï AAL 1 for real-time, constant bit rate, connection-oriented traffic, e.g. uncompressed audio and video. ï AAL 2 for real-time, variable bit rate, connection-oriented traffic, e.g. compressed audio and video. ï AAL 3/4 for connection-oriented or connectionless, variable bit rate traffic with no real-time requirements, e.g. data. ï AAL 5, which operates at the same service class as AAL 3/4, but with a more efficient use of the cell payload. IP packets are usually sent using AAL 5. Upper Layers ATM Adaptation Layer ATM Layer Physical Layer Figure 1 The ATM reference model. The signaling between an ATM switch and a user system is done over the User-Network Interface (UNI) as defined by the ATM Forum. Although Version 4.0 has been specified in 1996, it is not safe to expect anything later than UNI 3.1 [ATM94] in deployed switches. Thus, this thesis has been based on the latter. An ATM connection is called a Virtual Circuit (VC), and several VCs are combined in a Virtual Path (VP). A VC can be set up in advance by the network administrator. This is called a Permanent Virtual Circuit (PVC). Or it can be established dynamically by the user by giving the ATM address of the destination. This would be a Switched Virtual Circuit (SVC). Each cell of a virtual connection is tagged with the according identifiers (VCI/VPI). 18 Thesis Gero Dittmann IP over ATM 2.3 Classical IP over ATM 2.3.1 Architecture In the Classical IP model, as defined in [JNP99], an ATM network is treated as a link layer for the IP protocol stack. ATM networks under Classical IP are divided into Logical IP Subnets (LISes) in which all the members have the same IP network/subnetwork address and subnet mask. Each member is connected to the ATM network and can communicate with other members in the same LIS directly via ATM. Thus, a mesh of PVCs and/or SVCs are established among members of the LIS. Each member is able to map between IP addresses and ATM NSAP-format addresses using an ATM-based Address Resolution Protocol (ARP) and Inverse ARP service - ATMARP and InATMARP. One ATMARP/InATMARP server is used to provide address resolution in a unicast ATM environment for all members in the LIS. Traffic that goes from one LIS to another has to pass through a router which is a member of both LISes (Figure 2). AR P IP Router AR P Re qu es t Re sp on se ATM Network LIS 1 LIS 2 P AR nse po res P AR est qu Re ARP Server LIS 1 ATM Host ARP Server LIS 2 ATM Host Figure 2 CLIP: Inter-LIS routing. SVC management is performed via Q.2931 as specified in the ATM Forum UNI Specification [ATM94], which is a broadband signalling protocol designed to establish connections dynamically at the User-Network Interface (UNI). All signalling occurs over VPI/VCI 0/5. Q.2931 connections are bidirectional, with the same VPI/VCI pair used to transmit and receive. Once a Classical IP connection has been established, IP datagrams are encapsulated using the IEEE 802.2 Logical Link Control / SubNetwork Attachment Point (LLC/SNAP) standard [Hei93] and are segmented into ATM cells using AAL5. The default Maximum Transmission Unit (MTU) is 9,180 bytes with a maximum packet size of 65,535 bytes. Classical IP normally does not support IP broadcast or IP multicast. Thesis Gero Dittmann 19 IP over ATM 2.3.2 Operation Once a host knows its own ATM address and the ATM address of its ARP server it attempts to establish a connection to the ARP server which is used to send ARP requests and receive ARP replies. When the connection to the ARP server has been established, the ARP server sends an inverse ARP (InARP) request on the new VC to learn the hostís IP address. When an InARP reply is received, the ARP server places that hostís IP address to ATM address mapping in its ARP cache. Therefore, over time, the ARP server dynamically learns the IP-to-ATM address mappings of all the hosts in its LIS. It can then respond to ARP requests directed towards it for hosts in its LIS. When a host wants to communicate with another host in its LIS, it first sends an ARP request to the ARP server containing the IP address to be resolved. When an ARP reply is received from the ARP server, the host creates an entry in its ARP cache for the given IP address and stores the IP-to-ATM address mapping. This ARP cache entry is marked as complete. To ensure that all of the IP-to-ATM address mappings known by a certain host are up-to-date, hosts are required to age their ARP entries. A host must validate its ARP entries every 15 minutes. Any ARP entries not associated with open connections are immediately removed. A host validates its SVCs by sending an ARP request to the ARP server. An ARP server validates its entries by sending an InARP request on the VC. If a reply is not received, the ARP entry is marked invalid. Once an ARP entry is marked invalid, an attempt is made to revalidate it before transmitting. Transmission proceeds only if validation is successful. If a VC associated with an invalid ARP entry is closed, the entry is removed. 2.4 Fore IP Fore Systems use their own proprietary signaling protocol called Simple Protocol for ATM Network Signalling (SPANS) on the well-known VPI/VCI 0/15. On top of SPANS they offer an IP-over-ATM solution called Fore IP (see [For97], [For98], [For98B]) which allows communication using AAL4 or AAL5 with no encapsulation. It uses a broadcast mechanism for ARP and thus works without an ARP server. Also, it supports direct communication of all hosts on a physical ATM network without the use of IP routers. Connectionless traffic, like ARP requests/responses or IP broadcast traffic, is sent using a Connectionless Server (CLS). One instance of the CLS is implemented on each Fore ATM switch. Connectionless traffic to and from the CLS is transmitted using AAL4 over the well-known VPI/VCI 0/14. IP broadcast packets and ARP requests are forwarded out every active SPANS port using the reverse path flooding algorithm, while ARP responses of course are forwarded to the requesting source only, using the ATM routing tables. In order to prevent the switch controller from being overflowed with CLS traffic, a token based mechanism is used to limit the amount of traffic accepted by the CLS. The CLS will only forward connectionless traffic if it has a token available. Tokens are provided to the CLS at a predefined rate up to a predefined maximum number of tokens. Address resolution is accomplished by broadcasting an ARP packet to all hosts in the LAN. The ARP request packet contains the source IP address, the source MAC address--which in this case is an ATM address--and the destination IP address. Upon receiving an ARP request packet each endstation examines the destination IP address. If the destination IP address matches the IP address of the interface on which it was received, the host will fill in the MAC address for this 20 Thesis Gero Dittmann IP over ATM interface and return the ARP packet to the source MAC address provided within the ARP packet. Since ATM is a connection based network technology, there is no built in broadcast capability. In the Fore IP implementation, the CLS provides ARP services as previously described. Endstations are configured to use the CLS VPI/VCI for all ARP traffic. Since IP is a connectionless technology, the connection establishment from one IP endstation to another through an ATM network must be provided transparently by the ATM layer. Fore IP does this dynamically using SPANS. The following steps outline the processing of IP packets by the driver on a Fore IP endstation: 1. IP determines that the packet should be routed out the ATM interface on the local endstation. 2. The IP packet is handed from the IP stack to the ATM driver. 3. The driver checks the local ATM ARP cache for an existing connection to the destination IP address. 4. If a connection exists the driver will transmit the packet using the AAL type and VPI/VCI specified in the ATM ARP cache. 5. If a connection does not exist an ARP request will be issued and on response a connection will be opened via SPANS. Fore IP also supports IP multicast over ATM point-to-multipoint connections. In order for an endstation to begin receiving data from a particular IP multicast group, it must be added to the point-to-multipoint connection for the IP multicast group. By joining the SPANS group corresponding to the IP multicast group endstations are added to the point-to-multipoint connection. Since IP multicast is supported via hardware point-to-multipoint connections, there is no need for a multicast server to process each multicast packet. This has a positive impact on the performance of the system. The biggest drawback on Fore IP is that it does not scale since it treats the whole ATM network as a single broadcast domain. In a huge network this should first lead to excessive broadcast traffic and then to massive loss of broadcast packets as an effect of the CLS token mechanism. 2.5 LAN Emulation 2.5.1 Architecture LAN Emulation (LANE) was designed by the ATM Forum to allow existing network protocols to run over ATM networks [ATM95]. It allows using ATM as a backbone for connecting legacy networks. Also, it allows multiple Emulated LANs (ELANs) running simultaneously and independently on the same ATM Network. LANE differs from other IP-over-ATM schemes in that it uses ATM as a Medium Access Control (MAC)-level protocol below the Logical Link Control (LLC), while the others use ATM as a data link protocol below IP. It uses an overlay model to run transparently across existing ATM switches and signaling protocols; it is in essence a protocol for bridging across ATM. It makes no pretense of being an internetworking protocol, nor of dealing with the issues of scalability this would involve. It is solely a LAN protocol, like Ethernet. Thesis Gero Dittmann 21 IP over ATM A LANE consists of four mandatory instances: ï LAN Emulation Client (LEC) ï LAN Emulation Configuration Server (LECS) ï LAN Emulation Server (LES) ï Broadcast / Unknown Server (BUS) The LEC is the entity in end systems which performs data forwarding, address resolution, and other control functions. The LES implements the control coordination function for the Emulated LAN. The LES provides a facility for registering and resolving MAC addresses and/or route descriptors to ATM addresses: the LAN Emulation Address Resolution Protocol (LE ARP). Clients may register the LAN destinations they represent with the LES. The LECS implements the assignment of individual LECs to different emulated LANs (ELANs). Based upon its own policies, configuration database, and information provided by clients, it assigns any client which requests configuration information to a particular emulated LAN service by giving the client the LES's ATM address. The BUS handles data sent by a LEC to the broadcast MAC address ('FFFFFFFFFFFF'), all multicast traffic, and initial unicast frames which are sent by a LEC before the data direct target ATM address has been resolved. LES and BUS may reside on the same device and are then referred to as a co-located BUS. This configuration allows for more intelligent traffic handling. Communication among LANE components is ordinarily handled by several types of switched virtual circuits (SVCs). Some SVCs are unidirectional; others are bidirectional. Some are pointto-point and others are point-to-multipoint. Fig. 2 illustrates the various virtual circuits that are mentioned. 22 Thesis Gero Dittmann IP over ATM Configure Direct (Server) LES LECS ) nt lie (C igu ) nf Co nt t d lie as Co nt ro lD ire ct (C igu ct re re Di Di re ct re u c lti n Se Multicast Send nf M Multicast Distribute Co Control Direct Control Distribute BUS Data Direct LEC 1 LEC 2 Figure 3 VCs in an ELAN. 2.5.2 Operation The following process normally occurs after a LANE client has been enabled on a host: The Client sets up a connection to the LECS (bidirectional point-to-point Configure Direct VC connection) to find the ATM address of the LES for its ELAN. LECs find the LECS by using the following methods in the listed order: ï Locally configured ATM address ï Interim Local Management Interface (ILMI) ï Fixed address defined by the ATM Forum ï PVC 0/17 Using the same VC Connection (VCC), the LECS returns the ATM address and the name of the LES for the client's emulated LAN. The client sets up a connection to the LES for its emulated LAN (bidirectional point-to-point Control Direct VCC) to exchange control traffic. Once a Control Direct VCC is established between a LEC and a LES, it remains up. The server for the ELAN sets up a connection to the LECS to verify that the client is allowed to join the ELAN (bidirectional point-to-point Server Configure VCC). The server's configuration request contains the client's MAC address, its ATM address, and the name of the ELAN. The LECS checks its database to determine whether the client can join that LAN; then it uses the same VCC to inform the server whether the client is or is not allowed to join. If allowed, the LES adds the LEC to the unidirectional point-to-multipoint Control Distribute VCC and conThesis Gero Dittmann 23 IP over ATM firms the join over the bidirectional point-to-point Control Direct VCC. If disallowed, the LANE server rejects the join over the bidirectional point-to-point Control Direct VCC. The LEC sends LE ARP packets for the broadcast address, which is all 1s. This sets up the VCCs to and from the BUS. LE ARP allows the LES to fulfill the basic responsibility of LANE, resolving MAC addresses into ATM addresses. This allows LECs to set up direct SVC connections to other LECs for unicast data forwarding. Broadcast/multicast traffic is sent to the BUS first and then redistributed to all the receivers. The LAN Emulation protocol defines mechanisms for emulating either an Ethernet (IEEE 802.3) or Token Ring (IEEE 802.5) LAN to attached host LECs. Supporting IP over LAN Emulation is the same as supporting IP over either of these IEEE 802 LANs, with no modification to higher-layer protocols. It should be noted, however, that LANE provides no means of directly connecting between Ethernet and Token Ring emulations. A gateway is still required to bridge between them. Forwarding packets between different emulated LANs must be accomplished via routers, either ATM-attached conventional routers or a form of ATM router implementing LANE at two or more interfaces to different emulated networks (if the router is attached only to an ATM network, this configuration is called ìrouter on a stickî). 2.6 IP Multicast over ATM Supporting IP multicast over ATM subnets faces two major problems that broadcast-capable link layers like IEEE 802.3 do not face: 1. Since ATM is connection-oriented, the endpoints that want to send IP multicast data over an ATM subnet need to use connections to transmit the data to ATM egress devices. 2. The IP multicast addresses need to be resolved to the ATM addresses of appropriate ATM egress devices. The unicast address resolution mechanisms fail to deliver this service because they associate only one ATM address with a given IP address, whereas an IP multicast address possibly resolves to a multitude of ATM egress addresses. In this section existing standard solutions to these problems are introduced. 2.6.1 Data Distribution There are two approaches to efficiently transmit multicast data to the ATM egress devices: ï Every multicast sender establishes a point-to-multipoint connection to all receivers in his multicast group over which the multicast data is sent. This is called a ì meshî. Fore IP implements this for multicast support. ï The senders transmit their data via a unicast connection to a Multicast Server (MCS). The MCS has a point-to-multipoint connection to all receivers over which the data is distributed. The BUS in the LANE architecture plays the role of an MCS. Both approaches have their drawbacks: With an increasing number of senders, the mesh uses a lot more connections, which unnecessarily binds network resources. The MCS introduces a single point of failure and also rises questions concerning scalability. The answer to these questions might be load-balancing between multiple MCSes. 24 Thesis Gero Dittmann IP over ATM 2.6.2 Address Resolution In LANE the BUS takes care of distributing multicast traffic. Thus, the clients do not need to care about multicast address resolution. With Fore IP multicast is provided by SPANS. In [Arm96] an infrastructure for supporting IP Multicasting with Classical IP is defined. The ATM ARP server of the Classical IP model as discussed in section 2.3 is extended to associate an IP multicast address with more than one ATM address, i.e. the ATM addresses of the ATM egress devices for that IP multicast group. This server is called a Multicast Address Resolution Server (MARS). A MARS is responsible for a ì clusterî of endpoints which are currently required to be members of the same LIS. Thus, external routing needs to be performed when crossing LIS boundaries. An ATM endpoint that wants to join an IP multicast group registers with the MARS by sending a MARS_JOIN message. Leaving a multicast group works with the MARS_LEAVE message. The MARS then adds its ATM address to the resolution table for that particular multicast group or removes it, respectively. An MCS registers and unregisters with a MARS with the MARS_MSERV and the MARS_UNSERV messages, respectively. A multicast sender requests the appropriate ATM addresses for his multicast group by sending a MARS_REQUEST message to the MARS. Depending on whether the group is served via a mesh or by an MCS, the MARS answers with the ATM addresses of all endpoints or with the single address of the MCS, in both cases encapsulated in MARS_MULTI messages. Then the sender establishes its point-to-multipoint VC to the returned ATM addresses. It does not notice any difference whether it is sending to a single receiver or to an MCS. If no entry for a particular multicast group can be found in the MARS table, then a MARS_NAK is sent to the requestor. The MARS can force senders to shift from mesh distribution to an MCS at any time by issuing a MARS_MIGRATE message. ATM/IP Node Revceiver 1 MARS_MULTI MARS_REQUEST Sender ATM Receiver 2 Receiver 3 MARS ClusterControlVC Multicast Data VC Figure 4 MARS with multicast mesh. Thesis Gero Dittmann 25 IP over ATM In order to notify multicast senders of changes in the membership of their multicast group, the MARS maintains a point-to-multipoint ClusterControlVC to all cluster members and a ServerControlVC to the MCSes over which changes are signalled. In a meshed cluster all MARS_JOIN and MARS_LEAVE messages are forwarded by the MARS to its ClusterControlVC. To an MCS changes are signalled with the MARS_SJOIN and MARS_SLEAVE messages. MCS ATM/IP Node Revceiver 1 MARS_MULTI MARS_REQUEST Sender ATM Receiver 2 Receiver 3 MARS ServerControlVC ClusterControlVC Multicast Data VC Figure 5 MARS with MCS. 26 Thesis Gero Dittmann IP QoS over ATM 3 IP QoS over ATM At this time the protocols used in the Internet provide only a ìBest Effortî service (BE) , i.e. no guarantees are given for available bandwidth, maximum delay, or jitter. Because of variable queueing delays and congestion losses real-time applications such as telephony, video conferences, or video on demand, do not work well across the Internet without such guarantees. In order to solve this problem two groups formed within the Internet Engineering Task Force (IETF), which is the standardization body for the Internet. These groups are the Integrated Services (IntServ) group and the Differentiated Services (DiffServ) group who follow two concurrent approaches. The entity that QoS reservations are made for is called a flow. A flow is a set of packets traversing a part of a network all of which are granted the same QoS guarantees. A packet is usually identified as belonging to a flow by a defined set of header fields, e.g. addresses, port numbers, protocols, or QoS tags. The most obvious differences between the two IETF approaches are whether flows are handled individually or aggregated, the persistence of reservations, and where in the network flow-states are maintained (per-hop vs. edge-to-edge signalling). 3.1 Integrated Services and RSVP In the Integrated Services Architecture [BCS94] a flow is identified by the IP address, transport layer protocol type, and port number of the destination along with a list of sources identified by their IP address and port number. The sender protocol type must be the same. Guarantees for available network resources are given for every flow separately according to requests from the end applications. These requests can be passed to the routers by network management procedures, e.g. with the Simple Network Management Protocol (SNMP), or using a reservation protocol which is the common case. Mostly this would be the Resource ReSerVation Protocol (RSVP), which has been designed with IntServ in mind, but others might be used as well, e.g. ST-II. Guarantees are given for the lifetime of a flow and on a per-hop basis, i.e. every router along the path of a flow needs to keep information about the state of the flow. The IntServ group has defined a number of QoS classes of which two have found greater attention: Controlled-Load Service and Guaranteed Service. 3.1.1 Controlled-Load Service Controlled-Load Service (CS) is defined in [Wro97] as providing ìthe client data flow with a quality of service closely approximating the QoS that same flow would receive from an unloaded network elementî even when the network is really overloaded. Normally, no packets should be lost with CS, and delay should be minimal. It points at real-time applications that are able to adapt to modest variations in the network load, e.g. vic, vat, or nv. A router would provide CS using admission control mechanisms. It has to reject traffic that would lead to congestion. The specification does not quantitatively define a congestion situation. This interpretation is left to the system administrator. Thesis Gero Dittmann 27 IP QoS over ATM 3.1.2 Guaranteed Service Guaranteed Service (GS) [SPG97] offers a service similar to a leased line, i.e. an assured level of bandwidth, a maximal end-to-end delay and no queueing loss for conforming packets of a data flow provided no network components fail and no routing changes occur during the lifetime of the flow. The considered delay is only the queueing delay, not the fixed delay which depends on the chosen path. GS is intended for real-time applications that are sensible towards packet delay that exceeds a certain maximum. 3.1.3 Admission Control and Policing Network nodes which provide QoS need to be able to measure their load and to enforce given traffic policies. Applications that are requesting a service guarantee give an approximation of their traffic behavior called the Tspec (T for ìtrafficî). For GS and CL it is composed of the following parameters: ï Peak rate, p (bytes per second). ï Depth of a token bucket, b (bytes). ï Token bucket rate, r (bytes per second). ï Minimum policed unit, m (bytes) - all packets smaller than m are policed as being of size m. ï Maximum datagram size, M (bytes). Also, the applications give their service requirements called the Rspec (R for ìreservationî). While for CL no Rspec is defined, for GS it has the parameters: ï Bandwidth, R (bytes per second) ï Slack term, S (ms) - allows for a trade-off between bandwidth and delay. Based on these parameters, the nodes along the flow path decide if they are able to provide the requested service without affecting the guarantees already granted to other flows. If sufficient resources are available the request is accepted. Once a service request has been accepted the flow must be monitored if its traffic conforms to the given Tspec. Packets that exceed the flowís Tspec cannot be given the same guarantees as conforming packets in order to keep them from affecting conforming traffic guarantees. They are either dropped instantly, or they may be tagged on the link layer to be treated like BE traffic. 3.1.4 RSVP The Resource ReSerVation Protocol was designed to enable hosts in an IntServ environment to request specific QoS from the network for particular flows. The RSVP messages are used to deliver these requests to all routers along the paths of the flows and to establish and maintain state to provide the requested service. An RSVP reservation generally applies to each node along the data path, but only in simplex style, i.e. resources are reserved only in one direction. RSVP is not a routing protocol but an Internet control protocol, like ICMP, or IGMP. It is not concerned with finding the path the data will take but only with reserving forwarding resources along this path. 28 Thesis Gero Dittmann IP QoS over ATM RSVP supports unicast as well as multicast sessions. In order to accommodate large multicast groups reservations are receiver initiated and the receiver is also responsible for maintaining the reservation. This is crucial since reservations are soft-state, i.e. they need to be updated in regular intervals, otherwise they will time out and be removed. 3.1.4.1 Messages The fundamental RSVP messages are the Path and the Resv message. Path messages are originated by the traffic sender and they follow the path of the data. Their purpose is to provide possible receivers with information about the traffic characteristics and about the path that the data will take so that receivers can make appropriate reservation requests. Routers along the path will update Path messages according to the resources they have available. Receivers will answer a Path message with a Resv message which carries a reservation request to the routers along the path between sender and receiver. It will take exactly the reverse path that the data packets take. The Resv message creates and maintains reservation states in each router along the way. It consists of a ì flowspecî that specifies the desired QoS, and a ì filterspecî that defines the flow to receive the QoS. Together this pair is called a ì flow descriptorî. With IntServ the flowspec includes the Rspec and the Tspec described in section 3.1.3. They are determined by the service models and are generally opaque to RSVP. The filterspec may generally define any fields in any protocol header. The Path message contains the following information: ï Previous Hop Address - unicast IP address of the previous hop which is used to route the Resv messages hop-by-hop in the reverse direction. ï Sender Template - filterspec that identifies the packets belonging to this flow. ï Sender Tspec - defines the traffic characteristics of this flow. ï Adspec (optional) - gives information about the QoS that the routers along the path are able to provide. 3.1.4.2 Merging RSVP allows for heterogeneous receivers in a multicast tree. Normally, this would lead to duplicate packets being sent over the same link to different receivers. RSVP avoids this by merging reservations in routers where Resv messages come in on more than one interface but for the same session. This is achieved by only forwarding the Resv message with the greater reservation. Figure 6 shows an example with a bandwidth parameter. Thesis Gero Dittmann 29 IP QoS over ATM Sender 10 Mbit/s Router 5 Mbit/s 10 Mbit/s Router 5 Mbit/s Receiver 1 Receiver 2 3 Mbit/s Receiver 3 Figure 6 RSVP merge. 3.1.4.3 Reservation Styles For sessions with more than one sender three reservation styles have been defined to allow reservation sharing among flows from different senders of the same session. This makes sense for instance in virtual conferences where most of the time only one person is speaking: ï Wildcard Filter (WF) Style: With a WF-style reservation request the flows from all senders of a session share a single reservation. ï Shared Explicit (SE) Style: With an SE-style reservation request some, but not all sender flows of a session share a single reservation. The included senders are specified by receiver. ï Fixed Filter (FF) Style: A FF-style reservation request creates an exclusive reservation for data packets from a particular sender, not sharing resources with other sendersí packets. 3.2 Differentiated Services With the Differentiated Services Architecture [BBC+98] flows are identified only at network boundaries. Upon entry to a DiffServ-capable network packets are mapped to a QoS class, called ìBehavior Aggregateî (BA), that has been requested for the flow they belong to. The Type-of-Service (ToS) field in the IP header is set accordingly requiring a defined per-hop-behavior (PHB) in the network, and routers in the DiffServ network provide QoS forwarding according only to the ToS byte. Thus, backbone routers do not need to maintain a state for every flow traversing their networks since flows receive an aggregate QoS handling per BA. Two major PHB groups have been defined so far: 30 Thesis Gero Dittmann IP QoS over ATM 3.2.1 Expedited Forwarding PHB The Expedited Forwarding (EF) PHB [JNP99] emulates a leased line with low delay and low packet loss. The traffic contract gives a peak data rate that is allocated for a BA. The first hop DiffServ node needs to cut off traffic that exceeds the peak rate for a flow by dropping packets. DiffServ interior nodes handle EF packets with a ìforward me firstî policy in a separate queue with the lowest delay. 3.2.2 Assured Forwarding PHB In the Assured Forwarding (AF) service PHB [HBWW99] there are four defined traffic classes to which a flow may be assigned by the customer or the provider. For each AF class the DiffServ nodes allocate a certain amount of buffer space and bandwidth. Within each AF class packets are marked with one of three drop precedence values. AF traffic shares queues with BE traffic but in case of network congestion BE packets are dropped first while policy conforming assured service packets with the lowest drop precedence are dropped last. Thus, only a statistical guarantee of bandwidth and no guarantee concerning delay is granted. AF packets are guaranteed to arrive in order. Packets that exceed their flowís bandwidth are usually not instantly dropped but are marked with a higher drop precedence value. In case of network congestion these packets are then dropped with a higher probability. 3.3 ATM QoS 3.3.1 Service Categories The following service categories have been defined for ATM: ï Constant Bit Rate (CBR) The CBR service category is designed to support real-time application like video and audio. CBR is used by applications that require a fixed amount of bandwidth that is continuously available during the connection life time. CBR provides strict delay boundaries and low cell loss. In UNI 3.x this category is called BCOB-A (Broadband Class Of Bearer - A). ï Real-Time Variable Bit Rate (rt-VBR) The rt-VBR service category is designed to support bursty traffic with strict endto-end delay requirements like compressed video. ï Non Real-Time Variable Bit Rate (nrt-VBR) The nrt-VBR service category is intended for applications that have bursty traffic characteristics and do not have tight constraints on delay and delay variation. nrtVBR does not guarantee any delay bounds and therefore can be buffered. In UNI 3.x this category is called BCOB-C. ï Available Bit Rate (ABR) The ABR service category is intended for sources having the ability to reduce or increase their information rate if the network requires them to do so. This allows them to exploit the changes in the ATM layer traffic characteristics, e.g. bandwidth Thesis Gero Dittmann 31 IP QoS over ATM availability, subsequent to connection establishment. ABR guarantees only a minimum amount of bandwidth and may be limited to a specified peak emission rate. ï Unspecified Bit Rate (UBR) The UBR service category is a best effort service intended for non-critical applications which do not require a specified QoS. UBR sources are expected to transmit non-continuous bursts of cells. It shares the remaining bandwidth without any specific feedback mechanisms. UBR service does not specify traffic related service guarantees. It does not include the notion of a per-connection negotiated bandwidth. rt-VBR and ABR are new with UNI 4.0 and thus are not supported by UNI 3.1. 3.3.2 Service Parameters In UNI 4.0 there are three QoS parameters used to measure the performance of the network for a given connection. These can be negotiated between end-systems and networks as part of the traffic contract. The parameters are: ï Cell Loss Ratio (CLR) The CLR is calculated as loss that occurs in case of an overrun of the buffering resources because of simultaneous arrivals of bursts from different connections. The CLR QoS parameter is defined on a per connection basis as the number of lost cells divided by the total number of transmitted cells. The number of lost cells includes cells not reaching their destination, cells that are received with invalid headers, and cells for which the content has been corrupted by errors. The total number of transmitted cells is the total number of conforming cells transmitted over a time period. ï Maximum Cell Transfer Delay (CTD) The CTD is the time elapsed between departure time of a cell from the generating end-system and the arrival time at the destination. A maximum CTD is set and cells with delays that exceed this maximum are assumed lost or unusable. ï Cell Delay Variation (CDV) The CDV represents the difference between the maximum CTD and the minimum CTD. This metric allows the evaluation of the maximum possible delay between two consecutive cells that were deterministically spaced. It also allows the estimation of worst possible amount of clumping due to queueing. In UNI 3.x the service parameters are indicated by the ëQoS Classí which is essentially an index to a network specific table of values for the actual QoS parameters. ï Class 0 refers to best effort service. ï Class 1 results in leased-line behavior. ï Class 2 is designed for delay-dependent, connection-oriented traffic. ï Class 3 supports connection-oriented but delay-independent data transfer. ï Class 4 was specified for connectionless traffic. 32 Thesis Gero Dittmann IP QoS over ATM 3.3.3 Traffic Descriptors Traffic descriptors attempt to capture the cell inter-arrival pattern for resource allocation. Each connection has two sets of traffic descriptors, one set for each direction, which are conveyed at connection setup. The set of applicable traffic descriptors may vary depending on the connectionís service category. Traffic descriptors for an ATM connection include one or more of the following parameters: ï Peak Cell Rate (PCR) is the maximum available bandwidth for a connection in cells per second. ï Sustainable Cell Rate (SCR) is the upper bound on the average transmission rate of the conforming cell of an ATM connection over time. ï Maximum Burst Size (MBR) represents the burstiness factor of the connection. It indicates the maximum number of cells that can be transmitted by the source at PCR while still complying with the negotiated SCR. ï Minimum Cell Rate (MCR) is the minimum allocated bandwidth for a connection. It does not describe the behavior of traffic. The connection can send traffic at a higher rate than MCR. It is used for bandwidth on demand services like ABR to ensure that a connection does not starve when there is no more available bandwidth. Table 1. shows which of these parameters are applicable to which service category. Service Category Traffic Description Guarantees Low Cell Loss Delay/Variance Bandwidth CBR PCR X X X rt-VBR PCR, SCR, MBS X X X nrt-VBR PCR, SCR, MBS X NO X ABR PCR, MCR X NO X UBR (PCR) NO NO NO Table 1. Service category attributes and guarantees. Thesis Gero Dittmann 33 IP QoS over ATM 3.4 Mapping IP QoS to ATM This section concentrates on IntServ/RSVP over ATM. The IETF has already come up with some RFCs on this topic: [Ber98a], [Ber98b], [GB98], [Cra98]. 3.4.1 Different Approaches There are some major differences in the architectures of ATM and IntServ/RSVP as listed below: ï Reservation Initiation With RSVP, the receiver reserves network resources by issuing a Resv message. In ATM, the sender establishes the connection and at the same instance determines the QoS parameters to be used with this connection. ï Reservation State; Dynamic QoS; Reservation Time While ATM does the reservation for a node once on connection setup and implicitly keeps it stable over the connection life-time (hard state), an RSVP client needs to refresh the reservation regularly in order to keep it (soft-state). The RSVP solution allows for flexible routing changes and for QoS parameter renegotiation which is not possible with ATM. ï Multicast RSVP allows for multiple senders and multiple receivers in one multicast session. ATM has only point-to-multipoint connections. ï Heterogeneity In multicast sessions with RSVP it is possible to have receivers with different QoS requirements each getting exactly the QoS they ask for. ATM gives all receivers of a point-to-multipoint connection the same QoS. ï QoS classes and parameters The QoS classes into which the traffic is divided, and the parameters available for these classes as described in sections 3.1 and 3.3 are different. In the following sections possible resolutions to these differences are introduced as discussed in the above mentioned RFCs. 3.4.2 Connection Management A major issue in IntServ/RSVP - ATM interworking are the rules for setting up and tearing down VC connections. Possible approaches will be enumerated in the following as defined in [Cra98]. 3.4.2.1 Best Effort VCs An IntServ/RSVP over ATM system is likely to maintain a best-effort VC over which RSVP sessions are established and traffic with no QoS reservations will be forwarded. 3.4.2.2 RSVP Signalling VCs An important question is over which path to send the RSVP messages themselves. The main ideas are: 34 Thesis Gero Dittmann IP QoS over ATM ï use same VC as data ï single VC per session ï single point-to-multipoint VC multiplexed among sessions ï multiple point-to-point VCs multiplexed among sessions In order to avoid message loss, a QoS should be assigned to the signalling VCs. 3.4.2.3 VC Initiation The difference in the location of connection initiation is resolved easily by forwarding RSVP messages to the ingress ATM node and let it initiate the connection downstream. All the egress node needs to do is accept the incoming VC. 3.4.2.4 Dynamic QoS The mismatch between the dynamic QoS reservation style of RSVP and the static ATM reservations needs to be supported by establishing a new VC with the newly required QoS every time RSVP requests a change in the reservation parameters. In order not to drop the whole connection just because of a reservation change, the old VC will remain in place until the new VC has been established. If the new VC cannot be established, then the old VC will continue to be used. 3.4.2.5 Heterogeneity Heterogeneity occurs when receivers request different levels of QoS within one multicast session. Four models have been defined on how to deal with this situation: ï Full Heterogeneity A separate VC is provided for each requested QoS. This model gives every user exactly what he asked for, but for the price of requiring the most network resources and possibly sending a lot of duplicate packets over some links. ï Limited Heterogeneity Only two alternative VCs are provided: one QoS VC and one best-effort VC. Receivers have to choose between those two. This is a trade-off between matching reservations and binding up network resources. ï Homogeneity Only one QoS VC is used for all receivers. This approach is obviously simple to implement. The drawbacks are that a receiver might be charged for a QoS that he has not requested, and that a receiver might not be added due to a lack of QoS resources although a best-effort connection would have been possible. ï Modified Homogeneity Receivers are added to a single QoS VC unless the available resources do not allow to add another receiver. In this case, a best-effort connection is established to the otherwise rejected receivers. This model is a compromise between the limited heterogeneity and the homogeneity models. 3.4.2.6 Flow Aggregation It is possible to map more than one reservation to one VC, thus aggregating flows. While setting up a dedicated VC per flow results in easy implementation, aggregation provides reservations with no setup times, and a simple solution for dynamic QoS as well as for heterogeneity. A complex problem might be what QoS to choose for aggregate VCs. Thesis Gero Dittmann 35 IP QoS over ATM 3.4.2.7 Short-Cuts Short-cuts allow VCs to be established directly between to hosts on different IP subnets, thus crossing LIS boundaries. Short-cuts can be found with the Next Hop Resolution Protocol (NHRP). The issues here are when to establish short cuts with what QoS. 3.4.3 Class and Parameter Mapping A critical operation in finding appropriate mappings from IntServ/RSVP to ATM is the translation of QoS classes and parameters. Approaches to this problem as defined in [GB98] are introduced in the following. 3.4.3.1 Service Categories Figure 7 shows reasonable mappings from the IntServ QoS classes to the ATM service categories. CBR Guaranteed Service Controlled Load Best Effort rt-VBR nrt-VBR ABR UBR IntServ ATM Figure 7 QoS class mappings. These mappings follow the definitions quite naturally. For UNI 3.x, since rt-VBR and ABR are not available, the mappings are non-ambiguous. 3.4.3.2 Traffic Descriptors Only the major transformations are described here. There are a lot of pitfalls to avoid which are described in detail in [GB98]. Guaranteed Service When using rt-VBR for GS the straight forward mapping of the receiverís Tspec/Rspec to the ATM traffic descriptors would be: PCR = p SCR = R MBS = b In order to allow for a trade-off between bandwidth reservation and maximum delay, a more flexible mapping can be used if either the ATM ingress device can buffer a burst of size b (as given in the Tspec) or the MBS is at least b. As an additional parameter, the AVAILABLE_PATH_BANDWIDTH (APB) is taken into account which is given by the 36 Thesis Gero Dittmann IP QoS over ATM Adspec in the Path message. The APB states the lowest bandwidth of all links traversed by the Path message. The parameter relationships should be: r <= SCR <= R <= PCR <= APB <= line rate r <= p <= APB When transmitting GS traffic over CBR connections, PCR may be set to R if peaks can be sufficiently buffered. Otherwise, PCR must be the maximum of R and r. Controlled Load For CL over nrt-VBR a straight forward approach may be used: PCR = p SCR = r MBS = b When using ABR VCs, MCS is set to r. Since ABR has no parameter that specifies burst behavior additional buffer space in the ATM ingress device should be provisioned to cope with bursts. Best Effort For BE service over UBR there are no parameters to be determined. Still a system administrator may decide to reserve a minimal bandwidth for BE traffic using the ABR service category. 3.4.3.3 QoS Classes The QoS classes of UNI 3.x should be set depending on the used service category like this: ï QoS_Class_0 for UBR and ABR (required!). ï QoS_Class_1 for CBR and rt-VBR. ï QoS_Class_3 for nrt-VBR. 3.4.3.4 Units Conversion IntServ parameters are measured in bytes and bytes per second, whereas for ATM they are measured in cells and cells per second. Also, the overhead introduced by the ATM headers needs to be taken into account. For an IP packet with B bytes and an AAL-5 header in which it is encapsulated with H header bytes per protocol data unit (PDU), the number C of cells it takes to carry the packet is roughly C = B/48 and more precisely C = (H + B + 8 + 47) / 48 with C then rounded to the nearest integer. The ë8í accounts for the AAL-5 trailer and the ë47í accounts for the last cell which may be only partially filled. Thesis Gero Dittmann 37 Network Protocol Stack in UNIX 4 Network Protocol Stack in UNIX 4.1 UNIX Device Drivers In order to use a peripheral device, such as an ATM network interface card, a computerís operating system needs a piece of software that provides routines for the kernel and user programs to give commands to the device and to exchange information with the device. This software is called a device driver. There are two categories of device drivers, depending on how the data coming from the device is organized: all devices can be handled by a character device driver, while block device drivers are specialized to handle devices with a file system. And then there is a special kind of devices, that only provide an access point to the kernel but do not represent a hardware interface, e.g. the null device, which simply abandons all data passed to it. These drivers are called pseudo drivers. There is a standard set of operations for UNIX device drivers: ï Initialization, which is only called once by the kernel for every device, usually at boot time. As the name suggests, the routine does basic function tests and sets flags and counters to their reset values. ï Open prepares the device for access by a user application. ï Close is called if the user application does not need the device any more. ï Read and Write are for data exchange with the device. ï Input/output control sends control messages (IOCTLs) to the device. This is used to get status information about a device and to send configuration commands to the device. ï Interrupt handler is called when the device causes an interrupt. An interrupt is the means for a device to get the attention of the kernel. It is used for instance when the device completes an operation and wants to let the kernel know, or if new data arrived at the device that need to be transferred to main memory. ï Poll is the service routine for a device that cannot generate interrupts and thus needs to be actively asked for events. ï Select is used for synchronous multiplexing. ï Strategy is used by block devices to sort read and write requests in their queue. Character devices do not use this routine. Drivers do not need to provide all routines. For instance, drivers for output-only devices, such as printers, would not have a read function. The input/output control mechanism is very important, because we will later use this service to configure filters in the kernel. Basically, an IOCTL message holds the command that should be executed, and a data block with parameters for that command. 38 Thesis Gero Dittmann Network Protocol Stack in UNIX 4.2 STREAMS STREAMS is an architecture for device drivers that brings the idea of layered protocol stacks to the kernel. Developed in 1983 by Dennis M. Ritchie, today STREAMS has been integrated into most UNIX variants. STREAMS driver stacks consist of three categories of instances: ï A STREAMS driver ï STREAMS modules ï A STREAMS head The STREAMS driver is like a traditional device driver that provides access procedures to hardware devices. The STREAMS head is the interface from user space to kernel space. It is addressed by system calls. In between head and driver, STREAMS modules evaluate and mostly modify passing data units. The modules are intended to resemble e.g. layers of the OSI reference model. Data or IOCTLs are exchanged between these instances via communication paths (ístreamsí), over which messages are sent. To indicate which kind of information a message carries, each message has a message type, e.g. M_DATA for payload data or M_PROTO for control and protocol information. Modules can be pushed onto or popped off a stream dynamically and automatically when a device is opened. Modules may have more than one STREAMS interface upstream and one downstream. These modules are called multiplexors. A multiplexor with multiple upstream interfaces and one downstream is called ìN-to-1î. A multiplexor with multiple downstream interfaces and one upstream is called ì1-to-Mî. There are also ìN-to-Mî multiplexors. Multiplexors are of course much more complex than normal modules because they need to perform routing between the connected streams. Wherever a stream connects to a STREAMS head, module, or driver, two queues are allocated by the kernel: one up-stream and one down-stream. Here messages wait to be handled. Thesis Gero Dittmann 39 Network Protocol Stack in UNIX User : FTP User : Video STREAMS Head STREAMS Head TCP UDP tcp udp User Space The IP STREAMS Stack Operating System Kernel 4.3 IP arp arp arp ip ip Ethernet Driver IP over ATM Driver Ethernet NIC ATM NIC : STREAMS driver Hardware arp : STREAMS module Figure 8 The IP STREAMS stack. With STREAMS the architecture of a device driver conveniently matches the layered reference model of IP, with a module/driver for each protocol layer. The layers are represented not only by modules because the user will send data to a driver, not to a module. Thus, each layer that should be directly accessible by the user needs to have its own driver interface. When making use of the layer 4 services, the user gives his data to the STREAMS head who passes them either to the TCP or to the UDP module/driver. Figure 8 shows that from there they go to the IP module/driver. Downstream from IP there are several ARP modules and link layer drivers, one for each network interface. Since both, TCP and UDP, pass data to the IP module and there are streams for each link layer technology, the IP module must be an N-to-M multiplexor. 40 Thesis Gero Dittmann Project Goals 5 Project Goals 5.1 Previous Implementation The first implementation of RSVP-over-ATM at KOM is a proof of concept only [Zin97]. With [Rom98], some of the deficiencies of the previous work are eliminated, but not all of them. For implementation reasons the forwarding of the IP packets is done in the user space and such faces some major efficiency problems: ï Every packet is copied to the user space, classified, and copied back to the kernel space. These copy operations are very ìexpensiveî with respect to processing time. ï The packets are only copied to the user space, while the original packets are sent through the IP stack and forwarded over a best effort VC, anyway. Thus, packets belonging to an IP flow for which an ATM QoS VC has been set up are duplicated which leads to a massive waste of network resources. In [Rom98], this has been tried to be avoided by introducing a packet filter into the kernel. 5.2 New Approach The major goal was to solve the above mentioned problems by intercepting the packets of interest before they leave the node on a best effort connection and by relocating the forwarding process to the kernel space. This would avoid packets crossing the user space boundary and thus make the forwarding much more efficient. Also, packets would not be copied anymore, but the original packet would be forwarded to the QoS VC. Packet duplication and the associated waste of bandwidth would be prevented. The concept should be extended to not only support RSVP/IntServ but to be as flexible as possible in terms of relevant layer 3/4 header fields for the filtering and in terms of applying filtering rules to these fields. For the user space software that controls the filtering in the kernel, an easy to use API should be provided that grants access to the whole functionality. This API is particularly aimed at daemons like RSVP/IntServ or DiffServ. Finally, the use of the API should be demonstrated with an example application. The project is called ì Virtual Connection Managerî ( VCM). Thesis Gero Dittmann 41 Software Architecture 6 Software Architecture In this chapter, the problems that have been met in the concept elaboration process are described, possible solutions are enumerated, and the decisions are documented. Finally, the whole concept is illustrated. 6.1 Filter Terminology Here are some important definitions that will be used throughout the text: A predicate defines a set of values that a header field can have, such as an IP address and a mask, or a port range. A filter rule is a complete list of predicates that are to be matched against a header in an ìANDî fashion, i.e. every single predicate needs to match for the rule to match. A filter associates a set of filter rules (ìORî combined) with a list of VCs that matching packets are to be forwarded over. 6.2 Kernel Module 6.2.1 Location of Interception Since known IP-over-ATM drivers cannot make QoS reservations, only best effort packets can be transmitted over them. QoS VCs need to be set up directly using the ATM driver, and the corresponding packets have to be forwarded through it. There are several locations in the IP STREAMS stack where the data messages could be intercepted. The alternatives have to be evaluated with respect to efficiency, compatibility, programming style, and complexity of implementation. 6.2.1.1 Modification of the IP-over-ATM Driver Since the source code of the driver has been available, modification of the driver was possible. In the IP-over-ATM driver data messages coming from the IP driver can be associated with a flow. If there is a reservation for the flow, the packet would have to be forwarded over the corresponding VC. Otherwise, the packet would have to be sent down the best effort VC. 6.2.1.2 Modification of the IP Module The source code of the IP STREAMS stack has been available as well, so modifications here were another alternative. From the IP module, an additional stream can be established to the ATM driver. The IP module would then have to be modified in order to identify packets belonging to a flow with a reserved VC and forward them to the corresponding VC via the ATM driver. 6.2.1.3 STREAMS Module/Driver A STREAMS module/driver (from now on only called ìmoduleî to simplify matters) can be pushed underneath the IP module. The appropriate place seems to be between IP module and 42 Thesis Gero Dittmann Software Architecture IP-over-ATM driver, since here only packets arrive that are routed over ATM. It only needs to be determined, whether a packet is to be transmitted over a QoS VC or only best-effort. Neither the IP module nor the IP-over-ATM driver should realize a difference whether the interception module is there or not. To achieve this, the interception module needs to be installed under the IP module instead of the IP-over-ATM driver. From the view of the native ATM driver, the interception module needs to behave like the IP module. Thus, it has to provide a Data Link Provider Interface (DLPI) upstream and downstream which is the standard IP interface to layer 2. 6.2.1.4 Evaluation For the first two alternatives, third party code needs to be manipulated. This is not desirable for several reasons: ï Future updates of the modified software always need to also be modified in order to provide the same functionality discussed here. ï The modified code is not free to be published. ï It is generally better style to introduce a new module with extended functionality then to alter various sections of existing code. The third alterative seems more complex to implement because a whole new STREAMS module needs to be created and the DLPI communication needs to be emulated. Besides it might be slightly less efficient because of the additional STREAMS queue management. 6.2.1.5 Conclusion The outlined disadvantages of code modification are not acceptable, while those of the third alternative are. Thus, the development of a STREAMS module was chosen. 6.2.2 VC Connection Establishment Somewhere in the system, the VCs with the user-requested QoS parameters have to be set up. This could be done in the kernel by directly interfacing with the ATM driverís kernel module, or in user space, employing an API as provided by the equipment vendor. 6.2.2.1 In the Kernel There are two alternatives as how to interface with the ATM driverís kernel module: ï Using direct procedure calls to the ATM module. ï Sending IOCTLs as generated by user space APIs. The first option would have been possible because of the availability of the source code for the ATM driver. However, it has not been considered due to the reasons given in section 6.2.1.4. With the source code, the IOCTLs could have been analyzed and emulated. 6.2.2.2 In User Space There are several user space APIs to choose from which enable the user to set up ATM connections and to send data across them. Since the routing should be done in the kernel, a question to answer is how to get the virtual circuit descriptors (VCD), as returned by the set up routines, from user space to the kernel module in order to enable it to use the connections. Thesis Gero Dittmann 43 Software Architecture The solution found to this problem is to let the VCM kernel module intercept messages between the stream head and the kernel driver of the used API. The user space part of VCM sends a dummy data packet over every new connection which is filtered out by the kernel module. This packet is then used as a template for sending data over the connection by simply appending the data to it. 6.2.2.3 Evaluating Both alternatives have been analyzed and it has been found that the IOCTL protocol between the APIs and the ATM driver is very complex, while connection setup using an API in the user space is a matter of a few function calls. Also, a simple way to hand over the VCDs to the kernel module can be used. Thus, it has been decided to set up VCs from user space. 6.2.3 Kernel Module Commands In order to configure the filtering in the kernel module from user space, a set of IOCTLs had to be defined. The following IOCTLs have been found necessary: ï VCM_NEWFILTER inserts new filter into the filter list and fetches VC templates. ï VCM_DELFILTER removes filter from filter list. ï VCM_ADDVC2FILTER adds one or more VCs to the list of VCs that a filter forwards matching packets over. ï VCM_CHANGEFILTER removes the filtering rule from a filter and substitutes it by a new one. ï VCM_ADDFILTER inserts new filter with new filter rule into the filter list but copies VC templates from existing filter. ï VCM_CHANGEVC4FILTER removes VCs from a filters VC list and substitutes them by new VC templates. ï VCM_DELETEVCFROMFILTER removes VCs from the list of VCs that a filter forwards matching packets over. ï VCM_EXISTFILTER scans through the filter list for an identical filter and returns success message. ï VCM_LISTFILTER prints a listing of all active filters to the console. ï VCM_FLUSH removes all filters. 6.2.4 Programming Language Object oriented programming today is considered the modern paradigm to be used for code design. With all interfaces to the operating system being in C, an object oriented approach would have to utilize C++ as the programming language. Since the available driver source code is in procedural C and all resources found on kernel programming are based on C, doing kernel programming in C++ would have introduced an artificial border between two programming paradigms which would have made debugging as well as getting the big picture of the code unnecessarily complicated. For this reason, procedural C was chosen for the kernel code. 44 Thesis Gero Dittmann Software Architecture 6.3 ATM Driver Interfaces 6.3.1 Underlying Best-Effort System Obviously, it is not necessary to invent a new best-effort IP-over-ATM system. One of the existing architectures can be chosen for the transmission of those packets for which no QoS VC has been set up. The requirements to meet are: The administration should be as easy as possible. For possibly large installations, scalability is vital. Besides, the interoperability of equipment by different vendors can be taken into account which is only provided by non-proprietary protocols. The candidates are: 6.3.1.1 LAN Emulation (LANE) LANE is standardized by the ATM Forum [ATM95]. Most ATM devices should support it. For installation, the three servers have to be set up: LECS, LES, and BUS. This makes installation and administration quite an effort and introduces efficiency problems because of the multitude of connections between the entities. LANE allows for multiple emulated LANs per ATM network with their own LANE servers each, which provides scalability. 6.3.1.2 Classical IP over ATM Classical IP is standardized by the IETF [JNP99]. Again, most ATM devices should support it. Classical IP allows for multiple LISes. Only one ARP server per LIS has to be set up. Besides, a provisional mechanism has been found to dynamically resolve IP addresses to ATM addresses with Classical IP. 6.3.1.3 Fore IP Fore IP is a proprietary protocol by Fore Systems ([For97], [For98], [For98B]). Thus, only Fore products support it. Fore IP uses the SPANS signalling protocol for the set up process, which also is Fore proprietary. With SPANS, the user is released from any installation work apart form assigning IP addresses and masks. Only one subnetwork can be installed per ATM network, which rigidly limits scalability. 6.3.1.4 Evaluation For the proposed STREAMS module it makes no difference of what kind the IP-over-ATM driver is as long as it talks DLPI, which all of them do. Only the name of the driver to be linked below the module has to be changed in order to switch from one protocol to the other. Thus, the decision for one of the aforementioned protocols can easily be revised at any time. They can even be used in parallel with a separate VCM module on top of each. LANE is the most complex solution. This makes it inconvenient to set up and use, and introduces efficiency issues. Fore IP has the least administration effort of all candidates, but this does not make a decisive difference to Classical IP. It does not scale well and it does not interoperate. Classical IP is easy to set up and use and is scalable. Also, it is standardized and thus does interoperate. An advantage of Classical IP for this project has been the possibility to easily resolve addresses. Thesis Gero Dittmann 45 Software Architecture 6.3.1.5 Conclusion Best-effort delivery has been tested with both, Classical IP as well as Fore IP. For the mentioned reasons, LANE has not been considered. The VCM design is flexible enough to switch to another protocol with no software changes but only minor configuration adjustments. 6.3.2 ATM destination address resolution In order to find the ATM address that a QoS VC has to be established to, the next hop address of the IP flow needs to be resolved to the corresponding ATM address. Obviously, the best solution would be to utilize the mechanism of the underlying IP-over-ATM protocol. Unfortunately, none of the architectures offers an API to their address resolution mechanism. Therefore, either a work-around or another approach has to be found. 6.3.2.1 LAN Emulation The only way to access the LAN Emulation kernel module is via IOCTLs. There is a tool called elarp, that displays ARP table entries. Thus, there must be a way to read the ARP table from user space. Unfortunately, no documentation for these IOCTLs has been available. 6.3.2.2 Classical IP over ATM Also for Classical IP over ATM a set of IOCTLs is defined. In this set there seem to be commands for address resolution, since the tool cliparp is able to display ARP entries. Again, no IOCTL documentation has been available. 6.3.2.3 Fore IP No IOCTLs have been found documented. 6.3.2.4 Static Routing Table The address mapping can be done manually. A file has to be created with every single ARP entry in it, which then can be read by the connection management. This provides an easy way to control the routing of flows but of course is very rigid and inflexible. 6.3.2.5 Work-around Using Shell Tools The easiest way to implement address resolution is to make use of the shell tools that come with the ATM driver, which display ARP table entries. They can be called via a system() call, the output can be redirected to a file, and the file can be read by the connection manager. 6.3.2.6 Conclusion In the long run, the only acceptable solution is to access the address resolution mechanism of the underlying best-effort system. Apparently, the only way to do this elegantly is to generate the appropriate IOCTLs. Due to the lack of documentation for any of the three available IPover-ATM architectures, it has been found to be too time consuming for the scope of this thesis to find out about the IOCTL protocol between user space and kernel, and to emulate this protocol. Static routing might be suitable for some applications, but in general it is too inflexible to be a serious alternative. Thus, the only remaining option has been the above mentioned work-around using the cliparp shell command. A small script has been written that extracts the ARP entries from the 46 Thesis Gero Dittmann Software Architecture cliparp output. VCM calls this script, redirects the output to a file, and finally reads this file. However, this is a major limitation in this implementation and should soon be addressed. 6.3.3 ATM API The set up of VCs for QoS reservations is done by an API to the ATM driver. The Fore Systems Network Interface Cards (NICs) that were used for this project are accessible by two APIs. 6.3.3.1 XTI XTI (X/Open Transport Interface) is the X/Open standard for an API to transport services at layer 4 of the OSI reference model. It is described in the X/Open Specification [TOG97] which is part of the UNIX specification. The XTI implementation shipped with the Fore adapters provides access only to a subset of the ATM QoS parameters. 6.3.3.2 SDAPI The Signalling and Data Transfer Application Programming Interface (SDAPI) is Fore Systemsí proprietary API to the Q.2931 signalling protocol. It grants full access to the ATM QoS features. The SDAPI has not been officially published by Fore Systems but is part of the Fore Partners program. The SDAPI provides access to the full set of signalling functions. This includes the possibility to specify all ATM QoS parameters when setting up a new VC. 6.3.3.3 Fore Code It would have also been possible to directly interface to the Fore source code and use the user space functions to send the IOCTLs to the kernel. This way access is granted to all parameters the Fore driver software offers. 6.3.3.4 Conclusion Using the Fore source code, again, faces the problems discussed in 6.2.1.4. The advantage of the XTI is the fact that it is an open standard which makes it very easy to port software that is using it to NICs of alternative vendors. Unfortunately, it offers only very limited access to the ATM QoS parameters. When it is difficult to find appropriate mappings from IP QoS mechanisms, like RSVP, to ATM with the full set of ATM parameters, then it is hopeless to find any reasonable mappings with the reduced parameter set of the XTI. Since the SDAPI is the only Fore API that offers the full set of QoS parameters, there was no other choice possible than to use it. Being a proprietary API, no other vendor will offer the SDAPI for their NICs. Thus, when porting the VCM to use other NICs, a comparable API needs to be found. 6.4 Structure Overall View In Figure 9 the whole VCM structure is depicted. The STREAMS driver le0 is the Ethernet device driver, and fa0 is the Fore IP driver. In user space there is VCM library, which is explained in chapter 7, as an interface for programs that use the VCM functionality. One of these programs will probably be an RSVP daemon which is used as an example here. Thesis Gero Dittmann 47 RSVP-VIC RAPI RSVPd VCM Library exTCI VCM control TCP UNI control User Space Software Architecture UDP tcp udp arp arp arp arp ip ip le0 VCM vcm fa0* sdapi SPANS Operating System Kernel IP UNI Ethernet Controller Hardware fatm0 ATM Controller : STREAMS connections : non-STREAMS connections : non-STREAMS device driver/module : STREAMS device driver : STREAMS module * : With the flexible VCM design, there could also be the any other convergence driver here. Figure 9 Overall view of the VCM kernel components. 48 Thesis Gero Dittmann Software Architecture 6.5 Filtering 6.5.1 Filter Matching 6.5.1.1 Pattern Masking A first idea on how to test header fields for a match with a given filter rule was to introduce bit patterns and bit masks for the whole header. To test for a match, a pattern as well as the header in question would each be logically ìANDî combined with the bit mask, and subsequently be tested for equality. This would give a very generic, efficient, and flexible way of filtering. Two major obstacles led to giving up this approach: ï Because of the option fields in the IP header, the transport header is not located at a defined distance from the start of the IP header. Varying positions can not be supported with the approach. ï While bit masks work perfectly with IP addresses, the layer 4 port numbers are often matched against a range of values. This also can not be supported. 6.5.1.2 Specialized Matching Rules Because of the problems encountered when trying to use one comparing algorithm for all header fields, three specialized matching patterns were thought of: ï Address Match ï Exact Match ï Range Match Address match is the IP algorithm with addresses and subnet masks: the address bits are logically AND combined with a subnet mask and the same operation is performed on a table entry before comparing them, just like the algorithm described in 6.5.1.1. Exact match is used for compares with a single value such as a protocol number. Range match aims at port numbers which are expected to be within a certain range. Because of the problem with the variable length options field in the IP header, the locations of the header fields to look for are programmed into the kernel code and not configurable by the user. 6.5.2 Relevant IP Header Fields Of the fields in the IPv4 header as defined in [Pos81], the following have been identified as being relevant for flow detection: ï Type of Service (ToS) ï Protocol ï Source Address ï Destination Address Also, in the layer 4 TCP or UDP header, the port numbers are significant. They are given as the first 32 bits in both, the TCP or the UDP header (see Figure 10). Thesis Gero Dittmann 49 Software Architecture 32 Bits Version IP Header Length Fragment Identification IP Time to Live Total Length Type of Service Flags Protocol Fragment Offset Header Checksum Source Address Destination Address Options (+ Padding) ... > TCP/ UDP Source Port Destination Port < Figure 10 IP and TCP/UDP header fields for flow identification. The ToS byte is only considered by DiffServ, otherwise it is usually ignored in todayís Internet. RSVP is sensitive for the layer 4 protocol, source and destination addresses, and source and destination layer 4 port numbers. VCM currently considers only the RSVP relevant fields, but should easily be extended to also handle the ToS byte or other header fields. 6.6 User Space 6.6.1 Programming Language The goal in designing the API to the VCM kernel functionality was to make it easy to understand and use. The object-oriented programming paradigm seems to best fit this requirement. Since it is most likely that API users will program in C/C++ and the API needs to interface the C constructs of the VCM kernel part, it was obviously the best solution to offer a C++ API. 6.6.2 API Using VCM from other programs, as for instance an RSVP daemon, should be as easy as possible. Two basic designs were considered: ï A library, offering all VCM operations with function calls. ï A daemon which is running in the background and handles commands from user processes. Additionally, multi-user operation seemed to be interesting. 6.6.2.1 Single User Library A single user library needs to be multi-threaded: 50 Thesis Gero Dittmann Software Architecture ï One control thread is started when initiating the API. It is basically responsible for the dispatch_loop() function which periodically polls the SDAPI file descriptors for new messages. If there are any messages, it calls an SDAPI handler function that has been passed to it. This callback function updates the connection tables and calls the callback function of the corresponding VC. This method might also do some handling and then escalates the event to the responsible Filter which, again, might update some lists. Finally, a function is called that may be overloaded by the user. ï The second thread is the user thread from which the control thread is forked off. This thread returns to the user code after each library call. The SDAPI callback function also needs to communicate with the VC setup method in order to let it know whether the setup has been successful. The setup must not return prior to getting this information. 6.6.2.2 Single User Daemon The single user library should be easily wrapped by a daemon process without any changes to the library. Communication with the user could be handled e.g. via sockets with a handshake mechanism. 6.6.2.3 Multiple Instances of a Single User Library If multiple instances of the single user library on one system should be allowed, the kernel module needs to be enabled to handle more than one user instance, i.e. it needs to handle multiple STREAMS queue pairs and it needs to associate signaling messages coming from the network with the right queues. 6.6.2.4 Shared Library With a shared library, all users would share one STREAMS connection to the kernel module. Thus, new users must not initialize this connection again which must be assured by the library. A solution to this might be shared memory between the users. Again, signaling messages coming from the network need to be associated with the correct users. 6.6.2.5 Multi-User Daemon A daemon for multiple users could simply use the single user library. It just needs to keep track of the connections of each user in order to associate incoming messages with the right user. 6.6.2.6 Conclusion Since the single user library is the basis for most of the above mentioned user interfaces, the single user library was implemented first. Because deployment in a multi-user environment seems an important scenario, the library was developed with an eye on this. Where it was obvious, the implementation was made multi-user safe. Otherwise, possible problems were indicated by comments in the source code. The single user library is a first step that should be carried on towards multi-user capability in the near future. Thesis Gero Dittmann 51 The VCM Library 7 The VCM Library 7.1 Survey The library structure reproduces the real objects that the filtering deals with as closely as possible. This should make it easy to perceive the purpose of the library objects and how they act in concert. This perception is further supported by the Unified Modeling Language (UML) diagrams in this chapter. The UML has been standardized by the Object Management Group (OMG) in [RUP97]. Figure 11 shows the UML diagram of all classes in the library. 7.2 Exceptions When a severe error occurs in one of the VCM methods then the method throws an exception. Exceptions are C++ objects that are passed to the method caller as an error report. When an exception is thrown, the method in which this is done is aborted. The caller can see what went wrong from the exception class type. Just like normal classes exceptions can also have a number of attributes which provide more information about the error. The method caller is supposed to catch the exception, evaluate the information provided, and handle the exception accordingly. If an exception is not handled by the calling method then this method is also aborted and the exception is forwarded one level up in the method caller hierarchy. When the top level is reached and the exception is still not handled there then the program is exited. Another way to report errors would be to return error numbers. The exceptions mechanism has been preferred mainly because of the additional information provided by exception attributes, and because class constructors cannot provide any return value. Besides this approach is more conforming to the object-oriented programming paradigm. Figure 12 shows all exception classes defined in the VCM library. The base exception class is VCMexception from which all other exception classes in the library are derived. This allows to catch all possible exceptions thrown by the library by simply catching VCMexception. Similarly, each class has its own base exception class, so as to allow to catch all possible exceptions thrown by one class by just catching the base exception. VCMexception is the only globally defined exception class. The others are implemented as nested classes within the classes whose methods throw them. In section 7.3 for each class the exceptions are given that methods might throw. 52 Thesis Gero Dittmann The VCM Library AddressResolver +AddressResolver() +~AddressResolver() +resolve() VCMcontrol UNIcontrol #vcmDevFd : int #foreip_ipFd : int #foreip_arpFd : int #doIoctl() #destroy() #VCMcontrol() #~VCMcontrol() +newFilter() +addVC2Filter() +changeFilter() +addFilter() +changeVC4Filter() +deleteVCfromFilter() +deleteFilter() +filterExists() +flush() #aPI_p : sd_api_t* #cardAddress : atm_addr_t #controlThreadID : pthread_t #eventCond : pthread_cond_t #eventMutex : pthread_mutex_t #eventVCD_p : void #eventType : int32_t #VCList : vcList +getAPI_p() +getCardAddress() +getEventCond_p() +getEventMutex_p() +getEventVCD_p() +resetEventVCD _p() +getEventType() +vcListIndex() #sDAPIcallback() #controlThread() #UpwardSync() #destroy() #UNIcontrol() #~UNIcontrol() SimpleAddressResolver +SimpleAddressResolver() +~SimpleAddressResolver() +resolve() 1 ki _p 1 0..n KernelInterface -running : bool [8] +KernelInterface() +∞KernelInterface() ki _p Filter -filterRuleSet : RuleList -vcSet : VCList -ki_p : KernelInterface* -vcSetupOrder : List<VC*> #callbackOnAsync() +Filter() filtersInCharge +~Filter() +addRule() 0..n +addVC() +changeRule() 1..n +changeVC() +removeRule() +removeVC() +changeRuleSet() +changeVCSet() 0..1 +unexpectedRelease() +unexpectedDropParty() 1..n 1 1..n filterRuleSet entry vcSet entry 0..n vcList entry VCDVCTuple 1 0..n -vcd : void* -vc : VC* +VCDVCTuple() +VCDVCTuple( : ) +setVC() +getVC() +operator<() +operator!=() VC #destAddr_p : atm_addr_t* #qos_p : QoS* #ki_p : KernelInterface* #partyID_p : void* #vCD_p : void* #filtersInCharge : FilterList 0..1 #open : bool vc #setup( : ) #sendData() +VC() +~VC() +setup() +close() 0..1 +asyncHandler() +setFilterInCharge() +removeFilterInCharge() 0..1 qos_p 1 QoS -sdu_p : sd_sdu_t* +QoS() +~QoS() +getSdu_p() FilterRule -f : vcm_filter_t* +FilterRule() +FilterRule( : ) +setIPSourceAddress() +setIPSourceAddressMask() +setIPDestAddress() +setIPDestAddressMask() +setIPProto() +setTPSourcePort() +setTPSourcePortOffset() +setTPDestPort() +setTPDestPortOffset() +operator*() PointToPointVC MultipointVC #setup( : ) +PointToPointVC() +~PointToPointVC() +setup() -addrPartyList : APL #setup( : ) +MultipointVC() +~MultipointVC() +setup() +addDest() +deleteDest() +asyncHandler() 1 1..n addrPartyList entry AddrPartyIDTuple -addr : atm_addr_t* -partyID : void* +AddrPartyIDTuple() +AddrPartyIDTuple( : ) +operator<() +operator!=() +getPartyID() +getAddr() Figure 11 The VCM library. Thesis Gero Dittmann 53 The VCM Library VCMexception FilterError NewFilterError UNIerror NoAddress VCMcontrolError NoKernelModule +vcs : List<VC*>* +rules : List<FilterRule*>* VCerror CannotCloseVC +error : int NoAPI CannotSetupVC DeleteFilterError +vcs : List<VC*>* +rules : List<FilterRule*>* NoSDU LastRule CannotListen KernelError +error : int GotReleaseMsg KernelInterfaceRunning GotUnknownMsg AddVCerror NoThread +rules : List<FilterRule*>* NoMutex RemoveVCerror +rules : List<FilterRule*>* +lastVC : bool QoSerror NoCond CannotCreateSDU IncompleteRuleChange +notAdded : List<FilterRule*>* +notDeleted : List<FilterRule*>* IncompleteVCchange +failedVCs : List<VC*>* +failedRules : List<FilterRule*>* +unclosedVCs : List<VC*>* Figure 12 The VCM exception classes. 54 Thesis Gero Dittmann The VCM Library 7.3 Class Reference 7.3.1 KernelInterface VCMcontrol UNIcontrol KernelInterface -running : bool [8] +KernelInterface(ipConvergenceModule : const String&, ipAddress : const String&, netMask : constString&, unit : int) +~KernelInterface() ki_p 1 1 0..n ki_p 0..n Filter VC Figure 13 KernelInterface class. The KernelInterface class hides the interface to the kernel part of VCM and SDAPI from the user. For every ATM card (a.k.a. unit) in the system that the user wants to run VCM on exactly one KernelInterface needs to be instantiated. A reference to this object is then passed to all objects that need to communicate with the kernel module, namely VC objects and Filter objects. The wrapping of the kernel functionality is really further divided into the VCMcontrol class and the UNIcontrol class which are described in the following sections. KernelInterface incorporates these classes and their methods by being derived from both of them. The division corresponds to the two STREAMS paths into the kernel. N.B.: Instantiating the KernelInterface class also creates an endpoint for incoming VCM connections because the UNIcontrol accepts connection requests by remote VCM instances. Once the VCM kernel module is installed by the VCMcontrol constructor it forwards all incoming data to the IP module. 7.3.1.1 Attribute The running attribute indicates whether a KernelInterface has already been instantiated on the NIC unit given by the array index. If this is the case, the library should not create a new instance but the user should use the existing one. 7.3.1.2 Methods The KernelInterface constructor is called with the following parameters: ï The driver name of the IP convergence module which is to be plumbed below the VCM driver, e.g. fa0. ï The IP address of the convergence module. ï The IP subnet mask of the convergence module. Thesis Gero Dittmann 55 The VCM Library ï The ATM adapter unit on which this KernelInterface should operate. The constructor tests the running attribute. If it is set for the ATM adapter unit already then the created VCMcontrol and UNIcontrol instances are destructed and an exception is thrown. 7.3.1.3 Exceptions The base exception class of the KernelInterface is KernelError. Currently there is only one exception derived which is KernelInterfaceRunning. It is thrown by the constructor if the running attribute is already set when the constructor is called. Since KernelInterface is derived from VCMcontrol and UNIcontrol the KernelInterface constructor may also forward exceptions thrown by these classesí constructors. 7.3.2 VCMcontrol VCMcontrol #vcmDevFd : int #foreip_ipFd : int #foreip_arpFd : int #doIoctl(cmd : int, f1 : vcm_filter_t* = NULL, f2 : vcm_filter_t* = NULL) : int #destroy() : void #VCMcontrol(ipConvergenceModule : const String&, ipAddr : const String&, ipNetmask : const String&) #~VCMcontrol() +newFilter(f : vcm_filter_t*) : int +addVC2Filter(f : vcm_filter_t*) : int +changeFilter(oldf : vcm_filter_t*, newf : vcm_filter_t*) : int +addFilter(oldf : vcm_filter_t*, newf : vcm_filter_t*) : int +changeVC4Filter(f : vcm_filter_t*) : int +deleteVCfromFilter(f : vcm_filter_t*, vcIndex : int) : int +deleteFilter(f : vcm_filter_t*) : int +filterExists(f : vcm_filter_t*) : int +flush() : int KernelInterface Figure 14 VCMcontrol class. The VCMcontrol class provides the interface to the VCM STREAMS driver. 7.3.2.1 Attributes The three attributes, which are file descriptors to the VCM device and the best-effort IP convergence module, are used to plumb the VCM driver between the IP driver and the convergence driver, and to send IOCTLs to the VCM driver. 7.3.2.2 Methods The constructor is only called by the KernelInterface class which then acts as a facade for the VCMcontrol. Its methods are normally called not directly by the user but only by other VCM objects via a KernelInterface reference. The destroy method is only called by the destructor. VCMcontrol offers methods for all kinds of filter instantiation and manipulation. These methods then send the appropriate IOCTLs to the kernel. At this time, the methods correspond oneto-one to the set of IOCTLs as defined in section 6.2.3, except for VCM_LISTFILTER which 56 Thesis Gero Dittmann The VCM Library has been left out because of the limited operation so far. The IOCTLs are sent by the doIoctl method. 7.3.2.3 Exceptions The exception base class for VCMcontrol is VCMcontrolError. Currently the only derived class is NoKernelModule. It is thrown by the VCMcontrol constructor if the installation of the kernel module fails. 7.3.3 UNIcontrol UNIcontrol #aPI_p : sd_api_t* #cardAddress : atm_addr_t #controlThreadID : pthread_t #eventCond : pthread_cond_t #eventMutex : pthread_mutex_t #eventVCD_p : void #eventType : int32_t #VCList : vcList +getAPI_p() : sd_api_t* const +getCardAddress() : const atm_addr_t& +getEventCond_p() : pthread_cond_t* const +getEventMutex_p() : pthread_mutex_t* const +getEventVCD_p() : void* const +resetEventVCD_p() : void +getEventType() : int32_t +vcListIndex(vCD_p : void*) : VCList::Iterator #sDAPIcallback(aPI_p : sd_api_t*, rock_p : void*, buffer_p : osp_buf_t*, event_p : sd_event_t*) : void* #controlThread( : void*) : void* #UpwardSync(e : sd_event_t*, thisUNI : UNIcontrol*) : void #destroy() : void #UNIcontrol(unit : int) #~UNIcontrol() 1 0..n KernelInterface vcList entry VCDVCTuple Figure 15 UNIcontrol class. The UNIcontrol class provides the interface to the SDAPI. 7.3.3.1 Attributes A pointer to the SDAPI instance is stored in aPI_p, and the ATM address of the NIC that the UNIcontrol object is working on is stored in cardAddress. The controlThreadID is a pointer to the event handling thread which is described below. The event attributes are used for interthread communication. 7.3.3.2 Methods The constructor is only called by the KernelInterface class which then acts as a facade for the UNIcontrol. Its methods are normally called not directly by the user but only by other VCM objects via a KernelInterface reference. The only UNIcontrol constructor parameter is the unit (ATM adapter card) number. The destroy method is only called by the destructor. Thesis Gero Dittmann 57 The VCM Library UNIcontrol offers administrative methods for the SDAPI which are listed in Table 2. Method Description getAPI_p returns the identifier of the API instance, which is needed for sending commands to it. getCardAddress returns the ATM address of the unit this UNIcontrol is working on. getEventCond_p returns the condition variable for the communication between control thread and user thread. getEventMutex_p returns the mutex for the communication between control thread and user thread. getEventVCD_p returns the VCD for which the last VC event has occurred. resetEventVCD_p resets the event VCD to NULL; is called by VC objects that have fetched their event. getEventType returns last VC event. vcListIndex returns the index into the UNIcontrol.vcList of a given VCD. This is used to let a VC object set the VC reference in the list to itself on successful connection setup. sDAPIcallback is called whenever there is an event on one of the VCs that have been created using this UNIcontrol instance. controlThread is dispatched as a separate thread; polls events from the VCs that have been created using this UNIcontrol instance; when there is an event, the sDAPIcallback function is called. UpwardSync is called by the sDAPIcallback function; handles communication with the user thread when it is synchronously waiting for an event on connection setup. destroy is a cleanup method which is called by the UNIcontrol destructor Table 2. UNIcontrol methods. On instantiation of UNIcontrol a thread is started that handles the incoming VC events for all connections that are set up using this UNIcontrol instance. This thread is kicked off with the controlThread function which waits for the SDAPI to deliver events . When an event occurs the sDAPIcallback function is called. This function does the first-level handling of the events. It updates the UNIcontrol.vcList according to the events, i.e. if a SETUP, a CONNECT, or an ADDPARTY message occur, the corresponding VC is inserted in the list, and on occurrence of a RELEASE message, the VC is deleted from the list. Then, for outgoing VCs, the VC object is notified. There are two different situations for the VC object notification: synchronous and asynchronous. In the asynchronous case, the VC object is not expecting any events because it has not asked for a change in the connection state. sDAPIcallback then calls VC.asyncHandler which is an empty method that should be overloaded by the library user to take appropriate action on incoming events. In the synchronous case, the VC object has initiated a change in the state of the connection and is waiting for a confirming event. This has been implemented with the condition variable ìeventCondî and the corresponding mutex ìeventMutexî. The VC object waits for a change of eventCond. When an event occurs that is to be signaled synchronously, sDAPIcallback calls the UpwardSync function. Here the eventVCD is set to the VC that the event occurred on and the 58 Thesis Gero Dittmann The VCM Library eventType is also set. Then the change of eventCond is broadcasted to the eventCond listeners. This way the waiting VC object is unblocked. Prior to waiting for a VC event, the VC object calls UNIcontrol.resetEventVCD to make sure that it does not take an old event for a new one. The way communication between the threads is handled at this time is not 100% multi-user safe since occurring events might overwrite older events before these have been received by the associated VC objects. In order to make this safe, the mutex construct should be extended to not allow writing a new event before the old one has been read. UpwardSync, sDAPIcallback, and conrolTread need to be declared static because they interface to legacy C code. Static declaration is the only way to avoid linkage errors with legacy C code. 7.3.3.3 Exceptions The base exception class for UNIcontrol is UNIerror. Table 3. shows the seven exceptions derived from it which are all thrown by the UNIcontrol constructor. Exception Cause NoAddress ATM address of the NIC unit could not be determined. NoAPI SDAPI user instance could not be created. NoSDU No SDU could be created in order to listen for incoming calls. CannotListen Listening for incoming calls failed. NoThread Control thread could not be created. NoMutex Mutex for inter-thread communication could not be created. NoCond Condition variable for inter-thread communication could not be created. Table 3. UNIcontrol exceptions. 7.3.4 VCDVCTuple VCDVCTuple UNIcontrol 0..n 1 vcList entry -vcd : void* -vc : VC* +VCDVCTuple() +VCDVCTuple(d : void*, v : VC* = NULL) +setVC(v : VC*) : void +getVC() : VC* +operator<(v : const VCDVCTuple&) : bool +operator!=(v : const VCDVCTuple&) : bool 0..1 0..1 vc VC Figure 16 VCDVCTuple class. This class associates a VC object with the corresponding VCD. It is used as an entry in the UNIcontrol.vcList. There are only get and set methods defined and two operators that are needed to keep objects in a sorted list like the UNIcontrol.vcList. Thesis Gero Dittmann 59 The VCM Library 7.3.5 FilterRule Filter 1 1..n filterRuleSet entry FilterRule -f : vcm_filter_t* +FilterRule() +FilterRule(source : ipaddr_t, sourceMask : ipaddr_t, sourcePort : u_short, sourcePortOffset : u_short, dest : ipaddr_t, destMask : ipaddr_t, destPort : u_shor +setIPSourceAddress(ia : ipaddr_t) : void +setIPSourceAddressMask(im : ipaddr _t) : void +setIPDestAddress(ia : ipaddr_t) : void +setIPDestAddressMask(im : ipaddr _t) : void +setIPProto(p : u_char) : void +setTPSourcePort(p : u_short) : void +setTPSourcePortOffset(o : u_short) : void +setTPDestPort(p : u_short) : void +setTPDestPortOffset(o : u_short) : void +operator*() : vcm_filter_t t = 0, destPortOffset : u_short = 0xffff, layer4Proto : u_char = 0) Figure 17 FilterRule class. The FilterRule class wraps the vcm_filter_t structure defined in vcm_dev.h. The rule element of this structure constitutes the IP header values for which a matching IP packet is to be forwarded over certain VCs. The relevant header fields have been identified in section 6.5.2. For the constructor or the set methods, the FilterRule class accepts values in the format given in Table 4. Parameter Format IP source address ipaddr_t as defined in <inet/ip.h> IP source subnet mask ipaddr_t Layer 4 source port u_short; this is the lower border of the allowed port range Layer 4 source port offset u_short; this defines the range within which the port numbers may be IP destination address ipaddr_t IP destination subnet mask ipaddr_t Layer 4 destination port u_short; this is the lower border of the allowed port range Layer 4 destination port offset u_short; this defines the range within which the port numbers may be Layer 4 protocol either IPPROTO_TCP or IPPROTO_UDP Table 4. FilterRule parameters. The * operator is used by Filter objects to extract the vcm_filter_t structure in order to send it to the kernel as a new rule. 60 Thesis Gero Dittmann The VCM Library 7.3.6 Filter KernelInterface ki_p 1 0..n Filter -filterRuleSet : RuleList -vcSet : VCList -ki_p : KernelInterface* -vcSetupOrder : List<VC*> #callbackOnAsync(eventType : int32, vc : VC*, droppedParty : atm_addr_t* = NULL) : void +Filter(frl : RuleList, vcl : VCList, k : KernelInterface*) +~Filter() +addRule(f : FilterRule*) : int +addVC(vc : VC*) : int +changeRule(oldf : FilterRule*, newf : FilterRule*) : int +changeVC(oldVC : VC*, newVC : VC*) : int +removeRule(f : FilterRule*) : int +removeVC(vc : VC*) : int +changeRuleSet(frl : RuleList) : int +changeVCSet(vcl : VCList) : int +unexpectedRelease(vc : VC*) : void +unexpectedDropParty(vc : VC*, droppedPartyAddress : atm_addr_t) : void filtersInCharge 0..n 1..n VC 1..n vcSet entry 0..1 1 1..n filterRuleSet entry FilterRule Figure 18 Filter class. 7.3.6.1 Attributes As defined in section 6.1, the Filter class associates a set of FilterRule objects with a set of VCs over which packets that match one of the rules are forwarded. The rules are recorded in filterRuleSet, and the VCs are recorded in vcSet. Additionally, the order in which the VCs have been set up is logged in vcSetupOrder. Since Filter needs to send IOCTLs to the VCM kernel driver, it has a reference to a KernelInterface object in ki_p. 7.3.6.2 Methods The constructor is called with a list of FilterRules, a list of VCs, and a KernelInterface reference and initializes the attributes accordingly. The VC.setFilterInCharge Method is called for each enlisted VC which will invoke the VC.setup method if the VC is not open already. If this is successful, the VC is inserted into the Filterís vcSet. Accordingly, the destructor deregisters with all VCs. When the last Filter deregisters the VC will be closed. There are three methods which are called by the control thread when an unexpected event occurs on a VC that is associated with this Filter instance. If it is a RELEASE message then unexpectedRelease is called which removes the VC from the VC lists. For a DROPPARTY event the unexpectedDropParty method is called. Both methods then call callbackOnAsync. Thesis Gero Dittmann 61 The VCM Library Filter also offers some methods to reconfigure the rule set and the VC set. These are listed in Table 5. Method Description addRule Adds a FilterRule to the flterRuleSet. addVC Adds a VC to the vcSet. changeRule Adds a new FilterRule and removes an old one from the filterRuleSet. changeVC Adds a new VC and removes an old one from the vcSet. removeRule Removes a FilterRule from the filterRuleSet. removeVC Removes a VC from the vcSet. changeRuleSet Adds a set of new FilterRules and removes all old rules from filterRuleSet. changeVCSet Adds a set of new VCs and removes all old VCs from the vcSet. Table 5. Filter reconfiguration methods. 7.3.6.3 Exceptions The base exception class for Filter is FilterError. Derived from it are seven exceptions that may be thrown by the Filter methods. Also, some exceptions from other classes may be propagated: ï The Filter constructor throws a NewFilterError exception if it failed to open one of the VCs or to install one of the FilterRules. The respective VCs are reported in NewFilterError.vcs, and the respective FilterRules are listed in NewFilterError.rules. ï The Filter destructor throws a DeleteFilterError exception if it failed to remove a FilterRule or to close a VC. The failure objects are given in DeleteFilterError.vcs and in DeleteFilterError.rules, respectively. ï addVC throws an AddVCerror exception if it fails to apply a FilterRule to the VC. The resistant FilterRules are listed in AddVCerror.rules. addVC may also forward an exception thrown by VC.setup. ï changeVC propagates exceptions thrown by addVC or removeVC. If an addVC exception occurs the old VC has not been removed, yet. ï removeRule throws a LastRule exception if there is only one FilterRule left in this Filter. ï removeVC throws a RemoveVCerror exception if the VC could not be unlinked from a FilterRule or if this was the last VC of this Filter. The resistant FilterRules are listed in RemoveVCerror.rules. If it was the last VC then RemoveVCerror is set. However, in this case the VC has been removed, anyway! removeVC also propagates exceptions thrown by VC.close. ï changeRuleSet throws an IncompleteRuleChange exception if FilterRules could not be removed or installed. The FilterRules are listed in IncompleteRuleChange.notAdded and IncompleteRuleChange.notDeleted, respectively. ï changeVCSet throws an IncompleteVCchange exception if a VC could not be setup or closed, or a VC could not be associated with a FilterRule. The 62 Thesis Gero Dittmann The VCM Library corresponding objects are listed in IncompleteVCchange.failedVCs, IncompleteVCchange.failedRules, and in IncompleteVCchange.unclosedVCs, respectively. 7.3.7 QoS VC 0..1 qos_p 1 QoS -sdu_p : sd_sdu_t* +QoS(trafficType : TrafficType_t, qosClass : class_t = qos_class_0, pcr : u_int = 0, scr : u_int = 0, mbs : u_int = 0) +~QoS() +getSdu_p() : sd_sdu_t* Figure 19 QoS class. This class is a wrapper for the sd_sdu_t structure defined by the SDAPI. The QoS parameters in the SDU are set by the class constructor. The VCM library user will generate a QoS object for every VCC he intends to open. The QoS object is passed to the VC objectís constructor. For setting up a connection, the VC object extracts the SDU with the getSdu_p method. There is a QoSerror exception base class defined in the QoS class. The only derived exception class is CannotCreateSDU which is thrown by the QoS constructor if it cannot get an SDU structure to fill in the QoS parameters. Thesis Gero Dittmann 63 The VCM Library 7.3.8 VC KernelInterface ki_p VCDVCTuple 1 0..n 0..1 0..1 VC Filter filtersInCharge 0..n 0..1 1..n 1..n vcSet entry #destAddr_p : atm_addr_t* #qos_p : QoS* #ki_p : KernelInterface* #partyID_p : void* #vCD_p : void* #filtersInCharge : FilterList #open : bool #setup(setupFlag : int) : void #sendData() : int +VC(atmAddr : atm_addr_t*, q : QoS*, k : KernelInterface*) +~VC() +setup() : void +close() : void +asyncHandler(event : sd_event_t*) : void +setFilterInCharge(f : Filter*) : void +removeFilterInCharge(f : Filter*) : void PointToPointVC vc 0..1 1 qos_p QoS MultipointVC Figure 20 VC class. The VC class is the abstract base class for two kinds of VCs: point-to-point VCs and point-tomultipoint VCs. Using this base class, references to a VC can be made independently of whether it is of the one or the other kind. This is useful for instance for the VC lists in the Filter and the UNIcontrol classes. 7.3.8.1 Attributes Since VCs need to communicate with the kernel in order to set up a connection, they have a reference to a KernelInterface instance in ki_p. The ATM address of the remote end is given in destAddr_p, and the QoS parameters are given by the object in qos_p. When the VC is set up the open flag is set to TRUE and the vCD_p is set to what is returned by the SDAPI. For point-to-multipoint VCs also the partyID_p is set by the SDAPI. When a Filter registers with the VC an entry that points at that filter is added to the filtersInCharge list. 7.3.8.2 Methods When calling the constructor for a VC the ATM address of the remote end, a QoS object, and a reference to a KernelInterface are passed. The connection is opened when the first Filter registers with the VC using the setFilterInCharge method which then calls setup. This is an abstract method that needs to be defined by the derived classes. They will call the protected setup method with a specific parameter indicating whether it is a point-to-point connection or a point-to-multipoint connection. The setup method then initiates connection setup with the SDAPI and it waits for the CONNECT message before returning. 64 Thesis Gero Dittmann The VCM Library Once the connection is set up some dummy data can be sent to it with the sendData method. The IOCTL that results in this call is intercepted by the VCM kernel module and then used as a template for sending data over the VCC as suggested in section 6.2.2.2. When the VC is handed over to a Filter object then the Filter calls the setFilterInCharge method, pointing at itself. The asyncHandler is called by the control thread when an unexpected event has occurred on the VC. If it is a RELEASE event the asyncHandler sets the open flag to FALSE and calls the unexpectedRelease method of the registered Filters in filtersInCharge. The connection is closed with the close method which is at the latest called by the VC destructor or when the last Filter deregisters using the removeFilterInCharge method. 7.3.8.3 Exceptions The VC exception base class is VCerror. There are exception classes derived from it that are thrown by the VC constructor, and one thrown by the VC destructor. If the setup call to the SDAPI fails then the constructor throws a CannotSetupVC exception with the SDAPI error number in CannotSetupVC.error. If the CONNECT request is answered with a RELEASE message then the constructor throws a GotReleaseMsg exception. If the request is answered with another negative message then a GotUnknownMsg is thrown. The VC destructor throws a CannotCloseVC exception if the close call to the SDAPI fails. Again, the error number is denoted by CannotCloseVC.error. 7.3.9 PointToPointVC VC PointToPointVC #setup(dummy : int) : int +PointToPointVC(atmAddress : atm_addr_t*, Q : QoS*, k : KernelInterface*) +~PointToPointVC() +setup() : void Figure 21 PointToPointVC class. The only modification of the base class VC that PointToPointVC concerns the two setup functions. The abstract public setup method is implemented to simply call the protected setup method while the protected setup method is overloaded to call its base class equivalent with the parameter set for a point-to-point connection. Thesis Gero Dittmann 65 The VCM Library 7.3.10 MultipointVC VC MultipointVC -addrPartyList : APL #setup(dummy : int) : void +MultipointVC(atmAddress : atm_addr_t*, q : QoS*, k : KernelInterface*) +~MultipointVC() +setup() : void +addDest(destAddr : atm_addr_t*) : int +deleteDest(destAddr : atm_addr_t*) : int +asyncHandler(event : sd_event_t*) : void 1 1..n addrPartyList entry AddrPartyIDTuple Figure 22 Multipoint class. 7.3.10.1 Attributes In addition to the base class VC the MultipointVC class has a list of the ATM addresses and the partyIDs of the remote ends. This list is called addrPartyList and its elements are AddrPartyIDTuple objects. 7.3.10.2 Methods Similar to the PointToPointVC class, the MultipointVC class implements the abstract public setup method and overloads the protected setup method. The public method calls the protected one and then adds the connection to the addrPartyList. The protected method calls its base class equivalent with the parameter set to SD_SETUPFLAG_PMP. The asyncHandler method is also overloaded. When a RELEASE message occurs it is handled just like in the base class. In addition to this, when a DROPPARY event occurs then the party is deleted from the addrPartyList and unexpectedDropParty of the filtersInCharge is called. Moreover, a destination can be added to a MultipointVC using addDest, and a destination can be dropped by calling deleteDest. 7.3.10.3 Exceptions The MultipointVC methods throw the same exceptions as the VC base class. 7.3.11 AddrPartyIDTuple MultipointVC AddrPartyIDTuple 1 1..n addrPartyList entry -addr : atm_addr_t* -partyID : void* +AddrPartyIDTuple() +AddrPartyIDTuple(a : atm_addr_t*, p : void* = NULL) +operator<(a : const AddrPartyIDTuple&) : bool +operator!=(a : const AddrPartyIDTuple&) : bool +getPartyID() : void* +getAddr() : atm_addr_t* Figure 23 AddrPartyIDTuple 66 Thesis Gero Dittmann The VCM Library This class associates an ATM address of a remote MultipointVC end with the corresponding partyID. It is used as an entry in the MultipointVC->addrPartyList. There are only get and set methods defined and two operators that are needed to keep objects in a sorted list like the MultipointVC->addrPartyList. 7.3.12 AddressResolver AddressResolver +AddressResolver() +~AddressResolver() +resolve(ipAddr : in_addr_t) : atm_addr_t* SimpleAddressResolver Figure 24 Address Resolver Class The only purpose of this class is to provide a uniform interface to a resolve method for resolving an IP address to a next hop ATM address. The class is declared abstract to allow for different resolution mechanisms. An implementation of this class is instantiated by the library user in order to resolve the IP destination address of a flow to an ATM address that it can pass to a VC class constructor. 7.3.13 SimpleAddressResolver AddressResolver SimpleAddressResolver +SimpleAddressResolver() +~SimpleAddressResolver() +resolve(ipAddr : in_addr_t) : atm_addr_t* Figure 25 SimpleAddressResolver class. This implementation of an AddressResolver simply runs a shell script called ìresolve_addrî which uses the Fore shell program cliparp to resolve an IP address to a next hop ATM address. The output of the script is redirected to a file from which the result is read and returned to the method caller. Thesis Gero Dittmann 67 The VCM Library 7.4 Using VCM Library 7.4.1 Egress On the ATM egress devices that VCM connections should go to, all there is to do is to instantiate a KernelInterface. This will accept incoming VCM connection requests and forward the data to the IP protocol stack. On an ingress device, a KernelInterface instance will be present, anyway. Thus, every ingress device will accept connections from other ingress devices. 7.4.2 Ingress The first thing to do on the ingress side is to overload Filter->callbackOnAsync as needed in order to handle unexpected events on the VCs, i.e. RELEASE or DROPPARTY messages. Overloading this method is not required but optional. The basic classes to be instantiated are a KernelInterface and a SimpleAddressResolver. Then for each Filter one or more FilterRules need to be defined. After setting the parameters they are inserted into a Filter::RuleList. For each VC of the Filter a QoS object is created. It is passed to the PointToPointVC constructor or to the MultipointVC constructor. The VCs are inserted into a Filter::VCList. Both List are then passed to the Filter constructor. When instantiating a Filter the possible NewFilterError exception needs to be catched. If no exception occurs then the IP over ATM traffic is now forwarded according to the newly inserted Filter. The Filter may be modified using the methods described in Table 5. Also, parties may be added to or deleted from a MultipointVC using the appropriate MultipointVC methods (see section 7.3.10). 68 Thesis Gero Dittmann The VCM Console 8 The VCM Console The VCM Console is an example program using the VCM library. It provides access to the VCM by a command line interface. The user can manually configure filter rules and VCs, and put them together in a filter. 8.1 Implementation Because of the procedural character of a command line parser, the VCM Console is written in procedural style rather than in object-oriented style. There are only a few classes defined but the work is done in a huge main procedure. 8.1.1 The Classes First of all, the Filter->callbackOnAsync method is overloaded by the class ìMyFilterî. Here, an error message is produced if an unexpected RELEASE or DROPPARTY event occurs. The classes ìNumberî, ìNumberedListEntryî, and the ìPointLessî struct are used to extend the list collection by a list ordered by integer numbers. 8.1.2 The Lists The VCM console maintains three global lists: ï vcList, which contains all instantiated single VCs. ï ruleListList, which contains Filter::RuleLists. ï filterList, which contains all defined Filters With the console commands entries can be added to or removed from these lists. The vc command adds a VC object to a defined destination with certain QoS requirements to the vcList. The rule-list command adds a FilterRule object to one of the lists in ruleListList. The filter command combines one of the RuleLists in ruleListList with one or more VCs from the vcList. Finally, the same commands with a ì noî in front remove an entry from a list. List entries are always identified with an instance number. 8.1.3 The Parser The parsing is performed in a while(1) loop which is exited by the exit or quit command. For the commands that add entries to vcList or filterList the parser verifies if an instance with that number is already present in the respective list. If so then the command is rejected. Otherwise, the command line options are examined one after the other and parsed immediately. If a mandatory option is missing or a parameter is incorrect, the parsing is aborted and an error message is displayed stating the cause of the problem. Thesis Gero Dittmann 69 The VCM Console Three utility functions have been defined for the parser: ï getNextArgument extracts a word out of a string object after a given position. This is used to find the next option or parameter in a command line. ï stringToInt converts the a string object to an integer number. This is used to get the parameter values out of the command line string. ï getAddress converts an ATM address in a string object to the atm_addr_t format needed by the VC constructor. 8.2 VCM Console Commands 8.2.1 Notation The notation of the commands follow this convention: ï Word with no parentheses: mandatory key words. vc ï Word in tapered parentheses: mandatory parameters. <vc-number> ï Multiple words divided by vertical lines, embedded in cambered parentheses: alternative keywords, choosing exactly one is mandatory. {atm|ip} ï Parameter block in squared brackets: optional keywords / parameters.[pcr <pcr>] ï Three dots mean: repeat previous parameter as often as needed. [<vc-list> ...] 8.2.2 Reference Here is the complete listing of the VCM Console commands: help ? displays all possible commands and options. vc <vc-number> {atm|ip} <address> {cbr|ubr} [class <qos-class>] [pcr <pcr>] [scr <scr>] [mbs <mbs>] creates a new VC identified by <vc-number>. {atm|ip} determines whether the following <address> is an ATM address or an IP address. {cbr|ubr} is the ATM service category, <qos-class> is the ATM service class and must be in the range of 1 to 4. <pcr>, <scr>, and <mbs> are the ATM QoS parameters. 70 Thesis Gero Dittmann The VCM Console rule-list <rule-list-number> <source-address> <mask> [<operator> <port> [<port>]] <destination-address> <mask> [<operator> <port> [<port>]] [{tcp|udp}] adds a new rule-list identified by <rule-list-number>. If the rule list is not existent, yet, then it is created. <source-address> and <mask> is the IP address and subnet mask of the source that is to be filtered. The <operator> and the <port> parameters determine the transport layer ports to be filtered. These are the possible operators: ï lt with only one port number: all port numbers less than <port>. ï eq with only one port number: only port number equal to <port>. ï gt with only one port number: all port numbers greater than <port>. ï range with two port numbers: all port numbers within range <port> to [<port>]. The second set of these parameters works just the same for the destination address, mask, and port numbers. Finally, [{tcp|udp}] is the transport layer protocol that is filtered. If this parameter is not specified then both protocols will be filtered. filter <filter-number> <rule-list-number> <vc-number> [<vc-number> ...] installs a new filter which is identified by <filter-number>. <rule-list-number> identifies the rule-list that should be applied to this filter. Then, a variable number of <vcnumber> gives the IDs of the VCs over which the filtered traffic is forwarded. no {vc|rule-list|filter} <instance-number> deletes an entry in one of the administrative lists. {vc|rule-list|filter} determines what type of object should be deleted and the <instance-number> gives the ID of the concrete instance. quit exit removes all filters and closes all VCCs that have been initiated by this VCM Console instance, and exits the VCM Console. # [<comment>] A line starting with # is interpreted as a comment and ignored. An empty line is also ignored. Thesis Gero Dittmann 71 Summary 9 Summary In the telecommunications industry today there are two predominant networking technologies: Internet and ATM. The Internet dominates the desktop market, while many telecommunication providers are running ATM in their backbone. Consequently, there is a need to make both architectures interwork. For best effort traffic some standards have already been defined, e.g. Classical IP over ATM. With the growing demand for multimedia services it becomes clear that also the different systems for Quality of Service provisioning must be enabled to work together. In this thesis, regarding the existing basic standards a STREAMS kernel infrastructure for Solaris has been developed that allows QoS daemons like RSVP/IntServ to set up ATM VCs on a Fore card and to efficiently forward IP packets over these VCs. The QoS parameters of the VCs and the filter rules for the forwarding are freely configurable. ATM address resolution is provided, making use of Classical IP mechanisms. The functionality is made available to programmers in the user space via an easy to use C++ library. It should be easy to modify the software to also accommodate alternative IP QoS frameworks like DiffServ. Keywords: Quality of Service (QoS) Internet Protocol (IP) Asynchronous Transfer Mode (ATM) Integrated Services (IntServ) Resource ReServation Protocol (RSVP) 72 Thesis Gero Dittmann Evaluation 10 Evaluation The major goal was to relocate the forwarding of QoS supported packets from user space to the kernel space. This has been achieved by implementing a STREAMS driver/module that performs the filtering and that redirects filtered packets to the appropriate QoS VCs. The only data that crosses the boundary to the user space is the configuration messages, which is not avoidable as long as the QoS daemons work in the user space. Packet duplication has also been eliminated. The kernel part of the software cannot be reconfigured to filter for other than the defined IP header fields, because this would have led to undesirable complex configuration commands. Still, the filtering routines should be easily modified to consider other header fields. While a multicasting address resolution mechanism is not provided, the software allows to set up pointto-multipoint VCs with QoS parameters, over which multicast traffic may be forwarded. The user space API has been designed to provide the full functionality with a lean interface. This has been demonstrated with the VCM console, which allows to manually configure connection and filtering parameters. Of course, there is a lot of room for improvement: multicast address resolution, efficiency tuned kernel algorithms, multi-user support, etc. Still, the major goals have been reached: the software should provide a basis for implementing and evaluating algorithms that map IP QoS architectures, particularly IntServ/RSVP, to ATM. Thesis Gero Dittmann 73 Outlook 11 Outlook Because of the complexity of the subject, there are a lot of details in the implementation that need further fine tuning. Many of those have been mentioned throughout the text or are indicated as comments in the source code, including error handling, table lookup, implementation of the VCM_LISTFILTER command, and improving the address resolution mechanism. The adaptation of the library to other convergence modules, address resolution mechanisms, or ATM cards might be simplified if for those parts of the code a factory design pattern would be used as described in [GHJV95]. Some major improvements would be: full multicast support including address resolution (e.g. using MARS), multi-user support, and of course porting the software for other vendorís ATM NICs. Because of the object-oriented library style most of those changes will not require major changes to the API. Thus, a first integration with an RSVP/IntServ daemon would be possible already and should be implemented soon because demand for multimedia services is likely to grow at increasing speed, which makes development of algorithms for end-to-end QoS services a hot subject for research. A VCM-based RSVP daemon provides a comfortable environment for testing these algorithms. 74 Thesis Gero Dittmann Acronyms Appendix A: Acronyms AAL ABR ANSI API ARP ATM BA BE BUS CBR CDV CDVT CL CLIP CLR CLS CTD DiffServ DLPI ELAN FTP GS ICMP IEEE IETF IGMP ILMI IntServ IP ITU-T LAN LANE LEC LECS LES LIS LLC MAC ATM Adaptation Layer Available Bit Rate American National Standards Institute Application Programmers Interface Address Resolution Protocol Asynchronous Transfer Mode Behavior Aggregate Best Effort Broadcast / Unknown Server Constant Bit Rate Cell Delay Variation Cell Delay Variation Tolerance Controlled Load Classical IP Cell Loss Ratio Connectionless Server Cell Transfer Delay Internet Differentiated Services Data Link Provider Interface Emulated LAN File Transfer Protocol Guaranteed Service Internet Control Message Protocol Institute of Electrical and Electronics Engineers Internet Engineering Task Force Internet Group Management Protocol Integrated Layer Management Interface Internet Integrated Services Internet Protocol International Telecommunication Union - Telecommunication Standardization Sector Local Area Network LAN Emulation LANE Client LANE Configuration Server LANE Server Logical IP Subnet Logical Link Control Medium Access Control Thesis Gero Dittmann 75 Acronyms MARS MBS MCS MCR MTU NHRP NIC nrt-VBR OMG OSI PDU PHB PVC QoS RFC RSVP rt-VBR SCR SDAPI SDU SNAP SNMP SONET SPANS SVC TCP ToS UBR UDP UML UNI VC VCC VCD VCI VCM VP VPI WWW XTI 76 Multicast Address Resolution Server Maximum Burst Size Multicast Server Minimum Cell Rate Maximum Transmission Unit Next Hop Resolution Protocol Network Interface Card non real-time Variable Bit Rate Object Management Group Open Systems Interconnection Protocol Data Unit Per-Hop-Behavior Permanent Virtual Circuit Quality of Service Request For Comments Resource ReSerVation Protocol real-time Variable Bit Rate Sustainable Cell Rate Signalling and Data Transfer API Service Data Unit SubNetwork Attachment Point Simple Network Management Protocol Synchronous Optical NETwork Simple Protocol for ATM Network Signalling Switched Virtual Circuit Transmission Control Protocol Type of Service Unspecified Bit Rate User Datagram Protocol Unified Modeling Language User-Network Interface Virtual Channel (a.k.a. Virtual Circuit) VC Connection VC Descriptor VC Identifier VC Manager Virtual Path VP Identifier World Wide Web X/Open Transport Interface Thesis Gero Dittmann References Appendix B: References [Arm96] Grenville Armitage Support for Multicast over UNI 3.0/3.1 based ATM Networks IETF Network Working Group, RFC 2022, November 1996 [ATM94] The ATM Forum ATM User-Network Interface (UNI) Specification, Version 3.1 Prentice Hall, September 1994 [ATM95] The ATM Forum LAN Emulation over ATM Version 1.0 The ATM Forum, January 1995 [BBC+98] Steven Blake, David L. Black, Mark A. Carlson, Elwyn Davies, Zheng Wang, Walter Weiss An Architecture for Differentiated Services IETF Network Working Group, RFC 2475, December 1998 [BCS94] Bob Braden, David Clark, Scott Shenker Integrated Services in the Internet Architecture: an Overview IETF Network Working Group, RFC 1633, June 1994 [Ber98a] Lou Berger RSVP over ATM Implementation Guidelines IETF Network Working Group, RFC 2379, August 1998 [Ber98b] Lou Berger RSVP over ATM Implementation Requirements IETF Network Working Group, RFC 2380, August 1998 [Bra97] Bob Braden, Editor, et al. Resource ReSerVation Protocol (RSVP) -Version 1 Functional Specification IETF Network Working Group, RFC 2205, September 1997 [Cra98] Eric S. Crawley, Editor, et al. A Framework for Integrated Services and RSVP over ATM IETF Network Working Group, RFC 2382, August 1998 [For97] Fore Systems, Inc. ForeRunnerLE ATM Workgroup Switch Installation Manual Fore Systems, Software Version 5.0.x, October 1997 [For98] Fore Systems, Inc. ForeRunnerLE HE/200E ATM Adapters for Solaris Systems Userís Manual Fore Systems, Software Version 5.0, May 1998 [For98B] Fore Systems, Inc. ForeThought Partners Unix Adapter Documentation Fore Systems, ForeThought 5.0, August 1998 Thesis Gero Dittmann 77 References [GB98] Mark W. Garrett, Marty Borden Interoperation of Controlled-Load Service and Guaranteed Service with ATM IETF Network Working Group, RFC 2381, August 1998 [GHJV95] Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides Design Patterns Addison-Wesley, 1995 [HBWW99] Juha Heinanen, Fred Baker, Walter Weiss, John Wroclawski Assured Forwarding PHB Group IETF Network Working Group, RFC 2597, June 1999 [Hei93] Juha Heinanen Multiprotocol Encapsulation over ATM Adaptation Layer 5 IETF Network Working Group, RFC 1483, July 1993 [JNP99] Van Jacobson, Kathleen Nichols, Kedarnath Poduri An Expedited Forwarding PHB IETF Network Working Group, RFC 2598, June 1999 [Lau94] Mark Laubach Classical IP and ARP over ATM IETF Network Working Group, RFC 1577, January 1994 [Pos81] John Postel (Editor) Internet Protocol IETF, RFC 791, September 1981 [Rom98] F. Javier Antich Romaguera RSVP/IP Multicast in Heterogeneous IP/ATM Networks Diploma Thesis, KOM, TU Darmstadt, March 1998 [RUP97] Rational and UML Partners UML Notation Guide Version 1.1 OMG, document ad/97-08-05, September 1997 [SPG97] Scott Shenker, Craig Partridge, Roch Guerin Specification of Guaranteed Quality of Service IETF Network Working Group, RFC 2212, September 1997 [TOG97] The Open Group Networking Services (XNS) Issue 5 X/Open Document Number C523, February 1997 [Tan96] Andrew S. Tanenbaum Computer Networks Prentice Hall, Third Edition, 1996 [Wan97] Zheng Wang User-Share Differentiation (USD): Scalable bandwidth allocation for differentiated services IETF Internet Draft, draft-wang-diff-serv-usd-00.txt, November 1997 [Wro97] John Wroclawski Specification of the Controlled-Load Network Element Service IETF Network Working Group, RFC 2211, September 1997 78 Thesis Gero Dittmann References [Zin97] Michael Zink Integration der Dienstg¸tearchitekturen des Internet und ATM: ‹berblick und Evaluierung von Ans‰tzen f¸r RSVP ¸ber ATM Diplomarbeit, KOM, TU Darmstadt, August 1997 [Zse97] Tanja Zseby Support for IP Multicast over UNI 3.x: Link Layer Extension Diplomarbeit, TU Berlin, September 1997 Thesis Gero Dittmann 79 References 80 Thesis Gero Dittmann