Design of a MAC Protocol for Wavelength Sharing in a Passive Optical Distribution Network by Richard Thommes B.Sc., Mathematics and Engineering: Control and Communications Queen's University at Kingston, Ontario, Canada, 2000 Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY BARKER June 2002 MASSACH USETT I NSTITIUTE OF TECHNOLOGY @ 2002 Massachusetts Institute of Technology All Rights Reserved JUL 3 1200 2 LIBRARIES Author.... ... ............................... Department of Electrical and Computer Engineering May 16, 2002 .. Certified byJ Ni Professor Vincent W.S. Chan, Director, EECS Laboratory for Information and Decision Systems Thesis Supervisor Accepted )y .. ...... .............. Arthur C. Smith Chairman, Department Committee on Graduate Theses Design of a MAC Protocol for Wavelength Sharing in a Passive Optical Distribution Network by Richard Thommes Submitted to the Department of Electrical Engineering and Computer Science on May 10, 2002 of the requirements for the degree of fulfillment in partial Master of Science in Electrical and Computer Science Abstract: The model under consideration in this thesis is a passive optical distribution network shared by a number of users generating IP packets that are destined for an edge router port at an access node. Each user generates data in a bursty fashion, meaning it only has packets to send a small fraction of the time. Therefore, in order to maintain efficient router port usage and to keep the cost low, a large number of users must share each port. All the users connected to the port share a single wavelength (or a group of wavelengths). This thesis develops a MAC protocol that allows the access router port and this wavelength to be efficiently shared between the users. A major impetus of the thesis is to identify presently available hardware that may be used to physically implement the protocol at a reasonable cost An assessment is made of the theoretical delay and throughput characteristics of various classes of multi-access schemes. A reservation scheme is chosen as the best candidate. A high level specification of a reservation-based MAC protocol is presented. This specification defines all the control data which must be exchanged between the users and the access router in order for the protocol to operate, and addresses the need for Next, the hardware synchronization between all users and the router/scheduler. implementation of this protocol is addressed. An architecture making use of existing Gigabit Ethernet technology is suggested. Finally, the performance difference between a contention-based and contention-free version of the reservation MAC protocol is assessed. 2 Acknowledgements Personal: I would like to thank first and foremost my family: my parents, Fred and Rita, and my brother Ed. Further, I wish to acknowledge my fellow students at MIT: Jason Bau, Serena Chan, Patrick Choi, Todd Coleman, Lillian Dai, Roop Ganguly, Chi (Kyle) Guan, Ramesh Johari, Desmond Lun, Tengo Saengudomlert, Etty Shin, Guy Weichenberg, Yonggang Wen Academic: Thanks to my thesis advisor, Dr. Vincent Chan. This thesis was supported by the MIT Lincoln Laboratory under the Architectural Study for Next Generation Internet Award # BX-7276. 3 Contents Introduction ................................................................................................... 6 1.1 O bjective ................................................................................................................. 6 1.2 M odel ........................................................................................................................ 6 1. Analysis of M ulti-Access Schemes ........................................................... 9 1.1 Static TD M ............................................................................................................. 9 1.1.1 O verv iew ...................................................................................................................................... 9 1. 1.2 Throughput ................................................................................................................................... 9 1.1.3 De lay : ......................................................................................................................................... 10 1.2. Slotted A loha ..................................................................................................... 10 1.2.1 Overview .................................................................................................................................... 10 1.2.2 Throughput ................................................................................................................................. 11 1.3 Slotted C S M A/C D ............................................................................................... 12 1.3.1 Overview .................................................................................................................................... 12 1.3.2 Throughput ................................................................................................................................. 13 1.4 O ptical C D M A ..................................................................................................... 15 1.4.1 Overview .................................................................................................................................... 15 1.4.2 Throughput ................................................................................................................................. 16 1.4 .3 Delay .......................................................................................................................................... 17 1.5 Reservation ......................................................................................................... 17 1.5.1 1.5.2 1.5 .3 1.5.4 Overview .................................................................................................................................... 17 Throughput ................................................................................................................................. 18 D elay .......................................................................................................................................... 18 Contention Reservation Optimization ........................................................................................ 22 1.6 Token-Passing .................................................................................................... 28 1.6.1 Overview .................................................................................................................................... 28 1.6.2 Throughput ................................................................................................................................. 28 1.6 .3 Delay .......................................................................................................................................... 2 8 1.7 Preliminary Performance Evaluation ........................................................... 31 1.7.1 Slotted Aloha .............................................................................................................................. 1.7.2 Slotted CSM A/CD ..................................................................................................................... 1.7.3. TDM & O-CDM A: ................................................................................................................... 1.7.4. Reservation and Token-Passing ................................................................................................ 32 32 33 33 2. Reservation Protocol Specification ........................................................ 37 2.1 O verview .............................................................................................................. 37 2.2 Access Node Packets ....................................................................................... 37 2.2.3 Data packets ............................................................................................................................... 37 2.2.4 Control packets: ......................................................................................................................... 38 2.2.4.1 Types of Control Packets .................................................................................................... 38 2.3 User Packets ....................................................................................................... 41 2.3.1 Data Packets ............................................................................................................................... 41 2.3.2 Control Packets .......................................................................................................................... 41 2.3.2.1 Types of Control Packets ..................................................................................................... 41 4 2.4. Registration................................................................................................... 42 2 .4 .1 U ser R em o val............................................................................................................................. 44 2.5 Timing ................................................................................................................... 44 2.5.1 Synchronization ......................................................................................................................... 2.5.2 Synchronization Issues............................................................................................................... 2.5.2.1 Precision of measured quantities ..................................................................................... 2 .5 .2 .2 C lock D rift.......................................................................................................................... 2.5.3 Maximum discrepancy between AN and any user ................................................................. 2.5.4 Maximum discrepancy between any two users ..................................................................... 2.5.5 Compensating for Propagation Delay ..................................................................................... 2.5.6 Collisions between Upstream Packets................................................................................... 44 46 47 47 48 48 49 51 2.6 Reservation Schemes................................................................................... 51 2 .6.1 Static T DM ................................................................................................................................. 2.6.1.1 A N O peration ..................................................................................................................... 2.6.1.2 U ser O peration ................................................................................................................... 2.6.2 Contention R eservation .............................................................................................................. 2 .6 .2 .1 O peration ............................................................................................................................ 52 54 56 58 58 2.7 Scheduling Algorithm ................................................................................... 3. 59 Hardware.................................................................................................. 3.1 Gigabit Ethernet..............................................................................................61 3 .1.1 H isto ry ........................................................................................................................................ 3.1.2 F ram e Structure .......................................................................................................................... 3.1.3 G igabit E thernet L ayers ............................................................................................................. 3.1.4 Available Gigabit Ethernet Hardware ..................................................................................... 61 61 62 64 3.2 Im plementing the Reservation Protocol...................................................65 3.2.1 Mapping Packets to GE Frames .............................................................................................. 3.2.2 U ser H ardw are ........................................................................................................................... 3.2.3 A ccess Node H ardw are .............................................................................................................. 3.2.3.1 Access Node Scheduler Hardware: A Closer Examination............................................. 65 66 70 72 4. Perform ance Evaluation ......................................................................... 77 4.1 TDM Reservation .......................................................................................... 77 4.2 Aloha-Contention........................................................................................... 80 4.3 Effect of Bounds................................................................................ .83 5.1 Lim itations................................................................................................84 5.2 Alternatives ..................................................................................................... 85 5.3 Im provements................................................................................................ 86 5.3.1 Shortening the Reservation Interval....................................................................................... 5.3.2 Retaining Reservation Requests Exceeding the Bound.......................................................... 86 87 Conclusion .................................................................................................. 88 Appendix: Sim ulation ode ..................................................... ............... 89 References..................................................................................................... 5 93 Introduction The use of fiber optics is commonplace in modem communication networks. The majority of long-distance telephone calls and Internet packets traverse a fiber link for at least part of their path through the network. While fiber optic technology is also utilized in access networks, it is typically implemented in the form of a point-to-point link between an end-user and an access node. This type of architecture requires that each user have its own dedicated port on a network access router. The problem with this approach is that if the users generate traffic in a bursty manner the port utilization will be low and the expense high. Since each user only transmits a small fraction of the time, the port will mostly sit idle. Furthermore, if a large number of users are to access one router, it becomes prohibitively expensive to have a separate port for each one. 1.1 Objective The objective of this thesis is to develop a high level architecture design for efficiently sharing a router port between multiple users connected to a passive optical network. Solving this problem involves dealing with both Physical Layer and Data Link Layer (DLL) issues. Specifically, in order to share the channel, a DLL Multi Access Control (MAC) protocol must be developed. In order for users to transmit their data, a Physical Layer solution must be designed. 1.2 Model The physical model of the distribution network consists of a number of users connected to an Access Node (AN) via a common fiber. The term "user" will be used to denote any entity other than the AN connected to the passive distribution network. A typical user may be a high-end workstation or a router terminating a corporate LAN. The AN consists of an IP router and hardware handling the MAC protocol functionality. Depending on the protocol chosen, this hardware's role may include sending control packets to the users and receiving control packets from them. Upstream control packets will be processed by the MAC hardware and will not be forwarded to the router port. Downstream control packets will be interleaved with downstream IP packets. The functionality of the MAC hardware is transparent to the IP router. Figure 1 illustrates an overview of the model. 6 Access Node IP Router MAC Hardware I __________________ Downstream IP Packets: X2 Upstream IP Packets: X, Shared ' Fiber (Multiple levels) I User User 1 2 (Multiple users) User User N-1 N Figure 1.1: Model Overview A subset of users sharing a single wavelength destined for the IP router port will be considered. A major focus of the research is to develop a scheme that could be implemented at a reasonable cost using technology that is presently available. As a result, for cost reasons, a separate control channel will not be used. Users generate all their data in the form of IP packets. Some simple assumptions will be made about the bursty nature in which user IP packets are generated. Each user is typically only active about 10% of the time. The term "active" refers to a user being in a state where it may generate IP packets. Users are assumed to be active independent of one another. The total number of active users follows a binomial distribution. While a user is active, it is not continuously generating packets. Instead, new packets are created 7 in bursts. Packet bursts arrive only during a small proportion of the time a user is active, on the order of 1%. Packets destined for users connected to the port arrive at the IP router from elsewhere. They are modeled as being generated by a random process at the IP router. Once generated, they must be sent downstream to the users. There may be as many as 1000 users in this group. The fiber connecting the users and the AN is of a general broadcast structure. The average distance between the AN and the users is 5km, and the maximum distance between the AN and any user is 10km. The fiber allows all the users sharing the port to send packets upstream to the AN. The specific structure of the fiber will not be considered. Upstream and downstream communication may occur simultaneously. This will be modeled as follows: Upstream packets are sent on wavelength X1, and downstream packets are sent on X2. If upstream and downstream packets are sent over the same physical fiber, these wavelengths must be distinct. If separate fibers are used, X, and X2 can be the same. 8 1. Analysis of Multi-Access Schemes The following section introduces the classes of multi-access schemes being considered to form the basis of a MAC protocol for sharing the upstream wavelength. General measures of their performance are provided. These performance measures include throughput and delay. In some cases formulas are omitted because analytical results are not available. Packets are assumed to be of a fixed length, each requiring X seconds to transmit over the shared upstream channel. There are N total users sharing the channel, and the upstream channel rate is C bits/second. In order to obtain closed-form expressions for delay, a simplified model for the generation of new IP packets will be used. Namely, each of the A active users will generate packets according to a Poisson process of rate X. This is admittedly a violation of the assumption that packets arrive in bursts, since the Poisson process models packets as arriving one at a time. Rather than having -10% of the total number of users sharing the access network generating packets in bursts at any given moment - as is more likely to be seen in a real network - the model used will have them generate packets "smoothly". This means that any specific delay values derived are only approximations to the behavior of a real network. 1.1 Static TDM 1.1.1 Overview Users are synchronized to a common time source, and time is divided into fixed-length slots of length X. Each of the N users statically assigned every Mh slot during which time it may send a packet. If a user does not have any data to send during a slot, that slot remains idle. 1.1.2 Throughput Each user is assigned 1/N of the total available upstream capacity. A user not generating data during a given time period wastes its entire allocation of the channel. Due to the nature of static TDM, wasted slots cannot be reallocated to other users. A user may only generate data while it is active, following the definition of Section 1.2. Thus, throughput is upper-bounded by the proportion of active users. If there are A active users, A throughputTDm - N 9 (1. 1.1.3 Delay: When a new packet is generated by a user, it must wait for all the packets ahead of it to be sent. Each user may be modeled as an M/D/1 queue. The average delay of this type of queue is given by the Pollaczek-Khinchin formula. It has the following general form [3]: (1.2) WMID/i =1yfl 2(1 - r8) Here y is the Poisson rate at which packets arrive, / is the time required to transmit one packet, and WMD,,1 is the average queuing delay. In the model under consideration, a user may only send once every N slots, meaning /3=XN. Each user generates new packets at a rate of y =X. In addition to waiting for all the queued packets to be sent, a new packet must wait till the user is first allowed to transmit during its allocated slot. Since there are N total slots, the average waiting time till the next assigned slot is: Wnextsiot = NX 2 (1.3) The total average delay is the sum of the (1.2) and (1.3): TDM AN2X2 2(1- ANX) NX 2 1.2. Slotted Aloha 1.2.1 Overview Aloha refers to a family of contention-based strategies in which each user sends data packets without checking whether the common channel is free. This means that users may collide with one another when attempting to transmit. The variant of Aloha considered here is slotted. This means users are synchronized to a common time source, and time is divided into fixed-length slots. The slot length is chosen to match the time required to transmit one packet. The packets generated are initially buffered before transmission. Users may only send packets at the beginning of a slot. During each slot every user that has buffered packets sends one with a certain probability. This probability must be strictly less than one. If this was not the case, throughput would drop 10 to zero following a collision: Each user involved in the collision would attempt to retransmit its packet in every subsequent slot and the collisions would continue ad infinitum. 1.2.2 Throughput If there are M users with buffered packets to send, and each transmits with probability q during a slot, the probability that a given user successfully transmits a packet is given by: q(1- q) m ~' (1.5) The derivation of this expression is readily apparent: In order to be successful, the given user must attempt to transmit (an event of probability q), and the other users with packets must remain idle (each of the M-1 users is independently idle with probability (1-q), so the event of them all being idle is the product of the individual probabilities). The system throughput is limited by the probability of a successful transmission. A success occurs when one user transmits, and all the other users remain idle. Since there are M users with packets to send, there are M different ways for a successful transmission to occur. These events are mutually disjoint, so the total probability of a success is just the sum of the individual probabilities of each event: Psuccess = Mq(1-q) m -' (1.6) The above is maximized for q = 1/M. If q can be adapted in "real time" to adjust for a change in M, the algorithm is said to be stabilized. Stabilization is essential for the throughput to remain close to its ideal value. If M can be accurately measured, the maximum ideal throughput of stabilized, slotted Aloha - as function of the M - is given by: throughputideal,Slotted Aloha = I M (1.7) This expression converges to I/e quickly with M, as illustrated in the following plot: 11 1- 0.0- 0.6 0.4 /e.- -- -- -- ------ 0.2 0 1' ''20 30 40 50 NI&,mi di Usais Wh Palckes la Send Figure 1.1: Aloha Throughput In reality, the exact value of M is typically not known exactly and is instead estimated. This lowers the throughput from the theoretical maximum illustrated in Fig 1.1. 1.3 Slotted CSMA/CD 1.3.1 Overview Carrier Sense Multiple Access/Collision Detection has similarities to Aloha. It is also a contention based scheme in which collisions between users may occur. However, one difference with CSMA/CD is that users monitor the channel to make sure it is not in use before attempting to transmit. Collisions can still occur due to propagation delays: If a user finds the channel free it may be that another user has just begun broadcasting but its packets have not yet arrived at the first user. The other defining property of CSMA/CD is that a user continues to monitor the channel after it has transmitted in order to detect collisions, and ceases transmission after one time slot if it finds that one has occurred. The length of a time slot is determined by the maximum possible delay between a user sending a packet and determining that it was involved in a collision. Thus, if a user sends 12 a packet at the beginning of a slot, and does not detect a collision by the end of the slot, it will know with certainty that its transmission was successful. 1.3.2 Throughput The maximum time to detect a collision is the maximum propagation delay between any two users: this is the longest it can take a transmitting user to determine another user is also transmitting. This value will be denoted by Tc. After a successful transmission ends, a contention period starts. It is divided into contention slots of length Tc. Just as with Aloha, each user with data packets to send will do so at the beginning of a slot with probability q. If a contention slot remains idle or a collision occurs, it is wasted. The probability of a successful transmission (exactly one user transmitting a packet) is Psuccess, given by (1.6), for each slot. Thus, the distribution of the number of contention slots needed till the first success is geometric, with an expected value of: I Psurcess It follows that the expected number of wasted contention slots is: 1 -1 (1.8) Psuccess Once a successful transmission occurs, a time period of length X will be used to send a data packet. Thus, for every successfully transmitted packet of length X, the average amount of wasted time is Tc multiplied by (1.8). This means the maximum average throughput is given by: X ThroughputCSMAICD X+TC 1 - = 1+ a T T I 19) ' - For simplicity, the assumption has been made that X is an integer multiple of Tc. This allows a new contention period to begin immediately after the end of a successful transmission - no time is wasted waiting for the next contention slot to start. 13 After (1.6) is substituted into (1.9), it is elementary to find that the optimal value of q is again 1/M. This results in the throughput converging quickly, in M, to: 1 1+ a(e -1) (1.10) While the derivation above assumed Tc < X and hence a < 1, (1.9) still holds for cases where this is not true. A value of a > 1 corresponds to the situation where, after the packet is sent, the channel stays idle while the transmitter waits for a collision that it can do nothing about. It is readily apparent that CSMA/CD is an illogical choice in this case: Collision detection is not only useless, it actually increases the delay. A value of a =1 corresponds to slotted Aloha: The sender does not stop transmitting due to a collision, and does not continue to monitor the channel after it is done. The following plot illustrates the dependence of maximum throughput (the optimal value of q is used) on the value of a: ICShA/CD ft~d ghmIp 0.0- 0.6a=0.5 0.4- a=0.5 0-2- 0* 10 16 NIohbd 20 3 30 40 40 9l Uses Wih PWeIts In Send Figure 1.2: Aloha Throughput As expected, values of a > 1 give poorer throughput than slotted Aloha. 14 50io 1.4 Optical CDMA 1.4.1 Overview Code Division Multiple Access (CDMA) has been successfully deployed for several years in cellular communication. Optical-CDMA is being studied as an alternative technology for sharing an optical channel [17]. In this scheme, more than one user may use the channel at any time. Each is assigned a unique binary "spreading code". When a user sends data, it multiplies the data by this code. Specifically, a "1" bit is sent by user i as: F ci (t)= Lci - p(t -t" n=I Here F is the length of the spreading code, [cin,n=1..F c [0,1] makes up the unique spreading code for user i, and p(t) is a light pulse of a duration equivalent to the time needed to send one bit over the channel. The "0" bit is sent as an all zero sequence of length F. The receiver checks for the presence of a particular user's transmission by correlating the received signal, which is composed of the sum of all the coded signals sent by active users, with the particular user's spreading code. In order for correct detection to happen, the autocorrelation peak, K, common to all users' codes, must be greater than both the cross correlation constraint A,, and the autocorrelation constraint La. These are defined by: F K = cj -c, Vj n=1 This is equivalent to the number of "1"s in a spreading code. F Ar 0!m LCjnOCk n_ F-1,j*k n=I This is the maximum correlation between any shifted versions of any two distinct users' spreading codes. F 2a Cjn Cjn+, 1 m F -1 The above is the maximum correlation between two shifted versions of the same spreading code. O-CDMA may be either synchronous or asynchronous. The above formulae hold for asynchronous O-CDMA, where a user may transmit at any time. In synchronous 0CDMA, time is slotted and a user may only send at the beginning of a slot. In this case A 15 does not matter, since it is not possible to simultaneously receive two time-shifted transmissions from the same user. synchronous O-CDMA: Furthermore, the A, constraint is less strict for F Ie Cj, 'Ckc j k n=1 One class of optical codes that have been extensively studied [29] and are well understood are Orthogonal Optical Codes (OOC). For these types of code, Aa and , are unity. OOC codes are suitable for asynchronous O-CDMA. It should be noted that the usage of the word "orthogonal" to describe these codes represents a bit of a misnomer, since the cross correlation of distinct codes is not zero. The number of supportable unique codes N is related to F and K as follows: N [ F-1 K(K -1)_ ,K >2 (1.11) For a fixed N, increasing K (and hence F) lowers the probability of false detection at the receiver due to the presence of other users' transmissions. Specific expressions for the probability of error as a function of K lie beyond the scope of this paper In the synchronous case, codes exist which have N=F. In fact, the simplest case of such a code is bit-based TDM: For N users, each user is allocated every N bit-slot. This still meets the definition of an O-CDMA: The user assigned the j bit-slot is equivalently assigned the spreading code (0.. .010...0) where the only 1-bit is in the jth position. As one might expect, no codes exist with F<N. 1.4.2 Throughput When a user wishes to send 1 bit, it must encode it as an F-bit sequence. Since all N users may transmit simultaneously, every transmitted F-bit sequence can carry 1 bit of data for every user. Using F transmitted bits to carry at most N bits of data limits the maximum throughput to N/F. Thus, of the total channel rate of C, only a fraction N/F is available for transmitting data. Each user is effectively assigned 1/N of this available channel rate. As with static TDM, a user not generating data during a given time period wastes its entire allocation of the channel. Thus, with A active users, the total throughput is upper-bounded by the product of (1.1) and the fraction N/F: (1.12) ThroughputO-CDMA N F 16 F 1.4.3 Delay The average delay for O-CDMA is closely related to that of TDM. The substitution /=FX is made for equation (1.2), since each user may transmit at an effective rate of 1/F of the channel capacity. For non-synchronous O-CDMA, a user may send at any time, it does not have to wait till its next assigned slot. Thus, equation (1.3) does not apply, and the average delay is given by: W O-CDMA, Asynchronous AF 2 X 2 2(1 - AFX) (1. 13) For synchronous O-CDMA, a user may begin transmission once every F bit-slots. This means a newly arrived packet must wait on average F/2 slots before the user may transmit. Each of these slots is long enough to send one bit: 1/C seconds. Thus, the average total delay for synchronous O-CDMA is: W O-CDMA,Synchronous -AF2 X 2 2(1-FX) F (1.14) 1.5 Reservation 1.5.1 Overview A reservation based multi-access strategy requires users to inform a central scheduler of their intent to send accumulated data packets. The scheduler subsequently informs the users of when to send their packets, and how many may be sent. The scheme is divided into two alternating phases: the reservation phase, during which time users send their requests to the scheduler, and the data phase, when users transmit data packets as allocated by the scheduler. There are two possible ways to implement the reservation phase, contention-free or contention-oriented. The contention-free approach divides the reservation interval using static TDM or O-CDMA. Since it was shown in Section 1.4.2 that O-CDMA requires at least as much time as TDM to send the same amount of data, only TDM reservation will henceforth be considered. Each of the N users is assigned a unique slot which it uses to request packet transmission times. Contention-reservation reduces the length of the reservation interval, but requires users wishing to reserve to compete with one another when sending requests. The motivation is as follows: If only a small fraction of the N users typically have data packets waiting to be sent, most of the N reservation slots used in the contention-free scheme will remain 17 idle. The reservation phase represents overhead, and therefore should be made as short as possible. Since users compete to send requests, collisions are possible. All users involved in a collision will be unsuccessful in their attempt to send a request to the scheduler. They must wait till the next reservation interval to try again. Thus, designing a contentionreservation scheme involves finding a balance between the length of the reservation interval, and the probability that a user will be unsuccessful in reserving. 1.5.2 Throughput The maximum achievable throughput is given by: Maximum time spent sending packets per data phase (Max. time spent sending packets per data phase)+(Avg. length of reservation phase) There is no inherent limit on how many packets may be sent during a data phase. Thus, assuming a fixed reservation phase length, the scheme can approach a throughput of close to 1. In a typical implementation, the scheduler would place a limit on how many packets each user may send per data phase. This is to eliminate the possibility of a user sending a large number of packets continuously and choking off the other users for an extended period of time. With N total users sharing a channel, a bound of B packets/user/data phase, a reservation phase of length R and data packets that take a time X to transmit, the throughput is upper-bounded by: ThrouhgputRe servation,Bounded < B NBX + R (1.15) If, for a given time period, it is known that only A of the N users are generating data packets, N can be replaced by A for a tighter bound on the maximum throughput during that time. 1.5.3 Delay The first case considered will be contention-free reservation, in which there is no bound on the number of packets each user may send per cycle. The derivation of this delay presented below is directly from [3]. It is repeated here to justify the use of a similar approach to later derive new expressions for the delays of bounded and contention-based reservation schemes. The assumption is that newly generated packets are sent according to a first-come, first serve policy. This means that a newly generated packet must wait for three events to 18 complete between the times that it arrives and is transmitted. These include waiting for all the packets ahead of it to be transmitted, waiting through one reservation phase, and waiting through the residual time. The residual time is defined as the time to complete the transaction occurring at the instant the new packet is created: either the transmission of another packet or a reservation operation. Writing out an expression for the delay and taking expectations gives: E{Wreservation, unbounded }= E{R} + E{}X + E{Q} = delay of a packet time R = duration of the next reservation interval X= time to transmit one packet K = total number of packets waiting in all users' queues Wreservation, unbounded Q = residual Next, Little's law is utilized. It states that if packets arrive at a rate r to a queue and wait an average time w before being sent, the average number in the queue is given by: n = rw Assuming a cumulative packet generation rate of a, one may apply Little's Law to find K: K = aW reservation,unbounded Using simplified notation, the average waiting time is now given by: W reservation,unbounded = R + aX W reservation,unbounded + Solving forW reservation,unbounded W -R reservation,unbounses = +Q 1- aX Queuing theory provides the following expression for Q: - aX2 2 (1-aX)R 2 2R 19 (1.16) Q Substituting this result into (1.16) provides: W reservation,unbounded = aX2 2(1 -aX) + R 1-aX R2 2R If the reservation interval is a fixed time R, the above simplifies to: -aX R 3-aX) 2 W reservation,unbounded = a 2(1- aX) +- 2 I-aX (1.18) Now consider how this formula changes when a bound of B is placed on how many packets one user may transmit at any one time. In this case, a packet may have to wait through more than one reservation interval until its reservation is made. In particular, if a user has 1 packets in its queue, a newly arrived packet must wait through [ reservation intervals: a total delay of R B Here, R is considered fixed. If the distribution of r is f(q), then the expected number of additional reservation intervals a packet must wait is given by: [B =0 However, the distribution of 9 is generally not known, so one may proceed as follows. Let 0 A(Q) <1 be an unknown correction factor to compensate for the ceiling function If ([) B 7=0 f(q) q=0B + A) f n07) B-+ 20 = = = B Where A=E { AQ7) }: 0 A <1 Since all active users generate packets at a Poisson rate of A each, E{)7} can be obtained from Little's Law: E{;} = XWreservaion,bounded One may now enumerate the total delay seen by a newly arrived packet to a bounded reservation system. As in the unbounded case, it must wait for the residual time and for the packets ahead of it to be transmitted - these values remain unchanged. However, as discussed above, it must wait not through one reservation period of length R, but through B + Aof them. Writing an expression for the delay gives: A W reservation,bounded Wreservation,bounded = B + AIJ R + aXWreservation,bounded +Q Rearranging, W AR+Q reservation,bounded =- B Substituting (1.17) for Q, one obtains: R - W reservation,bounded AR +-(1-aX) aX 2 = +RA 211-aX-- B )+B 2 (-aX-~ RA Finally, since A <1, the following inequality holds: R W reservation,bounded < ax,2 2 1-aX - (3-aX ) - -a R (1) +(1-aX- Furthermore, Wreservation,bounded !Wreservaion,unbounded. This follows from the fact that a newly arrived packet in a bounded reservation scheme must wait through at least one reservation interval, whereas in the unbounded case, a new packets only waits through exactly one reservation period. 21 Next, the case of unbounded, contention-reservation will be considered. Let the length of a contention-reservation interval be given by R. Assume that the probability that a user attempting to register is successful in a given reservation phase is fixed at p. A new packet will, thus, have to wait for a randomly distributed number of reservation intervals, rather than just one as in the contention-free case. Since the probability of a success is fixed for all reservation intervals, this distribution is geometric with parameter p. Therefore, the average number of reservation intervals a packet has to wait through is 1/p and the average total time waiting during reservation intervals is given by Rp. Everything else remains the same as in the unbounded, contention-free reservation. This means the contention-reservation delay is obtained by replacing R in (1.16) by RIp and following the same steps for the remainder of the derivation. The final result is: W reservation,unbounded,contention = 2(1-aX) +R 1 p(1-aX) +!- 2 (1.20) One can proceed in a similar fashion for the bounded, contention based reservation scheme. A user has to wait through + AjJ successful reservations. Each of these takes an average time of R/p. Everything else remains the same as the contention free bounded case, giving the following inequality for the delay: X 2 Wreservation,bounded ,contenti.on 2 1-aX- R Bp RE(3-aX) R p (3-A A)121 iaxRA 1-aX (.1 Bp 1.5.4 Contention Reservation Optimization The contention scheme to be considered is slotted Aloha. In this approach, a certain number of slots (t) are available during each reservation phase. The number is not necessarily constant; in general it may change for each reservation phase. However, users are informed by the AN of how many slots are available before each reservation phase. Every user that has data packets to send will send a reservation request packet during the reservation phase. It will randomly pick one of the available slots. Given the above assumptions, R, does not remain constant, since it is a function of (t). Specifically, Re(t)= (t)Tr+T (1.22) where Tr is the length of one reservation slot and Tf is a fixed value that accounts for other delays during the reservation phase. Tf is composed of a processing delay at the AN, 22 and a transmission and propagation delay associated with the AN sending a packet informing users when to send their data. The value of p is also variable - it is a function of the number of users attempting to register during a given reservation interval, U(t), and the number of available slots. A user has its request successfully received if each of the other U(t)-J users chooses a different slot. Thus, the expression for p(t) is given by: ((t)1= p(t) = I - () (1.23) 1) The fact that p(t) and R,(t) vary with time makes the analysis difficult. Thus, initially a simpler model of contention reservation - one where these values remain constant for all reservation intervals - will be examined. Keeping R,(t) at a constant value R, requires having the same number of reservation slots during each reservation phase. Thus, 4(t) will be fixed as 4. In order for p(t) to be fixed as p, U(t) must be a constant U. Even though the assumption that the same number of users attempt to register each reservation phase is unrealistic, it will be made to allow the following calculations to proceed. Starting with (1.18), expressions for R, and p are substituted in: W reservation,unbounded,Aloha-contention = aX2_ +(T 2(1 - aX) r + Tf )+ I (u-1)( X - 2 (1.24) In order to optimize the performance of the Aloha-reservation, contention-based protocol, the average delay must be minimized. Thus, given a value of aX - henceforth referred to as the load p - and U users attempting to register, the objective is to find an optimal number of slots . This is achieved by the standard approach of taking the derivative of the delay with respect to 4, equating it to zero, and solving for 4. After simplification, the optimal value of 4 is given by the solution to the following equation: U 2[ -U)+ Tf(1-U)]+ f j 2 T(1- p) =0 (1.25) An explicit solution for 4, as a function of U, p, Tr, and Tf could not be obtained. However, (1.25) can be numerically solved for specific values of these variables. As mentioned before, the assumption that U remains constant is unrealistic. The model which will next be considered is one in which the number of active users, A, generating packets according to a Poisson process is fixed. will still remain fixed. A user will 23 only attempt to register during a reservation phase if it has any data packets waiting to be sent. Thus, U(t) will vary over time. A user will have data packets waiting to be sent in the following cases: 1) The user generated at least one new packet since the previous reservation phase 2) The user generated no packets since the previous reservation phase, but generated one or more in the period between the beginning of the second last reservation and the beginning of the last reservation phase, and was unsuccessful in reserving during the last reservation phase. 3) The user generated no packets since two reservation phases ago, but generated one or more in the period between the beginning of the third last reservation phase and the beginning of the second last reservation phase, and was unsuccessful in reserving during both of the last two reservation phases. n) The user generated no packets since (n-1) reservation phases ago, but generated one or more in the period between the beginning of the nth last reservation phase and the beginning of the (n-1)th last reservation phase, and was unsuccessful in registering during the last (n-1) reservation phases. The above pattern continues for an infinite number of cases. In order to perform further analysis, more variables must first be defined: " Let T, be the time between the beginning of the current reservation phase and the beginning of the previous one, T2 be the time between the beginnings of the second last and last reservation phase, and in general let Ti be the time between the beginnings of the ith last and (i-1)th last reservation phase. * Let pi be the probability of a successful reservation in the ith last reservation phase Since each user generate packets generated according to a Poisson process of rate X, the probability f(n) that n packets are generated by a user during an interval of length Ti is given by: 'AT f(n) =- n!I It follows that during an interval of length Ti, a user will generate no packets with probability e-A and generate at least one packet with the complimentary probability 1eA . Therefore, the probabilities of the above cases can now be expressed: 24 p(l) = 1- e--" p(2) e-"TI(1- e-T2 )(I - pl) p(3) e-"'T e~"2 (1-e- )(1-p p p(n) = e~" e-AT2 ...e-AT.-I (_ -P2) - p1 )(1 - P 2 ) ...(1-- Pn-1) The probability P, that a user will attempt to register in the current reservation phase, is the infinite sum of these cases. Ti and pi are random quantities with a joint distribution. Furthermore, the distributions of the two vectors (Ti, pi) and (Ti+, 1 pi+,) exhibit dependence. This can be illustrated with the following example: A larger than average value of Ti makes it more likely users generated packets since the last reservation phase, meaning that the number of users attempting to register will tend to be higher than average -lowering the probability pi of a successful reservation for any given users. The high proportion of users unsuccessful in reserving results in fewer being allowed to send data packets during the subsequent data interval, thus tending to lower Ti+,. Furthermore, unsuccessful users will attempt to register again during the subsequent reservation phase and hence affect pi+]. The dependence described is not limited to consecutive intervals. For instance, it is apparent from the above example that the effect of the large Ti may extend to Ti+2 and Pi+2 and further. Given the complex joint distribution of the random quantities, an expected value of the infinite sum describing the probability of a user register could not be found. To continue the analysis, several simplifying approximations will be made. A simulation will later be run to provide an indication of how closely the simplified approach approximates the true behavior of the contention-based reservation protocol. First, all values of pi will be replaced by a fixed approximation p. Next, all values of Ti will be replaced by the expected amount of time between the beginning of consecutive reservation intervals, T. The value of T depends on the network load p and the length of the reservation phase, R,. The value of p must be the same as the average proportion of time dedicated to transmitting data packets. Packets are sent during the data phase, and the channel sits idle during the reservation phase. Thus, R, makes up an average proportion (1- p) of the length of average cycle (a cycle being one reservation phase and one data phase), and the data phase makes up a proportion p. Thus, T= R, 1-p With these simplifications, the infinite sum for P can be re-written as: 25 (D = (I-e~AT)[+ e-AT(I- p) +e2 AT( [e -A(T e-A)T =(- p) 2 + e-3AT(p) 3 +...] p)]" n=O e_'T )(1.26) 1-eA T(1 P) If there are A total users generating packets according to a Poisson process, an approximation to the expected number of users, U, attempting to generate each reservation phase is given by: -I U = A(1--e 0) 1-e '- (1- p) (- TeAT (I-p)A = = 1DA A -] (1.27) Substituting in (1.22) into (1.25) provides: A 1-e -AeT'+ U =~ -L I- e _ ) _j- (1.28) P(1- p) Since it takes X s to send one packet, the capacity of the channel, in packets/s, is 1/X. A users generating at an average rate of A packets/second for a total load of p means the following relation holds: =p P - = (1.29) XA Substituting (1.29) into (1.28) gives: A U = [1-e - .-- - -(1.30) A (i (1- p) 26 Finally, substituting in the following approximation for p: p 1 provides A l-e i( U i= [- ri-1 -l- pj{Tc+ T AX 1-p - J - (1.31) ) Given values for Te, T, p, and A, an approximate relationship between 4 and U can be obtained using (1.31). For each value of 4 substituted into (1.31), the corresponding value of U is found by numerically solving the equation. This procedure may be repeated for various values of 4 to obtain a set of vectors (4, U). Finally, each vector can be substituted back into (1.20) in order to find the choice of 4 that minimizes the average delay. It is important to emphasize again that the above relation is only an approximation, since random variables have been replaced by their expected values. Simulations should be carried out in order to get an indication of how accurate (1.31) is in providing an optimal choice of slots for a contention-based Aloha reservation scheme. It is instructive to consider the limiting situation when p=1. This case corresponds to contention-free TDM reservation. Let RTDM be the length of the reservation interval. Making these two substitutions into (1.26) yields: UTDM A l-e '-P (1.32) Here UrD is the expected number of users that register during each reservation phase in a contention-free setting. Finally, substituting (1.29) into (1.32) provides: _pRoryx UTDM A 1-e AX(-) 27 (1.33) 1.6 Token-Passing 1.6.1 Overview Token passing is a contention-free, non synchronous scheme. Users are sequentially given a chance to send data packets through the use of a token, a special control packet that circulates between them. Upon receiving the token, a user transmits its queued data packets and, when finished, sends the token to the next users to continue the cycle. A user receiving the token when it has no data to send immediately forwards it. 1.6.2 Throughput The throughput analysis is similar to that of the reservation case. achievable throughput is given by: The maximum Maximum time spent sending data packets sent per cycle (Max. time spent sending data packets per cycle)+(Time spent passing token per cycle) Where a cycle is defined as the period of time in which the token reaches each user exactly once. There is no inherent limit on how many packets may be sent during one cycle. Thus, assuming a fixed time spent passing the token, the scheme can approach a throughput of close to 1. If a limit B is placed on how many data packets a user may send per token possession and the average amount of time to pass the token between any two users is L, the throughput is limited to: BX BX- (1.34) BX+L If, for a given time period, it is known that only A of the N have any data to send, the token still has to pass between all the N users. In this case the throughput is limited to: ABX ABX + NL 1.6.3 Delay For the unbounded case, a newly generated packet by a given user has to wait for three events to transpire: all data packets queued by users that get the token before the given 28 user must be transmitted, the token must pass between all the users ahead of it, and the current residual time must come to an end. This is expressed as: E{Wk,boudd }= Wtoken,bOunJed E{Y} + E{K'}X + E{Q '} = queuing delay of a packet Q'= residual time Y= total duration of all token-passing operations the packet must wait through X = time to transmit one packet K '= number of packets queued by users that get the token before the given user must be transmitted At first observation, E[K'} may seem difficult to calculate, since, unlike reservation, a token passing scheme does not transmit packets on a strictly first come, first serve basis. For instance, consider a case where there are 10 users, indexed by (1,..,10), with the token passing between them in the order of the index. If a new packet is generated at user 10 while the token is at user 5, the newly arrived packet will be transmitted before any of the queued packets at users 1 through 4 that arrived before it are transmitted. On the other hand, data packets that are generated by users 5 through 10 at times after the new packet under consideration is generated, and before the token leaves the respective user, will be transmitted before the packet under consideration. Fortunately, an important result from queuing theory [3] states that the average delay experienced by packets does not depend on the order in which they are transmitted (as long as the order is not influenced by the relative size of the packets). This result implies that the average number of packets transmitted before the newly arrived packet is the same as if packets were sent on a first come, first serve basis. Thus, from Little's law, e = W token,unbounded Where a= aggregate Poisson arrival rate of new data packets. A newly arrived data packet is equally likely to arrive when the token is at any user. Thus, since the token passes between all N users, on average the new packet must wait through N/2 token passing operations. Assuming the average time to transmit a token is L, - Y - LN 2 Using simplified notation, the average waiting time is now given by: - tLN W okenunbounses =- 2 + aX W 29 tokenunbounded +V Solving forfW , one gets LN Wtoken,unbounded = -- 2 1-aX Queuing theory [3] provides the following expression for Q: - aX (1-aX)L 2 2 Q =-+ 2 Substituting this result into the expression forW one obtains: -aX Wtoken,unbounded 2 aX + 2(1-aX) = L( N +1I- aX ) ( -aX) 2(1-aX) (1.35) ] Next, the bounded case will be considered. If a newly generated packet arrives to a queue with il packets, it must wait for token-passing cycles. The first one of these will involve the token passing between N/2 users on average, and the remaining ones will have the token passing between all N users. Thus, in the bounded case, Y takes on a value of - = NL 2 NL= -L[ B B 1) 2 With analogy to the reservation case, AW toen,bounded B B +A Therefore, for the bounded case, Y=NjWtokn~bounded 2) B the expressions for K' and Q' remain unchanged. Solving for Wtokenbounded one obtains: -aX2 W2tokenbounded L(2A.N+1I- aX - N) 2-aX- + B 30 2 -aX- N B This provides the following inequality: -- Wtoken,bounded < aX 2 + 211-aX- N B L( N + I- aX ) L(N+2-aX ) 2<i-aX-N (1.36) B Furthermore, Wtoken,bounded Wtoken,unbounded. This follows from the fact that a newly generated packet in a bounded token-passing scheme may have to wait for several tokencycles, whereas new packets in the unbounded case only wait till the token first arrives. 1.7 Preliminary Performance Evaluation The viability of using the above multi-access schemes as the basis for a MAC protocol that shares the upstream wavelength between the users will now be examined. The objective here is to select a candidate, or candidates that warrant further investigation. Performance calculations are only estimates of true network performance, intended to eliminate weaker candidates. Before proceeding, additional information must be presented. The physical layer connectivity will be implemented using Gigabit Ethernet (GE) technology. An overview of GE will be given later. For now, the important characteristics of GE, in terms of impacting the performance of a MAC protocol, are summarized here: * Maximum frame size: 1526 bytes (9026 if Jumbo Frames are used) * Minimum frame size: 72 bytes * Channel Rate: 125 Mbytes/second For simplification, the assumption will be made that all data frames sent by users are of fixed length: the maximum 1526 bytes. There is some justification for this simplification: When a higher layer process wishes to send some data, it will divide it into segments so that the data can be sent in the form of IP packets. Since the IP overhead bytes stay the same no matter the amount of data that an IP packet contains, higher layer processes tend to be optimized to send packets as large as possible. In fact, measurements of Internet traffic show [6] that the single most common IP packet size is 1500 bytes, which corresponds to a 1526 GE frame after encapsulation. Where appropriate, the effect of using fixed length 9026 byte jumbo frames will be considered. Throughput is measured as the proportion of the 1 Gbit/s upstream channel that is available for sending data frames. Although 26 bytes of each data frame are overhead or more if the encapsulated IP packet headers are also considered - this factor will not be considered when calculating throughput since it is common to all multi-access schemes. 31 The number of users sharing the GE channel, N, may be as large as 1000. Since each user is active approximately 10% of the time, the expected number of active users is: A =(0. 1)N A further assumption is that users remain in their active or inactive states for long durations of time, meaning A varies slowly and it may be accurately estimated during operation of the distribution network. The performance of the network at any point in time is a function of the current value of A. 1.7.1 Slotted Aloha The above assumptions are not directly applicable to the Aloha throughput formula (1.7), since the number of users actively generating packets according to a Poisson process is not easily translated to the average number attempting to transmit during a given slot. However, some general conclusions can be drawn. The maximum throughput of Aloha drops below 0.5 when more than 2 users have data to send at any given time. Given that as on the order of 100 users may be generating traffic at any time, it seems reasonable to assume that typically more than 2 would be attempting to transmit at any time. Thus, one would expect the throughput to be less than 0.5, and indeed close to the limiting value of 1/e (-.36). A major impetus in designing the MAC protocol was to make efficient use of the router port. Clearly, a throughput this low does not meet the requirement. 1.7.2 Slotted CSMA/CD One must find the value of a for the access network in order to evaluate the merit of a CSMA/CD based protocol. Since users cannot receive upstream transmissions, they are unable to do collision detection directly. The AN has to be involved, either detecting collisions and informing the users, or simply echoing back all transmitted frames. Tc would thus be twice the propagation delay from the user furthest to the AN: A transmitted frame must travel to the AN and come back. The furthest distance from any user to the AN is 10 km, and the propagation speed of light in fiber is approximately 2* 108 m/s. Therefore, Tc = 2 1.10 4m4 1. M= 1.10~4 S 2-108 Mr/s Next, T - 1526bytes = 1.2.10-'s 125Mbytes I s In this case, a= 10. If Jumbo frames are used, Tp will be increased by a factor of approximately 6, meaning a will still be greater than 1. Since CSMA/CD is only an improvement over Aloha for values of a smaller than 1, it warrants no further consideration. 32 1.7.3. TDM & O-CDMA: Both of these candidates are quickly eliminated from contention because of the way they divide upstream capacity. They both assign 1/N of the total capacity (or less in the case of non-synchronous O-CDMA) to each user, whether it is generating data or not. Since the assumed model is that only approximately 10% of users are active at any time, the throughput is limited to about 0.1. 1.7.4. Reservation and Token-Passing Before comparing the two remaining schemes, it is important to discuss how each would be implemented given the constraints of GE and the assumptions about the access network. For the token-passing protocol, transmissions of the token between users must go through the AN since users cannot communicate directly. When a user has completed its transmissions, it informs the AN, which then informs the next user. The average distance from a user to the AN was given as 5 km, thus the token must travel an average of 10 km each time it moves between users. The actual token does not carry much information, only the address of the intended recipient (which is in every GE frame header anyway), so the shortest frame size of 72 bytes should be used. This frame must be transmitted twice per token-passing: by the user terminating transmission, and by the AN. Therefore, ignoring processing overhead at the AN, the average time to pass a token, L, is given as: L = 2(prop.delay+ trans.delay) = 2 (5.0m 72 bytes 58MIS + b = 5.1152-10- s (2-10m s 125Mbytes / s) For TDM-based reservation, each of the users is assigned a slot to send a reservation frame in. Again, this frame will not carry much information, only the number of requested packets. Thus, a 72-byte frame will be used. The transmissions are scheduled, meaning that the frames can ideally be sent one after another if the users are perfectly synchronized to a common clock and compensate for their propagation delay to the AN. Unlike the token-passing protocol, the propagation delay will not be added to each transmission since the reservation frames are pipelined: While one is in flight, the next can already be sent; the propagation delay is only counted once for all the reservation frames. After the AN receives the requests, it sends back a frame instructing all users when to transmit. Presumably this information could be contained within a maximum size frame of 1526 bytes. 33 An ideally synchronized TDM reservation interval is therefore composed of: the time to transmit all the user requests, the time for the last one to propagate to the AN, the time for the AN to transmit its frame, and the time for it to propagate to all the users. Processing delay is again ignored. For 1000 users, this represents a total length of time, R, of R=l1000 72 bytes 125Mbytes / s) +2 (1526 bytes +1I =6.8821125Mbytes /Is) 2-10 m / s 4 1.10 M _4s Given the values of R and L, it is now possible to compare the delay values of the two schemes. The remaining free parameter is a, the aggregate generation rate of new frames. It should be noted that a only appears in the delay formulae within the product aX. The significance of this product can be seen by considering the units of the two variables: X is the time it takes to transmit a frame, given in seconds/packet, and a is the rate at which new frames are generated, given in packets/seconds. Their product is a unit-less value between 0 and 1, representing the load placed on the network. This can be defined as the rate at which new frames are generated, as a fraction of the maximum rate at which frames can be transmitted across the channel. The network load will be the free parameter in the subsequent delay plots. Average Delay vs. Lad .10e3 Delay,s .1e2 0.2 .1 0.4 0.6 Token-Passing 0.8 - .1e-1 Reservation .le-2 Figure 1.3: Unbounded TDM Reservation (1.18) and Unbounded Token-Passing (1.35), Log Plot 34 Average Delay vs. Lead 1e3 T-P B=1 Delay.s T-P T-P B=100 B=100 T-P B=nor e 1, .11'-- =1000/ -R ie-1 BB=1 .Ae-1: B=1O =none .1e-2 T-P: Token Passing; R: Reservation; B: Bound (max. number of packets sent per cycle) Figure 1.4: Bounded TDM Reservation (1.19) and Bounded Token Passing (1.36), Log Plot Notes regarding Figure 1.4: * Recall that for bounded T-P and Reservation, the delay formulas (1.19) and (1.36) are upper limits. The lower limits are given by (1.18) and (1.35), respectively. Thus, the actual average delay lies between the plotted curves representing these upper limits lower limits. * The load value corresponding to the asymptote of a curve is the maximum throughput of that particular scheme/bound. * The upper limit on the delay performance of a reservation scheme with a 1000 packet bound is so close to the delay of the unbounded case, that the curves representing these two cases are coincidental. Figures 1.3 and 1.4 indicate that a reservation protocol provides significantly better delay performance, both for bounded and unbounded cases. Further, for a given bound, the reservation scheme provides a much better maximum throughput. However, it is necessary to take a closer look at some of the idealized assumptions made, and how a physically implemented protocol would differ from the ideal. The most significant 35 assumption for the reservation protocol is perfect synchronization. In reality, hardware limitations introduce inaccuracies to the synchronization. In order to counteract this problem and ensure that there are not any collisions, guard times must be introduced between transmissions - times when no user is transmitting. The longer these guard times are, the longer the reservation period is. Secondly, the processing time required for the AN to schedule the packets was not included. The token-passing assumptions are quite realistic: Only the processing time of the AN in returning the token was ignored. Thus, it seems likely that its performance of an actual token-passing protocol would mirror the theoretical results quite closely. In short, the performance of a physically implemented reservation protocol is significantly more uncertain - but potentially much better - than that of a reservation protocol. In light of this observation, the following course of action suggests itself: Devise and attempt to optimize a reservation protocol and evaluate its delay performance, taking into consideration all the non-ideal factors described above. Optimization will include consideration of the contention-based version of the reservation protocol. If its delay is more favorable than that of the theoretical token-passing protocol, the reservation protocol will be deemed as the correct choice. Otherwise, a token-passing protocol must subsequently be specified and its performance evaluated. 36 2. Reservation Protocol Specification 2.1 Overview The purpose of this section is to describe the logical functioning of the reservation protocol. Specific hardware implementation issues and performance evaluations are discussed later. Descriptions of packets refer only to the elements of importance to the reservation protocol. Additional framing data will also be addressed later. All users have a unique identification number (ID) which allow them to be recognized by the AN. When a user transmits any type of packet, its ID number will be contained in the header information. When the AN sends a packet intended for a specific user, that user's ID will appear in the header. Some of the AN packets will be addressed to all users. These are identified by a unique broadcast identifier in the header. The protocol has two phases: reservation and data transmission. During the reservation phase, each user that has stored data packets waiting to be transmitted sends a control packet to the AN indicating the number of buffered data packets at the present time. After receiving all requests, the AN carries out a scheduling algorithm. The AN next informs the users of the specific times when each one is allowed to send its data packets, and how many packets each may send. The data transmission phase occurs next. It is the time when users transmit data according to the schedule provided by the AN. 2.2 Access Node Packets Each user is constantly monitoring the shared channel for downstream packets originating from the AN. All packets sent by the AN fall into one of two categories: 2.2.3 Data packets These contain the encapsulated IP packets that arrive to the AN via the IP router, and then sent downstream to the distribution network. Each user examines the destination ID. Those intended for other users are discarded. Those packets intended for the given user are forwarded to the network layer. Packet Format: ID of intended recipient Flag indicting a data packet 37 IP packet 2.2.4 Control packets: Control packets may carry a special broadcast ID, respond accordingly. Others will be addressed to a on the packet, while all the other users ignore the fixed structure known a priori by the users - this memory by each user. meaning all users examine them and specific user. This user will act based packet. All control packets are of a information is stored in non-volatile Packet Format: ID of intended recipient / or broadcast ID Flag indicating the type of control packet Additional control information 2.2.4.1 Types of Control Packets Control packets originating from the AN fall into nine categories. Reservation Announcement (RA): Description: It informs all users of the specific times when they may send data packets, and how many; also when the next reservation phase is to occur. This is the only control packet not of a fixed size. Type: broadcast Control Information: dependent on the reservation scheme used. Discussed in: 2.6.1.1 and 2.6.2.1 Time Packet (TP): Description: It is for synchronizing user clocks to the AN clock. Type: broadcast Control Information: current time and date, in the following layout hours: minutes: seconds: nanoseconds. Discussed in: 2.5.1 38 Slot Designation (SD): Description: It informs a user of its slot assignment in the case of a TDM reservation protocol. Type: This packet is addressed to a specific user. Control Information: the user's slot assigned slot number Discussed in: 2.5.1 Emergency Packet (EP): Description: It is sent to initiate the beginning of Synchronization Mode (SM) if the AN detects a high level of collisions indicating loss of synchronization The packet informs all users to immediately cease transmission, and cancel any future scheduled transmissions. Type: broadcast Control Information: none, no additional information necessary. Discussed in: 2.5.6 Begin Measurement (BM): Description: It informs a user that it should now begin sending a Propagation Measurement (PM) packet, so that it can measure its propagation delay to the AN. Type: This packet is addressed to a specific user. Control Information: Specifies TB, the time required by the AN, in nanoseconds, to process an incoming BM and transmit a Propagation Return packet. A user requires this information so that it can calculate its propagation delay to the AN. Discussed in: 2.5.1 Propagation Return (PR): Description: After a user sends a PM packet, the AN responds by sending back a PR packet. Type: This packet is addressed to a specific recipient. 39 Control Information: none Discussed in: 2.5.1 New Users (NU): Description: This packet informs users that have just joined the network that they may attempt to register with the AN. Type: broadcast Control Information: none Discussed in: 2.4 Successfully Registered (SR) Description: This packet informs a user that the AN has successfully received its Network Registration packet. Type: addressed to a specific user Control Information: none Discussed in: 2.4 Registration Collision (RC) Description: This packet informs users that a collision has occurred between NR packets and that each unsuccessful user should transmit another NR with a certain probability strictly less than 1. Type: broadcast Control Information: a probability value Discussed in: 2.3 40 2.3 User Packets 2.3.1 Data Packets User generated IP packets are encapsulated before being sent to the AN. These encapsulated packets are of a fixed length X. They do not require a destination address since they are all sent to the AN. Although they do not strictly require a source address either (The AN doesn't need to know the recipient, it simply forwards the IP packet to the router), one is nevertheless added to allow the AN to check for problems with user reservations. For instance, if a certain user is scheduled to send at a given time, but a packet from a different user is received, the AN will detect a problem. Packet Format: ID of user sending the data Flag indicting a data packet IP packet 2.3.2 Control Packets Again, a destination address is not necessary. The ID of the sending user is, however, required. Packet Format: ID of user sending the control packet Flag indicating the type of control packet Additional control information 2.3.2.1 Types of Control Packets Reservation Reauest (RR) Description: It informs the AN of how many data packets are waiting in memory to be transmitted. Control Information: The number of packets. Discussed in: 2.6.1.1 and 2.6.2.1 Propagation Measurement (PM) Description: This packet is sent after the user receives a BM packet from the AN. 41 The user sends this packet and waits to receive a PR packet from the AN. It then uses this information to calculate its propagation delay. Control Information: None Discussed in: 2.5.1 Network Registration (NR) Description: This packet is sent by user attempting to register with the AN Control Information: None Discussed in: 2.4 2.4. Registration It is expected that new users will be physically added to the network from time to time. The AN must be made aware of all new users, as will become apparent in Section 2.5. To this end, the AN periodically sends out an NU packet informing new users that they may make their presence known. During this time the regular operation of the reservation protocol is suspended. It is assumed that registration periods occur at a low enough frequency so as to have no significant effect on network performance. In response to a NU packet, all new users (if any) immediately reply with an NR packet containing their ID. If more than one new user attempts to register, it is possible that a collision between NR packets may result. Due to different propagation delays between users and the AN, it is certainly also possible that two or more users attempt to register and are all successful. If the AN successfully receives one or more NR packets, it will send back an SR packet for each NR received. This packet informs the new users that they have successfully registered. If the AN detects a collision it initiates a back-off among competing users, meaning that each one will send a subsequent NR with a probability strictly less than one. The AN sends out an RC packet informing the users of the explicit probability value. Immediately after receiving the RC, each user randomly chooses whether to transmit an NR based on the probability provided. If the AN detects another collision, this indicates that more than two users may be competing. Thus, it sends out another RC packet with a lower probability value. After an idle, the AN sends out an RC containing the same probability value as the previous one. If the AN receives a successful NR following one or more collisions, it will know that at least one further new user exists that has not yet successfully registered. Therefore, the AN subsequently sends out a NU packet informing the remaining new user(s) to register (with probability 1). If another collision is observed, a back-off is again initiated. The operation of the AN and the new users during the registration phase is summarized in the following two diagrams: 42 Processes Packets NU AN Send NU Processes Packets Receives data -0 I Correctly receives 1 or more NR packets and detects no collisions No data AN operates in regular mode for a certain amount of time Detects Collision Sends SR for each NR received Sends RC and Sends SR for any correctly received NR packets Sends SR Collision or Idle Correctly receives NR Figure 2.1: AN Operation During the Registration Phase Receives SR Wait for NU; sends NR Has successfully registered at the AN Receives SR Receives RC Sends NR with probability indicated by RC Receives NU Receives RC CD Figure 2.2: Operation of a New User During the Registration Phase 43 2.4.1 User Removal An existing user may occasionally be physically removed from the network. If static TDM reservation is used (2.6.1) the AN should be made aware of this, so that it can eliminate the user's reservation slot. A user may be removed abruptly, and it is therefore not reasonable to expect it to send a packet to the AN requesting to be "de-registered". A polling algorithm, in which the AN periodically communicates with all users to ensure they are still on the network, is also flawed - the AN cannot distinguish between users that are powered-down and those that have been removed. A simple alternative is to have a network operator inform the AN of the removal of a user through a configuration interface. This would not have to be done immediately, since the AN will continue to operate normally even if not aware of the removal of a user - that user's slot will simply be wasted during every reservation phase. 2.5 Timing The reservation-based scheme involves the scheduling of future events. It is therefore necessary for all users and the AN to have access to a common time source. The AN and all users have clocks which are synchronized with one another. The term "clock" refers to a device that keeps track of elapsing time. It will not be used to describe an electronic oscillator that produces a fixed frequency. The AN sends out an RA packet at the end of reach reservation phase informing all users of when to transmit. Upon receiving this packet, every user extracts its designated transmission time and uses its own clock to begin transmission at the appropriate time. The challenge in this approach is to achieve and maintain synchronization between all the users' clocks and the master time clock at the AN. The following strategy is suggested. 2.5.1 Synchronization The AN periodically sends out a broadcast TP which contains the present time of its master clock. Each user receives the packet and sets its own clock accordingly. The problem is that some time elapses between the instant the TP is sent and received. This delay is composed of the following three elements: the generation and transmission of the TP by the AN, the propagation of the TP from the AN to the user, and the processing of the packet by the user. These elements are illustrated in the following diagram: 44 Distance TA D...... :::. AN reads current time L Transmission elay ......................... ....... ................ ........................ User sets time . TU + TA: time for AN to generate and transmit the Timing Packet Dj: the propagation delay for the TP to travel from the AN to user i Tu: time required by user to process TP and set its time clock Figure 2.3: Delay Between TP Generation and Users Setting Their Clocks In order to compensate for its transmission delay, the AN adds TA to the time value sent in the TP. A user compensates for the remaining delays by adding the sum of Di and Tu to the time value of the TP. It then sets its clock to this calculated value. TA is a function of the AN hardware and the rate at which data can be transmitted across the fiber. The value should be precisely determined by the manufacturer during the design and testing phase of the AN. The AN will have this value stored in non-volatile memory. Tu is a function of only the user hardware. Again, it must be determined by the manufacturer and placed in non-volatile user memory. Presumably it would be identical for all users. However, if this is not the case, it doesn't matter since each user knows its own value of Tu. The value of Di is different for each user, and is a function of the distance from a user to the AN. The following procedure is proposed for having each user determine its value of Di: After a new user has successfully registered, the AN sends it a BM packet. The BM packet contains a value called TB. Its significance will be addressed shortly. Upon receiving the BM, the user sends a PM control packet and begins a timer once the transmission is completed. Immediately after receiving the PM from the user, the AN sends back a PR. The time between the AN receiving the PM and returning a PR is given by TB. This value includes time to process the inbound PM, generate the PR, and transmit 45 this packet. Again, TB is a function of the AN hardware and the rate at which data can be sent across the fiber. It must be measured beforehand and stored. When the user receives the PR, it stops the timer. The value of the timer corresponds to 2Di + TB, meaning the user has sufficient information to calculate Di. The assumption here is that the round trip time is the same upstream as downstream. The procedure is illustrated in Figure 2.4. User i AN BM Packet Pae ..................... ....... ............... ............. ..... ............ ........ ........................... ......... --.........------. Packet User ........ starts timer ------ TB .......... 21)i +T B PR.... Packet Packet. ... + ............................ ....... .......... .. . ... ................... . .. . ... ............. . .. ......... . . . . . .......... .e ......... V...stops timer Figure 2.4: Propagation Delay Measurement 2.5.2 Synchronization Issues Once a new user determines its value of Di, it will ideally be able to perfectly synchronize its clock to that of the AN. However, in reality this synchronization will not be perfect for a number of reasons. This issue will now be addressed. 46 2.5.2.1 Precision of measured quantities As mentioned above, TA, TB, and Tu are delay values that have been previously measured, namely during the design and testing phase of the user and AN hardware. As with any measured quantity, there is some uncertainty associated with it. In particular, it may be due to limitations of the measuring equipment, and the possibility that these values change over the operating temperature range of the equipment. Assuming that the maximum uncertainty of these quantities is known, one can calculate the resulting maximum difference between the AN master time and the clock value of any user. Let ATA, ATB, ATu be the maximum uncertainty of the three measured quantities. This means that, for instance, the true value of the time required for the AN to generate and transmit the TP would lie in the range [TA -ATA, TA +ATA]. It is important to note that, since the value of Di is calculated by the user subtracting TB from the time it takes to receive back a PR and dividing the result by two, the uncertainty of Di is ATB/2. The value that user clocks are set to upon reception of a TP is the sum of TA, Tu, Di and the time of the master clock when the TP packet is sent. Therefore, the Fixed Uncertainty (UF), the maximum difference between the clock of the AN and that of any user immediately after receiving a TP, is given by the sum of the uncertainties of the quantities used in setting the clock: UF = ATA+ ATB +ATu 2 2.5.2.2 Clock Drift A clock utilizes a reference frequency from an electronic oscillator in order to keep accurate time. However, such an oscillator is not perfect in its generation of the reference frequency. It has a characteristic stability measurement that expresses its maximum deviation from the nominal value of its frequency. Since the frequency does not remain perfectly constant, the clock using the oscillator will not keep perfect time. The instability of the oscillator is measured in parts per million (ppm). One ppm corresponds to a maximum error of 1 ps/s for the clock using the oscillator. This means for each second the clock is running, the time value it produces can diverge from the time it was initially set to by one is for every ppm of oscillator instability. The maximum error value, expressed as s/s will, henceforth, be referred to as the Uncertainty Rate (UR)- 47 2.5.3 Maximum discrepancy between AN and any user In order to counteract the effects of clock drift, the AN periodically sends out a TP so that the users can resynchronize. If a TP packet is sent at least once very a seconds, the maximum discrepancy, dAU, between the AN clock and the clock of any user is given by: dAU AUF+2UR0 A justification is required for the factor of 2: Just as the clock of a user is subject to drift, so is that of the AN master clock. It is assumed that the same type of oscillator is used to implement the AN clock, meaning its UR will be the same. Now, if a user experiences maximum clock drift in, say, the positive direction (it runs fast) during the time interval of length a, and the AN experiences maximum negative clock drift during that time (it runs slow), the two clocks will be off by a factor of 2 URG. This is in addition to the initial UF value. 2.5.4 Maximum discrepancy between any two users During regular operation, a user will have exclusive access to the upstream channel for a short time before allowing the next scheduled user to transmit. It is during these transitions of channel control, when one user stops transmitting and the next user begins, that collisions may occur due to discrepancies between users' clocks. In order to counteract the possibility of collisions, it is necessary to add a guard time, a time when no user is scheduled to transmit. The amount of guard time needed is equivalent to the maximum discrepancy between the clocks of any two users. This is a different value than dAu. Specifically, the error in the TA value does not matter: All users set their clocks to the same broadcast TP. Since the AN compensates for time to generate and transmit the TP, any deviation from the true value will be common for all users. In other words, all users will share a common offset from the AN time. However, every user may have its own unique deviation from the nominal Tu. Thus, this offset does not cancel. Furthermore, the propagation delay from the AN is measured separately for each user. This means that for different users the deviation from the nominal value of TB may differ (if, for instance, the AN undergoes a temperature change between measurements and its processing time is affected). The worst case difference between any two users, i andj, results from the following case: One user, say i, is subjected to the maximum positive errors of the measured quantities, +ATB/2 and +ATu, and then undergoes maximum positive drift. Userj is subjected to the maximum negative errors: -ATB/2 and -ATu, and then undergoes maximum negative clock drift. Their maximum clock discrepancy, and thus the required guard time, ge, is given by: 48 ge = 2 (T + ATy +URIJ Diagram 2.5 illustrates this derivation. In this diagram, the "nominal" end/beginning refers to the time when the users would start/end their transmission if they were perfectly synchronized to the AN. Since the value of the TA error is common to both user i andj, it shifts their beginning/ending time by the same amount. The maximum over-shoot is the amount of time userj may transmit beyond its allocated time, due to the factors indicated. The maximum under-shoot is how much before its scheduled time user i may start transmitting. Maximum "under-shoot" = ATu + ATB/2+ OUR Maximum "over-shoot" = ATu + ATB/2+ OUR .Zv77z.4. NZ Required guard time, Nominal end of user g= ATB +2(AT + OU )Nominal beginning of UINmnlbgnigo T+(~ 9 i's transmission + userj's transmission + TA error TA error Figure 2.5: Illustration of Required Guard Time 2.5.5 Compensating for Propagation Delay It has been shown that in order to have clock synchronization - within the guard time g, between all users, each user had to compensate for the propagation delay of downstream TP packets. Upstream packets are subject to the same propagation delay, and therefore compensation is also required whenever a user sends a packet. Failure to account for upstream propagation delays may lead to numerous collisions. For instance, consider a case where user i is scheduled to transmit from time To till time T and userj is to transmit next after waiting for the guard time g. Let user i have a propagation delay Di to the AN and user j have a propagation delay Dj. Assuming, initially, perfect clock synchronization and no upstream propagation delay compensation, user i ceases transmission at time T and its data continues to arrive at the AN until time T1 + Di. User j waits until time T,+gc to transmit, and the first of its data arrives at the AN at time T1 + ge+Dj. A collision will occur between the transmissions of user i and user j if the following holds: 49 The presence of the guard time g, reduced the likelihood of a collision under the assumption of perfect clock synchronization. However, once this assumption is removed, g, cannot be relied on to lessen the likelihood of a collision due to propagation delay differences. In the worst case scenario the clock of user i is g, seconds faster than that of user j, meaning the entire guard time will be used just to compensate for the clock discrepancy. Thus, a collision may occur if Di > Dj and, more generally, any time the propagation delay of the user ceasing transmission is longer than that of the one next starting to transmit. One possibility for compensating for upstream propagation delays is to add an additional guard time between transmissions. This guard time must be long enough to avoid a collision under the worst case scenario - the user ceasing transmission has the maximum possible propagation delay to the AN, and the next user has the shortest possible propagation delay. Since the largest distance to between any user and the AN is 10 km, the longest propagation delay is: 1.10 4 m 1.O0m4 M 2-10' m / S = 5.10 s (2.1) A user may lie arbitrarily close to the AN, so the lower limit on propagation delay is 0. This means that the guard time would have to be increased by a value equal to (2.1). The other way of handling propagation delays is to have each user begin its transmission early so that its packets arrives at the AN at the scheduled time. Thus, in the above example, user i would begin its transmission at time T1 -Di. Each user has the stored value of its propagation delay from the procedure described in Section 2.5.1. This value has an uncertainty of ATB/2, meaning that collisions could still occur. In order to eliminate this possibility, an additional guard time to compensate for propagation uncertainties, gp, must be added. The value of gp is ATB. Presumably, TB could be measured to a greater accuracy than the value of (2.1). Under this assumption, the second method for compensating for upstream propagation delays extends the required guard time by a smaller amount than the first and will therefore be the one used in the reservation protocol. The total guard time, g, between transmissions, necessary to compensate for clock discrepancies and propagation uncertainties, is given by: g9g1 + g 50 2.5.6 Collisions between Upstream Packets Because of the guard time between user transactions, collisions between upstream data packets should not occur during normal operation. The presence of collisions indicates that a loss of synchronization beyond the guard times has occurred. The AN cannot directly detect a collision. Instead, it checks the integrity of all received data packets. During collision-free operation there should only be a small fraction of corrupt data packets, since transmissions over fiber are not error-prone. As the network operates, the AN keeps track of the rate at which it receives corrupt packets. If this rate begins to increase rapidly, the AN will conclude that collisions are taking place. If the AN detects collisions, it will attempt to reestablish synchronization by entering a special Synchronization Mode (SM). First, it sends out the EP informing all users to immediately cease transmission. It then goes through the same process described in Section 2.5.1 for every registered user, allowing each one to re-measure its propagation delay. When completed, it sends out another EP informing users to continue normal operations. Entering SM is somewhat of an extreme action to take, since it results in the network going down for a period of time. However, the justification is as follows: Collisions between data packets represent catastrophic events; the access network is not designed to properly operate while they occur. If collisions are happening, the performance of the network has been compromised to such an extent that letting it continue in its present state is not an option. Also, the root cause of the collisions is not likely to resolve itself during normal operation. There is no guarantee that going through SM will eliminate the occurrence of collisions. In fact, the only cause of loss of synchronization that will be solved by SM is if one or more of the users' stored propagation delay values has changed significantly from the time it was last measured (for instance due to a severe change in the operating environment). There are other, arguably more likely causes of collisions, such as hardware malfunctions. However, these other causes may not be rectifiable without outside intervention. Thus, if, upon going through SM, the AN continues to observe collisions immediately, it will indicate an access network failure to an outside entity. 2.6 Reservation Schemes This section discusses two options for how users make requests for sending packets, and how the AN informs them of when they may send. 51 2.6.1 Static TDM In this scheme, the reservation phase gives each user an opportunity to send an RR free of contention. There are N slots during the reservation phase, and each user is assigned one of the slots to send an RR. A user that is not on-line, or has no data packets waiting to be sent, will remain idle during its slot. The slot assignment is the same each time. Users must be aware of their slot assignment. This is handled after a new user registers: An SD packet is sent that informs it of which slot number it has been assigned. When a new user registers, its slot position will be after those of all the other users. Thus, these users are unaffected. However, if a user is removed from the network and its slot eliminated (Section 2.4.1) all the users whose slots were after that of the departed one must have their slots reassigned. Thus, the AN sends out an SD to each user. An overview of the reservation phase of the protocol is given in Figure 2.6. It shows three of the N users reserving. The users are identified by their assigned reservation slot. As described in Section 2.5.5, each user compensates for its propagation delay Di by beginning its transmission early so that the RR packet arrives at the AN at the scheduled time. The guard time, g, is allocated between all RR requests to eliminate collisions. It is important to note the behavior of User 2: Although its reservation slot is after that of User 1, it actually begins its transmission before User 1 because it is a significantly further distance from the AN. 52 User 2.User 2 transmits RR Packet User 2 receive l % % User User 1 transmits ..- RR Packet User N transmits RR Packet User 1 ,User 'receive - RA N N %% % % % ,/"User*N receives RA If / Access Node % % %% I I4 -4 User l's / reservation slot User 2's reservation slot I ~J~J II I iI User Ns reservation slot Guard Time, g AN processing time: scheduling algorithm run RA broadcast packet transmitted Figure 2.6: TDM Reservation Phase After the AN receives all the requests, it runs a scheduling algorithm, and sends out the RA packet informing all users when to send during the data phase. An overview of the data phase of the protocol is given in Figure 2.7. All users have received the RA packet and therefore know when to send their data packets. The guard g is added between the time one user ceases transmission and the next one begins. 53 User 2 User 1 % User % Access Node % % %% I I I II I IIl User l's allocated data packet slots User 2's allocated data packet slots I User N's allocated data packet slots Guard Time, g Figure 2.7: Data Phase 2.6.1.1 AN Operation The AN receives all the reservation packets from the users, and uses them to generate and broadcast an RA packet which informs users of when they may send their data packets. In order to generate the RA, it first runs its scheduling algorithm (Section 2.7) after receiving all the requests. Next, the AN calculates the Beginning of the Data Phase (BD) value, the time when the data of first user is to arrive. The AN must ensure that, when the first user allowed to transmit receives the RA, the time value of that user's clock is not ahead of the time that it is to begin broadcasting at. Otherwise, the first user will effectively receive notification to begin broadcasting in the past. This may lead to several undesirable outcomes: The user may not know how to handle the situation and not transmit at all, or it might try to send all its packets anyway and run past its allotted time and cause collisions with the next user. 54 The first user is to start transmitting at time BD-Dl, according to its clock. Thus, when it receives the RR, the value of BD it contains must be at least D, ahead of its clock. This means that when the AN generates the BD value it must add a number of values to the present time of its master clock. These include: the maximum discrepancy between its clock and that of the user, the time it takes to transmit the RA, and twice D, (to allow the RA to propagate to the user, and to allow the user to begin broadcasting at time BD-D1). Just prior to transmitting the RA, the AN records its clock value and adds the total length of time of the above factors to calculate BD. The maximum clock discrepancy is given by dAu. UF, UR, and a will be stored in non-volatile AN memory to allow it to calculate dAU. The time required to transmit the RA can also be calculated by the AN - the channel rate will be known. The exact propagation delay to user 1, D1 , will not be known by the AN. This is because it is the users, not the AN, that measure and store the propagation delays. However, the AN can simply use the worst case scenario: the propagation delay for a user at the maximum 10 km distance. This ensures that the first scheduled user (and every other user for that matter) will receive the RA before the user's clock time exceeds the time when it must start transmitting. The next step for the AN is to calculate the Beginning of Reservation Phase (BR), the starting time of the next reservation phase. BR is calculated as the time required to send all the scheduled packets, added to BD. Using the above information, the AN generates and sends the RA packet. The format of its control information is as follows. The value k is the total number of users that are allowed to transmit Beginning Data Phase (B) ID of 1st user allowed to send packets Number of packets user can send ID of 2nd usei Number of packets user allowed to send packets may send ID of kth user allowed to send packets Number of packets user may send Beginning servation Phase (BR) In order to allow users to parse the RA, its various sections will be of a fixed bit length that is known to the users. The BD section will always be the same size, LT bits (a sufficient number of bits to represent an explicit time). After this, the two sections that make up the transmission information for a user will all be the same: Let Ll designate the length of the "ID section", and LN be the length of the "Number of Packets section". Finally, the length of the BR section will also be LT. The operation of AN Scheduler of is summarized in Figure 2.8. 55 Run scheduling algorithm Receive Reservation Request Packets from users 'ait until BR Generate and transmit Reservation Announcement Packet Calculate BD and BR . Figure 2.8: AN Operation 2.6.1.2 User Operation Because of the fixed length of the RA sections a user will know precisely which bits to examine in order to extract its intended information. After receiving the RA, it first records the total length of the packet, Lp. Unlike the other lengths, this one will not be constant - it depends on the number of users, k, that are allowed to send data packets. Next, the user stores the value of BD (the first LT bits in the packet) and the value of BR (the last LT bits of the packet). It then calculates the value of k as follows: k= ,- 2L. L1 + LN Next, it checks the values of all the ID sections: There is one starting at each of the following bits positions: LT 1+ j(LI + LN),=,1,...(k-1) If a user finds its own ID in one of these positions, say starting at bit b, it will next record how many packets it is allowed to send: This value will be in bit position b + L1 . Finally, a user must determine the time when it is to begin its transmission. As mentioned before, the first user allowed to transmit begins sending its packets at time BD-DI. Every other user must perform additional calculations to determine its transmission time. First, a user finds the number of packets that are to be sent before its turn comes. It can determine this number by summing all the Number of Packets sections before its own. Thus, the value of yj, the number of packets to be sent before the ith user allowed to send packets begins transmission, is given by 56 i-2 Zi =ZO(4L +L, +1+ j-{ LI +L,}) j=0 Where 9(*) is defined as "the value of the bit sequence of length LP beginning at bit position * All data packets are of a fixed size X, and there is a guard time g after a user ceases transmitting and the next begins. Thus, the ith user allowed to transmit can calculate its explicit transmission time Ti as follows: 'T,= BD + g('-1)+ X -Z, -Di Each user stores its calculated value of Ti and transmits its allotment of packets at that time. The RA also allows users to calculate the time they may next send a RR. The value of BR specifies the time when the user assigned the first reservation slot is allowed to transmit its RR. The other users calculate the explicit value of when they are to send their next RR. These packets are of a fixed length Y, and there is also a guard time g between transmissions. Thus, the user which has the jth assigned slot, may send its RR packet at time Pj: P =BR+ (Y + g)(j -1)-Dj Note: The designation "ith user allowed to transmit" and "user which has the jth assigned slot" are distinct identifiers. The order in which users are allowed to transmit changes each data phase, depending on which users sent RR packets in the previous reservation phase. On the other hand, the order of the reservation slots is fixed. The operation of a typical user is summarized in Figure 2.9 (in this example, the user is the ith one allowed to transmit, and has the jth assigned reservation slot) 57 Not allowed to transmit Receive and parse RA Calculate Ti, and begin transmitting the allocated number of data packets at that time Allowed to transmit No Data in Buffer Continue to monitor inbound packets Calculate Pj -4 Data in Buffer Transmit Reservation Request Packet at Pj " Figure 2.9: User Operation 2.6.2 Contention Reservation The reservation scheme that will be considered is slotted Aloha. A certain number of reservation slots are available per reservation phase, and any user having stored data packets will send an RR packet in a randomly selected slot. The AN may dynamically change the number of slots available during each reservation interval depending on its estimate of A. If two users transmit in the same slot, a collision occurs and neither is successful. By the nature of the network, users will not be able to detect collisions directly. They simply wait until the AN sends out its RA packet, and if their RR packet was successfully received, they will be allowed to send data packets. Otherwise, they remain idle until the next reservation period. 2.6.2.1 Operation In order to implement a contention-based scheme, only a few variations of the TDM case are needed. The RA packet will have one additional section: one of length Lc bits specifying the number of contention slots Nc in the next reservation. This new section follows BR. The rest of the RA packet remains unchanged. The AN must be able to detect when the data it receives during a reservation slot is the result of a collision, and ignore it. In order to do this, it simply checks the integrity of 58 every RR packet and discards those that contain errors. After receiving all the successful RR packets, the behavior of the AN is the same as in the TDM case: It runs a scheduling algorithm to decide how many packets each user may send. BD and BR calculations are unchanged as well. When a user receives the RA, it calculates k as: k = L -2L -Lc L, +LN It then stores the value of BD (the first LT bits in the packet), the value of Nc (the last Lc bits of the packet), and the value of BR (the LT bits preceding N,). The user next searches the packet for its ID, and, if found, calculates its Ti value. This procedure remains unchanged from the TDM case. Next, the user calculates its next reservation time. In order to do so, it randomly picks an integer Zi in the range [0...Nc-1], and then calculates its reservation time as BR +i (R + g)-D, A user will only transmit an RR packet at this calculated time if it has any packet waiting to be sent. 2.7 Scheduling Algorithm There are a wide variety of possible scheduling algorithms available. Generally they offer a trade-off between fairness and efficiency. An example of a fair algorithm is one that keeps track of how many packets each user has recently sent (where the definition of "recently" depends on how far into the past the algorithm tracks user behavior), and when assigning channel usage gives priority to users who have sent fewer packets recently. An example of an efficient algorithm is one in which every user is allowed to send all its queued data packets every cycle. This results in long data intervals, meaning shorter average delays. However, it can lead to high-rate users "hogging" the channel with a large data transaction and causing long delays for other users. The structure of the access network makes the scheduling algorithm entirely transparent to the user. A user will request to send packets, and send them when allocated channel access by the AN: It is entirely impervious to the scheduling done by the AN. This feature affords great flexibility. It is not necessary to be limited to one scheduling scheme. Indeed, the AN can be programmed to perform any number of scheduling algorithms. It can even be configured to provide Quality of Service. For instance, high priority users could be guaranteed to receive permission to send, at least, some minimum number of data packets each data phase. 59 Thus, it is proposed that the AN must not be constrained to a single scheduling algorithm. Instead, the AN is to be configurable via network operator commands to provide any desirable scheduling algorithm. However, for simplicity, this thesis only considers unbounded and statically-bounded round-robin scheduling. In these schemes, each user requesting to send will always be allowed to send during the following data interval. For the unbounded case, each user will be allowed to send all packets requested. In the bounded case, a user may only send up to a certain fixed number of packets, B. 60 3. Hardware 3.1 Gigabit Ethernet Gigabit Ethernet (GE) technology will be used to provide the physical layer hardware needed for the access network. It was chosen because it is becoming a widely adopted technology for transmitting gigabit per second data over fiber. The required hardware for implementing a GE link is available from a wide array of vendors, and its costs are continuously declining. 3.1.1 History Ethernet is a CSMA/CD-based LAN technology that was first designed in 1973 by Xerox. In 1982 the IEEE released the first official 10 Mbps Ethernet standard, called 802.3. It specified both the physical and data link layer operation. In 1995 a 100 Mbps standard was introduced which was fully backward compatible. In 1998, the GE (1000Mbps) standard was released. 802.3z specifies GE operation over a fiber optic medium. GE maintains the same frame structure of the previous Ethernet versions. Currently, a standard for 10-Gigabit Ethernet, 802.3ae, is being finalized. 3.1.2 Frame Structure Preamble 7 bytes SFD: 1 byte DA: 6 bytes SA: 6 bytes Length/ Type: 2 bytes Data: 46-1500 bytes FCS: 4 bytes Preamble: This is a sequence of alternating 1 and 0 bits that allows the receiver to perform clock recovery: synchronize to the times when one bit ends and the next one begins so that the remainder of the packet can be correctly received. Start of Frame Delimiter (SFD): This is a unique bit sequence that informs the receiver that the preamble has come to an end and the remainder of the frame is about to begin. Destination Address (DA): Each user has a unique identifier called a MAC address. The DA field contains the MAC address of the intended recipient, allowing users to discern whether to process the frame or ignore it. A bit sequence of all Is indicates a broadcast frame, meaning it is intended for all users. Source Address (SA): The MAC address of the station transmitting the frame is included so the receiver can identify the sender. 61 Length/Type: This field can be used to designate the length of the subsequent data field, or to identify the type of packet encapsulated within the data field. The code 0800 (hex) specifies an IP packet. Data: The information being transmitted, such as an encapsulated IP packet or control data. Note: the 1500 byte length restriction on this field is per the IEEE standard. However, some vendors offer GE products that expand the size of this field up to 9000 bytes. Frames whose data length is between 1501 and 9000 bytes are called "Jumbo Frames" Frame Check Sequence (FCS): This is a checksum generated by performing a cyclic redundancy check (CRC) on all the other fields of the frame except the Preamble and SFD. Upon reception, the CRC is recalculated by the receiver. If it does not match, the receiver will assume the frame is corrupted and discard it. 3.1.3 Gigabit Ethernet Layers The GE standard follows a hierarchical structure, as illustrated below: Gigabit Ethernet Hierarchy OSI Reference Model LLC: Logical Link Control Higher Layers MAC: Media Access Control RS: Reconciliation Sub-layer Network Layer GMII: Gigabit Medium Independent Interface PCS: Physical Coding Sub-layer Data Link Layer PMA: Physical Medium Attachment PMD: Physical Medium Dependent Physical Layer MDI: Media Dependent Interface Fiber Optic Media Fig 3.1: GE Layers and Connection to OSI Reference Model 62 LLC: This layer specifies the mode of transportation for GE: unacknowledged, connectionless, best-effort service. GE relies on higher layers to detect lost frames and deal with rescheduling. The LLC simply forwards the data packets to and from the network layer MAC: This layer performs all the CSMA/CD functions, such as checking whether the channel is free prior to a frame transmission, and detecting collisions. In most practical applications, GE is used in a contention-free setting in full duplex mode. This means CSMA/CD functionality is disabled and frames may be sent and received simultaneously. The MAC layer receives outgoing packets from the network layer, formats them as GE frames, and sends them across the GMII. The layer is also responsible for checking the DA and FCS of frames coming in across the GMII. The data contained in non-corrupted frames intended for the user is forwarded to the network layer. RS: This layer collects bits from the MAC, 8 at a time, for serial transmission across the GMII. GMII: This is an 8-bit wide serial transmit/receive data path, which is capable of simultaneously transmitting one byte of data in both directions between the RS and the PCS on each cycle of a 125 MHz reference oscillator. PCS: It encodes 8-bit sequences received from the GMII into 10-bit sequences using a block code specified by the GE standard. The two extra bits add redundancy to allow error correction at the receiver. They also ensure that a sufficient number of transitions exist in the transmitted data to enable accurate clock recovery by the PMA layer at the receiver. Each 10-bit sequence is then sent in parallel to the PMA The PCS layer performs the complimentary operation on data originating from the PMA: The received 10-bit sequences are decoded to 8-bit sequences and error-correction is performed as necessary PMA: This layer serializes the data originating from the PCS. This serial data is then sent to the PMD layer, one bit per cycle of a 1.25 GHz reference clock. Clock recovery on the 1.25 Gb/s serial data received from the PMD is performed, and the data is retimed. Signal processing algorithms are applied to limit any inter-symbol interference that may be present. The data is then deserialized and passed onto the PCS layer. PMID/MDI: The function of these two layers is closely tied together. The electrical signal from the PMA is used by a driver to control a laser which transmits the data over the fiber link. Incoming optical signals are received by a photodetector which converts them back to electrical signals which are forwarded to the PMA. 63 The 802.3z standard includes specifications for a number of different types of fibers and lasers. 100OBase-SX specifies transmission over multi-mode fiber using an 850 nm laser. Transmission distances are limited to 550 meters. 100OBase-LX includes a specification for transmission over a single mode fiber, using a 1310 nm laser. Transmission distances of up to 5Km are guaranteed by this configuration. 3.1.4 Available Gigabit Ethernet Hardware A number of vendors make products which handle the tasks of the above layers. The functionality of the MAC layer is handled by a dedicated Gigabit Ethernet MAC controller chip. It receives data packets from the host user, which also specifies the DA and Length/Type field. The unique MAC address associated with a user is typically stored in a ROM chip. The controller adds this value as the SA. It also adds the Preamble and SFD. Finally, it calculates and adds the SFD and sends out the frame across the GMII. The controller compares the DA address of GE frames arriving across the GMII with the stored address. Those that don't match are discarded. Those that do match are checked for integrity using the FCS field. Frames deemed non-corrupted have their FCS, Preamble, and SFD fields removed and are forwarded to the host user. Some controller chips keep track of the number of corrupted frames received in an on-chip register that can be accessed by the host. The interface between the MAC controller and the host is not specified by the 802.3 standard - it is vendor specific. Many commercially available controller chips have an interface designed to communicate across a Peripheral Component Interconnect (PCI) bus. The PCI bus is a ubiquitous technology used within computers, switches, and other systems, to interconnect peripheral cards with system memory and the central processing unit (CPU). Access to the bus is mediated by a dedicated PCI controller chip. Two examples of GE MAC chips with a PCI interface are the Intel® 82543GC [13] and the National Semiconductor® DP83820 [22]. PCS and PMA tasks can be performed by a single integrated chip. Two examples of such a product are the 88E1043 Alaska from Marvell* [20], and the L84700 from LSI Logic® [19]. PMI and MDI functionality is available in a single standardized module called a GBIC: a Gigabit Interface Converter. A number of vendors have GBICs available that adhere to the 100OBaxe-LX specification, but are able to transmit for greater distances than 5 km. Cisco® claims that its ESR-GBIC-LH GBIC can transmit reliably over 10 km of 9 micron single mode fiber [5]. Finisar® makes the same claim for its FTR-1319-3A model GBIC [8]. Some manufacturers, including Cisco and Finisar, are also making GBICs that 64 adhere to a new standard called 100OBase-ZX. It uses a 1550 nm laser over single mode fiber that can reliably transmit over a distance of 70 km. This standard has not yet been officially recognized by the IEEE as a part of the 802.3 specification. 3.2 Implementing the Reservation Protocol This section will describe how the logical functioning of the reservation protocol can be carried out using a combination of available GE hardware and other components. 3.2.1 Mapping Packets to GE Frames Packets are sent within Ethernet frames. Each user has a GE MAC address, which serves as its User ID, as defined in Section 2.1. Since this address is in the SA field of every frame sent by a user, the AN can identify the source of received frames. The AN will also have a GE address that appears as the SA in all frames it transmits. Because users receive all inbound frames from the AN, this SA address serves no purpose and is ignored by the users. It is only included to maintain the proper GE frame format, so that the GE hardware at the user side can process the frame. Similarly, all frames transmitted by users are intended for the AN, but must still contain a DA. It will be set as the broadcast pattern rather than the GE MAC address of the AN so that users do not have to be provided with this address. The broadcast code assures that the GE hardware at the AN does not discard the frames. Frames originating from the AN will either have the intended recipient's GE MAC address in the DA field, or the broadcast code if it is a control frame intended for all users. The "Flag indicating a data packet", defined in Section 2.2.3, is invoked by setting the Type field to indicate an encapsulated IP packet. The "Flag indicating the type of control packet" from Section 2.2.4 is also implemented by the Type field. The IEEE standard maps a number of Type field bit patterns to represent various types of encapsulated packets, but many of the possible patterns remain unassigned. A total of twelve different control packet types were identified in Sections 2.2.4.1 and 2.3.2.1. Each of these will be assigned a unique bit pattern in the Type field not defined in the 802.3 standard. While this is technically an abuse of the standard, it does not present any problems - as will become apparent from the subsequent hardware specification With GE it takes 1 ns to transmit each bit, meaning that all scheduled transmissions will take an integer multiple of this unit of time. Thus, in order to provide sufficient granularity, the clock of each user and that of the AN will have a resolution - the smallest unit of time tracked - of 1 ns. Explicit time values contained within the TP, the BD and BR field of the RA (Section 2.6.1.1), and the TB field of the BM (Section 2.5.1) will be in the following format - hours: minutes: seconds: nanoseconds. The number of bits required for a time value of this format is: 65 [092 24] bits hour field + log 2 60] + bits bits [109 2 60] second field minute field 9] bits ns field = 47 bits One unused bit, set to 0, will be pre-appended to time values so that they are exactly 6 bytes in length. Since a GE frame must have a minimum data field of 368 bits, the TP frame will have 320 bits of padding appended to the time value. These are ignored when the user parses the frame and sets the clock. For RR frames (Section 2.3.2.1), the number of bits used to specify the number of data packets waiting to be sent will be limited to 16, corresponding to a maximum request of 65535 packets. The remaining 352 bits in the RR frame data field will be padding. RA frames will also use 16 bits to inform users of how many packets each may send. Thus, an RA frame requires 8 bytes for each reservation granted: 6 bytes indicating the user's MAC address, and 2 bytes indicating the number of data packets that may be sent. EP, PR, NU, and SR frames (Section 2.2.4.1) as well as NR and PM (Section 2.3.2.1) frames carry no data. This means all 368 data bits will be padding. 3.2.2 User Hardware Each user has a Network Interface Card (NIC) which transmits and receives data across the shared fiber. The goal in designing this card is to use as much existing GE technology as possible in order to keep the costs low. One possible solution is to a use an off-the-shelf GE card, and carry out all the tasks related to the reservation protocol in software running on the host user's CPU. There are several problems with this. First, the additional burden on the user CPU will adversely affect the performance of other processes running. Secondly, the time required to carry out calculations related to the protocol may vary with the load placed on the CPU by the other process - making it difficult to ensure frames are sent at exactly the allocated times. Finally, none of the available NICS surveyed offered more than one megabyte of buffer space. This means that most IP packets generated by the Network layer would have to be buffered in the user's system memory. Given these drawbacks to software-based reservation MAC, the solution is to design a new NIC which handles the reservation protocol processing in hardware. The following high-level design is proposed: A 100OBaxe-LX GBIC from one of the manufacturers guaranteeing a transmission range of 10 km is used. Although a 100OBase-ZX GBIC would provide a greater range (70 km), these modules have only recently been released to market and tend to cost on the 66 order of ten times as much as the LX ones. Since no user is more than 10 km from the AN, a 10 km LX GBIC is sufficient. The GBIC is connected to a conventional PCS/PMA chip. It performs its functions on all inbound serial data from the GBIC, and outgoing parallel data destined for the GBIC. This chip requires an external 125 MHz oscillator, whose signal it multiples by ten to get a 1.25GHz reference. This oscillator will be included on the NIC. The PCS/PMA connects to a GE MAC controller chip across a GMII interface. This interface requires a 125 MHz reference signal, which is provided by the same on-card oscillator utilized by the PCS/PMA chip. The host interface of the controller is PCI compatible, to facilitate simple system integration. The controller is operating in fullduplex mode (no CSMA/CD), and its flow-control functionality is disabled. Its functionality is limited to formatting outgoing Ethernet frames and processing inbound ones. The GE MAC chip will henceforth be referred to as the GE Framer, since this is a more accurate description of its role. The next module represents a major divergence from an off-the-shelf NIC. The output of the GE Framer is not in fact connected to the host user, but to a custom piece of hardware which will be referred to as the Protocol Processing Unit (PPU). The PPU receives all packets originating from the GE Framer. It identifies them by reading their Type field. Its task is to carry out all the functions specifically related to the reservation protocol. It includes a CPU, firmware stored on a ROM chip (nonvolatile memory that cannot be modified), Flash memory (nonvolatile memory which may be written to), RAM, and a PCI interface along with a PCI controller chip. The CPU executes the firmware stored in the ROM. This code provides the instruction on how to carry out the reservation protocol. The ROM also contains the values of Tu, ATu, ATB, UR, and G (as defined in Sections 2.5.1 through 2.5.2.2) all of which would have been determined during the design and testing stage of the NIC and AN. Furthermore, it contains the values of X, the time to send a data frame, and Y, the time to send an RR frame. The PPU also includes the clock described in Section 2.5. This clock uses the same 125 MHz oscillator previously mentioned as a frequency reference. It multiplies the frequency by a factor of 8 to achieve a 1 GHz signal, and advances 1 ns on every cycle. The clock is accessible by the CPU. The CPU is able to read the present time of the clock, and may set the clock by providing it with a time value. The clock also contains two registers, each of which can store a time value. The CPU has access to write to these registers. When the clock's present time corresponds to the value stored within one of the registers, it sends an interrupt signal to the CPU. The PPU is connected to the host user through the PPU's PCI bus. All IP packets generated by the user arrive to the PPU over the bus. These packets are stored in the RAM. They do not have to be processed by the CPU before being stored. Therefore, they are written directly to memory. This is accomplished by utilizing the PCI Direct Memory Access (DMA) feature, which allows incoming data to bypass the CPU. A high level view of the user NIC is given in Figure 3.2. 67 Optical Signal GBIC Serial Link PCs/ PMD Chip GMII Ilink GE Framer PCI Bus PCI CPU RAM Controller ROM ''' Flash Memor lc PPUj : CPU to memory interface -.-. : CPU control line Fig 3.2: User NIC 68 PCI || Interface to host user IP Packets The following is a list of scenarios encountered by the PPU CPU, and how it responds: The CPU receives an RA packet: It parses the packet and calculates the time when it is to initiate sending its data packets, the number of packets it may send (if any), and the time it is to initiate sending RR packet. First it calculates Ti and Pj as discussed Sections 2.6.1.2. and 2.6.2.1 (assuming it is the ith user allowed to transmit packets and assigned the jth reservation slot). These are the times when the appropriate frames should begin being transmitted across the fiber. However, there is some processing delay between the time the CPU initiates the transmission of frames and when they are actually sent. These delays include the time to generate the packet, have it pass across the PCI bus, and be processed by the GE Framer, PCS/PMA chip, and the GBIC. The total delay values for IP and RR transmission must be known by the CPU. They are measured during the design and testing of the NIC, and stored in its ROM (just like the other measured quantities such as Tu). Using Ti, Pj, and the stored delays, the CPU calculates the times when it must initiate transmission of stored IP packets and the next RR packet. It sends these two values to the clock registers. It stores the number of IP packets to be sent within its cache. The CPU receives a TP: It adds Tu and Di to the value carried by the packet, and sets the clock to the calculated value, as described in Section 2.5.1. The CPU receives an EP: A flag within the processor is set to indicate that it has entered Synchronization Mode, as discussed in Section 2.5.6 (unless this flag was already set, in which case the EP indicates the end of SM, and the flag is cleared). Next, the processor sends a flow control signal to the user to inform the higher layer applications to cease generating IP packets. Finally, it clears any of the time values stored in the clock registers. It ignores future RA packets until it receives another EP. The CPU receives a BM Packet: The processor parses the packet, extracts the value of TB, and stores it in its cache (Section 2.5.1). If a static TDM protocol is running (Section 2.6.1), the processor will also extract its slot assignment data, and stores it in the Flash memory. Next, the processor generates a PM packet, and sends it to the GE framer across the PCI bus. It stores the clock value as the PM packet's transmission is complete. The CPU receives a PR Packet: It reads the time of the clock when the packet is received into its cache. The earlier recorded time (when the PM was sent) is then subtracted from the arrival time of the PR packet. Using this delay, the processor calculates its propagation delay to the AN, Di, which it stores in its Flash memory (Section 2.5.1) The CPU receives an interrupt from the clock informing it that it is to initiate transmission of stored IP packets: The processor enables the specified number of data packets to be sent from RAM across the PCI bus to the GE Framer. 69 The CPU receives an interrupt from the clock informing it that it is to initiate transmission of an RR packet: The processor reads from the RAM how many data packets are waiting to be sent. Using this information, it creates an RR packet, which it sends across the PCI bus to the GE Framer. The CPU receives an IP data packet: It forwards it across the PCI bus to the host user. The CPU receives an SD Packet: It stores the specified slot assignment in Flash memory. The CPU receives an NU Packet: It checks a flag in Flash memory which indicates whether the user has previously registered. If the flag indicates that the user has not registered, the CPU immediately initiates the transmission of an NR packet and sets a flag indicating that an NR has been sent. The CPU receives an RC Packet: It checks the flag indicating whether an NR has been sent. If this is the case, the CPU randomly chooses whether to initiate the transmission of another NR, based on the probability value of the RC Packet. The CPU receives an SR Packet: It sets a flag in Flash memory to indicate that the user has successfully registered. It also clears the flag indicating that it had sent an NR packet. 3.2.3 Access Node Hardware The AN includes the same hardware as the user NICs. Upstream frames - those originating from users - go through the same cascade of hardware: a GBIC, then a PCS/PMA, followed by a GE Framer. The inbound packets are then fed into a module which will be referred to as the Access Node Scheduler (ANS). It includes the same hardware as the PPU on the NICs, but the firmware being executed is different. Further, the stored values within it are a, UF, TA, ATA, TB, ATB, and ATu. The flash memory contains the MAC address of all the users that have registered (Section 2.4). The ANS examines the Type Field of all upstream packets and takes appropriate action. The ANS is also connected to the IP router. The AN hardware is accurately represented by Figure 3.2, after the following substitutions are made: "Interface to host user" becomes "Interface to IP Router", and "PPU" becomes "ANS". During normal operation (i.e. not during SM or registration), the ANS will periodically send two types of control packets - RR and TP ones. Once every a seconds, the ANS reads the value of its clock forms a TP packet as discussed in Section 2.5.1 and sends it to the GE Framer. The RA packet is sent after the ANS has received all RR packets during a reservation phase, as described in Section 2.6.1.1. During the times that it sends these 70 packets to the GE Framer, IP packets from the router coming into the ANS must be stored in RAM. As soon as the control packets have been sent, the stored IP packets are sent. IP packets that arrive while the ANS is neither sending a control packet nor a buffered IP packet are sent directly across the PCI bus to the GE Framer. Thus, a priority scheme exists for downstream packets: control packets, followed by buffered IP packets, followed by newly arrived IP packets. The CPU, in conjunction with the PCI controller chip, enforces this priority system. Inbound IP packets, once identified by the CPU, are sent to the IP router via the PCI bus. Under normal operation, the only types of control packets received by the ANS are RR ones. If the Aloha Contention reservation scheme is used, some RR frames may collide. Data arriving at the GE Framer that corresponds to two or more RR frames involved in a collision will be deemed corrupted after the Framer performs the CRC check - meaning it is not forwarded to the ANS. Thus, under all conditions the ANS only has to deal with valid RR packets. The ANS CPU identifies the SA and number of packets requested in each RR. If an unbounded scheduling algorithm is running, it simply stores these two values in RAM. In the bounded scheme (Section 2.7) it compares the number of packets requested in each RR to the bound. If the request is less than the bound, it stores the request. Otherwise, it substitutes the bound. Once all RR packets have been received and processed, the CPU calculates the BD and BR value (Section 2.6.1.1 and 2.6.2.1). If it is running a contention reservation scheme, it will also calculate Nc. Finally, using all these values, it generates and sends the RA packet. As just mentioned, the CPU waits till all RR packets have been received before generating and sending the RA packets. Since not all users send RR packets, it cannot simply wait till it receives a certain number of RR packets. Instead, it waits till the time allocated to the sending of RR packets has come to an end. This occurs at BR+N(g+Y) for a TDM reservation scheme, or at BR + N, (g +Y ) for a contention based one. The appropriate time value is calculated after the RA is sent, and stored it a clock register. Once the CPU receives the subsequent interrupt from the clock, it knows that it has received all requests and can generate the next RA packet. As mentioned in Section 2.5.6., a high rate of upstream data packet collisions cause the AN to deduce loss of synchronization has occurred, and to enter SM mode. The GE Framer chip used on the NIC will keep track of all corrupted packets. Periodically the CPU will poll the Framer in order to calculate the rate of corrupted packets. If the rate exceeds a pre-defined limit, the ANS will enter SM mode. If a contention-reservation scheme is being used, it is important for the ANS to discriminate between corrupt RR packets - which are expected to appear during normal operations - and corrupt data packets that indicate a loss of synchronization. In order to make this distinction, the ANS 71 will clear the Framer's record of corrupted packets after each reservation phase, and poll it after each data phase. During SM, the ANS ignores any RR packets it receives from users. First, it sends out an EP packet. It goes through the list of all users stored in Flash memory, and for each one it generates a BM packet, sends it to the GE Framer, and waits to receive a PM packet from the Framer. Once it receives the PM packet, it immediately sends out a PR packet and then sends the BM packet addressed to the next user. Once it has gone through al the users, it sends out another EP packet. Next, it sends out a RA packet that informs users of the next BR time. 3.2.3.1 Access Node Scheduler Hardware: A Closer Examination The ANS must be able to handle the requirements of the reservation protocol. This section will quantify these requirements and provide suggestions for specific hardware which may be used to meet them. PCI Bus Data from the following sources may be moving across the bus at any time: " IP Packets from the IP router, destined for RAM or the GE Framer. The limit on the instantaneous rate at which these packets may arrive is determined by the maximum output rate of the IP router. Although the AN can only send IP packets at a maximum average rate of 1 Gb/s, it is possible that the router may be supplying packets in bursts that exceed this rate for short durations of time. e Stored IP packets moving from RAM to the GE Framer. These may be transmitted at up to 1 Gb/s in order to take full advantage of the capacity of the GE link. e User packets arriving from the GE Framer, destined for the CPU. These may arrive up the GE link at a rate of up to 1 Gb/s. It is possible that all three of the above transactions occur simultaneously. The PCI bus found in typical PCs runs at 33MHz, with 32 bits of data being transferred in parallel per cycle [36]. This results in a total throughput of just over 1 Gb/s, which is not enough for the above requirements. However, a faster type of PCI bus exists which runs at 66 MHz, and has 64 parallel data paths. Its total data rate is 4.2 Gb/s, which in theory allows maximum IP router bursts of 2.2 Gb/s. In reality, some of the PCI capacity is used for signaling overhead, somewhat lowering this value. 72 RAM: The system memory must be able to provide data at a rate of up to 1 Gb/s in order to fully utilize the GE link. It must also be able to write data at the burst rate of the router. Synchronous Dynamic RAM (SDRAM) is commonly used in modern PCs. Reads and Writes occur at the rate of 64 bits per cycle. This clock speed is determined by the Front Side Bus (FSB), which connects the CPU to the RAM. A standard speed of 100 MHz is commonly used, which results in a maximum throughput of 6.4 Gb/s. Although newer generation memory such as Double Density RAM (DDR) provides a higher throughput, SDRAM is sufficient for the requirements of the ANS and less expensive than the newer memory. CPU: The delay performance of the reservation protocol is affected by how fast the ANS performs its scheduling algorithm after receiving all RR packets and sends out an RA packet. In order to estimate the effect of CPU speed on this operation, the operations that must be performed by the CPU will be closely examined. For each RR packet that arrives and is stored in RAM, the CPU must 1) read the data from memory 2) examine the Type Field to verify that it is an RR packet 3) if running a bounded reservation scheme: Compare the number of packets requested to be sent to the bound, if the bound is smaller replace the request by the bound 4) write the SA address and the number of allocated data packets back to memory 5) add the number of allocated packets to a running total stored in the CPU cache This total is necessary to later calculate BRThe RR packets arrive from the GE Framer and are stored in RAM in the following format: DA: 6 byte user MAC address SA: 6 byte broadcast code Type Field: 2 bytes indicating an RR Request: 2 bytes indicating the number of data packets waitinE to be sent Padding: 44 bytes to meet 46 byte min. length limit of data Step 1 requires the CPU to only read the first 16 bytes from memory, since the padding is irrelevant. When performing a read from SDRAM, there is some initial latency. The first 4 byte segment of data typically takes 5 cycles of the FSB to receive. Subsequent 4 byte data segments arrive on every cycle [2]. Thus, it takes 8 total FSB cycles to complete step 1. 73 Steps 2 and 3 each require the comparison of 2 values that are 2 bytes long. Step 3 also requires an exchange operation if the request exceeds the bound. Step 4 requires writing 8 bytes to memory. Using the same approach as for the memory read analysis, this requires 6 cycles. Step 5 requires adding a 2 byte quantity to a value stored in cache. Steps 1 and 4 together take 14 FSB cycles. Since the FSB is assumed to be running at 100 MHz, the total time required is 140 ns. In order to approximate the time needed for the three remaining operations, it is necessary to consider the instruction set of the CPU used. The type of CPU that will be considered is one using the "80x86" instruction set [31, 32]. This variety of processor is readily available, and easily integrates with the SDRAM and PCI bus. Two examples are the Athlon from AMD* and the Pentium from Intel*. The 80x86 instruction set includes the "CMP" instruction, which is appropriate for carrying out Step 2). It compares 2 2-byte operands and sets a flag indicating if they are equal. The value of the first operand, the Type Code indicating an RR packet, is contained in the code being executed by the CPU. The second operand, the Type field value from the received packet, is stored in a CPU register after the memory read. Since both values are immediately available to the CPU, it is able to perform the CMP instruction in one CPU cycle. The CMP instruction will also be used to carry out the first part of step 3). Again, both operands are immediately available to the CPU, so it requires one processor cycle. If the request exceeds the bound and the exchange is necessary, this operation can be performed using the "XCHG" instruction. It requires 3 CPU cycles. The "ADD" instruction is appropriate for Step 5. It requires 3 CPU cycles. The total number of cycles required by the CPU to perform the above steps is at most 8. The time per cycle depends on the speed at which the CPU is running. The slowest 80x86 processors generally available at the present time run at 200 MHz. Such processors can complete all the instructions in 40 ns. Adding the 140 ns for memory access, the estimate of the total time required to process an RR packet is at most 180 ns. The CPU does not have to wait till all RR packets have been received and stored in memory until it starts performing its calculations; it may process each one as soon as it arrives. The time required to send an RR packet is: 74 72Bytes 125-106 Bytes / s = 576 ns (3.1) This means that the time between consecutive RR packets arriving in the RAM is at least 576 ns (in fact it will be longer due to the guard time between transmissions). Since only approximately 180 ns are needed to process each RR packet, the CPU is able to handle them in "real time". The only task remaining for the CPU after all RR packets are received is to calculate BD and BR, as described in Section 2.6.1.1. Calculating BD requires adding a fixed amount of time to the present value of the AN clock, and writing the result to memory. The ADD instruction requires the operands to be integers of 4 bytes or less. Since one of the operands, the present time value, is 6 bytes, ADD cannot be used. Instead, the operands must be represented as 8-byte double floating-point variables. These variables have a precision of 53 bits (with the remaining 11 bits representing the exponent, which is not relevant in this case), meaning the time values can be represented as floating point variables with no loss of precision. The floating-point addition instruction "FADD" can then perform this addition. It requires 3 cycles. The memory write requires 6 FSB cycles. is calculated by multiplying the total number of data packets to be sent by X and adding it to BD. This operation can be carried out as a floating-point multiplication using the "FMUL" instruction. It requires 3 cycles. Next, the FADD instruction is executed. Finally, a memory write is performed which again requires 6 FSB cycles. BR The total estimated processing time required for these above operations is 12 FSB cycles -10 ns =165 ns ns +9 CPU cycles .5 CPU cycle FSB cycle As will be shown shortly, this value is negligible compared to the other factors that comprise one reservation phase. Thus, even a low-end 200 MHz x86 CPU provides sufficient processing speed to essentially have no effect on the reservation phase duration. Oscillator Recall that the size of the required guard time between transmissions is given by: g A 2(AT + ATU +o-UR) (3.2) The value of UR depends on the instability of the oscillator used. The instability rating, in ppm, is equivalent to UR in gs/s. Since the channel sits idle during the guard time, it is desirable to minimize g - and hence the instability rating of the oscillator. A survey of commercially available oscillators was conducted. The main source of their instability is variation in the temperature of the environment. A type of oscillator that addresses this 75 problem is a Temperature Compensated Crystal Oscillator (TXCO). As suggested by the name, TXCOs include a circuit that is able to compensate for changes in temperature in order to keep the frequency from being (significantly) affected. They provide much better stability than other oscillators that do not adapt to temperature changes. TXCOs that operate at 125 MHz with lppm instability over an operating range from 0 to 50'C are available from a number of vendors. Two specific models include the Vectron® TC210 (Z5) [37] and the Corning® 956WHAB [7]. None of the oscillators lacking temperature compensation surveyed offered an instability rating of less than 20ppm. One of the two TXCOs mentioned (or a comparable one from another manufacturer) should be used on the NIC. ANS/PPU Integration Rather than using a custom design to combine all the ANS/PPU hardware, it is desirable to use an existing integration. This approach should reduce the design and manufacturing costs. So-called Single Board Computers (SBC) that compactly combine all the required ANS hardware with the exception of the clock are commercially available. However, an SBC typically includes a significant amount of additional hardware not required by the ANS/PPU. Intel offers a line of networking hardware that may be used for an alternative implementation of the ANS/PPU. It includes the IXP 1200 network processor [11], which is optimized to perform packet processing and network control tasks. It has a "StrongARM" core which uses a different instruction set than the 80x86 processors discussed above. The IXP 1200 may be combined with the IXM1200 Network Processor Base Card [12]. This card includes additional features needed for the ANS/PPU, including SDRAM, Flash memory, and a PCI interface. Like with the SBC, this approach requires only the clock to be externally implemented. This combination of the IXP 1200 processor and the IXM1200 base card appears to be a good choice for implementing the ANS/PPU. A version of the processor is available that runs at 200 MHz. Thus, it appears that its performance may be comparable to the low-end 80x86 processors. However, specific instruction-set documentation including cyclecounts could not be obtained. 76 4. Performance Evaluation In this section the delay performance of both the TDM and Aloha-contention reservation protocol will be calculated, based on the protocol specification of Section 2 and the hardware suggested in Section 3. 4.1 TDM Reservation In order to evaluate the average delay of the MAC protocol, the length of the reservation phase interval must be calculated. As illustrated in Figure 2.6, it consists of the following components: * The time when all N users sharing the distribution network may transmit their RR frames " The processing delay at the AN " The time required by the AN to transmit an RA frame * The delay between the time the RA is sent and the time the data from the 1 st user allowed to transmit arrives The reservation interval begins immediately after the last frame of the previous data phase has been sent. Each of the N users sharing the distribution network is statically assigned one slot to send an RR frame. The length of each slot is the time required to send an RR frame, TRR, was calculated in equation (3.1) as 576 ns. Thus, the total time dedicated to these slots is: 576N ns (4.1) A guard time of length g seconds is required between each slot. As shown in (3.2), the oscillator instability UR and the time between successive transmissions of the TP, a, appear as a product. Thus, the effect that UR has on g can be made arbitrarily small by choosing a small enough value of a. A large number of TP packets may be sent down the channel each second before the downstream capacity remaining for sending IP packets is significantly lowered. For example, since each TP frame is 72 bytes in length and the downstream channel can send 125Mbytes each second, sending 1000 TP frames per second takes up the following percentage of the total capacity: 72*1l0OOB /s = 0.0576% 125MB/s If a TXCO is used with 1 ppm accuracy, sending 1000 TPs a second - corresponding to a a of 1 ms - would reduce the guard time contribution to 77 Is/sIms =Ins C-UR This value is negligible compared to TRR. The sizes of ATu and ATB, the other values affecting the size of g, depend on how accurately the processing delays at the AN and user can be measured, and the amount these delays may vary during operation of the network. This was discussed in Section 2.5.2.1. Test equipment, such as an oscilloscope, is available with sample rates beyond 1 GHz/second, meaning it is possible to measure Tu and TB with an uncertainty of less than Ins. is composed of the time required by the AN to process an inbound frame, and to generate and transmit a new one. The transmission time is fixed by the GE downstream link. Tu is composed of only a processing delay. The processing time contribution to TB and Tu will consists of a certain number of CPU cycles. The number of cycles required is not expected to change over the time, since the CPU can always expend its full processing power to perform the required operation. Thus, the only source of variation in Tu and TB is due to instability of the oscillator generating the CPU cycles. Presumably, if an oscillator with a low instability rating is used - such as the TXCO used to drive the clock - the values of ATu and ATB will become so small that they too are negligible TB compared to TRR- Given the above discussion, the entire guard time g will be assumed negligible and it will not be included in subsequent calculations of the reservation phase. As discussed in Section 3.2.1, the RA packet consists of 38 fixed bytes (26 bytes for GE framing, and 12 bytes to represent the two explicit time values BD and BR) and 8 bytes for each of the U users that are allowed to transmit after reserving during the previous contention period. Thus, the time to transmit the RA frame is: 38=+8U bytes 304 + 64U ns (4.2) 125 Mbytes / s From Section 2.6.1.1, the AN schedules the first user in the RA frame to begin transmitting at the appropriate time so that its data frames begin to arrive after twice the maximum propagation delay between any user and the AN. This means that after the RA is sent, the first data from this user arrives after: 2(10 km) 2(IOkm) =lO5 ns 2.i05 km /s 78 (4.3) The total length of the reservation phase is given by the sum of (4.1), (4.2), and (4.3): R&Dm (4.4) = (100304 + 576N + 64U)ns The assertion made in Section 3.2.3.1, that the processing delay of the CPU is negligible, is justified by (4.3). In fact, the 304 ns fixed time for the RA packet could also be ignored. The expected value of U may be calculated using (1.33). Substituting (4.4) and 1526 B X = 2 in (1.33) yields 125 MB/s pRTDM UrT =A 35 4 8191 . p(l0030 + 1-e ^X (-p)) = A 1-e 57 6 64 N+ UTM )-10' (4.5) ^(I-p) This equation does not have a closed form solution forUTDm, but may be numerically solved for specific values of N, p, and A. Let RTDM denote the expected reservation interval length, obtained after solving for UTDM and substituting it into the expression for RTDM. Using equation (1.18), the average delay for the implementation of the unbounded TDM reservation protocol specified in Sections 2 and 3 is given by: WTDM Re servation Implementation,Unbounded - + 2(1- p) RTDM +X R 2 (3-pb (1-p _ (4.6) Recall from section 1.71 the assumption that A ~ (0. 1)N . In light of it, the following three cases will be plotted: A=50,N=500; A=100,N=1000; A=200,N=2000 79 Average Delay IN a TDM-Reserfi.. Piutcol .17/ Delay.s N=2000, A=200 N=1000, A=1003___64--N=500,,A=50n 0 0.2 0-4 0.6 0-8 Figure 4.1: TDM-Reservation Delay 4.2 Aloha-Contention This discussion follows the notation of section 1.5.4. The fixed time T consists of (4.2) and (4.3). T, is the time required to send an RR, given by (4. 1). Thus, Tf = (100304+64U)ns T, = 576ns Equation (1.31) and (1.20) may now be used as described in Section 1.5.4 to find an approximation to the optimal number of slots for given values of A and p, and the corresponding average delay. The following is a plot of the optimal number of slots, according to the approximations of Section 1.5.4, for A = 50, 100 and 200. 80 Oplnad Nabmi 6l Caueedmn Slot A=200 3N_ 250 200+ 0 A=100 150 + 0 a3A=50 1000 50 02 0.4 0.6 0.8 1 ead Figure 4.2: Optimal Number of Contention Slots for A=50, 100, and 200 Active Users As previously mentioned, the optimal average delay derived using equations (1.31) and (1.20) will only be approximations, and a simulation should be conducted to gauge their accuracy. To this end, code for a simulation was written using the Maple 7 mathematics package. Packets are modeled as arriving according to a Poisson process, by generating inter-arrival times according to an exponential distribution. The parameters of A, p, can be freely varied. The code used for the simulation is provided in Appendix A. The following three plots were generated for values of A = 50, 100 and 200. For each value of p considered, the simulation was run, and equations (1.31) and (1.20) were numerically optimized (also using Maple). Ten values of p were evaluated for each plot: 0.1, 0.2, ... , 0.8, 0.9, and 0.99. The circles are the simulation values, while the crosses are the optimizations of (1.31) and (1.20). For comparison, the corresponding TDM reservation plots - for A=0. IN - from section 4.1 are included. These are represented by diamonds. There is no significance to the fact that the TDM plots have more data points. 81 Ceiieudion Rese aiim, Oplni Delay im A=50 .1e-1- Delay.s Simulation, 0 +Optimization Approximation 0.2 0 0.4 0.6 0.8 Figure 4.3: 50 Active Users - Average Delay for Contention Reservation According to the Optimization Approximation (Eqns. 1.31 and 1.20) and Simulations Results; TDM Reservation also Shown Cenieuimn Rese iijimn. Opial Delay iw A=-100 le-I Deiby~s T*D I~ood Simulation Optimization Approximation J 0 S 0.'2 0.4 0.6 0,8 Figure 4.4: 100 Active Users - Average Delay for Contention Reservation According to the Optimization Approximation (Eqns. 1.31 and 1.20) and Simulations Results; TDM Reservation also Shown 82 cantimiin Rese iijn (pinnhi Delay kx A=23 * TDM Debay~s , r4 .1 Simulation Optimization A pproximation \ c 0 0.2 0.4 0.6 0.8 Figure 4.5: 200 Active Users - Average Delay for Contention Reservation According to the Optimization Approximation [(1.31) and (1.20)] and Simulations Results; TDM Reservation also Shown These results are encouraging since the simulations track the performance of the approximate optimization quite closely. This indicates that replacing the random variables by their expected values in the derivation of equation (1.31) under the assumption of equal-rate Poisson packet generation among all active users resulted in a good estimate of delay performance. These plots also show that the contention-based reservation scheme performs significantly better than static-TDM, especially at low loading. This is a direct result of the assumption that A=O. IN, which means that 90% of the TDM reservation slots are wasted. 4 .3 Effect of Bounds The formulas derived in Section I - (1.19) and (1.21) - that predict the performance of placing bounds on the number of packets each user sends only give upper limits on the average delay. In order to get a better understanding of the behavior, simulations were conducted. The results indicated that bounds have very little effect on the delay, and that the average delay is in fact close to its lower limit, provided by the unbounded case. Only when the load came very close to the maximum throughput supported by the bound given by (1. 15) - were increases in delay noticeable. The reason behind this behavior is the relatively smooth arrival characteristics of Poisson traffic. Large bursts of data that significantly exceed the imposed bounds are simply not observed. Bounds are only significant when more realistic traffic models are used, as will be addressed in the next section. 83 5. Discussion 5.1 Limitations The most significant weakness of the analysis done in this thesis is the assumption that packets arrive according to a Poisson process. While this model allows existing analytical results from queuing theory to be applied, it does not accurately represent the type of traffic seen in actual LANs. Observations of network traffic conducted during the last decade indicate that it exhibits self-similarity [9]. This property means that the arrival characteristic of new packets is similar over different time scales - that burstiness doesn't "average out". Figure 5.1 exhibits the difference between actual, self-similar traffic and Poisson traffic. The vertical axis is the instantaneous arrival rate of new packets, and the horizontal one is time. Specific units are not provided for the axes, since the figure is only intended to illustrate a high-level view of the general behavior. From top to bottom, the time scale becomes increasingly coarse. The highlighted portion on a graph corresponds to the entire time scale of the graph above it. The Poisson model exhibits bursty arrival characteristics on the shorter timescales, but becomes increasingly smooth as it is averaged out over longer time periods. The observed network traffic exhibits self-similarity by remaining bursty over all the time scales. The self-similarity property is well modeled by packet arrivals that occur in bursts, where the number of packets in each burst follows a heavy-tailed distribution [9]. A heavytailed distribution has a finite mean, but an infinite variance. An example is the Pareto distribution, defined by the following complementary distribution function: p(X> x If the reservation protocol is implemented in an actual access network, it must be able to efficiently handle the occasional large bursts of data generated by users. In this case, the impact of bounds is much more significant than for the Poisson model. If no bound exists, then a user generating a large burst will be allowed to send all the data during the next data phase. While this represents an efficient use of the channel, it will cause large delays for the other users. On the other hand, if a bound is used care must be taken not to have one that is too short. The shorter the bound, the more total data phases are required to completely transmit all the packets from a single large burst. Each additional data phase required adds more delay, due to the associated reservation phase. It seems overly simplistic to address the problem of a heavy-tailed burst distribution with a fixed bound. Instead, the AN may dynamically change the bound based on all the requests received during a reservation phase. For instance, if the only request received is 84 from a user wishing to send a large transaction the AN may allocate more packets to be sent than if it received additional requests from other users. In order to establish an efficient scheduling algorithm, simulations will likely have to be conducted. Few analytical results exist for network performance associated with heavytailed packet arrival bursts. Measured Poisson 21 p ic 6- - ;1) 45 1 IV 10 o Figure 5.1: Poisson Traffic vs. Actual Network Traffic [38] 5.2 Alternatives All the hardware specified in Section 3 is commercially available. Where applicable, care was taken to suggest hardware - such as the CPU and RAM - that is not state-of85 the-art in order to keep costs down. While the costs of the components needed for the user NICs and the AN are reasonable, significant research and design expense may be needed to successfully integrate all the components and develop the required firmware. A competing design for sharing the IP port is to eliminate the need for a MAC protocol by using WDM and having a dedicated wavelength for each user. A wideband optical receiver at the AN - or multiple receivers, each tuned to one of the user wavelengths will receive all the signals, perform optical-to-electronic conversion, and forward the IP packets to the router. Each user may use an off-the-shelf GE card, with the standard laser replaced by one operating at the desired wavelength. This approach is simpler to implement, but the need to use laser and receivers operating at various frequencies may result in higher hardware costs. Further, the number of users that may share a port is limited by the number of available wavelengths. In order to choose between this alternative design and the reservation protocol presented in the thesis, a detailed assessment of the costs must be conducted. An evaluation must also be made as to how many users can realistically share one IP port, based on their traffic generation characteristics and the capacity of the port. If this number exceeds the number of available wavelengths, then the WDM approach cannot fully utilize the port. 5.3 Improvements 5.3.1 Shortening the Reservation Interval A significant proportion of the reservation phase is due to the delay between the AN sending its RA frame, and beginning to receive data from the first scheduled user. As described in Section 2.6.1.1 the AN is not aware of the propagation delay to any of the users. In order to ensure that the first user has sufficient time to begin transmitting its packets when receiving the RA, the AN must schedule this user as though it were the maximum 10 km distance away. If the AN was aware of its distance to all users, it could schedule closest users first and thereby reduce the round trip delay. For instance, if the closest user was 500m away, the AN could schedule it to transmit so that its data arrives only 5pts after the RA was sent, since 2(500m) 2-.108MI s - 54us Users that are further away will be scheduled to transmit later, to ensure they have received the RA by the time it is their turn to transmit. This approach would require the addition of a new control frame which users send to the AN informing it of their propagation delay after measuring it upon registering. 86 5.3.2 Retaining Reservation Requests Exceeding the Bound In the bounded, contention-based reservation case, if a user's request exceeds the bound the AN does not keep a record of the additional packets. This means that they must be reserved again in the subsequent reservation interval. Since the request may be unsuccessful due to a collision, the packets may have to wait for several cycles before they can be sent. If the AN kept track of the requests exceeding the bound, it could schedule these packets in the subsequent data phase and eliminate the potential additional delay experienced by the packets. In order to avoid duplicate requests under this new approach, each data packet would have to be assigned an identification number. This would increase the amount of overhead needed for scheduling - rather than just requesting a certain number of packets, a user has to provide the identifier for each one. The RA packet would also have to identify the specific packets that may be sent. 87 Conclusion This thesis has addressed the problem of sharing an IP router port among a group of bursty users that access it over a shared optical channel. Chapter 1 examined multiaccess schemes that were candidates for use in a MAC protocol. TDM and OpticalCDMA were determined to be poor choices due to their inefficiency when a small proportion of users are active. Slotted Aloha and CSMA/CD were shown to provide too low of a throughput at high network loading. Of the remaining candidates, reservation was chosen over token-passing due to its lower average delay. This difference is due to the fact that reservation requests can be pipelined, whereas each token-passing operation requires a round-trip delay. It addressed Chapter 2 provided a high-level specification of the protocol. synchronization issues and quantified the amount of guard time needed between transmissions. It identified, as well, all the control packets needed to implement the protocol. The chapter also compared the operation of a contention-free TDM reservation protocol to an Aloha-contention one. Chapter 3 introduced Gigabit Ethernet, and described how the protocol could be implemented using a combination of GE hardware and other technology. Specific hardware was suggested, and a high-level architecture of the AN and the user NICs was provided. Chapter 4 provided a final performance evaluation of the reservation protocol implemented using the hardware described in Chapter 3, under the assumption of Poisson traffic and a small fraction of active users. It was found that the variant using a contention-based reservation phase had a significantly better delay performance. Finally, simulations conducted indicated that the simplified approach introduced in Chapter 1 to find the optimal number of contentions slots and the resulting delay yielded good estimates. 88 Appendix: Simulation Code In this example 100 users are active, and 50 contention slots are available. users:=100; maxpack:=2000; /most packets a user may send during simulation/I slots:=50; fixed:=1.00304E-4; /propagationdelay andfixed transmission time element of RAI slot:=array[1..slots]; I/arrayfor tracking contention slots/I maysend:=array[L..users]: /tracks users who are successful in reserving/I pick:=rand(1. .slots); /function for picking a random contention slot/I for i from 1 by 1 to slots /initialize/I do slot[i]:=0 end do: for i from 1 by 1 to users I/initialize// do maysend[i]:=0 end do: resttime:=5.76E-7*slots+fixed; /time to send all RR packets + fixed / delay:=array[users,maxpack]; I/stores delaysfor all packetl/ arrival:=array[users,maxpack]; //stores arrival times/I sent:=array[users,maxpack]; /stores departure times/I for i from 1 by 1 to users //initializell do arrival[i,0]:=0 end do: packettime:=0.00001208; /time to send a data packetl/ packetsthiscycle:=array[ 1..4000]; /tracks packets sent per cycle// lastpacket:=array[users]; //indexes total number of packets each user has to send// lastsent:=array[users]; //tracks last packet currently sent by a user during simulation/I hadpackettosend:=array[users,4000]; //tracks which users sent packets each cycle/I randomize(); for i from 1 by 1 to users do //generate the arrivaltimes for datapackets/ forj from 1 while arrival[i,(j-1)]<1 /simulate 1 second/I do arrival [ij]:=arrival [i,(j- 1)]+stats[random,exponential [81.92]] (1): /use exponential-distributionrandom generatorfor inter-arrivaltimes/I lastpacket[i]:=(j-1): //recordtotal packets each user has to send/I 89 end do end do: totalpackets:=0: for i from 1 by 1 to users /total the number of packets to be sent/I do totalpackets:=totalpackets+lastpacket[i]: end do: for i from 1 by 1 to users I/initialize/I do for j from 1 by 1 to maxpack do sent[ij]:=O end do end do: for i from 1 by 1 to users I/initialize/I do lastsent[i]:=1 end do: for i from 1 by 1 to users /initialize// do for j from 1 by I to 2000 do hadpackettosend[i,j]:=O end do end do: for i from 1 by 1 to 2000 : /initializell do packetsthiscycle[i]:=O end do systime:=O; /simulation time/I previous:=O; I/end time ofprevious data interval/I packets:=O; /packets sent/I totalsending:=O; /users sending each cycle/I for i from 1 by 1 while packets<totalpackets I/actual simulation starts here/I do systime:=(systime+resttime+totalsending*6.4E-8): /advance systime by length of reservationphase/I for x from 1 by 1 to slots /initialize/I do slot[x]:=0 end do: for x from 1 by 1 to users I/initialize/I do maysend[x]:=O 90 end do: for j from 1 by 1 to users /test if each user has packets to send/I do if arrival[j,(lastsent[j]+1)]<=previous then hadpackettosendU,i] :=1: end if: end do: for j from 1 by I to users do I/each user with packets attempts to reserve/I if hadpackettosendU,i]=1 then temp:=picko: /pick a slot/I if slot[temp]=O then maysend[j]:=1: slot[temp]:=j: Ilif no other user sending, no collision/I else maysendU]:=O: /collision: user cannot send its data/I maysend[slot[temp]]:=0 /the other colliding user also can't send/I end if: end if: end do: totalsending:=0: /track how many users are sending data/I for j from 1 by I to users do totalsending:=totalsending+maysend[j] end do: for k from 1 by 1 to users do /all users successfully reserving send their packets that arrivedbefore the beginning of the reservationphase/I if maysend[k]=1 then forj from lastsent[k] by 1 while (arrival[k,j]<=previous and j<=lastpacket[k]) do if (sent[kj]=O) then systime:=(systime+packettime): /inc. systimell sent[k,j]:=systime: /record transmissiontime/I 91 packets: =packets+ 1: packetsthiscycle[i]:=packetsthiscycle[i]+1: lastsent[k]:=j: end if: end do: end if: end do: previous:=systime: end do: cycle:=i; /record number of cycles taken to send all packets/I for i from 1 by 1 to users do for j from 1 by 1 to lastpacket[i] do delay[i,j]:=(sent[i,j]-arrival[ij]) end do: end do: /calculate waiting time for each packet/I counter:=O: for i from 1 by 1 to users do for j from 1 by 1 to lastpacket[i] do counter: =counter+delay[i,j] do end end do: /add up all the delays/I counter/(packets); /find average delay/I temp:=O: for i from 1 by 1 to users do for j from 1 by I to cycle do temp:=temp+hadpackettosend[i,j]: end do: /add up total number of times users had data to send/I end do: evalf(temp/(users*cycle)); /find averageproportionof time users have data to send/I 92 References [1] Adaptec. PCI,64-Bit and 66-MHz Benefits. 2002. http://www.adaptec.com/worldwide/product/markeditorial.html?prodkey=pci64bi t&cat=%2ffechnology%2fSCSI%2f&type=SCSI [2] Arnold, Eric E. Memory Types PC100 DRAM FPMEDO PC66 PC133 PC150 SDRAM nDRAM. 2002. http://home.cfl.rr.com/eaa/MemoryTypes.htm [3] Bertsekas, Dimitri and Gallager, Robert. Data Networks, Second Edition. Prentice Hall, 1992. [4] Capp6, Olivier et al. "Long-Range Dependence and Heavy-Tail Modeling for Teletraffic Data". To appear in IEEE Signal ProcessingMagazine, 2002. [5] Cisco. Cisco 10000 ESR GigabitEthernet Interface Module. 2001. http://www.cisco.com/warp/public/cc/pd/rt/10000/prodlit/gige-ds.htm [6] Claffy, K et al. "Packet Sizes and Sequencing", CooperativeAssociationfor Internet Data Analysis (CAIDA) Traffic Analysis Teaching CD. 2001 http://traffic.caida.org/TrafficAnalysis/Learn/Size/index.html [7] Coming. 956W Data Sheet, 2001. http://www.comingfrequency.com/catalog/datasheets/956w.pdf [8] Finisar. Optical GBIC TransceiverModules. 2002. http://www.finisar.com/media/productdocumentdetail/site2_1042273571_FTR1319-3A.pdf [9] Fischer, Martin J. and Fowler, Thomas B. "Fractals, Heavy-Tails, and the Internet." Mitretek Technology Summaries. 2001 www.mitretek.org/pubs/mitretek-summariessummer/ SigmaPubs/Fractals.PDF [10] Froberg, N.M. et.al. "The NGI ONRAMP Test Bed: Reconfigurable WDM Technology for Next Generation Regional Access Networks". Journalof Lightwave Technology, Vol. 18, No. 12, December 2000. [11] Intel. IXP 1200 Network ProcessorFamily ProductBrief 2001 http://www.intel.com/design/network/prodbrf/279040.htm [12] Intel. IXDP 1200 Advanced Development Platform ProductBrief 2001 http://www.intel.com/design/network/prodbrf/279042.htm [13] Intel. 82543GC GigabitEthernet Controller.2002. http://www.intel.com/design/network/products/lan/controllers/82543gc.htm 93 [14] Kadambi, Jayant; Crayford, Ian and Mohan Klakunte. GigabitEthernet: Migratingto High-BandwidthLANS. Prentice Hall PTR, 1998. [15] Kam, Anthony C. and Siu, Kai-Yeung. "Supporting Bursty Traffic with Bandwidth Guarantee in WDM Distribution Networks" IEEE Journalon Selected Areas in Communications, Vol. 18, No. 10, October, 2000. [16] Keiser, Gerd E. Local Area Networks. McGraw Hill, 1989. [17] Keiser, Gerd E. Optical FiberCommunications. McGraw Hill, 1999. [18] Kwong, Wing C. "Performance Comparison of Asynchronous and Synchronous Code-Division Multiple-Access Techniques for Fiber-Optic Local Area Networks". IEEE Transactionson Communications,Vol.39, No. 11, November, 1991. [19] LSI Logic. SpeedBlazer L80710 Gigabit and L84700 Quad Serializer/Deserialzer. 2000. http://206.204.107.130/techlib/marketing-docs/networking/speedblazer.pdf [20] Marvell, 88E1043 Alaska Quad FiberTransceiver. 2002. http://www.marvell.com/Internet/Products/products/1,2414,3-30-185-31,00.html [21] Modiano, Eytan and Berry, Richard. "A Novel Medium Access Control Protocol for WDM-Based LAN's and Access Networks Using a Master/Slave Scheduler." JournalOf Lightwave Technology, Vol. 18, No. 4, APRIL 2000. [22] National Semiconductors. DP83820 10 /100/ 1000 Mb/ s PCI Ethernet Network Interface Controller.2002. http://www.national.com/ds/DP/DP83820.pdf [23] Peterson, Larry L and Davie Bruce S. Computer Networks: A Systems Approach. Morgan Kaufmann Publishers, 2000. [24] Ramaswami, Rajiv and Sivarajan, Kumar N. Optical Networks: A Practical Perspective. Morgan Kaufmann Publishers, 1998. [25] Robertazzi, Thomas G. Computer Networks and Systems: Queuing Theory and PerformanceEvaluation. Springer Verlag, 1994. [26] Soper, Mark. Efficiency for the Ethernet MAC protocol - CSMA-CD. EECS 122 Course Notes (Berkeley), 1996. http://robotics.eecs.berkeley.edu/~eel22/FALL1996/Discussion/DiscWeek6/disc week6/node3.html 94 [27] Rom, Raphael and Side, Moshe. Multiple Access Protocols: Performance and Analysis. Springer Verlag, 1990. [28] Ross, Sheldon M. Introduction to ProbabilityModels, Sixth Edition. Academic Press, 1997. [29] Salehi, Jawad A. "Code Division Multiple-Access Techniques in Optical Fiber Networks - Part 1: Fundamental Principals". IEEE Transactions on Communications, Vol. 37, No.8, August, 1989. [30] Sander Sassen. Lies, Damned Lies, and a Different Perspective, 2002. http://www.hardwarecentral.com/hardwarecentral/reports/1686/1/ [31] Schmidt, Mike. 80x86 Integer Instruction Set (8088-Pentium). 2002. http://www.quantasm.com/opcode_i.html [32] Schmidt, Mike. 80x87 Instruction Set (x87 - Pentium). 2002. http://www.quantasm.com/opcodef.html [33] Strdsslin, Thomas and Gagnaire, Maurice. A Flexible MAC Protocol for AllOptical WDM Metropolitan Area Networks. IPCCC2000. 2000 [34] Tanenbaum, Andrew S. ComputerNetworks, Third Edition. Prentice Hall PTR, 1998. [35] Tasaka, Shuji. PerformanceAnalysis of Multiple Access Protocols. MIT Press, 1986. [36] TechFest. PCI Local Bus Technical Summary. 1999. http://www.techfest.com/hardware/bus/pci.htm [37] Vectron International. TC-210 (Z5) Series TCXO's. 2001. http://www.vectron.com/products/tcxo/tc-210_011601.pdf [38] Willinger, Walter and Paxson, Vern. "Where Mathematics Meets the Internet." Notices of the AMS. September, 1998. [39] Windl, Ulrich (ed.). The NTP FAQ and HOWTO. 2002. http://www.eecis.udel.edu/-ntp/ntpfaq/NTP-a-faq.htm [40] Xilinx. DS200 (vL.0) ProductSpecification. 2002. http://www.xilinx.com/ipcenter/catalog/logicore/docs/gig-eth-mac.pdf 95