2000 Systems Engineering Capstone Conference • University of Virginia A SIMULATION ANALYSIS OF THE INITIALIZATION OF DOCSIS-COMPLIANT CABLE MODEM SYSTEMS Student team: Rolando Domdom, Brian Espey, Mark Goodman, Karen Jones, Vincent Lim Faculty Advisor: Dr. Stephen Patek Department of Systems Engineering Client Advisors: Jerry Lankford and Mac Hartless Ericsson, Inc. Lynchburg, VA E-mail: EUSJERL@am1.ericsson.se KEYWORDS: Cable Modem, DOCSIS, Event-Driven simulation, MAC Protocol, Contention Resolution ABSTRACT An analysis of Data-Over-Cable System Interfaces Specifications (DOCSIS) cable modem (CM) systems has been conducted, with specific emphasis on initialization and effective network recovery time. An object-oriented, event-driven simulation has been developed to simulate cable modem network topologies consisting of 200, 500, and 1000 CMs. In particular, the simulation is a high-fidelity model of a singlechannel upstream system, with the assumption of an infinite bandwidth downstream. In the analysis, distributions of CM initialization completion time were constructed using data from the simulation trials. The plots of the various CM network topologies show aggregated distributions of CM initialization time. The 200 and 500 CM network distributions tend to exhibit a leftward-skew, with heavy right tails, while the 1000 CM network exhibits almost a constant rate of CM initialization. INTRODUCTION In this paper we present preliminary simulation results that indicate the recovery times of generic DOCSIS-compliant networks with up to 1000 CMs that share a single upstream channel. We consider the case where all CMs receive power simultaneously, as though power in the metropolitan area is suddenly restored after a thunderstorm. The problem we consider is very similar to the “Ready, Go! Situation” studied by Ishihara and Okada, which relates to the transient performance of Ethernet in the classroom where large numbers of students compete for server access over a common LAN (Ishihara 1999). Other researchers have examined the performance of competing MAC layer protocols for hybrid fiber cable networks. The purpose of a MAC protocol is to define a bandwidth allocation scheme and to resolve contentions. In DOCSIS-compliant networks the contention resolution algorithm is defined in detail, so the major decision vendors must make is how to generate the bandwidth allocation MAP. The first HFC (hybrid fiber/coax) MAC protocol was the (X)DQRAP (Wu 1994). The basic principle of this algorithm was a fixed number of data grant and request opportunities. The next protocol, CPR, was very similar to the first; the primary difference was redefining a data mini-slot to allow for piggybacking requests (Limb 1997). Piggybacking involves requesting bandwidth during a data grant period. This feature significantly reduces the number of request mini-slots needed and causes less contention during those periods. More recently, Golmie et al. have compared the performance of various MAC protocols for HFC networks (Golmie 1997), and IEEE 802.14 and DOCSIS networks (Golmie 1998). The basic problem with the current research is that little information is known about the head-end control mechanism. Control mechanisms are highly proprietary, and therefore, have little published in the public domain. DOCSIS defines this structure and the basic protocol for cable modem networks, but no information exists regarding implementation details. In addition, all prior research as focused on steady-state traffic flow rather than the initialization process. Our simulation focuses on the algorithm’s performance during initialization. PROJECT SCOPE This project was commissioned by Ericsson, Incorporated, a worldwide supplier of 43 A Simulation Analysis of the Initialization of DOCSIS-Compliant Cable Modem Systems telecommunications products, who is currently undergoing the development of a new line of cable modem and cable modem termination system (CMTS) products. These cable modem products are part of Ericsson’s strategy to enter into the residential broadband market. As part of this cable modem development strategy, Ericsson desired a means of analyzing the effective time-to-recovery of a downed cable modem network after catastrophic failure. This time-to-recovery is defined as the time needed for all of a network’s CMs to come back online after being disconnected due to a network system failure, such as a power outage or a malfunctioning CMTS. To aid Ericsson in analyzing network recovery time, a simulation was designed for the intent of simulating and analyzing the effective time-to-recovery of cable modem network systems. This simulation is based on DOCSIS, which is the definition of a standard interface between cable lines and modems that allows interoperability between different CM products. OVERVIEW OF DOCSIS NETWORKS Development of cable modems is currently governed by one of two protocols, IEEE 802.14 or DOCSIS. The modems being developed by Ericsson follow the DOCSIS protocol. DOCSIS was developed by a consortium of cable TV operators, and defines the interface specifications for a cable modem network. A cable modem network is comprised of two basic elements: the downstream and the upstream. The downstream is the medium through which the CMTS sends messages to all cable modems. Its bandwidth is very wide, so for the purposes of modeling, one can reasonably make the assumption that the bandwidth is infinite. In other words, the CMTS can send as much information as necessary to all attached CMs, and the only factor which determines how long it will take a CM to receive the message is the physical distance between it and the CMTS. The second major component of a DOCSIS network is the upstream. The upstream is the medium through which CMs send messages to the CMTS. In general, these messages are varying types of requests, either for physical maintenance or more upstream bandwidth, or they are data packets that a particular modem is sending to the CMTS. Within one upstream channel, the signal for only one frame can occupy a 44 particular point on the cable line at a time. If two or more frames occupy the cable at the same point and at the same time, i.e. if a collision occurs, the CMTS is unable to interpret the resulting message. Thus none of the CMs that sent a request at that time on the same upstream channel will receive a response from the CMTS, and they will have to send another request. To reduce the collision problems associated with this process, DOCSIS defines a protocol that all CMs must follow whenever they need to use the upstream. The purpose of this protocol is twofold: 1) to regulate access to the upstream, so that only certain CMs can use it at a given time, and 2) to provide a means of contention resolution, so that when a collision does occur, it is resolved as quickly as possible. The first function is implemented through the bandwidth allocation MAP, and the second through a well-defined contention resolution process. Bandwidth Allocation MAP The bandwidth allocation MAP is a message that the CMTS sends to all CMs on a particular downstream. It describes the ways in which an upstream channel can be used for a certain range of time. There are essentially three different types of actions a CM can perform within a given MAP interval: 1) attempt to join the network, 2) request bandwidth, or 3) send data using bandwidth the CMTS has granted it. The first two actions may result in contention, as the time intervals described in these regions of the MAP define broadcast opportunities. The third action occurs within a unicast time interval, allowing only one particular CM to transmit, so it is impossible for contention to occur. The basic decision the CMTS must make when constructing the MAP is how to distribute the amount of time allocated to each of these three types of actions. The MAP is composed of an array of various types of information elements (IE), each of which describes allowed upstream usage for a certain time interval. In our simulation, we consider four types of IEs: Initial Maintenance, Station Maintenance, Requests, and Grants. A CM uses the first two when it begins the initialization process. The purpose of these regions is to range the CMs, so they are aware of the physical propagation delay experienced by frames as they traverse the cable line. The details of how these two 2000 Systems Engineering Capstone Conference • University of Virginia regions are used are discussed below. After joining the network, the CM only uses the second two types of IEs. A portion of an example map is shown in Figure 1. Address BROADCAST BROADCAST CM2 Type Initial Maintenance Request Grant Offset 0 Initialization Process The initialization process for a single CM can be divided into eight primary phases (DOCSIS, 1999). Figure 2 gives a visual representation of the initialization process flow. 4 16 Figure 1—Sample MAP Message Each row in this table represents one IE. The offset field is used for determining the time interval described by the information element. For example, the second IE in this map defines a broadcast Request region that begins at a time that is 4 mini-slots after the start time of this map. A mini-slot is a discrete unit of time that must be a power of two multiple of 6.25 s. All components on a DOCSIS network track time in these discrete mini-slots. Contention Resolution When two or more CMs decide to send data within the same time interval of a broadcast MAP region, a collision occurs, destroying both messages. A contention resolution algorithm (CRA) defines the protocol the CMs must follow when attempting retransmission. The CRA is based on the concept of backoff windows. A backoff window defines an interval from which a CM randomly selects a number. This number indicates the number of transmission opportunities it must defer before it attempts retransmission. DOCSIS defines that the size of the back-off window must grow in truncated binary exponential form, with the initial backoff window and the maximum backoff window controlled by the CMTS. The values are communicated to the CMs in the MAP and represent a power-of-two value. When a CM sends its first request it sets its back-off window, b, equal to the Data Backoff Start defined in the map. The CM randomly selects a number, R, between 0 and 2 b - 1. It defers R transmission opportunities, retransmits, and waits for a response from the CMTS. If the CM still does not receive a response, it increments b by 1 (unless b is equal to the maximum back-off value), and repeats the process. This cycle continues until the CM receives a response, or it until it exhausts its retries. DOCSIS defines the exhaustion point as 16 retries. Figure 2—Flow Diagram of DOCSIS Initialization [Adapted from DOCSIS, 1999] Our simulation captures the first seven of these phases. The first two are primarily physical layer operations, and are modeled in the simulation simply as receipts of various types of control frames from the CMTS. Beginning with the third phase, Ranging, each CM must use the upstream to complete the required tasks. At the start of the Ranging process, a CM scans the map for an Initial Maintenance region. It generates a Ranging Request within this interval, and waits for a Ranging Response from the CMTS as well as a Station Maintenance opportunity in a subsequent map. Because this region is broadcast, contention may occur, in which case the CM would attempt retransmission based on the CRA discussed above. Once the CMTS receives the initial Ranging Request, it adds a unicast Station Maintenance region for the CM in the next map. The CM uses this region to generate a second Ranging Request. Once it receives a response from the CMTS, it moves on to the next phase of initialization. The remaining phases are all implemented as Request/Grant cycles. For instance, to obtain an IP 45 A Simulation Analysis of the Initialization of DOCSIS-Compliant Cable Modem Systems address, the CM generates a Request for bandwidth, waits for a Grant in the next map, and sends the IP address request in a subsequent Grant region. The time to completion for these phases is modeled as a normally distributed random variable with a specified mean and variance. PROBLEM DEFINITION Ericsson is most interested in the initialization phase, since a network is considered to have fully “recovered” from a power outage once all CMs have completed initialization. Once power is restored, all CMs will simultaneously come online and attempt to initialize themselves on the network. The large number of resulting collisions forces many CMs to back off and defer transmission opportunities, causing increased time to initialization and redundancy in data retransmission. The focus of this paper was to determine how recovery time varied with the number of CMs that were attempting to initialize on a given network. We utilized a basic map generation algorithm (discussed in the next section), which prioritizes IEs by their type, to examine the effectiveness of this algorithm. THE SIMDOCSIS SIMULATION TOOL Assumptions In order to begin analyzing network recovery time, a network simulation was designed to model DOCSIScompliant cable modem networks. While the SimDOCSIS model attempts to follow the DOCSIS specifications as accurately as possible, the following assumptions were made to simplify modeling: Downstream bandwidth is infinite Processing time is constant for all frame types Downstream and upstream are error free (no noise) Receipt of two SYNC messages completes synchronization Min and max backoff window exponents are fixed during the operation of the network Times for TFTP file transfer, IP address acquisition, and time-of-day acquisition are normally distributed The assumption of infinite downstream bandwidth is based on its significantly larger size relative to the upstream’s smaller bandwidth. Also, with respect to the receipt of SYNC messages, we assume that the 46 required physical operations occur instantaneously. Thus, the only aspect that we model is the receipt of two synchronization frames from the CMTS. SimDOCSIS MAP Algorithm The MAP algorithm implemented in the SimDOCSIS simulation tool is based on the prioritization of IEs by type. First, the CMTS checks to see if needs to include an Initial Maintenance region, based on the amount of time that has elapsed since the last Initial Maintenance region was included. Next, the CMTS grants Station Maintenance IEs for all CMs who successfully sent Ranging Requests in a previous Initial Maintenance interval. The CMTS then allocates grants based on Requests that it has received. The CMTS uses the remaining time described by the map for the Request region. Each map is 400 mini-slots (10 ms) long. EXPERIMENT DESIGN These experiments were designed to analyze the effective time-to-recovery of a downed cable modem network due to catastrophic failure, such as a power outage. The experiment’s objective was to determine how recovery time varies with the number of CMs on the network. In order to analyze network recovery time, three cable modem network topologies consisting of 200, 500, and 1000 CMs were simulated. CMs were evenly dispersed along a 5000-meter trunk-line from the CMTS. Also, all CMs were initialized to come online immediately (i.e. at 0 ms) to mimic the nearinstantaneous powering-on of CMs after a power outage failure is resolved. In this paper, we ignore the effect of network traffic attributable to CMs that complete initialization. This is based on the assumption that immediately after a power-outage is resolved, people on the network will not be attempting to use their modems, thus a minimal amount of data traffic is generated. CM initialization time data were collected from fifty simulated trials of each network configuration. The data were then aggregated and plotted in histograms to approximate the initialization completion time distributions for each of the three network types. 2000 Systems Engineering Capstone Conference • University of Virginia Table 1—Experiment Simulation Parameters Parameter Number of CMs CM Distances Number of Upstreams Time-Ticks Per Mini-Slot Online Time No. of Mini-Slots per Map Slots Per Initial Maint. Slots Per Station Maint. Slots Per Request Ranging-Backoff Window Data-Backoff Window TFTP Time Dist. Value 200, 500, and 1000 Evenly dispersed within 5,000-meters of CMTS 1 4 0 ms 400 56 1 1 0-1 4-8 ~N(2000, 10002) RESULTS 200 CM Network although the network consists of relatively few CMs, meaning that contention is probably not the primary factor. It is more likely that in this small network, these last-to-initialize CMs had a longer TFTP response interval relative to other CMs. This greater TFTP time can significantly delay their initialization completion while waiting for a TFTP response. Most of the completion variance is in the 3,0009,000 ms region, which can be attributed to the highlevel of contention during these intervals. As the number of initialized CMs increases, the variance of completed CMs diminishes, reflecting the effect of fewer contentions. In general, a 200 CM network completes within 20,000 ms, with most completing CM initialization within 14,000-17,000 ms region (75%-quartile is at about 16,000 ms). Figure 4 depicts the total network initialization times for the fifty simulated runs. Results from the simulation trials of a 200 CM network are shown below. Figure 3 is a histogram showing the number of CMs initialized during a given time interval for all fifty simulation runs. The middle curve depicts the average number of CMs initialized during an interval, and the two surrounding curves represent the first standard deviations at that interval. Figure 4—Cumulative Network Initialization Times for Simulations of 200 CM Network 500 CM Network Figure 3—Distribution of CM Initialization Times for Simulations of 200 CM Network This heavy-tailed distribution is left-skewed, indicating that most CMs were able to quickly initialize within the first half (~10,000 ms) of total simulation time. However, the marginal number of CMs initialized diminishes thereafter, and a significant amount of time is needed to initialize the relatively few remaining CMs. Part of this lag in time is attributable to the result of heavy contention for the IM region, Figure 5 gives the CM initialization times for a 500 CM network. Again, the distribution seems to exhibit the leftward skew present in the 200 CM case, and peaks in the 11,000-12,000 ms time interval. The heavy tail in this larger CM network is due to both the TFTP effects also present in the 200 CM network, and the large amount of contention for IM slots. Further examination into the levels of network contention have found that, on average, a 500 CM network is subject to approximately 20,000 IM contention incidences, which is about a five-fold increase over the 2,000 incidences of IM contention in the 200 CM networks. 47 A Simulation Analysis of the Initialization of DOCSIS-Compliant Cable Modem Systems In Figure 7, we can see that a 500 CM network generally has an effective time-to-recovery of 44,000 ms. The 75%-quartile lies at approximately in the 36,000-38,000 ms interval, where a little over 40 simulations complete the initialization process for all 500 modems. Figure 5—Distribution of CM Initialization Times for Simulations of 500 CM Network Perhaps the most interesting feature of this graph is the anomalous second “peak” that is apparent in the 16,000-18,000 ms time interval. Completion time variance increases in this time interval as more CMs come onto the network. It is important to note that this phenomenon is evident not only on the aggregate level, but also within the individual simulation runs as well. Figure 6 shows a sample of five simulation trials, in which the “dip” at the 14,000-16,000 ms interval, and the corresponding second peak are clearly present. Figure 7—Cumulative Network Initialization Times for Simulations of 500 CM Network 1000 CM Network The 1000 CM network was the most timeconsuming to simulate, due to the large amount of contention and network activity during the initialization process. Figure 8 presents the distribution of CM initialization times. The distinctiveness of the peaks, which are so prominent in the 200 and 500 CM networks, have flattened-out considerably as the marginal rate of CM completion is more evenly distributed across the entire initialization process. Figure 6—Sample Plots of Individual Simulations of 500 CM Networks At this point the reason for the second peak is unclear. One hypothesis is that these peaks are the result of a sudden increase of CMs attempting to initialize onto the network—such as when the duration of a range response timeout expires, or when CMs restart their backoff windows after exhausting their retry limit. Further research is needed in profiling the states of CMs throughout the initialization process to determine the effects of timeouts and exhausting retries. 48 Figure 8—Distribution of CM Initialization Times for Simulations of 1000 CM Network 2000 Systems Engineering Capstone Conference • University of Virginia Another aspect of this flattened curve is that the initialization distribution does not appear to exhibit an obvious leftward shift present in the cases of the 200 and 500 CM networks. This indicates that, unlike the other two smaller networks, there is actually enough IM contention early in the initialization process to force the majority of CMs to defer a large number of times before they receive an IM acknowledgement. Thus the marginal rate of initialization is much more gradual than in the 200 or 500 CM network cases. Figure 8 also shows evidence of the multiple-peak effect first witnessed in the simulations of the 500 CM network. In the 1000 CM case, however, there appear to be at least two peaks—one in the 77,000-99,000 ms interval, and another at 121,000-132,000 ms range. We are currently investigating the cause of these peaks. The 1000 CM network graph shows a fairly high level of variance, which stays relatively constant over the duration of the simulation, again indicating that contention is consistently hampering CMs from initializing themselves to the network. This high, constant variance, even as the simulations near completion, is uncharacteristic of the 200 CM and 500 CM networks. The following graph in Figure 9 shows that a 1000 CM network has an initialization recovery time of approximately 230,000 ms, with at least 75% of the simulation trials completing their initialization processes within 190,000-198,000 ms. simulating and analyzing the performance of DOCSIScompliant cable modem networks. This project was commissioned by Ericsson, Incorporated, a worldwide supplier of telecommunications products, desired a DOCSIS simulation to analyze the effective time-torecovery of a downed cable modem network after catastrophic failure. In order to begin analyzing recovery time, network scenarios consisting of 200, 500 and 1000 CM networks were each simulated for fifty trial runs. Data were collected for each simulation run and aggregated to produce histograms representing distributions of CM initialization times over a given time interval. The approximate recovery times for each network type are given in Table 2. Table 2—Summary of Network Recovery Times CMs 200 500 1000 Approximate Recovery Time (ms) 20,000 44,000 230,000 In general, the 200 and 500 CM network cases exhibited leftward-shifted distributions with heavy tails. This indicated that most CMs were able to initialize fairly quickly, but the remaining few tended to have delayed completion times due to long TFTP response times (primarily in the 200 CM case), and large amounts of deferral opportunities resulting from IM contention. Unlike the 200 and 500 CM cases, the 1000 CM network tends to show a flattened curve throughout the initialization process, indicating a high enough level of contention force many CMs to defer transmission opportunities. This is reflected in the gradual (almost linear) rate of CM initialization, relative to the other two smaller networks. Finally, the 500 and 1000 CM cases also show evidence of “multiple-peaks.” However, the cause for these additional peaks is unknown, and further investigation into the individual CM states at these peak-times is needed. Figure 9—Cumulative Network Initialization Times for Simulations of 500 CM Network CONCLUSIONS An object-oriented, event-driven simulation program has been developed for the purpose of RECOMMENDATIONS The results presented in this paper give a good first-cut approximation of effective network recovery time from a catastrophic failure. Clearly, recovery time can be improved by increasing the initial size of the IM 49 A Simulation Analysis of the Initialization of DOCSIS-Compliant Cable Modem Systems region to accommodate the flood of early, simultaneous traffic characteristics after a power outage. This would require a trade-off with other aspects of the network, such as data transmission. To effectively determine the optimal use of bandwidth several factors should be studied further. Due to the extendable feature-set of SimDOCSIS, additional types of analyses can be performed to determine optimal bandwidth allocation algorithms. The modular nature of the SimDOCSIS simulation framework allows for the inclusion of multiple bandwidth allocation MAP algorithms to test their relative performance under similar conditions to the experiments discussed in this paper. Further analyses using SimDOCSIS should also investigate the effects of including traffic modeling (both telephony and data) upon initialization times. The inclusion of traffic should significantly increase the effective recovery time, and requires better, more dynamic MAP algorithms to accommodate the tradeoffs in initializing CMs still not on the network and granting bandwidth to already initialized CMs. Including partial grants and piggybacking should significantly aid in the CM initialization process with traffic. Finally, quality of service (QOS) needs to be considered for accurately modeling networks with varying service levels for its CMs. REFERENCES MCNS Holdings L.P. “Data-Over-Cable Service Interface Specifications, Radio Frequency Interface Specifications, SP-RFIv1.1-I02-990731” Cable Television Laboratories, Inc., 1999. Golmie, N., G. Pieris, S. Masson and D. Su. “A MAC Protocol for HFC Networks: Design Issues and Performance Evaluation.” Computer Communications, vol. 20, pgs. 1042-1050, 1997. Golmie, N., F. Monveaux, and D. Su. “A Comparison of MAC Protocols for Hybrid Fiber/Coax Networks: IEEE 802.14 vs. MCNS.” National Institute of Standards and Technology, 1998. Ishihara, S. and M. Okada: "Performance Investigation of Ethernet LANs for Educational Computer Systems", Proc. of SMC'99 - 1999 IEEE Systems, Man, and Cybernetics Conference. 50 Limb, J.O. and D. Sala, “A Protocol for Efficient Transfer of Data over Fiber/Coax Systems”, IEEE Trans on Networking, Vol. 5, No. 6, Dec. 1997. Wu, C-T. and G. Campbell, “Extended DQRAP (XDQRAP) A Cable TV Protocol Functioning as a Distributed Switch”, Proc. 1994 1st International Workshop on Community Networking, pp. 191-198, July 13-14, 1994. BIOGRAPHIES Rolando Domdom is a fourth-year Systems Engineering major, concentrating in both computer information and management systems. Rolando will be working for Sprint Inc. in the Fall of 2000. Brian Espey is a fourth-year Systems Engineering major, concentrating in computer information systems, with a minor in computer science. Brian has accepted a software engineering position with Best! Software, Inc. Mark Goodman is a fourth-year Systems Engineering student, concentrating in economic systems. Mark will be working for Merrill Lynch in New York City, NY. Karen Jones is a fourth-year Systems Engineering student, with a minor in biomedical systems. Karen has accepted a position with Ernst & Young. Vincent Lim is a fourth-year Systems Engineering student, with a major in economics, and a minor in computer science. Vince has accepted a position as an investment baking analyst at Salomon Smith Barney.