Control and Provisioning of Ultra-High Speed Networks for Large Science Applications Principal Investigator: Biswanath Mukherjee Co-Principal Investigators: Dipak Ghosal and Xin Liu Department of Computer Science University of California Davis, CA 95616 E-mail: mukherjee@cs.ucdavis.edu Phone: +1-530-752-4826; +1-530-752-7004 Fax: +1-530-752-4767 Submitted to: Office of Science Notice DE-FG01-04ER04-03 High-Performance Network Research: Scientific Discovery through Advanced Computing (SciDAC) and Mathematical, Informational, and Computational Sciences (MICS) Program Manager: Dr. Thomas D. Ndousse Mathematical, Informational, and Computational Sciences Division Germantown Bldg/SC-31 Office of Science U.S. Department of Energy 1000 Independence Avenue, SW Washington, DC 20858-1290 Email: tndousse@sc.doe.gov Phone: +1-301-903-9960, Fax: +1-301-903-7774 1 Table of Contents 1. DOE LARGE SCIENCE APPLICATIONS......................................................................................................... 3 2. CURRENT STATE OF THE NETWORKS ........................................................................................................ 5 3. RESEARCH PLAN ................................................................................................................................................ 7 3.1 TRAFFIC GROOMING AND BANDWIDTH PROVISIONING ...................................................................................... 7 3.2 SURVIVABILITY AND FAULT-TOLERANT NETWORK PROVISIONING .................................................................. 9 3.3 DISTRIBUTED SPACE TIME SCHEDULING.......................................................................................................... 11 3.4 LOW LATENCY AND FAULT-TOLERANT SIGNALING PLANE ............................................................................. 14 3.5 SCALABLE AND FAULT-TOLERANT NETWORK PRIMITIVES AND INTELLIGENT SERVICES .............................. 15 4. COLLABORATION AND APPLICATIONS.................................................................................................... 16 5. STATEMENT OF WORK ................................................................................................................................... 17 6. REFERENCES ..................................................................................................................................................... 18 7. APPENDIX A: BUDGET..................................................................................................................................... 20 8. APPENDIX B: BIOGRAPHICAL INFORMATION ....................................................................................... 21 2 1. DOE Large Science Applications The next generation supercomputers hold an enormous promise for meeting the demands of a number of large-scale scientific computations from fields as diverse as earth science, climate modeling, astrophysics, fusion energy science, molecular dynamics, nanoscale materials science, and genomics (see Table 1 [DOE03] for a list of other application and their characteristics). Among the DOE sponsored large-science applications, a specific example is the Genomes To Life (GTL) Program. The goal of GTL is to use DNA sequences of microbes and higher organisms, including humans, as starting points to systematically answer questions related to the fundamental underlying processes of living systems. Towards this end, the key goals of the GLT program [GTL03] are to (1) identify the protein machines that carry out critical life functions, (2) characterize the gene regulatory networks that control these machines, (3) explore the functional repertoire of complex microbial communities in their natural environments to provide a foundation for understanding and using their remarkably diverse capabilities to address DOE missions, and (4) develop the computational capability to integrate and understand these data and begin to model complex biological systems. Table 1 Characteristics of various large-scale science applications [DOE03]. Science Areas High Energy Physics Climate Data & Computations Current End2End Throughput 0.5 Gbps E2E 0.5 Gbps E2E 5 years End2End Throughput 100 Gbps E2e 160-200 Gbps 5-10 Years End2End Throughput 1.0 Tbps high throughput n Tbps high throughput Tbps & control channels n Tbps remote control & high throughput time critical transport 1TB+ & stable streams computational steering & collaborations high throughput & steering SNS NanoScience does not exist 1.0 Gbps steady state Fusion Energy 500MB/min (Burst) Astrophysics 1TB/week 500MB/20se c (burst) N*N multicast Genomics Data & Computations 1TB/day 100s users Tbps & control channels General Remarks Consider the scenario in which a user participating in a genomics research project is running software like mpiBLAST [DCF03], which at different stages of the computation, needs to make repeated accesses to biodatabases that are scattered all over the country. These databases are very large and typically several gigabytes in size and increasing at a rate faster than Moore’s Law [DOE03], i.e., doubling every 12 months rather than every 18 months. The research groups can currently install local copies of these large databases to save the data transfer time. However, as the amount of bio-data increases exponentially due to the advanced capabilities in analytical technologies for biology, it will soon become unrealistic to keep local copies of all bio-databases in every single biology lab. As a result, from time to time (during the computation), large chunks of data will be downloaded from a number of different bio-databases. In addition, future applications may also require distributed collaborative 3 visualization, remote computational steering, and remote instrument control [DOE03]. This poses important and challenging networking research and development issues, as highlighted in the workshop report [DOE03]: An ultra high-performance network with powerful and flexible provisioning and transport modalities is needed to meet the demand of the DOE large-scale science application. Dynamic provisioning of ultra high-speed networks is identified as a critical area of research for networking technologies of DOE large-scale science projects. In this proposal, we focus on the control and provisioning of ultra high-speed networks. To elaborate, we will address the following challenges: 1. Traffic Grooming and Bandwidth Provisioning: Fiber-optic technology is the dominant choice for building and operating long-haul backbone networks because of fiber's enormous bandwidth capacity. A single strand of fiber can support 160 wavelength channels, each operating at 10 Gbps today using commercial off-the-shelf components (and extendable to 320 channels and 40 Gbps/channel in the foreseeable future). However, not all network nodes or interfaces (e.g., bio-databases, supercomputer interfaces, or other large-science DOE applications) may need such a large capacity. Therefore, how to efficiently provision high-capacity pipes of diverse bandwidth granularities (perhaps ranging from several wavelength channels to a single wavelength to sub-wavelength channel capacity) between network nodes and interfaces is a very important problem, and is known as the traffic-grooming problem. Concisely, traffic grooming refers to the mechanisms for intelligently aggregating/de-aggregating and switching lower-speed traffic streams between higher-capacity trunks (such as wavelength channels). Based on our preliminary work, we plan to further develop grooming strategies for large-science applications. 2. Fault-tolerant Network Provisioning: Reliability and fast restoration are highly desirable features of a network designed to support DOE large-science applications. Noting the huge capacity of a fiber and the fact that network failures (particularly fiber cuts) do occur more often than we wish, it is imperative that excellent protection and restoration schemes be designed in the next-generation UltraScienceNet. Given the diversity of bandwidth granularities of the various high-capacity pipes, there are several additional research challenges, e.g., should each bandwidth pipe be protected separately or should protection be set up at wavelength channel levels? Should the spare capacity be set up for "dedicated protection" for a connection or for "shared protection" which can be pooled for different connections? Should the protection/recovery be performed on a per-link basis or a connection's end-to-end basis or on the basis of a "sub-path" of a connection? Should the recovery paths be pre-computed (and periodically recomputed based on current network state for efficiency) or should they be dynamically discovered after a failure occurs? These methods will have different performance tradeoffs on reliability, restorability, restoration time, etc. It is envisioned that, perhaps, all of these approaches may need to co-exist in the same network because different applications may have different requirements on fault tolerance. We propose to investigate the applicability of the above fault-management approaches on bandwidth provisioning for the GTL application as well as other DOE large-scale science applications. 3. Space-Time Scheduling of Large Data Transfers: The problem of aggregating large data files from distributed databases and/or terrascale computing facilities will be a common task in many large science applications. This requires intelligent distributed space-time scheduling for large data transfers. In this project, we consider the problem of aggregating large data files from distributed databases and address the corresponding challenges involved from a network architecture perspective. The objective is to minimize the total time delay for data aggregation. The two dimensions of determining both the path (space) and the time make the problem difficult and differentiated it from all machine-scheduling problems which have been reported in the literature [Pin02]. We formulated the problem as a Time-Path Scheduling Problem (TPSP). We showed that TPSP is NP-complete and developed heuristic algorithms for efficient scheduling. In this project, we will extend our previous work and further develop scheduling strategies suitable for large-science applications. 4 4. Low-delay and Fault-tolerant Signaling and Control Plane Architecture: The scheduled transfer of large data sets will require a signaling and control plane architecture that can used to setup the schedule as well as manage and control the network resources [BBM03]. A key aspect of such an architecture will be to minimize the end-to-end delay of the signaling and control messages. In the context of genomics application, low-delay requirements also arise in sending control messages to supercomputers. The prediction or modeling tasks for biological sciences, such as the simulation of dynamics of bio-molecules over large time-scales [DOE03] will be carried out in supercomputers at different physical sites and will require data to be transferred from bio-databases to supercomputers and various inputs and control messages to be processed from the user running the simulations. To provide the computational power needed for these long time-scale simulations, tasks must be very tightly coordinated to ensure the effective utilization of the supercomputers. Thus, it is important that end-to-end message delays be minimized over the networks to ensure that the supercomputers do not idle waiting for control messages. We will explore in-fiber-out-of-band signaling architecture and investigate the use of redundant signaling paths to meet the low-delay and fault-tolerant requirements. 5. Scalable Network Primitives and Services: The high performance networking not only consists of the infrastructure but also various other protocol and services. This will require new transport layer protocols that can transport large amounts of data efficiently and with low latency. Algorithms must be developed to mitigate receiver-side bottlenecks that may arise when large amounts of data from a number of different databases are aggregated at a client. Network primitives such as application-layer multicasting [Jan00], caching [Ora01], intelligent data replication [PBB01], data bundling based on access patterns [KoG99], and sharing of partial computation among experts will be required to extend the capabilities of the UltraNet. The remainder of the proposal is organized as follows. Section 2 outlines the current state of the networks for largescale science applications. In particular we discuss ESNet and the goals of the newly proposed UltraNet. Section 3 gives details of the research plan. Section 4 outlines collaborations and applications and Section 5 enumerates the statement of the work. The references, the budget, the biographical information of the PIs, are provided in Sections 6 through 8. 2. Current State of the Networks The Energy Sciences Network, or ESnet, (shown in Figure 1) is a high-speed network serving thousands of DOE scientists and collaborators worldwide. A pioneer in providing high-bandwidth, reliable connections, ESnet enables researchers at national laboratories, universities and other institutions to communicate with each other using the collaborative capabilities needed to address some of the world's most important scientific challenges. The newer challenges of DOE large-scale science applications require capabilities that far transcend its production network capabilities. Consequently, the next generation network demands are simply beyond the capabilities of ESnet both in terms of the required large bandwidths and the sophistication of the capabilities. First, there is no provision in ESnet for testing Gbps dedicated cross-country connections with dynamic switching capability. Second, during the technology development process, it is quite possible for various components of the network to be unavailable for production operations; such situations cause undue disruptions for normal Esnet activities. 5 Figure 1 The ESNet backbone network. A number of proposals have been recently funded to extend ESnet. The science UltraNet [RWD03] is one important effort that this proposed research will be closely aligned to. The key goal of UltraNet is to eliminate the ever-widening performance gap between link speeds and application throughputs. While optical technologies promise lambda switched links at Tbps rates, they do not provide provisioning and transport technologies to deliver this performance to the application layer. Legacy protocols, including the most widely deployed transport protocols, namely Transmission Control Protocol (TCP), and other network components (that are optimized for low network speeds) cannot easily scale to the unprecedented optical link bandwidths. UltraNet is exploring innovative scalable architectural options that use a minimum number of layers that make wavelengths available directly to the applications [RWD03]. UltraNet will provide a rich environment to explore high-performance transport protocols that will achieve throughputs of the order of available capacity in the optical core networks. TCP was designed and optimized for low-speed data transfers over congested IP-based networks. However, its effectiveness in ultra high-speed networks based on the emerging all-optical networks is being seriously questioned, especially in the transfer of petabytes data over intercontinental distances [PFD03,STP03,Flo01,SCTP]. Another key issue to be addressed by UltraNet is traffic engineering. While MPLS has recently been extended to IP-based DWDM networks to take advantage of the optical bandwidths to address congestion problem in the IP layer, unfortunately, the required advanced traffic engineering methods have not been widely deployed in operational networks because they involve complex interdomain signaling and costing. UltraNet will provide an excellent environment to prototype the needed practical traffic engineering methods within the context of DOE networking environments. Clearly the goal of UltraNet is to develop the infrastructure and networking technologies required to support the needs of DOE large-scale science applications. The purpose of this proposed research is to extend the capabilities of UltraNet by enabling it with scalable and fault-tolerant network services and primitives that will allow rapid deployment of large science applications. 6 3. Research Plan In this project, we focus on the control and provisioning of ultra-high speed networks for large-scale science applications. We expect our project to complement and extend the research of Ultranet. Our research proposal includes (a) traffic grooming and bandwidth provisioning, (b) survivability and fault-tolerant network provisioning, (c) distributed space-time scheduling of large data transfer over ultra-high speed network, (d) low-delay and faulttolerant signaling and control plane architecture, (e) scalable network primitives and services. Figure 2 shows the roadmap of the proposed research and its potential impact on Ultranet and ESnet. Our Research: Traffic Grooming Signaling and Control Plane Architecture Scalable Services Distributed Space-Time Scheduling ESnet Features: Connects all DOE Sites 7x24 & high reliability: 9999 Best-effort delivery Routine Internet activities UltraNet (Research Networks) Production Network (ESnet) Tech Transfer Proposed Research at UCDavis Tech Transfer UltraNet Features: R&D – Breakable Scheduled operations Ultra High speed Nearly all-optical Figure 2 Roadmap of the proposed research. 3.1 Traffic Grooming and Bandwidth Provisioning We envision that large-science applications and the next-generation communication infrastructure will employ high-bandwidth optical networks as the dominant backbone technology. Optical networks based on wavelengthdivision multiplexing (WDM) technology have the ability to satisfy the bandwidth requirements of the largescience applications and future Internet infrastructure, by scaling up its existing capability (particularly its bandwidth) by 2 or 3 orders of magnitude! Under WDM, the optical transmission spectrum is carved up into a number of non-overlapping wavelength (or frequency) bands, with each wavelength supporting a single communication channel operating at whatever rate one desires, e.g., peak electronic speed. By allowing users to transmit simultaneously on different WDM channels, the huge opto-electronic bandwidth mismatch problem is solved and the aggregate traffic carried by the network is increased. Point-to-point WDM transmission technology is quite mature today, while the corresponding switching technologies (optical crossconnects (OXCs)) are still maturing. But bandwidth is precious, especially for large- 7 science applications. Once WDM transmission technology is deployed on the network backbone, efficiently utilizing the huge bandwidth at our disposal is of paramount importance. While a single fiber strand has over a terabit-per-second bandwidth and a wavelength channel has over a gigabitper-second transmission speed, the network may still be required to support traffic connections at rates that are lower than the full wavelength capacity. The capacity requirement of these low-rate traffic connections can vary in range from STS-1 (51.84 Mbps or lower) up to full wavelength capacity. In order to save network cost and to improve network performance, it is very important for the network operator to be able to mux/demux multiple lowspeed connections onto/from high-capacity circuit pipes, and intelligently switch them at intermediate nodes. This is referred as traffic grooming problem [ZhM02,ClG02,Gro99,ToN94,ZhS00,FTU02,OZM02]. For traffic grooming, a node should switch traffic at wavelength granularity as well as finer granularity. Figure 3 shows the logical view of a simplified grooming-node architecture. (In this figure, Mux/Demux form the transmission system, while the other blocks form the switching system.) This hierarchical grooming node consists of a wavelength-switch fabric (W-Fabric) and a grooming fabric (G-Fabric). The W-Fabric performs wavelength routing; the G-Fabric performs multiplexing, demultiplexing, and switching of low-speed connections. A portion of the incoming wavelengths to the W-Fabric can be dropped to the G-Fabric through the grooming-drop ports for sub-wavelength-granularity switching. The groomed traffic can then be added to the W-Fabric through the grooming-add ports. The number of grooming ports determines the grooming capacity of a node. Wavelength Fiber in Access Switch Fiber out Fabric Layer (W - Fabric) … Dem … Mu Layer x Grooming add port T R x… x… Loca Rx Tx Grooming drop port ë Layer Grooming Fabric … Fabric) (G - Lightpath AI … Local AO Figure 3 Grooming-node architecture and the corresponding auxiliary graph. We propose a generic graph model for traffic grooming [ZhM02]. This model uses an auxiliary graph to represent the different grooming node architectures and current network state, and takes into account various resource constraints, such as the number of free wavelengths on each fiber and the number of available grooming ports at each node. Figiure 3 shows the grooming-node architecture (left) and its corresponding auxiliary graph (right). WFabric is modeled as the layer consisting of input vertex1 I and output vertex O; G-Fabric is modeled as the access layer consisting of input vertex AI and output vertex AO; grooming-add port is modeled by an edge from 1 For clarity, we refer to node and link in the auxiliary graph as vertex and edge. 8 vertex AO to vertex O; and grooming-drop port is modeled by an edge from vertex I to vertex AI. A unidirectional fiber is represented as an edge from vertex O at the source node to vertex I at the destination node of the link. A lightpath layer consisting of input vertex LI and output vertex LO is added to model existing lightpaths sourced/sunk at a node. A lightpath is represented as an edge from vertex LO at the source node to vertex LI at the destination node. Every edge is associated with two attributes: one indicating the available capacity and the other indicating the cost of the resource which the edge represents. Given a connection request, by computing the shortest path from the access-layer output port (AO) at source node to the access-layer input port (AI) at destination node, we can determine how to set up lightpath(s) and how to route the connection onto the these lightpath(s) and/or some existing lightpath(s). Given a traffic demand T(s,d,g,m), we need to determine how to route the traffic under the current network state. In general, for a traffic demand T(s,d,g,m) in a network, there are four possible operations that can be used to carry the traffic without altering the existing lightpaths. • Operation 1: Route the traffic onto an existing lightpath directly connecting the source s and the destination d. • Operation 2: Route the traffic through multiple existing lightpaths. • Operation 3: Set up a new lightpath directly between the source s and the destination d and route the traffic onto this lightpath. Using this operation, we set up only one lightpath if the amount of the traffic is less than the capacity of the lightpath. • Operation 4: Set up one or more lightpaths that do not directly connect the source s and the destination d, and route the traffic onto these lightpaths and/or some existing lightpaths. Using this operation, we need to set up at least one lightpath. However, since some existing lightpaths may be utilized, the number of wavelength-links used to set up the new lightpaths is probably less than that of wavelength-links needed to set up a lightpath directly connecting the source s and the destination d. The different ordering of the possible operations forms different grooming policies [ZhM02]. A grooming policy determines how to carry the traffic in a certain situation. It reflects the intentions of the network operator. In this project, we plan to compare the properties of various grooming policies and develop the policies based on the characteristics of GTL and other large-science applications. 3.2 Survivability and Fault-tolerant Network Provisioning Reliability and fast restoration are highly desirable features of a network designed to support DOE large-science applications. However, network failures do occur more often than we wish. Table 2 shows some typical data on network component (transmitter, receiver, fiber link (cable), etc.) failure rates and failure-repair times according to Bellcore (now Telcordia). In Table 2, FIT (failure-in-time) denotes the average number of failures in 109 hours, Tx denotes optical transmitters, Rx denotes optical receivers, and MTTR means mean time to repair. Although the problem of how the connection availability is affected by network failures is currently attracting a lot of interest [HoM02,ACQ02,RaM02,WSM02,WSM02FOEC], we still lack a systematic methodology to quantitatively estimate a connection’s availability, especially when protection schemes are used. It is imperative that excellent protection and restoration schemes be designed in the next-generation UltraScienceNet. The reliability requirement for these applications may not be identical because of their diverse service characteristics. In the commercial network, the availability requirements using Service Level Agreement (SLA), which is a contract between the 9 network operator and a customer. Usually, service reliability is represented by connection availability, which is defined as the probability that the connection will be found in the operating state at a random time in the future. Table 3 shows some typical SLAs. Connection availability can be computed statistically based on the failure frequency and failure repair rate, reflecting the percentage of time a connection is “alive” or “up” during its entire service period. Table 2: Failure rates and repair times (Bellcore). Metric Equipment MTTR Cable-Cut MTTR Cable-Cut Rate Tx failure rate Rx failure rate Bellcore Statistics 2 hrs 12 hrs 4.39/yr/1000 miles 10867 FIT 4311 FIT There are two types of fault-recovery mechanisms. If backup resources (routes and wavelengths) are pre-computed and reserved in advance, we call it a protection scheme. Otherwise, when a failure occurs, if another route and a free wavelength have to be discovered dynamically for each interrupted connection, we call it a restoration scheme. Generally, dynamic restoration schemes are more efficient in utilizing network capacity because they do not allocate spare capacity in advance, and they provide resilience against different kinds of failures (including multiple failures); but protection schemes have faster recovery time and they can guarantee recovery from disrupted services they are designed to protect against (a guarantee which restoration schemes cannot provide). Table 3: Illustrative service classes. Service Type Basic Premium Silver Gold Platinum Availability 99% 99.5% 99.9% 99.99% 99.999% Down Time/Year 87.6 hours 43.8 hours 8.76 hours 52.56 mins 5.26 mins Protection schemes can be classified as ring protection and mesh protection. Ring-protection schemes include Automatic Protection Switching (APS) and Self-Healing Rings (SHR). Both ring protection and mesh protection can be further divided into two groups: path protection and link protection. In path protection, the traffic is rerouted through a link-disjoint backup route (backup path) once a link failure occurs on its working path (primary path).2 In link protection, the traffic is rerouted only around the failed link. While path protection leads to efficient utilization of backup resource and lower end-to-end propagation delay for the recovered route, link protection provides faster protection-switching time. Recently, researchers have proposed the idea of sub-path protection in a mesh network by dividing a primary path into a sequence of segments and protecting each segment separately. Compared with path protection, sub-path protection can achieve high scalability and fast recovery time for a modest sacrifice in resource utilization. 2 Node failures can also be considered by calculating node-disjoint routes. However, one should also note that carrier-class optical crossconnects (OXCs) in network nodes must be 1+1 (master/slave) protected in the hardware for both the OXC’s switch fabric and its control unit. The OXC’s port cards, however, don’t have to be 1+1 protected since they take up the bulk of the space (perhaps over 80%) and cost of an OXC; also a port-card failure can be handled as link and/or wavelength channel failure(s). However, node failures are important to protect against in scenarios where an entire node (or a collection of nodes in a part of the network) may be taken down, possibly due to a natural disaster or by a malicious attacker. 10 Link, sub-path, and path protection schemes can be dedicated or shared. In dedicated protection, there is no sharing between backup resources, while in shared protection, backup wavelengths can be shared on some links as long as their protected segments (links, sub-paths, paths) are mutually diverse. OXCs on backup paths cannot be configured until the failure occurs if shared protection is used. So, recovery time in shared protection is longer but its resource utilization is better than dedicated protection. Dynamic restoration can also be classified as link, sub-path, or path based depending on the type of rerouting. In link restoration, the end nodes of the failed link dynamically discover a route around the link, for each connection (or “live” wavelength) that traverses the link. In path restoration, when a link fails, the source and the destination node of each connection that traverses the failed link are informed about the failure (possibly via messages from the nodes adjacent to the failed link). The source and destination nodes of each connection independently discover a backup route on an end-to-end basis. In sub-path restoration, when a link fails, the upstream node of the failed link detects the failure and discovers a backup route from itself to the corresponding destination node for each disrupted connection. Link restoration is fastest and path restoration is slowest among the above three schemes. Sub-path restoration time lies in between. Figure 4 summarizes the classification of protection and restoration schemes. Figure 4: Different protection and restoration schemes in WDM mesh networks. In summary, given the diversity of bandwidth granularities of the various high-capacity pipes, there are several additional research challenges, e.g., should each bandwidth pipe be protected separately or should protection be set up at wavelength channel levels? Should the spare capacity be set up for "dedicated protection" for a connection or for "shared protection" which can be pooled for different connections? Should the protection/recovery be performed on a per-link basis or a connection's end-to-end basis or on the basis of a "sub-path" of a connection? Should the recovery paths be pre-computed (and periodically recomputed based on current network state for efficiency) or should they be dynamically discovered after a failure occurs? These methods will have different performance tradeoffs on reliability, restorability, restoration time, etc. It is envisioned that, perhaps, all of these approaches may need to co-exist in the same network because different applications may have different requirements on fault tolerance. We propose to investigate the applicability of the above fault-management approaches on bandwidth provisioning for the GTL application as well as other DOE large-science applications. 3.3 Distributed Space Time Scheduling To support various system-level tasks of the large-science applications, it will be necessary to form interdisciplinary centers consisting of experts located at various academic, government, and industrial research labs in different geographical locations. The data related to different aspects of the system will be collected, processed, analyzed, and stored at different locations. This data must be cooperatively accessed and analyzed by teams of experts. It will be cost-effective to support such efforts over ultra high-speed networks. This requires intelligent 11 distribted space-time scheduling of large data transfers. In this project, we consider the problem of aggregating large data files from distributed databases and address the corresponding challenges involved from a network architecture perspective. We believe that an Optical Burst-Switched (OBS) network is a suitable candidate for this application. The problem is modeled as one of identifying a time-path schedule (TPS) in a graph representation of the network, as described in the following. The TPS problem (TPSP) is proven to be NP-complete [BSZ04]. Thus, we propose a Mixed Integer Linear Programming (MILP)-based approach and three heuristics to solve TPSP. We first formulate the TPSP problem. Let us consider an OBS mesh network topology. The mesh network can be represented as a graph G(V, E), as shown in Figure 5. Vertices V represent OBS nodes, and the edges E represent optical links connecting the OBS nodes. The assumption is that all the optical links have the same capacity C (say OC-192). For simplicity of exposition as well as for application to a non-WDM burst-switched network, let each optical link have one wavelength only, since the problem can be easily extended for incorporating WDM. Each (Genome) data warehouse is connected to a OBS node through a dedicated link of capacity C. There may be multiple data warehouses connected to one OBS node. A supercomputer is connected to the OBS node through dedicated links, so that there is no bandwidth bottleneck from the OBS node to the supercomputer. All the above links being dedicated are not represented in the graph. Figure 5. Graph representation of the TPSP. At a certain step in the computation, the supercomputer may require data aggregated from multiple data warehouses before it resumes computation. This process is modeled as the transfer of files which require to be sent from the source OBS node (to which the corresponding data warehouses are connected) to the destination supercomputer. It should be noted that one OBS node may be connected to several data warehouses. A query is first issued by the supercomputer to all data warehouses to determine the file size required from each warehouse. Alternatively, based on how some of these applications develop in the future, the file-size information may already be available at the supercomputer. The file size provides information on its expected transmission delay, as the file is transferred from the source node to destination. The time that it takes to transfer a file along a route, T f , is the sum of the transmission delay, the propagation delay, and the overhead. Because the file size ( s f ) is typically large (typically greater than 5 Gbytes, and perhaps as large as Petabytes in some (future) applications), the transmission delay dominates T f . 12 In our graph model, at each OBS node v , there exist a set of files Sv { fv1 fv 2 … f vl } whose T f is pre-computed and denoted by the set Tv {T fv1 T fv 2 … T fvl } , which is the time to transfer each file. The OBS node that the supercomputer is connected to is modelled as d , where all the files are destined to. The objective is to determine the following: 1. Route: The path through which a file should be transferred from the source to the destination. 2. Time schedule: The time at which a file has to transmitted in a single burst so that it can be transferred through the route determined in Step 1. This is important because two files which share a link on their routes should not be transmitted at the same time to avoid collision due to the constraints of an OBS network described below. In an OBS network, although limited data buffering at OBS nodes is currently possible using fiber delay lines, it is inadequate for buffering very large files as they exist in our case. Hence, once a data warehouse starts transmitting a file, it must reach the destination in a single burst, and there is no possibility of buffering it along the path. We assume that the files cannot be fragmented. This simplifies the burst-assembly process, reduces the overhead of burst regeneration at the destination, and eliminates the possibility of errors arising due to misaligned fragments. Hence, we utilize only a single path from the source to the destination. We also assume that this path may contain no cycles. OBS switches do not have the ability to multiplex two different incoming data streams onto the same outgoing link. Therefore, each link can transfer only one file at a time. The aim is to minimize the total time for data aggregation. This is assuming that the last file to reach the destination is indeed the bottleneck, since computation cannot begin unless all the data is accumulated. The two dimensions of determining both the path and the time makes this problem exceptionally hard, and differentiates it from all machine-scheduling problems which have been reported in the literature [Pin02]. Thus, we formulated the problem as a Mixed Integer Linear Program (MILP) [BSZ04], which can be solved using a MILP solver such as CPLEX [CPL]. However, the size of the MILP grows rapidly with the number of files because a set of several equations is created for every pair of files. Hence, the MILP is not very efficient for solving larger problems. Therefore, we propose efficient heuristics to solve the problem, and we use the MILP for only a comparative study. Thus, we also proposed three heuristic algorithms to yield close-to-optimum solutions for TPSP as summarized in the following: LONGEST-FILE-FIRST (LFF) SCHEDULING: This heuristic is based on the intuition that the longest file (having the largest transfer times) is the bottleneck for scheduling, because it requires more resources in terms of the amount of time required to be free on the links for it to be transferred. Therefore, the LFF algorithm aims at scheduling the longest files first, so that they get priority on the network’s resources and get scheduled earlier. For choosing the path over which to transfer a file, the algorithm chooses the best path among K randomly chosen paths. The overall worst-case running-time complexity of LFF is O( Krf 2 ) , where r is the path length and f is the number of files. DISJOINT-PATH (DP) SCHEDULING: This heuristic is based on the intuition that files can be transferred along link-disjoint paths in parallel. The idea is to compute the maximum number of disjoint paths from the sources of the files to destination d . The above can be computed through an implementation of the Max-Flow algorithm [CLR01] on the following modified graph. All the links have unit capacity. A dummy source node is connected to all the nodes which have files not scheduled as yet, with link capacity as the number of files. The destination is connected to a dummy destination with capacity as the number of files yet to be scheduled. The Max-Flow algorithm then identifies the disjoint paths to consist of links with unit flow. The worst-case running-time complexity of the DP heuristic is O(r 4 rf 2 ) . MOST-DISTANT-FILE-FIRST (MDFF) SCHEDULING: This heuristic is based on the intuition that files which are most distant in terms of number of links from the destination occupy more links and are hence the bottleneck for scheduling. The heuristic aims at scheduling these files first when the network is relatively resourcefree. The worst-case running time is O( Krf 2 ) . 13 These approaches are compared through simulations on a 24-node topology in Figure 6. The Longest-File-First (LFF) heuristic performs very well when the number of files to be aggregated is small, while the Disjoint-Paths (DP) heuristic should be preferred for a large number of files. Also, LFF performs close to the MILP for small networks where the MILP can provide a solution in a reasonable computing time. We plan to extend our existing heuristic algorithms and further develop algorithms that are tailored for large-science applications. We also plan to test our algorithms in real large-science projects, e.g., GTL applications on Ultranet. Figure 6: Performance of the heuristics – with lower bound on finish time. 3.4 Low-delay and Fault-tolerant Signaling and Control Plane Architecture The need for low-delay and fault-tolerant signaling and control plane architecture for a network such as UltraNet arise for many reason. First, supercomputers at different physical sites will be harvested to provide the computational power needed for these long time-scale simulations, which must be tightly coordinated to ensure the effective utilization of computers. It is important that the end-to-end message delays be minimized over the networks to ensure that the supercomputers do not idle waiting for control messages. It is important to note that a single second of idle time represents the loss of several teraflops of compute power [DOE03]. The end-to-end delay minimization represents a significant challenge to networking technologies. Such problems are currently addressed in a limited way in overlay networks and daemons, but highly focused research and development efforts are needed for an effective solution to this class of applications. We will identify the needs of large-science applications and propose corresponding schemes. 14 Second, in order to implement a space-time schedule of data transfers between end hosts, supercomputers, and databases, it is necessary to have a signaling and control network that can used to setup the schedule and manage and control the network resources. One issue is how to implement the signaling network. Current approaches employ in-fiber-in-band techniques which are simple but do not provide the capability to do deploy sophisticated scheduling algorithms. As part of this research, we will investigate in-fiber-out-band signaling. To address the fault-tolerance issue we will investigate multipath approaches for signaling and control messages. Third, in order to manage and control the network resources of the ultra-high speed network, it is necessary to develop a very fast, reliable, and powerful control plane. The control plane must guarantee delivery of control messages with minimum delay. As part of the research we will investigate the requirements of the control plane architecture for large-science applications and build upon the knowledge plane architecture proposed in [CPR03]. 3.5 Scalable and Fault-Tolerant Network Primitives and Intelligent Services In this research project we will investigate how application-layer multicasting, caching, and intelligent data replication can be used to implement a high-performance network infrastructure for GTL applications. Towards this end, we will build upon the pseudo-serving paradigm proposed in [KoG99]. Pseudoserving is a P2P file sharing system comprising two components: a superserver and a set of pseudoservers. The former grants the latter access to files in exchange for some amount of network and storage resource, specified through a contract between the system and each user. Under normal circumstances, no resources are requested and the superserver acts as a regular server. As demand begins to exceed the superserver’s ability to provide service, the superserver offers a contract to the requesting pseudoserver. In it, the pseudoserver is obligated to serve the file it will retrieve to N other requesters within T seconds. It is released from its obligations should it service N other requesters before T seconds or should T seconds have passed without it having serviced N requesters. In exchange for this resource contribution, the superserver gives to the requesting pseudoserver a referral to another pseudoserver. This other pseudoserver is the one closest to the requester known to contain the file and is obligated to provide service as part of its contractual obligations. The pseudo-server provides a framework for sharing local storage and bandwidth and even partial computation across organizational boundaries. The current applications of pseudo-serving to dissipate the flash-crowd problem is somewhat limiting because what is transferred is the same file; in GLT applications there will be small number of users who will large amounts of data. This restriction can be removed by organizing files into sets of files, or packages. Before we see how packages work, we first examine why pseudo-serving may not work well when more than one file is requested from the super-server which for the GTL application is a meta-data controller for all the bio-databases in the GTL application. Suppose there are many files on a bio-database. There is nothing that prevents pseudo-serving from working on a per-file basis, so that contracts are set based on the incoming rate of request for individual files. The problem arises when this per-file rate of request is low but the cumulative rate of request for all the files on the super-server is high. Under such circumstances, the super-server may not be able to handle the incoming stream of request alone, and pseudo-servers are not able to satisfy contracts set by the super-server and so pseudo-serving is ineffective. Now, suppose files are organized into packages, where each package is a group of N files. Moreover, users are allowed to retrieve only packages. Under this arrangement, the rate of request for each package is N times the rate of request for individual files. The file holding time is therefore reduced by a factor of N. Using this packaging mechanism, contracts previously too difficult to satisfy because of their long file holding times can be made more attractive to the user. The problem with packaging, of course, is that users now need to download a file that may be significantly larger than the original file. Depending on how load on the bio-databases, from a user's point of view, retrieving packages may or may not be more attractive than retrieving only the file directly from the super-server. The optimal package size strikes a good balance between making contracts reasonably satisfiable and providing sufficient benefit to the user in reducing the total download time. This topic will be investigated as part of this research. 15 4. Collaboration and Applications This project will be executed in close collaboration with the DOE UltraScienceNet project at Oak Ridge National Laboratory (ORNL), in particular with the UltraScienceNet PIs -- Dr. Nagi Rao, Dr. Bill Wing, and their colleagues. The PI of our proposed project, Professor Biswanath Mukherjee, has been cooperating with Dr. Nagi Rao, Dr. Bill Wing, and others over the past 1.5 years towards defining the research challenges for the bandwidth-provisioning problems for DOE Large-Science Applications. In fact, Professor Mukherjee was invited by Dr. Nagi Rao and Dr. Bill Wing to co-chair (along with Dr. Wing) the "Provisioning Group" of the "DOE Workshop on Ultra-High Speed Transport Protocols and Provisioning for Large Scale Science Applications" held at Argonne National Laboratory in April 2003 [DOE03]. Professor Mukherjee made important contributions to the workshop by co-leading the discussions on provisioning. Then, he contributed to the final workshop report through his ideas on: (1) dynamic provisioning of high-capacity pipes of various bandwidth granularities ranging from multiple wavelengths to a full wavelength to sub-wavelength capacity; (2) how to employ generalized multi-protocol label switching (GMPLS) to facilitate the dynamic provisioning; (3) survivable bandwidth provisioning; (4) separated control channel with deterministic or bounded delay and jitter for control-loop operations; etc. We propose to build up on the relationship that has been set up between Professor Mukherjee and the UltraScienceNet team. Specifically, we plan to extend it to a true research collaboration to ensure that our research team at UC Davis is working on important research problems (w.r.t. the missions of the UltraScienceNet team and the DOE). We anticipate that our research results will complement (and extend the knowledge gained from) the UltraScienceNet, and our research results can be tested on the UltraScienceNet platform as well. One of our Co-PIs, Professor Dipak Ghosal, also has a long working relationship with Dr. Nagi Rao. They share common interests in transport-layer protocols and application-layer research problems. Our collaboration will build up on this relationship as well. It should also be worth mentioning that Professors Mukherjee and Ghosal have an ongoing collaborative research project with Dr. Wu-Chun Feng of Los Alamos National Laboratory using a UC-LANL seed-grant project entitled "Wide-Area Transport and Signaling Protocols for Genome-to-Life (GTL) Applications"; $45,214; 9/1/03 8/31/04. We propose to exploit this collaboration also for successful execution of the proposed project. Our additional collaborators in the DOE community include Professor Ghosal's research collaboration with Dr. Rose P. Tsang of Sandia National Laboratory. This relationship will also be utilized, if necessary, for the proposed project. The following are the expected outcomes of this proposed research: 1. A report on the networking issues for the DOE GTL program. This report will discuss the specific requirements of the GTL program and the corresponding requirements on the networking infrastructure and protocols. 2. This research will develop various heuristics to perform space-time scheduling of large file transfers that minimize the total data aggregation delay. 3. The research will compare and contrast various methods to mitigate receiver-side congestion that will arise when large data is simultaneously transferred from the various bio-databases to a client. The analysis will be done using a combination of simulation and analytical models. 4. The research will develop the requirements for a low-delay and fault-tolerant signaling and control plane architecture. 5. This research will investigate traffic grooming algorithms for efficient network bandwidth utilization. 6. We will propose reliable network provisioning strategies that meet the requirements for large-science applications. 16 7. The research will investigate and design a framework using which users can share local storage and partial computation. The applicability of application layer multicasting, caching, and data replication in the GTL applications will also be determined. 8. We will fully test the proposed algorithms and protocols in UltraNet in collaborations UltraNet researchers. 5. Statement of Work The main components of this proposal include (a) traffic grooming and bandwidth provisioning, (b) survivability and fault-tolerant network provisioning, (c) distributed space-time scheduling of large data transfer over ultra-high speed network, (d) low-delay and fault-tolerant signaling and control plane architecture, (e) scalable network primitives and services. Research, engineering, and application milestones: Year 1: Identify and analyze suitable traffic grooming algorithms for GTL applications. Develop and extend existing heuristic space-time scheduling algorithms under dynamic network states. Propose and study suitable network survivability strategies for large-science applications. Year 2: Technology transfer to Ultranet o Test proposed space-time scheduling and grooming algorithms on Ultranet (at ORNL) with the collaboration of UltraNet researchers Develop low latency signaling strategies to control GTL simulations at remote DOE terrascale computing facilities. Develop a robust application–layer multicasting framework for simultaneous downloading of very large datasets to multiple clients. Year 3: Technology transfer to Ultranet o Test proposed low latency signaling strategies and application-layer multicasting algorithms on Ultranet (at ORNL) with the collaboration of UltraNet researchers o Fully test algorithms developed for a set of large-science applications, in addition to GTL. Develop the requirement for a fault-tolerant in-fiber-out-of-band signaling architecture. Develop a scalable control plane architecture for UltraNet taking into account the requirement of the various large-science applications. 17 6. References3 [ACQ02] V. Anand, S. Chauhan, and C. Qiao, ``Sub-path protection: A new framework for optical layer survivability and its quantitative evaluation,'' Dept. of Computer Science and Engineering, State University of New York at Buffalo, Tech. Report 2002-01, Jan. 2002. [BBM03] Alessandro Bassi, Micah Beck, Terry Moore, James S. Plank, Martin Swany, Rich Wolski, and Graham Fagg, “The Internet Backplane Protocol: A Study in Resource Sharing,” Future Generation Computing Systems, 19(4), May 2003, pp 551-561. Elsevier. [BSZ04] A. Banerjee, N.Singhal, J. Zhang, C. N. Chuah and B. Mukherjee, “A Time-Path Scheduling Problem (TPSP) for Aggregating Large Data Files from Distributed Databases using an Optical-Burst Switched Network”, accepted for presentation and publication in the proceedings of International Communications Conference (ICC 2004), Paris, France. [ClG02] M. Clouqueur and W. D. Grover, ``Availability analysis of span-restorable mesh networks,'' IEEE J. Selected Areas in Communications, vol. 20, pp. 810--821, May 2002. [CLR01] T. Cormen, C. Leiserson, R. Rivest and C. Stein, “Introduction to Algorithms,” Second Edition, MIT Press, 2001. [CPL] http://www.ilog.com/products/cplex/product/suite.cfm [CPR03] D. D. Clark, C. Partridge, J. C. Ramming, J. Wroclawski, A Knowledge Plane for the Internet. ACM SIGCOMM 2003. [DCF03] A. Darling, L. Carey, and W. Feng, “The Design, Implementation, and Evaluation of mpiBLAST,” ClusterWorld 2003, Best Paper Award, June 2003. [DOE03] DOE Workshop on Ultra-High Speed Transport Protocols and Provisioning for Large Scale Science Applications. Argonne National Lab, Argonne, IL, 2003. http://www.csm.ornl.gov/ghpn/wk2003_workshops.html [Flo01] Internet Engineering Task Force, ICSI Center for Internet Research, Berkeley, California.http://www.icir.org/floyd/papers/draft-floyd-tcp-highspeed-01.txt [FTU02] A. Fumagalli, M. Tacca, F. Unghvary, and A. Farago, ``Shared path protection with differentiated reliability,'' in Proc. IEEE ICC, pp. 2157--2161, April 2002. [GTL03] DOE Genomes to Life Program. http://doegenomestolife.org/. [Gro99] W. D. Grover, ``High availability path design in ring-based optical networks,'' IEEE/ACM Trans. Networking, vol. 7, pp. 558--574, Aug. 1999. [HoM02] P.-H. Ho and H. Mouftah, ``A framework for service-guaranteed shared protection in WDM mesh networks,'' IEEE Communications Mag., vol. 40, pp. 97--103, Feb. 2002. 3 There exist a vast amount of references in the area of networking, optical, and applications that are related to this proposal. Far from a complete set of references, we can only list a small sample of the literature to highlight the presentation. 18 [Jan00] Jannottti, J., et al. Overcast: Reliable Multicasting with an Overlay Network. in Fourth Symposium on Operating Systems Design and Implementation (OSDI 2000). 2000. San Diego, California, USA. [KoG99] K. Kong and D. Ghosal, Mitigating Server Side Congestion Through Pseudo-Serving. IEEE/ACM Transactions on Networking, 1999. 7(4). [Muk97] B. Mukherjee, “Optical Communication Networks,” McGrawHill, pp. 259–288, 1997. [Ora01] A. Oram, Peer-to-Peer: Harnessing the Benefits of a Disruptive Technology. 2001: O' Reilly & Associates. [OZM02] C. Ou, H. Zang, and B. Mukherjee, ``Sub-path protection for scalability and fast recovery in optical WDM mesh networks,'' in Proc. OFC, p. ThO6, Mar. 2002. [Pin02] M. Pinedo, “Scheduling: Theory, Algorithms, and Systems,” Second Edition, Prentice Hall, 2002. [PBB01] James S. Plank, Alexander Bassi, Micah Beck, Terence Moore, D. Martin Swany, and Rich Wolski, “Managing Data Storage in the Network,” IEEE Internet Computing, 5(5), September/October 2001, pp. 50-58. [PFD03] PFDLNet, First International Workshop on Fast Long-Distance Networks, Cern, Geneva, Switzerland, 2003. [RaM02] S. Ramamurthy and B. Mukherjee, ``Survivable WDM mesh networks, Part II -- restoration,'' in Proc. IEEE ICC, pp. 2023--2030, June 1999. (Also, IEEE JLT, to appear, 2002.). [RWD03] N. S. Rao, W. R. Wing, T. H. Dunigan, DOE UltraScience Net: Experimental Ultra-Scale Network Research Testbed. http://www.csm.ornl.gov/ultranet/UltraNet_ORNL_Prop.pdf [SCTP] [STP03] Stream Control Transmission Protocol (SCTP). http://www.sctp.de Scheduled Transfer Protocol (ST), High-Performance Parallel Interface Standards Group. http://www.hippi.org/cST.html. [ToN94] M. To and P. Neusy, ``Unavailability analysis of long-haul networks,'' IEEE J. Selected Areas in Communications, vol. 12, pp. 100--109, Jan. 1994. [WSM02] J. Wang, L. Sahasrabuddhe, and B. Mukherjee, ``Path vs. sub-path vs. link restoration for fault management in IP-over-WDM networks: Performance comparisons using GMPLS control signaling,'' IEEE Communications Mag., vol. 40, pp. 2--9, Nov. 2002. [WSM02FOEC] J. Wang, L. Sahasrabuddhe, and B. Mukherjee, ``Fault monitoring and restoration in optical WDM networks,'' in Proc. National Fiber Optic Engineers Conference, Sep. 2002. [ZhM02] K. Zhu and B. Mukherjee, “On-line Provisioning Connections of Different Bandwidth Granularity in WDM Mesh Networks,” Proc., IEEE/OSA Optical Fiber Communication Conference (OFC) ’02, Anaheim, CA, March, 2002. [ZhS00] D. Zhou and S. Subramaniam, ``Survivability in optical networks,'' IEEE Network, vol. 14, pp. 16--23, Nov./Dec. 2000. 19 7. Appendix A: Budget The cost of this project is $K in the first year, $K in the second year, and $K in the third year (for FY2004 through 2006). Cost for each year includes graduate student assistantship and tuition fees for five graduate students, one month salary for each PI. In each year, one post-doctoral fellow at full-time will be supported at UC Davis. The post-doctoral fellow will be the involved in all aspect of the research plan and will work closely with the graduate students and the PIs. The budget includes travel money for travel to DOE project meetings, meetings for technology transfer to UltraNet, and attending conferences and workshops to present some of the relevant research results. Finally, the budget also includes equipment money to buy desktop PCs/workstations for graduate students and the postdoc. Graduate Student Fees for Research Assistants including non-California-resident tuition. The University of California, Davis campus does not have an “Out of State” or “Non-Resident” Tuition Remission program. However, we are requesting approval to charge the tuition for 3 students in the first year. Technical support salary is requested for computer technical support directly related to the scientific research objectives of this proposed project. This will include including installation, troubleshooting and maintenance of specialized software and/or networking capabilities required for simulation software such as ns as well as installation of hardware and /or research instrumentation required to meet the scientific research objectives of this project. 20 8. Appendix B: Biographical Information BISWANATH MUKHERJEE Department of Computer Science University of California Davis, CA 95616, USA EDUCATION 1987 Ph.D. 1980 B.Tech. (Hons.) Phone: +1-530-752-4826; FAX: +1-530-752-4767 Electronic mail: mukherjee@cs.ucdavis.edu WWW: http://networks.cs.ucdavis.edu/~mukherje/ Electrical Engineering, University of Washington, Seattle Electronics and Electrical Communications Engg., IIT Kharagpur (India) ACADEMIC APPOINTMENTS 1995Professor/Computer Science, University of California, Davis 1997-00 Department Chair/Computer Science, University of California, Davis 1992-95 Associate Professor/Computer Science, University of California, Davis 1987-92 Assistant Professor/Computer Science, University of California, Davis 1984-87 Research & Teaching Assistant/Electrical Engineering, University of Washington CURRENT RESEARCH INTERESTS Lightwave Networks; Wireless Networks; Network Security AWARDS 1984-85 1986-87 1991 1994 GTE Teaching Fellowship, University of Washington General Electric Foundation Fellowship, University of Washington Co-winner, Best Paper Award, 14th National Computer Security Conference, for the paper "DIDS (Distributed Intrusion Detection System \(mi Motivation, Architecture, and an Early Prototype." Co-winner, Paper Award, 17th National Computer Security Conference, for the paper "Testing Intrusion Detection Systems: Design Methodologies and Results from an Early Prototype." RESEARCH PUBLICATIONS Please visit B. Mukherjee's website (http://networks.cs.ucdavis.edu/~mukherje/) for details on his research publications. A. List of up to Five Publications Most Closely Related to Proposed Project: 1. B. Mukherjee, Optical Communication Networks, New York: McGraw-Hill, July 1997. 2. B. Mukherjee, "WDM-Based Local Lightwave Networks -- Part I: Single-Hop Networks; Part II: Multihop Networks," IEEE Network, vol. 6: Part I: no. 3, pp. 12-27, May 1992; Part II: no. 4, pp. 20-32, July 1992. (Nominated for IEEE and IEEE Communications Society Paper Awards. Also, revised/updated version published in Encyclopedia for Telecommunications as an Invited Article.) 3. L. Sahasrabuddhe and B. Mukherjee, ``Light-Trees: Optical Multicasting for Improved Performance in Wavelength-Routed Networks,'' IEEE Communications Magazine, vol. 37, no. 2, pp. 67-73, Feb. 1999. 4. D. Datta, B. Ramamurthy, H. Feng. J.P. Heritage, and B. Mukherjee, "Impact of transmission impairments on the teletraffic performance of wavelength-routed optical networks," IEEE/OSA Journal of Lightwave Technology, vol. 17, no. 10, pp. 1713-1723, Oct. 1999. 5. B. Mukherjee, ``WDM Optical Communication Networks: Progress and Challenges" (Invited Paper), IEEE Journal on Selected Areas in Communications (Special Issue on ``Protocols and Architectures for Next Generation Optical WDM Networks"), vol. 18, no. 10, pp. 1810-1824, Oct. 2000. B. List of up to Five Other Significant Publications. 1. B. Mukherjee and J. S. Meditch, "The p(i)-persistent protocol for unidirectional broadcast bus networks," IEEE Transactions on Communications, vol. 36, pp. 1277-1286, Dec. 1988. 21 2. B. Mukherjee and J. S. Meditch, "Integrating voice with the p(i) persistent protocol for unidirectional broadcast bus networks," IEEE Transactions on Communications, vol. 36, pp. 1287-1295, Dec. 1988. 3. B. Mukherjee, D. Banerjee, S. Ramamurthy, and A. Mukherjee, "Some principles for designing a wide-area optical network," IEEE/ACM Transactions on Networking, vol. 4, pp. 684-696, Oct. 1996. (Originally appeared in IEEE Infocom '94, was selected by the IEEE Infocom '94 conference program committee as one of the top few papers (out of 449 submissions), recommended to the IEEE/ACM Transactions on Networking, and published in the journal after its own independent review.) 4. B. Mukherjee, L. T. Heberlein, and K. N. Levitt, "Network intrusion detection," IEEE Network, vol. 8, no. 3, pp. 26-41, May/June 1994. 5. B. Guha and B. Mukherjee, "Network security via reverse engineering of TCP code: Vulnerability analysis and proposed solutions," IEEE Network, vol. 11, no. 4, pp. 40-49, July/August 1997. PROFESSIONAL SERVICE Editor, IEEE/ACM Transactions on Networking (1994-2000) Editor-at-Large, Optical Communicationa and Networking, IEEE Communications Society (1999-2000) Technical Program Chair, IEEE INFOCOM '96 Conference Member of the Editorial Board and Senior Technical Editor, IEEE Network (1997-2000) Member of the Editorial Board, Journal of High-Speed Networks Member of the Editorial Board, ACM/Baltzer Wireless Information Networks (WINET) journal Member of the Editorial Board, Photonic Network Communications journal Member of the Editorial Board, Optical Networks journal Proposal Evaluation Panel: National Science Foundation (1993-present) NSF Panels/Workshops: (1) All-Optical Networks (Jan. 93); (2) Optical Commun. and Networks (March 94); (3) CISE International Cooperation (Oct. 97); (4) Ultra-High-Capacity Optical Networks (Oct. 02). Member of the Technical Program Committee, IEEE INFOCOM 89-90, 92-99 conferences; IEEE GLOBECOM 92; ACM SIGCOMM 93; (and many other conferences) Reviewer of proposals for: National Science Foundation; NASA/HPCC; State of California MICRO Program; Hong Kong Research Grants Council; Govt. of Singapore; Israel Science Foundation Founder, Chairman, and Chief Technology Officer, Summit Networks, San Jose, CA (Feb. '00 - Aug. '02): A startup specializing in building optical networking equipment. Member, Board of Directors, IPLocks, San Jose, CA (Feb.'02 - present): building computer security products. Graduate Students and Postdoctoral Researchers Supervised: Subrata Banerjee, PhD (Cisco; previously Director of Software, Accordion Networks; Asst. Prof. at Stevens Tech, Phillips Research); Feiling Jia, PhD (Atoga Systems, previously at SBC/Pacific Bell); Shao-kong Kao, PhD (Foundry Networks; previously Sun Microsystems, Alidian); Michael S. Borella, PhD (3Com; previously Asst. Prof. at DePaul Univ.); Dhritiman Banerjee, PhD (VP/cofounder of Internet Photonics; previously at Bell Labs./Lucent); Jason Iness, PhD (Intel); Byrav Ramamurthy, PhD (Asst. Prof. at Univ. of Nebraska); S. Ramu Ramamurthy, PhD (CIENA; previously at Tellium and Bellcore); Jason Jue, PhD (Asst. Prof. at University of Texas--Dallas); Laxman H. Sahasrabuddhe, PhD (SBC; previously at Amber Networks); Nick Puketza, PhD (Lecturer at UC Davis); Xiaoxin Wu, PhD (postdoc at Purdue; previously at Arraycom); Wushao Wen, PhD (CIENA; previously at Mahi Networks); Hui Zang, PhD (Sprint Adv. Technology Lab.); Shun Yao, PhD (postdoc at UC Davis); Jian Wang, PhD (Asst. Prof. at Florida Intl. Univ.); L. T. Heberlein, MS (Net Squared); Justin Doak, MS (LANL); Kui Zhang, MS (Cisco); Biswaroop Guha, MS (Hewlett-Packard); Kirk Bradley, MS (SRI); plus 11 other MS degrees. Currently supervising approx. 13 graduate students, mostly PhDs. PI's PhD Advisor: Professor James S. Meditch, University of Washington, Seattle 22 DIPAK GHOSAL Department of Computer Science University of California Davis, CA 95616 E-mail: ghosal@cs.ucdavis.edu Tel. No.: (530) 754 9251 Fax: (530) 752 4767 WWW: http://networks.cs.ucdavis.edu/~ghosal Education Post-Doctoral Studies, Computer Science, Institute for Advanced Computer Studies, University of Maryland, USA, September 1990 Ph.D., Computer Science, The Center for Advanced Computer Studies, University of Louisiana, USA, July 1988. M.Sc.(Engg.), Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore, India, December 1985. B.Tech., Dept. of Electrical Engineering, Indian Institute of Technology, Kanpur, India, May 1983. Professional Experience July 1999 - Present: Associate Professor, Department of Computer Science, University of California, Davis, CA 95616. January 1996 - June 1999: Assistant Professor, Department of Computer Science, University of California, Davis, CA 95616. September 1990 - December 1995: Member of the Technical Staff, Bell Communications Research, Red Bank, New Jersey 07701, USA. September 1988 - August 1990: Research Associate, Institute for Advanced Computer Studies, The University of Maryland, College Park, MD 20742, USA. July 1986 - July 1988: Research Assistant, The Center for Advanced Computer Studies, University of Louisiana, Lafayette, LA, 70504, USA. August 1983 - December 1985: Research Fellowship, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India. Recent Research Publications Julee Pandya, Prasant Mohapatra, and Dipak Ghosal, “Asymptotic Analysis of a Peer Enhanced Cache Invalidation Scheme,” WiOpt'04: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks 24th - 26th of March, 2004, University of Cambridge, UK. Stephen Mueller, Rose P. Tsang, and Dipak Ghosal, “Multipath Routing in Mobile Ad Hoc Network – Issues and Challenges,” Invited paper. To appear in Lecture Notes in Computer Science, 2004. Dipak Ghosal, Benjamin Poon, and Keith Kong, “P2P Contracts: A Framework for Resources and Service Exchange" accepted for publication in the special issue of Future Generation Computer Systems, 2004 S. Kovvuri, V. Pandey, B. Mukherjee, D. Ghosal, and D. Sarkar, ``A Call-admission Control (CAC) Algorithm for Providing Guaranteed QoS in Cellular Networks," Intl. Journal of Wireless Information Networks, 2003{Preliminary version: S. Kovvuri, V. Pandey, D. Ghosal, B. Mukherjee, and D. Sarkar, ``A call-admission control (CAC) algorithm for providing guaranteed QoS in cellular networks,'' Proc., IEEE Wireless Access Systems, San Francisco, CA, Dec. 2000}. W. Wen, B. Mukherjee, S.-H. Gary Chan, and D. Ghosal, ``LVMSR-- An efficient algorithm to multicast layered video," Computer Networks, March 2003 {Preliminary Version: W. Wen, S.-H. Gary Chan, D. Ghosal, and B. Mukherjee, ``LVMSR--An efficient algorithm to multicast layered video,'' Proc., IEEE ICC 2000 conference, New Orleans, LA, pp. 254-258, June 2000.} 23 B. Reynolds and D. Ghosal. STEM: Secure Telephony Enabled Middlebox. IEEE Communications Magazine Special Issue on Security in Telecommunication Networks. October 2002. J. Burns and D. Ghosal, “Design and Analysis of a New Algorithm for Automatic Detection and Control of Media-Stimulated Focussed Overload, to appear in Telecommunication System, 2002. J. Abramson, X-yan Fang, D. Ghosal, Analysis of an Enhanced Signaling Network for Scalable Mobility Management in Next Generation Wireless Networks, in IEEE Globecom, November 2002. B. Reynolds and D. Ghosal, “STEM: Secure Telephony Enable Middlebox, to appear in IEEE Communications Magazine Special Issue on Security Issues in Telecommunications Networks, October 2002. Professional Service Served in many NSF and UC Core panels Program Committee Member of 1995 Distributed Computing Conference, Infocom 1995-1997, 2000, 2001, 2003, Performance 1996, SDPS 1996, MASCOT 1994, 2001 Referee for NSF Proposals, IEEE/ACM transactions on Networking, IEEE Transactions on Computers, IEEE Computer Magazine, IEEE Transactions on Software Engineering, Member of IEEE Computer Society, IEEE Communications Society, and ACM. Grants and Award 1998-1999: MICRO Grant. Title “Emerging Customer Data Network Management.” (Industry support committed from SBC). PIs: Biswanath Mukherjee and Dipak Ghosal. 1997-2002: NSF Career Award. Proposal Title “A Career Development Plan for Research and Education in High Speed Networks.” PI: Dipak Ghosal 1998-2003: NSF Award. Proposal Title “Complementing Internet Caching with Pseudo-serving to Mitigate Network Congestion.” PIs: Dipak Ghosal and Louis S Hakimi 2002-2003: HP Technology Award, Mobile Technology Solutions Grant, Pis: Prasant Mohapatra and Dipak Ghosal 2003–2004: Sandia Labs. Title “Application of Mobile Ad Hoc and Sensor Networks for Facilities Protection,” PI: Dipak Ghosal. 2003– 2004: Los Almos National Labs. Title: Wide-Area Transport and Signaling Protocols for Genome To Life (GTL) Applications. PIs: Biswanath Mukherjee, Dipak Ghosal and Wu-Fung Chung 2003– 2005 NSF Award: Proposal Title: Security Architecture for IP Telephony. PIs: Dipak Ghosal and S. Felix Wu. 2003–2004: California Institute for Energy Efficiency (CIEE). Proposal Title: Enabling Demand Response with Vehicular Mesh Networks (VMesh). Status (pending) PIs: Chen-Nee Chuah, Dipak Ghosal, and Michael H. Zhang. Patents/Inventions Keith Kong and Dipak Ghosal, “A Self-Scaling Scheme for Avoiding Server-Side Congestion in the Internet,” Approved October 2002, US Patent 6,473,401 B1 Names of graduate and post-graduate advisors, advisees, and collaborators Ph.D. advisor: Dr. Laxmi N. Bhuyan, Professor of Department of Computer Science, University of California, Riverside. Post-doctoral advisor: Dr. Satish K. Tripathi, Dean of Engineering and Johnsons Professor of Engineering, University of California, Riverside. Advisees: Jennfer Yick, Howard Cheung, Archana Bhratidhasan, Vijay Ponduru, Brennen Reynolds, Julee Pandya, Jeremy Abramsom, James Xiao-yan Fang, Keith Kong, Vijoy Pandey, Sujatha Balaraman, Xiaoxin Wu, Raja Mukhopadhaya, Ashok Swamy, Arijit Mukherji, Narana Kannappan. Research Collaborators: Biswanath Mukherjee, Randy Katz, Rajeev Motwani, Matthew Caesar, T. V. Lakshman, Tsong-Ho Wu, Gopal Mempat, Jonathan Chao, Debanjan Saha, Satish Tripathi, Erol Gelenbe, Guiseppe Serazzi. 24 XIN LIU Department of Computer Science University of California Davis, CA 95616 E-mail: liu@cs.ucdavis.edu Tel.: (530) 754-6907 Fax: (530) 752-4767 http://www.cs.ucdavis.edu/~liu EDUCATION 2002 Ph.D. 1997 M.S. 1994 B.S. Electrical & Comp. Engineering, Purdue University Electrical Engineering, Xi’an Jiaotong University Electrical Engineering, Xi’an Jiaotong University ACADEMIC APPOINTMENTS 2003Assistant Professor/Computer Science, University of California, Davis 2002-2003 Post-doctoral Research Associate, Univ. of Illinois, Urbana-Champaign CURRENT RESEARCH INTERESTS Wireless Networks; Network Security AWARDS 2003 Best Paper Award, Computer Networks (Elsevier) Journal, for the paper "A Framework for Opportunistic Scheduling in Wireless Networks." RECENT RESEARCH PUBLICATIONS 1. X. Liu, E. K. P. Chong, and N. B. Shroff, “A Framework for Opportunistic Scheduling in Wireless Networks,” Computer Networks, vol. 41, no. 4, pp. 451-474, March, 2003. 2. X. Liu, E. K. P. Chong, and N. B. Shroff, “Opportunistic Transmission Scheduling with Resource-Sharing Constraints in Wireless Networks,” IEEE Journal on Selected Areas in Communications, vol. 19, no. 10, pp. 2053-2064, October, 2001. 3. X. Liu, E. K. P. Chong, and N. B. Shroff, “Joint Scheduling and Power-Allocation for Interference Management in Wireless Networks,” Proceedings of the 2002 IEEE Vehicular Technology Conference, Vancouver, Canada, September, 2002, vol. 3, pp. 1892– 1896. 4. X. Liu, E. K. P. Chong, and N. B. Shroff, “Efficient Scheduling in Wireless Networks,” Proceedings of the 2001 IEEE INFOCOM, Alaska, April 2001, vol. 2, pp. 5. X. Liu and R. Srikant, The Timing Capacity of Single-server Queues with Multiple Input and Output Terminals," To appear in the proceedings of the DIMACS Workshop on Network Information Theory, 2004. PROFESSIONAL SERVICE Member, Technical Program Committee, IEEE INFOCOM 2003-2004 Referee for IEEE/ACM transactions on Networking, IEEE Transactions on Computers, IEEE Computer Magazine Member of IEEE Computer Society, IEEE Communications Society, and ACM. 25 Grdaute and Postdoctoral Advisors Edwin K. P. Chong, Dept. of Elec. & Comp. Engr., Colorado State University Ness B. Shroff, Dept. of Elec. & Comp. Engr., Purdue University R. Srikant, Dept. of Elec. & Comp. Engr., University of Illinois 26