D R A F T - Computer Science - the University of California, Davis

advertisement
Control and Provisioning of Ultra-High Speed Networks
for Large Science Applications
Principal Investigator: Biswanath Mukherjee
Co-Principal Investigators: Dipak Ghosal and Xin Liu
Department of Computer Science
University of California
Davis, CA 95616
E-mail: mukherjee@cs.ucdavis.edu
Phone: +1-530-752-4826; +1-530-752-7004
Fax: +1-530-752-4767
Submitted to:
Office of Science
Notice DE-FG01-04ER04-03
High-Performance Network Research:
Scientific Discovery through Advanced Computing (SciDAC) and
Mathematical, Informational, and Computational Sciences (MICS)
Program Manager:
Dr. Thomas D. Ndousse
Mathematical, Informational, and Computational Sciences Division
Germantown Bldg/SC-31
Office of Science
U.S. Department of Energy
1000 Independence Avenue, SW
Washington, DC 20858-1290
Email: tndousse@sc.doe.gov
Phone: +1-301-903-9960,
Fax: +1-301-903-7774
1
Table of Contents
1. DOE LARGE SCIENCE APPLICATIONS......................................................................................................... 3
2. CURRENT STATE OF THE NETWORKS ........................................................................................................ 5
3. RESEARCH PLAN ................................................................................................................................................ 7
3.1 TRAFFIC GROOMING AND BANDWIDTH PROVISIONING ...................................................................................... 7
3.2 SURVIVABILITY AND FAULT-TOLERANT NETWORK PROVISIONING .................................................................. 9
3.3 DISTRIBUTED SPACE TIME SCHEDULING.......................................................................................................... 11
3.4 LOW LATENCY AND FAULT-TOLERANT SIGNALING PLANE ............................................................................. 14
3.5 SCALABLE AND FAULT-TOLERANT NETWORK PRIMITIVES AND INTELLIGENT SERVICES .............................. 15
4. COLLABORATION AND APPLICATIONS.................................................................................................... 16
5. STATEMENT OF WORK ................................................................................................................................... 17
6. REFERENCES ..................................................................................................................................................... 18
7. APPENDIX A: BUDGET..................................................................................................................................... 20
8. APPENDIX B: BIOGRAPHICAL INFORMATION ....................................................................................... 21
2
1. DOE Large Science Applications
The next generation supercomputers hold an enormous promise for meeting the demands of a number of large-scale
scientific computations from fields as diverse as earth science, climate modeling, astrophysics, fusion energy
science, molecular dynamics, nanoscale materials science, and genomics (see Table 1 [DOE03] for a list of other
application and their characteristics). Among the DOE sponsored large-science applications, a specific example is
the Genomes To Life (GTL) Program. The goal of GTL is to use DNA sequences of microbes and higher
organisms, including humans, as starting points to systematically answer questions related to the fundamental
underlying processes of living systems. Towards this end, the key goals of the GLT program [GTL03] are to (1)
identify the protein machines that carry out critical life functions, (2) characterize the gene regulatory networks that
control these machines, (3) explore the functional repertoire of complex microbial communities in their natural
environments to provide a foundation for understanding and using their remarkably diverse capabilities to address
DOE missions, and (4) develop the computational capability to integrate and understand these data and begin to
model complex biological systems.
Table 1 Characteristics of various large-scale science applications [DOE03].
Science
Areas
High Energy
Physics
Climate Data &
Computations
Current
End2End
Throughput
0.5 Gbps E2E
0.5 Gbps E2E
5 years
End2End
Throughput
100 Gbps
E2e
160-200
Gbps
5-10 Years
End2End
Throughput
1.0 Tbps
high throughput
n Tbps
high throughput
Tbps &
control
channels
n Tbps
remote control
& high
throughput
time critical
transport
1TB+ &
stable streams
computational
steering &
collaborations
high throughput
& steering
SNS
NanoScience
does not exist
1.0 Gbps
steady state
Fusion Energy
500MB/min
(Burst)
Astrophysics
1TB/week
500MB/20se
c
(burst)
N*N
multicast
Genomics Data &
Computations
1TB/day
100s users
Tbps &
control
channels
General
Remarks
Consider the scenario in which a user participating in a genomics research project is running software like
mpiBLAST [DCF03], which at different stages of the computation, needs to make repeated accesses to biodatabases that are scattered all over the country. These databases are very large and typically several gigabytes in
size and increasing at a rate faster than Moore’s Law [DOE03], i.e., doubling every 12 months rather than every 18
months. The research groups can currently install local copies of these large databases to save the data transfer
time. However, as the amount of bio-data increases exponentially due to the advanced capabilities in analytical
technologies for biology, it will soon become unrealistic to keep local copies of all bio-databases in every single
biology lab. As a result, from time to time (during the computation), large chunks of data will be downloaded from
a number of different bio-databases. In addition, future applications may also require distributed collaborative
3
visualization, remote computational steering, and remote instrument control [DOE03]. This poses important and
challenging networking research and development issues, as highlighted in the workshop report [DOE03]:
An ultra high-performance network with powerful and flexible provisioning and transport
modalities is needed to meet the demand of the DOE large-scale science application.
Dynamic provisioning of ultra high-speed networks is identified as a critical area of research for networking
technologies of DOE large-scale science projects. In this proposal, we focus on the control and provisioning of ultra
high-speed networks. To elaborate, we will address the following challenges:
1. Traffic Grooming and Bandwidth Provisioning: Fiber-optic technology is the dominant choice for
building and operating long-haul backbone networks because of fiber's enormous bandwidth capacity. A
single strand of fiber can support 160 wavelength channels, each operating at 10 Gbps today using
commercial off-the-shelf components (and extendable to 320 channels and 40 Gbps/channel in the
foreseeable future). However, not all network nodes or interfaces (e.g., bio-databases, supercomputer
interfaces, or other large-science DOE applications) may need such a large capacity. Therefore, how to
efficiently provision high-capacity pipes of diverse bandwidth granularities (perhaps ranging from several
wavelength channels to a single wavelength to sub-wavelength channel capacity) between network nodes
and interfaces is a very important problem, and is known as the traffic-grooming problem. Concisely,
traffic grooming refers to the mechanisms for intelligently aggregating/de-aggregating and switching
lower-speed traffic streams between higher-capacity trunks (such as wavelength channels). Based on our
preliminary work, we plan to further develop grooming strategies for large-science applications.
2. Fault-tolerant Network Provisioning: Reliability and fast restoration are highly desirable features of a
network designed to support DOE large-science applications. Noting the huge capacity of a fiber and the
fact that network failures (particularly fiber cuts) do occur more often than we wish, it is imperative that
excellent protection and restoration schemes be designed in the next-generation UltraScienceNet. Given
the diversity of bandwidth granularities of the various high-capacity pipes, there are several additional
research challenges, e.g., should each bandwidth pipe be protected separately or should protection be set up
at wavelength channel levels? Should the spare capacity be set up for "dedicated protection" for a
connection or for "shared protection" which can be pooled for different connections? Should the
protection/recovery be performed on a per-link basis or a connection's end-to-end basis or on the basis of a
"sub-path" of a connection? Should the recovery paths be pre-computed (and periodically recomputed
based on current network state for efficiency) or should they be dynamically discovered after a failure
occurs? These methods will have different performance tradeoffs on reliability, restorability, restoration
time, etc. It is envisioned that, perhaps, all of these approaches may need to co-exist in the same network
because different applications may have different requirements on fault tolerance. We propose to
investigate the applicability of the above fault-management approaches on bandwidth provisioning for the
GTL application as well as other DOE large-scale science applications.
3. Space-Time Scheduling of Large Data Transfers: The problem of aggregating large data files from
distributed databases and/or terrascale computing facilities will be a common task in many large science
applications. This requires intelligent distributed space-time scheduling for large data transfers. In this
project, we consider the problem of aggregating large data files from distributed databases and address the
corresponding challenges involved from a network architecture perspective. The objective is to minimize
the total time delay for data aggregation. The two dimensions of determining both the path (space) and the
time make the problem difficult and differentiated it from all machine-scheduling problems which have
been reported in the literature [Pin02]. We formulated the problem as a Time-Path Scheduling Problem
(TPSP). We showed that TPSP is NP-complete and developed heuristic algorithms for efficient scheduling.
In this project, we will extend our previous work and further develop scheduling strategies suitable for
large-science applications.
4
4. Low-delay and Fault-tolerant Signaling and Control Plane Architecture: The scheduled transfer of
large data sets will require a signaling and control plane architecture that can used to setup the schedule as
well as manage and control the network resources [BBM03]. A key aspect of such an architecture will be to
minimize the end-to-end delay of the signaling and control messages. In the context of genomics
application, low-delay requirements also arise in sending control messages to supercomputers. The
prediction or modeling tasks for biological sciences, such as the simulation of dynamics of bio-molecules
over large time-scales [DOE03] will be carried out in supercomputers at different physical sites and will
require data to be transferred from bio-databases to supercomputers and various inputs and control
messages to be processed from the user running the simulations. To provide the computational power
needed for these long time-scale simulations, tasks must be very tightly coordinated to ensure the effective
utilization of the supercomputers. Thus, it is important that end-to-end message delays be minimized over
the networks to ensure that the supercomputers do not idle waiting for control messages. We will explore
in-fiber-out-of-band signaling architecture and investigate the use of redundant signaling paths to meet the
low-delay and fault-tolerant requirements.
5. Scalable Network Primitives and Services: The high performance networking not only consists of the
infrastructure but also various other protocol and services. This will require new transport layer protocols
that can transport large amounts of data efficiently and with low latency. Algorithms must be developed to
mitigate receiver-side bottlenecks that may arise when large amounts of data from a number of different
databases are aggregated at a client. Network primitives such as application-layer multicasting [Jan00],
caching [Ora01], intelligent data replication [PBB01], data bundling based on access patterns [KoG99], and
sharing of partial computation among experts will be required to extend the capabilities of the UltraNet.
The remainder of the proposal is organized as follows. Section 2 outlines the current state of the networks for largescale science applications. In particular we discuss ESNet and the goals of the newly proposed UltraNet. Section 3
gives details of the research plan. Section 4 outlines collaborations and applications and Section 5 enumerates the
statement of the work. The references, the budget, the biographical information of the PIs, are provided in
Sections 6 through 8.
2. Current State of the Networks
The Energy Sciences Network, or ESnet, (shown in Figure 1) is a high-speed network serving thousands of DOE
scientists and collaborators worldwide. A pioneer in providing high-bandwidth, reliable connections, ESnet enables
researchers at national laboratories, universities and other institutions to communicate with each other using the
collaborative capabilities needed to address some of the world's most important scientific challenges. The newer
challenges of DOE large-scale science applications require capabilities that far transcend its production network
capabilities. Consequently, the next generation network demands are simply beyond the capabilities of ESnet both
in terms of the required large bandwidths and the sophistication of the capabilities. First, there is no provision in
ESnet for testing Gbps dedicated cross-country connections with dynamic switching capability. Second, during the
technology development process, it is quite possible for various components of the network to be unavailable for
production operations; such situations cause undue disruptions for normal Esnet activities.
5
Figure 1 The ESNet backbone network.
A number of proposals have been recently funded to extend ESnet. The science UltraNet [RWD03] is one
important effort that this proposed research will be closely aligned to. The key goal of UltraNet is to eliminate the
ever-widening performance gap between link speeds and application throughputs. While optical technologies
promise lambda switched links at Tbps rates, they do not provide provisioning and transport technologies to deliver
this performance to the application layer. Legacy protocols, including the most widely deployed transport protocols,
namely Transmission Control Protocol (TCP), and other network components (that are optimized for low network
speeds) cannot easily scale to the unprecedented optical link bandwidths. UltraNet is exploring innovative scalable
architectural options that use a minimum number of layers that make wavelengths available directly to the
applications [RWD03].
UltraNet will provide a rich environment to explore high-performance transport protocols that will achieve
throughputs of the order of available capacity in the optical core networks. TCP was designed and optimized for
low-speed data transfers over congested IP-based networks. However, its effectiveness in ultra high-speed networks
based on the emerging all-optical networks is being seriously questioned, especially in the transfer of petabytes data
over intercontinental distances [PFD03,STP03,Flo01,SCTP]. Another key issue to be addressed by UltraNet is
traffic engineering. While MPLS has recently been extended to IP-based DWDM networks to take advantage of the
optical bandwidths to address congestion problem in the IP layer, unfortunately, the required advanced traffic
engineering methods have not been widely deployed in operational networks because they involve complex interdomain signaling and costing. UltraNet will provide an excellent environment to prototype the needed practical
traffic engineering methods within the context of DOE networking environments.
Clearly the goal of UltraNet is to develop the infrastructure and networking technologies required to support the
needs of DOE large-scale science applications. The purpose of this proposed research is to extend the capabilities
of UltraNet by enabling it with scalable and fault-tolerant network services and primitives that will allow rapid
deployment of large science applications.
6
3. Research Plan
In this project, we focus on the control and provisioning of ultra-high speed networks for large-scale science
applications. We expect our project to complement and extend the research of Ultranet. Our research proposal
includes (a) traffic grooming and bandwidth provisioning, (b) survivability and fault-tolerant network provisioning,
(c) distributed space-time scheduling of large data transfer over ultra-high speed network, (d) low-delay and faulttolerant signaling and control plane architecture, (e) scalable network primitives and services. Figure 2 shows the
roadmap of the proposed research and its potential impact on Ultranet and ESnet.
Our Research:

Traffic Grooming

Signaling and Control
Plane Architecture

Scalable Services

Distributed Space-Time
Scheduling
ESnet Features:

Connects all DOE Sites

7x24 & high reliability:
9999

Best-effort delivery

Routine Internet activities
UltraNet
(Research Networks)
Production Network
(ESnet)
Tech
Transfer
Proposed Research at
UCDavis
Tech
Transfer
UltraNet Features:

R&D – Breakable

Scheduled
operations

Ultra High speed

Nearly all-optical
Figure 2 Roadmap of the proposed research.
3.1 Traffic Grooming and Bandwidth Provisioning
We envision that large-science applications and the next-generation communication infrastructure will employ
high-bandwidth optical networks as the dominant backbone technology. Optical networks based on wavelengthdivision multiplexing (WDM) technology have the ability to satisfy the bandwidth requirements of the largescience applications and future Internet infrastructure, by scaling up its existing capability (particularly its
bandwidth) by 2 or 3 orders of magnitude! Under WDM, the optical transmission spectrum is carved up into a
number of non-overlapping wavelength (or frequency) bands, with each wavelength supporting a single
communication channel operating at whatever rate one desires, e.g., peak electronic speed. By allowing users to
transmit simultaneously on different WDM channels, the huge opto-electronic bandwidth mismatch problem is
solved and the aggregate traffic carried by the network is increased.
Point-to-point WDM transmission technology is quite mature today, while the corresponding switching
technologies (optical crossconnects (OXCs)) are still maturing. But bandwidth is precious, especially for large-
7
science applications. Once WDM transmission technology is deployed on the network backbone, efficiently
utilizing the huge bandwidth at our disposal is of paramount importance.
While a single fiber strand has over a terabit-per-second bandwidth and a wavelength channel has over a gigabitper-second transmission speed, the network may still be required to support traffic connections at rates that are
lower than the full wavelength capacity. The capacity requirement of these low-rate traffic connections can vary in
range from STS-1 (51.84 Mbps or lower) up to full wavelength capacity. In order to save network cost and to
improve network performance, it is very important for the network operator to be able to mux/demux multiple lowspeed connections onto/from high-capacity circuit pipes, and intelligently switch them at intermediate nodes. This
is referred as traffic grooming problem [ZhM02,ClG02,Gro99,ToN94,ZhS00,FTU02,OZM02].
For traffic grooming, a node should switch traffic at wavelength granularity as well as finer granularity. Figure 3
shows the logical view of a simplified grooming-node architecture. (In this figure, Mux/Demux form the
transmission system, while the other blocks form the switching system.) This hierarchical grooming node consists
of a wavelength-switch fabric (W-Fabric) and a grooming fabric (G-Fabric). The W-Fabric performs wavelength
routing; the G-Fabric performs multiplexing, demultiplexing, and switching of low-speed connections. A portion
of the incoming wavelengths to the W-Fabric can be dropped to the G-Fabric through the grooming-drop ports for
sub-wavelength-granularity switching. The groomed traffic can then be added to the W-Fabric through the
grooming-add ports. The number of grooming ports determines the grooming capacity of a node.
Wavelength
Fiber
in
Access
Switch
Fiber
out
Fabric
Layer
(W - Fabric)
…
Dem
…
Mu
Layer
x
Grooming
add port
T
R
x…
x…
Loca
Rx
Tx
Grooming
drop port
ë
Layer
Grooming
Fabric
… Fabric)
(G -
Lightpath
AI
…
Local
AO
Figure 3 Grooming-node architecture and the corresponding auxiliary graph.
We propose a generic graph model for traffic grooming [ZhM02]. This model uses an auxiliary graph to represent
the different grooming node architectures and current network state, and takes into account various resource
constraints, such as the number of free wavelengths on each fiber and the number of available grooming ports at
each node. Figiure 3 shows the grooming-node architecture (left) and its corresponding auxiliary graph (right). WFabric is modeled as the  layer consisting of input vertex1 I and output vertex O; G-Fabric is modeled as the
access layer consisting of input vertex AI and output vertex AO; grooming-add port is modeled by an edge from
1
For clarity, we refer to node and link in the auxiliary graph as vertex and edge.
8
vertex AO to vertex O; and grooming-drop port is modeled by an edge from vertex I to vertex AI. A
unidirectional fiber is represented as an edge from vertex O at the source node to vertex I at the destination node
of the link. A lightpath layer consisting of input vertex LI and output vertex LO is added to model existing
lightpaths sourced/sunk at a node. A lightpath is represented as an edge from vertex LO at the source node to vertex
LI at the destination node. Every edge is associated with two attributes: one indicating the available capacity and
the other indicating the cost of the resource which the edge represents.
Given a connection request, by computing the shortest path from the access-layer output port (AO) at source node
to the access-layer input port (AI) at destination node, we can determine how to set up lightpath(s) and how to route
the connection onto the these lightpath(s) and/or some existing lightpath(s).
Given a traffic demand T(s,d,g,m), we need to determine how to route the traffic under the current network state. In
general, for a traffic demand T(s,d,g,m) in a network, there are four possible operations that can be used to carry the
traffic without altering the existing lightpaths.
• Operation 1: Route the traffic onto an existing lightpath directly connecting the source s and the destination
d.
• Operation 2: Route the traffic through multiple existing lightpaths.
• Operation 3: Set up a new lightpath directly between the source s and the destination d and route the traffic
onto this lightpath. Using this operation, we set up only one lightpath if the amount of the traffic is less than
the capacity of the lightpath.
• Operation 4: Set up one or more lightpaths that do not directly connect the source s and the destination d,
and route the traffic onto these lightpaths and/or some existing lightpaths. Using this operation, we need to
set up at least one lightpath. However, since some existing lightpaths may be utilized, the number of
wavelength-links used to set up the new lightpaths is probably less than that of wavelength-links needed to
set up a lightpath directly connecting the source s and the destination d.
The different ordering of the possible operations forms different grooming policies [ZhM02]. A grooming policy
determines how to carry the traffic in a certain situation. It reflects the intentions of the network operator. In this
project, we plan to compare the properties of various grooming policies and develop the policies based on the
characteristics of GTL and other large-science applications.
3.2 Survivability and Fault-tolerant Network Provisioning
Reliability and fast restoration are highly desirable features of a network designed to support DOE large-science
applications. However, network failures do occur more often than we wish. Table 2 shows some typical data on
network component (transmitter, receiver, fiber link (cable), etc.) failure rates and failure-repair times according to
Bellcore (now Telcordia). In Table 2, FIT (failure-in-time) denotes the average number of failures in 109 hours, Tx
denotes optical transmitters, Rx denotes optical receivers, and MTTR means mean time to repair. Although the
problem of how the connection availability is affected by network failures is currently attracting a lot of interest
[HoM02,ACQ02,RaM02,WSM02,WSM02FOEC], we still lack a systematic methodology to quantitatively
estimate a connection’s availability, especially when protection schemes are used. It is imperative that excellent
protection and restoration schemes be designed in the next-generation UltraScienceNet. The reliability requirement
for these applications may not be identical because of their diverse service characteristics. In the commercial
network, the availability requirements using Service Level Agreement (SLA), which is a contract between the
9
network operator and a customer. Usually, service reliability is represented by connection availability, which is
defined as the probability that the connection will be found in the operating state at a random time in the future.
Table 3 shows some typical SLAs. Connection availability can be computed statistically based on the failure
frequency and failure repair rate, reflecting the percentage of time a connection is “alive” or “up” during its entire
service period.
Table 2: Failure rates and repair times (Bellcore).
Metric
Equipment MTTR
Cable-Cut MTTR
Cable-Cut Rate
Tx failure rate
Rx failure rate
Bellcore Statistics
2 hrs
12 hrs
4.39/yr/1000 miles
10867 FIT
4311 FIT
There are two types of fault-recovery mechanisms. If backup resources (routes and wavelengths) are pre-computed
and reserved in advance, we call it a protection scheme. Otherwise, when a failure occurs, if another route and a
free wavelength have to be discovered dynamically for each interrupted connection, we call it a restoration scheme.
Generally, dynamic restoration schemes are more efficient in utilizing network capacity because they do not
allocate spare capacity in advance, and they provide resilience against different kinds of failures (including multiple
failures); but protection schemes have faster recovery time and they can guarantee recovery from disrupted services
they are designed to protect against (a guarantee which restoration schemes cannot provide).
Table 3: Illustrative service classes.
Service Type
Basic
Premium
Silver
Gold
Platinum
Availability
99%
99.5%
99.9%
99.99%
99.999%
Down Time/Year
87.6 hours
43.8 hours
8.76 hours
52.56 mins
5.26 mins
Protection schemes can be classified as ring protection and mesh protection. Ring-protection schemes include
Automatic Protection Switching (APS) and Self-Healing Rings (SHR). Both ring protection and mesh protection
can be further divided into two groups: path protection and link protection. In path protection, the traffic is rerouted
through a link-disjoint backup route (backup path) once a link failure occurs on its working path (primary path).2
In link protection, the traffic is rerouted only around the failed link. While path protection leads to efficient
utilization of backup resource and lower end-to-end propagation delay for the recovered route, link protection
provides faster protection-switching time. Recently, researchers have proposed the idea of sub-path protection in a
mesh network by dividing a primary path into a sequence of segments and protecting each segment separately.
Compared with path protection, sub-path protection can achieve high scalability and fast recovery time for a modest
sacrifice in resource utilization.
2
Node failures can also be considered by calculating node-disjoint routes. However, one should also note that carrier-class optical crossconnects (OXCs) in
network nodes must be 1+1 (master/slave) protected in the hardware for both the OXC’s switch fabric and its control unit. The OXC’s port cards, however,
don’t have to be 1+1 protected since they take up the bulk of the space (perhaps over 80%) and cost of an OXC; also a port-card failure can be handled as link
and/or wavelength channel failure(s). However, node failures are important to protect against in scenarios where an entire node (or a collection of nodes in a
part of the network) may be taken down, possibly due to a natural disaster or by a malicious attacker.
10
Link, sub-path, and path protection schemes can be dedicated or shared. In dedicated protection, there is no sharing
between backup resources, while in shared protection, backup wavelengths can be shared on some links as long as
their protected segments (links, sub-paths, paths) are mutually diverse. OXCs on backup paths cannot be configured
until the failure occurs if shared protection is used. So, recovery time in shared protection is longer but its resource
utilization is better than dedicated protection.
Dynamic restoration can also be classified as link, sub-path, or path based depending on the type of rerouting. In
link restoration, the end nodes of the failed link dynamically discover a route around the link, for each connection
(or “live” wavelength) that traverses the link. In path restoration, when a link fails, the source and the destination
node of each connection that traverses the failed link are informed about the failure (possibly via messages from the
nodes adjacent to the failed link). The source and destination nodes of each connection independently discover a
backup route on an end-to-end basis. In sub-path restoration, when a link fails, the upstream node of the failed link
detects the failure and discovers a backup route from itself to the corresponding destination node for each disrupted
connection. Link restoration is fastest and path restoration is slowest among the above three schemes. Sub-path
restoration time lies in between. Figure 4 summarizes the classification of protection and restoration schemes.
Figure 4: Different protection and restoration schemes in WDM mesh networks.
In summary, given the diversity of bandwidth granularities of the various high-capacity pipes, there are several
additional research challenges, e.g., should each bandwidth pipe be protected separately or should protection be set
up at wavelength channel levels? Should the spare capacity be set up for "dedicated protection" for a connection or
for "shared protection" which can be pooled for different connections? Should the protection/recovery be
performed on a per-link basis or a connection's end-to-end basis or on the basis of a "sub-path" of a connection?
Should the recovery paths be pre-computed (and periodically recomputed based on current network state for
efficiency) or should they be dynamically discovered after a failure occurs? These methods will have different
performance tradeoffs on reliability, restorability, restoration time, etc. It is envisioned that, perhaps, all of these
approaches may need to co-exist in the same network because different applications may have different
requirements on fault tolerance. We propose to investigate the applicability of the above fault-management
approaches on bandwidth provisioning for the GTL application as well as other DOE large-science applications.
3.3 Distributed Space Time Scheduling
To support various system-level tasks of the large-science applications, it will be necessary to form
interdisciplinary centers consisting of experts located at various academic, government, and industrial research labs
in different geographical locations. The data related to different aspects of the system will be collected, processed,
analyzed, and stored at different locations. This data must be cooperatively accessed and analyzed by teams of
experts. It will be cost-effective to support such efforts over ultra high-speed networks. This requires intelligent
11
distribted space-time scheduling of large data transfers. In this project, we consider the problem of aggregating
large data files from distributed databases and address the corresponding challenges involved from a network
architecture perspective. We believe that an Optical Burst-Switched (OBS) network is a suitable candidate for this
application. The problem is modeled as one of identifying a time-path schedule (TPS) in a graph representation of
the network, as described in the following. The TPS problem (TPSP) is proven to be NP-complete [BSZ04]. Thus,
we propose a Mixed Integer Linear Programming (MILP)-based approach and three heuristics to solve TPSP.
We first formulate the TPSP problem. Let us consider an OBS mesh network topology. The mesh network can be
represented as a graph G(V, E), as shown in Figure 5. Vertices V represent OBS nodes, and the edges E represent
optical links connecting the OBS nodes. The assumption is that all the optical links have the same capacity C (say
OC-192). For simplicity of exposition as well as for application to a non-WDM burst-switched network, let each
optical link have one wavelength only, since the problem can be easily extended for incorporating WDM.
Each (Genome) data warehouse is connected to a OBS node through a dedicated link of capacity C. There may be
multiple data warehouses connected to one OBS node. A supercomputer is connected to the OBS node through
dedicated links, so that there is no bandwidth bottleneck from the OBS node to the supercomputer. All the above
links being dedicated are not represented in the graph.
Figure 5. Graph representation of the TPSP.
At a certain step in the computation, the supercomputer may require data aggregated from multiple data warehouses
before it resumes computation. This process is modeled as the transfer of files which require to be sent from the
source OBS node (to which the corresponding data warehouses are connected) to the destination supercomputer. It
should be noted that one OBS node may be connected to several data warehouses. A query is first issued by the
supercomputer to all data warehouses to determine the file size required from each warehouse. Alternatively, based
on how some of these applications develop in the future, the file-size information may already be available at the
supercomputer. The file size provides information on its expected transmission delay, as the file is transferred from
the source node to destination. The time that it takes to transfer a file along a route, T f , is the sum of the
transmission delay, the propagation delay, and the overhead. Because the file size ( s f ) is typically large (typically
greater than 5 Gbytes, and perhaps as large as Petabytes in some (future) applications), the transmission delay
dominates T f .
12
In our graph model, at each OBS node v , there exist a set of files Sv  { fv1 fv 2 … f vl } whose T f is pre-computed
and denoted by the set Tv  {T fv1  T fv 2 … T fvl } , which is the time to transfer each file. The OBS node that the
supercomputer is connected to is modelled as d , where all the files are destined to.
The objective is to determine the following:
1. Route: The path through which a file should be transferred from the source to the destination.
2. Time schedule: The time at which a file has to transmitted in a single burst so that it can be transferred
through the route determined in Step 1. This is important because two files which share a link on their
routes should not be transmitted at the same time to avoid collision due to the constraints of an OBS
network described below.
In an OBS network, although limited data buffering at OBS nodes is currently possible using fiber delay lines, it is
inadequate for buffering very large files as they exist in our case. Hence, once a data warehouse starts transmitting a
file, it must reach the destination in a single burst, and there is no possibility of buffering it along the path. We
assume that the files cannot be fragmented. This simplifies the burst-assembly process, reduces the overhead of
burst regeneration at the destination, and eliminates the possibility of errors arising due to misaligned fragments.
Hence, we utilize only a single path from the source to the destination. We also assume that this path may contain
no cycles. OBS switches do not have the ability to multiplex two different incoming data streams onto the same
outgoing link. Therefore, each link can transfer only one file at a time.
The aim is to minimize the total time for data aggregation. This is assuming that the last file to reach the destination
is indeed the bottleneck, since computation cannot begin unless all the data is accumulated. The two dimensions of
determining both the path and the time makes this problem exceptionally hard, and differentiates it from all
machine-scheduling problems which have been reported in the literature [Pin02]. Thus, we formulated the problem
as a Mixed Integer Linear Program (MILP) [BSZ04], which can be solved using a MILP solver such as CPLEX
[CPL]. However, the size of the MILP grows rapidly with the number of files because a set of several equations is
created for every pair of files. Hence, the MILP is not very efficient for solving larger problems. Therefore, we
propose efficient heuristics to solve the problem, and we use the MILP for only a comparative study. Thus, we also
proposed three heuristic algorithms to yield close-to-optimum solutions for TPSP as summarized in the following:
LONGEST-FILE-FIRST (LFF) SCHEDULING: This heuristic is based on the intuition that the longest file
(having the largest transfer times) is the bottleneck for scheduling, because it requires more resources in terms of
the amount of time required to be free on the links for it to be transferred. Therefore, the LFF algorithm aims at
scheduling the longest files first, so that they get priority on the network’s resources and get scheduled earlier. For
choosing the path over which to transfer a file, the algorithm chooses the best path among K randomly chosen
paths. The overall worst-case running-time complexity of LFF is O( Krf 2 ) , where r is the path length and f is the
number of files.
DISJOINT-PATH (DP) SCHEDULING: This heuristic is based on the intuition that files can be transferred
along link-disjoint paths in parallel. The idea is to compute the maximum number of disjoint paths from the sources
of the files to destination d . The above can be computed through an implementation of the Max-Flow algorithm
[CLR01] on the following modified graph. All the links have unit capacity. A dummy source node is connected to
all the nodes which have files not scheduled as yet, with link capacity as the number of files. The destination is
connected to a dummy destination with capacity as the number of files yet to be scheduled. The Max-Flow
algorithm then identifies the disjoint paths to consist of links with unit flow. The worst-case running-time
complexity of the DP heuristic is O(r 4  rf 2 ) .
MOST-DISTANT-FILE-FIRST (MDFF) SCHEDULING: This heuristic is based on the intuition that files
which are most distant in terms of number of links from the destination occupy more links and are hence the
bottleneck for scheduling. The heuristic aims at scheduling these files first when the network is relatively resourcefree. The worst-case running time is O( Krf 2 ) .
13
These approaches are compared through simulations on a 24-node topology in Figure 6. The Longest-File-First
(LFF) heuristic performs very well when the number of files to be aggregated is small, while the Disjoint-Paths
(DP) heuristic should be preferred for a large number of files. Also, LFF performs close to the MILP for small
networks where the MILP can provide a solution in a reasonable computing time. We plan to extend our existing
heuristic algorithms and further develop algorithms that are tailored for large-science applications. We also plan to
test our algorithms in real large-science projects, e.g., GTL applications on Ultranet.
Figure 6: Performance of the heuristics – with lower bound on finish time.
3.4 Low-delay and Fault-tolerant Signaling and Control Plane Architecture
The need for low-delay and fault-tolerant signaling and control plane architecture for a network such as UltraNet
arise for many reason. First, supercomputers at different physical sites will be harvested to provide the
computational power needed for these long time-scale simulations, which must be tightly coordinated to ensure the
effective utilization of computers. It is important that the end-to-end message delays be minimized over the
networks to ensure that the supercomputers do not idle waiting for control messages. It is important to note that a
single second of idle time represents the loss of several teraflops of compute power [DOE03]. The end-to-end delay
minimization represents a significant challenge to networking technologies. Such problems are currently addressed
in a limited way in overlay networks and daemons, but highly focused research and development efforts are needed
for an effective solution to this class of applications. We will identify the needs of large-science applications and
propose corresponding schemes.
14
Second, in order to implement a space-time schedule of data transfers between end hosts, supercomputers, and
databases, it is necessary to have a signaling and control network that can used to setup the schedule and manage
and control the network resources. One issue is how to implement the signaling network. Current approaches
employ in-fiber-in-band techniques which are simple but do not provide the capability to do deploy sophisticated
scheduling algorithms. As part of this research, we will investigate in-fiber-out-band signaling. To address the
fault-tolerance issue we will investigate multipath approaches for signaling and control messages.
Third, in order to manage and control the network resources of the ultra-high speed network, it is necessary to
develop a very fast, reliable, and powerful control plane. The control plane must guarantee delivery of control
messages with minimum delay. As part of the research we will investigate the requirements of the control plane
architecture for large-science applications and build upon the knowledge plane architecture proposed in [CPR03].
3.5 Scalable and Fault-Tolerant Network Primitives and Intelligent Services
In this research project we will investigate how application-layer multicasting, caching, and intelligent data
replication can be used to implement a high-performance network infrastructure for GTL applications. Towards this
end, we will build upon the pseudo-serving paradigm proposed in [KoG99]. Pseudoserving is a P2P file sharing
system comprising two components: a superserver and a set of pseudoservers. The former grants the latter access
to files in exchange for some amount of network and storage resource, specified through a contract between the
system and each user. Under normal circumstances, no resources are requested and the superserver acts as a
regular server. As demand begins to exceed the superserver’s ability to provide service, the superserver offers a
contract to the requesting pseudoserver. In it, the pseudoserver is obligated to serve the file it will retrieve to N
other requesters within T seconds. It is released from its obligations should it service N other requesters before T
seconds or should T seconds have passed without it having serviced N requesters. In exchange for this resource
contribution, the superserver gives to the requesting pseudoserver a referral to another pseudoserver. This other
pseudoserver is the one closest to the requester known to contain the file and is obligated to provide service as part
of its contractual obligations. The pseudo-server provides a framework for sharing local storage and bandwidth and
even partial computation across organizational boundaries.
The current applications of pseudo-serving to dissipate the flash-crowd problem is somewhat limiting because what
is transferred is the same file; in GLT applications there will be small number of users who will large amounts of
data. This restriction can be removed by organizing files into sets of files, or packages. Before we see how
packages work, we first examine why pseudo-serving may not work well when more than one file is requested from
the super-server which for the GTL application is a meta-data controller for all the bio-databases in the GTL
application. Suppose there are many files on a bio-database. There is nothing that prevents pseudo-serving from
working on a per-file basis, so that contracts are set based on the incoming rate of request for individual files. The
problem arises when this per-file rate of request is low but the cumulative rate of request for all the files on the
super-server is high. Under such circumstances, the super-server may not be able to handle the incoming stream of
request alone, and pseudo-servers are not able to satisfy contracts set by the super-server and so pseudo-serving is
ineffective. Now, suppose files are organized into packages, where each package is a group of N files. Moreover,
users are allowed to retrieve only packages. Under this arrangement, the rate of request for each package is N times
the rate of request for individual files. The file holding time is therefore reduced by a factor of N. Using this
packaging mechanism, contracts previously too difficult to satisfy because of their long file holding times can be
made more attractive to the user. The problem with packaging, of course, is that users now need to download a file
that may be significantly larger than the original file. Depending on how load on the bio-databases, from a user's
point of view, retrieving packages may or may not be more attractive than retrieving only the file directly from the
super-server. The optimal package size strikes a good balance between making contracts reasonably satisfiable and
providing sufficient benefit to the user in reducing the total download time. This topic will be investigated as part
of this research.
15
4. Collaboration and Applications
This project will be executed in close collaboration with the DOE UltraScienceNet project at Oak Ridge National
Laboratory (ORNL), in particular with the UltraScienceNet PIs -- Dr. Nagi Rao, Dr. Bill Wing, and their
colleagues.
The PI of our proposed project, Professor Biswanath Mukherjee, has been cooperating with Dr. Nagi Rao, Dr. Bill
Wing, and others over the past 1.5 years towards defining the research challenges for the bandwidth-provisioning
problems for DOE Large-Science Applications. In fact, Professor Mukherjee was invited by Dr. Nagi Rao and Dr.
Bill Wing to co-chair (along with Dr. Wing) the "Provisioning Group" of the "DOE Workshop on Ultra-High Speed
Transport Protocols and Provisioning for Large Scale Science Applications" held at Argonne National Laboratory
in April 2003 [DOE03]. Professor Mukherjee made important contributions to the workshop by co-leading the
discussions on provisioning. Then, he contributed to the final workshop report through his ideas on: (1) dynamic
provisioning of high-capacity pipes of various bandwidth granularities ranging from multiple wavelengths to a full
wavelength to sub-wavelength capacity; (2) how to employ generalized multi-protocol label switching (GMPLS) to
facilitate the dynamic provisioning; (3) survivable bandwidth provisioning; (4) separated control channel with
deterministic or bounded delay and jitter for control-loop operations; etc.
We propose to build up on the relationship that has been set up between Professor Mukherjee and the
UltraScienceNet team. Specifically, we plan to extend it to a true research collaboration to ensure that our research
team at UC Davis is working on important research problems (w.r.t. the missions of the UltraScienceNet team and
the DOE). We anticipate that our research results will complement (and extend the knowledge gained from) the
UltraScienceNet, and our research results can be tested on the UltraScienceNet platform as well.
One of our Co-PIs, Professor Dipak Ghosal, also has a long working relationship with Dr. Nagi Rao. They share
common interests in transport-layer protocols and application-layer research problems. Our collaboration will build
up on this relationship as well.
It should also be worth mentioning that Professors Mukherjee and Ghosal have an ongoing collaborative research
project with Dr. Wu-Chun Feng of Los Alamos National Laboratory using a UC-LANL seed-grant project entitled
"Wide-Area Transport and Signaling Protocols for Genome-to-Life (GTL) Applications"; $45,214; 9/1/03 8/31/04. We propose to exploit this collaboration also for successful execution of the proposed project.
Our additional collaborators in the DOE community include Professor Ghosal's research collaboration with Dr.
Rose P. Tsang of Sandia National Laboratory. This relationship will also be utilized, if necessary, for the proposed
project.
The following are the expected outcomes of this proposed research:
1. A report on the networking issues for the DOE GTL program. This report will discuss the specific
requirements of the GTL program and the corresponding requirements on the networking infrastructure and
protocols.
2. This research will develop various heuristics to perform space-time scheduling of large file transfers that
minimize the total data aggregation delay.
3. The research will compare and contrast various methods to mitigate receiver-side congestion that will arise
when large data is simultaneously transferred from the various bio-databases to a client. The analysis will
be done using a combination of simulation and analytical models.
4. The research will develop the requirements for a low-delay and fault-tolerant signaling and control plane
architecture.
5. This research will investigate traffic grooming algorithms for efficient network bandwidth utilization.
6. We will propose reliable network provisioning strategies that meet the requirements for large-science
applications.
16
7. The research will investigate and design a framework using which users can share local storage and partial
computation. The applicability of application layer multicasting, caching, and data replication in the GTL
applications will also be determined.
8. We will fully test the proposed algorithms and protocols in UltraNet in collaborations UltraNet researchers.
5. Statement of Work
The main components of this proposal include (a) traffic grooming and bandwidth provisioning, (b) survivability
and fault-tolerant network provisioning, (c) distributed space-time scheduling of large data transfer over ultra-high
speed network, (d) low-delay and fault-tolerant signaling and control plane architecture, (e) scalable network
primitives and services.
Research, engineering, and application milestones:
Year 1:
 Identify and analyze suitable traffic grooming algorithms for GTL applications.
 Develop and extend existing heuristic space-time scheduling algorithms under dynamic network states.
 Propose and study suitable network survivability strategies for large-science applications.
Year 2:
 Technology transfer to Ultranet
o Test proposed space-time scheduling and grooming algorithms on Ultranet (at ORNL) with the
collaboration of UltraNet researchers
 Develop low latency signaling strategies to control GTL simulations at remote DOE terrascale computing
facilities.
 Develop a robust application–layer multicasting framework for simultaneous downloading of very large
datasets to multiple clients.
Year 3:
 Technology transfer to Ultranet
o Test proposed low latency signaling strategies and application-layer multicasting algorithms on
Ultranet (at ORNL) with the collaboration of UltraNet researchers
o Fully test algorithms developed for a set of large-science applications, in addition to GTL.
 Develop the requirement for a fault-tolerant in-fiber-out-of-band signaling architecture.
 Develop a scalable control plane architecture for UltraNet taking into account the requirement of the
various large-science applications.
17
6. References3
[ACQ02] V. Anand, S. Chauhan, and C. Qiao, ``Sub-path protection: A new framework for optical layer
survivability and its quantitative evaluation,'' Dept. of Computer Science and Engineering, State
University of New York at Buffalo, Tech. Report 2002-01, Jan. 2002.
[BBM03] Alessandro Bassi, Micah Beck, Terry Moore, James S. Plank, Martin Swany, Rich Wolski, and Graham
Fagg, “The Internet Backplane Protocol: A Study in Resource Sharing,” Future Generation Computing
Systems, 19(4), May 2003, pp 551-561. Elsevier.
[BSZ04]
A. Banerjee, N.Singhal, J. Zhang, C. N. Chuah and B. Mukherjee, “A Time-Path Scheduling Problem
(TPSP) for Aggregating Large Data Files from Distributed Databases using an Optical-Burst Switched
Network”, accepted for presentation and publication in the proceedings of International
Communications Conference (ICC 2004), Paris, France.
[ClG02]
M. Clouqueur and W. D. Grover, ``Availability analysis of span-restorable mesh networks,'' IEEE J.
Selected Areas in Communications, vol. 20, pp. 810--821, May 2002.
[CLR01]
T. Cormen, C. Leiserson, R. Rivest and C. Stein, “Introduction to Algorithms,” Second Edition, MIT
Press, 2001.
[CPL]
http://www.ilog.com/products/cplex/product/suite.cfm
[CPR03]
D. D. Clark, C. Partridge, J. C. Ramming, J. Wroclawski, A Knowledge Plane for the Internet. ACM
SIGCOMM 2003.
[DCF03]
A. Darling, L. Carey, and W. Feng, “The Design, Implementation, and Evaluation of mpiBLAST,”
ClusterWorld 2003, Best Paper Award, June 2003.
[DOE03] DOE Workshop on Ultra-High Speed Transport Protocols and Provisioning for Large Scale Science
Applications. Argonne National Lab, Argonne, IL, 2003.
http://www.csm.ornl.gov/ghpn/wk2003_workshops.html
[Flo01]
Internet Engineering Task Force, ICSI Center for Internet Research, Berkeley,
California.http://www.icir.org/floyd/papers/draft-floyd-tcp-highspeed-01.txt
[FTU02] A. Fumagalli, M. Tacca, F. Unghvary, and A. Farago, ``Shared path protection with differentiated
reliability,'' in Proc. IEEE ICC, pp. 2157--2161, April 2002.
[GTL03]
DOE Genomes to Life Program. http://doegenomestolife.org/.
[Gro99] W. D. Grover, ``High availability path design in ring-based optical networks,'' IEEE/ACM Trans.
Networking, vol. 7, pp. 558--574, Aug. 1999.
[HoM02] P.-H. Ho and H. Mouftah, ``A framework for service-guaranteed shared protection in WDM mesh
networks,'' IEEE Communications Mag., vol. 40, pp. 97--103, Feb. 2002.
3
There exist a vast amount of references in the area of networking, optical, and applications that are related to this proposal.
Far from a complete set of references, we can only list a small sample of the literature to highlight the presentation.
18
[Jan00]
Jannottti, J., et al. Overcast: Reliable Multicasting with an Overlay Network. in Fourth Symposium on
Operating Systems Design and Implementation (OSDI 2000). 2000. San Diego, California, USA.
[KoG99] K. Kong and D. Ghosal, Mitigating Server Side Congestion Through Pseudo-Serving. IEEE/ACM
Transactions on Networking, 1999. 7(4).
[Muk97]
B. Mukherjee, “Optical Communication Networks,” McGrawHill, pp. 259–288, 1997.
[Ora01]
A. Oram, Peer-to-Peer: Harnessing the Benefits of a Disruptive Technology. 2001: O' Reilly &
Associates.
[OZM02] C. Ou, H. Zang, and B. Mukherjee, ``Sub-path protection for scalability and fast recovery in optical
WDM mesh networks,'' in Proc. OFC, p. ThO6, Mar. 2002.
[Pin02]
M. Pinedo, “Scheduling: Theory, Algorithms, and Systems,” Second Edition, Prentice Hall, 2002.
[PBB01] James S. Plank, Alexander Bassi, Micah Beck, Terence Moore, D. Martin Swany, and Rich Wolski,
“Managing Data Storage in the Network,” IEEE Internet Computing, 5(5), September/October 2001,
pp. 50-58.
[PFD03] PFDLNet, First International Workshop on Fast Long-Distance Networks, Cern, Geneva, Switzerland,
2003.
[RaM02] S. Ramamurthy and B. Mukherjee, ``Survivable WDM mesh networks, Part II -- restoration,'' in Proc.
IEEE ICC, pp. 2023--2030, June 1999. (Also, IEEE JLT, to appear, 2002.).
[RWD03] N. S. Rao, W. R. Wing, T. H. Dunigan, DOE UltraScience Net: Experimental Ultra-Scale Network
Research Testbed. http://www.csm.ornl.gov/ultranet/UltraNet_ORNL_Prop.pdf
[SCTP]
[STP03]
Stream Control Transmission Protocol (SCTP). http://www.sctp.de
Scheduled Transfer Protocol (ST), High-Performance Parallel Interface Standards Group.
http://www.hippi.org/cST.html.
[ToN94] M. To and P. Neusy, ``Unavailability analysis of long-haul networks,'' IEEE J. Selected Areas in
Communications, vol. 12, pp. 100--109, Jan. 1994.
[WSM02] J. Wang, L. Sahasrabuddhe, and B. Mukherjee, ``Path vs. sub-path vs. link restoration for fault
management in IP-over-WDM networks: Performance comparisons using GMPLS control signaling,''
IEEE Communications Mag., vol. 40, pp. 2--9, Nov. 2002.
[WSM02FOEC] J. Wang, L. Sahasrabuddhe, and B. Mukherjee, ``Fault monitoring and restoration in optical WDM
networks,'' in Proc. National Fiber Optic Engineers Conference, Sep. 2002.
[ZhM02] K. Zhu and B. Mukherjee, “On-line Provisioning Connections of Different Bandwidth Granularity in
WDM Mesh Networks,” Proc., IEEE/OSA Optical Fiber Communication Conference (OFC) ’02,
Anaheim, CA, March, 2002.
[ZhS00]
D. Zhou and S. Subramaniam, ``Survivability in optical networks,'' IEEE Network, vol. 14, pp. 16--23,
Nov./Dec. 2000.
19
7. Appendix A: Budget
The cost of this project is $K in the first year, $K in the second year, and $K in the third year (for FY2004 through
2006). Cost for each year includes graduate student assistantship and tuition fees for five graduate students, one
month salary for each PI. In each year, one post-doctoral fellow at full-time will be supported at UC Davis. The
post-doctoral fellow will be the involved in all aspect of the research plan and will work closely with the graduate
students and the PIs. The budget includes travel money for travel to DOE project meetings, meetings for technology
transfer to UltraNet, and attending conferences and workshops to present some of the relevant research results.
Finally, the budget also includes equipment money to buy desktop PCs/workstations for graduate students and the
postdoc.
Graduate Student Fees for Research Assistants including non-California-resident tuition. The University of
California, Davis campus does not have an “Out of State” or “Non-Resident” Tuition Remission program.
However, we are requesting approval to charge the tuition for 3 students in the first year.
Technical support salary is requested for computer technical support directly related to the scientific research
objectives of this proposed project. This will include including installation, troubleshooting and maintenance of
specialized software and/or networking capabilities required for simulation software such as ns as well as
installation of hardware and /or research instrumentation required to meet the scientific research objectives of this
project.
20
8. Appendix B: Biographical Information
BISWANATH MUKHERJEE
Department of Computer Science
University of California
Davis, CA 95616, USA
EDUCATION
1987
Ph.D.
1980
B.Tech. (Hons.)
Phone: +1-530-752-4826; FAX: +1-530-752-4767
Electronic mail: mukherjee@cs.ucdavis.edu
WWW: http://networks.cs.ucdavis.edu/~mukherje/
Electrical Engineering, University of Washington, Seattle
Electronics and Electrical Communications Engg., IIT Kharagpur (India)
ACADEMIC APPOINTMENTS
1995Professor/Computer Science, University of California, Davis
1997-00
Department Chair/Computer Science, University of California, Davis
1992-95
Associate Professor/Computer Science, University of California, Davis
1987-92
Assistant Professor/Computer Science, University of California, Davis
1984-87
Research & Teaching Assistant/Electrical Engineering, University of Washington
CURRENT RESEARCH INTERESTS
Lightwave Networks; Wireless Networks; Network Security
AWARDS
1984-85
1986-87
1991
1994
GTE Teaching Fellowship, University of Washington
General Electric Foundation Fellowship, University of Washington
Co-winner, Best Paper Award, 14th National Computer Security Conference, for the paper "DIDS
(Distributed Intrusion Detection System \(mi Motivation, Architecture, and an Early Prototype."
Co-winner, Paper Award, 17th National Computer Security Conference, for the paper
"Testing Intrusion Detection Systems: Design Methodologies and Results from an Early
Prototype."
RESEARCH PUBLICATIONS
Please visit B. Mukherjee's website (http://networks.cs.ucdavis.edu/~mukherje/) for details on his research
publications.
A. List of up to Five Publications Most Closely Related to Proposed Project:
1. B. Mukherjee, Optical Communication Networks, New York: McGraw-Hill, July 1997.
2. B. Mukherjee, "WDM-Based Local Lightwave Networks -- Part I: Single-Hop Networks; Part II: Multihop
Networks," IEEE Network, vol. 6: Part I: no. 3, pp. 12-27, May 1992; Part II: no. 4, pp. 20-32, July 1992.
(Nominated for IEEE and IEEE Communications Society Paper Awards. Also, revised/updated version
published in Encyclopedia for Telecommunications as an Invited Article.)
3. L. Sahasrabuddhe and B. Mukherjee, ``Light-Trees: Optical Multicasting for Improved Performance in
Wavelength-Routed Networks,'' IEEE Communications Magazine, vol. 37, no. 2, pp. 67-73, Feb. 1999.
4. D. Datta, B. Ramamurthy, H. Feng. J.P. Heritage, and B. Mukherjee, "Impact of transmission impairments on
the teletraffic performance of wavelength-routed optical networks," IEEE/OSA Journal of Lightwave
Technology, vol. 17, no. 10, pp. 1713-1723, Oct. 1999.
5. B. Mukherjee, ``WDM Optical Communication Networks: Progress and Challenges" (Invited Paper), IEEE
Journal on Selected Areas in Communications (Special Issue on ``Protocols and Architectures for Next
Generation Optical WDM Networks"), vol. 18, no. 10, pp. 1810-1824, Oct. 2000.
B. List of up to Five Other Significant Publications.
1. B. Mukherjee and J. S. Meditch, "The p(i)-persistent protocol for unidirectional broadcast bus networks," IEEE
Transactions on Communications, vol. 36, pp. 1277-1286, Dec. 1988.
21
2. B. Mukherjee and J. S. Meditch, "Integrating voice with the p(i) persistent protocol for unidirectional broadcast
bus networks," IEEE Transactions on Communications, vol. 36, pp. 1287-1295, Dec. 1988.
3. B. Mukherjee, D. Banerjee, S. Ramamurthy, and A. Mukherjee, "Some principles for designing a wide-area
optical network," IEEE/ACM Transactions on Networking, vol. 4, pp. 684-696, Oct. 1996. (Originally
appeared in IEEE Infocom '94, was selected by the IEEE Infocom '94 conference program committee as one of
the top few papers (out of 449 submissions), recommended to the IEEE/ACM Transactions on Networking, and
published in the journal after its own independent review.)
4. B. Mukherjee, L. T. Heberlein, and K. N. Levitt, "Network intrusion detection," IEEE Network, vol. 8, no. 3,
pp. 26-41, May/June 1994.
5. B. Guha and B. Mukherjee, "Network security via reverse engineering of TCP code: Vulnerability analysis and
proposed solutions," IEEE Network, vol. 11, no. 4, pp. 40-49, July/August 1997.
PROFESSIONAL SERVICE
Editor, IEEE/ACM Transactions on Networking (1994-2000)
Editor-at-Large, Optical Communicationa and Networking, IEEE Communications Society (1999-2000)
Technical Program Chair, IEEE INFOCOM '96 Conference
Member of the Editorial Board and Senior Technical Editor, IEEE Network (1997-2000)
Member of the Editorial Board, Journal of High-Speed Networks
Member of the Editorial Board, ACM/Baltzer Wireless Information Networks (WINET) journal
Member of the Editorial Board, Photonic Network Communications journal
Member of the Editorial Board, Optical Networks journal
Proposal Evaluation Panel: National Science Foundation (1993-present)
NSF Panels/Workshops: (1) All-Optical Networks (Jan. 93); (2) Optical Commun. and Networks (March 94); (3)
CISE International Cooperation (Oct. 97); (4) Ultra-High-Capacity Optical Networks (Oct. 02).
Member of the Technical Program Committee, IEEE INFOCOM 89-90, 92-99 conferences; IEEE GLOBECOM
92; ACM SIGCOMM 93; (and many other conferences)
Reviewer of proposals for: National Science Foundation; NASA/HPCC; State of California MICRO Program;
Hong Kong Research Grants Council; Govt. of Singapore; Israel Science Foundation
Founder, Chairman, and Chief Technology Officer, Summit Networks, San Jose, CA (Feb. '00 - Aug. '02): A
startup specializing in building optical networking equipment.
Member, Board of Directors, IPLocks, San Jose, CA (Feb.'02 - present): building computer security products.
Graduate Students and Postdoctoral Researchers Supervised:
Subrata Banerjee, PhD (Cisco; previously Director of Software, Accordion Networks; Asst. Prof. at Stevens Tech,
Phillips Research); Feiling Jia, PhD (Atoga Systems, previously at SBC/Pacific Bell); Shao-kong Kao, PhD
(Foundry Networks; previously Sun Microsystems, Alidian); Michael S. Borella, PhD (3Com; previously Asst.
Prof. at DePaul Univ.); Dhritiman Banerjee, PhD (VP/cofounder of Internet Photonics; previously at Bell
Labs./Lucent); Jason Iness, PhD (Intel); Byrav Ramamurthy, PhD (Asst. Prof. at Univ. of Nebraska); S. Ramu
Ramamurthy, PhD (CIENA; previously at Tellium and Bellcore); Jason Jue, PhD (Asst. Prof. at University of
Texas--Dallas); Laxman H. Sahasrabuddhe, PhD (SBC; previously at Amber Networks); Nick Puketza, PhD
(Lecturer at UC Davis); Xiaoxin Wu, PhD (postdoc at Purdue; previously at Arraycom); Wushao Wen, PhD
(CIENA; previously at Mahi Networks); Hui Zang, PhD (Sprint Adv. Technology Lab.); Shun Yao, PhD (postdoc
at UC Davis); Jian Wang, PhD (Asst. Prof. at Florida Intl. Univ.); L. T. Heberlein, MS (Net Squared); Justin
Doak, MS (LANL); Kui Zhang, MS (Cisco); Biswaroop Guha, MS (Hewlett-Packard); Kirk Bradley, MS (SRI);
plus 11 other MS degrees. Currently supervising approx. 13 graduate students, mostly PhDs.
PI's PhD Advisor: Professor James S. Meditch, University of Washington, Seattle
22
DIPAK GHOSAL
Department of Computer Science
University of California
Davis, CA 95616
E-mail: ghosal@cs.ucdavis.edu
Tel. No.: (530) 754 9251
Fax: (530) 752 4767
WWW: http://networks.cs.ucdavis.edu/~ghosal
Education
 Post-Doctoral Studies, Computer Science, Institute for Advanced Computer Studies, University of
Maryland, USA, September 1990
 Ph.D., Computer Science, The Center for Advanced Computer Studies, University of Louisiana, USA, July
1988.
 M.Sc.(Engg.), Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore, India,
December 1985.
 B.Tech., Dept. of Electrical Engineering, Indian Institute of Technology, Kanpur, India, May 1983.
Professional Experience
 July 1999 - Present: Associate Professor, Department of Computer Science, University of California,
Davis, CA 95616.
 January 1996 - June 1999: Assistant Professor, Department of Computer Science, University of California,
Davis, CA 95616.
 September 1990 - December 1995: Member of the Technical Staff, Bell Communications Research, Red
Bank, New Jersey 07701, USA.
 September 1988 - August 1990: Research Associate, Institute for Advanced Computer Studies, The
University of Maryland, College Park, MD 20742, USA.
 July 1986 - July 1988: Research Assistant, The Center for Advanced Computer Studies, University of
Louisiana, Lafayette, LA, 70504, USA.
 August 1983 - December 1985: Research Fellowship, Department of Computer Science and Automation,
Indian Institute of Science, Bangalore, India.
Recent Research Publications
 Julee Pandya, Prasant Mohapatra, and Dipak Ghosal, “Asymptotic Analysis of a Peer Enhanced Cache
Invalidation Scheme,” WiOpt'04: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks
24th - 26th of March, 2004, University of Cambridge, UK.
 Stephen Mueller, Rose P. Tsang, and Dipak Ghosal, “Multipath Routing in Mobile Ad Hoc Network –
Issues and Challenges,” Invited paper. To appear in Lecture Notes in Computer Science, 2004.
 Dipak Ghosal, Benjamin Poon, and Keith Kong, “P2P Contracts: A Framework for Resources and Service
Exchange" accepted for publication in the special issue of Future Generation Computer Systems, 2004
 S. Kovvuri, V. Pandey, B. Mukherjee, D. Ghosal, and D. Sarkar, ``A Call-admission Control (CAC)
Algorithm for Providing Guaranteed QoS in Cellular Networks," Intl. Journal of Wireless Information
Networks, 2003{Preliminary version: S. Kovvuri, V. Pandey, D. Ghosal, B. Mukherjee, and D. Sarkar, ``A
call-admission control (CAC) algorithm for providing guaranteed QoS in cellular networks,'' Proc., IEEE
Wireless Access Systems, San Francisco, CA, Dec. 2000}.
 W. Wen, B. Mukherjee, S.-H. Gary Chan, and D. Ghosal, ``LVMSR-- An efficient algorithm to multicast
layered video," Computer Networks, March 2003 {Preliminary Version: W. Wen, S.-H. Gary Chan, D.
Ghosal, and B. Mukherjee, ``LVMSR--An efficient algorithm to multicast layered video,'' Proc., IEEE ICC
2000 conference, New Orleans, LA, pp. 254-258, June 2000.}
23

B. Reynolds and D. Ghosal. STEM: Secure Telephony Enabled Middlebox. IEEE Communications
Magazine Special Issue on Security in Telecommunication Networks. October 2002.
 J. Burns and D. Ghosal, “Design and Analysis of a New Algorithm for Automatic Detection and Control of
Media-Stimulated Focussed Overload, to appear in Telecommunication System, 2002.
 J. Abramson, X-yan Fang, D. Ghosal, Analysis of an Enhanced Signaling Network for Scalable Mobility
Management in Next Generation Wireless Networks, in IEEE Globecom, November 2002.
 B. Reynolds and D. Ghosal, “STEM: Secure Telephony Enable Middlebox, to appear in IEEE
Communications Magazine Special Issue on Security Issues in Telecommunications Networks, October
2002.
Professional Service
 Served in many NSF and UC Core panels
 Program Committee Member of 1995 Distributed Computing Conference, Infocom 1995-1997, 2000, 2001,
2003, Performance 1996, SDPS 1996, MASCOT 1994, 2001
 Referee for NSF Proposals, IEEE/ACM transactions on Networking, IEEE Transactions on Computers,
IEEE Computer Magazine, IEEE Transactions on Software Engineering,
 Member of IEEE Computer Society, IEEE Communications Society, and ACM.
Grants and Award
1998-1999: MICRO Grant. Title “Emerging Customer Data Network Management.” (Industry support
committed from SBC). PIs: Biswanath Mukherjee and Dipak Ghosal.
1997-2002: NSF Career Award. Proposal Title “A Career Development Plan for Research and Education in
High Speed Networks.” PI: Dipak Ghosal
1998-2003: NSF Award. Proposal Title “Complementing Internet Caching with Pseudo-serving to Mitigate
Network Congestion.” PIs: Dipak Ghosal and Louis S Hakimi
2002-2003: HP Technology Award, Mobile Technology Solutions Grant, Pis: Prasant Mohapatra and
Dipak Ghosal
2003–2004: Sandia Labs. Title “Application of Mobile Ad Hoc and Sensor Networks for Facilities
Protection,” PI: Dipak Ghosal.
2003– 2004: Los Almos National Labs. Title: Wide-Area Transport and Signaling Protocols for Genome
To Life (GTL) Applications. PIs: Biswanath Mukherjee, Dipak Ghosal and Wu-Fung Chung
2003– 2005 NSF Award: Proposal Title: Security Architecture for IP Telephony. PIs: Dipak Ghosal and S.
Felix Wu.
2003–2004: California Institute for Energy Efficiency (CIEE). Proposal Title: Enabling Demand Response
with Vehicular Mesh Networks (VMesh). Status (pending) PIs: Chen-Nee Chuah, Dipak Ghosal, and
Michael H. Zhang.
Patents/Inventions
Keith Kong and Dipak Ghosal, “A Self-Scaling Scheme for Avoiding Server-Side Congestion in the
Internet,” Approved October 2002, US Patent 6,473,401 B1
Names of graduate and post-graduate advisors, advisees, and collaborators
 Ph.D. advisor: Dr. Laxmi N. Bhuyan, Professor of Department of Computer Science, University of
California, Riverside.
 Post-doctoral advisor: Dr. Satish K. Tripathi, Dean of Engineering and Johnsons Professor of Engineering,
University of California, Riverside.
 Advisees: Jennfer Yick, Howard Cheung, Archana Bhratidhasan, Vijay Ponduru, Brennen Reynolds, Julee
Pandya, Jeremy Abramsom, James Xiao-yan Fang, Keith Kong, Vijoy Pandey, Sujatha Balaraman, Xiaoxin
Wu, Raja Mukhopadhaya, Ashok Swamy, Arijit Mukherji, Narana Kannappan.
 Research Collaborators: Biswanath Mukherjee, Randy Katz, Rajeev Motwani, Matthew Caesar, T. V.
Lakshman, Tsong-Ho Wu, Gopal Mempat, Jonathan Chao, Debanjan Saha, Satish Tripathi, Erol Gelenbe,
Guiseppe Serazzi.
24
XIN LIU
Department of Computer Science
University of California
Davis, CA 95616
E-mail: liu@cs.ucdavis.edu
Tel.: (530) 754-6907
Fax: (530) 752-4767
http://www.cs.ucdavis.edu/~liu
EDUCATION
2002
Ph.D.
1997
M.S.
1994
B.S.
Electrical & Comp. Engineering, Purdue University
Electrical Engineering, Xi’an Jiaotong University
Electrical Engineering, Xi’an Jiaotong University
ACADEMIC APPOINTMENTS
2003Assistant Professor/Computer Science, University of California, Davis
2002-2003 Post-doctoral Research Associate, Univ. of Illinois, Urbana-Champaign
CURRENT RESEARCH INTERESTS
Wireless Networks; Network Security
AWARDS
2003
Best Paper Award, Computer Networks (Elsevier) Journal, for the paper "A Framework for
Opportunistic Scheduling in Wireless Networks."
RECENT RESEARCH PUBLICATIONS
1. X. Liu, E. K. P. Chong, and N. B. Shroff, “A Framework for Opportunistic Scheduling in Wireless
Networks,” Computer Networks, vol. 41, no. 4, pp. 451-474, March, 2003.
2. X. Liu, E. K. P. Chong, and N. B. Shroff, “Opportunistic Transmission Scheduling with Resource-Sharing
Constraints in Wireless Networks,” IEEE Journal on Selected Areas in Communications, vol. 19, no. 10,
pp. 2053-2064, October, 2001.
3. X. Liu, E. K. P. Chong, and N. B. Shroff, “Joint Scheduling and Power-Allocation for Interference
Management in Wireless Networks,” Proceedings of the 2002 IEEE Vehicular Technology Conference,
Vancouver, Canada, September, 2002, vol. 3, pp. 1892– 1896.
4. X. Liu, E. K. P. Chong, and N. B. Shroff, “Efficient Scheduling in Wireless Networks,” Proceedings of the
2001 IEEE INFOCOM, Alaska, April 2001, vol. 2, pp.
5. X. Liu and R. Srikant, The Timing Capacity of Single-server Queues with Multiple Input and Output
Terminals," To appear in the proceedings of the DIMACS Workshop on Network Information Theory,
2004.
PROFESSIONAL SERVICE
 Member, Technical Program Committee, IEEE INFOCOM 2003-2004
 Referee for IEEE/ACM transactions on Networking, IEEE Transactions on Computers, IEEE Computer
Magazine
 Member of IEEE Computer Society, IEEE Communications Society, and ACM.
25
Grdaute and Postdoctoral Advisors

Edwin K. P. Chong, Dept. of Elec. & Comp. Engr., Colorado State University

Ness B. Shroff, Dept. of Elec. & Comp. Engr., Purdue University

R. Srikant, Dept. of Elec. & Comp. Engr., University of Illinois
26
Download