1 - ECE Engineering, U.Va.

advertisement
SCHEDULING FILE TRANSFERS ON A CIRCUITSWITCHED NETWORK
DISSERTATION
for the Degree of
DOCTOR OF PHILOSOPHY (Electrical Engineering)
Hojun Lee
May 2004
SCHEDULING FILE TRANSFERS ON A CIRCUITSWITCHED NETWORK
DISSERTATION
Submitted in Partial Fulfillment
of the REQUIREMENTS for the
Degree of
DOCTOR OF PHILOSOPHY (Electrical Engineering)
at the
POLYTECHNIC UNIVERSITY
by
Hojun Lee
May 2004
Approved:
__________________________
Department Head
__________________________
Date
Copy No.____
ii
Approved by the Guidance Committee:
Major:
Electrical Engineering
________________________
Malathi Veeraraghavan
Professor of
Electrical and Computer Engineering
________________________
Date
Major:
Electrical Engineering
________________________
Shivendra S. Panwar
Professor of
Electrical and Computer Engineering
________________________
Date
Minor:
Electrical Engineering
________________________
Torsten Suel
Professor of
Computer and Information Science
________________________
Date
Minor:
Electrical Engineering
________________________
Edwin K. P. Chong
Professor of
Electrical and Computer Engineering
Colorado State University
________________________
Date
iii
Microfilm or other copies of this dissertation are obtainable from:
UMI Dissertations Publishing
Bell & Howell Information and Learning
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Michigan 48106-1346
iv
VITA
Hojun Lee was born in Pusan, Korea on February 22, 1971. He received the B.S.
degree in Electrical Engineering from Polytechnic University, Brooklyn, in 1997. He
received the M.S. degree in Electrical Engineering from Columbia University, New
York, in 1999. He is currently working toward the Ph.D. degree in Electrical Engineering
at the Polytechnic University.
He got a fellowship from Village Networks, Eatontown, NJ, from 2001 to 2002,
working on a network throughput comparison of optical metro ring architectures.
Mr. Hojun is a student member of IEEE, a member of KSEA (Korean-American
Scientists and Engineers Association), and a member of KSA (Korean Student
Association) in Polytechnic University.
v
To grateful thanks to my parents
vi
ACKNOWLEDGEMENTS
I would like to thank my Ph.D. thesis supervisor Professor Malathi Veeraraghavan who
influenced me most during my years at Polytechnic. Professor Malathi taught me how to
look for new areas of research, how to understand the state of the art quickly, how to
write good technical papers, and how to present my ideas effectively.
Professor E. K. P. Chong, Professor S. Panwar, and Professor S. Torsten I thank for
serving on my defense committee.
A special thanks to Hua Li from Colorado State University. He not only provided help
with packet-switched system simulations, but also valuable insights into my dissertation,
especially on discrete-time unit simulation.
I would like to acknowledge my fellow students at Polytechnic and the University of
Virginia, including Jaewoo Park, Seunghun Cha, Sungjun Lee, Jongha Lee, Sangwook
Suh, Jeff Tao, Xuan Zheng, and Tao Li.
I also owe deep thanks to my friends in New York who supported me during my
studies at Polytechnic: Jenny Kim, Seiwoon Kim, and Jaehuk Lee.
Pursuing a Ph.D. requires not only technical skill but also tremendous amount of
stamina and courage. I would like to thank my parents Kiehwa Lee, Yangja Yoo, and my
sister, Inha Lee, for sharing their unconditional love with me and giving me the necessary
amount of courage required for pursuing my goals at Polytechnic.
vii
AN ABSTRACT
SCHEDULING FILE TRANSFERS ON A CIRCUITSWITCHED NETWORK
by
Hojun Lee
Advisor: Malathi Veeraraghavan
Submitted in Partial Fulfillment of the Requirements
for the degree of Doctor of Philosophy (Electrical Engineering)
May 2004
In the current Internet, files are typically transferred using application-layer protocols
such as http and ftp with TCP as the transport protocol. TCP has been studied
extensively, extending and proving its worth as a reliable protocol under a variety of
network conditions and applications. However, current TCP implementations are not
adequate to support the high performance of such applications as encountered in eScience
projects (large file transfers). While others are working to improve TCP to work in highspeed networks, we propose an end-to-end optical circuit-switched solution called
Circuit-switched High-speed End-to-End Transport ArcHitecture (CHEETAH). This
solution is proposed on an add-on basis to the basic Internet service already available to
end hosts. This has significant advantages. It allows the optical circuit-switched network
to be run in a call-blocking mode. Given the presence of the primary path through the
viii
Internet, an end host can fall back to the TCP/IP path if its call request is blocked. We
analyze this mode of operation. We also define a call-scheduling mode of operation for
the optical circuit-switched network. This scheme is based on a new varying bandwidth
list scheduling approach, which overcomes the main drawback of using circuit-switched
networks for file transfers. Adopting this scheme in CHEETAH instead of call-blocking
mode, we might have improved gain, i.e., less file-transfer delay. For example, with a
large file (i.e., 1TB), the end-host might have a high-access service via the end-to-end
circuit over the TCP/IP backup path.
.
ix
Table of Contents
Chapter 1: Background and Problem Statement . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2: Proposed CHEETAH service and its application to file transfers . . . . . . . . 9
2.1 Equipment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Optical Connectivity Service (OCS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Hardware acceleration of signaling protocol implementations . . . . . . . . . . 11
2.4 Transport protocol used over the Ethernet/EoS circuit . . . . . . . . . . . . . . . . . 12
Chapter 3: Operating the optical circuit-switched network in call-blocking mode . . . . 14
3.1 Analytical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Numerical results for transfer delays of “large” files . . . . . . . . . 17
3.2.2 Numerical results for transfer delays of “small” files . . . . . . . . . . . 23
3.2.3 Optical circuit-switched network utilization considerations . . . . . . . 25
3.2.4 Implementation of routing decision algorithm . . . . . . . . . . . . . . . 28
Chapter 4: Call-queueing/scheduling mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 Scheduling file transfers on a single link . . . . . . . . . . . . . . . . . . 34
4.1.1 Varying-Bandwidth List Scheduling (VBLS) overview . . . . . . . 37
4.1.2 Detailed description of VBLS . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.3 VBLS with Channel Allocation (VBLS/CA) overview . . . . . . . . 40
4.1.4 Detailed description of VBLS/CA . . . . . . . . . . . . . . . . 43
4.2 Analysis and simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Traffic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.2 Validation of simulation against analytical results . . . . . . . . . . . . . . 51
x
4.2.3 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.4 Comparison of VBLS with FBLS and PS . . . . . . . . . . . . . . . . . . 63
4.2.4.1 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.5 Practical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Chapter 5: Multiple-link cases (centralized and distributed) . . . . . . . . . . . . . . . . . . . . . . 68
5.1 VBLS algorithm for multiple-link case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Analysis and simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.1 Traffic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.2 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Chapter 6: Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
xi
List of Figures
Figure 1.
Current architecture: IP routers interconnect different types of networks.
CHEETAH enables direct Ethernet/EoS circuits between hosts (see dashed
lines and text in italics); File transfers between end hosts in enterprise
building 1 and enterprise building 2 have a choice of two paths; (i) TCP/IP
path through primary NICs, Ethernet switches, Leased circuits I and II and
IP router I, (ii) Ethernet/EoS circuit through secondary NICs, MSPPs,
optical circuit-switched network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Figure 2.
Plot of equation (2) for large files wi th a link rate of 100Mbps,
 sig   sp  0.7, k  20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 3.
Plot of equation (2) for large files with a link rate of 1Gbps
 sig   sp  0.7, k  20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 4.
Plot of equation (2) for small files with a link rate of 100Mps,
 sig   sp  0.7, k  4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 5.
Plot of equation (2) for small files with a link rate of 100Mbps,
 sig   sp  0.7 , k = 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 6.
Plot of utilization u with a link rate of 100Mbps,  sig   sp  0.7, k  20 . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Figure 7.
Single link mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Figure 8.
Example of  (t ) , P1  0 ,  (0)  0 , P2  10 , z max  9 , and Pzmax  P9  80 .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Figure 9.
Shaded area shows the allocation for the example 50MB file transfers, with
i
i
 2 channels. Per channel link capacity is 1 Gbps and
Treq
 50 and Rmax
one time unit is 10ms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Figure 10.
Per-channel availability; G11  10, H11  30, n1  2 ; the  (t ) shown in Figure
8 is derived from this A j (t ) . Lines indicate time ranges when the channel is
available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Figure 11.
Dashed lines show the allocation of resources for the example 75MB file
i
i
 2 channels . . . . . . . . 50
transfer described above with Treq
 10 and Rmax
xii
Figure 12.
File latency comparison between analytical and simulation results . . . . . . 52
Figure 13.
A comparison of VBLS file latency and mean file transfer delay for files
i
requesting same Rmax
; Link capacity is 100 channels . . . . . . . . . . . . . 54
Figure 14.
File latency comparison for two different values of k (500MB and 10GB)
while keeping p constant at 10GB . . . . . . . . . . . . . . . . . . . 55
Figure 15.
Frequencies for different file size ranges (total number of bins = 50; bin-size
=1.99 GB or 1.8GB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Figure 16.
i
File throughput metric for files requesting three different values for Rmax
, 1,
2, and 4 channels (  =10Gbps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 17.
i
File throughput metric for files requesting three different values for Rmax
, 1,
5, and 10 channels (  =1Gbps) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Figure 18.
File throughput comparison for different value of Tdiscrete (0.05, 0.5, 1, and 2
sec) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Figure 19.
Utilization comparison for different values of Tdiscrete (0.05, 0.5, 1, and 2 sec)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Figure 20.
i
File throughput metric for files requesting Rmax
of 1 channel (10Gbps) . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Figure 21.
i
File throughput metric for files requesting Rmax
of 5 channels (50Gbps) . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 22.
i
File throughput metric for files requesting Rmax
of 10 channels (100Gbps) .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 23.
VBLS on a path of K hops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Figure 24.
Example of  1 (t ) , P1  0 ,  (0)  0 , P2  10 , z max  9 , and Pzmax  P9  80
- Link 1 (Shaded area shows the allocation for the example 25MB file
transfer). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Figure 25.
Example of  2 (t ) , P1  0 ,  (0)  0 , P2  10 , z max  9 , and Pzmax  P9  80
- Link 2 (Shaded area shows the allocation for the example 25MB file
transfer) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
xiii
Figure 26.
Network model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Figure 27.
Percentages of blocked calls comparison for different values of M when
Tdiscrete  0.01sec and p12  p23  5ms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Figure 28.
F i l e t h r o u gh p u t c o m p a r i s o n f o r d i f f e r e n t v a l u e s o f M w h e n
Tdiscrete  0.01sec and p12  p23  5ms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Figure 29.
Percentage of blocked calls comparison for different values of Tdiscrete when
M = 3 and p12  p23  5ms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Figure 30.
File throughput comparison for different values of Tdiscrete when M = 3 and
p12  p23  5ms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Figure 31.
Dashed lines show the allocation of resources for example 15.625MB file
i
i
 3 . . . . . . . . . . . . . . . . 83
transfer described above with Treq
 15 and Rmax
Figure 32.
Dashed lines show the allocation of resources for example 3.125MB file
i
i
 1 . . . . . . . . . . . . . . . . 85
transfer described above with Treq
 32 and Rmax
xiv
List of Tables
Table 1.
Input parameters plus the time to transfer a 1GB file and a 1TB file . . . . 19
Table 2.
Crossover file sizes in the [5MB, 1GB] range when r = 1 Gbps and Tprop =
0.1ms, k=20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Table 3.
Crossover file sizes when r = 100 Mbps and Tprop = 0.1ms . . . . . . . . . . 25
Table 4.
Notations for VBLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Table 5.
Additional notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Table 6.
Notations for multiple-link case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Table 7.
Input parameters for example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Table 8.
TRL vectors for each round (Example 1) . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Table 9.
Input parameters for example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Table 10.
TRL vectors for each round (Example 2) . . . . . . . . . . . . . . . . . . . . . . . . . . 85
1
Chapter 1. Background and Problem Statement
Files are commonly transferred on the Internet using application-layer protocols such
as http and ftp using TCP as their transport protocol. Since file transfers do not have a
stringent delay requirement, it is quite acceptable to incur the retransmission delays
associated with TCPs error-correction mechanisms or rate slowdowns associated with the
TCP’s congestion-control mechanism. Since file transfers are typically small (an average
of 10KB per flow has been cited in [1]), even with retransmission delays, the total time
for the transfers is small enough to ignore the excess delays caused by retransmissions or
rate slowdowns. However, there are some applications that require the transfer of large
files [2]-[6]. Of particular interest is the effective throughput of large-file transfers, e.g.,
terabyte- and petabyte- (1015 ) sized files created in particle physics, earth observation,
bioinformatics, radio astronomy, and other scientific studies, for which the current TCP
has been shown to be inadequate [7]-[8]. One set of solutions calls for enhancing the TCP
to improve end-to-end throughput, thus limiting upgrade to the end hosts. Such
improvements can be made via congestion control [9]-[11] and/or flow control [12]-[14].
A second set of solutions requires upgrades to routers within the Internet. For example,
Mathis [15] proposes the use of larger Maximum Transmission Unit (MTU) to improve
end-to-end throughput. Given that the Internet is a global network of IP routers that
interconnects different types of networks, as illustrated in Figure 1, to improve filetransfer performance between any two hosts connected via the Internet, research work
must focus on enhancing TCP and/or IP, as is currently being done in these two sets of
solutions.
2
We propose a third set of solutions, which we call Circuit-switched High-speed End-toEnd Transport ArcHitecture (CHEETAH). This solution leverages the dominance of
Ethernet in LANs and SONET in MANs and WANs, the optical fiber deployment to
enterprises, and the deployment of a new network system called a Multi Service
Provisioning Platform (MSPP) in enterprises. In this solution, hosts would be equipped
with second (high-speed) Ethernet NICs, which would be connected directly to ports of
enterprise MSPPs. These MSPPs have the capability of encapsulating Ethernet frames
into SONET frames for transport on wide-area segments. By establishing a wide-area
SONET/SDH circuit between the two enterprise MSPPs, and mapping Ethernet signals
from/to hosts to/from this wide-area circuit, we can realize end-to-end high-speed
“Ethernet/EoS circuits.”
Clearly this solution has limited applicability when compared to the first two sets of
solutions because it can only be used when both ends have access to CHEETAH service
and are interconnected by the same optical network. However, if a given network has a
large coverage area, this solution will be useful in many transfers. Current-day
SONET/SDH/WDM circuit-switched networks of different service providers are largely
isolated, but as standards evolve, these can be interconnected directly to achieve larger
coverage areas. A few research optical networks such as the Canarie network [16]
extends coast-to-coast across Canada.
3
Figure 1: Current architecture: IP routers interconnect different types of networks.
CHEETAH enables direct Ethernet/EoS circuits between hosts (see dashed lines and text
in italics); File transfers between end hosts in enterprise building 1 and enterprise
building 2 have a choice of two paths; (i) TCP/IP path through primary NICs, Ethernet
switches, Leased circuits I and II and IP router I, (ii) Ethernet/EoS circuit through
secondary NICs, MSPPs, optical circuit-switched network
Since we cannot provision wide-area Ethernet-over-SONET circuits between every
pair of end hosts that need to communicate, the circuit-switched network has to support
dynamic circuit setup and release. Not only have standards been specified for signaling
protocols for SONET/SDH/WDM networks [17], these signaling protocols have been
implemented in many vendors’ switches [[18] of OIF demo]. If the optical network is
operated on a dynamic shared basis, then it is possible that circuit resources may not be
available when a call request is being processed. In this case, the network can either: (i)
block the call, i.e., reject the call setup request, or (ii) queue the call.
4
Blocking calls is indeed an option in our proposed mode of operation: that the
CHEETAH service be equipped with second Ethernet cards so as not to interfere with the
host’s primary Internet service. The implication is that if a call is blocked for a lack of
resources through the optical circuit-switched network, the end host can fall back to the
TCP/IP path through its primary NIC. There are many advantages to this mode of
operation, which of course, comes at a cost of additional Ethernet cards. We described
this solution in detail in Chapter 3, in which we define the conditions under which an
Ethernet/EoS circuit setup should be attempted. For example, if the file is very small,
because of the setup overhead and the negative impact on utilization, we recommend that
such a circuit setup not be attempted. Based on the loading conditions on the two paths, it
may happen that even for large files, the TCP/IP path is the preferred one. However, there
will be conditions under which it is worth attempting a circuit setup, because if it
succeeds, a file-transfer delay can be significantly less than if it fails. For example, a 1GB
file transfer on a TCP/IP path with a round-trip time of 50ms, link rate of 1 Gbps, and a
loss probability of 0.0001, takes 395.7 sec, while on a circuit with the same link rate, the
transfer time is only 8.08 sec.
For very large files, e.g., on the order of terabytes, this solution of attempting a circuit
setup, and if it fails, falling back to the TCP/IP path, may not be a good one. Instead, if
the circuit-switched network is operated in call-queueing mode, there is a likelihood that
the total file-transfer delay, even after being queued for a circuit, could be lower than the
delay through the TCP/IP path. For example, we computed that we need over 4 days and
15 hours to transfer a 1 TB file with TCP if the round-trip propagation delay is 50ms and
bottleneck link rate is 1 Gbps, even if the packet loss rate on the end-to-end path is as low
5
as 0.0001. On the other hand, if we successfully set up a 1Gbps circuit, we can complete
the transfer in 2.3 hours. Hence, for this case (large file transfers), we propose that the
optical, circuit-switched network be run in a call-queueing mode. Prior work on call
queueing is fairly limited. In practice, call-queueing systems have not been implemented
or tested extensively. Furthermore, when call holding times are large, as can be expected
in eScience applications, where a remote scientist needs a circuit for a few hours of
experimentation, call queueing is not practical. Instead, we propose “scheduling calls.”
We discovered that this was indeed possible for file transfers if the network is provided
file sizes.
Prior work on scheduling file transfers/calls, making “book-ahead” reservations, and
packing algorithms includes [19]-[30]. In [19], Coffman describes file transfer scheduling
schemes for a star network assuming all transfers are of unit bandwidth but arbitrary
durations. The paper by Erlebach and Jansen [20] extends this work to star and tree
networks with arbitrary bandwidth and arbitrary durations. Both papers obtain
competitive ratios for on-line greedy heuristics called list-scheduling algorithms,
characterizing their performance when compared to off-line optimal solutions. The metric
being compared is makespan, which is the total time needed to transfer a set of files. The
basic list scheduling (LS) algorithm works as follows: if there is a call i such that its
required bandwidth bi is available on all links of the end-to-end path, LS schedulers the
first such call in the list of all calls. If not, it waits until one of the active transfers
completes. This heuristic is extended for arbitrary bandwidth in a star network to
Decreasing Bandwidth List Scheduling (DBLS), and for trees as Level-List Scheduling
(LLS) and List Scheduling by Levels (LSL) schemes [20]. Both [19] and [20] require file
6
transfer requests to specify a bandwidth requirement, and schedule the constant
bandwidth requested for each transfer.
For arbitrary topologies, the file-scheduling problem has added dimension of route
decision. In [21], four greedy algorithms are proposed, all of which appear to be focused
on the route selection problem rather than on file scheduling. Although the file transfer
request specifies the file sizes (as in our approach), it is a packet-switched network, the
whole link bandwidth is used for each transfer and buffers are assumed at all switches.
Thus, it is different from our problem, which is aimed at circuit-switched networks.
There has been some work on book-ahead (BA) reservations in which a user requests
that a connection be set up at some future time. In such reservations, typically the
duration of the call is specified, either as a deterministic number or as a distribution.
Papers on this topic include [22]-[25]. All of these papers develop and analyze
algorithms, centralized or distributed, to perform BA reservations, but only in
conjunction with call- blocking. None address the issue of call queueing or scheduling. In
other words, when a user makes a BA request, the network analyzes the probability of
honoring this request at the desired future start time if the call is admitted now. If this
probability is below some threshold, the network will simply block the call. None of
these papers address the concept of giving the BA call a delayed start time relative to the
requested start time. Some of these papers [22]-[25] analyze schemes that allow for the
sharing of resources between IR calls and BA calls. They typically assume that IR calls
do not specify their expected durations. In contrast, we require all calls, IR and BA, to
specify their file sizes.
7
In bin packing problems, the goal is to pack a given set of blocks into finite-sized bins
using the smallest number of bins [26]. In container loading problems, the goal is to fill a
single bin of infinite height to a minimum possible height [27]. In the knapsack-loading
problem, each block has an associated profit and the problem is to select those blocks that
will maximize profit when loaded into the bin [28]. This classification of packing
problems is obtained from [29]. In none of these problems can the blocks be broken up
into pieces to fit into bins, which is what happens when we schedule files with varying
bandwidth allocation in different time ranges. Thus, the heuristics proposed for these
problems do not directly apply. Solutions to the job-shop scheduling problem [30]
typically compute optimal schedules that can be used for the offline problem equivalent
to our online problem. Our problem is online because resources have to be scheduled for
a file transfer without information about file transfer requests that arrive subsequent to
the request being scheduled.
We examine the problem of call scheduling with knowledge of file sizes in detail in
Chapters 4 and 5. If such a mechanism is implemented in the optical circuit-switched
network, the end host application can attempt a circuit setup for very large files, obtain a
completion time for the transfer based on the schedule allocated, and then decide whether
it should resort to the TCP/IP path. If the optical circuit-switched network is heavily
loaded, then indeed the 1TB file transfer request may result in an answer from the
network, which is greater than 4 days and 15 hours. In this case, the end host can resort to
the TCP/IP path.
The CHEETAH solution of growing-in a network solution as an add-on to basic
Internet access is comparable to providing people multiple transportation options. For
8
example, a traveler between New York and Washington DC has a choice of flying, riding
a train or driving, all of which are more-or-less comparable from a time perspective.
Based on conditions on these three transportation networks, the traveler can choose. With
CHEETAH, we provide a similar choice to end host applications.
9
Chapter 2. Proposed CHEETAH service and its
application to file transfers
Our solution calls for equipping end hosts with second (high-speed) Ethernet NICs and
connecting these NICs directly to MSPPs, as illustrated in Figure 1. MSPPs are then
interconnected across wide-area networks using EoS circuits. The circuits are established
and released dynamically using signaling protocols. Section 2.1 describes the equipment
needed to support the CHEETAH service.
Since the CHEETAH service can only be used for communication between end hosts
located on an optical circuit-switched network, a host requires some support to first
determine whether its correspondent end host (the end host with which it is
communicating) is reachable via an end-to-end Ethernet/EoS circuit. In Section 2.2, we
describe a support service for this purpose called “Optical Connectivity Service (OCS).”
Next, we consider the question of how to use the CHEETAH service for file-transfer
applications. File-transfer sessions require the exchange of many back-and-forth
messages in addition to the actual file transfer. We propose using a TCP connection via
the primary Internet path for such short exchanges, and limiting the use of end-to-end
Ethernet/EoS circuits for the actual file transfers. To achieve high utilization of the
circuit-switched network, we propose (i) setting up the end-to-end high-speed
Ethernet/EoS circuit just prior to the actual transfer and releasing it immediately after the
file transfer, (ii) operating the circuit-switched network in call-blocking mode, (iii) using
circuits only for certain transfers, and (iv) using a unidirectional EoS circuit from the
server to the client (since this is the primary direction of data flow).
10
The implication of holding circuits only for the duration of file transfers is that call
holding times can be quite small. For example, a 1MB transfer on a 100Mbps link incurs
a transmission delay of only 80ms. This means call setup delays should be kept low and
call handling capacities of switches should be high. Therefore, we recommend a
hardware-accelerated implementation of signaling protocols at MSPPs, Add/Drop
Multiplexers (ADMs), crossconnects and other optical circuit switches. Section 2.3
describes our current work on hardware-accelerated signaling implementations.
In Section 2.4, we consider the question of transport protocols for end-to-end
Ethernet/EoS circuits. We found a transport protocol called Scheduled Transfer (ST), an
ANSI standard [31], which is ideally suited for end-to-end Ethernet/EoS circuits. Section
2.4 describes our data-transport approach.
2.1 Equipment
Due to the “add-on” characteristic of the CHEETAH service, hosts that want access to
this service should be equipped with second Ethernet NICs that are connected “directly”
to the MSPP Ethernet cards as shown in Figure 1. Some of the MSPPs and
SONET/SDH/WDM switches (crossconnects, ADMs) should be enhanced with signaling
protocol engines to handle dynamic call setup and release. Circuits can be provisioned
between nodes that do not have signaling capability. Adding signaling engines to MSPPs
allows for concentration on access links from enterprises. Furthermore, application
software in end hosts should be upgraded to interface with the CHEETAH service.
11
2.2 Optical Connectivity Service (OCS)
A support service called the “Optical Connectivity Service (OCS)” is proposed to
provide end hosts a mechanism to determine whether or not their correspondent end hosts
have access to the CHEETAH service. OCS can be implemented much like the Domain
Name Service (DNS) with enterprises and service provider networks maintaining servers
with information on end hosts that have access to the CHEETAH service. These servers
would answer queries from end hosts in much the same manner as DNS servers answer
queries for IP addresses and other information. With caching, the delay incurred in this
step can be reduced.
2.3 Hardware acceleration of signaling protocol implementations
Processing signaling protocol messages involves many data table reads/writes,
parsing/constructing complex messages, maintaining state information, managing timers,
etc. For example, consider call setup. Upon receiving a call setup messages, a call
processor needs to parse out parameters, such as destination address, bandwidth
requested, etc. and then perform several actions. First, it determines the next-hop switch
through which to reach the destination typically by consulting a precomputed routing
table (similar to the longest-prefix match operation in IP routers). Second, it selects an
interface connected to the selected next-hop switch on which sufficient bandwidth is
available. Third, it selects free time-slots and/or wavelengths on the selected interface.
Finally, it programs the switch fabric by writing a switch configuration table. This table is
used by the switch to route data bits received at/on a given timeslot/wavelength on an
incoming interface to a given timeslot/wavelength on a corresponding outgoing interface.
Other actions performed by the signaling protocol processing engines at switches include
12
updating state information and constructing the outgoing signaling processing engines at
switches include updating state information and constructing the outgoing signaling
message. Similar actions are performed for circuit release.
Accelerating signaling protocol processing engines is a challenging task. In [32], the
other group of colleagues designed the signaling protocol specifically for SONET
networks with a goal of achieving high performance rather than flexibility. They
implemented the basic and frequently used operations in Field Programmable Gate
Arrays (FPGAs), and relegated the complex and infrequently used operations (e.g.,
processing of optional parameters and error handling) to software. They modeled the
signaling protocol in VHDL and then mapped it onto two FPGAs on the
WILDFORCETM reconfigurable board with a Xilinx XC4036XLA FPGA with 62%
resource utilization and a XC4013XLA with 8% resource utilization. The hardware
implementation handles four messages: Setup, Setup-success, Release and Releaseconfirm. From the timing simulations, done using the ModelSim simulator, call setup
message processing consumes between 77-101 clock cycles. Assuming a 25 MHz clock,
this translates into 3.08-4 s. Compare this with the millisecond-based software
implementations of signaling protocols [33].
2.4 Transport protocol used over the Ethernet/EoS circuit
In this section, we consider the question of what transport protocol to use on these endto-end high-speed Ethernet/EoS circuits. TCP is a poor choice for dedicated end-to-end
circuits because of its slow start and congestion avoidance algorithms. Also, TCP’s
window-based flow control and positive-ACK based error control scheme are not well
suited for dedicated end-to-end circuits. Hence we considered a number of other transport
13
protocols, some high-speed transport protocols such as [34]-[35] and some OS bypass
protocols [36]-[38]. Of these, we selected the Scheduled Transfer (ST) protocol, which is
an ANSI standard [31], and is ideally suited for end-to-end circuits carrying Ethernet
frames.
ST provides sufficient hooks to allow for a high-speed, OS-bypass implementation, a
feature that is necessary to achieve true high-speed end-to-end throughput. It does this by
having the sender specify a receiver memory address in the data block header, which
causes the receiving NIC to simply write the received payload using Direct Memory
Access (DMA) into the specified memory location. This results in a low end-host
transport layer delay. ST offers flexibility in its flow control and error control schemes.
For flow control, we propose using a rate control approach in which the circuit rate is
selected to taking into account the rate at which the receiving application can process
received data from memory. An alternative is to have the receiver allocate a large-enough
buffer space for the entire file prior to the start of the transfer. This solution however
limits the maximum size of files that can be transferred, which may anyway be necessary
from a network circuit-sharing perspective. This means we limit file sizes to a Maximum
File Transfer Size (MFTS) per session. For error control, we propose using ST’s support
for negative acknowledgments (NAKs) given that data blocks will be delivered in
sequence on the Ethernet/EoS circuit. Missing/errored blocks resulting from bit errors
will need to be retransmitted. ST supports the selective repeat approach.
14
Chapter 3. Operating the optical circuit-switched
network in call-blocking mode
In this section, we analyze the case when the optical circuit-switched network is
operated in call-blocking mode. If a call is blocked, the end host application falls back to
the TCP/IP path. Given that end hosts with access to CHEETAH service have two paths
for certain transfers, they have to make a routing decision on whether or not to attempt
setting up an Ethernet/EoS circuit. Such an attempt is not always a good idea as we will
show through analysis below. On the other hand, in some circumstances, if there is a
large difference in the delay on the two paths, it may be well worth attempting the circuit
setup, which we also show the analysis below.
3.1 Analytical model
Let E[Tcheetah ] be the mean delay incurred if an Ethernet/EoS circuit setup is attempted
prior to the file transfer.
E[Tcheetah ]  (1  Pb )( E[Tsetup ]  Ttransfer )  Pb ( E[T fail ]  E[Ttcp ])
(1)
where Pb is the call-blocking probability on the optical circuit-switched network,
E[Tsetup ] is the mean call-setup delay of a successful circuit setup, Ttransfer is the time to
transfer the file on the Ethernet/EoS circuit, E[T fail ] is the mean delay incurred in a failed
call setup attempt, and E[Ttcp ] is the mean delay incurred in sending the file on the
TCP/IP path. If the call is not blocked, mean delay experienced is E[Tsetup ]  Ttransfer  , but
if it is blocked, then after incurring a cost E[T fail ] , the end host has to use the TCP/IP
path and hence will incur the E[Ttcp ] delay. Comparing E[Ttcp ] , the delay incurred if a
15
circuit setup is not attempted, with E[Tcheetah ] , the delay incurred if a circuit setup is
attempted, and approximating E[T fail ] to be equal to E[Tsetup ] , results in:
 E[Tsetup ] 

  E[Ttcp ]  Ttransfer 
use TCP/IP path
(2)
1

P
b 

 E[Tsetup ] 


attempt circuit setup
if
 1  P   E[Ttcp ]  Ttransfer 
b 

Next, we obtain expressions for E[Ttcp ] , E[Tsetup ] and Ttransfer . E[Ttcp ] is obtained using
if
the models of [39]-[40], which capture the time spent in slow start E[Tss ] , the time spent
in congestion avoidance E[Tca ] , the expected cost of a recovery following the first loss
E[Tloss ] , and the time to delay the ACK for the initial segment E[Tdelayack ] .
E[Ttcp ]  E[Tss ]  E Tloss   ETca   E[Tdelayack ]
(3)
E[Tss ] is a function of Round-Trip Time (RTT), Wmax , which is a limitation posed by the
sender or receiver window, w1 , the initial congestion window, Ploss , the loss rate, the
number of data segments in the file transfer, and the number of segments for which an
ACK is generated (for example, if ACK-every-other-segment strategy is used, this
number is 2). The E[Tloss ] term is a function of To , which is the average duration of a
first time-out in a sequence of one or more time-outs, and RTT and the probability of the
first loss being detected with a retransmission time-out or with a triple duplicate ACK.
The reader is referred to [39] for details of these two terms E[Tss ] and E[Tloss ] . E[Tca ] is
a function of the number of data segments in the file transfer, Ploss , RTT, To , and Wmax ,
as derived in [39]. We set the final term E[Tdelayack ] to 0 because we assume a starting
initial window size of 2 [41] and the ACK-every-other-segment strategy. We do not
16
include TCP connection setup time assuming that the connection is already open (because
messaging needed prior to the actual file transfer, such as the name of file being
requested, would require the TCP connection to be opened first). If this is the first file
transfer within an application session, then the actual file transfer would start in the slow
start phase. For subsequent transfers, the window could potentially be in a state that
indicates are often long, and [41] specifies that the congestion window should reset to a
Restart Window size (2 segments) whenever the session is idle for more than one
retransmission timeout. Hence we assume that all file transfers start in the slow start
phase.
The mean call setup delay E[Tsetup ] 1 includes mean-signaling message transmission
delays, mean call processing delays (to process signaling protocol messages), and a
round-trip propagation delay.


 sig 
 sp 
  (k  1)  Tsp  1 
  k  T prop
 1 
(4)
 2(1   ) 
 2(1   ) 
rs
sig 
sp 


is the cumulative size of signaling messages used in a call setup, rs is the signaling
E[Tsetup ] 
m sig
msig
link rate, k is the number of switches on the end-to-end path, Tsp is the signaling
message processing time incurred at each switch, and T prop is the round-trip propagation
delay.
The second component Ttransfer is the actual file-transfer delay:
We assume the queueing delay for the signaling link with an M/D/1 queue at a load sig, and the queueing delay for the call
processor also with an M/D/1 queue at load sp. M/D/1 queueing models are quite accurate since inter-arrival times between file
transfers have been shown to be exponentially distributed [ [42]], and signaling message lengths and call processing delays are moreor-less constant.
1
17
f T prop
(5)

rc
2
where f is the sizes of the file being transferred and rc is the data rate of the circuit. We
Ttransfer 
have not included retransmission delays here because on Ethernet/EoS circuits,
retransmissions are only required when random bit errors affect a block of data, and
theses types of errors also impact delays on the TCP/IP path. Since our approach is to
compare delays on the TCP/IP path and on Ethernet/EoS circuits before deciding whether
or not to attempt a circuit setup, we have omitted retransmission delays due to bit errors
on both paths. Including this delay would in fact favor using the Ethernet/EoS circuit.
This is because bit errors on the TCP/IP path would be misinterpreted as packet losses
caused by congestion leading to reductions in sending rates.
3.2 Numerical results
3.2.1 Numerical results for transfer delays of “large” files
Input parameter values assumed for the numerical computation are shown in Table 1.
We assume four values for Ploss , two values for the bottleneck link rate r, and three values
of the round-trip propagation delay T prop to create a total of 24 cases. RTT is computed
from T prop and a rough estimate of queueing plus service delay at the bottleneck link. We
derive this estimate by determining the load at which an M/D/1/k system 2 will experience
the assumed Ploss values. Wmax , as stated earlier, is determined by limitations on the
sender or receiver window. For all the cases, we set Wmax to the delay-bandwidth product,
2
While packet transmission (service) time is more-or-less deterministic because of MTU restrictions, the packet arrival process at a
buffer feeding the bottleneck link is known not to be a Poisson process. However, we use this approximate model to obtain a rough
estimate of queueing plus service delay. As seen from the numerical values, this component is not significant.
18
i.e., Wmax  RTT  r . When the congestion window reaches Wmax , any further increase is
irrelevant because the system will reach a streaming state in which ACKs are received in
time to permit further packet transmissions before the sender completes emitting its
current congestion window.
Using the input parameters shown in Table 1, we compute E[Ttcp ] given by (3) for a
1GB file and 1TB file and list the values in the last two columns of Table 1. The roundtrip propagation delay T prop has a significant impact on total file-transfer delay. For
example, for a 1GB file transfer, increasing T prop from 5ms to 50ms results in a
considerable increase in E[Ttcp ] from 89.45s to 396.5s. Also, at large values of the roundtrip propagation delay T prop (50ms), for a given Ploss , there is not much benefit gained
from increasing the bottleneck link rate from 100Mbps to 1Gbps. Compare 396.5s for a
100Mbps link with the 395.7s number using a 1Gbps link for the 1GB file transfer.
Increasing the bottleneck link rate has value when propagation delay is small. The higher
the rate, the smaller the propagation delay at which this benefit can be seen. Loss
probability Ploss also plays an important role. Even in a low propagation delay
environment ( T prop of 0.1ms), E[Ttcp ] jumps from 82.25s to 283.56s for the 1GB file
transfer for Ploss increase from 0.0001 to 0.1. If an end-to-end GbE/EoS circuit is
established for the 1GB file transfer, the sum E[Tsetup ]  Ttransfer is 80.08sec when the link
rate is 100Mbps and 8.08sec when the link rate is 1Gbps. These numbers are obtained
assuming both  sp and  sig are 0.8, T prop is 50ms and there are 20 switches on the endto-end path. The major component of these values is Ttransfer . E[Tsetup ] is only 55.3ms.
19
The total message length for call setup related signaling messages is assumed to be
100bytes and the call processing delay per switch is assumed to 4  sec given our
hardware-accelerated signaling implementations (see Section 2.3).
Case
Table 1: Input parameters plus the time to transfer a 1GB file and a 1TB file
Intermediate
derived Final results
Input parameters
results
Loss
Round-trip Queueing RTT
Rate
E[Ttcp ] E[Ttcp ] for
Wmax
propagatio delay
(ms)
r
Ploss
(pkts)
for a a 1TB file
n delay
plus
1GB
service
T prop
file
time
Case 1 0.0001
100
Mbps
0.1ms
0.2ms
0.3
2.5
82.25
22.9 hours
Case 2
5ms
5.2
41
89.45
1 day and
1.3 hours
Case 3
50ms
50.2
418
396.5
4 days and
15.3 hours
0.12
10
8.25
2.3 hours
Case 4 0.0001
1Gbps
0.1ms
0.02ms
Case 5
5ms
5.02
418
39.6
11.1 hours
Case 6
50ms
50.02
4168
395.7
0.36
3
82.93
4 days and
14.9 hours
22.9 hours
Case 7 0.001
100
Mbps
0.1ms
0.26ms
Case 8
5ms
5.26
43.8
135.4
Case 9
50ms
50.26
418.8
1293
0.13
10.8
8.64
1 day and
0.1 hour
4 days and
15.4 hours
2.3 hours
5ms
5.03
419
129.4
11.1 hours
50ms
50.03
4169
1287
0.48
4
92.41
4 days and
14.9 hours
22.9 hours
5ms
5.38
44.8
471.7
50ms
50.38
419.8
4417
0.138
11.5
12.43
Case
10
Case
11
Case
12
Case
13
Case
14
Case
15
Case
16
0.001
0.01
0.01
1Gbps
100
Mbps
1Gbps
0.1ms
0.1ms
0.1ms
0.026ms
0.38ms
0.038ms
1 day and
0.2 hours
4 days and
15.7 hours
2.3 hours
20
Case
17
Case
18
Case
19
Case
20
Case
21
Case
22
Case
23
Case
24
0.1
0.1
100
Mbps
1Gbps
5ms
5.038
419.8
441.7
11.2 hours
50ms
50.04
4169.8
4387
4 days and
14.9 hours
0.78
6.5
283.56
22.9 hours
5ms
5.68
47.33
2064.9
50ms
50.68
422.33
18424
0.168
14
61.07
1 day and
0.3 hours
4.days and
16.3 hours
2.3 hours
5ms
5.068
422.33
1842.4
11.2 hours
50ms
50.07
4172.3
18202
4 days and
15 hours
0.1ms
0.1ms
0.68ms
0.068ms
Compare the file-transfer delays for a 1TB file shown in Table 1 with delays on an endto-end high-speed Ethernet/EoS circuit. For example, with a 1Gbps Ethernet/EoS circuit,
a 1TB file will take about 2.2 hours, which is comparable to the TCP/IP path numbers for
the low propagation delay environment when T prop is 0.1ms, but significantly less than
the TCP/IP path numbers when T prop is 5 or 50ms. The bulk of the 2.2 hours number is
the file transfer time Ttransfer ; E[Tsetup ] is in the order of ms as shown above. This is not a
surprising result because the delay for the end-to-end circuit is possible only if the call is
not blocked. Once a circuit is set up, there is no reduction in delay due to competition
from other users.
To take into account blocking probability, we plot (2), the basis for the routing
decision, in Figure 2 and Figure 3 for the 100Mbps and 1Gbps link rates, respectively.
For the three horizontal lines on which Pb values are listed, the y-axis is the left-hand size
of (2), i.e., E[Tsetup ] (1  Pb ) . For the remaining three lines, which are marked
21
“Difference” with Ploss values, the y-axis is the right-hand side of (2), i.e., E[Ttcp ]  Ttransfer .
In Figure 2, when the link rate is 100Mbps for the entire file range (5MB, 1GB), an
Ethernet/EoS circuit should be attempted if Pb and Ploss have the values shown. This is
because E[Tsetup ] (1  Pb ) is always less than the difference term E[Ttcp ]  Ttransfer (see (2)).
However, when the bottleneck link rate increases to 1Gbps (see Figure 3), while we see
a similar pattern when T prop is 50ms (WAN environment), in a lower-propagation delay
environment (Figure 3 (a) in which T prop = 0.1ms), we see that there are crossover file
sizes below which an end host should resort directly to the TCP/IP path and above which
it should attempt an Ethernet/EoS circuit setup. These crossover file sizes are listed in
Table 2.
Figure 2: Plot of equation (2) for large files with a link rate of 100Mbps,  sig   sp  0.7 ,
k = 20
22
Figure 3: Plot of equation (2) for large files with a link rate of 1Gbps,  sig   sp  0.7 ,
k = 20
Table 2: Crossover file sizes in the [5MB, 1GB] range when r = 1 Gbps and Tprop =
0.1ms, k=20
Pb  0.01
Pb  0.1
Pb  3
Measure of loading on
ckt. sw.
network
TCP/IP path
Ploss  0.0001
22MB
Ploss  0.001
9MB
Ploss  0.01
<5MB
24MB
30MB
10MB
<5MB
12MB
<5MB
In the current-day Internet, where bottleneck link rates are in the order of Mbps for
enterprise users, it is worthwhile attempting a circuit setup for files 5MB and over in
most MAN and WAN environments ( T prop of 0.1ms, 5ms, 50ms). This holds true even as
rates increase to 100Mbps. But as links become upgraded to the Gbps, such circuit
attempts should be made mainly in wide-area environments or for larger files.
23
3.2.2 Numerical results for transfer delays of “small” files
Even though our motivation for this work comes from high-end scientific applications
with very large files, we wanted to understand whether the CHEETAH service could be
used for smaller files (100KB to 5MB). Unlike larger files, where we studied the impact
of link rate, here we study the impact of the number of switches on the end-to-end path
keeping the link rate at 100Mbps. Figure 4 plots the results for the case when the
numbers of switches on the end-to-end path k is 4 and Figure 5 plots the k = 20 case.
(a) Tprop is 0.1ms
(b) Tprop is 50ms
Figure 4: Plot of equation (2) for small files with a link rate of 100Mbps,  sig   sp  0.7 ,
k=4
Our first observation is that in wide-area network scenarios Figure 4 (b) and Figure 5
(b) for the entire file range (100KB, 5MB), an Ethernet/EoS circuit should be attempted if
Pb and Ploss are (0.01, 0.001, 0.001) and (0.3, 0.1, 0.001) respectively. This is because
the difference term E[Ttcp ]  Ttransfer is always greater than E[Tsetup ] / 1  Pb  .
24
For a lower-propagation delay environment, e.g., T prop is 0.1ms, in Figure 4 (a) and
Figure 5 (a), we see crossover file sizes below which an end host should resort directly to
the TCP/IP path and above which it should attempt an Ethernet/EoS circuit setup. Theses
crossover file sizes are listed in Table 3. The number of switches on the end-to-end path k
has little impact on the total transfer times, but it does not affect E[Tsetup ] especially when
T prop is 0.1 ms. As a result, crossover file sizes in Figure 5 (a) are much larger than those
in Figure 4 (a), as seen in Table 3.
(a) Tprop is 0.1ms
(b) Tprop is 50ms
Figure 5: Plot of equation (2) for small files with a link rate of 100Mbps,  sig   sp  0.7 ,
k = 20
In summary, in the current-day Internet, where bottleneck link rates are in the order
of Mbps for enterprise users, it is worthwhile attempting a circuit setup for files 5MB and
over in most MAN and WAN environments ( T prop of 0.1 ms, 5ms, 50ms). This holds true
even as rates increase to 100Mbps. But as links become upgraded to the Gbps range, such
circuit attempts should be made mainly in wide-area environments or for larger files.
25
Table 3: Crossover file sizes when r = 100 Mbps and Tprop = 0.1ms
Number of switches on the path k = 4
Measure of loading
ckt. sw.
network
on
TCP/IP path
Number of switches on the path k = 20
Pb  0.01
Pb  0.1
Pb  3
Pb  0.01
Pb  0.1
Pb  3
Ploss  0.0001
610KB
640KB
840KB
2.4MB
2.65MB
3.4MB
Ploss  0.001
490KB
730KB
2MB
2.2MB
2.8MB
Ploss  0.01
120KB
140KB
500KB
550KB
650KB
550KB
140KB
3.2.3 Optical circuit-switched network utilization considerations
While file-transfer delay is an important user measure for making the routing decision
of whether or not to attempt a circuit setup, service provider measures such as utilization
should also be considered since utilization ultimately does impact users through prices
charged. Total network utilization has two components: aggregate network utilization u a
and per-circuit utilization u c , which are given by:
ua 
(1  Pb )  
 m / m!
, where Pb  m
(Erlang-B formula),
m
k
  / k!
(6)
k 0
uc 
E[Ttransfer ]
E[Tsetup ]  E[Ttransfer ]
, where E[Ttransfer ] 
E[ X ]
,
rc
(7)
 is the offered traffic, m is the number of circuits, E[ X ] is the average file size, and rc
is the circuit rate.
Restricting transfers on the circuit-switched network to files larger than some crossover
file size,  , we can compute the fractional offered load  ' and the average file size
E[ X | ( X   )] if we know the distribution of file sizes. Reference [43] suggests a Pareto
distribution for file sizes. Using this distribution, we compute the fractional offered load
 ' as:
26


k
 (  1)  k  
 
'
P( X   ) E[ X | ( X   )] 
   
E[ X ]
k      1

 1
(8)
where , the shape parameter, is 1.06 and k, the scale parameter, is 1000 bytes as
computed in [43],
and  is the total offered load. We note that the offered load
decreases as  increases, which means aggregate utilization u a decreases for a given
Pb . However, as  increases, per-circuit utilization u c increases.
Combining the two components of utilization, we obtain total utilization u as follows:
u
(1  Pb )   '
( E[ X | X   )]) / rc

m
E[Tsetup ]  ( E[ X | X   )]) / rc
(9)
We plot the total utilization u in Figure 6 for different call-blocking probabilities Pb ,
different values of  and T prop . As crossover file size  is increased, the plots show
utilization increasing because of the second factor, i.e., the per-circuit utilization
increases. However, the drop in the offered load and the corresponding drop in the
aggregate utilization shows the increase of the total utilization, making it stable at some
value below 1 or even dropping it slightly. In these plots, to keep Pb constant as  is
increased, we compute m for each value of  , using the second equation of (6). The
“zigzag” pattern of the plots occurs because m has to be an integer.
27
(a) Pb = 0.3
(b) Pb = 0.01
Figure 6: Plot of utilization u with a link rate of 100Mbps,  sig   sp  0.7 , k = 20
From our file-transfer delay analysis, we did not have a crossover file size when T prop
is large (e.g., 50ms), but from the utilization analysis here we see the need to place a
lower bound. Without such as lower bound, per-circuit utilization can be poor. For
example, for a 100KB file transfer on a 100Mbps circuit with 4 switches on the end-toend path, we need 50.158ms setup time and 8ms total transfer time. As a result, the percircuit utilization is only 13.7%, which is why the 50ms plots are at a lower utilization
than the 0.1ms plots in Figure 6.
Another observation is that high utilizations are possible by operating the network at
high call-blocking probability (30%). For example, with   50 and T prop  0.1ms , with
a blocking probability of 30%, we can achieve a 90% utilization at the crossover file size
of 150KB, while at a low blocking probability (1%), we can only achieve a 73%
utilization for the same crossover file size (150KB). Thus, when the CHEETAH service
is first introduced, the initial number of end hosts equipped with second NICs and
28
enterprises equipped with MSPPs will be small. The network can be operated at a high
utilization and high call-blocking probability with many file transfers resorting to the
TCP/IP path upon rejection from the optical network. But with growth in the number of
CHEETAH service participants (as  increases), lower call-blocking probabilities can be
achieved while maintaining high utilization.
These plots have been generated assuming all calls are of the long-distance variety
( T prop is 50ms) or all calls are in small propagation-delay environments ( T prop is 0.1ms).
In reality, different file transfers will experience different round-trip propagation delays.
This means the routing decision algorithm should have different crossover file sizes for
different end-to-end paths.
3.2.4 Implementation of routing decision algorithm
The routing decision algorithm implemented at an end host could use dynamically
obtained values of RTTs, P b , Ploss , and link rate. However, such as dynamic algorithm
could be complex. While RTT measurements can be made during the TCP connection
establishment handshake, other parameters are harder to estimate. Tomography
experiments have shown that Ploss can be estimated by end hosts [44]. Other options are
to have network management stations track these values and respond to queries from end
hosts. Since the benefit of using Ethernet/EoS circuits may not be significant for small
file sizes, we need to carefully study the value of introducing this complexity.
Alternatively, we could define static values for RTT and crossover file size based on
nominal operating conditions of the two networks and simplify the routing decision
algorithm implemented at end hosts. This needs experimental study.
29
Another question is whether the CHEETAH service should be implemented from IP
router-to-router rather than end-to-end. We note the routing decision on whether or not to
attempt an Ethernet/EoS circuit is difficult to make within an IP router. This is because it
is hard to extract information on the file size and RTT at a router that supports many
flows, and both these parameters are important in making this decision. Other attempts
have been made in the past to perform flow classification within routers and then trigger
cut-through connections between routers [45]. Given the difficulties with theses
solutions, we realize that the routing decision is best made at the end hosts where it is
easier to determine these parameters, and hence propose CHEETAH as an end-to-end
service. This work was presented at PFLDNET2003 Workshop [46] and Opticomm2003
[47].
30
Chapter 4. Call-queueing/scheduling mode
In Table 1 (see Section 3.2.1), we list the delays for a 1TB file transfer. The numbers
show that we need over 4 days and 15 hours with TCP if the round-trip propagation delay
is 50ms almost independent of bottleneck link rate (100Mbps or 1Gbps) and Ploss. On the
other hand, if we set up a 1Gbps circuit, we can complete the transfer in 2.3 hours. This
means that with such large files, we should not adapt an "attempt-circuit-setup-and-ifrejected-fall-back-to-Internet-path" approach (blocking mode). Instead, if the circuitswitched network offered some form of call queueing, then perhaps the wait time for a
circuit could be shorter than the 4-day 15-hour TCP time. Hence we started looking for
call-queueing algorithms. However, we quickly concluded that call queueing was really
not feasible in a multiple-hop circuit because utilization would suffer significantly if an
upstream switch held resources while the call is queued at a downstream switch. Instead
call scheduling was possible if file size information was provided to the network.
Before we design call-scheduling schemes for multiple-hop paths, we design callscheduling algorithms for a single-link case. In the single-link problem, the main question
is what bandwidth to assign a file transfer. Given that file transfers can occur at any rate,
this is a key question that needs to be answered. We propose a scheme in which a file
transfer is provided a vector of Time-Range-Capacity (TRC) allocations when admitted,
where the capacity allocation varies from time range to time range. This is unlike the
fixed bandwidth allocation mode where a fixed assignment of bandwidth is made for the
entire duration of the transfer. To enable our proposed TRC allocation mode, we require
end hosts to provide the network the sizes of the files to be transferred. With information
31
on the size of the file that an end host wants to transfer, the network can fit this file into
time ranges when bandwidth is available based on the TRC allocations of ongoing
transfers. This allows the network to offer an incoming file transfer an increased amount
of bandwidth for future time ranges when there are fewer competing transfers.
Besides file size, we require end hosts requesting a file transfer i to specify two more
i
i
parameters: Rmax
, a maximum bandwidth limit for the transfer, and Treq
, a desired start
time for the transfer.
i
First consider Rmax
. In practice, any amount of bandwidth can be allocated to a new
transfer because file transfers do not have an inherent bandwidth requirement. However,
in practice, end hosts engaging in file transfers have communication link interface
bandwidth limitations and/or processing limitations. Making a bandwidth allocation
i
larger than Rmax
will only result in wasted bandwidth. Hence, we require end hosts to
provide the network this information. If a given transfer has no such limit (in other
i
words, its limit is larger than the shared link bandwidth), then Rmax
is simply set to be
equal to the link bandwidth.
i
Second, consider Treq
. An end host application may want to make a reservation for a
circuit to transfer a file at a later time, say when it expects to have more resources. By
booking ahead, it should have a greater probability of receiving the maximum bandwidth
it requests from its requested start time. As in any reservation system, end users that alert
the resource arbitrator ahead of time should be encouraged to allow for a better
management of resources. Hence, in solving the problem of resource allocation for file
transfers, we allow both “immediate-request” (IR) and “book-ahead” (BA) calls.
32
To understand the impact of these parameters, consider the following. If we disallow
i
BA calls, and there is no Rmax
constraint, the single-link sharing problem becomes
i
simple. Without Rmax
constraints, it means all transfers can take advantage of the full link
capacity. Thus each file is simply transmitted one after the other. In comparing such a
solution with packet-by-packet statistical multiplexing, the main issue becomes fairness.
If a very large file grabs the full resources of the link, smaller transfers will end up with
unduly large queueing delays. A solution is to limit the maximum file size to some
number, which we call a Maximum File Transfer Size (MFTS). This is analogous to the
Maximum Transmission Unit (MTU) used in packet-switched networks to solve the same
fairness problem at the packet level. The smaller the MFTS, the fairer the solution. In the
limit if MFTS is equal to MTU, then the scheme reduces to a packet-by-packet statistical
multiplexing scheme.
A slightly more complex problem is one in which we still disallow BA calls but allow
i
Rmax
to vary from transfer to transfer. In this case, the switch has to find the time instant
i
beyond which each channel becomes free and try to maximally assign Rmax
channels in
each time range. The problem is still not that complex because an incoming call will not
find any “holes” in the allocation, i.e., there will be no free time ranges followed by
reserved time ranges on any channel past the instant of its arrival. To find a TRC vector
for this incoming transfer, the network can simply assign a minimum of the available
i
capacity and Rmax
on different time ranges. Adding in BA calls creates holes in the future
schedules making it a more complex problem.
33
In summary, the goal of this work is to develop a scheduling scheme to schedule file
i
i
i
transfers characterized by ( F i , Treq
is the
, Rmax
) , where F i is the size of the file, Treq
i
requested starting time, and Rmax
is the requested maximum bandwidth, on a single link
of capacity C. To visualize a mode of our system, see Figure 7. Source hosts make
requests to transfer files to a destination D through a shared link leading out of the
switch. We assume that the shared single link consists of m channels. File transfer
i
requests arrive and depart dynamically. The Rmax
constraint can be thought of as arising
from the access links connecting each source node S i to the switch. A small part of the
shared link resources is set aside for signaling and the remaining part is allocated for the
actual file transfers. We expect the delay impact incurred from signaling (call setup delay
plus increased transfer delay resulting from the reduction in bandwidth incurred by
setting aside bandwidth for signaling) to be comparable to the packet header overhead
incurred in the packet-by packet statistical multiplexing scheme.
We present two heuristic schemes for this scheduling. In the first scheme, our goal is to
determine capacity allocation on different time ranges to generate a TRC vector for a
transfer. In the second scheme, we additionally determine the exact channels allocated to
the transfer on each time range, and thus find a Time-Range-channeL(TRL) allocation
vector. This becomes important when we extend our schemes from the single-link
TDM/FDM multiplexing problem to the multiple-link circuit-switched network problem.
Allocation of resources to an end-to-end circuit traversing multiple links requires an
allocation of channels on each link of the end-to-end path. In some networks, there is a
constraint of maintaining the same channel number on all links of the end-to-end path,
e.g., in optical wavelength-division multiplexed (WDM) networks where the switches are
34
not equipped with wavelength converters. Therefore, in addition to the total capacity
allocation problem, we address the problem how to allocate exact channel numbers in our
single-link context.
S1
Circuit
D
switch
SN
Figure 7: Single link mode
4.1 Scheduling file transfers on a single link
We can think of three ways to assign bandwidth to file requests as they arrive:

Greedy scheme: allocates maximum bandwidth available that is less than
i
or equal to Rmax

Socialistic scheme: readjusts bandwidth allocation of all ongoing transfers
every time a new call is admitted so that the link capacity, C, is
divide equally at all times among the active file transfers

Capitalistic scheme: needs requestors to provide information on the price
they are willing to pay for each transfer, and uses this information to
allocate bandwidth in a manner that will maximize revenues for the owner
of the shared link
35
In this dissertation, we will only describe a greedy scheme for scheduling calls. Details
of the scheme are provided in sub-sections 4.1.1- 4.1.4 below.
The socialistic scheme may be feasible on a single link, but could be hard to implement
in the multiple-link scenario. This is because in the multiple-link scenario, the bandwidth
allocated to a circuit is the minimum available among all the links on the end-to-end path.
This constraint will result in some bandwidth lying idle on certain links and also may
cause shorter-route calls to get a larger share of the cumulative bandwidth. The packetby-packet statistical multiplexing scheme indeed achieves socialistic scheduling,
automatically admitting any new transfer and dividing the link bandwidth equally among
all ongoing transfers. This is hard, if not impossible, to achieve in circuit-switched
networks.
In contrast, a capitalistic scheme is hard to implement in the packet-by-packet
statistical multiplexing mode, while a capitalistic scheme that allocates bandwidth
according to how much a user is willing to pay is more feasible in circuit-switched
networks. Given the increasing interest in enabling service providers to earn revenues on
the Internet by providing differentiated services [48], we believe that there will be an
increasing level of interest in using circuit-switched networks for file transfers since the
latter appear to be better suited for capitalistic resource sharing. Capitalistic schemes are
possible with extended versions of packet switching, e.g., with priority queuing or other
forms of scheduling. These schemes lie in between the extremes of the complete-sharing
packet-switched scheme and the compete-partitioning fixed bandwidth TDM/FDM
scheme.
36
Before we can develop socialistic and capitalistic schemes in packet-switched and
circuit-switched networks and then compare them to either validate or disprove our
intuition that packet-switched networks are better suited for socialistic scheduling and
circuit-switched networks are better suited for capitalistic scheduling, we start by
developing a greedy heuristic that overcomes the fundamental disadvantages with fixedbandwidth TDM/FDM.
We define two greedy schemes (list scheduling schemes) for an m-channel single link.
The first is called Varying-Bandwidth List Scheduling (VBLS) and the second, a
special case of practical interest, is called VBLS with Channel Allocation (VBLS/CA).
VBLS is the basic heuristic in which we maintain the total bandwidth allocated to a file
transfer in different time ranges, while VBLS/CA takes into account practical
consideration and tracks actual channel allocations in different time ranges. Table 4 lists
notations used in the algorithm.
Symbol
Fi
i
Treq
Table 4: Notations for VBLS
Meaning
File transfer size requested by call i
Start time requested for call i
i
Rmax
TRCi =
B , E , C , k  1,, 
i
k
i
k
i
k
i
 (t )
 (t ) is expressed in the following form:
Pz  t  Pz 1
 mz

t  Pzmax
m
Maximum rate requested for call i
expressed as a number of channels;
typically limited by the access-link rate or
end-host processing rates
Time-Range-Capacity allocation:
Capacity C ki is assigned to call i in time
range k starting at Bki and ending at E ki
Capacity availability function: Total
number of available channels at time t
z max denotes the number of times
 (t ) changes values before reaching m at
t  Pzmax after which all m channels of the
37
where mz  m and z  1,2,, z max ; see link remain available
Figure 8 for an example

Per-channel bandwidth
Discrete time unit
Tdiscrete
Number of channels
4
3
 (t)
2
1
Time
10
20
30
40
50
60
70
80 

Figure 8: Example of  (t ) , P1  0 ,  (0)  0 , P2  10 ,
z max  9 , and Pzmax  P9  80
4.1.1 Varying-Bandwidth List Scheduling (VBLS) overview
The scheduler maintains a available capacity function  (t ) . Given it knows the TRC
allocations for all ongoing file transfers, it knows when and how much link capacity is
i
i
available for a new request. A request i, specifies ( F i , Treq
, Rmax
) . The switch’s response
is TRCi, which is an allocation of capacity for different time ranges for request i.
Capacity allocation for a new request is made on a round-by-round basis, where a round
consists of the procedures used to allocate capacity for a time range that extends between
two consecutive change points in  (t ) . In a time range between two consecutive change
points, we determine whether the entire remaining file can be transferred or the holding
38
time ends within this time range, and whether the available capacity is greater than or less
i
than/equal to Rmax
. We define four cases corresponding to the four possible outcomes of
theses two decisions. At the end of each round, we compute the remaining size of the file
and start the next round.
4.1.2 Detailed description of VBLS
i
Start algorithm: Set time   Treq
, and remaining file   F i ; k  1.
Repeat loop (start next round):
Find z such that Pz    Pz 1 in the capacity availability function  (t ) . If  ( )  0 ,
then reset   Pz 1 . Continue repeat loop (start next round).
i
Case 1: Number of available channels is less than/equal to Rmax
, and the whole file can
i
be transmitted before the next change in the available capacity curve, i.e.,  ( )  Rmax
,
and ( Pz 1  ) ( )    , then

Set Bki   , E ki     /( ( )  ) , C ki   ( ) (the begin time, end time and
capacity allocation for the kth range of file transfer i). Set  i  k (total
number of time ranges allocated to file transfer i). Terminate repeat loop.
i
Case 2: Number of available channels is less than/equal to Rmax
, and the whole file
cannot be transmitted before the next change in the available capacity curve, i.e.,
i
 ( )  Rmax
, and ( Pz 1  ) ( )    , then

Set Bki   , E ki  Pz 1 , C ki   ( ) . Set
k  k  1 ,   Pz 1 , and
    ( Pz 1  )  ( ) . Continue repeat loop (start next round).
39
i
Case 3: Number of available channels is greater than Rmax
, and the whole file can be
i
transmitted before the next change in the available capacity curve, i.e.,  ( )  Rmax
, and
( Pz 1  ) ( )    , then

i
i
 ) , C ki  Rmax
Set Bki   , E ki     /( Rmax
. Set
 i  k . Terminate
repeat loop.
i
Case 4: Number of available channels is greater than Rmax
, and the whole file cannot be
i
transmitted before the next change in the available capacity curve, i.e.,  ( )  Rmax
, and
( Pz 1  ) ( )    , then

i
Set Bki   , E ki  Pz 1 , C ki  Rmax
. Set
k  k  1 ,   Pz 1 , and
i
    ( Pz 1   ) Rmax
. Continue repeat loop (start next round).
End repeat loop.
i
As an example of VBLS, consider scheduling a transfer of 5GB file with a Treq
of 50,
i
and an Rmax
of 2. Let  , the per channel link capacity, be 10Gbps, and each unit of time
corresponds to 100ms. Assume the 4-channel link state is as shown in Figure 8. In the
time range 50  t  60 , we can schedule 1 channel for the transfer. Within this range,
1.25GB (  10Gbps  10  100ms) can be transferred. In the 60  t  70 range, we can
i
allocate 2 channels since Rmax
is 2. Therefore we can transfer 2.5GB. The remaining
1.25GB can be assigned to 2 channels past t = 70. Even though available capacity is 3
i
channels in the 70  t  80 , we can only assign two channels because of the Rmax
limit.
Therefore the TRC vector is as follows:
50, 60, 1, 60, 70, 2, 70, 75, 2 where
each
40
tuple is of the form ( Bki , E ki , C ki ) and the number of ranges for this call  i is 3. The TRC
allocation is indicated by the shaded area in Figure 9.
Number of channels
4
3
 (t)
2
1
Time
10
20
30
40
50
60
70
80 

Figure 9: Shaded area shows the allocation for the example 50MB file
i
i
 2 channels. Per channel link
transfers, with Treq
 50 and Rmax
capacity is 1 Gbps and one time unit is 10ms
4.1.3 VBLS with Channel Allocation (VBLS/CA) overview
VBLS/CA is a specific instance of VBLS. In this heuristic, the network keeps track of
individual channel occupancy as a function of time for each of the m channels on the link,
unlike before, where the network only tracked the total available capacity with time
(  (t ) ). This is required for practical considerations. In electronic TDM switches, the
channel numbers are needed to establish the crossconnections at the right instances in
time, while additionally in all-optical WDM networks without wavelength converters, the
same channel has to be selected on multiple links in the multiple-link problem. We
41
introduce additional notation in Table 5 to handle tracking of channels and then describe
VBLS/CA.
Table 5: Additional notations
Meaning
Symbol
 1 G dj  t  H dj
A j (t )  
0 otherwise
Indicator of the availability of channel C j .
G dj and H dj are the start and end points of
time range d when the channel C j is
for 1  d  n j and 1  j  m ; see Figure 10 available. The end point of the last time
n
range H j j is necessarily  .
for an example
C (t )
The exact set of channels available at
instant t
TRLi =
Time-Range-channeL allocation:
B , E
i
k
i
k

, Lik , k  1,  , i

Channel Lik is assigned to call i in time
range k starting at Bki and ending at E ki
Aj (t )
Available
Busy
Available
Channel 4
Channel 3
Channel 2
Channel 1
Time
10
20
30
40
50
60
70
80 

Figure 10: Per-channel availability; G11  10 , H 11  30 , n1  2 ; the  (t ) shown in Figure
8 is derived from this A j (t ) . Lines indicate time ranges when the channel is available.
42
A summary of the additions needed to account for channels is as follows. First, we
track channel availability A j (t ) with time t for each channel j in addition to tracking
 (t ) , the total available bandwidth. We also track the set of channels available C  (t ) .
Time ranges are demarcated by changes in  (t ) or C  (t ) . For example, in  (t ) shown in
the example in Figure 8, one channel is available in the range 30  t  60 . However,
different channels are available in different sub-ranges within this range. Channel 4 is
available in the range 30  t  40 , and channel 1 is available in the range 40  t  60 .
Thus, t  40 should be regarded a change point. Second, we keep track of a set called
C open , which is the set of channels that remains available past a time-range demarcation
point in  (t ) or C  (t ) . The reason for tracking these is so that they can be allocated in the
next time range to save switch-programming times. Third, if multiple channels are
allocated with the same time range, we count each such allocation as a separate entry in
the Time-Range-channeL (TRL) vector. Therefore, a round of the algorithm could
advance the range-tracking variable k by a number greater than 1 unlike in the basic
VBLS scheme where it always advances only by 1. Fourth, when we have choices, i.e.,
when there are many candidate channels from which the heuristic needs to select a
subset, we propose using two rules. If the file transfer completes within a time range
where such a choice needs to be made, we choose the channels with the smallest
remaining available time. On the other hand, if the file transfer does not complete, then
we choose the channels with the largest remaining available time. The rationale for the
former is to limit “holes” in the time range allocations of channels, while the rationale for
the latter is to decrease the number of switch reprogramming actions needed.
43
There are other practical details such as accounting for the time needed to change the
crossconnections established through the switch at the boundary of every time range in
the TRL allocation for a file. In electronic switches, this time to set up or release a crossconnection is in nanoseconds, while in all-optical switches this time is in currently in
milliseconds. Capacity allocations in TRL vectors should take into account this
overhead.
4.1.4 Detailed description of VBLS/CA
i
Start algorithm: Set time   Treq
, and remaining file   F i ; k  1; C open  Nullset .
Repeat loop (start next round):
Find z such that Pz    Pz 1 . If  ( )  0 , then reset   Pz 1 . Continue repeat loop.
i
Case 1: Number of available channels is less than/equal to Rmax
, and the whole file can
i
be transmitted before the next change in the available capacity curve, i.e.,  ( )  Rmax
,
and ( Pz 1  ) ( )    , then

Cki '  Copen [k 'k  1] for k '  k , k  1,, (k  C open  1) (continue using open
channels from previous time range);

Check whether ( Pz 1   ) C open    :
o If
true,
E ki '     /( C open  )
,
Bki '  
,
for
k '  k , k  1, , (k  C open  1) . Set  i  k  C open  1 . Terminate
repeat loop.
44
______


o If false, choose all channels in  C ( v )  C open  . Set these channel


number
into
C ki '
for
k '  (k  C open ), , (k   ( )  1)
.
Set
E ki '     /( ( )  ) , Bki '   , for k '  k , k  1, , (k   ( )  1) . Set
 i  k   ( )  1 . Terminate repeat loop.
i
Case 2: Number of available channels is less than/equal to Rmax
, but the whole file
cannot be transmitted before the next change in the available capacity curve, i.e.,
i
 ( )  Rmax
, and ( Pz 1  ) ( )    , then

Cki '  Copen [k 'k  1] for k '  k , k  1,, (k  C open  1) (continue using open
channels from previous time range);

______


Choose all channels in  C ( v )  C open  . Set these channel number into C ki '


for k '  (k  C open ), , (k   ( )  1) .

Set Bki '   and E ki '  Pz 1 , for k '  k , k  1, , (k   ( )  1) .

Store set of still open channels as C open by removing channels for which
H dj  Pz  1 from the set of channels C ki ' for k '  k , k  1,, (k   ( )  1) .
k  k   ( ) ,   Pz 1 , and     ( Pz 1  )  ( ) . Continue repeat loop
(start next round)
45
i
Case 3: Number of available channels is greater than Rmax
, and the whole file can be
i
transmitted before the next change in the available capacity curve, i.e.,  ( )  Rmax
, and
( Pz 1  ) ( )    , then

i
Check if C open  Rmax
.
i
o If true, choose Rmax
channels out of the C open channels with the
smallest leftover available ranges by comparing H dj  over all
j  C open . The reason for choosing the channels with the smallest
leftover available ranges is to leave open the smallest gaps. If
multiple channels have the same smallest leftover range, choose at
random.
Set
these
channel
i
k '  (k  1), , (k  Rmax
 1) .
numbers
into
i
E ki '     /( Rmax
) ,
C ki '
Bki '  
for
for
i
i
k '  k , k  1, , (k  Rmax
 1) . Set  i  k  Rmax
 1 . Terminate
repeat loop.
o If
false,
Cki '  Copen [k 'k  1]
for
k '  k , k  1, , (k  C open  1) (continue using open channels from
previous time range);

Check whether ( Pz 1   ) C open    ;
46
o If
E ki '     /( C open  )
true,
Bki '  
,
,
for
k '  k , k  1, , (k  C open  1) . Set  i  k  C open  1 . Terminate
repeat loop.
o If
false,
choose
all
channels
in
______


 C ( v )  C open 


,
i
 C open ) channels with the smallest leftover available
choose ( Rmax
ranges by comparing H dj  over all j  C  ( ) and j  C open . The
reason for choosing the channels with the smallest leftover available
ranges is to leave smallest gaps. If multiple channels have the same
smallest leftover range, choose at random. Set these channel
numbers
into
C ki '
for
i
k '  (k  C open ), , (k  Rmax
 1) .
Set
i
i
 1) . Set
E ki '     /( Rmax
 ) , Bki '   , for k '  k , k  1,, (k  Rmax
i
 i  k  Rmax
 1 . Terminate repeat loop.
i
Case 4: Number of available channels is greater than Rmax
, but the whole file cannot be
i
transmitted before the next change in the available capacity curve, i.e.,  ( )  Rmax
, and
( Pz 1  ) ( )    , then

i
Check if C open  Rmax
.
i
o If true, choose Rmax
channels with the largest leftover available
ranges by comparing H dj  over all j  C open . The reason for
choosing the channels with the largest leftover available ranges is so
47
that fewer switch-reprogramming actions are required. If multiple
channels have the same largest leftover range, choose at random.
i
 1) .
Set these channel numbers into C ki ' for k '  (k  1), , (k  Rmax
E ki '  Pz 1 ,
Bki '  
for
i
k '  k , k  1, , (k  Rmax
 1) .
Set
i
i
k  k  Rmax
and   Pz 1 , and     ( Pz 1   ) Rmax
. Store set of
still open channels as C open by removing channels for which
H dj  Pz 1
from
the
i
k '  k , k  1, , (k  Rmax
 1) .
set
of
Continue
channels
C ki '
for
repeat loop (start next
round).
o If
Cki '  Copen [k 'k  1]
false,
for
k '  k , k  1, , (k  C open  1) (continue using open channels from
previous time range);

From
channels
in
______


 C ( v )  C open 


,
i
 C open ) channels with the largest leftover
choose ( Rmax
available ranges by comparing H dj  over all j  C  ( ) and
j  C open . The reason for choosing the channels with the
largest leftover available ranges is so that fewer switchreprogramming actions are required. If multiple channels
have the same largest leftover range, choose at random. Set
48
these
channel
numbers
into
C ki '
for
i
k '  (k  C open ), , (k  Rmax
 1) .


i
 1) .
Set E ki '  Pz 1 , Bki '   , for k '  k , k  1, , (k  Rmax
i
i
Set k  k  Rmax
and   Pz 1 , and     ( Pz 1   ) Rmax
.
Store set of still open channels as C open by removing channels
for which H dj  Pz 1 from the set of channels C ki ' for
i
k '  k , k  1, , (k  Rmax
 1) . Continue repeat loop.
End repeat loop.
Merge time ranges on a given channel:

Find the subset of time ranges TRLim ' corresponding to each channel m' . Order
time ranges in this subset such that Bki  E ki for k  (1, 2,  , TRLim ' ). If
E ki  Bki 1 , then merge kth and (k  1) th ranges. At the end of this operation,
obtain  i time ranges in TRLi.
End algorithm.
As an example of VBLS/CA, assume that the link is in the same starting state as shown
i
in Figure 8 and Figure 10, and the file to be scheduled is a 75MB file with a Treq
of 10,
i
and an Rmax
of 2. Let  , the per-channel link bandwidth, be 1Gbps, and each unit of
time correspond to 10ms as before. In the time range 10  t  20 , we can schedule 2
i
channels for the transfer. This falls into Case 4 where  ( )  Rmax
and the whole file
cannot be transmitted before the next change in  (t ) , which occurs at t  20 . Here we see
49
that a choice needs to be made from among the three available channels 1, 2, 4 (see
Figure 10). We choose channels 1 and 4 because they have the largest leftover times in
their availability ranges. At the end of the first round, we find two time-range-channel
allocations, {(10, 20, 1), (10, 20, 4)} , where 10 is the begin time, 20 is the end time and
channel numbers are 1 and 4 in the two allocations, respectively. With these allocations,
we can transmit 25MB of the file.
The next round begins with the set C open having channels (1, 4) in it because these
channels continue to be available in the next z range of  (t ) , which is (20  t  30) .
Since the remaining part of the file cannot be fully transmitted in this range, and the
i
available capacity is equal to Rmax
, which is 2 channels, we fall into the Case 2 category.
At the end of this second round, we find two more time-range-channel allocations,
{( 20, 30, 1), (20, 30, 4)} , where 20 is the begin time, 30 is the end time and channel
numbers are 1 and 4 in the two allocations respectively. With these allocations, we can
again transmit 25MB of the file.
The third round begins with the set C open consisting of only one channel (4). An
interesting case occurs here. We note that the end of this time range is at t  40 even
though in the  (t ) curve shown in Figure 8, it appears to stretch to t  60 . The reason for
this is that C  (t ) , the set of available channels at  (t ) has a change at t  40 . The one
available channel changes at this point from channel 4 to channel 1 (see Figure 10).
Therefore, this third round, which also falls into the Case 2 category results in one timerange-channel allocation, which is {(30, 40, 4)}. In this allocation, we can schedule
12.5MB of the remainder of the file. We have only 12.5MB left to schedule.
50
The fourth round starts with no channels in the C open list. Since we can finish the
transfer before the next change point, which occurs at t  60 , and the available capacity
i
is less than Rmax
, we select the one channel available, which is channel 1. This is a Case
1 round, and the last allocation becomes {(40, 50, 1)}.
At this point, when we finish the Repeat loop of the heuristic, the TRL allocation
vector looks as follows: {(10, 20, 1), (10, 20, 4), (20, 30, 1), (20, 30, 4), (30, 40, 4), (40,
50, 1)}. Illustrative of the last merge function described in the VBLS/CA heuristic, we
now merge consecutive time ranges for a given channel to avoid extra switch
reprogramming actions. Following this merge, the final TRL vector is as follows: {(10,
30, 1), (10, 40, 4), (40, 50, 1)}, and  i for the transfer, the number of time ranges is 3.
The allocation is illustrated in Figure 11.
Aj (t )
allocated to example file
Channel 4
Channel 3
Channel 2
Channel 1
allocated to example file
Time
10
20
30
40
50
60
70
80 

Figure 11: Dashed lines show the allocation of resources for the example
i
i
 2 channels
75MB file transfer described above with Treq
 10 and Rmax
4.2 Analysis and simulation results
We describe our traffic model in Section 4.2.1. In Section 4.2.2, we show how we
validated our simulation against an analytical model. To understand the VBLS system
51
better, we carried out four sensitivity analysis experiments, which are described in
Section 4.2.3. In Section 4.2.4, we describe a simulation comparison of VBLS with a
fixed-bandwidth list scheduling (FBLS) scheme and a packet-switched (PS) system. It
shows that VBLS can overcome the classical drawback of circuit switching as described
in the beginning of Chapter 4, and achieve close to PS performance. Finally, we describe
some practical considerations with regards to the implementation of VBLS in Section
4.2.5.
4.2.1 Traffic model
We assume that file transfer requests arrive according to a Poisson process with rate  .
i
The requested start time Treq
for all transfers is assumed to be equal to the corresponding
call arrival times, i.e., all calls are of the “immediate-request” type. We assume that file
sizes are distributed according to a bounded Pareto distribution [49]. Specifically, the file
size probability density function is given by (10):
f X ( x) 
k  x  1
, kx p
(10)

k
1   
 p
where  , the shape parameter, k and p are the lower and upper bounds, respectively, of
the allowed file-size range.
4.2.2 Validation of simulation against analytical results
To validate our VBLS simulator, we model and simulate a simple case in which all file
i
requests set their Rmax
to match the link capacity, C . In this case, we can model the
VBLS system as an M/G/1 queue. In an M/G/1 system where the arrival rate is  and X
is a random variable representing the service time, the average waiting time E(W) is [50]:
52
E (W ) 
E[ X 2 ]
2(1   )
(11)
where E[ X 2 ] is the second moment of the service-time distribution and  is the system
load,   E[ X ] . Using the file-size distribution specified in (10), and the link capacity,
C , which is also the circuit rate3:
E[ X 2 ] 
1
 2
C
1
C2

2
 ( x  f X ( x))dx 


k


 (1  (k / p )
p
1
( x 2  f X ( x)) dx
2 
C k
   1
1

   2    2
p
   2  k



(12)
0.8
0.7
Analytical model
Simulation
File latency (sec)
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
System load
0.7
0.8
0.9
1
Figure 12: File latency comparison between analytical and simulation results
Figure 12 shows our numerical results comparing the mean waiting time for the
analytical model from (12) with our simulation results. Input parameter values are as
follows: k = 500MB, p = 100GB, and  =1.1. We define the term file latency to denote
the mean waiting time across all files transferred. Waiting time for a given file transfer
3
Here we make an idealized assumption that the service time is file size divided by the circuit rate. Realistically, we will need
retransmissions due to link errors and/or flow-control buffer overflows.
53
i
i is defined as the time duration between Treq
(which is the call arrival time in our
simulations) and the time instant in which the first bit of the file gets transmitted or
served. System load,  , is defined as  multiplied by the mean service time E[ X ] . The
latter is fixed because the mean file size and link capacity are fixed. This means that to
generate an increasing system load, we simply increase the call arrival rate  . Note that
for stability the system load  must be below 1.
As can bee seen in Figure 12, the analysis and simulation results match closely. In
other words, our simulation code is validated and can be used to obtain more interesting
i
results in which calls specify varying values of Rmax
, and/or to compare with other
scheduling schemes.
4.2.3 Sensitivity analysis
i
In this section, we carry out four experiments: (i) to understand the impact of Rmax
i
when all calls request the constant values of Rmax
, (ii) to understand the impact of the
i
allowed file-size range (i.e., parameters (k and p)), (iii) to understand the impact of Rmax
i
when calls request the three different values of Rmax
, and (iv) to understand the impact of
the size of Tdiscrete (discrete time unit).
i
For the first experiment, we still assume that all calls request the same Rmax
, but study
i
the system performance under different values of Rmax
, 1, 5, 10, and 100 channels, when
the link capacity C is assumed to be 100 channels. The actual rate of a channel is not
i
i
significant but rather the ratio between Rmax
and C. As can be expected, if Rmax
for all
calls is 1 channel, we obtain lower file latency than in the case when all files request an
54
i
Rmax
equal to the link capacity. In other words, the mean waiting time is more as shown
in Figure 13 (a). However, if we consider the mean file-transfer delay, which includes
the file latency (mean waiting time) and mean service time (transmission delay), we get
the opposite result, as shown in Figure 13 (b). The case in which all calls request 100
channels outperforms the case in which all calls request 1 channel. This result is
effectively the same as a well-known result that the mean response time in an M/M/1
system with an m service rate is lower than that of a system of m M/M/1 servers in
which each server operates at a service rate  [50].
2.5
0.7
0.6
Rimax = 100 channels
0.5
Rimax = 10 channels
Rimax = 1 channel
Mean file transfer delay (sec)
File latency (sec)
2
Rimax = 5 channels
0.4
Rimax = 1 channel
0.3
0.2
1.5
Rimax = 5 channels
Rimax = 10 channels
1
Rimax = 100 channels
0.5
0.1
0
0
0
0.2
0.4
0.6
System load
0.8
(a) File latency comparison
1
0
0.2
0.4
0.6
System load
0.8
1
(b) Mean file-transfer delay
Figure 13: A comparison of VBLS file latency and
mean file-transfer delay for
comparison
i
files requesting same Rmax ; Link capacity is 100 channels
For the second experiment, we studied the sensitivity of the results to the lower and
upper bounds of the allowed file-size range, k and p in equation (10). This experiment is
useful in helping us determine the file-size range suitable for dedicated circuits. Clearly,
the use of the dedicated circuits for small files is not recommended because the round-trip
55
propagation delay incurred in establishing a wide-area circuit could add a significant
overhead to the total transfer delay, especially as link rates increase. Therefore, we
compare two cases (only the lower bound k is varied):

Case 1: k = 500MB, p = 100GB,  =1.1

Case 2: k = 10GB, p = 100GB,  =1.1
i
For each case, we ran three sets of simulations with Rmax
equal to 1 channel, 5 channels
and 10 channels, respectively, for all calls. Link capacity was assumed to be 100
channels.
1.6
Rimax = 10 channels (k = 10GB)
1.4
Case 2
File latency (sec)
1.2
Rimax = 5 channels (k = 10GB)
Rimax = 1 channel (k = 10GB)
1
Rimax = 10 channels (k = 500MB)
0.8
0.6
Case 1
Rimax = 5 channels (k = 500MB)
Rimax = 1 channel (k = 500MB)
0.4
0.2
0
0
0.2
0.4
0.6
System load
0.8
1
Figure 14: File latency comparison for two different values
of k (500MB and 10GB) while keeping p constant at 10GB
As shown in Figure 14, the file latency (mean waiting time) is lower for Case 1 than for
Case 2. This is an interesting result because it is the opposite of what we expected. Since
the file size range is smaller in Case 2, we expected the service-time variance to be lower,
and this in turn to result in a lower file latency (see equation (11)). This is incorrect
56
because of the form of the Pareto distribution. Indeed the variance is greater in Case 2.
We demonstrate this feature by an analysis of the simulated values. We divided the entire
file ranges for both cases into 50 bins each. We then counted the frequency of occurrence
of file sizes in each of these bins for all the generated file transfer requests. As can be
seen in Figure 15, most of the files belong to the small file size bins (the first or second
binds) in Case 1. This means the probabilities associated with theses small file size bins
are quite high, and the probabilities of files belonging to the higher-size bins are small.
Since E[ X 2 ]  E[( X  E[ X ]) 2 ] and E[ X ] in Case 1 is quite small at 2.27GB, we can see
that variance is actually lower in Case 1 than in Case 2.
6
10
5
10
Bin 9 (Mean file size = 24.6 GB)
4
Frequency
10
k = 10GB; p = 100GB; alpha = 1.1
3
10
2
10
Bin 1 (Mean file size = 2.27 GB)
k = 500MB; p = 100GB; alpha = 1.1
1
10
0
10
0
|
|
10
20
30
bin-number
40
50
Figure 15: Frequencies for different file size ranges (total number
of bins = 50; bin-size = 1.99GB or 1.8GB
From this analysis, it is clear why the file latency is lower if we decrease the lower
bound, k, while keeping upper bound, p, constant. The opposite is true if we keep the
lower bound, k constant but increase the upper bound, p. This is again because the curve
57
flattens out as p is increased, thus increasing variance and correspondingly, the file
latency.
i
For the third experiment, we studied the impact of Rmax
when we allow three different
i
values for Rmax
. We carried out two of the following cases:

i
 1 (30%),
Case 1:  (per-channel rate) = 10Gbps, C = 1Tbps (100 channels), Rmax
i
i
Rmax
 2 (30%), and Rmax
 4 (40%)

i
 1 (30%),
Case 2:  (per-channel rate) = 1Gbps, C = 100Gbps (100 channels), Rmax
i
i
Rmax
 5 (30%), and Rmax
 10 (40%)
For the input parameters, we chose   1.1 , k  100MB , and p  1TB . Instead of a
i
i
constant Rmax
for all calls, we allowed different values for Rmax
for the following two
cases. For both cases, link capacity was assumed to be 100 channels.
For this experiment, we used a new performance metric called file throughput, which
we define as the long-run average of the file size divided by the file-transfer delay. The
i
file-transfer delay of file i is defined as the time duration between Treq
and the instant
when the transmission of file i is complete.
58
40
35
Rimax = 4 channels
File throughput, Gbps
30
25
Rimax = 2 channels
20
15
Rimax = 1 channel
10
5
0
0.1
0.2
0.3
0.4
0.5
0.6
System load
0.7
0.8
0.9
1
Figure 16: File throughput metric for files requesting three
i
different values for Rmax
, 1, 2, and 4 channels (  =10Gbps)
10
9
Rimax = 10 channels
8
File throughput, Gbps
7
Rimax = 5 channels
6
5
4
3
Rimax = 1 channel
2
1
0
0
10
20
30
40
50
60
System load
70
80
90
100
Figure 17: File throughput metric for files requesting three
i
different values for Rmax
, 1, 5, and 10 channels (  =1Gbps)
i
As shown in Figure 16, file transfers with the smaller value of Rmax
(say 1 in our
example) have a lower degradation of file throughput at the same system load. This
i
degradation will be more apparent for the three different values of Rmax
as the system load
59
increases. For example, at a load of 0.91, the percentage of the system performance
i
 1 is 28.5%, while with file transfers where
degradation for a file transfer with Rmax
i
i
Rmax
 2 and Rmax
 4 , these numbers are 33.3% and 37.9% respectively. Similar results
are obtained in Figure 17. For instance, at a load of 0.92, the percentage of the system
i
 1 is 13.3%, while with file
performance degradation for a file transfer with Rmax
i
i
 5 and Rmax
 10 , these numbers are 29.4 % and 39.5%
transfers where Rmax
respectively.
i
From this experiment, we learn that when we allow different values for Rmax
, file
i
requests with larger values of Rmax
will experience the system performance degradation
i
more than the smaller values of Rmax
as the system load increases.
For the fourth experiment, we studied the impact of the discrete time unit under different
i
values of Tdiscrete : 0.05, 0.5, 1, and 2 sec. We assume that all calls request the same Rmax
(say 1
channel in our experiment), and the link capacity C is assumed to be 100 channels.
Figure 18 shows the file throughput comparison for different values of Tdiscrete (0.05,
0.5, 1, and 2 sec). As the value of Tdiscrete is increased, the file throughput gets worse as
the system load increases. This is because more unused time ranges are occupied due to
the discretization, resulting in the succeeding file transfers experiencing the larger file
i
latency. For example, if all calls request the same Rmax
of 1 channel (10Gbps), then with a
discrete time unit of 1sec, the most we could end up scheduling for each file in the worst case is
1.25GB. Since the file size range is from 500MB to 100GB, this 1.25GB is a significant per file
transfer. Therefore, the impact of the file throughput is larger. In addition, this large value of
60
Tdiscrete (say 2) will sacrifice the utilization more, as shown in Figure 19. However, one of
the interesting findings is that the utilization does not drop much under low load condition. This
is because each file does not experience the file delay caused by other subsequent ongoing file
transfers. The file delay incurred in low load conditions is only dependent on the length of the
discrete time unit. However, under low load conditions, this file delay will drop the system
performance, but not make a significant impact on the utilization. In Figure 19, the utilization
curve does not increase from a certain load. This means that the system is already full
from that load.
100
Tdiscrete = 50msec
90
80
Tdiscrete = 0.5sec
File throughput, Gbps
70
Tdiscrete = 1sec
60
50
Tdiscrete = 2sec
40
30
20
10
0
.
0
0.2
0.4
0.6
System load
0.8
1
Figure 18: File throughput comparison for different value
of Tdiscrete (0.05, 0.5, 1, and 2 sec)
61
100
Tdiscrete = 50msec
90
Tdiscrete = 0.5sec
80
Tdiscrete = 1sec
Utilization (%)
70
60
Tdiscrete = 2sec
50
40
30
20
10
0
0
0.2
0.4
0.6
System load
0.8
1
Figure 19: Utilization comparison for different value
of Tdiscrete (0.05, 0.5, 1, and 2 sec)
To understand the impact of utilization theoretically, we derived the following upper
bound for the utilization penalty due to the discretization. Consider a situation as follows:
suppose for a discrete-time implementation of VBLS, the discrete-time unit is Tdiscrete
seconds, i.e., the scheduler is invoked every Tdiscrete seconds. A file transferring, under
ideal non-discrete-time VBLS, takes T seconds. A file transfer will need T / Tdiscrete 
time slots to complete its transferring under the discrete-time VBLS. In other words, the
system utilization will lose
T / Tdiscrete   T  100 percent. It is interesting to find an upper
T
bound for the utilization penalty incurred by the discretization. Denote the sequence of
file sizes of the files arriving at the system as {F i , i  1, 2, } and the time taken for the
i

files to get through the system as {T i , i  1, 2, } . It is obvious that T i  F i /   Rmax
i
for all i , since the speed of the i th file transferring is bounded by   Rmax
, where  is
i
the per channel bandwidth. Therefore, the utilization loss U loss
for i th file, computed as
62
T / Tdiscrete   T , can be bounded by T
discrete
T
/ T because T / Tdiscrete   T  Tdiscrete . Further,
i
U loss
can be bounded by:
i
Tdiscrete
  Rmax
 Tdiscrete

i
i
F /( Rmax   )
Fi
(13)
Therefore, the average utilization loss
in
U
i 1
i
loss
/n
(14)
is upper bounded by
n

i
  Rmax
 Tdiscrete
Fi
n
i
If F , i  1, 2,  has the same distribution, then
i 1
n

(15)
i
  Rmax
 Tdiscrete
Fi
 1 
i
   Rmax
 Tdiscrete  E  i , where n  
(16)
n
F 
In other words, we can compute the upper bound for the system utilization loss incurred
i 1
by discretization with
 1 
.
(17)
i 
F 
i
For example, in our experiment, when all files request the same Rmax
of 1 channel,
i
  Rmax
 Tdiscrete  E 
where the per-channel bandwidth is 10Gbps, we can calculate the upper bound for the
system utilization loss from (17). When Tdiscrete equals to 0.05 sec, utilization loss
incurred by the discretization is always less than 0.6567%. However, this value increases
as the number of Tdiscrete increases. For example, under the same simulation settings,
when Tdiscrete equals 2 sec, the upper bound for the utilization loss is 26.26% from (17).
63
4.2.4 Comparison of VBLS with FBLS and PS
The primary objective of this simulation study is to compare the performance of VBLS
with that of two alternative file transfer schemes serving the same file requests: (1) a
packet-switched (PS) system, and (2) a fixed-bandwidth list scheduling (FBLS)
scheme (FBLS is the greedy scheme that schedules each file request to start as soon as
i
possible, using a fixed bandwidth of Rmax
). The rationale for this comparison is to
illustrate that although VBLS is a circuit-based resource-sharing scheme, its throughput
behavior (on a file-by-file basis) more closely mimics packet switching than it does
FBLS. As pointed out before, standard circuit switching using FBLS is expected to
produce significantly lower file throughputs than packet switching, simply because
ongoing file transfers cannot exploit the release of bandwidth resulting from completed
file transfers. However, the variable-bandwidth nature of VBLS in scheduling file
transfers mitigates this loss of throughput; indeed, by design VBLS exploits the
bandwidth released by completed file transfers.
In the simulation of the PS system, files are divided into packets of length 1500 bytes,
i
and arrive at the infinite packet buffer at a constant packet rate equal to Rmax
divided by
the packet length. In other words the packet interarrival time for a file is the packet length
i
divided by the Rmax
value for the file.
For the input parameter, we choose   1.1, k = 500MB, and p = 100GB. Instead of a
i
i
constant Rmax
for all calls, here we allow three values for Rmax
, 1, 5, and 10 channels
with corresponding probabilities of 0.3, 0.3, and 0.4, respectively. The link capacity, C is
64
100 channels. We use the same performance metric used in Section 4.2.3, which is file
throughput.
4.2.4.1 Numerical results
Figure 20-Figure 22 show plots of the file throughput versus system load for VBLS,
FBLS, and packet switching (PS). Our plots are categorized according to the value of
i
Rmax
; the rationale here is that file throughputs (especially at low loads) are naturally
i
i
limited by their Rmax
values, and so comparing throughputs for files with different Rmax
values is inappropriate.
10
PS
9
File throughput, Gbps
8
VBLS
7
6
FBLS
5
4
3
2
1
0
0.1
0.2
0.3
0.4
0.5
0.6
System load
0.7
0.8
0.9
Figure 20: File throughput metric for files requesting
i
Rmax
of 1 channel (10Gbps)
1
65
50
45
PS
File Throughput, Gbps
40
35
VBLS
30
25
FBLS
20
15
10
5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
System load
0.7
0.8
0.9
1
Figure 21: File throughput metric for files requesting
i
Rmax
of 5 channels (50Gbps)
100
90
PS
80
VBLS
File throughput, Gbps
70
60
50
40
FBLS
30
20
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
System load
0.7
0.8
0.9
1
Figure 22: File throughput metric for files requesting
i
Rmax
of 10 channels (100Gbps)
As we can see in Figure 20-Figure 22, VBLS achieves throughput values that lie above
that of FBLS, as expected. Significantly, the throughput performance of VBLS is
indistinguishable from packet switching. This serves to illustrate our main point – that by
66
taking into account file sizes and varying the bandwidth allocation for each transfer over
its transfer duration, we mitigate the performance degradation usually associated with
circuit-based methods.
We note that our simulation of the PS scheme is of an infinite-buffer system. This is
clearly an idealized packet-switching scenario. In practice, buffers will be finite, which
means packet losses will occur due to congestion. Mechanisms such as TCP’s congestion
control schemes are then required to recover from these packet losses with
retransmissions and rate adjustments. TCP mechanisms can add significant delays to the
total file-transfer delays [47].
4.2.5 Practical considerations
VBLS achieves close-to-PS performance at the cost of complexity relative to FBLS.
First, as noted in Section 4.1.1, VBLS requires that the circuit switches be reprogrammed
multiple times within a transfer unlike in the fixed-bandwidth allocation mode where the
switch is only programmed at the start and end of a call. With electronic TDM switches,
where switch programming times is in the order of nanoseconds [32], impact of
reprogramming on utilization will be less than with the slower optical Micro-ElectroMechanical Switches (MEMS) [51]. If VBLS is used only for very large files then these
overheads will be relatively insignificant.
Second, to make it practical to implement  (t ) , the capacity availability function, we
need to limit the number of bandwidth change points z max . For this purpose, we discretize
time, and only allow for bandwidth changes to fall on discrete time instances. The smaller
the discrete time unit, the larger the storage needed for  (t ) . The larger this unit, the
67
worse the utilization because bandwidth cannot be reassigned in the middle of a time
range. Details such as these will be explored in implementation of VBLS.
Propagation delays and clock synchronization issues become important in distributed
implementations of this scheduling algorithm. For now, since the dynamic provisioning
of circuits is handled in a centralized manner, the VBLS scheduler can also be centralized
avoiding propagation delay and clock synchronization problems. Maintenance of the
schedulers for switch reprogramming should ideally be at the switches themselves with
timer-based triggers.
Adopting the VBLS scheme, in CHEETAH instead of call-blocking mode, we might
have improved the gain, i.e., less file-transfer delay. For example, when a large file
comes (for example 1TB, see the beginning of Chapter 4), the end-host might have a
high-access service via the end-to-end circuit over the TCP/IP backup path. In addition,
the routing decisions made in Chapter 3 might have to be adapted. This work was
presented in [52]-[55].
68
Chapter 5. Multiple-link cases (centralized and
distributed)
The centralized on-line greedy scheme for multiple-link is just extensions for one-link
case. We can create a new  new (t ) reflecting the available bandwidth for all links.
However, in case of the distributed on-line greedy scheme for multiple link, each switch
should keep the same information, such as TRCi and TRLi, in a distributed manner. In
this section, we will only describe the distributed on-line greedy scheme for multiple-link
case. Details of the scheme are provided in Section 5.1. In Section 5.2, we describe the
analysis and simulation results.
5.1 VBLS algorithm for multiple-link case
There are a few practical issues that need to be solved before call-scheduling algorithm
can be implemented in multiple-link case. For example, we need to deal with time
synchronization. Given than multiple end hosts and switches have to interpret the
allocated time, such as Bki , E ki , and so on, in the same manner, some synchronization
method is needed. An extensive survey done in 1995 reported that most machines using
NTP (Network Time Protocol) to synchronize their clocks are within 21ms of their
synchronization sources, and all are within 29ms on average [56]. Mechanisms that use
relative time values or some such approach are needed to deal with these differences.
The call setup delay of 50ms is not an issue because other calls can use the circuit
when a call is being scheduling. However, by the time the first bit of a 8ms transfer
(100KB/100Mbps) arrives in CA from Boston, it takes 25ms while the resources at the
Boston switch are being held but not utilized – hence we need staggered setup. Merging
69
of propagation delay plus clock synchronization problem. Naturally aides the staggered
setup approach. A rule: a time instant specified in a message sent by a switch S n is as per
clock in that switch.
(TRC1i f , TS1 f )
S1
(TRC 2i f , TS 2 f )
S2
(TRC1ir , TS 2 r )
S3
SN
(TRC 2i r , TS 3r )
Figure 23: VBLS on a path of K hops
Consider the first time range ( B11i , E11i , C1i ) in the TRC vector TRC1i f carried forward
from switch S1 to switch S 2 in Figure 23. The call-scheduling message suffers a certain
propagation delay p12 . Furthermore, the clocks at switch S1 and switch S 2 may not be
synchronized. Let this clock difference be donated  12 with switch S 2 ’s clock ahead of
S1 ’s clock4. Switch S1 places a time stamp TS1 f just before it emits the call-scheduling
message on to the link. When switch S 2 receives the message, it immediately places a
time stamp TS 2 f of the current time in switch S 2 ’s clock into the message buffer along
with the message.
TS 2 r  TS1 f   12  p12
4
If S1’s clock is ahead of S2’s clock, 12 will be negative.
(18)
70
Switch S 2 can not interpret the time carried in TRC1i f for i th call and check whether
the bandwidth C1i is available within the time range ( PB12i , PE12i ), where
PB12i  B11i  12  p12  B11i  TR2r  TS1 f
(19)
PB12i  E11i  12  p12  E11i  TR2r  TS1 f
(20)
The above action results in identifying time ranges that are subsets (proper or equal) of
the time ranges selected in the TRC vector received from the upstream switch. Assuming
i
C1i  Rmax
, the goal is to find a set of ranges (which are sub-ranges of ( PB12i , PE12i ) ) in
which C ij is less than or equal to C1i where j  2 .
We define the following operation:
TRC (in1) f  TRC nfi   n (t )
(21)
TRC nri  TRC nfi  n (t )
(22)
A range ( Bkif( n1) , Ekif( n1) , Ckif( n1) ) belongs to TRC (in1) f if
PBki ( n1)  Bki '( n1)  Eki '( n1)  PEki ( n1) and C ki ' ( n  1)  C kni for
k
 k

k '    rn ,    rn  1,,  r ( n 1)
r 1
 r 1

(23)
(24)
A range ( Bknir , E knir , C knir ) belongs to TRC nri if
 n (t )  Ckif( n 1)  Cknir ,
where B  t  E
ir
kn
ir
kn
Table 6 lists our notations for the multiple-link case.
Table 6: Notations for multiple-link case
(25)
71
Symbol
Meaning
TRC nfi 
B
if
kn
B
ir
kn
Time-Range-Capacity allocation:
Capacity C ki is assigned to call i in the time






i
i
, E knif , C knif , k  1,, ni , (n  1,2,, N ) range k starting at B kn and ending at E kn by
switch n . Since the number of time ranges
can change from link to link, we add the
subscript n to  ni .
Time-Range-Capacity allocation:
TRC nri 
Capacity C ki to be released i starting
, E knir , C knir , k  1,, ni , (n  1,2,, N ) at Bknir and ending at E knir at switch n  1 .
 n (t )
Capacity availability function: Total number
of available channels at time t at switch n .
 n (t ) is expressed in the following form:
Pz  t  Pz 1
 mz

t  Pz
m
z max
denotes the number of times
 n (t ) changes values before reaching m n at
t  Pzmax after which all m n channels of the
where mz  m and z  1,2,, z max

link n remain available
max
i
kn
i
kn
( PB , PE )
M
 rn
Per-channel bandwidth
Potential begin and end times of the k th range
on the n th link
Multiplicative factor used in reserving TRCs;
if M = 5, then the TRC vector reserved is 5
times the TRC allocation needed to transfer
the file.
The r th range on the (n  1) th link becomes
 rn ranges on the n th link.
i
As an example of VBLS, consider scheduling a transfer of a 50MB file with a Treq
of
i
50, and an Rmax
of 2. Let  , the per-channel link bandwidth, be 1 Gbps, and each unit of
discrete time correspond to 10ms. Assume the 4-channel link state is as shown in Figure
24. In the time range 50  t  60 , we can schedule 1 channel for the transfer. Within this
range, 12.5MB (= 1Gbps  10  10ms ) can be transferred. In the 60  t  70 range, we can
72
i
allocate 2 channels since Rmax
is 2. Therefore, we can transfer 25MB. The remaining
12.5MB can be assigned to 2 channels past t  70 . Even though available bandwidth is 3
i
channels in the 70  t  80 , we can only assign two channels because of the Rmax
limit.
Therefore, the TRC vector is as follows: {(50, 60, 1), (60, 75, 2)} where each tuple is of
the form ( Bki , E ki , C ki ) and the number of ranges for this call  i is 2 if multiplicative
factor M is 1. But, if M is 2, then TRC 1 f  (50, 60, 1), (60, 95, 2) to be able to transfer
100MB when we really only need TRC allocation to transfer 50MB. The TRC allocation
is indicated by the shaded area in Figure 24.
1(t)
4
3
2
1
Time
10
20
30
40
50
60
70
80
90



Figure 24: Example of  1 (t ) , P1  0 ,  (0)  0 , P2  10 , z max  9 ,
Pzmax  P9  80 , M = 2- Link 1 (Shaded area shows the allocation
for the example 25MB file transfer)
At switch 2, it compares with the received TRC 1 f with  2 (t ) , and creates a new TRC
vector. Assuming the propagation delay is 10ms, and that the clocks are synchronized, as
shown in Figure 25, the TRC vectors as switch 2 is as follows: TRC 2 f  (51, 80, 1), (80,
80.5, 2) and TRC 2r  (60, 80, 1), (85.5, 95, 2).
73
 2 (t)
4
3
2
1
Time
10
20
30
40
50
60
70
80
90



Figure 25: Example of  2 (t ) , P1  0 ,  (0)  0 , P2  10 ,
z max  9 , and Pzmax  P9  80 - Link 2 (Shaded area shows the
allocation for the example 25MB file transfer)
5.2 Analysis and simulation results
We describe our traffic model in Section 5.2.1. To understand the VBLS system better
in multiple-link scenario cases, we carry out two sensitivity analysis experiments, which
are described in Section 5.2.2.
5.2.1 Traffic model
The network model studied is presented in Figure 26. The network model consists of
three switches, three sources (Source, Src1, and Src2), and three destinations (Dest,
Dest1, and Dest2). Both link capacities C12 (between SW1 and SW2) and C 23 (between
SW2 and SW3) are assumed to be 100 channels (per channel bandwidth is 10Gbps). File
transfers between Source and Dest are studied (“study traffic”), and the file transfers
between Srcx and Destx are creased as “interference traffic” to simulate cross traffic.
Similar to the single link case (see Section 4.2.1), we assume that file transfer requests
74
i
from all sources arrive according to a Poisson process. The request start time Treq
of each
file from all sources is equal to its arrival time. The size of each file is distributed
according to a bounded Pareto distribution with a mean of 2.27 GB. For the input
parameters, we choose   1.1 , k  500MB , and p  100GB .
To simulate the scheduling scheme under different load conditions, the mean file
interarrival time for the study traffic is kept constant through all simulations while
varying the mean call interarrival time of the interference traffic. The mean call
interarrival time used by Source is 10 files/sec, computing to a system load of 18%
introduced to the network by Source. The mean call interarrival times used for the
interference traffic (generated by Src1 and Src2) are varied (5, 10, 15, 20, 25, 30, 35, and
40 files/sec). With these different load conditions, the link between switches will
experience a total load varying from 27% to 91%.
Src1
Src2
Interference traffic Interference traffic
Source
SW 1
SW 2
SW 3
Dest1
Dest2
Study traffic
Figure 26: Network model
Dest
75
5.2.2 Sensitivity analysis
In this section, we carry out two experiments: (i) to understand the impact of M
(Multiplicative factor), and (ii) to understand the impact of the discrete time unit.
i
For the first experiment, we assume that all calls request the same Rmax
(= 1channel)
when the link capacity is assumed to be 100 channels, the values of the discrete time unit
and propagation delay are fixed as 10ms and 5ms respectively, and the clocks are
synchronized. Under different values of M (multiplicative factor), i.e., M = 2, M = 3, and
M = 4, we study the percentages of blocked calls and file throughput as the interference
traffic load increases.
As shown in Figure 27, the call-blocking percentages of our VBLS scheme show that
increasing the value of M decreases the call-blocking probability significantly. With the
usage of a large value by SW1, the chances that a succeeding switch can successfully find
the set of time ranges belonging to the time ranges allocated by SW1, are increased.
However, as the value of M increases, file throughput drops significantly as shown in
Figure 28. This is because successive file transfers at SW1 will suffer from large file
latency caused by the size of M. For example, if M equals four, SW1 should allocate four
times of F i . Because of these large occupied time ranges, succeeding file transfers will
experience the large file delay.
76
100
90
80
Blocked calls(%)
70
60
50
40
M=2
M=3
30
20
10
0
M=4
0
0.1
0.2
0.3
0.4
0.5
Interference traffic load
0.6
0.7
0.8
Figure 27: Percentages of blocked calls comparison for different
values of M when Tdiscrete  0.01sec and p12  p23  5ms
10
9
M=2
8
File throughput, Gbps
7
M=3
6
M=4
5
4
3
2
1
0
0
0.1
0.2
0.3
0.4
0.5
Interference traffic load
0.6
0.7
0.8
Figure 28: File throughput comparison for different values
of M when Tdiscrete  0.01sec and p12  p23  5ms
77
i
For the second experiment, we assume that all calls request the same Rmax
(= 1channel)
when the link capacity is assumed to be 100 channels. The values of M and propagation
delay are fixed as 3 and 5ms respectively, and the clocks are synchronized. Under
different values of Tdiscrete (discrete time unit), i.e., Tdiscrete  0.01sec , Tdiscrete  0.1sec , and
Tdiscrete  1sec , we study the percentages of blocked calls and file throughput as the
interference traffic load increases.
As shown in Figure 29, as the value of Tdiscrete increases, the call-blocking percentages
of our VBLS scheme increase as well. This is because if Tdiscrete is large, much larger time
ranges will be occupied at SW1 and succeeding switches. When SW2 receives the TRC
vector from SW1, the scheduler at SW2 has less chance of finding the subsets of time
ranges selected in the TRC vector from SW1 due to the smaller empty ranges available.
As already noticed from the single link case, there exists for the tradeoff between the
storage needed for  (t ) and the system performance. As we already expected, the
performance of the file throughput drops significantly as the system load increases (see
Figure 30). This result is consistent with those found in single-link case (see Section
4.2.3).
78
100
90
80
Blocked calls(%)
70
60
Tdiscrete = 1sec
50
40
Tdiscrete = 0.1sec
30
20
Tdiscrete = 0.01sec
10
0
0
0.1
0.2
0.3
0.4
0.5
Interference traffic load
0.6
0.7
0.8
Figure 29: Percentages of blocked calls comparison for
different values of Tdiscrete when M = 3 and p12  p23  5ms
10
9
Tdiscrete = 0.01sec
8
File throughput, Gbps
7
Tdiscrete = 0.1sec
6
5
Tdiscrete = 1sec
4
3
2
1
0
0
0.1
0.2
0.3
0.4
0.5
Interference traffic load
0.6
0.7
0.8
Figure 30: File throughput comparison for different values of
Tdiscrete when M = 3 and p12  p23  5ms
79
Chapter 6. Conclusions and future work
In Chapters 2 and 3, we propose improving delay performance of file transfers by
using intra-network paths where possible. Specifically, we propose a service called
CHEETAH in which pairs of end hosts are connected on a call-by-call basis via highspeed end-to-end Ethernet/EoS circuits. This is feasible today given the deployment of
fiber to enterprises, MSPPs in enterprises and EoS technologies within these MSPPs.
Seeking to achieve high utilization, we propose setting up unidirectional EoS circuits and
only holding circuits for the duration of the actual file transfers. The CHEETAH service
is proposed as an add-on to basic Internet access service. The latter allows for the optical
circuit-switched network to be operated in call-blocking mode such that if the circuit
setup is blocked, an end host can fall back to the TCP/IP path. If the circuit setup is
successful, there is a huge advantage in minimizing total delay especially in wide-area
environments. For example, a 1TB file requires 2.2 hours on a 1Gbps end-to-end circuit,
but could take more than 4 days on a TCP/IP path in a WAN environment. We analyzed
the conditions under which a circuit setup should be attempted. For WAN environments
and large files, it is clear that a circuit setup should be attempted. We also found that for
medium-sized files (MBs) in WAN environments, it is worthwhile making this attempt.
In lower propagation-delay environments, if bottleneck link rates are on the order of
100Mbps, for files larger than 3.5MB, it becomes worthwhile attempting a circuit setup.
For higher link rates (1Gbps), or smaller files, one should consider the loading conditions
on the two paths, probability of packet loss on the TCP/IP path, and call-blocking
probability through the circuit-switched network, before deciding whether or not to
attempt the circuit setup.
80
In Chapters 4 and 5, instead of call-blocking mode in CHEETAH, we propose a callscheduling algorithm in which varying levels of capacity are allocated to a file transfer
using knowledge of file sizes of all admitted transfers. We called this heuristic for
scheduling file transfers Varying-Bandwidth List Scheduling (VBLS).
By having end host applications provide the VBLS scheduler file sizes, it is possible to
make a Time-Range-Capacity (TRC) vector allocation for file transfers. This approach
overcomes a well-known drawback of using circuits for file transfers where a fixedbandwidth allocation mode fails to allow users to take advantage of bandwidth that
becomes available subsequent to the start of a transfer. We demonstrate through
simulation that with VBLS we can improve performance significantly over fixedbandwidth schemes, making it indistinguishable from packet switching.
Adopting the VBLS scheme, in CHEETAH instead of call-blocking mode, we might
have improved the gain, i.e., less file-transfer delay. For example, when a large file
comes (for example 1TB, see the beginning of Chapter 4), the end-host might have a
high-access service via the end-to-end circuit over the TCP/IP backup path. In addition,
the routing decisions made in Chapter 3 might have to be adapted.
As one part of our future work, we can include a second class of user requests,
specifically targeted at interactive applications (long-holding-time applications), such as
remote visualization and simulation steering. Such requests will be specified as
i
i
i
i
i
and Rmax
are the minimum
( H i , Rmin
, Rmax
, Treq
) , where H i is the holding time, Rmin
i
and maximum bandwidth acceptable to the user, and Treq
is the requested start time. The
VBLS scheduler can handle such requests in the same manner as it does file transfer
81
i
i
requests whereby it allocates a TRC vector starting at some time Tstart
and ending
 Treq
i
i
i
 H i . During this interval Tstart
 t  Tstart
 H , varying levels of bandwidth will
at Tstart
be allocated in a TRC vector such that the capacity assigned in any time range k is not
i
i
less than Rmin
and not greater than Rmax
.
As another part of our future work, we can extend the simulations for the multiple link
case. One possible set of simulations is to extend the sensitivity analysis of VBLS more
thoroughly by varying the propagation delays but keeping the other parameters fixed i.e.,
M and Tdiscrete .
Another possible set of simulations is to do the comparison between VBLS and TCP/IP
via simulations. In our previous simulations for the comparison between VBLS and
packet-switched system (PS), we considered that buffer size is infinity (no packet loss).
But practically, that is impossible. If we have a finite buffer, then there would be packet
loss due to buffer overflow. Hence, lost packets should be retransmitted. The time to
retransmit the packets due to loss will cause degradation of performance in a packetswitched system.
82
Appendices
A-1. Examples of VBLS/CA
In this section, to understand the VBLS/CA algorithm more thoroughly, we provide the
following two examples. For both cases, assume the granularity of our discrete time is
1ms and the per-channel rate,  is 1Gbps.
For the first example, assume the file being scheduled is 15.625MB, which means it
requires 125 slots. We set the other parameters of the file transfer request as indicated in
Table 7. The state of the single link is shown in Figure 31. Table 8 shows the TRL
allocation vector for each round. After Round 10, all the time ranges have been identified
for the file transfer. Now we need to merge ranges on the same channel. From Table 8,
we see that for channel 2, ranges “(35-40), (40-45), (45-50), (50-55), and (55-60)” can be
merged. For channel 3, ranges “(30-35) and (35-40)” and “(45-50) and (50-55)” can be
merged. After merging the ranges for channel 4 and 5, the final allocation for TRLi is
{( B1i  15 , E1i  30 , L1i  5 ), ( B2i  15 , E 2i  25 , Li2  4 ), ( B3i  15 , E3i  20 , Li3  2 ),
( B4i  30 , E 4i  40 , Li4  3 ), ( B5i  35 , E 5i  60 , Li5  2 ), ( B6i  40 , E 6i  68.3 , Li6  4 ),
( B7i  45 , E 7i  55 , Li7  3 ), ( B8i  55 , E8i  68.3 , Li8  5 ), and ( B9i  60 , E 9i  68.3 ,
Li9  1 )}.This allocation is shown with dashed lines in Figure 31.
Table 7: Input parameters for example 1
Parameter
Value
i
15
Treq
i
Rmax
m
3 channels
5 channels
83
Ai (t )
allocated to example file
allocated to example file
Channel 5
Channel 4
Channel 3
Channel 2
Channel 1
Time

10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90

Figure 31: Dashed lines show the allocation of resources for example
i
i
3
15.625MB file transfer described above with Treq
 15 and Rmax
Case
Round 0
(Initial)
Round 1

-
125
4
110
Table 8: TRL vectors for each round (Example 1)
 (v) k
v
z
TRL i
Pz 1 Copen
4
i
=15
Treq
20
2
1
0
15
{}
{}
4
1
20
{4, 5}
B1i  15 , E1i  20 ,
and L1i  5
B2i  15 , E 2i  20 ,
and Li2  4
B3i  15 , E3i  20 ,
and Li3  2
Round 2
2
100
25
1
6
2
25
{5}
B4i  20 , E 4i  25 ,
and Li4  4
B5i  20 , E 5i  25 ,
and Li5  5
Round 3
2
95
30
1
7
3
30
{}
B6i  25 , E 6i  30 ,
and Li6  5
Round 4
2
90
35
2
8
4
35
{3}
B7i  30 , E 7i  35 ,
and Li7  3
Round 5
2
80
40
2
10
5
40
{2}
B8i  35 , E8i  40 ,
84
and Li8  3
B9i  35 , E9i  40 ,
and Li9  2
Round 6
2
70
45
3
12
6
45
{2, 4}
B10i  40 , E10i  45
i
2
and L10
B11i  40 , E10i  45 ,
i
and L11
4
Round 7
2
55
50
4
15
7
50
{2, 3,
4}
B12i  45 E12i  50 ,
i
and L12
2
i
B13  45 , E13i  50 ,
i
3
and L13
B14i  45 , E14i  50 ,
i
and L14
4
Round 8
4
40
55
4
18
8
55
{1, 2,
4}
B15i  50 E15i  55 ,
i
2
and L15
B16i  50 , E16i  55 ,
i
3
and L16
B17i  50 , E17i  55 ,
i
4
and L17
Round 9
4
25
60
3
21
9
60
{1, 4,
5}
B18i  55 E18i  60 ,
i
2
and L18
B19i  55 , E19i  60 ,
i
4
and L19
i
i
B 20
 55 , E 20
 60 ,
and Li20  5
Round 10
(exit loop)
1
0
70
2
24
10
70
{5}
i
i
B21
 60 , E 21
 68.3
i
and L21  1
i
i
 68.3 ,
B22
 60 , E22
i
and L22  4
i
i
 68.3
B23
 60 , E 23
and Li23  5
85
For the second example, assume the file being scheduled is 3.125MB, which means it
requires 25 slots. We set the other parameters of the file transfer request as indicated in
Table 9. The state of the single link is shown in Figure 32. Table 10 shows the TRL
allocation vector for each round. After Round 6, all the time ranges have been identified.
Now we need to merge ranges on the same channel if any. From Table 10, we see that for
channel 2, ranges (35-40), (40-45), (45-50), (50-55), and (55-60) can be merged into one
time range. Therefore, TRLi is { B1i  35 , E1i  60 , L1i  2 }. The final allocation is
illustrated with dashed line in Figure 32.
Table 9: Input parameters for example 2
Parameter
Value
i
32
Treq
1 channels
i
Rmax
m
7 channels
Ai (t )
Channel 7
Channel 6
Channel 5
Channel 4
Channel 3
allocated to example file
Channel 2
Channel 1
Time
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90


i
Treq
 32
Figure 32: Dashed lines show the allocation of resources for example
i
i
1
3.125MB file transfer described above with Treq
 32 and Rmax
86
Case

25
Table 10: TRL vectors for each round (Example 2)
 (v) k
v
z
TRL i
Pz 1 Copen
Round 0
(Initial)
-
Round 1
35
Round 2
Repeat 25
loop
4
20
40
Round 3
4
15
45
Round 4
4
10
50
i
=32
Treq
0
1
0
32
{}
{}
2
1
1
35
{}
{}
5
2
2
40
{2,3}
B1i  35 , E1i  40 ,
and L1i  2
3
3
45
{2,4,6,7
}
B2i  40 , E 2i  45 ,
and Li2  2
4
4
50
{2,4,6,7
}
B3i  45 , E 3i  50 ,
4
6
and Li7  2
Round 5
4
5
55
6
10
5
55
{1,2,3,4
,6}
B4i  50 , E 4i  55 ,
and Li4  2
Round 6
4
0
60
3
6
6
60
{1,4,5}
B5i  55 , E 5i  60 ,
and Li5  2
A-2. Some Characteristics of bounded Pareto distribution
In our simulation, for the file size distribution, we used the bounded Pareto distribution
rather that the exponential distribution.
In this section, we briefly go over some important characteristics of bounded Pareto
distribution. These are summarized as follows:

Bounded Pareto distribution (Truncated Pareto) ~ probability density function (PDF)
of the file size is given by
f X ( x) 
k  x  1
,
kx p

k
1   
 p
where , the shape parameter, k, the lower bound, and p, the upper bound.
(26)
87

Mean of this distribution (First Moment):

E X ( x) 
x f
p
X



k 
1

k 
dx 

1
k  1
 1

p
k  
      1

p
k 
p
1
x
 1


k
(27)
 1
1 
   1

  1   1 
p 
  k      1  k
1    
  p  


Second moment of this distribution:


x
EX (x ) 



k
1


 f X ( x)dx   x 2  f X ( x)dx
k

k 
p
k  
p
2
 1
 x  x dx 
k 
k
p
k 
1
p
2

1



k 
p
p
1
  2 
    2 x

k
1
k   2  p   2
 2
(28)
k
1 
   1

   2    2 
p 
  k      2  k
1    
  p  


th
General formula for j moment of the bounded Pareto distribution:





k  k j   p j 
p



E X ( x j )   x j  f X ( x)dx   (  j )(1  (k / p) )
k
k

 ln p  ln k 

1  (k / p)

p
1
k
2

 1
k 
k
p


k
p
xx
 
k 
1
( x)dx   x  f X ( x)dx 
if
 j
(29)
if
  j 1
Variance of the file-size distribution:

 X2   ( x  E X ( x)) 2 f X ( x)dx  E X ( x 2 )  E X ( x) 2

(30)
88
Bibliography
[1] Kevin Thompson, Gregory J. Miller, and Rick Wilder. Wide-area Internet traffic
patterns and characteristics. IEEE Network, 11(6), November 1997.
[2] T. Ndousse, “High-performance Networks Research Program Overview,”
http://www.sc.doe.gov/ascr/mics/hpn/index.html.
[3] Napsgter, http://www.napster.com.
[4] Gnutella, http://www.wego.com
[5] H. B. Newman, M. H. Ellisman, J. A. Orcutt, “Data-intensive e-science frontier
research,” Communications of ACM, Vol. 46, No. 11, pp. 68-77, Nov. 2003.
[6] T. DeFanti, C. D. Laat, J. Mambretti, K. Neggers, B. St. Arnaud, “TransLight: a
global-scale LambdaGrid for e-science,” Communications of ACM, Vol. 46, No. 11,
pp. 34-41, Nov. 2003.
[7] First International Workshop on Protocols for Fast Long-Distance Networks,
PFLDnet 2003, http://data-ag.web.cern.ch/datatag/pfldnet2003/, Feb. 3-4, 2003,
Geneva, Switzerland.
[8] W. Feng and P. Tinnakornsrisupapa, “The Failure of TCP in High-Performance
Computational Grids,” Proc. of SC2000: High-Performance Network and
Computing Conference, Dallas, TX, Nov. 2000.
[9] S. Floyd, “High Speed TCP and Quick-Start for Fast Long-Distance Networks,”,
PFLDnet 2003, http://datatag.web.cern.ch/datatag/pfldnet2003/, Feb. 3-4, 2003,
Geneva, Switzerland.
[10] C. Jin, D. Wei, S. Low, J. Bunn, D. H. Choe, J. C. Doyle, H. Newman, S. Ravot, S.
Singh, G. Buhrmaster, R.M.A. Cottrell, and F. Paganini, “FAST Kernel:
89
Background
Theory
and
Experimental
http://datatag.web.cern.ch/datatag/pfldnet2003/,
Results,”
Feb.
3-4,
PFLDnet
2003,
2003,
Geneva,
Switzerland.
[11] T. Kelly, “Scalable TCP: Improving Performance in High Speed Wide Area
Networks,” PFLDnet 2003, http://datatag.web.cern.ch/datatag/pfldnet2003/, Feb. 34, 2003, Geneva, Switzerland.
[12] J. Semke, J. Mahdavi, and M. Mathis, “Automatic TCP Buffer Tuning,” Proc. of
ACM SIGCOMM 1998, 28(4), October 1998.
[13] W. Feng, M. Gardner, M. Fish, and E. Weigle, “Automatic Flow-Control
Adaptation for Enhancing Network Performance in Computational Grids,” Journal
of Grid Computing 2003.
[14] M. Grardner, W. Feng, and M. Fish, “Dynamic Right-sizing in FTP(drsFTP): An
Automatic Technique for Enhancing Grid Performance,” Proc. of the IEEE
Symposium on High-Performance Distributed Computing, July 2002.
[15] Matthew Mathis, www.psc.edu/~mathis/MTU
[16] Bill St. Arnaud, “Proposed CA*net 4 Network Design and Research Program,”
Revision no. 8, April 2, 2002.
[17] P. Ashwood-Smith, et al. “Generalized MPLS - RSVP-TE Extensions,” IETF
Internet Draft, draft-ietf-mpls-generalized-rsvp-te-04.txt, July 2001.
[18] Optical Internetworking Forum, “User Network Interface (UNI) 1.0 Signaling
Specification,” Oct. 1, 2001, http://www.oiforum.com/public/documents/OIF-UNI01.0.pdf.
90
[19] E. G. Coffman, Jr., M. R. Garey, D. S. Johnson, A. S. Lapaugh, “Scheduling File
Transfers,” SIAM Journal of Computing, vol. 14, no. 3, Aug. 1985, pp. 744-780.
[20] T. Erlebach and K. Jansen, “Off-line and On-line Call Scheduling in Stars and
Trees,” in Proceedings of the 23rd International Workshop on Graph-Theoretic
Concepts in Computer Science, WG ’97, LNCS1335, pp. 195-213, Springer-Verlag
1997.
[21] J. T. Havil, W. Mao, and R. Simha, “A Lower bound for on-line File Transfer
Routing and Scheduling,” Proceedings of the 1997 Conference on Information
Sciences and Systems, 225-230, 1997.
[22] D. Wischik and A. Greenberg, “Admission control for booking ahead shared
resources,” Proc. of IEEE Infocom, 1998, pp. 873-882
[23] D. Ferrari, A. Gupta, G. Ventre, “Distributed advanced reservation of real-time
connections,” Tech. Rep. TR95-008, International Computer Science Institute,
Berkeley, March 1995.
[24] W. Reinhardt, “Advanced resource reservation and its impact on reservation
protocols.” In Proc. IWACA’94.
[25] L. C. Wolf, L. Delgrossi, R. Steinmetz, S. Schaller and H. Wittig, “Issues of
reserving resources in advance,” Proc. NODDAV ’95, 1995.
[26] S. Martello and D. Vigo, “Exact solution of the two-dimensional bin packing
problem,” Management Science, 1998.
[27] E. E. Bischoff and M. D. Marriott, “A Comparative Evaluation of Heuristics for
Container Loading,” European Journal of Operations Research, 44: 267-276, 1990
91
[28] M. Gehring, K. Menscher, M. Meyer, “A Computer Based Heuristic for Packing
Pooled Shipment Containers,” European Journal of Operations Research, 44: 277288, 1990.
[29] S. Martello, D. Pisinger, D. Vigo, “The Three-Dimensional Bin Packing Problem,”
Operations Research, 48, 256-267, 2000.
[30] M. Pinedo, Scheduling: Theory, algorithms and systems, Prentice Hall, Inc. 1995.
[31] I. R. Philip and Y.-L. Liong, “The Scheduled Transfer (ST) Protocol,” 3rd Intl.
Workshop on Communications, Architecture and Applications for Network-Based
Parallel Computing (CANC’99), Lecture Notes in Computer Science, vol. 1502, Jan.
1999.
[32] H. Wang, M.Veeraraghavan and R. Karri, “A hardware implementation of a
signaling protocol,” Proc. of Opticomm 2002, July 29-Aug. 2, 2002, Boston, MA.
[33] S. K. Long, R. R. Pillai, J. Biswas, T. C. Khong, “Call Performance Studies on the
ATM
Forum
UNI
Signaling,”
http://www.krdl.org.sg/Research/Publications/Papers/pillai_uni_perf.pdf.
[34] W. Doeringer, D. Dykeman, M. Kaiserswerth, B. W. Meister, H. Rudin, R.
Williamson, “A survey of light-weight transport protocols for high-speed networks”,
IEEE Trans. Comm., 38(11):2025-39, Nov. 1990.
[35] S. Iren, P. D. Amer, and P.T. Conrad, “The Transport Layer: Tutorial and Survey,”
ACM Computing Surveys, Vol. 31, No. 4, Dec. 99.
[36] M. Blumrich, C. Dubrucki. E. Felton, and K. Li, “Protected, User-Level DMA for
the SHRIMP Network Interface,” In Proceedings 2nd International Symposium on
High Performance Architecture, San Jose, CA, Feb. 3-7, 1996, pp. 154-165.
92
[37] P. Druschel and L.L. Peterson and B.S. Davic, “Experiences with a High-Speed
Network Adapter: A-Software Perspective,” In Proceedings of ACM Sigcomm ’94
Aug. 1994.
[38] S. Pakin, M. Lauria, and A. Chien, “High Performance Messaging on Workstations:
Illinois Fast Messages (FM) for Myrinet,” In Proceedings of Supercomputing ’95,
San Diego, CA, 1995.
[39] N. Cardwell, S. Savage, and T. Anderson, “Modeling TCP Latency,” Proc. of IEEE
Infocom, Mar. 26-30, 2000, Tel-Aviv, Israel, pp. 1724-1751.
[40] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP Throughput: A
Simple Model and its Empirical Validation,” Proc. of ACM SIGCOMM 98, Aug. 31
- Sep. 4, Vancouver Canada, pp. 303-314.
[41] M. Allman, V. Paxson, W. Stevens, “TCP Congestion Control”, IETF RFC 2581,
Apr. 1999.
[42] V. Paxson and S. Floyd, “Wide-area traffic: The failure of Poisson Modeling,”
IEEE/ACM Trasn. Networking, Vol. 3, pp. 226-244, June 1995.
[43] M. E. Crovella and A. Bestavros, “Self-similarity in World Wide Web Traffic
Evidence and Possible Causes,” Proc. of the SPIE International Conference on
Performance and Control of Network Systems, Nov., 1997.
[44] T. Bu, N. G. Duffield, F. Lo Presti, D. Towsley, “Network Tomography on General
Topologies,” Proceedings of ACM SIGMETRICS 2002.
[45] P. Newman, G. Minshall, T. Lyon, L. Huston, “Flow Labeled IP: A Connectionless
Approach to ATM,” Proc. of IEEE Infocom 1996.
93
[46] M. Veeraraghavan, H. Lee and X. Zheng, “File transfers across optical circuitswitched networks,” PFLDnet 2003, Feb. 3-4, 2003, Geneva, Switzerland.
[47] M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, W. Feng, "CHEETAH: Circuitswitched High-speed End-to-End Transport ArcHitecture,” accepted for publication
in the Proc. of Opticomm 2003, Oct. 13-17, Dallas, TX.
[48] J. Walrand, “Managing QoS in Heterogeneous Networks, Internet Next Generatin,”
http://robotics.eecs.berkeley,edu/~wlr/Presentatioins/Managing%20QoS.pdf,
Porquerolles, 2003
[49] M. E. Crovella, M. Harchol-Balter, and C. D. Murta, “Task Assignment in a
Distributed System: Improving Performance by Unbalancing Load,” BUCS-TR1997-018, October 31, 1997.
[50] D. Bertsekas and R. Gallager, Data Networks, Prentice Hall: New Jersey, 1986.
[51] R.
Van
der
Meer,
“MEMS
Madness,”
June
19,
2001,
http://www.lightreading.com/document.asp?doc_id=6149&site-trading
[52] Veeraraghavan, H. Lee, E.K.P. Chong, H. Li, “A varying-bandwidth list scheduling
heuristic for file transfers,” in Proc of ICC2004, June 20-24, Paris, France.
[53] M. Veeraraghavan, X. Zheng, W. Feng, H. Lee, E. K. P. Chong, and H. Li,
“Scheduling and Transport for File Transfers on High-speed Optical Circuits,”
PFLDNET
2004,
Feb.
16-17,
2004,
Argonne,
Illinois,
http://www.-
didc.lbl.gov/PFLDnet2004/.
[54] H. Lee, M. Veeraraghavan, E.K.P. Chong, H. Li, “Lambda scheduling algorithm for
file transfers on high-speed optical circuits,” in Proc. of the Workshop of Grids and
Advanced Networks (GAN’04), part of the IEEE International Symposium on
94
Cluster Computing and the Grid (CCGrid 2004), April 19-22, 2004, Chicago,
Illinois.
[55] M. Veeraraghavan, X. Zheng, W. Feng H. Lee, E.K.P. Chong, H. Li, “Scheduling
and Transport for File Transfers on High-speed Optical Circuits,” Journal of Grid
Computing (JOGC 2004) , Special Issue on High Performance Networking, to
appear.
[56] David L. Mills, Ajit Thyagarjan, and Brian C. Huffman, “Internet timekeeping
around the globe,” in Proc. Precision Time and Time Interval (PTTI) Applications
and Planning Meeting, Long Beach CA, Dec. 1997.
Download