MS Word

advertisement
Automatic Bandwidth Delay Product Discovery
Probably the most important unsolved technical problem with TCP over
high-performance networks today is automatically determining the
bandwidth-delay-product (BDP), which is used to specify the simplified
maximum TCP window size for each TCP session. The correct BDP is
extremely important for maximum utilization of high-performance
networking without undue memory consumption, particularly over very
long-distance high-performance networks.
The simplified BDP is calculated by multiplying the round-trip-time (RTT)
by the maximum bandwidth of the least-capable hop of all of the router hops
between two hosts using the TCP protocol. The RRT is easily obtained with
the "ping" protocol, which utilizes an ICMP echo request message, which is
a special IP message that is echoed back to the sending host by the receiving.
Simply measuring the time it takes to get the echoed message back provides
the RTT.
Right now, however, there is no way quick and easy method to
automatically obtain the least-bandwidth number between an arbitrary pair
of TCP hosts, so all users who expect to obtain high-performance
networking results are required to manually compute this value, which
essentially implies that such users must have an intimate knowledge of the
complete network topology between their own host and any host with which
they wish to communicate. Furthermore, every application that is to make
use of this information requires special coding and special user-interface
parameters. This is a ludicrous situation akin to having to be an expert
automobile mechanic to be able to drive an automobile (which was actually
the case at the dawn of the automobile age). As long as it requires networkengineer training for any user that must deal with BDP-discovery, we too
will remain only at the dawn of the high-performance networking era.
The method proposed herein for automatic BDP discovery and caching is to
use a simple mechanism modeled after the ICMP Echo Request and Echo
Reply protocol to discover the bandwidth of the least-capable hop between a
given source and destination host pair. This new mechanism could be a new
type of ICMP Request/Reply pair, or it could be a simple enhancement to
the existing Echo Request/Reply, but using a new IP option class/number
combination. The main difference between the new mechanism and the
existing ICMP Request/Reply pair is that the router would have to process
two new fields in the message.
This new mechanism would actually be different from the ICMP Echo
protocol in only a couple of ways, and would work as follows:
1. Two new fields would be defined for the Request and Reply message
types:
a) A Next-Hop-Least-Bandwidth-Request Field, and
b) A Next-Hop-Least-Bandwidth-Reply Field
2. BDP Request and Reply would work like Echo Request and Reply does
now except each router along the path would intercept the BDP messages
and perform the following before forwarding the message to the next
hop:
a) If the message is a Request, then the router would compare the value of
the Next-Hop-Least-Bandwidth-Request field with the router’s best
knowledge of the next hop’s maximum possible bandwidth, and if this
next hop bandwidth is less than the field’s value, then the new value
would be overwritten in the field, otherwise the field’s existing value is
unchanged. Essentially, the router should always compare the maximum
possible bandwidth of the next hop with the current Next-Hop-LeastBandwidth-Request value. The goal is to return the maximum possible
bandwidth of the least-capable link in the path so that the maximum TCP
window will never be too small to consume all possible bandwidth,
should such bandwidth ever be available. If a router isn't exactly sure of
a link's maximum possible bandwidth, it should therefore use the largest
bandwidth it thinks a link may be capable of. Note that this procedure is
not a bandwidth allocation scheme and a router should never use a value
smaller than the maximum bandwidth that a link is capable of.
b) If the message is a Reply, then the router would compare the value of the
Next-Hop-Least-Bandwidth-Reply field with the router’s best knowledge
of the next hop’s maximum possible bandwidth, and if this next hop
bandwidth is less than the field’s value, then the new value would be
overwritten in the field, otherwise the field's existing value is unchanged.
Essentially, the router should always compare the maximum possible
bandwidth of the next hop with the current Next-Hop-Least-BandwidthReply value. The goal is to return the maximum possible bandwidth of
the least-capable link in the path so that the maximum TCP window will
never be too small to consume all possible bandwidth, should such
bandwidth ever be available. If a router isn't exactly sure of a link's
maximum possible bandwidth, it should therefore use the largest
bandwidth it thinks a link may be capable of. Note that this procedure is
not a bandwidth allocation scheme and a router should never use a value
smaller than the maximum bandwidth that a link is capable of.
3. The source host would fill in the initial Next-Hop-Least-BandwidthRequest value, while the destination host would fill in the initial NextHop-Least-Bandwidth-Reply value. The destination host is also
responsible for converting the received Request message into the
outgoing Reply message, preserving all information received in the
Request message, just as it would for a normal Echo Request message.
The destination host could also extract the value of the Next-Hop-LeastBandwidth-Request Field.
Note that when the BDP Reply arrives at the source host, it would provide
the RRT plus the least-bandwidth information for both the forwarding and
return paths between the source host and the destination host, which is all
the information necessary to calculate the pair of simplified BDPs. This
information would be cached in a new kernel table indexed by destination
host IP addresses, and individually cached entries would timeout after some
suitable interval to force fresh information to be obtained.
Development of this BDP protocol initially requires the cooperation of at
least one router vendor, though a crude prototype could be demonstrated
with traceroute and SNMP-derived information.
Download