Critique

advertisement
Efficient Policies for Carrying Web Traffic Over Flow-Switched Networks
By Anja Feldmann, Jennifer Rexford, and Ramon Caceres
Paper Critique by:
Tauhid Mazedur Rahman
Reji Varghese
Muhammad Murshed Alam
Abstract:
For the last few years Internet is experiencing an explosive growth and it is really hard to
predict the growth. As the Internet grows, there is an increasing demand for faster and
more efficient traffic management over high-speed links. In this regard several approaches
have proposed for dedicated shortcut paths for serving long-lived flows to improve
performance. By this method the network can utilize all the resources more efficiently and
resultantly can ensure fast traffic transfer.
The authors Anja Feldman, Jennifer Rexford, and Ramon Caceres in their paper have tried
to evaluate the processor and switch overheads for flow-switched network. They
characterize the
flow-traffic based on various parameters and found that moderate level of aggregation,
trigger and timeout along with partial route aggregation give the best result while
balancing the cost-performance of the network.
I. Introduction:
The Internet as a whole is growing so fast that the limited network resources are not
enough for handling the huge and endless traffic. So there is an urgency to exploit more
efficient packet-switching technique for the routers. As the network traffic is growing more
and more, the network resources can never keep pace with the traffic. The only way open
is to make sure that the available resources are being utilized up to their maximum
capability. In this context a lot of researches have suggested to crate shortcut for flows.
Shortcut refers to fast switching path and flow refers to a group of related IP packets that
are combined as a bunch. The idea behind flow is that the router treats the packets that
have some similar property such as same destination address, hereby serves them in the
same way. This will reduce the router overhead to check every packet individually and so
the performance will get a boost. As the flow is defined, the next task is to create shortcut
for the flow. In network shortcut means some predefined path that takes minimum
switching time. Normally as a packet arrives, the router has to look at the routing table and
match-mask the address, and then forwards the packed. This costs both time and
computational power. But when there is a shortcut for any flow, as any packet of the flow
reaches the router, it knows where to forward and so needs not look at the routing table.
That saves both the time for lookup and the router overhead for path calculation. Tag
switching is an implementation of this concept.
It is quite evident that the idea of establishing flow and then creating and maintaining
shortcut reduces the forwarding overhead on the routers; saves time and can capitalize on
quality-of-service (QoS) features. But, does the concept have any drawbacks? Yes it does,
creation and maintenance of shortcut consume some network resources. Both the flow and
shortcut depend on the number of router interface, signaling capacity and number of
switches. So the authors of the paper explored the tradeoffs inherent in using different
definitions for flow and various properties for creating and maintaining a shortcut path.
They tried to find out some ways that reduces the network overhead still serving most of
the packets through the flow and shortcut.
Feldman, Rexford and Caceres considered three parameters for deciding flow and shortcut:
aggregation, timeout and trigger. Aggregation means the addressing level at which packets
are combined. For example a flow can be constituted with packets flowing between the
same two networks. Timeout refers to the time interval between flows. As the network
resources are limited, any routed should never reserve the resources for infinite time for
any flow to come and this waiting is restricted by imposing timeout for flow. Trigger
decides the number of packet that constitutes a flow. This parameter helps the router to
establish a shortcut only after that many packets have passed through that router with
similar flow characteristics.
They explored the following three metrics of importance by varying the above three
parameters: the percentage of traffic that follows shortcut, the shortcut setup rate and the
number of simultaneous shortcuts. The percentage of total traffic that follows the shortcut
path is a measure of performance gain as it reduces the overhead on the routers. On the
contrary, the shortcut setup rate and the number of simultaneous shortcuts are the
measurements of network overheads that are caused by creating and maintaining the
dedicated paths. The shortcut setup rate refers to the number of shortcut that is created
during fixed time interval. As the network resources are limited if this metric is high more
network resources will be utilized which is quite undesired. The number of simultaneous
shortcut refers to the number of shortcut per unit time. When any router decides to create
any shortcut it is supposed to let other neighboring routers know about the shortcut, which
causes signaling overhead. So the higher the number of simultaneous shortcut the
signaling load on the network increases.
II. Methodology for data collection and measurements
The authors carried on their study on the temporal and spatial characteristics of Internet
traffic. While collecting the data they considered some issues such as: the traffic trace
reflects the dominance of World Wide Web, traffic long continuous traffic traces were
observed, the endpoint machines were taken for consideration, and all the flow and
shortcut parameters were evaluated on various parameters. Based on that they could have a
good amount of confidence that the data collection could represent the whole world
Internet scenario. They argued that their work also extends the previous initiatives on flow
and shortcut.
A. Trace collection
The Extensive packet-level trace collection was carried out on an AT&T Research Lab
Network. A T1 line connects the AT&T Lab to the external Internet. The Ethernet
segment of the internal network carries all traffic to and from the T1 line at a speed of 10Mbits/s.
A Sniffer equipped with tcdump was placed on the T1 line to collect information of all the
packets that passes along the line. The Sniffer collected data of 100 000 packets in raw
binary format in a pormiscus mode, so each of the packets was examined deeply. The
collected data of every packet was then analyzed from different aspects.
A total of four tools were used for the packet level data collection: Tcdump read each
packet and generated an ASCII log, Perl script classified each packet into flows, Splus
function processed all the log files on different performance metrics and tcpredice
estimated the FTP, HTTP, SMTP, DNS, etc. data transfer.
It was found that 76.1% of the bytes are incoming traffic, 16.4% is outgoing traffic and the
rest 7.5% was internal traffic. The internal traffic was mostly domain name service
messages.
From the optional field of the web request messages (HTTP), the operating system of the
client machine was identified. Linux and/or Windows systems were classified as personal
computers IP address associated with multiple types of machines such as Windows and
SunOS were classified as Proxy Servers. Some of the packets were classified manually in
the absence of enough packet trace information.
B. Traffic Characteristics
From the data collected from AT&T network it is found that half of the outgoing traffic
consists of 40 bytes TCP acknowledgement packets, so the average packet size becomes
123 bytes. More than 60% of the incoming packets have 552, 576, or 1500 bytes, that
corresponds to the common maximum transfer unit size of packets on the network. The
internal Ethernet network bounds the maximum packet size. For both incoming and
outgoing traffic the packet size differs heavily.
The authors carried on a one week trace collection starting from 11am on Sunday,
February 19, 1997 and took an hourly average of the bandwidth consumes by the packets.
As they drew a graph, it is found that traffic load depends on the weekdays, with most
loads on the working days and less load on evening time. The traffic on weekends was
very thin. So it indicates that Internet traffic fluctuates depending on the day and time.
By using the tcpreduce tool the authors estimated traffic properties depending on various
protocols. The table drawn based on the tcpreduce output shows that traffic has different
characteristics for different protocol. HTTP traffic dominates the traffic load among all the
protocols.
Comments
The authors used some measurement procedure for collecting and analyzing the data.
However, we think that the data collection has some minor defects as:
a) The AT&T network is a research related one, which hardly can experience the
huge amount of data faced by any commercial network.
b) The network experienced much more incoming packets than out going or
internal, but does the real Internet world experience the same behavior?
c) The internal Ethernet network has packets of 1500 bytes maximum, which
varies for other network topology.
d) The authors’ assumption about only two different operating systems may not be
the actual scenario. Though most of the machines use them.
e) The number of packets was 100 000 which may not be enough considering the
exponential growth of Internet.
f) Some packets were resolved manually, which could be erroneous.
g) The authors did not mention anything about the protocols, which they referred
as other and unknown.
h) All the packets were analyzed from one end point, which may have biased the
traffic characteristics.
III. HTTP Response Flows
In this section the authors analyze the data collected in terms of the traffic type, end-point
aggregation, timeout value and machine type.
A. Interpreting Flow-Size Distributions
The data distribution shows relatively broad mixture of flows between 100 and 100,000
bytes, with large number of flows within certain sizes. For example, large number of flows
is around 250-300 bytes. One third of the flows are between 1000 – 10000 bytes and
another one-third between 100-1000 bytes.
B. Unique Characteristics of Web Traffic
Applications have unique characteristics of web traffic. For example, Telnet flows last a
long time, though a 60s timeout will split a single Telnet session into multiple flows. But
telnet does not generate as many bytes per flow as HTTP flows. SMTP flows have a high
concentration of flows with fewer than 1000 bytes and the remaining centered around 3000
bytes. HTTP flows fall into three main categories. Less than 150 bytes stem from error
messages and failed TCP sessions. Between 150 – 300 bytes from cache message, and
above 300 from actual Web page transfers.
C. HTTP Flows by Machine Type
Flows also depend a great deal on the type of end-machine. Modem connected hosts and
personal computers have the same distribution in terms of number of bytes and packets per
flow. But the modem connection makes the duration of the modem-connected flow much
longer-lived. UNIX machines have more types and packets due to the longer transfers of
Postscript and image files. Proxy servers have a mixture of longer and shorter flows
depending on if there is a cache hit or not. If cache is hit, which means the requested data
is already in the cache, then no transfer of data takes place, and otherwise the data needs to
be transferred. If the cache messages are removed, proxy servers follow the UNIX machine
flow distributions.
D. Combining Multiple Web Responses
Traffic can be combined at three levels of aggregation, port-to-port, host-to-host, and netto-net. Port-to-Port combines traffic originating and terminating at the same port. Host-tohost combines traffic at the same end-point host machines and net-to-net combines flows
from the same net to the same terminating network. Combining host-to-host can counter
some of the unfairness associated with port-to-port aggregation, where aggressive users
can degrade the performance of other users by opening multiple TCP sessions to a server.
Also combining flows for host-to-host increases the packets per flow, thus decreasing the
high concentration of short lived flows. Aggregating traffic beyond the host level does not
yield substantial increase over host-to-host.
IV. Network Shortcut Overhead
The authors find that the shortcut setup rate increases with the decrease of the number of
packets in trigger but decreased with the increase of timeout of a flow. The number of
simultaneous shortcut connections increases with the reduction of number of packets in a
trigger but decreases with the decrease of timeout in a flow. The increase of timeout in a
flow causes conflicting overhead. Higher timeout reduces shortcut setup rate but increases
simultaneous shortcut connections.
V. Route Aggregation
The authors find that both the shortcut setup rate and number of simultaneous shortcut
connections reduce due to the partial aggregation of traffic. Further reduction in the setup
rate and the number of simultaneous connections is possible by increasing the timeout of
flow. They found that the reduction is dramatic if the aggregation of traffic is from the net
to multiple users. In both the host-to-host and host-to-net address aggregation, higher
overhead reduction was possible for 3-hop than 7-hop aggregation. Based on these findings
it is suggested to divide large network into multiple small regions, with separate shortcut
connections across each part of the route.
VI. Further Research
To understand the efficient policies for carrying web traffic over flow-switched networks,
following concerns need to be considered:
1) Here data were collected only from AT&T research lab. The distribution of
traffic may not be same if data were collected from the other source or
intermediate routers. We think that data needs to be collected from different
types of source and destination, as well as from the intermediate routers.
2) Data needs to be collected and analyzed for other periods of the year. The
authors collected data only for one week of February. This data may not be
sufficient enough to represent the data all over the year.
3) Here in this paper the authors investigate the setup rate, number of
simultaneous connections, and the percentage of traffic flow in the shortcut to
measure performance of the network. Other performance issues of the network,
such as delay, congestion, and packet loss of traffic need to be evaluated for the
flow-switched network.
4) Further research on detail breakdown of web traffic by content type, as well as
implications of push technology and the new features in the emerging HTTP
standards needs to be studied to understand web traffic over flow-switched
networks.
Download