Available Bandwidth and TCP Throughput

advertisement
Optimizing Network Performance In
Replicated Hosting
Peter Steenkiste (CMU)
with
Ningning Hu (CMU),
Oliver Spatscheck (AT&T),
Jia Wang (AT&T)
Ningning Hu
Carnegie Mellon University
1
Motivation
The question of how
to use latency to
select a replicated
web server has
been well studied
?
How about using
available
bandwidth?
Ningning Hu
Carnegie Mellon University
2
Outline
Pathneck
Internet end user RTT distribution and
access bandwidth distribution
Optimization results
 For RTT
 For bandwidth
 For data transmission time
Ningning Hu
Carnegie Mellon University
3
Pathneck:
Recursive Packet Train (RPT)
measurement
packets
1
2
measurement
packets
Load packets
20 100
100
100
100
100 20
60 pkts, 500 B
20 pkts, 60 B
2
1
20 pkts, 60 B
TTL
Two measurement packets are dropped at each router
ICMP packets allow source to estimate train length at
each hop
Changes in train length provide bounds on the
available bandwidth of each link
Ningning Hu
Carnegie Mellon University
4
Pathneck Operation
S
1
2
3
4
100
100
100
100
100
4
3
2
1
0
1
2
3
99
99 g199
99
99
3
2
1
0
0
1
2
98
98 g298
98
98
2
1
0
g1
R1
g2
R2
g2
R3
1
2
98
98
98
98
98
2
1
0
1
97
97
97 g3
97
97
1
0
Ningning Hu
Carnegie Mellon University
5
Pathneck Properties
Pathneck is an active probing tool designed
for locating Internet bottlenecks
 It is efficient and effective
 Also provide route, delay, and bandwidth
information
 For technical detail please see
www.cs.cmu.edu/~hnn/pathneck
We improve Pathneck to cover the last hop
 This allows us to measure the RTT and the
access bandwidth of many end users.
Ningning Hu
Carnegie Mellon University
6
Methodology
Measurement sources: 18 nodes from a
large tier-1 ISP
 14 in the US, 3 in Europe, and 1 in East-Asia
 Large fraction of paths cover other ISPs
 Play the role of possible replica sites
Measurement destinations: 164,130 IP
addresses from different prefixes
 67,271 IPs correspond to real online hosts
 Firewalls etc sometime require us to use
intermediate node as “virtual” destination
 Play the role of clients accessing the web
Ningning Hu
Carnegie Mellon University
7
Results
Internet end user RTT distribution and
access bandwidth distribution
Optimization results
 For RTT
 For bandwidth
 For data transmission time
Ningning Hu
Carnegie Mellon University
8
RTT Distribution
Europe
US-NE
East-Asia
The RTT “views” of Internet clients from different
geographical locations are significantly different
Ningning Hu
Carnegie Mellon University
9
Bandwidth Distribution
East-Asia Europe
US-NE
The bandwidth “views” are much more alike
Ningning Hu
Carnegie Mellon University
10
End Access Bandwidth Distribution
Limited by
downstream bandwidth
of measurement source
62.5% < 10Mbps
50% < 4.2Mbps
40% < 2.2Mbps
Low access bandwidth still dominates among end
users
Ningning Hu
Carnegie Mellon University
11
Bottleneck Location Distribution
75% of bottleneck links are at the last two hop
 Little chance to avoid these bottlenecks using
replication
However, when access bandwidth is higher than
40Mbps, content replication can help to improve
performance
Ningning Hu
Carnegie Mellon University
12
Results
Internet end user RTT distribution and
access bandwidth distribution
Optimization results
 For RTT
 For bandwidth
 For data transmission time
Ningning Hu
Carnegie Mellon University
13
Optimization Algorithm
We use simple greedy algorithm to optimize
the performance of our replication
infrastructure
 In each step, select the replication node that
has the largest marginal utility
Greedy algorithm has been shown to be able
to obtain results very close to the optimal
results
 For our study, it is only 0.1% worse than the
optimal results from brute-force search
Ningning Hu
Carnegie Mellon University
14
RTT Optimization
US-Central
East-Asia
US-West
Europe
US-East
RTT optimization results have a clear geographical
pattern
The first 5 replicas provide most of the benefit
Ningning Hu
Carnegie Mellon University
15
Marginal Utility of RTT Optimization
The first 5 nodes have significant improvement (i.e.,
larger than 5%)
[ Marginal utility: the relative performance
improvement from a specific node ]
Ningning Hu
Carnegie Mellon University
16
Bandwidth Optimization
The first 2 replicas provide most of the
benefit
Ningning Hu
Carnegie Mellon University
17
Marginal Utility for B.W. Optimization
Only the first 2 (3) nodes have significant
improvement
Ningning Hu
Carnegie Mellon University
18
For Well-provisioned Access Links
74%
35%
54Mbps
Replication can indeed improve bandwidth
performance for end users with access
bandwidth larger than 40Mbps
Ningning Hu
Carnegie Mellon University
19
Data Transmission Time
End-users’ data transmission time depends
on delay, bandwidth, and data size
We estimate data transmission time using a
simplified TCP model: a slow start and
congestion avoidance phase
 Assumes no packet loss
 Slow start: transfer time is delay sensitive
 Congestion avoidance: bandwidth sensitive
Data size determines whether replication
should optimize delay or bandwidth
 Use “slow-start size” as cross over point
Results: 70% of paths have slow-start size
larger than 10KB
 Larger than the average web page
Ningning Hu
Carnegie Mellon University
20
Data Transmission Time (2)
The transmission times for 10KB, 100KB, 1MB and
10MB are 0.4s, 1.1s, 6.4s, and 59.2s, respectively
Ningning Hu
Carnegie Mellon University
21
Related Work
Content replication with different
optimization metrics
 Geographic location, network hops and
latency,
 Retrieval costs, update cost, storage cost,
 QoS guarantee, …
Greedy algorithm used in replica selection
Ningning Hu
Carnegie Mellon University
22
Conclusion
Quantify Internet end-node accessbandwidth distribution and bottleneck
location distribution
Two differences distinguish the optimization
on bandwidth and on RTT
 Geographic location is not important for
bandwidth optimization
 For throughput, only well-provisioned end
users can benefit from content replication
Ningning Hu
Carnegie Mellon University
23
Download