Symbiotic Routing in Future Data Centers

advertisement
Symbiotic Routing
in Future Data Centers
Hussam Abu-Libdeh, Paolo Costa, Antony Rowstron,
Greg O’Shea, Austin Donnelly
Cornell University Microsoft Research Cambridge
1
Data center networking
• Network principles evolved from Internet systems
• Multiple administrative domains
• Heterogeneous environment
• But data centers are different
• Single administrative domains
• Total control over all operational aspects
• Re-examine the network in this new setting
2
Rethinking DC networks
• New proposals for data center network architectures
TCO
Scalability
Network Interface
Modular Design
Graceful Degradation
Fault Tolerance
Performance Isolation
Bandwidth
• Network interface has not changed!
Commodity Components
• DCell, BCube, Fat-tree, VL2, PortLand …
3
Challenge
• The network is a black box to applications
• Must infer network properties
• Locality, congestion, failure …etc
• Little or no control over routing
• Applications are a black box to the network
• Must infer flow properties
• E.g. Traffic engineering/Hedera
• In consequence
• Today’s data centers and proposals use a single protocol
• Routing trade-offs made in an application-agnostic way
• E.g. Latency, throughput, …etc
4
CamCube
• A new data center design
• Nodes are commodity x86 servers with local storage
• Container-based model 1,500-2,500 servers
• Direct-connect 3D torus topology
• Six Ethernet ports / server
• Servers have (x,y,z) coordinates
• Defines coordinate space
(0,2,0)
y
• Simple 1-hop API
• Send/receive packets to/from 1-hop neighbours
• Not using TCP/IP
z
• Everything is a service
• Run on all servers
x
• Multi-hop routing is a service
• Simple link state protocol
• Route packets along shortest paths from source to destination
5
Development experience
• Built many data center services on CamCube
• E.g.
• High-throughput transport service
• Desired property: high throughput
• Large-file multicast service
• Desired property: low link load
• Aggregation service
• Desired property: distribute computation load over servers
• Distributed object cache service
• Desired property: per-key caches, low path stretch
6
Per-service routing protocols
• Higher flexibility
• Services optimize for different objectives
• High throughput transport  disjoint paths
• Increases throughput
• File multicast  non-disjoint paths
• Decreases network load
7
What is the benefit?
• Prototype Testbed
• 27 servers, 3x3x3 CamCube
• Quad core, 4 GB RAM, six 1Gbps Ethernet ports
• Large-scale packet-level discrete event simulator
• 8,000 servers, 20x20x20 CamCube
• 1Gbps links
• Service code runs unmodified on cluster and simulator
8
Service-level benefits
• High throughput transport service
• 1 sender 2000 receivers
1
• Sequential iteration
CDF Flows
• 10,000 packets/flow
• 1500 bytes/packet
0.75
0.5
0.25
• Metric: throughput
• Shown: custom/base ratio
0
0
1
2
3
4
5
Custom/Base Throughput Ratio
9
Service-level benefits
• Large-file multicast service
• Metric: # of links in multicast tree
• Shown: custom/base ratio
0.4
Links reduction
• 8,000-server network
• 1 multicast group
• Group size: 0%  100% of servers
0.3
0.2
0.1
0
0%
20%
40%
60%
80%
100%
Number of servers in the group (%)
10
Service-level benefits
• Distributed object cache service
• Evenly distributed among servers
• 800,000 lookups
• 100 lookups per server
• Keys picked by Zipf distribution
• 1 primary + 8 replicas per key
1
0.75
CDF Lookups
• 8,000-server network
• 8,000,000 key-object pairs
0.5
Custom Routing
0.25
Base Routing
0
0
10
20
30
Path length
• Replicas unpopulated initially
• Metric: path length to nearest hit
11
Network impact
• Ran all services simultaneously
• No correlation in link usage
• Reduction in link utilization
Services per link
Custom/base packet ratio
Fraction of links
0.6
Change in link utilization
0.4
0.2
0
0 services 1 service 2 services 3 services 4 services
1
0.8
0.6
0.4
0.2
0
Key-value Cache
Multicast
Fixed Path
Aggregation
• Take-away: custom routing reduced network load and
increased service-level performance
High-Throughput
Transport
12
Symbiotic routing relations
• Multiple routing protocols running concurrently
• Routing state shared with base routing protocol
• Services
• Use one or more routing protocols
• Use base protocol to simplify their custom protocols
• Network failures
• Handled by base protocol
• Services route for common case
Service A
Routing Protocol 1
Service B
Routing Protocol 2
Service C
Routing Protocol 3
Base Routing Protocol
Network
13
Building a routing framework
• Simplify building custom routing protocols
• Routing:
• Build routes from set of intermediate points
• Coordinates in the coordinate space
packet
local coord
next coord
• Services provide forwarding function ‘F’
• Framework routes between intermediate points
• Use base routing service
• Consistently remap coordinate space on node failure
• Queuing:
• Services manage packet queues per link
• Fair queuing between services per link
14
Example: cache service
• Distributed key-object caching
• Key-space mapped onto CamCube coordinate space
• Per-key caches
• Evenly distributed across coordinate space
• Cache coordinates easily computable based on key
15
Cache service routing
• Routing
• Source  nearest cache or primary
• On cache miss: cache  primary
• Populate cache: primary  cache
• F function computed at
• Source
• Cache
• Primary
• Different packets can use different links
• Accommodate network conditions
• E.g. congestion
source/querier
nearest cache
primary server
16
Handling failures
• On link failure
• Base protocol routes around failure
• On replica server failure
• Key space consistently remapped
by framework
• F function does not change
• Developer only targets common case
• Framework handles corner cases
source/querier
nearest cache
primary server
17
Cache service F function
protected override List<ulong> F(int neighborIndex, ulong currentDestinationKey, Packet packet)
{
List<ulong> nextKeys = new List<ulong>();
ulong itemKey = LookupPacket.GetItemKey(packet);
extract packet
ulong sourceKey = LookupPacket.GetSourceKey(packet);
details
if (currentDestinationKey == sourceKey) // am I the source?
{
// get the list of caches (using KeyValueStore static method)
ulong[] cachesKey = ServiceKeyValueStore.GetCaches(itemKey);
}
// iterate over all cache nodes and keep the closest ones
int minDistance = int.MaxValue;
foreach (ulong cacheKey in cachesKey)
{
int distance = node.nodeid.DistanceTo(LongKeyToKeyCoord(cacheKey));
if (distance < minDistance)
{
nextKeys.Clear();
nextKeys.Add(cacheKey);
minDistance = distance;
}
else if (distance == minDistance)
nextKeys.Add(cacheKey);
}
else if (currentDestinationKey != itemKey) // am I the cache?
nextKeys.Add(itemKey);
}
return nextKeys;
if at source, route to
nearest cache
or primary
if cache miss,
route to primary
18
Framework overhead
• Benchmark performance
• Single server in testbed
• Communicate with all six 1-hop neighbors (Tx + Rx)
• Sustained 11.8 Gbps throughput
• Out of upper bound of 12 Gbps
• User-space routing overhead
CPU Utilization (%)
100
80
60
40
19
20
0
Baseline
Framework
What have we done
• Services only specify a routing “skeleton”
• Framework fills in the details
• Control messages and failures handled by framework
• Reduce routing complexity for services
• Opt-in basis
• Services define custom protocols only if they need to
20
Network requirements
• Per-service routing not limited to CamCube
• Network need only provide:
• Path diversity
• Providing routing options
• Topology awareness
• Expose server locality and connectivity
• Programmable components
• Allow per-service routing logic
21
Conclusions
• Data center networking from the developer’s perspective
• Custom routing protocols to optimize for application-level
performance requirements
• Presented a framework for custom routing protocols
• Applications specify a forwarding function (F) and queuing hints
• Framework manages network state, control messages, and
remapping on failure
• Multiple routing protocols running concurrently
• Increase application-level performance
• Decrease network load
22
Thank You!
Questions?
hussam@cs.cornell.edu
23
Cache service
Insert throughput
4
Ingress bandwidth bounded
(3 front-ends)
Insert throughput (Gbps)
3.5
3
2.5
2
Disk I/O
bounded
F=3, disk
1.5
F=27, disk
1
F=3, no disk
0.5
F=27, no disk
0
0
20
40
60
80
Concurrent insert requests
100
120
140
24
Cache service
Lookup requests/second
140,000
Lookup rate (reqs/s)
120,000
100,000
Ingress
bandwidth
bounded
80,000
60,000
40,000
F=3
F=27
20,000
0
0
20
40
60
80
100
Concurrent lookup requests
120
140
25
Cache service
CPU Utilization on FEs
100
lookup (F=3)
90
CPU utilization (%)
3 front-ends
insert (F=3, no disk)
80
insert (F=27, no disk)
70
lookup (F=27)
60
27 front-ends
50
40
30
20
10
0
26
0
20
40
60
80
Concurrent requests
100
120
140
Camcube link latency
1000
Round trip time (microsec)
900
UDP (x-cable)
Camcube (1 hop)
800
UDP (switch)
700
TCP (x-cable)
600
TCP (switch)
500
400
300
200
100
27
0
1,500-byte packets
9,000-byte packets
Download