Defense - Northwestern Networks Group

advertisement
A Scalable, Commodity Data Center
Network Architecture
Mohammad Al-Fares, Alexander
Loukissas, Amin Vahdat
Presented by Gregory Peaker and Tyler Maclean
Overview
•
•
•
•
Structure and Properties of a Data Center
Desired properties in a DC Architecture
Fat tree based solution
Evaluation of fat tree approach
Common data center topology
Problem With common DC topology
• Single point of failure
• Over subscription of links higher up in the
topology
– Typical over subscription is between 2.5:1 and 8:1
– Trade off between cost and provisioning
Properties of solutions
• Compatible with Ethernet and TCP/IP
• Cost effective
– Low power consumption & heat emission
– Cheap infrastructure
– Commodity hardware
• Allows host communication at line speed
– Over subscription of 1:1
Cost of maintaining switches
Review of Layer 2 & Layer 3
• Layer 2
– Data Link Layer
• Ethernet
• MAC address
– One spanning tree for entire network
• Prevents looping
• Ignores alternate paths
• Layer 3
– Transport Layer
• TCP/IP
– Shortest path routing between source and destination
– Best-effort delivery
FAT Tree based Solution
• Connect end-host together using a fat tree
topology
– Infrastructure consist of cheap devices
• Every port is the same speed
– All devices can transmit at line speed if packets
are distributed along existing paths
– A k-port fat tree can support k3/4 hosts
Fat-Tree Topology
Problems with a vanilla Fat-tree
• Layer 3 will only use one of the existing equal
cost paths
• Packet re-ordering occurs if layer 3 blindly
takes advantage of path diversity
– Creates overhead at host as TCP must order the
packets
FAT-tree Modified
• Enforce special addressing scheme in DC
– Allows host attached to same switch to route only
through switch
– Allows inter-pod traffic to stay within pod
– unused.PodNumber.switchnumber.Endhost
• Use two level look-ups to distribute traffic and
maintain packet ordering.
2 Level look-ups
• First level is prefix lookup
– Used to route down the topology to endhost
• Second level is a suffix lookup
– Used to route up towards core
– Diffuses and spreads out traffic
– Maintains packet ordering by using the same ports
for the same endhost
Diffusion Optimizations
• Flow classification
– Eliminates local congestion
– Assign to traffic to ports on a per-flow basis
instead of a per-host basis
• Flow scheduling
– Eliminates global congestion
– Prevent long lived flows from sharing the same
links
– Assign long lived flows to different links
Results: Heat & Power Consumption
Implementation
• NetFPGA:
• 4 Gigabit Ports, 36 Mb SRAM
• 64MB DDR2, 3GB SATA Port
• Implemented elements in Click Router Software
• Two Level Table
• Initialized with preconfigured information
• Flow Classifier
• Distributes output evenly across local ports
• Flow Report + Flow Schedule
• Communicates with central schedule
Evaluation
• Purpose: measure bisection bandwidth
• Fat-Tree: 10 machines connected to 48 port switch
• Hierarchical: 8 machines connected to 48 port switch
Results
Related Work
• Myrinet – popular for cluster based supercomputers
• Benefit: low latency
• Cost: proprietary, host responsible for load balancing
• Infiniband – used in high-performance computing
environments
• Benefit: proven to scale and high bandwidth
• Cost: imposes its own layer 1-4 protocol
• Uses Fat Tree
• Many massively parallel computers such as Thinking
Machines & SGI use fat-trees
Conclusion
• The Good: cost per gigabit, energy per gigabit is
going down
• The Bad: Datacenters are growing faster than
commodity Ethernet devices
• Our fat-tree solution
• Is better: technically infeasible 27k node cluster using 10
GigE, we do it in $690M
• Is faster: equal or faster bandwidth in tests
• Increases fault tolerance
• Is Cheaper: 20k hosts costs $37M for hierarchical and
$8.67M for fat-tree (1 GigE)
• KO’s the competing data center’s
Download