Data Center Networks Jennifer Rexford COS 461: Computer Networks

advertisement
Data Center Networks
Jennifer Rexford
COS 461: Computer Networks
Lectures: MW 10-10:50am in Architecture N101
http://www.cs.princeton.edu/courses/archive/spr12/cos461/
Networking Case Studies
Data Center
Enterprise
Backbone
Cellular
Wireless
2
Cloud Computing
3
Cloud Computing
• Elastic resources
– Expand and contract resources
– Pay-per-use
– Infrastructure on demand
• Multi-tenancy
– Multiple independent users
– Security and resource isolation
– Amortize the cost of the (shared) infrastructure
• Flexible service management
4
Cloud Service Models
• Software as a Service
– Provider licenses applications to users as a service
– E.g., customer relationship management, e-mail, ..
– Avoid costs of installation, maintenance, patches, …
• Platform as a Service
– Provider offers platform for building applications
– E.g., Google’s App-Engine
– Avoid worrying about scalability of platform
5
Cloud Service Models
• Infrastructure as a Service
– Provider offers raw computing, storage, and
network
– E.g., Amazon’s Elastic Computing Cloud (EC2)
– Avoid buying servers and estimating resource needs
6
Enabling Technology: Virtualization
• Multiple virtual machines on one physical machine
• Applications run unmodified as on real machine
• VM can migrate from one computer to another
7
Multi-Tier Applications
• Applications consist of tasks
– Many separate components
– Running on different machines
• Commodity computers
– Many general-purpose computers
– Not one big mainframe
– Easier scaling
Multi-Tier Applications
Front end
Server
Aggregator
Aggregator
Aggregator
… …
Aggregator
…
Worker
9
Worker
…
Worker
Worker
Worker
Data Center Network
10
Virtual Switch in Server
11
Top-of-Rack Architecture
• Rack of servers
– Commodity servers
– And top-of-rack switch
• Modular design
– Preconfigured racks
– Power, network, and
storage cabling
12
Aggregate to the Next Level
13
Modularity, Modularity, Modularity
• Containers
• Many containers
14
Data Center Network Topology
Internet
CR
S
AR
AR
S
S
S
S
…
~ 1,000 servers/pod
15
CR
...
S
…
AR
AR
...
•
•
•
•
Key
CR = Core Router
AR = Access Router
S = Ethernet Switch
A = Rack of app. servers
Capacity Mismatch
CR
CR
~ 200:1
AR
AR
AR
AR
S
S
S
S
S
S
~ 40:1
S
~ S5:1
…
16
S
S
…
...
S
…
S
…
Data-Center Routing
Internet
CR
DC-Layer 3
DC-Layer 2
SS
…
AR
AR
SS
SS
S
S
SS
CR
...
SS
…
~ 1,000 servers/pod == IP subnet
17
AR
AR
...
•
•
•
•
Key
CR = Core Router (L3)
AR = Access Router (L3)
S = Ethernet Switch (L2)
A = Rack of app. servers
Reminder: Layer 2 vs. Layer 3
• Ethernet switching (layer 2)
– Cheaper switch equipment
– Fixed addresses and auto-configuration
– Seamless mobility, migration, and failover
• IP routing (layer 3)
– Scalability through hierarchical addressing
– Efficiency through shortest-path routing
– Multipath routing through equal-cost multipath
• So, like in enterprises…
18
– Connect layer-2 islands by IP routers
Case Study: Performance
Diagnosis in Data Centers
http://www.eecs.berkeley.edu/~minl
anyu/writeup/nsdi11.pdf
19
Applications Inside Data Centers
….
….
Front end Aggregator
Server
….
….
Workers
20
Challenges of Datacenter Diagnosis
• Multi-tier applications
– Hundreds of application components
– Tens of thousands of servers
• Evolving applications
– Add new features, fix bugs
– Change components while app is still in operation
• Human factors
– Developers may not understand network well
– Nagle’s algorithm, delayed ACK, etc.
21
Diagnosing in Today’s Data Center
App logs:
#Reqs/sec
Response time
1% req. >200ms
delay
Host
App
OS
SNAP:
Diagnose net-app interactions
Packet trace:
Filter out trace for
long delay req.
Packet
sniffer
Switch logs:
#bytes/pkts per minute
22
Problems of Different Logs
App logs:
Application-specific
Packet trace:
Too expensive
Host
App
OS
SNAP:
Generic, fine-grained, and
lightweight
Runs everywhere, all the time
Packet
sniffer
Switch logs:
Too coarse-grained
23
TCP Statistics
• Instantaneous snapshots
– #Bytes in the send buffer
– Congestion window size, receiver window size
– Snapshots based on random sampling
• Cumulative counters
– #FastRetrans, #Timeout
– RTT estimation: #SampleRTT, #SumRTT
– RwinLimitTime
– Calculate difference between two polls
24
Identifying Performance Problems
Sender App
– Not any other problems
Send Buffer
– Send buffer is almost full
Network
Receiver
– #Fast retransmission
– #Timeout
– RwinLimitTime
– Delayed ACK
Sampling
Direct
measure
Inference
diff(SumRTT)/diff(SampleRTT) > MaxDelay
25
SNAP Architecture
At each host for every connection
Collect
data
Direct access to OS
-Polling per-connection statistics:
• Snapshots (#bytes in send buffer)
• Cumulative counters (#FastRestrans)
-Adaptive tuning of polling rate
26
SNAP Architecture
At each host for every connection
Collect
data
Performance
Classifier
Classifying based on the life of data transfer
-Algorithms for detecting performance problems
-Based on direct measurement in the OS
27
SNAP Architecture
At each host for every connection
Collect
data
Performance
Classifier
Crossconnection
correlation
Direct access to data center configurations
-Input
• Topology, routing information
• Mapping from connections to processes/apps
-Correlate problems across connections
• Sharing the same switch/link, app code 28
SNAP Deployment
• Production data center
– 8K machines, 700 applications
– Ran SNAP for a week, collected petabytes of data
• Identified 15 major performance problems
– Operators: Characterize key problems in data center
– Developers: Quickly pinpoint problems in app
software, network stack, and their interactions
29
Characterizing Perf. Limitations
#Apps that are limited
for > 50% of the time
Sender
App
551
Apps
– Bottlenecked by CPU, disk, etc.
– Slow due to app design (small writes)
Send
Buffer
1
App
– Send buffer not large enough
Networ
k
6
Apps
– Fast retransmission
– Timeout
Receiver
8 Apps
– Not reading fast enough (CPU, disk, etc.)
144
30
Apps – Not ACKing fast enough (Delayed ACK)
Delayed ACK
• Delayed ACK caused significant problems
– Delayed ACK was used to reduce bandwidth usage
and server interruption
B
A
B has data
to send
….
Delayed ACK
should be
disabled in data
centers
B doesn’t have
data to send
200 ms
31
Diagnosing Delayed ACK with SNAP
• Monitor at the right place
– Scalable, low overhead data collection at all hosts
• Algorithms to identify performance problems
– Identify delayed ACK with OS information
• Correlate problems across connections
– Identify the apps with significant delayed ACK issues
• Fix the problem with operators and developers
– Disable delayed ACK in data centers
32
Conclusion
• Cloud computing
– Major trend in IT industry
– Today’s equivalent of factories
• Data center networking
– Regular topologies interconnecting VMs
– Mix of Ethernet and IP networking
• Modular, multi-tier applications
– New ways of building applications
– New performance challenges
33
Load Balancing
34
Load Balancers
• Spread load over server replicas
– Present a single public address (VIP) for a service
– Direct each request to a server replica
10.10.10.1
Virtual IP (VIP)
192.121.10.1
10.10.10.2
10.10.10.3
35
Wide-Area Network
Servers
Router
DNS
Server
DNS-based
site
36
selection
Data
Centers
Servers
Router
Internet
Clients
Wide-Area Network: Ingress Proxies
Data
Centers
Servers
Router
Proxy
Servers
Router
Proxy
Clients
37
Download