Data-center scale computing presentation

advertisement
Data Center
Scale Computing
If computers of the kind I have advocated become the computers
of the future, then computing may someday be organized as a
public utility just as the telephone system is a public utility. . . .
The computer utility could become the basis of a new and
important industry.
John McCarthy
MIT centennial celebration (1961)
Presentation by:
Ken Bakke
Samantha Orogvany
John Greene
Outline
●
●
●
●
●
●
●
●
●
●
Introduction
Data Center System Components
Design and Storage Considerations
Data Center Power supply
Data Center Cooling
Data center failures and fault tolerances
Data center repairs
Current challenges
current research, trends, etc
Conclusion
Data Center VS Warehouse Scale Computer
Data center
• Provide colocated equipment
• Consolidate heterogeneous
computers
• Serve wide variety of customers
• Binaries typically run on a small
number of computers
• Resources are partitioned and
separately managed
• Facility and computing resources
are designed separately
• Share security, environmental and
maintenance resources
Warehouse-scale computer
•
•
•
•
•
Designed to run massive internet
applications
Individual applications run on
thousands of computers
Homogeneous hardware and
system software
Central management for a common
resource pool
The design of the facility and the
computer hardware is integrated
Need for Warehouse-scale Computers
•
•
•
•
•
•
Renewed focus on client-side consumption
of web resources
Constantly increasing numbers of web users
Constantly expanding amounts of
information
Desire for rapid response for end user
Focus on cost reduction delivering massive
applications.
Increased interest in Infrastructure as a
Service (Iaas)
Performance and Availability Techniques
•
•
•
•
•
•
•
•
•
•
Replication
Reed-Solomon codes
Sharding
Load-balancing
Health checking
Application specific compression
Eventual consistency
Centralized control
Canaries
Redundant execution and tail tolerance
Major system components
•
•
•
Typical server is 4 CPU - 8 Dual threaded cores yielding 32 cores
Typical rack - 40 servers & 1 or 10 Gbps ethernet switch
Cluster containing cluster switch and 16 - 64 racks
A cluster may contain tens of thousands of processing threads
Low-end Server vs SMP
•
•
Latency 1000 time faster in SMP
Less impact on applications too large for single server
Performance advantage of a cluster built with large SMP server nodes (128-core SMP) over a cluster
with the same number of processor cores built with low-end server nodes (four-core SMP), for clusters
of varying size.
Brawny vs Wimpy
Advantages of wimpy computers
• Multicore CPUs carry a premium cost
of 2-5 times vs multiple smaller CPUs
• Memory and IO bound applications do
not take advantage of faster CPUs
• Slower CPUs are more power
efficient
Disadvantages of wimpy computer
• Increasing parallelism is
programmatically difficult
• Programming costs increase
• Networking requirements increase
• Less tasks / smaller size creates
loading difficulties
• Amdahl’s law impacts
Design Considerations
•
•
•
•
•
Software design and improvements can be made to align with architectural
choices
Resource requirements and utilization can be balanced among all
applications
o Spare CPU cycles can be used for process intensive applications
o Spare storage can be used for archival purposes
Fungible resources are more efficient
Workloads can be distributed to fully utilize servers
Focus on cost-effectiveness
Smart programmers may be able to restructure algorithms to match a more
inexpensive design.
Storage Considerations
Private Data
• Local DRAM, SSD or Disk
Shared State Data
• High throughput for thousands of users
• Robust performance tolerant to errors
• Unstructure Storage - (Google - GFS)
o
o
o
o
Master plus thousnads of “chunk” servers
Utilizes every system with a disk drive
Cross machine replication
Robust performance tolerant to errors
• Structured Storage
o Big Table provides Row, Key, Timestamp mapping to byte array
o Trade-offs favor high performance and massive availability
o Eventual consistency model leaves applications managing
consistency issues
Google File System
WSC Network Architecture
Leaf Bandwidth
• Bandwidth between servers in common rack
• Typically managed with a commodity switch
• Easily increased by increasing number of ports or speed of ports
Bisection Bandwidth
• Bandwidth between the two halves of a cluster
• Matching leaf bandwidth requires as many uplinks to fabric as links within a
rack
• Since distances are longer, optical interfaces are required.
Three Stage Topology
Required to maintain same throughput as single switch.
Network Design
•
•
•
•
Oversubscription ratios of 4-10 are common.
Limit network cost per server
Offloading to special networks
Centralized management
Service level response times
Consider servers with 99th, 99.9th and 99.99th latency > 1s vs # required service requests
Selective replication is one mitigating strategy
Power Supply Distribution
Uninterruptible Power Systems
•
Transfer switch used to chose active power input
from either utility sources or generator
● After a power failure, the transfer switch will detect
the power generator and after 10-15 seconds,
provide power
● This power system has energy storage to provide
additional protection between power failure of main
utility power and when generators begin providing
full load
● Levels incoming power feed to remove spikes and
lags from AC-feed
Example of Power Distribution Units
Traditional PDU
• Takes in power output from
UPS
• Regulates power with
transformers to distribute
power to servers
• Handles 75-225 kW typically
• Provides Redundancy by
switching between 2 power
sources
Examples of Power Distribution
Facebook’s power
distribution system
•
•
Designed to increase power
efficiency by reducing
energy loss to about 15%
Eliminates the UPS and
PDU and adds on-board
12v battery for each cabinet
Power Supply Cooling Needs
Air Flow Consideration
Fresh Air cooling
o “Opening the
windows”
Closed loop system
o Underfloor
systems
o Servers are on
raised concrete
tile floors
•
•
Power Cooling Systems
2-loop Systems
Loop 1 - Hot Air/Cool air circuit (Red/Blue Arrows)
Loop 2 - Liquid supply to Computer Room Air
Conditioning Units and heat discharging
Example of Cooling System Design
3 - Loop System
•
•
•
Chiller sends cooled
water to CRACs
Heated water sent
from building to
chiller for heat
dispersal
Condenser water
loop flows into
cooling tower
Cooling System for Google
Estimated Annual Costs
Estimated Carbon Costs for Power
Based on local utility power generated via the use of oil, natural gas, coal or
renewable sources, including hydroelectricity, solar energy, wind and
biofuels
Power Efficiency
Sources of Efficiency Loss
•
•
•
•
Overheading cooling systems,
such as chillers
Improvements to
Efficiency
•
Air movement
IT Equipment
Power distribution unit
•
•
•
Handling air flow more carefully.
Keep cooling path short and
separate hot air from servers
from system
Consider raising cooling
temperatures
Employ “free cooling” by locating
datacenter in cooler climates
Select more efficient power
system
Data Center Failures
Reliability of Data Center
Trade off between cost of failures, along with repairing,
and preventing failures.
Fault Tolerances
•Traditional servers require high degree of reliability and
redundancy to prevent failures as much as possible
For data warehouses, this is not practical
•
o Example: a cluster of 10,000 servers will have an average of 1
server failure/day
Data Center Failures
Fault Severity Categories
•Corrupted
•
•
•
Data is lost, corrupted, or cannot be regenerated
Unreachable
o Service is down
Degraded
o Service is available, but limited
Masked
o Faults occur but due to fault tolerance, this is
masked from user
o
Data Center Fault Causes
Causes
•Software errors
•Faulty configs
•Human Error
•Networking faults
•Faulty hardware
It’s easier to tolerate
known hardware issues
than software bugs or
human error.
Repairs
•It’s not critical to quickly
repair individual servers
In reality, repairs are
scheduled as a ‘daily
sweep’
Individual failures mostly
do not affect overall data
center health
System is designed to
tolerate faults
•
•
•
Google Restarts and Downtime
Relatively New Class of Computers
•
•
•
•
•
Facebook founded in 2004
Google’s Modular Data Center in 2005
Microsoft’s Online Services Division in 2005
Amazon Web Services in 2006
Netflix added streaming in 2007
Balanced System
•
Nature of workload at this scale is:
o
o
•
•
•
o
Large volume
Large variety
Distributed
This means no servers (or parts of servers)
get to slack while others do the work.
Keep servers busy to amortize cost
Need high performance from all
components!
Imbalanced Parts
•
Latency lags bandwidth
Imbalanced Parts
•
CPUs have been historical focus
Focus Needs to Shift
•
•
Push toward SaaS will highlight these
disparities
Requires concentrating research:
o
o
o
Improving non-CPU components
Improving responsiveness
Improving end-to-end experience
Why does latency matter?
•
•
Responsiveness dictated by latency
Productivity affected by responsiveness
Real Estate Considerations
•
•
•
•
•
•
Land
Power
Cooling
Taxes
Population
Disasters
Google’s Data Centers
Economical Efficiency
•
•
DC is non-trivial cost
o Does not include land
Servers is bigger cost
o More servers desirable
o Busy servers desirable
Improving Efficiency
•
•
Better components
o
Energy proportional (less use == less energy)
Power-saving modes
Transparent (e.g., clock-gating)
o Active (e.g., CPU throttling)
o Inactive (e.g., idle drives stop spinning)
o
Changing Workloads
•
•
Workloads more agile in nature
SaaS Shorter release cycles
o
•
o
Even major software gets rewritten
o
•
Office 365 updates several times per year
Some Google services update weekly
Google search engine re-written from scratch 4
times
Internet services are still young
o
Usage can be unpredictable
YouTube
•
•
Started in 2005
Fifth most popular site within first year
Adapting
•
•
•
Strike balance of need to deploy with longevity
o
Need it fast and good
Design to make software easy to create
o
Easier to find programmers
Redesign when warranted
o
o
Google Search’s rewrites removed inefficiencies
Contrast to Intel’s backwards compatibility spanning
decades
Future Trends
● Continued emphasis on:
○
Parallelism
○ Networking, both within and to/from datacenters
○ Reliability via redundancy
○ Optimizing efficiency (energy proportionality)
●
●
Environmental impact
Energy costs
● Amdahl’s law will remain major factor
● Need increased focus on end-to-end
systems
● Computing as a utility?
“Anyone can build a fast CPU. The trick is to build a fast
system.”
-Seymour Cray
“Anyone can build a fast CPU. The trick is to build a fast
system.”
-Seymour Cray
Download