Introduction

advertisement
CIT 470: Advanced Network and
System Administration
Data Centers
CIT 470: Advanced Network and System Administration
Slide #1
Topics
Data Center: A facility for housing a large
amount of computer or communications
equipment.
1.
2.
3.
4.
5.
6.
Racks
Power
PUE
Cooling
Containers
Economics
Google DC in The Dalles
Located near 3.1GW hydroelectric power station on Columbia River
Google DC in The Dalles
Inside a Data Center
Inside a Container Data Center
Data Center is composed of:
• A physically safe and secure space
• Racks that hold computer, network, and
storage devices
• Electric power sufficient to operate the
installed devices
• Cooling to keep the devices within their
operating temperature ranges
• Network connectivity throughout the data
center and to places beyond
Data Center Components
Data Center Tiers
See http://uptimeinstitute.org/ for more details about tiers.
Racks: The Skeleton of the DC
• 19” rack standard
– EIA-310D
– Other standard numbers.
• NEBS 21” racks
– Telecom equipment.
•
•
•
•
2-post or 4-post
Air circulation (fans)
Cable management
Doors or open
Rack Units
Rack Sizes
http://www.gtweb.net/rackframe.html
Rack Purposes
Organize equipment
– Increase density with vertical stacking.
Cooling
– Internal airflow in rack cools servers.
– Data center airflow determined by
arrangement of racks.
Wiring Organization
– Cable guides keep cables within racks.
Rack Power Infrastructure
• Different power
sockets can be on
different circuits.
• Individual outlet
control (power cycle.)
• Current monitoring
and alarms.
• Network managed
(web or SNMP.)
Rack-Mount Servers
1U
4U
Blade Servers
Buying a Rack
Buy the right size
– Space for servers.
– power, patch panels, etc.
Be sure it fits your servers.
– Appropriate mounting rails.
– Shelves for non-rack servers.
Environment options
–
–
–
–
Locking front and back doors
Sufficient power and cooling.
Power/environment monitors.
Console if needed.
Space
Aisles
Wide enough to move equipment.
Separate hot and cold aisles.
Hot spots
Result from poor air flow.
Servers can overheat when average
room temperature is too low.
Work space
A place for SAs to work on servers.
Desk space, tools, etc.
Capacity
Room to grow.
Data Center Power Distribution
http://www.42u.com/power/data-center-
UPS (Uninterruptible Power Supply)
Provides emergency power when utility fails
– Most use batteries to store power
Conditions power, removing voltage spikes
Standby UPS
•
•
•
•
•
Power will be briefly interrupted during switch
Computers may lockup/reboot during interruption
No power conditioning
Short battery life
Very inexpensive
http://myuninterruptiblepowersupply.com/toplogy.htm
Online UPS
•
•
•
•
•
AC -> DC -> AC conversion design
True uninterrupted power without switching
Extremely good power conditioning
Longer battery life
Higher price
http://myuninterruptiblepowersupply.com/toplogy.htm
Power Distribution Unit (PDU)
Takes high voltage feed and divides into many
110/120 V circuits that feed servers.
– Similar to breaker panel in a house.
Estimating Per-Rack Power
The Power Problem
• 4-year power cost = server purchase price.
• Upgrades may have to wait for electricity.
• Power is a major data center cost
– $5.8 billion for server power in 2005.
– $3.5 billion for server cooling in 2005.
– $20.5 billion for purchasing hardware in 2005.
Measuring Power Efficiency
PUE is ratio of total building power to IT power;
efficiency of datacenter building infrastructure
SPUE is ratio of total server input to its useful
power, where useful power is power consumed by
CPU, DRAM, disk, motherboard, etc.
Excludes losses due to power supplies, fans, etc.
Computation efficiency depends on software and
workload and measures useful work done per watt
Power Usage Effectiveness (PUE)
PUE = Data center power / Computer power
– PUE=2 indicates that for each watt of power used
to power IT equipment, one watt used for HVAC,
power distribution, etc.
– Decreases towards 1 as DC is more efficient.
PUE variation
– Industry average > 2
– Microsoft = 1.22
– Google = 1.19
Data Center Energy Usage
Sources of Efficiency Losses
UPS
– 88-94% efficiency
– Less if lightly loaded
PDU voltage transformation
– .5% or less
Cables from PDU to racks
– 1-3% depending on distance and cable type
Computer Room Air Conditioning (CRAC)
– Delivery of cool air over long distances uses fan
power and increases air temperature
Cooling a Data Center
• Keep temperatures within 18-27 ◦C
• Cooling equipment rated in BTUs
– 1 Watt = 3412 BTUH
– BTUH = British Thermal Unit / Hour
• Keep humidity between 30-55%
– High = condensation
– Low = static shock
• Avoid hot/cold spots
– Can produce
condensation
Computer Room Air Conditioning
• Large scale, highly
reliable air
conditioning units
from companies like
Liebert.
• Cooling capacity
measured in tons.
Waterworks for Data Center
Estimating Heat Load
Hot-Cold Aisle Architecture
• Server air intake from cold aisles
• Server air exhaust into hot aisles
• Improve efficiency by reducing mixture of
hot/cold
Free Cooling
• Cooling towers dissipate
heat by evaporating water,
reducing or eliminating need
to run chillers
• Google Belgium DC uses
100% free cooling
Improving Cooling Efficiency
Air flow handling: Hot air exhausted by servers
does not mix with cold air, and path to cooling
coil is very short so little energy spent moving
Elevated cold aisle temperatures: Cold aisle of
containers kept at 27◦C rather than 18-20◦C.
Use of free cooling: In moderate climates, cooling
towers can eliminate majority of chiller runtime.
Server PUE (SPUE)
Primary sources of inefficiency
– Power Supply Unit (PSU) (70-75% efficiency)
– Voltage Regulator Modules (VRMs)
• Can lose more than 30% power in conversion losses
– Cooling fans
• Software can reduce fan RPM when not needed
SPUE ratios of 1.6-1.8 are common today
Power Supply Unit Efficiency
80 PLUS initiative to promote PSU efficiency
– 80+% efficiency at 20%, 50%, 100% of rated load
– Can be less than 80% efficient at idle power load
First 80 PLUS PSU shipped in 2005
Server Useful Power Consumption
Device
Power Usage
Intel Xeon W5590 3.33 GHz Quad Core
130 W
Intel Xeon E5430 2.66 GHz Quad Core
80W
Intel Xeon E5502 2.13 GHz Dual Core
80W
7200RPM Hard Drive
7W
10,000RPM Hard Drive
14W
15,000RPM Hard Drive
20W
DDR2 DIMM
1.65W
Video Card
20-120W
The best method to determine power usage is to measure it
https://www.wattsupmeters.com/
Server Utilization ~10-50%
“The Case for
Energy-Proportional
Computing,”
Luiz André Barroso,
Urs Hölzle,
IEEE Computer,
December 2007
It is surprisingly hard
to achieve high levels
of utilization of typical
servers (and your home
PC is even worse)
Figure 1. Average CPU utilization of more than 5,000 servers during a six-month period. Servers
are rarely completely idle and seldom operate near their maximum utilization, instead operating
most of the time at between 10 and 50 percent of their maximum
Server Power Usage Range: 50-100%
“The Case for
Energy-Proportional
Computing,”
Luiz André Barroso,
Urs Hölzle,
IEEE Computer,
December 2007
Energy efficiency =
Utilization/Power
Figure 2. Server power usage and energy efficiency at varying utilization levels, from idle to
peak performance. Even an energy-efficient server still consumes about half its full power
when doing virtually no work.
Latency
Server Utilization vs. Latency
Utilization
100%
Improving Power Efficiency
Improving Power Efficiency
Application consolidation
– Reduce the number of applications by eliminating
old applications in favor of new ones that can server
the purpose of multiple old ones.
– Allows elimination of old app servers.
Server consolidation
– Use single DB for multiple applications.
– Move light services like NTP onto shared boxes.
Use SAN storage
– Local disks typically highly underused
– Use SAN so servers share single storage pool
Improving Power Efficiency
Virtualization
– Host services on VMs instead of on physical servers
– Host multiple virtual servers on single physical svr
Only-as-needed Servers
– Power down servers when not in use
– Works best with cloud computing
Granular capacity planning
– Measure computing needs carefully
– Buy minimal CPU, RAM, disk configuration based on
your capacity measurements and forecasts
Containers
Data Center in a
shipping container.
– 4-10X normal data
center density.
– 1000s of servers.
– 100s of kW of power.
Advantages
–
–
–
–
Efficient cooling
High server density
Rapid deployment
Scalability
Vendor offerings: http://www.datacentermap.com/blog/datacenter-container-55.html
Microsoft Chicago Data Center
Google Container Patents
Containers docked at central power spline
Vertical stack
of containers
Container air flow diagram, with a center
cold aisle and hot air return behind servers
Data Center Failure Events
Hardware Isn’t Reliable Enough
If servers are 99% reliable, then
a system with 10 servers is 0.9910 ≈ 90% reliable
a system with 100 servers is 0.99100 ≈ 37% reliable
0.99^n
1
0.8
0.6
0.4
0.2
0
1
11
21
31
41
51
61
71
81
91
Fault-Tolerant Architecture
Must use fault-tolerant software architecture
– Hardware must detect faults
– Hardware must notify software in timely fashion
Fault-tolerant architecture reduces costs
– Choose hardware reliability level that maximizes
cost efficiency, not just reliability
Fault-tolerant architecture can improve perf
– Spreading processing and storage across many
servers improves bandwidth and CPU capacity
Causes of Service Disruptions
Total Cost of Ownership (TCO)
TCO = Data Center Depreciation
+ Data Center Operating Expenses (Opex)
+ Server Depreciation
+ Server Operating Expenses (Opex)
Depreciation is the process of allocating cost of
assets across period during which assets are used.
Example: server cost = $10,000, $0 residual value
annual depreciation over 4 years = $2500
Cost to Build Data Center
• Primary components (power, cooling, space) scale roughly linearly
with space.
• 80% of total construction cost goes to power + cooling
• Typical depreciation periods of 10-15 years
Operational Costs
Operational costs include
–
–
–
–
Electricity
Salaries for personnel
Server maintenance contracts
Software licenses
Larger data centers are cheaper
– Smaller number of sysadmins per server
– Fixed number of security guards
For multi-MW data center, $0.02-$0.08/month
Case Study
Tier 3 multi-MW data center
–
–
–
–
–
–
–
–
–
Dell 2950 III EnergySmart servers (300W, $6000)
Cost of electricity is 6.2₵/kW
Servers financed with 3-year loan @ 12%
Cost of DC construction is $15/W, 12-yr lifetime
DC opex is 4₵/month
PUE = 2.0
Server lifetime is 3 years
Server maintenance is 5% of capex
Server avg power = 75% peak
Key Points
Data center components
–
–
–
–
–
Physically secure space
Racks, the DC skeleton
Power, including UPS and PDU
Cooling
Networking
Power efficiency (server cost = 4 years power on avg)
– PUE = Data center power / IT equipment power
– Most power in traditional DC goes to cooling, UPS
– SPUE = Server PUE; inefficiencies from PSU, VRM, fans
Cooling
–
–
–
–
Heat load estimation
Air flow control (hot/cold aisle architecture or containers)
Higher cold air temperatures (27C vs. 20C)
Free cooling (cooling towers)
TCO = DC depr + DC opex + Svr depr + Svr opex
References
1.
2.
3.
4.
5.
Luiz Andre Barroso and Urs Holzle, The Case for EnergyProportional Computing, IEEE Computer, Vol 40, Issue 12,
December 2007.
Luiz Andre Barroso and Urs Holzle, The Datacenter as a
Computer: An Introduction to the Design of Warehouse-Scale
Machines, 1st edition, Morgan and Claypool Publishers
Xiaobo Fan, Wolf-Dietrich Weber, Luiz Andre Barroso, Power
provisioning for a warehouse-sized computer, ISCA '07:
Proceedings of the 34th annual international symposium on
Computer architecture
Thomas A. Limoncelli, Christina J. Hogan, and Strata R.
Chalup, The Practice of System and Network Administration,
Second Edition, Addison-Wesley Professional, 2007.
Evi Nemeth, Garth Snyder, Trent R. Hein, Ben Whaley, UNIX
and Linux System Administration Handbook, 4th edition,
Prentice Hall, 2010.
Download