Presentation Title - Columbia Blogs

advertisement
Columbia University’s
Advanced Concept Data Center Pilot Project
April 30, 2010
Agenda
11:30am
Data Center Tour
12:00pm
Welcome and Opening Remarks
Bryan P. Berry, NYSERDA
Project Review
Alan Crosswell, Chief Technologist, CUIT
Richard Hall, Project Manager, CUIT
1:00pm
Installation Summary of Monitoring and Measurement Tools
Ian Katz, Manager, CUIT Data Center
1:45pm
Measurement Plan and Initial Results
Raj Bose and Peter Crosta, Research Computing Services, CUIT
2:45pm
Closing Comments
Alan Crosswell, Chief Technologist, CUIT
Victoria Hamilton, Research Initiatives Coordinator
3:00pm
Meeting Adjourned
2
Welcome and Opening Remarks
Bryan P. Berry, NYSERDA
3
Faculty House
•
Recently renovated Faculty House has been awarded the prestigious Leadership
in Energy and Environmental Design (LEED) Gold Certification by the United
States Green Building Council.
•
http://www.environment.columbia.edu/blog/facultyhouseawardedleedgold
 First LEED Gold Certified building on Morningside campus.
 The first McKim, Mead & White building in the country to be given this
designation.
•
To learn about Columbia University’s other “green initatives” please visit
Columbia University’s Environmental Stewardship webpage
•
http://www.environment.columbia.edu/initiatives
4
Advanced Concepts Data Center Pilot
Project - Review
Alan Crosswell, Assoc VP and Chief Technologist, CUIT
The opportunities
• Data centers consume 3% of all electricity in New York State (1.5%
nationally as of 2007). That’s 4.5 billion kWh annually.
• Use of IT systems especially for research high performance
computing (HPC) is growing.
• We need space for academic purposes such as wet labs, especially
in our constrained urban location.
• Columbia’s commitment to Mayor Bloomberg’s PlaNYC 30% carbon
footprint reduction by 2017.
• NYS Gov. Paterson’s 15x15 15% electrical demand reduction by
2015 goal.
• National Save Energy Now 25% energy intensity reduction in 10 yrs.
6
Green data center best practices
1. Measure and validate
– You can’t manage what you don’t measure.
2. Power and cooling infrastructure efficiency
– Best Practices for Datacom Facility Energy Efficiency. ASHRAE
(ISBN 978-1-933742-27-4)
3. IT equipment efficiency
– Moore’s Law performance improvements
– Energy Star/EPEAT
– Server consolidation and virtualization
– BIOS, OS and Application tuning
7
Measuring infrastructure efficiency
• The most common measure is Power Use Effectiveness (PUE) or its
reciprocal, Data Center Infrastructure Efficiency (DcIE).
[Total Datacenter Electrical Load]
PUE =
[Datacenter IT Equip. Electrical Load]
• PUE only measures efficiency of the electrical and cooling
infrastructure.
• Chasing a good PUE can lead to bizarre results since heavilyloaded facilities usually use their cooling systems more efficiently.
8
LBNL Average PUE for 12 Data Centers
Power Use Effectiveness (PUE) =2.17
9
Making the server slice bigger, the pie smaller
and green.
• Reduce the PUE ratio by improving electrical & mechanical
efficiency.
– Google claims a PUE of 1.2
• Consolidate data centers (server rooms)
– Claimed more efficient when larger (prove it!)
– Free up valuable space for wet labs, offices, classrooms.
• Reduce the overall IT load through
– Server efficiency (newer, more efficient hardware)
– Server consolidation & sharing
• Virtualization
• Shared research clusters
• Move servers to a zero-carbon data center
10
Data center electrical best practices
• 95% efficient 480V room UPS
– Basement UPS room vs. wasting 40% of rack space
– Flywheels or batteries?
• 480V distribution to PDUs at ends of rack rows
– Transformed to 208/120V at PDU
– Reduces copper needed, transmission losses
• 208V power to servers vs. 120V
– More efficient (how much?)
• Variable Frequency Drives for cooling fans and pumps
– Motor power consumption increases as the cube of the speed.
• Generator backup
11
Data center mechanical best practices
• Air flow – reduce mixing, increase delta-T
– Hot/cold or double hot aisle separation
– 24-36” under floor plenum
– Plug up leaks in floor and in racks (blanking panels)
– Duct CRAC returns to an overhead plenum if possible
– Perform CFD modeling
• Alternative cooling technique: In-row or in-rack cooling
– Reduces or eliminates hot/cold air mixing
– More efficient transfer of heat (how much?)
– Supports much higher power density
– Water-cooled servers are making a comeback
12
Measuring IT systems efficiency
• A complementary measure to PUE is the amount of useful work
being performed by the IT equipment. What should the metric be?
• MIPS per KwH?
• kilobits per MWh (an early NSFNet node benchmark:-)
• SPECpower benchmark
13
• Highly recommended
reading
• Surveys 261 higher ed
institutions
• Cites Columbia’s
environmental
sustainability efforts and
NYSERDA’s role
14
Our NYSERDA project
• New York State Energy Research & Development Authority is a public benefit
corporation funded by NYS electric utility customers. http://www.nyserda.org
• Columbia competed for and was awarded an “Advanced Concepts Datacenter
demonstration project”. 18 months starting April 2009
• ~ $1.2M ($447K Direct costs from NYSERDA)
• Goals:
 Learn about and test some industry best practices in a “real world” datacenter
 Measure and verify claimed energy efficiency improvements.
 Share lessons learned with our peers.
15
External Visiting Committee
•
Laurie Kerr, Senior Policy Advisor for Energy and Green Buildings, NYC Mayor’s Office of LongTerm Planning and Sustainability
•
Vace Kundakci, Assistant Vice President for Information Technology and Chief Information Officer,
City College of New York/City University of New York
•
Timothy Lance, President and Board Chair, New York State Education and Research Network
(NYSERNet)
•
Marilyn McMillan, Associate Provost and Chief Information Technology Officer, New York
University
Role:
The External Visiting Committee is charged with maintaining skepticism about our plans and our
accomplishments from the perspective of external institutions. To the extent we demonstrate
feasibility and potential impact, they will serve as powerful ambassadors to other institutions.
16
Internal Advisory Group
•
Wilmouth A. Elmes, Associate Vice President of Engineering/Technical Services, Manhattanville
Development Project
•
Doug McKean, Director, Capital Project Management
•
Arthur M. Langer, School of Engineering and Applied Science, Senior Director of the Center for
Technology, Innovation, and Community Engagement and Faculty & Assoc Director, Executive
Masters of Science in Technology Management at the School of Continuing Education
•
Nilda Mesa, Assistant Vice President, Environmental Stewardship
•
Scott W. Norum, Chief Administrative Officer for Arts and Sciences and Vice President, Office of
the Vice President for Arts and Sciences
•
Leonard Peters, School of Business, Associate Dean and Chief Information Officer – Information
Technology
Role:
The Internal Advisory Group serves two primary functions. Not only do we anticipate they will prove
a valuable sounding board for the execution of the pilot (implementation guidance/expert judgment),
but we expect them to pose the hard questions about issues surrounding scale-up (feedback).
17
Research Faculty User Group
•
Liam Paninski, Associate Professor, Dept. of Statistics
•
Kathryn V. Johnston, Associate Professor, Dept. of Astronomy
•
Mary Putman, Clare Boothe Luce Associate Professor of Astronomy
•
Greg L. Bryan, Assistant Professor, Dept. of Astronomy
Role:
Responsible for vetting the results of the combined HPC cluster for the Department of Astronomy
and the Department of Statistics
18
Working Group
•
Co-Principal Investigators
Alan Crosswell, Associate Vice President and Chief Technologist, CUIT
Victoria Hamilton, Director of the Office for Research Initiatives (ORI)
•
•
•
•
•
•
•
•
•
•
•
•
Richard Hall, Project Manager
Anthony Cirillo, AVP Systems and Operations
Peter Miner, Senior Director, Systems Engineering
Ian Katz. Data Center Facilities Manager
Melissa Metz, Director of Systems Engineering
Fran Ovios, Director of Systems Engineering
Stew Feuerstein, Director and Chief Architect
Joseph Rini, Director of Network Engineering
Megan Andersen, Research Systems Administrator
Ed McArthur, CU Facilities Project Manager
Raj Bose, Manager, Research Computing Services
Don Lanini, Senior Systems Engineer
Role:
Day-to-day responsibility for planning, executing and measuring required by the proposal. The
Columbia University in-house team will perform much of the work.
19
CU Data Center Improvement Program
•
Begun with an assessment and recommendation performed by Bruns-Pak,
Inc. in 2009 ($50K).
•
CUF Operations HVAC study by Horizon Engineering.
•
Generator overload mitigation study by Rowland Engineering ($32K).
•
JB&B, Gensler & Structuretone developed a master plan ($152K) based on
the Bruns-Pak work which was used to develop:
– DOE ARRA grant application for HVAC improvements (not awarded).
– NIH ARRA grant application for electrical improvements (awarded
4/15/10 $10M Core Research Computing Facility).
– Will be used for future funding opportunities
– Informing portions of the NYSERDA project (chilled racks).
20
Advanced Concepts Data Center
Pilot Project - Scope Review
Richard Hall, Project Manager, CUIT
21
Scope of Work Review
Milestone Description
No.
1
Project Approval
2
Inventory Servers
3
Instrument Power
4
Instrument Cooling
5
Develop Datacenter Profile
6
Implement 9 Racks
7
Replace 30 Servers
8
Compare Clusters
9
Implement Server Power Management
10
Increase Chilled Water Set Point 5-degrees
12
Project Review
13
Project close
22
Scope of Work – Detail
To achieve the overall project objectives, CUIT has broken down the project into major tasks.
•
•
•
•
•
Inventory (task 2) – COMPLETE
– Create detailed physical inventory of existing in-scope server
Instrument server power consumption (task 3) – COMPLETE
– Install network monitored power monitors for each server
– Perform data collection at 5-min intervals
Instrument server input air temperature and overall DC chilled water (task 4) - COMPLETE
– Install server input ambient air temperature for each server
– Install BTU metering on data center supply and return lines
– Perform data collection at 5-min intervals
Establish overall Data Center profile (task 5) - September 2010
– Utilize equipment load results to establish baselines
– Develop PUE ratio for entire data center & inventoried servers
Implement 9 racks of high power density in-row cooling (task 6) – September 2010
– Install 9 server racks outfitted for high power density
– Provide in-row cooling subsystem for 9 server racks
23
Scope of Work Detail Cont’d
•
Replace 30 “old” servers and measure efficiency improvement (task 7) – September 2010
– Consolidate the replacement servers into high density racks and re-implement the same IT
services
– Take measurements of before-and-after power consumption
– Document expected and actual efficiency improvement
•
Compare old and new research clusters (task 8) – COMPLETE
– Benchmark applications on new Astronomy/Statistics HPC cluster
•
Implement server power management (task 9) - September 2010
– Install server BIOS/high level power management feature upgrades to servers (identified in
task 2)
•
Increase chilled water set point and measure (task 10) - May 2010
– Document measured before-and-after energy consumption
•
Communicate results (task 11) – On-going
– Share results with key stakeholders
24
Installation Summary of Monitoring
and Measurement Tools
Ian Katz, Data Center Facilities Manager, CUIT
25
Progress so far…
•
Installed power meters throughout Data Center
– Established overall data center power usage ~ 290kW
•
Installed metered PDU’s and plugged in inventoried hosts
– Ready for Idle benchmarking and service group power measurements
•
Installed chilled water flow meters
– Established overall data center heat load ~ 120tons
•
Estimated CU Data Center PUE (Power Use Effectiveness)
•
Other Data Center Improvements
26
Selected Metering Products
• Power Panel Metering
– WattNode Meter
– Babel Buster SPX (ModBus to SNMP translator)
• Server Level Metering
– Raritan PDU
• Chilled Water Metering
– Flexim – Fluxus ADM 7407
27
Power Meter Installation
• Installed WattNodes in 20 Power Panels
• 17 Panels in Data Center
• 3 Main Feeder Panels in Mechanical room
– ATS 2 & 3 - HVAC Load
– ATS 4 - IT Load
• 290kW IT load read from
PP1,2,3,4,5,6,16,26,27
• 120kW HVAC load read
from ATS 2 & 3
28
Chilled Water Meter Installation
• Flexim meters installed in Mechanical Room
• Sensors installed to measure flow rate and
temperature
• Result is Heat Flow Rate in tons
• HF (tons) = Vol Flow (gpm) * ∆T / 24
• Sensors installed in 3 locations
– Liebert CRACs 1 – 6
– AC 1 & 2
– Dry Coolers
• Meters tied into same Modbus
network as Wattnodes
29
Server Level Metering
• Meter many different hardware types with Raritan PDU’s
– Sun: NetraT1, V100, V210, V240, 280R, V880, T2000
– HP: DL360G4p, DL360G5p, DL380G5
• 28 servers Identified to:
– Establish Active/Idle Benchmark
– Investigate service usage comparisons
• Blade chassis (HP c7000) and blade servers (HP BL460c) metered with
built in tools.
30
Wattnode
meters
Campus Level
power
panel
CRAC unit
Data Center: 200 Level
chilled water
pipes
Mechanical Room: 100 Level
Main IT power
feed (ATS4)
server rack
Raritan Power Distribution Units (PDUs)
and Uninterruptible Power Supplies (UPSs)
Flexim
meters
CU Data Center PUE Pie Chart
Power Use Effectiveness – 2.15
Lighting
5
1%
HVAC chilled water
120
23%
UPS overhead
44
8%
Servers
247
47%
HVAC fans & pumps
& compressors
114
21%
32
More Data Center Improvements
•
Overhead Cable Trays
– Will allow us to clean up under the
raised floor
•
New Data Center Layout
– Hot Aisle / Cold Aisle Format
•
*Future* -- Duct CRAC units & use ceiling as
plenum
– To return hot air from hot aisles to
CRACs
33
Measurement Plan and Initial
Results
Raj Bose and Peter Crosta, Research Computing Services, CUIT
34
Overview: Tasks 7, 8, 9
•
Task 7: Comparing power consumption of old and new(er) hardware
•
Task 8: High performance computing (HPC) cluster power consumption comparison
•
Task 9: Power management and tuning
35
Task 7: Out with the old, in with the new
•
If we replace old servers with new servers, how will power consumption change?
IBM 7090 in University Computer Center, 1966
Microsoft’s Chicago data center, 2009
36
Task 7: Power measurement plan
•
Inventory servers
•
Determine comparison groups
•
Two-tiered power measurement approach
1) pre/post migration comparison
2) SPECpower benchmark
37
Task 7: Pre/post migration comparisons
•
Power consumption of same IT services on different hardware
Old server
Migration
New server
Time
E.g.
Service: E-mail
Hardware: old
Time: 1 week
Energy: 68 kWh
Service: E-mail
Hardware: new
Time: 1 week
Energy: ?? kWh
38
Task 7: SPECpower benchmark
•
Industry standard benchmark to evaluate performance and power
•
Addresses the performance of server side Java
•
Finds maximum ssj_ops (server side Java operations per second)
•
With simultaneous power measurement, allows calculation of ssj_ops / watt
(performance per watt)
39
Task 7: SPECpower results – HP DL360G4p
•
On some systems, power consumption decreases as load decreases.
40
Task 7: SPECpower results – HP DL360G4p
41
Task 7: SPECpower results – Sun Sunfire V880
•
On other systems, constant power consumption regardless of load.
42
Task 7: Summary
•
Designed plan to measure old and new server power consumption in multiple ways
– Energy consumed while running the same IT services
– Performance per watt of power used
•
Pre-migration measurement period is complete; awaiting migrations and follow-up
measurements
•
Benchmarking has begun and nears completion
43
Task 8: Cluster comparison
•
Can a new, larger research cluster be more energy efficient than an older,
smaller research cluster?
Beehive
Hotfoot
44
Task 8: The clusters
Beehive
Hotfoot
•
•
•
•
•
•
•
Built in 2005
16 cores
8 servers
Dual-core 2.2 GHz AMD Operton
2 to 8 GB RAM
10 TB SATA storage
OpenPBS scheduler
•
•
•
•
•
•
•
Built in 2009
256 cores
16 high-density blades (2 servers each)
Dual quad-core 2.66 GHz Intel Xenon
16 GB RAM
30 TB SATA storage
Condor scheduler
•
•
Theoretical Peak GFlops: 61.6
IDLE POWER IN WATTS: 2.7 kW
•
•
Theoretical Peak GFLops: 1361.9
IDLE POWER IN WATTS: 4.1 kW
45
Task 8: Comparison plan
•
Power use in active idle state
– Beehive = 2.7 kW
– Hotfoot = 4.1 kW
•
Energy consumption at load
– Counting to one billion
– Summing primes from 2 to 2 million (MPI)
– Summing primes from 2 to 15 million (MPI)
46
Task 8: Energy use while running jobs
New cluster uses less energy to run research jobs than old cluster.
Job
Cluster
Count to one billion on 1 core
Beehive
Hotfoot
Runtime
3.33 minutes
2.87 minutes
Time
Difference
Energy
Energy
Difference
0.46 minutes
0.15 kWh
0.20 kWh
133%
8.09 minutes
0.61 kWh
0.35 kWh
57%
Sum primes between 2 and 15 million on 14 cores
8.92 hours
Beehive
3.87 hours
Hotfoot
5.05 hours
24.2 kWh
16.3 kWh
67%
Sum primes between 2 and 15 million on 256 cores
15.85 minutes
Hotfoot
8.66 hours
1.3 kWh
5%
Sum primes between 2 and 2 million on 14 cores
Beehive
13.02 minutes
Hotfoot
4.93 minutes
47
Task 8: Summary
•
Older cluster consumes less power and uses less energy at baseline
•
Advantages of newer cluster are evident as utilization increases
48
Task 9: Power tuning
•
Implement server-, BIOS-, OS-level power tuning and power management
•
Re-run benchmarks and service group comparisons to collect additional power usage
data
49
Summary of Tasks 7, 8, and 9
•
Task 7: In progress
•
Task 8: Completed – newer cluster proves to use less energy than older cluster for
research intensive work
•
Task 9: In progress
50
Closing Comments
Alan Crosswell, Associate VP and Chief Technologist, CUIT
Victoria Hamilton, Director of Research Initiatives, Office of the Executive Vice
President for Research (OEVPR)
51
Project Communications
• Conference presentations, publications, etc.
– Workshop planned for January(?) timeframe
– proposal submitted: 2010 Educause Annual Meeting (October)
– 05/03/10 participation in NSF Workshop on Sustainable HPC
– 04/15/10 cited in ECAR Green IT study
– 03/03/10 participation in Datacenter Dynamics panel
– 10/20/09 presentation at AITP-LI meeting
– 10/06/09 presentation at Internet2 Member Meeting
– 03/04/09 participated in CANARIE Green IT workshop
– cited in The Greening of IT (ISBN 978-0-13-715083-0)
• Project blog: http://blogs.cuit.columbia.edu/greendc
• Email: nyserda-committee@lists.columbia.edu
52
Thanks to many groups around Columbia
and within CUIT
•Facilities, Data Center, Operations
•Statics and Astronomy
•NetDev, NetProj, NFS
•UNIX, WISE
•RCS, PM Office
Thank You, NYSERDA
This work is supported in part by the New York State Energy Research and Development
Authority (NYSERDA agreement number 11145). NYSERDA has not reviewed the information
contained herein, and the opinions expressed do not necessarily reflect those of NYSERDA or the
State of New York.
53
Download